All right. Good afternoon, y’all. Let's talk about blending reality and imagination. But first, let's take a step back in time to 2001. As an 11-year-old in India, I became obsessed with computer graphics and visual effects. Of course, at that age, it meant making cheesy videos kind of like this. But therein started a foundational theme in my life, the quest to blend reality and imagination. And that quest has stayed with me and permeated across my decade-long career in tech, working as a product manager at companies like Google and as a content creator on platforms like YouTube and TikTok.
好的,大家下午好。 讓我們來談談如何融合現實和想像。 但首先,讓我們回到 2001 年。 當時我 11 歲,住在印度, 迷上了電腦繪圖和視覺效果。 當然,在那個年紀, 這意味著製作像這樣有點幼稚的影片。 不過,從那時我就展開 我生命中的一個基本主題, 追求現實與想像的融合。 這個追求一直伴隨著我, 貫穿了我長達十年的科技職涯, 我曾在 Google 等公司 擔任產品經理, 在 YouTube、TikTok 等 平台上擔任內容創作者。
So today, let's deconstruct this quest to blend reality and imagination and explore how it’s getting supercharged -- buzzword alert -- by artificial intelligence. Let's start with the reality bit.
今天,讓我們來解構 這個融合現實與想像的追求, 並探索人工智慧如何增強 ——流行語警報——這一追求。 讓我們從現實部分開始談起。
You probably heard about photogrammetry. It's the art and science of measuring stuff in the real world using photos and other sensors. What required massive data centers and teams of experts in the 2000s became increasingly democratized by the 2010s. Then, of course, machine learning came along and took things to a whole new level with techniques like neural radiance fields, or NeRFs.
你也許聽說過攝影測量學。 這是一門藝術兼科學, 使用照片和其他感測器 來測量現實世界中的東西。 在 2000 年代,這需要 大規模的數據中心和專家團隊, 但到了 2010 年代, 這變得越來越大眾化。 接著,當然,機器學習出現了, 使用神經輻射場 (NeRFs)等技術, 技術提升到了全新的水平。
What you're seeing here is an AI model creating a ground-up volumetric 3D representation using 2D images alone. But unlike older techniques for reality capture, NeRFs do a really good job of encapsulating the sheer complexity and nuance of reality. The vibe, if you will.
這裡你看到的是一個 AI 模型, 僅使用 2D 圖像,就能從零開始 創建一個 3D 立體再現。 但不同於舊的現實捕捉技術, NeRF 在封裝現實的複雜性 和細微差別方面做得非常好。 如果你想,可以做出氛圍。
Twelve months later, you can do all of this stuff using the iPhone in your pocket, using apps like Luma. It's like 3D screenshots for the real world. Capture anything once and reframe it infinitely in postproduction, so you can start building that collection of spaces, places and objects that you truly care about and conjure them up in your future creations.
十二個月後,你可以使用 口袋裡的 iPhone, 使用 Luma 等應用程式 來完成這些事情。 這就像現實世界的 3D 螢幕截圖。 捕捉任何物體的影像後, 就可以在後製中無限地重新建構, 你可以開始打造你的影像集, 放入你喜歡的空間、地點和物體, 並在未來的創作中將它們召喚出來。
So that's the reality bit. As NeRFs were popping off last year, the AI summer was also in full effect, with Midjourney, DALL-E 2, Stable Diffusion all hitting the market around the same time. But what I fell in love with was inpainting. This technique allows you to take existing imagery and augment it with whatever you like, and the results are photorealistically fantastic. It blew my mind because stuff that would have taken me like three hours in classical workflows I could pull off in just three minutes.
以上是現實的方面。 隨著去年 NeRF 的興起, AI 夏季也全面展開, Midjourney、DALL-E 2、 Stable Diffusion 幾乎同時上市。 但我愛上的是修復技術。 你可用這種技術隨心所欲地 增強現有的圖像, 結果非常逼真。 這讓我大為驚訝, 因為在傳統工作流程中 可能需要我三個小時的事情, 現在只需要三分鐘就能完成。
But I wanted more. Enter ControlNet, a game-changing technique by Stanford researchers that allows you to use various input conditions to guide and control the AI image generation process. So in my case, I could take the depth information and the texture detail from my 3D scans and use it to literally reskin reality.
但我想要更多。 介紹一下 ControlNet, 這是史丹佛大學研究人員提出的 顛覆傳統規則的技術, 你可以使用各種輸入條件 來引導和控制 AI 影像生成的過程。 就我而言,我可以從 3D 掃描中 獲取深度資訊和紋理細節, 並用來一絲一毫地重塑現實。
Now, this isn't just cool video. There’s a lot of useful use cases, too. For example, in this case I'm taking a 3D scan of my parents' drawing room, as my mother likes to call it, and reskinning it to different styles of Indian decor and doing so while respecting the spatial context and the layout of the interior space. If you squint, I'm sure you can see how this is going to transform architecture and interior design forever.
這不僅僅可以做出很酷的視頻, 還有很多好用的用例。 例如,在本例中,我對我父母的 客廳進行 3D 掃描, (我母親喜歡這樣稱呼它) 並將其重新設計為 不同風格的印度裝飾, 但不改動內部空間環境和佈局。 如果你仔細觀察,我相信你可以看出 這將永遠改變建築和室內設計的方式。
You could take that 2016 scan of a Buddha statue and reskin it to be gloriously golden while pulling off these impossible camera moves you just couldn't do any other way. Or you could take that vacation footage from your trip to Tokyo and bring these cherry blossoms to life in a whole new way. And let me tell you, cherry blossoms look really good during the day, but they look even better at night. Oh, my God. They sure are glowing.
你可以將 2016 年掃描的佛像 改成華麗的金色, 同時完成任何其他方法 無法做到的運鏡方式。 或者你可以拿你在東京旅行時 拍攝的度假片段, 以全新的方式讓櫻花煥發活力。 我告訴你,櫻花白天很好看, 但晚上更漂亮。 我的天啊。 他們確實在發光。
It's almost like this dreamlike quality where you can use AI to accentuate the best aspects of the real world. Natural landscapes look just as beautiful. Like this waterfall that could be on another planet. But of course, you could go over the hills and far away to the French Alps from another dimension.
就這幾乎夢幻般的特質, 你可以用 AI 來增強 現實世界最美的一面。 自然景觀看起來同樣美麗。 這個瀑布看起來像是在另一個星球。 當然,你也可以 從另一個次元越過山丘, 到達遙遠的法國阿爾卑斯山。
But it's not just static scenes. You can do this stuff with video, too. I can't wait till this technology is running at 30 frames per second because it's going to transform augmented reality and 3D rendering. I mean, how soon until we're channel-surfing realities layered on top of the real world?
但這不僅僅是靜態場景。 你也可以用影片來做這些事。 我迫不及待地看到這項技術能夠 以每秒 30 幀的速度運作, 因為這將徹底改變 擴增實境和 3D 渲染。 我是說,要多久我們才能在現實世界上 疊加多層次的現實?
Of course, just like reality capture got democratized, all these tools from last year are getting even easier. So instead of me spending hours weaving together a bunch of different tools, tools like Runway and Kaiber let you do exactly the same stuff with just a couple clicks. Want to go from day to night? No problemo. Want to get that retro 90s aesthetic from "Full House"? You can do that too.
當然,就像現實捕捉變得大眾化一樣, 每個去年上市的工具 都變得更容易上手。 不再需要花幾個小時 組合各種不同的工具, 像 Runway、Kaiber 這樣的工具, 只需要點擊幾下即可完成 完全相同的工作。 想從白天變成夜晚嗎?沒問題。 想感受《歡樂滿屋》中的 90 年代復古美學嗎? 你也可以做到。
But it goes beyond reality capture. Companies like Wonder Dynamics are turning video into this immaculate form of performance capture so you can embody fantastical creatures using the phone in your pocket. This is stuff that James Cameron only dreamt about in the 2000s. And now you could do it with your iPhone? That’s absolutely wild to me.
不只現實捕捉。 許多公司, 像 Wonder Dynamics, 正將影片轉變為 完美的表演捕捉形式, 讓你用口袋裡的手機 就能體現奇幻生物。 這是詹姆斯·卡梅隆 在 2000 年代做夢才有的東西。 現在用 iPhone 就做得到? 對我來說,這很誇張。
So when I look back at the past two decades and this ill-tailored tapestry of tools that I've had to learn, I feel a sense of optimism for what lies ahead for the next generation of creators. The 11-year-olds of today don't have to worry about all of that crap. All they need to do is have a creative vision and a knack for working in concert with these AI models, these AI models that are truly a distillation of human knowledge and creativity. And that's a future I'm excited about, a future where you can blend reality and imagination with your trusty AI copilot.
因此,當回顧過去的二十年, 我得學這一大串不好用的工具, 我為下一代創作者的 未來感到樂觀。 現在的 11 歲小孩 不必擔心那些瑣碎的東西。 他們只需要有創造力的視野, 與 AI 模型協同工作的技巧, 這些 AI 模型是 人類知識和創造力的真正精髓。 這是我期待的未來, 一個與可靠的 AI 合作夥伴 融合現實和想像的未來。
Thank you very much.
非常感謝大家。
(Applause)
(掌聲)