All right. Good afternoon, y’all. Let's talk about blending reality and imagination. But first, let's take a step back in time to 2001. As an 11-year-old in India, I became obsessed with computer graphics and visual effects. Of course, at that age, it meant making cheesy videos kind of like this. But therein started a foundational theme in my life, the quest to blend reality and imagination. And that quest has stayed with me and permeated across my decade-long career in tech, working as a product manager at companies like Google and as a content creator on platforms like YouTube and TikTok.
大家下午好。 我们来聊一聊现实与想象的融合。 但首先,让我们倒退到 2001 年。 我这个 11 岁的印度少年 爱上了计算机图形学和视觉效果。 当然,在那个年代, 等同于做做这种中二视频。 但也从此铺就了我的人生底色: 追求现实与想象的融合。 这个追求伴随着我, 贯穿了我在科技领域 十年的职业生涯, 在谷歌这样的公司 担任产品经理, 也在 YouTube 和 TikTok 这样的平台担任内容创作者。
So today, let's deconstruct this quest to blend reality and imagination and explore how it’s getting supercharged -- buzzword alert -- by artificial intelligence. Let's start with the reality bit.
今天,我们来解构一下 对融合现实与想象的追求, 探索如何为它添砖加瓦, 流行词预警—— 借助人工智能的力量。 从现实开始吧。
You probably heard about photogrammetry. It's the art and science of measuring stuff in the real world using photos and other sensors. What required massive data centers and teams of experts in the 2000s became increasingly democratized by the 2010s. Then, of course, machine learning came along and took things to a whole new level with techniques like neural radiance fields, or NeRFs.
你可能听说过摄影测量学。 它是通过照片和其他传感器 测量现实事物的艺术和科学。 在 21 世纪初需要 大量数据中心和专家的工作 到了 2010 年以后 就变得越来越触手可及。 当然,之后机器学习的发展 将一切提升到了全新的高度, 利用了神经辐射场 (NeRF)这样的技术。
What you're seeing here is an AI model creating a ground-up volumetric 3D representation using 2D images alone. But unlike older techniques for reality capture, NeRFs do a really good job of encapsulating the sheer complexity and nuance of reality. The vibe, if you will.
你眼前的是一个 AI 模型, 仅凭 2D 图像 生成立体 3D 呈现。 与传统用于现实捕捉的技术不同的是, NeRF 相当擅长捕捉现实的 微小细节和细微差别。 也可以说是那个“味儿”。
Twelve months later, you can do all of this stuff using the iPhone in your pocket, using apps like Luma. It's like 3D screenshots for the real world. Capture anything once and reframe it infinitely in postproduction, so you can start building that collection of spaces, places and objects that you truly care about and conjure them up in your future creations.
12 个月以后,你就能用 你口袋里的 iPhone 做到这一切, 使用 Luma 这样的 app。 就像是现实世界的 3D 截屏。 捕捉一次,然后 在后期制作中改造无数次, 你就能打造出你真正在乎的 空间、地点和物体的集合, 在你未来的创作中召唤它们。
So that's the reality bit. As NeRFs were popping off last year, the AI summer was also in full effect, with Midjourney, DALL-E 2, Stable Diffusion all hitting the market around the same time. But what I fell in love with was inpainting. This technique allows you to take existing imagery and augment it with whatever you like, and the results are photorealistically fantastic. It blew my mind because stuff that would have taken me like three hours in classical workflows I could pull off in just three minutes.
说完了现实。 NeRF 从去年风靡起来, AI 之夏也如火如荼, Midjourney、DALL-E 2、 Stable Diffusion, 几乎在同一时间涌现在市场之中。 但我心仪的是图像修复。 这项技术能让你使用现有图像, 增强成你想要的任何样子, 结果逼真得要命。 我觉得惊为天人, 因为在传统工作流程中 需要花上我三个小时的工作 只需要三分钟就搞定了。
But I wanted more. Enter ControlNet, a game-changing technique by Stanford researchers that allows you to use various input conditions to guide and control the AI image generation process. So in my case, I could take the depth information and the texture detail from my 3D scans and use it to literally reskin reality.
但我想要的还不止这些。 来看 ControlNet, 由斯坦福大学研究者 研发的变革性技术, 你可以使用各种输入条件 引导、控制 AI 生成图像的过程。 以我自己的情况为例, 我可以使用 3D 扫描的 深度和材质信息 为现实世界重新涂装。
Now, this isn't just cool video. There’s a lot of useful use cases, too. For example, in this case I'm taking a 3D scan of my parents' drawing room, as my mother likes to call it, and reskinning it to different styles of Indian decor and doing so while respecting the spatial context and the layout of the interior space. If you squint, I'm sure you can see how this is going to transform architecture and interior design forever.
不仅仅是炫酷的视频。 还有很多实用的用途。 比如,我用我父母的 “画室”的 3D 扫描, 我母亲喜欢这么称呼这间房间, 给它重新布置上 各式各样的印度装饰, 同时考虑到了这个室内空间的 空间环境和布局。 如果你细看,我相信你会发现 这会如何永久地改变 建筑和室内设计。
You could take that 2016 scan of a Buddha statue and reskin it to be gloriously golden while pulling off these impossible camera moves you just couldn't do any other way. Or you could take that vacation footage from your trip to Tokyo and bring these cherry blossoms to life in a whole new way. And let me tell you, cherry blossoms look really good during the day, but they look even better at night. Oh, my God. They sure are glowing.
你可以将一尊佛像的 2016 年扫描件 涂成金灿灿的, 没有别的方式可以让你 这么调整摄影角度。 你也可以用上 东京之行的度假录影, 用全新的方式重现樱花。 我想告诉大家, 白天看樱花非常漂亮, 但晚上看更漂亮。 天哪,它们在发光。
It's almost like this dreamlike quality where you can use AI to accentuate the best aspects of the real world. Natural landscapes look just as beautiful. Like this waterfall that could be on another planet. But of course, you could go over the hills and far away to the French Alps from another dimension.
你可以得到这种梦幻般的高质量, 借助于 AI 突出现实世界的美好一面。 自然景观风光绮丽。 就像这道可以出现在 另一个星球上的瀑布。 当然,你也可以 翻山越岭,长途跋涉, 从另一个维度 前往法国阿尔卑斯山。
But it's not just static scenes. You can do this stuff with video, too. I can't wait till this technology is running at 30 frames per second because it's going to transform augmented reality and 3D rendering. I mean, how soon until we're channel-surfing realities layered on top of the real world?
但这不仅限于静态的场景。 你可以将其用于视频。 我迫不及待想看到 这项技术以每秒 30 帧地速度运行, 因为它将改变 增强现实和 3D 渲染。 我们再过不久就能在现实的基础上 跨越不同现实的“频道”了吧?
Of course, just like reality capture got democratized, all these tools from last year are getting even easier. So instead of me spending hours weaving together a bunch of different tools, tools like Runway and Kaiber let you do exactly the same stuff with just a couple clicks. Want to go from day to night? No problemo. Want to get that retro 90s aesthetic from "Full House"? You can do that too.
当然,正如现实捕捉的普及, 这些去年出现的工具 也越来越容易使用。 不再由我花上日夜 把各种工具组合到一起, 而像 Runway 和 Kaiber 这样的工具已经能让你 在点击之间达到同样的效果。 把白天换成夜晚?没问题。 想渲染一个 90 年代 《欢乐满屋》复古风? 当然可以做到。
But it goes beyond reality capture. Companies like Wonder Dynamics are turning video into this immaculate form of performance capture so you can embody fantastical creatures using the phone in your pocket. This is stuff that James Cameron only dreamt about in the 2000s. And now you could do it with your iPhone? That’s absolutely wild to me.
还能超越现实捕捉的范畴。 像 Wonder Dynamics 这样的公司 将视频转化为这种 精美的动作捕捉形式, 让你可以用口袋里的手机 展现神奇生物。 这是詹姆斯·卡梅隆 (James Cameron) 在 21 世纪初 只敢想一想的东西。 你现在都可以用 iPhone 实现了?我觉得太牛了。
So when I look back at the past two decades and this ill-tailored tapestry of tools that I've had to learn, I feel a sense of optimism for what lies ahead for the next generation of creators. The 11-year-olds of today don't have to worry about all of that crap. All they need to do is have a creative vision and a knack for working in concert with these AI models, these AI models that are truly a distillation of human knowledge and creativity. And that's a future I'm excited about, a future where you can blend reality and imagination with your trusty AI copilot.
我在回望过去的 20 年时, 看到我得学习的工具大杂烩, 我对下一代创作者眼前的未来 感到了一丝乐观。 现在的 11 岁孩子 不用担心这堆烂事。 他们只需要拥有 发挥创造力的畅想、 与这些 AI 模型协同工作的技能, 这些 AI 模型真的是 人类智慧和创造力的精华。 这是我期待的未来, 一个可以用你值得信赖的 AI 助手 融合现实和想象的未来。
Thank you very much.
谢谢。
(Applause)
(掌声)