Doug Roble: Digital humans that look just like us

Hello. I'm not a real person. I'm actually a copy of a real person. Although, I feel like a real person. It's kind of hard to explain. Hold on -- I think I saw a real person ... there's one. Let's bring him onstage.

大家好。我不是一个真人。我实际上是一个真人的复制版本。但我感觉和真人无异。这样很难说明白。稍等，我看见了一个真人……在这儿。欢迎他来到台前。

Hello.

大家好。

(Applause)

（掌声）

What you see up there is a digital human. I'm wearing an inertial motion capture suit that's figuring what my body is doing. And I've got a single camera here that's watching my face and feeding some machine-learning software that's taking my expressions, like, "Hm, hm, hm," and transferring it to that guy. We call him "DigiDoug." He's actually a 3-D character that I'm controlling live in real time.

大家刚才看到的是一个数字化的人。我穿着一套能够捕获惯性运动的装备，它能够分析出我身体当前的动作。这里有一个对准我脸的摄像头，它不断的把我的面部表情输入进一个机器学习的软件里，像这样，“嗯，嗯，嗯，”，再把这些表情传输给刚才那个复制品。我们叫他“数码道格”。他实际上是一个三维人物，而我可以实时控制他。

So, I work in visual effects. And in visual effects, one of the hardest things to do is to create believable, digital humans that the audience accepts as real. People are just really good at recognizing other people. Go figure! So, that's OK, we like a challenge.

我的工作方向是视觉效果。在视觉效果这个领域，一件最难的事情就是制造出一个让人信服的，难辨真假的数字人。人们通常都很擅长辨认人脸。那就去辨认吧！我们喜欢面对挑战。

Over the last 15 years, we've been putting humans and creatures into film that you accept as real. If they're happy, you should feel happy. And if they feel pain, you should empathize with them. We're getting pretty good at it, too. But it's really, really difficult. Effects like these take thousands of hours and hundreds of really talented artists.

在过去的15年里，我们一直将人和动物角色放在电影中，你会认为它们是真实存在的。你会因他们喜而喜，也会因他们的遭遇而心生同情。我们也已经做得很不错了。但这其实很不简单。达到这样的效果，得花费成千上万个小时，需要数百个天赋异禀的艺术家。

But things have changed. Over the last five years, computers and graphics cards have gotten seriously fast. And machine learning, deep learning, has happened. So we asked ourselves: Do you suppose we could create a photo-realistic human, like we're doing for film, but where you're seeing the actual emotions and the details of the person who's controlling the digital human in real time? In fact, that's our goal: If you were having a conversation with DigiDoug one-on-one, is it real enough so that you could tell whether or not I was lying to you? So that was our goal.

但是，时过境迁。在过去的五年，电脑和显卡技术发展迅速。机器学习和深度学习技术也日趋成熟。我们开始自问：通过真人面部表情和细节控制对应的数码人，我们是否能够制作出像我们一直在电影中做的那样逼真的人物，但是是实时产生的？事实上，这就是我们的目标：如果让你和“数码道格” 一对一谈话，他能够真实到让你能判断我是否在说谎吗？这就是我们的目标。

About a year and a half ago, we set off to achieve this goal. What I'm going to do now is take you basically on a little bit of a journey to see exactly what we had to do to get where we are. We had to capture an enormous amount of data. In fact, by the end of this thing, we had probably one of the largest facial data sets on the planet. Of my face.

一年半以前，我们设置了这个目标。现在就给大家简单展示一下这一路走来，为了达到这个目标，我们做了怎样的努力。我们必须获取大量的数据。毫不夸张的说，到这个项目结束为止，有关我脸部的数据集可能是地球上最大的数据集之一了。

(Laughter)

（笑声）

Why me? Well, I'll do just about anything for science. I mean, look at me! I mean, come on. We had to first figure out what my face actually looked like. Not just a photograph or a 3-D scan, but what it actually looked like in any photograph, how light interacts with my skin. Luckily for us, about three blocks away from our Los Angeles studio is this place called ICT. They're a research lab that's associated with the University of Southern California. They have a device there, it's called the "light stage." It has a zillion individually controlled lights and a whole bunch of cameras. And with that, we can reconstruct my face under a myriad of lighting conditions. We even captured the blood flow and how my face changes when I make expressions. This let us build a model of my face that, quite frankly, is just amazing. It's got an unfortunate level of detail, unfortunately.

为什么选我的脸啊？因为，我愿意为科学献身啊。看看我这身！仔细看看。首先得分析出我的脸长什么样子。并不仅仅是一张照片或者是三维扫描，得分析出它在任何镜头之下呈现的真实面貌，光影如何和我的皮肤互动。幸运的是，在我们位于洛杉矶的工作室三个街道之外，有个叫做ICT的地方。 ICT是个研究实验室，它附属于南加州大学。它里面有个叫做“光平台”的设备。这个设备有超级多个单独控制的灯和很多的摄像头。在这个设备的帮助下，我们可以在很多种不同光影变化中，重建我的面部。我们甚至捕捉到了血液的流通，以及在我做表情时，我的面部是如何变化的。说真的，这样制造出的面部模型真是非常棒。虽然它呈现了不该呈现的细节，太让人难堪了。

(Laughter)

（笑）

You can see every pore, every wrinkle. But we had to have that. Reality is all about detail. And without it, you miss it. We are far from done, though. This let us build a model of my face that looked like me. But it didn't really move like me. And that's where machine learning comes in. And machine learning needs a ton of data. So I sat down in front of some high-resolution motion-capturing device. And also, we did this traditional motion capture with markers. We created a whole bunch of images of my face and moving point clouds that represented that shapes of my face. Man, I made a lot of expressions, I said different lines in different emotional states ... We had to do a lot of capture with this. Once we had this enormous amount of data, we built and trained deep neural networks. And when we were finished with that, in 16 milliseconds, the neural network can look at my image and figure out everything about my face. It can compute my expression, my wrinkles, my blood flow -- even how my eyelashes move. This is then rendered and displayed up there with all the detail that we captured previously.

你能清晰的看见每个毛孔，每条皱纹。但是呢，我们必须有这些东西。真实取决于细节。没有细节，你的工作就存在缺陷。但是我们的工作远远不止这些。这仅仅是制作了一个像我面部的模型罢了。它还不能像我一样动。这时候，就要用到机器学习了。机器学习需要很多很多的数据。所以呢，我得坐到一个高分辨率的动态捕捉仪器前。我们也使用传统的动作捕捉方法进行标记。我们制作了很多我的面部图片以及能呈现我面部轮廓的动态点云。啊，我做了太多表情了，在各种不同的心情下说了许多不同的台词…… 我们必须把这些都捕捉下来。一旦我们有了这些巨量的数据，就可以开始建造和训练深度神经网络了。当我们做完这些，在16微秒内，神经网络就可以根据我的图像，分析出我脸上的所有细节。它能计算出我的表情，皱纹，血液流动—— 甚至于我的睫毛是怎么动的。所以之前我们捕捉的细节，都会被渲染和呈现出来。

We're far from done. This is very much a work in progress. This is actually the first time we've shown it outside of our company. And, you know, it doesn't look as convincing as we want; I've got wires coming out of the back of me, and there's a sixth-of-a-second delay between when we capture the video and we display it up there. Sixth of a second -- that's crazy good! But it's still why you're hearing a bit of an echo and stuff. And you know, this machine learning stuff is brand-new to us, sometimes it's hard to convince to do the right thing, you know? It goes a little sideways.

但这还远远没完成。我们还要再接再厉。这是我们第一次在公司之外展示这一成果。这还没有达到我们想要的效果；在我背后还可以看见数据线，从捕获录影到显示屏展示这之间还有六分之一秒的延时。六分之一秒啊，这相当厉害了！但是你还是会听到一点回音。机器学习对我们而言，是个全新的东西，有时候很难确保它做出了正确的判断，它有的时候过于夸张。

(Laughter)

（笑）

But why did we do this? Well, there's two reasons, really. First of all, it is just crazy cool.

但是为什么我们要这么做呢？实话说，有两个原因。首先，这真是太酷了。

(Laughter)

（笑）

How cool is it? Well, with the push of a button, I can deliver this talk as a completely different character. This is Elbor. We put him together to test how this would work with a different appearance. And the cool thing about this technology is that, while I've changed my character, the performance is still all me. I tend to talk out of the right side of my mouth; so does Elbor.

有多酷呢？按一下按钮，我就能以完全不同的人物形象来做这个演说。这个是阿波尔。我们用他来检测这套系统是如何用另一副形象来工作的。这个艺术很炫酷的地方就在于，虽然呈现的形象变了，但是演说的人仍然是我。我习惯用偏右边的嘴巴讲话；阿波尔和我一样。

(Laughter)

（笑）

Now, the second reason we did this, and you can imagine, is this is going to be great for film. This is a brand-new, exciting tool for artists and directors and storytellers. It's pretty obvious, right? I mean, this is going to be really neat to have. But also, now that we've built it, it's clear that this is going to go way beyond film.

第二个原因，你也许能想到，这对电影制作意义非凡。对于艺术家，导演以及说书者，它是一个崭新的，令人振奋的工具。这很明显，对吧？一定会让人爱不释手。现在呢，我们已经实现了，很明显，它的应用远远不止在电影界。

But wait. Didn't I just change my identity with the push of a button? Isn't this like "deepfake" and face-swapping that you guys may have heard of? Well, yeah. In fact, we are using some of the same technology that deepfake is using. Deepfake is 2-D and image based, while ours is full 3-D and way more powerful. But they're very related. And now I can hear you thinking, "Darn it! I though I could at least trust and believe in video. If it was live video, didn't it have to be true?" Well, we know that's not really the case, right? Even without this, there are simple tricks that you can do with video like how you frame a shot that can make it really misrepresent what's actually going on. And I've been working in visual effects for a long time, and I've known for a long time that with enough effort, we can fool anyone about anything. What this stuff and deepfake is doing is making it easier and more accessible to manipulate video, just like Photoshop did for manipulating images, some time ago.

等一下，我刚才不是按一下按钮就改变了我的形象和身份吗？这是不是很像我们之前听说过的 “deepfake”和换脸呢？对。我们其实应用了一些和deepfake 同样的技术。 Deepfake是基于二维和平面图像的，而我们这个是三维的，更强大。但二者非常相关。我猜你一定在想： “天啊！我以为我至少可以信任视频的真实性。实时视频，不应该都是真的吗？” 可是，我们都知道，事实不是这样的，对吧？哪怕没有数码人，你在视频上运用一下小技巧，比方说你可以通过布局镜头去掩盖真正发生的事。我在视觉效果研究上花了很长时间，很久前我们就知道了，只要做出足够的努力，在任何事物上，我们都可以愚弄人。 Deepfake和我们研发的工具可以让视频处理更简单，更容易，就像不久以前用PS工具处理图像一样。

I prefer to think about how this technology could bring humanity to other technology and bring us all closer together. Now that you've seen this, think about the possibilities. Right off the bat, you're going to see it in live events and concerts, like this. Digital celebrities, especially with new projection technology, are going to be just like the movies, but alive and in real time. And new forms of communication are coming. You can already interact with DigiDoug in VR. And it is eye-opening. It's just like you and I are in the same room, even though we may be miles apart. Heck, the next time you make a video call, you will be able to choose the version of you you want people to see. It's like really, really good makeup. I was scanned about a year and a half ago. I've aged. DigiDoug hasn't. On video calls, I never have to grow old.

我更倾向于思考这项技术会怎样让其他技术变得更加人性化，让人与人走得更近。现在你们也看到了，想一想各种可能性吧。马上，你就会在演唱会和直播活动中看到它，就像这样。特别是随着新的投影技术的发展，数码化的名人将会和电影一样，但是更鲜活，而且是实时的。沟通交流的新形式要来临了。你已经能够在虚拟现实中和数码道格交流了。这让人大开眼界。就像你和我身处同一个房间一样，即使你我相隔很远。下次你和别人视频通话的时候，你就可以选择自己的版本，选择你想让对方见到的样子。这就像你化了个很好的妆一样。我大约一年半以前扫描了我的脸。我现在比那会儿老了。但是数码道格却没有。在视频通话中，我一点都不会变老。

And as you can imagine, this is going to be used to give virtual assistants a body and a face. A humanity. I already love it that when I talk to virtual assistants, they answer back in a soothing, humanlike voice. Now they'll have a face. And you'll get all the nonverbal cues that make communication so much easier. It's going to be really nice. You'll be able to tell when a virtual assistant is busy or confused or concerned about something.

你能想象得到，这还可以用来给虚拟助手提供一个身体和脸。让它更像一个人。当我与虚拟助手说话时，我喜欢它们以令人平静的、与人类相似的声音来回应。现在他们还将会有一张人类的脸。而且你还能看到让沟通更方便的非语言的线索。效果一定很不错。你可以判断出，虚拟助手是太忙了，还是迷茫、没听懂指令，亦或是它们在担心什么。

Now, I couldn't leave the stage without you actually being able to see my real face, so you can do some comparison. So let me take off my helmet here. Yeah, don't worry, it looks way worse than it feels.

不过，不给你们看看我的真容，让你们比较一下，我是不能下台的。让我取下头盔。别担心，它只是看起来有点糟糕而已。

(Laughter)

（笑）

So this is where we are. Let me put this back on here.

就这样了。我再带上头盔。

(Laughter) Doink!

（笑）咚！

So this is where we are. We're on the cusp of being able to interact with digital humans that are strikingly real, whether they're being controlled by a person or a machine. And like all new technology these days, it's going to come with some serious and real concerns that we have to deal with. But I am just so really excited about the ability to bring something that I've seen only in science fiction for my entire life into reality. Communicating with computers will be like talking to a friend. And talking to faraway friends will be like sitting with them together in the same room.

这就是我们目前的进展。我们很快就能和数码人互动了，他们看起来与真人无异，无论是被机器控制或被真人操控。和现在所有的新技术一样，它也会带来一些严峻且真实的顾虑，这是我们必须去处理的。但我真的很期待，可以有能力把在我有生之年曾经只在科幻小说中看到的事变成现实。和电脑说话，就跟和朋友说话一样。和远方的朋友说话就像坐在同一个房间说话一样。

Thank you very much.

非常感谢。

(Applause)

（掌声）

Hello.

大家好。

(Applause)

（掌声）

(Laughter)

（笑声）

(Laughter)

（笑）

(Laughter)

（笑）

But why did we do this? Well, there's two reasons, really. First of all, it is just crazy cool.

但是为什么我们要这么做呢？实话说，有两个原因。首先，这真是太酷了。

(Laughter)

（笑）

(Laughter)

（笑）

不过，不给你们看看我的真容，让你们比较一下，我是不能下台的。让我取下头盔。别担心，它只是看起来有点糟糕而已。

(Laughter)

（笑）

So this is where we are. Let me put this back on here.

就这样了。我再带上头盔。

(Laughter) Doink!

（笑）咚！

Thank you very much.

非常感谢。

(Applause)

（掌声）

Doug Roble: Digital humans that look just like us

Doug Roble: Digital humans that look just like us

Related talks

Paul Debevec: Animating a photo-real digital face

Danielle Feinberg: The magic ingredient that brings Pixar movies to life

Tasos Frantzolas: Everything you hear on film is a lie

James Bridle: The nightmare videos of children's YouTube -- and what's wrong with the internet today

Chris Milk: How virtual reality can create the ultimate empathy machine

Ariel Garten: Know thyself, with a brain scanner

Related talks

Paul Debevec: Animating a photo-real digital face

Danielle Feinberg: The magic ingredient that brings Pixar movies to life

Tasos Frantzolas: Everything you hear on film is a lie

James Bridle: The nightmare videos of children's YouTube -- and what's wrong with the internet today

Chris Milk: How virtual reality can create the ultimate empathy machine

Ariel Garten: Know thyself, with a brain scanner