Doug Roble: Digital humans that look just like us

Hello. I'm not a real person. I'm actually a copy of a real person. Although, I feel like a real person. It's kind of hard to explain. Hold on -- I think I saw a real person ... there's one. Let's bring him onstage.

哈囉。我不是真人。我其實是一個真人的複製品。不過，我感覺自己是真人。這有點難解釋。等等——我想我看到了一個真人……那裡有一個。咱們把他帶上台吧。

Hello.

哈囉。

(Applause)

（掌聲）

What you see up there is a digital human. I'm wearing an inertial motion capture suit that's figuring what my body is doing. And I've got a single camera here that's watching my face and feeding some machine-learning software that's taking my expressions, like, "Hm, hm, hm," and transferring it to that guy. We call him "DigiDoug." He's actually a 3-D character that I'm controlling live in real time.

各位在上面看到的是一個數位人。我穿著一件慣性動作捕捉衣，它會設法辨視出我的身體在做什麼。這裡有一台攝影機在觀測我的臉，將我的臉部表情資訊傳送給機器學習軟體，像是「嗯，嗯，嗯，」再傳送給那個傢伙。我們稱他「數位道格」。它其實是一個由我即時控制的 3D 角色。

So, I work in visual effects. And in visual effects, one of the hardest things to do is to create believable, digital humans that the audience accepts as real. People are just really good at recognizing other people. Go figure! So, that's OK, we like a challenge.

我的工作是做視覺效果。在視覺效果領域，最艱難的工作之一就是創造出逼真到可信的數位人，讓觀眾視為真人。人本來就很擅長認人。想想看吧！沒關係，我們喜歡挑戰。

Over the last 15 years, we've been putting humans and creatures into film that you accept as real. If they're happy, you should feel happy. And if they feel pain, you should empathize with them. We're getting pretty good at it, too. But it's really, really difficult. Effects like these take thousands of hours and hundreds of really talented artists.

在過去十五年間，我們把人和動物放到影片當中，讓大家視為真的。如果他們開心，你應該也會覺得開心。如果他們感到痛苦，你就應該會同情他們。我們變得很擅常做這件事。但它非常非常困難。像這樣的效果耗費數千小時的時間，由數百名非常有才華的藝術家共同完成。

But things have changed. Over the last five years, computers and graphics cards have gotten seriously fast. And machine learning, deep learning, has happened. So we asked ourselves: Do you suppose we could create a photo-realistic human, like we're doing for film, but where you're seeing the actual emotions and the details of the person who's controlling the digital human in real time? In fact, that's our goal: If you were having a conversation with DigiDoug one-on-one, is it real enough so that you could tell whether or not I was lying to you? So that was our goal.

但，情況變了。在過去五年間，電腦和顯示卡都變得非常快速。機器學習、深度學習都出現了。所以我們自問：你覺得我們可以創造出跟照片一樣真實的人類，如我們為影片所做的那樣，還可以即時看到控制這個數位人的人實際的情緒和細節嗎？事實上，那是我們的目標：如果你能和數位道格交談，一對一交談，他有沒有真實到讓你能分辨出我有沒有在對你說謊？那就是我們的目標。

About a year and a half ago, we set off to achieve this goal. What I'm going to do now is take you basically on a little bit of a journey to see exactly what we had to do to get where we are. We had to capture an enormous amount of data. In fact, by the end of this thing, we had probably one of the largest facial data sets on the planet. Of my face.

大約一年半前，我們開始朝這個目標邁進。現在我要帶各位踏上一段旅程，見識一下我們必須要做些什麼才能走到今天這一步。我們必須要取得非常大量的資料。事實上，在這件事結束之後，我們的資料集可能會是地球上最大的臉孔資料集之一。我的臉孔的資料集。

(Laughter)

（笑聲）

Why me? Well, I'll do just about anything for science. I mean, look at me! I mean, come on. We had to first figure out what my face actually looked like. Not just a photograph or a 3-D scan, but what it actually looked like in any photograph, how light interacts with my skin. Luckily for us, about three blocks away from our Los Angeles studio is this place called ICT. They're a research lab that's associated with the University of Southern California. They have a device there, it's called the "light stage." It has a zillion individually controlled lights and a whole bunch of cameras. And with that, we can reconstruct my face under a myriad of lighting conditions. We even captured the blood flow and how my face changes when I make expressions. This let us build a model of my face that, quite frankly, is just amazing. It's got an unfortunate level of detail, unfortunately.

為什麼是我？嗯，為了科學，我什麼都可以做。我的意思是，看看我！我是指，拜託。首先我們得要搞清楚我的臉孔真正看起來是什麼樣子的。不只是一張照片或是 3D 掃瞄，是它真正在任何照片中看起來會是什麼樣子、光線和我的皮膚如何互動。我們很幸運，離我們在洛杉磯的工作室只有三個街區左右，有一個地方叫做 ICT。它是間和南加州大學相關的研究實驗室，那裡有個裝置叫做「光舞台」，具有數不清的獨立控制光線，還有一大堆攝影機。有了它，我們就可以在無數的光線條件下重建我的臉孔。我們甚至捕捉到了血流，以及當我做表情時我的面部會有什麼改變。這讓我們建造出我的臉孔模型，坦白說，這相當了不起。不幸的是，細節也清楚到不行。

(Laughter)

（笑聲）

You can see every pore, every wrinkle. But we had to have that. Reality is all about detail. And without it, you miss it. We are far from done, though. This let us build a model of my face that looked like me. But it didn't really move like me. And that's where machine learning comes in. And machine learning needs a ton of data. So I sat down in front of some high-resolution motion-capturing device. And also, we did this traditional motion capture with markers. We created a whole bunch of images of my face and moving point clouds that represented that shapes of my face. Man, I made a lot of expressions, I said different lines in different emotional states ... We had to do a lot of capture with this. Once we had this enormous amount of data, we built and trained deep neural networks. And when we were finished with that, in 16 milliseconds, the neural network can look at my image and figure out everything about my face. It can compute my expression, my wrinkles, my blood flow -- even how my eyelashes move. This is then rendered and displayed up there with all the detail that we captured previously.

每個毛孔、每條皺紋都清楚可見。但這是不能省略的。要真實就少不了細節。沒有細節，就不真實。不過，我們離完成還很遠。這讓我們建出我的臉孔模型，看起來很像我，但動起來卻不像我。這就是機器學習派上用場的時候了。機器學習需要極大量的資料。所以我在某種高解析度動作捕捉裝置前面坐下來。此外，我們也用標記點來進行這種傳統的動作捕捉。我們創造出了一大堆我的臉孔的影像，還有移動點雲來呈現我的臉孔形狀。老天，我做了超多種表情，我用不同的情緒來唸出不同的台詞…… 我們得要捕捉非常多這類資料。一旦取得大量的資料，我們便開始建立和訓練深度神經網路。完成之後，只要 16 毫秒的時間，神經網路看著我的影像就能夠了解關於我臉孔的一切。它能夠計算我的表情、我的皺紋、我的血流—— 甚至我的睫毛會怎麼動。這些資訊會被描繪出來，呈現在上面這裡，且有著我們先前捕捉的所有細節。

We're far from done. This is very much a work in progress. This is actually the first time we've shown it outside of our company. And, you know, it doesn't look as convincing as we want; I've got wires coming out of the back of me, and there's a sixth-of-a-second delay between when we capture the video and we display it up there. Sixth of a second -- that's crazy good! But it's still why you're hearing a bit of an echo and stuff. And you know, this machine learning stuff is brand-new to us, sometimes it's hard to convince to do the right thing, you know? It goes a little sideways.

我們離完成還很遠。這只是在製作中的未成品。其實這是首次在我們公司以外的地方展示它。它並沒有我們所期望的那麼有說服力；我背後還接著電線，而且從捕捉影像到呈現影像間有六分之一秒的延遲。六分之一秒——那算非常好了！但那就是為什麼各位仍然會聽到一點迴音。機器學習對我們來說是全新的，有時很難叫它去做對的事，知道嗎？它會有點小暴走。

(Laughter)

（笑聲）

But why did we do this? Well, there's two reasons, really. First of all, it is just crazy cool.

但，我們為什麼要做這件事？其實理由有兩個。首先，它真的超酷。

(Laughter)

（笑聲）

How cool is it? Well, with the push of a button, I can deliver this talk as a completely different character. This is Elbor. We put him together to test how this would work with a different appearance. And the cool thing about this technology is that, while I've changed my character, the performance is still all me. I tend to talk out of the right side of my mouth; so does Elbor.

這多酷啊？只要按個按鈕，我就可以換成一個全然不同的角色來進行這場演說。這是艾爾柏。我們做他來測試換了外表之後是否還行得通。這項技術很酷的一點在於雖然我換了角色，表演的人仍然是我。我傾向於用嘴巴的右側來說話；艾爾柏也一樣。

(Laughter)

（笑聲）

Now, the second reason we did this, and you can imagine, is this is going to be great for film. This is a brand-new, exciting tool for artists and directors and storytellers. It's pretty obvious, right? I mean, this is going to be really neat to have. But also, now that we've built it, it's clear that this is going to go way beyond film.

我們做這件事的第二個理由，各位應該想得到，這技術對電影來說會很棒。對藝術家、導演，及說故事的人而言，這是個讓人興奮的全新工具。很明顯，對吧？有這技術很棒。我們已經造出來了，很顯然它不會只被用在電影上。

But wait. Didn't I just change my identity with the push of a button? Isn't this like "deepfake" and face-swapping that you guys may have heard of? Well, yeah. In fact, we are using some of the same technology that deepfake is using. Deepfake is 2-D and image based, while ours is full 3-D and way more powerful. But they're very related. And now I can hear you thinking, "Darn it! I though I could at least trust and believe in video. If it was live video, didn't it have to be true?" Well, we know that's not really the case, right? Even without this, there are simple tricks that you can do with video like how you frame a shot that can make it really misrepresent what's actually going on. And I've been working in visual effects for a long time, and I've known for a long time that with enough effort, we can fool anyone about anything. What this stuff and deepfake is doing is making it easier and more accessible to manipulate video, just like Photoshop did for manipulating images, some time ago.

但，等等。我剛剛不是只按個按鈕就改變了我的身分嗎？這不就像是各位曾經聽過的「深偽」和換臉嗎？嗯，是的。事實上，我們確實用到深偽所使用的某些技術。深偽是 2D 的，以影像為基礎，而我們的全是 3D，且強大許多。但它們非常相關。我可以聽見各位在想：「該死！我以為我至少還能夠相信影片的。如果是即時轉播的影片，不就該是真的嗎？」嗯，我們知道其實並非如此，對吧？即使沒有這項技術，還是有簡單的技倆可以用在影片上，比如你可以用取鏡的方式去扭曲真正發生的狀況。我在視覺效果的領域工作很久了，長久以來，我一直知道，只要肯花心力，我們就可以在任何事情上騙過任何人。這項技術和深偽做的就是使操弄影片更容易、門檻更低。就像以前 Photoshop 之於操弄影像一樣。

I prefer to think about how this technology could bring humanity to other technology and bring us all closer together. Now that you've seen this, think about the possibilities. Right off the bat, you're going to see it in live events and concerts, like this. Digital celebrities, especially with new projection technology, are going to be just like the movies, but alive and in real time. And new forms of communication are coming. You can already interact with DigiDoug in VR. And it is eye-opening. It's just like you and I are in the same room, even though we may be miles apart. Heck, the next time you make a video call, you will be able to choose the version of you you want people to see. It's like really, really good makeup. I was scanned about a year and a half ago. I've aged. DigiDoug hasn't. On video calls, I never have to grow old.

我比較偏好去思考這項技術能夠如何把人性帶到其他技術中，讓我們更緊密。各位已經見識過這項技術了，想想看它的可能性。很快，各位就會在現場活動、音樂會上看到類似的技術。特別是，若有了新的投影技術，數位名人就會像電影一樣，但卻是活生生、即時的。溝通的新形式即將到來。各位已經可以在虛擬實境中和數位道格互動，讓人大開眼界。彷彿你我共處一室，實際上我們相距甚遠。真見鬼了，下次你打視訊電話時，你將可以選擇你希望別人看到哪一版的你。就像是極佳的化妝。我是約一年半前被掃瞄的。我變老了。數位道格卻沒有。在視訊電話中，我永遠不必老化。

And as you can imagine, this is going to be used to give virtual assistants a body and a face. A humanity. I already love it that when I talk to virtual assistants, they answer back in a soothing, humanlike voice. Now they'll have a face. And you'll get all the nonverbal cues that make communication so much easier. It's going to be really nice. You'll be able to tell when a virtual assistant is busy or confused or concerned about something.

各位可以想像，有了這項技術，虛擬助理就可以有身體和臉孔，成為一個人。我和虛擬助理交談時，感覺真好，他們用有安撫作用的近似人聲回應我。現在他們都能有臉孔。你能得到各種非言語的暗示訊號，更容易溝通。那真的會很棒。你將能夠分辨出虛擬助理是否在忙、是否感到困惑，或者是否關心某件事情。

Now, I couldn't leave the stage without you actually being able to see my real face, so you can do some comparison. So let me take off my helmet here. Yeah, don't worry, it looks way worse than it feels.

我離開舞台之前，一定要讓各位看到我真實的臉孔，這樣各位才能做比較。讓我把頭盔拿下來。別擔心，感覺起來沒有看起來那麼糟糕。

(Laughter)

（笑聲）

So this is where we are. Let me put this back on here.

這就是我們目前的進展。讓我把頭盔戴回來。

(Laughter) Doink!

（笑聲）ㄉㄨㄞ！

So this is where we are. We're on the cusp of being able to interact with digital humans that are strikingly real, whether they're being controlled by a person or a machine. And like all new technology these days, it's going to come with some serious and real concerns that we have to deal with. But I am just so really excited about the ability to bring something that I've seen only in science fiction for my entire life into reality. Communicating with computers will be like talking to a friend. And talking to faraway friends will be like sitting with them together in the same room.

這就是我們目前的進展。我們正處於能和極度真實的數位人類互動的關口上，不論他們是由人類或機器所操控。如同現今所有的新技術，這項技術也會引發一些嚴重的考量和擔憂，我們得要去處理。但我非常興奮，因為我們有能力把過去只能在科幻小說中看到的東西實現成真。和電腦溝通將會像跟朋友說話一樣。而和遠方的朋友說話則會像是與他們共處一室。

Thank you very much.

非常謝謝。

(Applause)

（掌聲）

Hello.

哈囉。

(Applause)

（掌聲）

(Laughter)

（笑聲）

(Laughter)

（笑聲）

(Laughter)

（笑聲）

But why did we do this? Well, there's two reasons, really. First of all, it is just crazy cool.

但，我們為什麼要做這件事？其實理由有兩個。首先，它真的超酷。

(Laughter)

（笑聲）

(Laughter)

（笑聲）

我離開舞台之前，一定要讓各位看到我真實的臉孔，這樣各位才能做比較。讓我把頭盔拿下來。別擔心，感覺起來沒有看起來那麼糟糕。

(Laughter)

（笑聲）

So this is where we are. Let me put this back on here.

這就是我們目前的進展。讓我把頭盔戴回來。

(Laughter) Doink!

（笑聲）ㄉㄨㄞ！

Thank you very much.

非常謝謝。

(Applause)

（掌聲）

Doug Roble: Digital humans that look just like us

Doug Roble: Digital humans that look just like us

Related talks

Paul Debevec: Animating a photo-real digital face

Danielle Feinberg: The magic ingredient that brings Pixar movies to life

Tasos Frantzolas: Everything you hear on film is a lie

James Bridle: The nightmare videos of children's YouTube -- and what's wrong with the internet today

Chris Milk: How virtual reality can create the ultimate empathy machine

Ariel Garten: Know thyself, with a brain scanner

Related talks

Paul Debevec: Animating a photo-real digital face

Danielle Feinberg: The magic ingredient that brings Pixar movies to life

Tasos Frantzolas: Everything you hear on film is a lie

James Bridle: The nightmare videos of children's YouTube -- and what's wrong with the internet today

Chris Milk: How virtual reality can create the ultimate empathy machine

Ariel Garten: Know thyself, with a brain scanner