Doug Roble: Digital humans that look just like us

Hello. I'm not a real person. I'm actually a copy of a real person. Although, I feel like a real person. It's kind of hard to explain. Hold on -- I think I saw a real person ... there's one. Let's bring him onstage.

こんにちは私は本物の人間ではありません実は本物の人間のコピーです本物の人間のように感じてはいますが— 説明するのが難しいです待ってください本物の人間がいたような— 彼をステージに行かせましょう

Hello.

どうも

(Applause)

(拍手)

What you see up there is a digital human. I'm wearing an inertial motion capture suit that's figuring what my body is doing. And I've got a single camera here that's watching my face and feeding some machine-learning software that's taking my expressions, like, "Hm, hm, hm," and transferring it to that guy. We call him "DigiDoug." He's actually a 3-D character that I'm controlling live in real time.

ご覧いただいたのは CGの人間です私が着ているのは慣性センサー式のモーション・キャプチャ・スーツで私の動作を把握しますここにカメラがあって私の顔を捉えていて機械学習ソフトウェアで表情を取り出しますこんな風にそれをこの人物に転送しています彼は「デジ・ダグ」です私がリアルタイムで操作している 3Dキャラクターです

So, I work in visual effects. And in visual effects, one of the hardest things to do is to create believable, digital humans that the audience accepts as real. People are just really good at recognizing other people. Go figure! So, that's OK, we like a challenge.

私は視覚効果の仕事をしています視覚効果で一番難しいのは見た人が本物の人間と受け取るような CGの人間を作り出すことです人間は他の人間を認識することに驚くほど長けていますそれはいいです挑戦は好きですから

Over the last 15 years, we've been putting humans and creatures into film that you accept as real. If they're happy, you should feel happy. And if they feel pain, you should empathize with them. We're getting pretty good at it, too. But it's really, really difficult. Effects like these take thousands of hours and hundreds of really talented artists.

この15年私達は本物と思えるような人間や生き物を映画の中に作り出してきましたそのキャラが幸せならみんなも幸せに感じ苦しんでいたら見た人も同じ気持ちになるような— 随分上手くできるようになってきましたが本当に難しいんですそういう効果を生むには何百人もの優れたアーティストが何千時間もかけて取り組む必要があります

But things have changed. Over the last five years, computers and graphics cards have gotten seriously fast. And machine learning, deep learning, has happened. So we asked ourselves: Do you suppose we could create a photo-realistic human, like we're doing for film, but where you're seeing the actual emotions and the details of the person who's controlling the digital human in real time? In fact, that's our goal: If you were having a conversation with DigiDoug one-on-one, is it real enough so that you could tell whether or not I was lying to you? So that was our goal.

でも状況が変わりましたこの５年ほどでコンピューターやグラフィックカードがものすごく速くなりましたそして機械学習— ディープラーニングというのが現れましたそれで考えました映画に出てくるような写実的なCGの人物に操作している人の感情や顔の細部をリアルタイムで反映させることはできないか？それが目標ですデジ・ダグが皆さんと１対１で会話していて嘘をついたらそれと分かるくらいにリアルにできるか？そこを目標にやってきました

About a year and a half ago, we set off to achieve this goal. What I'm going to do now is take you basically on a little bit of a journey to see exactly what we had to do to get where we are. We had to capture an enormous amount of data. In fact, by the end of this thing, we had probably one of the largest facial data sets on the planet. Of my face.

１年半前に取り組み始めました今のレベルになるまでにどんなことをする必要があったのかをこれから皆さんにお見せしましょう膨大な量のデータを捉える必要がありました実際最終的にはこれは地球上で最も大きな顔のデータの集成になりました —私の顔のということですが

(Laughter)

(笑)

Why me? Well, I'll do just about anything for science. I mean, look at me! I mean, come on. We had to first figure out what my face actually looked like. Not just a photograph or a 3-D scan, but what it actually looked like in any photograph, how light interacts with my skin. Luckily for us, about three blocks away from our Los Angeles studio is this place called ICT. They're a research lab that's associated with the University of Southern California. They have a device there, it's called the "light stage." It has a zillion individually controlled lights and a whole bunch of cameras. And with that, we can reconstruct my face under a myriad of lighting conditions. We even captured the blood flow and how my face changes when I make expressions. This let us build a model of my face that, quite frankly, is just amazing. It's got an unfortunate level of detail, unfortunately.

なぜ私か？私は科学のためとあれば何だってしますから見てくださいどうです最初にしたのは私の顔がどう見えるか— 単なる１枚の写真や 3Dスキャンではなくあらゆる写真でどう見えるか皮膚に光がどう作用するかまで捉えることです幸いなことにロスにある私達のスタジオの近所に ICTというのがあります南カリフォルニア大学の研究施設ですそこに「ライトステージ」という装置があります個々に制御できる無数の照明と膨大な数のカメラが付いていますこれを使うと様々な光の条件下での私の顔を再現できます血の流れまで捉え表情ごとに顔がどう変わるかを把握しますこれにより見事なまでの私の顔のモデルができました勘弁してほしくなるほどの詳細さです

(Laughter)

(笑)

You can see every pore, every wrinkle. But we had to have that. Reality is all about detail. And without it, you miss it. We are far from done, though. This let us build a model of my face that looked like me. But it didn't really move like me. And that's where machine learning comes in. And machine learning needs a ton of data. So I sat down in front of some high-resolution motion-capturing device. And also, we did this traditional motion capture with markers. We created a whole bunch of images of my face and moving point clouds that represented that shapes of my face. Man, I made a lot of expressions, I said different lines in different emotional states ... We had to do a lot of capture with this. Once we had this enormous amount of data, we built and trained deep neural networks. And when we were finished with that, in 16 milliseconds, the neural network can look at my image and figure out everything about my face. It can compute my expression, my wrinkles, my blood flow -- even how my eyelashes move. This is then rendered and displayed up there with all the detail that we captured previously.

毛穴や皺の１つひとつまで見えますでもそれが必要なんですリアリティはそういう細部から生まれるのですそれなくしては上手くいきませんまだ完成ではありませんこれで私のように見える顔のモデルはできましたでも私のように動きはしませんそこで機械学習の出番です機械学習には膨大な量のデータが必要ですそれで私は高解像度モーションキャプチャ装置の前に座り従来的なマーカーを使ったモーション・キャプチャもしました膨大な量の私の顔の画像と私の顔の形状を表す動点群を作りましたすごくいろんな表情をしましたよ様々な感情を込め様々な台詞を言いましたたくさんのキャプチャをする必要がありましたそうやって膨大なデータが得られたらそれを使ってディープ・ニューラル・ネットワークを訓練しますそれが完了するとニューラル・ネットワークは私の顔を見て16ミリ秒であらゆることを把握できるようになりました表情や皺や血流やまつげの動きまで計算できますそしてそれを以前にキャプチャした細部のデータを使ってレンダリングし表示します

We're far from done. This is very much a work in progress. This is actually the first time we've shown it outside of our company. And, you know, it doesn't look as convincing as we want; I've got wires coming out of the back of me, and there's a sixth-of-a-second delay between when we capture the video and we display it up there. Sixth of a second -- that's crazy good! But it's still why you're hearing a bit of an echo and stuff. And you know, this machine learning stuff is brand-new to us, sometimes it's hard to convince to do the right thing, you know? It goes a little sideways.

まだ完成はしていません開発中のものです社外で見せるのはこれが初めてです説得力のある格好でもありません後ろにケーブルが繋がっているし映像のキャプチャから表示までに 1/6秒の遅延がありますやっていることからするとすごく速いんですがそれでもエコーなんかが出てしまいます機械学習というのは私達には目新しいものでなかなか思うようになってくれずおかしな具合になることもあります

(Laughter)

(笑)

But why did we do this? Well, there's two reasons, really. First of all, it is just crazy cool.

でもなぜこんなことをしているのでしょう？理由は２つあります第一に超イカしてるから

(Laughter)

(笑)

How cool is it? Well, with the push of a button, I can deliver this talk as a completely different character. This is Elbor. We put him together to test how this would work with a different appearance. And the cool thing about this technology is that, while I've changed my character, the performance is still all me. I tend to talk out of the right side of my mouth; so does Elbor.

どのくらいイカしてるかというとボタン１つでスピーカーのキャラを変更できますエルボーをご紹介します別の見かけでうまくいくか試すために作ったキャラですこの技術のいいところはキャラクターを変えても演じているのは依然私だということです私は口の右側でしゃべる癖がありますがエルボーもそうです

(Laughter)

(笑)

Now, the second reason we did this, and you can imagine, is this is going to be great for film. This is a brand-new, exciting tool for artists and directors and storytellers. It's pretty obvious, right? I mean, this is going to be really neat to have. But also, now that we've built it, it's clear that this is going to go way beyond film.

これをやってる２つ目の理由はご想像の通り映画での利用ですこれはアーティストや監督やストーリーテラーにとって素晴らしいツールになります言うまでもないでしょうすごく役に立つはずです作ってみて明らかになったのはこれが映画に留まるものではないということです

But wait. Didn't I just change my identity with the push of a button? Isn't this like "deepfake" and face-swapping that you guys may have heard of? Well, yeah. In fact, we are using some of the same technology that deepfake is using. Deepfake is 2-D and image based, while ours is full 3-D and way more powerful. But they're very related. And now I can hear you thinking, "Darn it! I though I could at least trust and believe in video. If it was live video, didn't it have to be true?" Well, we know that's not really the case, right? Even without this, there are simple tricks that you can do with video like how you frame a shot that can make it really misrepresent what's actually going on. And I've been working in visual effects for a long time, and I've known for a long time that with enough effort, we can fool anyone about anything. What this stuff and deepfake is doing is making it easier and more accessible to manipulate video, just like Photoshop did for manipulating images, some time ago.

でも待って私はボタン１つで自分の姿を変えてしまいましたがこれは皆さんも聞いたことがあるだろう「ディープフェイク」や顔のすげ替えに当たるのでは？確かに実際私達はディープフェイクと同様の技術を使ってもいますディープフェイクが２次元映像を使うのに対しこちらは完全な３次元だし遙かに強力ですが関連したものです皆さんの頭の中の叫びが聞こえるようです「なんてこった！映像は信頼できると思っていたライブ映像なら本物のはずじゃないのか？」必ずしもそうとも言えませんこのような技術がなくてもどう撮るか簡単なトリックを使うことで実際に起きていることとは違うものを見せることができます私は長年視覚効果をやってきたので十分な手間暇をかければ誰であれ何についてであれ欺けることを知っていますこの技術やディープフェイクは映像の操作を簡単で誰でもできるようにしただけです Photoshopが写真編集を容易にしたのと同じように

I prefer to think about how this technology could bring humanity to other technology and bring us all closer together. Now that you've seen this, think about the possibilities. Right off the bat, you're going to see it in live events and concerts, like this. Digital celebrities, especially with new projection technology, are going to be just like the movies, but alive and in real time. And new forms of communication are coming. You can already interact with DigiDoug in VR. And it is eye-opening. It's just like you and I are in the same room, even though we may be miles apart. Heck, the next time you make a video call, you will be able to choose the version of you you want people to see. It's like really, really good makeup. I was scanned about a year and a half ago. I've aged. DigiDoug hasn't. On video calls, I never have to grow old.

私はむしろこれがいかに人類に新たな技術をもたらし人を結びつけるかを考えたいですこれの可能性について考えてみてくださいすぐにライブイベントやコンサートでこういうのを目にするようになるでしょう新しいプロジェクション技術と相まってバーチャル有名人が映画の中だけでなくリアルタイムの生きた存在になるでしょう新たな形のコミュニケーションが生まれますすでにVRのデジ・ダグとやり取りできます驚くような経験ですずっと遠くにいながら同じ部屋にいるかのように感じられます今度ビデオ電話するときには相手に見せたい自分を選べるようになっているかもしれませんすごく良くできたメークみたいなものです私がスキャンをしたのは１年半前でした私は年を取りますがデジ・ダグは取りませんビデオ電話での私はずっと若いままでいられます

And as you can imagine, this is going to be used to give virtual assistants a body and a face. A humanity. I already love it that when I talk to virtual assistants, they answer back in a soothing, humanlike voice. Now they'll have a face. And you'll get all the nonverbal cues that make communication so much easier. It's going to be really nice. You'll be able to tell when a virtual assistant is busy or confused or concerned about something.

これが顔と体を持ったバーチャルアシスタントに使われるところを想像してくださいとても人間的ですバーチャルアシスタントが人間のような落ち着きのある声で答えてくれるのが私は気に入っていますがそれが顔も持つようになるのです非言語的なヒントがあることでコミュニケーションはずっと楽になりますすごくいいと思いますよバーチャルアシスタントが忙しかったり困惑していたり何か心配しているときにそれと分かるというのは—

Now, I couldn't leave the stage without you actually being able to see my real face, so you can do some comparison. So let me take off my helmet here. Yeah, don't worry, it looks way worse than it feels.

ちゃんと素顔を見せずにステージを降りるわけにはいかないでしょう比較ができるように— 被り物を取りましょうああ心配しないで見た目ほど酷くはないので

(Laughter)

(笑)

So this is where we are. Let me put this back on here.

これが現在の技術です頭を戻しときましょう

(Laughter) Doink!

(笑) ガチャーン！

So this is where we are. We're on the cusp of being able to interact with digital humans that are strikingly real, whether they're being controlled by a person or a machine. And like all new technology these days, it's going to come with some serious and real concerns that we have to deal with. But I am just so really excited about the ability to bring something that I've seen only in science fiction for my entire life into reality. Communicating with computers will be like talking to a friend. And talking to faraway friends will be like sitting with them together in the same room.

これが現在の技術ですそれを操作しているのが人間であれ機械であれ驚くほど本物らしく見える CGの人間とやり取りするようになる日も遠くありません今時の新技術の例に漏れずこれにも対応が必要な深刻で現実の懸念がありますでも子供の頃からずっと SFの世界の話でしかなかったものが現実になろうとしていることに私はすごくワクワクしていますコンピューターと話すのが友達と話すようになり遠くの友達と話すのが同じ部屋に一緒にいるように感じられるようになるんです

Thank you very much.

ありがとうございました

(Applause)

(拍手)

Hello.

どうも

(Applause)

(拍手)

(Laughter)

(笑)

(Laughter)

(笑)

(Laughter)

(笑)

But why did we do this? Well, there's two reasons, really. First of all, it is just crazy cool.

でもなぜこんなことをしているのでしょう？理由は２つあります第一に超イカしてるから

(Laughter)

(笑)

(Laughter)

(笑)

(Laughter)

(笑)

So this is where we are. Let me put this back on here.

これが現在の技術です頭を戻しときましょう

(Laughter) Doink!

(笑) ガチャーン！

Thank you very much.

ありがとうございました

(Applause)

(拍手)

Doug Roble: Digital humans that look just like us

Doug Roble: Digital humans that look just like us

Related talks

Paul Debevec: Animating a photo-real digital face

Danielle Feinberg: The magic ingredient that brings Pixar movies to life

Tasos Frantzolas: Everything you hear on film is a lie

James Bridle: The nightmare videos of children's YouTube -- and what's wrong with the internet today

Chris Milk: How virtual reality can create the ultimate empathy machine

Ariel Garten: Know thyself, with a brain scanner

Related talks

Paul Debevec: Animating a photo-real digital face

Danielle Feinberg: The magic ingredient that brings Pixar movies to life

Tasos Frantzolas: Everything you hear on film is a lie

James Bridle: The nightmare videos of children's YouTube -- and what's wrong with the internet today

Chris Milk: How virtual reality can create the ultimate empathy machine

Ariel Garten: Know thyself, with a brain scanner