Doug Roble: Digital humans that look just like us

Hello. I'm not a real person. I'm actually a copy of a real person. Although, I feel like a real person. It's kind of hard to explain. Hold on -- I think I saw a real person ... there's one. Let's bring him onstage.

سلام. من یک انسان واقعی نیستم. من در حقیقت کپی یک انسان واقعی هستم. اگر چه خیلی واقعی به نظر می‌آیم. توضیح آن کمی سخت است. صبر کنید -- فکر کنم یک انسان واقعی دیدم .. یکی آنجاست. بگذارید او را به صحنه بیاورم.

Hello.

سلام.

(Applause)

(تشویق حضار)

What you see up there is a digital human. I'm wearing an inertial motion capture suit that's figuring what my body is doing. And I've got a single camera here that's watching my face and feeding some machine-learning software that's taking my expressions, like, "Hm, hm, hm," and transferring it to that guy. We call him "DigiDoug." He's actually a 3-D character that I'm controlling live in real time.

چیزی که آنجا می‌بینید یک انسان دیجیتالی است. من لباس ضبط سه بعدی حرکات بدن پوشیده‌ام که حرکات بدنم را بازسازی می‌کند. من در اینجا تنها یک دوربین دارم که رو به صورت من است و حالاتی که از من دریافت می‌کند، مانند «اوه، هم،» را به نرم‌افزارهای یادگیری ماشینی می‌رساند، و آنها را به آن فرد انتقال می‌دهد. ما او را «دیجی داگ» می‌نامیم. او یک شخصیت سه بعدی است که من بصورت همزمان و زنده او را کنترل می‌کنم.

So, I work in visual effects. And in visual effects, one of the hardest things to do is to create believable, digital humans that the audience accepts as real. People are just really good at recognizing other people. Go figure! So, that's OK, we like a challenge.

خوب، من در جلوه‌های ویژه کار می‌کنم. در جلوه‌های ویژه، یکی از سخت‌ترین کارها این است که انسان‌های دیجیتالی باورپذیر خلق کنیم که مخاطب آنها را همچون انسان واقعی بپذیرد. مردم واقعا در شناخت دیگر انسان‌ها خوب هستند. فکرش را بکنید! خوب عیبی ندارد، ما چالش را دوست داریم.

Over the last 15 years, we've been putting humans and creatures into film that you accept as real. If they're happy, you should feel happy. And if they feel pain, you should empathize with them. We're getting pretty good at it, too. But it's really, really difficult. Effects like these take thousands of hours and hundreds of really talented artists.

طی ۱۵ سال گذشته، انسان‌ها و موجوداتی را در فیلم گنجانده‌ایم که شما به عنوان واقعی می‌پذیرید. اگر خوشحال بودند، شما هم باید احساس خوشحالی کنید. و اگر درد بکشند، شما هم باید با آنها همدردی کنید. ما در این کار هم خوب هستیم. اما واقعا، واقعا کار سختی است. جلوه‌هایی مثل این هزاران ساعت کار و صدها هنرمند بااستعداد می‌طلبد.

But things have changed. Over the last five years, computers and graphics cards have gotten seriously fast. And machine learning, deep learning, has happened. So we asked ourselves: Do you suppose we could create a photo-realistic human, like we're doing for film, but where you're seeing the actual emotions and the details of the person who's controlling the digital human in real time? In fact, that's our goal: If you were having a conversation with DigiDoug one-on-one, is it real enough so that you could tell whether or not I was lying to you? So that was our goal.

اما اوضاع عوض شده است. در طی پنج سال گذشته، کامپیوترها و کارت‌های گرافیکی خیلی سریع شده‌اند. یادگیری ماشین و یادگیری عمیق بوجود آمده است. ما از خودمان پرسیدیم: آیا قرار است که یک تصویر واقع‌گرایانه از انسان بسازیم، مانند کاری که در فیلم می‌کنیم، اما آنجا شما احساسات واقعی و جزئیات همان فردی را می‌بینید بطورهمزمان انسان دیجیتالی را کنترل می‌کند؟ هدف ما در حقیقت این بود: اگر مکالمه‌ای با دیجی داگ داشتید، رو در رو، آنقدر واقعی آنقدر واقعی به نظر برسد که شما بتوانید بگویید که آیا به شما دروغ می‌گفتم یا نه؟ پس هدف ما این بود.

About a year and a half ago, we set off to achieve this goal. What I'm going to do now is take you basically on a little bit of a journey to see exactly what we had to do to get where we are. We had to capture an enormous amount of data. In fact, by the end of this thing, we had probably one of the largest facial data sets on the planet. Of my face.

در حدود یک سال و نیم پیش، ما سعی کردیم که به این هدف برسیم. حالا می‌خواهم که شما را به سفر کوتاهی ببرم تا ببینید که دقیقا ما باید چه می‌کردیم تا به این جایی که هستیم برسیم. ما باید حجم عظیمی از اطلاعات را جمع آوری می‌کردیم. در حقیقت، تا انتهای این کار، ما احتمالا یکی از بزرگترین داده‌های چهره‌ای در روی زمین را داشتیم. البته از چهره‌ی خودم.

(Laughter)

(خنده حضار)

Why me? Well, I'll do just about anything for science. I mean, look at me! I mean, come on. We had to first figure out what my face actually looked like. Not just a photograph or a 3-D scan, but what it actually looked like in any photograph, how light interacts with my skin. Luckily for us, about three blocks away from our Los Angeles studio is this place called ICT. They're a research lab that's associated with the University of Southern California. They have a device there, it's called the "light stage." It has a zillion individually controlled lights and a whole bunch of cameras. And with that, we can reconstruct my face under a myriad of lighting conditions. We even captured the blood flow and how my face changes when I make expressions. This let us build a model of my face that, quite frankly, is just amazing. It's got an unfortunate level of detail, unfortunately.

چرا من؟ خوب، من برای علم هر کاری می‌کنم. منظورم این است، به من نگاه کنید! خوب، دیدید. ما باید اول تخمین می‌زدیم که صورت من دقیقا چه شکلی است. نه فقط یک عکس یا یک اسکن سه بعدی، بلکه آن چیزی که درهرعکسی به نظر می‌رسید، و تاثیری که نور روی پوست من می‌گذاشت. خوشبختانه، سه بلوک آن طرف‌تر از استودیوی ما در لس آنجلس این مکان قرار دارد که نامش ICT است. یک آزمایشگاه تحقیقاتی که با دانشگاه کالیفرنیای جنوبی همکاری دارد. آنها درآنجا وسیله‌ای به نام «صحنه‌ی نمایشِ نور» دارند. این وسیله تعداد زیادی چراغ دارد که بصورت جداگانه کنترل می‌شوند و تعداد زیادی دوربین آنجاست. با این وسیله، ما می‌توانیم صورتم را در نورپردازی‌های متفاوت بازسازی کنیم. ما حتی جریان خون را هم ضبط کردیم. و اینکه چطوروقتی حالتی به خود می‌گیرم، صورتم تغییر می‌کند. این کار به ما اجازه داد تا مدلی از صورتم بسازیم، که صادقانه بگویم، واقعا جالب بود. فقط متاسفانه یک سری جزئیات ناخوشایند را هم نشان می‌داد.

(Laughter)

(خنده حضار)

You can see every pore, every wrinkle. But we had to have that. Reality is all about detail. And without it, you miss it. We are far from done, though. This let us build a model of my face that looked like me. But it didn't really move like me. And that's where machine learning comes in. And machine learning needs a ton of data. So I sat down in front of some high-resolution motion-capturing device. And also, we did this traditional motion capture with markers. We created a whole bunch of images of my face and moving point clouds that represented that shapes of my face. Man, I made a lot of expressions, I said different lines in different emotional states ... We had to do a lot of capture with this. Once we had this enormous amount of data, we built and trained deep neural networks. And when we were finished with that, in 16 milliseconds, the neural network can look at my image and figure out everything about my face. It can compute my expression, my wrinkles, my blood flow -- even how my eyelashes move. This is then rendered and displayed up there with all the detail that we captured previously.

شما می‌توانید تمام روزنه‌ها و چروک‌های صورتم را ببینید. اما ما باید آنها راهم داشته باشیم جزئیات است که باعث واقعی به نظر رسیدن می‌شود. و بدون جزئیات، موفق نمی‌شوید. اگرچه تا اتمام آن راه درازی داریم این تکنولوژی به ما اجازه داد تا مدلی از صورتم بسازیم که شبیه من بود. اما دقیقا مانند من حرکت نمی‌کرد. و اینجاست که یادگیری ماشینی وارد می‌شود. یادگیری ماشینی به یک عالمه داده نیاز دارد، پس من روبروی ابزارهای ضبط حرکات با رزولوشن بالا نشستم. ما همچنین ضبط حرکت سنتی با نشانگرها را هم انجام دادیم. ما یک سری کامل عکس ازصورت من داشتیم و انبوهی از نقاط متحرک که شکل صورتم را نشان می‌دادند. من با صورتم حالت‌های مختلف درآوردم، عبارات مختلفی را با حالات احساسی متفاوت گفتم. ما باید با این تصاویر زیادی می‌گرفتیم. وقتی که این حجم انبوه داده‌ها را بدست آوردیم، شبکه‌های عصبی پیچیده‌ای ساختیم و بکارانداختیم. وقتی کارمان با آن تمام شد، در یک ۱۶هزارم ثانیه، شبکه عصبی می‌تواند به تصویر من نگاه کند و از همه چیز صورت من سر در بیاورد. این شبکه می‌تواند حالت، چین‌وچروک، و جریان خون مرا بررسی کند -- حتی اینکه چطور مژه‌های من تکان می‌خورد. سپس در اینجاست که با تمام جزئیاتی که قبلا ضبط کرده‌ایم بازسازی و نمایش داده می‌شود.

We're far from done. This is very much a work in progress. This is actually the first time we've shown it outside of our company. And, you know, it doesn't look as convincing as we want; I've got wires coming out of the back of me, and there's a sixth-of-a-second delay between when we capture the video and we display it up there. Sixth of a second -- that's crazy good! But it's still why you're hearing a bit of an echo and stuff. And you know, this machine learning stuff is brand-new to us, sometimes it's hard to convince to do the right thing, you know? It goes a little sideways.

تا اتمام آن خیلی فاصله داریم. قسمت زیادی از این کار در دست اجراست. در حقیقت این اولین باری است که آن را بیرون ازشرکت‌مان نمایش می‌دهیم. می‌دانید آنقدر که ما می‌خواهیم قانع‌کننده به نظر نمی‌آید؛ سیم‌هایی از پشت من بیرون می‌آید، بین زمان ضبط ویدئو و نمایش آن، یک ششم ثانیه تاخیر وجود دارد. یک ششم ثانیه -- خیلی خوب است! اما به همین خاطر است که کمی صدا را با اکو می‌شنوید. این چنین یادگیری ماشین برای ما جدید است، گاهی اوقات سخت است که قانع شویم تا کار درست را انجام دهیم، متوجه منظورم هستید؟ اوضاع خراب‌تر می‌شود.

(Laughter)

(خنده حضار)

But why did we do this? Well, there's two reasons, really. First of all, it is just crazy cool.

اما چرا این کار را می‌کنیم؟ خوب، در واقع دو دلیل دارد. اول از همه، چون معرکه است.

(Laughter)

(خنده حضار)

How cool is it? Well, with the push of a button, I can deliver this talk as a completely different character. This is Elbor. We put him together to test how this would work with a different appearance. And the cool thing about this technology is that, while I've changed my character, the performance is still all me. I tend to talk out of the right side of my mouth; so does Elbor.

چقدر معرکه است؟ خوب با فشار یک دکمه، می‌توانم این گفتگو را از طریق یک شخصیت کاملا متفاوت انجام بدهم. این البور است. ما او را خلق کردیم تا ببینیم که با یک ظاهر متفاوت چطور کار می‌کند. جذابیت این تکنولوژی در این است که اگرچه من شخصیتم را تغییر می‌دهم، هنوز عملکرد من یکسان است. من تمایل دارم تا با طرف راست دهانم حرف بزنم؛ البور هم همینطور.

(Laughter)

(خنده حضار)

Now, the second reason we did this, and you can imagine, is this is going to be great for film. This is a brand-new, exciting tool for artists and directors and storytellers. It's pretty obvious, right? I mean, this is going to be really neat to have. But also, now that we've built it, it's clear that this is going to go way beyond film.

حالا، دلیل دوم اینکه چرا ما این کار را کردیم این است که این کار برای فیلم‌سازی خیلی خوب است. این تکنولوژی برای بازیگران، کارگردانان و داستان‌گویان خیلی جدید و جالب است. کاملا واضح است، اینطور نیست؟ یعنی خیلی خوب می‌شود اگراین تکنولوژی را داشته باشند اما حالا که آن را ساخته‌ایم، روشن است که فقط به فیلم‌سازی محدود نخواهد شد.

But wait. Didn't I just change my identity with the push of a button? Isn't this like "deepfake" and face-swapping that you guys may have heard of? Well, yeah. In fact, we are using some of the same technology that deepfake is using. Deepfake is 2-D and image based, while ours is full 3-D and way more powerful. But they're very related. And now I can hear you thinking, "Darn it! I though I could at least trust and believe in video. If it was live video, didn't it have to be true?" Well, we know that's not really the case, right? Even without this, there are simple tricks that you can do with video like how you frame a shot that can make it really misrepresent what's actually going on. And I've been working in visual effects for a long time, and I've known for a long time that with enough effort, we can fool anyone about anything. What this stuff and deepfake is doing is making it easier and more accessible to manipulate video, just like Photoshop did for manipulating images, some time ago.

اما صبر کنید. با فشار یک دکمه من هویت خودم را تغییر ندادم؟ آیا این تکنولوژی شبیه تغییرقیافه و «دیپ فیک» (جعل عمیق) نیست که شما با آن آشنا هستید؟ خوب بله. درحقیقت، ما از همان تکنولوژی‌ای استفاده می‌کنیم که دیپ‌فیک از آن استفاده می‌کند. دیپ‌فیک دوبعدی است و بر مبنای تصویر کار می‌کند، در صورتی که ساخت ما سه بعدی است و بسیار قوی‌تر است. اما آنها خیلی به هم مرتبط‌ اند. می‌توانم بشنوم که در ذهنتان می‌گویید، «وای! فکر می‌کردم که که حداقل می‌توانم به آنچه که در ویدیو می‌بینم اعتماد کنم و آن را باور کنم. اگر ویدئو زنده باشد، دیگر نباید به آن اعتماد کنم؟» خوب، در واقع می‌دانیم که اینطور نیست، درسته؟ حتی بدون این تکنولوژی، کلک‌های ساده‌ای وجود دارد که شما می‌توانید با ویدیو همان کاری را کنید که در تدوین عکس انجام می‌دهید که باعث می‌شود آنچه که واقعا اتفاق می‌افتد را تحریف کند. من مدت زیادی در جلوه‌های ویژه کارمی‌کردم، و به این معروفم که می‌توانم با جلوه‌های لازم هرکسی را درباره هرچیزی به اشتباه بیاندازم. کاری که این تکنولوژی و دیپ‌فیک انجام می‌دهد این است که دستکاری ویدیویی را سریعتر و راحتتر می‌کند. درست همان کاری که چند سال پیش فتوشاپ برای تغییر تصاویر انجام می‌داد.

I prefer to think about how this technology could bring humanity to other technology and bring us all closer together. Now that you've seen this, think about the possibilities. Right off the bat, you're going to see it in live events and concerts, like this. Digital celebrities, especially with new projection technology, are going to be just like the movies, but alive and in real time. And new forms of communication are coming. You can already interact with DigiDoug in VR. And it is eye-opening. It's just like you and I are in the same room, even though we may be miles apart. Heck, the next time you make a video call, you will be able to choose the version of you you want people to see. It's like really, really good makeup. I was scanned about a year and a half ago. I've aged. DigiDoug hasn't. On video calls, I never have to grow old.

من ترجیح می‌دهم که به این فکر کنم که چگونه این تکنولوژی می‌تواند ویژگی‌های انسانی را وارد سایر فن‌آوری‌ها کند تا ما به هم نزدیک‌تر شویم. حالا که این تکنولوژی را دیدید، درباره قابلیت‌های آن هم فکر کنید. در ابتدا، شما درکنسرت‌ها و رویدادهای زنده آن را به این صورت خواهید دید. سلبریتی‌های دیجیتالی، به خصوص با تکنولوژی جدید نمایشی، درست مثل فیلم‌ها خواهند بود، اما زنده و بصورت همزمان. اشکال جدیدی از ارتباطات در راه است. شما هم‌اکنون هم می‌توانید با دیجی‌داگ در واقعیت مجازی (VR) تعامل داشته باشید. این تکنولوژی بسیار راه‎‌گشاست. انگار که من و شما در یک اتاق هستیم، در حالیکه کیلومترها از هم فاصله داریم. دفعه بعدی که تماس ویدئویی برقرار کنید، قادر خواهید بود تا تصویر دیگری برای خودتان انتخاب کنید، که دوست دارید مردم آن را ببینند. این دقیقا مثل یک آرایش خیلی خیلی خوب است. من یک سال و نیم پیش اسکن شدم. سن من بالا رفته است. اما سن دیجی‌داگ نه. در تماس‌های ویدیویی، من مجبور نیستم پیرتر بشوم.

And as you can imagine, this is going to be used to give virtual assistants a body and a face. A humanity. I already love it that when I talk to virtual assistants, they answer back in a soothing, humanlike voice. Now they'll have a face. And you'll get all the nonverbal cues that make communication so much easier. It's going to be really nice. You'll be able to tell when a virtual assistant is busy or confused or concerned about something.

و همانطور که می‌توانید تصور کنید، این تکنولوژی قرار است برای دادن بدن و صورت به دستیاران بصری مورد استفاده قرار بگیرد. یک ویژگی انسانی. همین حالا هم دوست دارم که وقتی با دستیاران بصری صحبت می‌کنم آنها هم با صدایی به نرمی صدای انسان به من پاسخ دهند. حالا آنها تصویر هم خواهند داشت. و شما همه‌ی نشانه‌های غیرزبانی که ارتباط را آسان‌تر می‌کند در اختیار خواهید داشت. بسیار خوب خواهد شد. شما قادر خواهید بود که بگویید چه زمانی کمک بصری، گیج شده یا سرش شلوغ است یا اینکه درباره چیزی نگران است.

Now, I couldn't leave the stage without you actually being able to see my real face, so you can do some comparison. So let me take off my helmet here. Yeah, don't worry, it looks way worse than it feels.

حالا نمی‌توانستم صحنه را بدون اینکه شما چهره واقعی من را ببینید ترک کنم، پس شما می‌توانید کمی مقایسه کنید. بگذارید کلاهم را ازسرم بردارم. بله، نگران نباشید، بدتر از آن چیزی است که به نظر می‌رسد.

(Laughter)

(خنده حضار)

So this is where we are. Let me put this back on here.

خوب ما اینجا هستیم. بگذارید تا این را سر جایش بگذارم.

(Laughter) Doink!

(خنده حضار) (دینک)

So this is where we are. We're on the cusp of being able to interact with digital humans that are strikingly real, whether they're being controlled by a person or a machine. And like all new technology these days, it's going to come with some serious and real concerns that we have to deal with. But I am just so really excited about the ability to bring something that I've seen only in science fiction for my entire life into reality. Communicating with computers will be like talking to a friend. And talking to faraway friends will be like sitting with them together in the same room.

خوب پس ما اینجا هستیم. ما در شرف این هستیم که بتوانیم با انسان‌های دیجیتالی ارتباط برقرار کنیم که به طرز عجیبی واقعی هستند، خواه توسط انسان کنترل شوند یا ماشین. مانند تمام تکنولوژی‌های امروز، برخی نگرانی‌های واقعی و جدید نیزدر پی خواهند داشت که باید با آنها روبرو شویم. اما من واقعا هیجان هستم که این تکنولوژی این توانایی را دارد چیزهایی را که تنها درداستان‌های علمی-تخیلی در طول زندگی دیده‌ام به واقعیت درآوردند. برقراری ارتباط با کامپیوتر مانند صحبت کردن با یک دوست خواهد شد. صحبت با دوستانی در راه دور هستند مثل این خواهد شد که انگار با آنها در یک اتاق نشسته‌ایم.

Thank you very much.

خیلی متشکرم.

(Applause)

(تشویق حضار)

Hello.

سلام.

(Applause)

(تشویق حضار)

(Laughter)

(خنده حضار)

(Laughter)

(خنده حضار)

(Laughter)

(خنده حضار)

But why did we do this? Well, there's two reasons, really. First of all, it is just crazy cool.

اما چرا این کار را می‌کنیم؟ خوب، در واقع دو دلیل دارد. اول از همه، چون معرکه است.

(Laughter)

(خنده حضار)

(Laughter)

(خنده حضار)

(Laughter)

(خنده حضار)

So this is where we are. Let me put this back on here.

خوب ما اینجا هستیم. بگذارید تا این را سر جایش بگذارم.

(Laughter) Doink!

(خنده حضار) (دینک)

Thank you very much.

خیلی متشکرم.

(Applause)

(تشویق حضار)

Doug Roble: Digital humans that look just like us

Doug Roble: Digital humans that look just like us

Related talks

Paul Debevec: Animating a photo-real digital face

Danielle Feinberg: The magic ingredient that brings Pixar movies to life

Tasos Frantzolas: Everything you hear on film is a lie

James Bridle: The nightmare videos of children's YouTube -- and what's wrong with the internet today

Chris Milk: How virtual reality can create the ultimate empathy machine

Ariel Garten: Know thyself, with a brain scanner

Related talks

Paul Debevec: Animating a photo-real digital face

Danielle Feinberg: The magic ingredient that brings Pixar movies to life

Tasos Frantzolas: Everything you hear on film is a lie

James Bridle: The nightmare videos of children's YouTube -- and what's wrong with the internet today

Chris Milk: How virtual reality can create the ultimate empathy machine

Ariel Garten: Know thyself, with a brain scanner