Doug Roble: Digital humans that look just like us

مرحبًا. أنا لست شخصًا حقيقيًا. في الواقع أنا نسخة لشخص حقيقي. وعلي الرغم من ذلك، أشعر كأني شخص حقيقي. من الصعب شرح ذلك. انتظروا.. أعتقد أنني رأيت شخصًا حقيقياً.. يوجد شخص! فلندعُه إلي المسرح.

Hello. I'm not a real person. I'm actually a copy of a real person. Although, I feel like a real person. It's kind of hard to explain. Hold on -- I think I saw a real person ... there's one. Let's bring him onstage.

مرحبًا.

Hello.

(تصفيق)

(Applause)

ما ترونه هناك هو إنسان رقمي. أرتدي بدلة مُلتقطة للحركة عن طريق القصور الذاتي تلتقط حركة جسدي. ولدي هنا كاميرا تراقب تعابير وجهي وتغذي آلة ذات نظام تعلمي تسجل تعابير وجهي، مثل: هممم، ممم، مم وتنقلها إلي ذلك الرجل. نطلق عليه اسم: "ديجي دوج". في الواقع، هو شخصية ثلاثية الأبعاد والتي أتحكم بها مباشرة.

What you see up there is a digital human. I'm wearing an inertial motion capture suit that's figuring what my body is doing. And I've got a single camera here that's watching my face and feeding some machine-learning software that's taking my expressions, like, "Hm, hm, hm," and transferring it to that guy. We call him "DigiDoug." He's actually a 3-D character that I'm controlling live in real time.

أعمل في مجال المؤثرات البصرية. وفي هذا المجال، واحدة من أصعب الأشياء هي صناعة بشر رقميين واقعيين يتقبلهم الجمهور كبشر حقيقيين. الناس بارعون بالفعل في التعرف علي الناس الآخرين. اذهب واكتشف! إذن، هذا جيد، نحب المنافسة.

So, I work in visual effects. And in visual effects, one of the hardest things to do is to create believable, digital humans that the audience accepts as real. People are just really good at recognizing other people. Go figure! So, that's OK, we like a challenge.

علي مدار الخمسة عشر عامًا الماضية قمنا بوضع البشر والكائنات الحية في فيلم لتتقبلهم كحقيقيين. إن كانوا سعداء، يتعين عليك أن تكون سعيدًا. وإن تألموا، يتعين عليك أن تتعاطف معهم. وأصبحنا جيدين للغاية في ذلك. ولكن هذا الأمر صعب للغاية. مؤثرات مثل هذه تتطلب آلاف ساعات العمل ومئات من الفنانين الموهوبين بشدة.

Over the last 15 years, we've been putting humans and creatures into film that you accept as real. If they're happy, you should feel happy. And if they feel pain, you should empathize with them. We're getting pretty good at it, too. But it's really, really difficult. Effects like these take thousands of hours and hundreds of really talented artists.

ولكن العالم تغير. في السنوات الخمس الأخيرة، أجهزة الكمبيوتر وكروت الرسوميات أصبحت سريعة للغاية. وتعلم الآلة، التعلم العميق قد تحقق. لذلك نسأل أنفسنا: هل تفترض أنه يمكننا صنع إنسان بصورة واقعية تمامًا، مثلما نفعل في الفيلم، ولكن الفرق أنك ترى الانفعالات الحقيقية والتفاصيل للشخص الذي يتحكم في الإنسان الرقمي مباشرةً؟ في الحقيقة، هذا هو هدفنا: إذا قمت بخوض محادثةٍ مع "ديجي دوج" وجهًا لوجه هل تكون واقعية كفاية بحث يمكنك الجزم بأني أكذب عليك أم لا؟ كان هذا هو هدفنا.

But things have changed. Over the last five years, computers and graphics cards have gotten seriously fast. And machine learning, deep learning, has happened. So we asked ourselves: Do you suppose we could create a photo-realistic human, like we're doing for film, but where you're seeing the actual emotions and the details of the person who's controlling the digital human in real time? In fact, that's our goal: If you were having a conversation with DigiDoug one-on-one, is it real enough so that you could tell whether or not I was lying to you? So that was our goal.

منذ قرابة العام ونصف انطلقنا لتحقيق هذا الهدف. ما سأفعله الآن هو أخذك في جزء صغير من رحلة لترى بالضبط ما توجب علينا فعله لنصل إلى ما نحن عليه. قمنا بجمع كم هائل من المعلومات. في الواقع، بنهاية هذا المشروع من الممكن أن يكون لدينا واحدة من أكبر قواعد البيانات الوجهية في العالم. لوجهي.

About a year and a half ago, we set off to achieve this goal. What I'm going to do now is take you basically on a little bit of a journey to see exactly what we had to do to get where we are. We had to capture an enormous amount of data. In fact, by the end of this thing, we had probably one of the largest facial data sets on the planet. Of my face.

(ضحكات)

(Laughter)

لماذا أنا؟ حسنًا، سوف أفعل أي شيء للعلم. أعني، انظر إليّ! أعني، هيا! في البداية، يجب أن نعرف كيف يبدو وجهي. ليس فقط صورة فوتوغرافية أو صورة ثلاثية الأبعاد، ولكن كيف يبدو وجهي في أي صورة، وكيف يتفاعل الضوء مع جلدي. لحسن حظنا، على بعد ثلاثة عمارات سكنية من الاستوديو الخاص بنا في لوس أنجلوس في مكان يدعي بمعهد التكنولوجيات الإبداعية. يوجد هناك معمل تابع لجامعة جنوب كاليفورنيا. لديهم جهاز يسمي بـ (لايت ستيج). لديه زيلليون وحدة ضوء منفردة التحكم ومجموعة كاميرات كاملة. وبواسطة تلك المعدات، نستطيع إعادة بناء وجهي تحت ظروف الاضاءة الوافرة، لدرجة أننا التقطنا سيلان الدم وكيف يتغير وجهي عند قيامي بالتعبيرات. ساعدنا هذا على بناء نموذج لوجهي وهو بصراحة رائع جدًا. للأسف يملك وجهي قدراً سيئاً من التفاصيل.

Why me? Well, I'll do just about anything for science. I mean, look at me! I mean, come on. We had to first figure out what my face actually looked like. Not just a photograph or a 3-D scan, but what it actually looked like in any photograph, how light interacts with my skin. Luckily for us, about three blocks away from our Los Angeles studio is this place called ICT. They're a research lab that's associated with the University of Southern California. They have a device there, it's called the "light stage." It has a zillion individually controlled lights and a whole bunch of cameras. And with that, we can reconstruct my face under a myriad of lighting conditions. We even captured the blood flow and how my face changes when I make expressions. This let us build a model of my face that, quite frankly, is just amazing. It's got an unfortunate level of detail, unfortunately.

(ضحكات)

(Laughter)

يمكنك أن تري كل فتحة مسام، كل تجعيدة. وجب علينا فعل ذلك. الواقعية هي التفاصيل. وبدون التفاصيل، تفقد الواقعية. نحن بعيدون جدًأ عن تمام المشروع. مكننا هذا من بناء نموذج لوجهي يبدو مثلي تمامًا. ولكن لا يتحرك مثلي تمامًا. وها هنا يأتي دور التعلم الآلي. ونظام التعلم الآلي يحتاج إلي أطنان من المعلومات. ولهذا جلست أمام جهاز فائق الدقة ملتقط للحركة. قمنا أيضاً برسم اعتيادي لالتقاط الحركة. صنعنا كماً هائلاً من الصور لوجهي والنقاط المتحركة التي تشكّل وجهي. يا رجل، أقوم بالعديد من التعابير. قلت العديد من الجُمل بحالات عاطفية مختلفة. اضطررنا إلي التقاط العديد من الصور في هذا الوضع. وعندما حصلنا على هذا الكم الهائل من المعلومات، بنينا شبكات عصبية عميقة ودربناها. وعندما انتهينا من ذلك، في 16 ميللي ثانية، تستطيع الشبكة العصبية النظر إلي وجهي واكتشاف كل شيء حول وجهي. يمكنها برمجة تعابيري، تجاعيدي، تدفق الدم، حتي كيفية حركة الرموش. هذه المعلومات تُعالج وتُعرض هناك وبكل تلك المعلومات التي التقطناها سابقًا.

You can see every pore, every wrinkle. But we had to have that. Reality is all about detail. And without it, you miss it. We are far from done, though. This let us build a model of my face that looked like me. But it didn't really move like me. And that's where machine learning comes in. And machine learning needs a ton of data. So I sat down in front of some high-resolution motion-capturing device. And also, we did this traditional motion capture with markers. We created a whole bunch of images of my face and moving point clouds that represented that shapes of my face. Man, I made a lot of expressions, I said different lines in different emotional states ... We had to do a lot of capture with this. Once we had this enormous amount of data, we built and trained deep neural networks. And when we were finished with that, in 16 milliseconds, the neural network can look at my image and figure out everything about my face. It can compute my expression, my wrinkles, my blood flow -- even how my eyelashes move. This is then rendered and displayed up there with all the detail that we captured previously.

نحن بعيدون جدًا عن التمام. يوجد الكثير من العمل قيد التنفيذ. في الواقع، هذه هي المرة الأولي التي نعرض فيها تلك البيانات خارج الشركة. وكما ترون، ليست تبدو مُقنعة كما نريد. لدي أسلاك تخرج من ظهري ويوجد تأخر بحوالي سدس من الثانية بين وقت التقاط الفيديو وعندما نعرضه هناك. سدس من الثانية-- ذلك جيد بجنون! ولكن يظل هناك سماعنا لصدى الصوت وتلك الأشياء. وكما تعلمون، نظام التعلم الآلي هذا جديد علينا كليًا، في بعض الأوقات يكون من الصعب الإقناع للقيام بالأمر الصواب، أليس كذلك؟ تنحرف الأمور عن مسارها قليلاً.

We're far from done. This is very much a work in progress. This is actually the first time we've shown it outside of our company. And, you know, it doesn't look as convincing as we want; I've got wires coming out of the back of me, and there's a sixth-of-a-second delay between when we capture the video and we display it up there. Sixth of a second -- that's crazy good! But it's still why you're hearing a bit of an echo and stuff. And you know, this machine learning stuff is brand-new to us, sometimes it's hard to convince to do the right thing, you know? It goes a little sideways.

(ضحكات)

(Laughter)

لماذا فعلنا ذلك؟ حسنًا، ثمة سببان لذلك. أولًا: ذلك رائع بشكل جنوني.

But why did we do this? Well, there's two reasons, really. First of all, it is just crazy cool.

(ضحكات)

(Laughter)

إلي أي مدي ذلك رائع؟ إذًا، بكبسة زر أستطيع أن أنقل ذلك الحوار كشخصية مختلفة تمامًا. هذا إلبور. وضعناه لاختبار كيف يعمل ذلك بمظهر مختلف. الشيء الرائع بخصوص تلك التقنية أنني أستطيع التبديل بين الشخصيات، وما زلت أنا من أمثّل العرض كاملاً. أتحدث بجانب فمي الأيمن وكذلك ألبور.

How cool is it? Well, with the push of a button, I can deliver this talk as a completely different character. This is Elbor. We put him together to test how this would work with a different appearance. And the cool thing about this technology is that, while I've changed my character, the performance is still all me. I tend to talk out of the right side of my mouth; so does Elbor.

(ضحكات)

(Laughter)

السبب الثاني لقيامنا بذلك، يمكنك التخيل، هو أن ذلك سوف يكون فيلمًا رائعًا. تلك تقنية جديدة تمامًا، أداة رائعة للفنانين والمخرجين ورواة القصص. هذا واضح للغاية، أليس كذلك؟ أعني أن ذلك سوف يكون رائعًا للغاية. ولكن أيضًا بما أننا بنينا ذلك فمن الواضح أن الأمر سوف يتعدي الأفلام.

Now, the second reason we did this, and you can imagine, is this is going to be great for film. This is a brand-new, exciting tool for artists and directors and storytellers. It's pretty obvious, right? I mean, this is going to be really neat to have. But also, now that we've built it, it's clear that this is going to go way beyond film.

ولكن انتظر.. ألم أغير هويتي فقط بكبسة زر؟ أليس ذلك كتقنية التزييف العميق "ديب فيك" وتغيير الوجوه "فيس سواب"، التي ربما سمعتم بها يا رفاق؟ حسنًا، نعم. في الواقع، نحن نستخدم بعض من تلك التكنولوجيا والتي تستخدمها تقنية التزييف العميق. تعتمد تلك على الصور ثنائية الأبعاد بينما برنامجنا ثلاثي الأبعاد بالكامل وأقوي بكثير. ولكنهما مرتبطين بشكل كبير. أستطيع رؤيتك تحدث نفسك الآن: "اللعنة! اعتقدت أنني علي الأقل أستطيع أن أثق في الفيديو وأصدقه. إذا كان بمثابة فيديو مباشر، ألا يتعين أن يكون حقيقيًا؟". حسنًا، نعلم أن ذلك غير صحيح، أليس كذلك؟ حتي بدون ذلك، توجد العديد من الحيل البسيطة التي يمكنك فعلها بالفيديو ككيفية أخذ لقطة قد لا تعبر بصدق عما يحدث بالفعل. أعمل في مجال المؤثرات البصرية منذ وقت طويل وأعلم منذ وقت طويل أنه وبالجهد الكافي يمكننا خداع أي شخص بأي شيء. ما يفعله ذلك الشئ هو والتزييف العميق هو جعل عملية التلاعب بالفيديو أسهل، تمامًا مثل الفوتوشوب للتلاعب بالصور في فترة ما.

But wait. Didn't I just change my identity with the push of a button? Isn't this like "deepfake" and face-swapping that you guys may have heard of? Well, yeah. In fact, we are using some of the same technology that deepfake is using. Deepfake is 2-D and image based, while ours is full 3-D and way more powerful. But they're very related. And now I can hear you thinking, "Darn it! I though I could at least trust and believe in video. If it was live video, didn't it have to be true?" Well, we know that's not really the case, right? Even without this, there are simple tricks that you can do with video like how you frame a shot that can make it really misrepresent what's actually going on. And I've been working in visual effects for a long time, and I've known for a long time that with enough effort, we can fool anyone about anything. What this stuff and deepfake is doing is making it easier and more accessible to manipulate video, just like Photoshop did for manipulating images, some time ago.

أفضل أن أفكر في كيف يمكن لهذه التقنية أن تجلب الطبيعة البشرية إلي تقنية أخري وتجعلنا جميعًا أقرب لبعضنا البعض. الأن وبما أنك رأيت كل ذلك، فكر بالاحتمالات. علي الفور، سوف تراها في الحفلات والفعاليات المباشرة مثل هذه. مشاهير رقميون، وخاصة بعد تقنية العرض الجديدة، سوف يصبحون بالضبط مثل الأفلام، ولكن بشكل مباشر. وسوف تظهر أشكال عديدة للتواصل. يمكنك التفاعل من ديجي دوج عن طريق نظارات الواقع الافتراضي. إنها مدهشة! كأنني وأنت في نفس الغرفة علي الرغم من كوننا علي بعد أميال. يا للدهشة، في المرة القادمة التي تجري فيها اتصال فيديو، سوف يمكنك اختيار نسختك التي تريد أن يراها الشخص الآخر. إنها مثل مساحيق التجميل الجيدة للغاية. فُحصت منذ قرابة العام ونصف. لقد تقدمت في العمر! ديجي دوج لم يتقدم بالعمر. في مكالمات الفيديو، ليس عليّ أن أتقدم بالعمر أبدًا.

I prefer to think about how this technology could bring humanity to other technology and bring us all closer together. Now that you've seen this, think about the possibilities. Right off the bat, you're going to see it in live events and concerts, like this. Digital celebrities, especially with new projection technology, are going to be just like the movies, but alive and in real time. And new forms of communication are coming. You can already interact with DigiDoug in VR. And it is eye-opening. It's just like you and I are in the same room, even though we may be miles apart. Heck, the next time you make a video call, you will be able to choose the version of you you want people to see. It's like really, really good makeup. I was scanned about a year and a half ago. I've aged. DigiDoug hasn't. On video calls, I never have to grow old.

وكما تتخيل، هذا سوف يُستخدم لإعطاء المساعدين الشخصيين الافتراضيين وجهًا وجسدًا. لإعطائهم الطبيعة البشرية. أحب بالفعل عندما أتحدث مع المساعديين الافتراضيين ويردون بصوت بشري لطيف. الآن سوف يمتلكون وجهًا. وسوف يكون لديك جميع الإشارات غير اللفظية التي تجعل التواصل أسهل. سوف يكون ذلك لطيفًا للغاية. سوف يمكنك معرفة إذا ما كان المساعد الافتراضي مشغولاً أو مرتبكًا أو في خضم أمر ما.

And as you can imagine, this is going to be used to give virtual assistants a body and a face. A humanity. I already love it that when I talk to virtual assistants, they answer back in a soothing, humanlike voice. Now they'll have a face. And you'll get all the nonverbal cues that make communication so much easier. It's going to be really nice. You'll be able to tell when a virtual assistant is busy or confused or concerned about something.

الآن، لا أستطيع أن أغادر المسرح بدون أن تستطيعوا فعلًا رؤية وجهي الحقيقي، لتقوموا بالمقارنة. دعوني أخلع تلك الخوذة. لا تقلقوا، يبدو الأمر أسوء بكثير مما تحس به فعلًا.

Now, I couldn't leave the stage without you actually being able to see my real face, so you can do some comparison. So let me take off my helmet here. Yeah, don't worry, it looks way worse than it feels.

(ضحكات)

(Laughter)

ها نحن ذا. دعوني أضع تلك الخوذة مجددًا.

So this is where we are. Let me put this back on here.

(ضحكات) ضخمة!

(Laughter) Doink!

إذًا، ها نحن ذا. نحن علي وشك أن نتمكن من التفاعل مع بشر رقميين حقيقيين بشكل مذهل! والذين يتم التحكم بهم إما عبر شخص أو آلة. وكسائر التقنيات الجديدة هذه الأيام، سوف يحل معها العديد من التساؤلات الجدية التي سيتعين علينا التعامل معها. ولكني متشوق للغاية لكيفية تحويل شيء أراه فقط في الخيال العلمي طوال حياتي إلي واقع. التفاعل مع أجهزة الكمبيوتر سيكون كالتحدث إلي صديق والتحدث إلى صديق بعيد جدًا سيكون مثل جلوسكما معًا في نفس الغرفة

So this is where we are. We're on the cusp of being able to interact with digital humans that are strikingly real, whether they're being controlled by a person or a machine. And like all new technology these days, it's going to come with some serious and real concerns that we have to deal with. But I am just so really excited about the ability to bring something that I've seen only in science fiction for my entire life into reality. Communicating with computers will be like talking to a friend. And talking to faraway friends will be like sitting with them together in the same room.

شكرًا جزيلًا.

Thank you very much.

(تصفيق)

(Applause)

مرحبًا.

Hello.

(تصفيق)

(Applause)

(ضحكات)

(Laughter)

(ضحكات)

(Laughter)

(ضحكات)

(Laughter)

لماذا فعلنا ذلك؟ حسنًا، ثمة سببان لذلك. أولًا: ذلك رائع بشكل جنوني.

But why did we do this? Well, there's two reasons, really. First of all, it is just crazy cool.

(ضحكات)

(Laughter)

(ضحكات)

(Laughter)

(ضحكات)

(Laughter)

ها نحن ذا. دعوني أضع تلك الخوذة مجددًا.

So this is where we are. Let me put this back on here.

(ضحكات) ضخمة!

(Laughter) Doink!

شكرًا جزيلًا.

Thank you very much.

(تصفيق)

(Applause)

Doug Roble: Digital humans that look just like us

Doug Roble: Digital humans that look just like us

Related talks

Paul Debevec: Animating a photo-real digital face

Danielle Feinberg: The magic ingredient that brings Pixar movies to life

Tasos Frantzolas: Everything you hear on film is a lie

James Bridle: The nightmare videos of children's YouTube -- and what's wrong with the internet today

Chris Milk: How virtual reality can create the ultimate empathy machine

Ariel Garten: Know thyself, with a brain scanner

Related talks

Paul Debevec: Animating a photo-real digital face

Danielle Feinberg: The magic ingredient that brings Pixar movies to life

Tasos Frantzolas: Everything you hear on film is a lie

James Bridle: The nightmare videos of children's YouTube -- and what's wrong with the internet today

Chris Milk: How virtual reality can create the ultimate empathy machine

Ariel Garten: Know thyself, with a brain scanner