Doug Roble: Digital humans that look just like us

Hello. I'm not a real person. I'm actually a copy of a real person. Although, I feel like a real person. It's kind of hard to explain. Hold on -- I think I saw a real person ... there's one. Let's bring him onstage.

שלום. אני לא אדם אמיתי. אני למעשה עותק של אדם אמיתי. למרות, שאני מרגיש כמו אדם אמיתי. קצת קשה להסביר את זה. חכו -- אני חושב שראיתי אדם אמיתי ... יש שם אחד. בואו ונעלה אותו לבמה.

Hello.

שלום.

(Applause)

(מחיאות כפיים)

What you see up there is a digital human. I'm wearing an inertial motion capture suit that's figuring what my body is doing. And I've got a single camera here that's watching my face and feeding some machine-learning software that's taking my expressions, like, "Hm, hm, hm," and transferring it to that guy. We call him "DigiDoug." He's actually a 3-D character that I'm controlling live in real time.

מה שאתם רואים שם זה אדם דיגיטלי. אני לובש חליפת לכידת תנועה התמדית שמבינה מה הגוף שלי עושה. ויש לי פה מצלמה אחת שמסתכלת על הפנים שלי ומזינה כמה תוכנות למידה ממוחשבות שקולטות את הביטויים שלי, כמו "המ, המ, המ," ומעבירות אותם לבחור ההוא. אנחנו קוראים לו "דיג'ידאג." הוא בעצם דמות תלת מימדית שאני שולט בה חי בזמן אמת.

So, I work in visual effects. And in visual effects, one of the hardest things to do is to create believable, digital humans that the audience accepts as real. People are just really good at recognizing other people. Go figure! So, that's OK, we like a challenge.

אני עובד עם אפקטים חזותיים. ובאפקטים חזותיים, אחד הדברים שקשה לעשות הוא ליצור אנשים דיגיטליים אמינים שהקהל מקבל כאמיתיים. אנשים פשוט מיטיבים לזהות אנשים אחרים. לכו תבינו! אז, זה בסדר, אנחנו אוהבים אתגר.

Over the last 15 years, we've been putting humans and creatures into film that you accept as real. If they're happy, you should feel happy. And if they feel pain, you should empathize with them. We're getting pretty good at it, too. But it's really, really difficult. Effects like these take thousands of hours and hundreds of really talented artists.

במהלך 15 השנים האחרונות, כבר הכנסנו בני אדם ויצורים לסרטים שאתם מקבלים כאמיתיים. אם הם מאושרים, אתם אמורים להרגיש מאושרים. ואם הם חשים כאב,אתם צריכים לחוש כלפיהם אמפתיה. אנחנו נעשים די טובים גם בזה. אבל זה ממש, ממש קשה. אפקטים כאלה מצריכים אלפי שעות ומאות אמנים ממש מוכשרים.

But things have changed. Over the last five years, computers and graphics cards have gotten seriously fast. And machine learning, deep learning, has happened. So we asked ourselves: Do you suppose we could create a photo-realistic human, like we're doing for film, but where you're seeing the actual emotions and the details of the person who's controlling the digital human in real time? In fact, that's our goal: If you were having a conversation with DigiDoug one-on-one, is it real enough so that you could tell whether or not I was lying to you? So that was our goal.

אבל דברים השתנו. בחמש השנים האחרונות, מחשבים וכרטיסים גרפיים נעשו מהירים מאוד. ולמידת מכונה, למידה עמוקה, התפתחו. אז שאלנו את עצמנו: האם אנחנו מניחים שנוכל ליצור אדם פוטו-ריאליסטי, כמו שאנחנו מייצרים לסרטים, אבל כזה שניתן להבחין ברגשות אמיתיים ובפרטים של האדם ששולט באדם הדיגיטלי בזמן אמת? למעשה, זו המטרה שלנו: לו הייתם משוחחים עם דיג'ידאג אחד על אחד, זה אמיתי דיו כדי שתוכלו לומר אם אני משקר לכם או שלא? אז זו היתה המטרה שלנו.

About a year and a half ago, we set off to achieve this goal. What I'm going to do now is take you basically on a little bit of a journey to see exactly what we had to do to get where we are. We had to capture an enormous amount of data. In fact, by the end of this thing, we had probably one of the largest facial data sets on the planet. Of my face.

לפני כשנה וחצי, התחלנו לפעול על מנת להשיג מטרה זו. כעת אני מתכוון לקחת אתכם למסע קטן כדי שתראו בדיוק מה היינו צריכים לעשות כדי להגיע עד כאן. היינו צריכים ללכוד כמות עצומה של נתונים. למעשה, לבסוף, היה לנו כנראה אחד מקובצי נתוני הפנים הכי גדול על הפלנטה. של הפנים שלי.

(Laughter)

(צחוק)

Why me? Well, I'll do just about anything for science. I mean, look at me! I mean, come on. We had to first figure out what my face actually looked like. Not just a photograph or a 3-D scan, but what it actually looked like in any photograph, how light interacts with my skin. Luckily for us, about three blocks away from our Los Angeles studio is this place called ICT. They're a research lab that's associated with the University of Southern California. They have a device there, it's called the "light stage." It has a zillion individually controlled lights and a whole bunch of cameras. And with that, we can reconstruct my face under a myriad of lighting conditions. We even captured the blood flow and how my face changes when I make expressions. This let us build a model of my face that, quite frankly, is just amazing. It's got an unfortunate level of detail, unfortunately.

למה אני? כי אעשה כמעט כל דבר למען המדע. אני מתכוון, הסתכלו עלי! בחייכם, היה עלינו להבין תחילה איך הפנים שלי נראים. לא רק תצלום או סריקה תלת ממדית, אלא איך למעשה זה נראה בכל תצלום, איזו אינטראקציה יש לאור עם העור שלי. למזלנו, במרחק 3 רחובות מהסטודיו שלנו בלוס אנג'לס נמצא המקום הזה שנקרא ICT. זוהי מעבדת מחקר שקשורה לאוניברסיטת דרום קליפורניה. יש להם שם התקן, שנקרא "במת אור." שלו כמות עצומה של אורות בשליטה בודדת ומקבץ שלם של מצלמות. ואיתם ,ניתן לשחזר את פני תחת מספר עצום של סוגי תאורה. אפילו לכדנו את זרימת הדם וכיצד פניי משתנים כשאני משנה הבעות. זה מאפשר לנו לבנות דגם של הפנים שלי שלמען האמת, הוא פשוט מדהים. יש לו רמת פירוט אומללה, למרבה הצער.

(Laughter)

(צחוק)

You can see every pore, every wrinkle. But we had to have that. Reality is all about detail. And without it, you miss it. We are far from done, though. This let us build a model of my face that looked like me. But it didn't really move like me. And that's where machine learning comes in. And machine learning needs a ton of data. So I sat down in front of some high-resolution motion-capturing device. And also, we did this traditional motion capture with markers. We created a whole bunch of images of my face and moving point clouds that represented that shapes of my face. Man, I made a lot of expressions, I said different lines in different emotional states ... We had to do a lot of capture with this. Once we had this enormous amount of data, we built and trained deep neural networks. And when we were finished with that, in 16 milliseconds, the neural network can look at my image and figure out everything about my face. It can compute my expression, my wrinkles, my blood flow -- even how my eyelashes move. This is then rendered and displayed up there with all the detail that we captured previously.

רואים כל נקבובית, כל קמט. אבל זה מה שהיה עלינו לקבל. מציאות, משמעה כל פרט. וללא זה, מחמיצים את זה. ועם זאת, אנו רחוקים משלב הסיום. זה מאפשר לנו לבנות דגם של פניי שנראה כמוני. אבל הוא לא ממש התנועע כמוני. וכאן מגיעה מכונת הלמידה. ומכונת למידה זקוקה לאינספור נתונים. ישבתי מול מכשירי לכידת תנועה ברזולוציה גבוהה. וביצענו פעולות לכידת תנועה מסורתית עם סמנים. יצרנו מקבץ תמונות שלם של הפנים שלי וענני נקודת תנועה שייצגו את תווי הפנים שלי ועשיתי הרבה הבעות פנים, אמרתי משפטים שונים במצבים רגשיים שונים ... היינו צריכים ללכוד כך הרבה עם זה. ברגע שהיתה לנו כמות עצומה של נתונים, בנינו ואימנו רשתות עצביות עמוקות, וכשסיימנו עם זה, ב-16 אלפיות שניה, הרשת העצבית יכלה להסתכל על התמונה שלי ולהבין את כל מה שיש בפנים שלי. היא יכולה לחשב את ההבעה שלי, הקמטים שלי, זרימת הדם שלי -- אפילו איך הריסים שלי זזים. לאחר מכן זה מרונדר, ומוצג שם למעלה עם כל הפרטים שלכדנו קודם לכן.

We're far from done. This is very much a work in progress. This is actually the first time we've shown it outside of our company. And, you know, it doesn't look as convincing as we want; I've got wires coming out of the back of me, and there's a sixth-of-a-second delay between when we capture the video and we display it up there. Sixth of a second -- that's crazy good! But it's still why you're hearing a bit of an echo and stuff. And you know, this machine learning stuff is brand-new to us, sometimes it's hard to convince to do the right thing, you know? It goes a little sideways.

אנחנו רחוקים מנקודת הסיום. זו עבודה בתהליך מתקדם. זו הפעם הראשונה שהצגנו אותו מחוץ לחברה שלנו. ותדעו, זה לא נראה משכנע כפי שאנו רוצים; יש לי חוטים שיוצאים לי מהגב, ויש עיכוב של שישית שנייה בין זמן לכידת הווידאו לזמן הצגתו למעלה. שישית של שנייה - זה מטורף! אבל זו הסיבה לכך שאתם שומעים קצת הד וכאלה. ואתם יודעים, מכונת הלמידה הזאת חדשה לנו לגמרי, לפעמים קשה לשכנע לעשות את הדבר הנכון, זה הולך קצת לצדדים.

(Laughter)

צחוק

But why did we do this? Well, there's two reasons, really. First of all, it is just crazy cool.

אבל למה עשינו את זה? ובכן, באמת, משתי סיבות. קודם כל, זה פשוט מגניב בטירוף.

(Laughter)

(צחוק)

How cool is it? Well, with the push of a button, I can deliver this talk as a completely different character. This is Elbor. We put him together to test how this would work with a different appearance. And the cool thing about this technology is that, while I've changed my character, the performance is still all me. I tend to talk out of the right side of my mouth; so does Elbor.

כמה זה מגניב? ובכן, בלחיצת כפתור, ניתן להעביר את השיחה הזאת כדמות שונה לחלוטין. זה אלבור. הרכבנו אותו כדי לבדוק איך זה יעבוד עם חזות שונה. ומה שמגניב בטכנולוגיה הזו הוא, שבעוד שאני שיניתי את הדמות שלי, ההופעה היא עדיין כולה אני. אני נוטה לדבר מתוך צידו הימני של הפה שלי; כך גם אלבור.

(Laughter)

(צחוק)

Now, the second reason we did this, and you can imagine, is this is going to be great for film. This is a brand-new, exciting tool for artists and directors and storytellers. It's pretty obvious, right? I mean, this is going to be really neat to have. But also, now that we've built it, it's clear that this is going to go way beyond film.

הסיבה השנייה לכך שעשינו את זה ואתם יכולים לתאר לעצמכם היא, שזה יוכל להיות נהדר עבור סרטים. זהו כלי חדיש ומלהיב עבור שחקנים ובמאים ומספרי סיפורים. זה די ברור מאליו, נכון? זה הולך להיות מגניב. אבל גם, עכשיו שבנינו את זה, זה ברור שזה יהיה הרבה מעבר לסרטים.

But wait. Didn't I just change my identity with the push of a button? Isn't this like "deepfake" and face-swapping that you guys may have heard of? Well, yeah. In fact, we are using some of the same technology that deepfake is using. Deepfake is 2-D and image based, while ours is full 3-D and way more powerful. But they're very related. And now I can hear you thinking, "Darn it! I though I could at least trust and believe in video. If it was live video, didn't it have to be true?" Well, we know that's not really the case, right? Even without this, there are simple tricks that you can do with video like how you frame a shot that can make it really misrepresent what's actually going on. And I've been working in visual effects for a long time, and I've known for a long time that with enough effort, we can fool anyone about anything. What this stuff and deepfake is doing is making it easier and more accessible to manipulate video, just like Photoshop did for manipulating images, some time ago.

אבל חכו. האם לא שיניתי את זהותי בלחיצת כפתור? האם זה לא כמו "דיפ-פייק" וחילופי פנים שאולי שמעתם עליהם? ובכן, כן. אנו משתמשים בחלק מאותה טכנולוגיה ש"דיפ-פייק" משתמשים. דיפ-פייק הוא --2D ומבוסס תמונה, בעוד ששלנו הוא 3D מלא, והרבה יותר עוצמתי. אבל יש קשור חזק ביניהם. וכעת אני שומע אתכם חושבים, "לכל הרוחות! חשבנו שאפשר לפחות לסמוך על וידאו ולהאמין לו. לו זה היה וידאו חי,האם זה לא צריך היה להיות אמתי?" טוב, אנחנו יודעים שזה לא ממש כך, נכון? גם בלי זה, יש טריקים פשוטים שניתן לעשות עם וידאו כמו, איך מכוונים צילום שיכול ממש לסלף את מה שבעצם קורה. ועבדתי הרבה זמן עם אפקטים חזותיים, וידעתי מזה זמן רב, שעם מספיק מאמץ אפשר לרמות כל אחד, בנוגע לכל דבר. הדבר הזה ושדיפ-פייק עושים, הוא להפוך זאת ליותר קל ונגיש לעבד וידאו, כמו שפוטושופ עשו כדי לעבד תמונות לפני זמן מה.

I prefer to think about how this technology could bring humanity to other technology and bring us all closer together. Now that you've seen this, think about the possibilities. Right off the bat, you're going to see it in live events and concerts, like this. Digital celebrities, especially with new projection technology, are going to be just like the movies, but alive and in real time. And new forms of communication are coming. You can already interact with DigiDoug in VR. And it is eye-opening. It's just like you and I are in the same room, even though we may be miles apart. Heck, the next time you make a video call, you will be able to choose the version of you you want people to see. It's like really, really good makeup. I was scanned about a year and a half ago. I've aged. DigiDoug hasn't. On video calls, I never have to grow old.

אני מעדיף לחשוב כיצד הטכנולוגיה הזו יכולה להצעיד את האנושות לטכנולוגיה אחרת ולקרב את כולנו זה לזה. עכשיו לאחר שראיתם את זה, חישבו על האפשרויות. די מהר, תראו את זה באירועים וקונצרטים, סלבריטאים דיגיטליים, במיוחד אלה עם טכנולוגיית הקרנה חדשה, יהיו בדיוק כמו בסרטים, אבל חיים ובזמן אמת. וצורות חדשות של תקשורת מגיעות. אתם כבר יכולים לתקשר עם "דיג'ידאג" במציאות מדומה. וזו חוויה מפתיעה ומחכימה. זה בדיוק כאילו שאתם ואני נמצאים באותו חדר, למרות שכמה קילומטרים עשויים להפריד בינינו. בפעם הבאה שתבצעו שיחת וידאו, תוכלו לבחור את הגרסה של עצמכם שאתם רוצים שאנשים יראו. זה ממש כמו איפור ממש טוב. עברתי סריקה לפני כשנה וחצי. אני הזדקנתי. "דיג'י-דאג" לא. בשיחות וידאו, אני אף פעם לא צריך להזדקן.

And as you can imagine, this is going to be used to give virtual assistants a body and a face. A humanity. I already love it that when I talk to virtual assistants, they answer back in a soothing, humanlike voice. Now they'll have a face. And you'll get all the nonverbal cues that make communication so much easier. It's going to be really nice. You'll be able to tell when a virtual assistant is busy or confused or concerned about something.

וכפי שאתם יכולים לדמיין, זה יוכל לשמש כדי לתת לעוזרים וירטואליים גוף ופנים. אנושיות. אני כבר אוהב את זה שכשאני מדבר אל עוזרים וירטואליים, הם עונים בקול אנושי מרגיע, . עכשיו יהיה להם פרצוף. ותקבלו את כל הרמזים הלא מילוליים שהופכים תקשורת להרבה יותר נוחה. זה יהיה ממש נחמד. תוכלו לדעת מתי עוזר וירטואלי עסוק או מבולבל או מודאג לגבי משהו.

Now, I couldn't leave the stage without you actually being able to see my real face, so you can do some comparison. So let me take off my helmet here. Yeah, don't worry, it looks way worse than it feels.

ולא יכולתי לעזוב את הבמה מבלי שתוכלו לראות את פני האמיתיים, כך שאתם יכולים קצת להשוות, אז הרשו לי להוריד את הקסדה שלי כאן. אל תחששו, זה נראה יותר גרוע ממה שזה מרגיש

(Laughter)

(צחוק)

So this is where we are. Let me put this back on here.

אז זה המקום שבו אנחנו נמצאים. הרשו לי להחזיר את זה לכאן.

(Laughter) Doink!

(צחוק) לעזאזל!

So this is where we are. We're on the cusp of being able to interact with digital humans that are strikingly real, whether they're being controlled by a person or a machine. And like all new technology these days, it's going to come with some serious and real concerns that we have to deal with. But I am just so really excited about the ability to bring something that I've seen only in science fiction for my entire life into reality. Communicating with computers will be like talking to a friend. And talking to faraway friends will be like sitting with them together in the same room.

אז פה אנחנו נמצאים. אנחנו על סף היכולת לתקשר עם בני אדם דיגיטליים שהם אמיתיים להפליא, בין אם הם נשלטים על ידי אדם או מכונה. וכמו כל טכנולוגיה חדשה בימים אלה, זה יבוא עם כמה בעיות רציניות וממשיות שאנו צריכים להתמודד איתן. אבל אני פשוט כל כך נלהב מהיכולת להביא משהו שראיתי רק במדע בדיוני במשך כל חיי למציאות. תקשורת עם מחשבים תהיה כמו לדבר עם חבר. ולדבר עם חברים במרחקים יהיה כמו לשבת איתם באותו חדר.

Thank you very much.

תודה רבה לכם

(Applause)

(מחיאות כפיים)

Hello.

שלום.

(Applause)

(מחיאות כפיים)

(Laughter)

(צחוק)

(Laughter)

(צחוק)

(Laughter)

צחוק

But why did we do this? Well, there's two reasons, really. First of all, it is just crazy cool.

אבל למה עשינו את זה? ובכן, באמת, משתי סיבות. קודם כל, זה פשוט מגניב בטירוף.

(Laughter)

(צחוק)

(Laughter)

(צחוק)

(Laughter)

(צחוק)

So this is where we are. Let me put this back on here.

אז זה המקום שבו אנחנו נמצאים. הרשו לי להחזיר את זה לכאן.

(Laughter) Doink!

(צחוק) לעזאזל!

Thank you very much.

תודה רבה לכם

(Applause)

(מחיאות כפיים)

Doug Roble: Digital humans that look just like us

Doug Roble: Digital humans that look just like us

Related talks

Paul Debevec: Animating a photo-real digital face

Danielle Feinberg: The magic ingredient that brings Pixar movies to life

Tasos Frantzolas: Everything you hear on film is a lie

James Bridle: The nightmare videos of children's YouTube -- and what's wrong with the internet today

Chris Milk: How virtual reality can create the ultimate empathy machine

Ariel Garten: Know thyself, with a brain scanner

Related talks

Paul Debevec: Animating a photo-real digital face

Danielle Feinberg: The magic ingredient that brings Pixar movies to life

Tasos Frantzolas: Everything you hear on film is a lie

James Bridle: The nightmare videos of children's YouTube -- and what's wrong with the internet today

Chris Milk: How virtual reality can create the ultimate empathy machine

Ariel Garten: Know thyself, with a brain scanner