Doug Roble: Digital humans that look just like us

Hello. I'm not a real person. I'm actually a copy of a real person. Although, I feel like a real person. It's kind of hard to explain. Hold on -- I think I saw a real person ... there's one. Let's bring him onstage.

Привіт. Я не справжня людина. Насправді, я копія справжньої людини. Проте, я почуваюся, наче справжня людина. Це трохи важко пояснити. Постривайте -- здається, я помітив справжню людину ... ось він. Давайте покличемо його на сцену.

Hello.

Привіт.

(Applause)

(Оплески)

What you see up there is a digital human. I'm wearing an inertial motion capture suit that's figuring what my body is doing. And I've got a single camera here that's watching my face and feeding some machine-learning software that's taking my expressions, like, "Hm, hm, hm," and transferring it to that guy. We call him "DigiDoug." He's actually a 3-D character that I'm controlling live in real time.

Те, що ви бачите на екрані -- цифрова людина. Я одягнутий в інерційний костюм для відтворення рухів, який визначає, що робить моє тіло. А тут у мене одна камера, яка слідкує за обличчям і передає інформацію у програму з технологією машинного навчання, яка приймає вирази обличчя типу "Хм, хм, хм" і перетворює у цього хлопця. Ми називаємо його "ДіджіДаг". Це 3D персонаж, якого я контролюю у режимі реального часу.

So, I work in visual effects. And in visual effects, one of the hardest things to do is to create believable, digital humans that the audience accepts as real. People are just really good at recognizing other people. Go figure! So, that's OK, we like a challenge.

Отже, я працюю над візуальними ефектами. І найскладніше в нашій галузі - це створення правдоподібних цифрових людей, яких глядачі сприйматимуть, як справжніх. Люди, насправді, досить добре розпізнають інших людей. Це ж треба! Ну, це добре, ми любимо виклики.

Over the last 15 years, we've been putting humans and creatures into film that you accept as real. If they're happy, you should feel happy. And if they feel pain, you should empathize with them. We're getting pretty good at it, too. But it's really, really difficult. Effects like these take thousands of hours and hundreds of really talented artists.

Упродовж останніх 15 років ми додавали у фільми людей та істот, які сприймалися, як справжні. Якщо вони щасливі, то ви теж почуваєтеся щасливими. І якщо їм боляче, то ви їм співчуваєте. Нам вдається це робити щоразу краще. Але робити це дуже, дуже складно. Такі ефекти потребують тисячі годин і праці сотень справді талановитих митців.

But things have changed. Over the last five years, computers and graphics cards have gotten seriously fast. And machine learning, deep learning, has happened. So we asked ourselves: Do you suppose we could create a photo-realistic human, like we're doing for film, but where you're seeing the actual emotions and the details of the person who's controlling the digital human in real time? In fact, that's our goal: If you were having a conversation with DigiDoug one-on-one, is it real enough so that you could tell whether or not I was lying to you? So that was our goal.

Проте це змінилося. За останні п'ять років комп'ютери та відеокарти стали суттєво швидшими. З'явилося машинне та глибинне навчання. Тож ми подумали: "Чи зможемо ми створити фотореалістичну людину, як у фільмах, але таку, котра демонструватиме справжні емоції та деталі міміки людини, яка контролюватиме цифрову людину у реальному часі?" Якщо чесно, це і є наша мета: якби ви вели бесіду з ДіджіДагом віч-на-віч, чи змогли би ви зрозуміти, обманюю я вас, чи ні? Це і була наша ціль.

About a year and a half ago, we set off to achieve this goal.

Приблизно півтора року тому

What I'm going to do now is take you basically on a little bit of a journey to see exactly what we had to do to get where we are. We had to capture an enormous amount of data. In fact, by the end of this thing, we had probably one of the largest facial data sets on the planet. Of my face.

ми почали шлях до цієї цілі, а зараз я збираюся взяти вас у маленьку подорож, щоб показати саме те, що нам довелося зробити, аби опинитися там, де ми зараз. Нам довелося зібрати неймовірнмй об'єм даних. Фактично, наприкінці цього процесу у нас був, імовірно, найбільший набір інформації про обличчя на планеті. Інформації про моє лице.

(Laughter)

(Сміх)

Why me? Well, I'll do just about anything for science. I mean, look at me! I mean, come on. We had to first figure out what my face actually looked like. Not just a photograph or a 3-D scan, but what it actually looked like in any photograph, how light interacts with my skin. Luckily for us, about three blocks away from our Los Angeles studio is this place called ICT. They're a research lab that's associated with the University of Southern California. They have a device there, it's called the "light stage." It has a zillion individually controlled lights and a whole bunch of cameras. And with that, we can reconstruct my face under a myriad of lighting conditions. We even captured the blood flow and how my face changes when I make expressions. This let us build a model of my face that, quite frankly, is just amazing. It's got an unfortunate level of detail, unfortunately.

Чому саме я? Ну, я готовий на все заради науки. Та ви тільки погляньте на мене! Та серйозно. Отже, спершу нам потрібно було визначити, як насправді виглядає моє обличчя. Не просто на фото чи на 3D скані, але як воно виглядає на будь-якому фото, як світло взаємодіє з моєю шкірою. На щастя для нас, за три квартали від нашої студії в Лос-Анджелесі існує місце під назвою ІКТ. Там знаходиться дослідницька лабораторія, яка пов'язана з Південно-Каліфорнійським університетом. У них є так звана "світлова сцена"-- пристрій, обладнаний незліченною кількістю індивідуально контрольованих лампочок і цілою купою камер. І за допомогою цього пристрою ми змогли реконструювати моє обличчя за всеможливих умов освітлення. Нам навіть вдалося зафіксувати кровообіг, і те, як змінювалося моє обличчя залежно від різних виразів. Це дозволило нам побудувати модель мого обличчя, яка, чесно кажучи, просто неймовірна. Рівень її деталізованості виявився феноменальним.

(Laughter)

На жаль. (Сміх)

You can see every pore, every wrinkle. But we had to have that. Reality is all about detail. And without it, you miss it. We are far from done, though. This let us build a model of my face that looked like me. But it didn't really move like me. And that's where machine learning comes in. And machine learning needs a ton of data. So I sat down in front of some high-resolution motion-capturing device. And also, we did this traditional motion capture with markers. We created a whole bunch of images of my face and moving point clouds that represented that shapes of my face. Man, I made a lot of expressions, I said different lines in different emotional states ... We had to do a lot of capture with this. Once we had this enormous amount of data, we built and trained deep neural networks. And when we were finished with that, in 16 milliseconds, the neural network can look at my image and figure out everything about my face. It can compute my expression, my wrinkles, my blood flow -- even how my eyelashes move. This is then rendered and displayed up there with all the detail that we captured previously.

Ви можете побачити кожну пору, кожну зморшку. Нам треба було зробити саме так. Реальність -- це деталі. І без деталей не було б такого результату. Проте, залишилося ще дуже багато роботи. Згаданий пристрій дозволив нам зробити модель, яка виглядає, як я. Але вона не могла рухатися так, як я. Тут в гру і вступає машинне навчання. І воно потребує безлічі даних. Тому я мусів сидіти навпроти пристрою з високою роздільною якістю для фіксації кожного руху. Окрім того, ми відзначали рухи ще й традиційно: за допомогою маркерів. Ми створили цілу купу зображень мого обличчя і рухливих точкових хмар, які відображали форми мого обличчя. Мені треба було багато кривлятися... Я промовляв різні репліки з різними емоціями... Нам довелося повторювати багато рухів для цього. І коли ми отримали цей колосальний об'єм даних, ми побудували і натренували глибинні нейронні мережі. Коли ми завершили, ця мережа за 16 мілісекунд, дивлячись на моє зображення, навчилася повністю розпізнавати моє лице. Вона може обраховувати вирази мого обличчя, мої зморшки, мій кровообіг і навіть рух моїх повік. Після цього це все візуалізується й виводиться ось тут з усіма деталями, які ми зняли раніше.

We're far from done. This is very much a work in progress. This is actually the first time we've shown it outside of our company. And, you know, it doesn't look as convincing as we want; I've got wires coming out of the back of me, and there's a sixth-of-a-second delay between when we capture the video and we display it up there. Sixth of a second -- that's crazy good! But it's still why you're hearing a bit of an echo and stuff. And you know, this machine learning stuff is brand-new to us, sometimes it's hard to convince to do the right thing, you know? It goes a little sideways.

Ще далеко до завершення. Це, по факту, робочий прототип. Насправді, це перший показ за межами нашої компанії. І, знаєте, воно виглядає не так переконливо, як нам хочеться. Позаду кабелі, а також затримка в одну шосту секунди між захопленням відео і демонстрацією на екрані. Одна шоста секунди -- це неймовірно добре! Але і через цю затримку ви можете почути ехо і тому подібне. Усе це машинне навчання новинка для нас і, знаєте, деколи його важко змусити робити те, що нам потрібно. Він деколи бешкетує.

(Laughter)

(Сміх)

But why did we do this? Well, there's two reasons, really. First of all, it is just crazy cool.

Власне, чому ми розробили цю технологію? Ну, якщо чесно, є дві причини. Перша, це дико круто.

(Laughter)

(Сміх)

How cool is it? Well, with the push of a button, I can deliver this talk as a completely different character. This is Elbor. We put him together to test how this would work with a different appearance. And the cool thing about this technology is that, while I've changed my character, the performance is still all me. I tend to talk out of the right side of my mouth; so does Elbor.

Наскільки? Ну, натиснувши одну клавішу, я можу продовжувати цю розмову, як зовсім інший персонаж. Ось це Елбор. Ми зробили його, щоб перевірити, як наша розробка працюватиме з іншим зовнішнім виглядом. Ця технологія крута тим, що, навіть змінивши персонажа, я все ще граю роль. Наприклад, я розмовляю правою частиною рота, а Елбор це повторює.

(Laughter)

(Сміх)

Now, the second reason we did this, and you can imagine, is this is going to be great for film. This is a brand-new, exciting tool for artists and directors and storytellers. It's pretty obvious, right? I mean, this is going to be really neat to have. But also, now that we've built it, it's clear that this is going to go way beyond film.

Друга причина, чому ми це розробили, ви здогадуєтеся, бо ця технологія буде корисною для кінематографу. Вона стане новітнім, прекрасним інструментом для митців, режисерів і оповідачів. Це очевидно, правда? Буде чудово мати таку технологію. А ще, коли ми її розробили, стало очевидно, що вона буде застосована не тільки для фільмів.

But wait. Didn't I just change my identity with the push of a button? Isn't this like "deepfake" and face-swapping that you guys may have heard of? Well, yeah. In fact, we are using some of the same technology that deepfake is using. Deepfake is 2-D and image based, while ours is full 3-D and way more powerful. But they're very related. And now I can hear you thinking, "Darn it! I though I could at least trust and believe in video. If it was live video, didn't it have to be true?" Well, we know that's not really the case, right? Even without this, there are simple tricks that you can do with video like how you frame a shot that can make it really misrepresent what's actually going on. And I've been working in visual effects for a long time, and I've known for a long time that with enough effort, we can fool anyone about anything. What this stuff and deepfake is doing is making it easier and more accessible to manipulate video, just like Photoshop did for manipulating images, some time ago.

Але постривайте. Хіба я не змінив свою особистість одним натисканням на клавішу? Чи не схоже це на "deepfake" та заміну лиць, про яку ви, можливо, чули? Так і є. Якщо чесно, ми навіть використовуємо ту саму технологію, що і "deepfake". "Deepfake" базується на 2D зображеннях, а наша технологія працює у 3D і є набагато потужнішою. Але вони дуже схожі. І тепер мені здається, що ви всі думаєте: "Щоб його! Я думав, що можна вірити відео, якщо це пряма трансляція. Бо якщо так, то його не можна сфальсифікувати, правильно?" Що ж, ми ж знаємо, що це не так, правда? Навіть без цього існують прості хитрощі, які можна застосувати, як, наприклад, постановка кадру. щоб показати все, що відбувається, в іншому світлі. Я працюю з візуальними ефектами вже дуже довго і давно знаю, що приклавши достатньо зусиль, ми можемо обманути будь-кого стосовно будь-чого. Наша технологія і "deepfake" просто спрощує і робить маніпуляції з відео доступнішими, як колись давно Photoshop вчинив із маніпуляціями над фото.

I prefer to think about how this technology could bring humanity to other technology and bring us all closer together. Now that you've seen this, think about the possibilities. Right off the bat, you're going to see it in live events and concerts, like this. Digital celebrities, especially with new projection technology, are going to be just like the movies, but alive and in real time. And new forms of communication are coming. You can already interact with DigiDoug in VR. And it is eye-opening. It's just like you and I are in the same room, even though we may be miles apart. Heck, the next time you make a video call, you will be able to choose the version of you you want people to see. It's like really, really good makeup. I was scanned about a year and a half ago. I've aged. DigiDoug hasn't. On video calls, I never have to grow old.

Я надаю перевагу думкам про те, як ця технологія зможе допомогти створити новіші технології і зблизити нас усіх. Тепер, коли ви побачили її в дії, подумайте про її можливості. Перша ж думка: ви побачите її застосування на концертах та на заходах наживо, як тут. Цифрові знаменитості, особливо завдяки сучасним технологіям проектування, виглядатимуть як у фільмах, тільки наживо і в реальному часі. І нові форми спілкування також на підході. Навіть зараз ви можете взаємодіяти з ДіджіДагом у віртуальній реальності. І це просто приголомшує. Це так, ніби ми з вами сидимо в одній кімнаті, хоча насправді ми можемо бути на відстані миль. Коли ви будете телефонувати по відеозв'язку наступного разу, ви зможете обрати ту версію вас, яку захочете показати іншим. Це як дуже, дуже якісний макіяж. Мене просканували приблизно півтора року тому. Я постарів. А ДіджіДаг -- ні. Під час відеодзвінків я можу взагалі не старіти.

And as you can imagine, this is going to be used to give virtual assistants a body and a face. A humanity. I already love it that when I talk to virtual assistants, they answer back in a soothing, humanlike voice. Now they'll have a face. And you'll get all the nonverbal cues that make communication so much easier. It's going to be really nice. You'll be able to tell when a virtual assistant is busy or confused or concerned about something.

І як ви можете уявити, це буде використано, щоб надати віртуальним помічникам тіла та обличчя. Схожість на людину. Мені подобається те, що віртуальні помічники відповідають мені людиноподібним, спокійним голосом. Тепер вони матимуть і обличчя. А ви отримуватиме усі ці невербальні підказки, які робитимуть спілкування набагато простішим. Ви зможете зрозуміти, коли віртуальний помічник буде зайнятим, стурбованим чи схвильованим через щось.

Now, I couldn't leave the stage without you actually being able to see my real face, so you can do some comparison. So let me take off my helmet here. Yeah, don't worry, it looks way worse than it feels.

Я не можу піти зі сцени, не показавши вам свого справжнього обличчя, аби ви мали змогу порівняти. Отож, давайте я зніму шолом. Так, не хвилюйтесь: це виглядає набагато гірше, ніж відчувається.

(Laughter)

(Сміх)

So this is where we are. Let me put this back on here.

Ось чого ми досягнули. Тепер давайте я знову одягну шолом.

(Laughter) Doink!

(Сміх) Упс.

So this is where we are. We're on the cusp of being able to interact with digital humans that are strikingly real, whether they're being controlled by a person or a machine. And like all new technology these days, it's going to come with some serious and real concerns that we have to deal with. But I am just so really excited about the ability to bring something that I've seen only in science fiction for my entire life into reality. Communicating with computers will be like talking to a friend. And talking to faraway friends will be like sitting with them together in the same room.

Ось чого ми досягнули. Ми за крок від здатності взаємодіяти з нереально правдоподібними цифровими людьми, незалежно від того, чи ними керує інша людина чи машина. І як будь-яка інша сучасна технологія, ця можливість тягне за собою серйозні і справжні занепокоєння, з якими нам треба буде справитися. Але я неймовірно збуджений від можливості втілити у життя те, що я раніше бачив хіба що в науковій фантастиці. Спілкування з комп'ютером буде ідентичним до розмови з другом. А розмова з віддаленим другом буде подібною до розмови з людиною, яка сидить у тій самій кімнаті.

Thank you very much.

Дуже дякую.

(Applause)

(Оплески)

Hello.

Привіт.

(Applause)

(Оплески)

About a year and a half ago, we set off to achieve this goal.

Приблизно півтора року тому

(Laughter)

(Сміх)

(Laughter)

На жаль. (Сміх)

(Laughter)

(Сміх)

But why did we do this? Well, there's two reasons, really. First of all, it is just crazy cool.

Власне, чому ми розробили цю технологію? Ну, якщо чесно, є дві причини. Перша, це дико круто.

(Laughter)

(Сміх)

(Laughter)

(Сміх)

(Laughter)

(Сміх)

So this is where we are. Let me put this back on here.

Ось чого ми досягнули. Тепер давайте я знову одягну шолом.

(Laughter) Doink!

(Сміх) Упс.

Thank you very much.

Дуже дякую.

(Applause)

(Оплески)

Doug Roble: Digital humans that look just like us

Doug Roble: Digital humans that look just like us

Related talks

Paul Debevec: Animating a photo-real digital face

Danielle Feinberg: The magic ingredient that brings Pixar movies to life

Tasos Frantzolas: Everything you hear on film is a lie

James Bridle: The nightmare videos of children's YouTube -- and what's wrong with the internet today

Chris Milk: How virtual reality can create the ultimate empathy machine

Ariel Garten: Know thyself, with a brain scanner

Related talks

Paul Debevec: Animating a photo-real digital face

Danielle Feinberg: The magic ingredient that brings Pixar movies to life

Tasos Frantzolas: Everything you hear on film is a lie

James Bridle: The nightmare videos of children's YouTube -- and what's wrong with the internet today

Chris Milk: How virtual reality can create the ultimate empathy machine

Ariel Garten: Know thyself, with a brain scanner