Doug Roble: Digital humans that look just like us

Hello. I'm not a real person. I'm actually a copy of a real person. Although, I feel like a real person. It's kind of hard to explain. Hold on -- I think I saw a real person ... there's one. Let's bring him onstage.

Hallo. Ik ben geen echt persoon. Ik ben eigenlijk een kopie van een echt persoon. Al voel ik me wel als een echt persoon. Het is moeilijk uit te leggen. Wacht even -- ik geloof dat ik een echt persoon zag ... daar is er één. Laten we hem naar het podium sturen.

Hello.

Hallo.

(Applause)

(Applaus)

What you see up there is a digital human. I'm wearing an inertial motion capture suit that's figuring what my body is doing. And I've got a single camera here that's watching my face and feeding some machine-learning software that's taking my expressions, like, "Hm, hm, hm," and transferring it to that guy. We call him "DigiDoug." He's actually a 3-D character that I'm controlling live in real time.

Wat jullie daarboven zien is een digitaal mens. Ik draag een stabiel motioncapturepak, dat uitzoekt wat mijn lichaam doet. Ik heb hier een enkele camera die mijn gezicht bekijkt en die een zelflerend systeem voedt met informatie over mijn uitdrukkingen, zoals: 'Hm, hm, hm', en dat overbrengt op die vent daar. We noemen hem 'DigiDoug'. Hij is eigenlijk een 3-D-personage dat ik in real-time bestuur.

So, I work in visual effects. And in visual effects, one of the hardest things to do is to create believable, digital humans that the audience accepts as real. People are just really good at recognizing other people. Go figure! So, that's OK, we like a challenge.

Ik werk met visuele effecten. En met visuele effecten is één van de uitdagingen het creëren van geloofwaardige, digitale mensen, die het publiek als echt accepteert. Mensen zijn gewoon erg goed in het herkennen van andere mensen. Niet te geloven! Dat geeft niet, we houden van uitdagingen.

Over the last 15 years, we've been putting humans and creatures into film that you accept as real. If they're happy, you should feel happy. And if they feel pain, you should empathize with them. We're getting pretty good at it, too. But it's really, really difficult. Effects like these take thousands of hours and hundreds of really talented artists.

De afgelopen 15 jaar hebben we in films mensen en dieren gestopt die jullie als echt accepteren. Als zij vrolijk zijn, zijn jullie vrolijk. Als zij pijn hebben, voelen jullie met ze mee. We worden daar aardig goed in. Maar het is heel erg moeilijk. Dat soort effecten kosten duizenden uren om te maken en honderden zeer getalenteerde kunstenaars.

But things have changed. Over the last five years, computers and graphics cards have gotten seriously fast. And machine learning, deep learning, has happened. So we asked ourselves: Do you suppose we could create a photo-realistic human, like we're doing for film, but where you're seeing the actual emotions and the details of the person who's controlling the digital human in real time? In fact, that's our goal: If you were having a conversation with DigiDoug one-on-one, is it real enough so that you could tell whether or not I was lying to you? So that was our goal.

Maar dingen veranderen. De afgelopen vijf jaar zijn computers en grafische kaarten zeer snel geworden. Er is nu zelflerende software, 'deep learning' ontstaan. Dus vroegen we ons af: denk je dat we een fotorealistisch mens kunnen creëren, zoals we dat in films doen, maar waarbij je de daadwerkelijke emoties en details ziet van de persoon die het digitale mens bestuurt ... in real-time? Dat is ons doel: als je een gesprek met DigiDoug zou hebben, één-op-één, kan dat zo realistisch zijn dat je kon weten als ik tegen je loog? Dat was ons doel.

About a year and a half ago, we set off to achieve this goal. What I'm going to do now is take you basically on a little bit of a journey to see exactly what we had to do to get where we are. We had to capture an enormous amount of data. In fact, by the end of this thing, we had probably one of the largest facial data sets on the planet. Of my face.

Zo'n anderhalf jaar geleden begonnen we aan dat doel te werken. Ik ga jullie nu meenemen op een reis om te laten zien wat we precies moesten doen om te komen waar we zijn. We moesten enorm veel gegevens verzamelen. Tegen het einde van dit project hadden we waarschijnlijk één van de grootste verzamelingen gegevens over gezichten ter wereld. Over mijn gezicht.

(Laughter)

(Gelach)

Why me? Well, I'll do just about anything for science. I mean, look at me! I mean, come on. We had to first figure out what my face actually looked like. Not just a photograph or a 3-D scan, but what it actually looked like in any photograph, how light interacts with my skin. Luckily for us, about three blocks away from our Los Angeles studio is this place called ICT. They're a research lab that's associated with the University of Southern California. They have a device there, it's called the "light stage." It has a zillion individually controlled lights and a whole bunch of cameras. And with that, we can reconstruct my face under a myriad of lighting conditions. We even captured the blood flow and how my face changes when I make expressions. This let us build a model of my face that, quite frankly, is just amazing. It's got an unfortunate level of detail, unfortunately.

Waarom ik? Ik zou zo'n beetje alles doen voor de wetenschap. Kijk eens naar mij! Kom op. We moesten er eerst achter komen hoe mijn gezicht er daadwerkelijk uitzag. Niet op een gewone foto of 3-D-scan, maar als op elke foto die je maar zou kunnen maken, hoe het licht interageert met mijn huid. Gelukkig is er zo'n drie straten bij onze studio in Los Angeles vandaan een bedrijf dat ICT heet. Dat is een onderzoekslab dat samenwerkt met de universiteit van Zuid-Californië. Ze hebben daar een apparaat dat 'het lichtpodium' heet. Dat heeft wel een ziljoen individueel bestuurbare lampjes en een heleboel camera's. Daarmee kunnen we mijn gezicht in talloze lichtcondities reconstrueren. Zelfs de bloedstroom is vastgelegd, en hoe mijn gezicht verandert bij diverse gezichtsuitdrukkingen. Hiermee konden we een fantastisch model van mijn gezicht maken. Het is helaas wel vervelend gedetailleerd.

(Laughter)

(Gelach)

You can see every pore, every wrinkle. But we had to have that. Reality is all about detail. And without it, you miss it. We are far from done, though. This let us build a model of my face that looked like me. But it didn't really move like me. And that's where machine learning comes in. And machine learning needs a ton of data. So I sat down in front of some high-resolution motion-capturing device. And also, we did this traditional motion capture with markers. We created a whole bunch of images of my face and moving point clouds that represented that shapes of my face. Man, I made a lot of expressions, I said different lines in different emotional states ... We had to do a lot of capture with this. Once we had this enormous amount of data, we built and trained deep neural networks. And when we were finished with that, in 16 milliseconds, the neural network can look at my image and figure out everything about my face. It can compute my expression, my wrinkles, my blood flow -- even how my eyelashes move. This is then rendered and displayed up there with all the detail that we captured previously.

Je ziet elke porie, elke rimpel. Maar dat moesten we zo doen. Realisme draait helemaal om details. Als die er niet zijn, mis je ze. We zijn echter nog lang niet klaar. Dit laat ons een model van mijn gezicht bouwen dat op mij lijkt. Maar het bewoog niet echt zoals ik. En dat is waar we zelflerende software gebruiken. Die heeft een enorme hoeveelheid gegevens nodig. Ik ging voor een motion-capturingapparaat met een hoge resolutie zitten. En we gebruikten ook traditionele motion capture met puntjes. We creëerden heel veel afbeeldingen van mijn gezicht en wolken van bewegende puntjes die de vorm van mijn gezicht voorstelden. Ik heb een heleboel gezichtsuitdrukkingen gemaakt. Ik sprak verschillende zinnen uit in verschillende emotionele toestanden. Hier moesten we veel van vastleggen. Toen we een enorme hoeveelheid gegevens hadden, bouwden en trainden we diepe neurale netwerken. Toen we daarmee klaar waren kon het neurale netwerk binnen 16 milliseconden naar mijn evenbeeld kijken en alles van mijn gezicht weten. Het kan mijn gezichtsuitdrukking, mijn rimpels, mijn bloedstroom berekenen, en zelfs de beweging van mijn wimpers. Dit wordt dan omgezet en daarboven weergegeven, met alle details die we eerder vastgelegd hebben.

We're far from done. This is very much a work in progress. This is actually the first time we've shown it outside of our company. And, you know, it doesn't look as convincing as we want; I've got wires coming out of the back of me, and there's a sixth-of-a-second delay between when we capture the video and we display it up there. Sixth of a second -- that's crazy good! But it's still why you're hearing a bit of an echo and stuff. And you know, this machine learning stuff is brand-new to us, sometimes it's hard to convince to do the right thing, you know? It goes a little sideways.

We zijn nog lang niet klaar. Het is nog steeds werk in uitvoering. Dit is de eerste keer dat we het buiten ons bedrijf laten zien. En het ziet er nog niet zo overtuigend uit als we zouden willen. Er komen draadjes uit mijn achterkant, er is een vertraging van zes-honderdste seconde tussen de opname van de video en het uitzenden daarboven. Zes-honderdste van een seconde, dat is krankzinnig goed! Maar het is wel de reden waarom je nog een echo hoort en zo. Dit hele machine learning is nog gloednieuw voor ons, waardoor het soms moeilijk is om het goed te laten functioneren. Het gaat soms een beetje mis.

(Laughter)

(Gelach)

But why did we do this? Well, there's two reasons, really. First of all, it is just crazy cool.

Waarom doen we dit? Er zijn eigenlijk twee redenen. Ten eerste, het is gewoon krankzinnig gaaf.

(Laughter)

(Gelach)

How cool is it? Well, with the push of a button, I can deliver this talk as a completely different character. This is Elbor. We put him together to test how this would work with a different appearance. And the cool thing about this technology is that, while I've changed my character, the performance is still all me. I tend to talk out of the right side of my mouth; so does Elbor.

Hoe gaaf precies? Door één druk op de knop kan ik deze talk doen als een compleet ander personage. Dit is Elbor. We hebben hem ontworpen om te kijken hoe dit werkt met een ander uiterlijk. Het gave aan deze technologie is dat, hoewel mijn personage anders is, ik hem nog steeds volledig bestuur. Ik neig ernaar om uit de rechterkant van mijn mond te praten; Elbor ook.

(Laughter)

(Gelach)

Now, the second reason we did this, and you can imagine, is this is going to be great for film. This is a brand-new, exciting tool for artists and directors and storytellers. It's pretty obvious, right? I mean, this is going to be really neat to have. But also, now that we've built it, it's clear that this is going to go way beyond film.

Je kan je voorstellen dat de tweede reden waarom we dit deden is dat het geweldig gaat zijn voor films. Dit is een gloednieuw, spannend gereedschap voor kunstenaars en regisseurs en verhalenvertellers. Het is aardig duidelijk, toch? Dit gaat geweldig zijn om te hebben. Maar nu we het hebben gemaakt wordt duidelijk dat dit nog verder zal gaan dan alleen films.

But wait. Didn't I just change my identity with the push of a button? Isn't this like "deepfake" and face-swapping that you guys may have heard of? Well, yeah. In fact, we are using some of the same technology that deepfake is using. Deepfake is 2-D and image based, while ours is full 3-D and way more powerful. But they're very related. And now I can hear you thinking, "Darn it! I though I could at least trust and believe in video. If it was live video, didn't it have to be true?" Well, we know that's not really the case, right? Even without this, there are simple tricks that you can do with video like how you frame a shot that can make it really misrepresent what's actually going on. And I've been working in visual effects for a long time, and I've known for a long time that with enough effort, we can fool anyone about anything. What this stuff and deepfake is doing is making it easier and more accessible to manipulate video, just like Photoshop did for manipulating images, some time ago.

Maar wacht. Heb ik niet zojuist mijn identiteit veranderd door op een knop te drukken? Is dit niet hetzelfde als 'deepfake' en faceswapping, waar jullie vast van gehoord hebben? Dat klopt. We gebruiken zelfs gedeeltelijk dezelfde technologie als die deepfake gebruikt. Deepfake is 2-D en gebaseerd op afbeeldingen, terwijl de onze volledig 3-D is, en veel krachtiger. Maar ze lijken erg op elkaar. Nu hoor ik jullie denken: 'Verdorie! Ik dacht dat ik op z'n minst video's kon vertrouwen en geloven. Als het live te zien was moet het toch waar zijn?' We weten dat dit niet meer het geval is, toch? Zelfs zonder dit alles zijn er simpele trucjes die je met video's kan uithalen, zoals bepaling van de opnamehoek zodat het misleidend weergeeft wat er eigenlijk gebeurt. Ik werk al lang met visuele effecten en ik weet al lang dat we met genoeg moeite iedereen alles kunnen laten geloven. Wat dit soort dingen en deepfake doen is het manipuleren van video's makkelijker en toegankelijker maken, net zoals Photoshop dat een poosje terug deed voor het manipuleren van foto's.

I prefer to think about how this technology could bring humanity to other technology and bring us all closer together. Now that you've seen this, think about the possibilities. Right off the bat, you're going to see it in live events and concerts, like this. Digital celebrities, especially with new projection technology, are going to be just like the movies, but alive and in real time. And new forms of communication are coming. You can already interact with DigiDoug in VR. And it is eye-opening. It's just like you and I are in the same room, even though we may be miles apart. Heck, the next time you make a video call, you will be able to choose the version of you you want people to see. It's like really, really good makeup. I was scanned about a year and a half ago. I've aged. DigiDoug hasn't. On video calls, I never have to grow old.

Ik denk liever aan hoe deze technologie menselijkheid aan andere technologie kan toevoegen, en ons allemaal dichter bij elkaar kan brengen. Nu jullie dit gezien hebben, bedenk eens wat de mogelijkheden zijn. Vanaf het begin ga je dit zien bij live evenementen en concerten, zoals hier. Digitale beroemdheden, vooral met nieuwe projectietechnologie, zullen net als in de films zijn, maar dan levensecht en in real-time. En er komen nieuwe vormen van communicatie. Je kan al met DigiDoug in VR interageren. En dat is een openbaring. Het is net alsof jij en ik in dezelfde kamer zitten, zelfs al zijn we mijlenver uit elkaar. De volgende keer dat je gaan videobellen zul je de versie van jezelf kunnen kiezen waarvan je wilt dat mensen hem zien. Het is net als ontzettend goede make-up. De scans zijn anderhalf jaar oud. Ik ben ouder geworden. DigiDoug niet. Ik hoef nooit ouder te worden op videogesprekken.

And as you can imagine, this is going to be used to give virtual assistants a body and a face. A humanity. I already love it that when I talk to virtual assistants, they answer back in a soothing, humanlike voice. Now they'll have a face. And you'll get all the nonverbal cues that make communication so much easier. It's going to be really nice. You'll be able to tell when a virtual assistant is busy or confused or concerned about something.

Je kan je ook voorstellen dat dit gebruikt gaat worden om virtual assistents een lichaam en een gezicht te geven. Menselijkheid. Ik vind het al geweldig dat als ik met virtual assistants praat, ze antwoord geven met een rustgevende, menselijke stem. Nu zullen ze een gezicht hebben. En je zult alle non-verbale signalen zien die communicatie zoveel makkelijker maken. Het zal echt heel fijn zijn. Je zult het zien wanneer een virtual assistent druk of verward is, of zich ergens zorgen over maakt.

Now, I couldn't leave the stage without you actually being able to see my real face, so you can do some comparison. So let me take off my helmet here. Yeah, don't worry, it looks way worse than it feels.

Ik kan niet het podium af gaan zonder dat jullie mijn echte gezicht hebben gezien, zodat je vergelijkingsmateriaal hebt. Ik zal mijn helm even afzetten. Maak je geen zorgen, het ziet er erger uit dan het voelt.

(Laughter)

(Gelach)

So this is where we are. Let me put this back on here.

Dit is waar we nu staan. Ik zal deze weer even opzetten.

(Laughter) Doink!

(Gelach) Doink!

So this is where we are. We're on the cusp of being able to interact with digital humans that are strikingly real, whether they're being controlled by a person or a machine. And like all new technology these days, it's going to come with some serious and real concerns that we have to deal with. But I am just so really excited about the ability to bring something that I've seen only in science fiction for my entire life into reality. Communicating with computers will be like talking to a friend. And talking to faraway friends will be like sitting with them together in the same room.

Dit is waar we nu staan. We kunnen bijna interageren met digitale mensen die levensecht lijken, of ze nu bestuurd worden door een persoon of een machine. En zoals bij alle nieuwe technologie, zal dit gepaard gaan met ernstige en reële zorgen, die we onder ogen moeten zien. Maar ik ben zo enthousiast over de mogelijkheid om iets dat ik mijn hele leven lang alleen in science fiction gezien heb realiteit te maken. Communiceren met computers zal zijn alsof je met een vriend praat. En praten met vrienden die ver weg zijn zal zijn alsof je met hen in dezelfde kamer zit.

Thank you very much.

Heel erg bedankt.

(Applause)

(Applaus)

Hello.

Hallo.

(Applause)

(Applaus)

(Laughter)

(Gelach)

(Laughter)

(Gelach)

(Laughter)

(Gelach)

But why did we do this? Well, there's two reasons, really. First of all, it is just crazy cool.

Waarom doen we dit? Er zijn eigenlijk twee redenen. Ten eerste, het is gewoon krankzinnig gaaf.

(Laughter)

(Gelach)

(Laughter)

(Gelach)

(Laughter)

(Gelach)

So this is where we are. Let me put this back on here.

Dit is waar we nu staan. Ik zal deze weer even opzetten.

(Laughter) Doink!

(Gelach) Doink!

Thank you very much.

Heel erg bedankt.

(Applause)

(Applaus)

Doug Roble: Digital humans that look just like us

Doug Roble: Digital humans that look just like us

Related talks

Paul Debevec: Animating a photo-real digital face

Danielle Feinberg: The magic ingredient that brings Pixar movies to life

Tasos Frantzolas: Everything you hear on film is a lie

James Bridle: The nightmare videos of children's YouTube -- and what's wrong with the internet today

Chris Milk: How virtual reality can create the ultimate empathy machine

Ariel Garten: Know thyself, with a brain scanner

Related talks

Paul Debevec: Animating a photo-real digital face

Danielle Feinberg: The magic ingredient that brings Pixar movies to life

Tasos Frantzolas: Everything you hear on film is a lie

James Bridle: The nightmare videos of children's YouTube -- and what's wrong with the internet today

Chris Milk: How virtual reality can create the ultimate empathy machine

Ariel Garten: Know thyself, with a brain scanner