Doug Roble: Digital humans that look just like us

Hello. I'm not a real person. I'm actually a copy of a real person. Although, I feel like a real person. It's kind of hard to explain. Hold on -- I think I saw a real person ... there's one. Let's bring him onstage.

Salve. Io non sono una persona reale. Sono in realtà una copia di una persona. Anche se mi sento come fossi una persona. È un po' complicato da spiegare. Un momento... penso di aver visto una vera persona... ce n'è una. Portiamola sul palco.

Hello.

Salve.

(Applause)

(Applausi)

What you see up there is a digital human. I'm wearing an inertial motion capture suit that's figuring what my body is doing. And I've got a single camera here that's watching my face and feeding some machine-learning software that's taking my expressions, like, "Hm, hm, hm," and transferring it to that guy. We call him "DigiDoug." He's actually a 3-D character that I'm controlling live in real time.

Ciò che vedete lassù è un umano digitale. Sto indossando una tuta che cattura i movimenti inerziali e sta interpretando ciò che il mio corpo sta facendo. E c'è una videocamera qui che guarda il mio volto e segnala ad un software di machine-learning le mie espressioni tipo: "hm, hm, hm" e le trasferisce a quel signore. Lo chiamiamo "DigiDoug". È in realtà un personaggio 3D che io sto controllando in tempo reale.

So, I work in visual effects. And in visual effects, one of the hardest things to do is to create believable, digital humans that the audience accepts as real. People are just really good at recognizing other people. Go figure! So, that's OK, we like a challenge.

Mi occupo di effetti speciali. E in questo campo, una delle cose più difficili da fare è creare umani digitali credibili che il pubblico possa accettare come reali. Le persone sono di fatto abbastanza brave a riconoscere altre persone. Proveteci! Quindi ok, ci piacciono le sfide.

Over the last 15 years, we've been putting humans and creatures into film that you accept as real. If they're happy, you should feel happy. And if they feel pain, you should empathize with them. We're getting pretty good at it, too. But it's really, really difficult. Effects like these take thousands of hours and hundreds of really talented artists.

Negli ultimi 15 anni, abbiamo inserito umani e creature nei film che voi accettate come reali. Se loro sono felici, voi dovreste sentirvi felici. E se loro provano dolore, voi dovreste empatizzare con loro. Stiamo diventando parecchio bravi. Ma è davvero molto difficile. Effetti come questi richiedono migliaia di ore e centinaia di artisti di talento.

But things have changed. Over the last five years, computers and graphics cards have gotten seriously fast. And machine learning, deep learning, has happened. So we asked ourselves: Do you suppose we could create a photo-realistic human, like we're doing for film, but where you're seeing the actual emotions and the details of the person who's controlling the digital human in real time? In fact, that's our goal: If you were having a conversation with DigiDoug one-on-one, is it real enough so that you could tell whether or not I was lying to you? So that was our goal.

Ma le cose sono cambiate. Negli ultimi 5 anni, computer e schede grafiche sono diventati davvero veloci. E il machine-learning e il deep learning hanno progredito. Perciò ci siamo chiesti: puoi immaginare di creare un umano fotorealistico, come stiamo facendo per i film, ma dove tu vedi una reale emozione e dettagli di una persona che sta controllando l'umano digitale in tempo reale? Questo è il nostro obiettivo: se stai avendo una conversazione con DigiDoug uno a uno, è reale abbastanza da farti pensare se è reale o se ti stavo mentendo? Era il nostro obiettivo.

About a year and a half ago, we set off to achieve this goal. What I'm going to do now is take you basically on a little bit of a journey to see exactly what we had to do to get where we are. We had to capture an enormous amount of data. In fact, by the end of this thing, we had probably one of the largest facial data sets on the planet. Of my face.

Circa un anno e mezzo fa ci siamo posti questo obiettivo. Quello che sto per fare è farvi fare un piccolo viaggio per vedere esattamente cosa abbiamo dovuto fare per arrivare qui. Abbiamo dovuto catturare una quantità enorme di dati. Infatti, alla fine di questo processo, avevamo probabilmente uno dei set facciali più ampi al mondo. Del mio volto.

(Laughter)

(Risate)

Why me? Well, I'll do just about anything for science. I mean, look at me! I mean, come on. We had to first figure out what my face actually looked like. Not just a photograph or a 3-D scan, but what it actually looked like in any photograph, how light interacts with my skin. Luckily for us, about three blocks away from our Los Angeles studio is this place called ICT. They're a research lab that's associated with the University of Southern California. They have a device there, it's called the "light stage." It has a zillion individually controlled lights and a whole bunch of cameras. And with that, we can reconstruct my face under a myriad of lighting conditions. We even captured the blood flow and how my face changes when I make expressions. This let us build a model of my face that, quite frankly, is just amazing. It's got an unfortunate level of detail, unfortunately.

Perché me? Beh, farei esattamente qualsiasi cosa per la scienza. Cioè, guardatemi! Andiamo! Abbiamo prima dovuto capire com'era veramente la mia faccia. Non solo una foto o una scansione 3D, ma come appariva di fatto in qualsiasi foto, come la luce interagise con la mia pelle. Fortunatamente per noi, a circa 3 isolati dai nostri studi di Los Angeles c'è questo posto chiamato ICT. È un laboratorio di ricerca associato con la University of Southern California. Hanno un dispositivo lì, chiamato "light stage". Ha milioni e milioni di luci controllate individualmente e un mucchio di telecamere. E così possiamo ricostruire la mia faccia sotto miriadi di condizioni di luce. Abbiamo anche catturato il flusso sanguigno e coma la mia faccia cambia quando mi esprimo. Ciò ci ha portati a costruire un modello che, onestamente, è davvero sorprendente. Ha uno sfortunato livello di dettagli, purtroppo.

(Laughter)

(Risate)

You can see every pore, every wrinkle. But we had to have that. Reality is all about detail. And without it, you miss it. We are far from done, though. This let us build a model of my face that looked like me. But it didn't really move like me. And that's where machine learning comes in. And machine learning needs a ton of data. So I sat down in front of some high-resolution motion-capturing device. And also, we did this traditional motion capture with markers. We created a whole bunch of images of my face and moving point clouds that represented that shapes of my face. Man, I made a lot of expressions, I said different lines in different emotional states ... We had to do a lot of capture with this. Once we had this enormous amount of data, we built and trained deep neural networks. And when we were finished with that, in 16 milliseconds, the neural network can look at my image and figure out everything about my face. It can compute my expression, my wrinkles, my blood flow -- even how my eyelashes move. This is then rendered and displayed up there with all the detail that we captured previously.

Potete vedere ogni poro, ogni ruga. Ma abbiamo dovuto farlo. La realtà è tutto un fatto di dettagli. E senza quelli, fallisci. Siamo lontani dall'aver finito. Abbiamo costruito un modello della mia faccia che mi somigliasse. Ma non si muoveva esattamente come me. Ed è qui che il machine-learning entra in gioco. E ha bisogno di una montagna di dati. Mi sono seduto di fronte a dispositivo che cattura i movimenti in alta risoluzione. In più abbiamo eseguito una motion capture tardizionale con dei pennarelli. Abbiamo creato tutta una serie di immagini della mia faccia e di punti scuri che in movimento rappresentano la forma del mio volto. Wow, avevo un sacco di espressioni, dicevo cose diverse in diversi stati emozionali... Abbiamo dovuto catturare tutto ciò. Una volta raccolta questa enorme quantità di dati, abbiamo costruito e addestrato reti neurali profonde. E quando abbiamo finito con quello, in 16 millisecondi, la rete neurale riesce a guardare la mia immagine e comprendere tutto riguardo la mia faccia. Può processare la mia espressione, le mie rughe, il flusso sanguigno e perfino come le mie ciglia si muovono. Questo è poi trasformato e mostrato lassù con tutti i dettagli catturati in precedenza.

We're far from done. This is very much a work in progress. This is actually the first time we've shown it outside of our company. And, you know, it doesn't look as convincing as we want; I've got wires coming out of the back of me, and there's a sixth-of-a-second delay between when we capture the video and we display it up there. Sixth of a second -- that's crazy good! But it's still why you're hearing a bit of an echo and stuff. And you know, this machine learning stuff is brand-new to us, sometimes it's hard to convince to do the right thing, you know? It goes a little sideways.

Siamo lontani dalla fine. Questo è davvero un work in progress. In realtà è la prima volta che lo mostriamo fuori dai nostri studi. E, sapete, non ci sembra convincente come volevamo; ho cavi che vengono fuori dalle mie spalle, e c'è un ritardo di 1/6 di secondo tra quando catturiamo il video e quando lo mostriamo lassù. Un sesto di secondo... va davvero benissimo! Ma è comunque il motivo per cui sentite una leggera eco. E sapete, questa machine-learning è del tutto nuova per noi, a volte è difficile convincerla a fare la cosa giusta, vero? Va leggermente per conto suo.

(Laughter)

(Risate)

But why did we do this? Well, there's two reasons, really. First of all, it is just crazy cool.

Ma perché abbiamo fatto tutto questo? Beh, ci sono due ragioni. Prima di tutto, è davvero figo.

(Laughter)

(Risate)

How cool is it? Well, with the push of a button, I can deliver this talk as a completely different character. This is Elbor. We put him together to test how this would work with a different appearance. And the cool thing about this technology is that, while I've changed my character, the performance is still all me. I tend to talk out of the right side of my mouth; so does Elbor.

Quanto figo? Beh, premendo un solo bottone, posso condurre questa conferenza con un personaggio completamente diverso. Questo è Elbor. Lo abbiamo realizzato per testare come avrebbe funzionato con un aspetto diverso. E la cosa divertente di questa tecnologia è che, mentre ho cambiato personaggio, la performance è ancora mia. Io tendo a parlare con il lato destro della mia bocca; così pure Elbor.

(Laughter)

(Risate)

Now, the second reason we did this, and you can imagine, is this is going to be great for film. This is a brand-new, exciting tool for artists and directors and storytellers. It's pretty obvious, right? I mean, this is going to be really neat to have. But also, now that we've built it, it's clear that this is going to go way beyond film.

Ora, la seconda ragione è, e potete immaginarlo, che sarà ottima per i film. Questo è uno strumento nuovo ed esaltante per artisti e registi e chi racconta storie. È abbastanza scontato, no? Voglio dire, questo sarà davvero bello da avere. Inoltre, ora che l'abbiamo costruito, è chiaro che andrà ben oltre i film.

But wait. Didn't I just change my identity with the push of a button? Isn't this like "deepfake" and face-swapping that you guys may have heard of? Well, yeah. In fact, we are using some of the same technology that deepfake is using. Deepfake is 2-D and image based, while ours is full 3-D and way more powerful. But they're very related. And now I can hear you thinking, "Darn it! I though I could at least trust and believe in video. If it was live video, didn't it have to be true?" Well, we know that's not really the case, right? Even without this, there are simple tricks that you can do with video like how you frame a shot that can make it really misrepresent what's actually going on. And I've been working in visual effects for a long time, and I've known for a long time that with enough effort, we can fool anyone about anything. What this stuff and deepfake is doing is making it easier and more accessible to manipulate video, just like Photoshop did for manipulating images, some time ago.

Ma aspettate. Non abbiamo appena cambiato la mia identità con un semplice bottone? Questo non è simile al "deepfake" e al face-swapping di cui avete sentito parlare? Beh, sì. Infatti, stiamo usando qualche aspetto della stessa tecnologia che il deepfake sta usando. Deepfake è un'immagine 2D, mentre la nostra è 3D e molto più potente. Ma sono due cose molto vicine. E adesso posso sentirvi pensare: "Accidenti!" Pensavo potessi almeno fidarmi e credere in un video. Se fosse stato un video dal vivo, non sarebbe dovuto essere vero? Beh, noi sappiamo che questo non è proprio il caso, vero? Anche senza questo, ci sono semplici trucchi che puoi impiegare per i video tipo il modo in cui giri una scena che può davvero mal rappresentare ciò che sta davvero accadendo. Ed io lavoro nel campo degli effetti speciali per molto tempo, e so da molto tempo che con abbastanza lavoro, puoi ingannare chiunque su qualunque cosa. Ciò che questa roba e deepfake stanno facendo è rendere più facile e più accessibile la manipolazione di video, proprio come ha fatto Photoshop con le immagini, qualche tempo fa.

I prefer to think about how this technology could bring humanity to other technology and bring us all closer together. Now that you've seen this, think about the possibilities. Right off the bat, you're going to see it in live events and concerts, like this. Digital celebrities, especially with new projection technology, are going to be just like the movies, but alive and in real time. And new forms of communication are coming. You can already interact with DigiDoug in VR. And it is eye-opening. It's just like you and I are in the same room, even though we may be miles apart. Heck, the next time you make a video call, you will be able to choose the version of you you want people to see. It's like really, really good makeup. I was scanned about a year and a half ago. I've aged. DigiDoug hasn't. On video calls, I never have to grow old.

Io preferisco pensare a come questa tecnologia può portare umanità ad altre tecnologie e portare noi più vicini insieme. Ora che avete visto ciò, pensate alle possibilità. Così su due piedi, lo state per vedere in concerti ed eventi live, come questo. Celebrità digitali, soprattutto con nuove tecnologie di proiezione, saranno proprio come i film, ma vivi e in tempo reale. E nuove forme di comunicazione stanno arrivando. Potete già interagire con DigiDoug in Realtà Virtuale. Ed è sconvolgente. È proprio come se noi fossimo nella stessa stanza, anche se distanti chilometri. Cavolo, la prossima volta che videochiami qualcuno sarai tu a scegliere la versione di te stesso/a che vuoi che le persone vedano. È come un makeup fatto molto, molto bene. Le mie scansioni sono state fatte un anno e mezzo fa. Sono invecchiato. DigiDoug no. Nelle videochiamate, non ho mai la necessità di invecchiare.

And as you can imagine, this is going to be used to give virtual assistants a body and a face. A humanity. I already love it that when I talk to virtual assistants, they answer back in a soothing, humanlike voice. Now they'll have a face. And you'll get all the nonverbal cues that make communication so much easier. It's going to be really nice. You'll be able to tell when a virtual assistant is busy or confused or concerned about something.

E come potete immaginare, questo sarà usato per dare ad assistenti virtuali un corpo ed un volto. Un'umanità. Amo già il fatto che quando parlo con assistenti virtuali, loro rispondono con una voce calma, quasi umana. Ora avranno anche una faccia. Ed avrai indizi non verbali che renderanno più facile la comunicazione. Sarà davvero bello. Sarai in grado di capire se un assistente virtuale è impegnato o confuso o preoccupato riguardo qualcosa.

Now, I couldn't leave the stage without you actually being able to see my real face, so you can do some comparison. So let me take off my helmet here. Yeah, don't worry, it looks way worse than it feels.

Ora, non potrei lasciare il palco senza che voi vediate la mia vera faccia in modo che possiate fare un paragone. Quindi fatemi togliere il mio casco Sì, tranquilli, sembra molto peggio di quello che è.

(Laughter)

(Risate)

So this is where we are. Let me put this back on here.

Quindi eccoci qui. Lasciate che risistemi questo

(Laughter) Doink!

(Risate) --

So this is where we are. We're on the cusp of being able to interact with digital humans that are strikingly real, whether they're being controlled by a person or a machine. And like all new technology these days, it's going to come with some serious and real concerns that we have to deal with. But I am just so really excited about the ability to bring something that I've seen only in science fiction for my entire life into reality. Communicating with computers will be like talking to a friend. And talking to faraway friends will be like sitting with them together in the same room.

Eccoci allora. Siamo al punto di svolta di riuscire ad interagire con umani digitali che sembrano sorprendentemente veri, che siano controllati da una persona o da una macchina. E come ogni nuova tecnologia oggi, ciò sarà accompagnato da una serie di questioni serie e reali con cui dovremo fare i conti. Ma io sono davvero esaltato dall'abilità di portare qualcosa che ho sempre solo visto nella fantascienza per tutta la mia vita nella realtà. Comunicare con un computer sarà come parlare con un amico. E parlare con amici lontani sarà come sedere con loro nella stessa stanza.

Thank you very much.

Grazie mille.

(Applause)

(Applausi)

Hello.

Salve.

(Applause)

(Applausi)

(Laughter)

(Risate)

(Laughter)

(Risate)

(Laughter)

(Risate)

But why did we do this? Well, there's two reasons, really. First of all, it is just crazy cool.

Ma perché abbiamo fatto tutto questo? Beh, ci sono due ragioni. Prima di tutto, è davvero figo.

(Laughter)

(Risate)

(Laughter)

(Risate)

(Laughter)

(Risate)

So this is where we are. Let me put this back on here.

Quindi eccoci qui. Lasciate che risistemi questo

(Laughter) Doink!

(Risate) --

Thank you very much.

Grazie mille.

(Applause)

(Applausi)

Doug Roble: Digital humans that look just like us

Doug Roble: Digital humans that look just like us

Related talks

Paul Debevec: Animating a photo-real digital face

Danielle Feinberg: The magic ingredient that brings Pixar movies to life

Tasos Frantzolas: Everything you hear on film is a lie

James Bridle: The nightmare videos of children's YouTube -- and what's wrong with the internet today

Chris Milk: How virtual reality can create the ultimate empathy machine

Ariel Garten: Know thyself, with a brain scanner

Related talks

Paul Debevec: Animating a photo-real digital face

Danielle Feinberg: The magic ingredient that brings Pixar movies to life

Tasos Frantzolas: Everything you hear on film is a lie

James Bridle: The nightmare videos of children's YouTube -- and what's wrong with the internet today

Chris Milk: How virtual reality can create the ultimate empathy machine

Ariel Garten: Know thyself, with a brain scanner