Doug Roble: Digital humans that look just like us

Hello. I'm not a real person. I'm actually a copy of a real person. Although, I feel like a real person. It's kind of hard to explain. Hold on -- I think I saw a real person ... there's one. Let's bring him onstage.

Hola. No soy una persona real. De hecho, soy la copia de una persona real. Aunque... me siento como una persona real. Es difícil de explicar. Aguarden, creo que vi a una persona real. Ahí está, que suba al escenario.

Hello.

Hola.

(Applause)

(Aplausos)

What you see up there is a digital human. I'm wearing an inertial motion capture suit that's figuring what my body is doing. And I've got a single camera here that's watching my face and feeding some machine-learning software that's taking my expressions, like, "Hm, hm, hm," and transferring it to that guy. We call him "DigiDoug." He's actually a 3-D character that I'm controlling live in real time.

Lo que ven allí arriba es un ser humano digital. Tengo puesto un traje de captura de movimientos inercial que descifra los movimientos de mi cuerpo. Y tengo una cámara aquí, apuntando a mi rostro, que envía información sobre mis expresiones a un software de aprendizaje automático. Toma expresiones como "Hm, hm, hm" y las transfiere a ese sujeto. Lo llamamos "DigiDoug". Es un personaje 3D que estoy controlando en tiempo real.

So, I work in visual effects. And in visual effects, one of the hardest things to do is to create believable, digital humans that the audience accepts as real. People are just really good at recognizing other people. Go figure! So, that's OK, we like a challenge.

Trabajo en efectos visuales. En este campo, una de las tareas más difíciles es crear humanos digitales creíbles que la audiencia acepte como reales. Las personas son muy buenas en reconocer a otras. ¿Quién lo diría? Está bien, aceptamos el reto.

Over the last 15 years, we've been putting humans and creatures into film that you accept as real. If they're happy, you should feel happy. And if they feel pain, you should empathize with them. We're getting pretty good at it, too. But it's really, really difficult. Effects like these take thousands of hours and hundreds of really talented artists.

Durante los últimos 15 años, hemos introducido en las películas seres humanos y criaturas que ustedes aceptan como reales. Si ellos están felices, ustedes también. Si sienten dolor, Uds. deberían sentir empatía. Además, estamos mejorando cada vez más. Pero es extremadamente difícil. Efectos así llevan miles de horas y requieren el trabajo de cientos de artistas talentosos.

But things have changed. Over the last five years, computers and graphics cards have gotten seriously fast. And machine learning, deep learning, has happened. So we asked ourselves: Do you suppose we could create a photo-realistic human, like we're doing for film, but where you're seeing the actual emotions and the details of the person who's controlling the digital human in real time? In fact, that's our goal: If you were having a conversation with DigiDoug one-on-one, is it real enough so that you could tell whether or not I was lying to you? So that was our goal.

Pero las cosas han cambiado. Durante los últimos cinco años, las computadoras y las tarjetas gráficas se han vuelto superrápidas, y han surgido el aprendizaje automático y el aprendizaje profundo. Así que nos preguntamos: ¿sería posible crear un ser humano fotorrealista como los que creamos para el cine pero en los que puedan verse las emociones reales y los detalles de la persona que lo controla en tiempo real? Y esa es nuestra meta. Si estuviesen charlando con DigiDoug de forma directa, ¿es lo suficientemente real para que puedan notar si yo les estoy mintiendo? Esa fue nuestra meta.

About a year and a half ago, we set off to achieve this goal. What I'm going to do now is take you basically on a little bit of a journey to see exactly what we had to do to get where we are. We had to capture an enormous amount of data. In fact, by the end of this thing, we had probably one of the largest facial data sets on the planet. Of my face.

Hace aproximadamente un año y medio, nos dispusimos a alcanzar ese objetivo. Lo que haré ahora será invitarlos a una pequeña aventura para que vean lo que tuvimos que hacer para llegar adonde estamos. Debimos recolectar una enorme cantidad de datos. De hecho, al terminar este proyecto, probablemente contábamos con una de las bases de datos más completas de expresiones faciales, de mi rostro.

(Laughter)

(Risas)

Why me? Well, I'll do just about anything for science. I mean, look at me! I mean, come on. We had to first figure out what my face actually looked like. Not just a photograph or a 3-D scan, but what it actually looked like in any photograph, how light interacts with my skin. Luckily for us, about three blocks away from our Los Angeles studio is this place called ICT. They're a research lab that's associated with the University of Southern California. They have a device there, it's called the "light stage." It has a zillion individually controlled lights and a whole bunch of cameras. And with that, we can reconstruct my face under a myriad of lighting conditions. We even captured the blood flow and how my face changes when I make expressions. This let us build a model of my face that, quite frankly, is just amazing. It's got an unfortunate level of detail, unfortunately.

¿Por qué yo? Pues haría lo que fuera en nombre de la ciencia. Es decir, mírenme. Mírenme. Primero, teníamos que determinar cómo luce mi rostro realmente. No sólo una fotografía o un escaneo 3D, sino como luce en todo tipo de fotografías, cómo la luz interactúa con la piel. Por suerte, a tres cuadras de distancia de nuestro estudio en Los Ángeles se encuentra este lugar llamado ICT. Se trata de un laboratorio de investigación asociado a la Universidad del Sur de California. Allí tienen un dispositivo llamado "escenario de las luces". Tiene montones de luces controladas de forma individual y una cantidad de cámaras. Podemos reconstruir mi rostro de acuerdo a diversas condiciones lumínicas. Incluso capturamos la circulación y cómo mi rostro cambia al hacer distintas expresiones. Esto nos permitió construir un modelo de mi rostro que, honestamente, es asombroso. Desafortunadamente, tiene un gran nivel de detalles.

(Laughter)

(Risas)

You can see every pore, every wrinkle. But we had to have that. Reality is all about detail. And without it, you miss it. We are far from done, though. This let us build a model of my face that looked like me. But it didn't really move like me. And that's where machine learning comes in. And machine learning needs a ton of data. So I sat down in front of some high-resolution motion-capturing device. And also, we did this traditional motion capture with markers. We created a whole bunch of images of my face and moving point clouds that represented that shapes of my face. Man, I made a lot of expressions, I said different lines in different emotional states ... We had to do a lot of capture with this. Once we had this enormous amount of data, we built and trained deep neural networks. And when we were finished with that, in 16 milliseconds, the neural network can look at my image and figure out everything about my face. It can compute my expression, my wrinkles, my blood flow -- even how my eyelashes move. This is then rendered and displayed up there with all the detail that we captured previously.

Pueden notar cada poro, cada arruga. Pero necesitábamos eso. La realidad consiste en esos detalles. Sin ellos, no es creíble. Y esto es solo el comienzo. Pudimos construir un modelo de mi rostro que luce como yo. Pero no se movía como yo. Y es aquí donde interviene el aprendizaje automático. El aprendizaje automático requiere muchísimos datos. Me senté delante de un dispositivo de captura de movimiento de alta resolución. También hicimos esta captura tradicional con marcadores. Creamos montones de imágenes de mi rostro y nubes de puntos que representan las formas de mi rostro. Hice montones de expresiones, pronuncié muchas frases en distintos estados de ánimo. El trabajo para capturar todo esto fue arduo. Una vez recolectada esta enorme cantidad de data, construimos y entrenamos redes neurales profundas. Y cuando completamos eso, en 16 milisegundos la red neural es capaz de tomar mi imagen y descifrar todo acerca de mi rostro. Puede computar mi expresión, mis arrugas, mi circulación, incluso el movimiento de mis pestañas. Esto se representa y muestra allí con todos los detalles que se capturaron previamente.

We're far from done. This is very much a work in progress. This is actually the first time we've shown it outside of our company. And, you know, it doesn't look as convincing as we want; I've got wires coming out of the back of me, and there's a sixth-of-a-second delay between when we capture the video and we display it up there. Sixth of a second -- that's crazy good! But it's still why you're hearing a bit of an echo and stuff. And you know, this machine learning stuff is brand-new to us, sometimes it's hard to convince to do the right thing, you know? It goes a little sideways.

Esto es apenas el comienzo. Se trata de un trabajo en desarrollo. Y es de hecho la primera vez que mostramos esta tecnología al público. Y, como verán, no luce tan convincente como quisiéramos: tengo cables conectados por detrás, y hay una demora de un sexto de segundo entre lo que lo que grabamos en video y lo que se muestra allá arriba. Un sexto de segundo, ¡es increíble! Pero esa es la razón por la que escuchan un breve eco. Además, esto del aprendizaje automático es supernuevo para nosotros, a veces es difícil convencerlo para que haga lo que le indicamos. Se sale un poco de control.

(Laughter)

(Risas)

But why did we do this? Well, there's two reasons, really. First of all, it is just crazy cool.

Pero ¿por qué desarrollamos esto? Pues hay dos razones. En primer lugar, es algo increíble.

(Laughter)

(Risas)

How cool is it? Well, with the push of a button, I can deliver this talk as a completely different character. This is Elbor. We put him together to test how this would work with a different appearance. And the cool thing about this technology is that, while I've changed my character, the performance is still all me. I tend to talk out of the right side of my mouth; so does Elbor.

¿Qué tan increíble es? Pues con solo apretar un botón, puedo dar esta charla como un personaje totalmente diferente. Él es Elbor. Lo diseñamos para probar cómo funcionaría esto con una apariencia diferente. Y lo sorprendente de esta tecnología es que, si bien cambié mi apariencia, la gesticulación es todavía mía. Tiendo a hablar con el lado derecho de la boca, así que así lo hace Elbor.

(Laughter)

(Risas)

Now, the second reason we did this, and you can imagine, is this is going to be great for film. This is a brand-new, exciting tool for artists and directors and storytellers. It's pretty obvious, right? I mean, this is going to be really neat to have. But also, now that we've built it, it's clear that this is going to go way beyond film.

La segunda razón, como podrán imaginar, es por sus aplicaciones en el cine. Se trata de una herramienta muy nueva y emocionante para los artistas, directores y narradores. Es un uso evidente, ¿no? Es decir, será muy útil contar con esto. Pero además, ahora que lo hemos construido, es obvio que se aplicará en otros ámbitos además del cine.

But wait. Didn't I just change my identity with the push of a button? Isn't this like "deepfake" and face-swapping that you guys may have heard of? Well, yeah. In fact, we are using some of the same technology that deepfake is using. Deepfake is 2-D and image based, while ours is full 3-D and way more powerful. But they're very related. And now I can hear you thinking, "Darn it! I though I could at least trust and believe in video. If it was live video, didn't it have to be true?" Well, we know that's not really the case, right? Even without this, there are simple tricks that you can do with video like how you frame a shot that can make it really misrepresent what's actually going on. And I've been working in visual effects for a long time, and I've known for a long time that with enough effort, we can fool anyone about anything. What this stuff and deepfake is doing is making it easier and more accessible to manipulate video, just like Photoshop did for manipulating images, some time ago.

Pero... un momento. ¿No acabo de cambiar de identidad con solo apretar un botón? ¿No es algo parecido al ultrafalso y cambios de rostro de los que habrán oído hablar? Pues, sí. De hecho, usamos algunas de las mismas tecnologías que usa el ultrafalso. Pero el ultrafalso es 2D y se basa en imágenes; el nuestro es 3D y mucho más potente. Pero están muy relacionados. Puedo escuchar cómo piensan: "¡Diablos! Pensé que podía al menos confiar en los videos. Si es un video en vivo, ¿no es evidente que es verdadero?" Pues sabemos que no necesariamente, ¿cierto? Incluso sin esto, existen trucos sencillos que pueden aplicarse a los videos, como el ángulo de una toma que puede en verdad distorsionar lo que realmente está sucediendo. He trabajado en efectos visuales por mucho tiempo y sé desde hace bastante que con un poco de esfuerzo se puede engañar a quien sea respecto a lo que sea. Lo que esto y el ultrafalso hacen es que sea más sencillo y accesible manipular videos, así como lo hizo Photoshop con las imágenes hace tiempo.

I prefer to think about how this technology could bring humanity to other technology and bring us all closer together. Now that you've seen this, think about the possibilities. Right off the bat, you're going to see it in live events and concerts, like this. Digital celebrities, especially with new projection technology, are going to be just like the movies, but alive and in real time. And new forms of communication are coming. You can already interact with DigiDoug in VR. And it is eye-opening. It's just like you and I are in the same room, even though we may be miles apart. Heck, the next time you make a video call, you will be able to choose the version of you you want people to see. It's like really, really good makeup. I was scanned about a year and a half ago. I've aged. DigiDoug hasn't. On video calls, I never have to grow old.

Pero yo prefiero pensar cómo esta tecnología podría conducir a la humanidad a otras tecnologías y acercarnos más a todos. Después de ver esto, imaginen las posibilidades. Obviamente estará presente en eventos y conciertos en vivo, como este. Las celebridades digitales, especialmente con la nueva tecnología de proyección, van a verse igual que en las películas, pero en tiempo real. Y nuevas formas de comunicación se avecinan. Ya pueden interactuar con DigiDoug a través de la realidad virtual. Y es una gran experiencia. Es como si ustedes y yo estuviéramos en la misma habitación, aunque pudiéramos estar realmente muy lejos. La próxima vez que hagan una videollamada, podrían ser capaces de seleccionar la mejor versión de ustedes mismos que prefieren que la gente vea. Es como un maquillaje extremadamente bueno. Escaneé mi rostro hace un año y medio. He envejecido. DigiDoug, no. En las videollamadas, puedo no envejecer.

And as you can imagine, this is going to be used to give virtual assistants a body and a face. A humanity. I already love it that when I talk to virtual assistants, they answer back in a soothing, humanlike voice. Now they'll have a face. And you'll get all the nonverbal cues that make communication so much easier. It's going to be really nice. You'll be able to tell when a virtual assistant is busy or confused or concerned about something.

Y, como imaginarán, esto servirá para dar a los asistentes virtuales un cuerpo y un rostro, para darles humanidad. Me encanta que cuando interactúo con asistentes virtuales me contesten con una voz tranquila, que suena humana. Ahora tendrán un rostro. Y podremos ver todos los indicios verbales que tanto facilitan la comunicación. Será realmente bueno. Podrán darse cuenta cuando el asistente esté ocupado, confundido o preocupado por algo.

Now, I couldn't leave the stage without you actually being able to see my real face, so you can do some comparison. So let me take off my helmet here. Yeah, don't worry, it looks way worse than it feels.

No quería irme del escenario sin mostrarles mi verdadero rostro, para que puedan hacer comparaciones. Así que permítanme quitarme este casco. Sí, no se preocupen, no se siente tan mal como parece.

(Laughter)

(Risas)

So this is where we are. Let me put this back on here.

Hasta aquí hemos llegado. Dejen que me lo vuelva a poner.

(Laughter) Doink!

(Risas)

So this is where we are. We're on the cusp of being able to interact with digital humans that are strikingly real, whether they're being controlled by a person or a machine. And like all new technology these days, it's going to come with some serious and real concerns that we have to deal with. But I am just so really excited about the ability to bring something that I've seen only in science fiction for my entire life into reality. Communicating with computers will be like talking to a friend. And talking to faraway friends will be like sitting with them together in the same room.

Hasta aquí hemos llegado. Estamos a las puertas de poder interactuar con seres humanos digitales que sean sorprendentemente reales, ya sea que estén controlados por una persona o una máquina. Y, como todas las tecnologías nuevas en la actualidad, tendrá algunas implicancias serias que tendremos que abordar. Pero en verdad estoy muy emocionado por la capacidad de traer algo que solo he visto en la ciencia ficción durante toda mi vida a la realidad. Podremos comunicarnos con computadoras como si habláramos con un amigo. Y charlar con mis amigos que viven lejos será como si estuviéramos en la misma habitación.

Thank you very much.

Muchas gracias.

(Applause)

(Aplausos)

Hello.

Hola.

(Applause)

(Aplausos)

(Laughter)

(Risas)

(Laughter)

(Risas)

(Laughter)

(Risas)

But why did we do this? Well, there's two reasons, really. First of all, it is just crazy cool.

Pero ¿por qué desarrollamos esto? Pues hay dos razones. En primer lugar, es algo increíble.

(Laughter)

(Risas)

(Laughter)

(Risas)

(Laughter)

(Risas)

So this is where we are. Let me put this back on here.

Hasta aquí hemos llegado. Dejen que me lo vuelva a poner.

(Laughter) Doink!

(Risas)

Thank you very much.

Muchas gracias.

(Applause)

(Aplausos)

Doug Roble: Digital humans that look just like us

Doug Roble: Digital humans that look just like us

Related talks

Paul Debevec: Animating a photo-real digital face

Danielle Feinberg: The magic ingredient that brings Pixar movies to life

Tasos Frantzolas: Everything you hear on film is a lie

James Bridle: The nightmare videos of children's YouTube -- and what's wrong with the internet today

Chris Milk: How virtual reality can create the ultimate empathy machine

Ariel Garten: Know thyself, with a brain scanner

Related talks

Paul Debevec: Animating a photo-real digital face

Danielle Feinberg: The magic ingredient that brings Pixar movies to life

Tasos Frantzolas: Everything you hear on film is a lie

James Bridle: The nightmare videos of children's YouTube -- and what's wrong with the internet today

Chris Milk: How virtual reality can create the ultimate empathy machine

Ariel Garten: Know thyself, with a brain scanner