Doug Roble: Digital humans that look just like us

Hello. I'm not a real person. I'm actually a copy of a real person. Although, I feel like a real person. It's kind of hard to explain. Hold on -- I think I saw a real person ... there's one. Let's bring him onstage.

Olá. Eu não sou uma pessoa real. Sou, na verdade, uma cópia de uma pessoa real. No entanto, eu sinto-me uma pessoa real. É difícil de explicar. Esperem, eu acho que vi uma pessoa real... ali está uma! Vamos trazê-la ao palco.

Hello.

Olá.

(Applause)

(Aplausos)

What you see up there is a digital human. I'm wearing an inertial motion capture suit that's figuring what my body is doing. And I've got a single camera here that's watching my face and feeding some machine-learning software that's taking my expressions, like, "Hm, hm, hm," and transferring it to that guy. We call him "DigiDoug." He's actually a 3-D character that I'm controlling live in real time.

O que veem ali é um ser humano digital. Estou a vestir um fato de captura de movimento inercial que entende o que o meu corpo está a fazer. E tenho uma só câmara aqui, a observar o meu rosto e a alimentar um "software" de aprendizagem automática que usa as minhas expressões, como: "Hum, hum,hum," e as transfere para aquele sujeito. Nós chamamo-lhe "DigiDoug." Ele é, na verdade, uma personagem 3D que eu controlo ao vivo em tempo real.

So, I work in visual effects. And in visual effects, one of the hardest things to do is to create believable, digital humans that the audience accepts as real. People are just really good at recognizing other people. Go figure! So, that's OK, we like a challenge.

Eu trabalho com efeitos visuais. Em efeitos visuais, uma das coisas mais difíceis de fazer é criar seres humanos digitais credíveis que o público aceita como reais. As pessoas são muito boas a reconhecer outras pessoas. Vá-se lá entender! Tudo bem, nós gostamos de um desafio.

Over the last 15 years, we've been putting humans and creatures into film that you accept as real. If they're happy, you should feel happy. And if they feel pain, you should empathize with them. We're getting pretty good at it, too. But it's really, really difficult. Effects like these take thousands of hours and hundreds of really talented artists.

Nos últimos 15 anos, temos colocado humanos e criaturas em filmes que aceitamos como reais. Se eles estão felizes, nós também devíamos estar. E se eles sentem dor, devíamos ter empatia por eles. Estamos a tornar-nos muito bons nisto. Mas realmente é muito difícil. Efeitos como esses demoram milhares de horas e centenas de artistas muito talentosos.

But things have changed. Over the last five years, computers and graphics cards have gotten seriously fast. And machine learning, deep learning, has happened. So we asked ourselves: Do you suppose we could create a photo-realistic human, like we're doing for film, but where you're seeing the actual emotions and the details of the person who's controlling the digital human in real time? In fact, that's our goal: If you were having a conversation with DigiDoug one-on-one, is it real enough so that you could tell whether or not I was lying to you? So that was our goal.

Mas as coisas têm mudado. Nos últimos cinco anos, os computadores e placas gráficas têm-se tornado muito rápidos. E surgiu a aprendizagem automática e a aprendizagem profunda. Então perguntámo-nos: Será que conseguiríamos criar um ser humano foto-realista, como fazemos nos filmes, mas em que vejamos as emoções reais e os detalhes da pessoa que está a controlar o ser humano digital em tempo real? Este era o nosso objetivo: Se estivessem a ter uma conversa com o DigiDoug cara a cara, seria real o suficiente para poderem dizer se eu estava a mentir ou não? Esse era o nosso objetivo.

About a year and a half ago, we set off to achieve this goal. What I'm going to do now is take you basically on a little bit of a journey to see exactly what we had to do to get where we are. We had to capture an enormous amount of data. In fact, by the end of this thing, we had probably one of the largest facial data sets on the planet. Of my face.

Há cerca de um ano e meio, propusemo-nos alcançá-lo. Vou agora levar-vos numa pequena viagem para verem exatamente o que tivemos de fazer para chegarmos onde chegámos. Tivemos de captar uma quantidade enorme de dados. De facto, até ao final disto, tínhamos, provavelmente, um dos maiores conjuntos de dados faciais no planeta, do meu rosto.

(Laughter)

(Risos)

Why me? Well, I'll do just about anything for science. I mean, look at me! I mean, come on. We had to first figure out what my face actually looked like. Not just a photograph or a 3-D scan, but what it actually looked like in any photograph, how light interacts with my skin. Luckily for us, about three blocks away from our Los Angeles studio is this place called ICT. They're a research lab that's associated with the University of Southern California. They have a device there, it's called the "light stage." It has a zillion individually controlled lights and a whole bunch of cameras. And with that, we can reconstruct my face under a myriad of lighting conditions. We even captured the blood flow and how my face changes when I make expressions. This let us build a model of my face that, quite frankly, is just amazing. It's got an unfortunate level of detail, unfortunately.

Porquê eu? Bem, faria qualquer coisa pela ciência. Quer dizer, olhem para mim! Vejam só. Primeiro, tivemos de descobrir como era realmente o meu rosto. Não apenas numa fotografia ou numa digitalização em 3D, mas como era em qualquer fotografia, como a luz interage com a minha pele. Felizmente, a cerca de três quarteirões do nosso estúdio de Los Angeles há um local chamado ICT. Um laboratório de pesquisa associado à Universidade da Califórnia do Sul. Eles têm um dispositivo chamado de "Light Stage". Tem imensas luzes controladas individualmente e muitas câmaras. Com isto, podemos reconstruir o meu rosto sob uma série de condições de iluminação. Até capturamos o fluxo sanguíneo e como o meu rosto muda quando faço expressões. Isto permitiu-nos construir um modelo que, francamente, é espetacular. Infelizmente, tem um grande nível de detalhe.

(Laughter)

(Risos)

You can see every pore, every wrinkle. But we had to have that. Reality is all about detail. And without it, you miss it. We are far from done, though. This let us build a model of my face that looked like me. But it didn't really move like me. And that's where machine learning comes in. And machine learning needs a ton of data. So I sat down in front of some high-resolution motion-capturing device. And also, we did this traditional motion capture with markers. We created a whole bunch of images of my face and moving point clouds that represented that shapes of my face. Man, I made a lot of expressions, I said different lines in different emotional states ... We had to do a lot of capture with this. Once we had this enormous amount of data, we built and trained deep neural networks. And when we were finished with that, in 16 milliseconds, the neural network can look at my image and figure out everything about my face. It can compute my expression, my wrinkles, my blood flow -- even how my eyelashes move. This is then rendered and displayed up there with all the detail that we captured previously.

É possível ver cada poro, cada ruga. Mas precisávamos disto. A realidade consiste nesses detalhes. E sem eles, não será credível. E isto é só o começo. Isto permitiu-nos criar um modelo do meu rosto que se parecia comigo... mas que não se movia exatamente como eu. E foi aí que entrou a aprendizagem automática. E a aprendizagem automática precisa de imensos dados. Sentei-me à frente de um dispositivo de captura de movimento de alta resolução. E também, fizemos esta captação de movimento tradicional com marcadores. Criámos imensas imagens do meu rosto e nuvens de pontos em movimento que representavam as formas do meu rosto. Fiz muitas expressões, disse muitas frases em estados emocionais diferentes... O trabalho de captação foi árduo. Uma vez recolhida esta quantidade enorme de dados, construímos e treinámos redes neurais profundas, e quando acabámos, em 16 milissegundos, a rede neural consegue analisar a minha imagem e decifrar tudo sobre o meu rosto. Pode processar a minha expressão, as minhas rugas, o meu fluxo sanguíneo até mesmo, como se mexem as minhas pestanas. Isso é tudo processado e exibido lá em cima com todo o detalhe que captámos anteriormente.

We're far from done. This is very much a work in progress. This is actually the first time we've shown it outside of our company. And, you know, it doesn't look as convincing as we want; I've got wires coming out of the back of me, and there's a sixth-of-a-second delay between when we capture the video and we display it up there. Sixth of a second -- that's crazy good! But it's still why you're hearing a bit of an echo and stuff. And you know, this machine learning stuff is brand-new to us, sometimes it's hard to convince to do the right thing, you know? It goes a little sideways.

Estamos longe de terminar. Ainda é um projeto em andamento. Aliás, esta é a primeira vez que o mostramos fora da nossa empresa. Ainda não parece tão convincente quanto queremos; eu tenho fios a sair das minhas costas, e há um atraso de um sexto de segundo entre o momento em que captamos o vídeo e o exibimos ali. Um sexto de segundo — isso é incrível! Mas é a razão pela qual ainda ouvem um pouco de eco. Para além disso, a aprendizagem automática ainda é nova para nós, e, às vezes, é difícil convencê-la a fazer o que lhe indicamos. Foge um pouco ao controlo.

(Laughter)

(Risos)

But why did we do this? Well, there's two reasons, really. First of all, it is just crazy cool.

Mas porque é que fizemos isto? Bem, existem duas razões, na verdade. Primeiro, porque é mesmo muito fixe.

(Laughter)

(Risos)

How cool is it? Well, with the push of a button, I can deliver this talk as a completely different character. This is Elbor. We put him together to test how this would work with a different appearance. And the cool thing about this technology is that, while I've changed my character, the performance is still all me. I tend to talk out of the right side of my mouth; so does Elbor.

Quão fixe é? Ao pressionar um botão, posso dar esta palestra como uma personagem completamente diferente. Este é o Elbor. Criámo-lo para testar como isto funcionaria com uma aparência diferente. E o engraçado desta tecnologia é que, mesmo alterando a personagem, a apresentação ainda é toda minha. Costumo falar com o lado direito da boca; e o mesmo acontece com o Elbor.

(Laughter)

(Risos)

Now, the second reason we did this, and you can imagine, is this is going to be great for film. This is a brand-new, exciting tool for artists and directors and storytellers. It's pretty obvious, right? I mean, this is going to be really neat to have. But also, now that we've built it, it's clear that this is going to go way beyond film.

A segunda razão pela qual fizemos isto, e podem imaginar, é que será ótimo para filmes. Esta é uma ferramenta nova e entusiasmante para artistas, realizadores e contadores de histórias. É bastante óbvio, certo? Quer dizer, vai ser incrível contar com esta ferramenta. Agora que a construímos, está claro que vai muito além do cinema.

But wait. Didn't I just change my identity with the push of a button? Isn't this like "deepfake" and face-swapping that you guys may have heard of? Well, yeah. In fact, we are using some of the same technology that deepfake is using. Deepfake is 2-D and image based, while ours is full 3-D and way more powerful. But they're very related. And now I can hear you thinking, "Darn it! I though I could at least trust and believe in video. If it was live video, didn't it have to be true?" Well, we know that's not really the case, right? Even without this, there are simple tricks that you can do with video like how you frame a shot that can make it really misrepresent what's actually going on. And I've been working in visual effects for a long time, and I've known for a long time that with enough effort, we can fool anyone about anything. What this stuff and deepfake is doing is making it easier and more accessible to manipulate video, just like Photoshop did for manipulating images, some time ago.

Mas, esperem. Não acabei de mudar a minha identidade pressionando um botão? Não é parecido com a tecnologia "deepfake" e troca de rosto de que já podem ter ouvido falar? Bem, sim. De facto, usamos alguma da mesma tecnologia que a "deepfake" usa. A "deepfake" é em 2D e baseada em imagens, enquanto a nossa é totalmente em 3D e muito mais poderosa. Mas estão muito relacionadas. E agora consigo ouvir-vos a pensar: "Fogo! "Eu pensava que, pelo menos, podia confiar e acreditar num vídeo. "Se fosse um vídeo ao vivo, não teria de ser verdadeiro?" Sabemos agora que não necessariamente, certo? Mesmo sem isto, existem truques simples que podem fazer com um vídeo como o ângulo da filmagem, que pode realmente distorcer o que está a acontecer na realidade. Eu já trabalho em efeitos visuais há muito tempo, e sei há muito tempo que, com esforço suficiente, podemos enganar qualquer um sobre qualquer coisa. O que isto e a "deepfake" estão a fazer é com que seja mais fácil e acessível manipular um vídeo, tal como o Photoshop fez ao manipular imagens, há algum tempo.

I prefer to think about how this technology could bring humanity to other technology and bring us all closer together. Now that you've seen this, think about the possibilities. Right off the bat, you're going to see it in live events and concerts, like this. Digital celebrities, especially with new projection technology, are going to be just like the movies, but alive and in real time. And new forms of communication are coming. You can already interact with DigiDoug in VR. And it is eye-opening. It's just like you and I are in the same room, even though we may be miles apart. Heck, the next time you make a video call, you will be able to choose the version of you you want people to see. It's like really, really good makeup. I was scanned about a year and a half ago. I've aged. DigiDoug hasn't. On video calls, I never have to grow old.

Prefiro pensar como esta tecnologia pode conduzir a Humanidade a outras tecnologias e nos aproximar a todos. Agora que viram isto, pensem nas possibilidades. De imediato, verão eventos ao vivo e concertos assim. Celebridades digitais, especialmente com esta nova tecnologia de projeção, serão exatamente como nos filmes, mas ao vivo e em tempo real. E novas formas de comunicação estão a surgir. Vocês já podem interagir com o DigiDoug através da realidade virtual. É surpreendente. É como se estivéssemos na mesma sala, mesmo se, na realidade, estivermos a quilómetros de distância. Da próxima vez que fizerem uma videochamada, poderão escolher a versão de vocês que querem que outros vejam. É como uma maquilhagem muito boa. Fiz um exame há um ano e meio. Envelheci. O DigiDoug não. Em videochamadas, nunca preciso de envelhecer.

And as you can imagine, this is going to be used to give virtual assistants a body and a face. A humanity. I already love it that when I talk to virtual assistants, they answer back in a soothing, humanlike voice. Now they'll have a face. And you'll get all the nonverbal cues that make communication so much easier. It's going to be really nice. You'll be able to tell when a virtual assistant is busy or confused or concerned about something.

E como podem imaginar, isto vai ser usado para dar um corpo e rosto aos assistentes virtuais. Uma humanidade. Eu já adoro falar com assistentes virtuais, eles respondem sempre com uma voz suave e humana. Agora, eles terão um rosto. E poderão ver todas as dicas não verbais que tornam a comunicação tão mais fácil. Isto vai ser muito bom. Conseguirão saber quando o assistente virtual está ocupado, confuso ou preocupado com alguma coisa.

Now, I couldn't leave the stage without you actually being able to see my real face, so you can do some comparison. So let me take off my helmet here. Yeah, don't worry, it looks way worse than it feels.

Não podia sair do palco sem que pudessem ver o meu rosto real, para que possam comparar. Deixem-me tirar o meu capacete. Não se preocupem, parece muito pior do que realmente é.

(Laughter)

(Risos)

So this is where we are. Let me put this back on here.

Então é aqui que estamos. Deixem-me pô-lo outra vez.

(Laughter) Doink!

(Risos)

So this is where we are. We're on the cusp of being able to interact with digital humans that are strikingly real, whether they're being controlled by a person or a machine. And like all new technology these days, it's going to come with some serious and real concerns that we have to deal with. But I am just so really excited about the ability to bring something that I've seen only in science fiction for my entire life into reality. Communicating with computers will be like talking to a friend. And talking to faraway friends will be like sitting with them together in the same room.

É aqui que estamos. Estamos prestes a conseguir interagir com seres humanos digitais que são surpreendentemente reais, quer sejam controlados por uma pessoa ou por uma máquina. E, tal como todas as novas tecnologias de hoje em dia, virão algumas preocupações sérias e reais com que teremos de lidar. Mas eu estou muito entusiasmado pela nossa capacidade de trazer algo que apenas vi em ficção científica, toda a minha vida, para a realidade. Comunicar com computadores será como falar com um amigo. E falar com amigos que estão longe será como nos sentarmos com eles na mesma sala.

Thank you very much.

Muito obrigado.

(Applause)

(Aplausos)

Hello.

Olá.

(Applause)

(Aplausos)

(Laughter)

(Risos)

(Laughter)

(Risos)

(Laughter)

(Risos)

But why did we do this? Well, there's two reasons, really. First of all, it is just crazy cool.

Mas porque é que fizemos isto? Bem, existem duas razões, na verdade. Primeiro, porque é mesmo muito fixe.

(Laughter)

(Risos)

(Laughter)

(Risos)

Não podia sair do palco sem que pudessem ver o meu rosto real, para que possam comparar. Deixem-me tirar o meu capacete. Não se preocupem, parece muito pior do que realmente é.

(Laughter)

(Risos)

So this is where we are. Let me put this back on here.

Então é aqui que estamos. Deixem-me pô-lo outra vez.

(Laughter) Doink!

(Risos)