Doug Roble: Digital humans that look just like us

Hello. I'm not a real person. I'm actually a copy of a real person. Although, I feel like a real person. It's kind of hard to explain. Hold on -- I think I saw a real person ... there's one. Let's bring him onstage.

안녕하세요. 전 진짜 사람이 아닙니다. 사실 전 복제품이죠. 그렇지만 진짜 사람처럼 느껴져요. 이 상황을 설명하기 좀 어려운데요. 잠시만요. 진짜 사람을 본 것 같아요. 여기 있네요. 그를 무대로 불러보겠습니다.

Hello.

안녕하세요.

(Applause)

(박수)

What you see up there is a digital human. I'm wearing an inertial motion capture suit that's figuring what my body is doing. And I've got a single camera here that's watching my face and feeding some machine-learning software that's taking my expressions, like, "Hm, hm, hm," and transferring it to that guy. We call him "DigiDoug." He's actually a 3-D character that I'm controlling live in real time.

여러분이 보신 건 디지털 인간입니다. 저는 관성 모션 캡처 의상을 입고 있습니다. 이 옷은 제 몸의 움직임을 계산합니다. 그리고 제 얼굴 앞에 카메라 한 대가 있는데요. 제 표정 촬영해 머신러닝 소프트웨어에 전달합니다. "음, 음, 음" 이런식으로요. 저 친구에게 전도시키죠. 우리는 그를 "디지더그" 라고 부릅니다. 제가 실시간으로 조종하는 3D 캐릭터입니다.

So, I work in visual effects. And in visual effects, one of the hardest things to do is to create believable, digital humans that the audience accepts as real. People are just really good at recognizing other people. Go figure! So, that's OK, we like a challenge.

저는 시각효과 분야에서 일하고 있습니다. 시각효과에 있어 가장 어려운 부분은 관객들이 실제라고 생각할 정도로 그럴듯한 디지털 인간을 구현하는 것입니다. 사람들은 다른 사람을 구별하는 능력이 아주 뛰어나죠. 제길! 근데 괜찮아요. 우리는 도전을 좋아하거든요.

Over the last 15 years, we've been putting humans and creatures into film that you accept as real. If they're happy, you should feel happy. And if they feel pain, you should empathize with them. We're getting pretty good at it, too. But it's really, really difficult. Effects like these take thousands of hours and hundreds of really talented artists.

지난 15년 동안, 우리는 실제와 같은 사람과 생명체들을 영화에 투입시켰었죠. 그들이 행복하면, 여러분도 행복하고, 그들이 고통을 받는다면, 여러분도 동일한 감정을 느낍니다. 우리는 지금까지 잘 해왔지만, 정말이지 매우 힘들었습니다. 이러한 효과를 만드는데 수 천 시간이 걸리고, 수 백 명의 재능 있는 아티스트들이 필요합니다.

But things have changed. Over the last five years, computers and graphics cards have gotten seriously fast. And machine learning, deep learning, has happened. So we asked ourselves: Do you suppose we could create a photo-realistic human, like we're doing for film, but where you're seeing the actual emotions and the details of the person who's controlling the digital human in real time? In fact, that's our goal: If you were having a conversation with DigiDoug one-on-one, is it real enough so that you could tell whether or not I was lying to you? So that was our goal.

그러나 상황은 변했습니다. 지난 5년 동안, 컴퓨터와 그래픽 카드 속도가 말도 안 되게 빨라졌죠. 그리고 '머신러닝', '딥러닝'이 출연했습니다. 그래서 우리는 질문을 던졌습니다. 우리가 영화에서처럼 실제와 같이 사람을 창조해 낼 수 있을까? 그리고 그것을 통제하는 사람의 감정을 디지털 인간에게 상세히 투영할 수 있을까? 실시간으로? 사실, 그게 우리의 목표입니다. 만일 여러분이 디저더그와 1:1로 대화를 한다면, 제가 거짓말을 하는지 안 하는지 충분히 알 수 있지 않을까요? 그게 우리의 목표입니다.

About a year and a half ago, we set off to achieve this goal. What I'm going to do now is take you basically on a little bit of a journey to see exactly what we had to do to get where we are. We had to capture an enormous amount of data. In fact, by the end of this thing, we had probably one of the largest facial data sets on the planet. Of my face.

약 1년 반 전에, 이 목표를 세웠습니다. 지금부터 여러분께 우리가 현재에 이르기까지 무엇을 했는지 보여드리겠습니다. 우리는 셀 수 없을 정도로 많은 데이터를 수집했습니다. 사실, 이 작업이 끝났을 때, 지구상에서 가장 큰 얼굴 데이터를 확보했어요. 제 얼굴입니다.

(Laughter)

(웃음)

Why me? Well, I'll do just about anything for science. I mean, look at me! I mean, come on. We had to first figure out what my face actually looked like. Not just a photograph or a 3-D scan, but what it actually looked like in any photograph, how light interacts with my skin. Luckily for us, about three blocks away from our Los Angeles studio is this place called ICT. They're a research lab that's associated with the University of Southern California. They have a device there, it's called the "light stage." It has a zillion individually controlled lights and a whole bunch of cameras. And with that, we can reconstruct my face under a myriad of lighting conditions. We even captured the blood flow and how my face changes when I make expressions. This let us build a model of my face that, quite frankly, is just amazing. It's got an unfortunate level of detail, unfortunately.

왜 저냐고요? 저는 과학에 관해서는 뭐든 하는 사람이니까요. 절 보세요! 어서요. 우리는 가장 먼저 제 얼굴을 분석해야 했습니다. 그냥 사진이나 3D 스캔 수준이 아닙니다. 모든 사진 속에서 빛이 제 피부와 닿아서 어떻게 실제처럼 나타나는지 확인하죠. 다행히, 우리의 로스엔젤레스 연구소에서 세 블럭 쯤 떨어진 곳에 ICT라고 불리는 곳이 있습니다. 서던캘리포니아 대학 소속의 연구소입니다. "라이트 스테이지"라는 장비가 그곳에 설치되어 있습니다. 개별적으로 제어 가능한 수 만개의 전등과 상당수의 카메라가 장착되어 있습니다. 이 엄청난 빛을 얼굴에 비추며 제 얼굴을 재구성 합니다. 제가 표정을 지을 때의 얼굴 변화뿐만 아니라 혈류마저도 측정합니다. 이 과정으로 제 얼굴 모형을 만드는데, 솔직히 정말 대단해요. 안타까울 정도로 자세합니다.

(Laughter)

(웃음)

You can see every pore, every wrinkle. But we had to have that. Reality is all about detail. And without it, you miss it. We are far from done, though. This let us build a model of my face that looked like me. But it didn't really move like me. And that's where machine learning comes in. And machine learning needs a ton of data. So I sat down in front of some high-resolution motion-capturing device. And also, we did this traditional motion capture with markers. We created a whole bunch of images of my face and moving point clouds that represented that shapes of my face. Man, I made a lot of expressions, I said different lines in different emotional states ... We had to do a lot of capture with this. Once we had this enormous amount of data, we built and trained deep neural networks. And when we were finished with that, in 16 milliseconds, the neural network can look at my image and figure out everything about my face. It can compute my expression, my wrinkles, my blood flow -- even how my eyelashes move. This is then rendered and displayed up there with all the detail that we captured previously.

모든 모공과 주름까지 볼 수 있죠. 이건 반드시 필요합니다. 세부사항이 생명이기 때문입니다. 그렇지 않으면, 실패할 거예요. 그러나 아직 갈 길이 멉니다. 이렇게 저와 똑같은 얼굴 모형을 만들었습니다. 그런데 저처럼 움직이지는 않았습니다. 여기가 바로 머신러닝이 투입되는 시점입니다. 머신러닝은 수 많은 데이터가 필요합니다. 그래서 고해상도 모션 캡처 장비로 측정 작업을 했습니다. 그리고 점을 찍는 전통적인 모션 캡처 작업도 진행 했습니다. 우리는 제 얼굴의 형상을 나타내는 점군을 만들어가며 얼굴 전체 이미지를 만들었습니다. 정말, 수 많은 표정을 지어야했죠. 각각의 감정의 상태에 따른 각기 다른 표정 말입니다. 모든 것들을 캡처해야 했죠. 이 엄청난 데이터를 모으게 되면, 심층신경망을 만들고 훈련시킵니다. 그리고 이 작업을 마치면, 0.016 초 안에 심층신경망은 저의 이미지를 보며, 제 얼굴에 관한 모든 것을 분석합니다. 제 표정과 주름, 핏줄까지 파악할 수 있습니다. 눈썹의 움직임까지도요. 사전에 저장해 둔 모든 상세 정보를 가지고 이미지를 만들어 화면에 보이게 하는 겁니다.

We're far from done. This is very much a work in progress. This is actually the first time we've shown it outside of our company. And, you know, it doesn't look as convincing as we want; I've got wires coming out of the back of me, and there's a sixth-of-a-second delay between when we capture the video and we display it up there. Sixth of a second -- that's crazy good! But it's still why you're hearing a bit of an echo and stuff. And you know, this machine learning stuff is brand-new to us, sometimes it's hard to convince to do the right thing, you know? It goes a little sideways.

아직 갈 길이 많습니다. 아직 많은 연구가 진행 중이죠. 사실, 처음으로 우리 회사 밖에서 보여드리는 겁니다. 아직까지 우리 기대에 미치지는 못하고 있습니다. 제 뒤로는 전선이 나와 있고, 영상을 캡처하고 화면에 보이는데 1/6초 가량 반응 시간이 걸립니다. 1/6초도 상당히 좋은 거죠! 그런데 이 때문에 약간의 울림이 들리는 것입니다. 그리고 이 머신러닝은 우리에게 새로워서, 제대로 일을 하고 있는지 확신하기 어려울 때가 있습니다. 가끔 엉뚱한 결과가 나오기도 하죠.

(Laughter)

(웃음)

But why did we do this? Well, there's two reasons, really. First of all, it is just crazy cool.

그런데 우리가 왜 이 일을 했을까요? 두 가지 이유가 있습니다. 첫째, 그냥 이 일 자체가 멋지기 때문이죠.

(Laughter)

(웃음)

How cool is it? Well, with the push of a button, I can deliver this talk as a completely different character. This is Elbor. We put him together to test how this would work with a different appearance. And the cool thing about this technology is that, while I've changed my character, the performance is still all me. I tend to talk out of the right side of my mouth; so does Elbor.

얼마나 멋있냐고요? 버튼 한 번 눌러서, 완전히 전혀 다른 캐릭터로 바꿔 강연을 할 수 있습니다. 이건 엘보어(Elbor)입니다. 우리는 다른 형상에는 어떻게 적용되는지 시험하기 위해 엘보어를 함께 사용하고 있습니다. 이 기술이 환상적인 것은, 제가 캐릭터를 변경해도, 여전히 행동은 저입니다. 저는 입의 오른쪽으로 이야기 하는 경향이 있습니다. 엘보어도 마찬가지죠.

(Laughter)

(웃음)

Now, the second reason we did this, and you can imagine, is this is going to be great for film. This is a brand-new, exciting tool for artists and directors and storytellers. It's pretty obvious, right? I mean, this is going to be really neat to have. But also, now that we've built it, it's clear that this is going to go way beyond film.

이 일을 하는 두 번째 이유는, 여러분도 추측할 수 있습니다. 영화에 매우 뛰어난 강점이 있기 때문입니다. 예술가나 감독, 스토리텔러에게 매우 새롭고, 흥미로운 장치이죠. 이건 아주 분명합니다. 그렇죠? 영화에 매우 유용할 겁니다. 우리가 마침내 만들어 냈습니다. 그리고 영화를 뛰어 넘을게 확실하죠.

But wait. Didn't I just change my identity with the push of a button? Isn't this like "deepfake" and face-swapping that you guys may have heard of? Well, yeah. In fact, we are using some of the same technology that deepfake is using. Deepfake is 2-D and image based, while ours is full 3-D and way more powerful. But they're very related. And now I can hear you thinking, "Darn it! I though I could at least trust and believe in video. If it was live video, didn't it have to be true?" Well, we know that's not really the case, right? Even without this, there are simple tricks that you can do with video like how you frame a shot that can make it really misrepresent what's actually going on. And I've been working in visual effects for a long time, and I've known for a long time that with enough effort, we can fool anyone about anything. What this stuff and deepfake is doing is making it easier and more accessible to manipulate video, just like Photoshop did for manipulating images, some time ago.

그런데 잠시만요. 제가 버튼 한 번 눌러, 저의 신분을 바꾸지 않았습니까? 여러분은 "영상조작"이나 얼굴 변환과 같은 것 아니냐? 고 물으실 수 있습니다. 예. 그렇습니다. 사실, '형상변환' 에서 이용하는 것과 일부 같은 기술을 쓰고 있습니다. 그런데 형상변환은 2D 이미지 기반이고, 우리 것은 완전 3D이죠. 그리고 훨씬 더 강력합니다. 그러나 서로 연관성은 높습니다. 저는 여러분의 생각을 알 수가 있죠. "제길! 저는 적어도 영상을 신뢰하거나 믿을 수 있습니다. 만약, 이것이 실시간 영상이라면, 믿어야 하지 않을까요?" 자, 우리는 그게 사실이 아니란 것을 알고 있죠. 그렇죠? 이게 아니더라도, 실제 상황을 왜곡하여 영상을 다르게 조작 할 수 있는 단순한 속임수들이 있습니다. 저는 시각효과 분야에서 오랫동안 일을 해왔습니다. 그리고 조금의 노력으로 누구를 막론하고 속일 수 있다는 걸 오랫동안 알고 있었습니다. 우리의 기술과 형상변환은 영상을 더 쉽고 간단히 조작할 수 있게 합니다. 얼마 전까지 포토샵으로 이미지를 조작했던 것처럼 말이죠.

I prefer to think about how this technology could bring humanity to other technology and bring us all closer together. Now that you've seen this, think about the possibilities. Right off the bat, you're going to see it in live events and concerts, like this. Digital celebrities, especially with new projection technology, are going to be just like the movies, but alive and in real time. And new forms of communication are coming. You can already interact with DigiDoug in VR. And it is eye-opening. It's just like you and I are in the same room, even though we may be miles apart. Heck, the next time you make a video call, you will be able to choose the version of you you want people to see. It's like really, really good makeup. I was scanned about a year and a half ago. I've aged. DigiDoug hasn't. On video calls, I never have to grow old.

저는 이 기술이 인간적인 면을 가져다 줄 수 있고, 우리 서로를 더 가까이 할 수 있게 한다는 사실에 관심이 많습니다. 여러분이 보신 것을 통해 그 가능성을 가늠해 볼 수 있습니다. 지금 바로, 여러분은 실시간 이벤트와 콘서트도 볼 수도 있죠. 신기술을 통해 영화에서처럼 디지털 유명인사들이 나타나고, 이들이 실시간으로 살아있듯이 움직일 것입니다. 그리고 새로운 커뮤니케이션 방법이 등장할 것입니다. 이미 우리는 VR을 통해 디지더그와 소통할 수 있습니다. 놀랄만합니다. 멀리 떨어져 있어도, 같은 방에 있는 것처럼 느끼게 합니다. 앞으로 영상통화를 하게 된다면, 여러분은 다른 사람들에게 보여줄 외모 버전 하나 선택할 수 있을 것입니다. 이건 정말이지 잘된 화장과도 같아요. 약 1년 반 전에 제 모습을 스캔 해놓았어요. 저는 나이가 들었지만, 디지더그는 그대로 입니다. 영상 통화에서 저는 나이가 들지 않습니다.

And as you can imagine, this is going to be used to give virtual assistants a body and a face. A humanity. I already love it that when I talk to virtual assistants, they answer back in a soothing, humanlike voice. Now they'll have a face. And you'll get all the nonverbal cues that make communication so much easier. It's going to be really nice. You'll be able to tell when a virtual assistant is busy or confused or concerned about something.

여러분이 상상할 수 있듯, 이 기술은 가상의 비서에게 얼굴과 몸을 가져다 줄 것입니다. 인간성을 부여하는 거죠. 저는 가상 비서와 대화를 즐깁니다. 인간과 같은 부드러운 목소리로 대답을 하죠. 이제 그들은 얼굴을 갖게 될 것입니다. 비언어적 반응을 통해 의사소통을 훨씬 수월하게 할 수 있습니다. 이건 매우 좋아질 거예요. 앞으로 여러분은 가상의 비서가 바쁘거나, 당황해하거나 어떤 것에 염려할 때 알아차릴 수 있을 겁니다.

Now, I couldn't leave the stage without you actually being able to see my real face, so you can do some comparison. So let me take off my helmet here. Yeah, don't worry, it looks way worse than it feels.

이제, 여러분께 제 실제 얼굴을 보여드리겠습니다. 비교해보세요. 헬멧을 벗어 보겠습니다. 걱정하진 마세요. 생각보다 훨씬 더 못생겼습니다.

(Laughter)

(웃음)

So this is where we are. Let me put this back on here.

현재 우리 수준은 이렇습니다. 다시 쓰겠습니다.

(Laughter) Doink!

(웃음) 바보!

So this is where we are. We're on the cusp of being able to interact with digital humans that are strikingly real, whether they're being controlled by a person or a machine. And like all new technology these days, it's going to come with some serious and real concerns that we have to deal with. But I am just so really excited about the ability to bring something that I've seen only in science fiction for my entire life into reality. Communicating with computers will be like talking to a friend. And talking to faraway friends will be like sitting with them together in the same room.

우리는 여기까지 왔습니다. 우리는 놀랄 만큼 실제적인 디지털 인간과 상호작용할 수 있는 단계에 이르렀습니다. 사람이든 기계든 그 것을 누가 조종하느냐에 상관없이 말이죠. 오늘날의 모든 새로운 기술처럼 우리가 앞으로 처리해야 할 문제와 걱정거리들이 남아 있습니다. 하지만 저는 평생 동안 공상 과학 소설에서만 보던 것을 현실로 가져올 수 있었다는 것에 정말 기쁨을 느낍니다. 앞으로 컴퓨터와 친구처럼 대화를 할 수 있습니다. 멀리 있는 친구와 대화는 같은 방에서 함께 앉아있는 것과 같이 할 수 있습니다.

Thank you very much.

감사합니다.

(Applause)

(박수)

Hello.

안녕하세요.

(Applause)

(박수)

(Laughter)

(웃음)

(Laughter)

(웃음)

(Laughter)

(웃음)

But why did we do this? Well, there's two reasons, really. First of all, it is just crazy cool.

그런데 우리가 왜 이 일을 했을까요? 두 가지 이유가 있습니다. 첫째, 그냥 이 일 자체가 멋지기 때문이죠.

(Laughter)

(웃음)

(Laughter)

(웃음)

이제, 여러분께 제 실제 얼굴을 보여드리겠습니다. 비교해보세요. 헬멧을 벗어 보겠습니다. 걱정하진 마세요. 생각보다 훨씬 더 못생겼습니다.

(Laughter)

(웃음)

So this is where we are. Let me put this back on here.

현재 우리 수준은 이렇습니다. 다시 쓰겠습니다.

(Laughter) Doink!

(웃음) 바보!

Thank you very much.

감사합니다.

(Applause)

(박수)

Doug Roble: Digital humans that look just like us

Doug Roble: Digital humans that look just like us

Related talks

Paul Debevec: Animating a photo-real digital face

Danielle Feinberg: The magic ingredient that brings Pixar movies to life

Tasos Frantzolas: Everything you hear on film is a lie

James Bridle: The nightmare videos of children's YouTube -- and what's wrong with the internet today

Chris Milk: How virtual reality can create the ultimate empathy machine

Ariel Garten: Know thyself, with a brain scanner

Related talks

Paul Debevec: Animating a photo-real digital face

Danielle Feinberg: The magic ingredient that brings Pixar movies to life

Tasos Frantzolas: Everything you hear on film is a lie

James Bridle: The nightmare videos of children's YouTube -- and what's wrong with the internet today

Chris Milk: How virtual reality can create the ultimate empathy machine

Ariel Garten: Know thyself, with a brain scanner