Doug Roble: Digital humans that look just like us

Hello. I'm not a real person. I'm actually a copy of a real person. Although, I feel like a real person. It's kind of hard to explain. Hold on -- I think I saw a real person ... there's one. Let's bring him onstage.

Merhaba. Gerçek biri değilim. Gerçek birinin kopyasıyım. Fakat gerçek bir insan gibi hissediyorum. Açıklaması biraz zor. Bir saniye. Sanırım gerçek bir insan gördüm. İşte orada. Hadi onu sahneye alalım.

Hello.

Merhaba.

(Applause)

(Alkışlar)

What you see up there is a digital human. I'm wearing an inertial motion capture suit that's figuring what my body is doing. And I've got a single camera here that's watching my face and feeding some machine-learning software that's taking my expressions, like, "Hm, hm, hm," and transferring it to that guy. We call him "DigiDoug." He's actually a 3-D character that I'm controlling live in real time.

Yukarıda görmüş olduğunuz şey dijital bir insan. Üzerimde hareket algılayıcı bir kıyafet var, vücudumun ne yaptığını anlıyor. Burada da yüzümü çeken tekli bir kamera var, ayrıca mimiklerimi yakalayan makine öğrenimli bir yazılıma aktarım yapıyor, ''hm, hm, hm'' gibi ve şu adama aktarıyor. Biz ona ''DigiDoug'' diyoruz. O aslında gerçek zamanda canlı kontrol ettiğim 3D bir karakter.

So, I work in visual effects. And in visual effects, one of the hardest things to do is to create believable, digital humans that the audience accepts as real. People are just really good at recognizing other people. Go figure! So, that's OK, we like a challenge.

Görsel efekt işinde çalışıyorum. Görsel efektlerde en zor şeylerden biri seyircilerin gerçek gibi kabul edecekleri inandırıcı, dijital insanlar yapmak. İnsanlar başka insanları tanımada gerçekten iyiler. Anlayın anlayabilirseniz. Peki, sorun değil. Biz zoru severiz.

Over the last 15 years, we've been putting humans and creatures into film that you accept as real. If they're happy, you should feel happy. And if they feel pain, you should empathize with them. We're getting pretty good at it, too. But it's really, really difficult. Effects like these take thousands of hours and hundreds of really talented artists.

On beş yılı aşkın süredir gerçek olarak kabul ettiğiniz insanları ve yaratıkları filmlere yerleştiriyoruz. Eğer onlar mutlularsa siz de mutlu hissetmelisiniz. Eğer onlar acı çekiyorlarsa siz de onlarla empati kurmalısınız. Bunda da oldukça iyiye gidiyoruz. Aslında gerçekten ama gerçekten çok zor. Bu gibi efektler binlerce saatinizi alır ve yüzlerce gerçekten yetenekli sanatçı gerektirir.

But things have changed. Over the last five years, computers and graphics cards have gotten seriously fast. And machine learning, deep learning, has happened. So we asked ourselves: Do you suppose we could create a photo-realistic human, like we're doing for film, but where you're seeing the actual emotions and the details of the person who's controlling the digital human in real time? In fact, that's our goal: If you were having a conversation with DigiDoug one-on-one, is it real enough so that you could tell whether or not I was lying to you? So that was our goal.

Ama şartlar değişti. Son beş yılı aşkın süredir, bilgisayarlar ve grafik kartları ciddi anlamda hızlandı. Ve makine öğrenimi, derin öğrenme, gerçekleşti. Biz de kendimize sorduk: Foto realistik bir insan yapabildiğimizi düşünebiliyor musunuz, tıpkı filmlerde yaptığımız gibi, ama dijital insanı gerçek zamanda kontrol eden kişinin asıl duygu ve detaylarını görebileceksiniz. Aslında, bu bizim amacımız. Eğer DigiDoug ile bire bir sohbet ediyorsanız size yalan söyleyip söylemediğimi anlayacak kadar gerçekçi midir peki? İşte bu bizim amacımızdı.

About a year and a half ago, we set off to achieve this goal. What I'm going to do now is take you basically on a little bit of a journey to see exactly what we had to do to get where we are. We had to capture an enormous amount of data. In fact, by the end of this thing, we had probably one of the largest facial data sets on the planet. Of my face.

Yaklaşık bir buçuk yıl önce bu amacı gerçekleştirmek için yola çıktık. Şimdi sizi küçük bir yolculuğa çıkaracağım ve bugünlere gelmek için neler yapmak zorunda olduğumuzu göstereceğim. Çok büyük miktarda veri toplamak zorundaydık. Aslında bu şeyin sonunda, muhtemelen gezegendeki en büyük yüz veri setlerinden bir tanesi sahiptik. Benim yüzümün.

(Laughter)

(Kahkahalar)

Why me? Well, I'll do just about anything for science. I mean, look at me! I mean, come on. We had to first figure out what my face actually looked like. Not just a photograph or a 3-D scan, but what it actually looked like in any photograph, how light interacts with my skin. Luckily for us, about three blocks away from our Los Angeles studio is this place called ICT. They're a research lab that's associated with the University of Southern California. They have a device there, it's called the "light stage." It has a zillion individually controlled lights and a whole bunch of cameras. And with that, we can reconstruct my face under a myriad of lighting conditions. We even captured the blood flow and how my face changes when I make expressions. This let us build a model of my face that, quite frankly, is just amazing. It's got an unfortunate level of detail, unfortunately.

Neden ben? Pekâlâ, bilim için her şeyi yaparım. Hâlime bir baksanıza! Hadi ama. İlk olarak benim yüzümün nasıl göründüğünü bilmeliydik. Sadece bir fotoğraf ya da 3D tarama değil herhangi bir fotoğrafta nasıl göründüğü önemliydi, ışığın cildimle nasıl etkileşime girdiği. Şanslıyız ki Los Angeles'taki stüdyomuzdan üç sokak ötede ICT denen bir yer var. Bir araştırma laboratuvarı, Southern California Üniversitesi'ne bağlı. Orada ''ışık sahnesi'' denen bir cihaz var. Ayrı ayrı kontol edilen bir sürü ışık var ve bir hayli de kamera. Bununla, bir dizi ışıklandırma altında yüzümü yeni baştan düzenleyebiliriz. Kan dolaşımını ve mimiklerimle birlikte yüzümün nasıl değiştiğini bile yakaladık. Bu da yüzümün bir modelini çıkartmamızı sağladı, ki bu harika bir şey. Ne yazık ki rahatsız edici bir şekilde ayrıntılı.

(Laughter)

(Kahkahalar)

You can see every pore, every wrinkle. But we had to have that. Reality is all about detail. And without it, you miss it. We are far from done, though. This let us build a model of my face that looked like me. But it didn't really move like me. And that's where machine learning comes in. And machine learning needs a ton of data. So I sat down in front of some high-resolution motion-capturing device. And also, we did this traditional motion capture with markers. We created a whole bunch of images of my face and moving point clouds that represented that shapes of my face. Man, I made a lot of expressions, I said different lines in different emotional states ... We had to do a lot of capture with this. Once we had this enormous amount of data, we built and trained deep neural networks. And when we were finished with that, in 16 milliseconds, the neural network can look at my image and figure out everything about my face. It can compute my expression, my wrinkles, my blood flow -- even how my eyelashes move. This is then rendered and displayed up there with all the detail that we captured previously.

Bütün gözenekleri, kırışıklıkları görebilirsiniz. Fakat buna ihtiyacımız vardı. Gerçeklik detayda saklı. Ve onsuz, gerçeklik de olmaz. Henüz her şey bitmiş değil. Bu, yüzümün bir modelini yapmamızı sağladı. Fakat bu şey aslında benim gibi hareket etmedi. İşte bu da makine öğreniminin devreye girdiği yer. Makine öğrenimi tonlarca veriye ihtiyaç duyuyor. Bu yüzden yüksek çözünürlüklü hareket algılayan bir cihazın önüne oturdum. Ve kalemlerle bu geleneksel hareket algılayan şeyi de yaptık. Yüzümün bir sürü görselini çıkarttık ve hareket eden nokta bulutları da yüzümün şekillerini yansıttı. Ne kadar da çok mimik yaptım. Farklı cümleleri farklı duygularla söyledim... Bununla bir sürü şey yakalamalıydık. Büyük ölçüde veriye eriştiğimizde de derin sinir ağları kurduk ve eğittik. Bunu da bitirdiğimizde 16 milisaniyede sinir ağı, görüntüme bakabilir ve yüzümle alakalı her şeyi çözümleyebilir. Mimiklerimi, kırışıklıklarımı, kan dolaşımımı hesaplayabilir -- hatta kirpiklerimin hareketini bile. İşte bu şekilde önceden yakalamış olduğumuz bütün detaylar ile beraber yukarıda resmediliyor ve gösteriliyor.

We're far from done. This is very much a work in progress. This is actually the first time we've shown it outside of our company. And, you know, it doesn't look as convincing as we want; I've got wires coming out of the back of me, and there's a sixth-of-a-second delay between when we capture the video and we display it up there. Sixth of a second -- that's crazy good! But it's still why you're hearing a bit of an echo and stuff. And you know, this machine learning stuff is brand-new to us, sometimes it's hard to convince to do the right thing, you know? It goes a little sideways.

Daha çok yolumuz var. Hâlâ sürmekte olan bir çalışma bu. Aslında bu, şirketimizin dışında yaptığımız ilk sunum. Ve istediğimiz gibi de inandırıcı görünmüyor. Arkamdan çıkan kablolar var. Görüntüyü yakalamamız ve yukarıda göstermemiz arasında saniyenin altıda biri gecikme var. Saniyenin altıda biri. Bu harika bir şey. Fakat hâlâ biraz eko ve benzeri şeyleri duymamızın sebebi. Ve bu makine öğrenimi bizim için çok yeni, bazen doğru şeyi yapmasını sağlamak zor. Biraz hata yapabiliyor.

(Laughter)

(Kahkahalar)

But why did we do this? Well, there's two reasons, really. First of all, it is just crazy cool.

Fakat biz bunu neden yaptık? Aslında iki tane sebebi var. Öncelikle son derece havalı.

(Laughter)

(Kahkahalar)

How cool is it? Well, with the push of a button, I can deliver this talk as a completely different character. This is Elbor. We put him together to test how this would work with a different appearance. And the cool thing about this technology is that, while I've changed my character, the performance is still all me. I tend to talk out of the right side of my mouth; so does Elbor.

Nasıl mı havalı? Düğmeye basmamla bu konuşmayı tamamen farklı bir karaktere iletebilirim. Bu Elbor. Onu buraya farklı bir görünüşle bu şeyin nasıl çalıştığını test etmek için koyduk. Bu teknolojiyle alakalı güzel olan şey de karakterimi değiştirdiğimde performansın hâlâ bana ait olması. Ağzımın sağ tarafıyla konuştuğumda Elbor da aynını yapıyor.

(Laughter)

(Kahkahalar)

Now, the second reason we did this, and you can imagine, is this is going to be great for film. This is a brand-new, exciting tool for artists and directors and storytellers. It's pretty obvious, right? I mean, this is going to be really neat to have. But also, now that we've built it, it's clear that this is going to go way beyond film.

Bunu yapmamızın ikinci sebebi ise, tahmin edebilirsiniz ki bu, sinema için harika bir şey olacak. Bu, sanatçılar, yönetmenler ve hikâye anlatıcıları için yepyeni, heyecan verici araç. Oldukça aşikâr, değil mi? Yani, buna sahip olmak harika bir şey. Ama aynı zamanda, bu yaptığımız şey belli ki sinemanın de ötesine geçecek.

But wait. Didn't I just change my identity with the push of a button? Isn't this like "deepfake" and face-swapping that you guys may have heard of? Well, yeah. In fact, we are using some of the same technology that deepfake is using. Deepfake is 2-D and image based, while ours is full 3-D and way more powerful. But they're very related. And now I can hear you thinking, "Darn it! I though I could at least trust and believe in video. If it was live video, didn't it have to be true?" Well, we know that's not really the case, right? Even without this, there are simple tricks that you can do with video like how you frame a shot that can make it really misrepresent what's actually going on. And I've been working in visual effects for a long time, and I've known for a long time that with enough effort, we can fool anyone about anything. What this stuff and deepfake is doing is making it easier and more accessible to manipulate video, just like Photoshop did for manipulating images, some time ago.

Fakat bir saniye. Düğmeye basışımla kimliğimi değiştirmedim mi? Bu belki de daha önce duyduğunuz ''deepfake'' ve yüz değiştirme gibi bir şey değil mi? Evet, öyle. Aslında biz deepfake'in kullandığı ayn teknolojinin bazı yönlerini kullanıyoruz. Deepfake 2D ve görüntü tabanlı iken bizimki 3D ve daha güçlü. Ama ikisi oldukça bağlantılı. Şu anda düşüncelerinizi duyabiliyorum. ''Lanet olsun! Videoya inanıp güvenebileceğimi düşünmüştüm. Eğer canlı bir video olsaydı gerçek olması gerekmez miydi?'' Biliyoruz ki asıl mesele bu değil, değil mi? Bu olmadan bile video ile yapabileceğiniz basit numaralar var, nasıl atış yapabileceğiniz gibi, ki bu da aslında gerçekleşmekte olan şeyi yanlış tanıtabilir. Uzun bir süredir görsel efekt işinde çalışıyorum. Uzun bir süredir de biliyorum ki yeterli eforla herhangi birini herhangi bir şey hakkında kandırabiliriz. Bu ve deepfake'in yapmakta olduğu şey video manipüle etmeyi daha kolay ve erişilebilir hâle getiriyor. bir süre önce Photoshop'ın görüntülerle oynadığı gibi.

I prefer to think about how this technology could bring humanity to other technology and bring us all closer together. Now that you've seen this, think about the possibilities. Right off the bat, you're going to see it in live events and concerts, like this. Digital celebrities, especially with new projection technology, are going to be just like the movies, but alive and in real time. And new forms of communication are coming. You can already interact with DigiDoug in VR. And it is eye-opening. It's just like you and I are in the same room, even though we may be miles apart. Heck, the next time you make a video call, you will be able to choose the version of you you want people to see. It's like really, really good makeup. I was scanned about a year and a half ago. I've aged. DigiDoug hasn't. On video calls, I never have to grow old.

Bu teknolojinin insanlığı nasıl başka bir teknolojiye bağladığını ve nasıl bizi yakınlaştırdığını düşünmek istiyorum. Siz de buna tanık oldunuz, olasılıkları düşünün. Hiç geçmeden, onu canlı etkinliklerde ve konserlerde göreceksiniz, bunun gibi. Dijital ünlüler, özellikle de yeni projeksiyon teknolojisi ile tıpkı filmlerdeki gibi olacaklar fakat canlı ve gerçek zamanlı. Yeni iletişim modelleri çıkıyor. Daha şimdiden sanal gerçeklikte DigiDoug ile etkileşime girebilirsiniz. Ve o ufuk açıcı. Tıpkı sizlerle benim aynı odada olmamız gibi, kilometrelerce uzakta olsa bile. Bir sonraki video aramanızda insanların sizi görmesi için istediğiniz şekli seçebileceksiniz. Çok ama çok iyi bir makyaj gibi. Bir buçuk yıl önce bu sistemde tarandım. Yaşlandım. Fakat Digidoug yaşlanmadı. Video aramalarında, yaşlı görünmek zorunda değilim.

And as you can imagine, this is going to be used to give virtual assistants a body and a face. A humanity. I already love it that when I talk to virtual assistants, they answer back in a soothing, humanlike voice. Now they'll have a face. And you'll get all the nonverbal cues that make communication so much easier. It's going to be really nice. You'll be able to tell when a virtual assistant is busy or confused or concerned about something.

Hayal edebileceğiniz gibi bu, sanal asistanlara yüz ve vücut vermek için kullanılacak. Bir insanlık. Sanal asistanlarla konuştuğumda insana aitmiş gibi sakin bir tonla cevaplamalarına bayılıyorum. Şimdi bir yüze de sahip olacaklar. Ve iletişimi çok daha kolay hâle getiren sözsüz işaretleri de göreceksiniz. Gerçekten çok güzel olacak. Bir sanal asistan meşgul veya kafası karışmış ya da bir şey hakkında endişeli olduğunda anlayabileceksiniz.

Now, I couldn't leave the stage without you actually being able to see my real face, so you can do some comparison. So let me take off my helmet here. Yeah, don't worry, it looks way worse than it feels.

Siz benim gerçek yüzümü görmeden bu sahneyi terk edemem çünkü biraz karşılaştırma yapabilirsiniz. Başlığımı çıkarayım. Endişelenmeyin, hissettirdiğinden daha kötü görünüyor.

(Laughter)

(Kahkahalar)

So this is where we are. Let me put this back on here.

İşte bu geldiğimiz nokta. Geri takayım şunu.

(Laughter) Doink!

(Kahkahalar)

So this is where we are. We're on the cusp of being able to interact with digital humans that are strikingly real, whether they're being controlled by a person or a machine. And like all new technology these days, it's going to come with some serious and real concerns that we have to deal with. But I am just so really excited about the ability to bring something that I've seen only in science fiction for my entire life into reality. Communicating with computers will be like talking to a friend. And talking to faraway friends will be like sitting with them together in the same room.

İşte bu geldiğimiz nokta. Ya bir kişi tarafından ya da bir makine tarafından kontrol ediliyor olsunlar şaşırtıcı bir şekilde gerçekçi olan dijital insanlarla etkileşim kurabilmenin zirvesindeyiz. Ve günümüzdeki diğer tüm yeni teknolojiler gibi bu şey, halletmemiz gereken bazı ciddi ve gerçek sorunları da beraberinde getirecektir. Ama şu beni çok sevindiriyor ki hayatım boyunca sadece bilim kurguda gördüğüm bir şeyi gerçek hayata dönüştürme yeteneğine sahibiz. Bilgisayarlarla iletişim kurmak tıpkı bir arkadaşla konuşmak gibi olacak. Ve uzaktaki arkadaşlarla konuşmak onlarla aynı odada birlikte oturuyormuşsunuz gibi olacak.

Thank you very much.

Çok teşekkür ederim.

(Applause)

(Alkışlar)

Hello.

Merhaba.

(Applause)

(Alkışlar)

(Laughter)

(Kahkahalar)

(Laughter)

(Kahkahalar)

(Laughter)

(Kahkahalar)

But why did we do this? Well, there's two reasons, really. First of all, it is just crazy cool.

Fakat biz bunu neden yaptık? Aslında iki tane sebebi var. Öncelikle son derece havalı.

(Laughter)

(Kahkahalar)

(Laughter)

(Kahkahalar)

(Laughter)

(Kahkahalar)

So this is where we are. Let me put this back on here.

İşte bu geldiğimiz nokta. Geri takayım şunu.

(Laughter) Doink!

(Kahkahalar)

Thank you very much.

Çok teşekkür ederim.

(Applause)

(Alkışlar)

Doug Roble: Digital humans that look just like us

Doug Roble: Digital humans that look just like us

Related talks

Paul Debevec: Animating a photo-real digital face

Danielle Feinberg: The magic ingredient that brings Pixar movies to life

Tasos Frantzolas: Everything you hear on film is a lie

James Bridle: The nightmare videos of children's YouTube -- and what's wrong with the internet today

Chris Milk: How virtual reality can create the ultimate empathy machine

Ariel Garten: Know thyself, with a brain scanner

Related talks

Paul Debevec: Animating a photo-real digital face

Danielle Feinberg: The magic ingredient that brings Pixar movies to life

Tasos Frantzolas: Everything you hear on film is a lie

James Bridle: The nightmare videos of children's YouTube -- and what's wrong with the internet today

Chris Milk: How virtual reality can create the ultimate empathy machine

Ariel Garten: Know thyself, with a brain scanner