Blaise Agüera y Arcas: How computers are learning to be creative

So, I lead a team at Google that works on machine intelligence; in other words, the engineering discipline of making computers and devices able to do some of the things that brains do. And this makes us interested in real brains and neuroscience as well, and especially interested in the things that our brains do that are still far superior to the performance of computers.

저는 구글에서 기계지능팀을 이끌고 있습니다. 다르게 표현하면, 컴퓨터와 장치를 공학적으로 훈련시켜 뇌가 하는 일을 할 수 있게 합니다. 그리고 이 일을 하면서 저희는 실제 뇌와 신경과학에 관심을 두게 되었습니다. 특히 관심 있는 부분은 우리의 뇌가 하는 일 중에 아직 컴퓨터보다 훨씬 뛰어난 부분에 대한 것입니다.

Historically, one of those areas has been perception, the process by which things out there in the world -- sounds and images -- can turn into concepts in the mind. This is essential for our own brains, and it's also pretty useful on a computer. The machine perception algorithms, for example, that our team makes, are what enable your pictures on Google Photos to become searchable, based on what's in them. The flip side of perception is creativity: turning a concept into something out there into the world. So over the past year, our work on machine perception has also unexpectedly connected with the world of machine creativity and machine art.

역사적으로 이런 부분 중에 하나로 인식이 언급돼 왔습니다. 세상에 존재하는 소리나 이미지를 과정을 통해 마음속에 개념화시키는 것입니다. 이것은 우리 뇌에 필수적인 기능이고 컴퓨터에도 꽤 유용합니다. 기계 인식 알고리즘의 예로 저희 팀에서 한 일은 구글 포토스에 올린 사진을 뭐가 찍혔냐에 따라 검색이 가능하게 한 것이죠. 인식의 반대말은 창의성입니다. 개념을 세상에 존재하는 것으로 바꾸는 것입니다. 지난 몇 년 동안 기계 인식에 대해 저희가 해온 일들은 뜻밖에도 기계의 창의력과 기계 예술을 연결했습니다.

I think Michelangelo had a penetrating insight into to this dual relationship between perception and creativity. This is a famous quote of his: "Every block of stone has a statue inside of it, and the job of the sculptor is to discover it." So I think that what Michelangelo was getting at is that we create by perceiving, and that perception itself is an act of imagination and is the stuff of creativity.

저는 미켈란젤로가 인식과 창의성 간의 이중 관계를 꿰뚫어 보았다고 생각합니다. 이것은 그의 유명한 인용구입니다. "모든 돌덩이는 그 안에 조각상을 가지고 있고 그것을 발견하는 것이 조각가의 과업이다." 그래서 저는 미켈란젤로의 생각은 우리는 인식하는 것으로 창조하고 그 인식 자체가 상상하는 행위이며 창의성이라 여깁니다.

The organ that does all the thinking and perceiving and imagining, of course, is the brain. And I'd like to begin with a brief bit of history about what we know about brains. Because unlike, say, the heart or the intestines, you really can't say very much about a brain by just looking at it, at least with the naked eye. The early anatomists who looked at brains gave the superficial structures of this thing all kinds of fanciful names, like hippocampus, meaning "little shrimp." But of course that sort of thing doesn't tell us very much about what's actually going on inside.

생각하고 인식하고 상상하는 기관은 물론 뇌입니다. 그리고 저는 간략하게 뇌에 대한 연구의 역사에 대해서 이야기하고 싶습니다. 왜냐하면 심장이나 장과 달리 보기만 해선 뇌에 대해 이야기할 게 없기 때문입니다. 겉으로 보기에 말이죠. 초기 해부학자들은 뇌를 보고 표면상의 구조에 온갖 기발한 이름을 붙였습니다. 해마같이 말이죠, 뜻은 "작은 새우"입니다. 하지만 물론 이런 이름들이 실제로 무슨일을 하는지 말해 주지는 않습니다.

The first person who, I think, really developed some kind of insight into what was going on in the brain was the great Spanish neuroanatomist, Santiago Ramón y Cajal, in the 19th century, who used microscopy and special stains that could selectively fill in or render in very high contrast the individual cells in the brain, in order to start to understand their morphologies. And these are the kinds of drawings that he made of neurons in the 19th century.

제 생각에 최초로 뇌에서 무슨 일이 일어나는지에 대해 큰 공헌을 한 사람은 스페인의 위대한 신경 해부학자인 산티아고 라몬 이 카할입니다. 19세기에 현미경 관찰과 특수한 착색을 이용해 선택적으로 각각의 뇌세포를 채우거나 높은 대비를 만들어 내 형태학적인 이해를 할 수 있게 한 사람입니다. 이것들은 그가 신경 세포로 만든 그림들입니다. 19세기에 말이죠.

This is from a bird brain. And you see this incredible variety of different sorts of cells, even the cellular theory itself was quite new at this point. And these structures, these cells that have these arborizations, these branches that can go very, very long distances -- this was very novel at the time. They're reminiscent, of course, of wires. That might have been obvious to some people in the 19th century; the revolutions of wiring and electricity were just getting underway. But in many ways, these microanatomical drawings of Ramón y Cajal's, like this one, they're still in some ways unsurpassed.

이것은 새의 뇌 그림입니다. 그리고 굉장히 다양한 세포를 볼 수 있습니다. 심지어 세포이론도 알려진 지 얼마 안 된 때였습니다. 그리고 이 구조는 수지상부를 가지고 있는 세포들의 가지는 아주 멀리까지 뻗을 수 있는데 당시 매우 새로웠습니다. 이 구조는 전선을 연상시킵니다. 전선과 전기의 혁명이 일어나던 19세기 사람들은 당연히 그렇게 볼 수 있었을 것입니다. 하지만 여러 가지 면에서 이런 라몬 이 카할의 조직학적 그림은 오늘날에도 최고로 여겨집니다.

We're still more than a century later, trying to finish the job that Ramón y Cajal started. These are raw data from our collaborators at the Max Planck Institute of Neuroscience. And what our collaborators have done is to image little pieces of brain tissue. The entire sample here is about one cubic millimeter in size, and I'm showing you a very, very small piece of it here. That bar on the left is about one micron. The structures you see are mitochondria that are the size of bacteria. And these are consecutive slices through this very, very tiny block of tissue. Just for comparison's sake, the diameter of an average strand of hair is about 100 microns. So we're looking at something much, much smaller than a single strand of hair.

우리는 지난 한 세기 동안 라몬 이 카할이 시작한 일을 끝내려고 노력하고 있습니다. 이것들은 막스플랑크 신경과학 연구소 협력자들의 기초 데이타입니다. 그리고 저희 협력자들이 한 것은 뇌세포의 작은 부분을 조명한 것 입니다. 이 샘플의 전체 크기는 대략 1 입방 밀리미터이고 결과물의 아주 작은 부분을 보고 계신 것입니다. 왼쪽에 있는 바는 1미크론 입니다. 보고 계신 구조는 미토콘드리아입니다. 이는 박테리아만큼 작습니다. 이것은 아주 작은 조직으로 자른 연속적인 단면입니다. 비교를 하자면 머리카락의 평균 지름은 100 미크론입니다. 저희가 보고 있는 것은 머리카락 한 가닥보다 훨씬 작은 것입니다.

And from these kinds of serial electron microscopy slices, one can start to make reconstructions in 3D of neurons that look like these. So these are sort of in the same style as Ramón y Cajal. Only a few neurons lit up, because otherwise we wouldn't be able to see anything here. It would be so crowded, so full of structure, of wiring all connecting one neuron to another.

그리고 이런 전자현미경으로 나눈 일련의 조각들로 신경세포를 3D로 이렇게 복원할 수 있습니다. 이것은 라몬 이 카할의 방식과 어느 정도 같습니다. 일부 신경세포만 비추었죠. 그렇지 않으면 아무것도 구분할 수 없을 것입니다. 사진 가득히 신경세포끼리 서로 연결된 구조만 보일 것입니다.

So Ramón y Cajal was a little bit ahead of his time, and progress on understanding the brain proceeded slowly over the next few decades. But we knew that neurons used electricity, and by World War II, our technology was advanced enough to start doing real electrical experiments on live neurons to better understand how they worked. This was the very same time when computers were being invented, very much based on the idea of modeling the brain -- of "intelligent machinery," as Alan Turing called it, one of the fathers of computer science.

라몬 이 카할은 시대를 앞서나갔고 그후 수십 년 동안 뇌의 이해에 대한 연구는 서서히 발전했습니다. 그러나 우리는 신경세포가 전기를 이용하는 것을 알아냈고 제2차 세계대전 때 발전한 기술로 실제로 신경세포에 전기 실험을 할 수 있게 되고 신경세포를 더 이해할 수 있었습니다. 컴퓨터가 발명된 것도 바로 이때인데 뇌를 모델로 한 아이디어였죠. 앨런 튜링은 "지능형 기계" 라고 불렀습니다. 컴퓨터 공학의 아버지 중에 한 명이죠.

Warren McCulloch and Walter Pitts looked at Ramón y Cajal's drawing of visual cortex, which I'm showing here. This is the cortex that processes imagery that comes from the eye. And for them, this looked like a circuit diagram. So there are a lot of details in McCulloch and Pitts's circuit diagram that are not quite right. But this basic idea that visual cortex works like a series of computational elements that pass information one to the next in a cascade, is essentially correct.

워렌 맥컬로흐와 월터 피츠는 어느날 라몬 이 카할의 시각 피질 그림을 보았습니다. 지금 보고 계신 그림말이죠. 이것은 눈을 통해 들어온 이미지를 처리하는 피질입니다. 그리고 그들에겐 이 그림은 마치 회로도처럼 보였습니다. 맥컬로흐와 피츠의 회로도에는 많은 세부사항이 있지만 정확하지는 않습니다. 하지만 기본 아이디어인 시각 피질의 원리가 일련의 계산 요소를 연속적으로 하나에서 다음으로 정보를 넘긴다는 것은 근본적으로 맞습니다.

Let's talk for a moment about what a model for processing visual information would need to do. The basic task of perception is to take an image like this one and say, "That's a bird," which is a very simple thing for us to do with our brains. But you should all understand that for a computer, this was pretty much impossible just a few years ago. The classical computing paradigm is not one in which this task is easy to do.

조금 더 이야기해 보겠습니다. 시각 정보를 처리하는 모델이 해야 하는 일에 대해서 말이죠. 인식이 기본적으로 하는 일은 이런 이미지를 보고 이렇게 말하는 것 입니다. "이것은 새입니다" 우리에게는 매우 쉬운 일입니다. 하지만 여러분 모두가 아셔야 하는 것이 몇 년 전까지 컴퓨터로는 이런 것이 불가능했습니다. 고전적인 컴퓨팅 패러다임은 이런 일을 쉽게 할 수 있는 것이 아닙니다.

So what's going on between the pixels, between the image of the bird and the word "bird," is essentially a set of neurons connected to each other in a neural network, as I'm diagramming here. This neural network could be biological, inside our visual cortices, or, nowadays, we start to have the capability to model such neural networks on the computer. And I'll show you what that actually looks like.

그래서 픽셀들 간의 관계와 만들어진 이미지와 "새"라는 단어의 관계는 근본적으로 신경세포들이 서로 연결되어 신경망을 구축하고 있는 것입니다. 제가 그린 도표처럼요. 이 신경망은 시각피질 내부의 생물학적인 것이나 오늘날에는 우리의 기술로 컴퓨터를 통해 신경망을 그릴 수 있습니다. 그리고 이것이 실제 모델입니다.

So the pixels you can think about as a first layer of neurons, and that's, in fact, how it works in the eye -- that's the neurons in the retina. And those feed forward into one layer after another layer, after another layer of neurons, all connected by synapses of different weights. The behavior of this network is characterized by the strengths of all of those synapses. Those characterize the computational properties of this network. And at the end of the day, you have a neuron or a small group of neurons that light up, saying, "bird."

픽셀이 신경세포의 첫 번째 층입니다. 그리고 이것은 실제로 눈으로 보는 과정으로 보면 픽셀이 망막인 것입니다. 그리고 이 자극을 신경세포의 한 층에서 다음 층으로 전달합니다. 이는 각각 다른 농도의 시냅스로 모두 연결되어있습니다. 이 네트워크의 동작은 모든 시냅스의 강도에 의해 구분됩니다. 이것으로 네트워크 내에서 계산되는 것을 특징짓습니다. 그리고 마지막에 신경 세포 하나 또는 한 무리가 반짝이며 "새"라고 말합니다.

Now I'm going to represent those three things -- the input pixels and the synapses in the neural network, and bird, the output -- by three variables: x, w and y. There are maybe a million or so x's -- a million pixels in that image. There are billions or trillions of w's, which represent the weights of all these synapses in the neural network. And there's a very small number of y's, of outputs that that network has. "Bird" is only four letters, right? So let's pretend that this is just a simple formula, x "x" w = y. I'm putting the times in scare quotes because what's really going on there, of course, is a very complicated series of mathematical operations.

이제 제가 이 세가지를 입력된 픽셀, 신경망의 시넵스 그리고 결과물인 새를 세 변수 x, w, y라고 하겠습니다. 픽셀이 백만 개는 있을테니 x는 이미지의 백만 개의 픽셀입니다. 그리고 w는 수십억 혹은 수조 개가 있습니다. 이는 신경망의 모든 시냅스의 농도를 말합니다. 그리고 적은 수의 y가 있습니다. 신경망의 결과물로써 말이죠. "Bird"는 네 글자뿐이잖아요. 그러면 이것을 간단한 공식이라고 해봅시다. x "x" w = y. 저는 곱하기를 큰따옴표 안에 넣었습니다. 실제로 저기서 일어나는 일은 매우 복잡한 일련의 수학적인 과정이기 때문입니다.

That's one equation. There are three variables. And we all know that if you have one equation, you can solve one variable by knowing the other two things. So the problem of inference, that is, figuring out that the picture of a bird is a bird, is this one: it's where y is the unknown and w and x are known. You know the neural network, you know the pixels. As you can see, that's actually a relatively straightforward problem. You multiply two times three and you're done. I'll show you an artificial neural network that we've built recently, doing exactly that.

이것은 한 공식입니다. 세 개의 변수가 있습니다. 그리고 우리가 알고 있는 것이 한 공식에서 두 개의 변수를 알면 남은 한 개를 알 수 있다는 것입니다. 그래서 추론해야 하는 새의 사진을 보고 새를 구분하는 공식은 바로 이것입니다. 이 경우는 y는 알려지지 않고 w와 x는 알려진 경우이죠 신경망과 픽셀이 무엇인지는 알고 있습니다. 보시다시피 사실 상대적으로 간단한 문제입니다 2 곱하기 3을 하면 끝나는 거죠 여러분께 최근에 만든 인공 신경망이 정확히 이것을 하는 것을 보여드리겠습니다

This is running in real time on a mobile phone, and that's, of course, amazing in its own right, that mobile phones can do so many billions and trillions of operations per second. What you're looking at is a phone looking at one after another picture of a bird, and actually not only saying, "Yes, it's a bird," but identifying the species of bird with a network of this sort. So in that picture, the x and the w are known, and the y is the unknown. I'm glossing over the very difficult part, of course, which is how on earth do we figure out the w, the brain that can do such a thing? How would we ever learn such a model?

이것은 휴대전화에서 실시간으로 돌아가는 것입니다. 그리고 물론 휴대전화에서 초당 수십억 수조 개의 동작을 한다는 것 자체만으로도 놀라운 일입니다 여러분이 보고 있는 것은 휴대전화가 다른 새 사진을 보고 “네, 이것은 새입니다.” 하고 끝나는 것이 아니라 네트워크 정보로 종까지 분류하는 모습입니다. 사진을 보면 x와 w는 밝혀져 있고 y는 밝혀지지 않았습니다. 지금 몹시 어려운 부분을 얼버무리고 지나가고 있는데 그것은 우리가 어떻게 w를 밝혀냈으며 뇌가 어떻게 그런 일을 하며 어떻게 이런 모델을 배울까입니다.

So this process of learning, of solving for w, if we were doing this with the simple equation in which we think about these as numbers, we know exactly how to do that: 6 = 2 x w, well, we divide by two and we're done. The problem is with this operator. So, division -- we've used division because it's the inverse to multiplication, but as I've just said, the multiplication is a bit of a lie here. This is a very, very complicated, very non-linear operation; it has no inverse. So we have to figure out a way to solve the equation without a division operator. And the way to do that is fairly straightforward. You just say, let's play a little algebra trick, and move the six over to the right-hand side of the equation. Now, we're still using multiplication. And that zero -- let's think about it as an error. In other words, if we've solved for w the right way, then the error will be zero. And if we haven't gotten it quite right, the error will be greater than zero.

w를 배우고 해결하는 과정을 간단한 공식으로 만들어 숫자를 대입해보면 정확히 알 수 있습니다. 6=2 x w라고 하면 양변을 2로 나누면 끝납니다. 문제점은 이 연산에서 나눗셈을 우리가 나눗셈을 썼는데 곱셈을 역으로 계산한 것입니다. 하지만 방금 말한 대로 실제 연산은 곱하기가 아닙니다. 이것은 매우 매우 복잡한 비선형 연산이고 역으로 계산할 수 없습니다. 그래서 우리는 이 공식을 나누지 않고 해결할 방법을 찾아야 합니다. 그리고 그 방법은 매우 간단합니다. 대수학을 조금 이용해 6을 공식의 우변으로 옮기겠습니다. 이러면 곱하기만 사용할 수 있습니다. 그리고 0은 오류라고 생각합시다. 다시 말해, 우리가 w를 해결해서 정답이 나오면 오류가 0이 될 것이고 우리가 잘못된 값을 구했다면 오류가 0보다 커질 것입니다.

So now we can just take guesses to minimize the error, and that's the sort of thing computers are very good at. So you've taken an initial guess: what if w = 0? Well, then the error is 6. What if w = 1? The error is 4. And then the computer can sort of play Marco Polo, and drive down the error close to zero. As it does that, it's getting successive approximations to w. Typically, it never quite gets there, but after about a dozen steps, we're up to w = 2.999, which is close enough. And this is the learning process.

이제 우리가 추측해서 오류를 최소화할 수 있습니다. 그리고 이런 것은 컴퓨터가 아주 잘하는 일이죠. 그래서 최초의 추측으로 w가 0이라면 오류는 6입니다. w가 1이면 오류는 4입니다. 컴퓨터가 계속 마르코 폴로같이 여행하면 오류가 0에 가까워질 것입니다. 그러면서 컴퓨터가 성공적으로 w 값의 근사치를 얻어가는 것입니다. 전형적으로 정확한 값을 얻진 못하지만 수십 단계가 지나면 w는 2.999를 얻게 되고 이는 충분히 근접한 값입니다. 그리고 이것이 학습 과정입니다.

So remember that what's been going on here is that we've been taking a lot of known x's and known y's and solving for the w in the middle through an iterative process. It's exactly the same way that we do our own learning. We have many, many images as babies and we get told, "This is a bird; this is not a bird." And over time, through iteration, we solve for w, we solve for those neural connections.

지금까지 이야기한 것은 수많은 x와 y 값을 알고 있고 가운데 w 값을 추론 과정에서 알아내고 있습니다. 이는 우리의 뇌가 학습하는 과정과 같습니다. 우리는 어릴 적 수많은 이미지를 접하고 "이것은 새다, 이것은 새가 아니다" 라고 듣습니다. 그리고 시간이 흘러 반복하면서 w를 알아내죠. 신경 연결을 해결하는 것입니다.

So now, we've held x and w fixed to solve for y; that's everyday, fast perception. We figure out how we can solve for w, that's learning, which is a lot harder, because we need to do error minimization, using a lot of training examples.

이제 우리는 고정된 x와 w값으로 y를 구합니다. 이것은 매일 우리가 하는 인식입니다. w 값을 구하는 과정은 학습이고 더 어렵습니다. 왜냐면 많은 훈련 예시를 통해 오류를 최소화 해야 하기 때문이죠.

And about a year ago, Alex Mordvintsev, on our team, decided to experiment with what happens if we try solving for x, given a known w and a known y. In other words, you know that it's a bird, and you already have your neural network that you've trained on birds, but what is the picture of a bird? It turns out that by using exactly the same error-minimization procedure, one can do that with the network trained to recognize birds, and the result turns out to be ... a picture of birds. So this is a picture of birds generated entirely by a neural network that was trained to recognize birds, just by solving for x rather than solving for y, and doing that iteratively.

약 1년 전에 저희 팀의 알렉스 모드빈츠세프는 우리가 x를 구하면 어떻게 되는지 실험하기로 했습니다. w와 y 값을 알고 있다는 조건에서 말이죠. 다시 말하자면 새라는 것을 알고 새라는 것을 인식할 수 있는 신경망이 구축된 상태에서 새의 모습을 알아내는 것입니다. 똑같은 오류 최소화 과정을 거쳐 컴퓨터가 새를 인식할 수 있는 네트워크를 통해 만들어낸 결과는 새의 그림입니다. 이 그림은 전적으로 새를 인식할 수 있는 신경 네트워크를 통해 y 값을 구하는 대신 x 값을 추론하여 구현됬습니다.

Here's another fun example. This was a work made by Mike Tyka in our group, which he calls "Animal Parade." It reminds me a little bit of William Kentridge's artworks, in which he makes sketches, rubs them out, makes sketches, rubs them out, and creates a movie this way. In this case, what Mike is doing is varying y over the space of different animals, in a network designed to recognize and distinguish different animals from each other. And you get this strange, Escher-like morph from one animal to another.

다른 재미있는 예를 보여드리면 이것은 저희 그룹의 마이크 티카의 작품입니다. 이 작품의 제목은 "동물 행진"입니다. 이것을 보고 윌리엄 켄트리지의 작품이 떠올랐습니다. 그는 스케치를 그렸다가 지우고 그렸다가 지워가며 이런 식으로 영상을 만들죠. 이 경우에는 마이크가 한 것은 변수 y를 다양한 동물들로 설정했습니다. 서로 다른 동물들을 구분할 수 있도록 설계된 네트워크 안에서 말이죠. 그렇게 이런 희안한 에셔 풍의 동물들이 변하는 그림이 나옵니다.

Here he and Alex together have tried reducing the y's to a space of only two dimensions, thereby making a map out of the space of all things recognized by this network. Doing this kind of synthesis or generation of imagery over that entire surface, varying y over the surface, you make a kind of map -- a visual map of all the things the network knows how to recognize. The animals are all here; "armadillo" is right in that spot.

여기서 마이크와 알렉스는 y 값을 줄여 2차원 평면에 표현했습니다. 그렇게 이 네트워크가 인식할 수 있는 모든 종류를 나타내는 지도를 만들었습니다. 이런 종류의 이미지 통합 혹은 생성은 표면 전반에 걸쳐 y를 다르게 해서 이런 지도를 만듭니다. 네트워크가 인식하는 모든 것의 시각적 지도입니다. 모든 동물이 있습니다. 저기 "아르마딜로"가 있습니다.

You can do this with other kinds of networks as well. This is a network designed to recognize faces, to distinguish one face from another. And here, we're putting in a y that says, "me," my own face parameters. And when this thing solves for x, it generates this rather crazy, kind of cubist, surreal, psychedelic picture of me from multiple points of view at once. The reason it looks like multiple points of view at once is because that network is designed to get rid of the ambiguity of a face being in one pose or another pose, being looked at with one kind of lighting, another kind of lighting. So when you do this sort of reconstruction, if you don't use some sort of guide image or guide statistics, then you'll get a sort of confusion of different points of view, because it's ambiguous. This is what happens if Alex uses his own face as a guide image during that optimization process to reconstruct my own face. So you can see it's not perfect. There's still quite a lot of work to do on how we optimize that optimization process. But you start to get something more like a coherent face, rendered using my own face as a guide.

이것을 다른 네트워크로 할 수 있습니다. 이 네트워크는 얼굴을 인식하도록 설계됬습니다. 서로 다른 얼굴을 구분하도록 말이죠. 여기서 저희가 y에 "저"를 넣었습니다. 제 얼굴을 변수로 말이죠. 그리고 이것이 x를 구하면 이런 상당히 정신없고 약간은 입체파, 초현실주의, 사이키델릭한 제 사진을 만듭니다. 여러 모습을 한 번에 보여주면서요. 여러 모습을 한 번에 보여주는 이유는 네트워크의 설계에서 얼굴의 한 모습에서 다른 모습으로 넘어가는 모호한 과정이 제거되었기 때문입니다. 특정 각도의 얼굴을 보는 것입니다. 그래서 이것을 재구성할 때 가이드 이미지나 통계를 사용하지 않으면 이런 혼란스러운 시점들이 나옵니다. 모호하기 떄문이죠. 이것은 알렉스가 본인 얼굴을 가이드로 이용해 최적화 과정을 거쳐 제 얼굴을 만든 것입니다. 보시다시피 완벽하진 않습니다. 어떻게 최적화를 해야 할지 아직도 갈 길이 멉니다. 하지만 제 얼굴을 가이드로 쓰면 더 일관된 얼굴을 구할 수 있습니다.

You don't have to start with a blank canvas or with white noise. When you're solving for x, you can begin with an x, that is itself already some other image. That's what this little demonstration is. This is a network that is designed to categorize all sorts of different objects -- man-made structures, animals ... Here we're starting with just a picture of clouds, and as we optimize, basically, this network is figuring out what it sees in the clouds. And the more time you spend looking at this, the more things you also will see in the clouds. You could also use the face network to hallucinate into this, and you get some pretty crazy stuff.

굳이 빈 캔버스로 시작하지 않아도 됩니다. 혹은 백색 잡음으로요. x를 구할 때 이미 그려진 그림 위에 x를 구해도 됩니다. 이것이 바로 그 예입니다. 이 네트워크는 온갖 물체를 구분하도록 설계되었습니다. 인조물이나 동물 등을 말이죠. 여기서 저희는 구름 사진을 이용했습니다. 그리고 저희가 최적화를 하면 기본적으로 이 네트워크는 구름에서 무엇이 보이는지 구분합니다. 그리고 이것을 더 자세히 보시면 구름에서 더 다양한 것을 볼 수 있습니다. 여기서 얼굴을 인식하는 네트워크로 환각을 만들면 꽤나 정신없는 그림이 나옵니다.

(Laughter)

(웃음)

Or, Mike has done some other experiments in which he takes that cloud image, hallucinates, zooms, hallucinates, zooms hallucinates, zooms. And in this way, you can get a sort of fugue state of the network, I suppose, or a sort of free association, in which the network is eating its own tail. So every image is now the basis for, "What do I think I see next? What do I think I see next? What do I think I see next?"

혹은 마이크가 다른 시도를 했습니다. 바로 구름 그림을 이용해 환각을 만들고 확대하고 환각을 만들고 확대했습니다. 그리고 이렇게 방황하는 것처럼 보이는 네트워크나 자유 연상의 일종으로 네트워크가 스스로 꼬리를 물게 됩니다. 그래서 모든 이미지의 기본은 이렇습니다. "다음에는 무엇이 보이지? 다음에는 무엇이 보이지? 다음에는 무엇이 보이지?"

I showed this for the first time in public to a group at a lecture in Seattle called "Higher Education" -- this was right after marijuana was legalized.

이것을 최초로 공개한 곳은 시애틀의 "고등 교육"그룹의 강연에서였습니다. 마리화나가 합법화 된 직후에 말이죠.

(Laughter)

(웃음)

So I'd like to finish up quickly by just noting that this technology is not constrained. I've shown you purely visual examples because they're really fun to look at. It's not a purely visual technology. Our artist collaborator, Ross Goodwin, has done experiments involving a camera that takes a picture, and then a computer in his backpack writes a poem using neural networks, based on the contents of the image. And that poetry neural network has been trained on a large corpus of 20th-century poetry. And the poetry is, you know, I think, kind of not bad, actually.

그래서 정리를 짧게 하겠습니다. 이 기술에 제약이 없다는 것을 말하면서 말이죠. 순전히 시각자료를 보여드린 이유는 흥미를 유발하기 위해서 입니다. 이것은 순전히 시각 기술만은 아닙니다. 저희와 함께 일하는 아티스트 로스 굿윈은 실험을 했습니다. 사진을 찍는 사진기와 등에 매고 있는 컴퓨터로 신경 네트워크를 이용해 시를 썼습니다. 사진에 찍힌 내용을 보고 말이죠. 그리고 시인 신경 네트워크는 20세기 시의 집대성으로 훈련됬습니다. 그리고 결과로 나온 시는 말이죠 사실 제 생각엔 나쁘지 않아 보입니다.

(Laughter)

(웃음)

In closing, I think that per Michelangelo, I think he was right; perception and creativity are very intimately connected. What we've just seen are neural networks that are entirely trained to discriminate, or to recognize different things in the world, able to be run in reverse, to generate. One of the things that suggests to me is not only that Michelangelo really did see the sculpture in the blocks of stone, but that any creature, any being, any alien that is able to do perceptual acts of that sort is also able to create because it's exactly the same machinery that's used in both cases.

마지막으로 저는 미켈란젤로의 생각이 옳았다고 생각합니다. 인식과 창의성은 매우 밀접하게 연결되어 있습니다. 지금까지 보신 것은 신경 네트워크 입니다. 전적으로 훈련이 되어 구분하거나 혹은 다른 것들을 인식하거나 반대로 적용하여 만들어 낼 수 있습니다. 이것을 보고 느낀 점 중에 하나는 미켈란젤로가 정말로 본 것은 돌덩이 안에 있는 조각상뿐만 아니라 어떤 생물, 생명 심지어 외계인도 인식행위를 할 수 있으면 창조할 수 있다는 것 입니다. 두 경우 모두 같은 조작과정을 사용하기 때문이죠.

Also, I think that perception and creativity are by no means uniquely human. We start to have computer models that can do exactly these sorts of things. And that ought to be unsurprising; the brain is computational.

또한 저는 인식과 창의성은 결코 인간에 국한되지 않는다고 생각합니다. 저희는 똑같은 일을 할 수 있는 컴퓨터 모델을 만들었고 그리고 그 뇌가 컴퓨터로 만들어 졌다는 것은 놀랄 일도 아닙니다.

And finally, computing began as an exercise in designing intelligent machinery. It was very much modeled after the idea of how could we make machines intelligent. And we finally are starting to fulfill now some of the promises of those early pioneers, of Turing and von Neumann and McCulloch and Pitts. And I think that computing is not just about accounting or playing Candy Crush or something. From the beginning, we modeled them after our minds. And they give us both the ability to understand our own minds better and to extend them.

그리고 마지막으로 컴퓨터는 지능적 기계를 설계하면서 시작되었습니다. 이것은 이런 생각을 따라 만들어졌습니다. 어떻게 하면 우리가 기계를 똑똑하게 만들지 말이죠. 그리고 이제 선구자들과 한 약속 중에 일부를 이뤄가고 있습니다. 튜링, 폰 노이만 매컬로크 그리고 피트에게 말이죠. 그리고 저는 컴퓨터는 회계나 게임 할 때만 쓰는 것이 아니라고 생각합니다. 시작부터 인간을 본따 컴퓨터를 만들었고 그리고 그 과정에서 인간의 마음을 더 잘 이해하고 더 넓히게 되었습니다.

Thank you very much.

감사합니다.

(Applause)

(박수)

(Laughter)

(웃음)

I showed this for the first time in public to a group at a lecture in Seattle called "Higher Education" -- this was right after marijuana was legalized.

이것을 최초로 공개한 곳은 시애틀의 "고등 교육"그룹의 강연에서였습니다. 마리화나가 합법화 된 직후에 말이죠.

(Laughter)

(웃음)