Fei-Fei Li: How we're teaching computers to understand pictures

Let me show you something.

이걸 보시죠.

(Video) Girl: Okay, that's a cat sitting in a bed. The boy is petting the elephant. Those are people that are going on an airplane. That's a big airplane.

(영상) 소녀: "고양이가 침대에 앉아 있습니다." "소년이 코끼리를 쓰다듬고 있습니다." "사람들이 비행기에 타고 있습니다." "큰 비행기입니다."

Fei-Fei Li: This is a three-year-old child describing what she sees in a series of photos. She might still have a lot to learn about this world, but she's already an expert at one very important task: to make sense of what she sees. Our society is more technologically advanced than ever. We send people to the moon, we make phones that talk to us or customize radio stations that can play only music we like. Yet, our most advanced machines and computers still struggle at this task. So I'm here today to give you a progress report on the latest advances in our research in computer vision, one of the most frontier and potentially revolutionary technologies in computer science.

이건 세 살짜리 아이가 사진을 보고 설명하는 것입니다. 그녀는 아직 이 세상에 대해 배울 것이 많지만, 한 가지 일에서만큼은 이미 전문가 수준입니다. 본 것을 이해하는 일이죠. 우리 사회는 그 어느 때보다 기술적으로 진보하고 있습니다. 우리는 달에 사람을 보내고, 말을 하는 전화를 만들거나 좋아하는 곡만 방송하는 맞춤형 라디오를 만듭니다. 그러나 첨단 기계와 컴퓨터로도 애를 먹는 일이 있습니다. 저는 오늘 컴퓨터 비전 연구의 최신 동향에 대해 말하고자 합니다 컴퓨터 과학에서 가장 선도적이고 혁명적인 기술이죠.

Yes, we have prototyped cars that can drive by themselves, but without smart vision, they cannot really tell the difference between a crumpled paper bag on the road, which can be run over, and a rock that size, which should be avoided. We have made fabulous megapixel cameras, but we have not delivered sight to the blind. Drones can fly over massive land, but don't have enough vision technology to help us to track the changes of the rainforests. Security cameras are everywhere, but they do not alert us when a child is drowning in a swimming pool. Photos and videos are becoming an integral part of global life. They're being generated at a pace that's far beyond what any human, or teams of humans, could hope to view, and you and I are contributing to that at this TED. Yet our most advanced software is still struggling at understanding and managing this enormous content. So in other words, collectively as a society, we're very much blind, because our smartest machines are still blind.

스스로 운전하는 자동차 시험판을 만들더라도 똑똑한 인식 능력이 없다면 도로 위에 있는 것이 밟아도 될 종이 봉투인지 피해야 할 돌덩이인지 구분할 수 없습니다. 수백만 화소의 엄청난 카메라를 만들더라도 시각장애인의 눈이 되지는 못합니다. 무인기가 광활한 땅을 날 수 있어도 컴퓨터 비전 기술이 없으면 열대 우림의 변화를 추적하지 못합니다. 감시 카메라가 도처에 있어도 수영장에서 물에 빠진 아이를 보고 우리에게 경고해 주지는 않습니다. 사진과 비디오는 지구 생활의 불가결한 부분이 되고 있습니다. 어떤 개인이나 단체가 다 볼 수 없을 분량의 영상이 만들어지고 있습니다. 여기 TED도 일조하고 있지요. 그러나 가장 진보한 소프트웨어도 아직까지는 이 방대한 영상을 이해하고 관리하는데 애를 먹고 있습니다. 달리 말하자면 사회 전체적으로 우리는 장님과 같습니다. 우리의 가장 똑똑한 기계가 아직까지 장님이니까요.

"Why is this so hard?" you may ask. Cameras can take pictures like this one by converting lights into a two-dimensional array of numbers known as pixels, but these are just lifeless numbers. They do not carry meaning in themselves. Just like to hear is not the same as to listen, to take pictures is not the same as to see, and by seeing, we really mean understanding. In fact, it took Mother Nature 540 million years of hard work to do this task, and much of that effort went into developing the visual processing apparatus of our brains, not the eyes themselves. So vision begins with the eyes, but it truly takes place in the brain.

"그게 왜 어렵지?" 하고 물으실 수 있어요. 카메라는 이런 사진을 찍을 수 있고 빛을 숫자의 2차원 배열인 픽셀로 변환할 수 있지만, 이는 그저 죽은 숫자일 뿐입니다. 그 자체에 의미는 없습니다. '들리는' 것과 '듣는' 것이 똑같지 않듯이 사진을 '찍는' 것과 '보는' 것은 똑같지 않습니다. '본다'는 말에는 '이해한다'는 뜻이 있습니다. 사실 자연은 5억 4천만년에 걸쳐 이 작업을 했는데요. 그 노력의 대부분은 우리 뇌의 시각처리능력을 발달시키는데 소요되었고 눈을 만드는데 소요되지 않았습니다. 시각현상은 눈에서 시작되지만 사실상 나타나는 곳은 뇌 안쪽이죠.

So for 15 years now, starting from my Ph.D. at Caltech and then leading Stanford's Vision Lab, I've been working with my mentors, collaborators and students to teach computers to see. Our research field is called computer vision and machine learning. It's part of the general field of artificial intelligence. So ultimately, we want to teach the machines to see just like we do: naming objects, identifying people, inferring 3D geometry of things, understanding relations, emotions, actions and intentions. You and I weave together entire stories of people, places and things the moment we lay our gaze on them.

저는 최근 15년간 캘리포니아 공대 박사 과정에서부터 스탠포드대 컴퓨터 비전 연구실을 이끌기까지 지도교수, 공동연구자, 학생들과 함께 컴퓨터에게 '보는 법'을 가르쳐왔습니다. 저희 연구 분야를 컴퓨터 비전과 기계 학습이라고 합니다. 인공지능 일반 분야에 속하죠. 궁극적으로 우리는 기계가 인간처럼 볼 수 있게 하려고 합니다. 물체와 사람을 식별하고, 3차원 기하구조를 추측하고, 관계, 감정, 행동과 의도를 이해하게 하는 겁니다. 여러분과 저는 한번 보기만 해도 사람, 장소, 사물로 이야기를 엮어낼 수 있습니다.

The first step towards this goal is to teach a computer to see objects, the building block of the visual world. In its simplest terms, imagine this teaching process as showing the computers some training images of a particular object, let's say cats, and designing a model that learns from these training images. How hard can this be? After all, a cat is just a collection of shapes and colors, and this is what we did in the early days of object modeling. We'd tell the computer algorithm in a mathematical language that a cat has a round face, a chubby body, two pointy ears, and a long tail, and that looked all fine. But what about this cat? (Laughter) It's all curled up. Now you have to add another shape and viewpoint to the object model. But what if cats are hidden? What about these silly cats? Now you get my point. Even something as simple as a household pet can present an infinite number of variations to the object model, and that's just one object.

이런 목표를 향한 첫걸음이 컴퓨터를 가르쳐 사물, 시각 세계의 구성요소를 보게 하는 것입니다. 간단히 말해, 이런 학습 과정을 상상해보세요. 컴퓨터에 특정 사물의 훈련용 이미지를 보여줍니다. 고양이라고 해보죠. 그리고 그 훈련용 이미지로 학습하는 모델을 설계합니다. 간단하게 들리는데요. 얼마나 어려울 수 있을까요? 고양이는 모양과 색깔의 집합이고, 이것이 우리가 초창기 객체 모델링으로 한 일이죠. 우리는 컴퓨터 알고리즘을 수학적 언어로 표현합니다. 고양이는 둥근 얼굴과 통통한 몸, 두 개의 뾰족한 귀, 긴 꼬리가 있다고 가르칩니다. 다 괜찮아 보였습니다. 그런데 이 고양이는 어떨까요? (웃음) 몸을 말고 있습니다. 이제 객체 모델에 다른 모양과 관점을 추가합니다. 그런데 만약 고양이가 숨어 있으면요? 이런 웃기는 고양이들은요? 이제 제 말을 아시겠죠. 집안의 애완동물처럼 단순한 사물조차 객체 모델에 무한한 변형이 존재할 수 있고, 그게 한 개의 객체일 뿐이죠.

So about eight years ago, a very simple and profound observation changed my thinking. No one tells a child how to see, especially in the early years. They learn this through real-world experiences and examples. If you consider a child's eyes as a pair of biological cameras, they take one picture about every 200 milliseconds, the average time an eye movement is made. So by age three, a child would have seen hundreds of millions of pictures of the real world. That's a lot of training examples. So instead of focusing solely on better and better algorithms, my insight was to give the algorithms the kind of training data that a child was given through experiences in both quantity and quality.

약 8년 전 단순하고도 깊은 관찰이 제 생각을 바꾸었습니다. 아이에게 보는 법을 가르칠 순 없죠. 특히 어린 시절에 말이죠. 아이들은 현실세계의 경험과 사례로 보는 법을 배웁니다. 만약 아이의 눈을 생물학적 카메라 한쌍이라 치면 200밀리초마다 한 장씩 사진을 찍는 셈이죠. 눈이 움직이는 평균 시간이에요. 아이는 세 살까지 수억장의 현실세계 사진을 보게 됩니다. 방대한 양의 학습 사례죠. 그래서 제 생각엔 더 나은 알고리즘에만 집중하기보다, 알고리즘에 주는 학습 데이터를 아이가 경험하는 것과 같이 만들어야 했습니다. 양적으로나 질적으로 말이죠.

Once we know this, we knew we needed to collect a data set that has far more images than we have ever had before, perhaps thousands of times more, and together with Professor Kai Li at Princeton University, we launched the ImageNet project in 2007. Luckily, we didn't have to mount a camera on our head and wait for many years. We went to the Internet, the biggest treasure trove of pictures that humans have ever created. We downloaded nearly a billion images and used crowdsourcing technology like the Amazon Mechanical Turk platform to help us to label these images. At its peak, ImageNet was one of the biggest employers of the Amazon Mechanical Turk workers: together, almost 50,000 workers from 167 countries around the world helped us to clean, sort and label nearly a billion candidate images. That was how much effort it took to capture even a fraction of the imagery a child's mind takes in in the early developmental years.

이걸 알게 되자, 우리는 이전보다 훨씬 많은 데이터를 모아야 했습니다. 거의 수천배였죠. 그래서 전 프린스턴 대학의 카이 리 교수와 함께 2007년 이미지넷 프로젝트를 시작했습니다. 다행히도 우리는 머리에 카메라를 매달고 몇년씩 기다릴 필요는 없었습니다. 인터넷이 있었거든요. 인류가 만든 최대의 사진 창고죠. 우리는 거의 10억장의 이미지를 다운로드했고 아마존 MTurk 같은 크라우드 소싱 기술을 사용해 이미지에 라벨을 붙였습니다. 가장 최고치에서는 이미지넷이 아마존 MTurk 일꾼들의 최대 고용주였습니다. 5만명 가까운 작업자가 세계 167개국에서 약 10억장의 후보 이미지의 정리 분류 작업을 도왔습니다. 아이의 성장 초기에 이미지의 일부라도 수집하는데 얼마나 많은 노력이 드는가 하는 것과 같았죠.

In hindsight, this idea of using big data to train computer algorithms may seem obvious now, but back in 2007, it was not so obvious. We were fairly alone on this journey for quite a while. Some very friendly colleagues advised me to do something more useful for my tenure, and we were constantly struggling for research funding. Once, I even joked to my graduate students that I would just reopen my dry cleaner's shop to fund ImageNet. After all, that's how I funded my college years.

지나고 보니, 컴퓨터 알고리즘의 훈련에 빅데이터를 사용한다는 아이디어는 이제 확실한 것 같습니다만, 2007년 당시에는 그렇지 않았습니다. 우리 혼자 이런 일을 한 게 꽤 오래 됐습니다. 친절한 동료는 종신교수가 되려면 더 유용한 일을 하라고 조언했고, 우리는 늘 연구 자금 문제에 시달렸죠. 저는 이미지넷의 자금 조달을 위해 세탁소를 다시 열어야겠다고 대학원생들에게 농담을 했죠. 제가 대학 학비를 마련한 방법이거든요.

So we carried on. In 2009, the ImageNet project delivered a database of 15 million images across 22,000 classes of objects and things organized by everyday English words. In both quantity and quality, this was an unprecedented scale. As an example, in the case of cats, we have more than 62,000 cats of all kinds of looks and poses and across all species of domestic and wild cats. We were thrilled to have put together ImageNet, and we wanted the whole research world to benefit from it, so in the TED fashion, we opened up the entire data set to the worldwide research community for free. (Applause)

우리는 계속 진행했습니다. 2009년에 이미지넷 프로젝트는 객체와 사물을 2만2천개 범주로 분류한 1천5백만장 이미지의 데이터베이스를 만들었고 일상적인 영단어로 표현했습니다. 양적으로나 질적으로나 전례 없는 규모였죠. 예를 들어, 고양이의 경우 6만 2천장의 이미지가 다양한 모양과 자세, 집고양이부터 들고양이까지 모든 종류를 망라합니다. 우리는 이미지넷을 만든 것에 흥분했고 모든 연구자들과 혜택을 나누고자 했습니다. 그래서 TED 방식으로 모든 데이터를 전세계의 연구자 커뮤니티에 무료로 공개했습니다. (박수)

Now that we have the data to nourish our computer brain, we're ready to come back to the algorithms themselves. As it turned out, the wealth of information provided by ImageNet was a perfect match to a particular class of machine learning algorithms called convolutional neural network, pioneered by Kunihiko Fukushima, Geoff Hinton, and Yann LeCun back in the 1970s and '80s. Just like the brain consists of billions of highly connected neurons, a basic operating unit in a neural network is a neuron-like node. It takes input from other nodes and sends output to others. Moreover, these hundreds of thousands or even millions of nodes are organized in hierarchical layers, also similar to the brain. In a typical neural network we use to train our object recognition model, it has 24 million nodes, 140 million parameters, and 15 billion connections. That's an enormous model. Powered by the massive data from ImageNet and the modern CPUs and GPUs to train such a humongous model, the convolutional neural network blossomed in a way that no one expected. It became the winning architecture to generate exciting new results in object recognition. This is a computer telling us this picture contains a cat and where the cat is. Of course there are more things than cats, so here's a computer algorithm telling us the picture contains a boy and a teddy bear; a dog, a person, and a small kite in the background; or a picture of very busy things like a man, a skateboard, railings, a lampost, and so on. Sometimes, when the computer is not so confident about what it sees, we have taught it to be smart enough to give us a safe answer instead of committing too much, just like we would do, but other times our computer algorithm is remarkable at telling us what exactly the objects are, like the make, model, year of the cars.

이제 우리는 컴퓨터 두뇌에 영양을 공급할 데이터가 있고, 알고리즘 자체로 돌아올 준비가 되었죠. 결과적으로 이미지넷의 풍부한 정보는 기계 학습 알고리즘의 특정 분류에 딱 들어맞았는데, 이를 합성곱 신경망이라고 합니다. 쿠니히코 후쿠시마, 제프리 힌튼, 양 루캉이 1970~80년대에 개척한 영역이죠. 마치 뇌가 고도로 연결된 뉴런 수십억개로 구성된 것처럼 신경망의 기본 단위는 뉴런과 같은 노드입니다. 다른 노드에서 입력을 받고 다른 노드로 출력을 보냅니다. 게다가 이런 수십만, 수백만의 노드는 계층 형태로 조직화됩니다. 뇌와 마찬가지죠. 우리가 사물 인식 모델을 훈련하려고 사용한 전형적인 신경망에는 2천4백만의 노드, 1억4천만의 매개변수, 150억의 결합이 존재합니다. 어마어마한 모델이죠. 이미지넷의 방대한 데이터와 현대의 CPU와 GPU에 힘입어 합성곱 신경망은 아무도 예상치 못한 방식으로 꽃피었습니다. 사물의 인식에 있어 흥미롭고도 새로운 결과를 내는 우수한 구조가 되었습니다. 이 컴퓨터는 우리에게 이 사진에 고양이가 있는지, 어디에 있는지 말해줍니다. 물론 고양이 이외의 것도 인식할 수 있고, 여기서 컴퓨터 알고리즘은 사진 속에 소년과 테디 베어가 있다고 말해줍니다. 개, 사람, 배경에 작은 연이 있습니다. 또는 많은 것이 찍힌 사진에서 사람, 스케이트 보드, 난간, 가로등 같은 것을 가려냅니다. 때때로 컴퓨터가 보는 것이 무엇인지 확신하지 못할 때는 우리는 컴퓨터를 가르쳐서 억측을 하기 보다는 안전한 대답을 하게 합니다. 사람과 마찬가지죠. 반면 컴퓨터 알고리즘은 놀랍게도 사물이 정확히 무엇인지 말해주기도 합니다. 자동차의 차종, 모델, 연식 같은 것이죠.

We applied this algorithm to millions of Google Street View images across hundreds of American cities, and we have learned something really interesting: first, it confirmed our common wisdom that car prices correlate very well with household incomes. But surprisingly, car prices also correlate well with crime rates in cities, or voting patterns by zip codes.

수백개 미국 도시에서 찍은 구글 스크리트 뷰 이미지 수백만장에 알고리즘을 적용했더니 흥미로운 것을 발견했습니다. 먼저, 일반적으로 예상하듯이 자동차 가격이 가계 수입과 매우 관련이 있다는 것이었습니다. 하지만 놀랍게도, 자동차 가격은 도시의 범죄율과도 관련이 있었고, 도시구역별 투표 경향과도 관련이 있었습니다.

So wait a minute. Is that it? Has the computer already matched or even surpassed human capabilities? Not so fast. So far, we have just taught the computer to see objects. This is like a small child learning to utter a few nouns. It's an incredible accomplishment, but it's only the first step. Soon, another developmental milestone will be hit, and children begin to communicate in sentences. So instead of saying this is a cat in the picture, you already heard the little girl telling us this is a cat lying on a bed.

잠깐만요. 그런가요? 컴퓨터는 이미 인간의 능력을 따라잡거나 추월한 것인가요? 그렇지는 않습니다. 지금까지 우리는 컴퓨터에 사물 인식을 가르쳤을 뿐이에요. 마치 어린 아이가 명사 몇개를 배운 것과 같죠. 엄청난 성과이지만 그저 첫 걸음에 불과합니다. 곧 다음 개발 목표에 이를 것이고, 어린 아이는 문장으로 소통을 하기 시작할 겁니다. 그래서 사진을 보고 '고양이입니다' 하는 대신 여러분이 이미 들었듯 '고양이가 침대에 누워 있다'고 합니다.

So to teach a computer to see a picture and generate sentences, the marriage between big data and machine learning algorithm has to take another step. Now, the computer has to learn from both pictures as well as natural language sentences generated by humans. Just like the brain integrates vision and language, we developed a model that connects parts of visual things like visual snippets with words and phrases in sentences.

컴퓨터가 사진을 보고 문장을 만들게 가르치려면, 빅 데이터와 기계 학습 알고리즘의 결합이 또 한발짝 나아가야 합니다. 이제 컴퓨터는 사진 뿐만 아니라 사람이 만든 자연 언어 문장도 배워야 합니다. 뇌가 시각과 언어를 결합하듯이, 우리가 개발한 모델은 이미지의 단편과 같은 시각적 요소를 문장 속 단어나 문구와 연결합니다.

About four months ago, we finally tied all this together and produced one of the first computer vision models that is capable of generating a human-like sentence when it sees a picture for the first time. Now, I'm ready to show you what the computer says when it sees the picture that the little girl saw at the beginning of this talk.

약 4달 전 우리는 마침내 이 모두를 엮어 최초의 컴퓨터 비전 모델 하나를 만들었습니다. 사진을 처음 보았을때 사람과 같이 문장을 만들어내는 모델입니다. 이제, 여러분께 컴퓨터가 사진을 보고 말하는 것을 보여드리겠습니다. 앞서 어린 소녀가 봤던 사진입니다.

(Video) Computer: A man is standing next to an elephant. A large airplane sitting on top of an airport runway.

(컴퓨터) "남자가 코끼리 옆에 서 있습니다." "큰 비행기가 공항 활주로 끝에 있습니다."

FFL: Of course, we're still working hard to improve our algorithms, and it still has a lot to learn. (Applause)

물론, 우리는 여전히 알고리즘을 개량하려고 일하고 있고 배워야 할 게 많습니다. (박수)

And the computer still makes mistakes.

컴퓨터는 여전히 실수를 저지릅니다.

(Video) Computer: A cat lying on a bed in a blanket.

(컴퓨터) "고양이가 침대 위 이불 안에 있습니다."

FFL: So of course, when it sees too many cats, it thinks everything might look like a cat.

고양이를 너무 많이 봐서 뭐든지 고양이로 보이는지도 모르죠.

(Video) Computer: A young boy is holding a baseball bat. (Laughter)

(컴퓨터) "어린 소년이 야구 방망이를 들고 있습니다." (웃음)

FFL: Or, if it hasn't seen a toothbrush, it confuses it with a baseball bat.

칫솔을 본 적이 없다면 야구 방망이와 혼동합니다.

(Video) Computer: A man riding a horse down a street next to a building. (Laughter)

(컴퓨터) "남자가 말을 타고 건물 옆 길을 내려갑니다." (웃음)

FFL: We haven't taught Art 101 to the computers.

우리는 컴퓨터에게 미술을 가르치지 않았습니다.

(Video) Computer: A zebra standing in a field of grass.

(컴퓨터) "얼룩말이 초원에 서있습니다"

FFL: And it hasn't learned to appreciate the stunning beauty of nature like you and I do.

컴퓨터는 자연의 경이로운 아름다움에 감상하는 것을 배우지도 않았습니다.

So it has been a long journey. To get from age zero to three was hard. The real challenge is to go from three to 13 and far beyond. Let me remind you with this picture of the boy and the cake again. So far, we have taught the computer to see objects or even tell us a simple story when seeing a picture.

이는 오랜 여정이었습니다. 0세에서 3세까지 가는 건 힘들었습니다. 하지만 진짜 도전은 3세에서 13세, 그 이상으로 나아가는 것입니다. 이 소년과 케이크의 사진을 다시 보시죠. 지금까지 우리는 컴퓨터에 사물을 식별하고 간단한 말을 하는 것을 가르쳤습니다.

(Video) Computer: A person sitting at a table with a cake.

(컴퓨터) "한 사람이 케이크가 있는 테이블에 앉아 있습니다."

FFL: But there's so much more to this picture than just a person and a cake. What the computer doesn't see is that this is a special Italian cake that's only served during Easter time. The boy is wearing his favorite t-shirt given to him as a gift by his father after a trip to Sydney, and you and I can all tell how happy he is and what's exactly on his mind at that moment.

그러나 이 사진에는 사람과 케이크 이외에 더 많은 것이 들어있죠. 컴퓨터가 보지 못하는 것은 이 특별한 이태리 케이크가 부활절에만 먹는 것이란 겁니다. 소년은 자기가 좋아하는 티셔츠를 입고 있는데 아이 아버지가 시드니 여행을 다녀와 선물로 준 것입니다. 여러분과 저는 이 아이가 얼마나 기뻐하는지, 저 순간 무슨 생각을 하는지 이야기할 수 있습니다.

This is my son Leo. On my quest for visual intelligence, I think of Leo constantly and the future world he will live in. When machines can see, doctors and nurses will have extra pairs of tireless eyes to help them to diagnose and take care of patients. Cars will run smarter and safer on the road. Robots, not just humans, will help us to brave the disaster zones to save the trapped and wounded. We will discover new species, better materials, and explore unseen frontiers with the help of the machines.

제 아들 레오입니다. 시각 지능에 대한 탐구를 하며 저는 항상 레오와 레오가 살 미래세계를 생각합니다. 기계가 인식을 하게 되면, 의사와 간호사는 쉬지 않는 기계 눈을 이용해 환자를 진단하고 돌볼 수 있겠지요. 자동차는 더 똑똑하고 안전하게 도로를 주행할 겁니다. 인간 뿐 아니라 로봇이 재난 지역에서 갇히고 부상당한 사람을 구하는 걸 도울 겁니다. 우리는 기계의 도움으로 새로운 종, 더 나은 물질을 발견하고 보지 못한 개척지를 탐험하게 될 겁니다.

Little by little, we're giving sight to the machines. First, we teach them to see. Then, they help us to see better. For the first time, human eyes won't be the only ones pondering and exploring our world. We will not only use the machines for their intelligence, we will also collaborate with them in ways that we cannot even imagine.

조금씩 우리는 기계에게 시각을 주고 있습니다. 처음에 우리는 기계에게 보는 것을 가르쳤습니다. 다음엔, 기계가 우리를 도와 더 잘 보게 할 겁니다. 처음으로, 인간의 눈이 아닌 것이 세계를 생각하고 탐험하게 되었습니다. 우리는 인공지능 때문에 기계를 이용할 뿐만 아니라 상상치 못했던 방식으로 기계와 협력하게 될 것입니다.

This is my quest: to give computers visual intelligence and to create a better future for Leo and for the world.

이것이 제 탐구입니다. 컴퓨터에 시각 지능을 부여하는 것, 그리고 레오와 세계를 위해서 더 나은 미래를 만드는 것입니다.

Thank you.

감사합니다.

(Applause)

(박수)

Let me show you something.

이걸 보시죠.

(Video) Girl: Okay, that's a cat sitting in a bed. The boy is petting the elephant. Those are people that are going on an airplane. That's a big airplane.

(영상) 소녀: "고양이가 침대에 앉아 있습니다." "소년이 코끼리를 쓰다듬고 있습니다." "사람들이 비행기에 타고 있습니다." "큰 비행기입니다."

(Video) Computer: A man is standing next to an elephant. A large airplane sitting on top of an airport runway.

(컴퓨터) "남자가 코끼리 옆에 서 있습니다." "큰 비행기가 공항 활주로 끝에 있습니다."

FFL: Of course, we're still working hard to improve our algorithms, and it still has a lot to learn. (Applause)

물론, 우리는 여전히 알고리즘을 개량하려고 일하고 있고 배워야 할 게 많습니다. (박수)

And the computer still makes mistakes.

컴퓨터는 여전히 실수를 저지릅니다.

(Video) Computer: A cat lying on a bed in a blanket.

(컴퓨터) "고양이가 침대 위 이불 안에 있습니다."

FFL: So of course, when it sees too many cats, it thinks everything might look like a cat.

고양이를 너무 많이 봐서 뭐든지 고양이로 보이는지도 모르죠.

(Video) Computer: A young boy is holding a baseball bat. (Laughter)

(컴퓨터) "어린 소년이 야구 방망이를 들고 있습니다." (웃음)

FFL: Or, if it hasn't seen a toothbrush, it confuses it with a baseball bat.

칫솔을 본 적이 없다면 야구 방망이와 혼동합니다.

(Video) Computer: A man riding a horse down a street next to a building. (Laughter)

(컴퓨터) "남자가 말을 타고 건물 옆 길을 내려갑니다." (웃음)

FFL: We haven't taught Art 101 to the computers.

우리는 컴퓨터에게 미술을 가르치지 않았습니다.

(Video) Computer: A zebra standing in a field of grass.

(컴퓨터) "얼룩말이 초원에 서있습니다"

FFL: And it hasn't learned to appreciate the stunning beauty of nature like you and I do.

컴퓨터는 자연의 경이로운 아름다움에 감상하는 것을 배우지도 않았습니다.

(Video) Computer: A person sitting at a table with a cake.

(컴퓨터) "한 사람이 케이크가 있는 테이블에 앉아 있습니다."

This is my quest: to give computers visual intelligence and to create a better future for Leo and for the world.

이것이 제 탐구입니다. 컴퓨터에 시각 지능을 부여하는 것, 그리고 레오와 세계를 위해서 더 나은 미래를 만드는 것입니다.

Thank you.

감사합니다.

(Applause)

(박수)

Fei-Fei Li: How we're teaching computers to understand pictures

Fei-Fei Li: How we're teaching computers to understand pictures

Related talks

Jeremy Howard: The wonderful and terrifying implications of computers that can learn

Pawan Sinha: How brains learn to see

Patricia Kuhl: The linguistic genius of babies

Joseph Redmon: How computers learn to recognize objects instantly

Sebastian Thrun and Chris Anderson: What AI is -- and isn't

Linda Liukas: A delightful way to teach kids about computers

Related talks

Jeremy Howard: The wonderful and terrifying implications of computers that can learn

Pawan Sinha: How brains learn to see

Patricia Kuhl: The linguistic genius of babies

Joseph Redmon: How computers learn to recognize objects instantly

Sebastian Thrun and Chris Anderson: What AI is -- and isn't

Linda Liukas: A delightful way to teach kids about computers