Fei-Fei Li: How we're teaching computers to understand pictures

Let me show you something.

Müsadenizle size bazı şeyler göstermek istiyorum.

(Video) Girl: Okay, that's a cat sitting in a bed. The boy is petting the elephant. Those are people that are going on an airplane. That's a big airplane.

(Video) Kız: Tamam, burada yatağın üzerinde oturan bir kedi var. Çocuk fili okşuyor. Buradaki insanlar uçağa gidiyorlar. Bu büyük bir uçak.

Fei-Fei Li: This is a three-year-old child describing what she sees in a series of photos. She might still have a lot to learn about this world, but she's already an expert at one very important task: to make sense of what she sees. Our society is more technologically advanced than ever. We send people to the moon, we make phones that talk to us or customize radio stations that can play only music we like. Yet, our most advanced machines and computers still struggle at this task. So I'm here today to give you a progress report on the latest advances in our research in computer vision, one of the most frontier and potentially revolutionary technologies in computer science.

Fei-Fei Li: Üç yaşında küçük bir kız çocuğu fotoğraflarda ne gördüğünü anlatıyor. Henüz, dünya hakkında öğrenmesi gereken çok şey var fakat çok önemli bir alanda uzman olmuş bile: gördüklerini anlamlandırma. Toplumumuz teknolojik olarak her zamankinden daha fazla ilerlemiş durumda. İnsanları aya gönderiyoruz, bizimle konuşabilen ya da radyo kanallarını sadece sevdiğimiz müzikleri çalması için düzenleyebilen telefonlar yapıyoruz. En gelişmiş makinelerimiz ve bilgisayarlarımız hâlâ bu özelliği elde etmeye çalışıyorlar. Bugün, bilgisayar görme yetisi üzerine yapılan ileri düzeydeki araştırmalarımızın işleyişi hakkında size bilgi vermek için buradayım. Bilgisayar biliminde, en önde ve devrim niteliğinde olan teknolojik gelişmelerden biri.

Yes, we have prototyped cars that can drive by themselves, but without smart vision, they cannot really tell the difference between a crumpled paper bag on the road, which can be run over, and a rock that size, which should be avoided. We have made fabulous megapixel cameras, but we have not delivered sight to the blind. Drones can fly over massive land, but don't have enough vision technology to help us to track the changes of the rainforests. Security cameras are everywhere, but they do not alert us when a child is drowning in a swimming pool. Photos and videos are becoming an integral part of global life. They're being generated at a pace that's far beyond what any human, or teams of humans, could hope to view, and you and I are contributing to that at this TED. Yet our most advanced software is still struggling at understanding and managing this enormous content. So in other words, collectively as a society, we're very much blind, because our smartest machines are still blind.

Evet, kendini sürebilen araçların prototiplerine sahibiz, fakat akıllı görme yetisine sahip olmadan, üzerinden geçilebilecek buruşmuş bir kağıt torba ile sakınılması gereken aynı boyuttaki bir kaya arasındaki farkı söyleyebilmeleri mümkün değil. Mükemmel çözünürlükte kameralar yapmamıza rağmen, görebilmelerini sağlayamamıştık. İnsansız hava araçları koca bir araziyi uçabilirler, ama yağmur ormanlarındaki değişimleri izlememize yardımcı olabilecek düzeyde yeterli bir görüş kabiliyetine sahip değiller. Güvenlik kameraları her yerde, fakat bir çocuk havuzda boğuluyorken bizi uyaramıyorlar. Fotoğraf ve videolar gündelik hayatın bir parçası haline geliyorlar. Herhangi bir insan veya bazı grupların görme umuduyla hızlı bir şekilde çoğalıyorlar, buradaki TED konuşmaları ile sizler ve ben de buna katkı sağlıyoruz. En iyi yazılımımız hala bu devasa içeriği anlamaya ve yönetmeye çabalıyor. Başka bir anlamda, toplumun tamamı olarak büyük bir görme kaybına sahibiz çünkü en iyi makinelerimiz hala göremiyorlar.

"Why is this so hard?" you may ask. Cameras can take pictures like this one by converting lights into a two-dimensional array of numbers known as pixels, but these are just lifeless numbers. They do not carry meaning in themselves. Just like to hear is not the same as to listen, to take pictures is not the same as to see, and by seeing, we really mean understanding. In fact, it took Mother Nature 540 million years of hard work to do this task, and much of that effort went into developing the visual processing apparatus of our brains, not the eyes themselves. So vision begins with the eyes, but it truly takes place in the brain.

"Neden bu kadar zor ki bu?" diye soracaksınız. Kameralar burada da olduğu gibi fotoğraf çekebilirler, ışığın iki boyutlu sayı dizilerine çevrilmiş hali ile, ki bunlara pikseller diyoruz. Fakat burada sadece ölü sayılar bulunmakta. Kendi içlerinde herhangi bir anlam taşımıyorlar. Nasıl ki duymak ile dinlemek aynı anlama gelmiyorsa fotoğraf çekmek ile görmek de aynı şeyi ifade etmiyor. Görmek derken ciddi manada "anlamayı" kastediyoruz. Aslında, bu yetiye sahip olabilmemiz tabiat ananın 540 milyon yılını aldı. Bu çabanın çoğu, beynin görsel işleme bölümünün gelişimine gitti sadece gözlerin kendisine değil. Yani görmek gözlerde başlıyor, ama asıl olarak beynin bir bölümünde anlam kazanıyor.

So for 15 years now, starting from my Ph.D. at Caltech and then leading Stanford's Vision Lab, I've been working with my mentors, collaborators and students to teach computers to see. Our research field is called computer vision and machine learning. It's part of the general field of artificial intelligence. So ultimately, we want to teach the machines to see just like we do: naming objects, identifying people, inferring 3D geometry of things, understanding relations, emotions, actions and intentions. You and I weave together entire stories of people, places and things the moment we lay our gaze on them.

15 sene önce, Caltech'deki doktorama başladığımda ve sonra Stanford Görsel Laboratuvarını yönlendirdiğimde mentorlerim, iş ortaklarım ve öğrencilerimle birlikte bilgisayarlara görmeyi öğretmek için çalışıyorduk. Araştırma alanımız bilgisayar görme yetisi ve makine öğrenimi olarak anılıyordu. Yapay zeka bölümünün genel bir dalı olarak geçiyordu. Nihayetinde, makinelere tıpkı bizim gibi görebilmelerini öğretmek istedik, nesnelerin isimlendirilmesi, insanların tanımlanması, 3B geometrileri tahmin ilişkileri anlama, duygular, olaylar ve şiddet. Şu anda insanların, yerlerin ve eşyaların bütün hikayesini gözler önüne serip beraber dokuyalım.

The first step towards this goal is to teach a computer to see objects, the building block of the visual world. In its simplest terms, imagine this teaching process as showing the computers some training images of a particular object, let's say cats, and designing a model that learns from these training images. How hard can this be? After all, a cat is just a collection of shapes and colors, and this is what we did in the early days of object modeling. We'd tell the computer algorithm in a mathematical language that a cat has a round face, a chubby body, two pointy ears, and a long tail, and that looked all fine. But what about this cat? (Laughter) It's all curled up. Now you have to add another shape and viewpoint to the object model. But what if cats are hidden? What about these silly cats? Now you get my point. Even something as simple as a household pet can present an infinite number of variations to the object model, and that's just one object.

Bu amaca doğru atılacak ilk adım bilgisayara gördüğü şeyleri öğretmek, sanal dünyanın yapı taşı bu. Basit anlamda bu öğretme sürecini bi hayal edin, bilgisayara belirli bir nesnenin ya da bir kedinin diyelim deneme amaçlı resimlerini göstermek gibi ve bu resimlerden öğrenilmiş bir model tasarlamayı. Bu ne kadar zor olabilir ki? Sonuç olarak, bir kedi sadece şekillerin ve renklerin bir derlemesi ve bu ilk zamanlarda yaptığımız nesne modellemesi. Algoritmasını sayısal bir dille bilgisayara öğretmemiz gerekiyordu bu kedi yuvarak bir yüze, dolgun bir vücuda iki noktada kulaklara ve uzun bir kuyruğa sahip her şey yolunda gibi. Peki, ya bu kedi? (Gülüşmeler) Hepsi iç içe. Bu nesne için için farklı bir şekil ve farklı bir bakış açısı eklemeniz gerek. Peki ya kediler gizlenirse? Bu absürd kedilere ne demeli? Şimdi ne demek istediğimi anladınız. Evdeki kedi gibi basit bir şey için bile sonsuz çeşitlilikte nesne modellemesi yapmak mümkün ve bu sadece bir nesne.

So about eight years ago, a very simple and profound observation changed my thinking. No one tells a child how to see, especially in the early years. They learn this through real-world experiences and examples. If you consider a child's eyes as a pair of biological cameras, they take one picture about every 200 milliseconds, the average time an eye movement is made. So by age three, a child would have seen hundreds of millions of pictures of the real world. That's a lot of training examples. So instead of focusing solely on better and better algorithms, my insight was to give the algorithms the kind of training data that a child was given through experiences in both quantity and quality.

Yaklaşık sekiz yıl önce, oldukça basit ama yoğun bir gözlem fikrimi değiştirdi. Hiç kimse bir çocuğa nasıl görebileceğini öğretmez, özellikle de erken yaşlarda. Gerçek dünya tecrübeleri ve örnekleriyle öğrenirler bunu. Bir çocuğun gözlerini düşünecek olursanız sanki bir çift biyolojik kameraymış gibi, yaklaşık her 200 milisaniyede bir fotoğraf çekerler, göz hareketinden oluşmuş ortalama bir zaman dilimi. Yani üç yaşında bir çocuk, gerçek hayatta yüz milyonlarca fotoğraf görmüş olacak. Bu oldukça fazla deneme örneği. Sadece daha iyi algoritmalara odaklanmak yerine, sezilerim algoritmalara bir tür eğitici veri vermek üzerineydi, tıpkı bir çocuğa sayıca ve kalitece deneyimleri yoluyla verilmiş gibi.

Once we know this, we knew we needed to collect a data set that has far more images than we have ever had before, perhaps thousands of times more, and together with Professor Kai Li at Princeton University, we launched the ImageNet project in 2007. Luckily, we didn't have to mount a camera on our head and wait for many years. We went to the Internet, the biggest treasure trove of pictures that humans have ever created. We downloaded nearly a billion images and used crowdsourcing technology like the Amazon Mechanical Turk platform to help us to label these images. At its peak, ImageNet was one of the biggest employers of the Amazon Mechanical Turk workers: together, almost 50,000 workers from 167 countries around the world helped us to clean, sort and label nearly a billion candidate images. That was how much effort it took to capture even a fraction of the imagery a child's mind takes in in the early developmental years.

Bunu anlayınca, bir tür veri havuzuna sahip olduğumuz resimlerden daha fazla, hatta binlerce kat daha fazla ihtiyacımız olduğunu biliyorduk. Princeton Üniversitesinden Prof. Kai Li ile birlikte 2007 senesinde ImageNet projesini başlattık. Şanslıyız ki başımızın üzerine bir kamera alıp yıllarca beklememize gerek kalmadı. İnternete başvurduk. İnsanların oluşturduğu en büyük resim definesi. Yaklaşık bir milyar resim indirdik ve crowdsourcing teknolojisini kullandık. Resimleri tanımlamada bize yardımcı olmada Amazon Mechanical Turk platformu gibi. ImageNet, Amazon Mechanical Turk çalışanlarına işveren en büyük kurumlardan biri oldu. Dünya genelinde 167 ülkeden neredeyse 50,000 çalışan yaklaşık bir milyar resmi eleyip, sınıflandırma ve tanımlamada bize yardımcı oldu. Bu çaba, erken gelişim dönemindeki bir çocuğun algıladığı görüntülerin sadece belli bir bölümünü elde edebilmemiz içindi.

In hindsight, this idea of using big data to train computer algorithms may seem obvious now, but back in 2007, it was not so obvious. We were fairly alone on this journey for quite a while. Some very friendly colleagues advised me to do something more useful for my tenure, and we were constantly struggling for research funding. Once, I even joked to my graduate students that I would just reopen my dry cleaner's shop to fund ImageNet. After all, that's how I funded my college years.

Nihayet, bilgisayar algoritmalarını eğitmek için big datanın kullanılması fikri şu anda mümkün hale geldi, fakat 2007 senesine dönersek, bu mümkün değildi. Bu yolculukta uzun bir süre tam anlamıyla kendi başımızaydık. Samimi bazı arkadaşlarım kadrom için daha kullanışlı şeyler yapmamı tavsiye ettiler, ve aynı zamanda araştırma fonu oluşturmak için durmaksızın çabalıyorduk. Hatta, master öğrencilerime ImageNet fonu için kuru temizleme mağazamı tekrar açma konusunda şaka yapıyordum. Üniversite yıllarımda bu şekilde geçiniyordum.

So we carried on. In 2009, the ImageNet project delivered a database of 15 million images across 22,000 classes of objects and things organized by everyday English words. In both quantity and quality, this was an unprecedented scale. As an example, in the case of cats, we have more than 62,000 cats of all kinds of looks and poses and across all species of domestic and wild cats. We were thrilled to have put together ImageNet, and we wanted the whole research world to benefit from it, so in the TED fashion, we opened up the entire data set to the worldwide research community for free. (Applause)

Sonra devam ettik. 2009 senesinde, ImageNet projesi her gün ingilizce kelimelerle 22,000 nesne ve eşya sınıfı ile 15 milyonluk bir resim veritabanına ulaştı. Sayı ve kalite olarak, emsalsiz bir ölçekti bu. Örneğin, kedi kategorisinde, görünüş ve poz ile evcil ve yaban türlerinin tümüyle 62 binden fazla kedi bulunmakta. Bunları ImageNet olarak toparladığımızdan heyecanlıydık ve sonra bütün dünya araştırmalarında bunlardan faydalanılsın istedik, bu yüzden TED fashion'da bütün veri havuzumuzu global araştırma topluluklarına ücretsiz bir şekilde açtık. (Alkış)

Now that we have the data to nourish our computer brain, we're ready to come back to the algorithms themselves. As it turned out, the wealth of information provided by ImageNet was a perfect match to a particular class of machine learning algorithms called convolutional neural network, pioneered by Kunihiko Fukushima, Geoff Hinton, and Yann LeCun back in the 1970s and '80s. Just like the brain consists of billions of highly connected neurons, a basic operating unit in a neural network is a neuron-like node. It takes input from other nodes and sends output to others. Moreover, these hundreds of thousands or even millions of nodes are organized in hierarchical layers, also similar to the brain. In a typical neural network we use to train our object recognition model, it has 24 million nodes, 140 million parameters, and 15 billion connections. That's an enormous model. Powered by the massive data from ImageNet and the modern CPUs and GPUs to train such a humongous model, the convolutional neural network blossomed in a way that no one expected. It became the winning architecture to generate exciting new results in object recognition. This is a computer telling us this picture contains a cat and where the cat is. Of course there are more things than cats, so here's a computer algorithm telling us the picture contains a boy and a teddy bear; a dog, a person, and a small kite in the background; or a picture of very busy things like a man, a skateboard, railings, a lampost, and so on. Sometimes, when the computer is not so confident about what it sees, we have taught it to be smart enough to give us a safe answer instead of committing too much, just like we would do, but other times our computer algorithm is remarkable at telling us what exactly the objects are, like the make, model, year of the cars.

Artık, bilgisayarımızın beynini besleyecek veriye sahibiz, algoritmaların kendilerine dönecek kadar da hazırız. ImageNet projesinin sağladığı bilgi zenginliği, sonunda "evrişimli sinirsel ağ" olarak ifade edilen makine öğrenme algoritmalarının özel bir sınıfıyla mükemmel bir şekilde eşleşmişti, öncülüğünü Kunihiko Fukushima, Geoff Hinton ve Yann LeCun'un yaptığı 1970 ve 80'lerin öncesindeki bir alan. Beyinde meydana gelen milyarlarca yüksek bağlantılı sinirler gibi, sinir ağının basit bir çalışma birimine "nöron benzeri" düğümü deniyor. Başka düğümlerden girdi alıyorlar ve diğer düğümlere gönderiyorlar. Dahası, bu yüzbinlerce hatta milyonlarca düğüm hiyerarşik tabakalarla düzenleniyorlar tıpkı beyin gibi. Normal bir sinir ağında nesne tanıma modelimizi eğitmek için, 24 milyon düğüm, 140 milyon değişken, ve 15 milyar bağlantı kullandık. Bu muazzam bir modeldi. ImageNet'den elde edilen büyük veri ile oldukça muazzam bir modeli eğitmek için kullanılan modern CPU ve GPU'lar sayesinde evrişimli sinirsel ağ hiçbirimizin hayal edemeyeceği bir şekilde gelişti. Nesne tanımlamada etkileyeci yeni sonuçlar üretmek için başarılı bir mimari olmaya başladı. Bu bilgisayarın bize söylediği, bu fotoğrafta bir kedinin olduğu ve kedinin nerede olduğu. Elbette orada kedilerden daha fazlası var, burada ise bilgisayar algoritmasının bize söylediği resimde bir çocuk ile oyuncak bir ayının; bir köpeğin, bir kişinin ve arkaplanda küçük bir uçurtmanın; ya da çok karışık bir resimin bir adam, bir kaykay, korkuluklar, lamba direği v.b. gibi şeyler olduğu. Bazen, bilgisayar ne gördüğü hakkında emin olamayınca çok fazla düşünmek yerine yeterince mantıklı bir cevap vermesini öğrettik, tıpkı bizim yapacağımız gibi fakat başka zamanlarda bilgisayar algoritmamız bize dikkate değer şeyler tam olarak nesnelerin ne olduğunu marka, model ve üretim yılı gibi şeyleri söylüyor.

We applied this algorithm to millions of Google Street View images across hundreds of American cities, and we have learned something really interesting: first, it confirmed our common wisdom that car prices correlate very well with household incomes. But surprisingly, car prices also correlate well with crime rates in cities, or voting patterns by zip codes.

Bu algoritmayı Google Sokak Görüntüleme ile yüzlerce Amerika şehrinden alınmış resimlere uyguladık ve gerçekten ilginç şeyler öğrendik: öncelikle, hepimizin bildiği gibi araç fiyatlarının aile gelir düzeyiyle doğrudan ilişkili olduğunu teyit etti fakat ilginçtir ki, araç fiyatları aynı zamanda şehirdeki suç oranları ya da posta kodlarından oy verme alanları ile de bağlantılı.

So wait a minute. Is that it? Has the computer already matched or even surpassed human capabilities? Not so fast. So far, we have just taught the computer to see objects. This is like a small child learning to utter a few nouns. It's an incredible accomplishment, but it's only the first step. Soon, another developmental milestone will be hit, and children begin to communicate in sentences. So instead of saying this is a cat in the picture, you already heard the little girl telling us this is a cat lying on a bed.

Peki biraz düşünün, bu oldu mu? Bilgisayar henüz insan kabiliyetlerine erişebildi mi hatta daha üstün geldi mi ? Hayır, o kadar hızlı değil. Şu ana dek, sadece bilgisayara nesneleri görmesini öğrettik. Bu küçük bir çocuğun bir kaç kelime söylemesini öğrenmesi gibi bir sey. İnanılmaz bir başarıdır bu, fakat bu sadece ilk adımdır. Sonrasında, başka bir gelişimsel dönüm noktası açığa çıkar, ve çocuk cümlelerle iletişim kurmaya başlar. Yani, "bu resimdeki bir kedidir" demek yerine dinlediğiniz gibi küçük kız bize "bu yatağın üzerinde uzanan bir kedidir" diyor

So to teach a computer to see a picture and generate sentences, the marriage between big data and machine learning algorithm has to take another step. Now, the computer has to learn from both pictures as well as natural language sentences generated by humans. Just like the brain integrates vision and language, we developed a model that connects parts of visual things like visual snippets with words and phrases in sentences.

Bilgisayarı resimleri görmek ve cümle kurmak için eğitmek, big data ile makine öğrenim algoritmasının beraberliği için bir adım daha atılmalı. Şimdilik, bilgisayarın her resimden insanlar tarafından oluşturulmuş kadar iyi cümleler öğrenmesi gerek. Beynin görsellik ve dili bütünleştirdiği gibi, biz de ufak görsel parçacıklar gibi görsel şeylerle cümlelerdeki kelime ve ifadeleri birleştirecek bir model geliştirdik.

About four months ago, we finally tied all this together and produced one of the first computer vision models that is capable of generating a human-like sentence when it sees a picture for the first time. Now, I'm ready to show you what the computer says when it sees the picture that the little girl saw at the beginning of this talk.

Yaklaşık dört ay önce, sonunda bütün bunları bağladık ve bir fotoğrafı ilk kez gördüğünde bir insan gibi cümle kurabilme yeteneğine sahip ilk bilgisayar görme modelinden bir tane yaptık. Şu anda, bilgisayarın konuşmamızın başında küçük kızın gördüğü resimleri gördüğünde neler söylediğini size göstermeye hazırım.

(Video) Computer: A man is standing next to an elephant. A large airplane sitting on top of an airport runway.

(Video) Bilgisayar: Bir adam filin yanında duruyor. Geniş bir uçak, uçak pistinin üstünde oturuyor.

FFL: Of course, we're still working hard to improve our algorithms, and it still has a lot to learn. (Applause)

FFL: Tabii, hala sıkı bir şekilde algoritmamızı geliştirmek için çalışıyoruz ve henüz öğreneceği çok sey var. (Alkış)

And the computer still makes mistakes.

Bilgisayar henüz hatalar yapmakta.

(Video) Computer: A cat lying on a bed in a blanket.

Bilgisayar: Bir kedi battaniyenin içinde yatakta uzanıyor.

FFL: So of course, when it sees too many cats, it thinks everything might look like a cat.

FFL: Tabii, oldukça fazla kedi gördüğünden herşeyin kediye benzeyebileceğini düşünüyor.

(Video) Computer: A young boy is holding a baseball bat. (Laughter)

Bilgisayar: Genç erkek bir beysbol sopasını tutuyor. (Gülüşmeler)

FFL: Or, if it hasn't seen a toothbrush, it confuses it with a baseball bat.

FFL: Ya da, henüz bir diş fırçası görmemişse, beysbol sopasıyla karıştırıyor

(Video) Computer: A man riding a horse down a street next to a building. (Laughter)

Bilgisayar: Bir adam binanın kenarından atını sokak aşağı sürüyor. (Gülüşmeler)

FFL: We haven't taught Art 101 to the computers.

FFL: Henüz bilgisayarlara Sanat 101 dersini öğretmedik.

(Video) Computer: A zebra standing in a field of grass.

Bilgisayar: Bir zebra otlukların içinde duruyor.

FFL: And it hasn't learned to appreciate the stunning beauty of nature like you and I do.

FFL: Ve henüz doğanın büyüleci güzelliğini takdir etmeyi bizim gibi öğrenmedi.

So it has been a long journey. To get from age zero to three was hard. The real challenge is to go from three to 13 and far beyond. Let me remind you with this picture of the boy and the cake again. So far, we have taught the computer to see objects or even tell us a simple story when seeing a picture.

Uzun bir yolculuktu. Sıfırdan üç yaşına getirmek oldukça zordu. Asıl zor olan üç yaşından on üç yaş ve daha ötesine götürebilmek. Size bu resmi tekrar hatırlatmak istiyorum, çocuk ve kekin olduğu. Şu ana dek, bilgisayara nesneleri görebilmesini hatta gördüğü resimden küçük bir hikaye anlatmasını bile öğrettik.

(Video) Computer: A person sitting at a table with a cake.

Bilgisayar: Biri yaş pastanın olduğu masada oturuyor.

FFL: But there's so much more to this picture than just a person and a cake. What the computer doesn't see is that this is a special Italian cake that's only served during Easter time. The boy is wearing his favorite t-shirt given to him as a gift by his father after a trip to Sydney, and you and I can all tell how happy he is and what's exactly on his mind at that moment.

FFL: Fakat bu resimde sadece bir kişi ve pastadan daha fazlası var. Bilgisayarın göremediği şey, onun sadece Paskalya süresince servis edilen özel bir İtalyan pastası olduğu. Çocuk, babası tarafından Sidney gezisinden sonra kendisine hediye edilen en sevdiği tişörtünü giyiyor, hepimiz onun nasıl mutlu olduğunu ve şu anda kafasından geçenleri söyleyebiliriz.

This is my son Leo. On my quest for visual intelligence, I think of Leo constantly and the future world he will live in. When machines can see, doctors and nurses will have extra pairs of tireless eyes to help them to diagnose and take care of patients. Cars will run smarter and safer on the road. Robots, not just humans, will help us to brave the disaster zones to save the trapped and wounded. We will discover new species, better materials, and explore unseen frontiers with the help of the machines.

Bu benim oğlum Leo. Görsel zeka araştırmalarımda, durmaksızın Leo'yu ve içinde yaşayacağı geleceği düşünüyorum. Makineler görebildiğinde, doktor ve hemşireler, tanı koymak ve hastalarla ilgilenmek için ek olarak yorulmayan göz çiftlerine sahip olacaklar. Arabalar yollarda daha güvenli daha akıllı bir şekilde gidecek. Robotlar, sadece insanlar değil, enkaz bölgelerinde tutsak ve yaralıları kurtarmada bizimle göğüs gerecekler. Yeni tür, daha iyi malzemeler bulacak ve makinelerin yardımıyla, görünmeyen sınırları keşfedeceğiz.

Little by little, we're giving sight to the machines. First, we teach them to see. Then, they help us to see better. For the first time, human eyes won't be the only ones pondering and exploring our world. We will not only use the machines for their intelligence, we will also collaborate with them in ways that we cannot even imagine.

Azar azar, makinelere görme yetisini veriyoruz. Önce, biz onlara görmeyi öğretiyoruz. Sonra, onlar daha iyi görebilmemiz için bize yardım ediyor. Öncelikle, dünyamızı keşfetmek ve düşünmek için gözlerimiz sadece insan gözleri olmayacak. Makineleri sadece zekaları için kullanmıyor, aynı zamanda hayal bile edemeyeceğimiz bir şekilde onlarla iş birliği yapıyoruz.

This is my quest: to give computers visual intelligence and to create a better future for Leo and for the world.

Benim araştırmam bu: bilgisayarlara görsel zekayı vermek ve Leo için, dünya için daha iyi bir gelecek oluşturmak.

Thank you.

Teşekkürler.

(Applause)

(Alkış)