Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

What I'm going to show you first, as quickly as I can, is some foundational work, some new technology that we brought to Microsoft as part of an acquisition almost exactly a year ago. This is Seadragon, and it's an environment in which you can either locally or remotely interact with vast amounts of visual data.

Şimdi sizlere hızlıca göstermek istediğim şey, yaklaşık bir yıl önce bir satın alma işlemi ile Microsoft'a kazandırdığımız, temel bir çalışma, yeni bir teknoloji. Seadragon'dan bahsediyorum. Bu yazılım, büyük miktarda görsel veriyle, yerel olarak veya uzaktan etkileşim kurmanızı sağlayan bir ortam sunuyor.

We're looking at many, many gigabytes of digital photos here and kind of seamlessly and continuously zooming in, panning through it, rearranging it in any way we want. And it doesn't matter how much information we're looking at, how big these collections are or how big the images are. Most of them are ordinary digital camera photos, but this one, for example, is a scan from the Library of Congress, and it's in the 300 megapixel range. It doesn't make any difference because the only thing that ought to limit the performance of a system like this one is the number of pixels on your screen at any given moment. It's also very flexible architecture. This is an entire book, so this is an example of non-image data. This is "Bleak House" by Dickens. Every column is a chapter. To prove to you that it's really text, and not an image, we can do something like so, to really show that this is a real representation of the text; it's not a picture. Maybe this is an artificial way to read an e-book. I wouldn't recommend it.

Şu anda, gigabaytlarca büyüklükte dijital fotoğraflara bakıyoruz, sorunsuz şekilde, sürekli zum yapıyoruz, fotoğrafları kaydırıyor, istediğimiz her şekilde yeniden düzenliyoruz. Ve ne kadar çok bilgiye baktığımızın, bu koleksiyonların veya resimlerin ne kadar büyük olduğunun bir önemi yok. Bunların çoğu sıradan dijital fotoğraf makinesi resimleri, ama örneğin bu, Library of Congress'ten bir tarama, ve 300 megapiksel aralığında. Bunun bir önemi yok çünkü bunun gibi bir sistemin performansını sınırlayabilecek tek şey, belirli bir anda ekranınızdaki piksel sayısıdır. Bu aynı zamanda son derece esnek bir mimari. Bu komple bir kitap, görüntü dışı verilere bir örnek. Dickens'in Kasvetli Ev kitabı. Her sütun bir bölüm oluşturuyor. Size bunun gerçekten metin olduğunu, görüntü olmadığını kanıtlamak için şöyle bir şey yapabiliriz, bunun gerçekten metin olduğunun, resim olmadığının bir göstergesi. Bu belki de bir e-kitap okumak için yapay bir çeşit yol. Bunu pek tavsiye etmem.

This is a more realistic case, an issue of The Guardian. Every large image is the beginning of a section. And this really gives you the joy and the good experience of reading the real paper version of a magazine or a newspaper, which is an inherently multi-scale kind of medium. We've done something with the corner of this particular issue of The Guardian. We've made up a fake ad that's very high resolution -- much higher than in an ordinary ad -- and we've embedded extra content. If you want to see the features of this car, you can see it here. Or other models, or even technical specifications. And this really gets at some of these ideas about really doing away with those limits on screen real estate. We hope that this means no more pop-ups and other rubbish like that -- shouldn't be necessary.

Bu daha çok gerçekçi bir durum. Bu, The Guardian'ın bir baskısı. Her büyük resim bir bölümün başlangıcı. Ve bu size gerçekten bir derginin veya gazetenin gerçek kağıt versiyonunu okumanın keyfini ve güzel hissini verir, ki bu doğal olarak çoklu ölçek çeşidi bir araç. The Guardian'ın bu özel baskısının köşesine de bir şeyler yaptık. Çok yüksek çözünürlükte bir taklit ilan ürettik sıradan bir ilandakinden çok daha yüksek bir çözünürlük kullandık ve içine daha çok içerik koyduk. Bu arabanın özelliklerini görmek isterseniz, buradan görebilirsiniz. Veya başka modeller veya hatta başka teknik özellikler. Ve bu ekranda gayrimenkullerle ilgili sınırlamaların kaldırılması fikrini destekliyor. Artık açılır pencereler ve diğer benzeri can sıkıcı şeylerin gerekli olmayacağını umuyoruz.

Of course, mapping is one of those obvious applications for a technology like this. And this one I really won't spend any time on, except to say that we have things to contribute to this field as well. But those are all the roads in the U.S. superimposed on top of a NASA geospatial image. So let's pull up, now, something else. This is actually live on the Web now; you can go check it out.

Elbette, böyle bir teknoloji için öne çıkacak uygulamalardan biri haritacılık. Buna gerçekten vakit ayırmak istemiyorum, sadece bu alana katabileceğimiz şeyler olduğunu belirtmek istiyorum. Bunlar, NASA'nın uzaydan çektiği görüntülerin üzerine yerleştirilen ABD yollarının tamamı. Şimdi başka bir konuya geçelim. Buna şu anda internet; kendiniz de kontrol edebilirsiniz.

This is a project called Photosynth, which marries two different technologies. One of them is Seadragon and the other is some very beautiful computer-vision research done by Noah Snavely, a graduate student at the University of Washington, co-advised by Steve Seitz at U.W. and Rick Szeliski at Microsoft Research. A very nice collaboration. And so this is live on the Web. It's powered by Seadragon. You can see that when we do these sorts of views, where we can dive through images and have this kind of multi-resolution experience.

Bu Photosynth isminde bir projedir, ve gerçekten iki farklı teknolojiyi birleştiriyor. Bunlardan biri Seadragon ve diğeri, Washington Üniversitesinden bir yüksek lisans öğrencisi olan Noah Snavely ile aynı üniversiteden Steve Seitz ve Microsoft Research'ten Rick Szeliski'nin birlikte yaptığı çok güzel bir bilgisayar görüşü çalışmasıdır. Şu anda internette mevcut. Seadragon'la destekleniyor. Bunu, yaptığımız bu tarz görüntülerde görebilirsiniz, resimlerin arasına dalıp da bu çok çözünürlüklü deneyimleri elde ettiğimiz zaman.

But the spatial arrangement of the images here is actually meaningful. The computer vision algorithms have registered these images together so that they correspond to the real space in which these shots -- all taken near Grassi Lakes in the Canadian Rockies -- all these shots were taken. So you see elements here of stabilized slide-show or panoramic imaging, and these things have all been related spatially. I'm not sure if I have time to show you any other environments. Some are much more spatial. I would like to jump straight to one of Noah's original data-sets -- this is from an early prototype that we first got working this summer -- to show you what I think is really the punch line behind the Photosynth technology, It's not necessarily so apparent from looking at the environments we've put up on the website. We had to worry about the lawyers and so on.

Ama burada bu görüntülerin boyutsal düzenlemesi oldukça anlamlı. Bilgisayarla görüş algoritmaları bu görüntüleri bir araya kaydetmiş, dolayısıyla bu resimlerin çekildiği gerçek alanın, hepsi Kanada Rocky Dağları'nda bulunan Grassi Lakes'te çekilmiş, görüntüsüne birebir uyuyor. Burada, sabit slayt gösterisinin veya panoramik görüntülemenin öğelerini görüyorsunuz ve bunların hepsi uzamsal olarak birbiriyle bağdaştırılmıştır. Sizlere daha da başka ortamlar göstermek için vaktim var mı bilmiyorum. Çok daha uzamsal olanları da var. Hemen Noah'nın orijinal veri gruplarına geçip, bunlar ilk kez bu yaz üzerinde çalışmaya başladığımız Photosynth'in ilk prototipinden, size bu teknolojinin, Photosynth teknolojisinin arkasında yatan en önemli noktanın ne olduğunu göstermek istiyorum. Ve bu, web sitesine koyduğumuz ortamlara bakarak pek anlaşılamıyor. Avukatları vesaire düşünmek zorundaydık.

This is a reconstruction of Notre Dame Cathedral that was done entirely computationally from images scraped from Flickr. You just type Notre Dame into Flickr, and you get some pictures of guys in T-shirts, and of the campus and so on. And each of these orange cones represents an image that was discovered to belong to this model. And so these are all Flickr images, and they've all been related spatially in this way. We can just navigate in this very simple way.

Bu, Notre Dame Katedrali'nin yeniden inşası ve Flickr'dan topladığımız görüntüler ile tamamen bilgisayar ortamında yapıldı. Flickr'a sadece Notre Dame yazıyorsunuz ve tişörtlü adamların, kampüsün vesaire görüntüleri çıkıyor. Ve bu turuncu konilerin her biri, bu modele ait olduğu anlaşılan resimleri temsil ediyor. Bunların hepsi Flickr görüntüleri ve hepsi uzamsal olarak bu şekilde birbiriyle bağdaştırılmıştır. Ve böylece çok kolay bir şekilde gezinebiliyoruz.

(Applause)

(Alkış.)

(Applause ends)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here.

Bilirsiniz, asla Microsoft’ta çalışacağım aklıma gelmezdi. Burada bu şekilde karşılanıyor olmak çok sevindirici.

(Laughter)

(Gülüşmeler.)

I guess you can see this is lots of different types of cameras: it's everything from cell-phone cameras to professional SLRs, quite a large number of them, stitched together in this environment. If I can find some of the sort of weird ones -- So many of them are occluded by faces, and so on. Somewhere in here there is actually a series of photographs -- here we go. This is actually a poster of Notre Dame that registered correctly. We can dive in from the poster to a physical view of this environment.

Sanırım, bunların birçok çeşit kamera olduğunu görebiliyorsunuz: Cep telefonu kameralarından profesyonel SLR'lere kadar, hepsi, oldukça büyük bir miktarı bu ortamda birbiriyle bağdaştırılmış. Ve eğer yapabilirsem, o garip cinslerinden bir tane bulacağım. Çoğunu yüzler kapatmış, vesaire. Aslında burada bir yerde bir takım fotoğraflar olacaktı, işte buldum. Bu aslında Notre Dame'ın doğru bir şekilde kaydedilen bir posteri. Bu posterden bu ortamın fiziki görüntüsüne geçebiliriz.

What the point here really is is that we can do things with the social environment. This is now taking data from everybody -- from the entire collective memory, visually, of what the Earth looks like -- and link all of that together. Those photos become linked, and they make something emergent that's greater than the sum of the parts. You have a model that emerges of the entire Earth. Think of this as the long tail to Stephen Lawler's Virtual Earth work. And this is something that grows in complexity as people use it, and whose benefits become greater to the users as they use it. Their own photos are getting tagged with meta-data that somebody else entered. If somebody bothered to tag all of these saints and say who they all are, then my photo of Notre Dame Cathedral suddenly gets enriched with all of that data, and I can use it as an entry point to dive into that space, into that meta-verse, using everybody else's photos, and do a kind of a cross-modal and cross-user social experience that way. And of course, a by-product of all of that is immensely rich virtual models of every interesting part of the Earth, collected not just from overhead flights and from satellite images and so on, but from the collective memory.

Buradaki asıl nokta sosyal çevreyle bir şeyler yapabiliyor olmamızdır. Bu şimdi herkesten veri topluyor, Dünya'nın, nasıl göründüğüne ilişkin ortak belleklerin hepsinden, ve bunların hepsini birbirleriyle bağdaştırıyor. Bütün bu fotoğraflar birbirine bağlanıyor ve parçalarının toplamından çok daha büyük olan yeni bir şey oluşturuyorlar. Tüm dünyadan oluşan bir modelimiz var. Bunu, Stephen Lawler'in Sanal Dünya eserinin uzun bir kuyruğu gibi düşünün. Ve bu, insanlar kullandıkça karmaşıklaşan, ve kullanıcılar kullandıkça faydaları daha da büyüyen bir şey. Kendi fotoğrafları, başkalarının girmiş olduğu meta verilerle etiketleniyor. Eğer birileri bu azizleri imlemek zahmetine girseydi ve kim olduklarını söyleseydi, o zaman benim Notre Dame Katedrali resmim birden bütün o veriler ile donatılırdı ve ben bunu o ortama, o meta verilerine dalmak için giriş noktası olarak kullanırdım ve o şekilde herkesin resimlerini kullanıp, bir çeşit modaliteler arası ve kullanıcılar arası sosyal deneyim elde ederdim. Ve elbette, tüm bunların bir yan ürünü ise dünyanın bütün ilginç taraflarının son derece zengin sanal modelleridir, sadece uçakların çektiği görüntüler ve uydu görüntülerinden vesaire toplanan görüntüler değil aynı zamanda ortak bellekten de alınan görüntüler.

Thank you so much.

Çok teşekkür ederim.

(Applause)

(Alkış.)

(Applause ends)

Chris Anderson: Do I understand this right? What your software is going to allow, is that at some point, really within the next few years, all the pictures that are shared by anyone across the world are going to link together?

Chris Anderson: Şunu doğru mu anladım? Sizin yazılımınız bir yerden sonra, gerçekten önümüzdeki birkaç yıl içerisinde, dünya üzerinde insanlar tarafından paylaşılan bütün resimlerin temel olarak birbiriyle bağlantılandırılmasını mı sağlayacak?

BAA: Yes. What this is really doing is discovering, creating hyperlinks, if you will, between images. It's doing that based on the content inside the images. And that gets really exciting when you think about the richness of the semantic information a lot of images have. Like when you do a web search for images, you type in phrases, and the text on the web page is carrying a lot of information about what that picture is of. What if that picture links to all of your pictures? The amount of semantic interconnection and richness that comes out of that is really huge. It's a classic network effect.

BAA: Evet. Bunun aslında yaptığı şey keşfetmek. Eğer isterseniz görüntüler arasında köprüler oluşturuyor. Ve bunu, görüntülerin içindeki bilgilere göre yapıyor. Ve bu, bu görüntülerin çoğunda bulunan semantik bilgilerin zenginliğini düşündüğünüz zaman gerçekten heyecan verici oluyor. Tıpkı internette görüntü aramak için cümleler yazdığınız ve web sayfasındaki metnin o resmin ne olduğuna ilişkin birçok bilgi içerdiği gibi. Peki, ya o resim sizin bütün resimlerinize bağlanırsa? O zaman bunun beraberinde getireceği semantik ara bağlantıların toplamı ve zenginliklerin toplamı gerçekten çok büyük. Klasik bir ağ etkisi. CA: Blaise, bu gerçekten inanılmaz. Tebrik ederim.

CA: Truly incredible. Congratulations.

BAA: Çok teşekkür ederim.

(Applause)

(Alkış.)

(Applause ends)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here.

Bilirsiniz, asla Microsoft’ta çalışacağım aklıma gelmezdi. Burada bu şekilde karşılanıyor olmak çok sevindirici.

(Laughter)

(Gülüşmeler.)

Thank you so much.

Çok teşekkür ederim.

(Applause)

(Alkış.)

(Applause ends)

CA: Truly incredible. Congratulations.

BAA: Çok teşekkür ederim.

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art