Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

What I'm going to show you first, as quickly as I can, is some foundational work, some new technology that we brought to Microsoft as part of an acquisition almost exactly a year ago. This is Seadragon, and it's an environment in which you can either locally or remotely interact with vast amounts of visual data.

Co Wam tu pokażę, tak szybko jak można, to praca podstawowa, nowa technologia, którą wnieśliśmy do Microsoftu jako część nabycia dokładnie rok temu. To jest Seadragon. Jest to środowisko, w którym można lokalnie lub zdalnie komunikować się z wielką ilością danych wizualnych.

We're looking at many, many gigabytes of digital photos here and kind of seamlessly and continuously zooming in, panning through it, rearranging it in any way we want. And it doesn't matter how much information we're looking at, how big these collections are or how big the images are. Most of them are ordinary digital camera photos, but this one, for example, is a scan from the Library of Congress, and it's in the 300 megapixel range. It doesn't make any difference because the only thing that ought to limit the performance of a system like this one is the number of pixels on your screen at any given moment. It's also very flexible architecture. This is an entire book, so this is an example of non-image data. This is "Bleak House" by Dickens. Every column is a chapter. To prove to you that it's really text, and not an image, we can do something like so, to really show that this is a real representation of the text; it's not a picture. Maybe this is an artificial way to read an e-book. I wouldn't recommend it.

Patrzymy na wiele, wiele gigabajtów fotografii cyfrowych bezproblemowo i stale pokazywanych, przesuwanych po elementach, ustawiając je jak tylko chcemy. Nie ma znaczenia na ile informacji patrzymy, jak duże są to kolekcje lub jak wielkie obrazy. Większość z nich to zwykłe zdjęcia cyfrowe, ale ten, na przykład, jest obrazem z Biblioteki Kongresu i jest w zakresie 300 megapikseli. Nie ma to znaczenia, ponieważ jedyną rzeczą, która powinna ograniczyć wydajność systemu takiego jak ten, jest liczba pikseli na Waszym ekranie w każdej chwili. Jest to także elastyczna architektura. Jest to cała książka, przykład danych bez obrazów. To jest Samotnia Dickensa. Każda kolumna to oddzielny rozdział. Aby udowodnić, że to jest rzeczywiście tekst, a nie obraz, możemy zrobić coś takiego, co rzeczywiście pokaże, że jest to prawdziwa prezentacja tekstu, że nie jest to obraz. Może jest to sztuczny sposób czytania książki elektronicznej. Nie polecałbym.

This is a more realistic case, an issue of The Guardian. Every large image is the beginning of a section. And this really gives you the joy and the good experience of reading the real paper version of a magazine or a newspaper, which is an inherently multi-scale kind of medium. We've done something with the corner of this particular issue of The Guardian. We've made up a fake ad that's very high resolution -- much higher than in an ordinary ad -- and we've embedded extra content. If you want to see the features of this car, you can see it here. Or other models, or even technical specifications. And this really gets at some of these ideas about really doing away with those limits on screen real estate. We hope that this means no more pop-ups and other rubbish like that -- shouldn't be necessary.

Tu jest bardziej realistyczny przykład. To jest wydanie The Guardian. Każdy duży obraz jest początkiem części. To naprawdę daje radość i dobre doświadczenie czytania prawdziwej wersji gazety lub magazynu, który jest z natury wieloskalowym środkiem. Zrobiliśmy także coś małego w rogu tego wydania The Guardian. Wprowadziliśmy sztuczną reklamę o bardzo dużej rozdzielczości - o wiele większej, niż moglibyście uzyskać w prawdziwej reklamie - wbudowaliśmy dodatkową treść. Jeżeli chcecie zobaczyć cechy tego samochodu, możecie je tu zobaczyć. Lub innych modeli, czy nawet specyfikacje techniczne. I to naprawdę pokazuje niektóre z tych pomysłów, w jaki sposób pozbyć się tych ograniczeń na ekranie nieruchomości. Mamy nadzieję, że to oznacza iż nie będzie już więcej okienek wyskakujących i innych śmieci tego rodzaju - nie powinnno być konieczne.

Of course, mapping is one of those obvious applications for a technology like this. And this one I really won't spend any time on, except to say that we have things to contribute to this field as well. But those are all the roads in the U.S. superimposed on top of a NASA geospatial image. So let's pull up, now, something else. This is actually live on the Web now; you can go check it out.

Oczywiście, mapowanie jest jedną z tych wyraźnych aplikacji w takiej technologii. Tej nie poświęcę żadnego czasu, za wyjątkiem powiedzenia, że mamy rozwiązania również w tej dziedzinie. A są to wszystkie drogi w USA nałożone na geoprzestrzenny obraz NASA. Wyświetlmy więc teraz coś innego. To jest faktycznie pokazywane teraz na żywo w Sieci, możecie to sprawdzić.

This is a project called Photosynth, which marries two different technologies. One of them is Seadragon and the other is some very beautiful computer-vision research done by Noah Snavely, a graduate student at the University of Washington, co-advised by Steve Seitz at U.W. and Rick Szeliski at Microsoft Research. A very nice collaboration. And so this is live on the Web. It's powered by Seadragon. You can see that when we do these sorts of views, where we can dive through images and have this kind of multi-resolution experience.

To projekt nazywany Photosynth, który właściwie łączy dwie technologie. Jedna z nich to Seadragon, a druga to piękna wizja badania komputerowego zrobiona przez Noah Snavely, studenta podyplomowego na Uniwersytecie Waszyngton, pod doradztwem Steva Seutza z UW i Ricka Szeliskiego z Centrum Microsoft. Piękna współpraca. To jest na żywo w Sieci. Zasilane jest przez Seadragon. Możecie zobaczyć, że gdy wprowadzamy tego rodzaju ujęcia, gdzie możemy wskoczyć w obrazy i mieć tego rodzaju wielorozdzielcze doświadczenie.

But the spatial arrangement of the images here is actually meaningful. The computer vision algorithms have registered these images together so that they correspond to the real space in which these shots -- all taken near Grassi Lakes in the Canadian Rockies -- all these shots were taken. So you see elements here of stabilized slide-show or panoramic imaging, and these things have all been related spatially. I'm not sure if I have time to show you any other environments. Some are much more spatial. I would like to jump straight to one of Noah's original data-sets -- this is from an early prototype that we first got working this summer -- to show you what I think is really the punch line behind the Photosynth technology, It's not necessarily so apparent from looking at the environments we've put up on the website. We had to worry about the lawyers and so on.

Ale tu organizacja przestrzenna obrazów ma właściwie znaczenie. Algorytmy wizji komputerowej zarejestrowały te obrazy razem, więc zgadzają się z rzeczywistą przestrzenią, w której te zdjęcia - wszystkie zrobione w pobliżu jezior Grassi w Kanadyjskich Górach Skalistych - były zrobione. Więc widzicie tu elementy ustabilizowanego pokazu slajdów lub obrazowania panoramicznego, a te rzeczy wszystkie zostały odniesione przestrzennie. Nie jestem pewien czy mam czas pokazać Wam inne środowiska. Są pewne, które pokazują więcej przestrzennie. Chciałbym przeskoczyć prosto do jednego z oryginalnych zestawów danych Noah - to pochodzi z wczesnego prototypu Photosynth, który pierwszy raz uruchomiliśmy w lecie - aby pokazać Wam, że to co myślę jest właściwie puentą tej technologii, technologii Photosynth. Nie jest to tak bardzo wyraźne, gdy patrzymy na środowiska, które umieściliśmy tu, na stronie. Musieliśmy się martwić prawnikami i innymi zagadnieniami.

This is a reconstruction of Notre Dame Cathedral that was done entirely computationally from images scraped from Flickr. You just type Notre Dame into Flickr, and you get some pictures of guys in T-shirts, and of the campus and so on. And each of these orange cones represents an image that was discovered to belong to this model. And so these are all Flickr images, and they've all been related spatially in this way. We can just navigate in this very simple way.

To jest rekonstrukcja Katedry Notre Dame, która została zrobiona całkowicie komputerowo z obrazów zdjętych z Flickra. Wpisujemy Notre Dame na Flickra i mamy zdjęcia facetów w koszulkach, kampusu i tym podobne. A każdy z tych pomarańczowych stożków reprezentuje obraz, który został odnaleziony jako należący do tego modelu. Więc są to wszystkie obrazy Flickra i wszystkie zostały ustawione przestrzennie w ten sposób. Możemy po prostu nawigować w ten prosty sposób.

(Applause)

(Oklaski)

(Applause ends)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here.

Wiecie, nigdy nie myślałem, że będę pracował w Microsofcie. Bardzo radosne jest spotkać się z takim przyjęciem tu.

(Laughter)

(Śmiech)

I guess you can see this is lots of different types of cameras: it's everything from cell-phone cameras to professional SLRs, quite a large number of them, stitched together in this environment. If I can find some of the sort of weird ones -- So many of them are occluded by faces, and so on. Somewhere in here there is actually a series of photographs -- here we go. This is actually a poster of Notre Dame that registered correctly. We can dive in from the poster to a physical view of this environment.

Myślę, że widzicie, że jest to wiele aparatów różnego rodzaju: od telefonów komórkowych do zawodowych SLR-ów, całkiem duża ich liczba, powiązanych razem w tym środowisku. I jeśli mogę, znajdę kilka dziwnych. Wiele z nich zasłoniętych jest twarzami i tym podobne. Gdzieś tu właściwie jest seria fotografii - tu ją mamy. To jest właściwie plakat Notre Dame dobrze zarejestrowany. Możemy przejść z plakatu do fizycznej wizji tego środowiska.

What the point here really is is that we can do things with the social environment. This is now taking data from everybody -- from the entire collective memory, visually, of what the Earth looks like -- and link all of that together. Those photos become linked, and they make something emergent that's greater than the sum of the parts. You have a model that emerges of the entire Earth. Think of this as the long tail to Stephen Lawler's Virtual Earth work. And this is something that grows in complexity as people use it, and whose benefits become greater to the users as they use it. Their own photos are getting tagged with meta-data that somebody else entered. If somebody bothered to tag all of these saints and say who they all are, then my photo of Notre Dame Cathedral suddenly gets enriched with all of that data, and I can use it as an entry point to dive into that space, into that meta-verse, using everybody else's photos, and do a kind of a cross-modal and cross-user social experience that way. And of course, a by-product of all of that is immensely rich virtual models of every interesting part of the Earth, collected not just from overhead flights and from satellite images and so on, but from the collective memory.

Ważnym punktem tu jest rzeczywiście, że możemy robić rzeczy ze środowiskiem społecznym. Bierzemy dane od wszystkich - z całej kolektywnej pamięci wizualnie, tego jak wygląda Ziemia - i łączymy je razem. Wszystkie te fotografie są połączone razem i tworzą coś nowo powstałego, co jest czymś więcej niż suma części. Mamy model, który wyłania się jako cała Ziemia. Wyobraźcie to sobie jako długi ogon wirtualnej Ziemi Stephena Lawlera. Zwiększa to swoją kompleksowość w miarę korzystania z niego przez ludzi, a jego korzyści stają się większe dla użytkowników w miarę jego używania. Ich zdjęcia zostają naznaczone w meta-danych, które ktoś gdzieś wprowadził. Jeśli ktoś poświęciłby czas by oznaczyć wszystkich tych świętych i powiedział kim oni są, wówczas zdjęcie Katedry Notre Dame nagle byłoby bogatsze o te wszystkie informacje i ja mógłbym je użyć jako punkt wejścia w tę przestrzeń, w ten meta-wersalik, korzystając z czyichś zdjęć i stworzyć pewne krzyżowo-modalne, wykorzystywane przez wielu użytkowników doświadczenie społeczne w ten sposób. I oczywiście produktem ubocznym tego wszystkiego są niezwykle bogate modele wirtualne każdej interesującej części Ziemi, zebrane nie tylko z lotów i obrazów satelitarnych i tym podobnych, lecz z kolektywnej pamięci.

Thank you so much.

Dziękuję bardzo.

(Applause)

(Oklaski)

(Applause ends)

Chris Anderson: Do I understand this right? What your software is going to allow, is that at some point, really within the next few years, all the pictures that are shared by anyone across the world are going to link together?

Chris Anderson: Czy dobrze rozumiem? Twoje oprogramowanie pozwoli w pewnym momencie, właściwie za kilka lat, że wszystkie zdjęcia, którymi się dzielimy z kimkolwiek na całym świecie zostaną połączone?

BAA: Yes. What this is really doing is discovering, creating hyperlinks, if you will, between images. It's doing that based on the content inside the images. And that gets really exciting when you think about the richness of the semantic information a lot of images have. Like when you do a web search for images, you type in phrases, and the text on the web page is carrying a lot of information about what that picture is of. What if that picture links to all of your pictures? The amount of semantic interconnection and richness that comes out of that is really huge. It's a classic network effect.

BAA: Tak. Co ono właściwie robi to odkrywa. Tworzy hyperlinki, jeśli tak można nazwać, między obrazami. I robi to w oparciu o treść zawartą w tych obrazach. I to jest naprawdę radosne, gdy pomyślisz o bogactwie semantycznych informacji jakie ma wiele z tych obrazów. To tak jak szukasz obrazów na Sieci, wpisujesz wyrażenia, a tekst na stronie niesie wszystkie informacje o obrazie. Więc co, jeśli to zdjęcie łączy się ze wszystkimi zdjęciami? Wówczas ilość połączeń semantycznych i bogactwo, które jest tego wynikiem jest w rzeczywistości ogromne. Jest to klasyczny rezultat sieci. CA: Blaise, to doprawdy niesamowite. Gratulacje.

CA: Truly incredible. Congratulations.

BAA: Dziękuję bardzo.

(Applause)

(Oklaski)

(Applause ends)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here.

Wiecie, nigdy nie myślałem, że będę pracował w Microsofcie. Bardzo radosne jest spotkać się z takim przyjęciem tu.

(Laughter)

(Śmiech)

Thank you so much.

Dziękuję bardzo.

(Applause)

(Oklaski)

(Applause ends)

CA: Truly incredible. Congratulations.

BAA: Dziękuję bardzo.

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art