Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

What I'm going to show you first, as quickly as I can, is some foundational work, some new technology that we brought to Microsoft as part of an acquisition almost exactly a year ago. This is Seadragon, and it's an environment in which you can either locally or remotely interact with vast amounts of visual data.

Wat ik u als eerste, snel wil laten zien is het fundament van een nieuwe technologie die wij bij een overname naar Microsoft brachten ongeveer een jaar geleden. Dit is Seedragon. En het is een omgeving waarin u lokaal of op afstand met immense hoeveelheden visuele data kunt werken.

We're looking at many, many gigabytes of digital photos here and kind of seamlessly and continuously zooming in, panning through it, rearranging it in any way we want. And it doesn't matter how much information we're looking at, how big these collections are or how big the images are. Most of them are ordinary digital camera photos, but this one, for example, is a scan from the Library of Congress, and it's in the 300 megapixel range. It doesn't make any difference because the only thing that ought to limit the performance of a system like this one is the number of pixels on your screen at any given moment. It's also very flexible architecture. This is an entire book, so this is an example of non-image data. This is "Bleak House" by Dickens. Every column is a chapter. To prove to you that it's really text, and not an image, we can do something like so, to really show that this is a real representation of the text; it's not a picture. Maybe this is an artificial way to read an e-book. I wouldn't recommend it.

We bekijken hier vele gigabytes aan digitale beelden en kunnen hier naadloos op blijven in- en uitzoomen, navigeren en het sorteren naar onze wens. En het maakt niet uit hoeveel informatie we bekijken, hoe groot deze collecties of beelden zijn. De meeste hiervan zijn normale digitale camera foto's, maar deze is bijvoorbeeld een scan van de Library of Congress, en zit in het 300 megapixel bereik. Het maakt allemaal niets uit omdat het enige wat de prestaties zou mogen beïnvloeden, op een dergelijk systeem, de hoeveelheid pixels op het scherm is. Het is ook een erg flexibele architectuur. Dit is een volledig boek, een voorbeeld van niet-beeld data. Dit is Het grauwe huis door Dickens. Elke kolom is een hoofdstuk. Om te bewijzen dat het werkelijk tekst is, en geen afbeelding, kunnen we zoiets doen, om te laten zien dat dit echt tekst is; dit is geen afbeelding. Misschien is dit een ietwat vreemde manier om een e-boek te lezen. Ik zou het niet aanbevelen.

This is a more realistic case, an issue of The Guardian. Every large image is the beginning of a section. And this really gives you the joy and the good experience of reading the real paper version of a magazine or a newspaper, which is an inherently multi-scale kind of medium. We've done something with the corner of this particular issue of The Guardian. We've made up a fake ad that's very high resolution -- much higher than in an ordinary ad -- and we've embedded extra content. If you want to see the features of this car, you can see it here. Or other models, or even technical specifications. And this really gets at some of these ideas about really doing away with those limits on screen real estate. We hope that this means no more pop-ups and other rubbish like that -- shouldn't be necessary.

Dit is een realistischer voorbeeld. Dit is een uitgave van The Guardian. Elke grote afbeelding is het begin van een onderdeel. Dit geeft u echt het plezier en de ervaring van een echte papieren tijdschrift of krant, welk ook eigenlijk een schalend medium is. We hebben ook iets gedaan met de hoek van deze uitgave van The Guardian. We hebben een nep advertentie gemaakt van hoge resolutie -- veel hoger dan een normale advertentie -- en we hebben extra informatie ingebracht. Als u de mogelijkheden van deze auto wilt zien, ziet u het hier. Of andere modellen, of zelfs technische specificaties. En deze ideeën kunnen er echt voor zorgen dat we de beperkingen van handelen via het scherm teniet kunnen doen. We hopen dat dit geen pop-ups meer betekent en andere dergelijke flauwekul -- niet meer nodig zal zijn.

Of course, mapping is one of those obvious applications for a technology like this. And this one I really won't spend any time on, except to say that we have things to contribute to this field as well. But those are all the roads in the U.S. superimposed on top of a NASA geospatial image. So let's pull up, now, something else. This is actually live on the Web now; you can go check it out.

Natuurlijk zijn landkaarten een voor de hand liggende toepassing voor een technologie zoals deze, ik zal hier geen tijd aan spenderen, behalve dat we aan dit veld ook nog veel kunnen bijdragen. Dit zijn alle wegen in de V.S. geplaatst bovenop een afbeelding van de NASA . Laten we nu naar iets anders kijken. Dit is trouwens nu live op het Web; je kunt het bekijken.

This is a project called Photosynth, which marries two different technologies. One of them is Seadragon and the other is some very beautiful computer-vision research done by Noah Snavely, a graduate student at the University of Washington, co-advised by Steve Seitz at U.W. and Rick Szeliski at Microsoft Research. A very nice collaboration. And so this is live on the Web. It's powered by Seadragon. You can see that when we do these sorts of views, where we can dive through images and have this kind of multi-resolution experience.

Dit project heet Photosynth, die twee technieken met elkaar verbind. Een ervan is Seadragon en het ander is een erg mooi stuk computer-visie onderzoek door Noah Snavely, promovendus aan de Universiteit van Washington, begeleid door Steve Seitz op U.W. en Rick Szeliski bij Microsoft Research. Een mooie samenwerking. Dit is dus live op het Web. Het is mede mogelijk gemaakt door Seadragon. Wanneer we dit doen, ziet u, dat we door de beelden duiken en de multi-resolutie ervaren.

But the spatial arrangement of the images here is actually meaningful. The computer vision algorithms have registered these images together so that they correspond to the real space in which these shots -- all taken near Grassi Lakes in the Canadian Rockies -- all these shots were taken. So you see elements here of stabilized slide-show or panoramic imaging, and these things have all been related spatially. I'm not sure if I have time to show you any other environments. Some are much more spatial. I would like to jump straight to one of Noah's original data-sets -- this is from an early prototype that we first got working this summer -- to show you what I think is really the punch line behind the Photosynth technology, It's not necessarily so apparent from looking at the environments we've put up on the website. We had to worry about the lawyers and so on.

De ruimtelijke schikking van de beelden heeft hier betekenis. De computer-visie algoritmes hebben deze beelden samengebracht, zodat zij overeenkomen met de echte ruimte -- de Grassi Lakes in de Canadese Rockies -- waar ze genomen zijn. Zo ziet u elementen van gestabiliseerde diashows of panorama beelden, en al deze dingen zijn ruimtelijk geschikt. Ik weet niet zeker of er nog tijd is, andere omgevingen te tonen. Er zijn er die nog veel ruimtelijker zijn. Ik wil graag een van Noah's oorspronkelijke data-sets tonen -- en dit is van een vroege prototype van Photosynth die we deze zomer werkend kregen -- om the communiceren wat ik in mijn hoofd heb is de werkelijke kracht van deze techniek, de Photosynth-technologie. En dat is niet altijd even goed te zien bij het bekijken van de omgevingen op onze website. We moesten rekening houden met de advocaten enzovoorts.

This is a reconstruction of Notre Dame Cathedral that was done entirely computationally from images scraped from Flickr. You just type Notre Dame into Flickr, and you get some pictures of guys in T-shirts, and of the campus and so on. And each of these orange cones represents an image that was discovered to belong to this model. And so these are all Flickr images, and they've all been related spatially in this way. We can just navigate in this very simple way.

Dit is een reconstructie van de Notre Dame die geheel door de computer berekend is met afbeeldingen uit Flickr. U zoekt gewoon Notre Dame op in Flickr, en u krijgt beelden van mensen in t-shirts, en de campus en dergelijke. En elk van deze oranje kegels geeft een afbeelding weer die onderdeel van dit model bleken te zijn. En dus zijn dit allemaal Flickr afbeeldingen, die op deze manier ruimtelijk gekoppeld zijn. En we kunnen op eenvoudige wijze navigeren.

(Applause)

(Applaus)

(Applause ends)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here.

Weet u, ik had nooit gedacht dat ik ooit bij Microsoft zou werken. Het is overweldigend hier zo ontvangen te worden.

(Laughter)

(Gelach)

I guess you can see this is lots of different types of cameras: it's everything from cell-phone cameras to professional SLRs, quite a large number of them, stitched together in this environment. If I can find some of the sort of weird ones -- So many of them are occluded by faces, and so on. Somewhere in here there is actually a series of photographs -- here we go. This is actually a poster of Notre Dame that registered correctly. We can dive in from the poster to a physical view of this environment.

Ik geloof dat u kunt zien dat dit vele typen camera's zijn: alles van gsm tot spiegelreflexcamera's, een grote aantal ervan, samengevoegd in deze omgeving. Ik zal proberen wat vreemde beelden te vinden. Er zijn er zo veel bedekt door gezichten en dergelijke. Ergens hiertussen zit zelfs een serie van foto's -- hier zijn ze. Dit is zelfs een poster van de Notre Dame die ook goedgekeurd is door het programma. Als we er via de poster in duiken, in een fysiek beeld van deze omgeving.

What the point here really is is that we can do things with the social environment. This is now taking data from everybody -- from the entire collective memory, visually, of what the Earth looks like -- and link all of that together. Those photos become linked, and they make something emergent that's greater than the sum of the parts. You have a model that emerges of the entire Earth. Think of this as the long tail to Stephen Lawler's Virtual Earth work. And this is something that grows in complexity as people use it, and whose benefits become greater to the users as they use it. Their own photos are getting tagged with meta-data that somebody else entered. If somebody bothered to tag all of these saints and say who they all are, then my photo of Notre Dame Cathedral suddenly gets enriched with all of that data, and I can use it as an entry point to dive into that space, into that meta-verse, using everybody else's photos, and do a kind of a cross-modal and cross-user social experience that way. And of course, a by-product of all of that is immensely rich virtual models of every interesting part of the Earth, collected not just from overhead flights and from satellite images and so on, but from the collective memory.

Wat we hiermee willen zeggen is dat we hier echt dingen kunnen doen met de sociale omgeving. Dit gebruikt nu de data van iedereen -- van het gehele collectieve geheugen van hoe, op het zicht, de Aarde er uit ziet -- en koppelt dat alles aan elkaar. Al deze fotos worden met elkaar verenigd, en vormen samen een geheel wat groter is dan de som van de delen. U heeft dan een model dat vanuit de hele Aarde ontstaat. Zie dit als een vervolg op Stephen Lawler's werk De Virtuele Aarde. En dit is iets dat groeit in complexiteit wanneer men het gebruikt, en waarvan ook de voordelen groeien voor de mensen die het gebruiken. Hun eigen foto's worden voorzien van meta-data die iemand anders invoerde. Als iemand al deze heiligen zou willen aanklikken en hun namen invoert, dan wordt mijn foto van de Notre Dame opeens verrijkt met al die data, en die kan ik dan gebruiken als een toegang tot die ruimte, dat meta-versum, gebruikmakend van iedereens foto's, en creëer hiermee een soort van kruis-modale, en gezamenlijke sociale ervaring. En natuurlijk, is het bij-product hiervan een immens rijk virtueel model van elke interessante plek op Aarde, die niet alleen uit luchtfoto's, satelietbeelden en andere toepassingen is ontstaan, maar ook uit het collectief geheugen.

Thank you so much.

Ontzettend bedankt.

(Applause)

(Applaus)

(Applause ends)

Chris Anderson: Do I understand this right? What your software is going to allow, is that at some point, really within the next few years, all the pictures that are shared by anyone across the world are going to link together?

Chris Anderson: Heb ik goed begrepen dat wat uw software mogelijk maakt, op een dag, in de komende jaren, alle beelden die gedeeld worden door iedereen op de wereld samen kan koppelen?

BAA: Yes. What this is really doing is discovering, creating hyperlinks, if you will, between images. It's doing that based on the content inside the images. And that gets really exciting when you think about the richness of the semantic information a lot of images have. Like when you do a web search for images, you type in phrases, and the text on the web page is carrying a lot of information about what that picture is of. What if that picture links to all of your pictures? The amount of semantic interconnection and richness that comes out of that is really huge. It's a classic network effect.

BAA: Ja. Wat dit echt doet is ontdekken. Het creërt, naar jouw wens, hyperlinks tussen afbeeldingen. En het doet dat op basis van de data in de afbeeldingen. En dat wordt ontzettend spannend wanneer je denkt aan de rijkdom van semantische informatie waarover veel van deze afbeeldingen beschikken. Wanneer je op het web zoekt naar beelden, voer je een zoekopdracht in, en de tekst op de webpagina bevat veel informatie over wat de afbeelding weergeeft. Wat als nu die afbeelding gekoppeld is aan al jouw afbeeldingen? Dan zijn het aantal onderlinge semantische verbindingen en de hoeveelheid rijkdom die daaruit voortkomt gigantisch. Het is een klassiek netwerkeffect. CA: Blaise, dat is werkelijk ongelooflijk. Gefeliciteerd.

CA: Truly incredible. Congratulations.

BAA: Ontzettend bedankt.

(Applause)

(Applaus)

(Applause ends)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here.

Weet u, ik had nooit gedacht dat ik ooit bij Microsoft zou werken. Het is overweldigend hier zo ontvangen te worden.

(Laughter)

(Gelach)

Thank you so much.

Ontzettend bedankt.

(Applause)

(Applaus)

(Applause ends)

Chris Anderson: Heb ik goed begrepen dat wat uw software mogelijk maakt, op een dag, in de komende jaren, alle beelden die gedeeld worden door iedereen op de wereld samen kan koppelen?

CA: Truly incredible. Congratulations.

BAA: Ontzettend bedankt.

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art