Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

What I'm going to show you first, as quickly as I can, is some foundational work, some new technology that we brought to Microsoft as part of an acquisition almost exactly a year ago. This is Seadragon, and it's an environment in which you can either locally or remotely interact with vast amounts of visual data.

Ce que je vais vous montrer en premier, aussi vite que possible, c'est un travail fondateur, une nouvelle technologie que nous avons apportée à Microsoft comme part d'une acquisition il y a presque exactement un an. Voici Seadragon. Et c'est un environnement dans lequel vous pouvez, localement ou à distance, interagir avec de grandes quantités de données visuelles.

We're looking at many, many gigabytes of digital photos here and kind of seamlessly and continuously zooming in, panning through it, rearranging it in any way we want. And it doesn't matter how much information we're looking at, how big these collections are or how big the images are. Most of them are ordinary digital camera photos, but this one, for example, is a scan from the Library of Congress, and it's in the 300 megapixel range. It doesn't make any difference because the only thing that ought to limit the performance of a system like this one is the number of pixels on your screen at any given moment. It's also very flexible architecture. This is an entire book, so this is an example of non-image data. This is "Bleak House" by Dickens. Every column is a chapter. To prove to you that it's really text, and not an image, we can do something like so, to really show that this is a real representation of the text; it's not a picture. Maybe this is an artificial way to read an e-book. I wouldn't recommend it.

On a ici devant les yeux beaucoup, beaucoup de gigabytes de photos numériques et on fait des zooms avant de manière continue et en quelque sorte fluide, on navigue dans le truc, on le réarrange de la manière qu'on veut. Et ce quelle que soit la quantité d'informations que l'on visionne, la taille de ces collections ou la taille des images. La plupart sont des photos d'appareils numériques ordinaires, mais celle-ci, par exemple, est scannée à partir de la Bibliothèque du Congrès, et elle fait dans les 300 mégapixels. Cela ne fait aucune différence parce que la seule chose qui devrait limiter les performances d'un système comme celui-ci, c'est le nombre de pixels sur votre écran à un instant "t". C'est aussi une architecture très flexible. Voici un livre entier, un exemple de donnée qui n'est pas une image. C'est Bleak House de Dickens. Chaque colonne est un chapitre. Pour vous prouver que c'est vraiment du texte, et pas une image, on peut faire quelque chose comme ça, pour bien montrer que c'est une véritable représentation du texte; ce n'est pas une photo. C'est peut-être une manière un peu artificielle de lire un e-book. Je ne vous le recommande pas.

This is a more realistic case, an issue of The Guardian. Every large image is the beginning of a section. And this really gives you the joy and the good experience of reading the real paper version of a magazine or a newspaper, which is an inherently multi-scale kind of medium. We've done something with the corner of this particular issue of The Guardian. We've made up a fake ad that's very high resolution -- much higher than in an ordinary ad -- and we've embedded extra content. If you want to see the features of this car, you can see it here. Or other models, or even technical specifications. And this really gets at some of these ideas about really doing away with those limits on screen real estate. We hope that this means no more pop-ups and other rubbish like that -- shouldn't be necessary.

Voilà un exemple plus réaliste. C'est un numéro du Guardian. Chaque grande image est le début d'une section. Et cela vous donne véritablement la joie et l'expérience agréable de la lecture de la vraie version papier d'un magazine ou d'un journal, qui est un type de media multi-échelles par définition. Nous avons également mis au point quelque chose avec le coin de ce numéro du Guardian en particulier. Nous avons créé une fausse publicité en très haute résolution... bien plus haute que ce que vous pourriez obtenir dans une publicité classique... et nous avons inséré du contenu supplémentaire. Si vous voulez voir les caractéristiques de cette voiture, on peut les voir ici. Ou d'autres modèles, ou même des spécifications techniques. Et on en arrive vraiment à certaines de ces idées pour vraiment se débarrasser de ces limites d'espace de l'écran. On espère que cela signifie la fin des pop-ups et d'autres types de pollution du même genre... ne devraient plus être nécessaires.

Of course, mapping is one of those obvious applications for a technology like this. And this one I really won't spend any time on, except to say that we have things to contribute to this field as well. But those are all the roads in the U.S. superimposed on top of a NASA geospatial image. So let's pull up, now, something else. This is actually live on the Web now; you can go check it out.

Bien entendu, la cartographie est l'une des applications évidentes pour une technologie comme celle-ci. Et sur celle-ci je vais passer très rapidement, si ce n'est pour dire qu'on a des choses pour contribuer à ce domaine également. Ce sont toutes les routes des Etats-Unis superposées à une image géospatiale de la NASA. Bon, maintenant, allons chercher autre chose. Tout ça est en ce moment en ligne sur le Web, vous pouvez aller vérifier.

This is a project called Photosynth, which marries two different technologies. One of them is Seadragon and the other is some very beautiful computer-vision research done by Noah Snavely, a graduate student at the University of Washington, co-advised by Steve Seitz at U.W. and Rick Szeliski at Microsoft Research. A very nice collaboration. And so this is live on the Web. It's powered by Seadragon. You can see that when we do these sorts of views, where we can dive through images and have this kind of multi-resolution experience.

C'est un projet appelé Photosynth, qui en réalité marie deux technologies différentes. L'une d'entre elles est Seadragon et l'autre est issue d'une recherche magnifique sur la vision informatisée menée par Noah Snavely, un étudiant de 3ème cycle à l'Université de Washington, co-dirigée par Steve Seitz à l'UW et Rick Szeliski de la recherche Microsoft. Une très belle collaboration. Et donc c'est en ligne sur le Web. Cela fonctionne avec la technologie Seadragon. On peut le voir quand on utilise ce genre de visualisations, où on peut plonger dans les images et avoir cette sorte d'expérience multi-résolutions.

But the spatial arrangement of the images here is actually meaningful. The computer vision algorithms have registered these images together so that they correspond to the real space in which these shots -- all taken near Grassi Lakes in the Canadian Rockies -- all these shots were taken. So you see elements here of stabilized slide-show or panoramic imaging, and these things have all been related spatially. I'm not sure if I have time to show you any other environments. Some are much more spatial. I would like to jump straight to one of Noah's original data-sets -- this is from an early prototype that we first got working this summer -- to show you what I think is really the punch line behind the Photosynth technology, It's not necessarily so apparent from looking at the environments we've put up on the website. We had to worry about the lawyers and so on.

Mais l'arrangement dans l'espace des images dans ce cas-là a en fait un sens. Les algorithmes de vision informatisée ont enregistré ces images ensemble, afin qu'elles correspondent à l'espace réel dans lequel ces photos (toutes prises près des lacs Grassi dans les Rocheuses canadiennes) ont été prises. Donc vous voyez des éléments ici de diaporama stabilisé ou d'imagerie panoramique, et toutes ces choses ont été reliées entre elles dans l'espace. Je ne suis pas sûr d'avoir le temps de vous montrer d'autres environnements. Certains sont beaucoup plus spatiaux. J'aimerais sauter directement à l'un des premiers ensemble de données de Noah (et cela vient d'un premier prototype de Photosynth que nous avons fait fonctionner pour la première fois cet été) pour vous montrer ce qui d'après moi est vraiment le point fort de cette technologie, la technologie Photosynth. Et ce n'est pas nécessairement si évident quand on regarde les environnements que nous avons mis en ligne sur le site web. On a du se préoccuper des avocats et des trucs dans le genre.

This is a reconstruction of Notre Dame Cathedral that was done entirely computationally from images scraped from Flickr. You just type Notre Dame into Flickr, and you get some pictures of guys in T-shirts, and of the campus and so on. And each of these orange cones represents an image that was discovered to belong to this model. And so these are all Flickr images, and they've all been related spatially in this way. We can just navigate in this very simple way.

Voici une reconstruction de la cathédrale Notre-Dame qui a été faite entièrement par calcul informatique à partir d'images récupérées sur Flickr. On tape juste Notre Dame dans Flickr, et on trouve des photos de types en t-shirts, du campus (de l'université de ND, NdT) et ainsi de suite. Et chacun des ces cônes oranges représente une image qui a été identifiée comme appartenant à ce modèle. Et donc ce sont toutes des images Flickr, et elles ont toutes été reliées entre elles dans l'espace de cette manière. Et on peut naviguer dedans très simplement de cette manière.

(Applause)

(Applaudissements)

(Applause ends)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here.

Vous savez, je n'aurais jamais cru que je finirais par travailler chez Microsoft. C'est très gratifiant d'être reçu ici de cette manière.

(Laughter)

(Rires)

I guess you can see this is lots of different types of cameras: it's everything from cell-phone cameras to professional SLRs, quite a large number of them, stitched together in this environment. If I can find some of the sort of weird ones -- So many of them are occluded by faces, and so on. Somewhere in here there is actually a series of photographs -- here we go. This is actually a poster of Notre Dame that registered correctly. We can dive in from the poster to a physical view of this environment.

Je pense que vous pouvez voir qu'il s'agit de beaucoup de types d'appareils photos différents: il y a de tout, des appareils photo de téléphone portable aux reflex mono-objectifs de professionnels, un assez grand nombre de photos, cousues ensemble dans cet environnement. Et si j'y arrive, je vais en trouver quelques-unes un peu étranges. Beaucoup d'entre elles sont occultées par des visages, et ainsi de suite. Quelque part par là il y a même une série de photographies... nous y voilà. Il s'agit en fait d'un poster de Notre Dame qui a été enregistré comme correct. On peut plonger depuis le poster dans une vue physique de l'environnement.

What the point here really is is that we can do things with the social environment. This is now taking data from everybody -- from the entire collective memory, visually, of what the Earth looks like -- and link all of that together. Those photos become linked, and they make something emergent that's greater than the sum of the parts. You have a model that emerges of the entire Earth. Think of this as the long tail to Stephen Lawler's Virtual Earth work. And this is something that grows in complexity as people use it, and whose benefits become greater to the users as they use it. Their own photos are getting tagged with meta-data that somebody else entered. If somebody bothered to tag all of these saints and say who they all are, then my photo of Notre Dame Cathedral suddenly gets enriched with all of that data, and I can use it as an entry point to dive into that space, into that meta-verse, using everybody else's photos, and do a kind of a cross-modal and cross-user social experience that way. And of course, a by-product of all of that is immensely rich virtual models of every interesting part of the Earth, collected not just from overhead flights and from satellite images and so on, but from the collective memory.

Le vrai but de tout ça, c'est qu'on peut faire des choses avec l'environnement social. On prend maintenant les données de tout le monde... de l'ensemble de la mémoire collective de ce à quoi ressemble la Terre, visuellement... et on relie tout ça ensemble. Toutes ces photos deviennent liées entre elles, et elles font émerger quelque chose de plus grand que la somme des parties. Il y a un modèle de la Terre entière qui émerge. Voyez ça comme la longue traîne du travail de Stephen Lawler sur la Terre Virtuelle. Et c'est quelque chose qui devient de plus en plus complexe à mesure que les gens l'utilisent, et dont les bénéfices s'accroissent pour les utilisateurs à mesure qu'ils l'utilisent. Leurs propres photos sont marquées avec des méta-données que quelqu'un d'autre a entrées. Si quelqu'un s'embêtait à marquer tous ces saints et à donner tous leurs noms, ma photo de la cathédrale Notre Dame serait enrichie d'un coup de toutes ces données, et je pourrais l'utiliser comme point d'entrée pour plonger dans cet espace, dans ce méta-verse, en utilisant les photos de tout le monde, pour faire une sorte d'expérience sociale trans-modale et trans-utilisateurs de cette manière. Et bien sûr, un sous-produit de tout ça, ce sont des modèles virtuels immensément riches de tous les coins intéressants la Terre, récoltés pas simplement par des vols en altitude et par des images satellite et autres, mais par la mémoire collective.

Thank you so much.

Merci beaucoup.

(Applause)

(Applaudissements)

(Applause ends)

Chris Anderson: Do I understand this right? What your software is going to allow, is that at some point, really within the next few years, all the pictures that are shared by anyone across the world are going to link together?

Chris Anderson: Est-ce que je comprends bien ? Que ce que votre logiciel va permettre, c'est qu'à un certain moment, vraiment dans les quelques années à venir, toutes les photos qui sont partagées par quiconque dans le monde seront en gros liées entre elles ?

BAA: Yes. What this is really doing is discovering, creating hyperlinks, if you will, between images. It's doing that based on the content inside the images. And that gets really exciting when you think about the richness of the semantic information a lot of images have. Like when you do a web search for images, you type in phrases, and the text on the web page is carrying a lot of information about what that picture is of. What if that picture links to all of your pictures? The amount of semantic interconnection and richness that comes out of that is really huge. It's a classic network effect.

BAA: Oui. Ce que le logiciel fait, c'est découvrir. Il crée des hyperliens, si vous voulez, entre les images. Et il le fait en se fondant sur le contenu des images. Et ça devient vraiment passionnant quand vous songez à la richesse de l'information sémantique que bon nombre de ces images possèdent. Par exemple quand on fait une recherche web sur des images, on tape des phrases, et le texte de la page web contient beaucoup d'informations à propos du sujet de cette photo. Maintenant, que se passe-t-il si cette photo est liée à toutes vos photos ? Dans ce cas la quantité d'interconnexions sémantiques et la quantité de richesse qui en ressort est vraiment immense.C'est un effet de réseau classique. CA: Blaise, c'est vraiment incroyable. Félicitations.

CA: Truly incredible. Congratulations.

BAA: Merci beaucoup.

(Applause)

(Applaudissements)

(Applause ends)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here.

Vous savez, je n'aurais jamais cru que je finirais par travailler chez Microsoft. C'est très gratifiant d'être reçu ici de cette manière.

(Laughter)

(Rires)

Thank you so much.

Merci beaucoup.

(Applause)

(Applaudissements)

(Applause ends)

CA: Truly incredible. Congratulations.

BAA: Merci beaucoup.

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art