Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

What I'm going to show you first, as quickly as I can, is some foundational work, some new technology that we brought to Microsoft as part of an acquisition almost exactly a year ago. This is Seadragon, and it's an environment in which you can either locally or remotely interact with vast amounts of visual data.

Prima di tutto vi farò vedere, il più rapidamente possibile, qualche lavoro fondamentale, della nuova tecnologia che abbiamo portato alla Microsoft in seguito a una acquisizione quasi un anno fa esattamente. Questo è Seadragon. Si tratta di un ambiente in cui potete interagire sia a livello locale che remoto con un'enorme quantità di dati visivi.

We're looking at many, many gigabytes of digital photos here and kind of seamlessly and continuously zooming in, panning through it, rearranging it in any way we want. And it doesn't matter how much information we're looking at, how big these collections are or how big the images are. Most of them are ordinary digital camera photos, but this one, for example, is a scan from the Library of Congress, and it's in the 300 megapixel range. It doesn't make any difference because the only thing that ought to limit the performance of a system like this one is the number of pixels on your screen at any given moment. It's also very flexible architecture. This is an entire book, so this is an example of non-image data. This is "Bleak House" by Dickens. Every column is a chapter. To prove to you that it's really text, and not an image, we can do something like so, to really show that this is a real representation of the text; it's not a picture. Maybe this is an artificial way to read an e-book. I wouldn't recommend it.

Qui stiamo vedendo moltissimi gigabyte di foto digitali con una specie di zoom costante e continuo, per una veloce panoramica, disponendo le immagini come vogliamo. E non importa quanto sia grande la quantità di informazioni che vediamo, quanto siano grandi queste raccolte di dati o le immagini. La maggior parte sono comuni fotografie digitali, ma questa per esempio è una scansione proveniente dalla Biblioteca del Congresso, e rientra nella gamma di 300 megapixel. Non fa alcuna differenza perché l'unica cosa che limita le prestazioni di un sistema come questo è il numero di pixel sul vostro schermo in qualsiasi momento. È anche un'architettura molto flessibile. Questo è un libro intero, esempio di dati non di immagini. Questo è La Casa desolata di Dickens. Ogni colonna rappresenta un capitolo. Per dimostrarvi che si tratta proprio di testo e non di immagini, possiamo fare una cosa del genere, per far vedere veramente che si tratta di una reale rappresentazione del testo, non di una foto. Potrebbe essere un modo un po' artificiale per leggere un libro elettronico. Non ve lo consiglio.

This is a more realistic case, an issue of The Guardian. Every large image is the beginning of a section. And this really gives you the joy and the good experience of reading the real paper version of a magazine or a newspaper, which is an inherently multi-scale kind of medium. We've done something with the corner of this particular issue of The Guardian. We've made up a fake ad that's very high resolution -- much higher than in an ordinary ad -- and we've embedded extra content. If you want to see the features of this car, you can see it here. Or other models, or even technical specifications. And this really gets at some of these ideas about really doing away with those limits on screen real estate. We hope that this means no more pop-ups and other rubbish like that -- shouldn't be necessary.

Questo è un esempio più realistico. Si tratta di un numero di The Guardian. Ogni immagine grande è l'inizio di una sezione. E vi dà veramente il piacere e la bella esperienza di leggere la vera versione cartacea di una rivista o di un quotidiano, che è per natura un molteplice tipo di mezzo. Abbiamo anche fatto qualcosa con l'angolo di questo numero in particolare di The Guardian. Abbiamo creato una pubblicità fasulla ad alta risoluzione-- molto più alta di quella che si usa in una normale pubblicità-- e vi abbiamo inserito del contenuto extra. Se volete vedere le caratteristiche di questa automobile, le potete vedere qui. O altri modelli, o persino le specifiche tecniche. E questo davvero dimostra alcune di queste idee per eliminare i limiti di spazio sullo schermo. Speriamo che questo significhi mettere fine ai pop-up e ad altre porcherie simili, non dovrebero più essere necessari.

Of course, mapping is one of those obvious applications for a technology like this. And this one I really won't spend any time on, except to say that we have things to contribute to this field as well. But those are all the roads in the U.S. superimposed on top of a NASA geospatial image. So let's pull up, now, something else. This is actually live on the Web now; you can go check it out.

Naturalmente, il mapping è una delle applicazioni più ovvie per questo tipo di tecnologia. E su questa non voglio perdere tempo, se non per dire che abbiamo dei contributi da dare anche in questo campo. Queste sono tutte le strade negli USA sovrapposte sopra un'immagine geospaziale della NASA. Allora adesso passiamo a qualcos'altro. Questo ora è effettivamente dal vivo sul web; potete andarlo a vedere.

This is a project called Photosynth, which marries two different technologies. One of them is Seadragon and the other is some very beautiful computer-vision research done by Noah Snavely, a graduate student at the University of Washington, co-advised by Steve Seitz at U.W. and Rick Szeliski at Microsoft Research. A very nice collaboration. And so this is live on the Web. It's powered by Seadragon. You can see that when we do these sorts of views, where we can dive through images and have this kind of multi-resolution experience.

Si tratta di un progetto chiamato Photosynth, che combina due diverse tecnologie. Una è Seadragon e l'altra è una ricerca visuale al computer molto interessante fatta da Noah Snavely, uno studente della University of Washington, con l'aiuto di Steve Seitz della UW e di Rick Szeliski di Microsoft Research. Un'ottima collaborazione. Ed è dal vivo sul web, gestito da Seadragon. Potete vederlo quando facciamo questo tipo di schermate, in cui possiamo passare da un'immagine all'altra e avere questo tipo di esperienza a risoluzione multipla.

But the spatial arrangement of the images here is actually meaningful. The computer vision algorithms have registered these images together so that they correspond to the real space in which these shots -- all taken near Grassi Lakes in the Canadian Rockies -- all these shots were taken. So you see elements here of stabilized slide-show or panoramic imaging, and these things have all been related spatially. I'm not sure if I have time to show you any other environments. Some are much more spatial. I would like to jump straight to one of Noah's original data-sets -- this is from an early prototype that we first got working this summer -- to show you what I think is really the punch line behind the Photosynth technology, It's not necessarily so apparent from looking at the environments we've put up on the website. We had to worry about the lawyers and so on.

Ma la collocazione spaziale delle immagini qui ha effettivamente un significato. Gli algoritmi di visualizzazione del computer hanno registrato queste immagini insieme, in modo che corrispondano ai luoghi reali in cui le immagini-- tutte scattate vicino ai laghi Grassi sulle Montagne Rocciose canadesi-- sono state scattate. Qui dunque vedete degli elementi di uno slide-show stabile o di imaging panoramico, e queste cose sono state tutte collegate a livello spaziale. Non so se ho tempo di mostrarvi qualche altro ambiente. Ce ne sono alcuni molto più spaziali. Voglio passare direttamente a uno dei set di dati originali di Noah-- e questo proviene da un precedente prototipo di Photosynth che abbiamo fatto funzionare per la prima volta l'estate scorsa-- per mostrarvi quello che ritengo sia veramente la chiave dietro questa tecnologia, la tecnologia Photosynth. E non è necessariamente tanto evidente guardando gli ambienti che abbiamo messo nel sito web. Abbiamo dovuto preoccuparci degli aspetti legali e via dicendo.

This is a reconstruction of Notre Dame Cathedral that was done entirely computationally from images scraped from Flickr. You just type Notre Dame into Flickr, and you get some pictures of guys in T-shirts, and of the campus and so on. And each of these orange cones represents an image that was discovered to belong to this model. And so these are all Flickr images, and they've all been related spatially in this way. We can just navigate in this very simple way.

Questa è una ricostruzione della cattedrale di Notre Dame fatta interamente al computer con immagini prese da Flickr. Basta digitare Notre Dame in Flickr e ottenete immagini di tipi in maglietta, altre del campus e via dicendo. Ognuno di questi coni arancioni rappresenta un'immagine che si è scoperta appartenere a questo modello. Quindi sono tutte immagini prese da Flickr e sono state tutte collegate a livello spaziale in questo modo. Ed è possibile navigare in questo modo semplicissimo.

(Applause)

(Applausi).

(Applause ends)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here.

Sapete, non avrei mai pensato di finire a lavorare per Microsoft. È molto gratificante ricevere questo tipo di accoglienza qui.

(Laughter)

(Risate).

I guess you can see this is lots of different types of cameras: it's everything from cell-phone cameras to professional SLRs, quite a large number of them, stitched together in this environment. If I can find some of the sort of weird ones -- So many of them are occluded by faces, and so on. Somewhere in here there is actually a series of photographs -- here we go. This is actually a poster of Notre Dame that registered correctly. We can dive in from the poster to a physical view of this environment.

Credo che possiate vedere che si tratta di un sacco di tipi diversi di fotocamere: da quelle dei cellulari a quelle professionali SLR, un numero significativo, messe insieme in questo ambiente. E se ci riesco, provo a cercare quelle più strane. Molte sono bloccate da facce, eccetera. Da qualche parte qui si trova una serie di fotografie -- eccole. Questo è un poster di Notre Dame registrato correttamente. Possiamo passare dal poster alla vista fisica di questo ambiente.

What the point here really is is that we can do things with the social environment. This is now taking data from everybody -- from the entire collective memory, visually, of what the Earth looks like -- and link all of that together. Those photos become linked, and they make something emergent that's greater than the sum of the parts. You have a model that emerges of the entire Earth. Think of this as the long tail to Stephen Lawler's Virtual Earth work. And this is something that grows in complexity as people use it, and whose benefits become greater to the users as they use it. Their own photos are getting tagged with meta-data that somebody else entered. If somebody bothered to tag all of these saints and say who they all are, then my photo of Notre Dame Cathedral suddenly gets enriched with all of that data, and I can use it as an entry point to dive into that space, into that meta-verse, using everybody else's photos, and do a kind of a cross-modal and cross-user social experience that way. And of course, a by-product of all of that is immensely rich virtual models of every interesting part of the Earth, collected not just from overhead flights and from satellite images and so on, but from the collective memory.

Il punto essenziale è che possiamo fare delle cose con l'ambiente sociale. Questo ora sta prendendo dati da tutti-- dall'intera memoria collettiva di come appare la terra, dal punto di vista visivo-- e collega tutti quei dati insieme. Tutte quelle foto diventano collegate insieme, creando un risultato maggiore della somma delle parti. Avete un modello della terra intera che emerge. Pensate a questo come al prolungamento del lavoro di Stephen Lawler sulla Terra virtuale. È qualcosa che cresce in complessità man mano che le persone lo utilizzano, e i cui vantaggi per chi lo utilizza aumentano con l'uso. Le loro foto sono marcate con metadati inseriti da altre persone. Se a qualcuno viene l'idea di contrassegnare tutti questi santi dicendone il nome, la mia foto della cattedrale di Notre Dame improvvisamente si arricchisce di tutti quei dati, e la posso usare come punto di ingresso per immergermi in quello spazio, in quel metauniverso, usando le foto di chiunque altro, e avere un'esperienza sociale di tipo cross-modal e cross-user. Naturalmente, una conseguenza di tutto ciò sono modelli virtuali estremamente elaborati di ogni parte interessante della terra, raccolti non soltanto da viste aeree e da immagini satellitari e simili, ma dalla memoria collettiva.

Thank you so much.

Mille grazie.

(Applause)

(Applausi).

(Applause ends)

Chris Anderson: Do I understand this right? What your software is going to allow, is that at some point, really within the next few years, all the pictures that are shared by anyone across the world are going to link together?

Chris Anderson: Ho capito bene? Che quello che il tuo software ci consentirà di fare è che a un certo punto, nei prossimi anni, tutte le fotografie condivise da chiunque nel mondo intero saranno praticamente collegate insieme?

BAA: Yes. What this is really doing is discovering, creating hyperlinks, if you will, between images. It's doing that based on the content inside the images. And that gets really exciting when you think about the richness of the semantic information a lot of images have. Like when you do a web search for images, you type in phrases, and the text on the web page is carrying a lot of information about what that picture is of. What if that picture links to all of your pictures? The amount of semantic interconnection and richness that comes out of that is really huge. It's a classic network effect.

BAA: Sì. Quello che sta facendo in pratica è scoprire. Sta creando degli hyperlink, se si vuole, tra immagini. E lo sta facendo in base al contenuto delle immagini. La cosa diventa molto interessante se si pensa alla ricchezza di informazioni semantiche che molte di queste immagini possiedono. Come quando si fa una ricerca di immagini sul web, si digita una frase e il testo sulla pagina web ha molte informazioni riguardo alle immagini della foto. Ora, cosa succede se quella foto si collega a tutte le vostre foto? Allora la quantità di interconnessione semantica e la quantità di ricchezza che ne deriva è davvero enorme. È un classico effetto della rete. CA: Blaise, è davvero incredibile. Complimenti.

CA: Truly incredible. Congratulations.

BAA: Vi ringrazio molto.

(Applause)

(Applausi).

(Applause ends)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here.

Sapete, non avrei mai pensato di finire a lavorare per Microsoft. È molto gratificante ricevere questo tipo di accoglienza qui.

(Laughter)

(Risate).

Thank you so much.

Mille grazie.

(Applause)

(Applausi).

(Applause ends)

CA: Truly incredible. Congratulations.

BAA: Vi ringrazio molto.

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art