Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

What I'm going to show you first, as quickly as I can, is some foundational work, some new technology that we brought to Microsoft as part of an acquisition almost exactly a year ago. This is Seadragon, and it's an environment in which you can either locally or remotely interact with vast amounts of visual data.

Lo que voy a mostrarles primero, tan brevemente como pueda, es algo de trabajo de base, una tecnología nueva que hemos traído a Microsoft como parte de una adquisición realizada hace casi un año exacto. Se trata de Seadragon. Es un entorno en el que se puede interactuar en forma local o remota con grandes cantidades de datos visuales.

We're looking at many, many gigabytes of digital photos here and kind of seamlessly and continuously zooming in, panning through it, rearranging it in any way we want. And it doesn't matter how much information we're looking at, how big these collections are or how big the images are. Most of them are ordinary digital camera photos, but this one, for example, is a scan from the Library of Congress, and it's in the 300 megapixel range. It doesn't make any difference because the only thing that ought to limit the performance of a system like this one is the number of pixels on your screen at any given moment. It's also very flexible architecture. This is an entire book, so this is an example of non-image data. This is "Bleak House" by Dickens. Every column is a chapter. To prove to you that it's really text, and not an image, we can do something like so, to really show that this is a real representation of the text; it's not a picture. Maybe this is an artificial way to read an e-book. I wouldn't recommend it.

Aquí estamos viendo muchos, muchos gigabytes de fotos digitales, haciendo zoom en forma continua y sin dificultades, haciendo panorámicas y modificaciones de cualquier tipo. La cantidad de información que veamos, el tamaño de las colecciones y el de las imágenes ya no son un problema. En su mayoría son fotos de cámaras digitales comunes, pero esta, por ejemplo, es una escaneada de la Biblioteca del Congreso, con cerca de 300 megapíxeles. Es lo mismo, porque lo único que puede limitar el rendimiento de un sistema como este es el número de píxeles de su pantalla en un momento dado. También tiene una arquitectura muy flexible. Este es un libro completo, un ejemplo de datos sin imágenes. Se trata de "Casa desolada", de Dickens. Cada columna es un capítulo. Para probarles que se trata realmente de texto, y no de una imagen, podemos hacer algo para mostrar que se trata de una representación real del texto; no es una imagen. Quizá sea una forma algo artificial de leer un libro electrónico. No la recomendaría.

This is a more realistic case, an issue of The Guardian. Every large image is the beginning of a section. And this really gives you the joy and the good experience of reading the real paper version of a magazine or a newspaper, which is an inherently multi-scale kind of medium. We've done something with the corner of this particular issue of The Guardian. We've made up a fake ad that's very high resolution -- much higher than in an ordinary ad -- and we've embedded extra content. If you want to see the features of this car, you can see it here. Or other models, or even technical specifications. And this really gets at some of these ideas about really doing away with those limits on screen real estate. We hope that this means no more pop-ups and other rubbish like that -- shouldn't be necessary.

Este es un caso más realista. Un ejemplar de The Guardian. Cada imagen grande es el comienzo de una sección. Y realmente proporciona el placer y la experiencia agradable de leer la versión real en papel de una revista o un diario, un tipo de medio propiamente de escalas múltiples. También hemos hecho algo en una esquina de este ejemplar de The Guardian. Hemos hecho un anuncio publicitario falso con alta resolución, mucho más de la que puede obtenerse en un anuncio común, y le hemos incorporado otros contenidos. Si desean ver las características de este coche, pueden hacerlo aquí. O ver otros modelos, e incluso especificaciones técnicas. Esto comprende algunas de las ideas sobre anular los límites en torno a los inmuebles en pantalla. Esperamos que esto implique el fin de las pantallas emergentes y otros estorbos de ese tipo: ya no serían necesarios.

Of course, mapping is one of those obvious applications for a technology like this. And this one I really won't spend any time on, except to say that we have things to contribute to this field as well. But those are all the roads in the U.S. superimposed on top of a NASA geospatial image. So let's pull up, now, something else. This is actually live on the Web now; you can go check it out.

Por cierto, el mapeo es una de las aplicaciones realmente obvias en una tecnología como esta. No voy a demorarme en esto, salvo decir que también tenemos cómo contribuir en este campo. Estas son todas las carreteras de los EE.UU. sobreimpresas en la parte superior de una imagen geoespacial de la NASA. Ahora veamos algo más. Esto está en directo en la red en este momento, pueden verlo.

This is a project called Photosynth, which marries two different technologies. One of them is Seadragon and the other is some very beautiful computer-vision research done by Noah Snavely, a graduate student at the University of Washington, co-advised by Steve Seitz at U.W. and Rick Szeliski at Microsoft Research. A very nice collaboration. And so this is live on the Web. It's powered by Seadragon. You can see that when we do these sorts of views, where we can dive through images and have this kind of multi-resolution experience.

Es un proyecto llamado Photosynth, que combina dos tecnologías diferentes. Una es Seadragon, y la otra una investigación visual computarizada muy hermosa, realizada por Noah Snavely, estudiante de posgrado de la Universidad de Washington, codirigida por Steve Seitz de la misma universidad y Rick Szeliski en el Dpto. de Investigación de Microsoft. Una muy buena colaboración. Y está en directo en la web, con tecnología de Seadragon. Pueden apreciarlo cuando hacemos estos tipos de vistas, en las que podemos bucear a través de las imágenes y tener esta experiencia de resolución múltiple.

But the spatial arrangement of the images here is actually meaningful. The computer vision algorithms have registered these images together so that they correspond to the real space in which these shots -- all taken near Grassi Lakes in the Canadian Rockies -- all these shots were taken. So you see elements here of stabilized slide-show or panoramic imaging, and these things have all been related spatially. I'm not sure if I have time to show you any other environments. Some are much more spatial. I would like to jump straight to one of Noah's original data-sets -- this is from an early prototype that we first got working this summer -- to show you what I think is really the punch line behind the Photosynth technology, It's not necessarily so apparent from looking at the environments we've put up on the website. We had to worry about the lawyers and so on.

El orden espacial de estas imágenes es realmente significativo. Los algoritmos visuales computarizados registraron estas imágenes en conjunto, de modo que se corresponden con el espacio real en que se hicieron las tomas, hechas en los Lagos Grassi, en las Montañas Rocallosas canadienses. Aquí ven elementos de diapositivas estabilizadas o imágenes panorámicas, todas relacionadas espacialmente. No sé si tengo tiempo para mostrarles otros entornos. Algunos son mucho más espaciales. Quisiera pasar directamente a uno de los conjuntos de datos originales de Noah; este es uno de los primeros prototipos de Photosynth por el que comenzamos en el verano, y sirve para mostrarles lo que considero la verdadera culminación de esta tecnología, Photosynth. Y esto no necesariamente se aprecia al ver los entornos que hemos subido a la red. Tuvimos que ocuparmos de las capas y demás.

This is a reconstruction of Notre Dame Cathedral that was done entirely computationally from images scraped from Flickr. You just type Notre Dame into Flickr, and you get some pictures of guys in T-shirts, and of the campus and so on. And each of these orange cones represents an image that was discovered to belong to this model. And so these are all Flickr images, and they've all been related spatially in this way. We can just navigate in this very simple way.

Esta es una reconstrucción de la Catedral de Notre Dame, realizada totalmente con ordenador a partir de imágenes tomadas de Flickr. Simplemente pongan Notre Dame en Flickr, y podrán ver imágenes de personas en camiseta, del campus y demás. Cada uno de estos conos anaranjados representa una imagen perteneciente a este modelo. Y estas son todas imágenes de Flickr, relacionadas espacialmente de esta manera. Podemos navegar simplemente de esta forma tan sencilla.

(Applause)

(Aplausos).

(Applause ends)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here.

Saben, nunca pensé que terminaría trabajando en Microsoft. Es muy gratificante tener una recepción así aquí.

(Laughter)

(Risas).

I guess you can see this is lots of different types of cameras: it's everything from cell-phone cameras to professional SLRs, quite a large number of them, stitched together in this environment. If I can find some of the sort of weird ones -- So many of them are occluded by faces, and so on. Somewhere in here there is actually a series of photographs -- here we go. This is actually a poster of Notre Dame that registered correctly. We can dive in from the poster to a physical view of this environment.

Supongo que sabrán que hay muchos tipos diferentes de cámaras: desde las de teléfonos móviles hasta SLR profesionales, gran parte de ellas ligadas a este entorno. Si puedo buscaré algunas de las más raras. Muchas están bloqueadas por rostro, y demás. Algunas de estas son realmente una serie de fotografías... veamos. Este es en realidad un póster de Notre Dame registrado correctamente. Podemos acercanos desde el póster hasta una vista física de este entorno.

What the point here really is is that we can do things with the social environment. This is now taking data from everybody -- from the entire collective memory, visually, of what the Earth looks like -- and link all of that together. Those photos become linked, and they make something emergent that's greater than the sum of the parts. You have a model that emerges of the entire Earth. Think of this as the long tail to Stephen Lawler's Virtual Earth work. And this is something that grows in complexity as people use it, and whose benefits become greater to the users as they use it. Their own photos are getting tagged with meta-data that somebody else entered. If somebody bothered to tag all of these saints and say who they all are, then my photo of Notre Dame Cathedral suddenly gets enriched with all of that data, and I can use it as an entry point to dive into that space, into that meta-verse, using everybody else's photos, and do a kind of a cross-modal and cross-user social experience that way. And of course, a by-product of all of that is immensely rich virtual models of every interesting part of the Earth, collected not just from overhead flights and from satellite images and so on, but from the collective memory.

Lo que importa realmente aquí es que podemos hacer algo en el entorno social. Aquí se están tomando datos de todos, de toda la memoria colectiva de la apariencia visual de la Tierra, y vinculándose en su totalidad. Todas estas fotos se vinculan y producen algo emergente que es más que la suma de las partes. Este es un modelo que surge de toda la Tierra. Véanlo como la larga cola del trabajo de Tierra Virtual de Stephen Lawler. Es algo cuya complejidad crece con el uso y cuyos beneficios para los usuarios se amplían a medida que lo utilizan. Sus propias fotos se etiquetan con metadatos que alguien introdujo. Si alguien se toma el trabajo de etiquetar todos estos santos e indicar quiénes son, mi foto de la Catedral de Notre Dame se enriquece de pronto con todos esos datos, y puedo utilizarla como punto de entrada para bucear en ese espacio, en ese metaverso, usando las fotos de todos los demás, y hacer un tipo de experiencia social de modelos y usuarios cruzados de esa forma. Por supuesto, una consecuencia de todo ello consiste en modelos virtuales enormemente ricos de cada parte interesante de la Tierra, tomados no solo de vuelos de altura e imágenes satelitales y demás, sino de la memoria colectiva.

Thank you so much.

Muchas gracias.

(Applause)

(Aplausos)

(Applause ends)

Chris Anderson: Do I understand this right? What your software is going to allow, is that at some point, really within the next few years, all the pictures that are shared by anyone across the world are going to link together?

Chris Anderson: A ver si lo comprendo bien: ¿este software permitirá en algún momento, en los próximos años, que todas las imágenes compartidas por cualquier persona en cualquier parte del mundo se vinculen?

BAA: Yes. What this is really doing is discovering, creating hyperlinks, if you will, between images. It's doing that based on the content inside the images. And that gets really exciting when you think about the richness of the semantic information a lot of images have. Like when you do a web search for images, you type in phrases, and the text on the web page is carrying a lot of information about what that picture is of. What if that picture links to all of your pictures? The amount of semantic interconnection and richness that comes out of that is really huge. It's a classic network effect.

BAA: Sí. Lo que hace realmente es descubrir crear hipervínculos, si lo quieren, entre las imágenes. Y lo hace basándose en el contenido de las imágenes. Y es realmente emocionante pensar en la riqueza de la información semántica de muchas de estas imágenes. Como en una búsqueda de imágenes en la web, se introducen frases, y el texto de la página web lleva gran cantidad de información acerca de la imagen. Ahora, ¿qué sucede si dicha imagen se vincula con todas sus imágenes? La cantidad de interconexión semántica y de riqueza procedente de ello es verdaderamente enorme. Es un efecto de red clásico. CA: Blaise, es realmente increíble. Felicidades.

CA: Truly incredible. Congratulations.

BAA: Muchas gracias.

(Applause)

(Aplausos).

(Applause ends)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here.

Saben, nunca pensé que terminaría trabajando en Microsoft. Es muy gratificante tener una recepción así aquí.

(Laughter)

(Risas).

Thank you so much.

Muchas gracias.

(Applause)

(Aplausos)

(Applause ends)

CA: Truly incredible. Congratulations.

BAA: Muchas gracias.

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art