Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

What I'm going to show you first, as quickly as I can, is some foundational work, some new technology that we brought to Microsoft as part of an acquisition almost exactly a year ago. This is Seadragon, and it's an environment in which you can either locally or remotely interact with vast amounts of visual data.

O que vou mostrar primeiro, o mais rápido possível, é um trabalho de base, uma nova tecnologia que levamos para a Microsoft como parte de uma aquisição há quase um ano. Este é o Seadragon. É um ambiente onde é possível interagir local ou remotamente com amplas quantidades de dados visuais.

We're looking at many, many gigabytes of digital photos here and kind of seamlessly and continuously zooming in, panning through it, rearranging it in any way we want. And it doesn't matter how much information we're looking at, how big these collections are or how big the images are. Most of them are ordinary digital camera photos, but this one, for example, is a scan from the Library of Congress, and it's in the 300 megapixel range. It doesn't make any difference because the only thing that ought to limit the performance of a system like this one is the number of pixels on your screen at any given moment. It's also very flexible architecture. This is an entire book, so this is an example of non-image data. This is "Bleak House" by Dickens. Every column is a chapter. To prove to you that it's really text, and not an image, we can do something like so, to really show that this is a real representation of the text; it's not a picture. Maybe this is an artificial way to read an e-book. I wouldn't recommend it.

Estamos vendo muitos, muitos gigabytes de fotos digitais aqui, ampliando-as quase que contínua e ininterruptamente, deslocando-nos sobre a montagem, reorganizando da forma que desejamos. E não importa a quantidade de informação que estamos vendo, nem o tamanho destas coleções, ou das imagens. A maioria é composta por fotos de câmeras digitais comuns, mas esta aqui, por exemplo, foi escaneada da biblioteca do congresso, e tem cerca de 300 megapixels. Não faz diferença, pois a única coisa que limita o desempenho de um sistema como esse é o número de pixels na sua tela em dado momento. Ele também apresenta arquitetura muito flexível. Isto é um livro inteiro, um exemplo de dados que não são imagens. Esse livro é Bleak House, de Dickens. Cada coluna é um capítulo. Para provar que é realmente texto, e não uma imagem, podemos fazer algo assim, para deixar claro que isto é uma representação real do texto, e não uma foto. Talvez seja uma maneira artificial de se ler um livro eletrônico. Eu não recomendo.

This is a more realistic case, an issue of The Guardian. Every large image is the beginning of a section. And this really gives you the joy and the good experience of reading the real paper version of a magazine or a newspaper, which is an inherently multi-scale kind of medium. We've done something with the corner of this particular issue of The Guardian. We've made up a fake ad that's very high resolution -- much higher than in an ordinary ad -- and we've embedded extra content. If you want to see the features of this car, you can see it here. Or other models, or even technical specifications. And this really gets at some of these ideas about really doing away with those limits on screen real estate. We hope that this means no more pop-ups and other rubbish like that -- shouldn't be necessary.

Aqui temos um caso mais realista. Esta é uma edição do The Guardian. Cada imagem grande é o início de uma seção. E isso realmente lhe dá a alegria e a experiência agradável de ler a versão real em papel de uma revista ou jornal, um tipo de mídia que é naturalmente disposto em escalas múltiplas. Também fizemos uma coisa aqui com o canto desta edição específica do The Guardian. Criamos um anúncio falso com resolução bem alta -- muito mais alta do que poderíamos ver em um anúncio comum -- e incorporamos conteúdo extra. Se quiser ver as características deste carro, pode vê-las aqui. Ou outros modelos, ou até especificações técnicas. E isto realmente trabalha algumas daquelas idéias sobre o problema dos limites impostos pelas telas. Esperamos que isso signifique um adeus aos pop-ups e porcarias do gênero -- não devem mais ser necessários.

Of course, mapping is one of those obvious applications for a technology like this. And this one I really won't spend any time on, except to say that we have things to contribute to this field as well. But those are all the roads in the U.S. superimposed on top of a NASA geospatial image. So let's pull up, now, something else. This is actually live on the Web now; you can go check it out.

Obviamente, mapeamento é uma das aplicações óbvias para uma tecnologia como essa. E neste não vou gastar muito tempo, exceto para dizer que também temos coisas para contribuir neste campo. Mas essas são todas as estradas dos EUA, superpostas a uma imagem geoespacial da NASA. Agora, vamos apresentar outra coisa. Isto está vindo direto da Internet; você pode conferir lá agora.

This is a project called Photosynth, which marries two different technologies. One of them is Seadragon and the other is some very beautiful computer-vision research done by Noah Snavely, a graduate student at the University of Washington, co-advised by Steve Seitz at U.W. and Rick Szeliski at Microsoft Research. A very nice collaboration. And so this is live on the Web. It's powered by Seadragon. You can see that when we do these sorts of views, where we can dive through images and have this kind of multi-resolution experience.

Este é um projeto chamado Photosynth, que realmente casa duas tecnologias diferentes. Uma é a do Seadragon e a outra é uma linda pesquisa de processamento de imagens por computador feita por Noah Snavely, estudante da Universidade de Washington, orientado por Steve Seitz, da U.W. e Rick Szeliski, da Microsoft Research. Um belo trabalho em equipe. E isto está disponível na Internet. Através do Seadragon. Você pode ver que, quando fazemos essas visualizações, podemos mergulhar através das imagens e experimentar esse ambiente de resolução múltipla.

But the spatial arrangement of the images here is actually meaningful. The computer vision algorithms have registered these images together so that they correspond to the real space in which these shots -- all taken near Grassi Lakes in the Canadian Rockies -- all these shots were taken. So you see elements here of stabilized slide-show or panoramic imaging, and these things have all been related spatially. I'm not sure if I have time to show you any other environments. Some are much more spatial. I would like to jump straight to one of Noah's original data-sets -- this is from an early prototype that we first got working this summer -- to show you what I think is really the punch line behind the Photosynth technology, It's not necessarily so apparent from looking at the environments we've put up on the website. We had to worry about the lawyers and so on.

Mas a disposição espacial das imagens aqui é realmente significativa. Os algoritmos de processamento de imagem registraram essas imagens juntas, de forma a corresponderem ao espaço real onde estas fotos -- todas tiradas perto dos Lagos Grassi, nas Montanhas Rochosas Canadenses -- foram tiradas. Então você vê elementos aqui de "slideshow" estabilizado, ou imagens panorâmicas, e tudo isso estava relacionado espacialmente. Não sei se vou ter tempo de mostrar outras paisagens. Há algumas bem mais amplas. Gostaria de passar direto para um dos conjuntos de dados originais do Noah -- e este é de um protótipo anterior do Photosynth com o qual começamos a trabalhar no verão -- para mostrar o que eu acho que é realmente o principal por trás dessa tecnologia, a tecnologia Photosynth. Algo que não é necessariamente tão visível quando observamos as cenas que estão no website. Tivemos que tomar cuidado com advogados, e coisas assim.

This is a reconstruction of Notre Dame Cathedral that was done entirely computationally from images scraped from Flickr. You just type Notre Dame into Flickr, and you get some pictures of guys in T-shirts, and of the campus and so on. And each of these orange cones represents an image that was discovered to belong to this model. And so these are all Flickr images, and they've all been related spatially in this way. We can just navigate in this very simple way.

Esta é uma reconstrução da Catedral de Notre Dame que foi feita inteiramente por computador através de imagens encontradas no Flickr. Se digitar "Notre Dame" no Flickr, aparecem fotos de gente com camiseta da faculdade Notre Dame, do campus, e por aí vai. E cada um desses cones laranja representa uma imagem que descobrimos pertencer a este modelo. Portanto, estas são todas imagens do Flickr, e foram todas espacialmente relacionadas, como podem ver. E podemos navegar por elas dessa forma muito simples.

(Applause)

(Aplausos)

(Applause ends)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here.

Sabem, eu nunca pensei que um dia ia trabalhar na Microsoft. É muito gratificante ter esse tipo de recepção aqui.

(Laughter)

(Risos)

I guess you can see this is lots of different types of cameras: it's everything from cell-phone cameras to professional SLRs, quite a large number of them, stitched together in this environment. If I can find some of the sort of weird ones -- So many of them are occluded by faces, and so on. Somewhere in here there is actually a series of photographs -- here we go. This is actually a poster of Notre Dame that registered correctly. We can dive in from the poster to a physical view of this environment.

Acho que podem notar que isso tudo vem de diversos tipos de câmeras: desde câmeras de celulares às SLR profissionais, cedendo grande número de fotos, alinhavadas nesse ambiente. E se eu conseguir, vou encontrar algumas esquisitas. Muitas estão obstruídas por rostos, e coisas assim. Em algum lugar aqui há uma série de fotos -- aqui está. Isso, na verdade, era só um cartaz de Notre Dame, mas a câmera pegou muito bem. Podemos mergulhar no cartaz, para obtermos uma visão detalhada deste ambiente.

What the point here really is is that we can do things with the social environment. This is now taking data from everybody -- from the entire collective memory, visually, of what the Earth looks like -- and link all of that together. Those photos become linked, and they make something emergent that's greater than the sum of the parts. You have a model that emerges of the entire Earth. Think of this as the long tail to Stephen Lawler's Virtual Earth work. And this is something that grows in complexity as people use it, and whose benefits become greater to the users as they use it. Their own photos are getting tagged with meta-data that somebody else entered. If somebody bothered to tag all of these saints and say who they all are, then my photo of Notre Dame Cathedral suddenly gets enriched with all of that data, and I can use it as an entry point to dive into that space, into that meta-verse, using everybody else's photos, and do a kind of a cross-modal and cross-user social experience that way. And of course, a by-product of all of that is immensely rich virtual models of every interesting part of the Earth, collected not just from overhead flights and from satellite images and so on, but from the collective memory.

A idéia aqui é que podemos fazer coisas com o ambiente social. Estamos pegando dados de todos -- de toda a memória coletiva sobre como é a Terra, visualmente -- e conectando tudo. Todas essas fotos são conectadas, e fazem emergir algo que é maior do que a soma das partes. Produz-se um modelo de toda a Terra. Pensem nisso como uma extensão do trabalho de Stephen Lawler, Virtual Earth. E isto é algo que cresce em complexidade conforme as pessoas o utilizam, e cujos benefícios aumentam para os usuários conforme o utilizam. Suas próprias fotos estão sendo identificadas via "tags meta-data" que outra pessoa inseriu. Se alguém tiver a paciência de preencher "tags" para identificar cada um desses santos e dizer quem são, então a minha foto da Catedral de Notre Dame repentinamente torna-se enriquecida com toda essa informação, e eu posso usar isso como um ponto de entrada para mergulhar naquele espaço, naquele mundo virtual, usando as fotos de todos, e viver um tipo de experiência social interligando modos e usuários. E, obviamente, um subproduto de tudo aquilo são os modelos virtuais imensamente ricos de cada parte interessante da Terra, obtidos não somente por fotos aéreas, de satélites e coisas assim, mas pela memória coletiva.

Thank you so much.

Muito obrigado.

(Applause)

(Aplausos)

(Applause ends)

Chris Anderson: Do I understand this right? What your software is going to allow, is that at some point, really within the next few years, all the pictures that are shared by anyone across the world are going to link together?

Chris Anderson: Eu entendi isso direito? Que o seu software vai permitir, em algum momento, na verdade dentro de alguns anos, que todas as fotos compartilhadas por qualquer pessoa em todo o mundo sejam, basicamente, conectadas?

BAA: Yes. What this is really doing is discovering, creating hyperlinks, if you will, between images. It's doing that based on the content inside the images. And that gets really exciting when you think about the richness of the semantic information a lot of images have. Like when you do a web search for images, you type in phrases, and the text on the web page is carrying a lot of information about what that picture is of. What if that picture links to all of your pictures? The amount of semantic interconnection and richness that comes out of that is really huge. It's a classic network effect.

BAA: Sim. O que isto realmente faz é descobri-las. Criar "links", por assim dizer, entre imagens. E fazer isso com base no conteúdo das imagens. E isto fica empolgante quando você pensa na riqueza da informação semântica que muitas dessas imagens têm. Como quando você faz uma busca na Internet por imagens, você digita uma frase, e esse texto na página web carrega muitas informações sobre o que é aquela foto. Bem, e se aquela foto estiver conectada a todas as suas fotos? Então, a interconexão semântica e a riqueza de detalhes que virá disso será realmente imensa. É um efeito clássico de rede. CA: Blaise, isso é incrível. Parabéns.

CA: Truly incredible. Congratulations.

BAA: Muito obrigado.

(Applause)

(Aplausos)

(Applause ends)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here.

Sabem, eu nunca pensei que um dia ia trabalhar na Microsoft. É muito gratificante ter esse tipo de recepção aqui.

(Laughter)

(Risos)

Thank you so much.

Muito obrigado.

(Applause)

(Aplausos)

(Applause ends)

CA: Truly incredible. Congratulations.

BAA: Muito obrigado.

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art