Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

What I'm going to show you first, as quickly as I can, is some foundational work, some new technology that we brought to Microsoft as part of an acquisition almost exactly a year ago. This is Seadragon, and it's an environment in which you can either locally or remotely interact with vast amounts of visual data.

Прежде всего, как можно скорее, я хочу представить вам одну серьезную работу, новую технологию, которую мы предоставили Microsoft как часть её приобретения почти год назад. Это Seadragon («Морской дракон»). В этом режиме можно работать с огромным количеством видеоинформации, локально или удаленно.

We're looking at many, many gigabytes of digital photos here and kind of seamlessly and continuously zooming in, panning through it, rearranging it in any way we want. And it doesn't matter how much information we're looking at, how big these collections are or how big the images are. Most of them are ordinary digital camera photos, but this one, for example, is a scan from the Library of Congress, and it's in the 300 megapixel range. It doesn't make any difference because the only thing that ought to limit the performance of a system like this one is the number of pixels on your screen at any given moment. It's also very flexible architecture. This is an entire book, so this is an example of non-image data. This is "Bleak House" by Dickens. Every column is a chapter. To prove to you that it's really text, and not an image, we can do something like so, to really show that this is a real representation of the text; it's not a picture. Maybe this is an artificial way to read an e-book. I wouldn't recommend it.

Перед нашими глазами проходят мириады гигабайт цифровых фотографий, и мы можем незаметно и непрерывно увеличивать изображение, горизонтально его прокручивать, перестраивать его, как хотим. Не имеет значения, сколько информации проходит у нас перед глазами, насколько велики эти коллекции или сами изображения. Большинство из них — это снимки, сделанные обычным цифровым фотоаппаратом, но вот эта, например, отсканирована в Библиотеке Конгресса, она выполнена с разрешением в 300 мегапикселей. Но это не имеет никакого значения, потому что единственное, что ограничивает эффективность подобной системы — это количество пикселей на вашем экране в данный момент. Это очень гибкая архитектура. Вот целая книга, пример данных без изображения. Это «Холодный дом» Диккенса. Каждый столбец — это глава. Чтобы доказать вам, что это на самом деле текст, а не изображение, мы можем сделать вот что, чтобы действительно показать, что это настоящее представление текста; это не рисунок. Может быть, читать электронную книгу таким образом не совсем привычно. Я бы не советовал читать ее так.

This is a more realistic case, an issue of The Guardian. Every large image is the beginning of a section. And this really gives you the joy and the good experience of reading the real paper version of a magazine or a newspaper, which is an inherently multi-scale kind of medium. We've done something with the corner of this particular issue of The Guardian. We've made up a fake ad that's very high resolution -- much higher than in an ordinary ad -- and we've embedded extra content. If you want to see the features of this car, you can see it here. Or other models, or even technical specifications. And this really gets at some of these ideas about really doing away with those limits on screen real estate. We hope that this means no more pop-ups and other rubbish like that -- shouldn't be necessary.

А это более удачный пример. Это выпуск газеты Guardian. Каждое большое изображение — это начало раздела. И так вы действительно почувствуете радость и ощущение того, что читаете настоящую печатную версию журнала или газеты, которая, по сути, является носителем информации с разными масштабами. Мы кое-что сделали здесь, в уголке этого выпуска Guardian. Мы составили фиктивную рекламу очень высокого разрешения, намного выше, чем в обычной рекламе, и вставили дополнительное содержание. Если вы хотите увидеть характеристики этой машины, они находятся вот здесь. Или другие модели или даже техническое описание. Здесь мы приблизились к тому, чтобы покончить со всякими ограничениями реального объема экрана. Мы надеемся, нам больше не понадобятся всплывающие окна и всякая подобная ерунда.

Of course, mapping is one of those obvious applications for a technology like this. And this one I really won't spend any time on, except to say that we have things to contribute to this field as well. But those are all the roads in the U.S. superimposed on top of a NASA geospatial image. So let's pull up, now, something else. This is actually live on the Web now; you can go check it out.

Конечно, один из способов применения подобной технологии, что сразу приходит на ум, — это составление карт. Я не буду на этом останавливаться, просто скажу, что мы можем кое-что привнести и в данную отрасль. Это все дороги США, нанесенные поверх геопространственного изображения NASA. А сейчас давайте остановимся на другом. Вот это сейчас размещено в сети в режиме реального времени, вы можете сами посмотреть.

This is a project called Photosynth, which marries two different technologies. One of them is Seadragon and the other is some very beautiful computer-vision research done by Noah Snavely, a graduate student at the University of Washington, co-advised by Steve Seitz at U.W. and Rick Szeliski at Microsoft Research. A very nice collaboration. And so this is live on the Web. It's powered by Seadragon. You can see that when we do these sorts of views, where we can dive through images and have this kind of multi-resolution experience.

Этот проект называется Photosynth («Фотосинтез»), он объединяет в себе две различные технологии. Одна из них — это Seadragon, а вторая — прекрасное исследование машинного зрения, выполненное Ноа Снейвли, аспирантом университета Вашингтона, его консультировали Стив Зейтц из университета Вашингтона и Рик Силински из Microsoft Research. Просто прекрасная команда. Этот проект размещен в сети. Поддерживается с помощью Seadragon. Вы видите, что когда мы используем эти режимы просмотра, мы можем проникать через изображения и получаем эффект переменной разрешающей способности.

But the spatial arrangement of the images here is actually meaningful. The computer vision algorithms have registered these images together so that they correspond to the real space in which these shots -- all taken near Grassi Lakes in the Canadian Rockies -- all these shots were taken. So you see elements here of stabilized slide-show or panoramic imaging, and these things have all been related spatially. I'm not sure if I have time to show you any other environments. Some are much more spatial. I would like to jump straight to one of Noah's original data-sets -- this is from an early prototype that we first got working this summer -- to show you what I think is really the punch line behind the Photosynth technology, It's not necessarily so apparent from looking at the environments we've put up on the website. We had to worry about the lawyers and so on.

Но пространственное расположение изображений имеет определенный смысл. Алгоритмы машинного зрения зарегистрировали эти изображения таким образом, что они соответствуют реальному пространству — это было снято в районе озер Грасси в канадских горах — где эти снимки были сделаны. Вы видите элементы устойчивой покадровой или панорамной визуализации, и всё это было связано в пространстве. Я не уверен, есть ли у меня время показать вам другие режимы просмотра. Существуют режимы, которые ещё лучше передают пространство. Я хотел бы перейти к одному из оригинальных наборов данных Ноа, они взяты из раннего прототипа Photosynth, с которым мы впервые начали работать летом, чтобы продемонстрировать вам то, что я считаю действительно кульминационным моментом в этой технологии, технологии Photosynth. Когда смотришь на режимы просмотра, предложенные нами на веб-сайте, это не так заметно. Мы тогда переживали по поводу юристов и так далее.

This is a reconstruction of Notre Dame Cathedral that was done entirely computationally from images scraped from Flickr. You just type Notre Dame into Flickr, and you get some pictures of guys in T-shirts, and of the campus and so on. And each of these orange cones represents an image that was discovered to belong to this model. And so these are all Flickr images, and they've all been related spatially in this way. We can just navigate in this very simple way.

Это реконструкция собора Парижской Богоматери, полностью выполненная с помощью компьютера на основе изображений, собранных на сервисе Flickr. Вы просто печатаете «Нотр-Дам» во Flickr’е, и получаете изображения ребят в футболках, изображения студенческого городка и так далее. А каждый из этих оранжевых конусов представляет изображение, относящееся к этой модели. Это все изображения из Flickr’а, таким образом, они все были соединены пространственно. И навигация здесь очень проста.

(Applause)

(Аплодисменты).

(Applause ends)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here.

Вы знаете, я никогда не думал, что буду работать в Microsoft. Такой прием — это просто огромное удовольствие для меня.

(Laughter)

(Смех)

I guess you can see this is lots of different types of cameras: it's everything from cell-phone cameras to professional SLRs, quite a large number of them, stitched together in this environment. If I can find some of the sort of weird ones -- So many of them are occluded by faces, and so on. Somewhere in here there is actually a series of photographs -- here we go. This is actually a poster of Notre Dame that registered correctly. We can dive in from the poster to a physical view of this environment.

Думаю, вы видите, все это многообразие различных видов фотоаппаратов: от камеры в мобильном телефоне до профессиональных зеркальных фотоаппаратов, их много, и они соединены в этом режиме просмотра. Если получится, я найду что-нибудь необычное. Многие изображения перекрываются. Тут где-то есть серия фотографий, а, вот они. Это плакат собора Парижской Богоматери, правильно зарегистрированный. Из этого плаката мы можем перейти в физический ракурс этой среды.

What the point here really is is that we can do things with the social environment. This is now taking data from everybody -- from the entire collective memory, visually, of what the Earth looks like -- and link all of that together. Those photos become linked, and they make something emergent that's greater than the sum of the parts. You have a model that emerges of the entire Earth. Think of this as the long tail to Stephen Lawler's Virtual Earth work. And this is something that grows in complexity as people use it, and whose benefits become greater to the users as they use it. Their own photos are getting tagged with meta-data that somebody else entered. If somebody bothered to tag all of these saints and say who they all are, then my photo of Notre Dame Cathedral suddenly gets enriched with all of that data, and I can use it as an entry point to dive into that space, into that meta-verse, using everybody else's photos, and do a kind of a cross-modal and cross-user social experience that way. And of course, a by-product of all of that is immensely rich virtual models of every interesting part of the Earth, collected not just from overhead flights and from satellite images and so on, but from the collective memory.

Дело в том, что мы можем взаимодействовать с социальной средой. Брать данные у любого человека, из общей коллективной памяти, чтобы видеть, как выглядит Земля, и связывать всё это вместе. Все эти фотографии связаны вместе, и из них возникает нечто большее, чем просто сумма составляющих. Вы постепенно получаете модель целой планеты. Считайте, что это продолжение работы Стивена Лоулера «Виртуальная Земля». Когда ее начинают использовать, она становится всё сложнее, в процессе её использования увеличиваются её преимущества. Фотографии пользователей связываются с метаданными, которые ввел кто-то другой. Если кто-то ввел информацию обо всех этих святых, указал, кто они такие, тогда моя фотография собора Парижской Богоматери внезапно обогащается всеми этими данными, и я могу ее использовать как точку входа для погружения в это пространство, в эту метавселенную, используя фотографии других людей и, таким образом, приобретя перекрестно-модальный и перекрестно-пользовательский социальный опыт. И, конечно, побочный результат всего этого — это виртуальные модели любой интересующей части Земли с чрезвычайно богатым содержанием, собранные не только со съемок местности и изображений со спутника и из всего такого прочего, но и из коллективной памяти.

Thank you so much.

Большое спасибо.

(Applause)

(Аплодисменты)

(Applause ends)

Chris Anderson: Do I understand this right? What your software is going to allow, is that at some point, really within the next few years, all the pictures that are shared by anyone across the world are going to link together?

Крис Андерсон: Я правильно понимаю? Ваше программное обеспечение позволит когда-нибудь, уже в ближайшие несколько лет, по существу, связать воедино все изображения, которые кто-либо размещал в сети во всем мире?

BAA: Yes. What this is really doing is discovering, creating hyperlinks, if you will, between images. It's doing that based on the content inside the images. And that gets really exciting when you think about the richness of the semantic information a lot of images have. Like when you do a web search for images, you type in phrases, and the text on the web page is carrying a lot of information about what that picture is of. What if that picture links to all of your pictures? The amount of semantic interconnection and richness that comes out of that is really huge. It's a classic network effect.

БАА: Да. Программа их находит. То есть создает гиперссылки, если угодно, между изображениями. Программа делает это, основываясь на содержимом изображений. Это просто потрясающе, если вы подумаете о богатстве семантической информации, которой обладают большинство из этих изображений. Например, когда вы ищете изображения в сети, вы набираете фразы, а текст на веб-странице содержит большое количество информации об этом изображении. А что если это изображение связано со всеми вашими изображениями? Тогда количество семантических взаимосвязей и весь тот объем информации, который вы получите, становится просто огромным. Это классический сетевой эффект. КА: Блейз, это просто невероятно. Я вас поздравляю.

CA: Truly incredible. Congratulations.

БАА: Огромное спасибо.

(Applause)

(Аплодисменты).

(Applause ends)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here.

(Laughter)

(Смех)

Thank you so much.

Большое спасибо.

(Applause)

(Аплодисменты)

(Applause ends)

CA: Truly incredible. Congratulations.

БАА: Огромное спасибо.

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art