Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Това което ще Ви покажа първо, набързо, е в известна степен основополагаща работа, една нова технология която донесохме в Майкрософт, като част от една покупка от преди точно една година. Това е Сиидрагън (Водно конче) Представлява среда, в която можете отблизо или отдалечено, да разглеждате големи обеми от визуална информация.

What I'm going to show you first, as quickly as I can, is some foundational work, some new technology that we brought to Microsoft as part of an acquisition almost exactly a year ago. This is Seadragon, and it's an environment in which you can either locally or remotely interact with vast amounts of visual data.

В момента виждате много, много гигабайта цифрови снимки с плавно и непрекъснато увеличение, преместване и преподреждане по всевъзможни начини. Няма значение обема на информацията, която преглеждаме, колко са големи тези колекции, или колко са големи тези изображения. Повечето са снимки от обикновена цифрова камера, но това например, е сканирано от библиотеката на Конгреса, и е от порядъка на 300 мегапиксела. Няма никакво значение, защото единственото нещо, което ограничава скоростта на система като тази, е броя на точките на екрана

We're looking at many, many gigabytes of digital photos here and kind of seamlessly and continuously zooming in, panning through it, rearranging it in any way we want. And it doesn't matter how much information we're looking at, how big these collections are or how big the images are. Most of them are ordinary digital camera photos, but this one, for example, is a scan from the Library of Congress, and it's in the 300 megapixel range. It doesn't make any difference because the only thing that ought to limit the performance of a system like this one is the number of pixels on your screen at any given moment.

във всеки даден момент. Също така това е много гъвкава архитектура. Това е една цяла книга, като пример за не-снимкови данни. Това е "Студеният дом" на Дикенс. Всяка колона е глава. За да ви покажа, че това е наистина текст, а не изображение, мога да направя нещо такова, което наистина показва, че това е истинско представяне на текста - не е снимка. Може би това е леко измислен начин за четене на книги. Не бих ви го препоръчал.

It's also very flexible architecture. This is an entire book, so this is an example of non-image data. This is "Bleak House" by Dickens. Every column is a chapter. To prove to you that it's really text, and not an image, we can do something like so, to really show that this is a real representation of the text; it's not a picture. Maybe this is an artificial way to read an e-book. I wouldn't recommend it.

Това е по-реалистичен случай. Това е брой на Гардиън. Всяко голямо изображение е начало на раздел. И това наистина носи удоволствие и приятно усещане за четене на хартиеното издание на списание или вестник, които са среда с различни размери. Също така направихме нещо дребно в ъгъла на този брой на Гардиън. Фалшифицирахме една реклама с много висока разделителна способност - много по-висока отколко можете да видите в обикновена реклама - и й добавихме допълнително съдържание. Ако искате да видите данните за тази кола, можете да ги видите тук. Или други модели, или техническите им характеристики. Това наистина показва някои от тези идеи, за това, как можем да се справим с ограниченията в размера на екрана. Надяваме се, това да означава край на изкачащите реклами и на други подобни глупости - не би трябвало да са необходими.

This is a more realistic case, an issue of The Guardian. Every large image is the beginning of a section. And this really gives you the joy and the good experience of reading the real paper version of a magazine or a newspaper, which is an inherently multi-scale kind of medium. We've done something with the corner of this particular issue of The Guardian. We've made up a fake ad that's very high resolution -- much higher than in an ordinary ad -- and we've embedded extra content. If you want to see the features of this car, you can see it here. Or other models, or even technical specifications. And this really gets at some of these ideas about really doing away with those limits on screen real estate. We hope that this means no more pop-ups and other rubbish like that -- shouldn't be necessary.

Разбира се, картографирането е едно от тези очевидни приложения за технология като тази. На тази няма да отделям много време, освен да кажа, че имаме какво да допринесем и в тази област. Това са всички пътища в САЩ нанесени върху сателитна снимка на НАСА. Нека да погледнем сега нещо друго. Това е качено в Инернет в момента; можете да го погледнете.

Of course, mapping is one of those obvious applications for a technology like this. And this one I really won't spend any time on, except to say that we have things to contribute to this field as well. But those are all the roads in the U.S. superimposed on top of a NASA geospatial image. So let's pull up, now, something else. This is actually live on the Web now; you can go check it out.

Този проект се нарича Фотосинт, и свързва две различни технологии. Едната от тях е Сиидрагън а другата е едно чудесно изследване на компютърното зрение направено от Ноа Снейвли, студент във Вашингтонския университет, с ко-ръководители Стиив Сейц пак оттам и Рик Сзелиски от Майкрософт Рисърч. Много добро сътрудничество. И така това е качено в Интернет. Базирано е на Сиидрагън. Виждате как, като показваме тези изгледи, можем да се потопим в снимките и да получим това много-разделително усещане.

This is a project called Photosynth, which marries two different technologies. One of them is Seadragon and the other is some very beautiful computer-vision research done by Noah Snavely, a graduate student at the University of Washington, co-advised by Steve Seitz at U.W. and Rick Szeliski at Microsoft Research. A very nice collaboration. And so this is live on the Web. It's powered by Seadragon. You can see that when we do these sorts of views, where we can dive through images and have this kind of multi-resolution experience.

Но пространственото подреждане на образите тук е от значение. Алгоритмите за комютърно зрение са подредили тези образи заедно, за да отговарят на действителното пространство, в което тези снимки - всички направени близо до езерата Граси (Grassi) в канадските Скалисти планини - са направени. И така, тук виждате елементи от стабилизирана презентация или панорамно изобразяване. и тези образи са ориентирани в пространството. Не съм сигурен дали имам време да ви покажа някои други места. Има някои, които са много по-пространствени. Искам да мина направо към един от началните набори на Ноа - и този е от ранен прототип на Фотосинт, който подкарахме през лятото - за да ви покажа, това което мисля, е основата на тази технология, технологията на Фотосинт. И тя не е задължително видима от местата, които сме сложили на уебсайта. Трябваше да се съобразим с адвокатите и други такива.

But the spatial arrangement of the images here is actually meaningful. The computer vision algorithms have registered these images together so that they correspond to the real space in which these shots -- all taken near Grassi Lakes in the Canadian Rockies -- all these shots were taken. So you see elements here of stabilized slide-show or panoramic imaging, and these things have all been related spatially. I'm not sure if I have time to show you any other environments. Some are much more spatial. I would like to jump straight to one of Noah's original data-sets -- this is from an early prototype that we first got working this summer -- to show you what I think is really the punch line behind the Photosynth technology, It's not necessarily so apparent from looking at the environments we've put up on the website. We had to worry about the lawyers and so on.

Това е реконструкция на катедралата Св. Богородица, която е направена изцяло с изчисления на база на снимки свалени от Фликр. Ако просто напишете "Notre Dame" във Фликр ще получите снимки на хора по тениски, на университета и така нататък. И всеки един от оранжевите конуси представя изображение което е маркирано, като принадлежащо към този модел. И така, това всичко са снимки от Фликр, и те са подредени в пространството по този начин. И така можем да навигираме по този прост начин. (Аплодисменти)

This is a reconstruction of Notre Dame Cathedral that was done entirely computationally from images scraped from Flickr. You just type Notre Dame into Flickr, and you get some pictures of guys in T-shirts, and of the campus and so on. And each of these orange cones represents an image that was discovered to belong to this model. And so these are all Flickr images, and they've all been related spatially in this way. We can just navigate in this very simple way. (Applause)

(Applause ends)

Знаете ли, никога не съм си представял, че един ден ще работя в Майкрософт. Много е удовлетворително да получа такова посрещане тук. (Смях)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here. (Laughter)

Предполагам можете да видите, че това са много различни видове фотоапарати: всичко - от камери на мобилни телефони до професионални апрати, доста голям брой от тях, свързани заедно на това място. И ако мога да намеря, ще ви покажа някои странни. Много от тях са закрити от лица, или нещо друго. Някъде тук имаме серия от фотографии - ето ги. Това всъщност е постер на Св. Богородица с правилно разположение. Можем да се потопим от постера до физическия изглед на мястото.

I guess you can see this is lots of different types of cameras: it's everything from cell-phone cameras to professional SLRs, quite a large number of them, stitched together in this environment. If I can find some of the sort of weird ones -- So many of them are occluded by faces, and so on. Somewhere in here there is actually a series of photographs -- here we go. This is actually a poster of Notre Dame that registered correctly. We can dive in from the poster to a physical view of this environment.

Основната идея тук е, че можем да правим неща от социалната среда. Сега може да получим данни от всеки - от цялото съвместно съзнание от, визуално, това което представлява Земята - и ни свързва всички заедно. И всички тези снимки стават свързани заедно, и се получава нещо неочаквано, което е по-голямо от сумата на своите части. Имате модел, който се създава от цялата Земя. Можете да мислите за това като за "дългата опашка" на Виртуалната Земя на Стивън Лоулер. И това е нещо, което нараства по сложност когато хората го използват, и чиито ползи нарастват за потребителите му, докато го използват. Техните собствени снимки биват маркирани с мета-данни които някой друг е въвел. Ако някой друг се е постарал да отбележи всички тези светци и да каже кои са те, след това моята снимка на катедралата Св. Богородица изведнъж се обогатява с цялата тази информция, и аз мога да я използвам като отправна точка да се потопя в това място в тази мета-вселена, използвайки снимките на всички останали, и да получа един вид междуформено и междупотребителско социално изживяване. И разбира се, като страничен ефект от това са изключително богатите виртуални модели от всяко интересно кътче на Земята, събрано не само от самолетни и спътникови снимки и т.н., но и от съвкупната памет.

What the point here really is is that we can do things with the social environment. This is now taking data from everybody -- from the entire collective memory, visually, of what the Earth looks like -- and link all of that together. Those photos become linked, and they make something emergent that's greater than the sum of the parts. You have a model that emerges of the entire Earth. Think of this as the long tail to Stephen Lawler's Virtual Earth work. And this is something that grows in complexity as people use it, and whose benefits become greater to the users as they use it. Their own photos are getting tagged with meta-data that somebody else entered. If somebody bothered to tag all of these saints and say who they all are, then my photo of Notre Dame Cathedral suddenly gets enriched with all of that data, and I can use it as an entry point to dive into that space, into that meta-verse, using everybody else's photos, and do a kind of a cross-modal and cross-user social experience that way. And of course, a by-product of all of that is immensely rich virtual models of every interesting part of the Earth, collected not just from overhead flights and from satellite images and so on, but from the collective memory.

Много благодаря. (Аплодисменти)

Thank you so much. (Applause)

(Applause ends)

Крис Андерсън: Дали разбрах правилно? Това което твоя софтуер ще позволи е, че в даден момент, през следващите няколко години, всички снимки споделени от всеки по света ще бъдат свързани заедно?

Chris Anderson: Do I understand this right? What your software is going to allow, is that at some point, really within the next few years, all the pictures that are shared by anyone across the world are going to link together?

БАА: Да. Това което правим е откриване. Създаваме хипервръзки, ако искате, между изображения. И го правим, на базата на съдържанието на тези изображения. И това е доста вълнуващо, когато се замислите за богатството на семантичната информация, която доста от тези снимки имат. Например когато търсите в интернет за снимки, пишете фрази, и текста на уеб страницата носи много информация за това какво е на картинката. А какво би станало, ако тази картинка е свързана с всички ваши снимки? Тогава обема на семантичната взаимосвързаност и обема на богатството, което идва от това е наистина огромен. Класическия мрежов ефект.

BAA: Yes. What this is really doing is discovering, creating hyperlinks, if you will, between images. It's doing that based on the content inside the images. And that gets really exciting when you think about the richness of the semantic information a lot of images have. Like when you do a web search for images, you type in phrases, and the text on the web page is carrying a lot of information about what that picture is of. What if that picture links to all of your pictures? The amount of semantic interconnection and richness that comes out of that is really huge.

KA: Блейз, това е наистина невероятно. Поздравления.

It's a classic network effect.

БАА: Много благодаря.

CA: Truly incredible. Congratulations.

(Applause ends)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here. (Laughter)

Много благодаря. (Аплодисменти)

Thank you so much. (Applause)

(Applause ends)

KA: Блейз, това е наистина невероятно. Поздравления.

It's a classic network effect.

БАА: Много благодаря.

CA: Truly incredible. Congratulations.

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art