Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

What I'm going to show you first, as quickly as I can, is some foundational work, some new technology that we brought to Microsoft as part of an acquisition almost exactly a year ago. This is Seadragon, and it's an environment in which you can either locally or remotely interact with vast amounts of visual data.

首先，我要用最快的速度為大家示範一些新技術的基礎研究成果。正好是一年前，微軟收購了我們公司，而我們為微軟帶來了這項技術，它就是「海龍」(Seadragon)。「海龍」是一個軟體環境，你可以通過它以近景或遠景的方式流覽浩瀚的視覺化資料。

We're looking at many, many gigabytes of digital photos here and kind of seamlessly and continuously zooming in, panning through it, rearranging it in any way we want. And it doesn't matter how much information we're looking at, how big these collections are or how big the images are. Most of them are ordinary digital camera photos, but this one, for example, is a scan from the Library of Congress, and it's in the 300 megapixel range. It doesn't make any difference because the only thing that ought to limit the performance of a system like this one is the number of pixels on your screen at any given moment. It's also very flexible architecture. This is an entire book, so this is an example of non-image data. This is "Bleak House" by Dickens. Every column is a chapter. To prove to you that it's really text, and not an image, we can do something like so, to really show that this is a real representation of the text; it's not a picture. Maybe this is an artificial way to read an e-book. I wouldn't recommend it.

我們在這裏看到的是，非常多千兆位元組的數位像片，我們可以對它們可以進行持續並且平滑的放大，可以通過全景的方式流覽它們，還可以對它們進行重新排列。不管所見到的資料有多少、圖像集有多大或是圖像本身有多大。以上展示的圖片，大部分是來自一般數位相機的照片，但這個例子不同，它是一張來自國會圖書館的掃描圖片，擁有3億個像素。然而，這沒有造成任何不同，因為能夠限制像這樣的系統效能的唯一因素，是你所使用的螢幕所正在顯示的像素數量。「海龍」同時也是一個非常靈活的架構。這是一本完整的書，是非圖形式資料的一個例子。這是狄更斯所著的《荒涼山莊》，每一欄就是一個章節。為了向你們證明這真的是文字而非圖片，我們可以這樣操作，大家可以看得出來這真的是文字，而不是一張圖片。也許這會是一種閱讀電子書的方式，但是我個人不會推薦這麼做。

This is a more realistic case, an issue of The Guardian. Every large image is the beginning of a section. And this really gives you the joy and the good experience of reading the real paper version of a magazine or a newspaper, which is an inherently multi-scale kind of medium. We've done something with the corner of this particular issue of The Guardian. We've made up a fake ad that's very high resolution -- much higher than in an ordinary ad -- and we've embedded extra content. If you want to see the features of this car, you can see it here. Or other models, or even technical specifications. And this really gets at some of these ideas about really doing away with those limits on screen real estate. We hope that this means no more pop-ups and other rubbish like that -- shouldn't be necessary.

接下來是一個更加實際的例子，這是一期《衛報》。你所看到的每一張大圖片，就是各版頭條，而報紙或雜誌的紙本，本身就包含了各種比例的媒材，因此這樣閱讀的時候，讀者會得到更好的閱讀體驗，從而享受閱讀的樂趣。我們在這裏做了點小小的更動，在這一期《衛報》的一角。我們刊登了一個非常高解析度的虛構廣告 — 比你平常看到的普通廣告的解析度要高很多，我們並在圖片中嵌入了額外的內容。如果你希望看到這輛車的特性，你可以看這裏，你還能看到其他的型號，甚至技術規格。這種方式在一定程度上，避開了螢幕面積的限制。我們希望這個技術能夠減少不必要的彈出視窗，及其他類似的，不必要的垃圾。

Of course, mapping is one of those obvious applications for a technology like this. And this one I really won't spend any time on, except to say that we have things to contribute to this field as well. But those are all the roads in the U.S. superimposed on top of a NASA geospatial image. So let's pull up, now, something else. This is actually live on the Web now; you can go check it out.

當然，對於像這樣的技術，數位地圖也是顯而易見的應用之一。對此，我真的不想花費太多的時間進行介紹，我只想告訴大家，我們對這個領域也貢獻了一己之力。這些是將美國的所有道路，疊加在太空總署的地理空間影像上。現在，我們先放下這些，看看其他的。實際上，這項技術已經放到網路上了，大家可以自己去體驗一下。

This is a project called Photosynth, which marries two different technologies. One of them is Seadragon and the other is some very beautiful computer-vision research done by Noah Snavely, a graduate student at the University of Washington, co-advised by Steve Seitz at U.W. and Rick Szeliski at Microsoft Research. A very nice collaboration. And so this is live on the Web. It's powered by Seadragon. You can see that when we do these sorts of views, where we can dive through images and have this kind of multi-resolution experience.

這個計畫名叫「相片合成」 (Photosynth)，它實際上融合了兩個不同的技術：一個是「海龍」，而另一個則是源自華盛頓大學的研究生 Noah Snavely，所進行的電腦視覺化研究的美麗成果。這項研究還得到了華盛頓大學 Steve Seitz 和微軟研究中心 Rick Szeliski 的共同指導。這是一個非常漂亮的合作成果。現在各位看到的是我們連上網路的即時示範，它是根基於「海龍」技術。你可以看到，我們輕鬆地對圖片進行多種方式的查看，就好像潛入這些影像一般，擁有了這種多解析度的瀏覽體驗。

But the spatial arrangement of the images here is actually meaningful. The computer vision algorithms have registered these images together so that they correspond to the real space in which these shots -- all taken near Grassi Lakes in the Canadian Rockies -- all these shots were taken. So you see elements here of stabilized slide-show or panoramic imaging, and these things have all been related spatially. I'm not sure if I have time to show you any other environments. Some are much more spatial. I would like to jump straight to one of Noah's original data-sets -- this is from an early prototype that we first got working this summer -- to show you what I think is really the punch line behind the Photosynth technology, It's not necessarily so apparent from looking at the environments we've put up on the website. We had to worry about the lawyers and so on.

不過，在這邊，這些圖片空間上的關係事實上是有意義的。電腦視覺演算法將這些圖片聯繫到一起，那麼這些圖片就能將真實空間給呈現出來了，而我們正是在這個地方拍了上述的照片 — 這些照片都是在加拿大洛磯山脈的格拉西湖附近拍下的 — 所有照片都是在這裏拍下的。因此，這邊你可以看到穩定幻燈片播放的元素或者環景影像，而這些內容在空間上都是互相關聯的。我不確定我是否有時間為你們示範其他環境的例子。有些其他例子比這個的空間感還要強。下面讓我們來看一下去年夏天， Noah 早期所建立的資料集之一，這是來自於「相片合成」技術早期的原型階段。我認為，這是我們這項技術最搶眼之處。「相片合成」技術不單單像我們剛剛在網站上所示範的環境般，那麼的簡單明瞭。主要因為我們製作網站時，要顧慮很多法律問題。

This is a reconstruction of Notre Dame Cathedral that was done entirely computationally from images scraped from Flickr. You just type Notre Dame into Flickr, and you get some pictures of guys in T-shirts, and of the campus and so on. And each of these orange cones represents an image that was discovered to belong to this model. And so these are all Flickr images, and they've all been related spatially in this way. We can just navigate in this very simple way.

這裡是利用 Flickr 網站上的照片，並完全以電腦重建的巴黎聖母院。你所要做的只是在 Flickr 網站上輸入「巴黎聖母院」，然後便能看到很多照片，包括在那邊留影的遊客等等。每一個橘色的錐形都代表了一張用來建立模型的照片。這些全部是來自 Flickr 的圖片，被這樣在空間裡被串聯起來。接著，我們便可如此自然的進行瀏覽。

(Applause)

（掌聲）

(Applause ends)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here.

說實話，我從來沒想過我會為微軟工作，這樣受到歡迎，真挺令人高興的。

(Laughter)

（笑聲）

I guess you can see this is lots of different types of cameras: it's everything from cell-phone cameras to professional SLRs, quite a large number of them, stitched together in this environment. If I can find some of the sort of weird ones -- So many of them are occluded by faces, and so on. Somewhere in here there is actually a series of photographs -- here we go. This is actually a poster of Notre Dame that registered correctly. We can dive in from the poster to a physical view of this environment.

我想你們可以看出，這些照片來自很多不同的相機：從手機鏡頭到專業的單眼相機。如此大量的不同品質的照片，全被在這個環境下拼湊在一起。讓我來找些比較詭異的照片。看，不少照片包含了遊客的大頭照等等。我記得這裡應該有一系列的照片 — 啊，在這兒。這實際上是一張有巴黎聖母院照片的海報，我們可以鑽到海報裡，去看整個重建的環境。

What the point here really is is that we can do things with the social environment. This is now taking data from everybody -- from the entire collective memory, visually, of what the Earth looks like -- and link all of that together. Those photos become linked, and they make something emergent that's greater than the sum of the parts. You have a model that emerges of the entire Earth. Think of this as the long tail to Stephen Lawler's Virtual Earth work. And this is something that grows in complexity as people use it, and whose benefits become greater to the users as they use it. Their own photos are getting tagged with meta-data that somebody else entered. If somebody bothered to tag all of these saints and say who they all are, then my photo of Notre Dame Cathedral suddenly gets enriched with all of that data, and I can use it as an entry point to dive into that space, into that meta-verse, using everybody else's photos, and do a kind of a cross-modal and cross-user social experience that way. And of course, a by-product of all of that is immensely rich virtual models of every interesting part of the Earth, collected not just from overhead flights and from satellite images and so on, but from the collective memory.

這裏的重點呢？便是我們可以有效地利用網路社群。我們可以從每個人那裡得到資料，將每個人對不同環境的視覺記憶蒐集在一起，並將它們連結起來。當所有這些圖片交織在一起時，所衍生出的東西，要遠遠超過各部件的總和，這個模型所衍生出的，是整個地球。將之想像是 Stephen Lawler《虛擬地球》的長尾市場。（Stephen Lawler 是微軟「虛擬地球」專案主管）這類模型，會隨著人們的使用而不斷變得更複雜，變得更加有價值。用戶的照片，會被其他人輸入標注資料。如果有人願意，為聖母院裡的所有聖賢輸入標注，表明他們是誰，那我們聖母院的照片便會一下子增加了許多資訊，然後呢，我們便能以這張照片為起點，進入這個空間，這個由很多人的照片所搭建的虛擬世界，從而得到一種跨越模型，跨越用戶的社交體驗。當然了，這一切所帶來另外一個寶貴產物便是，我們擁有地球上每一個有趣的地方，非常豐富的模型。這些模型的資料來源，不再僅限於空拍或衛星照片等等，而是來自全人類的集合記憶。

Thank you so much.

非常感謝！

(Applause)

（掌聲）

(Applause ends)

Chris Anderson: Do I understand this right? What your software is going to allow, is that at some point, really within the next few years, all the pictures that are shared by anyone across the world are going to link together?

Chris Anderson: 如果我理解正確的話，你們的這個軟體將能夠在未來的幾年內，將來自全球網路使用者所共享的照片結合在一起？

BAA: Yes. What this is really doing is discovering, creating hyperlinks, if you will, between images. It's doing that based on the content inside the images. And that gets really exciting when you think about the richness of the semantic information a lot of images have. Like when you do a web search for images, you type in phrases, and the text on the web page is carrying a lot of information about what that picture is of. What if that picture links to all of your pictures? The amount of semantic interconnection and richness that comes out of that is really huge. It's a classic network effect.

BAA:是的。這個軟體的真正意義便是去探索，它在照片間建立超鏈結。這個結合的過程，完全是基於照片的內容。更令人興奮的在於照片所包含的大量文字語義資訊。譬如，你在網路上搜尋一張照片，鍵入關鍵字後，網頁上的文字內容將包含大量與這個照片相關的資訊。現在，假設這些照片，全部都與你的照片互相連結，那將會怎樣？那時，所有這些語義資訊相互連結的資訊量將是非常巨大的。這是非常典型的網路效應。 CA:Blaise，太難以置信了。祝賀你們！

CA: Truly incredible. Congratulations.

BAA: 非常感謝各位！

(Applause)

（掌聲）

(Applause ends)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here.

說實話，我從來沒想過我會為微軟工作，這樣受到歡迎，真挺令人高興的。

(Laughter)

（笑聲）

Thank you so much.

非常感謝！

(Applause)

（掌聲）

(Applause ends)

Chris Anderson: 如果我理解正確的話，你們的這個軟體將能夠在未來的幾年內，將來自全球網路使用者所共享的照片結合在一起？

CA: Truly incredible. Congratulations.

BAA: 非常感謝各位！

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art