Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

What I'm going to show you first, as quickly as I can, is some foundational work, some new technology that we brought to Microsoft as part of an acquisition almost exactly a year ago. This is Seadragon, and it's an environment in which you can either locally or remotely interact with vast amounts of visual data.

먼저, 가능한 한 빨리, 보여드릴 것은, 거의 정확히 일 년 전에 합병의 일환으로 우리가 마이크로소프트에 가지고 갔던 어떤 기초 작업, 신기술입니다. 여러분이 현지에서나 원거리에서나 막대한 양의 시각 데이터로 상호 작용할 수 있는 환경입니다.

We're looking at many, many gigabytes of digital photos here and kind of seamlessly and continuously zooming in, panning through it, rearranging it in any way we want. And it doesn't matter how much information we're looking at, how big these collections are or how big the images are. Most of them are ordinary digital camera photos, but this one, for example, is a scan from the Library of Congress, and it's in the 300 megapixel range. It doesn't make any difference because the only thing that ought to limit the performance of a system like this one is the number of pixels on your screen at any given moment. It's also very flexible architecture. This is an entire book, so this is an example of non-image data. This is "Bleak House" by Dickens. Every column is a chapter. To prove to you that it's really text, and not an image, we can do something like so, to really show that this is a real representation of the text; it's not a picture. Maybe this is an artificial way to read an e-book. I wouldn't recommend it.

여기서 우리는 상당히 많은 기가바이트의 디지털 사진들을 보고 있고, 이음새없이 아주 매끄럽고 연속적으로 줌 인해 들어가고, 다른 쪽으로 패닝하고, 우리가 원하는 어떤 식으로든 재배치를 할 수 있습니다. 우리가 얼마나 많은 정보를 보고 있는지, 이 컬렉션이 정말 얼마나 큰 지, 이미지들이 얼마나 큰 지는 그리 중요하지 않습니다. 이중 대부분은 평범한 디지털 카메라 사진들입니다. 하지만, 예를 들어, 이것은 국회도서관에서 스캔한 것입니다. 거의 300 메가픽셀 짜리입니다. 그래도 별 차이가 없습니다. 이와 같은 시스템의 성능을 제한하는 유일한 요소는 그 순간 화면의 픽셀수입니다. 또한 이것은 아주 유연한 아키텍처입니다. 이것은 비이미지 데이터의 예로서, 책 전체입니다. 이것은 디킨즈의 "황량한 집"입니다. 각 열이 한 챕터입니다. 이게 이미지가 아닌 실제 텍스트라는 걸 증명해 드리기 위해 이런 작업도 할 수 있습니다. 이게 그림이 아니고 정말로 텍스트의 표현이라는 것을 보여드리기 위해서 말이죠. 어쩌면 이것은 e-북을 읽는 일종의 인위적인 방법이라고 할 수 있습니다. 추천해 드리지는 않겠습니다.

This is a more realistic case, an issue of The Guardian. Every large image is the beginning of a section. And this really gives you the joy and the good experience of reading the real paper version of a magazine or a newspaper, which is an inherently multi-scale kind of medium. We've done something with the corner of this particular issue of The Guardian. We've made up a fake ad that's very high resolution -- much higher than in an ordinary ad -- and we've embedded extra content. If you want to see the features of this car, you can see it here. Or other models, or even technical specifications. And this really gets at some of these ideas about really doing away with those limits on screen real estate. We hope that this means no more pop-ups and other rubbish like that -- shouldn't be necessary.

이것은 좀더 현실적인 경우입니다. 이건 가디언 잡지입니다. 큰 이미지들은 섹션의 시작입니다. 이것은 정말 여러분께 진짜 종이로 된 잡지나 신문을 읽는 기쁨과 좋은 경험을 전해드립니다. 원래가 다중스케일 유형의 매체거든요. 또한 바로 이번호의 가디언 구석에 작은 작업 하나를 해 놓았습니다. 아주 고해상도로 위조 광고를 만들었지요-- 보통 광고에서 보실 수 있는 것보다 훨씬 더 고해상도로-- 그리고 추가 컨텐츠를 내장해 넣었습니다. 이 자동차의 특징을 보고 싶으시면, 여기서 보실 수 있습니다. 아니면 다른 모델들, 또는 기술 사양까지도. 인터넷 부동산에서 정말 그 한계를 없애는 것에 대해 이런 아이디어를 얻습니다. 이것으로 인해 더이상 팝업창이나 이런 저런 쓸데없는 것들이 더 뜨지 않기를 바랍니다. 필요없으니까요.

Of course, mapping is one of those obvious applications for a technology like this. And this one I really won't spend any time on, except to say that we have things to contribute to this field as well. But those are all the roads in the U.S. superimposed on top of a NASA geospatial image. So let's pull up, now, something else. This is actually live on the Web now; you can go check it out.

물론, 이런 기술에는 지도 작업이 가장 확실한 응용 프로그램 중 하나입니다. 그리고 여기에는 정말 시간을 쓰지 않겠습니다. 단, 이 말씀만 드리고 싶어요. 저희는 이 분야에 기여할 만한 것도 가지고 있습니다. 저것들은 미국의 모든 길들을 NASA 지리공간 자료 위에 겹친 이미지입니다. 이제 세워놓고, 다른 것을 볼까요? 지금 웹에서 라이브 중이니, 가서 확인해 보실 수 있습니다.

This is a project called Photosynth, which marries two different technologies. One of them is Seadragon and the other is some very beautiful computer-vision research done by Noah Snavely, a graduate student at the University of Washington, co-advised by Steve Seitz at U.W. and Rick Szeliski at Microsoft Research. A very nice collaboration. And so this is live on the Web. It's powered by Seadragon. You can see that when we do these sorts of views, where we can dive through images and have this kind of multi-resolution experience.

이것은 포토신스라고 부르는 프로젝트입니다. 두 가지 다른 기술들이 정말 합쳐집니다. 그 중 하나는 씨드래곤입니다. 다른 것은 아주 아름다운 컴퓨터 비전 연구입니다. 워싱턴대학교 대학원생인 노아 스네이블리가 만든 것인데, 워싱턴대학교의 스티브 스타이츠와 마이크로소프트 리서치의 릭 첼리스키의 공동 지도를 받았지요. 참 멋진 협력입니다. 그리고 웹에서 라이브로 볼 수 있습니다. 씨드래곤이 지원하고 있습니다. 우리는 이런 종류의 것들을 볼 때 이미지 속으로 깊이 들어갈 수 있고 이런 종류의 다중 해상도 경험을 갖게 됩니다.

But the spatial arrangement of the images here is actually meaningful. The computer vision algorithms have registered these images together so that they correspond to the real space in which these shots -- all taken near Grassi Lakes in the Canadian Rockies -- all these shots were taken. So you see elements here of stabilized slide-show or panoramic imaging, and these things have all been related spatially. I'm not sure if I have time to show you any other environments. Some are much more spatial. I would like to jump straight to one of Noah's original data-sets -- this is from an early prototype that we first got working this summer -- to show you what I think is really the punch line behind the Photosynth technology, It's not necessarily so apparent from looking at the environments we've put up on the website. We had to worry about the lawyers and so on.

그러나 여기서 이미지들의 공간적인 배치가 사실 의미있습니다. 컴퓨터 비전 알고리듬은 이 이미지들을 함께 등록했고, 그래서 이 이미지들은 캐나다 록키산맥의 그래시 호수 근처-- 이 사진들이 찍힌 진짜 공간과 일치합니다. 여기서 안정화된 슬라이드 쇼나 전방위 영상 시스템의 요소들, 이것들은 모두 공간적으로 연결되어 있습니다. 다른 환경들을 보여드릴 시간이 될 지 잘 모르겠습니다. 훨씬 더 공간적인 것들이 있습니다. 막바로 노아의 원래 데이터 세트 중 하나에 대해 얘기할까 합니다. 이것은 포토신스의 초기 원형에서 비롯된 것인데, 여름에 우리가 처음 작업하게 되었던 것입니다. 이게 바로 이 기술, 포토신스 기술 뒤에 숨어 있는 핵심을 찌르는 것이라고 생각합니다. 이것을 보여드리고 싶습니다. 우리가 웹사이트에 올려놓는 환경을 볼 때 그렇게 분명할 필요는 없습니다. 우리는 변호사 등에 대해 걱정해야 했습니다.

This is a reconstruction of Notre Dame Cathedral that was done entirely computationally from images scraped from Flickr. You just type Notre Dame into Flickr, and you get some pictures of guys in T-shirts, and of the campus and so on. And each of these orange cones represents an image that was discovered to belong to this model. And so these are all Flickr images, and they've all been related spatially in this way. We can just navigate in this very simple way.

이것은 노틀담 대성당을 플릭커에서 스크랩한 이미지들을 가지고 전적으로 수학적으로 재구성한 것입니다. 플릭커에 노틀담이라고만 치면, 티셔츠 입은 사람들, 캠퍼스의 사람들 사진 등이 나옵니다. 이 오렌지 원뿔 하나 하나가 이 모델에 속하는 것으로 발견된 이미지를 나타냅니다. 그리고 이것들은 모두 플릭커 이미지입니다. 모두 이런 식으로 서로 공간적으로 관련되어 있습니다. 이렇게 아주 간단한 방법으로 네비게이션을 할 수 있습니다.

(Applause)

(박수).

(Applause ends)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here.

저는 제가 결국 마이크로소프트에서 일하게 될 것이라고는 생각해 본 적이 없습니다. 여기서 이런 식의 접대를 받다니 너무나 감사합니다.

(Laughter)

(웃음).

I guess you can see this is lots of different types of cameras: it's everything from cell-phone cameras to professional SLRs, quite a large number of them, stitched together in this environment. If I can find some of the sort of weird ones -- So many of them are occluded by faces, and so on. Somewhere in here there is actually a series of photographs -- here we go. This is actually a poster of Notre Dame that registered correctly. We can dive in from the poster to a physical view of this environment.

여러분은 수많은 종류의 많은 카메라들을 보실 수 있을 것입니다. 핸드폰 카메라에서부터 전문가용 SLR에 이르기까지 다양합니다. 꽤 많은 수의 사진들을 이 환경에 함께 짜넣었습니다. 가능하다면, 저는 이상한 종류의 것들을 찾을 것입니다. 많은 것들이 얼굴로 가려져 있습니다. 여기 저기는 사실 일련의 사진들입니다 - 여기 있네요. 이것이 실제로 정확하게 등록된 노틀담 포스터입니다. 우리는 포스터로부터 이 환경의 물리적인 광경으로 깊이 잠수할 수 있습니다.

What the point here really is is that we can do things with the social environment. This is now taking data from everybody -- from the entire collective memory, visually, of what the Earth looks like -- and link all of that together. Those photos become linked, and they make something emergent that's greater than the sum of the parts. You have a model that emerges of the entire Earth. Think of this as the long tail to Stephen Lawler's Virtual Earth work. And this is something that grows in complexity as people use it, and whose benefits become greater to the users as they use it. Their own photos are getting tagged with meta-data that somebody else entered. If somebody bothered to tag all of these saints and say who they all are, then my photo of Notre Dame Cathedral suddenly gets enriched with all of that data, and I can use it as an entry point to dive into that space, into that meta-verse, using everybody else's photos, and do a kind of a cross-modal and cross-user social experience that way. And of course, a by-product of all of that is immensely rich virtual models of every interesting part of the Earth, collected not just from overhead flights and from satellite images and so on, but from the collective memory.

여기서 진짜 중요한 것은 우리가 사회적인 환경으로도 이런 것들을 할 수 있다는 것입니다. 이것은 모든 사람으로부터 데이터를 취하고 있습니다. 시각적으로, 지구처럼 보이는 것의 전체적인 집단 기억으로부터-- 그리고 모든 것들을 연결시킵니다. 이 모든 사진들이 서로 연결되고, 부분들의 총합보다 훨씬 큰 것이 나타나도록 합니다. 지구 전체에서 나타나는 모델입니다. 이것을 스티븐 롤러의 시각적 지구 작품의 긴 꼬리라고 생각하세요. 사람들이 이용할 때, 그 복합성이 더 커지는 것이고, 사용자들이 사용할 때, 혜택이 더 커지는 것입니다. 자기가 찍은 사진들이 다른 누군가가 입력한 메타 데이타라는 태그를 갖게 됩니다. 누군가가 이 모든 성인들에게 태그를 붙이는 것이 귀찮고, 도대체 이게 다 누구냐고 한다면, 제 사진 노틀담 대성당이 갑자기 나타나 그 모든 데이터를 풍성하게 만들고 저는 그 공간 깊은 곳으로, 메타-시 깊은 곳으로, 다른 사람의 사진들을 이용하여, 깊이 잠수해 갈 시작점으로 이용할 수 있습니다. 그런 식으로 상호 사용자 사회 경험이 됩니다. 물론, 그 모든 것의 부산물은 지구의 모든 흥미로운 부분의 엄청나게 풍요로운 버추얼 모델들입니다. 머리 위를 지나가는 비행기나 위성 이미지들에서 수합한 것이 아니라 집단적인 기억에서 수합한 것입니다.

Thank you so much.

대단히 감사합니다.

(Applause)

(박수).

(Applause ends)

Chris Anderson: Do I understand this right? What your software is going to allow, is that at some point, really within the next few years, all the pictures that are shared by anyone across the world are going to link together?

크리스 앤더슨: 제가 제대로 이해를 하고 있나요? 말씀하신 소프트웨어가 어떤 시점에서, 정말 앞으로 몇 년 내에, 전세계에서 누구나 공유하는 모든 사진들이 기본적으로 서로 연결되도록 해줄 것이라는 건가요?

BAA: Yes. What this is really doing is discovering, creating hyperlinks, if you will, between images. It's doing that based on the content inside the images. And that gets really exciting when you think about the richness of the semantic information a lot of images have. Like when you do a web search for images, you type in phrases, and the text on the web page is carrying a lot of information about what that picture is of. What if that picture links to all of your pictures? The amount of semantic interconnection and richness that comes out of that is really huge. It's a classic network effect.

BAA: 예. 이것이 실제로 하고 있는 것은 발견하는 것입니다. 그리고 이미지들 간에 하이퍼링크를 만드는 것입니다. 이미지 내부에 있는 컨텐츠를 기반으로 그렇게 하는 것입니다. 그 수많은 이미지들이 가지고 있는 의미론적인 정보의 풍요로움에 대해 생각해 보시면 정말 흥분하게 됩니다. 이미지를 찾아 웹검색을 할 때처럼, 원하는 구절을 입력합니다. 웹페이지의 텍스트가 그 사진이 무엇인지에 대해 수많은 정보를 담고 있습니다. 이제, 그 사진이 여러분의 사진 모두와 연결되면 어떻게 될까요? 그러면, 의미론적인 상호 연결의 양과 그로부터 나오는 풍요로움의 양이 정말로 막대합니다. 고전적인 네트워크 효과입니다. 크리스 앤더슨: 블레즈, 정말 놀랍습니다. 축하합니다.

CA: Truly incredible. Congratulations.

BAA: 대단히 감사합니다.

(Applause)

(박수).

(Applause ends)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here.

저는 제가 결국 마이크로소프트에서 일하게 될 것이라고는 생각해 본 적이 없습니다. 여기서 이런 식의 접대를 받다니 너무나 감사합니다.

(Laughter)

(웃음).

Thank you so much.

대단히 감사합니다.

(Applause)

(박수).

(Applause ends)

CA: Truly incredible. Congratulations.

BAA: 대단히 감사합니다.

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art