Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

What I'm going to show you first, as quickly as I can, is some foundational work, some new technology that we brought to Microsoft as part of an acquisition almost exactly a year ago. This is Seadragon, and it's an environment in which you can either locally or remotely interact with vast amounts of visual data.

آنچه ابتدا به شما نشان خواهم داد، البته اگر كمي بجنبم، کاری بنیادی است، يك فن آوری جدید که به عنوان بخشی از یک محصول به مایکروسافت آوردیم تقریبآ یک سال پیش. این سی دراگون است. و محیطی است که آنجا می توانید بطور نزدیک یا از راه دور با مقادیر بزرگی از داده های تصويري تعامل داشته باشید.

We're looking at many, many gigabytes of digital photos here and kind of seamlessly and continuously zooming in, panning through it, rearranging it in any way we want. And it doesn't matter how much information we're looking at, how big these collections are or how big the images are. Most of them are ordinary digital camera photos, but this one, for example, is a scan from the Library of Congress, and it's in the 300 megapixel range. It doesn't make any difference because the only thing that ought to limit the performance of a system like this one is the number of pixels on your screen at any given moment. It's also very flexible architecture. This is an entire book, so this is an example of non-image data. This is "Bleak House" by Dickens. Every column is a chapter. To prove to you that it's really text, and not an image, we can do something like so, to really show that this is a real representation of the text; it's not a picture. Maybe this is an artificial way to read an e-book. I wouldn't recommend it.

ما به گیگابایتهای بسیار زیادی از عکسهای دیجیتالی در اینجا نگاه می کنیم و به نوعي بدون وقفه و مرتب به داخل زوم می کنیم، با حرکت از میان آن، و باز آرایی آن بطور دلخواه. و مهم نیست که به چه مقدار داده نگاه می کنیم يا چه اندازه این مجموعه ها وسعت دارند یا اندازه تصاویر چقدر است. اکثر آنها عکسهای عادی دوربین دیجیتالی هستند، اما این یکی مثلآ، يك تصوير اسكن شده از کتابخانه کنگره است، و در حدود 300 مگاپیکسل می باشد. فرقی نمی کند زیرا تنها چیزی که می تواند عملکرد چنين سیستمی را محدود سازد تعداد پیکسلهای مونیتور شما است در هر لحظه. همچنین معماري بسیار انعطاف پذیری دارد. این یک کتاب کامل است، مثاالی از داده های غیر تصویری. این خانه متروک نوشته دیکنز می باشد. هر ستون یک فصل است. برای ثابت نمودن به شما که آن واقعآ متن است و نه تصویر می توانیم کاری انجام دهیم تا واقعآ نشان دهیم که این یک بازسازی حقیقی از متن است؛ و عکس نیست. شاید این نوعی راه مصنوعی برای خواندن کتاب الكترونيكي است. آنرا توصیه نمی کنم.

This is a more realistic case, an issue of The Guardian. Every large image is the beginning of a section. And this really gives you the joy and the good experience of reading the real paper version of a magazine or a newspaper, which is an inherently multi-scale kind of medium. We've done something with the corner of this particular issue of The Guardian. We've made up a fake ad that's very high resolution -- much higher than in an ordinary ad -- and we've embedded extra content. If you want to see the features of this car, you can see it here. Or other models, or even technical specifications. And this really gets at some of these ideas about really doing away with those limits on screen real estate. We hope that this means no more pop-ups and other rubbish like that -- shouldn't be necessary.

این موردی واقعی تر است. این شماره ای از روزنامه گاردین است. هر تصویر بزرگ در ابتدای هر بخش است. و این واقعآ به شما شادی و تجربه خوب خواندن نوع کاغذی مجله یا روزنامه را می دهد، که بطور اساسي نوعی رسانه چند مقياسي است. ما همچنین کار کوچکی انجام دادیم با گوشه این شماره خاص از روزنامه گاردین. ما نسخه ای تقلبی از تبليغات ایجاد نمودیم که تفكيك پذيري بسیار بالايي دارد بسیار بالاتر از آنچه بتوان در تبلیغی عادی بدست آورد- و همراه آن محتوای بیشتری وارد ساختیم. اگر می خواهید اجزای این خودرو را ببینید، می توانید آنها را اینجا ببینید. یا مدلهای دیگر، یا حتی جزئیات فنی. و این واقعآ مي‌تواند به اين ايده نزديك شود كه ديگر نبايد نگران محدوديت فيزيكي صفحه مانيتور بود. ما امیدواریم که این به معنی خاتمه حضور پاپ آپها باشد و ديگر به چیزهای بیهوده ای مثل آنها- نیاز نخواهد بود.

Of course, mapping is one of those obvious applications for a technology like this. And this one I really won't spend any time on, except to say that we have things to contribute to this field as well. But those are all the roads in the U.S. superimposed on top of a NASA geospatial image. So let's pull up, now, something else. This is actually live on the Web now; you can go check it out.

البته، نقشه کشی یکی از استفاده های بارز برای چنین فن آوری می باشد. و برای این یکی وقتی نمی گذارم، فقط خواهم گفت که چیزهایی برای اهدا به این رشته نیز داریم. اما آنها همه راههای زمینی آمریکا هستند. كه بر روي تصویری فضایی از ناسا قرار گرفته‌اند. پس بگذارید اکنون چیز دیگری بیاوریم. این در واقع الان بطور زنده دراینترنت است. می توانید به آن مراجعه کنید.

This is a project called Photosynth, which marries two different technologies. One of them is Seadragon and the other is some very beautiful computer-vision research done by Noah Snavely, a graduate student at the University of Washington, co-advised by Steve Seitz at U.W. and Rick Szeliski at Microsoft Research. A very nice collaboration. And so this is live on the Web. It's powered by Seadragon. You can see that when we do these sorts of views, where we can dive through images and have this kind of multi-resolution experience.

این پروژه ای است که فتو سینت نام دارد، که در واقع دو فن آوری را با هم می آمیزد. یکی از آنها سی دراگون است و آن یکی تحقیقی زیبا از بينايي کامپیوتری است كه توسط نوا اسناولی، دانشجوی کارشناسی ارشد در دانشگاه واشنگتن انجام شده است با مشاوره استیو سایتز از دانشگاه واشنگتن و ریک شلیسکی از مركزتحقيقات مایکروسافت. همکاری خوبی بود. بنابراین این دراینترنت بطور زنده است. با پشتیبانی سی دراگون. می بینید که وقتی این نوع تصاویر را نمایش می دهیم، می توانیم داخل تصاویر غوطه‌ور شويم و این نوع تجربه چند رزولوشنی را داشته باشیم.

But the spatial arrangement of the images here is actually meaningful. The computer vision algorithms have registered these images together so that they correspond to the real space in which these shots -- all taken near Grassi Lakes in the Canadian Rockies -- all these shots were taken. So you see elements here of stabilized slide-show or panoramic imaging, and these things have all been related spatially. I'm not sure if I have time to show you any other environments. Some are much more spatial. I would like to jump straight to one of Noah's original data-sets -- this is from an early prototype that we first got working this summer -- to show you what I think is really the punch line behind the Photosynth technology, It's not necessarily so apparent from looking at the environments we've put up on the website. We had to worry about the lawyers and so on.

اما آرایش فضایی تصاویر در واقع معنی دار است. الگوریتمهای بينايي کامپیوتری این تصاویر را با هم ثبت کرده است بطوریکه با فضای واقعی که این تصاویر در آن گرفته شده‌اند يعني نزدیک دریاچه های گراسی در سلسله کوههای راکی کانادا- همخوانی داشته باشد. پس اینجا عناصری می بینید از نمایش اسلاید تثبیت یافته یا تصویر سازی منظره ای و این چیزها همه بطور فضایی بهم مربوط شده اند. مطمئن نیستم که وقت دارم محیطهای دیگری به شما نشان دهم. بعضی وجود دارند که بسیار گسترده ترند. می خواهم به یکی از مجموعه داده های اصلی نوا بروم- و این از یکی از نمونه‌هاي اولیه فوتوسینت می باشد- که ابتدا در تابستان شروع به کار بر آن کردیم- تا به شما نشان دهم که چه فکر می کنم واقعآ جز اصلی این فن آوری می باشد. فن آوری فوتو سینت. و لزومآ آنقدر واضح نیست با نگاه کردن به مناظری که بر وب سایت گذاشته ایم. لازم بود نگران وکلا و غیره باشیم.

This is a reconstruction of Notre Dame Cathedral that was done entirely computationally from images scraped from Flickr. You just type Notre Dame into Flickr, and you get some pictures of guys in T-shirts, and of the campus and so on. And each of these orange cones represents an image that was discovered to belong to this model. And so these are all Flickr images, and they've all been related spatially in this way. We can just navigate in this very simple way.

این بازسازی کلیسای نوتر دام است که کاملآ با کامپیوتر انجام شد با استفاذه از تصاویر فلیکر. کافی است در فلیکر عبارت نوتر دام را تایپ کنید، و عکسهایی از افراد با تی شرت و محیط دانشگاهی و غیره دریافت می کنید. و هر کدام از این مخروطهای نارنجی رنگ نمایانگر یک تصویر است که به این الگو متعلق می باشد. بنابراین اینها همه تصاویر فلیکر هستند و همه آنها بدین طریق به لحاظ فضائی مربوط شده اند. و ما می توانیم بدین طریق ساده آنها را مرور کنیم.

(Applause)

(تشویق)

(Applause ends)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here.

می دانید، هرگز فکر نمی کردم که در مایکروسافت کار کنم. براي من خیلی رضایت بخش است که در چنين مراسمي تشويق شوم.

(Laughter)

(خنده)

I guess you can see this is lots of different types of cameras: it's everything from cell-phone cameras to professional SLRs, quite a large number of them, stitched together in this environment. If I can find some of the sort of weird ones -- So many of them are occluded by faces, and so on. Somewhere in here there is actually a series of photographs -- here we go. This is actually a poster of Notre Dame that registered correctly. We can dive in from the poster to a physical view of this environment.

گمان می کنم می توانید ببینید که اینها انواع بسیاری از دوربین است: همه چیز ازدوربینهای موبایل گرفته تا اس ال آرهای حرفه ای، تعداد بزرگی از آنها كه به همديگر وصل شده‌اند در این محیط. و اگر بتوانم، انواع عجیب آنرا پیدا می کنم. بسیاری از آنها توسط صورتها بسته شده اند و غیره. در جایی اینجا در واقع یک سری عکس وجود دارد- اینجا است. این در واقع پوستری از نوتر دام است که صحیح ثبت شده است. ما می توانیم از پوستر وارد دیدگاهی فیزیکی این محیط شویم.

What the point here really is is that we can do things with the social environment. This is now taking data from everybody -- from the entire collective memory, visually, of what the Earth looks like -- and link all of that together. Those photos become linked, and they make something emergent that's greater than the sum of the parts. You have a model that emerges of the entire Earth. Think of this as the long tail to Stephen Lawler's Virtual Earth work. And this is something that grows in complexity as people use it, and whose benefits become greater to the users as they use it. Their own photos are getting tagged with meta-data that somebody else entered. If somebody bothered to tag all of these saints and say who they all are, then my photo of Notre Dame Cathedral suddenly gets enriched with all of that data, and I can use it as an entry point to dive into that space, into that meta-verse, using everybody else's photos, and do a kind of a cross-modal and cross-user social experience that way. And of course, a by-product of all of that is immensely rich virtual models of every interesting part of the Earth, collected not just from overhead flights and from satellite images and so on, but from the collective memory.

نکته واقعی اینجاست که ما می توانیم کارهایی را انجام دهیم با محیط اجتماعی. این هم اکنون از همه داده می گیرد- از کل حافظه جمعی از آنچه کره زمین بطور تصويري بنظر می رسد- و همه آنها را به هم مربوط می سازد. تمام آن عکسها به هم مربوط می شوند و آنها چیزی را ایجاد می کنند که بزرگتر از مجموع اجزایش است. شما الگویی دارید که از کل کره زمین پدید می آید. فکر کنید که این انتهای بلندی از اثر زمین مجازي توسط استیفن لاولر است. و این چیزی است که بطور افزون پیچیده تر می شود با مرور استفاده مردم، و فواید آن بیشتر می شوند برای مصرف کنندگان با مرور استفاده. عکسهای خودشان با فرا-داده ها برچسب گذاري می شوند داده هایی که شخص دیگری وارد ساخته. اگر کسی وقت صرف نامیدن این قدیسان نموده و بگوید آنها چه کسانی هستند، پس عکس من از کلیسای بزرگ نوتر دام ناگهان با آن داده ها غنی می شود، و من می توانم از آن بعنوان نقطه ورود برای داخل شدن در آن فضا استفاده کنم. داخل آن فرا-نوشته با استفاده از عکسهای دیگران و نوعی تجربه اجتماعی بین حسی و بین مصرف کننده به آن صورت انجام دهم. و البته، محصول جانبی تمام آنها الگوهای مجازي بسیار غنی هر بخش جالب کره زمین است، كه تدوین شده‌اند نه تنها توسط پروازهای فوقانی و تصاویر ماهواره ای و غیره، بلکه از حافظه جمعی.

Thank you so much.

بسیار متشکرم.

(Applause)

(تشویق)

(Applause ends)

Chris Anderson: Do I understand this right? What your software is going to allow, is that at some point, really within the next few years, all the pictures that are shared by anyone across the world are going to link together?

کریس اندرسون: آیا درست متوجه شدم؟ که آنچه نرم افزار شما مجال می دهد، این است که در نقطه ای، واقعآ در چند سال آینده، تمام عکسهایی که توسط هر کس در سراسر جهان تبادل می شوند اساسآ با هم مربوط می شوند؟

BAA: Yes. What this is really doing is discovering, creating hyperlinks, if you will, between images. It's doing that based on the content inside the images. And that gets really exciting when you think about the richness of the semantic information a lot of images have. Like when you do a web search for images, you type in phrases, and the text on the web page is carrying a lot of information about what that picture is of. What if that picture links to all of your pictures? The amount of semantic interconnection and richness that comes out of that is really huge. It's a classic network effect.

ب. آ. آ: بله. آنچه این انجام می دهد اکتشاف است. بین تصاویر، می توان گفت که هایپرلینک ایجاد می کند. و آنرا انجام می دهد بر اساس محتوای درون تصاویر. و آن بسیار هیجان آور است وقتی به غنای داده های معنای کلماتی که بسیاری از آن تصاویر دارند فکر کنیم. مثلآ وقتی در اینترنت برای تصاویر جستجو می کنید، لغات تایپ می کنید، و متن صفحه اینترتنی اطلاعات زیادی راجع به محتوای تصويري حمل می کند. حال چه می شود اگر آن عکس به تمام عکسهای شما مربوط شود؟ آنگاه مقدار ارتباط فیمابین معنايی و مقدار غنایی که از آن می آید بسیار عظیم است. اثر شبکه ای کلاسیک است. سی. اس: بلیز این واقعآ باور نکردنی است. تبریک می گویم.

CA: Truly incredible. Congratulations.

بی ای ای: بسیار متشکرم.

(Applause)

(تشویق)

(Applause ends)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here.

می دانید، هرگز فکر نمی کردم که در مایکروسافت کار کنم. براي من خیلی رضایت بخش است که در چنين مراسمي تشويق شوم.

(Laughter)

(خنده)

Thank you so much.

بسیار متشکرم.

(Applause)

(تشویق)

(Applause ends)

CA: Truly incredible. Congratulations.

بی ای ای: بسیار متشکرم.

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art