Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

ما سأعرضه عليكم أولاً، بأسرع ما يمكن هو بعض الأعمال الأساسية، بعضاً من التكنولوجيا الحديثة التي جلبناها إلى شركة ميكروسوفت كجزء من عملية استحواذ منذ سنة تقريباً. هذا هو تطبيق "سي دراجون" وهو عبارة عن بيئة يمكنك التفاعل فيها سواء بشكل مباشر أو عن بُعد مع كميات كبيرة من البيانات المرئية

What I'm going to show you first, as quickly as I can, is some foundational work, some new technology that we brought to Microsoft as part of an acquisition almost exactly a year ago. This is Seadragon, and it's an environment in which you can either locally or remotely interact with vast amounts of visual data.

نحن ننظر إلى العديد والعديد من مساحات الجيجا بايت من الصور الرقمية هنا وتكبير الصورة بشكل سلس ومستمر والتحرك عبر الشيء، وإعادة ترتيبه بأي شكل نريده ولا يهم كم المعلومات التي ننظر إليها ولا حجم هذه المجموعات أو حجم الصور معظمها عبارة عن صور التقطت بكاميرات رقمية لكن هذه الصورة، على سبيل المثال، مأخوذة بالماسح الضوئي من مكتبة الكونجرس وهي في نطاق 300 ميجا بكسل وهذا لا يُحدث أي فرق لأن الشيء الوحيد الذي ينبغي أن يعمل على الحد من أداء جهاز كهذا هو عدد وحدات البكسل على الشاشة

We're looking at many, many gigabytes of digital photos here and kind of seamlessly and continuously zooming in, panning through it, rearranging it in any way we want. And it doesn't matter how much information we're looking at, how big these collections are or how big the images are. Most of them are ordinary digital camera photos, but this one, for example, is a scan from the Library of Congress, and it's in the 300 megapixel range. It doesn't make any difference because the only thing that ought to limit the performance of a system like this one is the number of pixels on your screen at any given moment.

في أي لحظة من اللحظات. وهي أيضاً عبارة عن شكل هندسي مرن للغاية هذا كتاب بالكامل، وهو مثال للبيانات التي لا تشمل الصور هذه هي رواية "Bleak House" التي كتبها ديكنز. كل عمود هو عبارة عن فصل من الكتاب لكي أثبت لكم أنه نص بالفعل، وليس صورة يمكن أن نفعل شيئاً كهذا. لكي نوضح بالفعل أنه عبارة عن نص، وليس صورة ربما تكون هذه طريقة اصطناعية لقراءة كتاب إلكتروني أنا لا أرجحها

It's also very flexible architecture. This is an entire book, so this is an example of non-image data. This is "Bleak House" by Dickens. Every column is a chapter. To prove to you that it's really text, and not an image, we can do something like so, to really show that this is a real representation of the text; it's not a picture. Maybe this is an artificial way to read an e-book. I wouldn't recommend it.

هذه حالة أكثر واقعية. هذا عدد من صحيفة الجارديان كل صورة كبيرة هي عبارة عن بداية قِسم وهذا يعطيك البهجة والخبرة الجيدة التي تنشدها في قراءة النسخة الورقية من مجلة أو جريدة والتي تعتبر بطبيعتها طريقة متعددة المقاييس كما فعلنا أيضاً شيئاً آخر في ركن هذه القضية في صحيفة الجارديان قمنا بعمل إعلان وهمي ذي دقة عالية -- أكبر من الدقة التي يمكن أن تراها في الإعلان العادي -- وقمنا بتضمين محتوى إضافي إذا كنت تريد أن ترى خصائص هذه السيارة، يمكن أن تراها هنا أو نماذج أخرى، أو حتى مواصفات فنية وهذا يطرح بعض من تلك الأفكار التي يمكن بها استغلال حدود الشاشة كي نجعل منها ممتلكات نأمل ألا ينتج عن هذه الطريقة أي نوع من النفايات مثل ذلك – لا يجب أن يكون ضرورياً

This is a more realistic case, an issue of The Guardian. Every large image is the beginning of a section. And this really gives you the joy and the good experience of reading the real paper version of a magazine or a newspaper, which is an inherently multi-scale kind of medium. We've done something with the corner of this particular issue of The Guardian. We've made up a fake ad that's very high resolution -- much higher than in an ordinary ad -- and we've embedded extra content. If you want to see the features of this car, you can see it here. Or other models, or even technical specifications. And this really gets at some of these ideas about really doing away with those limits on screen real estate. We hope that this means no more pop-ups and other rubbish like that -- shouldn't be necessary.

بالطبع، تعتبر الخرائط أحد التطبيقات التي تناسب تقنية كهذه وهذا لن أقضي فيه أي وقت إلا لأقول أن لدينا أشياء يمكن أن نشارك بها في هذا المجال أيضاً لكن هذه هي جميع الطرق في الولايات المتحدة متراكبة بالإضافة إلى صورة ناسا الأرضية الفضائية دعونا نتوقف، الآن، شيء آخر هذا في الحقيقة موجود مباشرة على الويب الآن؛ يمكنكم تصفح الإنترنت ورؤيته

Of course, mapping is one of those obvious applications for a technology like this. And this one I really won't spend any time on, except to say that we have things to contribute to this field as well. But those are all the roads in the U.S. superimposed on top of a NASA geospatial image. So let's pull up, now, something else. This is actually live on the Web now; you can go check it out.

هذا هو مشروع يسمى Photosynth والذي يضم تقنيتين مختلفتين إحداهما هي تقنية سي دراجون والأخرى هي عبارة عن بحث رائع عن رؤية الكمبيوتر أجراه نوح سنافيلي، وهو طالب متخرج من جامعة واشنطن وكان تحت إشراف ستيف زايتز في جامعة واشنطن وريك زيليسكي في معهد أبحاث ميكروسوفت وهو تعاون جميل للغاية وهكذا، فهذا مباشر على الويب. ويتم تشغيله عن طريق سي دراجون يمكنكم أن تروا ذلك عندما نقوم بعمل هذه الأنواع من العروض حيث يمكننا الغوص في الصور ويكون لدينا هذا النوع من التجربة ذات الدقة المتعددة

This is a project called Photosynth, which marries two different technologies. One of them is Seadragon and the other is some very beautiful computer-vision research done by Noah Snavely, a graduate student at the University of Washington, co-advised by Steve Seitz at U.W. and Rick Szeliski at Microsoft Research. A very nice collaboration. And so this is live on the Web. It's powered by Seadragon. You can see that when we do these sorts of views, where we can dive through images and have this kind of multi-resolution experience.

لكن الترتيب المكاني للصور هنا له معنى حقيقي فإن الخواريزمات الخاصة برؤية الكمبيوتر قد سجلت هذه الصور مع بعضها حتى تناسب المساحة الحقيقية التي تم أخذ هذه الصورة فيها بالقرب من بحيرات جراسي في جبال الروكي الكندية -- فأنتم ترون هنا عناصر لعرض تقديمي ثابت أو تصوير بانورامي وتم ربط هذه الأشياء جميعًا بشكل مكاني أنا لست متأكدًا إذا ما كان لدي وقت لأوضح لكم بيئات أخرى هناك بيئات أخرى أكبر من حيث المكانية أريد أن أنتقل إلى أحد حزم بيانات نوح الأصلية -- وهذا من نموذج أولى لـ Photosynth والذي بدء العمل به في الصيف -- لأوضح لكم ما أعتقد بأنه بمثابة السطر الأخير لهذه التقنية تقنية Photosynth. وليس بالضرورة أن يكون ذلك واضحاً بالنظر إلى البيئات التي وضعناها على موقع الويب كان طبيعياً أن يساورنا القلق بشأن المحامين وغير ذلك

But the spatial arrangement of the images here is actually meaningful. The computer vision algorithms have registered these images together so that they correspond to the real space in which these shots -- all taken near Grassi Lakes in the Canadian Rockies -- all these shots were taken. So you see elements here of stabilized slide-show or panoramic imaging, and these things have all been related spatially. I'm not sure if I have time to show you any other environments. Some are much more spatial. I would like to jump straight to one of Noah's original data-sets -- this is from an early prototype that we first got working this summer -- to show you what I think is really the punch line behind the Photosynth technology, It's not necessarily so apparent from looking at the environments we've put up on the website. We had to worry about the lawyers and so on.

هذا هو عبارة عن إعادة إعمار لكاتدرائية نوتردام والذي تم بالكامل على الكمبيوتر باستخدام صور مأخوذة من موقع Flickr. ما عليك إلا أن تكتب كاتدرائية نوتردام في موقع Flickr وستحصل على بعض الصور لرجال يرتدون قمصاناً، وللحرم الجامعي وغير ذلك. وكل من هذه المخاريط البرتقالية تمثل صورة تم اكتشاف انتماءها لهذا النموذج وهكذا فكل هذه الصور مأخوذة من موقع Flickr وتم ربطها جميعاً مكانياً بهذه الطريقة ويمكننا التنقل بهذه الطريقة البسيطة للغاية (تصفيق)

This is a reconstruction of Notre Dame Cathedral that was done entirely computationally from images scraped from Flickr. You just type Notre Dame into Flickr, and you get some pictures of guys in T-shirts, and of the campus and so on. And each of these orange cones represents an image that was discovered to belong to this model. And so these are all Flickr images, and they've all been related spatially in this way. We can just navigate in this very simple way. (Applause)

(Applause ends)

أتعرفون، أنا لم أفكر أبداً بأنني سينتهي بي الأمر لأعمل بشركة ميكروسوفت إنه لمن السار جدًا أن أحظى بهذا النوع من الاستقبال هنا (ضحك)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here. (Laughter)

أعتقد أنكم تستطيعون أن تروا هذه عبارة عن أنواع مختلفة من الكاميرات فيها كل شيء بدءاً من كاميرات التليفونات المحمولة وحتى كاميرات SLR الاحترافية عدد كبير جداً منها، موجودين معاً في هذه البيئة وإذا استطعت، سوف أبحث عن بعض الكاميرات الغريبة الكثير منها تحتوي على أوجه، وما إلى ذلك وفي مكان ما هنا، توجد سلسلة من الصور – ها هي هنا هذا في الحقيقة ملصق لكاتدرائية نوتردام الذي تم تسجيله بشكل صحيح ويمكن أن نغوص من الملصق إلى عرض مادي لهذه البيئة

I guess you can see this is lots of different types of cameras: it's everything from cell-phone cameras to professional SLRs, quite a large number of them, stitched together in this environment. If I can find some of the sort of weird ones -- So many of them are occluded by faces, and so on. Somewhere in here there is actually a series of photographs -- here we go. This is actually a poster of Notre Dame that registered correctly. We can dive in from the poster to a physical view of this environment.

والموضوع هنا هو أننا يمكن أن نفعل أشياء باستخدام البيئة الاجتماعية. ويتم في ذلك أخذ بيانات من الجميع -- من الذاكرة الكلية من الشكل الذي تبدو عليه الأرض، من الناحية البصرية -- ويتم ربط كل ذلك مع بعضه تصبح جميع هذه الصور مرتبطة مع بعضها ثم ينتج عنها شيء ما أعظم من مجرد كونه تجميع لهذه الأجزاء لديكم نموذج نشأ عن الكرة الأرضية بالكامل فكروا في ذلك على أنه الذيل الطويل لعمل ستيفن لولر عن الأرض الافتراضية وهذا شيء متنامي التعقيد بينما يستخدمه الناس، والذي تزداد فوائده للمستخدمين عندما يستخدمونه يتم تعليم صورهم بالبيانات المصغرة التي أدخلها شخص آخر فإذا تعب شخص ما في وضع علامة معينة على هؤلاء القديسين وقال من هم جميعاً، فسوف يتم تزويد صورتي عن كاتدرائية نوتردام بجميع هذه البيانات ويمكن أن استخدمها أنا كنقطة إدخال للغوص في هذه المساحة باستخدام صور الآخرين ويمكنني عمل نموذج عام وعمل تجربة اجتماعية لجميع المستخدمين بهذه الطريقة وبالطبع، سيكون هناك منتج ثانوي لكل ذلك وهو عبارة عن نماذج افتراضية غنية للغاية لكل جزء ممتع من الكرة الأرضية، يتم جمعه ليس فقط من الطائرات وصور الأقمار الصناعية وغير ذلك، ولكن أيضاً من الذاكرة الإجمالية

What the point here really is is that we can do things with the social environment. This is now taking data from everybody -- from the entire collective memory, visually, of what the Earth looks like -- and link all of that together. Those photos become linked, and they make something emergent that's greater than the sum of the parts. You have a model that emerges of the entire Earth. Think of this as the long tail to Stephen Lawler's Virtual Earth work. And this is something that grows in complexity as people use it, and whose benefits become greater to the users as they use it. Their own photos are getting tagged with meta-data that somebody else entered. If somebody bothered to tag all of these saints and say who they all are, then my photo of Notre Dame Cathedral suddenly gets enriched with all of that data, and I can use it as an entry point to dive into that space, into that meta-verse, using everybody else's photos, and do a kind of a cross-modal and cross-user social experience that way. And of course, a by-product of all of that is immensely rich virtual models of every interesting part of the Earth, collected not just from overhead flights and from satellite images and so on, but from the collective memory.

أشكركم شكراً جزيلاً (تصفيق)

Thank you so much. (Applause)

(Applause ends)

كريس أندرسون: هل أفهم ذلك بشكل صحيح؟ هذا ما سيسمح به برنامجك وهو أنه في وقت ما، في خلال السنوات القليلة القادمة سوف يتم ربط جميع الصور التي يشارك بها أي شخص في العالم مع بعضها؟

Chris Anderson: Do I understand this right? What your software is going to allow, is that at some point, really within the next few years, all the pictures that are shared by anyone across the world are going to link together?

BAA: نعم. ما يفعله البرنامج هو الاكتشاف فهو يقوم بإنشاء ارتباطات تشعبية، إذا أردت، بين الصور وهو يقوم بذلك بناءً على محتوى الصورة وسيكون ذلك مثيراً بالفعل عندما تفكر في إثراء المعلومات الدلالية التي تحويها هذه الصور فهذا مثل بحثك عن الصور في الويب تقوم بكتابة عبارات، ويظهر بصفحة الويب الكثير من المعلومات حول ما تحويه الصورة والآن، ماذا إذا كانت الصورة ترتبط بجميع الصور لديك؟ فسيكون مقدار الارتباط الدلالي ومقدار الثراء الناتج عن ذلك كبير جداً. فهو تأثير شبكي ممتاز

BAA: Yes. What this is really doing is discovering, creating hyperlinks, if you will, between images. It's doing that based on the content inside the images. And that gets really exciting when you think about the richness of the semantic information a lot of images have. Like when you do a web search for images, you type in phrases, and the text on the web page is carrying a lot of information about what that picture is of. What if that picture links to all of your pictures? The amount of semantic interconnection and richness that comes out of that is really huge.

كريس أندرسون: هذا فعلاً شيء لا يُصدق. تهانينا

It's a classic network effect.

BAA: شكراً جزيلاً لك

CA: Truly incredible. Congratulations.

(Applause ends)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here. (Laughter)

أشكركم شكراً جزيلاً (تصفيق)

Thank you so much. (Applause)

(Applause ends)

كريس أندرسون: هذا فعلاً شيء لا يُصدق. تهانينا

It's a classic network effect.

BAA: شكراً جزيلاً لك

CA: Truly incredible. Congratulations.

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art