Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

What I'm going to show you first, as quickly as I can, is some foundational work, some new technology that we brought to Microsoft as part of an acquisition almost exactly a year ago. This is Seadragon, and it's an environment in which you can either locally or remotely interact with vast amounts of visual data.

میں آپ کو سب سے پہلے ہر ممکن جلد کچھ بنیادی کام دکھاؤں گا۔ یہ وہ نئی ٹیکنالوجی ہے، جو ہم ٹھیک ایک سال پہلے مائیکروسافٹ میں حصول کے جزو کے طور پر لے کر آئے تھے۔ اس کا نام سی ڈریگن (Seadragon) ہے۔ یہ ایک ایسا ماحول ہے جس میں آپ بصری کوائف کی بڑی مقدار کے ساتھ نزدیک یا فاصلے سے تعامل کر سکتے ہیں۔

We're looking at many, many gigabytes of digital photos here and kind of seamlessly and continuously zooming in, panning through it, rearranging it in any way we want. And it doesn't matter how much information we're looking at, how big these collections are or how big the images are. Most of them are ordinary digital camera photos, but this one, for example, is a scan from the Library of Congress, and it's in the 300 megapixel range. It doesn't make any difference because the only thing that ought to limit the performance of a system like this one is the number of pixels on your screen at any given moment. It's also very flexible architecture. This is an entire book, so this is an example of non-image data. This is "Bleak House" by Dickens. Every column is a chapter. To prove to you that it's really text, and not an image, we can do something like so, to really show that this is a real representation of the text; it's not a picture. Maybe this is an artificial way to read an e-book. I wouldn't recommend it.

ہم یہاں کئی گیگا بائیٹس کی ڈیجیٹل تصاویر دیکھ رہے ہیں اور ایک لحاظ سے بلارکاوٹ اور مسلسل زوم ان کر سکتے ہیں، ہم اس کو گھما کر دیکھ سکتے ہیں اور جس طرح چاہیں دوبارہ ترتیب دے سکتے ہیں۔ اور اس سے کچھ فرق نہیں پڑتا کہ ہم کتنی معلومات کو دیکھ رہے ہیں، یا یہ مجموعے کتنے بڑے ہیں یا تصاویر کتنی بڑی ہیں۔ ان میں سے بیشتر ڈیجیٹل کیمرے سے لی گئی عام تصاویر ہیں، لیکن یہ والی، مثال کے طور پر، لائبریری آف کانگریس کی ایک اسکین تصویر ہے، اور اس کی حد 300 میگا پکسل ہے۔ اس سے کوئی فرق نہیں پڑتا کیونکہ اس طرح کے کسی نظام کی کارکردگی کو محدود کر سکنے والی واحد چیز آپ کی اسکرین پر اس وقت پائے جانے والے پکسلز کی تعداد ہے۔ یہ انتہائی لچک دار بناوٹ ہے۔ یہ ایک مکمل کتاب ہے، جو کہ بلاتصویر کوائف کی ایک مثال ہے۔ یہ ڈکنز کی لکھی ہوئی کتاب بلیک ہاؤس ہے۔ ہر کالم ایک باب ہے۔ آپ کے سامنے یہ ثابت کرنے کے لئے کہ یہ واقعی متن ہے اور تصویر نہیں، ہم اسے یوں کر سکتے ہیں، تاکہ واقعی یہ ظاہر کیا جا سکے کہ یہ متن کی حقیقی نمائندگی ہے؛ یہ کوئی تصویر نہیں ہے۔ ہوسکتا ہے کہ کسی برقی کتاب کو پڑھنے کا یہ کوئی مصنوعی طریقہ ہو۔ میں اس کی سفارش نہیں کروں گا۔

This is a more realistic case, an issue of The Guardian. Every large image is the beginning of a section. And this really gives you the joy and the good experience of reading the real paper version of a magazine or a newspaper, which is an inherently multi-scale kind of medium. We've done something with the corner of this particular issue of The Guardian. We've made up a fake ad that's very high resolution -- much higher than in an ordinary ad -- and we've embedded extra content. If you want to see the features of this car, you can see it here. Or other models, or even technical specifications. And this really gets at some of these ideas about really doing away with those limits on screen real estate. We hope that this means no more pop-ups and other rubbish like that -- shouldn't be necessary.

یہ زیادہ حقیقت پسند مثال ہے۔ یہ روزنامہ گارڈین کا ایک شمارہ ہے۔ ہر بڑی تصویر ایک حصے کا آغاز ہے۔ اور اس سے آپ کو کسی رسالے یا اخبار کو حقیقی کاغذی شکل میں پڑھنے کا لطف اور اچھا تجربہ ملے گا، جو کہ فطری طور پر ایک کثیر الپیمانہ قسم کا ذریعہ ہے۔ ہم نے روزنامہ گارڈین کے اس مخصوص شمارے کے کونے پر کچھ تبدیلی کی ہے۔ ہم نے ایک جعلی اشتہار تیار کیا ہے جو انتہائی اعلی معیار (ہائی ریزولوشن) کا ہے ۔۔ یہ اس سے کافی زیادہ ہیں جو آپ کسی عام اشتہار میں دیکھتے ہیں ۔۔ اور ہم نے اس میں اضافی مواد شامل کیا ہے۔ اگر آپ اس کار کی خصوصیات دیکھنا چاہیں، تو آپ انہیں یہاں دیکھ سکتے ہیں۔ یا دیگر ماڈل یا پھر تکنیکی تصریحات بھی دیکھی جا سکتی ہیں۔ اور یہ ان خیالات کا پیکر ہے جس میں اسکرین کی حدود سے باہر نکلا جا سکتا ہے۔ ہمیں امید ہے کہ اس سے مراد پاپ اپ یا اس قسم کا دیگر بکواس نہیں ہے ۔۔ اس کی ضرورت بھی نہیں ہونی چاہیے۔

Of course, mapping is one of those obvious applications for a technology like this. And this one I really won't spend any time on, except to say that we have things to contribute to this field as well. But those are all the roads in the U.S. superimposed on top of a NASA geospatial image. So let's pull up, now, something else. This is actually live on the Web now; you can go check it out.

بلاشبہ، اس قسم کی نمایاں ٹیکنالوجی کا ایک اطلاق نقشوں پر بھی ہوتا ہے۔ میں اس پر زیادہ وقت صرف نہیں کروں گا، سوائے اس کے کہ اس شعبے میں بھی کردار ادا کرنے کے لئے ہمارے پاس چیزیں موجود ہیں۔ وہ سب امریکا کی سڑکیں ہیں جنہیں ناسا کی جغرافیائی مکان کی حامل ایک تصویر کے اوپر مصنوعی انداز سے چسپاں کیا گیا ہے۔ تو چلیں آئیں کچھ اور دیکھتے ہیں۔ یہ اس وقت ویب پر براہ راست ہے، آپ جا کر اسے چیک کر سکتے ہیں۔

This is a project called Photosynth, which marries two different technologies. One of them is Seadragon and the other is some very beautiful computer-vision research done by Noah Snavely, a graduate student at the University of Washington, co-advised by Steve Seitz at U.W. and Rick Szeliski at Microsoft Research. A very nice collaboration. And so this is live on the Web. It's powered by Seadragon. You can see that when we do these sorts of views, where we can dive through images and have this kind of multi-resolution experience.

یہ ایک منصوبہ ہے جس کا نام فوٹو سنتھ (Photosynth) ہے، جو درحقیقت دو مختلف ٹیکنالوجیوں کا بندھن ہے۔ ان میں سے ایک سی ڈریگن ہے اور دوسری ایک انتہائی خوبصورت کمپیوٹر بصری تحقیق ہے جسے یونیورسٹی آف واشنگٹن کے ایک گریجویٹ طالب علم نوہا اسنیولے نے سر انجام دیا ہے، اس کام میں یونیورسٹی آف واشنگٹن کے اسٹیو سائٹز اور مائیکروسافٹ ریسرچ کے رک سزیلیسکی نے ان کی معاونت کی ہے۔ ایک انتہائی عمدہ اشتراک۔ تو یہ ویب پر براہ راست ہے۔ اسے سی ڈریگن سے طاقت ملتی ہے۔ آپ دیکھ سکتے ہیں کہ جب ہم انہیں اس انداز سے ملاحظہ کرتے ہیں، جہاں ہم تصاویر کے اندر غوطہ لگا سکتے ہیں اور اس قسم کا ملٹی ریزولوشن تجربہ کر سکتے ہیں۔

But the spatial arrangement of the images here is actually meaningful. The computer vision algorithms have registered these images together so that they correspond to the real space in which these shots -- all taken near Grassi Lakes in the Canadian Rockies -- all these shots were taken. So you see elements here of stabilized slide-show or panoramic imaging, and these things have all been related spatially. I'm not sure if I have time to show you any other environments. Some are much more spatial. I would like to jump straight to one of Noah's original data-sets -- this is from an early prototype that we first got working this summer -- to show you what I think is really the punch line behind the Photosynth technology, It's not necessarily so apparent from looking at the environments we've put up on the website. We had to worry about the lawyers and so on.

لیکن یہاں موجود تصاویر میں خصوصیات کی ترتیب دراصل معنی خیز ہے۔ کمپیوٹر کے بصری الگورزم نے ان تصاویر کا اکٹھے اندراج کیا ہے، چنانچہ وہ اس حقیقی جگہ سے مطابقت رکھتے ہیں جن کی یہ تصاویر ہیں ۔۔ یہ تمام تصاویر کینیڈا کے چٹانی پہاڑی سلسلے میں واقع گراسی جھیلوں کے نزدیکی علاقے کی ہیں چنانچہ آپ کو یہاں مستحکم شدہ سلائیڈ شو یا منظر کی تصاویر کے عناصر نظر آئیں گے، اور مکان کے لحاظ سے یہ تمام چیزیں مشترک ہیں۔ مجھے یقینی طور پر نہیں معلوم کہ آیا میرے پاس آپ کو کوئی دیگر ماحول دکھانے کا وقت ہے۔ ان میں سے بعض ایسی ہیں جن میں مکان کا عنصر بہت زیادہ ہے۔ میں براہ راست نوہا کے حقیقی کوائف سیٹ پر جانا چاہوں گا ۔۔ اور یہ فوٹو سینتھ کے ایک پرانے آزمائشی سافٹ ویئر سے لیا گیا ہے جسے ہم نے پہلے پہل موسم گرما میں آزمایا ۔۔ میں اس میں آپ کو دکھاؤں گا کہ میرے خیال میں حقیقت میں اس ٹیکنالوجی کے پیچھے کیا اہم نقطہ ہے، یعنی فوٹو سینتھ ٹیکنالوجی۔ اور ضروری نہیں کہ ہم نے ویب سائٹ پر جو ماحول رکھے ہیں ان کو دیکھ کر یہ واضح ہو جائے۔ ہم وکلاء اور اسی طرح کے دیگر افراد کی جانب سے فکرمند تھے۔

This is a reconstruction of Notre Dame Cathedral that was done entirely computationally from images scraped from Flickr. You just type Notre Dame into Flickr, and you get some pictures of guys in T-shirts, and of the campus and so on. And each of these orange cones represents an image that was discovered to belong to this model. And so these are all Flickr images, and they've all been related spatially in this way. We can just navigate in this very simple way.

یہ نوترے دیم کیتھیڈرل کی تعمیر نو کا نمونہ ہے جسے مکمل طور پر فلکر پر موجود تصاویر کی مدد سے کمپیوٹر کے ذریعے تیار کیا گیا۔ آپ فلکر میں صرف نوترے دیم ٹائپ کریں، اور آپ کو ٹی شرٹ میں ملبوس لڑکوں کی تصاویر اور کیمپس اور دیگر جگہوں کی تصاویر ملیں گی۔ اور نارنگی رنگ کی ہر کون ایک تصویر کو ظاہر کرتی ہے جسے اس نمونے کے حصے کے طور پر دریافت کیا گیا۔ تو یہ سب فلکر کی تصاویر ہیں، اور مکان کے اعتبار سے ان سب کا ایک دوسرے سے تعلق ہے۔ اور ہم اس آسان طریقے سے سمت متعین کر سکتے ہیں۔

(Applause)

(تالیاں)۔

(Applause ends)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here.

آپ کو علم ہے، میں نے کبھی سوچا بھی نہیں تھا کہ میں مائیکروسافٹ میں کام کرنے لگوں گا۔ یہاں اس قسم کا استقبال بہت مسرت بخش ہے۔

(Laughter)

(قہقہے)۔

I guess you can see this is lots of different types of cameras: it's everything from cell-phone cameras to professional SLRs, quite a large number of them, stitched together in this environment. If I can find some of the sort of weird ones -- So many of them are occluded by faces, and so on. Somewhere in here there is actually a series of photographs -- here we go. This is actually a poster of Notre Dame that registered correctly. We can dive in from the poster to a physical view of this environment.

میرا اندازہ ہے کہ آپ کو یہاں بہت سے مختلف النوع کیمرے نظر آ رہے ہوں گے: ان میں موبائل فون کے کیمروں سے لیکر ایس ایل آر تک ہر قسم کے کیمرے ہیں، یہ بہت بڑی تعداد میں ہیں جو یہاں اس ماحول میں ایک دوسرے سے منسلک ہیں۔ اور اگر میں تلاش کر سکا تو میں کچھ عجیب طرح کے بھی ڈھونڈ نکالوں گا۔ ان میں بہت سے سامنے سے بند ہیں اور وغیرہ وغیرہ۔ یہاں کسی کے اندر دراصل تصاویر کا ایک سلسلہ ہے ۔۔ یہ رہا۔ یہ دراصل نوترے دیم کا ایک پوسٹر ہے جس کا درست طریقے سے اندراج کیا گیا ہے۔ ہم پوسٹر سے غوطہ لگا کر اس ماحول کا طبعی منظر دیکھ سکتے ہیں۔

What the point here really is is that we can do things with the social environment. This is now taking data from everybody -- from the entire collective memory, visually, of what the Earth looks like -- and link all of that together. Those photos become linked, and they make something emergent that's greater than the sum of the parts. You have a model that emerges of the entire Earth. Think of this as the long tail to Stephen Lawler's Virtual Earth work. And this is something that grows in complexity as people use it, and whose benefits become greater to the users as they use it. Their own photos are getting tagged with meta-data that somebody else entered. If somebody bothered to tag all of these saints and say who they all are, then my photo of Notre Dame Cathedral suddenly gets enriched with all of that data, and I can use it as an entry point to dive into that space, into that meta-verse, using everybody else's photos, and do a kind of a cross-modal and cross-user social experience that way. And of course, a by-product of all of that is immensely rich virtual models of every interesting part of the Earth, collected not just from overhead flights and from satellite images and so on, but from the collective memory.

دراصل یہاں نقطہ یہ ہے کہ ہم سماجی ماحول کے ساتھ کام کر سکتے ہیں۔ اب یہ ہر کسی سے کوائف حاصل کر رہا ہے ۔۔ مکمل مجموعی یاد داشت کے ذریعے، بصری طور پر کہ زمین کیسی دکھائی دیتی ہے ۔۔ اور ان سب کو آپس میں جوڑ دے گا۔ یہ تمام تصاویر آپس میں مربوط ہو جاتی ہیں، اور ان سے وہ چیز نمودار ہوتی ہے جو تمام اجزا کے مجموعے سے بڑی ہے۔ آپ کے پاس ایک نمونہ ہے جو تمام زمین سے نمودار ہوتا ہے۔ اسے اسٹیفن لالر کی مجازی زمین کے کام کے ساتھ دمدار شکل کے طور پر سمجھیں۔ اور اس کی پیچیدگی میں اس وقت اضافہ ہو جاتا ہے جب لوگ اسے استعمال کرتے ہیں اور استعمال کنندگان کے لئے اس کے فوائد میں اضافہ ہو جاتا ہے۔ ان کی اپنی تصاویر پر میٹا کوائف کا ٹیگ لگ جاتا ہے جو کسی اور شخص نے داخل کیا ہوتا ہے۔ اگر کوئی شخص ان تمام ولیوں کو ٹیگ لگانے کی زحمت کرے اور کہے کہ وہ سب کون ہیں، پھر نوترے دیم کیتھیڈرل کی میری تصویر اچانک ہی ان تمام کوائف سے مزین ہوجاتی ہے، اور پھر میں خلا میں غوطہ لگانے کے لئے اسے انٹری پوائنٹ کے طور پر استعمال کر سکتا ہوں، اور ہر کسی کی تصاویر استعمال کرتے ہوئے اس ذاتی خلاصہ میں جا سکتا ہوں اور ایک طرح سے مختلف انداز کا اور مختلف استعمال کنندگان کے سماجی تجربے سے لطف اندوز ہو سکتا ہوں۔ اور بلاشبہ ان سب کی ایک ضمنی پیداوار زمین کے ہر دلچسپ حصے کے انتہائی بیش قیمت مجازی نمونے ہیں جنہیں ہوائی جہاز کی پروازوں اور سیٹیلائٹ کی تصاویر اور اس کے ساتھ ساتھ مشترکہ یاد داشت کی مدد سے بھی حاصل کیا جاتا ہے۔

Thank you so much.

آپ کا بہت شکریہ۔

(Applause)

(تالیاں)۔

(Applause ends)

Chris Anderson: Do I understand this right? What your software is going to allow, is that at some point, really within the next few years, all the pictures that are shared by anyone across the world are going to link together?

کرس اینڈرسن: کیا میں ٹھیک سمجھ رہا ہوں؟ یہ کہ آپ کا سافٹ ویئر آئندہ چند برسوں میں کسی وقت دنیا بھر سے شراکت کی جانے والی تمام تصاویر کو آپس میں ملانے کی اجازت دے دے گا؟

BAA: Yes. What this is really doing is discovering, creating hyperlinks, if you will, between images. It's doing that based on the content inside the images. And that gets really exciting when you think about the richness of the semantic information a lot of images have. Like when you do a web search for images, you type in phrases, and the text on the web page is carrying a lot of information about what that picture is of. What if that picture links to all of your pictures? The amount of semantic interconnection and richness that comes out of that is really huge. It's a classic network effect.

بلئیس اگوارا: جی ہاں۔ یہ حقیقت میں دریافت کر رہا ہے۔ یہ تصاویر کے درمیان ہائپر لنکس قائم کرتا ہے۔ اور وہ یہ کام تصاویر کے اندر موجود مواد کے ذریعے کرتا ہے۔ اور یہ چیز انتہائی سنسنی خیز ہوجاتی ہے جب ان میں سے بہت سی تصاویر کے اندر موجود معنویتی معلومات کی قدر کے متعلق سوچیں۔ اس طرح جیسے آپ تصاویر کی ویب تلاش کرتے ہیں، آپ الفاظ ٹائپ کرتے ہیں اور ویب صفحے پر موجود متن میں تصویر کے متعلق بہت سی معلومات شامل ہوتی ہیں۔ اب، اس وقت کیا ہوتا ہے جب وہ تصویر آپ کی تمام تصاویر سے منسلک ہوجاتی ہے؟ پھر معنویتی باہم مربوط اور اس سے باہر آنے والی معلومات کی مقدار حقیقت میں بہت بڑی ہے۔ یہ نیٹ ورک کا ایک کلاسیکی اثر ہے۔ کرس اینڈرسن: بلیئس، یہ واقعی ناقابل یقین ہے۔ مبارک ہو۔

CA: Truly incredible. Congratulations.

بلئیس اگوارا: بہت شکریہ۔

(Applause)

(تالیاں)۔

(Applause ends)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here.

(Laughter)

(قہقہے)۔

Thank you so much.

آپ کا بہت شکریہ۔

(Applause)

(تالیاں)۔

(Applause ends)

CA: Truly incredible. Congratulations.

بلئیس اگوارا: بہت شکریہ۔

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art