Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Πρώτα θα σας δείξω στα γρήγορα κάποια θεμελιώδη δουλειά, μια καινούρια τεχνολογία την οποία φέραμε στη Microsoft ως μέρος μιας εξαγοράς πριν από ένα χρόνο περίπου. Είναι το Seadragon. Είναι ένα περιβάλλον όπου μπορείτε, είτε τοπικά είτε απομακρυσμένα να επεξεργαστείτε μεγάλες ποσότητες εικαστικών δεδομένων.

What I'm going to show you first, as quickly as I can, is some foundational work, some new technology that we brought to Microsoft as part of an acquisition almost exactly a year ago. This is Seadragon, and it's an environment in which you can either locally or remotely interact with vast amounts of visual data.

Εδώ βλέπουμε ψηφιακές εικόνες που πιάνουν πάρα πολλά gigabytes και μπορούμε να τις μεγεθύνουμε συνεχόμενα να αλλάζουμε όψη και να τις αναδιοργανώσουμε όπως θέλουμε. Και δεν έχει σημασία η ποσότητα των δεδομένων που επεξεργαζόμαστε, πόσο μεγάλες είναι οι συλλογές ή οι εικόνες. Οι περισσότερες προήλθαν από συνηθισμένες ψηφιακές φωτογραφικές μηχανές, αλλά αυτή, για παράδειγμα, είναι από τη βιβλιοθήκη του Κονγκρέσου, και είναι γύρω στα 300 megapixels. Δεν έχει καμία διαφορά γιατί το μόνο πράγμα που θα 'πρεπε να περιορίζει την επίδοση ενός συστήματος σαν και αυτού είναι ο αριθμός των pixels της οθόνης σας

We're looking at many, many gigabytes of digital photos here and kind of seamlessly and continuously zooming in, panning through it, rearranging it in any way we want. And it doesn't matter how much information we're looking at, how big these collections are or how big the images are. Most of them are ordinary digital camera photos, but this one, for example, is a scan from the Library of Congress, and it's in the 300 megapixel range. It doesn't make any difference because the only thing that ought to limit the performance of a system like this one is the number of pixels on your screen at any given moment.

σε οποιαδήποτε στιγμή. Επίσης είναι πολύ ευέλικτο σύστημα. Εδώ είναι ένα ολόκληρο βιβλίο, ένα παράδειγμα μη-εικονικών δεδομένων. Αυτό είναι το «Έρημο Σπίτι» του Ντίκενς. Κάθε στήλη δείχνει ένα κεφάλαιο. Για να σας αποδείξω ότι είναι πραγματικά κείμενο και όχι εικόνα, μπορούμε να κάνουμε κάτι τέτοιο, για να σας πείσω ότι εδώ έχουμε πραγματικά το κείμενο, και όχι μια εικόνα. Ίσως να μην είναι ο καλύτερος τρόπος να διαβάσει κανείς ένα ηλεκτρονικό βιβλίο. Δεν θα το συνιστούσα.

It's also very flexible architecture. This is an entire book, so this is an example of non-image data. This is "Bleak House" by Dickens. Every column is a chapter. To prove to you that it's really text, and not an image, we can do something like so, to really show that this is a real representation of the text; it's not a picture. Maybe this is an artificial way to read an e-book. I wouldn't recommend it.

Εδώ έχουμε ένα πιο ρεαλιστικό παράδειγμα. Ένα τεύχος της εφημερίδας «The Guardian». Κάθε μεγάλη εικόνα είναι η αρχή ενός τμήματος. Και έτσι πραγματικά σου δίνει τη χαρά και την εμπειρία του να διαβάζεις την κανονική έκδοση ενός περιοδικού ή μιας εφημερίδας, τα οποία περιέχουν πληροφορίες σε διάφορες κλίμακες. Επίσης έχουμε κάνει κάτι στη γωνία αυτού του συγκεκριμένου τεύχους της εφημερίδας «Guardian». Φτιάξαμε μια ψεύτικη διαφήμιση πολύ υψηλής ανάλυσης -- πολύ υψηλότερης απ' ότι θα βρίσκαμε σε μια κανονική διαφήμιση -- και έχουμε προσθέσει επιπλέον υλικό. Αν θέλετε να δείτε τα χαρακτηριστικά αυτού του αυτοκινήτου, μπορείτε να τα δείτε εδώ. Ή άλλα μοντέλα, ακόμα και τεχνικά χαρακτηριστικά. Και αυτό πραγματικά εξυπηρετεί κάποιες από αυτές τις ιδέες στο να ξεπεραστούν οι περιορισμοί του μεγέθους της οθόνης. Ευχόμαστε ότι αυτό σημαίνει πως δεν θα υπάρχουν πια 'pop-ups' ούτε άλλα τέτοια πράγματα -- δεν θα χρειάζεται.

This is a more realistic case, an issue of The Guardian. Every large image is the beginning of a section. And this really gives you the joy and the good experience of reading the real paper version of a magazine or a newspaper, which is an inherently multi-scale kind of medium. We've done something with the corner of this particular issue of The Guardian. We've made up a fake ad that's very high resolution -- much higher than in an ordinary ad -- and we've embedded extra content. If you want to see the features of this car, you can see it here. Or other models, or even technical specifications. And this really gets at some of these ideas about really doing away with those limits on screen real estate. We hope that this means no more pop-ups and other rubbish like that -- shouldn't be necessary.

Βέβαια, η χαρτογράφηση είναι μια προφανής εφαρμογή για μια τέτοια τεχνολογία. Δεν θα διαθέσω πολύ χρόνο σ' αυτό το παράδειγμα, παρά μόνο να πω ότι μπορούμε να συμβάλλουμε και σε αυτόν τον τομέα. Αυτοί είναι όλοι οι δρόμοι των Η.Π.Α. πάνω σε μια γεωχωρική εικόνα της NASA. Ας δούμε κάτι άλλο τώρα. Αυτό υπάρχει τώρα στο δίκτυο, μπορείτε να πάτε να το δείτε.

Of course, mapping is one of those obvious applications for a technology like this. And this one I really won't spend any time on, except to say that we have things to contribute to this field as well. But those are all the roads in the U.S. superimposed on top of a NASA geospatial image. So let's pull up, now, something else. This is actually live on the Web now; you can go check it out.

Είναι ένα πρόγραμμα που λέγεται Photosynth, και παντρεύει δύο διαφορετικές τεχνολογίες. Η μία είναι το Seadragon και η άλλη είναι μια έρευνα στον τομέα της όρασης υπολογιστών που διεξήχθη από τον Νόα Σνέιβλι, απόφοιτο του πανεπιστημίου της Ουάσινγκτον, υπό την καθοδήγηση του Στιβ Σάιτς, και του Ρικ Ζελίσκι στη Microsoft Research. Μια πολύ ωραία συνεργασία. Αυτό υπάρχει ζωντανά στο διαδίκτυο. Δουλεύει με το Seadragon. Βλέπετε όταν το κοιτάμε από τέτοιες προοπτικές μπορούμε να βουτάμε μέσα από τις εικόνες και να έχουμε μια πολυδιάστατη εμπειρία.

This is a project called Photosynth, which marries two different technologies. One of them is Seadragon and the other is some very beautiful computer-vision research done by Noah Snavely, a graduate student at the University of Washington, co-advised by Steve Seitz at U.W. and Rick Szeliski at Microsoft Research. A very nice collaboration. And so this is live on the Web. It's powered by Seadragon. You can see that when we do these sorts of views, where we can dive through images and have this kind of multi-resolution experience.

Αλλά η χωρική διάταξη των εικόνων εδώ είναι σημαντική. Οι αλγόριθμοι έχουν ευθυγραμμίσει τις εικόνες, ώστε να αντιστοιχούν στον πραγματικό χώρο απ' όπου τραβήχτηκαν αυτές οι εικόνες -- όλες κοντά στις λίμνες Γκράσι στα Καναδικά Όρη. Έτσι βλέπετε στοιχεία εδώ ενός σταθεροποιημένου slide-show ή πανοραμικής απεικόνισης, και όλα αυτά έχουν συσχετιστεί χωρικά. Δεν ξέρω αν έχω χρόνο να σας δείξω κάποια ακόμη περιβάλλοντα. Είναι κάποια τα οποία είναι πολύ πιο ανοιχτά σε χώρο. Θα ήθελα να πάω κατευθείαν σε ένα από τα πρωτότυπα σετ δεδομένων του Νόα-- και αυτό είναι ένα αρχικό πρωτότυπο του Photosynth το οποίο ξεκινήσαμε να δουλεύουμε το καλοκαίρι -- για να σας δείξω τι πιστεύω πως είναι στ' αλήθεια το κλειδί πίσω από αυτήν την τεχνολογία, την τεχνολογία του Photosynth. Δεν είναι απαραίτητα τόσο ξεκάθαρο βλέποντας τα περιβάλλοντα που έχουμε βάλει στην ιστοσελίδα. Έπρεπε να προσέχουμε για νομικά ζητήματα κλπ.

But the spatial arrangement of the images here is actually meaningful. The computer vision algorithms have registered these images together so that they correspond to the real space in which these shots -- all taken near Grassi Lakes in the Canadian Rockies -- all these shots were taken. So you see elements here of stabilized slide-show or panoramic imaging, and these things have all been related spatially. I'm not sure if I have time to show you any other environments. Some are much more spatial. I would like to jump straight to one of Noah's original data-sets -- this is from an early prototype that we first got working this summer -- to show you what I think is really the punch line behind the Photosynth technology, It's not necessarily so apparent from looking at the environments we've put up on the website. We had to worry about the lawyers and so on.

Αυτό είναι μια ανακατασκευή του καθεδρικού της Νοτρ Νταμ η οποία έγινε εξολοκλήρου υπολογιστικά από εικόνες του Flickr. Απλά γράφετε Notre Dame στο Flickr, και παίρνετε εικόνες από τύπους με μπλουζάκια, και από τον περιβάλλοντα χώρο κλπ. Και κάθε ένας από αυτούς τους πορτοκαλί κώνους, αντιπροσωπεύει μια εικόνα που βρέθηκε να ανήκει σε αυτό το μοντέλο. Και έτσι, αυτές είναι όλες εικόνες από το Flickr, και όλες έχουν συσχετιστεί χωρικά με αυτόν τον τρόπο. Και εμείς μπορούμε απλώς να περιηγηθούμε σε αυτό. (Χειροκρότημα)

This is a reconstruction of Notre Dame Cathedral that was done entirely computationally from images scraped from Flickr. You just type Notre Dame into Flickr, and you get some pictures of guys in T-shirts, and of the campus and so on. And each of these orange cones represents an image that was discovered to belong to this model. And so these are all Flickr images, and they've all been related spatially in this way. We can just navigate in this very simple way. (Applause)

(Applause ends)

Ξέρετε, ποτέ δεν πίστευα πως θα κατέληγα να δουλεύω στη Microsoft. Είναι πολύ ενθαρρυντικό να έχεις τέτοια αποδοχή εδώ. (Γέλια)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here. (Laughter)

Φαντάζομαι μπορείτε να δείτε πως έχουμε πολλούς διαφορετικούς τύπους μηχανών: τα πάντα από κινητά τηλέφωνα μέχρι επαγγελματικές μηχανές, ένα μεγάλο αριθμό από αυτές, «ραμμένες» μεταξύ τους σε αυτό το περιβάλλον. Αν μπορέσω, θα βρω κάποιες από τις πιο περίεργες. Πολλές από αυτές παρεμποδίζονται από πρόσωπα κλπ. Κάπου εδώ είναι μια σειρά φωτογραφιών -- εδώ είμαστε. Αυτό είναι ένα πόστερ της Νοτρ Νταμ το οποίο αναγνωρίστηκε επιτυχώς. Μπορούμε να βουτήξουμε από το πόστερ σε μια φυσική όψη του περιβάλλοντος.

I guess you can see this is lots of different types of cameras: it's everything from cell-phone cameras to professional SLRs, quite a large number of them, stitched together in this environment. If I can find some of the sort of weird ones -- So many of them are occluded by faces, and so on. Somewhere in here there is actually a series of photographs -- here we go. This is actually a poster of Notre Dame that registered correctly. We can dive in from the poster to a physical view of this environment.

Το νόημα είναι εδώ, πως μπορούμε να κάνουμε πράγματα με το κοινωνικό περιβάλλον. Παίρνοντας δεδομένα από τον καθένα -- από όλη τη συλλογική μνήμη του πως, οπτικά, μοιάζει η Γη -- και συσχετίζοντάς τα μεταξύ τους. Όλες αυτές οι φωτογραφίες συσχετίστηκαν μεταξύ τους, και φτιάξαν κάτι καινούριο το οποίο είναι μεγαλύτερο από το άθροισμα των στοιχείων του. Έχεις ένα μοντέλο που αναδύεται από ολόκληρη τη Γη. Σκεφτείτε το σαν μια μακρινή εκδοχή του Virtual Earth του Στιβεν Λόουλερ. Και αυτό είναι κάτι το οποίο μεγαλώνει σε πολυπλοκότητα όσο ο κόσμος το χρησιμοποιεί, και του οποίου τα ευεργετήματα προς τους χρήστες μεγαλώνουν καθώς το χρησιμοποιούνε. Οι δικές τους φωτογραφίες συμπληρώνονται με λέξεις κλειδιά που κάποιος άλλος έχει εισάγει. Αν κάποιος μπει στον κόπο να ονοματίσει όλους αυτούς του Αγίους τότε ξαφνικά η φωτογραφία μου της Νοτρ Νταμ εμπλουτίζεται με όλα αυτά τα δεδομένα, και μπορώ να τη χρησιμοποιήσω σαν σημείο εισόδου προς αυτόν το χώρο, σε αυτό το σύμπαν από λέξεις κλειδιά, χρησιμοποιώντας τις φωτογραφίες των άλλων, και να έχω μια ποικιλότροπη και πολυχρηστική εμπειρία με αυτόν τον τρόπο. Και φυσικά, ένα υποπροϊόν όλων αυτών είναι τα πάρα πολύ πλούσια εικονικά μοντέλα από κάθε ενδιαφέρον σημείο του πλανήτη, συγκεντρωμένα όχι μόνο από εναέριες και δορυφορικές φωτογραφίες και τα λοιπά, αλλά από τη συλλογική μνήμη.

What the point here really is is that we can do things with the social environment. This is now taking data from everybody -- from the entire collective memory, visually, of what the Earth looks like -- and link all of that together. Those photos become linked, and they make something emergent that's greater than the sum of the parts. You have a model that emerges of the entire Earth. Think of this as the long tail to Stephen Lawler's Virtual Earth work. And this is something that grows in complexity as people use it, and whose benefits become greater to the users as they use it. Their own photos are getting tagged with meta-data that somebody else entered. If somebody bothered to tag all of these saints and say who they all are, then my photo of Notre Dame Cathedral suddenly gets enriched with all of that data, and I can use it as an entry point to dive into that space, into that meta-verse, using everybody else's photos, and do a kind of a cross-modal and cross-user social experience that way. And of course, a by-product of all of that is immensely rich virtual models of every interesting part of the Earth, collected not just from overhead flights and from satellite images and so on, but from the collective memory.

Σας ευχαριστώ πολύ. (Χειροκρότημα)

Thank you so much. (Applause)

(Applause ends)

Κρις Άντερσον: Το αντιλαμβάνομαι σωστά; Ότι το πρόγραμμά θα επιτρέψει έως ενός σημείου, μέσα σε λίγα χρόνια, όλες οι φωτογραφίες που μοιράζονται από τον καθένα στον κόσμο θα συσχετιστούν βασικά μεταξύ τους;

Chris Anderson: Do I understand this right? What your software is going to allow, is that at some point, really within the next few years, all the pictures that are shared by anyone across the world are going to link together?

ΜΑΑ: Ναι. Αυτό που κάνει είναι να ανακαλύπτει. Να δημιουργεί συνδέσμους, αν προτιμάς, μεταξύ εικόνων. Και το κάνει βασιζόμενο στο περιεχόμενο των εικόνων. Και γίνεται πραγματικά συναρπαστικό όταν σκεφτείς τον πλούτο της σημασιολογικής πληροφορίας που έχουν πολλές από αυτές τις εικόνες. Όπως όταν κάνεις μια αναζήτηση στον ιστό για εικόνες, γράφεις κάποιες φράσεις και το κείμενο στην ιστοσελίδα κουβαλάει πολλές πληροφορίες για το τι περιέχει αυτή η εικόνα. Τώρα, τι γίνεται αν η εικόνα συνδέεται με όλες τις εικόνες σας; Τότε η ποσότητα των σημασιολογικών διασυνδέσεων και ο όγκος του πλούτου που πηγάζει από όλο αυτό είναι πραγματικά τεράστιος. Είναι ένα κλασικό φαινόμενο δικτύου.

BAA: Yes. What this is really doing is discovering, creating hyperlinks, if you will, between images. It's doing that based on the content inside the images. And that gets really exciting when you think about the richness of the semantic information a lot of images have. Like when you do a web search for images, you type in phrases, and the text on the web page is carrying a lot of information about what that picture is of. What if that picture links to all of your pictures? The amount of semantic interconnection and richness that comes out of that is really huge.

ΚΑ: Μπλεζ, αυτό είναι πραγματικά απίθανο. Συγχαρητήρια.

It's a classic network effect.

ΜΑΑ: Ευχαριστώ πάρα πολύ.

CA: Truly incredible. Congratulations.

(Applause ends)

You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here. (Laughter)

Σας ευχαριστώ πολύ. (Χειροκρότημα)

Thank you so much. (Applause)

(Applause ends)

ΚΑ: Μπλεζ, αυτό είναι πραγματικά απίθανο. Συγχαρητήρια.

It's a classic network effect.

ΜΑΑ: Ευχαριστώ πάρα πολύ.

CA: Truly incredible. Congratulations.

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art

Related talks

David Bolinsky: Visualizing the wonder of a living cell

Johnny Lee: Free or cheap Wii Remote hacks

Anand Agarawala: Rethink the desktop with BumpTop

Levon Biss: Mind-blowing, magnified portraits of insects

Christoph Niemann: You are fluent in this language (and don't even know it)

Sarah Sze: How we experience time and memory through art