Frederic Kaplan: How to build an information time machine

Das ist ein Bild des Planeten Erde. Es sieht so ähnlich aus wie die bekannten Apollo-Bilder. Etwas ist allerdings anders: Man kann es anklicken, und wenn man es anklickt, kann man zu fast jedem Ort auf der Erde zoomen. Dies ist z. B. eine Vogelperspektive auf den EPFL-Campus. In vielen Fällen kann man auch sehen, wie ein Gebäude von einer nahegelegenen Straße aus aussieht. Das ist ziemlich beeindruckend. Aber eins fehlt in dieser wunderbaren Reise: Zeit. Ich weiß nicht so genau, wann dieses Bild gemacht wurde. Ich weiß noch nicht mal, ob es im selben Moment wie die Vogelperspektive gemacht wurde. In meinem Labor entwickeln wir Werkzeuge, mit denen wir nicht nur durch den Raum, sondern auch durch die Zeit reisen können. Wir stellen solche Fragen: Ist es möglich, etwas wie ein Google Maps der Vergangenheit zu bauen? Kann ich einen Schieberegler über Google Maps legen und einfach das Jahr ändern, und sehen, wie es vor 100 Jahren oder vor 1000 Jahren aussah? Ist das möglich? Kann ich soziale Netzwerke der Vergangenheit rekonstruieren? Kann ich ein Facebook des Mittelalters erstellen? Also, kann ich Zeitmaschinen bauen? Vielleicht antworten wir einfach: "Nein, das ist unmöglich." Oder wir könnten darüber aus einer Informationsperspektive nachdenken. Ich nenne das hier den "Informationspilz". Vertikal ist die Zeit abgetragen und horizontal die verfügbare Menge digitaler Information. Für die letzten 10 Jahre haben wir natürlich sehr viele Informationen. Und klar, ke weiter wir zurückgehen, desto weniger Informationen haben wir. Wenn wir etwas wie ein Google Maps oder Facebook der Vergangenheit schaffen wollen, müssen wir diesen Raum vergrößern. Wir müssen ihn zu einem Rechteck machen. Wie machen wir das? Eine Methode ist Digitalisierung. Es gibt Unmengen an Material -- Zeitungen, gedruckte Bücher, Tausende von gedruckten Büchern. All diese kann ich digitalisieren. Ich kann Informationen aus ihnen ziehen. Je weiter man in die Vergangenheit geht, desto weniger Informationen gibt es natürlich. Es könnte also zu wenig sein. Dann kann ich das tun, was Historiker tun. Ich kann Dinge ableiten. In der Informatik nennen wir das Simulation. Wenn ich ein Logbuch nehme, kann ich annehmen, dass es nicht nur das Logbuch eines venezianischen Kapitäns auf einer bestimmten Reise ist. Ich kann annehmen, dass dieses Logbuch tatsächlich viele Reisen dieser Zeit repräsentiert. Ich leite ab. Wenn ich das Gemälde einer Fassade habe, kann ich annehmen, dass es nicht nur dieses bestimmte Gebäude ist sondern die Hauptaspekte von anderen Gebäuden teilt, von denen wir alle Informationen verloren haben.

This is an image of the planet Earth. It looks very much like the Apollo pictures that are very well known. There is something different; you can click on it, and if you click on it, you can zoom in on almost any place on the Earth. For instance, this is a bird's-eye view of the EPFL campus. In many cases, you can also see how a building looks from a nearby street. This is pretty amazing. But there's something missing in this wonderful tour: It's time. i'm not really sure when this picture was taken. I'm not even sure it was taken at the same moment as the bird's-eye view. In my lab, we develop tools to travel not only in space but also through time. The kind of question we're asking is Is it possible to build something like Google Maps of the past? Can I add a slider on top of Google Maps and just change the year, seeing how it was 100 years before, 1,000 years before? Is that possible? Can I reconstruct social networks of the past? Can I make a Facebook of the Middle Ages? So, can I build time machines? Maybe we can just say, "No, it's not possible." Or, maybe, we can think of it from an information point of view. This is what I call the information mushroom. Vertically, you have the time. and horizontally, the amount of digital information available. Obviously, in the last 10 years, we have much information. And obviously the more we go in the past, the less information we have. If we want to build something like Google Maps of the past, or Facebook of the past, we need to enlarge this space, we need to make that like a rectangle. How do we do that? One way is digitization. There's a lot of material available -- newspaper, printed books, thousands of printed books. I can digitize all these. I can extract information from these. Of course, the more you go in the past, the less information you will have. So, it might not be enough. So, I can do what historians do. I can extrapolate. This is what we call, in computer science, simulation. If I take a log book, I can consider, it's not just a log book of a Venetian captain going to a particular journey. I can consider it is actually a log book which is representative of many journeys of that period. I'm extrapolating. If I have a painting of a facade, I can consider it's not just that particular building, but probably it also shares the same grammar of buildings where we lost any information.

Für die Konstruktion einer Zeitmaschine brauchen wir also zwei Dinge. Wir brauchen sehr große Archive und großartige Spezialisten. Die "Venice Time Machine", das Projekt, über das ich spreche, ist ein gemeinsames Projekt der EPFL und der Universität Venedig Ca'Foscari.

So if we want to construct a time machine, we need two things. We need very large archives, and we need excellent specialists. The Venice Time Machine, the project I'm going to talk to you about, is a joint project between the EPFL

Etwas sehr Besonderes an Venedig ist, dass die Administration schon immer sehr, sehr bürokratisch gewesen ist. Sie haben alles festgehalten, fast wie Google heute. Im Archivio di Stato gibt es 80 Kilometer Archive, die jeden Aspekt des Lebens in Venedig über mehr als 1000 Jahre dokumentieren. Jedes Boot, das ablegt und ankommt, ist da verzeichnet. Man findet jede Veränderung, die in der Stadt gemacht wurde. Es ist alles da. Wir bauen gerade ein 10-jähriges Digitalisierungsprogramm auf, welches dieses immense Archiv in ein gigantisches Informationssystem verwandeln soll. Eines unserer Ziele ist es, 450 Bücher am Tag digitalisieren zu können. Digitalisieren reicht natürlich nicht aus, weil die meisten Dokumente in Latein, Toskanisch, oder venezianischem Dialekt sind. Sie müssen transkribiert werden, in manchen Fällen übersetzt, sie müssen indiziert werden. Das ist natürlich nicht einfach, besonders, weil traditionelle Schrifterfassungsmethoden für gedruckte Manuskripte nicht gut für Handschriften funktionieren. Die Lösung ist, Inspiration in einem anderen Feld zu suchen: Spracherkennung. Dies ist ein scheinbar unmögliches Gebiet, das tatsächlich funktioniert, wenn man zusätzliche Bedingungen hinzufügt. Hat man von der genutzten Sprache ein sehr gutes Modell, ein sehr gutes Modell von einem Dokument -- wie gut sie strukturiert sind. Dies sind Verwaltungsdokumente. Sie sind in vielen Fällen gut strukturiert. Wenn man dieses riesige Archiv in kleinere Untergruppen aufteilt, in denen kleinere Untergruppen ähnliche Merkmale teilen, dann ist Erfolg möglich.

and the University of Venice Ca'Foscari. There's something very peculiar about Venice, that its administration has been very, very bureaucratic. They've been keeping track of everything, almost like Google today. At the Archivio di Stato, you have 80 kilometers of archives documenting every aspect of the life of Venice over more than 1,000 years. You have every boat that goes out, every boat that comes in. You have every change that was made in the city. This is all there. We are setting up a 10-year digitization program which has the objective of transforming this immense archive into a giant information system. The type of objective we want to reach is 450 books a day that can be digitized. Of course, when you digitize, that's not enough, because these documents, most of them are in Latin, in Tuscan, in Venetian dialect, so you need to transcribe them, to translate them in some cases, to index them, and this is obviously not easy. In particular, traditional optical character recognition method that can be used for printed manuscripts, they do not work well on the handwritten document. So the solution is actually to take inspiration from another domain: speech recognition. This is a domain of something that seems impossible, which can actually be done, simply by putting additional constraints. If you have a very good model of a language which is used, if you have a very good model of a document, how well they are structured. And these are administrative documents. They are well structured in many cases. If you divide this huge archive into smaller subsets where a smaller subset actually shares similar features, then there's a chance of success.

Haben wir diesen Punkt erreicht, gibt es noch etwas: Wir können aus diesen Dokumenten Ereignisse ableiten. Wahrscheinlich können ca. 10 Milliarden Ereignisse aus diesem Archiv abgeleitet werden. Dieses gigantische Informationssystem kann auf viele Arten durchsucht werden. Man kann Fragen stellen wie: "Wer wohnte 1323 in diesem Palazzo?" "Wie viel kostete 1434 eine Seebrasse auf dem Rialto-Markt?" "Wie hoch war das Gehalt eines Glasbläsers aus Murano in einem Jahrzehnt?" Auch größere Fragen sind möglich, weil sie semantisch kodiert werden. Man kann das dann im Raum anordnen, denn viele der Informationen sind räumlich. Und davon ausgehend, kann man die außergewöhnliche Reise dieser Stadt rekonstruieren, die es über 1000 Jahre geschafft hat, eine nachhaltige Entwicklung zu haben, und die ganze Zeit über in einem Gleichgewicht mit ihrer Umwelt zu leben. Man kann die Reise rekonstruieren und auf vielfältige Art und Weise visualisieren. Man kann Venedig nicht verstehen, wenn man nur die Stadt sieht, man muss einen größeren europäischen Kontext sehen. Die Idee ist also, all die Dinge zu dokumentieren, die auf europäischem Niveau abliefen. Wir können auch die Entwicklung des venezianischen Seereiches rekonstruieren, wie die Stadt nach und nach die Adria kontrollierte und wie sie das mächtigste mittelalterliche Reich dieser Zeit wurde, welches die meisten Seerouten von Osten nach Süden kontrollierte.

If we reach that stage, then there's something else: we can extract from this document events. Actually probably 10 billion events can be extracted from this archive. And this giant information system can be searched in many ways. You can ask questions like, "Who lived in this palazzo in 1323?" "How much cost a sea bream at the Realto market in 1434?" "What was the salary of a glass maker in Murano maybe over a decade?" You can ask even bigger questions because it will be semantically coded. And then what you can do is put that in space, because much of this information is spatial. And from that, you can do things like reconstructing this extraordinary journey of that city that managed to have a sustainable development over a thousand years, managing to have all the time a form of equilibrium with its environment. You can reconstruct that journey, visualize it in many different ways. But of course, you cannot understand Venice if you just look at the city. You have to put it in a larger European context. So the idea is also to document all the things that worked at the European level. We can reconstruct also the journey of the Venetian maritime empire, how it progressively controlled the Adriatic Sea, how it became the most powerful medieval empire of its time, controlling most of the sea routes from the east to the south.

Aber man kann sogar noch mehr machen, denn diese Seerouten folgen geordneten Mustern. Man kann einen Schritt weiter gehen und eine Simulation bauen, eine Simulation des Mittelmeeres, mit der man sogar fehlende Informationen rekonstruieren kann, die uns Fragen erlauben würden, die wir sonst einem Routenplaner stellen.

But you can even do other things, because in these maritime routes, there are regular patterns. You can go one step beyond and actually create a simulation system, create a Mediterranean simulator which is capable actually of reconstructing even the information we are missing, which would enable us to have questions you could ask like if you were using a route planner.

"Wenn ich im Juni 1323 auf Korfu bin und nach Konstantinopel möchte, wo kann ich ein Boot nehmen?"

"If I am in Corfu in June 1323 and want to go to Constantinople, where can I take a boat?"

Wir können diese Frage vermutlich auf ein, zwei oder drei Tage genau beantworten.

Probably we can answer this question with one or two or three days' precision.

"Wie viel kostet das?"

"How much will it cost?"

"Wie hoch sind die Chancen, auf Piraten zu treffen?"

"What are the chance of encountering pirates?"

Wissen Sie, natürlich ist die zentrale, wissenschaftliche Herausforderung bei so einem Projekt, Ungewissheit und Lückenhaftigkeit in jedem Schritt dieses Prozesses zu qualifizieren, zu quantifizieren und zu repräsentieren. Überall finden sich Fehler, Fehler im Dokument, der falsche Name des Kapitäns, manche Boote sind niemals in See gestochen. Es gibt Übersetzungsfehler, interpretative Verzerrungen. Dazu kommt noch, dass bei einem Algorithmus Erkennungsfehler und Extraktionsfehler auftreten werden, sodass die Datenbasis sehr, sehr unsicher ist.

Of course, you understand, the central scientific challenge of a project like this one is qualifying, quantifying and representing uncertainty and inconsistency at each step of this process. There are errors everywhere, errors in the document, it's the wrong name of the captain, some of the boats never actually took to sea. There are errors in translation, interpretative biases, and on top of that, if you add algorithmic processes, you're going to have errors in recognition, errors in extraction, so you have very, very uncertain data.

Wie können wir diese Ungereimtheiten finden und korrigieren? Wie können wir diese Form der Ungewissheit darstellen? Es ist schwierig. Ein Ansatz ist, jeden Schritt zu dokumentieren -- nicht nur die historischen Daten zu kodieren, sondern auch sogenannte meta-historischen Informationen, wie historisches Wissen konstruiert wurde, und jeden Schritt zu dokumentieren. Das garantiert nicht, dass wir tatsächlich zu einer einzigen Geschichte Venedigs zusammenfinden, aber wir können vermutlich eine mögliche, vollständig dokumentierte Geschichte Venedigs rekonstruieren. Vielleicht gibt es nicht nur eine, sondern mehrere Karten. Das System sollte dafür geeignet sein, denn hier arbeiten wir mit einer neuen Art von Unsicherheit, die für diesen Datenbankentyp absolut neu ist.

So how can we detect and correct these inconsistencies? How can we represent that form of uncertainty? It's difficult. One thing you can do is document each step of the process, not only coding the historical information but what we call the meta-historical information, how is historical knowledge constructed, documenting each step. That will not guarantee that we actually converge toward a single story of Venice, but probably we can actually reconstruct a fully documented potential story of Venice. Maybe there's not a single map. Maybe there are several maps. The system should allow for that, because we have to deal with a new form of uncertainty, which is really new for this type of giant databases.

Und wie sollten wir diese neue Forschung einer großen Zielgruppe darstellen? Venedig ist auch dafür außergewöhnlich gut geeignet. Mit den Millionen Besuchern, die jedes Jahr nach Venedig kommen, ist es tatsächlich einer der besten Orte, um den Versuch eines Museums der Zukunft zu wagen. Stellen Sie sich das vor: Horizontal sieht man die rekonstruierte Karte in einem bestimmten Jahr und vertikal sieht man das Dokument, das als Basis für die Rekonstruktion diente, Gemälde zum Beispiel. Ein umfassendes System, das es erlaubt, hinzugehen, einzutauchen und das Venedig eines bestimmten Jahres auferstehen zu lassen, als ein Erlebnis, das man mit einer Gruppe teilen könnte. Oder stellen Sie sich vor, dass Sie tatsächlich bei einem Dokument anfangen. Einem venezianischen Manuskript. Und Sie zeigen, was man daraus machen kann, wie es dekodiert wird, wie der Kontext des Dokuments nachvollzogen werden kann. Dies ist ein Bild aus einer Ausstellung, die im Moment in Genf aufgebaut wird mit dieser Art von System.

And how should we communicate this new research to a large audience? Again, Venice is extraordinary for that. With the millions of visitors that come every year, it's actually one of the best places to try to invent the museum of the future. Imagine, horizontally you see the reconstructed map of a given year, and vertically, you see the document that served the reconstruction, paintings, for instance. Imagine an immersive system that permits to go and dive and reconstruct the Venice of a given year, some experience you could share within a group. On the contrary, imagine actually that you start from a document, a Venetian manuscript, and you show, actually, what you can construct out of it, how it is decoded, how the context of that document can be recreated. This is an image from an exhibit which is currently conducted in Geneva with that type of system.

Zusammenfassend können wir also sagen, dass die Forschung in den Geisteswissenschaften dabei ist, eine Evolution zu durchlaufen, die vielleicht mit dem vergleichbar ist, was vor 30 Jahren mit den Naturwissenschaften passierte. Es ist eine Frage des Ausmaßes. Wir sehen Projekte, die weit über die Kapazität eines einzelnen Forscherteams hinausgehen, und das ist eine neue Entwicklung für die Geisteswissenschaften, die daran gewöhnt sind, in kleinen Gruppen oder mit nur ein paar Forschern zusammenzuarbeiten. Im Archivio di Stato merkt man, dass das weit über die Kapazitäten eines einzelnen Teams geht und dass es eine Zusammenarbeit sein sollte. Was wir für diesen Paradigmenwechsel tun müssen, ist eine neue Generation der "digital humanists" zu fördern, die bereit sind, diese Anforderungen zu meistern.

So to conclude, we can say that research in the humanities is about to undergo an evolution which is maybe similar to what happened to life sciences 30 years ago. It's really a question of scale. We see projects which are much beyond any single research team can do, and this is really new for the humanities, which very often take the habit of working in small groups or only with a couple of researchers. When you visit the Archivio di Stato, you feel this is beyond what any single team can do, and that should be a joint and common effort. So what we must do for this paradigm shift is actually foster a new generation of "digital humanists" that are going to be ready for this shift.

Vielen Dank!

I thank you very much.

(Applaus)

(Applause)

"Wenn ich im Juni 1323 auf Korfu bin und nach Konstantinopel möchte, wo kann ich ein Boot nehmen?"

"If I am in Corfu in June 1323 and want to go to Constantinople, where can I take a boat?"

Wir können diese Frage vermutlich auf ein, zwei oder drei Tage genau beantworten.

Probably we can answer this question with one or two or three days' precision.

"Wie viel kostet das?"

"How much will it cost?"

"Wie hoch sind die Chancen, auf Piraten zu treffen?"

"What are the chance of encountering pirates?"

Vielen Dank!

I thank you very much.

(Applaus)

(Applause)

Frederic Kaplan: How to build an information time machine

Frederic Kaplan: How to build an information time machine

Related talks

Jean-Baptiste Michel + Erez Lieberman Aiden: What we learned from 5 million books

Blaise Agüera y Arcas: Augmented-reality maps

Brewster Kahle: A free digital library

David McCandless: The beauty of data visualization

JP Rangaswami: Information is food

Aris Venetikidis: Making sense of maps

Related talks

Jean-Baptiste Michel + Erez Lieberman Aiden: What we learned from 5 million books

Blaise Agüera y Arcas: Augmented-reality maps

Brewster Kahle: A free digital library

David McCandless: The beauty of data visualization

JP Rangaswami: Information is food

Aris Venetikidis: Making sense of maps