Frederic Kaplan: How to build an information time machine

This is an image of the planet Earth. It looks very much like the Apollo pictures that are very well known. There is something different; you can click on it, and if you click on it, you can zoom in on almost any place on the Earth. For instance, this is a bird's-eye view of the EPFL campus. In many cases, you can also see how a building looks from a nearby street. This is pretty amazing. But there's something missing in this wonderful tour: It's time. i'm not really sure when this picture was taken. I'm not even sure it was taken at the same moment as the bird's-eye view. In my lab, we develop tools to travel not only in space but also through time. The kind of question we're asking is Is it possible to build something like Google Maps of the past? Can I add a slider on top of Google Maps and just change the year, seeing how it was 100 years before, 1,000 years before? Is that possible? Can I reconstruct social networks of the past? Can I make a Facebook of the Middle Ages? So, can I build time machines? Maybe we can just say, "No, it's not possible." Or, maybe, we can think of it from an information point of view. This is what I call the information mushroom. Vertically, you have the time. and horizontally, the amount of digital information available. Obviously, in the last 10 years, we have much information. And obviously the more we go in the past, the less information we have. If we want to build something like Google Maps of the past, or Facebook of the past, we need to enlarge this space, we need to make that like a rectangle. How do we do that? One way is digitization. There's a lot of material available -- newspaper, printed books, thousands of printed books. I can digitize all these. I can extract information from these. Of course, the more you go in the past, the less information you will have. So, it might not be enough. So, I can do what historians do. I can extrapolate. This is what we call, in computer science, simulation. If I take a log book, I can consider, it's not just a log book of a Venetian captain going to a particular journey. I can consider it is actually a log book which is representative of many journeys of that period. I'm extrapolating. If I have a painting of a facade, I can consider it's not just that particular building, but probably it also shares the same grammar of buildings where we lost any information.

Ecco un'immagine del pianeta Terra. Assomiglia molto alle immagini dell'Apollo che tutti conosciamo bene. Ma c'è qualcosa di diverso: potete cliccarci sopra, e se ci cliccate potete zoomare su quasi tutti i posti della Terra. Ecco qui una vista aerea del campus EPFL. In molti punti potete vedere anche come appaiono gli edifici visti da strade adiacenti. Abbastanza sorprendente. Ma c'è qualcosa che manca in questo favoloso tour: il tempo. Non so bene quando sia stata scattata la foto. Non so nemmeno se sia stata scattata contemporaneamente alla vista panoramica. Nel mio laboratorio sviluppiamo strumenti per viaggiare non solo nello spazio ma anche nel tempo. Ciò che ci chiediamo è: È possibile costruire qualcosa tipo un Google Maps del passato? È possibile aggiungere una barra di scorrimento su Google Maps e cambiare l'anno, per vedere com'era 100 anni prima, o anche 1000 anni prima? Si può? Si possono ricostruire le reti sociali del passato? Si può ricostruire il Facebook del Medioevo? Si possono costruire le macchine del tempo? Forse basta dire: "No, non si può". O magari possiamo vederlo da un punto di vista informativo. Ecco uno schema a fungo. Sulla linea verticale c'è il tempo e su quella orizzontale, la quantità di informazione digitale disponibile. Ovviamente, per gli ultimi 10 anni abbiamo molte informazioni. Chiaramente più andiamo indietro nel tempo, meno informazioni troveremo. Se vogliamo costruire qualcosa di simile al Google Maps del passato, o al Facebook del passato, bisogna estendere questo spazio, fare in modo che diventi un rettangolo. Come possiamo riuscirci? Attraverso la digitalizzazione. C'è molto materiale disponibile - giornali, libri cartacei, migliaia di libri cartacei. Posso digitalizzare tutto ciò. Posso estrarre informazioni. Naturalmente, più indietro andiamo nel tempo meno informazioni troveremo. Potrebbe non essere abbastanza. Posso fare come gli storici, allora. Posso estrapolare. È quello che in informatica chiamiamo simulazione. Se prendiamo un diario di bordo, suppongo che non racconti solo la storia di un viaggio di quel capitano veneziano. Posso vederlo come un vero e proprio diario rappresentativo di tanti viaggi di quel periodo. Sto estrapolando. Se ho il dipinto di una facciata, posso ipotizzare che non sia solo una caratteristica di quell'edificio, ma probabilmente presenta lo stesso schema di molti edifici di cui non abbiamo più informazioni.

So if we want to construct a time machine, we need two things. We need very large archives, and we need excellent specialists. The Venice Time Machine, the project I'm going to talk to you about, is a joint project between the EPFL and the University of Venice Ca'Foscari.

Quindi se vogliamo costruire una macchina del tempo, ci servono due cose. Ci vogliono archivi molto grandi, e degli eccellenti specialisti. La macchina del tempo di Venezia, il progetto di cui vi parlerò, è un progetto congiunto tra EPFL e l'Università Ca' Foscari di Venezia.

There's something very peculiar about Venice, that its administration has been very, very bureaucratic. They've been keeping track of everything, almost like Google today. At the Archivio di Stato, you have 80 kilometers of archives documenting every aspect of the life of Venice over more than 1,000 years. You have every boat that goes out, every boat that comes in. You have every change that was made in the city. This is all there. We are setting up a 10-year digitization program which has the objective of transforming this immense archive into a giant information system. The type of objective we want to reach is 450 books a day that can be digitized. Of course, when you digitize, that's not enough, because these documents, most of them are in Latin, in Tuscan, in Venetian dialect, so you need to transcribe them, to translate them in some cases, to index them, and this is obviously not easy. In particular, traditional optical character recognition method that can be used for printed manuscripts, they do not work well on the handwritten document. So the solution is actually to take inspiration from another domain: speech recognition. This is a domain of something that seems impossible, which can actually be done, simply by putting additional constraints. If you have a very good model of a language which is used, if you have a very good model of a document, how well they are structured. And these are administrative documents. They are well structured in many cases. If you divide this huge archive into smaller subsets where a smaller subset actually shares similar features, then there's a chance of success.

C'è qualcosa di molto particolare su Venezia, ed è che la sua amministrazione è stata molto, molto burocratica. Hanno tenuto traccia di tutto, quasi come fa Google oggi. All'Archivio di Stato ci sono 80 chilometri di archivi che documentano ogni aspetto della vita di Venezia durante più di 1000 anni. Troviamo ogni barca che è salpata, ogni barca che è arrivata. Ogni cambiamento che è stato fatto nella città. È tutto lì dentro. Stiamo preparando un programma di digitalizzazione per i prossimi 10 anni con l'obiettivo di trasformare questo immenso archivio in un enorme sistema di informazioni. Il tipo di progetto che vorremmo realizzare è la digitalizzazione di 450 libri al giorno. Naturalmente quando si digitalizza, non è abbastanza, poiché la maggior parte di questi documenti è in latino, toscano, dialetto veneziano, quindi vanno trascritti, a volte anche tradotti, indicizzati, e tutto questo non è facile. Nello specifico, il tradizionale metodo di riconoscimento ottico dei caratteri che può essere usato per i manoscritti stampati, non funziona bene con i documenti scritti a mano. Quindi la soluzione è trovare l'ispirazione in un altro campo: il riconoscimento vocale. È il dominio di qualcosa che sembra impossibile. ma che può essere fatto, semplicemente mettendo ulteriori vincoli. Se si dispone di un valido modello di un linguaggio che viene utilizzato, di un valido modello per un documento, si presentano ben strutturati. E questi sono documenti amministrativi. In molti casi sono ben strutturati. Se dividiamo questo enorme archivio in sottoinsiemi, in cui piccoli sottoinsiemi condividono simili caratteristiche, allora esiste la possibilità di successo.

If we reach that stage, then there's something else: we can extract from this document events. Actually probably 10 billion events can be extracted from this archive. And this giant information system can be searched in many ways. You can ask questions like, "Who lived in this palazzo in 1323?" "How much cost a sea bream at the Realto market in 1434?" "What was the salary of a glass maker in Murano maybe over a decade?" You can ask even bigger questions because it will be semantically coded. And then what you can do is put that in space, because much of this information is spatial. And from that, you can do things like reconstructing this extraordinary journey of that city that managed to have a sustainable development over a thousand years, managing to have all the time a form of equilibrium with its environment. You can reconstruct that journey, visualize it in many different ways. But of course, you cannot understand Venice if you just look at the city. You have to put it in a larger European context. So the idea is also to document all the things that worked at the European level. We can reconstruct also the journey of the Venetian maritime empire, how it progressively controlled the Adriatic Sea, how it became the most powerful medieval empire of its time, controlling most of the sea routes from the east to the south.

Se raggiungiamo quella fase, poi c'è qualcos'altro: Possiamo estrarre gli eventi dai documenti. Probabilmente 10 miliardi di eventi possono essere estratti da questo archivio. E questo enorme sistema di informazioni può essere ricercato in vari modi. Si possono porre domande come, "Chi ha vissuto in questo palazzo nel 1323?"; "Quanto costava un'orata al mercato di Rialto nel 1434?"; "Qual era il salario dei vetrai di Murano durante quel decennio?"; Si possono anche fare domande più complesse, poiché è codificato semanticamente. E a quel punto possiamo collocarlo nello spazio, poiché molte informazioni sono spaziali. Partendo da ciò, si può realizzare la ricostruzione di questo stupendo viaggio di una città che ha cercato di realizzare uno sviluppo sostenibile nel corso di un millennio, cercando di avere sempre un equilibrio con l'ambiente. Si può ricostruire quel viaggio, visualizzarlo in modi diversi. Ma certamente non potremo capire Venezia se guardiamo solo la città. Va inserita in un contesto europeo più ampio. L'idea è di documentare anche tutte le cose che hanno funzionato a livello europeo. Possiamo ricostruire anche il viaggio dell'impero marittimo veneziano, come via via ha acquisito il controllo sul mare Adriatico, come si sia convertito nel più potente impero medievale di quei tempi, controllando molte rotte marittime da est a sud.

But you can even do other things, because in these maritime routes, there are regular patterns. You can go one step beyond and actually create a simulation system, create a Mediterranean simulator which is capable actually of reconstructing even the information we are missing, which would enable us to have questions you could ask like if you were using a route planner.

Ma si possono fare anche altre cose, perché in queste rotte marittime, esistono schemi regolari. Si può fare un passo avanti e creare un sistema simulato, creare un simulatore del Mediterraneo in grado di ricostruire perfino le informazioni mancanti che ci permetterebbero di avere domande da porci come se stessimo usando un navigatore.

"If I am in Corfu in June 1323 and want to go to Constantinople, where can I take a boat?"

"Se sono a Corfù nel giugno del 1323 e voglio andare a Costantinopoli, dove posso prendere una barca?".

Probably we can answer this question with one or two or three days' precision.

Forse possiamo rispondere a questa domanda con una precisione tra uno e tre giorni.

"How much will it cost?"

"Quanto costerà?"

"What are the chance of encountering pirates?"

"Quante possibilità ci sono di trovare pirati?"

Of course, you understand, the central scientific challenge of a project like this one is qualifying, quantifying and representing uncertainty and inconsistency at each step of this process. There are errors everywhere, errors in the document, it's the wrong name of the captain, some of the boats never actually took to sea. There are errors in translation, interpretative biases, and on top of that, if you add algorithmic processes, you're going to have errors in recognition, errors in extraction, so you have very, very uncertain data.

Si capisce, dunque, che la sfida scientifica al centro di un progetto come questo è qualificare, quantificare e rappresentare l'incertezza e l'incoerenza per ognuna delle fasi del processo. Ci sono errori dappertutto: errori nel documento; il nome del capitano è sbagliato; alcune delle barche non sono mai salpate. Ci sono errori di traduzione, interpretazioni distorte, e soprattutto, se si aggiungono dei processi algoritmici, si finisce per avere errori di riconoscimento, errori nell'estrazione, per cui avremo dati molto, molto imprecisi.

So how can we detect and correct these inconsistencies? How can we represent that form of uncertainty? It's difficult. One thing you can do is document each step of the process, not only coding the historical information but what we call the meta-historical information, how is historical knowledge constructed, documenting each step. That will not guarantee that we actually converge toward a single story of Venice, but probably we can actually reconstruct a fully documented potential story of Venice. Maybe there's not a single map. Maybe there are several maps. The system should allow for that, because we have to deal with a new form of uncertainty, which is really new for this type of giant databases.

Come possiamo rilevare e correggere queste incongruenze? Come si può rappresentare quell'incertezza? È difficile. Una cosa che si può fare è documentare ogni passaggio del processo, non solo codificando l'informazione storica ma ciò che chiamiamo informazione meta-storica, il modo in cui il sapere storico è costruito, documentando ogni passaggio. Ciò non ci garantirà la confluenza verso un'unica storia di Venezia, ma forse saremo in grado di ricostruire una potenziale storia di Venezia documentata. Forse non c'è una sola mappa. Forse esistono più mappe. Il sistema dovrebbe permetterlo, poiché ci troviamo di fronte a una nuova forma di incertezza, che è nuova per questo tipo di database gigante.

And how should we communicate this new research to a large audience? Again, Venice is extraordinary for that. With the millions of visitors that come every year, it's actually one of the best places to try to invent the museum of the future. Imagine, horizontally you see the reconstructed map of a given year, and vertically, you see the document that served the reconstruction, paintings, for instance. Imagine an immersive system that permits to go and dive and reconstruct the Venice of a given year, some experience you could share within a group. On the contrary, imagine actually that you start from a document, a Venetian manuscript, and you show, actually, what you can construct out of it, how it is decoded, how the context of that document can be recreated. This is an image from an exhibit which is currently conducted in Geneva with that type of system.

E come dovremmo trasmettere questa nuova ricerca al grande pubblico? Venezia è straordinaria anche per questo motivo. Con milioni di visitatori ogni anno, è in realtà uno dei posti migliori per cercare di inventare un museo del futuro. Visualizzate su una linea orizzontale la mappa ricostruita di un determinato anno, e in verticale, c'è il documento utilizzato per la ricostruzione: dipinti, ad esempio. Immaginate un sistema immersivo che permette di andare, tuffarsi e ricostruire la Venezia di un determinato anno, un'esperienza da condividere in gruppo. Al contrario, immaginate di partire da un documento, un manoscritto veneziano, e mostrate cosa potete ricavarne, come decodificarlo, come ricostruire il contesto del documento. Questa è un'immagine di una mostra attualmente presente a Ginevra con questo tipo di sistema.

So to conclude, we can say that research in the humanities is about to undergo an evolution which is maybe similar to what happened to life sciences 30 years ago. It's really a question of scale. We see projects which are much beyond any single research team can do, and this is really new for the humanities, which very often take the habit of working in small groups or only with a couple of researchers. When you visit the Archivio di Stato, you feel this is beyond what any single team can do, and that should be a joint and common effort. So what we must do for this paradigm shift is actually foster a new generation of "digital humanists" that are going to be ready for this shift.

Per concludere, possiamo dire che la ricerca in ambito umanistico sta per affrontare un'evoluzione forse simile a cosa accadde alla scienze biologiche 30 anni fa. È una questione di portata. Ci sono progetti che vanno ben oltre la singola ricerca che una squadra può fare, ed è una novità per il campo umanistico, che spesso ha l'abitudine di lavorare in piccoli gruppi o solo con un paio di ricercatori. Quando si visita l'Archivio di Stato, sembra molto più grande di quello che può fare un solo gruppo, e quello dev'essere uno sforzo comune e congiunto. Per questo cambiamento di paradigma è necessario promuovere una nuova generazione di "umanisti digitali" in grado di affrontare il cambiamento.

I thank you very much.

Grazie.

(Applause)

(Applausi)

"If I am in Corfu in June 1323 and want to go to Constantinople, where can I take a boat?"

"Se sono a Corfù nel giugno del 1323 e voglio andare a Costantinopoli, dove posso prendere una barca?".

Probably we can answer this question with one or two or three days' precision.

Forse possiamo rispondere a questa domanda con una precisione tra uno e tre giorni.

"How much will it cost?"

"Quanto costerà?"

"What are the chance of encountering pirates?"

"Quante possibilità ci sono di trovare pirati?"

I thank you very much.

Grazie.

(Applause)

(Applausi)

Frederic Kaplan: How to build an information time machine

Frederic Kaplan: How to build an information time machine

Related talks

Jean-Baptiste Michel + Erez Lieberman Aiden: What we learned from 5 million books

Blaise Agüera y Arcas: Augmented-reality maps

Brewster Kahle: A free digital library

David McCandless: The beauty of data visualization

JP Rangaswami: Information is food

Aris Venetikidis: Making sense of maps

Related talks

Jean-Baptiste Michel + Erez Lieberman Aiden: What we learned from 5 million books

Blaise Agüera y Arcas: Augmented-reality maps

Brewster Kahle: A free digital library

David McCandless: The beauty of data visualization

JP Rangaswami: Information is food

Aris Venetikidis: Making sense of maps