Jean-Baptiste Michel + Erez Lieberman Aiden: What we learned from 5 million books

Erez Lieberman Aiden: Everyone knows that a picture is worth a thousand words. But we at Harvard were wondering if this was really true. (Laughter) So we assembled a team of experts, spanning Harvard, MIT, The American Heritage Dictionary, The Encyclopedia Britannica and even our proud sponsors, the Google. And we cogitated about this for about four years. And we came to a startling conclusion. Ladies and gentlemen, a picture is not worth a thousand words. In fact, we found some pictures that are worth 500 billion words.

Erez Lieberman Aiden: Tutti sanno che un'immagine vale mille parole. ma noi di Harward ci stavamo chiedendo se fosse davvero così. (Risate) Perciò abbiamo messo insieme un gruppo di esperti, provenienti da Harvard, dall'MIT, dall'American Heritage Dictionary, dall'Encyclopedia Britannica e persino dal nostro gentile sponsor, Google. E ci abbiamo rimuginato sopra per circa quattro anni. E siamo giunti ad una conclusione sorprendente. Signore e signori, un'immagine non vale mille parole. In effetti abbiamo scoperto alcune immagini che valgono 500 miliardi di parole.

Jean-Baptiste Michel: So how did we get to this conclusion? So Erez and I were thinking about ways to get a big picture of human culture and human history: change over time. So many books actually have been written over the years. So we were thinking, well the best way to learn from them is to read all of these millions of books. Now of course, if there's a scale for how awesome that is, that has to rank extremely, extremely high. Now the problem is there's an X-axis for that, which is the practical axis. This is very, very low.

Jean-Baptiste Michel: Come siamo giunti a questa conclusione? Erez e io stavamo pensando ai diversi modi di ottenere una grande rappresentazione visiva della cultura umana, della sua storia e dei loro cambiamenti nel corso del tempo. Col passare degli anni sono stati scritti tantissimi libri, così abbiamo pensato: <Beh, il modo migliore per imparare qualcosa da questi milioni di libri é leggerli tutti.>. Se esiste una scala per misurare il grado di grandiosità delle cose, leggere tutti quei libri si piazza molto, molto in alto. Il problema è che c'è anche un asse X di cui tenere conto, l'asse della praticità. Sul quale si piazza molto, molto in basso

(Applause)

(Applauso)

Now people tend to use an alternative approach, which is to take a few sources and read them very carefully. This is extremely practical, but not so awesome. What you really want to do is to get to the awesome yet practical part of this space. So it turns out there was a company across the river called Google who had started a digitization project a few years back that might just enable this approach. They have digitized millions of books. So what that means is, one could use computational methods to read all of the books in a click of a button. That's very practical and extremely awesome.

Molte persone tendono ad usare un approccio alternativo, che consiste nel prendere solo alcune fonti e leggerle molto attentamente. Estremamente pratico, ma non altrettanto grandioso. La cosa ideale da fare é riuscire ad arrivare nella parte grandiosa ma al contempo pratica di questo grafico. Si scopre che c'è un'azienda dall'altra parte del fiume chiamata Google, che alcuni anni prima aveva avviato un progetto di digitalizzazione che avrebbe potuto rendere quest'ultimo approccio possibile. Per questo progetto hanno digitalizzato milioni di libri. Ciò significa che una persona può utilizzare metodi computazionali per leggere tutti questi libri solo cliccando su un pulsante. -Questo- é sia molto pratico che assolutamente grandioso.

ELA: Let me tell you a little bit about where books come from. Since time immemorial, there have been authors. These authors have been striving to write books. And this became considerably easier with the development of the printing press some centuries ago. Since then, the authors have won on 129 million distinct occasions, publishing books. Now if those books are not lost to history, then they are somewhere in a library, and many of those books have been getting retrieved from the libraries and digitized by Google, which has scanned 15 million books to date.

ELA: Lasciate che vi racconti qualcosa sulla provenienza dei libri. Da tempo immemore sono esistiti gli autori. Questi autori hanno sempre avuto l'ardente desiderio di scrivere libri, Cosa che divenne considerevolmente più facile con lo sviluppo della macchina tipografica alcuni secoli fa. Da allora, gli autori sono riusciti in 129 milioni distinte occasioni, a pubblicare libri. Ora, se quei libri non sono andati persi nel corso della storia, allora si trovano da qualche parte in una qualche libreria, e molti di quei libri sono stati reperiti dalle biblioteche e digitalizzati da Google, che ad oggi ha scansionato 15 milioni di libri.

Now when Google digitizes a book, they put it into a really nice format. Now we've got the data, plus we have metadata. We have information about things like where was it published, who was the author, when was it published. And what we do is go through all of those records and exclude everything that's not the highest quality data. What we're left with is a collection of five million books, 500 billion words, a string of characters a thousand times longer than the human genome -- a text which, when written out, would stretch from here to the Moon and back 10 times over -- a veritable shard of our cultural genome. Of course what we did when faced with such outrageous hyperbole ... (Laughter) was what any self-respecting researchers would have done. We took a page out of XKCD, and we said, "Stand back. We're going to try science."

Ora, quando Google digitalizza un libro, lo converte in un formato digitale molto pratico. Ora oltre ad avere i dati abbiamo anche i metadati. Abbiamo informazioni su cose come dove il libro fu pubblicato, chi era l'autore, quando venne pubblicato. E quel che facciamo è esaminare tutte quelle informazioni ed escludere tutto all'infuori dei dati della miglior qualità. Quello che resta è una selezione di cinque milioni di libri, 500 miliardi di parole, una riga di caratteri mille volte più lunga del genoma umano -- un testo che, se venisse trascritto, coprirebbe la distanza tra qui e la luna, andata e ritorno per 10 volte -- un autentico frammento del nostro genoma culturale. Ovviamente ciò che abbiamo fatto una volta messi di fronte ad una cosa così spaventosamente esagerata... (Risate) è stato ciò che qualunque ricercatore con un po' di amor proprio avrebbe fatto. Abbiamo preso una pagina di XKCD e abbiamo detto: "Fatevi da parte. Qui stiamo per fare la scienza."

(Laughter)

(Risate)

JM: Now of course, we were thinking, well let's just first put the data out there for people to do science to it. Now we're thinking, what data can we release? Well of course, you want to take the books and release the full text of these five million books. Now Google, and Jon Orwant in particular, told us a little equation that we should learn. So you have five million, that is, five million authors and five million plaintiffs is a massive lawsuit. So, although that would be really, really awesome, again, that's extremely, extremely impractical. (Laughter)

JM: Ovviamente stavamo pensando: limitiamoci a mettere questi dati a disposizione di chiunque e lasciamo loro a "fare la scienza". Ora stiamo pensando: "Quali dati possiamo divulgare?" Quello che vorremmo fare é prendere i libri e divulgare il testo integrale di questi cinque milioni di tomi. Google, e Jon Orwant in particolare, ci rivelarono una piccola equazione che dovremmo imparare. Se hai cinque milioni di libri hai anche cinque milioni di autori, e cinque milioni di querelanti fanno un'enorme causa legale. Perciò, anche se sarebbe stato davvero davvero grandioso, di nuovo, sarebbe stato anche molto molto poco fattibile. (Risate)

Now again, we kind of caved in, and we did the very practical approach, which was a bit less awesome. We said, well instead of releasing the full text, we're going to release statistics about the books. So take for instance "A gleam of happiness." It's four words; we call that a four-gram. We're going to tell you how many times a particular four-gram appeared in books in 1801, 1802, 1803, all the way up to 2008. That gives us a time series of how frequently this particular sentence was used over time. We do that for all the words and phrases that appear in those books, and that gives us a big table of two billion lines that tell us about the way culture has been changing.

E così cedemmo di nuovo e ripiegammo sull'approccio più fattibile e un po' meno grandioso. Ci dicemmo:" Ok, invece di divulgare il testo integrale divulgheremo le statistiche sui libri". Prendete per esempio "Un barlume di felicità". Sono quattro parole, noi lo chiamiamo un "quattro grammi". Riveleremo quante volte uno specifico "quattro grammi" è apparso nei libri nel 1801, 1802, 1803, fino al 2008. Questo ci dà una serie temporale di quanto frequentemente questa particolare frase è stata usata nel tempo. Lo facciamo con tutte le parole e frasi che appaiono in quei libri, ottenendo così una grande tabella con due miliardi di righe che ci raccontano il modo in cui la cultura è cambiata.

ELA: So those two billion lines, we call them two billion n-grams. What do they tell us? Well the individual n-grams measure cultural trends. Let me give you an example. Let's suppose that I am thriving, then tomorrow I want to tell you about how well I did. And so I might say, "Yesterday, I throve." Alternatively, I could say, "Yesterday, I thrived." Well which one should I use? How to know?

ELA: Questi due miliardi di righe noi le chiamiamo due miliardi di n-grammi. Cosa ci raccontano? Gli n-grammi individuali misurano le tendenze culturali. Lasciate che vi faccia un esempio. Prendiamo il verbo irregolare "to thrive", prosperare e immaginiamo che vi voglia dire che ieri ho prosperato. Potrei usare questa forma regolare. O, in alternativa, potrei usare questa forma irregolare. Hanno lo stesso significato, quale dovrei usare? Come scoprirlo?

As of about six months ago, the state of the art in this field is that you would, for instance, go up to the following psychologist with fabulous hair, and you'd say, "Steve, you're an expert on the irregular verbs. What should I do?" And he'd tell you, "Well most people say thrived, but some people say throve." And you also knew, more or less, that if you were to go back in time 200 years and ask the following statesman with equally fabulous hair, (Laughter) "Tom, what should I say?" He'd say, "Well, in my day, most people throve, but some thrived." So now what I'm just going to show you is raw data. Two rows from this table of two billion entries. What you're seeing is year by year frequency of "thrived" and "throve" over time. Now this is just two out of two billion rows. So the entire data set is a billion times more awesome than this slide.

All'inirca sei mesi fa l'approccio migliore in casi come questo era rivolgersi, ad esempio, a questo psicologo dalla favolosa capigliatura e chiedergli: "Steve, tu sei un esperto di verbi irregolari. Cosa dovrei fare secondo te?" E lui avrebbe detto: "Be' la maggioranza delle persone usa thrived, ma alcune persone usano throve". E tu sapevi anche, più o meno, che se fossi dovuto tornare indietro nel tempo di 200 anni e domandare al seguente luminare dalla capigliatura ugualmente favolosa, (Risate) "Tom, secondo te cosa dovrei dire?" Lui avrebbe detto: "Be', di questi tempi la maggioranza della gente usa throve, ma alcuni usano thrived". Quelli che intendo mostrarvi ora sono dati grezzi. Due righe da questa tabella di due miliardi di voci. Ciò che state vedendo è la frequenza anno dopo anno nell'uso di "thrived" e "throve" nella storia. Ora queste sono solo due righe fra due miliardi. Perciò l'intera collezione di dati è un miliardo di volte più grandiosa di questa diapositiva.

(Laughter)

(Risate)

(Applause)

(Applausi)

JM: Now there are many other pictures that are worth 500 billion words. For instance, this one. If you just take influenza, you will see peaks at the time where you knew big flu epidemics were killing people around the globe.

JM: Ora, ci sono molte altre immagini che valgono 500 miliardi di parole. Questa, ad esempio. Se semplicemente prendete la parola influenza, vedrete i picchi nel momento in cui sapevate che c'erano grandi epidemie di influenza che stavano mietendo vittime in tutto il mondo.

ELA: If you were not yet convinced, sea levels are rising, so is atmospheric CO2 and global temperature.

ELA: Se ancora non foste convinti, il livello del mare si sta innalzando, così come i livelli di anidride carbonica nell'atmosfera e la temperatura globale.

JM: You might also want to have a look at this particular n-gram, and that's to tell Nietzsche that God is not dead, although you might agree that he might need a better publicist.

JM: Potreste inoltre voler dare un'occhiata a questo particolare n-grammo, giusto per poter dire a Nietzsche che Dio non è morto, anche se forse sarete d'accordo nel dire che avrebbe bisogno di un migliore agente.

(Laughter)

(Risate)

ELA: You can get at some pretty abstract concepts with this sort of thing. For instance, let me tell you the history of the year 1950. Pretty much for the vast majority of history, no one gave a damn about 1950. In 1700, in 1800, in 1900, no one cared. Through the 30s and 40s, no one cared. Suddenly, in the mid-40s, there started to be a buzz. People realized that 1950 was going to happen, and it could be big. (Laughter) But nothing got people interested in 1950 like the year 1950. (Laughter) People were walking around obsessed. They couldn't stop talking about all the things they did in 1950, all the things they were planning to do in 1950, all the dreams of what they wanted to accomplish in 1950. In fact, 1950 was so fascinating that for years thereafter, people just kept talking about all the amazing things that happened, in '51, '52, '53. Finally in 1954, someone woke up and realized that 1950 had gotten somewhat passé. (Laughter) And just like that, the bubble burst.

ELA: Si può arrivare anche a concetti abbastanza astratti con questo metodo. Ad esempio, lasciate che vi racconti la storia dell'anno 1950. Durante buona parte della storia, a nessuno gliene fregava nulla del 1950. Nel 1700, nel 1800, nel 1900, a nessuno importava. Negli anni Trenta e Quaranta a nessuno importava Improvvisamente, a metà degli anni Quaranta, cominciò ad esserci del fermento. La gente si rese conto che il 1950 stava per arrivare, e poteva essere grandioso. (Risate) Ma nulla fece interessare la gente al 1950 come l'anno 1950. (Risate) La gente se ne andava in giro ossessionata. Non riuscivano a smettere di parlare di tutte le cose che fecero nel 1950, di tutte le cose che stavano pianificando di fare nel 1950, di tutti i sogni di cose che che volevano realizzare nel 1950. A conti fatti il 1950 fu così affascinante che negli anni a seguire la gente continuò a parlare di tutte le cose stupefacenti che accaddero, nel '51, nel '52 e nel '53. Alla fine nel 1954 qualcuno si svegliò e si rese conto che il 1950 era in qualche modo passato di moda. (Risate) E improvvisamente la bolla esplose.

(Laughter)

(Risate)

And the story of 1950 is the story of every year that we have on record, with a little twist, because now we've got these nice charts. And because we have these nice charts, we can measure things. We can say, "Well how fast does the bubble burst?" And it turns out that we can measure that very precisely. Equations were derived, graphs were produced, and the net result is that we find that the bubble bursts faster and faster with each passing year. We are losing interest in the past more rapidly.

E la storia del 1950 è la storia di ogni anno che abbiamo in archivio, con una piccola variante, perché ora abbiamo questi bei diagrammi. E dato che abbiamo questi bei diagrammi, possiamo misurare le cose. Possiamo dire: "Quanto velocemente la bolla esplode?". E si scopre che possiamo misurarla in maniera precisissima. Equazioni vennero dedotte, grafici furono realizzati ed il risultato definitivo é che scopriamo che la bolla esplode sempre più velocemente col passare di ciascun anno. Stiamo perdendo interesse nel passato più rapidamente.

JM: Now a little piece of career advice. So for those of you who seek to be famous, we can learn from the 25 most famous political figures, authors, actors and so on. So if you want to become famous early on, you should be an actor, because then fame starts rising by the end of your 20s -- you're still young, it's really great. Now if you can wait a little bit, you should be an author, because then you rise to very great heights, like Mark Twain, for instance: extremely famous. But if you want to reach the very top, you should delay gratification and, of course, become a politician. So here you will become famous by the end of your 50s, and become very, very famous afterward. So scientists also tend to get famous when they're much older. Like for instance, biologists and physics tend to be almost as famous as actors. One mistake you should not do is become a mathematician. (Laughter) If you do that, you might think, "Oh great. I'm going to do my best work when I'm in my 20s." But guess what, nobody will really care.

JM: Ora un piccolissimo consiglio sulla carriera. Per quanti di voi che desiderano essere famosi, possiamo imparare dalle 25 figure politiche più famose, autori, attori e così via. Ad esempio, se volete diventare famosi da giovani, dovreste fare gli attori perché in quel caso la fama inizia a crescere con l'avvicinarsi dei trent'anni siete ancora giovani, è davvero meraviglioso. Se invece potete attendere un po', potreste diventare degli autori, perché in quel caso raggiungerete vette altissime, come Mark Twain, ad esempio. Estremamente famoso. Ma se volete raggiungere il massimo dovreste rinviare le gratificazioni e, ovviamente, diventare un politico. In questo caso diventereste famosi verso la fine dei cinquant'anni, e molto molto famosi da andando avanti con l'età. Anche gli scienziati tendono a diventare famosi in età molto più avanzata. Biologi e fisici, ad esempio, tendono ad essere quasi tanto famosi quanto gli attori. Un errore che non dovreste commettere è quello di diventare dei matematici. (Risate) Se lo faceste potreste pensare: "Oh, fantastico! Realizzerò il mio miglior lavoro tra i venti e i trent'anni." Ma, indovinate un po'? A nessuno importerà nulla.

(Laughter)

(Risate)

ELA: There are more sobering notes among the n-grams. For instance, here's the trajectory of Marc Chagall, an artist born in 1887. And this looks like the normal trajectory of a famous person. He gets more and more and more famous, except if you look in German. If you look in German, you see something completely bizarre, something you pretty much never see, which is he becomes extremely famous and then all of a sudden plummets, going through a nadir between 1933 and 1945, before rebounding afterward. And of course, what we're seeing is the fact Marc Chagall was a Jewish artist in Nazi Germany.

ELA: Ci sono annotazioni più serie tra gli n-grammi. Ad esempio, ecco la traiettoria di Marc Chagall, un artista nato nel 1887. Questa sembra essere la normale traiettoria di una persona famosa. Diventa sempre più famoso, tranne quando si considerano gli n-grammi tedeschi. Se date uno sguardo in Germania, vedrete qualcosa di assolutamente bizzarro, qualcosa che non si vede praticamente mai, ovvero il fatto che diventa estremamente famoso e poi tutto a un tratto la sua fama precipita raggiungendo il punto più basso tra il 1933 e il 45, prima di recuperare terreno in seguito. Ovviamente quello che stiamo guardando è il fatto che Marc Chagall era un artista ebreo nella Germania nazista.

Now these signals are actually so strong that we don't need to know that someone was censored. We can actually figure it out using really basic signal processing. Here's a simple way to do it. Well, a reasonable expectation is that somebody's fame in a given period of time should be roughly the average of their fame before and their fame after. So that's sort of what we expect. And we compare that to the fame that we observe. And we just divide one by the other to produce something we call a suppression index. If the suppression index is very, very, very small, then you very well might be being suppressed. If it's very large, maybe you're benefiting from propaganda.

Ora questi segnali sono davvero tanto evidenti da non rendere necessario il sapere che qualcuno è stato censurato. Possiamo arrivarci tranquillamente usando teorie dei segnali davvero elementari. Ecco un modo facile per farlo. Ci si può ragionevolmente aspettare che la fama di una persona in un dato periodo di tempo sia approssimativamente la media della sua precedente fama e di quella successiva. Questo è un po' quello che ci attendiamo. Ora, confrontiamo questo con la fama che osserviamo. E semplicemente dividiamo l'una per l'altra per produrre qualcosa che noi chiamiamo indice di repressione. Se l'indice di repressione di una persona è molto, molto, molto piccolo quella persona potrebbe benissimo star venendo censurata. Se è molto ampio, forse sta traendo beneficio dalla propaganda.

JM: Now you can actually look at the distribution of suppression indexes over whole populations. So for instance, here -- this suppression index is for 5,000 people picked in English books where there's no known suppression -- it would be like this, basically tightly centered on one. What you expect is basically what you observe. This is distribution as seen in Germany -- very different, it's shifted to the left. People talked about it twice less as it should have been. But much more importantly, the distribution is much wider. There are many people who end up on the far left on this distribution who are talked about 10 times fewer than they should have been. But then also many people on the far right who seem to benefit from propaganda. This picture is the hallmark of censorship in the book record.

JM: Adesso potete dare un'occhiata alla distribuzione degli indici di repressione sull'intera popolazione Ad esempio, in questo caso: questo indice di repressione è quello di 5.000 persone estratte da libri inglesi in cui non risulta alcuna repressione. La distribuzione sarebbe questa, fondamentalmente distribuito attorno all'1. Ciò che si osserva è sostanzialmente identico alle aspettative. Questa è la distribuzione come vista in Germania; é molto diversa, spostata più a sinistra. La gente ne parlava due volte meno di quanto avrebbe teoricamente dovuto. Ma, cosa molto più importante, la distribuzione è molto più larga. Ci sono molte persone che finiscono nella parte più a sinistra della distribuzione di cui si parla circa 10 volte meno di quanto si sarebbe dovuto. Ma al contempo molte persone sulla parte più a destra che sembrano beneficiare della propaganda. Questa immagine è il marchio della censura nella storia dell'editoria.

ELA: So culturomics is what we call this method. It's kind of like genomics. Except genomics is a lens on biology through the window of the sequence of bases in the human genome. Culturomics is similar. It's the application of massive-scale data collection analysis to the study of human culture. Here, instead of through the lens of a genome, through the lens of digitized pieces of the historical record. The great thing about culturomics is that everyone can do it. Why can everyone do it? Everyone can do it because three guys, Jon Orwant, Matt Gray and Will Brockman over at Google, saw the prototype of the Ngram Viewer, and they said, "This is so fun. We have to make this available for people." So in two weeks flat -- the two weeks before our paper came out -- they coded up a version of the Ngram Viewer for the general public. And so you too can type in any word or phrase that you're interested in and see its n-gram immediately -- also browse examples of all the various books in which your n-gram appears.

ELA: Culturomica; è così che chiamiamo questo metodo. E' un po' come la genomica. Eccetto per il fatto che la genomica è uno spiraglio sulla biologia attraverso la finestra della sequenza di basi nel genoma umano. La culturomica è simile. E' l'applicazione dell'analisi su larga scala di una raccolta di dati allo studio della cultura umana. Qui, invece che attraverso la lente di un genoma, è attraverso la lente di frammenti digitalizzati di registrazioni di carattere storico. La cosa esaltante della culturonomica è che chiunque può praticarla. Perché chiunque può praticarla? Chiunque può perché queste tre persone, Jon Orwant, Matt Gray e Will Brockman di Google videro il prototipo dell'Ngram Viewer. e dissero: "E' così divertente. Dobbiamo renderlo disponibile al pubblico". Quindi in due settimane giuste giuste, le due settimane precedenti alla pubblicazione del nostro saggio programmarono una versione dell'Ngram Viewer per il vasto pubblico. Così anche voi potete digitare una qualsiasi parola o frase alla quale siete interessati e vedere il suo n-grammo immediatamente; oltre a spulciare esempi di tutti i vari libri in cui appare il vostro n-grammo .

JM: Now this was used over a million times on the first day, and this is really the best of all the queries. So people want to be their best, put their best foot forward. But it turns out in the 18th century, people didn't really care about that at all. They didn't want to be their best, they wanted to be their beft. So what happened is, of course, this is just a mistake. It's not that strove for mediocrity, it's just that the S used to be written differently, kind of like an F. Now of course, Google didn't pick this up at the time, so we reported this in the science article that we wrote. But it turns out this is just a reminder that, although this is a lot of fun, when you interpret these graphs, you have to be very careful, and you have to adopt the base standards in the sciences.

JM: Questo programma venne utilizzato un milione di volte durante il primo giorno di rilascio, e questa è la migliore di tutte le interrogazioni. Tutti vogliono essere best-qualcosa: best seller, best player... Ma si scopre che nel 18esimo secolo, alla gente non importava assolutamente nulla. Non volevano affatto essere "best-qualcosa", volevano essere "beft-qualcosa". Quello che è successo è, ovviamente, solamente un errore. Non è che si sforzassero di essere mediocri. Semplicemente un tempo si usava scrivere la S in maniera differente, un po' come la F. Questa cosa Google al momento non la capì, quindi lo riferimmo nell'articolo scientifico che abbiamo scritto. Ma alla fine questo è solo un avvertimento sul fatto che, pur essendo molto divertente, interpretare questi grafici richiede molta cautela e bisogna farlo seguendo le regole base della scienza.

ELA: People have been using this for all kinds of fun purposes. (Laughter) Actually, we're not going to have to talk, we're just going to show you all the slides and remain silent. This person was interested in the history of frustration. There's various types of frustration. If you stub your toe, that's a one A "argh." If the planet Earth is annihilated by the Vogons to make room for an interstellar bypass, that's an eight A "aaaaaaaargh." This person studies all the "arghs," from one through eight A's. And it turns out that the less-frequent "arghs" are, of course, the ones that correspond to things that are more frustrating -- except, oddly, in the early 80s. We think that might have something to do with Reagan.

ELA: Le persone hanno usato questo strumento in un sacco di modi spassosi. (Risate) In realtà, non avremo nemmeno bisogno di parlare, ci limiteremo a mostrarvi tutte le diapositive restando in silenzio. Questa persona era interessata alla storia della frustrazione. Ci sono diversi tipi di frustrazione. Se sbatti un dito del piede, è un "argh" con una A. Se il pianeta Terra viene annientato dai Vogon per fare spazio a un passaggio interstellare, quello è un aaaaaaaargh" con otto A. Questa persona studia tutti gli "argh" composti da uno fino a otto A. E si scopre che meno frequenti "argh" sono, ovviamente, quelli che corrispondono a cose che sono più frustranti; tranne che, stranamente, all'inizio degli anni 80. Noi pensiamo che possa avere qualcosa a che fare con Reagan.

(Laughter)

(Risate)

JM: There are many usages of this data, but the bottom line is that the historical record is being digitized. Google has started to digitize 15 million books. That's 12 percent of all the books that have ever been published. It's a sizable chunk of human culture. There's much more in culture: there's manuscripts, there newspapers, there's things that are not text, like art and paintings. These all happen to be on our computers, on computers across the world. And when that happens, that will transform the way we have to understand our past, our present and human culture.

JM: ci sono molti utilizzi per questi dati, ma la cosa che più importa è che la registrazione storica stia venendo digitalizzata. Google ha iniziato a digitalizzare 15 milioni di libri. E' il 12% di tutti i libri che siano mai stati pubblicati. E' una porzione enorme della cultura umana. C'è molto di più nella cultura: ci sono i manoscritti, ci sono le riviste, ci sono cose che non sono testo, come l'arte e la pittura. Tutte cose che, casualmente, si trovano nei nostri computer, nei computer di tutto il mondo; E quando la digitalizzazione sarà completa, trasformerà il modo che abbiamo di comprendere il nostro passato, il nostro presente e la cultura umana.

Thank you very much.

Grazie infinite a tutti.

(Applause)

(Applausi)

(Applause)

(Applauso)

(Laughter)

(Risate)

(Laughter)

(Risate)

(Applause)

(Applausi)

ELA: If you were not yet convinced, sea levels are rising, so is atmospheric CO2 and global temperature.

ELA: Se ancora non foste convinti, il livello del mare si sta innalzando, così come i livelli di anidride carbonica nell'atmosfera e la temperatura globale.

JM: You might also want to have a look at this particular n-gram, and that's to tell Nietzsche that God is not dead, although you might agree that he might need a better publicist.

(Laughter)

(Risate)

(Laughter)

(Risate)

(Laughter)

(Risate)

(Laughter)

(Risate)

Thank you very much.

Grazie infinite a tutti.

(Applause)

(Applausi)

Jean-Baptiste Michel + Erez Lieberman Aiden: What we learned from 5 million books

Jean-Baptiste Michel + Erez Lieberman Aiden: What we learned from 5 million books

Related talks

Brewster Kahle: A free digital library

Aaron Koblin: Visualizing ourselves ... with crowd-sourced data

Amit Sood: Building a museum of museums on the web

Chip Kidd: Designing books is no laughing matter. OK, it is.

Ilan Stavans: Why should you read "Don Quixote"?

Chand John: What's the fastest way to alphabetize your bookshelf?

Related talks

Brewster Kahle: A free digital library

Aaron Koblin: Visualizing ourselves ... with crowd-sourced data

Amit Sood: Building a museum of museums on the web

Chip Kidd: Designing books is no laughing matter. OK, it is.

Ilan Stavans: Why should you read "Don Quixote"?

Chand John: What's the fastest way to alphabetize your bookshelf?