Jean-Baptiste Michel + Erez Lieberman Aiden: What we learned from 5 million books

Erez Lieberman Aiden: Everyone knows that a picture is worth a thousand words. But we at Harvard were wondering if this was really true. (Laughter) So we assembled a team of experts, spanning Harvard, MIT, The American Heritage Dictionary, The Encyclopedia Britannica and even our proud sponsors, the Google. And we cogitated about this for about four years. And we came to a startling conclusion. Ladies and gentlemen, a picture is not worth a thousand words. In fact, we found some pictures that are worth 500 billion words.

Erez Lieberman Aiden: Toată lumea știe că o poză valorează cât o mie de cuvinte. Dar noi cei de la Harvard, ne gândeam cât de adevărat este. (Râsete) Așa că am adunat o echipă de experți, de la Harvard, MIT (Institutul Tehnologic din Massachusetts), American Heritage Dictionary, Encyclopedia Britannica și chiar și pe mândrii noștri sponsori, Google. Și am dezbătut această problemă timp de aproape patru ani. Și am ajuns la o concluzie surprinzătoare. Doamnelor și domnilor, o poză nu valorează cât o mie de cuvinte. Chiar am găsit unele poze, care valorează 500 de miliarde de cuvinte.

Jean-Baptiste Michel: So how did we get to this conclusion? So Erez and I were thinking about ways to get a big picture of human culture and human history: change over time. So many books actually have been written over the years. So we were thinking, well the best way to learn from them is to read all of these millions of books. Now of course, if there's a scale for how awesome that is, that has to rank extremely, extremely high. Now the problem is there's an X-axis for that, which is the practical axis. This is very, very low.

Jean-Baptiste Michel: Cum am ajuns la această concluzie? Eu și Erez ne gândeam la modalități de a cuprinde într-o mare poză cultura umană și istoria umanității: schimbarea de-a lungul timpului. În toți acești ani au fost scrise foarte multe cărți. Așa că ne gândeam că cea mai bună metodă să învățăm din ele, este să citim toate aceste milioane de cărți. Bineînțeles că dacă ar exista o scală pentru cât de grozavă e ideea, s-ar afla undeva foarte, foarte sus. Problema este că există și o axă X pentru asta, care este axa practică. Aceasta este foarte, foarte jos.

(Applause)

(Aplauze)

Now people tend to use an alternative approach, which is to take a few sources and read them very carefully. This is extremely practical, but not so awesome. What you really want to do is to get to the awesome yet practical part of this space. So it turns out there was a company across the river called Google who had started a digitization project a few years back that might just enable this approach. They have digitized millions of books. So what that means is, one could use computational methods to read all of the books in a click of a button. That's very practical and extremely awesome.

Oamenii au tendința unei alte abordări, care este să ia câteva surse și să le citească cu atenție. Ceea ce este foarte practic, dar nu foarte grozav. Ceea ce se vrea de fapt este, să avem și partea grozavă și partea practică a acestui spațiu. Așa că am aflat că mai există o companie numită Google care începuse cu câțiva ani în urmă un proiect de digitizare, care ar putea să atingă această abordare. Ei au digitizat milioane de cărți. Ceea ce înseamnă, că prin metode computaționale, cu un singur click, pot fi citite toate cărțile. Ceea ce este foarte practic si foarte grozav.

ELA: Let me tell you a little bit about where books come from. Since time immemorial, there have been authors. These authors have been striving to write books. And this became considerably easier with the development of the printing press some centuries ago. Since then, the authors have won on 129 million distinct occasions, publishing books. Now if those books are not lost to history, then they are somewhere in a library, and many of those books have been getting retrieved from the libraries and digitized by Google, which has scanned 15 million books to date.

ELA: Permiteți-mi să vă povestesc un pic despre originea cărților. Încă din cele mai vechi timpuri, au existat autori. Acești autori s-au străduit să scrie cărți. Proces ce a devenit deosebit de facil o dată cu dezvoltarea presei de tipar în urmă cu câteva secole. De atunci, autorii au câștigat în 129 de milioane de ocazii diferite, publicarea cărților. Dacă acele cărți nu s-au pierdut în istorie, atunci se află undeva într-o bibliotecă, și multe dintre acele cărți au fost scoase din biblioteci și digitizate de către Google, care până în ziua de azi a scanat 15 milioane de cărți.

Now when Google digitizes a book, they put it into a really nice format. Now we've got the data, plus we have metadata. We have information about things like where was it published, who was the author, when was it published. And what we do is go through all of those records and exclude everything that's not the highest quality data. What we're left with is a collection of five million books, 500 billion words, a string of characters a thousand times longer than the human genome -- a text which, when written out, would stretch from here to the Moon and back 10 times over -- a veritable shard of our cultural genome. Of course what we did when faced with such outrageous hyperbole ... (Laughter) was what any self-respecting researchers would have done. We took a page out of XKCD, and we said, "Stand back. We're going to try science."

Când Google digitizează o carte, o și așează într-un format frumos. Acum avem datele și în plus avem și metadatele. Avem informații despre locul de publicare, despre autor, când a fost publicată. Ceea ce facem noi este să filtrăm aceste informații și să ștergem tot ce nu este de cea mai mare calitate. Așa că ce avem acum este o colecție de 5 milioane de cărți, 500 miliarde de cuvinte, un șir de caractere de o mie de ori mai lung decât un genom uman -- un text care, dacă ar fi scris, s-ar întinde de aici, la lună și înapoi de 10 ori -- o adevărată bucată din genomul nostru cultural. Bineînțeles că ceea ce am făcut când ne-am lovit de o asemenea hiperbolă... (Râsete) a fost ceea ce ar face orice cercetător care se respectă. Am luat o pagină din XKCD, și am spus, „Dați-vă înapoi. O să încercăm prin știință.”

(Laughter)

(Râsete)

JM: Now of course, we were thinking, well let's just first put the data out there for people to do science to it. Now we're thinking, what data can we release? Well of course, you want to take the books and release the full text of these five million books. Now Google, and Jon Orwant in particular, told us a little equation that we should learn. So you have five million, that is, five million authors and five million plaintiffs is a massive lawsuit. So, although that would be really, really awesome, again, that's extremely, extremely impractical. (Laughter)

JM: Acum noi ne gândeam, să punem datele acolo și să lăsăm oamenii să experimenteze. Apoi ne-am întrebat, ce informații să lansăm? Normal că vrei să iei cărțile și să vezi tot textul acestor cinci milioane de cărți. Google-ul și în special Jon Orwant, ne-au arătat o mică ecuație pe care ar trebui să o învățăm. Așadar, ai cinci milioane, adică cinci milioane de autori și cinci milioane de reclamanți într-un imens proces. Și deși ar fi foarte foarte grozav, din nou, este foarte, foarte ineficace. (Râsete)

Now again, we kind of caved in, and we did the very practical approach, which was a bit less awesome. We said, well instead of releasing the full text, we're going to release statistics about the books. So take for instance "A gleam of happiness." It's four words; we call that a four-gram. We're going to tell you how many times a particular four-gram appeared in books in 1801, 1802, 1803, all the way up to 2008. That gives us a time series of how frequently this particular sentence was used over time. We do that for all the words and phrases that appear in those books, and that gives us a big table of two billion lines that tell us about the way culture has been changing.

Atunci din nou ne-am cam prăbușit, și am făcut ceea ce părea mai practic, dar mai puțin grozav. Ne-am hotărât ca, în loc să punem întregul text, vom publica statistici despre cărți. Spre exemplu „Un strop de fericire.” Sunt patru cuvinte; este ceea ce numim tetragramă. Vă vom spune de câte ori apare o anumită tetragramă în cărți, în 1801, 1802, 1803, și până în anul 2008. Ceea ce ne va oferi o serie temporală cu frecvența de utilizare a acestei secvențe de-a lungul timpului. Facem asta pentru toate cuvintele și expresiile care apar în acele cărți, și acest lucru ne oferă un mare tabel de două miliarde de linii care ne explică cum s-a schimbat cultura.

ELA: So those two billion lines, we call them two billion n-grams. What do they tell us? Well the individual n-grams measure cultural trends. Let me give you an example. Let's suppose that I am thriving, then tomorrow I want to tell you about how well I did. And so I might say, "Yesterday, I throve." Alternatively, I could say, "Yesterday, I thrived." Well which one should I use? How to know?

ELA: Așadar acele două miliarde de linii, noi le numim două miliarde de n-grame. Ce ne spun ele? N-gramele individuale măsoară trendul cultural. Să vă dau un exemplu. Să presupunem că eu acum mă aflu într-o stare de prosperitate, iar mâine aș vrea să vă spun dvs. cum mă descurc. Așa că aș putea spune, „Ieri, am înflorit.” Sau o altă variantă, aș putea spune, „Ieri, am prosperat.” Pe care ar trebui să o folosesc? De unde știi?

As of about six months ago, the state of the art in this field is that you would, for instance, go up to the following psychologist with fabulous hair, and you'd say, "Steve, you're an expert on the irregular verbs. What should I do?" And he'd tell you, "Well most people say thrived, but some people say throve." And you also knew, more or less, that if you were to go back in time 200 years and ask the following statesman with equally fabulous hair, (Laughter) "Tom, what should I say?" He'd say, "Well, in my day, most people throve, but some thrived." So now what I'm just going to show you is raw data. Two rows from this table of two billion entries. What you're seeing is year by year frequency of "thrived" and "throve" over time. Now this is just two out of two billion rows. So the entire data set is a billion times more awesome than this slide.

De cam șase luni încoace, cel mai bun lucru pe care-l puteai face în acest domeniu, ar fi ca, spre exemplu, să mergi la un psiholog specializat cu coafură fabuloasă, și să-i spui, „Steve, tu ești expert în verbe. Ce ar trebui să fac?” Iar el ți-ar spune, „Păi, cei mai mulți oameni spun „a prospera”, dar unii spun „a înflori”. Și mai mult sau mai puțin știai și că, dacă te-ai întoarce în timp cu 200 de ani și ai întreba un politician cu coafură la fel de fabuloasă, (Râsete) „Tom, cum ar trebui să spun?” El ar spune, „Păi, în vremea mea, cei mai mulți spuneau „a înflori”, dar unii spuneau „a prospera.” Ce vă voi arăta în continuare sunt date brute. Două rânduri din acest tabel cu două miliarde de intrări. Aici vedeți frecvența înregistrată an de an de-a lungul timpului pentru „a prospera” și „a înflori”. Acestea sunt doar două dintre cele două miliarde de rânduri. Așa că întreaga bază de date este de un miliard de ori mai grozavă decât acest slide.

(Laughter)

(Râsete)

(Applause)

(Aplauze)

JM: Now there are many other pictures that are worth 500 billion words. For instance, this one. If you just take influenza, you will see peaks at the time where you knew big flu epidemics were killing people around the globe.

JM: Sunt multe alte poze care valorează 500 de miliarde de cuvinte. Aceasta spre exemplu. Dacă alegi răceală, veți vedea extreme în perioadele în care știați că mari epidemii de gripă au ucis pe tot globul.

ELA: If you were not yet convinced, sea levels are rising, so is atmospheric CO2 and global temperature.

ELA: Dacă nu erați încă convinși, nivelurile mării se ridică, ceea ce înseamnă CO2 atmosferic și temperatura globală.

JM: You might also want to have a look at this particular n-gram, and that's to tell Nietzsche that God is not dead, although you might agree that he might need a better publicist.

JM: Poate ați vrea să vă uitați mai atent la această n-gramă, iar asta este pentru a-i spune lui Nietzsche că Dumnezeu nu este mort, și poate credeți că ar avea nevoie de un editor mai bun.

(Laughter)

(Râsete)

ELA: You can get at some pretty abstract concepts with this sort of thing. For instance, let me tell you the history of the year 1950. Pretty much for the vast majority of history, no one gave a damn about 1950. In 1700, in 1800, in 1900, no one cared. Through the 30s and 40s, no one cared. Suddenly, in the mid-40s, there started to be a buzz. People realized that 1950 was going to happen, and it could be big. (Laughter) But nothing got people interested in 1950 like the year 1950. (Laughter) People were walking around obsessed. They couldn't stop talking about all the things they did in 1950, all the things they were planning to do in 1950, all the dreams of what they wanted to accomplish in 1950. In fact, 1950 was so fascinating that for years thereafter, people just kept talking about all the amazing things that happened, in '51, '52, '53. Finally in 1954, someone woke up and realized that 1950 had gotten somewhat passé. (Laughter) And just like that, the bubble burst.

ELA: Poți ajunge la niște concepte destul de abstracte cu aceste lucruri. Spre exemplu, permiteți-mi să vă spun istoria anului 1950. În cea mai mare parte a istoriei, nimănui nu-i păsa de 1950. În 1700, în 1800, în 1900, nimănui nu-i păsa. În anii 30 și 40, nimănui nu-i păsa. Dintr-o dată, pe la mijlocul anilor 40, a început un zumzet. Oamenii și-au dat seama că 1950 urma să se petreacă, și putea să fie măreț. (Râsete) Dar în 1950, nimic nu i-a interesat mai mult pe oameni, ca anul 1950. (Râsete) Oamenii erau absolut obsedați. Nu mai încetau să vorbească despre ce au făcut ei în 1950, tot ce plănuiau să facă în 1950, tot ce visau să îndeplinească în 1950. 1950 a fost așa de fascinant încât la ani după, oamenii încă mai vorbeau despre lucrurile minunate petrecute atunci, în '51, '52, '53. În 1954, într-un final, cineva s-a trezit şi şi-a dat seama că anul 1950 a cam trecut. (Râsete) Şi uite-aşa, balonul s-a spart.

(Laughter)

(Râsete)

And the story of 1950 is the story of every year that we have on record, with a little twist, because now we've got these nice charts. And because we have these nice charts, we can measure things. We can say, "Well how fast does the bubble burst?" And it turns out that we can measure that very precisely. Equations were derived, graphs were produced, and the net result is that we find that the bubble bursts faster and faster with each passing year. We are losing interest in the past more rapidly.

Şi povestea anului 1950 este povestea fiecărui an pe care l-am înregistrat, cu o mică schimbare, pentru că acum avem aceste diagrame drăguţe. Şi pentru că avem aceste diagrame drăguţe, putem să măsurăm anumite lucruri. Putem întreba, "Cât de repede se sparge acest balon?" Şi ne-am dat seama că putem măsura asta cu precizie mare. Au reieşit ecuaţii, au fost produse grafice iar rezultatul final este că aceste baloane se sparg din ce în ce mai repede cu fiecare an. Ne pierdem interesul pentru istorie mai repede.

JM: Now a little piece of career advice. So for those of you who seek to be famous, we can learn from the 25 most famous political figures, authors, actors and so on. So if you want to become famous early on, you should be an actor, because then fame starts rising by the end of your 20s -- you're still young, it's really great. Now if you can wait a little bit, you should be an author, because then you rise to very great heights, like Mark Twain, for instance: extremely famous. But if you want to reach the very top, you should delay gratification and, of course, become a politician. So here you will become famous by the end of your 50s, and become very, very famous afterward. So scientists also tend to get famous when they're much older. Like for instance, biologists and physics tend to be almost as famous as actors. One mistake you should not do is become a mathematician. (Laughter) If you do that, you might think, "Oh great. I'm going to do my best work when I'm in my 20s." But guess what, nobody will really care.

JM: Un mic sfat despre cariere. Pentru aceia dintre voi care vor să fie faimoşi, putem învăţa de la cele mai cunoscute 25 de personalităţi politice, scriitori, actori şi aşa mai departe. Aşa că dacă vreţi să fiţi cunoscuţi de tineri, ar trebui să fiţi actori, pentru că devii faimos când te apropii de 30 de ani -- eşti încă tânăr, e minunat. Dacă poţi să mai aştepţi puţin, poţi să fii scriitor, pentru că atunci te ridic la culmi foarte înalte, spre exemplu ca Mark Twain: foarte cunoscut. Dar dacă vrei să ajungi în vârful piramidei, ar trebui să întârzii recompensa, şi bineînţeles, să devii politician. Aici vei ajunge cunoscut când deja ai trecut de 50 de ani şi foarte foarte cunoscut după vârsta asta. Şi oamenii de ştiinţă devin cunoscuţi când sunt mult mai în vârstă. Spre exemplu, biologii şi fizicienii ajung la fel de faimoşi ca actorii. O greşeală pe care nu trebuie să o faceţi, este să deveniţi matematicieni. (Râsete) Dacă faceţi asta, o să vă gândiţi, "O, ce bine, o să-mi ating apogeul pe la 20 de ani." Dar ghiciţi ce, nimănui nu-i pasă.

(Laughter)

(Râsete)

ELA: There are more sobering notes among the n-grams. For instance, here's the trajectory of Marc Chagall, an artist born in 1887. And this looks like the normal trajectory of a famous person. He gets more and more and more famous, except if you look in German. If you look in German, you see something completely bizarre, something you pretty much never see, which is he becomes extremely famous and then all of a sudden plummets, going through a nadir between 1933 and 1945, before rebounding afterward. And of course, what we're seeing is the fact Marc Chagall was a Jewish artist in Nazi Germany.

ELA: Există şi note mai vesele printre n-grame. Spre exemplu, iată traiectoria lui Marc Chagall, un artist născut în 1887. Aşa arată traiectoria normală a unui om faimos. Devine din ce în ce mai cunoscut, doar dacă nu te uiţi în germană. Dacă te uiţi în germană, se observă ceva foarte ciudat, ceva ce se observă foarte rar, adică devine extrem de faimos şi apoi decade dintr-o dată, trece printr-un nadir între 1933 şi 1945 înainte de a-şi reveni. Ceea ce vedem este, bineînţeles, că Marc Chagall este un artist evreu într-o Germanie nazistă.

Now these signals are actually so strong that we don't need to know that someone was censored. We can actually figure it out using really basic signal processing. Here's a simple way to do it. Well, a reasonable expectation is that somebody's fame in a given period of time should be roughly the average of their fame before and their fame after. So that's sort of what we expect. And we compare that to the fame that we observe. And we just divide one by the other to produce something we call a suppression index. If the suppression index is very, very, very small, then you very well might be being suppressed. If it's very large, maybe you're benefiting from propaganda.

Aceste semnalmente sunt atât de puternice încât nu avem nevoie să ştim că acea persoană a fost cenzurată. Ne putem da seama folosind procese de semnalare de bază. Iată o metodă de simplă de a o face. Un rezultat așteptat este ca faima cuiva într-o perioadă de timp să fie calculată ca faima anterioară și faima ulterioară. Cam asta așteptăm noi. Și comparăm rezultatul cu faima obeservată de noi. Și am împărțit un rezultat la celălalt pentru a reieși ceva ce noi numim index reprimat. Dacă indexul de reprimare este foarte, foarte, foarte mic, atunci este foarte posibil ca și tu să fii reprimat. Dacă este foarte mare, atunci poate beneficiezi de propagandă.

JM: Now you can actually look at the distribution of suppression indexes over whole populations. So for instance, here -- this suppression index is for 5,000 people picked in English books where there's no known suppression -- it would be like this, basically tightly centered on one. What you expect is basically what you observe. This is distribution as seen in Germany -- very different, it's shifted to the left. People talked about it twice less as it should have been. But much more importantly, the distribution is much wider. There are many people who end up on the far left on this distribution who are talked about 10 times fewer than they should have been. But then also many people on the far right who seem to benefit from propaganda. This picture is the hallmark of censorship in the book record.

JM: Acum puteți privi distribuția indexului de reprimare pentru toate populațiile. Așa că spre exemplu, aici -- indexul de reprimare este pentru 5.000 de persoane alese din cărți engleze, unde nu se cunoaște vreo reprimare -- ar arăta cam așa, în principiu centrat pe una. Ceea ce aștepți este în principiu ceea ce observi. Așa arată distribuția în Germania -- total diferită, este mutată către stânga. Oamenii au vorbit despre asta de două ori mai puțin decât ar fi trebuit. Dar mai important este că distribuția este mult mai întinsă. Sunt oameni care ajung în extrema stângă a distribuției despre care se vorbește de 10 ori mai puțin decât ar trebui. De asemenea, mulți oameni din extrema dreaptă par să beneficieze de propagandă. Această imagine este marca cenzurilor în arhiva cărții.

ELA: So culturomics is what we call this method. It's kind of like genomics. Except genomics is a lens on biology through the window of the sequence of bases in the human genome. Culturomics is similar. It's the application of massive-scale data collection analysis to the study of human culture. Here, instead of through the lens of a genome, through the lens of digitized pieces of the historical record. The great thing about culturomics is that everyone can do it. Why can everyone do it? Everyone can do it because three guys, Jon Orwant, Matt Gray and Will Brockman over at Google, saw the prototype of the Ngram Viewer, and they said, "This is so fun. We have to make this available for people." So in two weeks flat -- the two weeks before our paper came out -- they coded up a version of the Ngram Viewer for the general public. And so you too can type in any word or phrase that you're interested in and see its n-gram immediately -- also browse examples of all the various books in which your n-gram appears.

ELA: Așa că noi numim această metodă, culturomică. Se aseamană cu genomica. Doar că genomica este o lupă asupra biologiei prin fereastra de secvențe de bază din genomul uman. Culturomica este similară. Este o analiză realizată pe o colecție de date imensă aplicată la studiul culturii umane. Aici, nu privim prin lentilele unui genom ci prin lentilele unor piese digitale din istorie. Partea grozavă despre culturomică este că oricine o poate face. De ce o poate face oricine? Toată lumea poate datorită a trei bărbați. Jon Orwant, Matt Gray și Will Brockman de la Google au văzut prototipul lui Ngram Viewer, și au spus, „Este foarte distractiv. Trebuie să-l facem accesibil tuturor.” Așa că în fix două săptămâni -- cele două săptămâni dinainte să ne iasă actele -- au dezvoltat o versiune a Ngram Viewer pentru publicul larg. Așa că și voi puteți să tastați orice cuvânt sau frază doriți și imediat să-i vedeți n-grama -- și să explorați exemple din toate cărțile în care apare n-grama voastră.

JM: Now this was used over a million times on the first day, and this is really the best of all the queries. So people want to be their best, put their best foot forward. But it turns out in the 18th century, people didn't really care about that at all. They didn't want to be their best, they wanted to be their beft. So what happened is, of course, this is just a mistake. It's not that strove for mediocrity, it's just that the S used to be written differently, kind of like an F. Now of course, Google didn't pick this up at the time, so we reported this in the science article that we wrote. But it turns out this is just a reminder that, although this is a lot of fun, when you interpret these graphs, you have to be very careful, and you have to adopt the base standards in the sciences.

JM: În prima zi a fost folosit de peste un milion de ori, iar asta este cea mai bună dintre toate interogările. Așa că oamenii își dau silința, fac tot posibilul. Se pare că în secolul 18, oamenilor nu le păsa deloc. Nu doreau să fie cei mai buni, vroiau să fie cei mai buli. Ceea ce s-a întâmplat este, bineînțeles, doar o greșeală. Nu tindeau către mediocritate, doar N-ul era scris diferit, se apropia de un L. Bineînțeles, Google nu a ales asta atunci, așa că am raportat acestă situație într-un articol științific pe care l-am scris. Dar acesta a fost doar ca să ne amintească de faptul că, deși este foarte distractiv, atunci când interpretezi aceste grafice, trebuie să fii foarte atent, și trebuie să adopți standardele de bază în aceste științe.

ELA: People have been using this for all kinds of fun purposes. (Laughter) Actually, we're not going to have to talk, we're just going to show you all the slides and remain silent. This person was interested in the history of frustration. There's various types of frustration. If you stub your toe, that's a one A "argh." If the planet Earth is annihilated by the Vogons to make room for an interstellar bypass, that's an eight A "aaaaaaaargh." This person studies all the "arghs," from one through eight A's. And it turns out that the less-frequent "arghs" are, of course, the ones that correspond to things that are more frustrating -- except, oddly, in the early 80s. We think that might have something to do with Reagan.

ELA: Oamenii le-au folosit în diferite scopuri distractive. (Râsete) De fapt, nu o să mai vorbim, ci doar o să vă arătăm slide-urile și o să păstrăm liniștea. Această persoană a fost interesată de istoria frustrării. Există diferite tipuri de frustrare. Dacă îți rupi tocul, este un singur „A”. Dacă planeta Pământ este anihilată de Vogoni pentru a permite un zbor interstelar, este un A spus de opt ori „Aaaaaaaa”. Această persoană studiază toate „a-urile”, de la unu la opt A. Și reiese că cele mai rare „a-uri” sunt, evident, cu evenimentele cele mai puțin frustrante -- cu excepția, ciudat, de la începutul anilor '80. Noi credem că ar putea avea legătură cu Reagan.

(Laughter)

(Râsete)

JM: There are many usages of this data, but the bottom line is that the historical record is being digitized. Google has started to digitize 15 million books. That's 12 percent of all the books that have ever been published. It's a sizable chunk of human culture. There's much more in culture: there's manuscripts, there newspapers, there's things that are not text, like art and paintings. These all happen to be on our computers, on computers across the world. And when that happens, that will transform the way we have to understand our past, our present and human culture.

JM: Sunt multe utilizări ale acestei informații, dar concluzia este că istoricul ei este digitizat. Google a început să digitizeze 15 milioane de cărți. Ceea ce înseamnă 12% din totalul de cărți vreodată publicat. Este o parte considerabilă din cultura umană. În cultură există mult mai multe lucruri: manuscripte, ziare, există non-texte, ca arta și picturile. Toate acestea se întâmplă să fie în calculatoarele noastre, în calculatoare din întreaga lume. Și când asta se întâmplă, va schimba felul în care noi ne înțelegem trecutul, prezentul și cultura umană.

Thank you very much.

Vă mulțumesc foarte mult.

(Applause)

(Aplauze)

(Applause)

(Aplauze)

(Laughter)

(Râsete)

(Laughter)

(Râsete)

(Applause)

(Aplauze)

ELA: If you were not yet convinced, sea levels are rising, so is atmospheric CO2 and global temperature.

ELA: Dacă nu erați încă convinși, nivelurile mării se ridică, ceea ce înseamnă CO2 atmosferic și temperatura globală.

JM: You might also want to have a look at this particular n-gram, and that's to tell Nietzsche that God is not dead, although you might agree that he might need a better publicist.

JM: Poate ați vrea să vă uitați mai atent la această n-gramă, iar asta este pentru a-i spune lui Nietzsche că Dumnezeu nu este mort, și poate credeți că ar avea nevoie de un editor mai bun.

(Laughter)

(Râsete)

(Laughter)

(Râsete)

(Laughter)

(Râsete)

(Laughter)

(Râsete)

Thank you very much.

Vă mulțumesc foarte mult.

(Applause)

(Aplauze)

Jean-Baptiste Michel + Erez Lieberman Aiden: What we learned from 5 million books

Jean-Baptiste Michel + Erez Lieberman Aiden: What we learned from 5 million books

Related talks

Brewster Kahle: A free digital library

Aaron Koblin: Visualizing ourselves ... with crowd-sourced data

Amit Sood: Building a museum of museums on the web

Chip Kidd: Designing books is no laughing matter. OK, it is.

Ilan Stavans: Why should you read "Don Quixote"?

Chand John: What's the fastest way to alphabetize your bookshelf?

Related talks

Brewster Kahle: A free digital library

Aaron Koblin: Visualizing ourselves ... with crowd-sourced data

Amit Sood: Building a museum of museums on the web

Chip Kidd: Designing books is no laughing matter. OK, it is.

Ilan Stavans: Why should you read "Don Quixote"?

Chand John: What's the fastest way to alphabetize your bookshelf?