Jean-Baptiste Michel + Erez Lieberman Aiden: What we learned from 5 million books

Erez Lieberman Aiden: Everyone knows that a picture is worth a thousand words. But we at Harvard were wondering if this was really true. (Laughter) So we assembled a team of experts, spanning Harvard, MIT, The American Heritage Dictionary, The Encyclopedia Britannica and even our proud sponsors, the Google. And we cogitated about this for about four years. And we came to a startling conclusion. Ladies and gentlemen, a picture is not worth a thousand words. In fact, we found some pictures that are worth 500 billion words.

Erez Lieberman Aiden: Svi znaju da slika vrijedi tisuću riječi. No, mi smo se na Harvardu zapitali je li to stvarno istina. (Smijeh) Tako smo okupili tim stručnjaka, koji obuhvaća ljude na Harvardu i MIT-u, one koji rade na rječniku American Heritage i Encyclopediji Britannici, čak i naše ponosne sponzore, Google. Razmišljali smo o tome oko četiri godine i došli smo do začuđujućeg zaključka. Dame i gospodo, slika ne vrijedi tisuću riječi. Čak smo pronašli neke slike koje vrijede 500 milijardi riječi.

Jean-Baptiste Michel: So how did we get to this conclusion? So Erez and I were thinking about ways to get a big picture of human culture and human history: change over time. So many books actually have been written over the years. So we were thinking, well the best way to learn from them is to read all of these millions of books. Now of course, if there's a scale for how awesome that is, that has to rank extremely, extremely high. Now the problem is there's an X-axis for that, which is the practical axis. This is very, very low.

Jean-Baptiste Michel: Kako smo došlo do tog zaključka? Erez i ja razmišljali smo o načinima na koje bismo mogli steći općenitu sliku ljudske kulture i ljudske povijesti: promjene kroz vrijeme. Kroz vrijeme je zapravo napisano mnogo knjiga. Stoga smo mislili kako je najbolji način da nešto naučimo iz njih taj da pročitamo sve te milijune knjiga. Naravno, ako postoji ljestvica za mjerenje koliko je to fenomenalno, tako nešto mora biti rangirano vrlo, vrlo visoko. Problem je što za to postoji os x ili praktična os. Na njoj se to nalazi vrlo, vrlo nisko.

(Applause)

(Pljesak)

Now people tend to use an alternative approach, which is to take a few sources and read them very carefully. This is extremely practical, but not so awesome. What you really want to do is to get to the awesome yet practical part of this space. So it turns out there was a company across the river called Google who had started a digitization project a few years back that might just enable this approach. They have digitized millions of books. So what that means is, one could use computational methods to read all of the books in a click of a button. That's very practical and extremely awesome.

Ljudi su skloni primjenjivanju alternativnog pristupa, a to je da izaberu nekoliko izvora i njih pročitaju vrlo pažljivo. To je vrlo praktično, ali nije baš fenomenalno. Ono što zapravo želite jest doći do dijela koji je i fenomenalan i praktičan. Ispada da s druge strane rijeke postoji tvrtka koja se zove Google, koja je prije nekoliko godina počela s projektom digitalizacije koji bi mogao omogućiti upravo ovaj pristup. Digitalizirali su milijune knjiga. A to znači da se možemo služiti računalnim metodama kako bismo sve knjige pročitali pritiskom na tipku. To je vrlo praktično i poprilično fenomenalno.

ELA: Let me tell you a little bit about where books come from. Since time immemorial, there have been authors. These authors have been striving to write books. And this became considerably easier with the development of the printing press some centuries ago. Since then, the authors have won on 129 million distinct occasions, publishing books. Now if those books are not lost to history, then they are somewhere in a library, and many of those books have been getting retrieved from the libraries and digitized by Google, which has scanned 15 million books to date.

ELA: Ispričat ću vam malo o tome odakle dolaze knjige. Od pamtivijeka postoje autori. Oni teže tome da pišu knjige. To je postalo znatno lakše s razvojem tehnike tiskanja prije nekoliko stoljeća. Od tada su autori pobijedili 129 milijuna puta i objavili su knjige. Ako se te knjige s vremenom nisu izgubile, znači da su negdje u nekoj knjižnici. Mnoge od tih knjiga izvučene su iz knjižnica i Google ih je digitalizirao. Do danas je skenirano 15 milijuna knjiga.

Now when Google digitizes a book, they put it into a really nice format. Now we've got the data, plus we have metadata. We have information about things like where was it published, who was the author, when was it published. And what we do is go through all of those records and exclude everything that's not the highest quality data. What we're left with is a collection of five million books, 500 billion words, a string of characters a thousand times longer than the human genome -- a text which, when written out, would stretch from here to the Moon and back 10 times over -- a veritable shard of our cultural genome. Of course what we did when faced with such outrageous hyperbole ... (Laughter) was what any self-respecting researchers would have done. We took a page out of XKCD, and we said, "Stand back. We're going to try science."

Kad Google digitalizira knjigu, stavlja ju u zaista zgodan format. Imamo podatke, a imamo i metapodatke. Imamo informacije o stvarima kao što su mjesto izdavanja, ime autora, datum izdavanja. I mi tada prolazimo kroz sve te zapise i izostavljamo sve što nisu podaci najviše kvalitete. Ono što nam ostaje zbirka je od pet milijuna knjiga, 500 milijardi riječi, niz likova koji je tisuću puta dulji od ljudskog genoma -- tekst koji bi, kad bi se ispisao, protezao 10 puta odavde do Mjeseca i natrag -- zaista tek djelić našeg kulturnog genoma. Naravno, ono što smo učinili, kad smo se suočili s tako skandaloznom hiperbolom... (Smijeh) bilo je isto što bi učinili bilo koji istraživači koji drže do sebe. Uzeli smo jednu stranicu s XKCD-a i rekli: "Odmaknite se! Pokušat ćemo nešto znanstveno!"

(Laughter)

(Smijeh)

JM: Now of course, we were thinking, well let's just first put the data out there for people to do science to it. Now we're thinking, what data can we release? Well of course, you want to take the books and release the full text of these five million books. Now Google, and Jon Orwant in particular, told us a little equation that we should learn. So you have five million, that is, five million authors and five million plaintiffs is a massive lawsuit. So, although that would be really, really awesome, again, that's extremely, extremely impractical. (Laughter)

JM: Naravno, mislili smo, hajdemo prvo omogućiti pristup podacima kako bi ih ljudi mogli znanstveno promotriti. Razmišljali smo kojim podacima možemo omogućiti pristup? Naravno, želite uzeti te knjige i omogućiti pristup kompletnom tekstu tih pet milijuna knjiga. Google, a pogotovo Jon Orwant, pokazali su nam malu jednadžbu koju smo morali naučiti. Imate pet milijuna knjiga, odnosno pet miljuna autora i pet milijuna tužitelja u masovnoj tužbi. Dakle, iako bi to bilo stvarno, stvarno fenomenalno, to je opet vrlo, vrlo nepraktično. (Smijeh)

Now again, we kind of caved in, and we did the very practical approach, which was a bit less awesome. We said, well instead of releasing the full text, we're going to release statistics about the books. So take for instance "A gleam of happiness." It's four words; we call that a four-gram. We're going to tell you how many times a particular four-gram appeared in books in 1801, 1802, 1803, all the way up to 2008. That gives us a time series of how frequently this particular sentence was used over time. We do that for all the words and phrases that appear in those books, and that gives us a big table of two billion lines that tell us about the way culture has been changing.

Opet smo popustilli i primijenili vrlo praktičan pristup, koji je bio nešto manje fenomenalan. Rekli smo, umjesto da omogućimo pristup kompletnom tekstu, omogućit ćemo pristup statistikama o knjigama. Uzmite primjerice "tračak sreće" (a gleam of happiness). To su četiri riječi i to zovemo četverogram. Reći ćemo vam koliko se puta određeni četverogram pojavio u knjigama 1801., 1802., 1803. godine, i tako sve do 2008. Tako dobivamo vremenski niz učestalosti korištenja određene rečenice kroz vrijeme, To smo napravili za sve riječi i izraze koji se pojavljuju u tim knjigama, što nam daje veliku tablicu od dvije milijarde redaka koji nam prikazuju način na koji se kultura mijenja.

ELA: So those two billion lines, we call them two billion n-grams. What do they tell us? Well the individual n-grams measure cultural trends. Let me give you an example. Let's suppose that I am thriving, then tomorrow I want to tell you about how well I did. And so I might say, "Yesterday, I throve." Alternatively, I could say, "Yesterday, I thrived." Well which one should I use? How to know?

ELA: Te dvije milijarde redaka zovemo dvije milijarde n-grama. Što nam oni govore? Pojedinačni n-grami mjere kulturne trendove. Dat ću vam primjer. Pretpostavimo da ja težim nečemu (thrive), a sutra vam želim ispričati koliko sam bio uspješan. Mogao bih koristiti oblik za prošlo vrijeme "throve", a mogao koristiti i oblike "thrived". Koji bih trebao koristiti? Kako to znati?

As of about six months ago, the state of the art in this field is that you would, for instance, go up to the following psychologist with fabulous hair, and you'd say, "Steve, you're an expert on the irregular verbs. What should I do?" And he'd tell you, "Well most people say thrived, but some people say throve." And you also knew, more or less, that if you were to go back in time 200 years and ask the following statesman with equally fabulous hair, (Laughter) "Tom, what should I say?" He'd say, "Well, in my day, most people throve, but some thrived." So now what I'm just going to show you is raw data. Two rows from this table of two billion entries. What you're seeing is year by year frequency of "thrived" and "throve" over time. Now this is just two out of two billion rows. So the entire data set is a billion times more awesome than this slide.

Prije otprilike šest mjeseci, najsuvremeniji podaci u tom polju kažu da biste, primjerice, otišli do ovog psihologa fantastične kose i rekli biste: "Steve, ti si stručnjak za nepravilne glagole. Što da radim?" A on bi vam rekao: "Pa, većina ljudi koristi "thrived", ali neki ljudi kažu "throve". A znali biste i, više-manje, da kad biste se vratili 200 godina u prošlost i pitali ovog državnika jednako fantastične kose, (Smijeh) "Tome, kako bih trebao govoriti?" On bi vam rekao: "Pa, u moje vrijeme većina je ljudi koristila "throve", ali neki su koristili "thrived". Sad ću vam pokazati samo sirove podatke. Dva reda iz ove tablice od dvije milijarde unosa. Sada gledate učestalost godinu za godinom korištenja "thrived" i "throve" kroz vrijeme. Dakle, to su samo dva reda od dvije milijarde redova. Ukupan skup podataka milijardu je puta fenomenalniji od ovog slajda.

(Laughter)

(Smijeh)

(Applause)

(Pljesak)

JM: Now there are many other pictures that are worth 500 billion words. For instance, this one. If you just take influenza, you will see peaks at the time where you knew big flu epidemics were killing people around the globe.

JM: Postoji mnogo drugih slika koje vrijede 500 milijardi riječi. Na primjer, ova ovdje. Ako uzmete samo gripu, vidjet ćete vrhove u vrijeme za koje znate da su velike epidemije tada ubijale ljude u cijelom svijetu.

ELA: If you were not yet convinced, sea levels are rising, so is atmospheric CO2 and global temperature.

ELA: Ako vam treba još dokaza, diže se razina mora, kao i CO2 i temperatura u svijetu.

JM: You might also want to have a look at this particular n-gram, and that's to tell Nietzsche that God is not dead, although you might agree that he might need a better publicist.

JM: Možda ne bi bilo loše da pogledate i ovaj konkretni n-gram, koji govori Nietzscheu da Bog nije mrtav, iako se možda slažete da bi mu trebao bolji izdavač.

(Laughter)

(Smijeh)

ELA: You can get at some pretty abstract concepts with this sort of thing. For instance, let me tell you the history of the year 1950. Pretty much for the vast majority of history, no one gave a damn about 1950. In 1700, in 1800, in 1900, no one cared. Through the 30s and 40s, no one cared. Suddenly, in the mid-40s, there started to be a buzz. People realized that 1950 was going to happen, and it could be big. (Laughter) But nothing got people interested in 1950 like the year 1950. (Laughter) People were walking around obsessed. They couldn't stop talking about all the things they did in 1950, all the things they were planning to do in 1950, all the dreams of what they wanted to accomplish in 1950. In fact, 1950 was so fascinating that for years thereafter, people just kept talking about all the amazing things that happened, in '51, '52, '53. Finally in 1954, someone woke up and realized that 1950 had gotten somewhat passé. (Laughter) And just like that, the bubble burst.

ELA: Na ovaj način možete dobiti prilično apstraktne koncepte. Na primjer, ispričat ću vam priču o 1950. godini. Veliki dio povijesti, nikoga nije bilo briga za 1950. godinu. 1700. godine, 1800., 1900., nikoga nije bilo briga. 30-ih i 40-ih godina, nikoga nije bilo briga. Odjednom, sredinom 40-ih, počelo se brujati o tome. Ljudi su shvatili da će doći 1950. godina i da bi mogla biti važna. (Smijeh) Ali ništa nije ljude zainteresiralo za 1950. godinu kao 1950. godina. (Smijeh) Ljudi su hodali uokolo opsjednuti. Nisu mogli prestati govoriti o svim stvarima koje su učinili 1950. godine, o svim stvarima koje planiraju učiniti 1950. godine, o svim snovima koje žele ostvariti 1950. godine. Zapravo, 1950. godina bila je toliko fascinantna da su i godinama kasnije ljudi i dalje govorili o fantastičnim stvarima koje su se dogodile, '51., '52., '53. Na kraju, 1954. godine, netko se otrijeznio i shvatio da je 1950. godina postala passé. (Smijeh) I tako se iznenada mjehurić rasprsnuo.

(Laughter)

(Smijeh)

And the story of 1950 is the story of every year that we have on record, with a little twist, because now we've got these nice charts. And because we have these nice charts, we can measure things. We can say, "Well how fast does the bubble burst?" And it turns out that we can measure that very precisely. Equations were derived, graphs were produced, and the net result is that we find that the bubble bursts faster and faster with each passing year. We are losing interest in the past more rapidly.

Priča o 1950. godini priča je o svakoj godini koju smo zabilježili, s malom razlikom, jer sad imamo ove krasne grafove. A budući da imamo te krasne grafove, možemo mjeriti razne stvari. Možemo pitati: "Koliko će se brzo mjehurić rasprsnuti?" Ispada da to možemo vrlo precizno izmjeriti. Jednadžbe su se derivirale, grafovi su se crtali, a ukupni rezultat jest taj da smo otkrili da se mjehurić rasprsne sve brže sa svakom godinom koja prođe. Sve brže gubimo zanimanje za prošlost.

JM: Now a little piece of career advice. So for those of you who seek to be famous, we can learn from the 25 most famous political figures, authors, actors and so on. So if you want to become famous early on, you should be an actor, because then fame starts rising by the end of your 20s -- you're still young, it's really great. Now if you can wait a little bit, you should be an author, because then you rise to very great heights, like Mark Twain, for instance: extremely famous. But if you want to reach the very top, you should delay gratification and, of course, become a politician. So here you will become famous by the end of your 50s, and become very, very famous afterward. So scientists also tend to get famous when they're much older. Like for instance, biologists and physics tend to be almost as famous as actors. One mistake you should not do is become a mathematician. (Laughter) If you do that, you might think, "Oh great. I'm going to do my best work when I'm in my 20s." But guess what, nobody will really care.

JM: A sad mali savjet o odabiru karijere. Oni među vama koji žele biti slavni mogu ponešto naučiti od 25 najpoznatijih političkih ličnosti, pisaca, glumaca i drugih. Dakle, ako želite rano postati slavni, trebate postati glumac jer tada postajete slavni do kraja svojih 20-ih godina -- još uvijek ste mladi i to je odlično. Ako možete malo čekati, trebali biste biti pisac jer tada se možete vrlo visoko uzdignuti, poput primjerice Marka Twaina, on je bio zaista slavan. Ali ako želite dosegnuti sam vrh, trebali biste odgoditi zadovoljstvo i, naravno, postati političar. U tom ćete slučaju postati poznati do kraja svojih 50-ih godina, i ostati vrlo, vrlo poznati nakon toga. Znanstvenici uglavnom, isto tako, postaju poznati kad ostare. Biolozi i fizičari, primjerice, znaju biti gotovo jednako slavni kao i glumci. Trebate izbjeći samo jednu pogrešku - da postanete matematičar. (Smijeh) Ako to učinite, možda ćete pomisliti: "Odlično, u 20-ima ću napraviti napraviti svoje najbolje radove." No, znate što, nikoga neće biti briga.

(Laughter)

(Smijeh)

ELA: There are more sobering notes among the n-grams. For instance, here's the trajectory of Marc Chagall, an artist born in 1887. And this looks like the normal trajectory of a famous person. He gets more and more and more famous, except if you look in German. If you look in German, you see something completely bizarre, something you pretty much never see, which is he becomes extremely famous and then all of a sudden plummets, going through a nadir between 1933 and 1945, before rebounding afterward. And of course, what we're seeing is the fact Marc Chagall was a Jewish artist in Nazi Germany.

ELA: Postoje i neke ozbiljnije činjenice među n-gramima. Primjerice, evo putanje Marca Chagalla, umjetnika rođenog 1887. godine. Ovo izgleda kao normalna putanja poznate osobe. Postaje sve poznatiji i poznatiji, osim ako gledate za njemački jezik. Ako gledate za njemački, vidjet ćete nešto vrlo bizarno, nešto što gotovo nikad ne vidite, a to je da postaje iznimno poznat, a nakon toga mu popularnost iznenada padne, pri čemu su najniže točke bile između 1933. i 1945. godine, nakon čega mu se opet vratila popularnost. Naravno, on što zapravo vidimo jest činjenica da je Marc Chagall bio židovski umjetnik u nacističkoj Njemačkoj.

Now these signals are actually so strong that we don't need to know that someone was censored. We can actually figure it out using really basic signal processing. Here's a simple way to do it. Well, a reasonable expectation is that somebody's fame in a given period of time should be roughly the average of their fame before and their fame after. So that's sort of what we expect. And we compare that to the fame that we observe. And we just divide one by the other to produce something we call a suppression index. If the suppression index is very, very, very small, then you very well might be being suppressed. If it's very large, maybe you're benefiting from propaganda.

Ovi su signali zapravo toliko jaki da ne trebamo ni znati da su nekoga cenzurirali. Zapravo to možemo zaključiti koristeći osnovnu obradu znakova. Evo jednostavnog načina kako to učiniti. Razumno je za očekivati da će nečija slava u određenom razdoblju biti otprilike prosjek slave te osobe prije i nakon tog razdoblja. To je otprilike ono što mi očekujemo. I to uspoređujemo sa slavom koju promatramo. Samo podijelimo jedno drugim kako bismo dobili takozvani indeks zabrane. Ako je indeks zabrane vrlo, vrlo, vrlo malen, onda ste vrlo vjerojatno bili zabranjeni. Ako je vrlo velik, možda profitirate od propagande.

JM: Now you can actually look at the distribution of suppression indexes over whole populations. So for instance, here -- this suppression index is for 5,000 people picked in English books where there's no known suppression -- it would be like this, basically tightly centered on one. What you expect is basically what you observe. This is distribution as seen in Germany -- very different, it's shifted to the left. People talked about it twice less as it should have been. But much more importantly, the distribution is much wider. There are many people who end up on the far left on this distribution who are talked about 10 times fewer than they should have been. But then also many people on the far right who seem to benefit from propaganda. This picture is the hallmark of censorship in the book record.

JM: Zapravo možete promatrati raspored indeksa zabrane unutar populacija. Na primjer, ovdje -- ovo je indeks zabrane za 5.000 ljudi odabranih u engleskim knjigama u kojima nije zabilježeno zabranjivanje -- bilo bi ovako, usko centrirano oko jednog. Ono što očekujete u biti je ono što i vidite. Ovo je raspored za Njemačku -- vrlo različito, pomaknuto je ulijevo. Ljudi su o tome razgovarali upola manje nego što su trebali. No, mnogo je važnije da je raspored širi. Ima mnogo ljudi koji su sasvim na lijevoj strani rasporeda i o kojima se govori 10 puta manje nego što bi se trebalo. Ali isto tako ima mnogo ljudi na sasvim desnoj strani koji, izgleda, profitiraju od propagande. Ova je slika glavni simbol cenzure u knjigama.

ELA: So culturomics is what we call this method. It's kind of like genomics. Except genomics is a lens on biology through the window of the sequence of bases in the human genome. Culturomics is similar. It's the application of massive-scale data collection analysis to the study of human culture. Here, instead of through the lens of a genome, through the lens of digitized pieces of the historical record. The great thing about culturomics is that everyone can do it. Why can everyone do it? Everyone can do it because three guys, Jon Orwant, Matt Gray and Will Brockman over at Google, saw the prototype of the Ngram Viewer, and they said, "This is so fun. We have to make this available for people." So in two weeks flat -- the two weeks before our paper came out -- they coded up a version of the Ngram Viewer for the general public. And so you too can type in any word or phrase that you're interested in and see its n-gram immediately -- also browse examples of all the various books in which your n-gram appears.

ELA: Dakle, kulturomika jest ime koje smo dali ovoj metodi. Nalikuje na genomiku. Osim što je genomika pogled na biologiju, pogled na slijed baza u ljudskom genomu. Kulturomika je slična tome. To je primjena analize ogromnog skupa podataka na proučavanje ljudske kulture. Ovdje, umjesto da promatramo genom, promatramo digitalizirane dijelove povijesnih zapisa. Ono što je odlično kod kulturomike jest to da se svi mogu njome baviti. Zašto se svi mogu njome baviti? Svi se mogu njome baviti jer su tri tipa, Jon Orwant, Matt Gray i Will Brockman iz Googlea vidjeli prototip preglednika Ngram i rekli: "Ovo je tako zabavno. Moramo ljudima omogućiti pristup tome." Za samo dva tjedna -- dva tjedna prije nego nam je objavljen članak -- iskodirali su verziju pregledika Ngram za javnost. Tako da i vi možete unijeti bilo koju riječ ili izraz koji vas zanima i odmah vidjeti njegove n-grame -- isto tako možete pregledavati primjere iz svih knjiga u kojima se pojavljuje vaš n-gram.

JM: Now this was used over a million times on the first day, and this is really the best of all the queries. So people want to be their best, put their best foot forward. But it turns out in the 18th century, people didn't really care about that at all. They didn't want to be their best, they wanted to be their beft. So what happened is, of course, this is just a mistake. It's not that strove for mediocrity, it's just that the S used to be written differently, kind of like an F. Now of course, Google didn't pick this up at the time, so we reported this in the science article that we wrote. But it turns out this is just a reminder that, although this is a lot of fun, when you interpret these graphs, you have to be very careful, and you have to adopt the base standards in the sciences.

JM: Ovaj je preglednik korišten više od milijun puta prvog dana, i ovo je zapravo najbolji od svih upita. Ljudi žele dati sve od sebe, pokazati se u najboljem svijetlu. Ali ispada da u 18. stoljeću ljudima uopće nije bilo stalo do toga. Nisu željeli dati sve od sebe, željeli su dati fve od sebe. Naravno, ovdje se radi samo o pogrešci. Nije da su težili osrednjosti, već se S prije pisao drugačije, pomalo nalik na F. Naravno, Google to nije prepoznao i to smo napomenuli u znanstvenom članku koji smo napisali. No, ispada da je ovo samo podsjetnik da, iako je ovo vrlo zabavno, kad tumačite ove grafove, morate biti vrlo oprezni i morate usvojiti ove temeljne znastvene standarde.

ELA: People have been using this for all kinds of fun purposes. (Laughter) Actually, we're not going to have to talk, we're just going to show you all the slides and remain silent. This person was interested in the history of frustration. There's various types of frustration. If you stub your toe, that's a one A "argh." If the planet Earth is annihilated by the Vogons to make room for an interstellar bypass, that's an eight A "aaaaaaaargh." This person studies all the "arghs," from one through eight A's. And it turns out that the less-frequent "arghs" are, of course, the ones that correspond to things that are more frustrating -- except, oddly, in the early 80s. We think that might have something to do with Reagan.

ELA: Ljudi ovo koriste za razne zabavne namjene. (Smijeh) Zapravo, ne moramo ni govoriti, samo ćemo vam pokazati sve slajdove i šutjeti. Ovu osobu je zanimala povijest frustracije. Postoje različite vrste frustracija. Kad se udarite u nožni prst, to je "argh" s jednim A. Ako plant Zemlju unište Vogonci kako bi napravili mjesta za međuzvjezdanu zaobilaznicu, to je "aaaaaaaargh" s 8 A-ova. Ova osoba proučava sve "arghove", od jednog do 8 A-ova. Ispada da su manje učestali "arghovi" naravno, oni koji odgovaraju stvarima koje izazivaju veću frustraciju -- osim, čudno, početkom 80-ih. Mislimo da to možda ima veze s Reaganom.

(Laughter)

(Smijeh)

JM: There are many usages of this data, but the bottom line is that the historical record is being digitized. Google has started to digitize 15 million books. That's 12 percent of all the books that have ever been published. It's a sizable chunk of human culture. There's much more in culture: there's manuscripts, there newspapers, there's things that are not text, like art and paintings. These all happen to be on our computers, on computers across the world. And when that happens, that will transform the way we have to understand our past, our present and human culture.

JM: Ovi se podaci mogu koristiti za razne namjene, ali ono što je bitno jest da se povijesni zapisi digitaliziraju. Google je počeo digitalizirati 15 milijuna knjiga. To je 12 posto svih knjiga koje su ikad izdane. To je povelik dio ljudske kulture. U kulturi ima još mnogo toga: rukopisi, novine, postoje stvari koje nisu tekst, poput umjetnosti i slika. To će sve biti na našim računalima, na računalima u cijelome svijetu. A kad se to dogodi, promijenit će se način na koji smo shvaćali svoju prošlost, svoju sadašnjost i ljudsku kulturu.

Thank you very much.

Hvala vam puno.

(Applause)

(Pljesak)

(Applause)

(Pljesak)

(Laughter)

(Smijeh)

(Laughter)

(Smijeh)

(Applause)

(Pljesak)

ELA: If you were not yet convinced, sea levels are rising, so is atmospheric CO2 and global temperature.

ELA: Ako vam treba još dokaza, diže se razina mora, kao i CO2 i temperatura u svijetu.

JM: You might also want to have a look at this particular n-gram, and that's to tell Nietzsche that God is not dead, although you might agree that he might need a better publicist.

JM: Možda ne bi bilo loše da pogledate i ovaj konkretni n-gram, koji govori Nietzscheu da Bog nije mrtav, iako se možda slažete da bi mu trebao bolji izdavač.

(Laughter)

(Smijeh)

(Laughter)

(Smijeh)

(Laughter)

(Smijeh)

(Laughter)

(Smijeh)

Thank you very much.

Hvala vam puno.

(Applause)

(Pljesak)

Jean-Baptiste Michel + Erez Lieberman Aiden: What we learned from 5 million books

Jean-Baptiste Michel + Erez Lieberman Aiden: What we learned from 5 million books

Related talks

Brewster Kahle: A free digital library

Aaron Koblin: Visualizing ourselves ... with crowd-sourced data

Amit Sood: Building a museum of museums on the web

Chip Kidd: Designing books is no laughing matter. OK, it is.

Ilan Stavans: Why should you read "Don Quixote"?

Chand John: What's the fastest way to alphabetize your bookshelf?

Related talks

Brewster Kahle: A free digital library

Aaron Koblin: Visualizing ourselves ... with crowd-sourced data

Amit Sood: Building a museum of museums on the web

Chip Kidd: Designing books is no laughing matter. OK, it is.

Ilan Stavans: Why should you read "Don Quixote"?

Chand John: What's the fastest way to alphabetize your bookshelf?