Jean-Baptiste Michel + Erez Lieberman Aiden: What we learned from 5 million books

Erez Lieberman Aiden: Everyone knows that a picture is worth a thousand words. But we at Harvard were wondering if this was really true. (Laughter) So we assembled a team of experts, spanning Harvard, MIT, The American Heritage Dictionary, The Encyclopedia Britannica and even our proud sponsors, the Google. And we cogitated about this for about four years. And we came to a startling conclusion. Ladies and gentlemen, a picture is not worth a thousand words. In fact, we found some pictures that are worth 500 billion words.

Erez Lieberman Aiden: Mindenki tudja, hogy egy kép felér ezer szóval. De mi a Harvardon elgondolkoztunk, hogy ez tényleg így van-e. (Nevetés) Így összeraktunk egy szakértőkből álló csapatot, Harvardról, MIT-ről az American Heritage Dictionarytől, az Encyclopedia Britannicától és még a büszke szponzorunktól is, a Google-től. És rágódtunk rajta nagyjából négy évig. És egy ijesztő megállapításra jutottunk. Hölgyeim és uraim, egy kép nem ér fel ezer szóval. Valójában, találtunk néhány képet amely 500 milliárd szót ér.

Jean-Baptiste Michel: So how did we get to this conclusion? So Erez and I were thinking about ways to get a big picture of human culture and human history: change over time. So many books actually have been written over the years. So we were thinking, well the best way to learn from them is to read all of these millions of books. Now of course, if there's a scale for how awesome that is, that has to rank extremely, extremely high. Now the problem is there's an X-axis for that, which is the practical axis. This is very, very low.

Jean-Baptiste Michel: De hogyan jutottunk erre a következtetésre? Erez és én olyan módszereket kerestünk, amelyekből egy áttekintő képet kaphatunk az emberi kultúráról és az emberi történelemről, időbeli változásáról. Rengeteg könyvet írtak az évek során. Így mi arra gondoltunk, legjobban úgy tanulhatunk belőlük, ha ezt a több millió könyvet elolvassuk. Természetesen, ha lenne arra egy skála, ez mennyire döbbenetes, akkor ez extrém, extrém módon magas lenne. A probléma viszont az, hogy van egy X tengelye is, ami a praktikusság tengelye. Ez nagyon, nagyon alacsony.

(Applause)

(Taps)

Now people tend to use an alternative approach, which is to take a few sources and read them very carefully. This is extremely practical, but not so awesome. What you really want to do is to get to the awesome yet practical part of this space. So it turns out there was a company across the river called Google who had started a digitization project a few years back that might just enable this approach. They have digitized millions of books. So what that means is, one could use computational methods to read all of the books in a click of a button. That's very practical and extremely awesome.

Manapság hajlamosak az emberek egy másfajta megközelítést használni: vesznek néhány forrást és nagyon alaposan elolvassák. Ez rendkívül hasznos, de nem annyira döbbenetes. Amit igazán szeretnél az az, hogy eljuss az ábra döbbenetes, mégis hasznos részére. Kiderült, van egy vállalat, amely tudja a megoldást: a Google, mely néhány évvel korábban elkezdett egy digitalizálási projektet, ami lehetővé teheti ezt a megközelítést. Könyvek millióit digitalizálták. Mindez azt jelenti, hogy számítási metódusokkal egy gombnyomásra elolvashatóak ezek a könyvek. Ez nagyon hasznos és igazán döbbenetes.

ELA: Let me tell you a little bit about where books come from. Since time immemorial, there have been authors. These authors have been striving to write books. And this became considerably easier with the development of the printing press some centuries ago. Since then, the authors have won on 129 million distinct occasions, publishing books. Now if those books are not lost to history, then they are somewhere in a library, and many of those books have been getting retrieved from the libraries and digitized by Google, which has scanned 15 million books to date.

ELA: Hadd beszéljek egy kicsit arról, honnan is jönnek ezek a könyvek. Emberi emlékezet óta vannak szerzők. Ezek a szerzők arra törekedtek, hogy könyveket írjanak. És mindez nagyságrendekkel könnyebbé vált a nyomtatott sajtó néhány századdal ezelőtti fejlődésével. A szerzők azóta sikeresen, 129 millió különböző alkalommal publikáltak könyvet. Ha ezek a könyvek nem tűntek el a történelemben, akkor valahol megtalálhatóak egy könyvtárban, és a legtöbbjüket a Google kikölcsönözte és digitalizálta -- a mai napig 15 millió könyvet.

Now when Google digitizes a book, they put it into a really nice format. Now we've got the data, plus we have metadata. We have information about things like where was it published, who was the author, when was it published. And what we do is go through all of those records and exclude everything that's not the highest quality data. What we're left with is a collection of five million books, 500 billion words, a string of characters a thousand times longer than the human genome -- a text which, when written out, would stretch from here to the Moon and back 10 times over -- a veritable shard of our cultural genome. Of course what we did when faced with such outrageous hyperbole ... (Laughter) was what any self-respecting researchers would have done. We took a page out of XKCD, and we said, "Stand back. We're going to try science."

Amikor a Google bedigitalizál egy könyvet, egy elég szép formátumba rakja. Szóval megvan az adat és megvan a metaadat. Van információnk arról, hol publikálták, ki volt a szerző, mikor publikálták. Mi azt csináljuk, hogy átnézzük ezeket a rekordokat, és kizárjuk azokat, amelyek nem a legjobb minőségűek. A végén maradt egy ötmillió könyvből álló kollekciónk, 500 milliárd szó, egy ezerszer hosszabb karakterlánc, mint az emberi genom -- egy szöveg, mely leírva elérne a Holdig és vissza 10-szer -- a kulturális genom egy igazi darabja. Természetesen, amikor egy ilyen elképesztő túlzással találkozunk... (Nevetés) ugyanazt tesszük, mint bármely magára valamit is adó kutató tenne. Vettünk egy oldalt az XKCD-ről, és azt mondtuk, "Egy kis helyet! Kipróbáljuk a tudományt."

(Laughter)

(Nevetés)

JM: Now of course, we were thinking, well let's just first put the data out there for people to do science to it. Now we're thinking, what data can we release? Well of course, you want to take the books and release the full text of these five million books. Now Google, and Jon Orwant in particular, told us a little equation that we should learn. So you have five million, that is, five million authors and five million plaintiffs is a massive lawsuit. So, although that would be really, really awesome, again, that's extremely, extremely impractical. (Laughter)

JM: Természetesen, gondoltuk mi, elsőként adjuk oda az adatokat embereknek, akik tanulmányozzák. Arra gondoltunk, milyen adatot adhatunk oda? Természetesen, veheted a könyveket és kiadhatod mind az ötmillió könyv teljes szövegét. Nos, a Google és különösképpen Jon Orwant elmagyarázott nekünk egy kis egyenletet, amelyet meg kellene tanulnunk. Ha van 5 millió, azaz 5 millió szerződ és 5 millió felperes, az egy egész szép peres eljárás. Így, annak ellenére, hogy az igazán, igazán döbbenetes lenne, ismét csak, hihetetlenül, hihetetlenül haszontalan lenne. (Nevetés)

Now again, we kind of caved in, and we did the very practical approach, which was a bit less awesome. We said, well instead of releasing the full text, we're going to release statistics about the books. So take for instance "A gleam of happiness." It's four words; we call that a four-gram. We're going to tell you how many times a particular four-gram appeared in books in 1801, 1802, 1803, all the way up to 2008. That gives us a time series of how frequently this particular sentence was used over time. We do that for all the words and phrases that appear in those books, and that gives us a big table of two billion lines that tell us about the way culture has been changing.

Mi eléggé korlátoltak vagyunk, és vettük az elég praktikus megközelítést, amely valamivel kevésbé döbbenetes. Azt mondtuk, ahelyett, hogy kiadnánk a teljes szöveget, statisztikákat fogunk kiadni a könyvekről. Vegyünk egy példát: "A boldogság egy fénysugara." Négy szó. Négy-gramnak hívjuk. Meg fogjuk mondani, hogy egy bizonyos négy-gram hányszor bukkant fel a könyvekben 1801-ben, 1802-ben, 1803-ban, egészen 2008-ig. Kapunk egy idősort arról, hogy milyen gyakran használták ezt a bizonyos mondatot az idők során. Megcsináljuk ezt minden szóra és kifejezésre, amely azokban a könyvekben előfordul, amely egy 2 milliárd sorból álló halmazt ad, amely elmondja, miként változott a kultúra.

ELA: So those two billion lines, we call them two billion n-grams. What do they tell us? Well the individual n-grams measure cultural trends. Let me give you an example. Let's suppose that I am thriving, then tomorrow I want to tell you about how well I did. And so I might say, "Yesterday, I throve." Alternatively, I could say, "Yesterday, I thrived." Well which one should I use? How to know?

ELA: Ezt a kétmilliárd sort 2 milliárd n-gramnak hívjuk. Mit mondanak nekünk? Nos, az egyes n-gramok a kulturális trendeket mérik. Hadd mondjak egy példát! Tegyük fel, hogy jómódban élek, aztán holnap el akarom mondani, milyen jól éltem. És azt mondhatom, "Tegnap jól éltem (throve)." Másféleképpen, azt mondhatom, "Tegnap jól éltem (thrived)." Nos, melyiket kellene használnom? Honnan lehet tudni?

As of about six months ago, the state of the art in this field is that you would, for instance, go up to the following psychologist with fabulous hair, and you'd say, "Steve, you're an expert on the irregular verbs. What should I do?" And he'd tell you, "Well most people say thrived, but some people say throve." And you also knew, more or less, that if you were to go back in time 200 years and ask the following statesman with equally fabulous hair, (Laughter) "Tom, what should I say?" He'd say, "Well, in my day, most people throve, but some thrived." So now what I'm just going to show you is raw data. Two rows from this table of two billion entries. What you're seeing is year by year frequency of "thrived" and "throve" over time. Now this is just two out of two billion rows. So the entire data set is a billion times more awesome than this slide.

Nagyjából 6 hónappal ezelőtt, a tudomány akkori állása szerint megtehetted volna például, hogy felkeresed az alábbi furcsa hajú pszichológust, és azt mondod, "Steve, a rendhagyó igék szakértője vagy. Mit kéne tennem?" És azt mondta volna, "Nos, a legtöbb ember a 'thrived'-ot használja, de néhányan a 'throve'-ot." És többé-kevésbé azt is tudnád, ha 200 évet visszamész az időben, és megkérdezed az alábbi, szintén furcsa hajú államférfit, (Nevetés) "Tom, melyiket kellene használnom?" Azt mondaná, "Nos, az én időmben a legtöbb ember a 'throve'-ot, de néhányan a 'thrived'-ot." Amit most meg fogok mutatni azok nyers adatok. Két sort ebből a kétmilliárdos halmazból. Az ábrán a "thrive" és a "throve" előfordulási gyakorisága látható az évek során. Nos, ez csak kettő a kétmilliárd sorból. Az egész adathalmaz kétmilliárdszor döbbenetesebb, mint ez a dia.

(Laughter)

(Nevetés)

(Applause)

(Taps)

JM: Now there are many other pictures that are worth 500 billion words. For instance, this one. If you just take influenza, you will see peaks at the time where you knew big flu epidemics were killing people around the globe.

JM: Számtalan más kép van, amely felér 500 milliárd szóval. Például ez. Ha csupán az influenzát vesszük, kiugrásokat fogunk látni azokra az időszakokra, ahol tudjuk, hogy nagy influenza fertőzésekben haltak meg az emberek világszerte.

ELA: If you were not yet convinced, sea levels are rising, so is atmospheric CO2 and global temperature.

ELA: Ha még mindig nem lennének meggyőzve, a tengerszint emelkedik, akárcsak a légkör CO2 tartalma és a globális hőmérséklet.

JM: You might also want to have a look at this particular n-gram, and that's to tell Nietzsche that God is not dead, although you might agree that he might need a better publicist.

JM: Vagy akár megnézhetitek ezt a bizonyos n-gramot, melyben Nietzsche szerint Isten nem halott, habár azzal egyetértenének, hogy jobb publicistára lenne szüksége.

(Laughter)

(Nevetés)

ELA: You can get at some pretty abstract concepts with this sort of thing. For instance, let me tell you the history of the year 1950. Pretty much for the vast majority of history, no one gave a damn about 1950. In 1700, in 1800, in 1900, no one cared. Through the 30s and 40s, no one cared. Suddenly, in the mid-40s, there started to be a buzz. People realized that 1950 was going to happen, and it could be big. (Laughter) But nothing got people interested in 1950 like the year 1950. (Laughter) People were walking around obsessed. They couldn't stop talking about all the things they did in 1950, all the things they were planning to do in 1950, all the dreams of what they wanted to accomplish in 1950. In fact, 1950 was so fascinating that for years thereafter, people just kept talking about all the amazing things that happened, in '51, '52, '53. Finally in 1954, someone woke up and realized that 1950 had gotten somewhat passé. (Laughter) And just like that, the bubble burst.

ELA: Egészen szép absztrakt koncepciókat kaphatsz ilyen dolgokkal. Például, hadd meséljek az 1950. év történelméről! Nagy valószínűséggel a történelem túlnyomó részében senkit sem érdekelt 1950. 1700-ban, 1800-ban, 1900-ban senkit sem érdekelt. A 30-as és a 40-es években senkit sem érdekelt. A 40-es évek közepén hirtelen elkezdődött a mozgolódás. Rájöttek az emberek, hogy hamarosan 1950 lesz, és nagy lehet. (Nevetés) 1950-ben azonban semmi más nem érdekelte úgy az embereket, mint az 1950. év. (Nevetés) Az emberek megszállottan járkáltak. Egyfolytában azokról a dolgokról beszéltek, mit csináltak 1950-ben, mit fognak csinálni 1950-ben, az álmokról, amelyeket meg szeretnének valósítani 1950-ben. 1950 valójában annyira szenzációs volt, hogy évekkel utána is, az emberek egyfolytában az akkor történt csodálatos dolgokról beszéltek, '51-ben, '52-ben, '53-ban. Végül 1954-ben, felébred valaki és rájött, hogy 1950 valahogyan... elmúlt. (Nevetés) És egy csapásra a buborék kipukkant.

(Laughter)

(Nevetés)

And the story of 1950 is the story of every year that we have on record, with a little twist, because now we've got these nice charts. And because we have these nice charts, we can measure things. We can say, "Well how fast does the bubble burst?" And it turns out that we can measure that very precisely. Equations were derived, graphs were produced, and the net result is that we find that the bubble bursts faster and faster with each passing year. We are losing interest in the past more rapidly.

És 1950 története megegyezik valamennyi rendelkezésünkre álló év történetével, egy kis csavarral, hiszen megvannak ezek a szép ábráink. És mivel megvannak ezek a szép ábrák, meg tudunk mérni dolgokat. Azt mondhatjuk, "Nézzük, milyen gyorsan pukkant ki a buborék?" És kiderül, ezt egészen pontosan meg tudjuk mérni. Egyenleteket írtunk, grafikonokat állítottunk fel, és a végső eredmény az, hogy a buborék egyre gyorsabban és gyorsabban pukkan ki, ahogy telnek az egyes évek. Egyre gyorsabban veszítjük el a múlttal kapcsolatos érdeklődésünket.

JM: Now a little piece of career advice. So for those of you who seek to be famous, we can learn from the 25 most famous political figures, authors, actors and so on. So if you want to become famous early on, you should be an actor, because then fame starts rising by the end of your 20s -- you're still young, it's really great. Now if you can wait a little bit, you should be an author, because then you rise to very great heights, like Mark Twain, for instance: extremely famous. But if you want to reach the very top, you should delay gratification and, of course, become a politician. So here you will become famous by the end of your 50s, and become very, very famous afterward. So scientists also tend to get famous when they're much older. Like for instance, biologists and physics tend to be almost as famous as actors. One mistake you should not do is become a mathematician. (Laughter) If you do that, you might think, "Oh great. I'm going to do my best work when I'm in my 20s." But guess what, nobody will really care.

JM: Most pedig egy kis karrier tanács. Mindazok, akik híresek akarnak lenni, tanulhatnak a 25 leghíresebb politikai szereplőtől, szerzőtől, színésztől és így tovább. Szóval ha idejekorán híres akarsz lenni, színésznek kell menned, mert a hírnév a 20-as éveid végén kezd el növekedni -- még fiatal vagy, ez igazán nagyszerű. Ha tudsz egy kicsit tovább várni, szerzőnek kell menned, mivel akkor igen nagy magasságokba emelkedhetsz, mint Mark Twain például: elképesztően híres. Ha viszont a legmagasabbra akarsz jutni, késleltetned kell az önmegvalósítást, és természetesen, politikusnak kell állnod. Így az 50-es éveid végére kezdesz híres lenni, és csak aztán leszel nagyon, nagyon híres. A tudósok is akkor kezdenek híressé válni, amikor idősebbek lesznek. Mint például a biológusok és fizikusok, akik csaknem olyan híressé válhatnak, mint a színészek. Egy hibát nem szabad elkövetned: matematikusnak menned. (Nevetés) Ha így döntesz, azt gondolhatod, "Oh, remek. A legjobbat fogom teljesíteni, amikor a 20-as éveimben járok." De tudod mit? Senkit nem fog érdekelni.

(Laughter)

(Nevetés)

ELA: There are more sobering notes among the n-grams. For instance, here's the trajectory of Marc Chagall, an artist born in 1887. And this looks like the normal trajectory of a famous person. He gets more and more and more famous, except if you look in German. If you look in German, you see something completely bizarre, something you pretty much never see, which is he becomes extremely famous and then all of a sudden plummets, going through a nadir between 1933 and 1945, before rebounding afterward. And of course, what we're seeing is the fact Marc Chagall was a Jewish artist in Nazi Germany.

ELA: Vannak még jobban kijózanító megjegyzések az n-gramok között. Itt van például Marc Chagall pályája, egy 1887-ben született művészé. Ez úgy néz ki, mint egy híres ember átlagos pályája. Egyre jobban és jobban híres lett, kivéve, ha Németországban nézed. Ha Németországban nézed, valami egészen bizarr dolgot láthatsz, valamit, amit szinte még sosem láttál, nevezetesen, hogy hihetetlen híres lesz, aztán egyszercsak bezuhan, egy mélypontra érve 1933 és 1945 között, mielőtt ismét visszapattanna. Természetesen, amit látunk az az a tény, hogy Marc Chagall egy zsidó művész volt a náci Németországban.

Now these signals are actually so strong that we don't need to know that someone was censored. We can actually figure it out using really basic signal processing. Here's a simple way to do it. Well, a reasonable expectation is that somebody's fame in a given period of time should be roughly the average of their fame before and their fame after. So that's sort of what we expect. And we compare that to the fame that we observe. And we just divide one by the other to produce something we call a suppression index. If the suppression index is very, very, very small, then you very well might be being suppressed. If it's very large, maybe you're benefiting from propaganda.

Ezek a jelek igazából annyira erősek, hogy nem kell tudnunk, hogy valaki cenzorálva volt. Valójában ki tudjuk találni, egészen egyszerű jelzőrendszer segítségével. Itt egy egyszerű módszer minderre. Egy ésszerű várakozás, hogy egy adott időszakban valakinek a hírneve a korábbi és a későbbi hírnevének az átlaga. Ez az amit várnánk. És ezt összehasonlítjuk az általunk megfigyelt hírnévvel. Aztán elosztjuk egyiket a másikkal, hogy előállítsunk valamit, amit elnyomási indexnek hívunk. Ha az elnyomási index nagyon, nagyon, nagyon alacsony, akkor nagyon el lehetsz nyomva. Ha nagyon nagy, akkor lehet, hogy propaganda áldozata vagy.

JM: Now you can actually look at the distribution of suppression indexes over whole populations. So for instance, here -- this suppression index is for 5,000 people picked in English books where there's no known suppression -- it would be like this, basically tightly centered on one. What you expect is basically what you observe. This is distribution as seen in Germany -- very different, it's shifted to the left. People talked about it twice less as it should have been. But much more importantly, the distribution is much wider. There are many people who end up on the far left on this distribution who are talked about 10 times fewer than they should have been. But then also many people on the far right who seem to benefit from propaganda. This picture is the hallmark of censorship in the book record.

Igazából meg tudod nézni az elnyomási index eloszlását a teljes populáción. Így például, itt -- ez 5000 ember elnyomási indexe, melyet olyan angol nyelvű könyvekből választottunk ki, ahol nincs tudomásunk elnyomásról -- így kellene kinéznie, nagyjából az egy körül csoportosulva. Amire számítasz az az általad megfigyelt. Ez a Németországban megfigyelhető eloszlás -- nagyon különböző, a baloldalra tolódott. Az emberek kétszer kevesebbszer beszéltek róla, mint kellett volna. De sokkal fontosabb, hogy az eloszlás sokkal szélesebb. Sokan vannak, akik az eloszlás bal szélén helyezkednek el, akikről 10-szer kevesebbszer beszéltek, mint ahogy kellett volna. De aztán sokan vannak a jobb szélén, akik feltehetően propaganda áldozatai. Ez a kép jól illusztrálja a könyvekben megjelenő cenzúrát.

ELA: So culturomics is what we call this method. It's kind of like genomics. Except genomics is a lens on biology through the window of the sequence of bases in the human genome. Culturomics is similar. It's the application of massive-scale data collection analysis to the study of human culture. Here, instead of through the lens of a genome, through the lens of digitized pieces of the historical record. The great thing about culturomics is that everyone can do it. Why can everyone do it? Everyone can do it because three guys, Jon Orwant, Matt Gray and Will Brockman over at Google, saw the prototype of the Ngram Viewer, and they said, "This is so fun. We have to make this available for people." So in two weeks flat -- the two weeks before our paper came out -- they coded up a version of the Ngram Viewer for the general public. And so you too can type in any word or phrase that you're interested in and see its n-gram immediately -- also browse examples of all the various books in which your n-gram appears.

ELA: Kulturonómia, így hívjuk ezt a módszert. Olyan, mint a genomika. Azt leszámítva, hogy a genomika a biológia egyik lencséje, melyen keresztül az emberi genom alapvető szekvenciáit vizsgáljuk. A kulturonómia hasonló. Egy hatalmas méretű adatgyűjtemény analizálásának eszköze, amellyel az emberi kultúrát tanulmányozhatjuk. Itt nem a genom lencséjén keresztül, hanem a történelmi emlékek digitalizált darabjain keresztül. A kulturonómia nagy előnye, hogy bárki művelheti. Miért teheti meg bárki? Azért teheti, mivel három srác, Jon Orwant, Matt Gray és Will Brockman a Google-nél meglátta az Ngram Viewer prototípusát, és azt mondta, "Ez vicces. Az emberek számára elérhetővé kell tenni." Így nagyjából 2 hét alatt -- a tanulmányunk megjelenése előtti 2 hétben -- összerakták az Ngram Viewer publikus verziójának kódját. És így bármilyen szót vagy kifejezést be tudsz táplálni, ami érdekel és azonnal láthatod az n-gramját -- még példákat is mutat a különféle könyvekből, melyekben az n-gramod megtalálható.

JM: Now this was used over a million times on the first day, and this is really the best of all the queries. So people want to be their best, put their best foot forward. But it turns out in the 18th century, people didn't really care about that at all. They didn't want to be their best, they wanted to be their beft. So what happened is, of course, this is just a mistake. It's not that strove for mediocrity, it's just that the S used to be written differently, kind of like an F. Now of course, Google didn't pick this up at the time, so we reported this in the science article that we wrote. But it turns out this is just a reminder that, although this is a lot of fun, when you interpret these graphs, you have to be very careful, and you have to adopt the base standards in the sciences.

JM: Az első napon több mint egymilliószor használták, és ez a legjobb az összes keresés közül. Az emberek a legjobbak akarnak lenni, a legjobban előre haladni. De kiderült, hogy a 18. században az emberek egyáltalán nem törődtek ezzel. Nem a legjobbak (best) akartak lenni, hanem a legjobbak (beft). Természetesen ami történt az csak egy hiba. Nem egy szándékos középszerűség, hanem csak az, hogy az S betűt régen másképp írták, kicsit hasonlóan, mint az F-et. A Google természetesen nem jött rá időben, így ezt jeleztük is az általunk írt tudományos cikkünkben. De igazából ez csak egy figyelmeztetés, hogy habár igen szórakoztató amikor ezeket a grafikonokat értelmezed, nagyon óvatosnak kell lenned, és el kell fogadnod a tudomány alapfeltételeit.

ELA: People have been using this for all kinds of fun purposes. (Laughter) Actually, we're not going to have to talk, we're just going to show you all the slides and remain silent. This person was interested in the history of frustration. There's various types of frustration. If you stub your toe, that's a one A "argh." If the planet Earth is annihilated by the Vogons to make room for an interstellar bypass, that's an eight A "aaaaaaaargh." This person studies all the "arghs," from one through eight A's. And it turns out that the less-frequent "arghs" are, of course, the ones that correspond to things that are more frustrating -- except, oddly, in the early 80s. We think that might have something to do with Reagan.

ELA: Az emberek a legkülönfélébb célokra használják. (Nevetés) Igazából, nem is kell beszélnünk, csak megmutatjuk az összes diát és csendben maradunk. Ez a személy a frusztráció történelmére volt kíváncsi. Különféle frusztrációk vannak. Ha levágod a lábujjad, az egy A-s "argh". Ha a Földet elpusztítják a Vogonok, hogy helyet adjanak egy csillagközi kerülőútnak, az egy 8 A-s "aaaaaaaargh". Ez a személy valamennyi "argh"-ot tanulmányozza, egytől nyolc A-ig. És kiderül, hogy a legkevésbé gyakori "argh" természetesen a legjobban frusztráló dolgokhoz kapcsolódik -- leszámítva, furcsán, a 80-as évek elejét. Szerintünk ez valahogy összefügg Reagennel.

(Laughter)

(Nevetés)

JM: There are many usages of this data, but the bottom line is that the historical record is being digitized. Google has started to digitize 15 million books. That's 12 percent of all the books that have ever been published. It's a sizable chunk of human culture. There's much more in culture: there's manuscripts, there newspapers, there's things that are not text, like art and paintings. These all happen to be on our computers, on computers across the world. And when that happens, that will transform the way we have to understand our past, our present and human culture.

JM: Sokféle felhasználási módja van ezeknek az adatoknak, de a lényeg, hogy a történelmi emlékek digitalizálva lettek. A Google elkezdett 15 millió könyvet bedigitalizálni. Ez 12 százaléka a valaha megjelent összes könyvnek. Az emberi kultúra egy méretes darabja. Sokkal több van a kultúrában: kéziratok, újságok, vannak dolgok, amelyek nem szövegek, mint a műalkotások és festmények. Hamarosan mindezek a számítógépünkön lesznek, számítógépeken világszerte. És amikor ez megtörténik, megváltozik a módszer, ahogy korábban a múltunkat, a jelenünket és emberi kultúránkat vizsgáltuk.

Thank you very much.

Köszönjük szépen.

(Applause)

(Taps)

(Applause)

(Taps)

(Laughter)

(Nevetés)

(Laughter)

(Nevetés)

(Applause)

(Taps)

ELA: If you were not yet convinced, sea levels are rising, so is atmospheric CO2 and global temperature.

ELA: Ha még mindig nem lennének meggyőzve, a tengerszint emelkedik, akárcsak a légkör CO2 tartalma és a globális hőmérséklet.

JM: You might also want to have a look at this particular n-gram, and that's to tell Nietzsche that God is not dead, although you might agree that he might need a better publicist.

JM: Vagy akár megnézhetitek ezt a bizonyos n-gramot, melyben Nietzsche szerint Isten nem halott, habár azzal egyetértenének, hogy jobb publicistára lenne szüksége.

(Laughter)

(Nevetés)

(Laughter)

(Nevetés)

(Laughter)

(Nevetés)

(Laughter)

(Nevetés)

Thank you very much.

Köszönjük szépen.

(Applause)

(Taps)

Jean-Baptiste Michel + Erez Lieberman Aiden: What we learned from 5 million books

Jean-Baptiste Michel + Erez Lieberman Aiden: What we learned from 5 million books

Related talks

Brewster Kahle: A free digital library

Aaron Koblin: Visualizing ourselves ... with crowd-sourced data

Amit Sood: Building a museum of museums on the web

Chip Kidd: Designing books is no laughing matter. OK, it is.

Ilan Stavans: Why should you read "Don Quixote"?

Chand John: What's the fastest way to alphabetize your bookshelf?

Related talks

Brewster Kahle: A free digital library

Aaron Koblin: Visualizing ourselves ... with crowd-sourced data

Amit Sood: Building a museum of museums on the web

Chip Kidd: Designing books is no laughing matter. OK, it is.

Ilan Stavans: Why should you read "Don Quixote"?

Chand John: What's the fastest way to alphabetize your bookshelf?