Kenneth Cukier: Big data is better data

America's favorite pie is?

Qual è la torta preferita dagli americani?

Audience: Apple. Kenneth Cukier: Apple. Of course it is. How do we know it? Because of data. You look at supermarket sales. You look at supermarket sales of 30-centimeter pies that are frozen, and apple wins, no contest. The majority of the sales are apple. But then supermarkets started selling smaller, 11-centimeter pies, and suddenly, apple fell to fourth or fifth place. Why? What happened? Okay, think about it. When you buy a 30-centimeter pie, the whole family has to agree, and apple is everyone's second favorite. (Laughter) But when you buy an individual 11-centimeter pie, you can buy the one that you want. You can get your first choice. You have more data. You can see something that you couldn't see when you only had smaller amounts of it.

Pubblico:Quella di mele. Kenneth Cukier: Quella di mele. Ovviamente. Come lo sappiamo? Grazie ai dati. Guardate le vendite dei supermercati. Guardate le vendite dei supermercati relative alle torte surgelate di 30 centimetri. la torta di mele vince, non c'è gara. La maggior parte delle vendite è di torte di mele. Poi i supermercati hanno iniziato a vendere torte più piccole da 11 centimetri e improvvisamente la torta di mele cade al quarto o quinto posto. Perché? Cos'è successo? Pensateci. Quando comprate una torta da 30 centimetri tutta la famiglia deve essere d'accordo e la torta di mele è la seconda preferita di tutti. (Risate) Tuttavia quando comprate una torta da 11 centimetri potete acquistare quella che preferite. Potete avere la vostra prima scelta. Avete più dati. Potete vedere qualcosa che non potreste vedere

Now, the point here is that more data doesn't just let us see more, more of the same thing we were looking at. More data allows us to see new. It allows us to see better. It allows us to see different. In this case, it allows us to see what America's favorite pie is: not apple.

con un quantitativo minore di dati. Qui il punto è che più dati non ci fanno semplicemente vedere di più, ma di più della stessa cosa che stiamo osservando. Più dati ci permettono di vedere il nuovo. Ci permettono di vedere meglio. Ci permettono di vedere in modo diverso. In questo caso ci permettono di vedere che la torta preferita dagli americani è: non la torta di mele.

Now, you probably all have heard the term big data. In fact, you're probably sick of hearing the term big data. It is true that there is a lot of hype around the term, and that is very unfortunate, because big data is an extremely important tool by which society is going to advance. In the past, we used to look at small data and think about what it would mean to try to understand the world, and now we have a lot more of it, more than we ever could before. What we find is that when we have a large body of data, we can fundamentally do things that we couldn't do when we only had smaller amounts. Big data is important, and big data is new, and when you think about it, the only way this planet is going to deal with its global challenges — to feed people, supply them with medical care, supply them with energy, electricity, and to make sure they're not burnt to a crisp because of global warming — is because of the effective use of data.

Probabilmente avete tutti sentito parlare di Big Data. Anzi, probabilmente siete stanchi di sentir nominare la parola Big Data. È vero che c'è stato molto clamore sull'argomento ed è un vero peccato perché i Big Data sono uno strumento veramente importante grazie al quale la società progredirà. In passato siamo stati abituati a guardare ai piccoli dati pensando a cosa potesse dire a cercare di capire il mondo e adesso ne abbiamo molti di più, molti di più di quanti ne abbiamo mai avuti prima. Scopriamo che quando abbiamo un gran quantitativo di dati, possiamo fare certe cose che non potevamo fare quando ne avevamo di meno. I Big Data sono importanti, i Big Data sono nuovi e se ci pensate l'unico modo che il pianeta avrà di gestire la sua sfida globale di nutrire le persone, fornire loro cure mediche, energia, elettricità, ed essere certi che non saranno carbonizzate a causa del riscaldamento globale è grazie ad un uso efficace dei dati.

So what is new about big data? What is the big deal? Well, to answer that question, let's think about what information looked like, physically looked like in the past. In 1908, on the island of Crete, archaeologists discovered a clay disc. They dated it from 2000 B.C., so it's 4,000 years old. Now, there's inscriptions on this disc, but we actually don't know what it means. It's a complete mystery, but the point is that this is what information used to look like 4,000 years ago. This is how society stored and transmitted information.

Cosa c'è di nuovo sui Big Data? Qual è il punto? Per rispondere alla domanda, pensiamo all'aspetto delle informazioni, a come si presentavano fisicamente in passato. Nel 1908 sull'isola di Creta gli archeologi hanno scoperto un disco di argilla. L'hanno datato intorno al 2000 a.C., ha quindi circa 4000 anni. Su questo disco c'è un'iscrizione che effettivamente non sappiamo cosa significhi. È un completo mistero, ma il punto è che questo è l'aspetto che avevano le informazioni 4000 anni fa. È il modo in cui la civiltà archiviava

Now, society hasn't advanced all that much. We still store information on discs, but now we can store a lot more information, more than ever before. Searching it is easier. Copying it easier. Sharing it is easier. Processing it is easier. And what we can do is we can reuse this information for uses that we never even imagined when we first collected the data. In this respect, the data has gone from a stock to a flow, from something that is stationary and static to something that is fluid and dynamic. There is, if you will, a liquidity to information. The disc that was discovered off of Crete that's 4,000 years old, is heavy, it doesn't store a lot of information, and that information is unchangeable. By contrast, all of the files that Edward Snowden took from the National Security Agency in the United States fits on a memory stick the size of a fingernail, and it can be shared at the speed of light. More data. More.

e trasmetteva informazioni. La civiltà non è poi così progredita. Immagazziniamo ancora informazioni su dischi solo che oggi possiamo immagazzinare molte più informazioni come mai prima d'ora. Ricercarle è più semplice. Copiarle è più semplice. Condividerle è più semplice. Elaborarle è più semplice. Inoltre possiamo riutilizzare queste informazioni per scopi che non avremmo mai immaginato quando per la prima volta abbiamo raccolto dati. A questo proposito, i dati sono passati da un blocco ad un flusso, da qualcosa di immobile e statico a qualcosa di fluido e dinamico. C'è, se volete, una liquidità nelle informazioni. Il disco scoperto a Creta vecchio di 4000 anni, è pesante, non può immagazzinare molta informazione e le informazioni non si possono modificare. Invece tutti i file che Edward Snowden ha preso dalla National Security Agency negli Stati Uniti sono contenuti in una chiavetta USB delle dimensioni di un unghia e possono essere condivisi alla velocità della luce. Più dati. Di più.

Now, one reason why we have so much data in the world today is we are collecting things that we've always collected information on, but another reason why is we're taking things that have always been informational but have never been rendered into a data format and we are putting it into data. Think, for example, the question of location. Take, for example, Martin Luther. If we wanted to know in the 1500s where Martin Luther was, we would have to follow him at all times, maybe with a feathery quill and an inkwell, and record it, but now think about what it looks like today. You know that somewhere, probably in a telecommunications carrier's database, there is a spreadsheet or at least a database entry that records your information of where you've been at all times. If you have a cell phone, and that cell phone has GPS, but even if it doesn't have GPS, it can record your information. In this respect, location has been datafied.

Una ragione per cui abbiamo così tanti dati nel mondo oggi è che raccogliamo cose delle quali abbiamo sempre raccolto informazioni, ma un'altra ragione è che stiamo prendendo cose che sono sempre state esplicative ma non le abbiamo mai rappresentate in forma di dati e le stiamo trasformando in dati. Pensate, ad esempio, alla questione della localizzazione. Prendete, ad esempio, Martin Lutero. Se nel 1500 avessimo voluto sapere dove fosse Martin Lutero avremmo dovuto seguirlo tutto il tempo probabilmente con una penna d'oca e un calamaio per documentarlo, pensate a come funziona oggi. Sapete che da qualche parte nel database di un operatore delle telecomunicazioni c'è un foglio o perlomeno un database che raccoglie le vostre informazioni di dove siete in qualunque momento. Se avete un cellulare e quel cellulare ha il GPS, ma anche se non ha il GPS, può registrare le vostre informazioni. In questo senso la localizzazione è stata datificata.

Now think, for example, of the issue of posture, the way that you are all sitting right now, the way that you sit, the way that you sit, the way that you sit. It's all different, and it's a function of your leg length and your back and the contours of your back, and if I were to put sensors, maybe 100 sensors into all of your chairs right now, I could create an index that's fairly unique to you, sort of like a fingerprint, but it's not your finger.

Pensate ora, ad esempio, alla questione della postura il modo in cui tutti voi siete seduti ora, come sei seduto tu, come sei seduto tu, come sei seduto tu. Cambia in funzione della lunghezza delle vostre gambe, della vostra schiena, della forma della vostra schiena, se io mettessi dei sensori magari 100 sensori su ognuna delle vostre sedie proprio adesso potrei creare un indice unico per voi, una sorta di impronta digitale, ma non del vostro dito.

So what could we do with this? Researchers in Tokyo are using it as a potential anti-theft device in cars. The idea is that the carjacker sits behind the wheel, tries to stream off, but the car recognizes that a non-approved driver is behind the wheel, and maybe the engine just stops, unless you type in a password into the dashboard to say, "Hey, I have authorization to drive." Great.

Cosa potremmo farci? Dei ricercatori a Tokyo li stanno utilizzando per un potenziale dispositivo antifurto delle auto. L'idea è che il ladro d'auto sieda dietro al volante, cerchi di avviare l'auto, ma lei riconosce che dietro al volante c'è un guidatore non approvato il motore si fermerà, a meno che voi non digitiate una password sul cruscotto che dica "Hey, ho l'autorizzazione a guidare". Grandioso.

What if every single car in Europe had this technology in it? What could we do then? Maybe, if we aggregated the data, maybe we could identify telltale signs that best predict that a car accident is going to take place in the next five seconds. And then what we will have datafied is driver fatigue, and the service would be when the car senses that the person slumps into that position, automatically knows, hey, set an internal alarm that would vibrate the steering wheel, honk inside to say, "Hey, wake up, pay more attention to the road." These are the sorts of things we can do when we datafy more aspects of our lives.

Cosa succederebbe se ogni singola auto in Europa avesse questa tecnologia? Cosa potremmo fare allora? Forse, aggregando i dati potremmo identificare i segnali rivelatori che predicano al meglio che un incidente automobilistico sta per avvenire nei prossimi cinque secondi. Quello che avremmo datificato sarebbe l'affaticamento del guidatore e la funzione sarebbe che quando l'auto rileva che la persona sta crollando in una certa posizione sa automaticamente di dover attivare un allarme interno che faccia vibrare il volante e suonare un clacson interno per dire "Hei svegliati fai attenzione alla strada". Questo è il tipo di cose che possiamo fare quando datifichiamo più aspetti delle nostre vite.

So what is the value of big data? Well, think about it. You have more information. You can do things that you couldn't do before. One of the most impressive areas where this concept is taking place is in the area of machine learning. Machine learning is a branch of artificial intelligence, which itself is a branch of computer science. The general idea is that instead of instructing a computer what do do, we are going to simply throw data at the problem and tell the computer to figure it out for itself. And it will help you understand it by seeing its origins. In the 1950s, a computer scientist at IBM named Arthur Samuel liked to play checkers, so he wrote a computer program so he could play against the computer. He played. He won. He played. He won. He played. He won, because the computer only knew what a legal move was. Arthur Samuel knew something else. Arthur Samuel knew strategy. So he wrote a small sub-program alongside it operating in the background, and all it did was score the probability that a given board configuration would likely lead to a winning board versus a losing board after every move. He plays the computer. He wins. He plays the computer. He wins. He plays the computer. He wins. And then Arthur Samuel leaves the computer to play itself. It plays itself. It collects more data. It collects more data. It increases the accuracy of its prediction. And then Arthur Samuel goes back to the computer and he plays it, and he loses, and he plays it, and he loses, and he plays it, and he loses, and Arthur Samuel has created a machine that surpasses his ability in a task that he taught it.

Quindi, qual è il valore dei Big Data? Pensateci. Avete più informazioni. Potete fare cose che non avete mai potuto fare prima. Una delle aree più impressionanti dove questo concetto sta prendendo piede è nell'area dell'apprendimento automatico. L'apprendimento automatico è una branca dell'intelligenza artificiale che è a sua volta una branca dell'informatica. L'idea generale è che invece di dire ad un computer cosa fare dovremmo semplicemente inviargli i dati del problema e dire al computer di risolverlo da solo. Per comprenderlo, vi aiuterà vederne le origini. Nel 1950 l'informatico dell'IBM Arthur Samuel, al quale piaceva giocare a dama scrisse un programma per poter giocare contro il computer. Giocò. Vinse. Giocò. Vinse. Giocò. Vinse perché il computer sapeva soltanto quali fossero le mosse consentite. Arthur Samuel conosceva qualcos'altro. Arthur Samuel conosceva la strategia. Così scrisse un piccolo sub-programma da affiancare al primo, che lavorava in background. Tutto quello che faceva era registrare la probabilità che una certa configurazione della scacchiera portasse ad un risultato vincente o perdente dopo ogni mossa. Giocò con il computer. Vinse. Giocò con il computer. Vinse. Giocò con il computer. Vinse. Quindi Arthur Samuel lascio il computer a giocare da solo. Giocò da solo. Raccolse più dati. Raccolse più dati. Migliorò l'accuratezza delle proprie previsioni. Quindi Arthr Samuel ritornò al computer giocò e perse, giocò e perse, giocò e perse. Arthur Samuel aveva creato una macchina in grado di superarlo in un compito che gli aveva insegnato.

And this idea of machine learning is going everywhere. How do you think we have self-driving cars? Are we any better off as a society enshrining all the rules of the road into software? No. Memory is cheaper. No. Algorithms are faster. No. Processors are better. No. All of those things matter, but that's not why. It's because we changed the nature of the problem. We changed the nature of the problem from one in which we tried to overtly and explicitly explain to the computer how to drive to one in which we say, "Here's a lot of data around the vehicle. You figure it out. You figure it out that that is a traffic light, that that traffic light is red and not green, that that means that you need to stop and not go forward."

Questa idea dell'apprendimento automatico sta arrivando ovunque. Cosa ne pensate delle automobili che si guidano da sole? Saremmo una società migliore racchiudendo tutto il codice stradale in un software? No. La memoria è più economica. No. Gli algoritmi sono più rapidi. No, i processori sono migliori. No. Tutte queste cose sono importanti, ma non è il motivo. È perché abbiamo cambiato la natura del problema. Abbiamo cambiato la natura del problema da uno in cui abbiamo tentato di spiegare apertamente ed esplicitamente al computer come guidare a uno in cui diciamo, "Qui ci sono un sacco di dati sul veicolo. Devi capire. Capisci che quello è un semaforo che il semaforo è rosso e non verde il che significa che devi fermarti e non andare avanti."

Machine learning is at the basis of many of the things that we do online: search engines, Amazon's personalization algorithm, computer translation, voice recognition systems. Researchers recently have looked at the question of biopsies, cancerous biopsies, and they've asked the computer to identify by looking at the data and survival rates to determine whether cells are actually cancerous or not, and sure enough, when you throw the data at it, through a machine-learning algorithm, the machine was able to identify the 12 telltale signs that best predict that this biopsy of the breast cancer cells are indeed cancerous. The problem: The medical literature only knew nine of them. Three of the traits were ones that people didn't need to look for, but that the machine spotted.

L'apprendimento automatico è alla base di molte delle cose che facciamo online: motori di ricerca, gli algoritmi personalizzati di Amazon, traduzione computerizzata, sistemi di riconoscimento vocale. I ricercatori hanno recentemente osservato il problema delle biopsie, delle biopsie tumorali, hanno chiesto al computer di identificare, osservando i dati e le statistiche di sopravvivenza, di determinare se le cellule sono veramente tumorali oppure no e sicuramente quando mettendoci i dati, attraverso un algoritmo per l'apprendimento automatico la macchina è in grado di identificare i 12 segni distintivi che predicono al meglio che la biopsia delle cellule tumorali del seno sono effettivamente tumorali. Il problema: la letteratura medica ne conosce soltanto nove. Tre dei tratti erano quelli che non si cercavano, ma che la macchina ha individuato.

Now, there are dark sides to big data as well. It will improve our lives, but there are problems that we need to be conscious of, and the first one is the idea that we may be punished for predictions, that the police may use big data for their purposes, a little bit like "Minority Report." Now, it's a term called predictive policing, or algorithmic criminology, and the idea is that if we take a lot of data, for example where past crimes have been, we know where to send the patrols. That makes sense, but the problem, of course, is that it's not simply going to stop on location data, it's going to go down to the level of the individual. Why don't we use data about the person's high school transcript? Maybe we should use the fact that they're unemployed or not, their credit score, their web-surfing behavior, whether they're up late at night. Their Fitbit, when it's able to identify biochemistries, will show that they have aggressive thoughts. We may have algorithms that are likely to predict what we are about to do, and we may be held accountable before we've actually acted. Privacy was the central challenge in a small data era. In the big data age, the challenge will be safeguarding free will, moral choice, human volition, human agency.

Nei Big Data ci sono anche dei lati oscuri. Miglioreranno le nostre vite, ma ci sono dei problemi dei quali dobbiamo essere consapevoli e il primo è l'idea che potremmo essere puniti per le previsioni, che la polizia potrebbe utilizzare i Big Data per i propri scopi un po' come in "Minority Report". Viene definita sorveglianza predittiva o criminologia algoritmica, l'idea è che se prendiamo molti dati, per esempio dove sono avvenuti i crimini in passato, sappiamo dove inviare le pattuglie. Questo ha senso, però il problema, ovviamente, è che tutto questo non si fermerà semplicemente ai dati di localizzazione arriverà a livello individuale. Perché non utilizzare i dati scolastici delle persone? Forse potremmo usare il fatto che siano disoccupate o meno, il loro punteggio, il loro comportamento su internet, se stanno svegli fino a tardi la notte. Il loro Fitbit quando sarà in grado di identificare la biochimica ci mostrerà che hanno pensieri aggressivi. Potremmo avere algoritmi in grado di prevedere cosa stiamo per fare e potremmo essere ritenuti responsabili prima di aver effettivamente fatto qualcosa. La privacy era la sfida centrale nell'era dei piccoli dati. All'epoca dei grandi dati la sfida sarà la salvaguardia del libero arbitrio, della scelta morale, della decisione umana, dell'azione umana.

There is another problem: Big data is going to steal our jobs. Big data and algorithms are going to challenge white collar, professional knowledge work in the 21st century in the same way that factory automation and the assembly line challenged blue collar labor in the 20th century. Think about a lab technician who is looking through a microscope at a cancer biopsy and determining whether it's cancerous or not. The person went to university. The person buys property. He or she votes. He or she is a stakeholder in society. And that person's job, as well as an entire fleet of professionals like that person, is going to find that their jobs are radically changed or actually completely eliminated. Now, we like to think that technology creates jobs over a period of time after a short, temporary period of dislocation, and that is true for the frame of reference with which we all live, the Industrial Revolution, because that's precisely what happened. But we forget something in that analysis: There are some categories of jobs that simply get eliminated and never come back. The Industrial Revolution wasn't very good if you were a horse. So we're going to need to be careful and take big data and adjust it for our needs, our very human needs. We have to be the master of this technology, not its servant. We are just at the outset of the big data era, and honestly, we are not very good at handling all the data that we can now collect. It's not just a problem for the National Security Agency. Businesses collect lots of data, and they misuse it too, and we need to get better at this, and this will take time. It's a little bit like the challenge that was faced by primitive man and fire. This is a tool, but this is a tool that, unless we're careful, will burn us.

C'è anche un altro problema: i Big Data ci ruberanno il lavoro. i Big Data e gli algoritmi metteranno alla prova i colletti bianchi, il lavoro di concetto del 21° secolo nello stesso modo in cui l'automazione industriale e le linee di assemblaggio hanno messo alla prova il lavoro dei colletti blu nel 20° secolo. Pensate ai tecnici di laboratorio che guardano in un microscopio ad una biopsia per il cancro per determinare se è tumorale oppure no. Questa persona è andata all'università. Questa persona acquista proprietà. Lui o lei votano. Lui o lei è un azionista della società. Il lavoro di questa persona, così come quello di un intero gruppo di professionisti come quella persona, si rivelerà radicalmente cambiato oppure del tutto eliminato. Ci piace pensare che la tecnologia crei lavoro nel tempo dopo un breve, temporaneo periodo di crisi esattamente come è stato vero in riferimento a quello che abbiamo vissuto durante la Rivoluzione Industriale perché è precisamente quello che è capitato. Abbiamo però dimenticato qualcosa in questa analisi: ci sono alcune categorie di lavoro che verranno semplicemente eliminate e non torneranno. La Rivoluzione Industriale non è stata esattamente un bene per i cavalli. Dobbiamo essere cauti e prendere i Big Data e adattarli alle nostre esigenze, alle nostre esigenze in quanto esseri umani. Dobbiamo essere padroni di questa tecnologia, non i suoi servitori. Siamo proprio all'inizio dell'era dei Big Data e onestamente noi non siamo proprio bravi nel maneggiare tutti i dati che siamo in grado di raccogliere. Non è soltanto un problema della National Security Agency Le aziende raccolgono tantissimi dati e ne fanno pure un cattivo utilizzo dobbiamo imparare a gestirli meglio e questo richiederà tempo. È un po' come la sfida che abbiamo affrontato da uomini primitivi con il fuoco. È uno strumento, ma è uno strumento che scotta se non si sta attenti.

Big data is going to transform how we live, how we work and how we think. It is going to help us manage our careers and lead lives of satisfaction and hope and happiness and health, but in the past, we've often looked at information technology and our eyes have only seen the T, the technology, the hardware, because that's what was physical. We now need to recast our gaze at the I, the information, which is less apparent, but in some ways a lot more important. Humanity can finally learn from the information that it can collect, as part of our timeless quest to understand the world and our place in it, and that's why big data is a big deal.

I Big Data trasformeranno come viviamo, come lavoriamo, come pensiamo. Ci aiuteranno a gestire le nostre carriere e ci condurranno ad un vita di soddisfazione e speranza di felicità e salute ma in passato, molto spesso guardando all'Information Technology i nostri occhi hanno visto solo la T, la tecnologia, l'hardware, perché erano fisici. Adesso dobbiamo riadattare il nostro sguardo sulla I sull'informazione, che è meno appariscente ma in qualche modo molto più importante. L'umanità può finalmente imparare dalle informazioni che possono essere raccolte, come parte della nostra ricerca senza tempo per capire il mondo e il nostro posto in esso questo è il motivo per cui i Big Data sono una gran cosa.

(Applause)

(Applausi)

America's favorite pie is?

Qual è la torta preferita dagli americani?

(Applause)

(Applausi)

Kenneth Cukier: Big data is better data

Kenneth Cukier: Big data is better data

Related talks

David McCandless: The beauty of data visualization

Talithia Williams: Own your body's data

Tim Berners-Lee: The next web

Shyam Sankar: The rise of human-computer cooperation

Giorgia Lupi: How we can find ourselves in data

Anders Ynnerman: Visualizing the medical data explosion

Related talks

David McCandless: The beauty of data visualization

Talithia Williams: Own your body's data

Tim Berners-Lee: The next web

Shyam Sankar: The rise of human-computer cooperation

Giorgia Lupi: How we can find ourselves in data

Anders Ynnerman: Visualizing the medical data explosion