Kenneth Cukier: Big data is better data

America's favorite pie is?

Vilken är USA:s favoritpaj?

Audience: Apple. Kenneth Cukier: Apple. Of course it is. How do we know it? Because of data. You look at supermarket sales. You look at supermarket sales of 30-centimeter pies that are frozen, and apple wins, no contest. The majority of the sales are apple. But then supermarkets started selling smaller, 11-centimeter pies, and suddenly, apple fell to fourth or fifth place. Why? What happened? Okay, think about it. When you buy a 30-centimeter pie, the whole family has to agree, and apple is everyone's second favorite. (Laughter) But when you buy an individual 11-centimeter pie, you can buy the one that you want. You can get your first choice. You have more data. You can see something that you couldn't see when you only had smaller amounts of it.

Publiken: Äpple. Kenneth Cukier: Äpple såklart. Hur vet vi det? Tack vare data. Om man ser på stormarknadsförsäljning. Om man ser på säljsiffror för 30 centimeters frysta pajer så vinner äpple utan konkurrens. Majoriteten av försäljningen är äpple. Men sen började stormarknader sälja mindre, 11 cm:s pajer, och plötsligt föll äpple till fjärde eller femte plats. Varför? Vad hände? OK, fundera på det. När man köper en 30 centimeters paj så behöver hela familjen samsas, och äpple är då allas andrahandsval. (Skratt) Men när man köper en enskild 11 centimeters paj så kan man köpa den man vill ha. Du kan få ditt förstahandsval. Vi har mer data Vi kan se något som vi inte kunde se när vi hade mindre mängder av det. Poängen här är att mer data

Now, the point here is that more data doesn't just let us see more, more of the same thing we were looking at. More data allows us to see new. It allows us to see better. It allows us to see different. In this case, it allows us to see what America's favorite pie is: not apple.

inte bara låter oss se mera, mer av samma sak vi tittar på. Mer data låter oss att se nytt. Det låter oss att se bättre. Det låter oss att se annorlunda. I detta fall, så låter det oss att se vad USA:s favoritpaj är: inte äpple. Ni har troligen hört termen big data.

Now, you probably all have heard the term big data. In fact, you're probably sick of hearing the term big data. It is true that there is a lot of hype around the term, and that is very unfortunate, because big data is an extremely important tool by which society is going to advance. In the past, we used to look at small data and think about what it would mean to try to understand the world, and now we have a lot more of it, more than we ever could before. What we find is that when we have a large body of data, we can fundamentally do things that we couldn't do when we only had smaller amounts. Big data is important, and big data is new, and when you think about it, the only way this planet is going to deal with its global challenges — to feed people, supply them with medical care, supply them with energy, electricity, and to make sure they're not burnt to a crisp because of global warming — is because of the effective use of data.

Ni är troligen trötta på att höra termen big data. Det är sant att termen har blivit trendig och det är väldigt synd, för big data är ett väldigt viktigt verktyg för samhällsutvecklingen. Tidigare har vi tittat på mindre data och funderat på vad den betyder för att försöka förstå världen, och nu har vi mycket mer av det, mer än vi någonsin hade innan. Vad vi upptäcker när vi har en stor mängd med data, är att vi kan i grunden göra saker som inte gick när vi hade mindre mängder. Big data är viktigt och big data är nytt, och när du tänker på det är det enda sättet som planeten kommer hantera sina globala utmaningar på - att ge människor mat, förse dem med sjukvård, förse dem med energi, elektricitet, se till att dom inte blir knaperstekta på grund av global uppvärmning - är med hjälp av effektiv dataanvändning.

So what is new about big data? What is the big deal? Well, to answer that question, let's think about what information looked like, physically looked like in the past. In 1908, on the island of Crete, archaeologists discovered a clay disc. They dated it from 2000 B.C., so it's 4,000 years old. Now, there's inscriptions on this disc, but we actually don't know what it means. It's a complete mystery, but the point is that this is what information used to look like 4,000 years ago. This is how society stored and transmitted information.

Så vad är nytt med big data? Vad är den stora grejen? För att besvara frågan, fundera på hur information såg ut fysiskt, hur den såg ut förr i tiden. 1908, på ön Kreta hittade arkeologer en lerskiva. Dom daterade skivan till 2 000 år f.Kr., så den är 4 000 år gammal. Det finns inskriptioner på skivan, men vi vet inte vad de betyder. Det är ett mysterium, men poängen är att så här brukade information se ut för 4 000 år sedan. Detta är hur samhället lagrade och överförde information.

Now, society hasn't advanced all that much. We still store information on discs, but now we can store a lot more information, more than ever before. Searching it is easier. Copying it easier. Sharing it is easier. Processing it is easier. And what we can do is we can reuse this information for uses that we never even imagined when we first collected the data. In this respect, the data has gone from a stock to a flow, from something that is stationary and static to something that is fluid and dynamic. There is, if you will, a liquidity to information. The disc that was discovered off of Crete that's 4,000 years old, is heavy, it doesn't store a lot of information, and that information is unchangeable. By contrast, all of the files that Edward Snowden took from the National Security Agency in the United States fits on a memory stick the size of a fingernail, and it can be shared at the speed of light. More data. More.

Samhället har inte utvecklats så mycket. Vi lagrar fortfarande information på skivor, men nu kan vi lagra väldigt mycket mera information, mer än någonsin tidigare. Det är lättare att söka och kopiera den. Det är lättare att dela och bearbeta den. Och vi kan återanvända informationen inom områden vi inte ens drömt om när vi först samlade informationen. I detta avseende så har data gått från ett lager till ett flöde, från att vara stationärt och statiskt till att vara flytande och dynamiskt. Man kan säga att information har blivit mer flytande. Skivan som upptäcktes nära Kreta, som är 4 000 år gammal, är tung, den lagrar inte så mycket information, och den informationen är oföränderlig. Som kontrast kunde alla filerna som Edward Snowden tog från National Security Agency i USA få plats på en minnessticka lika stor som en fingernagel och den kan delas med ljusets hastighet. Mer data, mer.

Now, one reason why we have so much data in the world today is we are collecting things that we've always collected information on, but another reason why is we're taking things that have always been informational but have never been rendered into a data format and we are putting it into data. Think, for example, the question of location. Take, for example, Martin Luther. If we wanted to know in the 1500s where Martin Luther was, we would have to follow him at all times, maybe with a feathery quill and an inkwell, and record it, but now think about what it looks like today. You know that somewhere, probably in a telecommunications carrier's database, there is a spreadsheet or at least a database entry that records your information of where you've been at all times. If you have a cell phone, and that cell phone has GPS, but even if it doesn't have GPS, it can record your information. In this respect, location has been datafied.

En orsak till att vi har så mycket data i världen i dag är att vi samlar på saker som vi alltid samlat information om men en annan anledning är att vi tar saker som alltid haft mycket information men som aldrig har gjorts i ett dataformat och vi lägger till det bland våra data. Tänk till exempel på frågan om plats. Ta till exempel Martin Luther. Om vi på 1500-talet ville veta var Martin Luther var, så behöver vi följa efter honom hela tiden kanske med en fjäderpenna och ett bläckhorn, för att skriva ner det, men tänk nu på hur det ser ut idag. Du vet att någonstans, antagligen i en teleoperatörs databas, finns det ett kalkylblad, eller minst en databaspost som sparar din information om var du varit hela tiden. Om du har en mobiltelefon och mobiltelefonen har GPS, men även om den inte har GPS, så kan den spara din information. I detta avseende har plats blivit digitaliserat. Tänk till exempel på frågan om kroppshållning,

Now think, for example, of the issue of posture, the way that you are all sitting right now, the way that you sit, the way that you sit, the way that you sit. It's all different, and it's a function of your leg length and your back and the contours of your back, and if I were to put sensors, maybe 100 sensors into all of your chairs right now, I could create an index that's fairly unique to you, sort of like a fingerprint, but it's not your finger.

det sätt som ni alla sitter på just nu, hur du sitter, hur du sitter, hur du sitter. Alla sätt är olika och de beror på din benlängd och din rygg och din ryggs konturer, och om jag skulle sätta sensorer, kanske 100 sensorer på alla era stolar just nu, så skulle jag kunna skapa ett index som är ganska unikt för dig, ungefär som ett fingeravtryck, men som inte är ditt finger.

So what could we do with this? Researchers in Tokyo are using it as a potential anti-theft device in cars. The idea is that the carjacker sits behind the wheel, tries to stream off, but the car recognizes that a non-approved driver is behind the wheel, and maybe the engine just stops, unless you type in a password into the dashboard to say, "Hey, I have authorization to drive." Great.

Så vad kan vi göra med detta? Forskare i Tokyo använder det som ett potentiellt stöldskydd till bilar. Idén är att biltjuven sitter bakom ratten, försöker köra iväg, men bilen känner igen att en icke-godkänd förare sitter bakom ratten, kanske stannar motorn, fram tills du skriver in ett lösenord på instrumentbrädan för att säga: "Hej, jag har rätt att köra". Bra.

What if every single car in Europe had this technology in it? What could we do then? Maybe, if we aggregated the data, maybe we could identify telltale signs that best predict that a car accident is going to take place in the next five seconds. And then what we will have datafied is driver fatigue, and the service would be when the car senses that the person slumps into that position, automatically knows, hey, set an internal alarm that would vibrate the steering wheel, honk inside to say, "Hey, wake up, pay more attention to the road." These are the sorts of things we can do when we datafy more aspects of our lives.

Tänk om varje bil i Europa hade den här tekniken? Vad skulle vi kunna göra då? Om vi kombinerade all data kanske vi kunde identifiera varningstecknen som bäst förutsäger att en bilolycka kommer att ske inom fem sekunder. Och vad vi då har fastställt är förartrötthet, och tjänsten skulle handla om att bilen känner av att personen tappar sin hållning, automatiskt vet och ställer in ett internt alarm som får ratten att vibrera och tutar inuti bilen för att säga, "Hallå, vakna upp, fokusera mera på vägen." Detta är den typ av saker vi kan göra när vi digitaliserar fler delar av våra liv. Så vad är värdet av big data?

So what is the value of big data? Well, think about it. You have more information. You can do things that you couldn't do before. One of the most impressive areas where this concept is taking place is in the area of machine learning. Machine learning is a branch of artificial intelligence, which itself is a branch of computer science. The general idea is that instead of instructing a computer what do do, we are going to simply throw data at the problem and tell the computer to figure it out for itself. And it will help you understand it by seeing its origins. In the 1950s, a computer scientist at IBM named Arthur Samuel liked to play checkers, so he wrote a computer program so he could play against the computer. He played. He won. He played. He won. He played. He won, because the computer only knew what a legal move was. Arthur Samuel knew something else. Arthur Samuel knew strategy. So he wrote a small sub-program alongside it operating in the background, and all it did was score the probability that a given board configuration would likely lead to a winning board versus a losing board after every move. He plays the computer. He wins. He plays the computer. He wins. He plays the computer. He wins. And then Arthur Samuel leaves the computer to play itself. It plays itself. It collects more data. It collects more data. It increases the accuracy of its prediction. And then Arthur Samuel goes back to the computer and he plays it, and he loses, and he plays it, and he loses, and he plays it, and he loses, and Arthur Samuel has created a machine that surpasses his ability in a task that he taught it.

Tja, tänk på det. Vi har mer information. Vi kan göra saker som inte gick tidigare. Ett av det mest imponerande ställen där detta begrepp vinner mark är inom området för maskininlärning. Maskininlärning är den gren av artificiell intelligens, som i sig är en gren av datavetenskap. Det går ut på att istället för att instruera en dator om vad den ska göra, ger vi den helt enkelt data om ett problem och säger åt datorn att lista ut lösningen själv. Det hjälper er att förstå det genom att se källan. En datavetare på IBM på 50-talet som hette Arthur Samuel gillade att spela dam, så han skrev ett dataprogram så han kunde spela mot datorn. Han spelade. Han vann. Han spelade. Han vann. Han spelade. Han vann, eftersom datorn visste bara vad ett giltigt drag var. Arthur Samuel kunde något annat. Arthur Samuel kunde använda strategi. Så han skrev ett mindre underprogram som arbetade i bakgrunden, och allt det gjorde var att poängsätta sannolikheten att en viss brädkonfiguration sannolikt skulle leda till ett vinnande bräde jämfört med ett förlorande bräde efter varje drag. Han spelade mot datorn. Han vinner. Han spelade mot datorn. Han vinner. Han spelade mot datorn. Han vinner. Sedan lämnar Arthur Samuel datorn så den spelar mot sig själv. Den spelar själv. Den samlar mer data. Den samlar mer data. Den ökar noggrannheten i dess förutsägelser. Sedan går Arthur Samuel tillbaka till datorn och spelar själv mot den, och han förlorar. och han spelar, och han förlorar, och han spelar, och han förlorar, och Arthur Samuel har skapat en maskin som överträffar hans förmåga i en uppgift som han lärde den.

And this idea of machine learning is going everywhere. How do you think we have self-driving cars? Are we any better off as a society enshrining all the rules of the road into software? No. Memory is cheaper. No. Algorithms are faster. No. Processors are better. No. All of those things matter, but that's not why. It's because we changed the nature of the problem. We changed the nature of the problem from one in which we tried to overtly and explicitly explain to the computer how to drive to one in which we say, "Here's a lot of data around the vehicle. You figure it out. You figure it out that that is a traffic light, that that traffic light is red and not green, that that means that you need to stop and not go forward."

Och maskininlärning kommer att finnas överallt. Hur tror du att vi har fått självkörande bilar? Är vi ett bättre samhälle för att vi lägger in alla trafikregler i mjukvara? Nej. Minne är billigt. Nej. Algoritmerna är snabbare. Nej. Processorerna är bättre. Nej. Dom sakerna gör skillnad, men det är inte därför. Det är för att vi ändrat problemets natur. Vi ändrade problemets art från en där vi försökt att öppet och tydligt förklara för datorn hur man kör till en där vi säger, "Här är en massa data om fordonet. Du kan räkna ut det. Räkna ut att det finns ett trafikljus, att trafikljuset är rött och inte grönt, att det betyder att du måste stanna och inte köra framåt." Maskininlärning är grunden

Machine learning is at the basis of many of the things that we do online: search engines, Amazon's personalization algorithm, computer translation, voice recognition systems. Researchers recently have looked at the question of biopsies, cancerous biopsies, and they've asked the computer to identify by looking at the data and survival rates to determine whether cells are actually cancerous or not, and sure enough, when you throw the data at it, through a machine-learning algorithm, the machine was able to identify the 12 telltale signs that best predict that this biopsy of the breast cancer cells are indeed cancerous. The problem: The medical literature only knew nine of them. Three of the traits were ones that people didn't need to look for, but that the machine spotted.

till många saker vi gör online: sökmotorer, Amazons algoritm för personalisering, datoröversättning, röstigenkänningssystem. Forskare har nyligen kollat på frågan om biopsier, cancerbiopsier och dom bad en dator att identifiera, genom att se på data och överlevnadsstatistik för att bedöma om celler är drabbade av cancer eller inte, och visst, när man kastar data på den genom en algoritm för maskininlärning så kunde datorn identifiera de 12 varningssignalerna som bäst förutspådde att biopsin av bröstcancerceller verkligen är cancer. Problemet var att medicinsk facklitteratur bara kände till nio av dom. Tre av egenskaperna var sådana som folk inte behövde leta efter, men som datorn upptäckte.

Now, there are dark sides to big data as well. It will improve our lives, but there are problems that we need to be conscious of, and the first one is the idea that we may be punished for predictions, that the police may use big data for their purposes, a little bit like "Minority Report." Now, it's a term called predictive policing, or algorithmic criminology, and the idea is that if we take a lot of data, for example where past crimes have been, we know where to send the patrols. That makes sense, but the problem, of course, is that it's not simply going to stop on location data, it's going to go down to the level of the individual. Why don't we use data about the person's high school transcript? Maybe we should use the fact that they're unemployed or not, their credit score, their web-surfing behavior, whether they're up late at night. Their Fitbit, when it's able to identify biochemistries, will show that they have aggressive thoughts. We may have algorithms that are likely to predict what we are about to do, and we may be held accountable before we've actually acted. Privacy was the central challenge in a small data era. In the big data age, the challenge will be safeguarding free will, moral choice, human volition, human agency.

Men det finns också mörka sidor av big data Den förbättrar våra liv, men det finns problem som vi måste vara medvetna om, och den första är tanken att vi kan straffas för förutsägelser, att polisen får använda big data för sina syften, lite som i "Minority Report." Det finns en term som kallas för prediktivt polisarbete, eller algoritmisk kriminologi, och idén är att om vi tar en massa data, till exempel om var tidigare brott har skett, vet vi vart vi ska skicka patruller. Det är vettigt, men problemet är naturligtvis att det inte bara kommer att stanna vid platsuppgifter, det kommer att gå ner till individnivå. Varför använder vi inte uppgifter om personens gymnasiebetyg? Vi kanske ska använda det faktum att de är arbetslösa eller inte, deras kreditvärdering, deras beteende på internet, huruvida de är uppe sent på natten. Deras fitnessklocka, när den kan kontrollera biokemin, kommer visa att de har aggressiva tankar. Vi kan ha algoritmer som sannolikt kommer förutse vad vi kommer att göra, och vi kan hållas ansvariga innan vi faktiskt har agerat. Sekretess var den centrala utmaningen i eran av små datamängder. I big data-åldern är utmaningen att värna om den fria viljan, moraliskt val, mänsklig vilja, mänsklig inverkan.

There is another problem: Big data is going to steal our jobs. Big data and algorithms are going to challenge white collar, professional knowledge work in the 21st century in the same way that factory automation and the assembly line challenged blue collar labor in the 20th century. Think about a lab technician who is looking through a microscope at a cancer biopsy and determining whether it's cancerous or not. The person went to university. The person buys property. He or she votes. He or she is a stakeholder in society. And that person's job, as well as an entire fleet of professionals like that person, is going to find that their jobs are radically changed or actually completely eliminated. Now, we like to think that technology creates jobs over a period of time after a short, temporary period of dislocation, and that is true for the frame of reference with which we all live, the Industrial Revolution, because that's precisely what happened. But we forget something in that analysis: There are some categories of jobs that simply get eliminated and never come back. The Industrial Revolution wasn't very good if you were a horse. So we're going to need to be careful and take big data and adjust it for our needs, our very human needs. We have to be the master of this technology, not its servant. We are just at the outset of the big data era, and honestly, we are not very good at handling all the data that we can now collect. It's not just a problem for the National Security Agency. Businesses collect lots of data, and they misuse it too, and we need to get better at this, and this will take time. It's a little bit like the challenge that was faced by primitive man and fire. This is a tool, but this is a tool that, unless we're careful, will burn us.

Det finns ett problem till: Big data kommer att stjäla våra jobb. Big data och algoritmer kommer att utmana tjänstemäns yrkesmässiga kunskapsarbete på 2000-talet på samma sätt som fabriksautomation och monteringslinjen utmanade industriarbetaren på 1900-talet. Tänk på labbteknikern som tittar genom ett mikroskop på en cancerbiopsi och bestämmer om det är cancer eller inte. Personen har gått på universitetet. Personen köper egendom. Han eller hon röstar. Han eller hon är nu en samhällsaktör. Och den personens jobb, samt en hel flotta av proffs som den personen, kommer att finna att deras jobb har förändrats radikalt eller faktiskt helt elimineras. Vi vill gärna tro att tekniken skapar arbetstillfällen under en tid efter en kort, tillfällig period av störning, och det är sant för den referensram som vi alla lever i, den industriella revolutionen, för det är precis vad som hände. Men vi glömmer något i den analysen: Det finns vissa kategorier av jobb som helt enkelt försvinner och aldrig kommer tillbaka. Den industriella revolutionen var inte särskilt bra för hästar. Så vi måste vara försiktiga och ta big data och anpassa den efter våra behov, våra väldigt mänskliga behov. Vi måste vara herre över denna teknik, inte dess tjänare. Vi är bara i början av big data-eran, och ärligt talat så är vi inte så bra på att hantera all den data som vi nu samlar in. Det är inte bara ett problem för National Security Agency. Företagen samlar massor av data, och de missbrukar den också, och vi måste bli bättre på detta, och det kommer att ta tid. Det är lite som utmaningen när den primitiva människan mötte eld. Detta är ett verktyg, men ett verktyg som om det inte hanteras försiktigt kommer att skada oss.

Big data is going to transform how we live, how we work and how we think. It is going to help us manage our careers and lead lives of satisfaction and hope and happiness and health, but in the past, we've often looked at information technology and our eyes have only seen the T, the technology, the hardware, because that's what was physical. We now need to recast our gaze at the I, the information, which is less apparent, but in some ways a lot more important. Humanity can finally learn from the information that it can collect, as part of our timeless quest to understand the world and our place in it, and that's why big data is a big deal.

Big data kommer omvandla hur vi lever, hur vi arbetar och hur vi tänker. Det kommer hjälpa oss hantera våra karriär och leva liv som innehåller tillfredsställelse och hopp, lycka och hälsa, men i det förflutna har vi ofta sett på informationsteknik och vi har bara sett T:et, tekniken, hårdvaran, eftersom det är vad som fanns fysiskt Nu måste vi rikta blicken mot I:et, informationen, vilken är mindre uppenbart, men på vissa sätt mycket viktigare. Mänskligheten kan äntligen lära sig saker från informationen som den samlar in, som en del i ett tidlöst uppdrag att förstå världen och vår plats i den, och det är därför big data är en stor sak.

(Applause)

(Applåder)

America's favorite pie is?

Vilken är USA:s favoritpaj?

(Applause)

(Applåder)

Kenneth Cukier: Big data is better data

Kenneth Cukier: Big data is better data

Related talks

David McCandless: The beauty of data visualization

Talithia Williams: Own your body's data

Tim Berners-Lee: The next web

Shyam Sankar: The rise of human-computer cooperation

Giorgia Lupi: How we can find ourselves in data

Anders Ynnerman: Visualizing the medical data explosion

Related talks

David McCandless: The beauty of data visualization

Talithia Williams: Own your body's data

Tim Berners-Lee: The next web

Shyam Sankar: The rise of human-computer cooperation

Giorgia Lupi: How we can find ourselves in data

Anders Ynnerman: Visualizing the medical data explosion