Cathy O'Neil: The era of blind faith in big data must end

Algorithms are everywhere. They sort and separate the winners from the losers. The winners get the job or a good credit card offer. The losers don't even get an interview or they pay more for insurance. We're being scored with secret formulas that we don't understand that often don't have systems of appeal. That begs the question: What if the algorithms are wrong?

Algoritmes zijn overal. Ze scheiden de winnaars van de verliezers. De winnaars krijgen de baan of een goede creditcard-aanbieding. De verliezers mogen niet eens op gesprek komen of betalen meer voor hun verzekering. Onze score wordt berekend met geheime formules die we niet begrijpen en waar je vaak niet tegen in beroep kan gaan. Hierdoor rijst de vraag: wat als algoritmes fouten maken?

To build an algorithm you need two things: you need data, what happened in the past, and a definition of success, the thing you're looking for and often hoping for. You train an algorithm by looking, figuring out. The algorithm figures out what is associated with success. What situation leads to success?

Een algoritme vraagt om twee dingen: data, wat in het verleden gebeurd is, en een definitie voor succes, dat wat je hoopt te vinden. Je laat een algoritme data bestuderen om te ontcijferen wat leidt tot succes. Welke factoren spelen daarin mee?

Actually, everyone uses algorithms. They just don't formalize them in written code. Let me give you an example. I use an algorithm every day to make a meal for my family. The data I use is the ingredients in my kitchen, the time I have, the ambition I have, and I curate that data. I don't count those little packages of ramen noodles as food.

Iedereen gebruikt algoritmes. Alleen zetten ze het niet om in computercode. Een voorbeeld. Ik gebruik elke dag een algoritme om een maaltijd te koken voor mijn gezin. De gegevens die ik gebruik zijn de ingrediënten in huis, de hoeveelheid tijd die ik heb, de gewenste inspanning, en daarna orden ik die data. Kleine pakjes ramen noedels reken ik niet mee als voedsel.

(Laughter)

(Gelach)

My definition of success is: a meal is successful if my kids eat vegetables. It's very different from if my youngest son were in charge. He'd say success is if he gets to eat lots of Nutella. But I get to choose success. I am in charge. My opinion matters. That's the first rule of algorithms.

Mijn definitie voor succes is: een maaltijd is succesvol als mijn kinderen groenten eten. Als mijn zoontje de baas was, zou hij iets heel anders zeggen. Voor hem is succes als hij onbeperkt Nutella mag eten. (Gelach) Maar ik bepaal wat succes is. Ik ben de baas. Mijn mening is belangrijk. Dat is de eerste regel van algoritmes.

Algorithms are opinions embedded in code. It's really different from what you think most people think of algorithms. They think algorithms are objective and true and scientific. That's a marketing trick. It's also a marketing trick to intimidate you with algorithms, to make you trust and fear algorithms because you trust and fear mathematics. A lot can go wrong when we put blind faith in big data.

Algoritmes zijn meningen vastgelegd in code. Dat is heel iets anders dan wat de meeste mensen denken dat het zijn. Zij denken dat algoritmes objectief, waar en wetenschappelijk zijn. Dat is een marketingtruc. (Gelach) Het is ook een marketingtruc om je te intimideren met algoritmes, om te zorgen dat je algoritmes vertrouwt én er bang voor bent, net zoals je op wiskunde vertrouwt en er bang voor bent. Het kan flink misgaan als we blind vertrouwen op Big Data.

This is Kiri Soares. She's a high school principal in Brooklyn. In 2011, she told me her teachers were being scored with a complex, secret algorithm called the "value-added model." I told her, "Well, figure out what the formula is, show it to me. I'm going to explain it to you." She said, "Well, I tried to get the formula, but my Department of Education contact told me it was math and I wouldn't understand it."

Dit is Kiri Soares. Ze is directrice van een middelbare school in Brooklyn. In 2011 werden haar docenten beoordeeld met behulp van een complex, geheim algoritme, het zogeheten 'meerwaardemodel'. Ik zei: "Probeer achter de formule te komen, dan leg ik het je uit." Ze zei: "Dat heb ik geprobeerd, maar op het Ministerie van Onderwijs zeiden ze het wiskunde was en dat ik dat toch niet zou begrijpen." (Gelach)

It gets worse. The New York Post filed a Freedom of Information Act request, got all the teachers' names and all their scores and they published them as an act of teacher-shaming. When I tried to get the formulas, the source code, through the same means, I was told I couldn't. I was denied. I later found out that nobody in New York City had access to that formula. No one understood it. Then someone really smart got involved, Gary Rubinstein. He found 665 teachers from that New York Post data that actually had two scores. That could happen if they were teaching seventh grade math and eighth grade math. He decided to plot them. Each dot represents a teacher.

Het wordt nog erger. De New York Post deed een beroep op de vrijheid van informatie, verzamelde alle namen van docenten en hun scores en publiceerde die om ze publiekelijk terecht te wijzen. Toen ik zelf probeerde de formules, de broncode, te bemachtigen, werd gezegd dat dat onmogelijk was. Ik werd geweigerd. Pas later kwam ik erachter dat niemand in New York toegang had tot de formule. Niemand begreep het. Toen raakte de intelligente Gary Rubinstein bij de zaak betrokken. In de gegevens van de New York Post vond hij 665 docenten die twee scores bleken te hebben. Dat gebeurde als ze wiskunde gaven aan de brugklas en de tweede klas. Hij besloot ze in kaart te brengen. Elke stip is een docent.

(Laughter)

(Gelach)

What is that?

Wat is dat?

(Laughter)

(Gelach)

That should never have been used for individual assessment. It's almost a random number generator.

Dat hadden ze nooit mogen gebruiken voor individuele beoordeling. Het lijkt meer op een toevalsgenerator.

(Applause)

(Applaus)

But it was. This is Sarah Wysocki. She got fired, along with 205 other teachers, from the Washington, DC school district, even though she had great recommendations from her principal and the parents of her kids.

En dat was ook zo. Dit is Sarah Wysocki. Samen met 205 andere docenten werd ze ontslagen in het schooldistrict van Washington, ondanks de lovende aanbevelingen van de schoolleiding en de ouders van haar leerlingen.

I know what a lot of you guys are thinking, especially the data scientists, the AI experts here. You're thinking, "Well, I would never make an algorithm that inconsistent." But algorithms can go wrong, even have deeply destructive effects with good intentions. And whereas an airplane that's designed badly crashes to the earth and everyone sees it, an algorithm designed badly can go on for a long time, silently wreaking havoc.

Ik hoor jullie al denken, vooral de datawetenschappers, de KI-experts hier: zo'n onbetrouwbaar algoritme zou ik nooit maken. Maar het kan fout gaan met algoritmes, soms zelfs met desastreuze gevolgen, ondanks goede bedoelingen. Maar als een slecht ontworpen vliegtuig neerstort, dan ziet iedereen dat, maar een slecht ontworpen algoritme kan lange tijd ongemerkt schade aanrichten.

This is Roger Ailes.

Dit is Roger Ailes.

(Laughter)

(Gelach)

He founded Fox News in 1996. More than 20 women complained about sexual harassment. They said they weren't allowed to succeed at Fox News. He was ousted last year, but we've seen recently that the problems have persisted. That begs the question: What should Fox News do to turn over another leaf?

Hij richtte in 1996 Fox News op. Meer dan 20 vrouwen klaagden over seksuele intimidatie. Ze zeiden dat ze geen promotie kregen bij Fox News. Hij is vorig jaar afgezet, maar het blijkt nu dat het probleem nog steeds niet is opgelost. Dan rijst de vraag: wat moet Fox News doen om dit te veranderen?

Well, what if they replaced their hiring process with a machine-learning algorithm? That sounds good, right? Think about it. The data, what would the data be? A reasonable choice would be the last 21 years of applications to Fox News. Reasonable. What about the definition of success? Reasonable choice would be, well, who is successful at Fox News? I guess someone who, say, stayed there for four years and was promoted at least once. Sounds reasonable. And then the algorithm would be trained. It would be trained to look for people to learn what led to success, what kind of applications historically led to success by that definition. Now think about what would happen if we applied that to a current pool of applicants. It would filter out women because they do not look like people who were successful in the past.

Wat als ze voortaan mensen zouden werven met behulp van een zelflerend algoritme? Klinkt goed, toch? Maar wat houdt dat in? Welke data ga je gebruiken? De sollicitaties van de afgelopen 21 jaar zou redelijk zijn. Redelijk. En wanneer is het een succes? Een redelijke keuze zou zijn: wie is succesvol bij Fox News? Iemand die er al vier jaar werkt en minstens een keer promotie heeft gemaakt. Klinkt redelijk. En dan wordt het algoritme getraind. Het leert zoeken naar mensen om te weten wat tot succes leidt, welke sollicitaties in het verleden succesvol waren volgens die definitie. Wat zou er gebeuren als we dit zouden toepassen op de huidige kandidaten? Het zou vrouwen eruit filteren, want zo te zien waren zij niet succesvol in het verleden.

Algorithms don't make things fair if you just blithely, blindly apply algorithms. They don't make things fair. They repeat our past practices, our patterns. They automate the status quo. That would be great if we had a perfect world, but we don't. And I'll add that most companies don't have embarrassing lawsuits, but the data scientists in those companies are told to follow the data, to focus on accuracy. Think about what that means. Because we all have bias, it means they could be codifying sexism or any other kind of bigotry.

Algoritmes maken dingen niet eerlijker als je ze klakkeloos toepast. Ze maken het niet eerlijker. Ze herhalen onze eerdere ervaringen, onze patronen. Ze automatiseren de status quo. Dat werkt goed als de wereld perfect zou zijn, maar dat is niet zo. Ook hebben de meeste bedrijven geen pijnlijke rechtszaken lopen, maar hun datawetenschappers worden gedwongen de data te volgen, met nadruk op nauwkeurigheid. En wat betekent dat? Want onze vooroordelen zorgen ervoor dat seksisme het systeem binnendringt, net als andere vormen van onverdraagzaamheid.

Thought experiment, because I like them: an entirely segregated society -- racially segregated, all towns, all neighborhoods and where we send the police only to the minority neighborhoods to look for crime. The arrest data would be very biased. What if, on top of that, we found the data scientists and paid the data scientists to predict where the next crime would occur? Minority neighborhood. Or to predict who the next criminal would be? A minority. The data scientists would brag about how great and how accurate their model would be, and they'd be right.

Even een gedachte-experiment, want die zijn leuk: een volledig gesegregeerde samenleving -- naar ras gescheiden, alle steden, alle wijken -- waar we de politie alleen op wijken met minderheden afsturen om criminaliteit op te sporen. De arrestatiedata zouden sterk bevooroordeeld zijn. Wat zou er gebeuren als we datawetenschappers betaalden om te voorspellen waar de volgende misdaad zou plaatsvinden? In de wijk met minderheden. Of wie de volgende crimineel zou zijn? Iemand uit een minderheidsgroep. De datawetenschappers zouden opscheppen over hoe geweldig en nauwkeurig hun model was en ze zouden gelijk hebben.

Now, reality isn't that drastic, but we do have severe segregations in many cities and towns, and we have plenty of evidence of biased policing and justice system data. And we actually do predict hotspots, places where crimes will occur. And we do predict, in fact, the individual criminality, the criminality of individuals. The news organization ProPublica recently looked into one of those "recidivism risk" algorithms, as they're called, being used in Florida during sentencing by judges. Bernard, on the left, the black man, was scored a 10 out of 10. Dylan, on the right, 3 out of 10. 10 out of 10, high risk. 3 out of 10, low risk. They were both brought in for drug possession. They both had records, but Dylan had a felony but Bernard didn't. This matters, because the higher score you are, the more likely you're being given a longer sentence.

Zo erg is het nog niet, maar ernstige segregatie vindt in de meeste steden plaats en we hebben genoeg bewijzen van bevooroordeelde politie en rechtssysteem. We voorspellen daadwerkelijk gebieden waar zich criminaliteit zal voordoen. En ook waar criminele eenmansacties zullen plaatsvinden. De nieuwsorganisatie ProPublica onderzocht onlangs zo'n zogeheten 'recidive risico'-algoritme dat in Florida wordt gebruikt tijdens de veroordeling door de rechter. Bernard, links, de zwarte man, scoorde tien uit tien. Dylan, rechts, drie uit tien. Tien uit tien, hoog risico. Drie uit tien, laag risico. Ze werden allebei opgepakt voor drugsbezit. Ze hadden allebei een strafblad, maar Dylan voor een misdrijf en Bernard niet. Dat maakt uit, want hoe hoger de score, des te zwaarder je gestraft wordt.

What's going on? Data laundering. It's a process by which technologists hide ugly truths inside black box algorithms and call them objective; call them meritocratic. When they're secret, important and destructive, I've coined a term for these algorithms: "weapons of math destruction."

Hoe kan dit? Door het witwassen van data. Dit gebeurt als technologen de lelijke waarheid in een zwarte doos van algoritmes verbergen en ze objectief noemen, ze meritocratisch noemen. Geheime, essentiële en destructieve algoritmes krijgen van mij de naam 'datavernietigingswapens'.

(Laughter)

(Gelach)

(Applause)

(Applaus)

They're everywhere, and it's not a mistake. These are private companies building private algorithms for private ends. Even the ones I talked about for teachers and the public police, those were built by private companies and sold to the government institutions. They call it their "secret sauce" -- that's why they can't tell us about it. It's also private power. They are profiting for wielding the authority of the inscrutable. Now you might think, since all this stuff is private and there's competition, maybe the free market will solve this problem. It won't. There's a lot of money to be made in unfairness.

Ze zijn overal, echt waar. Dit zijn particuliere bedrijven die eigen algoritmes maken voor eigen gebruik. Zelfs die waar ik het over had voor docenten en politie, kwamen van particuliere bedrijven en werden verkocht aan overheidsinstellingen. Ze noemen het hun 'geheime recept' -- daarom willen ze er niets over zeggen. Het gaat ook om private macht. Ze maken handig gebruik van hun gezag over dingen die onbegrijpelijk zijn. Omdat alles in particuliere handen is en er concurrentie is, denk je wellicht dat de vrije markt dit probleem wel oplost. Onjuist. Er wordt grof geld verdiend met oneerlijke praktijken.

Also, we're not economic rational agents. We all are biased. We're all racist and bigoted in ways that we wish we weren't, in ways that we don't even know. We know this, though, in aggregate, because sociologists have consistently demonstrated this with these experiments they build, where they send a bunch of applications to jobs out, equally qualified but some have white-sounding names and some have black-sounding names, and it's always disappointing, the results -- always.

Ook zijn wij geen economisch rationele wezens. We zijn bevooroordeeld. We zijn racistisch en onverdraagzaam, erger dan we willen toegeven en vaak zonder dat we het doorhebben. Dit weten we doordat sociologen keer op keer met hun onderzoeken hebben bewezen dat als je sollicitaties verstuurt met dezelfde opleiding, maar met deels 'witte' namen deels 'zwarte' namen, dat de resultaten altijd zullen tegenvallen.

So we are the ones that are biased, and we are injecting those biases into the algorithms by choosing what data to collect, like I chose not to think about ramen noodles -- I decided it was irrelevant. But by trusting the data that's actually picking up on past practices and by choosing the definition of success, how can we expect the algorithms to emerge unscathed? We can't. We have to check them. We have to check them for fairness.

Wij zijn degenen met vooroordelen en daar injecteren we de algoritmes mee door te kiezen welke data worden verzameld, zoals ik besloot ramen noedels uit te sluiten -- omdat ik het niet relevant vond. Maar als we vertrouwen op data die uitgaan van eerder gedrag en een definitie voor succes hanteren, waarom denken we dan dat de algoritmes daar ongeschonden uitkomen? Onmogelijk. We moeten controleren of ze redelijk zijn.

The good news is, we can check them for fairness. Algorithms can be interrogated, and they will tell us the truth every time. And we can fix them. We can make them better. I call this an algorithmic audit, and I'll walk you through it.

Gelukkig is het mogelijk ze op redelijkheid te testen. Algoritmes kan je ondervragen en ze zullen steeds eerlijk antwoord geven. We kunnen ze herstellen. We kunnen ze verbeteren. Dit noem ik een algoritme-inspectie en ik leg even uit hoe dat werkt.

First, data integrity check. For the recidivism risk algorithm I talked about, a data integrity check would mean we'd have to come to terms with the fact that in the US, whites and blacks smoke pot at the same rate but blacks are far more likely to be arrested -- four or five times more likely, depending on the area. What is that bias looking like in other crime categories, and how do we account for it?

Allereerst, een data-integriteitscontrole. Voor het 'recidive risico'-algoritme dat ik eerder noemde, betekent een data-integriteitscontrole dat we moeten accepteren dat in de VS de witte en de zwarte bevolking net zoveel marihuana roken maar de zwarte bevolking vaker wordt gearresteerd -- vier tot vijf keer vaker zelfs, afhankelijk van de buurt. Hoe ziet die vertekening eruit in andere misdrijfcategorieën en hoe verantwoorden we dat?

Second, we should think about the definition of success, audit that. Remember -- with the hiring algorithm? We talked about it. Someone who stays for four years and is promoted once? Well, that is a successful employee, but it's also an employee that is supported by their culture. That said, also it can be quite biased. We need to separate those two things. We should look to the blind orchestra audition as an example. That's where the people auditioning are behind a sheet. What I want to think about there is the people who are listening have decided what's important and they've decided what's not important, and they're not getting distracted by that. When the blind orchestra auditions started, the number of women in orchestras went up by a factor of five.

Ten tweede, we moeten kijken naar onze definitie voor succes en dat checken. Weet je nog het wervingsalgoritme waar we het over hadden? Iemand die vier jaar in dienst is en één keer promotie maakt? Dat is een succesvolle werknemer, maar het is ook een werknemer die de goedkeuring krijgt van de bedrijfscultuur. Kortom, vooroordelen. We moeten die twee dingen uit elkaar houden. Laten we het voorbeeld nemen van de blinde orkestauditie. De mensen die auditie doen, zitten daarbij achter een laken. Wat volgens mij relevant is, is dat de toehoorders hebben besloten wat belangrijk is en daar zullen ze niet vanaf wijken. Toen de blinde orkestaudities begonnen, schoot het aantal vrouwen in orkesten met factor vijf omhoog.

Next, we have to consider accuracy. This is where the value-added model for teachers would fail immediately. No algorithm is perfect, of course, so we have to consider the errors of every algorithm. How often are there errors, and for whom does this model fail? What is the cost of that failure?

Ook moeten we kijken naar nauwkeurigheid. Hier zou het meerwaardemodel voor docenten direct falen. Geen enkel algoritme is perfect, dus we moeten rekening houden met de fouten van ieder algoritme. Hoe vaak worden fouten gemaakt en wie is daar het slachtoffer van? Wat zijn de gevolgen?

And finally, we have to consider the long-term effects of algorithms, the feedback loops that are engendering. That sounds abstract, but imagine if Facebook engineers had considered that before they decided to show us only things that our friends had posted.

En tot slot moeten we stilstaan bij de gevolgen van algoritmes op de lange termijn, de terugkoppeling die het met zich meebrengt. Dat klinkt vaag, maar stel dat de Facebookontwerpers daar aan hadden gedacht voordat ze besloten ons alleen te laten zien wat onze vrienden hadden gepost.

I have two more messages, one for the data scientists out there. Data scientists: we should not be the arbiters of truth. We should be translators of ethical discussions that happen in larger society.

Ik wil nog twee dingen kwijt. Allereerst aan de datawetenschappers: wij zijn niet de scheidsrechters die bepalen wat waar is. Wij moeten de ethische discussies die zich voordoen, begrijpelijk maken voor de hele samenleving.

(Applause)

(Applaus)

And the rest of you, the non-data scientists: this is not a math test. This is a political fight. We need to demand accountability for our algorithmic overlords.

En tegen alle niet-datawetenschappers zeg ik: dit is geen wiskundetest. Dit is een politiek gevecht. We moeten onze algoritmebazen ter verantwoording roepen.

(Applause)

(Applaus)

The era of blind faith in big data must end.

Het wordt tijd dat er een eind komt aan het blinde vertrouwen in Big Data.

Thank you very much.

Dank jullie wel.

(Applause)

(Applaus)

(Laughter)

(Gelach)

(Laughter)

(Gelach)

What is that?

Wat is dat?

(Laughter)

(Gelach)

That should never have been used for individual assessment. It's almost a random number generator.

Dat hadden ze nooit mogen gebruiken voor individuele beoordeling. Het lijkt meer op een toevalsgenerator.

(Applause)

(Applaus)

This is Roger Ailes.

Dit is Roger Ailes.

(Laughter)

(Gelach)

(Laughter)

(Gelach)

(Applause)

(Applaus)

(Applause)

(Applaus)

And the rest of you, the non-data scientists: this is not a math test. This is a political fight. We need to demand accountability for our algorithmic overlords.

En tegen alle niet-datawetenschappers zeg ik: dit is geen wiskundetest. Dit is een politiek gevecht. We moeten onze algoritmebazen ter verantwoording roepen.

(Applause)

(Applaus)

The era of blind faith in big data must end.

Het wordt tijd dat er een eind komt aan het blinde vertrouwen in Big Data.

Thank you very much.

Dank jullie wel.

(Applause)

(Applaus)

Cathy O'Neil: The era of blind faith in big data must end

Cathy O'Neil: The era of blind faith in big data must end

Related talks

Tricia Wang: The human insights missing from big data

Mona Chalabi: 3 ways to spot a bad statistic

Mallory Freeman: Your company's data could help end world hunger

Christian Rudder: Inside OKCupid: The math of online dating

Zeynep Tufekci: Machine intelligence makes human morals more important

Amy Webb: How I hacked online dating

Related talks

Tricia Wang: The human insights missing from big data

Mona Chalabi: 3 ways to spot a bad statistic

Mallory Freeman: Your company's data could help end world hunger

Christian Rudder: Inside OKCupid: The math of online dating

Zeynep Tufekci: Machine intelligence makes human morals more important

Amy Webb: How I hacked online dating