Peter Donnelly: How juries are fooled by statistics

As other speakers have said, it's a rather daunting experience -- a particularly daunting experience -- to be speaking in front of this audience. But unlike the other speakers, I'm not going to tell you about the mysteries of the universe, or the wonders of evolution, or the really clever, innovative ways people are attacking the major inequalities in our world. Or even the challenges of nation-states in the modern global economy. My brief, as you've just heard, is to tell you about statistics -- and, to be more precise, to tell you some exciting things about statistics. And that's -- (Laughter) -- that's rather more challenging than all the speakers before me and all the ones coming after me. (Laughter) One of my senior colleagues told me, when I was a youngster in this profession, rather proudly, that statisticians were people who liked figures but didn't have the personality skills to become accountants. (Laughter) And there's another in-joke among statisticians, and that's, "How do you tell the introverted statistician from the extroverted statistician?" To which the answer is, "The extroverted statistician's the one who looks at the other person's shoes." (Laughter) But I want to tell you something useful -- and here it is, so concentrate now. This evening, there's a reception in the University's Museum of Natural History. And it's a wonderful setting, as I hope you'll find, and a great icon to the best of the Victorian tradition. It's very unlikely -- in this special setting, and this collection of people -- but you might just find yourself talking to someone you'd rather wish that you weren't. So here's what you do. When they say to you, "What do you do?" -- you say, "I'm a statistician." (Laughter) Well, except they've been pre-warned now, and they'll know you're making it up. And then one of two things will happen. They'll either discover their long-lost cousin in the other corner of the room and run over and talk to them. Or they'll suddenly become parched and/or hungry -- and often both -- and sprint off for a drink and some food. And you'll be left in peace to talk to the person you really want to talk to.

Zoals andere sprekers al zeiden: het is een schrikbarende ervaring om voor dit publiek te spreken. In tegenstelling tot andere sprekers ga ik niet praten over de mysteries van het universum, de wonderen van de evolutie of de slimme, innovatieve manieren waarop mensen de grote ongelijkheden in deze wereld aanpakken. Zelfs niet over de uitdagingen van natiestaten in de globale economie. Mijn opdracht is te vertellen over statistiek. Meer in het bijzonder: spannende dingen over statistiek. Dat is -- (Gelach) dat is een veel lastiger klus dan alle sprekers voor mij en alle sprekers na mij. (Gelach) Eén van mijn oudere collega's vertelde mij, toen ik nog een groentje was, met een zekere trots: statistici zijn mensen die van cijfers houden maar niet de persoonlijkheid hebben om accountants te worden. (Gelach) Nog een inside joke onder statistici: "Hoe zie je het verschil tussen een introverte en een extraverte statisticus?" Het antwoord is: "De extraverte statisticus is diegene die naar de schoenen van de ander kijkt." (Gelach) Ik wil jullie iets nuttigs vertellen. Hier komt het, dus nu even opletten. Vanavond is er een receptie in het Museum of Natural History van de universiteit. Een prachtig decor, zoals jullie hopelijk zullen ontdekken, een icoon uit de beste Victoriaanse traditie. Het is erg onwaarschijnlijk, in deze speciale omgeving, met deze groep mensen, maar heel misschien raak je aan de praat met iemand met wie je liever niet zou praten. Laat me je een tip geven. Als ze je vragen wat voor werk je doet, zeg je: "Ik ben statisticus." (Gelach) Helaas zijn ze nu gewaarschuwd, en weten ze dat je het verzint. Dan zal er één van de volgende twee dingen gebeuren. Ofwel ontwaren ze een verloren gewaande neef in de andere hoek van de zaal, en gaan ze daar op af voor een praatje. Ofwel zijn ze plots heel dorstig of hongerig -- vaak beide -- en rennen ze er vandoor voor een drankje en een hapje. Jij kan dan in alle rust praten met de persoon met wie je echt wil praten.

It's one of the challenges in our profession to try and explain what we do. We're not top on people's lists for dinner party guests and conversations and so on. And it's something I've never really found a good way of doing. But my wife -- who was then my girlfriend -- managed it much better than I've ever been able to. Many years ago, when we first started going out, she was working for the BBC in Britain, and I was, at that stage, working in America. I was coming back to visit her. She told this to one of her colleagues, who said, "Well, what does your boyfriend do?" Sarah thought quite hard about the things I'd explained -- and she concentrated, in those days, on listening. (Laughter) Don't tell her I said that. And she was thinking about the work I did developing mathematical models for understanding evolution and modern genetics. So when her colleague said, "What does he do?" She paused and said, "He models things." (Laughter) Well, her colleague suddenly got much more interested than I had any right to expect and went on and said, "What does he model?" Well, Sarah thought a little bit more about my work and said, "Genes." (Laughter) "He models genes."

Eén van de uitdagingen van mijn beroep is uitleggen wat we doen. Wij staan niet bovenaan de lijst van gewilde genodigden en gesprekspartners. Ik heb nooit goed geweten hoe dat moest. Mijn vrouw, die toen mijn vriendin was, deed dit veel beter dan ik. Vele jaren geleden, toen we elkaar pas kenden, werkte ze voor de BBC in Engeland. Ik werkte toen in Amerika. Ik kwam terug om haar te bezoeken. Ze vertelde dat aan een collega, die vroeg: "Wat doet je vriendje?". Sarah had hard nagedacht over wat ik had uitgelegd -- en in die tijd deed ze haar best om te luisteren. (Gelach) Niet zeggen dat ik dat gezegd heb. Ze dacht aan mijn werk, het ontwerpen van wiskundige modellen om de evolutie en de moderne genetica te begrijpen. Toen haar collega zei: "Wat doet hij?" dacht ze even na en zei: "Hij modelleert dingen." (Gelach) Haar collega toonde plots meer interesse dan ik mocht verhopen, en zei: "Wat modelleert hij?" Sarah dacht nog wat verder na over mijn werk en zei: "Genen." (Gelach) "Hij modelleert genen."

That is my first love, and that's what I'll tell you a little bit about. What I want to do more generally is to get you thinking about the place of uncertainty and randomness and chance in our world, and how we react to that, and how well we do or don't think about it. So you've had a pretty easy time up till now -- a few laughs, and all that kind of thing -- in the talks to date. You've got to think, and I'm going to ask you some questions. So here's the scene for the first question I'm going to ask you. Can you imagine tossing a coin successively? And for some reason -- which shall remain rather vague -- we're interested in a particular pattern. Here's one -- a head, followed by a tail, followed by a tail.

Dat was mijn eerste liefde, en daar ga ik wat over vertellen. Meer in het algemeen wil ik jullie aan het denken zetten over de plaats van onzekerheid, toeval en geluk in onze wereld, hoe we erop reageren en hoe goed we er al dan niet over nadenken. Tot hiertoe hadden jullie het gemakkelijk -- een beetje lachen en zo -- in voorgaande lezingen. Je zult moeten nadenken, en ik ga jullie vragen stellen. Hier is de achtergrond van mijn eerste vraag. Beeld je in dat je een paar keer een munt opgooit. Om een reden -- die eerder vaag zal blijven -- interesseert ons een bepaald patroon. Dit hier: kop, gevolgd door munt, gevolgd door munt.

So suppose we toss a coin repeatedly. Then the pattern, head-tail-tail, that we've suddenly become fixated with happens here. And you can count: one, two, three, four, five, six, seven, eight, nine, 10 -- it happens after the 10th toss. So you might think there are more interesting things to do, but humor me for the moment. Imagine this half of the audience each get out coins, and they toss them until they first see the pattern head-tail-tail. The first time they do it, maybe it happens after the 10th toss, as here. The second time, maybe it's after the fourth toss. The next time, after the 15th toss. So you do that lots and lots of times, and you average those numbers. That's what I want this side to think about.

Veronderstel dat we herhaaldelijk gooien. Vervolgens doet het patroon zich voor waarop we ons plots concentreren - kop, munt, munt. Je kan tellen: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. Het gebeurt na de tiende keer. Misschien denk je dat er wel interessanter bezigheden zijn, maar sta me even toe. Beeld je in dat deze helft van het publiek een munt neemt en opgooit, tot ze voor het eerst het kop-munt-munt-patroon zien. De eerste keer gebeurt het misschien na de 10de beurt, zoals hier. De volgende keer misschien na de vierde beurt. De volgende keer na de 15de beurt. Doe dat heel vaak, en maak een gemiddelde van de getallen. Ik zou deze kant willen vragen daarover na te denken.

The other half of the audience doesn't like head-tail-tail -- they think, for deep cultural reasons, that's boring -- and they're much more interested in a different pattern -- head-tail-head. So, on this side, you get out your coins, and you toss and toss and toss. And you count the number of times until the pattern head-tail-head appears and you average them. OK? So on this side, you've got a number -- you've done it lots of times, so you get it accurately -- which is the average number of tosses until head-tail-tail. On this side, you've got a number -- the average number of tosses until head-tail-head.

De andere helft van het publiek houdt niet van kop-munt-munt -- om diep-culturele redenen vinden ze dat saai. Zij hebben meer belangstelling voor een ander patroon -- kop-munt-kop. Aan deze kant neem je je munten en je gooit en gooit en gooit. Je telt het aantal keren tot het patroon kop-munt-kop verschijnt en je maakt een gemiddelde. OK? Aan deze kant heb je een getal -- je hebt het vaak gedaan, dus het is accuraat -- het gemiddelde aantal beurten tot je kop-munt-munt krijgt. Aan deze kant heb je een getal -- het gemiddelde aantal beurten tot je kop-munt-kop krijgt.

So here's a deep mathematical fact -- if you've got two numbers, one of three things must be true. Either they're the same, or this one's bigger than this one, or this one's bigger than that one. So what's going on here? So you've all got to think about this, and you've all got to vote -- and we're not moving on. And I don't want to end up in the two-minute silence to give you more time to think about it, until everyone's expressed a view. OK. So what you want to do is compare the average number of tosses until we first see head-tail-head with the average number of tosses until we first see head-tail-tail.

Hier is een diep wiskundig feit -- als je twee getallen hebt, moet één van deze drie dingen kloppen. Ofwel zijn ze gelijk, ofwel is dit getal groter dan dat, ofwel is dat getal groter dan dit. Wat gebeurt er hier? Denk hier allemaal over na. Iedereen moet stemmen. We gaan niet verder -- en ik doe niet aan twee minuten stilte om jullie bedenktijd te geven -- tot iedereen een standpunt heeft ingenomen. Dus vergelijk nu het gemiddeld aantal beurten tot we kop-munt-kop zien, met het gemiddeld aantal beurten tot we kop-munt-munt zien.

Who thinks that A is true -- that, on average, it'll take longer to see head-tail-head than head-tail-tail? Who thinks that B is true -- that on average, they're the same? Who thinks that C is true -- that, on average, it'll take less time to see head-tail-head than head-tail-tail? OK, who hasn't voted yet? Because that's really naughty -- I said you had to. (Laughter) OK. So most people think B is true. And you might be relieved to know even rather distinguished mathematicians think that. It's not. A is true here. It takes longer, on average. In fact, the average number of tosses till head-tail-head is 10 and the average number of tosses until head-tail-tail is eight. How could that be? Anything different about the two patterns? There is. Head-tail-head overlaps itself. If you went head-tail-head-tail-head, you can cunningly get two occurrences of the pattern in only five tosses. You can't do that with head-tail-tail. That turns out to be important.

Wie denkt dat A juist is -- dat het gemiddeld langer duurt om kop-munt-kop te zien dan kop-munt-munt? Wie denkt dat B juist is -- dat ze gemiddeld gelijk zijn? Wie denkt dat C juist is -- dat het gemiddeld minder lang duurt om kop-munt-kop te zien dan kop-munt-munt? Wie heeft nog niet gestemd? Dat is echt stout -- ik had gezegd dat je moest. (Gelach) OK. De meeste mensen denken dat het B is. Het zal jullie opluchten dat zelfs gerespecteerde wiskundigen dat denken. Het is niet zo. A is juist. Het duurt gemiddeld langer. Het gemiddelde aantal beurten tot kop-munt-kop is 10, en het gemiddeld aantal beurten tot kop-munt-munt is 8. Hoe kan dat? Is er een verschil tussen de twee patronen? Jazeker. Kop-munt-kop overlapt met zichzelf. Bij kop-munt-kop-munt-kop, heb je heel slim twee keer het patroon in maar 5 beurten. Dat kan je niet doen met kop-munt-munt. Dat blijkt belangrijk te zijn.

There are two ways of thinking about this. I'll give you one of them. So imagine -- let's suppose we're doing it. On this side -- remember, you're excited about head-tail-tail; you're excited about head-tail-head. We start tossing a coin, and we get a head -- and you start sitting on the edge of your seat because something great and wonderful, or awesome, might be about to happen. The next toss is a tail -- you get really excited. The champagne's on ice just next to you; you've got the glasses chilled to celebrate. You're waiting with bated breath for the final toss. And if it comes down a head, that's great. You're done, and you celebrate. If it's a tail -- well, rather disappointedly, you put the glasses away and put the champagne back. And you keep tossing, to wait for the next head, to get excited.

Je kan dit op twee manieren bekijken. Ik geef je er één. Beeld je in -- beeld je in dat je het doet. Aan deze kant -- herinner je, jullie vinden kop-munt-munt spannend, jullie vinden kop-munt-kop spannend. We gooien en krijgen kop. Je gaat op het puntje van je stoel zitten omdat iets groots en fantastisch misschien wel op til is. De volgende keer is munt -- het wordt echt spannend. De champagne staat klaar, je hebt de glazen gekoeld om te vieren. Je wacht met ingehouden adem op de laatste gooi. Als dat kop is, is dat fantastisch. Je bent klaar, en je viert. Als het munt is, zet je teleurgesteld de glazen weg en de champagne. Je blijft opgooien, je wacht op de volgende kop, en op de spanning.

On this side, there's a different experience. It's the same for the first two parts of the sequence. You're a little bit excited with the first head -- you get rather more excited with the next tail. Then you toss the coin. If it's a tail, you crack open the champagne. If it's a head you're disappointed, but you're still a third of the way to your pattern again. And that's an informal way of presenting it -- that's why there's a difference. Another way of thinking about it -- if we tossed a coin eight million times, then we'd expect a million head-tail-heads and a million head-tail-tails -- but the head-tail-heads could occur in clumps. So if you want to put a million things down amongst eight million positions and you can have some of them overlapping, the clumps will be further apart. It's another way of getting the intuition.

Aan deze kant is de ervaring anders. Ze loopt gelijk voor de eerste twee delen. Je bent opgewonden bij de eerste kop -- de spanning stijgt met de volgende munt. Dan gooi je op. Als het munt is, kraak je de fles champagne. Als het kop is ben je teleurgesteld, maar je bent nog steeds een derde verwijderd van je patroon. Dit is een informele voorstellingswijze. Daarom is er een verschil. Een andere manier om ernaar te kijken -- als we 8 miljoen keren zouden opgooien, zouden we een miljoen keren kop-munt-kop verwachten en een miljoen keren kop-munt-munt. Maar kop-munt-kop kan in clusters voorkomen. Als je een miljoen dingen wil spreiden over acht miljoen posities en sommige kunnen overlappen, dan zullen de clusters verder uiteen liggen. Dat is een andere manier om de intuïtie te snappen;

What's the point I want to make? It's a very, very simple example, an easily stated question in probability, which every -- you're in good company -- everybody gets wrong. This is my little diversion into my real passion, which is genetics. There's a connection between head-tail-heads and head-tail-tails in genetics, and it's the following. When you toss a coin, you get a sequence of heads and tails. When you look at DNA, there's a sequence of not two things -- heads and tails -- but four letters -- As, Gs, Cs and Ts. And there are little chemical scissors, called restriction enzymes which cut DNA whenever they see particular patterns. And they're an enormously useful tool in modern molecular biology. And instead of asking the question, "How long until I see a head-tail-head?" -- you can ask, "How big will the chunks be when I use a restriction enzyme which cuts whenever it sees G-A-A-G, for example? How long will those chunks be?"

Wat is mijn punt? Het is een heel eenvoudig voorbeeld, een gemakkelijk geformuleerde vraag in kansrekening, die iedereen -- je bent in goed gezelschap -- fout heeft. Nu volgt een kleine omweg naar mijn echte passie, genetica. Er is een verband tussen kop-munt-kop en kop-munt-munt in genetica, namelijk het volgende. Als je een munt opgooit, krijg je een sequentie van kop en munt. Bij DNA is er een sequentie, niet van twee dingen (kop en munt), maar van vier letters: A, G, C en T. Er bestaan kleine chemische schaartjes, restrictie-enzymen genaamd. Die knippen DNA telkens als ze een bepaald patroon zien. Het is een uiterst succesvol gereedschap in de moderne moleculaire biologie. In plaats van te vragen: "Hoe lang nog voor ik kop-munt-kop krijg?" kan je vragen: "Hoe groot zullen de stukken zijn, als ik een restrictie-enzyme gebruik dat knipt telkens als het G-A-A-G ziet, bijvoorbeeld? Hoe lang zullen deze stukken zijn?"

That's a rather trivial connection between probability and genetics. There's a much deeper connection, which I don't have time to go into and that is that modern genetics is a really exciting area of science. And we'll hear some talks later in the conference specifically about that. But it turns out that unlocking the secrets in the information generated by modern experimental technologies, a key part of that has to do with fairly sophisticated -- you'll be relieved to know that I do something useful in my day job, rather more sophisticated than the head-tail-head story -- but quite sophisticated computer modelings and mathematical modelings and modern statistical techniques. And I will give you two little snippets -- two examples -- of projects we're involved in in my group in Oxford, both of which I think are rather exciting. You know about the Human Genome Project. That was a project which aimed to read one copy of the human genome. The natural thing to do after you've done that -- and that's what this project, the International HapMap Project, which is a collaboration between labs in five or six different countries. Think of the Human Genome Project as learning what we've got in common, and the HapMap Project is trying to understand where there are differences between different people.

Dat is een eerder triviaal verband tussen kansberekening en genetica. Er is ook een dieper verband. Ik heb geen tijd om erop in te gaan, maar moderne genetica is een echt spannend wetenschappelijk domein. We horen hier later in deze conferentie specifieke talks over. Het blijkt dat als je de geheimen ontsluit uit de informatie opgeleverd door moderne experimentele technologie, er een sleutelrol is weggelegd voor gecompliceerde -- je zult opgelucht zijn dat ik ook nuttige dingen doe in mijn dagelijks werk, veel gecompliceerder dan het verhaal van kop-munt-kop -- dus echt gecompliceerde computermodellen en wiskundige modellen, en moderne statistische technieken. Ik geef je twee snippers -- twee voorbeelden -- van projecten waar mijn groep in Oxford aan meewerkte. Beide volgens mij erg spannend. Je kent het Menselijk Genoomproject. Dat project had tot doel één kopie van het menselijke genoom te lezen. De natuurlijke volgende stap is -- daarover gaat dit project, het International HapMap Project, een samenwerking tussen laboratoria in 5 of 6 verschillende landen. Als het Menselijk Genoomproject gaat over begrijpen wat we gemeen hebben, dan gaat het HapMap Project over begrijpen waar de verschillen tussen verschillende mensen zitten.

Why do we care about that? Well, there are lots of reasons. The most pressing one is that we want to understand how some differences make some people susceptible to one disease -- type-2 diabetes, for example -- and other differences make people more susceptible to heart disease, or stroke, or autism and so on. That's one big project. There's a second big project, recently funded by the Wellcome Trust in this country, involving very large studies -- thousands of individuals, with each of eight different diseases, common diseases like type-1 and type-2 diabetes, and coronary heart disease, bipolar disease and so on -- to try and understand the genetics. To try and understand what it is about genetic differences that causes the diseases. Why do we want to do that? Because we understand very little about most human diseases. We don't know what causes them. And if we can get in at the bottom and understand the genetics, we'll have a window on the way the disease works, and a whole new way about thinking about disease therapies and preventative treatment and so on. So that's, as I said, the little diversion on my main love.

Waarom vinden we dat belangrijk? Om velerlei redenen. De dringendste is dat we willen begrijpen hoe bepaalde verschillen sommige mensen vatbaar maken voor een ziekte -- bijvoorbeeld diabetes type 2 -- en andere verschillen mensen vatbaar maken voor hartziekten, of beroertes, autisme enzovoort. Dat is één groot project. Er is een tweede groot project dat recent werd gefinancierd door de Wellcome Trust in dit land. Het gaat om zeer omvangrijke studies -- duizenden individuen, telkens 8 verschillende ziektes, gangbare ziektes zoals diabetes type 1 en type 2, coronaire hartklachten, bipolaire stoornissen enzovoort -- om de genetica te begrijpen. Om te begrijpen hoe genetische verschillen ziektes veroorzaken. Waarom willen we dat doen? Omdat we heel weinig verstand hebben van de meeste menselijke ziektes. We kennen hun oorzaak niet. Als we onderaan kunnen beginnen en de genetica begrijpen, geeft dat ons een zicht op hoe de ziekte functioneert. En ook een heel nieuwe kijk op behandeling van ziektes, preventieve behandeling enzovoort. Tot zover de kleine zijsprong over mijn grootste liefde.

Back to some of the more mundane issues of thinking about uncertainty. Here's another quiz for you -- now suppose we've got a test for a disease which isn't infallible, but it's pretty good. It gets it right 99 percent of the time. And I take one of you, or I take someone off the street, and I test them for the disease in question. Let's suppose there's a test for HIV -- the virus that causes AIDS -- and the test says the person has the disease. What's the chance that they do? The test gets it right 99 percent of the time. So a natural answer is 99 percent. Who likes that answer? Come on -- everyone's got to get involved. Don't think you don't trust me anymore. (Laughter) Well, you're right to be a bit skeptical, because that's not the answer. That's what you might think. It's not the answer, and it's not because it's only part of the story. It actually depends on how common or how rare the disease is. So let me try and illustrate that. Here's a little caricature of a million individuals. So let's think about a disease that affects -- it's pretty rare, it affects one person in 10,000. Amongst these million individuals, most of them are healthy and some of them will have the disease. And in fact, if this is the prevalence of the disease, about 100 will have the disease and the rest won't. So now suppose we test them all. What happens? Well, amongst the 100 who do have the disease, the test will get it right 99 percent of the time, and 99 will test positive. Amongst all these other people who don't have the disease, the test will get it right 99 percent of the time. It'll only get it wrong one percent of the time. But there are so many of them that there'll be an enormous number of false positives. Put that another way -- of all of them who test positive -- so here they are, the individuals involved -- less than one in 100 actually have the disease. So even though we think the test is accurate, the important part of the story is there's another bit of information we need.

Terug naar meer wereldlijke kwesties van onzekerheidsdenken. Ik heb nog een kwis voor jullie. Beeld je in dat we een test hebben voor een ziekte die niet onfeilbaar is, maar wel vrij goed. Hij is in 99 procent van de gevallen correct. Ik neem hem af bij jou, of bij iemand die ik op straat ontmoet. Ik test ze op de ziekte in kwestie. Laten we zeggen dat het een hiv-test is -- het virus dat aids veroorzaakt. Volgens de test heeft de persoon de ziekte. Hoe groot is de kans dat dat zo is? De test is in 99 procent van de gevallen correct. Het natuurlijke antwoord is dus 99 procent. Wie ziet wel wat in dat antwoord? Komaan -- iedereen moet meedoen. Niet denken dat je mij niet meer vertrouwt. (Gelach). Je doet er goed aan een beetje sceptisch te zijn, want dat is niet het antwoord. Dat is wat je zou denken. Het is niet het antwoord, omdat het maar een deel van het verhaal is. Het hangt af van hoe gangbaar of zeldzaam de ziekte is. Ik zal proberen dat te illustreren. Dit is een karikatuur van een miljoen individuen. Stel je voor dat je een ziekte hebt die -- een zeldzame ziekte, die één persoon op 10.000 treft. Van dit miljoen mensen zijn de meesten gezond, en zullen sommigen de ziekte hebben. Als de ziekte zo vaak voorkomt, zullen er ongeveer 100 de ziekte hebben en de rest niet. Veronderstel dat we ze allemaal testen. Wat dan? Van de 100 die de ziekte hebben, zal de test dat in 99 procent van de gevallen aangeven: 99 mensen zullen positief testen; Van alle andere mensen die de ziekte niet hebben, zal de test dat in 99 procent van de gevallen aangeven. Hij zal maar in één procent van de gevallen fout zijn. Maar omdat ze met zovelen zijn, zal er een enorm aantal vals positieve tests zijn. Anders gezegd -- van al diegenen die positief testen -- hier zijn ze, de betrokken individuen -- heeft er minder dan één op honderd de ziekte. Zelfs al denken we dat de test accuraat is, dan nog is het belangrijk te weten dat we nog een ander stukje informatie nodig hebben.

Here's the key intuition. What we have to do, once we know the test is positive, is to weigh up the plausibility, or the likelihood, of two competing explanations. Each of those explanations has a likely bit and an unlikely bit. One explanation is that the person doesn't have the disease -- that's overwhelmingly likely, if you pick someone at random -- but the test gets it wrong, which is unlikely. The other explanation is that the person does have the disease -- that's unlikely -- but the test gets it right, which is likely. And the number we end up with -- that number which is a little bit less than one in 100 -- is to do with how likely one of those explanations is relative to the other. Each of them taken together is unlikely.

Hier is de basisintuïtie. Wat we moeten doen, als we weten dat de test positief is, is de geloofwaardigheid, of de waarschijnlijkheid, van twee concurrerende verklaringen afwegen. Elk van deze verklaringen heeft een onwaarschijnlijk en een onwaarschijnlijk deel. De ene verklaring is dat de persoon de ziekte niet heeft -- dat is uitermate waarschijnlijk, als je op goed geluk iemand uitkiest -- en dat de test dat fout heeft, wat onwaarschijnlijk is. De andere verklaring is dat de persoon de ziekte heeft -- dat is onwaarschijnlijk -- maar dat de test het goed heeft, wat waarschijnlijk is. Het getal waarop we uitkomen -- dat getal dat een fractie lager is dan één op honderd -- heeft te maken met hoe waarschijnlijk één van die verklaringen is in vergelijking met de andere. Elk van hen is, samen genomen, onwaarschijnlijk.

Here's a more topical example of exactly the same thing. Those of you in Britain will know about what's become rather a celebrated case of a woman called Sally Clark, who had two babies who died suddenly. And initially, it was thought that they died of what's known informally as "cot death," and more formally as "Sudden Infant Death Syndrome." For various reasons, she was later charged with murder. And at the trial, her trial, a very distinguished pediatrician gave evidence that the chance of two cot deaths, innocent deaths, in a family like hers -- which was professional and non-smoking -- was one in 73 million. To cut a long story short, she was convicted at the time. Later, and fairly recently, acquitted on appeal -- in fact, on the second appeal. And just to set it in context, you can imagine how awful it is for someone to have lost one child, and then two, if they're innocent, to be convicted of murdering them. To be put through the stress of the trial, convicted of murdering them -- and to spend time in a women's prison, where all the other prisoners think you killed your children -- is a really awful thing to happen to someone. And it happened in large part here because the expert got the statistics horribly wrong, in two different ways.

Hier is een actueler voorbeeld van precies hetzelfde. De Britten onder jullie kennen allicht de beroemd geworden zaak van een vrouw genaamd Sally Clark. Ze had twee baby's die plots stierven. Oorspronkelijk dacht men dat ze gestorven waren aan wat we in spreektaal "wiegendood" noemen, en in vaktaal "syndroom van het plotseling overlijden van een zuigeling". Om allerlei redenen werd ze later van moord beschuldigd. Op haar proces getuigde een zeer gerespecteerd kinderarts dat de kans op twee gevallen van wiegendood, onschuldige overlijdens, in een gezin als het hare -- professioneel, niet-rokers -- één op 73 miljoen was. Om kort te gaan, ze werd toentertijd veroordeeld. Later, recentelijk, werd ze in beroep vrijgesproken -- eigenlijk in tweede beroep. Om de context mee te geven: je kan je voorstellen hoe vreselijk het is voor iemand om één kind verloren te hebben, en dan twee, en dan onschuldig veroordeeld te worden voor twee moorden. Je gaat door de stress van het proces, wordt veroordeeld voor de moorden en zit in een vrouwengevangenis, waar alle andere gevangenen denken dat je je kinderen hebt vermoord. Dat is een vreselijke beproeving. Het gebeurde grotendeels omdat de expert de statistieken helemaal fout bekeek, in twee opzichten.

So where did he get the one in 73 million number? He looked at some research, which said the chance of one cot death in a family like Sally Clark's is about one in 8,500. So he said, "I'll assume that if you have one cot death in a family, the chance of a second child dying from cot death aren't changed." So that's what statisticians would call an assumption of independence. It's like saying, "If you toss a coin and get a head the first time, that won't affect the chance of getting a head the second time." So if you toss a coin twice, the chance of getting a head twice are a half -- that's the chance the first time -- times a half -- the chance a second time. So he said, "Here, I'll assume that these events are independent. When you multiply 8,500 together twice, you get about 73 million." And none of this was stated to the court as an assumption or presented to the jury that way. Unfortunately here -- and, really, regrettably -- first of all, in a situation like this you'd have to verify it empirically. And secondly, it's palpably false. There are lots and lots of things that we don't know about sudden infant deaths. It might well be that there are environmental factors that we're not aware of, and it's pretty likely to be the case that there are genetic factors we're not aware of. So if a family suffers from one cot death, you'd put them in a high-risk group. They've probably got these environmental risk factors and/or genetic risk factors we don't know about. And to argue, then, that the chance of a second death is as if you didn't know that information is really silly. It's worse than silly -- it's really bad science. Nonetheless, that's how it was presented, and at trial nobody even argued it. That's the first problem. The second problem is, what does the number of one in 73 million mean? So after Sally Clark was convicted -- you can imagine, it made rather a splash in the press -- one of the journalists from one of Britain's more reputable newspapers wrote that what the expert had said was, "The chance that she was innocent was one in 73 million." Now, that's a logical error. It's exactly the same logical error as the logical error of thinking that after the disease test, which is 99 percent accurate, the chance of having the disease is 99 percent. In the disease example, we had to bear in mind two things, one of which was the possibility that the test got it right or not. And the other one was the chance, a priori, that the person had the disease or not. It's exactly the same in this context. There are two things involved -- two parts to the explanation. We want to know how likely, or relatively how likely, two different explanations are. One of them is that Sally Clark was innocent -- which is, a priori, overwhelmingly likely -- most mothers don't kill their children. And the second part of the explanation is that she suffered an incredibly unlikely event. Not as unlikely as one in 73 million, but nonetheless rather unlikely. The other explanation is that she was guilty. Now, we probably think a priori that's unlikely. And we certainly should think in the context of a criminal trial that that's unlikely, because of the presumption of innocence. And then if she were trying to kill the children, she succeeded. So the chance that she's innocent isn't one in 73 million. We don't know what it is. It has to do with weighing up the strength of the other evidence against her and the statistical evidence. We know the children died. What matters is how likely or unlikely, relative to each other, the two explanations are. And they're both implausible. There's a situation where errors in statistics had really profound and really unfortunate consequences. In fact, there are two other women who were convicted on the basis of the evidence of this pediatrician, who have subsequently been released on appeal. Many cases were reviewed. And it's particularly topical because he's currently facing a disrepute charge at Britain's General Medical Council.

Waar haalde hij het cijfer van één op 73 miljoen? Hij bekeek wat research, die zei dat de kans op één wiegendood in een familie als die van Sally Clark ongeveer één op 8.500 is. Dus zei hij: "Ik ga ervan uit dat als je één wiegendood in de familie hebt, de kans onveranderd is dat een tweede kind aan wiegendood sterft." Dat is wat statistici een assumptie van onafhankelijkheid noemen. Alsof je zegt: "Als je een munt opgooit en de eerste keer kop hebt, wijzigt dat je kans niet op een tweede keer kop." Als je twee keer opgooit, is je kans om kop te krijgen 0,5 -- dat is je kans de eerste keer -- maal 0,5 -- dat is je kans de tweede keer. Hij zegt dus: "Laten we veronderstellen -- ik veronderstel dat deze gebeurtenissen onafhankelijk zijn. Als je 8.500 maal zichzelf doet, krijg je ongeveer 73 miljoen." Dit werd niet zo voor het hof gesteld, en ook niet zo aan de jury voorgesteld. Jammer genoeg -- echt betreurenswaardig -- moet je dit ten eerste in zo'n situatie empirisch verifiëren. Ten tweede is het manifest fout. Er zijn heel veel dingen die we niet weten over wiegendood. Misschien zijn er omgevingsfactoren waar we ons niet van bewust zijn. Het is tamelijk waarschijnlijk dat er genetische factoren zijn die we niet kennen. Dus als een familie één wiegendood kent, zou je ze in een groep met verhoogd risico stoppen. Waarschijnlijk hebben zij deze omgevingsrisicofactoren en/of genetische factoren die we niet kennen. Dan betogen dat de kans op een tweede dood niet afhangt van je kennis van die informatie, is echt dwaas. Het is erger dan dwaas -- het is echt belabberde wetenschap. Niettemin was dat hoe het werd voorgesteld. Op het proces viel niemand erover. Dat is het eerste probleem. Het tweede probeem is: wat betekent het cijfer van één op 73 miljoen? Nadat Sally Clark was veroordeeld -- je kan je voorstellen dat dit nogal ophef maakte in de pers -- schreef één van de journalisten van één van de betere Britse kranten dat de expert had gezegd: "De kans dat ze onschuldig is, is één op 73 miljoen." Dat is een denkfout. Het is dezelfde denkfout als de aanname dat je op basis van de 99 procent accurate ziektetest, 99 procent kans hebt om de ziekte te hebben. In het ziektevoorbeeld moesten we twee dingen voor ogen houden, waaronder de mogelijkheid dat de test het al dan niet goed had. Het andere was de kans, a priori, dat de persoon al dan niet de ziekte had. Het is precies hetzelfde in deze context. Er spelen twee dingen -- twee delen van de verklaring. We willen weten hoe waarschijnlijk, hoe relatief waarschijnlijk, twee verschillende verklaringen zijn. De eerste is dat Sally Clarke onschuldig is, wat a priori uitermate waarschijnlijk is -- de meeste moeders vermoorden hun kinderen niet. Het tweede deel van de verklaring is dat ze slachtoffer was van een zeer onwaarschijnlijke gebeurtenis. Niet zo onwaarschijnlijk als één op 73 miljoen, maar niettemin erg onwaarschijnlijk. De tweede verklaring is dat ze schuldig was. Dat zullen we a priori als onwaarschijnlijk beschouwen. In de context van een strafzaak moeten we dat zeker als onwaarschijnlijk beschouwen, vanwege het vermoeden van onschuld. Als ze zou proberen haar kinderen te doden, zou ze erin geslaagd zijn. De kans dat ze onschuldig is, is niet één op 73 miljoen. We kennen die kans niet. Ze heeft te maken met het afwegen van de sterkte van de andere bewijzen tegen haar versus het statistische bewijs. We weten dat de kinderen zijn gestorven. Van belang is hoe waarschijnlijk of onwaarschijnlijk de twee verklaringen zijn, relatief ten opzichte van elkaar. Ze zijn allebei onwaarschijnlijk. Dit is een situatie waarin statistische fouten diepgaande en echt ellendige gevolgen hadden. Er zijn zelfs twee andere vrouwen die werden veroordeeld op basis van het getuigenis van deze kinderarts. Ze zijn intussen vrijgelaten na beroep. Vele zaken werden hervormd. Het is bijzonder actueel omdat momenteel een tuchtzaak tegen hem loopt voor de Britse Hoge Medische Raad.

So just to conclude -- what are the take-home messages from this? Well, we know that randomness and uncertainty and chance are very much a part of our everyday life. It's also true -- and, although, you, as a collective, are very special in many ways, you're completely typical in not getting the examples I gave right. It's very well documented that people get things wrong. They make errors of logic in reasoning with uncertainty. We can cope with the subtleties of language brilliantly -- and there are interesting evolutionary questions about how we got here. We are not good at reasoning with uncertainty. That's an issue in our everyday lives. As you've heard from many of the talks, statistics underpins an enormous amount of research in science -- in social science, in medicine and indeed, quite a lot of industry. All of quality control, which has had a major impact on industrial processing, is underpinned by statistics. It's something we're bad at doing. At the very least, we should recognize that, and we tend not to. To go back to the legal context, at the Sally Clark trial all of the lawyers just accepted what the expert said. So if a pediatrician had come out and said to a jury, "I know how to build bridges. I've built one down the road. Please drive your car home over it," they would have said, "Well, pediatricians don't know how to build bridges. That's what engineers do." On the other hand, he came out and effectively said, or implied, "I know how to reason with uncertainty. I know how to do statistics." And everyone said, "Well, that's fine. He's an expert." So we need to understand where our competence is and isn't. Exactly the same kinds of issues arose in the early days of DNA profiling, when scientists, and lawyers and in some cases judges, routinely misrepresented evidence. Usually -- one hopes -- innocently, but misrepresented evidence. Forensic scientists said, "The chance that this guy's innocent is one in three million." Even if you believe the number, just like the 73 million to one, that's not what it meant. And there have been celebrated appeal cases in Britain and elsewhere because of that.

Tot slot -- wat leren we hieruit? We weten dat toeval, onzekerheid en kans een belangrijk deel van ons leven van elke dag zijn. Het is ook zo -- hoewel jullie als groep op vele manieren bijzonder zijn, hebben jullie net als iedereen mijn voorbeelden fout opgelost. We hebben veel bewijzen dat mensen dit verkeerd zien. Ze maken denkfouten als ze redeneren op basis van kansberekening. We kunnen briljant omgaan met de subtiliteiten van de taal. Er zijn interessante evolutievraagstukken over hoe we zover kwamen. We zijn niet goed in redeneren op basis van kansberekening. Dat is een probleem in ons dagelijks leven. Zoals je in vele talks hebt gehoord, vormen statistieken de basis van een enorme hoop onderzoek in dit verband -- in de sociale wetenschappen, de geneeskunde en zeker ook in vele bedrijfstakken. Kwaliteitscontrole, die een grote impact heeft op industriële verwerking, is vaak op statistieken gestoeld. We zijn er niet goed in. We zouden dat minstens moeten erkennen, maar dat doen we niet. Terug naar de juridische context: op het proces van Sally Clark aanvaardden alle advocaten gewoon wat de expert zei Stel dat een kinderarts de jury had verteld: "Ik kan bruggen bouwen. Ik heb er verder in de straat eentje gebouwd. Ik nodig u uit om erover naar huis te rijden." Dan zouden ze gezegd hebben: "Kinderartsen kennen niets van bruggen bouwen. Daar hebben we ingenieurs voor." Anderzijds kwam hij inderdaad vertellen, of liet hij verstaan: "Ik weet hoe ik op basis van onzekerheid moet redeneren. Ik ken iets van statistiek." En iedereen zei: "Dat is goed. Hij is expert." We moeten begrijpen waar de grens van onze competentie ligt. We hadden exact dezelfde soort problemen in de begindagen van de DNA-profilering, toen wetenschappers, advocaten en soms rechters aan de lopende band het bewijsmateriaal fout voorstelden. Meestal -- dat hopen we -- gebeurde dat onbewust, maar ze stelden het fout voor. Gerechtsexperts zeiden: "De kans dat deze kerel onschuldig is, is één op drie miljoen." Zelfs al geloofde je het getal, zoals bij de 73 miljoen tegen één is dat niet wat het betekende. Er zijn beroemde gevallen van beroep in Groot-Brittannië en elders, om die reden.

And just to finish in the context of the legal system. It's all very well to say, "Let's do our best to present the evidence." But more and more, in cases of DNA profiling -- this is another one -- we expect juries, who are ordinary people -- and it's documented they're very bad at this -- we expect juries to be able to cope with the sorts of reasoning that goes on. In other spheres of life, if people argued -- well, except possibly for politics -- but in other spheres of life, if people argued illogically, we'd say that's not a good thing. We sort of expect it of politicians and don't hope for much more. In the case of uncertainty, we get it wrong all the time -- and at the very least, we should be aware of that, and ideally, we might try and do something about it. Thanks very much.

Laten we eindigen in de context van het juridische systeem. Het is prima om te zeggen: "Laten we ons best doen om het bewijsmateriaal voor te stellen." Maar het gebeurt steeds vaker, bij DNA-profilering (dit is er nog een) dat we verwachten dat jury's, gewone mensen, van wie we weten dat ze hier erg slecht in zijn, in staat zijn om dit soort redeneringen te bevatten. In andere domeinen van het leven zouden we, als mensen -- wel, met uitzondering van de politiek -- maar in andere domeinen zouden we, als mensen een onlogisch betoog hielden, zeggen dat dat geen goede zaak is. Van politici verwachten we het zowat, daar hebben we niet veel hoop. Bij onzekerheid hebben we het voortdurend fout. Daar zouden we ons tenminste bewust van moeten zijn. Idealiter zouden we er iets aan doen. Hartelijk dank.

Peter Donnelly: How juries are fooled by statistics

Peter Donnelly: How juries are fooled by statistics

Related talks

Hans Rosling: The best stats you've ever seen

Michael Shermer: Why people believe weird things

Emily Oster: Flip your thinking on AIDS in Africa

Robert Full: Learning from the gecko's tail

Aubrey de Grey: A roadmap to end aging

E.O. Wilson: Advice to a young scientist

Related talks

Hans Rosling: The best stats you've ever seen

Michael Shermer: Why people believe weird things

Emily Oster: Flip your thinking on AIDS in Africa

Robert Full: Learning from the gecko's tail

Aubrey de Grey: A roadmap to end aging

E.O. Wilson: Advice to a young scientist