Peter Donnelly: How juries are fooled by statistics

As other speakers have said, it's a rather daunting experience -- a particularly daunting experience -- to be speaking in front of this audience. But unlike the other speakers, I'm not going to tell you about the mysteries of the universe, or the wonders of evolution, or the really clever, innovative ways people are attacking the major inequalities in our world. Or even the challenges of nation-states in the modern global economy. My brief, as you've just heard, is to tell you about statistics -- and, to be more precise, to tell you some exciting things about statistics. And that's -- (Laughter) -- that's rather more challenging than all the speakers before me and all the ones coming after me. (Laughter) One of my senior colleagues told me, when I was a youngster in this profession, rather proudly, that statisticians were people who liked figures but didn't have the personality skills to become accountants. (Laughter) And there's another in-joke among statisticians, and that's, "How do you tell the introverted statistician from the extroverted statistician?" To which the answer is, "The extroverted statistician's the one who looks at the other person's shoes." (Laughter) But I want to tell you something useful -- and here it is, so concentrate now. This evening, there's a reception in the University's Museum of Natural History. And it's a wonderful setting, as I hope you'll find, and a great icon to the best of the Victorian tradition. It's very unlikely -- in this special setting, and this collection of people -- but you might just find yourself talking to someone you'd rather wish that you weren't. So here's what you do. When they say to you, "What do you do?" -- you say, "I'm a statistician." (Laughter) Well, except they've been pre-warned now, and they'll know you're making it up. And then one of two things will happen. They'll either discover their long-lost cousin in the other corner of the room and run over and talk to them. Or they'll suddenly become parched and/or hungry -- and often both -- and sprint off for a drink and some food. And you'll be left in peace to talk to the person you really want to talk to.

Aşa cum au spus şi alţi vorbitori, este o experienţă destul de descurajatoare -- o experienţă extrem de descurajantă -- să vorbeşti în faţa acestei audienţe. Însă spre deosebire de ceilalţi vorbitori, nu vă voi spune despre misterele universului, sau despre minunile evoluţiei, sau despre modurile cu adevărat inteligente, inovative prin care oamenii atacă cele mai mari inegalităţi ale lumii noastre. Sau chiar provocările întâmpinate de naţiunile-stat în economia globală. Scurta mea prezentare e, după cum aţi auzit, despre statistici -- şi, ca să fiu mai explicit, despre câteva lucruri interesante privind statisticile. Iar asta e -- (Râsete) -- asta e ceva mai provocatoare decât ale tuturor prezentatorilor dinainte și de după mine. (Râsete) Unul dintre colegii mei seniori mi-a spus, când eram un novice în această profesie, oarecum mândru, că statisticienii sunt oameni cărora le plac cifrele dar care nu au trăsăturile de personalitate potrivite pentru a ajunge contabili. (Râsete) Şi mai există o glumă printre statisticieni, iar aceasta sună aşa: "Cum deosebiți statisticianul introvert de statisticianul extrovert?" Răspunsul este "Statisticianul extrovert este cel care se uită la pantofii altor persoane." (Râsete) Dar aş vrea să vă spun ceva util -- şi iată, acum concentraţi-vă. În această seară are loc o recepţie la Muzeul de Istorie Naturală al Universităţii. Şi este un cadru minunat, aşa cum sper să vi se pară şi vouă, şi o imagine excelentă a celor mai bune aspecte ale tradiţiei victoriene. Este foarte puţin probabil -- în acest cadru special, şi cu această colecţie de oameni -- dar s-ar putea să ajungeţi să staţi de vorbă cu cineva cu care aţi prefera să nu o faceţi. Aşa că iată cum procedaţi. Când ei vă întreabă "Cu ce vă ocupați" - le răspundeți "Sunt statistician." (Râsete) Bine, cu excepția situației în care ei vor fi fost anunțați înainte, și vor ști că mințiți. Apoi un lucru din două se va întâmpla. Fie își vor descoperi un văr îndepărtat în celălalt colț al încăperii și vor merge imediat să vorbească cu el. Fie dintr-odată li se face foarte sete și/sau foame --și deseori amândouă -- și vor alerga pentru o băutură și ceva de mâncare. Iar voi veți fi lăsați în pace să vorbiți cu acea persoană cu care vreți să vorbiți într-adevărat.

It's one of the challenges in our profession to try and explain what we do. We're not top on people's lists for dinner party guests and conversations and so on. And it's something I've never really found a good way of doing. But my wife -- who was then my girlfriend -- managed it much better than I've ever been able to. Many years ago, when we first started going out, she was working for the BBC in Britain, and I was, at that stage, working in America. I was coming back to visit her. She told this to one of her colleagues, who said, "Well, what does your boyfriend do?" Sarah thought quite hard about the things I'd explained -- and she concentrated, in those days, on listening. (Laughter) Don't tell her I said that. And she was thinking about the work I did developing mathematical models for understanding evolution and modern genetics. So when her colleague said, "What does he do?" She paused and said, "He models things." (Laughter) Well, her colleague suddenly got much more interested than I had any right to expect and went on and said, "What does he model?" Well, Sarah thought a little bit more about my work and said, "Genes." (Laughter) "He models genes."

Este una din provocările profesiei noastre să încercăm să explicăm ce facem. Nu ne aflăm în fruntea listelor cu invitați la petreceri și conversații și altele. Iar asta e ceva ce niciodată n-am știut cum să fac. Însă soția mea -- pe vremea aceea prietena mea -- s-a descurcat mult mai bine decât am reușit eu vreodată. Cu mulți ani în urmă, când am început să ieșim împreună, ea lucra la BBC în Marea Britanie, iar eu, la vremea aceea, lucram în America. Mă întorceam să o vizitez. Ea a spus acest lucru colegilor ei, care au întrebat-o "Ei bine, cu ce se ocupă prietenul tău?" Sarah s-a gândit destul de mult la lucrurile pe care le explicasem eu -- și s-a concentrat, în acele zile, pe ascultare. (Râsete) Nu îi spuneți că am zis asta. Iar ea se gândea la munca pe care o depusesem în dezvoltarea modelelor matematice pentru înțelegerea evoluției și geneticii moderne. Deci când a întrebat-o colega sa "Cu ce se ocupă el?" Ea a făcut o pauză şi a spus "Modelează lucruri." (Râsete) Ei bine, colega ei a devenit dintr-odată mult mai interesată decât aveam eu dreptul să mă aştept şi a întrebat în continuare "Ce modelează?" Ei bine, Sarah s-a gândit un pic mai mult la munca mea şi a răspuns "Gene." (Râsete) "Modelează gene."

That is my first love, and that's what I'll tell you a little bit about. What I want to do more generally is to get you thinking about the place of uncertainty and randomness and chance in our world, and how we react to that, and how well we do or don't think about it. So you've had a pretty easy time up till now -- a few laughs, and all that kind of thing -- in the talks to date. You've got to think, and I'm going to ask you some questions. So here's the scene for the first question I'm going to ask you. Can you imagine tossing a coin successively? And for some reason -- which shall remain rather vague -- we're interested in a particular pattern. Here's one -- a head, followed by a tail, followed by a tail.

Aceasta e prima mea iubire și vă voi povesti puțin despre ea. Ceea ce vreau în general este să vă fac să vă gândiți la locul incertitudinii, aleatoriului și șansei în lumea noastră, și la cum reacționăm noi față de acestea și cât de bine gândim sau nu despre asta. Așadar ați avut o perioadă destul de lejeră până acum -- câteva râsete, și lucruri de genul ăsta -- în prelegerile de până acum. Ați ajuns să vă gândiți, iar eu vă voi pune niște întrebări. Așadar iată prima scenă pentru prima întrebare pe care vă voi adresa-o. Vă puteți imagina aruncând o monedă în mod succesiv? Iar dintr-un anumit motiv -- care va rămâne destul de vag -- ne interesează un anumit model. Iată unul -- un cap, urmat de o pajură, urmat de o pajură

So suppose we toss a coin repeatedly. Then the pattern, head-tail-tail, that we've suddenly become fixated with happens here. And you can count: one, two, three, four, five, six, seven, eight, nine, 10 -- it happens after the 10th toss. So you might think there are more interesting things to do, but humor me for the moment. Imagine this half of the audience each get out coins, and they toss them until they first see the pattern head-tail-tail. The first time they do it, maybe it happens after the 10th toss, as here. The second time, maybe it's after the fourth toss. The next time, after the 15th toss. So you do that lots and lots of times, and you average those numbers. That's what I want this side to think about.

Să presupunem că aruncăm o monedă în mod repetat. Modelul, cap-pajură-pajură, de care am devenit dintr-odată obsedați, are loc aici. Și puteți număra: unu, doi, trei, patru, cinci, șase, șapte, opt, nouă, 10 -- se întâmplă după cea de-a 10-a aruncare. S-ar putea să vă gândiți că sunt lucruri mult mai interesante de făcut, dar faceți-mi pe plac pentru moment. Imaginați-vă că jumătatea asta din audiență scoate fiecare câte o momedă și le aruncă până când văd prima dată cap-pajură-pajură. Prima dată când le iese, poate se întâmplă după a 10-a aruncare, ca aici. A doua oară, poate este după a patra aruncare. Data următoare, după cea de-a 15-a aruncare. Deci faceți acest lucru de foarte multe ori, și faceți media acelor numere. La asta aș vrea să vă gândiți cei din partea asta.

The other half of the audience doesn't like head-tail-tail -- they think, for deep cultural reasons, that's boring -- and they're much more interested in a different pattern -- head-tail-head. So, on this side, you get out your coins, and you toss and toss and toss. And you count the number of times until the pattern head-tail-head appears and you average them. OK? So on this side, you've got a number -- you've done it lots of times, so you get it accurately -- which is the average number of tosses until head-tail-tail. On this side, you've got a number -- the average number of tosses until head-tail-head.

Celeilalte jumătăți de audiență nu îi place cap-pajură-pajură -- ei cred că, din motive culturale profunde, că e plictisitor -- și îi interesează mult mai mult un model diferit -- cap-pajură-cap. Deci, pe partea aceasta, vă scoateți monedele și le aruncați de mai multe ori. Și numărați de căte ori le aruncați până când apare patternul cap-pajură-cap și faceți o medie. Bine? Deci în partea aceasta, ați obținut un număr -- ați făcut-o de multe ori, deci îl obțineți pe cel corect -- care reprezintă numărul mediu al aruncărilor până obțineți cap-pajură-pajură. În partea asta, ați obținut un număr -- media numerelor aruncărilor până obțineți cap-pajură-cap.

So here's a deep mathematical fact -- if you've got two numbers, one of three things must be true. Either they're the same, or this one's bigger than this one, or this one's bigger than that one. So what's going on here? So you've all got to think about this, and you've all got to vote -- and we're not moving on. And I don't want to end up in the two-minute silence to give you more time to think about it, until everyone's expressed a view. OK. So what you want to do is compare the average number of tosses until we first see head-tail-head with the average number of tosses until we first see head-tail-tail.

Deci iată o realitate matematică profundă -- dacă aveți două numere, unul din trei lucruri trebuie să fie adevărat. Fie sunt la fel, fie ăsta e mai mare decât ăsta, fie ăsta e mai mare decât ăsta. Deci ce se petrece aici? Cu toții trebuie să vă gândiți la acest lucru, și toți trebuie să votați -- și nu mergem mai departe. Și nu vreau să se transforme în liniștea de două minute pentru a vă acorda mai mult timp de gândire, până când toată lumea și-a exprimat o opinie. În regulă. Deci ce vreți să faceți este să comparați numărul mediu de aruncări până când vedem prima dată cap-pajură-cap cu numărul mediu de aruncări până vedem prima dată cap-pajură-pajură.

Who thinks that A is true -- that, on average, it'll take longer to see head-tail-head than head-tail-tail? Who thinks that B is true -- that on average, they're the same? Who thinks that C is true -- that, on average, it'll take less time to see head-tail-head than head-tail-tail? OK, who hasn't voted yet? Because that's really naughty -- I said you had to. (Laughter) OK. So most people think B is true. And you might be relieved to know even rather distinguished mathematicians think that. It's not. A is true here. It takes longer, on average. In fact, the average number of tosses till head-tail-head is 10 and the average number of tosses until head-tail-tail is eight. How could that be? Anything different about the two patterns? There is. Head-tail-head overlaps itself. If you went head-tail-head-tail-head, you can cunningly get two occurrences of the pattern in only five tosses. You can't do that with head-tail-tail. That turns out to be important.

Cine crede că A este adevărat -- că, în medie, va dura mai mult până vom vedea cap-pajură-cap decât cap-pajură-pajură? Cine crede că B este adevărat -- că în medie, ele sunt la fel? Cine crede că C este adevărat -- că, în medie, durează mai puțin pentru a vedea cap-pajură-cap decât cap-pajură-pajură? În regulă, cine nu a votat încă? Pentru că e cu adevărat neascultător -- am spus că trebuie să votați. (Râsete) Bine. Așadar majoritatea oamenilor cred că B este adevărat. Și s-ar păutea să vă simțiți ușurați să aflați că inclusiv matematicieni destul de distinși cred asta. Nu este adevărat. Aici A este adevărat. În medie, durează mai mult timp. De fapt, numărul mediu de aruncări până obținem cap-pajură-cap este 10 și numărul mediu de aruncări până obținem cap-pajură-pajură este opt. Cum se poate acest lucru? Există ceva diferit la cele două modele? Există. Cap-pajură-cap se suprapune. Dacă ați obținut cap-pajură-cap-pajură-cap, puteți obține cu viclenie două apariții ale modelului numai din cinci aruncări. Nu puteți face asta cu cap-pajură-pajură. Asta se dovedește a fi important.

There are two ways of thinking about this. I'll give you one of them. So imagine -- let's suppose we're doing it. On this side -- remember, you're excited about head-tail-tail; you're excited about head-tail-head. We start tossing a coin, and we get a head -- and you start sitting on the edge of your seat because something great and wonderful, or awesome, might be about to happen. The next toss is a tail -- you get really excited. The champagne's on ice just next to you; you've got the glasses chilled to celebrate. You're waiting with bated breath for the final toss. And if it comes down a head, that's great. You're done, and you celebrate. If it's a tail -- well, rather disappointedly, you put the glasses away and put the champagne back. And you keep tossing, to wait for the next head, to get excited.

Există două modalități de a te uita la asta. Vă voi da una dintre ele. Așadar imaginați-vă -- să presupunem că facem asta. În partea asta -- țineți minte, sunteți entuziasmați de cap-pajură-pajură, aici sunteți entuziasmați de cap-pajură-pajură. Începem să aruncăm o monedă, și obținem cap -- iar voi vă mutați pe marginea scaunului deoarece ceva extraordinar și minunat, sau excepțional, ar putea fi pe cale să se întâmple. Rezultatul următoarei aruncări este pajură - vă entuziasmați foarte mult. Șampania este la gheață chiar lângă voi, ați pregătit paharele reci pentru a sărbători. Așteptați cu respirația tăiată ultima aruncare. Iar dacă ea arată cap, este nemaipomenit. Ați terminat, și puteți sărbători. Dacă este pajură -- ei bine, oarecum dezamăgiți, vă scoateți ochelarii și puneți șampania la loc. Și continuați să aruncați moneda, să așteptați până cade din nou cap, să vă entuziasmați.

On this side, there's a different experience. It's the same for the first two parts of the sequence. You're a little bit excited with the first head -- you get rather more excited with the next tail. Then you toss the coin. If it's a tail, you crack open the champagne. If it's a head you're disappointed, but you're still a third of the way to your pattern again. And that's an informal way of presenting it -- that's why there's a difference. Another way of thinking about it -- if we tossed a coin eight million times, then we'd expect a million head-tail-heads and a million head-tail-tails -- but the head-tail-heads could occur in clumps. So if you want to put a million things down amongst eight million positions and you can have some of them overlapping, the clumps will be further apart. It's another way of getting the intuition.

În partea aceasta are loc o experiență diferită. Este la fel pentru primele două părți ale succesiunii. Sunteți puțin entuziasmați de primul cap -- deveniți ceva mai entuziasmați odată cu următoarea pajură. Apoi aruncați moneda. Dacă este pajură, desfaceți șampania. Dacă este cap sunteți dezamăgiți, dar din nou încă vă aflați la o treime din drumul către modelul vostru. Iar acesta este un mod informal de a-l prezenta -- de aceea există o diferență. Un alt mod de a vă gândi la el -- dacă am aruncat deja moneda de un milion de ori, atunci ne-am aștepta la un milion de cap-pajură-capete și un milion de cap-pajuri-pajuri -- însă cap-pajură-capete ar putea apărea în grupuri. Deci dacă vreți să așezați un milion de lucruri pe opt milioane de poziții iar unele dintre ele se pot suprapune, grupurile vor fi mai îndepărtate. E un alt mod de a obține intuiția.

What's the point I want to make? It's a very, very simple example, an easily stated question in probability, which every -- you're in good company -- everybody gets wrong. This is my little diversion into my real passion, which is genetics. There's a connection between head-tail-heads and head-tail-tails in genetics, and it's the following. When you toss a coin, you get a sequence of heads and tails. When you look at DNA, there's a sequence of not two things -- heads and tails -- but four letters -- As, Gs, Cs and Ts. And there are little chemical scissors, called restriction enzymes which cut DNA whenever they see particular patterns. And they're an enormously useful tool in modern molecular biology. And instead of asking the question, "How long until I see a head-tail-head?" -- you can ask, "How big will the chunks be when I use a restriction enzyme which cuts whenever it sees G-A-A-G, for example? How long will those chunks be?"

Ce vreau să subliniez de fapt? Este un exemplu foarte, foarte simplu, o întrebare formulată cu ușurință în probabilitate, pe care toată - sunteți într-o companie bună - toată lumea o înțelege greșit. Aceasta este mica mea diversiune către pasiunea reală, care este genetica. Există o legătură între cap-pajură-capete și cap-pajură-pajuri în genetică, aceasta fiind următoarea. Când aruncați o monedă, obțineți o succesiune de capete și pajuri. Când vă uitați la ADN, există o secvență nu de două lucruri -- capete și pajuri -- ci de patru litere -- A, G, C și T. Și există mici foarfece chimice, numite enzime de restricție care taie ADN-ul oricând văd un anumit model. Iar ele sunt o unealtă extrem de utilă în biologia moleculară modernă. Și în locul întrebării "Cât mai durează până când voi vedea cap-pajură-cap?" -- vă puteți întreba "Cât de mari vor fi bucățile când folosesc o enzimă de restricție care taie de fiecare dată când vede, spre exemplu, G-A-A-G? Cât de lungi vor fi acele bucăți?"

That's a rather trivial connection between probability and genetics. There's a much deeper connection, which I don't have time to go into and that is that modern genetics is a really exciting area of science. And we'll hear some talks later in the conference specifically about that. But it turns out that unlocking the secrets in the information generated by modern experimental technologies, a key part of that has to do with fairly sophisticated -- you'll be relieved to know that I do something useful in my day job, rather more sophisticated than the head-tail-head story -- but quite sophisticated computer modelings and mathematical modelings and modern statistical techniques. And I will give you two little snippets -- two examples -- of projects we're involved in in my group in Oxford, both of which I think are rather exciting. You know about the Human Genome Project. That was a project which aimed to read one copy of the human genome. The natural thing to do after you've done that -- and that's what this project, the International HapMap Project, which is a collaboration between labs in five or six different countries. Think of the Human Genome Project as learning what we've got in common, and the HapMap Project is trying to understand where there are differences between different people.

Aceasta este o legătură oarecum trivială între probabilitate și genetică. Există o legătură mult mai profundă, despre care nu am timp acum să vorbesc și anume că genetica e un domeniu științific foarte interesant. Vom auzi câteva discursuri mai târziu în conferința dedicată în mod special acestui lucru. Însă reiese că deblocarea secretelor din informația generată de tehnologii experimentale moderne, o parte esențială a acesteia are de-a face cu foarte sofisticata -- veți fi ușurați să aflați că fac ceva util la serviciul meu zilnic, ceva mai sofisticat decât povestea despre cap-pajură-cap -- dar destul de sofisticatele modelări computerizate și modelări matematice și tehnici statistice moderne. Și vă voi prezenta două fragmente -- două exemple -- de proiecte în care suntem implicați în grupul meu din Oxford, și cred că amândouă sunt destul de interesante. Știți despre Proiectul Genomul Uman. Acela a fost un proiect care țintea spre citirea unei copii a genomului uman. Lucrul pe care ar trebui în mod natural să îl faceți după ce ați terminat cu asta -- și la asta se referă acest proiect, Proiectul Internațional HapMap, reprezentând o colaborare între laboratoare din cinci sau șase țări diferite. Gândiți-vă la Proiectul Genomului uman în care aflați despre ce aveți în comun, iar Proiectul HapMap încearcă să deslușească unde există diferențe între diferiți oameni.

Why do we care about that? Well, there are lots of reasons. The most pressing one is that we want to understand how some differences make some people susceptible to one disease -- type-2 diabetes, for example -- and other differences make people more susceptible to heart disease, or stroke, or autism and so on. That's one big project. There's a second big project, recently funded by the Wellcome Trust in this country, involving very large studies -- thousands of individuals, with each of eight different diseases, common diseases like type-1 and type-2 diabetes, and coronary heart disease, bipolar disease and so on -- to try and understand the genetics. To try and understand what it is about genetic differences that causes the diseases. Why do we want to do that? Because we understand very little about most human diseases. We don't know what causes them. And if we can get in at the bottom and understand the genetics, we'll have a window on the way the disease works, and a whole new way about thinking about disease therapies and preventative treatment and so on. So that's, as I said, the little diversion on my main love.

De ce ne interesază asta? Ei bine, sunt o mulțime de motive. Cel mai presant dintre toate este acela că vrem să înțelegem modul în care anumite diferențe fac anumite persoane să fie predispuse la o boală -- diabet tip 2, de exemplu -- iar alte diferențe fac oamenii mult mai predispuși la boli de inima sau infarct sau autism și așa mai departe. Acesta este un proiect mare. Există un al doilea mare proiect, finanțat de curând de Wellcome Trust din această țară, implicând studii foarte mari -- mii de indivizi, fiecare cu o boală din opt diferite, boli obișnuite precum diabet tip-1 si tip-2, și boli de inimiă coronariene, boli bipolare și așa mai departe -- pentru a încerca să înțelegem genetica. Pentru a încerca să înțelegem ce anume din diferențele genetice cauzează boli. De ce vrem să facem asta? Deoarece înțelegem foarte puțin despre majoritatea bolilor umane. Nu știm ce le cauzează. Și dacă putem intra în profunzime și înțelege genetica, vom avea o fereastră spre modul în care funcționează bolile. Și un mod cu totul nou despre cum să ne gândim la terapiile pentru boli și tratamente preventive si așa mai departe. Deci aceasta este, după cum am spus, mica diversiune către principala mea dragoste.

Back to some of the more mundane issues of thinking about uncertainty. Here's another quiz for you -- now suppose we've got a test for a disease which isn't infallible, but it's pretty good. It gets it right 99 percent of the time. And I take one of you, or I take someone off the street, and I test them for the disease in question. Let's suppose there's a test for HIV -- the virus that causes AIDS -- and the test says the person has the disease. What's the chance that they do? The test gets it right 99 percent of the time. So a natural answer is 99 percent. Who likes that answer? Come on -- everyone's got to get involved. Don't think you don't trust me anymore. (Laughter) Well, you're right to be a bit skeptical, because that's not the answer. That's what you might think. It's not the answer, and it's not because it's only part of the story. It actually depends on how common or how rare the disease is. So let me try and illustrate that. Here's a little caricature of a million individuals. So let's think about a disease that affects -- it's pretty rare, it affects one person in 10,000. Amongst these million individuals, most of them are healthy and some of them will have the disease. And in fact, if this is the prevalence of the disease, about 100 will have the disease and the rest won't. So now suppose we test them all. What happens? Well, amongst the 100 who do have the disease, the test will get it right 99 percent of the time, and 99 will test positive. Amongst all these other people who don't have the disease, the test will get it right 99 percent of the time. It'll only get it wrong one percent of the time. But there are so many of them that there'll be an enormous number of false positives. Put that another way -- of all of them who test positive -- so here they are, the individuals involved -- less than one in 100 actually have the disease. So even though we think the test is accurate, the important part of the story is there's another bit of information we need.

Să ne întoarcem la niște chestiuni ceva mai terestre legate de gândirea despre incertitudine. Iată un alt chestionar pentru voi -- imaginați-vă că ați făcut un test pentru o boală care nu e infailibil, dar e destul de aproape. Iese corect în 99 la sută din cazuri. Și iau pe cineva din rândul vostru sau iau pe cineva de pe stradă, și îi testez pentru boala în cauză. Să presupunem că este un test pentru HIV -- virusul care cauzează SIDA -- iar testul spune că persoana are o boală. Care este șansa ca ei să o aibă? Testul iese corect 99 la sută din timp. Deci un răspuns natural este 99 la sută. Cui îi place acest răspuns? Haideți -- toată lumea trebuie să se implice. Nu cred că nu mai aveți încredere în mine. (Râsete) Ei bine, aveți dreptate să fiți un pic sceptici pentru că nu acesta este răspunsul. S-ar putea să gândiți asta. Nu e răspunsul, și nu doar pentru că e parte din poveste. Depinde de fapt de cât de comună sau de rară e boala. Dați-mi voie să ilustrez acest lucru. Iată o mică caricatură ce reprezintă un milion de indivizi. Să ne găndim deci la o boală care afectează -- este destul de rară, afectează o persoană din 10,000. Din aceste milioane de oameni, majoritatea sunt sănătoși iar unii dintre ei vor avea boala. Și de fapt, dacă asta reprezintă răspândirea bolii, aproximativ 100 vor avea boala, iar restul nu. Deci să presupunem acum că îi testați pe toți. Ce se întâmplă? Ei bine, printre cei 100 care au această boală, testul va ieși corect in 99 la sută din cazuri, iar 99 vor ieși pozitivi. Printre toți ceilalți oameni care nu au boala, testul va ieși corect 99 la sută din timp. Va ieși greșit numai unu la sută din timp. Însă sunt atât de mulți că va fi un număr enorm de rezultate pozitive eronate. Gândiți-vă astfel -- dintre toți cei care au rezultat pozitiv -- iată-i aici, indivizi implicați -- mai puțin de unu din 100 au într-adevăr această boală. Deci deși credem că testul e corect, partea importantă a poveștii e aceea că există o altă bucată de informație de care avem nevoie.

Here's the key intuition. What we have to do, once we know the test is positive, is to weigh up the plausibility, or the likelihood, of two competing explanations. Each of those explanations has a likely bit and an unlikely bit. One explanation is that the person doesn't have the disease -- that's overwhelmingly likely, if you pick someone at random -- but the test gets it wrong, which is unlikely. The other explanation is that the person does have the disease -- that's unlikely -- but the test gets it right, which is likely. And the number we end up with -- that number which is a little bit less than one in 100 -- is to do with how likely one of those explanations is relative to the other. Each of them taken together is unlikely.

Iată intuiția cheie. Ceea ce trebuie să facem, odată ce știm că testul este pozitiv este să cântărim plauzibilitatea sau probabilitatea a două explicații concurente. Fiecare din aceste explicații are o parte probabilă și una improbabilă. O explicație e aceea că persoana nu are boala -- aceste lucru este copleșitor de probabil, dacă alegi o persoană aleatoriu -- însă testul iese greșit, ceea ce este puțin probabil. Cealaltă explicație e că persoana are boala -- asta e puțin probabli -- dar testul iese pozitiv, ceea ce e posibil. Iar numărul pe care îl avem la final -- acel număr care e un pic mai mic decât unu din 100 -- are legătură cu cât de probabilă e una din aceste explicații relativ la cealaltă. Fiecare dintre ele luate împreună este puțin probabilă.

Here's a more topical example of exactly the same thing. Those of you in Britain will know about what's become rather a celebrated case of a woman called Sally Clark, who had two babies who died suddenly. And initially, it was thought that they died of what's known informally as "cot death," and more formally as "Sudden Infant Death Syndrome." For various reasons, she was later charged with murder. And at the trial, her trial, a very distinguished pediatrician gave evidence that the chance of two cot deaths, innocent deaths, in a family like hers -- which was professional and non-smoking -- was one in 73 million. To cut a long story short, she was convicted at the time. Later, and fairly recently, acquitted on appeal -- in fact, on the second appeal. And just to set it in context, you can imagine how awful it is for someone to have lost one child, and then two, if they're innocent, to be convicted of murdering them. To be put through the stress of the trial, convicted of murdering them -- and to spend time in a women's prison, where all the other prisoners think you killed your children -- is a really awful thing to happen to someone. And it happened in large part here because the expert got the statistics horribly wrong, in two different ways.

Iată un exemplu ceva mai actual al exact aceluiași lucru. Cei care sunteți din Marea Britanie veți ști despre cazul care a devenit oarecum celebru al unei femei pe nume Sally Clark, care avea doi copii care au decedați subit. Și inițial s-a crezut că au murit de ceea e numit informal "moarte infantilă" și mai formal drept Sindromul Morții Infantile Bruște. Din motive diverse, a fost mai târziu acuzată de crimă. Iar la proces, procesul ei, un pediatru foarte cunoscut a dovedit că șansa a două morți infantile, morți inocente, într-o familie ca a ei -- de profesie și nefumători -- era una la 73 de milioane. Mai pe scurt, la acel moment ea a fost condamnată. Mai târziu, și destul de recent, achitată în urma unui apel -- de fapt al doilea apel. Și doar pentru a crea context, vă puteți imagina cât de îngrozitor e pentru cineva să piardă un copil și apoi doi, dacă sunt nevinovați, să fie condamnată că i-ar fi ucis. Să fie supusă stresului unui proces, condamnată pentru uciderea lor -- și să petreacă timp într-o închisoare pentru femei, unde toate celelalte deținute cred ca și-a ucis copiii -- e un lucru îngrozitor să i se întâmple cuiva. Și s-a întâmplat în mare parte în acest caz pentru că expertul a greșit statisticile în mod oribil, în două moduri diferite.

So where did he get the one in 73 million number? He looked at some research, which said the chance of one cot death in a family like Sally Clark's is about one in 8,500. So he said, "I'll assume that if you have one cot death in a family, the chance of a second child dying from cot death aren't changed." So that's what statisticians would call an assumption of independence. It's like saying, "If you toss a coin and get a head the first time, that won't affect the chance of getting a head the second time." So if you toss a coin twice, the chance of getting a head twice are a half -- that's the chance the first time -- times a half -- the chance a second time. So he said, "Here, I'll assume that these events are independent. When you multiply 8,500 together twice, you get about 73 million." And none of this was stated to the court as an assumption or presented to the jury that way. Unfortunately here -- and, really, regrettably -- first of all, in a situation like this you'd have to verify it empirically. And secondly, it's palpably false. There are lots and lots of things that we don't know about sudden infant deaths. It might well be that there are environmental factors that we're not aware of, and it's pretty likely to be the case that there are genetic factors we're not aware of. So if a family suffers from one cot death, you'd put them in a high-risk group. They've probably got these environmental risk factors and/or genetic risk factors we don't know about. And to argue, then, that the chance of a second death is as if you didn't know that information is really silly. It's worse than silly -- it's really bad science. Nonetheless, that's how it was presented, and at trial nobody even argued it. That's the first problem. The second problem is, what does the number of one in 73 million mean? So after Sally Clark was convicted -- you can imagine, it made rather a splash in the press -- one of the journalists from one of Britain's more reputable newspapers wrote that what the expert had said was, "The chance that she was innocent was one in 73 million." Now, that's a logical error. It's exactly the same logical error as the logical error of thinking that after the disease test, which is 99 percent accurate, the chance of having the disease is 99 percent. In the disease example, we had to bear in mind two things, one of which was the possibility that the test got it right or not. And the other one was the chance, a priori, that the person had the disease or not. It's exactly the same in this context. There are two things involved -- two parts to the explanation. We want to know how likely, or relatively how likely, two different explanations are. One of them is that Sally Clark was innocent -- which is, a priori, overwhelmingly likely -- most mothers don't kill their children. And the second part of the explanation is that she suffered an incredibly unlikely event. Not as unlikely as one in 73 million, but nonetheless rather unlikely. The other explanation is that she was guilty. Now, we probably think a priori that's unlikely. And we certainly should think in the context of a criminal trial that that's unlikely, because of the presumption of innocence. And then if she were trying to kill the children, she succeeded. So the chance that she's innocent isn't one in 73 million. We don't know what it is. It has to do with weighing up the strength of the other evidence against her and the statistical evidence. We know the children died. What matters is how likely or unlikely, relative to each other, the two explanations are. And they're both implausible. There's a situation where errors in statistics had really profound and really unfortunate consequences. In fact, there are two other women who were convicted on the basis of the evidence of this pediatrician, who have subsequently been released on appeal. Many cases were reviewed. And it's particularly topical because he's currently facing a disrepute charge at Britain's General Medical Council.

Deci de unde a obținut el numărul de unu la 73 de milioane? S-a uitat peste niște cercetări, care spuneau că șansele unui deces infantil într-o familie ca cea a lui Sally Clark este de aproape 1 la 850,000. Drept care el a spus: "Voi presupune că dacă aveți o moarte infantilă într-o familie, șansele ca un al doilea copil să moară de moarte infantilă nu se schimbă." Deci asta este ceea ce statisticienii ar numi o presupunere de independență. Este ca și cum ai spune, "Dacă aruncați o monedă și obțineți cap prima dată, acest lucru nu va afecta șansele de a obține cap a doua oară." Deci dacă aruncați moneda de două ori, șansele să obțineți cap de două ori sunt jumătate -- șansa pentru prima dată -- înmulțit cu jumătate -- șansa pentru a doua oară. Așa că a spus "Iată, să presupunem -- voi presupune că aceste evenimente sunt independente. Când multiplicați 850,000 de două ori împreună, obțineți în jur de 73 de milioane." Și nici una dintre acestea nu a fost prezentată la tribunal drept o presupunere sau prezentată juriului în acest mod. Din păcate aici -- și chiar în mod regretabil -- mai intâi, într-o situație ca aceasta ar trebui să verificați empiric. Și în al doilea rând, este în mod palpabil fals. Sunt o mulțime de lucruri pe care nu le știm despre morțile infantile subite. S-ar putea să existe factori de mediu pe care nu îi conștientizăm, și este destul de probabil să fie cazul unor factori genetici pe care să nu îi conștientizăm. Așadar dacă o familie suferă de moarte infantilă, i-ați pune într-un grup cu risc foarte înalt. Probabil că au acești factori de risc de mediu și/sau factori de risc genetic despre care nu știm. Și să spui atunci că șansa apariției celei de-a doua morți e ca și cum nu ai ști acea informație e o prostie. E mai rău decât prostie -- e chiar știință proastă. Însă așa a fost prezentată și la proces nimeni nu a combătut-o. Aceasta e prima problemă. Cea de-a doua problemă e: ce reprezintă numărul de unu la 73 de milioane? După ce Sally Clark a fost condamnată -- vă puteți imagina, a făcut furori în presă -- unul dintre jurnaliștii de la un ziar onorabil din Marea Britanie a scris că ce a spun expertul a fost "Șansa ca ea să fi fost inocentă era de unu la 73 de milioane." Acum, aceasta e o eroare logică. E exact aceeași eroare logică ca și a gândi că după testul de boală, care este corect in procent de 99 la sută, șansa de a avea boala este de 99 la sută. În exemplul cu boala, a trebuit să ținem cont de două lucruri, dintre care unul era posibilitatea ca testul să fie sau nu corect. Iar celălalt era șansa ca, a priori, acea persoană să aibă sau nu boala. Este exact același context. Sunt două lucruri implicate -- două părți ale explicației. Vrem să știm cât de probabile, sau relativ probabile, sunt două explicații. Una dintre ele e aceea că Sally Clark era nevinovată -- ceea ce e, a priori, copleșitor de probabil -- majoritatea mamelor nu își ucid copiii. Iar a doua parte a explicației e că ea a suferit un eveniment incredibil de improbabil. Nu la fel de improbabil ca unu la 73 de milioane, însă destul de improbabil. Cealaltă explicație e aceea că ea era vinovată. Acum, probabil că ne gândim, a priori, că acest lucru nu e posibil. Și în mod cert ar trebui să ne gândim în contextul unui proces penal că acest lucru e improbabil, datorită prezumției de nevinovăție. Iar apoi dacă ea a încercat să omoare copiii, a reușit. Deci șansa ca ea să fie nevinovată nu este de unu la 73 de milioane. Nu știm care este. Are legătură cu cântărirea tăriei celeilate dovezi împotriva ei și dovada statistică. Știm că au murit copiii. Ce contează e cât de probabile sau improbabile sunt cele două explicații una față de cealaltă. Și sunt amândouă neplauzibile. Există o situație în care erorile din statistică au avut consecințe foarte profunde și cu adevărat nefericite. De fapt, există alte două femei care au fost condamnate pe baza dovezilor acestui pediatru, care au fost eliberate ulterior la recurs. Multe cazuri au fost revizuite. Și e în special de actualitate deoarece acum el se confruntă cu o condamnare pentru proastă reputație la Consilliul Medical General al Marii Britanii.

So just to conclude -- what are the take-home messages from this? Well, we know that randomness and uncertainty and chance are very much a part of our everyday life. It's also true -- and, although, you, as a collective, are very special in many ways, you're completely typical in not getting the examples I gave right. It's very well documented that people get things wrong. They make errors of logic in reasoning with uncertainty. We can cope with the subtleties of language brilliantly -- and there are interesting evolutionary questions about how we got here. We are not good at reasoning with uncertainty. That's an issue in our everyday lives. As you've heard from many of the talks, statistics underpins an enormous amount of research in science -- in social science, in medicine and indeed, quite a lot of industry. All of quality control, which has had a major impact on industrial processing, is underpinned by statistics. It's something we're bad at doing. At the very least, we should recognize that, and we tend not to. To go back to the legal context, at the Sally Clark trial all of the lawyers just accepted what the expert said. So if a pediatrician had come out and said to a jury, "I know how to build bridges. I've built one down the road. Please drive your car home over it," they would have said, "Well, pediatricians don't know how to build bridges. That's what engineers do." On the other hand, he came out and effectively said, or implied, "I know how to reason with uncertainty. I know how to do statistics." And everyone said, "Well, that's fine. He's an expert." So we need to understand where our competence is and isn't. Exactly the same kinds of issues arose in the early days of DNA profiling, when scientists, and lawyers and in some cases judges, routinely misrepresented evidence. Usually -- one hopes -- innocently, but misrepresented evidence. Forensic scientists said, "The chance that this guy's innocent is one in three million." Even if you believe the number, just like the 73 million to one, that's not what it meant. And there have been celebrated appeal cases in Britain and elsewhere because of that.

Deci în concluzie -- care sunt mesajele pe care să le luăm cu noi din acest caz? Ei bine, caracterul aleatoriu, și nesiguranța, și șansa sunt foarte mult o parte din viața noastră zilnică. Este de asemenea adevărat -- și, deși voi, ca grup, sunteți foarte speciali în multe privințe, sunteți complet tipici în a nu înțelege exemplele pe care vi le-am oferit. Este foarte bine documentat faptul că oamenii înțeleg greșit unele lucruri. Ei fac erori de logică în înțelegerea incertitudinii. Facem față subtilităților de limbaj în mod strălucit -- și sunt întrebări evolutiive interesante despre cum am ajuns aici. Nu suntem buni la a înțelege incertitudinea. Aceasta e o chestiune prezentă în viața noastră zilnică. După cum ați auzit din multe discursuri, stastica stă la baza unui procent enorm de cercetări științifice -- în știința socială, în medicină și într-adevăr, într-un număr foarte mare de industrii. Întregul control al calității, care a avut un impact major asupra procesului industrial, este susținut de statistici. E ceva ce nu ne pricepem să facem. Cel puțin, ar trebui să recunoaștem acest lucru și tindem să nu o facem. Ca să revenim la contextul legal, la procesul lui Sally Clark toți avocații au acceptat ca atare ce a spus expertul. Deci dacă un pediatru ar fi mers în față și i-ar fi spus juriului "Știu cum să construiesc poduri. Am construit unul peste drum. Vă rog să îl traversați cu mașina în drum spre casă", ei ar fi spun "Păi, pediatrii nu știu să construiască poduri. Asta fac inginerii." Pe de altă parte, a ieșit în față și a spus în mod eficient, sau a sugerat, "Știu cum să argumentez incertitudinea. Știu cum să fac statistici." Și toată lumea a spus, "Păi, este în regulă. Este un expert." Deci trebuie să înțelegem unde se încadrează competența noastră și unde nu. Exact aceleași tipuri de chestiuni ridicate în zilele de început ale profilării ADN, când oamenii de știință, avocații și în unele cazuri judecătorii, în mod rutinier au prezentat greșit dovezile. De obicei -- se speră -- inocent, dar greșit prezentată dovada. Oamenii de știință juridică spun, "Șansa ca acest om să fie nevinovat e unu la trei milioane. Chiar dacă dați crezare numărului, ca și 73 de milioane la unu, nu asta a însemnat. Și au existat cazuri celebre de recurs în Marea Britanie și în alte părți din acest motiv.

And just to finish in the context of the legal system. It's all very well to say, "Let's do our best to present the evidence." But more and more, in cases of DNA profiling -- this is another one -- we expect juries, who are ordinary people -- and it's documented they're very bad at this -- we expect juries to be able to cope with the sorts of reasoning that goes on. In other spheres of life, if people argued -- well, except possibly for politics -- but in other spheres of life, if people argued illogically, we'd say that's not a good thing. We sort of expect it of politicians and don't hope for much more. In the case of uncertainty, we get it wrong all the time -- and at the very least, we should be aware of that, and ideally, we might try and do something about it. Thanks very much.

Și doar pentru a încheia în contextul sistemului legal. Este foarte bine să spunem, "Să facem tot posibilul pentru a prezenta dovada." Însă din ce în ce mai mult, în cazul profilării ADN -- acesta este altul -- ne așteptăm ca jurații, care sunt oameni obișnuiți -- și este documentat că nu sunt buni la asta -- ne așteptăm ca jurații să fie capabili să facă față tipurilor de raționament care au loc. În alte sfere ale vieții, dacă oamenii se contrazic -- bine, exceptând, posibil, în politică. Însă în alte sfere ale vieții, dacă oamenii se contrazic în mod ilogic, am spune că nu este un lucru bun. Cumva ne așteptăm la asta de la politicieni și nu sperăm la mai mult. În cazul incertitudinii, greșim tot timpul -- și cel puțin, ar trebui să fim conștienți de asta. Și, ideal, ar trebui să încercăm să facem ceva în privința asta. Vă mulţumesc foarte mult.

Peter Donnelly: How juries are fooled by statistics

Peter Donnelly: How juries are fooled by statistics

Related talks

Hans Rosling: The best stats you've ever seen

Michael Shermer: Why people believe weird things

Emily Oster: Flip your thinking on AIDS in Africa

Robert Full: Learning from the gecko's tail

Aubrey de Grey: A roadmap to end aging

E.O. Wilson: Advice to a young scientist

Related talks

Hans Rosling: The best stats you've ever seen

Michael Shermer: Why people believe weird things

Emily Oster: Flip your thinking on AIDS in Africa

Robert Full: Learning from the gecko's tail

Aubrey de Grey: A roadmap to end aging

E.O. Wilson: Advice to a young scientist