Peter Donnelly: How juries are fooled by statistics

As other speakers have said, it's a rather daunting experience -- a particularly daunting experience -- to be speaking in front of this audience. But unlike the other speakers, I'm not going to tell you about the mysteries of the universe, or the wonders of evolution, or the really clever, innovative ways people are attacking the major inequalities in our world. Or even the challenges of nation-states in the modern global economy. My brief, as you've just heard, is to tell you about statistics -- and, to be more precise, to tell you some exciting things about statistics. And that's -- (Laughter) -- that's rather more challenging than all the speakers before me and all the ones coming after me. (Laughter) One of my senior colleagues told me, when I was a youngster in this profession, rather proudly, that statisticians were people who liked figures but didn't have the personality skills to become accountants. (Laughter) And there's another in-joke among statisticians, and that's, "How do you tell the introverted statistician from the extroverted statistician?" To which the answer is, "The extroverted statistician's the one who looks at the other person's shoes." (Laughter) But I want to tell you something useful -- and here it is, so concentrate now. This evening, there's a reception in the University's Museum of Natural History. And it's a wonderful setting, as I hope you'll find, and a great icon to the best of the Victorian tradition. It's very unlikely -- in this special setting, and this collection of people -- but you might just find yourself talking to someone you'd rather wish that you weren't. So here's what you do. When they say to you, "What do you do?" -- you say, "I'm a statistician." (Laughter) Well, except they've been pre-warned now, and they'll know you're making it up. And then one of two things will happen. They'll either discover their long-lost cousin in the other corner of the room and run over and talk to them. Or they'll suddenly become parched and/or hungry -- and often both -- and sprint off for a drink and some food. And you'll be left in peace to talk to the person you really want to talk to.

다른 연설자께서 말씀하신 것처럼, 이건 상당히 기죽을만한 경험이군요. 특히 여러분과 같은 청중들 앞에서 발표를 하는 건, 특히 위축될만한 경험입니다. 그럼에도, 다른 연설자들과 달리, 저는 여러분에게 우주의 신비나, 또는 진화의 경이로움, 또는 사람들이 우리 세계의 심각한 불평등들을 공략하고자 하기 위한 정말로 지혜롭고 혁신적인 방안들에 대해서 얘기하진 않을 것입니다. 또는 현대 글로벌 경제에서 민족국가들이 직면한 문제들에 대해서도 얘기하지 않을 것입니다. 여러분이 방금 들은 것처럼, 전 간단히 여러분께 통계학에 대해 말씀드리고 그리고, 정확히 말하자면, 여러분께 통계학에 관한 재미있는 것들을 알려드리겠습니다. 그리고 그건 (웃음) 그건 약간 더 난감한 것입니다. 저보다 먼저 연설했던 모든 사람들과 앞으로 연설할 모든 사람들보다도 말입니다. (웃음) 제가 이 분야에서 초보자였을때, 선배 중 한명이 저에게 이렇게 말했습니다. 상당히 자랑스럽게 말하기를, 통계학자들은 수치를 좋아하는 사람들인데 그들은 회계사가 될만한 사교성은 가지고 있지 않다고 말이죠. (웃음) 그리고 통계학자들에 대한 그들만의 다른 농담도 있는 데요. "내성적인 통계학자와 외향적인 통계학자를 어떻게 구별하는지 아십니까?" 답은 말이죠. 외향적인 통계학자는 다른 사람의 신발까지는 쳐다볼 수 있다는 겁니다. (웃음) 그렇지만, 전 여러분에게 뭔가 유용한 걸 얘기하고 싶고 -- 그걸 가지고 왔습니다. 그러니, 이젠 집중해 주시기 바랍니다. 오늘 저녁, 대학의 자연사 박물관에서 리셉션이 있습니다. 그리고, 그건 매우 훌륭하게 준비되어 있다는 걸, 여러분들도 알게 되길 원합니다. 그리고 그건 빅토리아 시대의 전통 중 최고 수준의 표상입니다. 이렇게 특별한 설정과 사람들의 모임에서 -- 잘 발생할 거 같지 않은 일이지만 당신은, 당신이 별로 얘기하고 싶어하지 않는 사람과 얘기하게 된다고 합시다. 그런 경우, 당신은 이렇게 할 수 있습니다. 그들이 당신에게 "직업이 뭡니까"라고 물었을 때, "통계학자입니다"라고 대답하는 거죠. (웃음) 뭐... 다만.. 여기서 예외는 그들이 이러한 상황에 대해 미리 주의를 했고, 당신이 거짓말을 했다는 사실을 눈치채는 경우겠죠. 아무튼 그런 대답을 듣고 나면, 다음 두가지 중 하나가 발생할 겁니다. 그들은 갑자기 방의 한 구석에서 오랜동안 헤어졌던 사촌을 발견하고 그 사촌에게 달려가서 얘기를 한다든지... 아니면, 갑자기 목이 마르거나 배가 고파지거나 -- 때로는 둘 다가 되서 -- 물을 마시고 음식을 먹기 위해 달려가겠죠. 그리고 당신은 다시 평화롭게 남겨져서, 당신이 정말 대화하고 싶은 사람과 대화를 할 수 있게 됩니다.

It's one of the challenges in our profession to try and explain what we do. We're not top on people's lists for dinner party guests and conversations and so on. And it's something I've never really found a good way of doing. But my wife -- who was then my girlfriend -- managed it much better than I've ever been able to. Many years ago, when we first started going out, she was working for the BBC in Britain, and I was, at that stage, working in America. I was coming back to visit her. She told this to one of her colleagues, who said, "Well, what does your boyfriend do?" Sarah thought quite hard about the things I'd explained -- and she concentrated, in those days, on listening. (Laughter) Don't tell her I said that. And she was thinking about the work I did developing mathematical models for understanding evolution and modern genetics. So when her colleague said, "What does he do?" She paused and said, "He models things." (Laughter) Well, her colleague suddenly got much more interested than I had any right to expect and went on and said, "What does he model?" Well, Sarah thought a little bit more about my work and said, "Genes." (Laughter) "He models genes."

이게 바로 우리같은 직업을 가진 사람들이 우리가 뭘 하는지 설명하려면 받게 되는 도전입니다. 우리는 디너 파티 초대 손님이나 대화 상대 등등의 리스트에서, 상위에 있는 인기있는 사람들은 아닙니다. 그리고 이러한 문제는 저로서는 절대 좋은 해결 방법을 찾아내지 못하는 것이기도 합니다. 그런데, 제 아내는 -- 당시에는 여자 친구였는데요. -- 결국 제가 할 수 있는 것보다 더 나은 방법을 찾아냈습니다. 오래 전에, 우리가 처음 데이트를 하기 시작했을 때, 그녀는 영국의 BBC에서 일하고 있었습니다. 그리고 그 당시 저는 미국에서 일하고 있었죠. 제가 여자친구를 만나기 위해 돌아왔는데요. 그녀가 이 사실을 동료에게 말하자, 동료는 "음, 네 남자친구는 뭐하는 사람인데?"라고 물었습니다. 사라는 제가 설명했던 것들을 매우 열심히 생각했죠. 그녀는, 적어도 그 당시에는, 제 말을 집중해서 들었었습니다. (웃음) 제가 이렇게 얘기했다고 사라에게 말하지 마세요. 그리고, 그녀는 제가 개발하고 있는 수학적 모델들에 관한 작업에 대해 생각했죠. 그 수학적 모델들은 진화와 현대 유전학을 이해하기 위한 것들이었습니다. 따라서, 그녀의 동료가 "네 남자친구는 뭐하는 사람이야?"라고 묻자, 그녀는 잠시 생각하고는 이렇게 말했습니다. "그 사람은 뭔가 모델링하는 사람이야." (웃음) 음, 그 동료는 갑자기 제가 예상할 수 있는 것보다 훨씬 더 흥미를 느꼈습니다. 그래서 계속해서 말했죠. "그 사람은 뭘 모델링하는 데?" 뭐, 사라는 제가 하는 작업에 대해 더 생각해 보고는 이렇게 말했습니다. "유전자들". (웃음) "그는 유전자들을 모델링해."

That is my first love, and that's what I'll tell you a little bit about. What I want to do more generally is to get you thinking about the place of uncertainty and randomness and chance in our world, and how we react to that, and how well we do or don't think about it. So you've had a pretty easy time up till now -- a few laughs, and all that kind of thing -- in the talks to date. You've got to think, and I'm going to ask you some questions. So here's the scene for the first question I'm going to ask you. Can you imagine tossing a coin successively? And for some reason -- which shall remain rather vague -- we're interested in a particular pattern. Here's one -- a head, followed by a tail, followed by a tail.

이게 바로 제 첫사랑 얘기이고, 제가 여러분에게 약간 얘기하게 될 것입니다. 제가 좀더 일반적으로 하고 싶은 것은 여러분이 생각하게 하고 싶습니다. 우리 세계 안의 불확실성(uncertainty)과 무작위성(randomness) 그리고 가능성(chance)의 장소에 대해서 말입니다. 그리고 우리가 그것에 대해 어떻게 반응하는지, 그리고 우리가 그것을 얼마나 잘 또는 잘못 생각하는지에 대해서 말입니다. 자, 지금까지는 여러분에게 매우 쉬웠습니다. 본 발표에서 지금까지는, 약간 웃고, 뭐 그런 것들이었습니다. 여러분은 이제 생각해야 합니다. 그리고 저는 여러분에게 몇가지 질문을 하겠습니다. 여기에 제가 여러분에게 드리는 첫번째 질문이 있습니다. 동전을 계속해서 던지는 경우를 상상할 수 있겠습니까? 그리고 뭔가 모호하게 남아있을 이유 때문에 우리는 특정 패턴에 관심을 가지고 있다고 합시다. 하나는 앞면(head), 뒷면(tail), 뒷면(tail)이 나오는 경우입니다.

So suppose we toss a coin repeatedly. Then the pattern, head-tail-tail, that we've suddenly become fixated with happens here. And you can count: one, two, three, four, five, six, seven, eight, nine, 10 -- it happens after the 10th toss. So you might think there are more interesting things to do, but humor me for the moment. Imagine this half of the audience each get out coins, and they toss them until they first see the pattern head-tail-tail. The first time they do it, maybe it happens after the 10th toss, as here. The second time, maybe it's after the fourth toss. The next time, after the 15th toss. So you do that lots and lots of times, and you average those numbers. That's what I want this side to think about.

즉 우리가 동전을 계속해서 던진다고 합시다. 그러면, 그 패턴, 앞뒤뒤, 즉 HTT가 나오는 여기에서 우리가 갑자기 고정됩니다. 그러면 여러분은 이렇게 셀 수 있지요: 하나, 둘, 셋, 넷, 다섯, 여섯, 일곱, 여덟, 아홉, 열 -- 즉 이 패턴은 10 번째 던졌을 때 나왔습니다. 따라서, 여러분들은 아마 이런 거보다는 뭔가 더 재미있는 일이 있겠지 하고 생각하시겠지만, 일단 제 비위를 좀 더 맞춰 주시기 바랍니다. 여기 청중분들 중 절반이 각각 동전을 꺼내들고 던진다고 합시다. HTT 패턴이 나올 때까지 반복해서 던져 봅니다. 처음에는 여기서처럼 아마 10번째에 그런 패턴이 나오겠죠. 두번째에는 네번째에서 그런 패턴이 나옵다고 합시다. 다음에는 15번째... 즉, 여러분은 이런 실험을 여러번 해보고는 몇번째에 나오는지에 대한 숫자들의 평균을 내봅니다. 즉, 여러분들 중 이 쪽 절반은 이러한 실험에 대해 생각을 해보길 바랍니다.

The other half of the audience doesn't like head-tail-tail -- they think, for deep cultural reasons, that's boring -- and they're much more interested in a different pattern -- head-tail-head. So, on this side, you get out your coins, and you toss and toss and toss. And you count the number of times until the pattern head-tail-head appears and you average them. OK? So on this side, you've got a number -- you've done it lots of times, so you get it accurately -- which is the average number of tosses until head-tail-tail. On this side, you've got a number -- the average number of tosses until head-tail-head.

이제 여러분들 중 다른 절반은 HTT 패턴을 싫어한다고 합시다. 다른 절반인 여러분들은 심오한 문화적인 차이 때문에 HTT 패턴은 매우 따분하다고 생각합니다. 그리고 다른 패턴에 매우 관심이 많습니다. -- HTH 즉 앞뒤앞 입니다. 따라서 다른 한 쪽에서 여러분은 동전을 꺼내서, 던지고, 던지고, 또 던집니다. 그리고 HTH 패턴이 처음 나오는 때의 횟수를 세고요. 그것들의 평균을 내는 겁니다. 아시겠죠? 따라서 이 쪽에서 여러분들은 평균값 하나을 얻었고 -- 여러분들은 이 실험을 충분히 매우 많이 했다고 합시다. 따라서 그 값은 정확하겠죠. 그 값은 HTT (앞뒤뒤) 패턴이 처음 발생하는 동전 던지기 횟수의 평균입니다. 다른 쪽에서는, HTH (앞뒤앞) 패턴이 처음 나오는 동전 던지기 회수의 평균값을 얻었습니다.

So here's a deep mathematical fact -- if you've got two numbers, one of three things must be true. Either they're the same, or this one's bigger than this one, or this one's bigger than that one. So what's going on here? So you've all got to think about this, and you've all got to vote -- and we're not moving on. And I don't want to end up in the two-minute silence to give you more time to think about it, until everyone's expressed a view. OK. So what you want to do is compare the average number of tosses until we first see head-tail-head with the average number of tosses until we first see head-tail-tail.

자 여기에 수학적으로 심오한 사실 하나가 있습니다. 두 개의 숫자가 있으므로, 다음 세 가지 중 하나가 사실이어야 합니다. 그 두 숫자가 같거나, 아니면 이것이 요것보다 크거나, 아니면 요것이 이것보다 커야 합니다. 자, 어떨까요? 여러분 모두 이것에 대해 생각해 보시고, 투표를 합시다. 그리고, 여기서 더이상 진도를 안나갑니다. 그리고 여러분 모두가 의사를 표명하도록 여러분에게 시간을 더 주고 대신 저는 2분이 넘게 침묵하고 있고 싶지는 않군요. 괜찮죠? 따라서 여러분이 할 일은 HTH를 처음 보게 되는 동전 던지기 회수의 평균과 HTT를 처음 보게 되는 동전 던지기 회수의 평균을 비교하는 겁니다.

Who thinks that A is true -- that, on average, it'll take longer to see head-tail-head than head-tail-tail? Who thinks that B is true -- that on average, they're the same? Who thinks that C is true -- that, on average, it'll take less time to see head-tail-head than head-tail-tail? OK, who hasn't voted yet? Because that's really naughty -- I said you had to. (Laughter) OK. So most people think B is true. And you might be relieved to know even rather distinguished mathematicians think that. It's not. A is true here. It takes longer, on average. In fact, the average number of tosses till head-tail-head is 10 and the average number of tosses until head-tail-tail is eight. How could that be? Anything different about the two patterns? There is. Head-tail-head overlaps itself. If you went head-tail-head-tail-head, you can cunningly get two occurrences of the pattern in only five tosses. You can't do that with head-tail-tail. That turns out to be important.

A가 맞다고 생각하시는 분들 있습니까? 즉, 평균적으로 HTH를 발견하는 게 HTT를 발견하는 것보다 시간이 더 걸린다는 거죠? B가 맞다고 생각하시는 분들은요? 즉, 평균적으로 같은 시간이 걸린다는 거죠. C가 맞다고 생각하시는 분들은요? 즉 평균적으로 HTH를 발견하는 게 HTT보다 시간이 덜 걸린다는 겁니다. 좋습니다. 아직 투표 안한 사람 있나요? 왜냐면 안했다면 꽤 무례한 것이거든요. 제가 여러분에게 투표하라고 했잖아요. (웃음) 알겠습니다, 대부분 B가 맞다고 생각하는군요. 그리고 여러분이 알면 안심하실만한 사실은, 상당히 유명한 수학자들도 그렇게 생각한다는 겁니다. B가 답이 아니고, A가 답입니다. 평균적으로 시간이 더 걸립니다. 사실 HTH가 나오는 평균 던지기는 10 이고요. HTT가 나오는 평균 던지기는 8 입니다. 어째서일까요? 이 두 패턴 간의 차이점이라도 있나요? 있습니다. HTH는 그 자신 스스로가 겹쳐집니다. 만일 HTHTH 가 나왔다고 하면, 겨우 다섯번 던졌는 데도 패턴이 두 번 발생합니다. HTT의 경우엔 그럴 수 없습니다. 이게 매우 중요한 사실이라는 게 드러났습니다.

There are two ways of thinking about this. I'll give you one of them. So imagine -- let's suppose we're doing it. On this side -- remember, you're excited about head-tail-tail; you're excited about head-tail-head. We start tossing a coin, and we get a head -- and you start sitting on the edge of your seat because something great and wonderful, or awesome, might be about to happen. The next toss is a tail -- you get really excited. The champagne's on ice just next to you; you've got the glasses chilled to celebrate. You're waiting with bated breath for the final toss. And if it comes down a head, that's great. You're done, and you celebrate. If it's a tail -- well, rather disappointedly, you put the glasses away and put the champagne back. And you keep tossing, to wait for the next head, to get excited.

이 점에 대해 생각할 두 가지 방법이 있습니다. 그 중 하나를 알려드리겠습니다. 우리가 그것을 한다고 상상해 봅시다. 한 쪽에서는 -- 기억하듯이, HTT 패턴을 좋아하고 있고요. 여러분들은 HTH 패턴을 좋아한다고 합시다. 동전을 던지고 H, 즉 앞면이 나왔습니다. 그러면 여러분은 의자 가장자리에 앉습니다. 왜냐면 뭔가 아름다운, 대단한 일이 발생할 거 같거든요. 다시 던집니다. T, 뒷면이 나왔습니다. 여러분들은 흥분합니다. 얼음에 잠긴 샴페인 병이 옆에 있습니다. 샴페인 잔을 차게 해서 축하할 준비를 합니다. 마지막 던지기를 숨을 죽이며 기다립니다. 그리고 H 앞면이 나왔습니다. 멋집니다. 해냈습니다. 그래서 축하합니다. 만일 T 즉 뒷면이 나왔다면 -- 네, 상당히 실망해서, 여러분은 샴페인 잔을 다시 갔다 놓습니다. 그리고 샴페인도 도로 갔다 놓습니다. 그리고 다시 H, 즉 앞면이 나올 때까지 계속 던집니다.

On this side, there's a different experience. It's the same for the first two parts of the sequence. You're a little bit excited with the first head -- you get rather more excited with the next tail. Then you toss the coin. If it's a tail, you crack open the champagne. If it's a head you're disappointed, but you're still a third of the way to your pattern again. And that's an informal way of presenting it -- that's why there's a difference. Another way of thinking about it -- if we tossed a coin eight million times, then we'd expect a million head-tail-heads and a million head-tail-tails -- but the head-tail-heads could occur in clumps. So if you want to put a million things down amongst eight million positions and you can have some of them overlapping, the clumps will be further apart. It's another way of getting the intuition.

다른 한 쪽에서는 다른 경험을 합니다. 처음 두 번의 던지기에서는 동일한 경험입니다. 처음 H가 나오면 약간 흥분하고 -- 다음에 T가 나오면 상당히 흥분하고 그리고 나서 동전을 던집니다. 만일 T가 나오면 샴페인을 터뜨립니다. 만일 H가 나오면 실망하겠죠. 그러나 그 H 자체로 이미 여러분이 찾고자 하는 패턴의 3분의 1은 왔습니다. 이게 바로 그 생각할 방법을 일상적인 방법으로 제시한 겁니다. 그래서 차이가 있는 겁니다. 이것에 대해 생각하는 다른 방법으로는 -- 만일 동전을 팔백만번 던졌다고 합시다. 그러면 HTH 패턴을 백만번 정도 기대할 수 있습니다. (2의 3제곱이 8이므로) 그리고 HTT도 백만번 정도입니다 -- 그러나 HTH는 무리 지어 발생할 수도 있습니다. 따라서 팔백만 개의 위치 중에서 백만개를 내려놓고 싶다면 어떤 것들은 서로 겹쳐지게 되므로, 그 무리들은 갈라지게 됩니다. 이게 바로 직관적으로 이해하는 다른 방법입니다.

What's the point I want to make? It's a very, very simple example, an easily stated question in probability, which every -- you're in good company -- everybody gets wrong. This is my little diversion into my real passion, which is genetics. There's a connection between head-tail-heads and head-tail-tails in genetics, and it's the following. When you toss a coin, you get a sequence of heads and tails. When you look at DNA, there's a sequence of not two things -- heads and tails -- but four letters -- As, Gs, Cs and Ts. And there are little chemical scissors, called restriction enzymes which cut DNA whenever they see particular patterns. And they're an enormously useful tool in modern molecular biology. And instead of asking the question, "How long until I see a head-tail-head?" -- you can ask, "How big will the chunks be when I use a restriction enzyme which cuts whenever it sees G-A-A-G, for example? How long will those chunks be?"

제가 말하고자 하는 핵심이 뭘까요. 이건 바로, 확률에서 쉽게 기술할 수 있는 문제이며, 여러분들같이 거의 모든 사람들이 틀리는, 매우 매우 단순한 예입니다. 이건 저의 진정한 열정인 유전학에서 살짝 벗어난 것입니다. 유전학에서 HTH 와 HTT 같의 연결된 바가 있습니다. 그건 다음과 같습니다. 동전을 던지면, 앞면과 뒷면의 순열을 얻게 됩니다. DNA를 보시면, 앞면과 뒷면 두 가지의 순열이 아닙니다. A, G, C, T 라는 네 문자들의 순열입니다. 그리고 제한 효소라 불리는 작은 화학적인 가위들이 있습니다. 이 가위들은 DNA에서 특정 패턴을 만나면 자릅니다. 이것들은 현대 분자 생물학에서 매우 유용한 도구입니다. 그리고 "앞면, 뒷면, 앞면을 보려면 얼마나 기다려야 하나?"라고 묻는 대신에 "G-A-A-G 라는 패턴을 보면 자르는 제한 효소를 사용한다면, " "그 잘라진 덩어리의 길이는 어느정도일까?"라고 물어볼 수 있습니다. 그 잘라진 것의 길이는 어느 정도일까요?

That's a rather trivial connection between probability and genetics. There's a much deeper connection, which I don't have time to go into and that is that modern genetics is a really exciting area of science. And we'll hear some talks later in the conference specifically about that. But it turns out that unlocking the secrets in the information generated by modern experimental technologies, a key part of that has to do with fairly sophisticated -- you'll be relieved to know that I do something useful in my day job, rather more sophisticated than the head-tail-head story -- but quite sophisticated computer modelings and mathematical modelings and modern statistical techniques. And I will give you two little snippets -- two examples -- of projects we're involved in in my group in Oxford, both of which I think are rather exciting. You know about the Human Genome Project. That was a project which aimed to read one copy of the human genome. The natural thing to do after you've done that -- and that's what this project, the International HapMap Project, which is a collaboration between labs in five or six different countries. Think of the Human Genome Project as learning what we've got in common, and the HapMap Project is trying to understand where there are differences between different people.

이건 확률과 유전학 간의 다소 단순한 연결 고리입니다. 훨씬 더 깊은 연결이 있습니다만, 제가 여기서 다루기에는 다소 시간이 걸립니다. 그리고 현대 유전학은 매우 재미있는 과학 분야입니다. 본 컨퍼런스의 나중에 나올 발표 몇 개에서 특히 이 분야에 대해 애기할 것입니다. 그러나, 이러한 것은 현대의 실험 기술이 생성하는 정보의 비밀을 푸는 것이라는 사실이 드러났고요. 핵심적 부분은 상당히 복잡한 데 -- 근데, 제가 제 직업에서 뭔가 유용한 것을 한다는 걸 아셔서 안심하셨을 겁니다만, 단순한 앞면 뒷면 앞면 얘기보다는 더 복잡하고 -- 상당히 복잡한 컴퓨터 모델링과 수학 모델링 그리고 현대 통계 기법들입니다. 이제 여러분에게 두 개의 작은 예로 옥스포드의 제 그룹에서 하고 있는 프로젝트들에 대해 소개하겠습니다. 둘 다 제 생각엔 상당히 재미있습니다. 휴먼 게놈 프로젝트를 아실 겁니다. 그건 사람의 게놈의 한 복사본을 읽으려는 목표의 프로젝트입니다. 이것을 해내고 나면 자연스럽게 하고 싶어지는 일은 -- 그게 바로 국제 햅맵 (HapMap) 프로젝트인데, 대여섯 개의 서로 다른 나라의 연구실들 간의 협동 과제입니다. 휴먼 게놈 프로젝트는 우리가 공통적으로 가지고 있는 게 뭔지를 알고자 하는 거라 생각하시고요. 햅맵 프로젝트는 서로 다른 사람들 간의 차이점이 어디에 있는지를 이해하기 위한 프로젝트입니다.

Why do we care about that? Well, there are lots of reasons. The most pressing one is that we want to understand how some differences make some people susceptible to one disease -- type-2 diabetes, for example -- and other differences make people more susceptible to heart disease, or stroke, or autism and so on. That's one big project. There's a second big project, recently funded by the Wellcome Trust in this country, involving very large studies -- thousands of individuals, with each of eight different diseases, common diseases like type-1 and type-2 diabetes, and coronary heart disease, bipolar disease and so on -- to try and understand the genetics. To try and understand what it is about genetic differences that causes the diseases. Why do we want to do that? Because we understand very little about most human diseases. We don't know what causes them. And if we can get in at the bottom and understand the genetics, we'll have a window on the way the disease works, and a whole new way about thinking about disease therapies and preventative treatment and so on. So that's, as I said, the little diversion on my main love.

왜 우리가 그런 것들에 신경써야 할까요? 글쎄요, 여러 가지 이유가 있습니다. 가장 절박한 이유는 우리는 어떠한 차이가 어떤 사람에게 특정 질병에 더 잘 걸리게 하는지를 이해하고 싶어 합니다. 예를 들어 당뇨병 제2형이 그렇습니다. 또한 어떠한 차이가 사람들로 하여금 심장병이나 발작, 자폐증 등에 더 잘 걸리게 하는지 이해하고 싶습니다. 그건 하나의 큰 프로젝트입니다. 두 번째로 큰 프로젝트가 있는 데, 최근 미국의 Wellcome Trust 에서 자금을 댄 과제인데요, 매우 큰 연구들이 관련되어 있습니다. 8 개의 서로 다른 질병을 각각 가지고 있는 수천 명도 관련되어 있고요. 이 질병들은 당뇨병 제1형, 제2형과 같은 흔한 병들과 관상동맥성 심장질환, 조울증 등등으로, 유전학을 이해하고자 하는 시도입니다. 질병들을 초래하는 유전적인 차이들이 무엇인지를 이해하려는 시도입니다. 우린 왜 이런 걸 할까요? 왜냐면, 사람의 질병 대부분에 대해 우린 거의 이해하지 못하고 있기 때문입니다. 우린 뭐가 질병들을 초래하는지 모릅니다. 만일 우리가 바닥까지 가서 유전학을 이해한다면, 질병이 작동하는 길로 향하는 창문을 얻을 수 있을 겁니다. 또한 질병 치료법, 질병 예방법 등등에 대해 완전히 새롭게 사고하는 방법을 알 수 있을 겁니다. 따라서, 제가 얘기했듯이, 그건 저의 주된 관심사에서 약간 벗어난 것입니다.

Back to some of the more mundane issues of thinking about uncertainty. Here's another quiz for you -- now suppose we've got a test for a disease which isn't infallible, but it's pretty good. It gets it right 99 percent of the time. And I take one of you, or I take someone off the street, and I test them for the disease in question. Let's suppose there's a test for HIV -- the virus that causes AIDS -- and the test says the person has the disease. What's the chance that they do? The test gets it right 99 percent of the time. So a natural answer is 99 percent. Who likes that answer? Come on -- everyone's got to get involved. Don't think you don't trust me anymore. (Laughter) Well, you're right to be a bit skeptical, because that's not the answer. That's what you might think. It's not the answer, and it's not because it's only part of the story. It actually depends on how common or how rare the disease is. So let me try and illustrate that. Here's a little caricature of a million individuals. So let's think about a disease that affects -- it's pretty rare, it affects one person in 10,000. Amongst these million individuals, most of them are healthy and some of them will have the disease. And in fact, if this is the prevalence of the disease, about 100 will have the disease and the rest won't. So now suppose we test them all. What happens? Well, amongst the 100 who do have the disease, the test will get it right 99 percent of the time, and 99 will test positive. Amongst all these other people who don't have the disease, the test will get it right 99 percent of the time. It'll only get it wrong one percent of the time. But there are so many of them that there'll be an enormous number of false positives. Put that another way -- of all of them who test positive -- so here they are, the individuals involved -- less than one in 100 actually have the disease. So even though we think the test is accurate, the important part of the story is there's another bit of information we need.

불확실성에 대해 생각하는 좀더 재미없는 사안들 중 일부로 돌아가 보겠습니다. 여기 여러분에게 다른 퀴즈를 내겠습니다. 우리가 특정 질병에 대한 검사를 한다고 합시다. 이 검사는 절대 안틀리는 건 아니지만, 상당히 좋은 검사입니다. 99 퍼센트의 경우로 맞는 답을 제시합니다. 그리고 제가 여러분 중 하나를 골라서 또는 길에서 한 사람을 골라서 그 질병을 검사합니다. AIDS를 일으키는 바이러스인 HIV 테스트라고 가정하고 테스트 결과 그 사람이 그 병이 있다고 합시다. 그 사람이 그 병이 있을 가능성은 얼마일까요? 검사는 99 퍼센트의 경우로 맞는 답을 제시한다고 했습니다. 따라서 자연스런 대답은 99 퍼센트입니다. 이 대답이 맘에 드는 분이 있습니까? 자자 -- 모든 분들이 참여해야 합니다. 저를 더이상 믿지 않겠다고 생각하지 마시기 바랍니다. (웃음) 사실, 회의적인 게 맞습니다. 왜냐면 맞는 답이 아니거든요. 그게 바로 여러분이 생각하시고 있는 것일 겁니다. 이게 정답이 아닌 이유는 단지 이야기의 일부일 뿐이라서가 아닙니다. 그건 실제로 병이 얼마나 흔한지 아니면 희귀한지에 따라 달라지기 때문입니다. 따라서, 설명해 보겠습니다. 여기 백만 명에 대한 작은 그림이 있습니다. 이런 질병에 대해 생각해 봅시다. 이건 매우 희귀한 거라 만 명 중 한 명에게만 영향을 줍니다. 이 백만 명 중, 대부분은 건강합니다. 그리고 일부는 그 질병을 가지고 있습니다. 그리고, 만일 그 질병이 유행한다면, 사실은 100명이 질병에 걸리고, 나머지는 그렇지 않다는 겁니다. 그러므로, 우리가 그 백만명을 전부 테스트한다고 합시다. 어떻게 될까요? 자, 질병을 가진 100 명 중에서 테스트는 99 퍼센트 맞으므로, 99명은 양성으로 나오고 질병이 없는 다른 사람들 중에서는 테스트는 역시 99 퍼센트 맞으므로 단지 1퍼센트만 잘못 결과를 낼 것입니다. 그러나 질병이 없는 사람들이 훨씬 많으므로, 가짜 양성(false positive)이 엄청나게 나올 것입니다. 다시 말해서 -- 양성으로 판정된 사람들 중에서, -- 여기에 그들이 있죠 -- 관련된 사람들 중에 100 분의 1 이하가 실제로 질병을 가지고 있다는 겁니다. 따라서, 우리가 그 테스트가 정확하다고 생각하더라도, 이 이야기의 중요한 부분은 우리는 다른 정보가 필요하다는 겁니다.

Here's the key intuition. What we have to do, once we know the test is positive, is to weigh up the plausibility, or the likelihood, of two competing explanations. Each of those explanations has a likely bit and an unlikely bit. One explanation is that the person doesn't have the disease -- that's overwhelmingly likely, if you pick someone at random -- but the test gets it wrong, which is unlikely. The other explanation is that the person does have the disease -- that's unlikely -- but the test gets it right, which is likely. And the number we end up with -- that number which is a little bit less than one in 100 -- is to do with how likely one of those explanations is relative to the other. Each of them taken together is unlikely.

여기에 중요한 직감이 있습니다. 일단 결과가 양성임을 안다면, 반드시 해야 할 일은 두가지 가능한 설명에 대한 타당성 또는 가능성를 재봐야 한다는 겁니다. 각각의 설명에는 가능성이 있는 부분과 그렇지 않은 부분이 있습니다. 첫번째 설명은 그 사람이 질병을 가지고 있지 않다는 건데 -- 만일 그 사람을 무작위로 뽑은 거라면 그럴 가능성이 매우 있습니다. 그러나, 그건 테스트가 틀렸다는 건 데, 그럴 가능성이 없어 보입니다. 다른 설명은 그 사람이 질병을 가진 건데, -- 그럴 가능성은 없어 보이지만 -- 테스트가 제대로 맞추었다는 것으로 그럴 가능성은 있어 보입니다. 그리고 결국 우리가 계산을 마친 숫자를 보면 -- 100 분의 1보다 약간 더 작은 숫자인데 -- 이 두 설명들 중 하나가 다른 하나와 비교하여 얼마나 가능성을 가졌는지와 관계가 있습니다. 이 두 설명을 같이 고려하는 것은 가능성이 없어 보입니다.

Here's a more topical example of exactly the same thing. Those of you in Britain will know about what's become rather a celebrated case of a woman called Sally Clark, who had two babies who died suddenly. And initially, it was thought that they died of what's known informally as "cot death," and more formally as "Sudden Infant Death Syndrome." For various reasons, she was later charged with murder. And at the trial, her trial, a very distinguished pediatrician gave evidence that the chance of two cot deaths, innocent deaths, in a family like hers -- which was professional and non-smoking -- was one in 73 million. To cut a long story short, she was convicted at the time. Later, and fairly recently, acquitted on appeal -- in fact, on the second appeal. And just to set it in context, you can imagine how awful it is for someone to have lost one child, and then two, if they're innocent, to be convicted of murdering them. To be put through the stress of the trial, convicted of murdering them -- and to spend time in a women's prison, where all the other prisoners think you killed your children -- is a really awful thing to happen to someone. And it happened in large part here because the expert got the statistics horribly wrong, in two different ways.

이와 정확히 같은 경우로 좀더 시사적인 예를 보도록 합시다. 이 중 영국에 계신 분은 이제 어느 정도 유명해진 사건으로 자신의 두 아이가 갑자기 사망한 샐리 클라크의 경우입니다. (변호사이며, MSbP에 의한 유아 살해 혐의로 3년간 복역하다 무죄로 풀려났으나, 그로 인한 알콜 중독으로 사망) 그리고 초기에는, 이 아이들이 유아 돌연사로 죽었다고 여겨졌습니다. 더 정확히 말하면 유아 돌연사 증후군이죠. 여러 가지 이유로, 그녀는 나중에 살인 혐의를 받게 됩니다. 그리고 재판에서, 그녀의 재판에서, 매우 유명한 소아과 의사가 (Roy Meadow: MSbP 증상을 최초로 주장한 사람) 두 건의 유아 돌연사, 즉 누구도 죄가 없는 사망 사건이 그녀의 가족같이 전문직에 종사하고 흡연을 안하는 가족에서 일어날 우연성은 7천3백만 분의 1이라는 증거를 제시합니다. 긴 얘기를 짧게 말하자면, 그녀는 그 당시에는 유죄를 선고받았습니다. 나중에, 아주 최근 들어, 항소심에서 무죄를 인정받았습니다. 실제로는 두 번째 항소심이었죠. 그리고 이 맥락을 고려하자면, 이것이 얼마나 끔찍한 것인지 이해할 수 있을텐데요. 아무 죄도 없는 한 사람이 자신의 첫 아이를 잃고, 두 번째 아이를 잃고, 그들을 살해했다고 유죄를 선고 받은 겁니다. 재판 과정과 아기들을 죽였다고 유죄 선고를 받은 정신적 고통과 여성 감옥에서 당신을 자식들을 죽인 사람으로 간주할 다른 죄수들과 지내야 하는 스트레스는 -- 사람에게 일어날 수 있는 진정으로 끔찍한 일입니다. 그리고, 이 일은 많은 곳에서 일어나는 데, 그 이유는 전문가들이 통계를 두 가지 다른 방식으로 지독하게 잘못 받아들이기 때문입니다.

So where did he get the one in 73 million number? He looked at some research, which said the chance of one cot death in a family like Sally Clark's is about one in 8,500. So he said, "I'll assume that if you have one cot death in a family, the chance of a second child dying from cot death aren't changed." So that's what statisticians would call an assumption of independence. It's like saying, "If you toss a coin and get a head the first time, that won't affect the chance of getting a head the second time." So if you toss a coin twice, the chance of getting a head twice are a half -- that's the chance the first time -- times a half -- the chance a second time. So he said, "Here, I'll assume that these events are independent. When you multiply 8,500 together twice, you get about 73 million." And none of this was stated to the court as an assumption or presented to the jury that way. Unfortunately here -- and, really, regrettably -- first of all, in a situation like this you'd have to verify it empirically. And secondly, it's palpably false. There are lots and lots of things that we don't know about sudden infant deaths. It might well be that there are environmental factors that we're not aware of, and it's pretty likely to be the case that there are genetic factors we're not aware of. So if a family suffers from one cot death, you'd put them in a high-risk group. They've probably got these environmental risk factors and/or genetic risk factors we don't know about. And to argue, then, that the chance of a second death is as if you didn't know that information is really silly. It's worse than silly -- it's really bad science. Nonetheless, that's how it was presented, and at trial nobody even argued it. That's the first problem. The second problem is, what does the number of one in 73 million mean? So after Sally Clark was convicted -- you can imagine, it made rather a splash in the press -- one of the journalists from one of Britain's more reputable newspapers wrote that what the expert had said was, "The chance that she was innocent was one in 73 million." Now, that's a logical error. It's exactly the same logical error as the logical error of thinking that after the disease test, which is 99 percent accurate, the chance of having the disease is 99 percent. In the disease example, we had to bear in mind two things, one of which was the possibility that the test got it right or not. And the other one was the chance, a priori, that the person had the disease or not. It's exactly the same in this context. There are two things involved -- two parts to the explanation. We want to know how likely, or relatively how likely, two different explanations are. One of them is that Sally Clark was innocent -- which is, a priori, overwhelmingly likely -- most mothers don't kill their children. And the second part of the explanation is that she suffered an incredibly unlikely event. Not as unlikely as one in 73 million, but nonetheless rather unlikely. The other explanation is that she was guilty. Now, we probably think a priori that's unlikely. And we certainly should think in the context of a criminal trial that that's unlikely, because of the presumption of innocence. And then if she were trying to kill the children, she succeeded. So the chance that she's innocent isn't one in 73 million. We don't know what it is. It has to do with weighing up the strength of the other evidence against her and the statistical evidence. We know the children died. What matters is how likely or unlikely, relative to each other, the two explanations are. And they're both implausible. There's a situation where errors in statistics had really profound and really unfortunate consequences. In fact, there are two other women who were convicted on the basis of the evidence of this pediatrician, who have subsequently been released on appeal. Many cases were reviewed. And it's particularly topical because he's currently facing a disrepute charge at Britain's General Medical Council.

자, 그 사람은 어디서 7천3백만이라는 숫자를 얻었을까요? 그 사람은 어떤 연구를 참고했는 데, 그 연구에서는 샐리 클라크와 비슷한 가족에서 아이가 유아 돌연사할 가능성이 8,500 분의 1이라는 걸 본 겁니다. 따라서, 그는 "만일 가족에서 유아 돌연사가 한 번 일어난 다면, " "또 한 번 일어날 확률은 변하지 않는다고 가정할 수 있다."라고 말한 겁니다. 그게 바로 통계학자들이 말하는 이른바 독립성의 가정입니다. 이건 마치 "만일 당신이 동전을 던지고, 처음에 앞면이 나왔다면 이 사실은" "두 번째에 동전의 앞면이 나올 가능성에 영향을 주지 않는다"는 말과 같습니다. 따라서, 만일 동전을 두 번 던진다면, 앞면이 두번 나올 확률은 2분의 1인 첫번째 던졌을 때의 가능성에 역시 2분의 1인 두번째 던졌을 때의 가능성을 곱한 것이 됩니다. 따라서 그 소아과 의사는 말하길 "자, 가정해 봅시다 -- 이러한 사건들이 서로 독립적이라고 가정하겠습니다. 그러면, 8,500 을 두 번 곱하게 되는 데, 7천3백만을 얻게 되는 겁니다." 그리고 이 주장이 가정이라는 점은, 법정에서나 배심원들에게 제시되지 않았습니다. 여기서 불행히도 -- 그리고 너무도 유감스럽게도 -- 무엇보다 먼저, 이러한 상황이라면, 제시된 주장을 실험적으로 확인해야 합니다. 두번째로, 그 주장은 명백히 거짓입니다. 우리가 유아 돌연사에 대해 모르는 건 너무도 너무도 많습니다. 우리가 모르는 환경적인 요소가 존재할 수도 있습니다. 또한 우리가 모르는 유전적인 요소가 존재할 가능성도 매우 높습니다. 따라서, 어떤 가족이 유아 돌연사로 고통받는다면, 그 가족은 고위험군으로 분류해야 합니다. 그 가족은 우리가 모르는 환경적 위험 요소를 가지고 있고 또는 유전적인 위험 요소를 가지고 있을 수도 있습니다. 그리고 그런 정보를 모르면서, 두번째 죽음의 가능성에 대해 논하는 것은 정말 어리석은 짓입니다. 그건 어리석은 거보다 더 나쁩니다. 그건 정말 잘못된 과학입니다. 그럼에도, 상황은 그런 식으로 흘러갔고, 재판에서 아무도 논쟁하지 않았습니다. 이것이 첫번째 문제입니다. 두번째 문제는 도대체 7천3백만분의 일이라는 숫자가 의미하는 게 뭐냐는 겁니다. 샐리 클라크가 유죄선고를 받고 나서 -- 예상하실 수 있겠지만, 언론은 이걸 특정으로 만들었고 -- 영국의 유명한 신문의 한 기자는 이렇게 썼습니다. 전문가가 말한 바는 "그녀가 무죄일 확률이 7천3백만 분의 1이라는 것이다." 자, 이건 논리적인 오류입니다. 이건 99 퍼센트 정확한 질병 테스트를 하고 나서 질병에 걸렸을 가능성이 99 퍼센트라고 생각하는 것과 정확히 똑같은 논리적인 오류입니다. 질병에 대한 예에서, 우린 두가지 경우를 명심해야 했습니다. 하나는 테스트가 맞는지 틀리는지의 가능성에 대한 것이고요. 다른 하나는, 테스트 이전에 그 사람이 질병을 가지고 있는지 아닌지에 대한 가능성입니다. 이 맥락에 따르면 정확히 같은 것입니다. 두 가지 것이 전체 설명의 두 부분에 관련되어 있습니다. 두가지 가능한 설명에 대해, 우린 얼마나 가능성이 있는지, 상대적으로 얼마나 가능성이 있는지 알고 싶어합니다. 그 중 하나는 샐리 클라크가 무죄다라는 거고요. 그건 원래부터 매우 가능성이 있는 겁니다. 대부분의 어머니들은 자기 자식을 살해하지 않습니다. 그리고 그 설명의 두번째 부분은 그녀가 믿을 수 없을 정도로 있을 수 없는 사건들로 괴로워하고 있었다는 겁니다. 7천3백만분의 1만큼 있을 수 없는 게 아니라, 그럼에도 더더욱 있을 거 같지 않은 사건으로 말입니다. 다른 설명을 보자면, 그녀는 유죄입니다. 이건 원래부터 가능성이 희박합니다. 그리고, 우린 범죄자의 재판이라는 상황에 기대어 그 가능성이 거의 없다고 생각해야 합니다. 무죄추정의 원칙 때문이죠. 그리고, 만일 그녀가 자식들을 죽이려 했었다면, 그녀는 성공한 겁니다. 그리고, 그녀가 무죄일 가능성은 7천3백만 분의 1이 아닙니다. 우린 그 가능성이 얼마인지 모릅니다. 그 가능성은 그녀가 유죄라는 다른 증거들의 중요성과 통계적인 증거를 같이 저울질한 결과와 관계가 있는 것입니다. 아이들이 죽었다는 것을 우린 압니다. 중요한 것은 두 가지 가능한 설명이 서로 상대적으로 얼마나 가능성이 있는지 그렇지 않은지입니다. 그리고 둘 다 믿기지 않습니다. 통계적인 오류가 진정으로 심오하고, 진정으로 불행한 결과를 낳은 상황이 되었습니다. 사실, 이 소아과 의사가 제시한 증거를 기반으로 다른 두 여성이 유죄 선고를 받았고 나중에, 항소심을 통해 풀려났습니다. 많은 사건들이 재조사되었습니다. 이건 특히 시사적인데, 그 의사는 현재 영국 일반의사협회에 불명예를 안긴 혐의로 기소되었기 때문입니다.

So just to conclude -- what are the take-home messages from this? Well, we know that randomness and uncertainty and chance are very much a part of our everyday life. It's also true -- and, although, you, as a collective, are very special in many ways, you're completely typical in not getting the examples I gave right. It's very well documented that people get things wrong. They make errors of logic in reasoning with uncertainty. We can cope with the subtleties of language brilliantly -- and there are interesting evolutionary questions about how we got here. We are not good at reasoning with uncertainty. That's an issue in our everyday lives. As you've heard from many of the talks, statistics underpins an enormous amount of research in science -- in social science, in medicine and indeed, quite a lot of industry. All of quality control, which has had a major impact on industrial processing, is underpinned by statistics. It's something we're bad at doing. At the very least, we should recognize that, and we tend not to. To go back to the legal context, at the Sally Clark trial all of the lawyers just accepted what the expert said. So if a pediatrician had come out and said to a jury, "I know how to build bridges. I've built one down the road. Please drive your car home over it," they would have said, "Well, pediatricians don't know how to build bridges. That's what engineers do." On the other hand, he came out and effectively said, or implied, "I know how to reason with uncertainty. I know how to do statistics." And everyone said, "Well, that's fine. He's an expert." So we need to understand where our competence is and isn't. Exactly the same kinds of issues arose in the early days of DNA profiling, when scientists, and lawyers and in some cases judges, routinely misrepresented evidence. Usually -- one hopes -- innocently, but misrepresented evidence. Forensic scientists said, "The chance that this guy's innocent is one in three million." Even if you believe the number, just like the 73 million to one, that's not what it meant. And there have been celebrated appeal cases in Britain and elsewhere because of that.

이제 결론을 내자면 -- 이 발표에서 집에 가져갈 메시지가 뭘까요? 자, 우린 무작위성, 불확실성, 그리고 가능성이 우리 매일매일의 생활의 많은 부분임을 압니다. 또한 -- 비록 여러분들은 여러 가지 방향으로 매우 특별한 분들이지만, 제가 제시한 예들에 제대로 대답하지 못한 전형적인 사람들입니다. 사람들이 이러한 질문들에 제대로 대답하지 못한다는 건, 관련 논문들에도 잘 나와 있습니다. 사람들은 불확실성 하에서 논리적인 추론을 할 때 오류를 저지릅니다. 우린 언어의 미묘함에 훌륭하게 대처해야 하고 -- 우리가 어떻게 이렇게 되었는지에 대해 흥미로운 진화적인 질문들이 있습니다. 우린 불확실성 하에서의 추론을 잘 못합니다. 그건 우리가 매일 생활하는 바에 있어 문제가 됩니다. 이러한 많은 발표들에서 들으셨듯이, 통계는 과학 연구 -- 특히 사회과학이나 의학에서 많은 것들에 대해 뒷받침하는 근거를 제공합니다. 그리고 산업계의 많은 부분에서도 실제로 그러합니다. 산업 처리 과정에 주요한 영향을 주는 품질 관리의 모든 것이 통계에 의해 근거를 얻습니다. 통계는 우리가 제대로 못해내는 것입니다. 최소한 적어도, 우린 그 사실을 인식해야 하는 데, 그러지 못합니다. 샐리 클라크의 재판에 대한 법률적인 상황으로 돌아가 보면, 모든 변호사들이 전문가가 말한 것을 그냥 받아들였습니다. 만일 소아과 의사가 와서 배심원에게 말하길, "난 다리를 어떻게 건설하는지 압니다. 저 길 아래 다리 하나를 지었습니다." "그 위로 차를 타고 지나서 집으로 가시지요."라고 한다면, 배심원들은 "허, 소아과 의사는 다리를 건설할 줄 몰라." "그건 엔지니어가 할 일이지"라고 하겠죠. 반면, 그는 나와서 효과적으로 주장했거나, 최소한 암시하기를, "나는 불확실성 하에서도 추론을 할 줄 압니다. 나는 통계를 할 줄 알거든요."라고 했고, 모든 사람들이 "음, 그거 괜찮네. 그는 전문가니까."라고 말했습니다. 따라서, 우린 무엇이 우리가 잘하는 건지, 아닌지를 이해할 필요가 있습니다. 정확히 똑같은 문제들이 DNA 프로파일링의 초기에 발생했는 데, 과학자들, 법률가들 그리고 어떤 경우에는 판사들이, 상투적으로 증거를 잘못 제시했습니다. 일반적으로 -- 사람들이 믿고 싶기를 -- 순전히 실수로 증거를 잘못 제시한 것입니다. 법의학 과학자들이 "그 사람이 무고일 가능성은 3백만분의 1이다."라고 말했습니다. 7천3백만분의 1처럼, 그 숫자를 여러분이 믿는다고 해도, 그건 그런 뜻이 아닙니다. 그리고 그러한 것 때문에 영국과 다른 나라들에서는 몇 건의 유명한 항소심들이 있었습니다.

And just to finish in the context of the legal system. It's all very well to say, "Let's do our best to present the evidence." But more and more, in cases of DNA profiling -- this is another one -- we expect juries, who are ordinary people -- and it's documented they're very bad at this -- we expect juries to be able to cope with the sorts of reasoning that goes on. In other spheres of life, if people argued -- well, except possibly for politics -- but in other spheres of life, if people argued illogically, we'd say that's not a good thing. We sort of expect it of politicians and don't hope for much more. In the case of uncertainty, we get it wrong all the time -- and at the very least, we should be aware of that, and ideally, we might try and do something about it. Thanks very much.

법률 시스템의 맥락에서 이 발표를 마치자면... "증거를 제시하기 위해 노력하자"고 말하는 건 매우 좋습니다. 그러나 더더욱, DNA 프로파일링의 경우 -- 이건 다른 경우입니다. 우린 배심원들이 평범한 사람들로 -- 그들이 불확실성 하에서의 추론에 약하다는 건 논문에 잘 나와있으므로 -- 우린 배심원들이 이러한 종류의 추론에 대응할 수 있기를 기대해야 합니다. 인생의 다른 부분, 만일 사람들이 주장한다면 -- 아, 아마도 정치는 빼고요. 그러나, 인생의 다른 부분에서, 만일 사람들이 비논리적으로 주장한다면, 우린 그건 좋은 게 아니다라고 말해야 합니다. 우린 그런 건 정치인들에게나 기대하고, 그 이상은 기대도 안합니다. 불확실성의 경우, 우린 언제나 제대로 못해냅니다. 그리고 적어도 최소한, 우린 그걸 알고 있어야 합니다. 그리고 이상적으로, 우린 이것에 대해 뭔가 하려고 노력해야 할 것입니다. 매우 감사합니다.

Peter Donnelly: How juries are fooled by statistics

Peter Donnelly: How juries are fooled by statistics

Related talks

Hans Rosling: The best stats you've ever seen

Michael Shermer: Why people believe weird things

Emily Oster: Flip your thinking on AIDS in Africa

Robert Full: Learning from the gecko's tail

Aubrey de Grey: A roadmap to end aging

E.O. Wilson: Advice to a young scientist

Related talks

Hans Rosling: The best stats you've ever seen

Michael Shermer: Why people believe weird things

Emily Oster: Flip your thinking on AIDS in Africa

Robert Full: Learning from the gecko's tail

Aubrey de Grey: A roadmap to end aging

E.O. Wilson: Advice to a young scientist