I don't mean to brag, but there are lots of things that I'm pretty average at. From playing table tennis, cooking risotto, finding countries on a map, just to name a few. Now, in our everyday lives, we're not typically assessed on our skills and abilities, so we're forced to rely on our own judgments. I may think I'm pretty decent with Italian cuisine, but how accurate is my assessment?
Now, what we’re talking about here is metacognition: our insight into our own thought processes. If I have good metacognitive insight, then how good I think I am at a particular task should line up pretty well with how good I actually am. Of course, in the real world this is often not the case. And indeed, we probably all know someone who thinks they're great at navigating maps, when in fact the reality is often the opposite. Not to name any names, of course, but still.
Perhaps you think this applies to other people and that you, yourself, wouldn't make this sort of mistake. So let's try a quick experiment. I want you to think about how you would rate yourself in terms of your driving ability. Would you rate yourself as below average, average or perhaps even above average? So most people rate themselves as above average, which, of course, is mathematically impossible, and something that we call the "better than average" effect. This is just one of a number of cognitive biases that we see when people judge their own abilities.
Today, I'm going to focus on a related bias, the Dunning-Kruger effect. So back in 1999, two psychologists at Cornell University, Dunning and Kruger, described the mistakes people make when estimating their own abilities. So if we take a sample of people and we divide them into four groups based on their scores on a test, and order those groups from lowest to highest. If we plot those scores on a graph along with their self-estimates, so how well they thought they did on the test, this is the pattern that we see. So the red line is a steep slope representing their actual scores. As it must be, since we ordered the groups based on their scores in the first place. Now what's interesting is the blue shallower line. This represents their self-estimates. So, how good they thought they did on the test. Now the Dunning-Kruger effect describes how the weakest performers significantly overestimate their performance, shown here in the green oval. The explanation for this, according to Dunning and Kruger, is that insight and ability rely on the same thing. So if I'm poor at a task, I also lack the metacognitive insight to accurately assess my ability.
Now this pattern has been seen again and again across a number of domains, from driving skill to exam-taking, even chess-playing. However, in recent years, a number of criticisms have been leveled at this approach, and we now have reason to believe that this pattern results is virtually unavoidable.
One reason for this is the statistical effect, regression to the mean. Now this is something that comes about when we have two measures that are related but not perfectly so. So imagine we have a sample of people and we measure their heights and their weights. Now height and weight are related, tall people are typically heavier, but the relationship is far from perfect. So unlike in the figure at the top here, the shortest people in red won't all be the lightest people. Some of them will be overweight or particularly muscular, for example. Similarly at the top end, the tallest people in blue won't all be the heaviest people. Some of them will be underweight, and so on. Now as a result, on average, the shortest people will rank higher for weight than they do for height, and the tallest people will rank lower for weight than they do for height producing this blue line here and the crossover pattern you're now becoming familiar with.
Now, some people might put forward a spurious explanation for why short people are relatively overweight or tall people relatively underweight, when in fact no explanation is needed.
Perhaps more compelling a reason to doubt the Dunning-Kruger effect is that we can produce the same pattern in our data when our data is entirely meaningless. So if we collect people's test scores along with their self-estimates of those scores, but then we shuffle those self-estimates and then analyze as before, then we still find that same pattern in the data. Of course, any effect that we can find with shuffled or randomized data is one that we should surely be suspicious of.
So, given these and other issues with the Dunning-Kruger approach, I was saddened and disappointed and, frankly, a little annoyed to discover that the same approach was now being applied in my field of expertise, which is face-matching. Now, this is a task where we're showing two images of faces or an image and a live person, and we're asked to decide whether they show the same person or two different people. Now, we've all stood in line at passport control, anxiously awaiting the passport officer's decision as to whether our ID photos look sufficiently like us or not. Indeed, I've included at the top here some examples of ID images from my own life, just to illustrate some variability. Some proud moments in photographic history, I'm sure you'll agree. And so what I'd like to do now is first see how well you might perform as passport officers.
So here are four pairs of images, some students’ ID images and some student photos. For each pair, I'd like you to decide whether it's a match, so two images of the same person, or a mismatch, two images of different people. Some of you might be surprised to hear that the top two pairs are matches, so images of the same people, and the bottom two pairs show mismatches, so two different people.
Now we know this task is particularly difficult when the images show identities that we're unfamiliar with. This is because it's hard to take into account the changes that can happen to the face across time, as well as over different situations, so changes in facial expression or lighting, for instance. We know this task is difficult for passport officers as well, and they also make mistakes. So this is why I thought it would be particularly interesting to look at the relationship between insight and ability in this important security context.
So given the issues we’ve described already with looking at overall scores and people’s self-estimates, I instead decided to focus on individual decision making. So over a series of experiments, I asked people to look at pairs of images and decide whether they were a match or a mismatch. But I also asked people to provide a rating of confidence in each decision. Now a good metacognitive insight would be reflected in people being much more confident in decisions that turned out to be correct and much less confident in decisions that turned out to be incorrect.
So let's have a look at how people did. Now I think this pattern is particularly fascinating, but also fairly intuitive. Let's start with the red line, which represents people's confidence in their incorrect responses. So as you can see, it doesn't matter how good people were at the test overall, represented by the score on the X-axis at the bottom there; people were approximately the same in terms of their confidence when they were incorrect. Now what's interesting is the blue line, which represents confidence when people were correct in their decisions. As you can see, the best performers on the test were much more confident in their correct responses in comparison with their incorrect ones. So shows good metacognitive insight. The weakest performers, on the other hand, were no different in their confidence for their correct and incorrect responses, shown here in the green circle. And so they show poor metacognitive insight.
So what might be going on with these weak performers? Now it could be the case that they have some sense they tend to perform poorly on tests in general, and so they're just less confident overall in their responses. However, I didn't find that pattern of lower confidence in my data, at least with individual decision-making. Instead, it's more likely that they were more confident in their correct responses in comparison with their incorrect ones. But this was simply unrelated to their accuracy on each trial because they had poor insight.
So how does this all fit in with the Dunning-Kruger effect? So Dunning and Kruger argued that the weakest performers show the least amount of insight and they overestimated their performance. And that's implied that they had greater confidence. Now, we didn't see that here in our data. The weakest performers didn't seem to be overly confident. However, the Dunning-Kruger effect also describes how insight depends on ability. And so the weakest performers showed the least amount of insight, overestimating their performance in their case. As we've just seen, the weakest performers do seem to show the least amount of insight. Here, they couldn't differentiate between their correct and incorrect responses. So insight does appear to depend on ability, but not in the way that Dunning and Kruger originally thought.
So if there are two things I'd like you to remember from this talk and take home, think about afterwards, they are: first, more broadly, science is always updating. Research comes along, new evidence that may contradict or even disprove previous work. In this case, the Dunning-Kruger effect may well not be a thing, despite the fact that it's so prevalent in popular culture.
Second, insight depends on ability. For the weakest performers, there's no difference between their confidence for correct and incorrect responses. They have poor insight, they can't tell the difference. For strong performers, when they're giving a correct answer, they're much more confident. Of course, the inverse isn't always true. Being more confident doesn't mean that you're right. You might be wrong and simply have poor insight.
So in our everyday lives, you should think about who it is that you ask the opinions of. If someone is an expert in their field, then if they're more confident, they're probably right, but if they're unsure, this is also informative and tells us something useful. It's much more sensible to find someone that we know is knowledgeable in an area, rather than someone who is simply confident in their opinion, because confidence is easily misplaced.
And finally, for those of you who are still wondering how good my risotto actually is, that may have to wait for a future talk.
Thank you.
(Applause)