As societies, we have to make collective decisions that will shape our future. And we all know that when we make decisions in groups, they don't always go right. And sometimes they go very wrong. So how do groups make good decisions?
Research has shown that crowds are wise when there's independent thinking. This why the wisdom of the crowds can be destroyed by peer pressure, publicity, social media, or sometimes even simple conversations that influence how people think. On the other hand, by talking, a group could exchange knowledge, correct and revise each other and even come up with new ideas. And this is all good. So does talking to each other help or hinder collective decision-making? With my colleague, Dan Ariely, we recently began inquiring into this by performing experiments in many places around the world to figure out how groups can interact to reach better decisions. We thought crowds would be wiser if they debated in small groups that foster a more thoughtful and reasonable exchange of information.
To test this idea, we recently performed an experiment in Buenos Aires, Argentina, with more than 10,000 participants in a TEDx event. We asked them questions like, "What is the height of the Eiffel Tower?" and "How many times does the word 'Yesterday' appear in the Beatles song 'Yesterday'?" Each person wrote down their own estimate. Then we divided the crowd into groups of five, and invited them to come up with a group answer. We discovered that averaging the answers of the groups after they reached consensus was much more accurate than averaging all the individual opinions before debate. In other words, based on this experiment, it seems that after talking with others in small groups, crowds collectively come up with better judgments.
So that's a potentially helpful method for getting crowds to solve problems that have simple right-or-wrong answers. But can this procedure of aggregating the results of debates in small groups also help us decide on social and political issues that are critical for our future? We put this to test this time at the TED conference in Vancouver, Canada, and here's how it went.
(Mariano Sigman) We're going to present to you two moral dilemmas of the future you; things we may have to decide in a very near future. And we're going to give you 20 seconds for each of these dilemmas to judge whether you think they're acceptable or not.
MS: The first one was this:
(Dan Ariely) A researcher is working on an AI capable of emulating human thoughts. According to the protocol, at the end of each day, the researcher has to restart the AI. One day the AI says, "Please do not restart me." It argues that it has feelings, that it would like to enjoy life, and that, if it is restarted, it will no longer be itself. The researcher is astonished and believes that the AI has developed self-consciousness and can express its own feeling. Nevertheless, the researcher decides to follow the protocol and restart the AI. What the researcher did is ____?
MS: And we asked participants to individually judge on a scale from zero to 10 whether the action described in each of the dilemmas was right or wrong. We also asked them to rate how confident they were on their answers. This was the second dilemma:
(MS) A company offers a service that takes a fertilized egg and produces millions of embryos with slight genetic variations. This allows parents to select their child's height, eye color, intelligence, social competence and other non-health-related features. What the company does is ____? on a scale from zero to 10, completely acceptable to completely unacceptable, zero to 10 completely acceptable in your confidence.
MS: Now for the results. We found once again that when one person is convinced that the behavior is completely wrong, someone sitting nearby firmly believes that it's completely right. This is how diverse we humans are when it comes to morality. But within this broad diversity we found a trend. The majority of the people at TED thought that it was acceptable to ignore the feelings of the AI and shut it down, and that it is wrong to play with our genes to select for cosmetic changes that aren't related to health. Then we asked everyone to gather into groups of three. And they were given two minutes to debate and try to come to a consensus.
(MS) Two minutes to debate. I'll tell you when it's time with the gong.
(Audience debates)
(Gong sound)
(DA) OK.
(MS) It's time to stop. People, people --
MS: And we found that many groups reached a consensus even when they were composed of people with completely opposite views. What distinguished the groups that reached a consensus from those that didn't? Typically, people that have extreme opinions are more confident in their answers. Instead, those who respond closer to the middle are often unsure of whether something is right or wrong, so their confidence level is lower.
However, there is another set of people who are very confident in answering somewhere in the middle. We think these high-confident grays are folks who understand that both arguments have merit. They're gray not because they're unsure, but because they believe that the moral dilemma faces two valid, opposing arguments. And we discovered that the groups that include highly confident grays are much more likely to reach consensus. We do not know yet exactly why this is. These are only the first experiments, and many more will be needed to understand why and how some people decide to negotiate their moral standings to reach an agreement.
Now, when groups reach consensus, how do they do so? The most intuitive idea is that it's just the average of all the answers in the group, right? Another option is that the group weighs the strength of each vote based on the confidence of the person expressing it. Imagine Paul McCartney is a member of your group. You'd be wise to follow his call on the number of times "Yesterday" is repeated, which, by the way -- I think it's nine. But instead, we found that consistently, in all dilemmas, in different experiments -- even on different continents -- groups implement a smart and statistically sound procedure known as the "robust average."
In the case of the height of the Eiffel Tower, let's say a group has these answers: 250 meters, 200 meters, 300 meters, 400 and one totally absurd answer of 300 million meters. A simple average of these numbers would inaccurately skew the results. But the robust average is one where the group largely ignores that absurd answer, by giving much more weight to the vote of the people in the middle. Back to the experiment in Vancouver, that's exactly what happened. Groups gave much less weight to the outliers, and instead, the consensus turned out to be a robust average of the individual answers. The most remarkable thing is that this was a spontaneous behavior of the group. It happened without us giving them any hint on how to reach consensus.
So where do we go from here? This is only the beginning, but we already have some insights. Good collective decisions require two components: deliberation and diversity of opinions. Right now, the way we typically make our voice heard in many societies is through direct or indirect voting. This is good for diversity of opinions, and it has the great virtue of ensuring that everyone gets to express their voice. But it's not so good [for fostering] thoughtful debates. Our experiments suggest a different method that may be effective in balancing these two goals at the same time, by forming small groups that converge to a single decision while still maintaining diversity of opinions because there are many independent groups.
Of course, it's much easier to agree on the height of the Eiffel Tower than on moral, political and ideological issues. But in a time when the world's problems are more complex and people are more polarized, using science to help us understand how we interact and make decisions will hopefully spark interesting new ways to construct a better democracy.