This is something you won't like. But here everyone is a liar. Don't take it too personally. What I mean is that lying is very common and it is now well-established that we lie on a daily basis. Indeed, scientists have estimated that we tell around two lies per day, although, of course, it's not that easy to establish those numbers with certainty.
And, well, I introduce myself. I'm Riccardo, I'm a psychologist and a PhD candidate, and for my research project I study how good are people at detecting lies. Seems cool, right? But I'm not joking. And you might wonder why a psychologist was then invited to give a TED Talk about AI.
And well, I'm here today because I'm about to tell you how AI could be used to detect lies. And you will be very surprised by the answer.
But first of all, when is it relevant to detect lies? A first clear example that comes to my mind is in the criminal investigation field. Imagine you are a police officer and you want to interview a suspect. And the suspect is providing some information to you. And this information is actually leading to the next steps of the investigation. We certainly want to understand if the suspect is reliable or if they are trying to deceive us.
Then another example comes to my mind, and I think this really affects all of us. So please raise your hands if you would like to know if your partner cheated on you.
(Laughter)
And don't be shy because I know.
(Laughter)
Yeah. You see? It's very relevant. However, I have to have to say that we as humans are very bad at detecting lies. In fact, many studies have already confirmed that when people are asked to judge if someone is lying or not without knowing much about that person or the context, people's accuracy is no better than the chance level, about the same as flipping a coin.
You might also wonder if experts, such as police officers, prosecutors, experts and even psychologists are better at detecting lies. And the answer is complex, because experience alone doesn't seem to be enough to help detect lies accurately. It might help, but it's not enough.
To give you some numbers. In a well-known meta-analysis that previous scholars did in 2006, they found that naive judges' accuracy was on average around 54 percent. Experts perform only slightly better, with an accuracy rate around 55 percent.
(Laughter)
Not that impressive, right? And ... Those numbers actually come from the analysis of the results of 108 studies, meaning that these findings are quite robust. And of course, the debate is also much more complicated than this and also more nuanced. But here the main take-home message is that humans are not good at detecting lies.
What if we are creating an AI tool where everyone can detect if someone else is lying? This is not possible yet, so please don't panic.
(Laughter)
But this is what we tried to do in a recent study that I did together with my brilliant colleagues whom I need to thank. And actually, to let you understand what we did in our study, I need to first introduce you to some technical concepts and to the main characters of this story: Large language models.
Large language models are AI systems designed to generate outputs in natural language in a way that almost mimics human communication. If you are wondering how we teach these AI systems to detect lies, here is where something called fine-tuning comes in. But let's use a metaphor. Imagine large language models being as students who have gone through years of school, learning a little bit about everything, such as language, concepts, facts. But when it's time for them to specialize, like in law school or in medical school, they need more focused training. Fine-tuning is that extra education. And of course, large language models don't learn as humans do. But this is just to give you the main idea.
Then, as for training students, you need books, lectures, examples, for training large language models you need datasets. And for our study we considered three datasets, one about personal opinions, one about past autobiographical memories and one about future intentions. These datasets were already available from previous studies and contained both truthful and deceptive statements.
Typically, you collect these types of statements by asking participants to tell the truth or to lie about something. For example, if I was a participant in the truthful condition, and the task was "tell me about your past holidays," then I will tell the researcher about my previous holidays in Vietnam, and here we have a slide to prove it. For the deceptive condition they will randomly pick some of you who have never been to Vietnam, and they will ask you to make up a story and convince someone else that you've really been to Vietnam. And this is how it typically works.
And as in all university courses, you might know this, after lectures you have exams. And likewise after training our AI models, we would like to test them. And the procedure that we followed, that is actually the typical one, is the following. So we picked some statements randomly from each dataset and we took them apart. So the model never saw these statements during the training phase. And only after the training was completed, we used them as a test, as the final exam.
But who was our student then? In this case, it was a large language model developed by Google and called FLAN-T5. Flanny, for friends. And now that we have all the pieces of the process together, we can actually dig deep into our study.
Our study was composed by three main experiments. For the first experiment, we fine-tuned our model, our FLAN-T5, on each single dataset separately. For the second experiment, we fine-tuned our model on two pairs of datasets together, and we tested it on the third remaining one, and we used all three possible combinations. For the last final experiment, we fine-tuned the model on a new, larger training test set that we obtained by combining all the three datasets together.
The results were quite interesting because what we found was that in the first experiment, FLAN-T5 achieved an accuracy range between 70 percent and 80 percent. However, in the second experiment, FLAN-T5 dropped its accuracy to almost 50 percent. And then, surprisingly, in the third experiment, FLAN-T5 rose back to almost 80 percent. But what does this mean? What can we learn from these results?
From experiment one and three we learn that language models can effectively classify statements as deceptive, outperforming human benchmarks and aligning with previous machine learning and deep learning models that previous studies trained on the same datasets.
However, from the second experiment, we see that language models struggle in generalizing this knowledge, this learning across different contexts. And this is apparently because there is no one single universal rule of deception that we can easily apply in every context, but linguistic cues of deception are context-dependent.
And from the third experiment, we learned that actually language models can generalize well across different contexts, if only they have been previously exposed to examples during the training phase. And I think this sounds as good news.
But while this means that language models can be effectively applied for real-life applications in lie detection, more replication is needed because a single study is never enough so that from tomorrow we can all have these AI systems on our smartphones, and start detecting other people's lies.
But as a scientist, I have a vivid imagination and I would like to dream big. And also I would like to bring you with me in this futuristic journey for a while. So please imagine with me living in a world where this lie detection technology is well-integrated in our life, making everything from national security to social media a little bit safer. And imagine having this AI system that could actually spot fake opinions. From tomorrow, we could say when a politician is actually saying one thing and truly believes something else.
(Laughter)
And what about the security board context where people are asked about their intentions and reasons for why they are crossing borders or boarding planes. Well, with these systems, we could actually spot malicious intentions before they even happen. And what about the recruiting process?
(Laughter)
We heard about this already. But actually, companies could employ this AI to distinguish those who are really passionate about the role from those who are just trying to say the right things to get the job.
And finally, we have social media. Scammers trying to deceive you or to steal your identity. All gone. And someone else may claim something about fake news, and well, perfectly, language model could automatically read the news, flag them as deceptive or fake, and we could even provide users with a credibility score for the information they read. It sounds like a brilliant future, right?
(Laughter)
Yes, but ... all great progress comes with risks. As much as I'm excited about this future, I think we need to be careful.
If we are not cautious, in my view, we could end up in a world where people might just blindly believe AI outputs. And I'm afraid this means that people will just be more likely to accuse others of lying just because an AI says so. And I'm not the only one with this view because another study already proved it.
In addition, if we totally rely on this lie detection technology to say someone else is lying or not, we risk losing another important key value in society. We lose trust.
We won't need to trust people anymore, because what we will do is just ask an AI to double check for us. But are we really willing to blindly believe AI and give up our critical thinking? I think that's the future we need to avoid.
With hope for the future is more interpretability. And I'm about to tell you what I mean. Similar to when we look at reviews online, and we can both look at the total number of stars at places, but also we can look more in detail at the positive and negative reviews, and try to understand what are the positive sides, but also what might have gone wrong, to eventually create our own and personal idea if that is the place where we want to go, where we want to be.
Likewise, imagine a world where AI doesn't just offer conclusions, but also provides clear and understandable explanations behind its decisions. And I envision a future where this lie detection technology wouldn't just provide us with a simple judgment, but also with clear explanations for why it thinks someone else is lying.
And I would like a future where, yes, this lie detection technology is integrated in our life, or also AI technology in general, but still, at the same time, we are able to think critically and decide when we want to trust in AI judgment or when we want to question it.
To conclude, I think the future of using AI for lie detection is not just about technological advancement, but about enhancing our understanding and fostering trust. It's about developing tools that don't replace human judgment but empower it, ensuring that we remain at the helm. Don't step into a future with blind reliance on technology. Let's commit to deep understanding and ethical use, and we'll pursue the truth.
(Applause)
Thank you.