Yejin Choi: Why AI is incredibly smart and shockingly stupid

So I'm excited to share a few spicy thoughts on artificial intelligence. But first, let's get philosophical by starting with this quote by Voltaire, an 18th century Enlightenment philosopher, who said, "Common sense is not so common." Turns out this quote couldn't be more relevant to artificial intelligence today. Despite that, AI is an undeniably powerful tool, beating the world-class "Go" champion, acing college admission tests and even passing the bar exam.

Sono felice di condividere alcuni pensieri controversi sull’intelligenza artificiale. Ma prima, mettiamola sul piano filosofico partendo da questa citazione di Voltaire, filosofo illuminista del 18esimo secolo che disse, “Il buonsenso non è poi così buono.” Questa citazione non potrebbe essere più rilevante per l’intelligenza artificiale oggi. Nonostante ciò, l’IA è indubbiamente un potente strumento che ha battuto il campione mondiale di Go, ha passato a pieni voti test d’ammissione, e persino l’esame di stato per avvocati.

I’m a computer scientist of 20 years, and I work on artificial intelligence. I am here to demystify AI. So AI today is like a Goliath. It is literally very, very large. It is speculated that the recent ones are trained on tens of thousands of GPUs and a trillion words. Such extreme-scale AI models, often referred to as "large language models," appear to demonstrate sparks of AGI, artificial general intelligence. Except when it makes small, silly mistakes, which it often does. Many believe that whatever mistakes AI makes today can be easily fixed with brute force, bigger scale and more resources. What possibly could go wrong?

Sono un’informatica da 20 anni, e lavoro sull’intelligenza artificale. Sono qui per spiegare l’IA. L’IA oggi è come Golia. È letteralmente enorme. Si ipotizza che le più recenti siano formate su decine di migliaia di GPU e un trilione di parole. Tali modelli di IA a estrema scalabilità, a cui ci si riferisce come “Large Language Models”, sembra che mostrino cenni di AGI, Intelligenza Artificiale Generale. Tranne quando fa piccoli errori stupidi, e li fa spesso. Molti credono che qualunque errore l’intelligenza artificiale faccia oggi possa essere risolto con la forza bruta, una scala più grande e più risorse. Cosa potrebbe andare storto?

So there are three immediate challenges we face already at the societal level. First, extreme-scale AI models are so expensive to train, and only a few tech companies can afford to do so. So we already see the concentration of power. But what's worse for AI safety, we are now at the mercy of those few tech companies because researchers in the larger community do not have the means to truly inspect and dissect these models. And let's not forget their massive carbon footprint and the environmental impact.

Ci sono tre sfide immediate che già affrontiamo a livello sociale. Per prima cosa, i modelli di IA a estrema scalabilità sono molto costosi da formare e solo alcune aziende informatiche possono permetterselo. Quindi già vediamo la concentrazione del potere. Ma la cosa peggiore per la sicurezza dell’IA, è che oggi siamo in balia di quelle poche aziende informatiche perché i ricercatori delle comunità più grandi non hanno i mezzi per ispezionare e analizzare questi modelli. E non dimentichiamoci della loro enorme impronta carbonica e del loro impatto ambientale.

And then there are these additional intellectual questions. Can AI, without robust common sense, be truly safe for humanity? And is brute-force scale really the only way and even the correct way to teach AI?

E poi ci sono queste ulteriori domande intellettuali. Può l’IA, senza un solido senso comune, essere veramente sicura per l’umanità? E la scala con il metodo forza bruta è veramente l’unico modo o addirittura il modo corretto di educare l’IA?

So I’m often asked these days whether it's even feasible to do any meaningful research without extreme-scale compute. And I work at a university and nonprofit research institute, so I cannot afford a massive GPU farm to create enormous language models. Nevertheless, I believe that there's so much we need to do and can do to make AI sustainable and humanistic. We need to make AI smaller, to democratize it. And we need to make AI safer by teaching human norms and values. Perhaps we can draw an analogy from "David and Goliath," here, Goliath being the extreme-scale language models, and seek inspiration from an old-time classic, "The Art of War," which tells us, in my interpretation, know your enemy, choose your battles, and innovate your weapons.

Ultimamente mi chiedono spesso se sia ancora fattibile fare una ricerca significativa senza un calcolo su larga scala. E io lavoro in un’università e in un istituto di ricerca no profit, quindi non posso permettermi infinite GPU per creare enormi modelli di linguaggio. Tuttavia, credo ci sia così tanto da fare e che si possa fare per rendere l’IA sostenibile e umanistica. Dobbiamo rendere l’IA più piccola per democratizzarla. E dobbiamo rendere l’IA più sicura insegnandole norme e valori umani. Potremmo tracciare un’analogia con “Davide e Golia”, con Golia che rappresenta i modelli di linguaggio a estrema scalabilità, e cercare ispirazione dal grande classico “L’arte della guerra”, che ci dice, secondo la mia interpretazione, conosci il tuo nemico, scegli le tue battaglie e implementa le tue armi.

Let's start with the first, know your enemy, which means we need to evaluate AI with scrutiny. AI is passing the bar exam. Does that mean that AI is robust at common sense? You might assume so, but you never know.

Cominciamo con la prima, conosci il tuo nemico, che significa che dobbiamo valutare l’IA con attenzione. L’IA ha passato l’esame da avvocato. Questo significa che l’IA ha un forte buonsenso? Si potrebbe pensare di sì, ma non si sa mai.

So suppose I left five clothes to dry out in the sun, and it took them five hours to dry completely. How long would it take to dry 30 clothes? GPT-4, the newest, greatest AI system says 30 hours. Not good. A different one. I have 12-liter jug and six-liter jug, and I want to measure six liters. How do I do it? Just use the six liter jug, right? GPT-4 spits out some very elaborate nonsense.

Mettiamo che ho lasciato ad asciugare cinque vestiti al sole, e ci abbiano messo cinque ore ad asciugarsi completamente. Quanto ci vorrebbe per asciugare 30 vestiti? GPT-4, il più nuovo e migliore sistema di IA dice 30 ore. Non va bene. Un’altra. Ho una brocca di 12 litri e una di 6 litri e voglio misurare 6 litri. Come faccio? Uso la brocca di 6 litri, giusto? GPT-4 tira fuori delle elaboratissime assurdità senza senso.

(Laughter)

(Risate)

Step one, fill the six-liter jug, step two, pour the water from six to 12-liter jug, step three, fill the six-liter jug again, step four, very carefully, pour the water from six to 12-liter jug. And finally you have six liters of water in the six-liter jug that should be empty by now.

Uno, riempi la brocca di 6 litri, due, versa l’acqua dalla brocca di 6 litri in quella da 12, tre, riempi la brocca di 6 litri di nuovo, quattro, molto attentamente, versa l’acqua dalla brocca da 6 in quella da 12 e finalmente hai 6 litri d’acqua nella brocca da 6 litri che dovrebbe essere vuota ormai.

(Laughter)

(Risate)

OK, one more. Would I get a flat tire by bicycling over a bridge that is suspended over nails, screws and broken glass? Yes, highly likely, GPT-4 says, presumably because it cannot correctly reason that if a bridge is suspended over the broken nails and broken glass, then the surface of the bridge doesn't touch the sharp objects directly.

Ok, un’altra. Potrei bucare se andassi in bici su un ponte che è sospeso su chiodi, viti e vetri rotti? Sì, altamente probabile, dice GPT-4, forse perché non arriva al fatto che se un ponte è sospeso sui chiodi e i vetri rotti, allora la superficie del ponte non tocca direttamente gli oggetti taglienti.

OK, so how would you feel about an AI lawyer that aced the bar exam yet randomly fails at such basic common sense? AI today is unbelievably intelligent and then shockingly stupid.

Ok, allora come vi fa sentire un IA avvocato che ha passato l’esame ma che fallisce a caso su un buon senso così basilare? L’IA oggi è incredibilmente intelligente eppure sorprendentemente stupida.

(Laughter)

(Risate)

It is an unavoidable side effect of teaching AI through brute-force scale. Some scale optimists might say, “Don’t worry about this. All of these can be easily fixed by adding similar examples as yet more training data for AI." But the real question is this. Why should we even do that? You are able to get the correct answers right away without having to train yourself with similar examples. Children do not even read a trillion words to acquire such a basic level of common sense.

È un inevitabile effetto collaterale dell’educare l’IA con la forza bruta. Alcuni ottimisti della scalabilità direbbero, “Non preoccuparti. Queste cose possono essere risolte aggiungendo esempi simili ai dati di formazione per l’IA.” Ma la vera domanda è questa: perché dovremmo farlo? Siete in grado di rispondere correttamente e subito senza doverlo imparare da esempi simili. Neanche i bambini leggono un trilione di parole per acquisire un livello così basilare di buonsenso.

So this observation leads us to the next wisdom, choose your battles. So what fundamental questions should we ask right now and tackle today in order to overcome this status quo with extreme-scale AI? I'll say common sense is among the top priorities.

Quest’osservazione ci porta alla massima seguente, scegli le tue battaglie. Quali domande fondamentali dovremmo porre adesso e affrontare oggi per superare questo status quo con l’IA a scalabilità estrema? Io dico che il buon senso è tra le priorità assolute.

So common sense has been a long-standing challenge in AI. To explain why, let me draw an analogy to dark matter. So only five percent of the universe is normal matter that you can see and interact with, and the remaining 95 percent is dark matter and dark energy. Dark matter is completely invisible, but scientists speculate that it's there because it influences the visible world, even including the trajectory of light. So for language, the normal matter is the visible text, and the dark matter is the unspoken rules about how the world works, including naive physics and folk psychology, which influence the way people use and interpret language.

Il buonsenso è sempre stato una sfida di vecchia data per l’IA. Per spiegare il perché, farò un’analogia con la materia oscura. Solo il 5% dell’universo è materia normale che puoi vedere e con cui puoi interagire, e il restante 95% è materia oscura ed energia oscura. La materia oscura è invisibile, ma gli scienziati ipotizzano che esiste perché influenza il mondo visibile e persino la traiettoria della luce. Per il linguaggio, la materia normale è il testo visibile, e la materia oscura sono le regole non scritte su come funziona il mondo, comprese la fisica naif e psicologia popolare, che influenzano il modo in cui la gente usa e interpreta il linguaggio.

So why is this common sense even important? Well, in a famous thought experiment proposed by Nick Bostrom, AI was asked to produce and maximize the paper clips. And that AI decided to kill humans to utilize them as additional resources, to turn you into paper clips. Because AI didn't have the basic human understanding about human values. Now, writing a better objective and equation that explicitly states: “Do not kill humans” will not work either because AI might go ahead and kill all the trees, thinking that's a perfectly OK thing to do. And in fact, there are endless other things that AI obviously shouldn’t do while maximizing paper clips, including: “Don’t spread the fake news,” “Don’t steal,” “Don’t lie,” which are all part of our common sense understanding about how the world works.

Allora perché il buonsenso è importante? In un famoso esperimento mentale di Nick Bostrom, è stato chiesto a un’IA di produrre e massimizzare le graffette. E quell’IA ha deciso di uccidere gli umani per usarli come risorse aggiuntive, per trasformarvi in graffette. Perché l’IA non aveva una comprensione umana basilare dei valori umani. Ora, scrivere un obiettivo e un’equazione migliori che dicono chiaramente: “Non uccidere umani” nemmeno funzionerà perché l’IA potrebbe procedere a uccidere tutti gli alberi, pensando che sia una cosa normale da fare. E infatti, ci sono altre innumerevoli cose che l’IA ovviamente non dovrebbe fare mentre produce graffette, compreso “Non diffondere false notizie”, “Non rubare”, “Non mentire”, che fanno parte della comprensione del buonsenso su come funziona il mondo.

However, the AI field for decades has considered common sense as a nearly impossible challenge. So much so that when my students and colleagues and I started working on it several years ago, we were very much discouraged. We’ve been told that it’s a research topic of ’70s and ’80s; shouldn’t work on it because it will never work; in fact, don't even say the word to be taken seriously. Now fast forward to this year, I’m hearing: “Don’t work on it because ChatGPT has almost solved it.” And: “Just scale things up and magic will arise, and nothing else matters.”

In ogni caso, il campo dell’IA ha considerato per decenni il buonsenso come una sfida quasi impossibile. Così tanto che quando io insieme ai miei studenti e colleghi abbiamo iniziato a lavorarci parecchi anni fa, eravamo molto scoraggiati. Ci avevano detto che era un argomento di ricerca degli anni ’70 e ’80, non avremmo dovuto lavorarci perché non avrebbe mai funzionato; infatti non abbiamo mai pensato di essere presi sul serio. Facciamo un salto in avanti fino a quest’anno, ho sentito dire: “Non lavorarci su perché ChatGPT ha quasi risolto il problema.” Oppure: ”Basta ridimensionare le cose e tutto accadrà come per magia, non serve nient’altro.”

So my position is that giving true common sense human-like robots common sense to AI, is still moonshot. And you don’t reach to the Moon by making the tallest building in the world one inch taller at a time. Extreme-scale AI models do acquire an ever-more increasing amount of commonsense knowledge, I'll give you that. But remember, they still stumble on such trivial problems that even children can do.

La mia opinione è che dare del vero buonsenso, il buonsenso di robot simili all’uomo all’IA è ancora impossibile. E non si raggiunge la Luna costruendo l’edificio più alto del mondo un centimetro alla volta. I modelli di IA su scala estrema acquisiscono una quantità sempre più crescente di conoscenza sul buonsenso, e questo è vero, ma ricordate che inciampano ancora su problemi così banali che può risolverli anche un bambino.

So AI today is awfully inefficient. And what if there is an alternative path or path yet to be found? A path that can build on the advancements of the deep neural networks, but without going so extreme with the scale.

Perciò oggi l’IA è incredibilmente poco efficiente. E se ci fosse un percorso alternativo o un percorso ancora sconosciuto? Un percorso che possa partire dai progressi nelle reti neurali profonde, ma senza diventare un progetto a scalabilità estrema.

So this leads us to our final wisdom: innovate your weapons. In the modern-day AI context, that means innovate your data and algorithms. OK, so there are, roughly speaking, three types of data that modern AI is trained on: raw web data, crafted examples custom developed for AI training, and then human judgments, also known as human feedback on AI performance. If the AI is only trained on the first type, raw web data, which is freely available, it's not good because this data is loaded with racism and sexism and misinformation. So no matter how much of it you use, garbage in and garbage out. So the newest, greatest AI systems are now powered with the second and third types of data that are crafted and judged by human workers. It's analogous to writing specialized textbooks for AI to study from and then hiring human tutors to give constant feedback to AI. These are proprietary data, by and large, speculated to cost tens of millions of dollars. We don't know what's in this, but it should be open and publicly available so that we can inspect and ensure [it supports] diverse norms and values. So for this reason, my teams at UW and AI2 have been working on commonsense knowledge graphs as well as moral norm repositories to teach AI basic commonsense norms and morals. Our data is fully open so that anybody can inspect the content and make corrections as needed because transparency is the key for such an important research topic.

Questo ci ha portato all’ultima massima: implementate le armi. Nel contesto dell’IA moderna ciò significa innovare dati e algoritmi. A grandi linee, questi sono i tre tipi di dati in cui l’IA moderna si sta addentrando: dati web grezzi, esempi costruiti su misura per la formazione dell’IA e poi le valutazioni umane, note anche come riscontro umano sulle prestazioni IA. Se l’IA fosse guidata solo dal primo tipo, i dati web grezzi, che sono liberamente accessibili, non andrebbe bene perché questi dati sono carichi di razzismo, sessismo e disinformazione. Quindi non importa quanto la si usi, è robaccia che entra ed esce. I sistemi di AI più recenti e grandi adesso sono azionati dal secondo e dal terzo tipo di dati che sono lavorati e giudicati da personale umano. È simile a scrivere libri di testo specializzati per l’IA da cui studiare e poi assumere tutor umani per dare continui feedback all’IA. Si tratta di dati proprietari, per lo più, che si ipotizza costino decine di milioni di dollari. Non sappiamo cosa contengano, ma dovrebbero essere aperti e disponibili al pubblico così da poter essere ispezionati e garantire che rispettino norme e valori. Per questo motivo, il mio team a UW e AI2 ha lavorato sui grafici della conoscenza del buonsenso così come sugli archivi delle norme morali per insegnare all’IA il buonsenso di base, le norme e la morale. I nostri dati sono aperti così che chiunque possa ispezionare il contenuto e correggerlo se necessario perché la trasparenza è la chiave per una ricerca così importante.

Now let's think about learning algorithms. No matter how amazing large language models are, by design they may not be the best suited to serve as reliable knowledge models. And these language models do acquire a vast amount of knowledge, but they do so as a byproduct as opposed to direct learning objective. Resulting in unwanted side effects such as hallucinated effects and lack of common sense. Now, in contrast, human learning is never about predicting which word comes next, but it's really about making sense of the world and learning how the world works. Maybe AI should be taught that way as well.

Pensiamo adesso a come imparare gli algoritmi Non importa quanto fantastici i grandi modelli linguistici siano, dal design potrebbero non essere l’ideale come modelli di conoscenza affidabili. E questi modelli linguistici sì che acquisiscono una grande varietà di sapere, ma lo fanno come effetto secondario invece di un obiettivo di apprendimento diretto. Con conseguenti effetti collaterali indesiderati come allucinazioni e mancanza di buonsenso. Ora, invece, l’apprendimento umano non consiste nel predire quale parola verrà, ma si tratta di dare senso al mondo e imparare come funziona. Forse all’IA dovrebbe essere insegnato anche questo.

So as a quest toward more direct commonsense knowledge acquisition, my team has been investigating potential new algorithms, including symbolic knowledge distillation that can take a very large language model as shown here that I couldn't fit into the screen because it's too large, and crunch that down to much smaller commonsense models using deep neural networks. And in doing so, we also generate, algorithmically, human-inspectable, symbolic, commonsense knowledge representation, so that people can inspect and make corrections and even use it to train other neural commonsense models.

Come ricerca verso l’acquisizione più diretta della conoscenza del buonsenso il mio team ha investigato nuovi potenziali algoritmi, inclusa la distillazione simbolica della conoscenza che può prendere un modello di lingua molto ampio come mostrato qui e che non sono riuscita a far entrare nello schermo perché troppo grande, e sminuzzare il tutto in modelli di senso pratico molto più piccoli usando un network neurale profondo. Nel farlo creiamo anche, algoritmicamente, una conoscenza del buonsenso che sia ispezionabile dall’uomo e simbolica così che possa essere ispezionata, corretta e usata per formare altri modelli neurali del buonsenso.

More broadly, we have been tackling this seemingly impossible giant puzzle of common sense, ranging from physical, social and visual common sense to theory of minds, norms and morals. Each individual piece may seem quirky and incomplete, but when you step back, it's almost as if these pieces weave together into a tapestry that we call human experience and common sense.

Più in generale, abbiamo affrontato questo puzzle apparentemente impossibile di buonsenso, con un range che va dal buonsenso fisico, sociale e visivo alla teoria delle menti, norme e morali. Ogni singolo pezzo può sembrare peculiare e incompleto, ma se facciamo un passo indietro, è come se questi pezzi si intrecciassero insieme in un quadro che chiamiamo l’esperienza umana e il senso comune.

We're now entering a new era in which AI is almost like a new intellectual species with unique strengths and weaknesses compared to humans. In order to make this powerful AI sustainable and humanistic, we need to teach AI common sense, norms and values.

Stiamo entrando in una nuova era in cui l’IA è quasi come le nuove specie intellettuali con forze e debolezze uniche comparate agli esseri umani. In modo da rendere questa potente IA sostenibile e umanistica dobbiamo insegnare all’IA il buon senso, le norme e i valori.

Thank you.

Grazie.

(Applause)

(Applausi)

Chris Anderson: Look at that. Yejin, please stay one sec. This is so interesting, this idea of common sense. We obviously all really want this from whatever's coming. But help me understand. Like, so we've had this model of a child learning. How does a child gain common sense apart from the accumulation of more input and some, you know, human feedback? What else is there?

Chris Anderson: Guarda qui. Yejin, per favore resta un secondo. È davvero interessante, quest’idea del buonsenso. Ovviamente lo vogliamo tutti, qualunque cosa stia per arrivare Ma aiutami a capire. Abbiamo quindi avuto questo modello di apprendimento dei bambini. Come fanno i bambini ad acquisire il buonsenso oltre che dall’accumulazione di più input e alcuni feedback umani? Cos’altro c’è?

Yejin Choi: So fundamentally, there are several things missing, but one of them is, for example, the ability to make hypothesis and make experiments, interact with the world and develop this hypothesis. We abstract away the concepts about how the world works, and then that's how we truly learn, as opposed to today's language model. Some of them is really not there quite yet.

Yejin Choi: Fondamentalmente mancano molte cose, ma una di queste è, per esempio, l’abilità di fare ipotesi e fare esperimenti, interagire con il mondo e sviluppare questa ipotesi. Noi astraiamo i concetti su come funziona il mondo, ed è così che impariamo veramente, al contrario del modello linguistico di oggi. Alcuni di loro non ci sono ancora arrivati.

CA: You use the analogy that we can’t get to the Moon by extending a building a foot at a time. But the experience that most of us have had of these language models is not a foot at a time. It's like, the sort of, breathtaking acceleration. Are you sure that given the pace at which those things are going, each next level seems to be bringing with it what feels kind of like wisdom and knowledge.

CA: Tu usi l’analogia che non possiamo arrivare sulla Luna estendendo un edificio un metro la volta. Ma l’esperienza che noi abbiamo avuto con questi modelli non è di un metro la volta. È stata, più o meno, un’accelerazione incredibile. Sei sicura che, visto l’andamento di come stanno andando le cose, ogni step successivo sembra portare con se quello che sembra saggezza e conoscenza.

YC: I totally agree that it's remarkable how much this scaling things up really enhances the performance across the board. So there's real learning happening due to the scale of the compute and data.

YC: Sono pienamente d’accordo sul fatto che sia notevole come la scalabilità migliori davvero le prestazioni in tutti i suoi aspetti. Quindi c’è un apprendimento reale grazie alla scalabilità dei computer e dati.

However, there's a quality of learning that is still not quite there. And the thing is, we don't yet know whether we can fully get there or not just by scaling things up. And if we cannot, then there's this question of what else? And then even if we could, do we like this idea of having very, very extreme-scale AI models that only a few can create and own?

Tuttavia, quello che manca ancora è la qualità dell’apprendimento. E il fatto è che, non sappiamo ancora se possiamo arrivarci o meno solamente aumentando la scalabilità. E se non possiamo, ci sarà poi la domanda di cos’altro? E anche se potessimo, ci piace l’idea di avere dei modelli di IA su scala molto estrema che solo pochi possono creare e possedere?

CA: I mean, if OpenAI said, you know, "We're interested in your work, we would like you to help improve our model," can you see any way of combining what you're doing with what they have built?

CA: Se OpenAI dicesse, per esempio “Siamo interessati al vostro lavoro, vorremmo che ci aiutaste a migliorare il nostro modello,” riesci a vedere un modo per unire quello che fate con quello che loro hanno costruito?

YC: Certainly what I envision will need to build on the advancements of deep neural networks. And it might be that there’s some scale Goldilocks Zone, such that ... I'm not imagining that the smaller is the better either, by the way. It's likely that there's right amount of scale, but beyond that, the winning recipe might be something else. So some synthesis of ideas will be critical here.

YC: Sicuramente quello che immagino è che bisognerà costruirlo sugli sviluppi di una profonda rete neurale. E può darsi che ci sia qualche zona abitabile su scala, tale da ... Non immagino nemmeno che il più piccolo sia migliore, tral’altro. È probabile che ci sia la giusta quantità di scalabilità, ma a parte questo, la ricetta vincente potrebbe essere un’altra. Quindi una sintesi delle idee potrebbe essere critica qui.

CA: Yejin Choi, thank you so much for your talk.

CA: Yejin Choi, grazie mille per il tuo talk.

(Applause)

(Applausi)

(Laughter)

(Risate)

(Laughter)

(Risate)

OK, so how would you feel about an AI lawyer that aced the bar exam yet randomly fails at such basic common sense? AI today is unbelievably intelligent and then shockingly stupid.

(Laughter)

(Risate)

Thank you.

Grazie.

(Applause)

(Applausi)

YC: I totally agree that it's remarkable how much this scaling things up really enhances the performance across the board. So there's real learning happening due to the scale of the compute and data.

CA: I mean, if OpenAI said, you know, "We're interested in your work, we would like you to help improve our model," can you see any way of combining what you're doing with what they have built?