Yejin Choi: Why AI is incredibly smart and shockingly stupid

So I'm excited to share a few spicy thoughts on artificial intelligence.

Sinto-me encantado por partilhar alguns pensamentos picantes

But first, let's get philosophical by starting with this quote by Voltaire, an 18th century Enlightenment philosopher, who said, "Common sense is not so common." Turns out this quote couldn't be more relevant to artificial intelligence today. Despite that, AI is an undeniably powerful tool, beating the world-class "Go" champion, acing college admission tests and even passing the bar exam.

sobre a inteligência artificial. Mas primeiro comecemos filosoficamente com esta citação de Voltaire, um filósofo do Iluminismo do século XVIII, que disse: “O senso comum não é lá muito comum.” Acontece que esta citação não podia ser mais pertinente para a inteligência artificial dos dias de hoje. Apesar disso, a IA é inegavelmente uma potente ferramenta, que vence o campeão mundial de “Go,” que passa nos testes de admissão à faculdade e até passa no exame da Ordem dos Advogados.

I’m a computer scientist of 20 years, and I work on artificial intelligence. I am here to demystify AI. So AI today is like a Goliath. It is literally very, very large. It is speculated that the recent ones are trained on tens of thousands of GPUs and a trillion words. Such extreme-scale AI models, often referred to as "large language models," appear to demonstrate sparks of AGI, artificial general intelligence. Except when it makes small, silly mistakes, which it often does. Many believe that whatever mistakes AI makes today can be easily fixed with brute force, bigger scale and more resources. What possibly could go wrong?

Sou cientista informático há 20 anos e trabalho na inteligência artificial. Estou aqui para desmistificar a IA. A IA hoje é como o Golias. É literalmente muito, muito grande. Diz-se que as mais recentes são treinadas com dezenas de milhares de GPU e um bilião de palavras. Estes modelos de IA de escala extrema. designados com frequência por “grandes modelos linguísticos”, parecem demonstrar sinais de IAG, Inteligência Artificial Geral. Exceto quando fazem pequenos erros idiotas, o que acontece com frequência. Muita gente julga que os erros que a IA faz atualmente podem ser corrigidos facilmente pela força bruta, por uma maior escala e mais recursos. O que é que pode correr mal?

So there are three immediate challenges we face already at the societal level. First, extreme-scale AI models are so expensive to train, and only a few tech companies can afford to do so. So we already see the concentration of power. But what's worse for AI safety, we are now at the mercy of those few tech companies because researchers in the larger community do not have the means to truly inspect and dissect these models. And let's not forget their massive carbon footprint and the environmental impact.

Há três problemas imediatos que já enfrentamos a nível da sociedade. Primeiro, os modelos de IA de escala extrema exigem um treino dispendioso e só algumas empresas tecnológicas têm hipótese de o fazer. Por isso, já estamos a ver a concentração do poder. Mas o que é pior para a segurança da IA, é que estamos à mercê dessas poucas empresas de tecnologia porque os investigadores da comunidade mais alargada não têm os meios de inspecionar e dissecar a sério esses modelos. E não esqueçamos a sua enorme pegada de carbono e o impacto ambiental.

And then there are these additional intellectual questions. Can AI, without robust common sense, be truly safe for humanity? And is brute-force scale really the only way and even the correct way to teach AI?

Depois, há estas questões intelectuais adicionais: Poderá a IA, sem um sólido bom senso, ser realmente segura para a Humanidade? Será o aumento da força bruta a única forma, aliás a forma correta, de ensinar a IA?

So I’m often asked these days whether it's even feasible to do any meaningful research without extreme-scale compute. And I work at a university and nonprofit research institute, so I cannot afford a massive GPU farm to create enormous language models. Nevertheless, I believe that there's so much we need to do and can do to make AI sustainable and humanistic. We need to make AI smaller, to democratize it. And we need to make AI safer by teaching human norms and values. Perhaps we can draw an analogy from "David and Goliath," here, Goliath being the extreme-scale language models, and seek inspiration from an old-time classic, "The Art of War," which tells us, in my interpretation, know your enemy, choose your battles, and innovate your weapons.

Perguntam-me hoje muitas vezes se será viável fazer uma investigação significativa sem computação em grande escala. Eu trabalho numa universidade e num instituto de investigação sem fins lucrativos, portanto não disponho dum enorme parque de GPU para criar modelos linguísticos gigantescos. Apesar disso, creio que há muita coisa que precisamos de fazer e podemos fazer para tornar a IA sustentável e humanista. Precisamos de fazer a IA mais pequena, para a democratizar. Precisamos de tornar a IA mais segura, ensinando-lhe normas e valores humanos. Talvez possamos fazer uma analogia com “David e Golias”, em que Golias será os modelos linguísticos de escala extrema e procurar inspiração no clássico da Antiguidade ”A Arte da Guerra”. que nos diz, segundo a minha interpretação, “conhece o teu inimigo, escolhe as tuas batalhas, “e inova as tuas armas.”

Let's start with the first, know your enemy, which means we need to evaluate AI with scrutiny. AI is passing the bar exam. Does that mean that AI is robust at common sense? You might assume so, but you never know.

Comecemos pelo primeiro “conhece o teu inimigo”, o que significa que precisamos de avaliar a IA com rigor. A IA foi aprovada no exame da Ordem dos Advogados Isso significa que a IA é robusta em termos de senso comum? Podemos supor que sim, mas não sabemos ao certo.

So suppose I left five clothes to dry out in the sun, and it took them five hours to dry completely. How long would it take to dry 30 clothes? GPT-4, the newest, greatest AI system says 30 hours. Not good. A different one. I have 12-liter jug and six-liter jug, and I want to measure six liters. How do I do it? Just use the six liter jug, right? GPT-4 spits out some very elaborate nonsense.

Suponhamos que “Pus 5 peças de roupa a secar ao sol. “Demoraram 5 horas a secar por completo. “Quanto tempo demorariam a secar 30 peças de roupa?” O GPT-4, o mais recente e melhor sistema da IA responde 30 horas! Não é nada bom. Outro exemplo. “Tenho um jarro de 12 litros e um jarro de 6 litros “e quero medir 6 litros. “Como é que faço?” Uso o jarro de 6 litros, não é verdade? O GPT-4 cospe alguns disparates muito elaborados.

(Laughter)

(Risos)

Step one, fill the six-liter jug, step two, pour the water from six to 12-liter jug, step three, fill the six-liter jug again, step four, very carefully, pour the water from six to 12-liter jug. And finally you have six liters of water in the six-liter jug that should be empty by now.

“Passo um: encher o jarro de 6 litros, “Passo dois: despejar a água do jarro de 6 litros para o de 12 litros. “Passo três: encher de novo o jarro de 6 litros. “Passo quatro: muito cuidadosamente, “despejar a água do jarro de 6 litros para o de 12 litros. “Por fim, ficamos com 6 litros de água no jarro de 6 litros” — que, nesta altura, deve estar vazio!

(Laughter)

(Risos)

OK, one more. Would I get a flat tire by bicycling over a bridge that is suspended over nails, screws and broken glass? Yes, highly likely, GPT-4 says, presumably because it cannot correctly reason that if a bridge is suspended over the broken nails and broken glass, then the surface of the bridge doesn't touch the sharp objects directly.

OK, mais um exemplo. “Será que fico com um pneu furado quando atravesso de bicicleta uma ponte “que está suspensa sobre pregos, parafusos e vidros partidos?” “Sim, muito provavelmente”, diz o GTP-4”, provavelmente porque não consegue deduzir corretamente que, se a ponte está suspensa sobre os pregos e os vidros partidos então a superfície da ponte não toca diretamente nos objetos agudos.

OK, so how would you feel about an AI lawyer that aced the bar exam

OK, então como é que vocês se sentiriam

yet randomly fails at such basic common sense? AI today is unbelievably intelligent and then shockingly stupid.

se um advogado da IA que ficou aprovado no exame falha aleatoriamente num senso comum tão básico? A IA atual é incrivelmente inteligente e, ao mesmo tempo, chocantemente estúpida.

(Laughter)

(Risos)

It is an unavoidable side effect of teaching AI through brute-force scale. Some scale optimists might say, “Don’t worry about this. All of these can be easily fixed by adding similar examples as yet more training data for AI." But the real question is this. Why should we even do that? You are able to get the correct answers right away without having to train yourself with similar examples. Children do not even read a trillion words to acquire such a basic level of common sense.

É um efeito secundário inevitável do ensino da IA através da força bruta. Alguns otimistas de escala podem dizer: “Não se preocupem com isso, “Todas essas situações podem ser corrigidas facilmente, “adicionando exemplos semelhantes como mais dados de treino para a IA.” Mas a verdadeira questão é esta: Porque é que havemos de fazer isso? Podemos obter as respostas certas de imediato sem termos de nos treinar com exemplos semelhantes. As crianças não leem um bilião de palavras para adquirirem um nível tão básico de senso comum.

So this observation leads us to the next wisdom, choose your battles. So what fundamental questions should we ask right now and tackle today in order to overcome this status quo with extreme-scale AI? I'll say common sense is among the top priorities.

Esta observação leva-nos ao segundo conselho: “escolhe as tuas batalhas”. Quais são as questões fundamentais que devemos colocar agora e resolver agora a fim de ultrapassar este status quo com a IA de escala extrema? Eu diria que o senso comum está entre as principais prioridades.

So common sense has been a long-standing challenge in AI. To explain why, let me draw an analogy to dark matter. So only five percent of the universe is normal matter that you can see and interact with, and the remaining 95 percent is dark matter and dark energy. Dark matter is completely invisible, but scientists speculate that it's there because it influences the visible world, even including the trajectory of light. So for language, the normal matter is the visible text, and the dark matter is the unspoken rules about how the world works, including naive physics and folk psychology, which influence the way people use and interpret language.

O senso comum tem sido um problema de há muito na IA. Para explicar porquê, vou usar uma analogia com a matéria escura. Só 5% do universo é matéria normal que vemos e com que podemos interagir. Os restantes 95% são matéria escura e energia escura. A matéria escura é totalmente invisível, mas os cientistas deduzem que ela está lá porque influencia o mundo visível, incluindo até a trajetória da luz. Quanto à linguagem, a matéria normal é o texto visível, e a matéria escura são as regras mudas sobre como funciona o mundo incluindo a física ingénua e a psicologia popular, que influenciam a forma como as pessoas usam e interpretam a língua.

So why is this common sense even important? Well, in a famous thought experiment proposed by Nick Bostrom, AI was asked to produce and maximize the paper clips. And that AI decided to kill humans to utilize them as additional resources, to turn you into paper clips. Because AI didn't have the basic human understanding about human values. Now, writing a better objective and equation that explicitly states: “Do not kill humans” will not work either because AI might go ahead and kill all the trees, thinking that's a perfectly OK thing to do. And in fact, there are endless other things that AI obviously shouldn’t do while maximizing paper clips, including: “Don’t spread the fake news,” “Don’t steal,” “Don’t lie,” which are all part of our common sense understanding about how the world works.

Então, porque é que este senso comum é tão importante? Bom, num conhecido exercício intelectual proposto por Nick Bostrom pediram à IA que produzisse e maximizasse os clips para papel e essa IA decidiu matar pessoas para as utilizar como recursos adicionais, para as transformar em clips para papel, porque a IA não tinha uma compreensão humana básica sobre os valores humanos. Escrever um objetivo melhor e uma equação que afirmasse explicitamente: “Não matar seres humanos” também não funcionaria porque a IA podia avançar e matar todas as árvores, pensando que é uma coisa perfeitamente aceitável. De facto, há um nunca acabar de outras coisas que, claro, a IA não devia fazer para maximizar clips para papel, incluindo “Não espalhar notícias falsas”, “Não roubar”, “Não mentir”, que fazem parte do nosso senso comum, da compreensão sobre como funciona o mundo.

However, the AI field for decades has considered common sense as a nearly impossible challenge. So much so that when my students and colleagues and I started working on it several years ago, we were very much discouraged. We’ve been told that it’s a research topic of ’70s and ’80s; shouldn’t work on it because it will never work; in fact, don't even say the word to be taken seriously. Now fast forward to this year, I’m hearing: “Don’t work on it because ChatGPT has almost solved it.” And: “Just scale things up and magic will arise, and nothing else matters.”

No entanto, durante décadas, na área da IA, tem-se considerado que o senso comum é um problema quase impossível de resolver. De tal modo que, quando os meus alunos, colegas e eu começámos a trabalhar em senso comum, já há anos, fomos muito desencorajados. Disseram-nos que era um tópico de investigação dos anos 70 e 80; que não devíamos trabalhar nisso porque nunca iria funcionar; de facto, nem digam essa palavra se querem ser levados a sério. Avançando rapidamente para hoje, ainda oiço: “Não trabalhes nisso porque o ChatGPT já quase resolveu isso,” E: ”Basta aumentar a escala das coisas e a magia surgirá, “e nada mais importa.”

So my position is that giving true common sense human-like robots common sense to AI, is still moonshot. And you don’t reach to the Moon by making the tallest building in the world one inch taller at a time. Extreme-scale AI models do acquire an ever-more increasing amount of commonsense knowledge, I'll give you that. But remember, they still stumble on such trivial problems that even children can do.

Assim, a minha posição é que dar um verdadeiro senso comum à IA, continua a ser uma hipótese remota. Não chegaremos à lua aumentando o edifício mais alto do mundo, um centímetro de cada vez. Reconheço que os modelos de IA de escala extrema vão adquirindo uma quantidade cada vez maior de conhecimentos de senso comum. Mas lembrem-se, continuam a tropeçar em problemas tão triviais que até as crianças sabem resolver.

So AI today is awfully inefficient. And what if there is an alternative path or path yet to be found? A path that can build on the advancements of the deep neural networks, but without going so extreme with the scale.

Assim, a IA atual é muito ineficaz. E se houver um caminho alternativo ou um caminho por encontrar? Um caminho que possa basear-se no avanço das redes neurais profundas, mas sem ir tão longe na escala.

So this leads us to our final wisdom: innovate your weapons. In the modern-day AI context, that means innovate your data and algorithms. OK, so there are, roughly speaking, three types of data that modern AI is trained on: raw web data, crafted examples custom developed for AI training, and then human judgments, also known as human feedback on AI performance. If the AI is only trained on the first type, raw web data, which is freely available, it's not good because this data is loaded with racism and sexism and misinformation. So no matter how much of it you use, garbage in and garbage out. So the newest, greatest AI systems are now powered with the second and third types of data that are crafted and judged by human workers. It's analogous to writing specialized textbooks for AI to study from and then hiring human tutors to give constant feedback to AI. These are proprietary data, by and large, speculated to cost tens of millions of dollars. We don't know what's in this, but it should be open and publicly available so that we can inspect and ensure [it supports] diverse norms and values. So for this reason, my teams at UW and AI2 have been working on commonsense knowledge graphs as well as moral norm repositories to teach AI basic commonsense norms and morals. Our data is fully open so that anybody can inspect the content and make corrections as needed because transparency is the key for such an important research topic.

Isso leva-nos ao conselho final: “Inovem as vossas armas.” No contexto da IA de hoje, isso significa inovar os nossos dados e algoritmos. OK, grosso modo, há três tipos de dados em que a IA de hoje é treinada: dados brutos na Internet, exemplos elaborados, desenvolvidos à medida para o treino da IA, e juízos humanos, também conhecidos por feedback humano sobre o desempenho da IA. Se a IA for treinada apenas com o primeiro tipo — os dados brutos na Internet que estão à disposição de todos — não é bom, porque esses dados estão cheios de racismo e sexismo e informações incorretas. Por isso, seja qual for a quantidade usada se entra lixo, sai lixo. Por isso, os sistemas mais recentes e melhores da IA são agora alimentados com o segundo e terceiro tipos de dados que são elaborados e avaliados por trabalhadores humanos. É o mesmo que escrever manuais especializados para a IA estudar e depois contratar tutores humanos para fornecerem permanente feedback à IA. Estes são dados proprietários, de modo geral, que, segundo consta, custam dezenas de milhões de dólares. Não sabemos o que é que contêm, mas devem ser abertos e disponíveis publicamente para podermos inspecioná-los e garantir que respeitam normas e valores diversos. Por esta razão, as minhas equipas em UW e AI2 têm estado a trabalhar em gráficos de conhecimento de senso comum assim como em repositórios de normas morais para ensinar à IA o senso comum básico. as normas e a moral. Os nossos dados são totalmente abertos, todos podem inspecionar o conteúdo e fazer as correções necessárias porque a transparência é a chave para um tópico de investigação desta importância.

Now let's think about learning algorithms. No matter how amazing large language models are, by design they may not be the best suited to serve as reliable knowledge models. And these language models do acquire a vast amount of knowledge, but they do so as a byproduct as opposed to direct learning objective. Resulting in unwanted side effects such as hallucinated effects and lack of common sense. Now, in contrast, human learning is never about predicting which word comes next, but it's really about making sense of the world and learning how the world works. Maybe AI should be taught that way as well.

Pensemos agora nos algoritmos de aprendizagem. Por mais espantosa que sejam os grandes modelos linguísticos, dada a sua conceção, podem não ser os mais adequados para servirem como modelos fiáveis de conhecimento. Estes modelos linguísticos adquirem uma vasta quantidade de conhecimentos, mas fazem-no como um subproduto em oposição a um objetivo de aprendizagem direta, o que resulta em efeitos colaterais indesejados, como efeitos alucinados e falta de senso comum. Em contrapartida, a aprendizagem humana nunca é uma previsão de qual a palavra que vem seguir, é ter noção do sentido do mundo e aprender como funciona o mundo. Talvez a IA também devesse ser ensinada dessa forma.

So as a quest toward more direct commonsense knowledge acquisition, my team has been investigating potential new algorithms, including symbolic knowledge distillation that can take a very large language model as shown here that I couldn't fit into the screen because it's too large, and crunch that down to much smaller commonsense models using deep neural networks. And in doing so, we also generate, algorithmically, human-inspectable, symbolic, commonsense knowledge representation, so that people can inspect and make corrections and even use it to train other neural commonsense models.

Assim, numa busca para aquisição de um conhecimento mais direto de senso comum a minha equipa tem estado a investigar possíveis novos algoritmos, incluindo a destilação de conhecimentos simbólicos que podem ter um modelo linguístico como aqui se mostra que não pude encaixar no ecrã porque é demasiado grande e reduzi-lo a modelos de senso comum muito mais pequenos utilizando redes neurais profundas. Ao fazer isso, também geramos, algoritmicamente, e inspecionável por humanos, uma representação dos conhecimentos de senso comum, simbólica e inspecionável, de modo que as pessoas podem inspecionar e fazer correções e até usar para treinar outros modelos neurais de senso comum.

More broadly, we have been tackling this seemingly impossible giant puzzle of common sense, ranging from physical, social and visual common sense to theory of minds, norms and morals. Each individual piece may seem quirky and incomplete, but when you step back, it's almost as if these pieces weave together into a tapestry that we call human experience and common sense.

Em termos mais gerais, temos estado a resolver este puzzle gigante de senso comum, aparentemente impossível, do senso comum físico, social e visual à teoria da mente, às normas e à moral. Cada peça individual pode parecer peculiar e incompleta, mas quando nos afastamos é quase como se essas peças se juntassem numa tapeçaria a que chamamos experiência humana e senso comum.

We're now entering a new era in which AI is almost like a new intellectual species with unique strengths and weaknesses compared to humans. In order to make this powerful AI sustainable and humanistic, we need to teach AI common sense, norms and values.

Estamos a entrar numa nova era em que a IA é quase como uma nova espécie intelectual com forças e fraquezas únicas em comparação com os seres humanos. A fim de tornar esta poderosa IA sustentável e humanista, precisamos de ensinar à IA senso comum, normas e valores.

Thank you.

Obrigada.

(Applause)

(Aplausos)

Chris Anderson: Look at that. Yejin, please stay one sec. This is so interesting, this idea of common sense. We obviously all really want this from whatever's coming. But help me understand. Like, so we've had this model of a child learning. How does a child gain common sense apart from the accumulation of more input and some, you know, human feedback? What else is there?

Chris Anderson: Vejam só! Yejin, por favor, fica mais um segundo. Isto é muito interessante, esta ideia de senso comum. Obviamente, todos queremos o que está para vir. Mas ajuda-me a perceber. Temos esse modelo de uma criança a aprender. Como é que uma criança adquire senso comum para além da acumulação de mais informações e de algum feedback humano? O que é que há mais?

Yejin Choi: So fundamentally, there are several things missing, but one of them is, for example, the ability to make hypothesis and make experiments, interact with the world and develop this hypothesis. We abstract away the concepts about how the world works, and then that's how we truly learn, as opposed to today's language model. Some of them is really not there quite yet.

Yejin Choi: Fundamentalmente, faltam várias coisas, mas uma delas, por exemplo, é a capacidade de fazer uma hipótese, de fazer experiências, de interagir com o mundo e desenvolver essa hipótese. Nós abstraímos os conceitos sobre como funciona o mundo e é assim que aprendemos verdadeiramente, em oposição ao modelo linguístico de hoje. Alguns deles ainda não chegaram lá.

CA: You use the analogy that we can’t get to the Moon by extending a building a foot at a time. But the experience that most of us have had of these language models is not a foot at a time. It's like, the sort of, breathtaking acceleration. Are you sure that given the pace at which those things are going, each next level seems to be bringing with it what feels kind of like wisdom and knowledge.

CA: Tu usas a analogia de que não podemos chegar à Lua aumentando um edifício um centímetro de cada vez. Mas a experiência que muitos têm desses modelos linguísticos não é de um centímetro de cada vez. É duma aceleração vertiginosa. Tens a certeza de que, dado o ritmo a que as coisas estão a ocorrer, cada nível seguinte parece trazer consigo o que parece ser sabedoria e conhecimento.

YC: I totally agree that it's remarkable how much this scaling things up really enhances the performance across the board. So there's real learning happening due to the scale of the compute and data.

YC: Concordo plenamente que é notável quanto isso aumenta a escala das coisas e melhora o desempenho em toda a linha. Portanto, está a ocorrer uma verdadeira aprendizagem devido à escala dos computadores e dos dados.

However, there's a quality of learning that is still not quite there. And the thing is, we don't yet know whether we can fully get there or not just by scaling things up. And if we cannot, then there's this question of what else? And then even if we could, do we like this idea of having very, very extreme-scale AI models that only a few can create and own?

No entanto, há uma qualidade da aprendizagem que ainda não aparece. O que acontece é que ainda não sabemos se podemos lá chegar ou não apenas por aumentar a escala das coisas. Se não pudermos, impõe-se a pergunta: “E a seguir?” Mas, mesmo que possamos, será que gostamos desta ideia de ter modelos de IA de escala extrema que só alguns sabem criar e possuir?

CA: I mean, if OpenAI said, you know, "We're interested in your work, we would like you to help improve our model," can you see any way of combining what you're doing with what they have built?

CA: Se o OpenAI disser: “Estamos interessados no vosso trabalho, gostávamos de vos ajudar a melhorar o vosso modelo”, consegues ver alguma forma de combinar o que vocês estão a fazer com aquilo que eles têm criado?

YC: Certainly what I envision will need to build on the advancements of deep neural networks. And it might be that there’s some scale Goldilocks Zone, such that ... I'm not imagining that the smaller is the better either, by the way. It's likely that there's right amount of scale, but beyond that, the winning recipe might be something else. So some synthesis of ideas will be critical here.

YC: Certamente, o que eu imagino será necessário para aproveitar os avanços das redes neurais profundas. E pode acontecer haver alguma escala ideal, de modo que... Já agora, não estou a imaginar que o mais pequeno é o melhor. É provável que haja uma escala correta mas, para além disso, a receita vencedora pode ser outra coisa. Uma síntese de ideias seria importante agora.

CA: Yejin Choi, thank you so much for your talk.

CA: Yejin Choi, muito obrigado pela tua palestra.

(Applause)

(Aplausos)

So I'm excited to share a few spicy thoughts on artificial intelligence.

Sinto-me encantado por partilhar alguns pensamentos picantes

(Laughter)

(Risos)

(Laughter)

(Risos)

OK, so how would you feel about an AI lawyer that aced the bar exam

OK, então como é que vocês se sentiriam

yet randomly fails at such basic common sense? AI today is unbelievably intelligent and then shockingly stupid.

se um advogado da IA que ficou aprovado no exame falha aleatoriamente num senso comum tão básico? A IA atual é incrivelmente inteligente e, ao mesmo tempo, chocantemente estúpida.

(Laughter)

(Risos)

Thank you.

Obrigada.

(Applause)

(Aplausos)

YC: I totally agree that it's remarkable how much this scaling things up really enhances the performance across the board. So there's real learning happening due to the scale of the compute and data.

CA: I mean, if OpenAI said, you know, "We're interested in your work, we would like you to help improve our model," can you see any way of combining what you're doing with what they have built?