Yejin Choi: Why AI is incredibly smart and shockingly stupid

So I'm excited to share a few spicy thoughts on artificial intelligence.

Estoy feliz de compartir con Uds. algunos interesantes conceptos

But first, let's get philosophical by starting with this quote by Voltaire, an 18th century Enlightenment philosopher, who said, "Common sense is not so common." Turns out this quote couldn't be more relevant to artificial intelligence today. Despite that, AI is an undeniably powerful tool, beating the world-class "Go" champion, acing college admission tests and even passing the bar exam.

sobre la inteligencia artificial. Primero, un poco de filosofía con esta cita de Voltaire, filósofo de la Ilustración del siglo XVIII, que dijo: “El sentido común no es nada común”. Pues bien, esta cita es hoy totalmente aplicable a la inteligencia artificial. Pese a ello, la IA es una herramienta tan poderosa que le ha ganado al campeón mundial de go, aprobó con altas notas los exámenes de ingreso universitario y hasta el examen Bar.

I’m a computer scientist of 20 years, and I work on artificial intelligence. I am here to demystify AI. So AI today is like a Goliath. It is literally very, very large. It is speculated that the recent ones are trained on tens of thousands of GPUs and a trillion words. Such extreme-scale AI models, often referred to as "large language models," appear to demonstrate sparks of AGI, artificial general intelligence. Except when it makes small, silly mistakes, which it often does. Many believe that whatever mistakes AI makes today can be easily fixed with brute force, bigger scale and more resources. What possibly could go wrong?

Llevo 20 años como experta en informática, y me especializo en inteligencia artificial. Vine hoy a desmitificar la inteligencia artificial. Actualmente, la inteligencia artificial es como Goliat. Es literalmente inmensa. Se cree que los últimos modelos de IA están entrenados con miles de unidades de procesamiento gráfico, o GPU, y un billón de palabras. Estos modelos de IA son de escala extrema, generalmente llamados “grandes modelos de lenguaje”, muestran, al parecer, algunos signos de IAG, que es la inteligencia artificial general. Excepto cuando comete errores pequeños, tontos, lo cual hace a menudo. Muchos creen que cualquier error que hoy cometa la IA puede ser fácilmente corregido con la fuerza bruta, mayores escalas y más recursos. ¿Qué podría salir mal?

So there are three immediate challenges we face already at the societal level. First, extreme-scale AI models are so expensive to train, and only a few tech companies can afford to do so. So we already see the concentration of power. But what's worse for AI safety, we are now at the mercy of those few tech companies because researchers in the larger community do not have the means to truly inspect and dissect these models. And let's not forget their massive carbon footprint and the environmental impact.

Pues bien, hay tres desafíos inmediatos que ya tenemos a nivel social. Primero, los modelos de IA a escala extrema son muy costosos de entrenar, y muy pocas empresas tecnológicas pueden costear esa inversión. Allí ya vemos la concentración de poder. Pero lo peor para la seguridad de la IA es que ahora estamos a merced de esas pocas empresas tecnológicas, porque los investigadores de la comunidad en general no tienen los medios para examinar estos modelos de manera exhaustiva. Y no olvidemos la enorme huella de carbono y el impacto ambiental que causan.

And then there are these additional intellectual questions. Can AI, without robust common sense, be truly safe for humanity? And is brute-force scale really the only way and even the correct way to teach AI?

Por otro lado, hay cuestionamientos de tipo intelectual. ¿Es la IA realmente segura para la humanidad, siendo que carece de un sentido común sólido? Y ¿es con fuerza bruta la única manera correcta de entrenar a la IA?

So I’m often asked these days whether it's even feasible to do any meaningful research without extreme-scale compute. And I work at a university and nonprofit research institute, so I cannot afford a massive GPU farm to create enormous language models. Nevertheless, I believe that there's so much we need to do and can do to make AI sustainable and humanistic. We need to make AI smaller, to democratize it. And we need to make AI safer by teaching human norms and values. Perhaps we can draw an analogy from "David and Goliath," here, Goliath being the extreme-scale language models, and seek inspiration from an old-time classic, "The Art of War," which tells us, in my interpretation, know your enemy, choose your battles, and innovate your weapons.

Últimamente, me suelen preguntar si es incluso posible hacer una investigación seria sin computación a escala extrema. Yo trabajo en una universidad y en un instituto de investigación sin fines de lucro, así que no puedo costear una inmensa granja de GPU para crear enormes modelos de lenguaje. Pero creo que hay mucho que debemos y podemos hacer para que la IA sea sustentable y humanista. Tenemos que reducir el tamaño de la IA y democratizarla. También debemos hacerla más segura enseñándole normas y valores humanos. Podríamos hacer una analogía con “David y Goliat”, donde Goliat encarna los modelos de lenguaje a escala extrema, e inspirarnos en un clásico de todos los tiempos, “El arte de la guerra”, que, según interpreto yo, nos dice que conozcamos al enemigo, elijamos nuestras batallas e innovemos nuestras armas.

Let's start with the first, know your enemy, which means we need to evaluate AI with scrutiny. AI is passing the bar exam. Does that mean that AI is robust at common sense? You might assume so, but you never know.

Comencemos por la primera estrategia: conocer al enemigo. Significa que debemos evaluar a la IA muy en detalle. La IA es capaz de aprobar el examen Bar, pero ¿quiere decir esto que la IA tiene un sentido común sólido? Se podría pensar que sí, pero nunca se sabe.

So suppose I left five clothes to dry out in the sun, and it took them five hours to dry completely. How long would it take to dry 30 clothes? GPT-4, the newest, greatest AI system says 30 hours. Not good. A different one. I have 12-liter jug and six-liter jug, and I want to measure six liters. How do I do it? Just use the six liter jug, right? GPT-4 spits out some very elaborate nonsense.

Supongamos que dejé cinco prendas a secar al sol, y que tardaron cinco horas en secarse por completo. ¿Cuánto tardarían 30 prendas en secarse? El GPT-4, la mejor y última versión de IA, dice que tardarían 30 horas. Incorrecto. Veamos otro caso. Tengo una jarra de 12 litros y otra de 6 litros, y quiero medir 6 litros. ¿Cómo lo hago? Simplemente uso la jarra de 6 litros, ¿verdad? Pues bien, el GPT-4 se despacha con disparates muy elaborados.

(Laughter)

(Risas)

Step one, fill the six-liter jug, step two, pour the water from six to 12-liter jug, step three, fill the six-liter jug again, step four, very carefully, pour the water from six to 12-liter jug. And finally you have six liters of water in the six-liter jug that should be empty by now.

Paso 1: Llenar la jarra de 6 litros. Paso 2: Poner esos 6 litros en la jarra de 12 litros. Paso 3: Volver a llenar la jarra de 6 litros. Paso 4: Con cuidado, poner esos 6 litros en la jarra de 12 litros. Finalmente, tendremos 6 litros de agua en la jarra de 6 litros, que ahora estará vacía.

(Laughter)

(Risas)

OK, one more. Would I get a flat tire by bicycling over a bridge that is suspended over nails, screws and broken glass? Yes, highly likely, GPT-4 says, presumably because it cannot correctly reason that if a bridge is suspended over the broken nails and broken glass, then the surface of the bridge doesn't touch the sharp objects directly.

Veamos otro más. ¿Se pincharía la rueda de mi bicicleta si voy por un puente construido sobre clavos, tornillos y vidrios rotos? El GPT-4 dirá que es muy probable, porque no estaría en condiciones de razonar que si un puente está erigido sobre clavos y vidrios rotos, la superficie del puente no está en contacto directo con estos objetos.

OK, so how would you feel about an AI lawyer that aced the bar exam

Ahora bien, ¿cómo se sentirían ante un abogado de IA

yet randomly fails at such basic common sense? AI today is unbelievably intelligent and then shockingly stupid.

que aprobó el examen Bar pero tropieza ante un planteo al azar de sentido común básico? La IA de hoy es increíblemente lista, pero sorprendentemente tonta.

(Laughter)

(Risas)

It is an unavoidable side effect of teaching AI through brute-force scale. Some scale optimists might say, “Don’t worry about this. All of these can be easily fixed by adding similar examples as yet more training data for AI." But the real question is this. Why should we even do that? You are able to get the correct answers right away without having to train yourself with similar examples. Children do not even read a trillion words to acquire such a basic level of common sense.

Es el efecto secundario inevitable de entrenar la IA con fuerza bruta. Los que apoyan este tipo de escalas dirían que no debemos preocuparnos, que estas fallas se subsanan fácilmente incorporando ejemplos similares y mayor cantidad datos para entrenar a la IA. Pero la pregunta real es la siguiente: ¿por qué llegar a esto? Nosotros mismos podemos arribar rápido a la respuesta correcta sin necesidad de entrenarnos con ejemplos similares. Los niños, por ejemplo, no leen un billón de palabras para llegar a ese nivel básico de sentido común.

So this observation leads us to the next wisdom, choose your battles. So what fundamental questions should we ask right now and tackle today in order to overcome this status quo with extreme-scale AI? I'll say common sense is among the top priorities.

Esta observación nos lleva al segundo consejo: elige tus batallas. Pues bien, ¿qué preguntas básicas debemos hacernos ahora y tratar de responder para sortear este ‘statu quo’ de la IA a escala extrema? Diría que el sentido común está entre las prioridades.

So common sense has been a long-standing challenge in AI. To explain why, let me draw an analogy to dark matter. So only five percent of the universe is normal matter that you can see and interact with, and the remaining 95 percent is dark matter and dark energy. Dark matter is completely invisible, but scientists speculate that it's there because it influences the visible world, even including the trajectory of light. So for language, the normal matter is the visible text, and the dark matter is the unspoken rules about how the world works, including naive physics and folk psychology, which influence the way people use and interpret language.

De hecho, el sentido común es un desafío de larga data para la IA. Para explicarlo, haré una analogía con la materia oscura. Tan solo el 5 % del universo es materia normal que es visible y con la cual interactuamos. El 95 % restante es materia oscura y energía oscura. La materia oscura es totalmente invisible, pero los científicos conjeturan que existe porque influye en el mundo visible, aun en la trayectoria de la luz. En el lenguaje, la materia normal es el texto visible, y la materia oscura son las reglas tácitas sobre el funcionamiento del mundo, incluso la física ingenua y la psicología popular, que influyen en nuestra forma de usar e interpretar el lenguaje.

So why is this common sense even important? Well, in a famous thought experiment proposed by Nick Bostrom, AI was asked to produce and maximize the paper clips. And that AI decided to kill humans to utilize them as additional resources, to turn you into paper clips. Because AI didn't have the basic human understanding about human values. Now, writing a better objective and equation that explicitly states: “Do not kill humans” will not work either because AI might go ahead and kill all the trees, thinking that's a perfectly OK thing to do. And in fact, there are endless other things that AI obviously shouldn’t do while maximizing paper clips, including: “Don’t spread the fake news,” “Don’t steal,” “Don’t lie,” which are all part of our common sense understanding about how the world works.

Pero ¿por qué es tan importante este sentido común? En un conocido experimento mental propuesto por Nick Bostrom, se ordenó a la IA que produzca y maximice clips sujetapapeles. Esa IA decidió matar personas y usarlas como un recurso más para transformarlas en clips sujetapapeles. Esto sucedió porque la IA no tenía la comprensión humana básica sobre los valores humanos. Pero crear un objetivo mejor y una ecuación más acertada que aclare de modo explícito la prohibición de matar personas tampoco funcionaría, porque la IA podría decidir matar todos los árboles y creer que es lo correcto. En realidad, hay un sinfín de otras cosas que la IA no debería hacer para maximizar los clips, como no difundir noticias falsas, no robar y no mentir, todas premisas de sentido común

However, the AI field for decades has considered common sense as a nearly impossible challenge. So much so that when my students and colleagues and I started working on it several years ago, we were very much discouraged. We’ve been told that it’s a research topic of ’70s and ’80s; shouldn’t work on it because it will never work; in fact, don't even say the word to be taken seriously. Now fast forward to this year, I’m hearing: “Don’t work on it because ChatGPT has almost solved it.” And: “Just scale things up and magic will arise, and nothing else matters.”

en nuestra comprensión de cómo funciona el mundo. Sin embargo, hace décadas que el sentido común es un desafío casi imposible para la IA. De hecho, cuando empecé a trabajar con la IA hace varios años junto a alumnos y colegas, la desilusión fue muy grande. Nos dijeron que esa investigación era de los años 70 y 80, que no trabajáramos en eso porque no funcionaría, y que no dijéramos ni una palabra para que nos tomaran en serio. Pero resulta que ahora, este año, nos dicen que no hagamos nada, porque el chat GPT ya lo hizo casi todo, y que si escalamos las cosas, se producirá la magia, y lo demás no importa.

So my position is that giving true common sense human-like robots common sense to AI, is still moonshot. And you don’t reach to the Moon by making the tallest building in the world one inch taller at a time. Extreme-scale AI models do acquire an ever-more increasing amount of commonsense knowledge, I'll give you that. But remember, they still stumble on such trivial problems that even children can do.

Mi opinión es que dotar a la IA de verdadero sentido común, robotizado pero con base humana, sigue siendo tan ambicioso como llegar a la luna. Y nadie llega a la luna construyendo el edificio más alto del mundo de a un centímetro por vez. Admito que los modelos de IA a escala extrema son capaces de razonar cada vez más con base en el sentido común. Pero recordemos que siguen tropezando con problemas tan banales que hasta un niño puede resolver.

So AI today is awfully inefficient. And what if there is an alternative path or path yet to be found? A path that can build on the advancements of the deep neural networks, but without going so extreme with the scale.

De modo que la IA de hoy es terriblemente ineficiente. Pero ¿y si hubiera una vía alternativa o una vía no descubierta aún? Una vía que pueda aprovechar los avances en redes neuronales profundas, pero sin llegar a escalas tan extremas.

So this leads us to our final wisdom: innovate your weapons. In the modern-day AI context, that means innovate your data and algorithms. OK, so there are, roughly speaking, three types of data that modern AI is trained on: raw web data, crafted examples custom developed for AI training, and then human judgments, also known as human feedback on AI performance. If the AI is only trained on the first type, raw web data, which is freely available, it's not good because this data is loaded with racism and sexism and misinformation. So no matter how much of it you use, garbage in and garbage out. So the newest, greatest AI systems are now powered with the second and third types of data that are crafted and judged by human workers. It's analogous to writing specialized textbooks for AI to study from and then hiring human tutors to give constant feedback to AI. These are proprietary data, by and large, speculated to cost tens of millions of dollars. We don't know what's in this, but it should be open and publicly available so that we can inspect and ensure [it supports] diverse norms and values. So for this reason, my teams at UW and AI2 have been working on commonsense knowledge graphs as well as moral norm repositories to teach AI basic commonsense norms and morals. Our data is fully open so that anybody can inspect the content and make corrections as needed because transparency is the key for such an important research topic.

Esto nos lleva al consejo final: innovar nuestras armas. En el marco de la IA actual, esto significa innovar en datos y algoritmos. Pues bien, en términos generales, hay tres tipos de datos con los que se entrena a la IA: los datos brutos de la web, ejemplos diseñados especialmente para entrenar a la IA y los juicios humanos, que es la valoración humana sobre el rendimiento de la IA. Si entrenamos a la IA exclusivamente con los datos brutos de la web, que son de libre acceso, los resultados no serán buenos porque esos datos están cargados de racismo, sexismo e información errónea. Se usen mucho o poco, si entra basura, sale basura. Los sistemas de IA mejores y más recientes se basan ahora en el segundo y tercer tipo de datos que son creados y juzgados por operadores humanos. Es como escribir textos especializados para que la IA se alimente de ellos, y luego contratar tutores humanos para dar valoraciones constantes a la IA. Generalmente, estos datos son cerrados, que costarían decenas de millones de dólares. No sabemos qué hay detrás, pero deberían ser abiertos y de acceso público para poder examinarlos y asegurarnos de que cumplan con diversas normas y valores. Por ello, en la Universidad de Washington y el Instituto Allen estamos trabajando en grafos de conocimiento de sentido común y en repositorios de normas morales para enseñar a la IA las normas y la ética básica del sentido común. Estos datos son totalmente abiertos. Cualquiera puede ver su contenido y hacer las correcciones necesarias, porque la transparencia es clave en una investigación de esta envergadura.

Now let's think about learning algorithms. No matter how amazing large language models are, by design they may not be the best suited to serve as reliable knowledge models. And these language models do acquire a vast amount of knowledge, but they do so as a byproduct as opposed to direct learning objective. Resulting in unwanted side effects such as hallucinated effects and lack of common sense. Now, in contrast, human learning is never about predicting which word comes next, but it's really about making sense of the world and learning how the world works. Maybe AI should be taught that way as well.

Veamos ahora los algoritmos de aprendizaje. Los grandes modelos de lenguaje pueden ser excelentes, pero, por diseño, no son necesariamente los más apropiados para usar como modelos de conocimiento confiables. Y si bien esos modelos de lenguaje son capaces de adquirir gran cantidad de conocimiento, lo hacen de manera lateral, a diferencia del aprendizaje directo. Así producen efectos secundarios no deseados, como la alucinación y la falta de sentido común. Por el contrario, el aprendizaje humano nunca tiene que ver con predecir la próxima palabra, sino con entender el mundo y aprender cómo funciona. Quizá la IA deba ser entrenada en ese sentido también.

So as a quest toward more direct commonsense knowledge acquisition, my team has been investigating potential new algorithms, including symbolic knowledge distillation that can take a very large language model as shown here that I couldn't fit into the screen because it's too large, and crunch that down to much smaller commonsense models using deep neural networks. And in doing so, we also generate, algorithmically, human-inspectable, symbolic, commonsense knowledge representation, so that people can inspect and make corrections and even use it to train other neural commonsense models.

Así que para intentar que la IA razone con un sentido común más directo, mi equipo se puso a investigar potenciales algoritmos nuevos, como la destilación de conocimiento simbólico, que puede usar un modelo de lenguaje de gran tamaño como este, solo que no entraba en pantalla justamente por el tamaño, y los redujimos a modelos de sentido común mucho más pequeños usando redes neuronales profundas. De este modo, también logramos generar algorítmicamente representaciones simbólicas basadas en razonamientos por sentido común que pueden ser examinadas por humanos. Así, la gente puede verlas, corregirlas y hasta usarlas para entrenar en el sentido común a otros modelos neuronales

More broadly, we have been tackling this seemingly impossible giant puzzle of common sense, ranging from physical, social and visual common sense to theory of minds, norms and morals. Each individual piece may seem quirky and incomplete, but when you step back, it's almost as if these pieces weave together into a tapestry that we call human experience and common sense.

Y fuimos un poco más allá para resolver este gran rompecabezas del sentido común, aparentemente imposible, desde el sentido común físico, social y visual a la teoría de la mente, las normas y la ética. Cada una de estas piezas puede parecer poco convencional e incompleta, pero si lo vemos en perspectiva, es como si esas piezas se entrelazaran para formar un tapiz, que es la experiencia humana y el sentido común.

We're now entering a new era in which AI is almost like a new intellectual species with unique strengths and weaknesses compared to humans. In order to make this powerful AI sustainable and humanistic, we need to teach AI common sense, norms and values.

Estamos entrando a una nueva era donde la IA es casi como una nueva especie intelectual con fortalezas y debilidades únicas distintas de los humanos. Para que esta poderosa IA sea sustentable y humanista, debemos entrenarla en el sentido común, las normas y los valores.

Thank you.

Gracias.

(Applause)

(Aplausos)

Chris Anderson: Look at that. Yejin, please stay one sec. This is so interesting, this idea of common sense. We obviously all really want this from whatever's coming. But help me understand. Like, so we've had this model of a child learning. How does a child gain common sense apart from the accumulation of more input and some, you know, human feedback? What else is there?

Chris Anderson: Maravilloso. Yejin, no te retires aún. Me pareció sumamente interesante la idea del sentido común. Claramente, es algo que todos deseamos en lo que sea que vendrá. Pero para entender mejor: hablaste del modelo de aprendizaje de un niño. ¿Cómo hace un niño para adquirir sentido común, además de acumulando más datos y siendo valorado por humanos? ¿Existe otra manera?

Yejin Choi: So fundamentally, there are several things missing, but one of them is, for example, the ability to make hypothesis and make experiments, interact with the world and develop this hypothesis. We abstract away the concepts about how the world works, and then that's how we truly learn, as opposed to today's language model. Some of them is really not there quite yet.

Yejin Choi: En esencia, hay varias cosas que faltan. Una de ellas, por ejemplo, es la capacidad de formular hipótesis y hacer experimentos, de interactuar con el mundo y desarrollar esa hipótesis. Hacemos una abstracción de los conceptos sobre el funcionamiento del mundo, y de ese modo aprendemos en verdad, a diferencia del modelo de lenguaje de hoy. Eso aún falta.

CA: You use the analogy that we can’t get to the Moon by extending a building a foot at a time. But the experience that most of us have had of these language models is not a foot at a time. It's like, the sort of, breathtaking acceleration. Are you sure that given the pace at which those things are going, each next level seems to be bringing with it what feels kind of like wisdom and knowledge.

CA: Hiciste la analogía de que no se puede llegar a la luna construyendo un edificio centímetro a centímetro. Pero la experiencia de la mayoría con estos modelos de lenguaje no ha sido de un centímetro por vez. La velocidad ha sido vertiginosa. ¿No tienes la impresión de que a este ritmo de desarrollo, cada nuevo nivel parece venir con una considerable dosis de sabiduría y conocimiento?

YC: I totally agree that it's remarkable how much this scaling things up

YC: Totalmente de acuerdo contigo

really enhances the performance across the board. So there's real learning happening due to the scale of the compute and data.

en que todo está escalando rápidamente y los resultados son cada vez mejores en todas las áreas. Es verdad que el aprendizaje está avanzando gracias a la escala de la computación y de los datos.

However, there's a quality of learning that is still not quite there. And the thing is, we don't yet know whether we can fully get there or not just by scaling things up. And if we cannot, then there's this question of what else? And then even if we could, do we like this idea of having very, very extreme-scale AI models that only a few can create and own?

Pero aún no se ha logrado calidad de aprendizaje. El problema es que no sabemos todavía si ese objetivo es posible o no con solo escalar las cosas. Y si no es posible, el tema es cómo hacerlo. Pero aunque encontráramos la manera, ¿nos gustaría tener modelos de IA a escala tan extrema que muy pocas personas pueden crear y tener?

CA: I mean, if OpenAI said, you know, "We're interested in your work, we would like you to help improve our model," can you see any way of combining what you're doing with what they have built?

CA: Si OpenAI dijera, por ejemplo: “Nos interesa tu trabajo. Nos gustaría que nos ayudes a mejorar nuestro modelo”, ¿verías la posibilidad de combinar lo que estás haciendo con lo que ellos han construido?

YC: Certainly what I envision will need to build on the advancements of deep neural networks. And it might be that there’s some scale Goldilocks Zone, such that ... I'm not imagining that the smaller is the better either, by the way. It's likely that there's right amount of scale, but beyond that, the winning recipe might be something else. So some synthesis of ideas will be critical here.

YC: Lo que yo vislumbro deberá construirse sobre los avances de las redes neuronales profundas. Y puede llegar a haber alguna zona de escalas adecuadas. Tampoco soy de los que creen que cuanto menos, mejor. Puede que haya una escala correcta, pero más allá de eso, la receta ganadora podría ser otra. De modo que, en este caso, combinar ideas sería fundamental.

CA: Yejin Choi, thank you so much for your talk.

CA: Yejin Choi, muchas gracias por esta charla.

(Applause)

(Aplausos)

So I'm excited to share a few spicy thoughts on artificial intelligence.

Estoy feliz de compartir con Uds. algunos interesantes conceptos

(Laughter)

(Risas)

(Laughter)

(Risas)

OK, so how would you feel about an AI lawyer that aced the bar exam

Ahora bien, ¿cómo se sentirían ante un abogado de IA

yet randomly fails at such basic common sense? AI today is unbelievably intelligent and then shockingly stupid.

que aprobó el examen Bar pero tropieza ante un planteo al azar de sentido común básico? La IA de hoy es increíblemente lista, pero sorprendentemente tonta.

(Laughter)

(Risas)

Thank you.

Gracias.

(Applause)

(Aplausos)

YC: I totally agree that it's remarkable how much this scaling things up

YC: Totalmente de acuerdo contigo

really enhances the performance across the board. So there's real learning happening due to the scale of the compute and data.

en que todo está escalando rápidamente y los resultados son cada vez mejores en todas las áreas. Es verdad que el aprendizaje está avanzando gracias a la escala de la computación y de los datos.

CA: I mean, if OpenAI said, you know, "We're interested in your work, we would like you to help improve our model," can you see any way of combining what you're doing with what they have built?