Yejin Choi: Why AI is incredibly smart and shockingly stupid

So I'm excited to share a few spicy thoughts on artificial intelligence. But first, let's get philosophical by starting with this quote by Voltaire, an 18th century Enlightenment philosopher, who said, "Common sense is not so common." Turns out this quote couldn't be more relevant to artificial intelligence today. Despite that, AI is an undeniably powerful tool, beating the world-class "Go" champion, acing college admission tests and even passing the bar exam.

我很高兴可以分享几个关于人工智能（AI）的“真知灼见”。但首先，我们从哲学看起，引用一句来自 18 世纪启蒙思想家伏尔泰的名言： “常识不平常。” 结果这句名言和如今的人工智能息息相关。虽然如此，AI 毋庸置疑是个强大的工具，它能赢得世界级围棋大赛，顺利通过大学入学考试，甚至通过律师资格考试。

I’m a computer scientist of 20 years, and I work on artificial intelligence. I am here to demystify AI. So AI today is like a Goliath. It is literally very, very large. It is speculated that the recent ones are trained on tens of thousands of GPUs and a trillion words. Such extreme-scale AI models, often referred to as "large language models," appear to demonstrate sparks of AGI, artificial general intelligence. Except when it makes small, silly mistakes, which it often does. Many believe that whatever mistakes AI makes today can be easily fixed with brute force, bigger scale and more resources. What possibly could go wrong?

我从事计算机科学家这一职业已经 20 年了，我研究人工智能。我来到这里，是为了揭秘 AI。如今的 AI 就像是个歌利亚（巨人）。真的非常、非常大型。据推测，最新的 AI 由几万个 GPU （图形处理器）和一万亿个词语训练而成。如此超巨型的 AI 模型，通常被称为“大语言模型”，它们的出现是 AGI，即通用人工智能的一簇火花。虽然它会犯一些愚蠢的小错误，而且总是会犯。很多人认为，AI 现在犯的错误都可以强行依靠更大的规模和更多的资源轻松解决。这有什么不对的呢？

So there are three immediate challenges we face already at the societal level. First, extreme-scale AI models are so expensive to train, and only a few tech companies can afford to do so. So we already see the concentration of power. But what's worse for AI safety, we are now at the mercy of those few tech companies because researchers in the larger community do not have the means to truly inspect and dissect these models. And let's not forget their massive carbon footprint and the environmental impact.

我们如今在社会层面上面临着三个亟待解决的问题。首先，训练超大规模 AI 模型的成本非常高，只有屈指可数的科技公司具备负担的实力。借此我们已经可以看出权力的集中化了。但就 AI 安全而言，更不好的情况是我们现在任凭这仅有的几家科技公司的摆布，因为业界的研究者们还没有找到真正检查、剖析这些模型的方法。我们也不能忽略它们大量的碳足迹和环境影响。

And then there are these additional intellectual questions. Can AI, without robust common sense, be truly safe for humanity? And is brute-force scale really the only way and even the correct way to teach AI?

还有几个智能方面的问题。如果 AI 没有可靠的常识，它对人类来说真的是安全的吗？强行扩张真的是教授 AI 的唯一且正确的途径吗？

So I’m often asked these days whether it's even feasible to do any meaningful research without extreme-scale compute. And I work at a university and nonprofit research institute, so I cannot afford a massive GPU farm to create enormous language models. Nevertheless, I believe that there's so much we need to do and can do to make AI sustainable and humanistic. We need to make AI smaller, to democratize it. And we need to make AI safer by teaching human norms and values. Perhaps we can draw an analogy from "David and Goliath," here, Goliath being the extreme-scale language models, and seek inspiration from an old-time classic, "The Art of War," which tells us, in my interpretation, know your enemy, choose your battles, and innovate your weapons.

最近总是有人问我，如果没有超大规模计算，还有没有可能做出一些有意义的研究。我在一所大学和非营利研究机构工作，所以我负担不起用大规模的 GPU 集群做出大语言模型。但是，我相信还有很多我们需要做、可以做的事，让 AI 可持续、以人为本。我们得缩小 AI、让它触手可及。我们得通过传授人类的规范和价值观让 AI 更安全。也许我们可以引用《大卫和歌利亚》的比喻，在这个例子中，歌利亚就是超大规模语言模型，受到古代经典作品《孙子兵法》的启发，根据我自己的解读，我们需要了解对手、选择战与不战、更新武器。

Let's start with the first, know your enemy, which means we need to evaluate AI with scrutiny. AI is passing the bar exam. Does that mean that AI is robust at common sense? You might assume so, but you never know.

我们从第一点了解对手开始，也就是说我们得对 AI 细细审视。 AI 通过了律师资格考试。这能说明 AI 有着可靠的常识吗？你可以这么认为，但你也没法验证。

So suppose I left five clothes to dry out in the sun, and it took them five hours to dry completely. How long would it take to dry 30 clothes? GPT-4, the newest, greatest AI system says 30 hours. Not good. A different one. I have 12-liter jug and six-liter jug, and I want to measure six liters. How do I do it? Just use the six liter jug, right? GPT-4 spits out some very elaborate nonsense.

假设我晒了 5 件衣服，要花 5 个小时才能晒干。那晒干 30 件衣服要多久？最新、最厉害的 AI 系统 GPT-4 说 30 个小时。不咋地。换个问题。我有一个 12 升的壶和一个 6 升的壶，我想量出 6 升水。该怎么做？直接用 6 升的壶就行了，对吧？ GPT-4 输出了一堆复杂的狗屁。

(Laughter)

（笑声）

Step one, fill the six-liter jug, step two, pour the water from six to 12-liter jug, step three, fill the six-liter jug again, step four, very carefully, pour the water from six to 12-liter jug. And finally you have six liters of water in the six-liter jug that should be empty by now.

第一步，装满 6 升的壶。第二步，把水从 6 升的壶倒进 12 升的壶里。第三步，再装满 6 升的壶。第四步，小心翼翼地把水从 6 升的壶倒进 12 升的壶里。最后，6 升的壶就能量出 6 升的水，而这个壶现在应该是空的。

(Laughter)

（笑声）

OK, one more. Would I get a flat tire by bicycling over a bridge that is suspended over nails, screws and broken glass? Yes, highly likely, GPT-4 says, presumably because it cannot correctly reason that if a bridge is suspended over the broken nails and broken glass, then the surface of the bridge doesn't touch the sharp objects directly.

再来一个。如果我骑着自行车经过了一座跨过钉子、螺丝和碎玻璃的桥，我的轮胎会爆掉吗？ “会，非常有可能会。” GPT-4 是这么回答的，可能是因为它无法正确地解读这座桥是架在碎钉子和碎玻璃之上的，桥面也不会直接接触到尖锐物体。

OK, so how would you feel about an AI lawyer that aced the bar exam yet randomly fails at such basic common sense? AI today is unbelievably intelligent and then shockingly stupid.

那你对这位通过了律师资格考试，但偶尔会在这些基本常识上犯错的 AI 律师有何感想？如今的 AI 聪明绝顶却又愚蠢不堪。

(Laughter)

（笑声）

It is an unavoidable side effect of teaching AI through brute-force scale. Some scale optimists might say, “Don’t worry about this. All of these can be easily fixed by adding similar examples as yet more training data for AI." But the real question is this. Why should we even do that? You are able to get the correct answers right away without having to train yourself with similar examples. Children do not even read a trillion words to acquire such a basic level of common sense.

如果要通过强行扩张教授 AI，那就会产生不可避免的副作用。有些看好扩张的人可能会说： “别担心这个。这些都可以通过给 AI 再加点类似的实例和训练数据轻松解决。” 但真正的问题是这个。我们干嘛要这么做呢？你甚至都不用自己拿着近似实例去训练一遍，就能立即得出正确答案。要让儿童获取基本的常识，根本不需要阅读一万亿个单词。

So this observation leads us to the next wisdom, choose your battles. So what fundamental questions should we ask right now and tackle today in order to overcome this status quo with extreme-scale AI? I'll say common sense is among the top priorities.

这个现象将我们引向了下一条大智慧：选择战与不战。我们现在该问、该解决什么关键问题，才能应对超大规模 AI 的现状？我想说，常识是重中之重。

So common sense has been a long-standing challenge in AI. To explain why, let me draw an analogy to dark matter. So only five percent of the universe is normal matter that you can see and interact with, and the remaining 95 percent is dark matter and dark energy. Dark matter is completely invisible, but scientists speculate that it's there because it influences the visible world, even including the trajectory of light. So for language, the normal matter is the visible text, and the dark matter is the unspoken rules about how the world works, including naive physics and folk psychology, which influence the way people use and interpret language.

常识一直是 AI 长久以来的挑战。让我引用暗物质的比喻来解释一下这是为什么。宇宙中只有 5% 是正常物质，是你可以看见、互动的，剩下的 95% 都是暗物质和暗能量。暗物质是完全不可见的，但科学家们推测出了它的存在，是因为它影响着可见世界，甚至包括了光路。对语言来说，正常物质就是可见的文本，暗物质就是潜规则，描述世界是如何运行的，包括朴素物理学和民间心理学，它们影响着人们使用、解读语言的方式。

So why is this common sense even important? Well, in a famous thought experiment proposed by Nick Bostrom, AI was asked to produce and maximize the paper clips. And that AI decided to kill humans to utilize them as additional resources, to turn you into paper clips. Because AI didn't have the basic human understanding about human values. Now, writing a better objective and equation that explicitly states: “Do not kill humans” will not work either because AI might go ahead and kill all the trees, thinking that's a perfectly OK thing to do. And in fact, there are endless other things that AI obviously shouldn’t do while maximizing paper clips, including: “Don’t spread the fake news,” “Don’t steal,” “Don’t lie,” which are all part of our common sense understanding about how the world works.

这种常识有什么重要的呢？尼克·博斯特罗姆（Nick Bostrom）曾提出这样一个著名的思想实验，要求 AI 产生最大量的回形针。 AI 最终决定杀死人类，将人类当作额外的资源，把你们都做成回形针。因为 AI 对于人类的价值没有基本的人类认知。如果写了这么一个更好的目标和等式，明确表示：“不要杀死人类。” 也无济于事，因为 AI 有可能会杀死所有的树木，认为这完全没问题。其实还有无穷无尽的事，都是 AI 在生产最多回形针的同时显然不应该做的，包括：“不要散布假消息”、 “不要盗窃”、“不要撒谎”，这些都是我们对这个世界该如何运行的常识性理解。

However, the AI field for decades has considered common sense as a nearly impossible challenge. So much so that when my students and colleagues and I started working on it several years ago, we were very much discouraged. We’ve been told that it’s a research topic of ’70s and ’80s; shouldn’t work on it because it will never work; in fact, don't even say the word to be taken seriously. Now fast forward to this year, I’m hearing: “Don’t work on it because ChatGPT has almost solved it.” And: “Just scale things up and magic will arise, and nothing else matters.”

但是，几十年以来， AI 领域一直将常识视为几乎不可能被征服的挑战。不可能到我和我的学生、同事多年前开始研究这个领域时，都非常挫败。有人告诉我们这个研究课题该是上世纪 70、80 年代的；不该研究这个，因为永远得不到答案；这个词甚至都不该被摆到台面上。时间跳到今年，我听到了：“别研究这个了，因为 ChatGPT 几乎已经把它搞定了。” 还有：“什么都扩张一下就行了，会发生奇迹的，别的都无所谓。”

So my position is that giving true common sense human-like robots common sense to AI, is still moonshot. And you don’t reach to the Moon by making the tallest building in the world one inch taller at a time. Extreme-scale AI models do acquire an ever-more increasing amount of commonsense knowledge, I'll give you that. But remember, they still stumble on such trivial problems that even children can do.

我的观点是，给 AI 真正的常识，类人的机器人常识，依旧难如登天。你要登天，也不可能一英尺一英尺地拔高世界上最高的楼。超大规模的 AI 模型确实需要比以往更大量的常识，我可以这么说。但记住，它们仍然会在一些小朋友都会做的小问题上犯错误。

So AI today is awfully inefficient. And what if there is an alternative path or path yet to be found? A path that can build on the advancements of the deep neural networks, but without going so extreme with the scale.

现在的 AI 极度低效。也许还有一条还没有被发掘的道路呢？一条基于深度神经网络进步的道路，也不用走向极端的规模。

So this leads us to our final wisdom: innovate your weapons. In the modern-day AI context, that means innovate your data and algorithms. OK, so there are, roughly speaking, three types of data that modern AI is trained on: raw web data, crafted examples custom developed for AI training, and then human judgments, also known as human feedback on AI performance. If the AI is only trained on the first type, raw web data, which is freely available, it's not good because this data is loaded with racism and sexism and misinformation. So no matter how much of it you use, garbage in and garbage out. So the newest, greatest AI systems are now powered with the second and third types of data that are crafted and judged by human workers. It's analogous to writing specialized textbooks for AI to study from and then hiring human tutors to give constant feedback to AI. These are proprietary data, by and large, speculated to cost tens of millions of dollars. We don't know what's in this, but it should be open and publicly available so that we can inspect and ensure [it supports] diverse norms and values. So for this reason, my teams at UW and AI2 have been working on commonsense knowledge graphs as well as moral norm repositories to teach AI basic commonsense norms and morals. Our data is fully open so that anybody can inspect the content and make corrections as needed because transparency is the key for such an important research topic.

这就说到了我们最后一条大智慧：更新你的武器。在当代的 AI 环境中，指的就是在你的数据和算法上创新。现在的 AI 大概由 3 类数据训练而成：原始网页数据、专为 AI 训练定制的人工实例和人类判断，也就是人类就 AI 的表现提供的反馈。如果 AI 只由第一种原始网页数据训练，此类数据唾手可得，这就会是个不好的选择，因为这类数据充满了种族歧视、性别歧视和错误信息。无论你用了多少此类数据，就是输入了垃圾又输出了垃圾。最新最好的 AI 系统现已接入了第二种和第三种数据，由人类员工创建、评判。这就类似于专为 AI 写了一本教科书，让它学，然后再请人类辅导老师不断给 AI 提供反馈。这些都是专有数据，大约估算要花费上亿美元。我们都不知道其中有什么，但这些数据得是公开的、公众可以获取的，这样我们可以检视，保证多种规范和价值观。因此，我在华盛顿大学和艾伦人工智能研究所（AI2）的团队一直在研究常识知识图谱和道德规范库，借此将基本常识中的规范和道德教授给 AI。我们的数据是完全公开的，任何人都可以检查内容，必要时做出修改，因为透明度是如此重要的研究课题的关键。

Now let's think about learning algorithms. No matter how amazing large language models are, by design they may not be the best suited to serve as reliable knowledge models. And these language models do acquire a vast amount of knowledge, but they do so as a byproduct as opposed to direct learning objective. Resulting in unwanted side effects such as hallucinated effects and lack of common sense. Now, in contrast, human learning is never about predicting which word comes next, but it's really about making sense of the world and learning how the world works. Maybe AI should be taught that way as well.

我们来谈一谈学习算法。无论大语言模型有多牛，它们可能本来就不是可靠的知识模型的最佳选择。这些语言模型确实能获取海量知识，但这是与它直接的学习目标相反的意外收获。这会导致多余的副作用，比如幻觉和缺乏常识。相比之下，人类学习从来就不是预测接下来该输出什么词，而是理解世界，学习世界运作的方式。也许也该这么教 AI。

So as a quest toward more direct commonsense knowledge acquisition, my team has been investigating potential new algorithms, including symbolic knowledge distillation that can take a very large language model as shown here that I couldn't fit into the screen because it's too large, and crunch that down to much smaller commonsense models using deep neural networks. And in doing so, we also generate, algorithmically, human-inspectable, symbolic, commonsense knowledge representation, so that people can inspect and make corrections and even use it to train other neural commonsense models.

为了探寻获取常识的更直接的方式，我的团队一直在研究潜在的新算法，比如符号知识提炼，需要的巨型模型如图所示，这页都放不下，因为实在是太大了，再通过深度神经网络把它缩小成一个小得多的常识模型。与此同时，我们通过算法生成人类可以检视、以符号表达的常识知识表示，让人们可以检查、修正，甚至用其训练其他神经常识模型。

More broadly, we have been tackling this seemingly impossible giant puzzle of common sense, ranging from physical, social and visual common sense to theory of minds, norms and morals. Each individual piece may seem quirky and incomplete, but when you step back, it's almost as if these pieces weave together into a tapestry that we call human experience and common sense.

更广泛地说，我们正在解开这个看似不可能解开的巨幅常识拼图，从物理的、社会的、视觉上的常识，到心智理论、规范和道德。每一块都古怪又不完整，但如果退后一步看，这些碎片好像交织在一起，形成一幅我们称之为人类经验和常识的画卷。

We're now entering a new era in which AI is almost like a new intellectual species with unique strengths and weaknesses compared to humans. In order to make this powerful AI sustainable and humanistic, we need to teach AI common sense, norms and values.

我们现正迈入一个新时代， AI 就像是一种拥有知识的新物种，相较人类有着独特的优势和弱势。要让这强大的 AI 可持续又以人为本，我们得把常识、规范和价值观教给 AI。

Thank you.

谢谢。

(Applause)

（掌声）

Chris Anderson: Look at that. Yejin, please stay one sec. This is so interesting, this idea of common sense. We obviously all really want this from whatever's coming. But help me understand. Like, so we've had this model of a child learning. How does a child gain common sense apart from the accumulation of more input and some, you know, human feedback? What else is there?

克里斯·安德森（Chris Anderson）：瞧瞧。艺珍（Yejin），请留步。太有趣了，常识的话题。显然我们都很期待。但请你帮我理解一下。我们有了这个类似儿童学习的模型。除了更多输入的积累和人类的反馈，小孩子该如何获取常识呢？还有什么？

Yejin Choi: So fundamentally, there are several things missing, but one of them is, for example, the ability to make hypothesis and make experiments, interact with the world and develop this hypothesis. We abstract away the concepts about how the world works, and then that's how we truly learn, as opposed to today's language model. Some of them is really not there quite yet.

崔艺珍（Yejin Choi）：从根本上来说，缺少了几样东西，但以其中一样为例，即做出假设和尝试的能力，与世界互动，形成假设。我们不会去归纳总结世界运作的方式，这才是我们学习的真正方式，而不是如今语言模型采用的方式。有些模型还达不到这种程度。

CA: You use the analogy that we can’t get to the Moon by extending a building a foot at a time. But the experience that most of us have had of these language models is not a foot at a time. It's like, the sort of, breathtaking acceleration. Are you sure that given the pace at which those things are going, each next level seems to be bringing with it what feels kind of like wisdom and knowledge.

CA: 你打了个比方，说我们无法通过每次把楼拔高一英尺登天。但很多人在这些语言模型上的体验可不是每次一英尺，而是像猛地一脚油门。你确定照现在发展的节奏，每到下一个阶段都会带来某种智慧的心得和知识吗？

YC: I totally agree that it's remarkable how much this scaling things up really enhances the performance across the board. So there's real learning happening due to the scale of the compute and data.

YC: 我完全同意扩大规模真的可以总体提高性能是一件了不起的事。计算和数据的规模真的能让我们有所收获。

However, there's a quality of learning that is still not quite there. And the thing is, we don't yet know whether we can fully get there or not just by scaling things up. And if we cannot, then there's this question of what else? And then even if we could, do we like this idea of having very, very extreme-scale AI models that only a few can create and own?

但是收获的质量不尽如人意。问题是，我们都不知道到底能不能“如人意”，仅仅通过扩大规模这一途径。如果这样是达不到的，问题就变成了：还有什么途径呢？就算我们可以借此达到想要的效果，我们真的会喜欢使用这种非常、非常大规模的 AI 模型，只有屈指可数的人可以创造、拥有的模型吗？

CA: I mean, if OpenAI said, you know, "We're interested in your work, we would like you to help improve our model," can you see any way of combining what you're doing with what they have built?

CA: 如果 OpenAI 说： “我们对你的工作很感兴趣，我希望你能帮我们改进我们的模型。” 你觉得有没有将你的研究内容与他们做的东西相结合的可能？

YC: Certainly what I envision will need to build on the advancements of deep neural networks. And it might be that there’s some scale Goldilocks Zone, such that ... I'm not imagining that the smaller is the better either, by the way. It's likely that there's right amount of scale, but beyond that, the winning recipe might be something else. So some synthesis of ideas will be critical here.

YC: 我的畅想当然必须建立在深度神经网络的突破之上。也许存在一个规模的“适居带”，这样…… 我不是说越小越好，很有可能有一个合适的规模，但除此之外，取胜秘籍可能另有他物。各种想法的碰撞就是关键。

CA: Yejin Choi, thank you so much for your talk.

CA: 崔艺珍，感谢你的演讲。

(Applause)

（掌声）

还有几个智能方面的问题。如果 AI 没有可靠的常识，它对人类来说真的是安全的吗？强行扩张真的是教授 AI 的唯一且正确的途径吗？

(Laughter)

（笑声）

(Laughter)

（笑声）

OK, so how would you feel about an AI lawyer that aced the bar exam yet randomly fails at such basic common sense? AI today is unbelievably intelligent and then shockingly stupid.

那你对这位通过了律师资格考试，但偶尔会在这些基本常识上犯错的 AI 律师有何感想？如今的 AI 聪明绝顶却又愚蠢不堪。

(Laughter)

（笑声）

这个现象将我们引向了下一条大智慧：选择战与不战。我们现在该问、该解决什么关键问题，才能应对超大规模 AI 的现状？我想说，常识是重中之重。

现在的 AI 极度低效。也许还有一条还没有被发掘的道路呢？一条基于深度神经网络进步的道路，也不用走向极端的规模。

Thank you.

谢谢。

(Applause)

（掌声）

YC: I totally agree that it's remarkable how much this scaling things up really enhances the performance across the board. So there's real learning happening due to the scale of the compute and data.

YC: 我完全同意扩大规模真的可以总体提高性能是一件了不起的事。计算和数据的规模真的能让我们有所收获。

CA: I mean, if OpenAI said, you know, "We're interested in your work, we would like you to help improve our model," can you see any way of combining what you're doing with what they have built?

CA: 如果 OpenAI 说： “我们对你的工作很感兴趣，我希望你能帮我们改进我们的模型。” 你觉得有没有将你的研究内容与他们做的东西相结合的可能？