Max Tegmark: How to keep AI under control

Five years ago, I stood on the TED stage and warned about the dangers of superintelligence. I was wrong. It went even worse than I thought.

五年前，我站在 TED 舞台上，预警了超级智能的危险。我错了。情况比我想象的还要糟糕。

(Laughter)

（笑声）

I never thought governments would let AI companies get this far without any meaningful regulation. And the progress of AI went even faster than I predicted. Look, I showed this abstract landscape of tasks where the elevation represented how hard it was for AI to do each task at human level. And the sea level represented what AI could be back then. And boy or boy, has the sea been rising fast ever since. But a lot of these tasks have already gone blub blub blub blub blub blub. And the water is on track to submerge all land, matching human intelligence at all cognitive tasks.

我从没想过政府会在没有任何有效的监管下让人工智能（AI）公司发展成这样，而且 AI 的进步比我预期的还要快。这是一幅任务的抽象图，其中海拔代表了 AI 要以人类水平完成每项任务的难度。而海平面代表了 AI 曾经的水平。从那以后，海平面一直在快速提高。有很多任务已经被水淹没。海水势必淹没所有的土地，在认知任务中与人类智慧媲美。

This is a definition of artificial general intelligence, AGI, which is the stated goal of companies like OpenAI, Google DeepMind and Anthropic. And these companies are also trying to build superintelligence, leaving human intelligence far behind. And many think it'll only be a few years, maybe, from AGI to superintelligence.

这就是通用人工智能（AGI）的定义，也是 OpenAI、谷歌 DeepMind 和 Anthropic 等公司的既定目标。这些公司也在努力打造超级智能，远超人类智能。许多人认为，从 AGI 到超级智能，可能只需要几年的时间。

So when are we going to get AGI? Well, until recently, most AI researchers thought it was at least decades away. And now Microsoft is saying, "Oh, it's almost here." We're seeing sparks of AGI in ChatGPT-4, and the Metaculus betting site is showing the time left to AGI plummeting from 20 years away to three years away in the last 18 months. And leading industry people are now predicting that we have maybe two or three years left until we get outsmarted. So you better stop talking about AGI as a long-term risk, or someone might call you a dinosaur stuck in the past.

我们什么时候能做出 AGI 呢？直到最近，大多数人工智能研究人员还认为至少还有几十年的时间。微软说：“哦，差不多了。” 我们在 ChatGPT-4 中看到了 AGI 的苗头，而 Metaculus 博彩网站显示，在过去的 18 个月中，距离 AGI 时间从 20 年后骤降至 3 年后。行业领袖现在预测我们距离被人工智能超越也许还有两三年的时间。所以你还是不要再说 AGI 是一种长期风险了，不然有人会认为你是头被困在过去的恐龙。

It's really remarkable how AI has progressed recently. Not long ago, robots moved like this.

AI 最近的进步真是太了不起了。不久前，机器人是这样移动的。

(Music)

（音乐）

Now they can dance.

现在它们可以跳舞了。

(Music)

（音乐）

Just last year, Midjourney produced this image. This year, the exact same prompt produces this. Deepfakes are getting really convincing.

就在去年，Midjourney 生成了这张照片。今年，相同的指示生成了这样的结果。深度伪造以假乱真。

(Video) Deepfake Tom Cruise: I’m going to show you some magic.

（视频）深度伪造的汤姆·克鲁斯：我要变个魔术。

It's the real thing.

这是真的。

(Laughs)

（笑）

I mean ... It's all ... the real ... thing.

我的意思是…… 全部…… 都是…… 真的。

Max Tegmark: Or is it?

迈克斯·泰格马克（Max Tegmark）：是吗？

And Yoshua Bengio now argues that large language models have mastered language and knowledge to the point that they pass the Turing test. I know some skeptics are saying, "Nah, they're just overhyped stochastic parrots that lack a model of the world," but they clearly have a representation of the world. In fact, we recently found that Llama-2 even has a literal map of the world in it. And AI also builds geometric representations of more abstract concepts like what it thinks is true and false.

约书亚·本吉奥（Yoshua Bengio）现在认为，大语言模型已经掌握了语言和知识，水平高到它们可以通过图灵测试。我知道一些质疑者会说： “不，它们只是过度炒作的随机鹦鹉，没有这个世界的模型，” 但它们显然有着世界的表征。我们最近发现 Llama-2 甚至有一张真实的世界地图。 AI 还构建了更抽象概念的几何表现形式，比如它对是非的认知。

So what's going to happen if we get AGI and superintelligence? If you only remember one thing from my talk, let it be this. AI godfather, Alan Turing predicted that the default outcome is the machines take control. The machines take control. I know this sounds like science fiction, but, you know, having AI as smart as GPT-4 also sounded like science fiction not long ago. And if you think of AI, if you think of superintelligence in particular, as just another technology, like electricity, you're probably not very worried. But you see, Turing thinks of superintelligence more like a new species. Think of it, we are building creepy, super capable, amoral psychopaths that don't sleep and think much faster than us, can make copies of themselves and have nothing human about them at all. So what could possibly go wrong?

如果我们有了 AGI 和超级智能，会发生什么？如果你只会记得演讲中的一点，那就记住这一点吧。人工智能教父艾伦·图灵（Alan Turing）预测默认的结果就是计算机掌控一切。计算机掌控一切。我知道这听起来像科幻小说，但是，像 GPT-4 这样聪明的人工智能在不久前听起来也像科幻小说。如果你将 AI，尤其是超级智能，视为一项司空见惯的技术，比如电力，你可能并不会十分担心。但是你看，图灵认为超级智能更像是一个新物种。想一想，我们正在培养可怕的、能力逆天的、不道德的心理变态，它们不需要睡觉，脑子转得比我们快得多，可以复制，没有丝毫人性。可能会出什么问题呢？

(Laughter)

（笑声）

And it's not just Turing. OpenAI CEO Sam Altman, who gave us ChatGPT, recently warned that it could be "lights out for all of us." Anthropic CEO, Dario Amodei, even put a number on this risk: 10-25 percent. And it's not just them. Human extinction from AI went mainstream in May when all the AGI CEOs and who's who of AI researchers came on and warned about it. And last month, even the number one of the European Union warned about human extinction by AI.

不只是图灵。为我们带来 ChatGPT 的 OpenAI CEO 山姆·阿尔特曼（Sam Altman）最近警告说，这可能会是 “全人类的黑暗”。 Anthropic CEO 达里奥·阿莫迪（Dario Amodei）甚至给这种风险定了一个数字： 10-25%。不只是他们。 AI 造成的人类灭绝在五月份甚嚣尘上，当时，所有 AGI 首席执行官和 AI 领域有头有脸的研究人员站出来，发出了警告。上个月，连欧盟委员会主席都警告了人工智能会导致人类灭绝。

So let me summarize everything I've said so far in just one slide of cat memes. Three years ago, people were saying it's inevitable, superintelligence, it'll be fine, it's decades away. Last year it was more like, It's inevitable, it'll be fine. Now it's more like, It's inevitable.

让我用一页猫咪表情包总结一下我刚说的内容。三年前，人们说这是不可避免的，超级智能，没事儿。还有几十年的时间。去年我们这么说，这是不可避免的，没事儿。现在我们说，这是不可避免的。

(Laughter)

（笑声）

But let's take a deep breath and try to raise our spirits and cheer ourselves up, because the rest of my talk is going to be about the good news, that it's not inevitable, and we can absolutely do better, alright?

来深呼吸，打起精神，振作起来，因为我接下来要讲的都是好消息，这不是不可避免的，我们绝对还有努力的空间，好吗？

(Applause)

（掌声）

So ... The real problem is that we lack a convincing plan for AI safety. People are working hard on evals looking for risky AI behavior, and that's good, but clearly not good enough. They're basically training AI to not say bad things rather than not do bad things. Moreover, evals and debugging are really just necessary, not sufficient, conditions for safety. In other words, they can prove the presence of risk, not the absence of risk. So let's up our game, alright? Try to see how we can make provably safe AI that we can control.

所以…… 真正的问题是我们没有针对 AI 安全的明确计划。人们正在努力进行评估，寻找危险的 AI 行为，这很好，但显然还不够好。他们都是在训练人工智能不要“说”不好的内容，而不是不去“做”坏事。此外，评估和调试其实只是 AI 安全的必要而不是充分条件。换句话说，它们可以证明风险的存在，而非无风险。我们来玩一玩吧，好吗？看看我们如何才能制造出可控的、“可证明安全”的 AI。

Guardrails try to physically limit harm. But if your adversary is superintelligence or a human using superintelligence against you, right, trying is just not enough. You need to succeed. Harm needs to be impossible. So we need provably safe systems. Provable, not in the weak sense of convincing some judge, but in the strong sense of there being something that's impossible according to the laws of physics. Because no matter how smart an AI is, it can't violate the laws of physics and do what's provably impossible. Steve Omohundro and I wrote a paper about this, and we're optimistic that this vision can really work. So let me tell you a little bit about how.

护栏可以试着从物理上控制伤害。但是，如果你的对手是超级智能，或者是使用超级智能对付你的人类， “试着获胜”是不够的。你必须得成功。伤害必须是不存在的。我们需要可证明安全的系统。 “可证明”，不是局限于说服法官的单薄含义，而是彻彻底底说明根据物理定律，有些事情是不可能的。因为无论 AI 有多么聪明，它都无法违反物理定律，做“可证明”不可能的事情。我和史蒂夫·奥莫洪德罗（Steve Omohundro）关于这点写了一篇论文，我们乐观地认为这个愿景能够真正奏效。我简单说一说要如何做到。

There's a venerable field called formal verification, which proves stuff about code. And I'm optimistic that AI will revolutionize automatic proving business and also revolutionize program synthesis, the ability to automatically write really good code. So here is how our vision works. You, the human, write a specification that your AI tool must obey, that it's impossible to log in to your laptop without the correct password, or that a DNA printer cannot synthesize dangerous viruses. Then a very powerful AI creates both your AI tool and a proof that your tool meets your spec. Machine learning is uniquely good at learning algorithms, but once the algorithm has been learned, you can re-implement it in a different computational architecture that's easier to verify.

有一个神圣的领域，叫做“形式验证”，可以证明有关代码的东西。我乐观地认为，AI 将彻底改变自动证明任务，还将彻底改变程序合成，即自动编写非常好的代码的能力。因此，我们的愿景是这样的。作为人类，你要写一份你的 AI 工具必须遵守的规范，比如，如果没有正确的密码，它就不可能登录你的电脑，或者 DNA 打印机无法合成危险病毒。然后，非常强大的 AI 既要创建你的 AI 工具，又要创建可以证明你的工具遵守你的规范的证据。机器学习尤其擅长学习算法，一旦它学习了算法，你就可以在另一种更易于验证的计算架构中重新实现它。

Now you might worry, how on earth am I going to understand this powerful AI and the powerful AI tool it built and the proof, if they're all too complicated for any human to grasp? Here is the really great news. You don't have to understand any of that stuff, because it's much easier to verify a proof than to discover it. So you only have to understand or trust your proof-checking code, which could be just a few hundred lines long. And Steve and I envision that such proof checkers get built into all our compute hardware, so it just becomes impossible to run very unsafe code.

你可能会担心，我到底该如何理解这个强大的 AI、它构建的强大 AI 工具和证据，如果它们对于所有人类都过于复杂，难以理解呢？以下就是真正的好消息。你不必了解任何东西，因为验证证据比找证据要容易得多。因此，你只需要理解或信任你的校验代码，它可能只有几百行长。史蒂夫和我设想这样的校验器会内置于我们所有的计算机硬件中，因此绝无可能运行非常不安全的代码。

What if the AI, though, isn't able to write that AI tool for you? Then there's another possibility. You train an AI to first just learn to do what you want and then you use a different AI to extract out the learned algorithm and knowledge for you, like an AI neuroscientist. This is in the spirit of the field of mechanistic interpretability, which is making really impressive rapid progress. Provably safe systems are clearly not impossible.

但是，如果 AI 无法为你编写那个 AI 工具呢？那就还有另一种可能性。你训练 AI 先学会做你想做的事，然后再换一个 AI 为你提取出所学的算法和知识，就像一个 AI 神经科学家。这就是机械可解释性领域的精髓，该领域正在取得惊艳的快速进步。可证明安全的系统显然不是不可能的。

Let's look at a simple example of where we first machine-learn an algorithm from data and then distill it out in the form of code that provably meets spec, OK? Let’s do it with an algorithm that you probably learned in first grade, addition, where you loop over the digits from right to left, and sometimes you do a carry. We'll do it in binary, as if you were counting on two fingers instead of ten. And we first train a recurrent neural network, never mind the details, to nail the task. So now you have this algorithm that you don't understand how it works in a black box defined by a bunch of tables of numbers that we, in nerd speak, call parameters. Then we use an AI tool we built to automatically distill out from this the learned algorithm in the form of a Python program. And then we use the formal verification tool known as Dafny to prove that this program correctly adds up any numbers, not just the numbers that were in your training data.

我们来看一个简单的例子，首先，基于数据，用计算机学习一个算法，然后以可证明符合规范的代码形式提炼出算法。我们就拿你一年级或许就学会的算法为例，加法，你从右到左遍历数位，有时还会进位。我们用二进制来做，如同用两根手指，而不是十个手指数数。我们训练了一个循环神经网络，不用管细节，来完成这个任务。你现在有了这个算法，你也不知道它在黑箱中是怎么运作的，黑箱由一大堆数字定义，用“技术宅”的话来说，就是“参数”。然后，我们用构建的 AI 工具以 Python 程序的形式自动从中提炼出所学算法。然后，我们使用名为 Daphne 的形式验证工具证明该程序可以正确地将任意数字相加，而不仅仅是训练数据中的数字。

So in summary, provably safe AI, I'm convinced is possible, but it's going to take time and work. And in the meantime, let's remember that all the AI benefits that most people are excited about actually don't require superintelligence. We can have a long and amazing future with AI.

总而言之，可证明安全的 AI，我坚信这是可能的，但这需要时间和努力。同时，请记住，各种 AI 的益处，很多人为之兴奋的益处，其实并不需要超级智能。我们可以和 AI 共同拥有漫长且奇妙的未来。

So let's not pause AI. Let's just pause the reckless race to superintelligence. Let's stop obsessively training ever-larger models that we don't understand. Let's heed the warning from ancient Greece and not get hubris, like in the story of Icarus. Because artificial intelligence is giving us incredible intellectual wings with which we can do things beyond our wildest dreams if we stop obsessively trying to fly to the sun.

所以请不要暂停 AI。而是暂停追求超级智能的无脑竞争。停止执着于训练越来越大又无法理解的模型。让我们听从古希腊的警告，不要像伊卡洛斯的故事那样自负。 AI 为我们插上了神奇的智慧之翼，如果我们不再执迷于飞向太阳，我们就能用它天马行空地飞翔。

Thank you.

谢谢。

(Applause)

（掌声）

Five years ago, I stood on the TED stage and warned about the dangers of superintelligence. I was wrong. It went even worse than I thought.

五年前，我站在 TED 舞台上，预警了超级智能的危险。我错了。情况比我想象的还要糟糕。

(Laughter)

（笑声）

It's really remarkable how AI has progressed recently. Not long ago, robots moved like this.

AI 最近的进步真是太了不起了。不久前，机器人是这样移动的。

(Music)

（音乐）

Now they can dance.

现在它们可以跳舞了。

(Music)

（音乐）

Just last year, Midjourney produced this image. This year, the exact same prompt produces this. Deepfakes are getting really convincing.

就在去年，Midjourney 生成了这张照片。今年，相同的指示生成了这样的结果。深度伪造以假乱真。

(Video) Deepfake Tom Cruise: I’m going to show you some magic.

（视频）深度伪造的汤姆·克鲁斯：我要变个魔术。

It's the real thing.

这是真的。

(Laughs)

（笑）

I mean ... It's all ... the real ... thing.

我的意思是…… 全部…… 都是…… 真的。

Max Tegmark: Or is it?

迈克斯·泰格马克（Max Tegmark）：是吗？

(Laughter)

（笑声）

(Laughter)

（笑声）

来深呼吸，打起精神，振作起来，因为我接下来要讲的都是好消息，这不是不可避免的，我们绝对还有努力的空间，好吗？

(Applause)

（掌声）

Thank you.

谢谢。

(Applause)

（掌声）

Max Tegmark: How to keep AI under control

Max Tegmark: How to keep AI under control

Related talks

Stephen Wolfram: How to think computationally about AI, the universe and everything

Max Tegmark: How to get empowered, not overpowered, by AI

Nita Farahany: Your right to mental privacy in the age of brain-sensing tech

Tom Gruber: How AI can enhance our memory, work and social lives

Kevin Kelly: How AI can bring on a second Industrial Revolution

Kai-Fu Lee: How AI can save our humanity

Related talks

Stephen Wolfram: How to think computationally about AI, the universe and everything

Max Tegmark: How to get empowered, not overpowered, by AI

Nita Farahany: Your right to mental privacy in the age of brain-sensing tech

Tom Gruber: How AI can enhance our memory, work and social lives

Kevin Kelly: How AI can bring on a second Industrial Revolution

Kai-Fu Lee: How AI can save our humanity