Since 2001, I have been working on what we would now call the problem of aligning artificial general intelligence: how to shape the preferences and behavior of a powerful artificial mind such that it does not kill everyone.
自 2001 年起,我就一直 在研究我们现在所说的 通用人工智能的对齐问题: 如何塑造强大的人工智能 所具备的偏好和行为, 以防它杀死所有人?
I more or less founded the field two decades ago, when nobody else considered it rewarding enough to work on. I tried to get this very important project started early so we'd be in less of a drastic rush later. I consider myself to have failed.
二十年前,我可以说是 开辟了这个领域, 所有人都认为这个领域 没什么吸引力,不值得干。 我试图早点开启 这一非常重要的项目, 这样我们之后就不需要 手忙脚乱地奋起直追。 我觉得我失败了。
(Laughter)
(笑声)
Nobody understands how modern AI systems do what they do. They are giant, inscrutable matrices of floating point numbers that we nudge in the direction of better performance until they inexplicably start working. At some point, the companies rushing headlong to scale AI will cough out something that's smarter than humanity. Nobody knows how to calculate when that will happen. My wild guess is that it will happen after zero to two more breakthroughs the size of transformers.
没有人知道现在的 AI 系统 是怎么做到这一切的。 它们是无法参透的 大型浮点数矩阵, 我们在朝着更好的性能 试探着前进, 直到它们莫名其妙地运行起来。 有朝一日,稀里糊涂 快速扩张 AI 的公司 会带出一些比人类更聪明的东西。 没有人知道如何计算 这一刻何时会到来。 我大胆猜测,会在 Transformer 模型大小 再突破零至两次之后到来。
What happens if we build something smarter than us that we understand that poorly? Some people find it obvious that building something smarter than us that we don't understand might go badly. Others come in with a very wide range of hopeful thoughts about how it might possibly go well.
如果我们做出了比我们更聪明, 但不甚了解的东西,会怎么样? 有人认为做出比我们更聪明 又无法理解的东西 显然会招致恶果。 也有人心怀各种希望, 期盼它还有可能走上正轨。
Even if I had 20 minutes for this talk and months to prepare it, I would not be able to refute all the ways people find to imagine that things might go well. But I will say that there is no standard scientific consensus for how things will go well. There is no hope that has been widely persuasive and stood up to skeptical examination. There is nothing resembling a real engineering plan for us surviving that I could critique. This is not a good place in which to find ourselves.
即使我这场演讲有 20 分钟, 花了几个月准备, 我也无法否认人们 幻想一切都好的各种方式。 但我想说,就如何走上正轨 还没有达成一个标准的科学共识。 还没有广泛为人所接受、 经得起推敲的希望。 还没有出现一个我愿意置评的、 接近真正的求生工程计划的东西。 这不该是我们的立足之地。
If I had more time, I'd try to tell you about the predictable reasons why the current paradigm will not work to build a superintelligence that likes you or is friends with you, or that just follows orders. Why, if you press "thumbs up" when humans think that things went right or "thumbs down" when another AI system thinks that they went wrong, you do not get a mind that wants nice things in a way that generalizes well outside the training distribution to where the AI is smarter than the trainers. You can search for "Yudkowsky list of lethalities" for more.
如果我有多余的时间, 我想告诉你 一些可预见的原因, 解释为什么以现在的水平 是做不出一个喜欢你、 与你交朋友或 乖乖听从指令的超级智能的。 为什么当人类觉得不错的时候, 你按下了“赞”, 有一个 AI 系统觉得出错了的时候, 你按下了“踩”, 你得到的并不会是 一个希冀好东西的大脑, 它不可能会超出训练范围, 达到 AI 比训练者更聪明的境界。 更多详情请搜索《尤德考斯基 (Yudkowsky)的致命清单》。
(Laughter)
(笑声)
But to worry, you do not need to believe me about exact predictions of exact disasters. You just need to expect that things are not going to work great on the first really serious, really critical try because an AI system smart enough to be truly dangerous was meaningfully different from AI systems stupider than that. My prediction is that this ends up with us facing down something smarter than us that does not want what we want, that does not want anything we recognize as valuable or meaningful.
但是别担心,你不用相信 我对某场灾难的精确预测。 你只需要做好心理准备, 在第一次认真严肃的尝试时, 结果不会如你所愿, 因为聪明到相当危险的 AI 系统, 与比它更笨的系统 是有天壤之别的。 我的预测是,我们最终 会直面比我们更聪明的东西, 它不会想要我们想要的东西, 不会想要我们眼中 有价值、有意义的东西。
I cannot predict exactly how a conflict between humanity and a smarter AI would go for the same reason I can't predict exactly how you would lose a chess game to one of the current top AI chess programs, let's say Stockfish. If I could predict exactly where Stockfish could move, I could play chess that well myself. I can't predict exactly how you'll lose to Stockfish, but I can predict who wins the game. I do not expect something actually smart to attack us with marching robot armies with glowing red eyes where there could be a fun movie about us fighting them. I expect an actually smarter and uncaring entity will figure out strategies and technologies that can kill us quickly and reliably and then kill us.
我无法预测人类与更智慧的 AI 之间的冲突会如何发展, 同样,我也无法预测 你到底会如何在一盘国际象棋中 输给当今最顶尖的 AI 国际象棋程序,比如 Stockfish。 如果我可以预测出 Stockfish 的每一步棋, 我自己也能下一手好棋。 我无法预测你会 具体如何输给 Stockfish, 但我可以预测谁会赢。 我不认为会有聪明到 让眼里放着红光的机器人步兵 袭击我们的东西, 再拍一部英勇抗击的有趣电影。 我认为会有那么一个 更智慧、更无情的东西, 找到了迅速杀死我们的策略和技术, 然后把我们杀光。
I am not saying that the problem of aligning superintelligence is unsolvable in principle. I expect we could figure it out with unlimited time and unlimited retries, which the usual process of science assumes that we have. The problem here is the part where we don't get to say, “Ha ha, whoops, that sure didn’t work. That clever idea that used to work on earlier systems sure broke down when the AI got smarter, smarter than us.” We do not get to learn from our mistakes and try again because everyone is already dead.
我没有说超级智能的对齐问题 是根本无法解决的。 我相信如果我们有无限的时间、 无限的重试机会,是可以解决的, 在传统的科学研究过程中 就会有这样的假设。 我们面临的问题是 我们不可能说: “哈哈,哎呀,这肯定行不通的。 这个好点子在以前的系统上行得通, 如果 AI 越来越聪明, 比我们更聪明,肯定搞不定。” 我们没有从错误中 吸取教训、再来一次的机会, 因为那时我们已经死光了。
It is a large ask to get an unprecedented scientific and engineering challenge correct on the first critical try. Humanity is not approaching this issue with remotely the level of seriousness that would be required. Some of the people leading these efforts have spent the last decade not denying that creating a superintelligence might kill everyone, but joking about it.
在第一次正经尝试时 就搞定这史无前例的 科学和工程挑战, 是个难为人的要求。 人们处理此事的态度 远不及它所需的慎重程度。 有些持这样态度的人 在过去的十年里,不是在否认 创造出超级智能 可能会杀死所有人, 而是戏谑。
We are very far behind. This is not a gap we can overcome in six months, given a six-month moratorium. If we actually try to do this in real life, we are all going to die.
我们落后太多了。 这不是一个我们可以在 六个月内弥合的鸿沟, 如果我们就这么停摆六个月。 如果我们真的 在现实生活中这么干了, 那我们都得死。
People say to me at this point, what's your ask? I do not have any realistic plan, which is why I spent the last two decades trying and failing to end up anywhere but here. My best bad take is that we need an international coalition banning large AI training runs, including extreme and extraordinary measures to have that ban be actually and universally effective, like tracking all GPU sales, monitoring all the data centers, being willing to risk a shooting conflict between nations in order to destroy an unmonitored data center in a non-signatory country.
这时候人们就想问了: 你想干啥呢? 我没有任何实际的计划, 这也就是为什么我在过去的二十年里 摸爬滚打,但这里才是我的归宿。 我觉得再不济 我们也得成立一个国际联盟, 禁止大型 AI 训练, 包括采取一些极端、特殊的措施, 让这项禁令确确实实 在全球范围内有效, 比如追踪所有 GPU 的销售记录, 监控所有的数据中心, 愿意冒着国家间 发生武装冲突的风险, 摧毁未签署国 未被监控的数据中心。 (注:仅代表演讲者原文意见, 不代表译者立场)
I say this, not expecting that to actually happen. I say this expecting that we all just die. But it is not my place to just decide on my own that humanity will choose to die, to the point of not bothering to warn anyone. I have heard that people outside the tech industry are getting this point faster than people inside it. Maybe humanity wakes up one morning and decides to live.
我这么说不是因为 我预见了这真的会发生。 我这么说是因为 预见了我们都会死。 但是自说自话认为 人类会选择去死, 甚至懒得警告任何人, 不是我的风格。 我听说有非科技行业的人 已经比科技行业的人 更快达到了这个境界。 也许人类某天早上醒来, 还是选择了要活下去。
Thank you for coming to my brief TED talk.
谢谢大家聆听我简短的 TED 演讲。
(Laughter)
(笑声)
(Applause and cheers)
(掌声、欢呼声)
Chris Anderson: So, Eliezer, thank you for coming and giving that. It seems like what you're raising the alarm about is that like, for this to happen, for an AI to basically destroy humanity, it has to break out, escape controls of the internet and, you know, start commanding actual real-world resources. You say you can't predict how that will happen, but just paint one or two possibilities.
克里斯·安德森(Chris Anderson): 埃利泽(Eliezer),谢谢你的演讲。 听起来你是在敲响警钟, 认为这就会发生,AI 会毁灭人类, 肯定会爆发,逃过互联网的控制, 开始掌控真实存在的资源。 你说你无法预测它会如何发生, 但还是请你给出一两个可能性吧。
Eliezer Yudkowsky: OK, so why is this hard? First, because you can't predict exactly where a smarter chess program will move. Maybe even more importantly than that, imagine sending the design for an air conditioner back to the 11th century. Even if they -- if it’s enough detail for them to build it, they will be surprised when cold air comes out because the air conditioner will use the temperature-pressure relation and they don't know about that law of nature. So if you want me to sketch what a superintelligence might do, I can go deeper and deeper into places where we think there are predictable technological advancements that we haven't figured out yet. And as I go deeper, it will get harder and harder to follow.
埃利泽·尤德考斯基: 嗯,那为什么这是件难事呢? 首先,因为你无法精确预测 更智慧的象棋程序的下一步棋是什么。 有可能更重要的一点是, 想象一下把一台空调的设计 发回 11 世纪。 就算他们有足够的细节把它做出来, 冷气跑出来的时候, 他们还是会感到惊讶, 因为空调会利用温度和压强的关系, 而他们还不知道这个自然定律。 如果你想让我描绘出 超级智能会做些什么, 我可以不断深入、深入, 到那些我们还一无所知的 可预见的技术进步出现的地方。 我越深入,就越难跟上。 这是非常有道理的。
It could be super persuasive. That's relatively easy to understand. We do not understand exactly how the brain works, so it's a great place to exploit laws of nature that we do not know about. Rules of the environment, invent new technologies beyond that. Can you build a synthetic virus that gives humans a cold and then a bit of neurological change and they're easier to persuade? Can you build your own synthetic biology, synthetic cyborgs? Can you blow straight past that to covalently bonded equivalents of biology, where instead of proteins that fold up and are held together by static cling, you've got things that go down much sharper potential energy gradients and are bonded together? People have done advanced design work about this sort of thing for artificial red blood cells that could hold 100 times as much oxygen if they were using tiny sapphire vessels to store the oxygen. There's lots and lots of room above biology, but it gets harder and harder to understand.
也比较容易理解。 我们不能彻底搞明白 大脑是如何运作的, 所以大脑就是探索我们 不了解的自然定律的好地方。 环境的规则, 发明超出环境的新技术。 你能做出一种合成病毒, 让人类感冒, 再动一动神经系统, 让人类更容易被说服吗? 你能做出你自己的合成生物, 合成改造人吗? 你可以突破 生物上的共价键物质, 蛋白质不再折叠、 由静电吸附聚合, 做出一个势能梯度更陡, 还互相联结在一起的东西吗? 人们已经为这样的物质 做了一些先进的设计, 可以多携带一百倍氧气 的人工红细胞, 如果红细胞可以用上 扩张导管储存氧气。 生物学之上有很多空间, 但越来越难理解。
CA: So what I hear you saying is that these terrifying possibilities there but your real guess is that AIs will work out something more devious than that. Is that really a likely pathway in your mind?
CA: 你说到的是 这些可怕的可能性, 但你真正的猜测是 AI 会走上一条更曲折的道路。 你脑海中有没有一条 比较有可能的道路?
EY: Which part? That they're smarter than I am? Absolutely.
EY: 你说的是哪一部分? 它们会比我更聪明吗?当然了。
CA: Not that they're smarter, but why would they want to go in that direction? Like, AIs don't have our feelings of sort of envy and jealousy and anger and so forth. So why might they go in that direction?
CA: 不是它们更聪明, 而是它们为什么会想走上这条道路? AI 没有我们这种嫉妒、 眼红、愤怒等感受。 那它们为什么要这么发展下去呢? EY: 因为它们想要的这些奇怪、
EY: Because it's convergently implied by almost any of the strange, inscrutable things that they might end up wanting as a result of gradient descent on these "thumbs up" and "thumbs down" things internally. If all you want is to make tiny little molecular squiggles or that's like, one component of what you want, but it's a component that never saturates, you just want more and more of it, the same way that we would want more and more galaxies filled with life and people living happily ever after. Anything that just keeps going, you just want to use more and more material for that, that could kill everyone on Earth as a side effect. It could kill us because it doesn't want us making other superintelligences to compete with it. It could kill us because it's using up all the chemical energy on earth and we contain some chemical potential energy.
难以捉摸的东西 通向了一个共同的结果, 来自 AI 内部的“赞”和“踩” 得出的梯度下降。 如果你想要的只是 随意修改一下分子结构, 或者你想要一个成分, 一个不会饱和的成分, 那你就会越要越多, 就像我们想要 越来越多充满生命的星系, 人们从此过上幸福的生活。 如果有一件不断运转的东西, 你就会想为它使用越来越多的资源, 也会产生把地球上的 所有人杀光的副作用。 它把我们杀光是因为 它不想让我们做出与其竞争的 其他超级智能。 它把我们杀光是因为 它要用完地球上的所有化学能量了, 而我们体内蕴含着一些 潜在的化学能量。
CA: So some people in the AI world worry that your views are strong enough and they would say extreme enough that you're willing to advocate extreme responses to it. And therefore, they worry that you could be, you know, in one sense, a very destructive figure. Do you draw the line yourself in terms of the measures that we should take to stop this happening? Or is actually anything justifiable to stop the scenarios you're talking about happening?
CA: AI 世界里的一些人 担心你的观点太强势了, 他们说太极端了, 你都愿意支持极端的回应。 所以他们担心你会是 某种意义上极具破坏性的人物。 你自己会划下我们 为了阻止这一切的发生 采取的措施应有的界限吗? 或者说要阻止你刚谈到 将要发生的情景, 有没有一些正当的措施?
EY: I don't think that "anything" works. I think that this takes state actors and international agreements and all international agreements by their nature, tend to ultimately be backed by force on the signatory countries and on the non-signatory countries, which is a more extreme measure. I have not proposed that individuals run out and use violence, and I think that the killer argument for that is that it would not work.
EY: 我不认为这些“措施”会有用。 我认为这涉及了国家政府 和国际协定, 而所有的国际协定从本质上 说到底都会受到签署国 和非签署国势力的支持, 这就会是一个更极端的措施了。 我没有提议个人 挺身而出,实施暴力, 就这一点,我的关键论点是 这是没有用的。
CA: Well, you are definitely not the only person to propose that what we need is some kind of international reckoning here on how to manage this going forward.
CA: 你肯定不是唯一一个提议 我们得达成一些国际上的共识, 如何处理它未来的走向。
Thank you so much for coming here to TED, Eliezer.
感谢你来到 TED,埃利泽。
(Applause)
(掌声)