Max Tegmark: How to keep AI under control

Five years ago, I stood on the TED stage and warned about the dangers of superintelligence. I was wrong. It went even worse than I thought.

五年前，我站在 TED 舞台上，警告大家超智慧的危險性。我錯了。情況比我想的還更糟。

(Laughter)

（笑聲）

I never thought governments would let AI companies get this far without any meaningful regulation. And the progress of AI went even faster than I predicted. Look, I showed this abstract landscape of tasks where the elevation represented how hard it was for AI to do each task at human level. And the sea level represented what AI could be back then. And boy or boy, has the sea been rising fast ever since. But a lot of these tasks have already gone blub blub blub blub blub blub. And the water is on track to submerge all land, matching human intelligence at all cognitive tasks.

我從來沒有想到政府會讓人工智慧公司在毫無任何有意義規範的情況下發展到這個地步，而人工智慧的進步速度比我預測的還快。我先前給大家看過這張抽象的工作任務地景圖，高度代表人工智慧進行該工作任務並做到人類水平的難度。海平面代表當時人工智慧可能是什麼樣子的。而，天哪，海平面從那時起就一直快速上升。但這些工作任務有很多都已經淹到海裡去了。水正在迅速淹沒所有陸地，在所有認知類的工作任務上都能與人類智慧匹敵。

This is a definition of artificial general intelligence, AGI, which is the stated goal of companies like OpenAI, Google DeepMind and Anthropic. And these companies are also trying to build superintelligence, leaving human intelligence far behind. And many think it'll only be a few years, maybe, from AGI to superintelligence.

這就是通用人工智慧的一個定義，簡稱 AGI， AGI 是許多公司宣稱的目標，如 OpenAI、 Google DeepMind，和 Anthropic 。而這些公司也在試圖打造超智慧，把人類智慧遠遠拋在後頭。許多人認為，從 AGI 到超智慧只需要幾年的時間。

So when are we going to get AGI? Well, until recently, most AI researchers thought it was at least decades away. And now Microsoft is saying, "Oh, it's almost here." We're seeing sparks of AGI in ChatGPT-4, and the Metaculus betting site is showing the time left to AGI plummeting from 20 years away to three years away in the last 18 months. And leading industry people are now predicting that we have maybe two or three years left until we get outsmarted. So you better stop talking about AGI as a long-term risk, or someone might call you a dinosaur stuck in the past.

那麼我們何時會有 AGI？目前為止，大多數人工智慧研究者都認為至少還要數十年。現在微軟說：「喔，它就快問世了。」在 ChatGPT-4 中我們可以看到 AGI 的火花， Metaculus 投注網站顯示，在過去十八個月間，實現 AGI 的時間從二十年後縮短到三年後。而業界領頭羊現在預測大概兩到三年後，我們就不是最聰明的了。因此，你最好不要再把 AGI 當作長期風險來談，否則你可能會被認為是活在過去的恐龍。

It's really remarkable how AI has progressed recently. Not long ago, robots moved like this.

最近人工智慧的發展真的很令人驚訝。不久前，機器人的動作還是像這樣子的。

(Music)

（音樂）

Now they can dance.

現在它們還可以跳舞呢。

(Music)

（音樂）

Just last year, Midjourney produced this image. This year, the exact same prompt produces this. Deepfakes are getting really convincing.

就在去年，MidJourney 製作了這張影像。今年，用完全相同的提示，產生出來的結果是這樣。深偽已經變得非常有說服力。

(Video) Deepfake Tom Cruise: I’m going to show you some magic.

（影片）深偽的湯姆‧克魯斯：我變個魔術給大家看。

It's the real thing.

這是真的東西。

(Laughs)

（笑）

I mean ... It's all ... the real ... thing.

我的意思是…… 這一切都是…… 真的……東西。講者：是嗎？

Max Tegmark: Or is it?

And Yoshua Bengio now argues that large language models have mastered language and knowledge to the point that they pass the Turing test. I know some skeptics are saying, "Nah, they're just overhyped stochastic parrots that lack a model of the world," but they clearly have a representation of the world. In fact, we recently found that Llama-2 even has a literal map of the world in it. And AI also builds geometric representations of more abstract concepts like what it thinks is true and false.

約書亞‧班吉歐現在主張大型語言模型精通語言和知識已經到了可以通過圖靈測試的程度。我知道有些懷疑論者會說：「不，它們只是被過度吹捧的隨機鸚鵡，缺乏世界觀模型。」但它們很顯然能夠表達出世界。事實上，我們最近發現 Llama-2 裡面甚至還有一張真的世界地圖。人工智慧還可以針對抽象的概念構建出幾何的表示方式，比如它認為的是真的和假的。

So what's going to happen if we get AGI and superintelligence? If you only remember one thing from my talk, let it be this. AI godfather, Alan Turing predicted that the default outcome is the machines take control. The machines take control. I know this sounds like science fiction, but, you know, having AI as smart as GPT-4 also sounded like science fiction not long ago. And if you think of AI, if you think of superintelligence in particular, as just another technology, like electricity, you're probably not very worried. But you see, Turing thinks of superintelligence more like a new species. Think of it, we are building creepy, super capable, amoral psychopaths that don't sleep and think much faster than us, can make copies of themselves and have nothing human about them at all. So what could possibly go wrong?

那麼，當我們有了 AGI 和超智慧時會發生什麼事？如果這場演說你只能記住一點，那請記住這點：人工智慧教父艾倫‧圖靈預測預設的結果是機器取得掌控權。機器取得掌控權。我知道這聽起來像科幻小說，但，有像 GPT-4 這麼聰明的人工智慧，在不久前也會覺得聽起來像是科幻小說。如果你把人工智慧，特別是，如果你把超智慧視為不過是另一種技術，就像電力，你可能不會太擔心。但，要知道，在圖靈眼中，超智慧更像是一個新物種。想想看，我們是在打造令人毛骨悚然、能力超強、沒道德的精神病患，不用睡覺，思考速度比我們快很多，可以自我複製，且毫無人性。所以，怎麼可能會出錯呢？

(Laughter)

（笑聲）

And it's not just Turing. OpenAI CEO Sam Altman, who gave us ChatGPT, recently warned that it could be "lights out for all of us." Anthropic CEO, Dario Amodei, even put a number on this risk: 10-25 percent. And it's not just them. Human extinction from AI went mainstream in May when all the AGI CEOs and who's who of AI researchers came on and warned about it. And last month, even the number one of the European Union warned about human extinction by AI.

且不只是圖靈。帶給我們 ChatGPT 的 OpenAI 執行長山姆‧奧特曼近期警告說，它可能會是「對我們所有人而言的熄燈時刻」。 Anthropic 執行長達里奧‧阿莫迪甚至給了這個風險一個數字： 10-25%。且不僅是他們。五月時，人工智慧造成人類滅絕成了主流話題，因為那時所有的 AGI 執行長和人工智慧的重要研究者都跳出來做警告。上個月，連歐盟的老大也警告要小心人工智慧造成人類滅絕。

So let me summarize everything I've said so far in just one slide of cat memes. Three years ago, people were saying it's inevitable, superintelligence, it'll be fine, it's decades away. Last year it was more like, It's inevitable, it'll be fine. Now it's more like, It's inevitable.

讓我用一張貓咪迷因投影片來總結目前為止我所說的一切。三年前，大家都說這是不可避免的，超智慧，一切都會很好的，還有幾十年。去年的狀況比較像是：這是不可避免的，一切都會很好的。現在的狀況更像是：這是不可避免的。

(Laughter)

（笑聲）

But let's take a deep breath and try to raise our spirits and cheer ourselves up, because the rest of my talk is going to be about the good news, that it's not inevitable, and we can absolutely do better, alright?

但，咱們深呼吸一下，試著提振精神，讓自己開心點，因為這場演說後續的部分都會是好消息，說明這並非不可避免的，且我們絕對可以做得更好，好嗎？

(Applause)

（掌聲）

So ... The real problem is that we lack a convincing plan for AI safety. People are working hard on evals looking for risky AI behavior, and that's good, but clearly not good enough. They're basically training AI to not say bad things rather than not do bad things. Moreover, evals and debugging are really just necessary, not sufficient, conditions for safety. In other words, they can prove the presence of risk, not the absence of risk. So let's up our game, alright? Try to see how we can make provably safe AI that we can control.

所以…… 真正的問題在於針對人工智慧安全，我們缺乏有說服力的計畫。大家很努力在做評估，尋找人工智慧有哪些行為會造成風險，這是好事，但顯然不夠好。基本上，他們是在訓練人工智慧不要說不好的事，而不是不要做不好的事。此外，評估和除錯對於安全而言，是必要條件但還不足夠。換句話說，他們可以證明有風險存在，而不是證明沒有風險。所以，咱們再努力提升點水平吧。試看看我們要如何做出安全性可以被證明且我們能控制的人工智慧。

Guardrails try to physically limit harm. But if your adversary is superintelligence or a human using superintelligence against you, right, trying is just not enough. You need to succeed. Harm needs to be impossible. So we need provably safe systems. Provable, not in the weak sense of convincing some judge, but in the strong sense of there being something that's impossible according to the laws of physics. Because no matter how smart an AI is, it can't violate the laws of physics and do what's provably impossible. Steve Omohundro and I wrote a paper about this, and we're optimistic that this vision can really work. So let me tell you a little bit about how.

護欄是以實體的方式嘗試限制傷害。但如果你的敵人是超智慧，或使用超智慧對付你的人，光嘗試還不足夠。你得要成功。必須要讓傷害不可能發生。我們需要安全性可以被證明的系統。證明指的並不是說服某位法官的那種證明，那太弱了，而是要強到比如根據物理定律就是不可能發生的程度，因為不論人工智慧有多聰明，也不可能違反物理定律，做出已被證明不可能的事。我和史帝夫‧奧莫亨卓寫了一篇相關論文，我們對此抱持樂觀，認為這個願景是行得通的。讓我來談談怎麼行得通。

There's a venerable field called formal verification, which proves stuff about code. And I'm optimistic that AI will revolutionize automatic proving business and also revolutionize program synthesis, the ability to automatically write really good code. So here is how our vision works. You, the human, write a specification that your AI tool must obey, that it's impossible to log in to your laptop without the correct password, or that a DNA printer cannot synthesize dangerous viruses. Then a very powerful AI creates both your AI tool and a proof that your tool meets your spec. Machine learning is uniquely good at learning algorithms, but once the algorithm has been learned, you can re-implement it in a different computational architecture that's easier to verify.

有個很讓人敬佩的領域，叫做正式驗證，用來證明和程式碼相關的東西。我很樂觀認為，人工智慧會為「自動證明」事業帶來革命，也會為程式合成帶來革命，也就是自動寫出很好的程式碼的能力。我們的願景是這麼運作的：當身為人類的你撰寫一份規格，你的人工智慧工具必須要遵守這份規格，內容是，沒有正確的密碼就不可能登入你的筆電，或者 DNA 印表機不可以合成出危險的病毒。接著，一個非常強大的人工智慧不但創建了你的人工智慧工具，也證明了你的工具符合你的規格。機器學習在學習演算法上有獨特的優勢，但一旦演算法已經學好了，你就可以把它重新導入到不同的計算架構中，更容易驗證的架構中。

Now you might worry, how on earth am I going to understand this powerful AI and the powerful AI tool it built and the proof, if they're all too complicated for any human to grasp? Here is the really great news. You don't have to understand any of that stuff, because it's much easier to verify a proof than to discover it. So you only have to understand or trust your proof-checking code, which could be just a few hundred lines long. And Steve and I envision that such proof checkers get built into all our compute hardware, so it just becomes impossible to run very unsafe code.

現在你可能會擔心我到底要怎麼了解這個強大的人工智慧、它建造的強大人工智慧工具，及證據。畢竟它們可能複雜到人類無法理解？我有個超棒的消息：這些你通通不需要了解，因為驗證證據比發現證據更容易許多。你只需要了解或信任你用來檢查證據的程式碼，它的長度可能只有幾百行。史帝夫和我期望未來這種證據檢查程式都會被內建在我們所有的電腦硬體中，所以根本就不可能執行不安全的程式碼。

What if the AI, though, isn't able to write that AI tool for you? Then there's another possibility. You train an AI to first just learn to do what you want and then you use a different AI to extract out the learned algorithm and knowledge for you, like an AI neuroscientist. This is in the spirit of the field of mechanistic interpretability, which is making really impressive rapid progress. Provably safe systems are clearly not impossible.

但若人工智慧無法為你寫出那個人工智慧工具，怎麼辦？那麼，還有另一種可能性。你先訓練一個人工智慧來做你想要做的事，接著，使用一個不同的人工智慧為你取出已經學好的演算法和知識，就像人工智慧神經科學家一樣。這完全符合機制可解釋性領域的精神，這個領域的進展快得驚人。很顯然，安全性可以被證明的系統不是不可能的。

Let's look at a simple example of where we first machine-learn an algorithm from data and then distill it out in the form of code that provably meets spec, OK? Let’s do it with an algorithm that you probably learned in first grade, addition, where you loop over the digits from right to left, and sometimes you do a carry. We'll do it in binary, as if you were counting on two fingers instead of ten. And we first train a recurrent neural network, never mind the details, to nail the task. So now you have this algorithm that you don't understand how it works in a black box defined by a bunch of tables of numbers that we, in nerd speak, call parameters. Then we use an AI tool we built to automatically distill out from this the learned algorithm in the form of a Python program. And then we use the formal verification tool known as Dafny to prove that this program correctly adds up any numbers, not just the numbers that were in your training data.

咱們來看個簡單的例子：首先，我們用機器學習方法，從資料中學習演算法，接著，把它提取出來，以程式碼的形式呈現，且可證明是符合規格的。咱們用個大家可能在一年級就學到的演算法來當例子，加法，從最右的位數到最左，進行迴圈，有時要進位。我們用二進位來做。就像把用十隻手指數數改為用兩隻手指。我們首先訓練一個循環神經網路，不用管細節，用來完成任務。現在你有這個演算法，不過你不知道它怎麼運作，它是黑箱作業，用一堆數字表格來定義的黑箱，用阿宅語言來說就是「參數」。接著，我們使用我們建造的工具從黑箱中提取出學好的演算法，並以 Python 程式的形式呈現，接著，我們使用正式的驗證工具，即大家熟知的 Daphne，來證明這個程式可以將任何數字正確相加起來，不僅是在訓練資料集中的數字。

So in summary, provably safe AI, I'm convinced is possible, but it's going to take time and work. And in the meantime, let's remember that all the AI benefits that most people are excited about actually don't require superintelligence. We can have a long and amazing future with AI.

總結來說，安全性可以被證明的人工智慧，我相信是有可能的，但需要投入時間和努力。與此同時，別忘記讓大多數人感到興奮的各種人工智慧益處其實並不需要超智慧。我們可以和人工智慧共存，創造長久且不凡的未來。

So let's not pause AI. Let's just pause the reckless race to superintelligence. Let's stop obsessively training ever-larger models that we don't understand. Let's heed the warning from ancient Greece and not get hubris, like in the story of Icarus. Because artificial intelligence is giving us incredible intellectual wings with which we can do things beyond our wildest dreams if we stop obsessively trying to fly to the sun.

所以，咱們別暫停人工智慧。咱們要暫停的是魯莾的超智慧競賽。咱們別再沉迷於訓練我們都還不了解的更大模型。咱們要聽從古希臘的警告，別像伊卡洛斯的故事一樣變得傲慢。因為人工智慧能給予我們很棒的智慧之翼，讓我們能超越我們最狂野的夢想，前提是我們不能再繼續沉迷於飛向太陽。

Thank you.

謝謝。

(Applause)

（掌聲）

Five years ago, I stood on the TED stage and warned about the dangers of superintelligence. I was wrong. It went even worse than I thought.

五年前，我站在 TED 舞台上，警告大家超智慧的危險性。我錯了。情況比我想的還更糟。

(Laughter)

（笑聲）

It's really remarkable how AI has progressed recently. Not long ago, robots moved like this.

最近人工智慧的發展真的很令人驚訝。不久前，機器人的動作還是像這樣子的。

(Music)

（音樂）

Now they can dance.

現在它們還可以跳舞呢。

(Music)

（音樂）

Just last year, Midjourney produced this image. This year, the exact same prompt produces this. Deepfakes are getting really convincing.

就在去年，MidJourney 製作了這張影像。今年，用完全相同的提示，產生出來的結果是這樣。深偽已經變得非常有說服力。

(Video) Deepfake Tom Cruise: I’m going to show you some magic.

（影片）深偽的湯姆‧克魯斯：我變個魔術給大家看。

It's the real thing.

這是真的東西。

(Laughs)

（笑）

I mean ... It's all ... the real ... thing.

我的意思是…… 這一切都是…… 真的……東西。講者：是嗎？

Max Tegmark: Or is it?

(Laughter)

（笑聲）

(Laughter)

（笑聲）

(Applause)

（掌聲）

Thank you.

謝謝。

(Applause)

（掌聲）

Max Tegmark: How to keep AI under control

Max Tegmark: How to keep AI under control

Related talks

Stephen Wolfram: How to think computationally about AI, the universe and everything

Max Tegmark: How to get empowered, not overpowered, by AI

Nita Farahany: Your right to mental privacy in the age of brain-sensing tech

Tom Gruber: How AI can enhance our memory, work and social lives

Kevin Kelly: How AI can bring on a second Industrial Revolution

Kai-Fu Lee: How AI can save our humanity

Related talks

Stephen Wolfram: How to think computationally about AI, the universe and everything

Max Tegmark: How to get empowered, not overpowered, by AI

Nita Farahany: Your right to mental privacy in the age of brain-sensing tech

Tom Gruber: How AI can enhance our memory, work and social lives

Kevin Kelly: How AI can bring on a second Industrial Revolution

Kai-Fu Lee: How AI can save our humanity