Stuart Russell: 3 principles for creating safer AI

This is Lee Sedol. Lee Sedol is one of the world's greatest Go players, and he's having what my friends in Silicon Valley call a "Holy Cow" moment --

這是李世石。李世石是全世界頂尖圍棋高手之一，此時，他正在經歷的是我的矽谷朋友們稱之為「我的媽呀！」的時刻......

(Laughter)

（笑聲）

a moment where we realize that AI is actually progressing a lot faster than we expected. So humans have lost on the Go board. What about the real world?

在這一刻讓我們意識到，原來人工智慧發展的進程比我們預期的要快得多。人類已在圍棋博弈中落敗，那現實世界中情況又如何？

Well, the real world is much bigger, much more complicated than the Go board. It's a lot less visible, but it's still a decision problem. And if we think about some of the technologies that are coming down the pike ... Noriko [Arai] mentioned that reading is not yet happening in machines, at least with understanding. But that will happen, and when that happens, very soon afterwards, machines will have read everything that the human race has ever written. And that will enable machines, along with the ability to look further ahead than humans can, as we've already seen in Go, if they also have access to more information, they'll be able to make better decisions in the real world than we can. So is that a good thing? Well, I hope so.

當然啦，現實世界要比棋盤廣闊、複雜得多，它也遠不如棋盤上那麽黑白分明，但仍然是個判定問題（Decision Problem）。如果我們思考一些即將問世的新科技…… 新井紀子提到機器仍無法「閱讀」，至少無法真正理解文本含義。但這項能力最終會被機器掌握，而當這一切發生時，不久之後，機器就能讀遍所有人類寫下的東西。這會讓機器擁有比人類更深刻的遠見和洞察力。就如我們在這場圍棋博弈中所見，如果機器能接觸到比人類更多的信息，那機器將能夠在現實世界中做出比人類更好的決策。那這會是一件好事嗎？我當然希望如此。

Our entire civilization, everything that we value, is based on our intelligence. And if we had access to a lot more intelligence, then there's really no limit to what the human race can do. And I think this could be, as some people have described it, the biggest event in human history. So why are people saying things like this, that AI might spell the end of the human race? Is this a new thing? Is it just Elon Musk and Bill Gates and Stephen Hawking?

人類的全部文明，我們所珍視的一切，都是基於我們的智慧。如果我們能獲得更強大的智慧，那人類將無所不能了。我在想，到時後就像一些人所描述的那樣，這會是人類歷史上最重要的事件。那為什麽有的人會說出以下的言論呢？說人工智慧將是人類的末日呢？這是新鮮事嗎？這僅僅只是伊隆馬斯克、比爾蓋茲、史蒂芬霍金的新發明嗎？

Actually, no. This idea has been around for a while. Here's a quotation: "Even if we could keep the machines in a subservient position, for instance, by turning off the power at strategic moments" -- and I'll come back to that "turning off the power" idea later on -- "we should, as a species, feel greatly humbled." So who said this? This is Alan Turing in 1951. Alan Turing, as you know, is the father of computer science and in many ways, the father of AI as well. So if we think about this problem, the problem of creating something more intelligent than your own species, we might call this "the gorilla problem," because gorillas' ancestors did this a few million years ago, and now we can ask the gorillas: Was this a good idea?

實際上不是，這個概念已經存在很長的時間了。請看這段話：「即便我們能讓機器屈從於我們，比如說，在重要時刻關掉它。」我等會兒會再來討論「關機」這一話題。「我們作為人類，仍應懷着謙卑......」這段話是誰說的呢？是艾倫 · 圖靈在 1951 年說的。眾所皆知艾倫 · 圖靈是計算機科學之父，並且從很多方面來講，他也是人工智慧之父。所以，當我們在思考「創造出比自己更聰明的物種」這個問題時，我們不妨將它稱為「大猩猩問題」。因為大猩猩的祖先們在幾百萬年前就親歷此境，我們可以去問大猩猩們：「這是不是一個好主意？」

So here they are having a meeting to discuss whether it was a good idea, and after a little while, they conclude, no, this was a terrible idea. Our species is in dire straits. In fact, you can see the existential sadness in their eyes.

圖片中，牠們正在開會討論那麽做是不是一個好主意，過了一會兒，牠們總結出：「不。」這是個很爛的主意── 作為靈長類的我們正岌岌可危。你可以從牠們的眼神中看到存亡攸關的憂傷。

(Laughter)

（笑聲）

So this queasy feeling that making something smarter than your own species is maybe not a good idea -- what can we do about that? Well, really nothing, except stop doing AI, and because of all the benefits that I mentioned and because I'm an AI researcher, I'm not having that. I actually want to be able to keep doing AI.

「創造出比你自己更聰明的物種並不是什麽妙計」這種感覺很倒胃口。那我們能做些什麽呢？其實，除非停止人工智慧的研究，否則束手無策。因為我所提到的人工智慧的各種裨益，也因為我是人工智慧的研究人員，我可不同意就此止步。實際上，我想一直研究人工智慧。

So we actually need to nail down the problem a bit more. What exactly is the problem? Why is better AI possibly a catastrophe?

所以我們需要更加明確問題所在。這個問題到底是什麽呢？為什麽更強大的人工智慧可能會是個災難呢？

So here's another quotation: "We had better be quite sure that the purpose put into the machine is the purpose which we really desire." This was said by Norbert Wiener in 1960, shortly after he watched one of the very early learning systems learn to play checkers better than its creator. But this could equally have been said by King Midas. King Midas said, "I want everything I touch to turn to gold," and he got exactly what he asked for. That was the purpose that he put into the machine, so to speak, and then his food and his drink and his relatives turned to gold and he died in misery and starvation. So we'll call this "the King Midas problem" of stating an objective which is not, in fact, truly aligned with what we want. In modern terms, we call this "the value alignment problem."

還有一句名言：「我們最好確保我們向機器發出的指令與我們的真正目的相吻合。」這句話是諾伯特 · 維納在 1960 年說的，就在他看完一個早期的學習系統（Learning System）之後。這個系統在學習如何能把西洋棋下得比發明它的人更好。但如出一轍的一句話，邁達斯國王也說過。他說：「我希望我觸碰的所有東西都變成金子。」結果他真的獲得了點石成金的能力。可以說，這就是他給機器下的指令。結果他的食物、飲料和家人都變成了金子，最後他死於痛苦與饑餓當中。所以我們把這類問題叫做「邁達斯國王問題」，這個比喻是要說明這種不符合實際需求的「目的」。用現代的術語來說，我們把它稱為「價值取向不一致問題」。

Putting in the wrong objective is not the only part of the problem. There's another part. If you put an objective into a machine, even something as simple as, "Fetch the coffee," the machine says to itself, "Well, how might I fail to fetch the coffee? Someone might switch me off. OK, I have to take steps to prevent that. I will disable my 'off' switch. I will do anything to defend myself against interference with this objective that I have been given." So this single-minded pursuit in a very defensive mode of an objective that is, in fact, not aligned with the true objectives of the human race -- that's the problem that we face. And in fact, that's the high-value takeaway from this talk. If you want to remember one thing, it's that you can't fetch the coffee if you're dead.

「設錯了目標」不是唯一的問題，還有其他的。如果你給機器人設了個目標，即使簡單如「去把咖啡端來。」那機器人會對自己說：「什麼會讓我無法去拿咖啡？說不定有人會把我關機；好，那我要想辦法阻止，我得讓我的「關機」開關失效。我得盡一切可能防衛自己，免得別人干涉我去達成所被賦予的任務。」這種專注的行事，以一種極端自我保護的模式在執行，實際上與我們人類想要的目標並不一致。這就是我們面臨的問題。而這就是這場演講的核心想法，也是價值所在。如果你想從這場演講中汲取什麽，那你只要記得：如果死了，就不能端咖啡了。

(Laughter)

（笑聲）

It's very simple. Just remember that. Repeat it to yourself three times a day.

這很簡單，記住就行了，每天早晚覆誦三遍。

(Laughter)

（笑聲）

And in fact, this is exactly the plot of "2001: [A Space Odyssey]" HAL has an objective, a mission, which is not aligned with the objectives of the humans, and that leads to this conflict. Now fortunately, HAL is not superintelligent. He's pretty smart, but eventually Dave outwits him and manages to switch him off. But we might not be so lucky. So what are we going to do?

實際上，這正是電影《2001太空漫步》的劇情。 HAL 有一個目標，一個任務，但這個目標與人類的目標不一致，最後導致了衝突。幸運的是， HAL 並沒有超級智慧，它挺聰明的，但還是比不過人類戴夫，戴夫可以把 HAL 關掉。但我們可能就沒有這麽幸運了。那我們應該怎麽辦呢？

I'm trying to redefine AI to get away from this classical notion of machines that intelligently pursue objectives. There are three principles involved. The first one is a principle of altruism, if you like, that the robot's only objective is to maximize the realization of human objectives, of human values. And by values here I don't mean touchy-feely, goody-goody values. I just mean whatever it is that the human would prefer their life to be like. And so this actually violates Asimov's law that the robot has to protect its own existence. It has no interest in preserving its existence whatsoever.

我想要重新定義人工智慧，不再囿於傳統的概念：能明智地達成目標的機器。新的定義涉及三條原則。第一個原則是利他主義原則，也就是說，機器的唯一目標就是要最大化地實現人類的目標、人類的價值。這種價值不是指多愁善感或者假裝乖巧，而是指人類所嚮往、追求的生活，無論現狀如何。事實上，這樣就違反了艾西莫夫定律，定律裡的機器人必須維護自己的生存。而在這條原則裡機器對自身生存與否毫不關心。

The second law is a law of humility, if you like. And this turns out to be really important to make robots safe. It says that the robot does not know what those human values are, so it has to maximize them, but it doesn't know what they are. And that avoids this problem of single-minded pursuit of an objective. This uncertainty turns out to be crucial.

第二個原則，不妨稱之為謙遜原則。這一條對製造出安全的機器人十分重要。它是指機器人不知道人類的價值是什麽，它只知道將該價值最大化，但卻不知道該價值究竟是什麽。這就避免了「追求單一目的而不知變通」的現象。這種不確定性就變得很重要了。

Now, in order to be useful to us, it has to have some idea of what we want. It obtains that information primarily by observation of human choices, so our own choices reveal information about what it is that we prefer our lives to be like. So those are the three principles. Let's see how that applies to this question of: "Can you switch the machine off?" as Turing suggested.

為了對我們有益，機械就得大概明白我們想要什麽。它要獲取這類信息，主要是透過觀察人類的決策，所以我們的決策會揭露我們生活的意願，所以，這三條原則，讓我們來看看要如何應用到圖靈所說的問題：「你能不能將機器關掉？」

So here's a PR2 robot. This is one that we have in our lab, and it has a big red "off" switch right on the back. The question is: Is it going to let you switch it off? If we do it the classical way, we give it the objective of, "Fetch the coffee, I must fetch the coffee, I can't fetch the coffee if I'm dead," so obviously the PR2 has been listening to my talk, and so it says, therefore, "I must disable my 'off' switch, and probably taser all the other people in Starbucks who might interfere with me."

這是 PR2 機器人，這是我們實驗室裡的其中一台，它的背面有一個大大的紅色開關。那問題來了：它會讓你把它關掉嗎？如果我們用傳統的定義製造它，我們給它一個「去拿咖啡」的目標，它會想：「我必須去拿咖啡，但如果我死了，就不能拿咖啡了。」看來， PR2 聽過我的演講了，因此它說：「我必須讓自己的開關失靈，可能還要通過電擊把那些在星巴克裡干擾我的人都擊暈。」

(Laughter)

（笑聲）

So this seems to be inevitable, right? This kind of failure mode seems to be inevitable, and it follows from having a concrete, definite objective.

這無法避免，對吧？這種失敗看起來是必然的，因為機器人會遵循一個十分明確的目標。

So what happens if the machine is uncertain about the objective? Well, it reasons in a different way. It says, "OK, the human might switch me off, but only if I'm doing something wrong. Well, I don't really know what wrong is, but I know that I don't want to do it." So that's the first and second principles right there. "So I should let the human switch me off." And in fact you can calculate the incentive that the robot has to allow the human to switch it off, and it's directly tied to the degree of uncertainty about the underlying objective.

那如果機器對目標不那麽確定會發生什麽呢？那它的思路就不一樣了。它會說：「好的，人類可能會把我關掉，但只有我做錯事了，才會把我關掉。沒錯，我真的不知道什麽才是錯，但我知道我不該做錯的事。」這就是第一和第二原則。「所以我應該讓人類把我關掉。」事實上你可以推斷出機器人為了允許讓人類關掉它所包含的動機，而且這與根本目標的不確定性程度直接相關。

And then when the machine is switched off, that third principle comes into play. It learns something about the objectives it should be pursuing, because it learns that what it did wasn't right. In fact, we can, with suitable use of Greek symbols, as mathematicians usually do, we can actually prove a theorem that says that such a robot is provably beneficial to the human. You are provably better off with a machine that's designed in this way than without it. So this is a very simple example, but this is the first step in what we're trying to do with human-compatible AI.

當機器被關閉後，第三條原則就起作用了。機器開始學習它應追求的目標，因為它知道它剛才做的事是不對的。實際上，我們可以適當地使用些希臘字母，就像數學家們經常做的那樣，直接證明這一個理論：這樣的機器人對人類是絕對有利的。可以證明如此設計出來的機器人，對我們的生活是是有益的。這個例子很簡單，但它是我們嘗試實現能與人類和諧共處的 AI 的第一步。

Now, this third principle, I think is the one that you're probably scratching your head over. You're probably thinking, "Well, you know, I behave badly. I don't want my robot to behave like me. I sneak down in the middle of the night and take stuff from the fridge. I do this and that." There's all kinds of things you don't want the robot doing. But in fact, it doesn't quite work that way. Just because you behave badly doesn't mean the robot is going to copy your behavior. It's going to understand your motivations and maybe help you resist them, if appropriate. But it's still difficult. What we're trying to do, in fact, is to allow machines to predict for any person and for any possible life that they could live, and the lives of everybody else: Which would they prefer? And there are many, many difficulties involved in doing this; I don't expect that this is going to get solved very quickly. The real difficulties, in fact, are us.

現在來看第三個原則，我知道各位可能還在為這一個原則傷腦筋。你可能會想：「你懂的，我行為舉止比較差勁。我的機器人可不能被我帶壞。我有時後會大半夜偷偷摸摸地從冰箱裡找東西吃，東瞅瞅，西摸摸。」有各種各樣的事你是不希望機器人去做的。但實際上不是那樣。你行為不檢，不代表機器人就得有樣學樣。它會去嘗試理解你做事的動機，而且可能會在合適的情況下幫助你、制止你。但這仍然十分困難。實際上，我們是要讓機器為任何人、任何一種可能的生活去預測：他們更想怎樣？更想要什麽？這涉及到諸多困難，我不認為這會很快地就被解決。實際上，真正的困難是我們自己。

As I have already mentioned, we behave badly. In fact, some of us are downright nasty. Now the robot, as I said, doesn't have to copy the behavior. The robot does not have any objective of its own. It's purely altruistic. And it's not designed just to satisfy the desires of one person, the user, but in fact it has to respect the preferences of everybody. So it can deal with a certain amount of nastiness, and it can even understand that your nastiness, for example, you may take bribes as a passport official because you need to feed your family and send your kids to school. It can understand that; it doesn't mean it's going to steal. In fact, it'll just help you send your kids to school.

就像我剛說的那樣，我們做事不守規矩。我們當中就有人是非常惡劣的。如前所說，機器人未必得要複製那些行為。機器人沒有自己的目標，它是完全利他的。它的誕生不僅僅是為了去滿足某一個人、某一個用戶的欲望，而是去尊重所有人的意願。所以它懂得抵制一些惡劣的行為，它甚至能理解你為什麼惡劣，比如說，如果你是一個邊境護照官員，你可能會收取賄賂，因為你得養家、供孩子們上學。機器人能理解這一點，但不代表它也會學你偷錢，它反而會幫助你去供孩子們上學。

We are also computationally limited. Lee Sedol is a brilliant Go player, but he still lost. So if we look at his actions, he took an action that lost the game. That doesn't mean he wanted to lose. So to understand his behavior, we actually have to invert through a model of human cognition that includes our computational limitations -- a very complicated model. But it's still something that we can work on understanding.

我們的計算能力也是有限的。李世石是一個傑出的圍棋大師，但他還是輸了。如果我們仔細觀察他的棋路，他下錯了那幾步以致輸棋，但這不意味著他想要輸。所以要理解他的行為，我們得從人類認知的模型回推過來，它包含了我們計算能力上的局限，是一個很覆雜的模型。但我們仍然可以嘗試去理解。

Probably the most difficult part, from my point of view as an AI researcher, is the fact that there are lots of us, and so the machine has to somehow trade off, weigh up the preferences of many different people, and there are different ways to do that. Economists, sociologists, moral philosophers have understood that, and we are actively looking for collaboration.

可能對於我這樣的 AI 研究人員來說，最大的困難是，人有很多種，所以機器必須想辦法去協調、權衡不同人之間的喜好、需求，而要做到這一點有多種不同的方法。經濟學家、社會學家、道德哲學家都理解這一點，我們正積極地尋求合作。

Let's have a look and see what happens when you get that wrong. So you can have a conversation, for example, with your intelligent personal assistant that might be available in a few years' time. Think of a Siri on steroids. So Siri says, "Your wife called to remind you about dinner tonight." And of course, you've forgotten. "What? What dinner? What are you talking about?"

讓我們來看看，如果我們把這一步走錯了會怎麽樣。比如說，你可能會與你的人工智慧助理有這樣的對話，這樣的人工智慧可能幾年內就會出現。可以把它想成是強化版的 Siri 。 Siri 對你說：「你老婆打電話提醒你別忘了今天的晚宴。」當然你早就忘了這回事：「什麽？什麽晚宴？你在說什麽？」

"Uh, your 20th anniversary at 7pm."

「呃.....今晚 7 點慶祝結婚 20 周年。」

"I can't do that. I'm meeting with the secretary-general at 7:30. How could this have happened?"

「我可去不了，我晚上 7 點半要見秘書長。怎麽會這樣呢？」

"Well, I did warn you, but you overrode my recommendation."

「呃，我可是提醒過你的，但你沒有理會我的建議。」

"Well, what am I going to do? I can't just tell him I'm too busy."

「我該怎麽辦呢？我可不能跟秘書長說我有事，沒空見他。」

"Don't worry. I arranged for his plane to be delayed."

「別擔心。我已經安排了，讓他的航班延誤。」

(Laughter)

（笑聲）

"Some kind of computer malfunction."

「用某種電腦故障。」

(Laughter)

（笑聲）

"Really? You can do that?"

「真的嗎？這個你也能做到？」

"He sends his profound apologies and looks forward to meeting you for lunch tomorrow."

「秘書長很不好意思，跟你道歉，並邀請你明天中午吃飯。」

(Laughter)

（笑聲）

So the values here -- there's a slight mistake going on. This is clearly following my wife's values which is "Happy wife, happy life."

所以這裡談的價值觀就有點問題了，這顯然是在遵循我老婆的價值觀，也就是「老婆開心，生活舒心」。

(Laughter)

（笑聲）

It could go the other way. You could come home after a hard day's work, and the computer says, "Long day?"

它也有可能發展成另一種情況。你忙碌一天，回到家裏，電腦對你說：「今天很忙喔？」

"Yes, I didn't even have time for lunch."

「是啊，我連午飯都沒來得及吃。」

"You must be very hungry."

「那你一定很餓了吧。」

"Starving, yeah. Could you make some dinner?"

「快餓暈了。你能做點晚飯嗎？」

"There's something I need to tell you."

「有一件事我得告訴你。」

(Laughter)

（笑聲）

"There are humans in South Sudan who are in more urgent need than you."

「南蘇丹人民的情況比你更緊急，更需要照顧。」

(Laughter)

（笑聲）

"So I'm leaving. Make your own dinner."

「所以我要走了。你自己做飯去吧。」

(Laughter)

（笑聲）

So we have to solve these problems, and I'm looking forward to working on them.

我們得解決這類的問題，我也很期待能解決這樣的問題。

There are reasons for optimism. One reason is, there is a massive amount of data. Because remember -- I said they're going to read everything the human race has ever written. Most of what we write about is human beings doing things and other people getting upset about it. So there's a massive amount of data to learn from.

我們有理由感到樂觀。理由之一是，我們有大量的數據資料。記住，我說過機器將能夠閱讀所有人類寫下來的東西。而我們寫下的文字大都類似於「人類做了一些事情導致其他人對此感到沮喪」。所以機器可以從大量的數據中去學習。

There's also a very strong economic incentive to get this right. So imagine your domestic robot's at home. You're late from work again and the robot has to feed the kids, and the kids are hungry and there's nothing in the fridge. And the robot sees the cat.

同時從經濟的角度，我們也有足夠的動機去做好這件事。想像一下，你家裡有個居家機器人。而你又得加班，機器人得給孩子們做飯，孩子們很餓，但冰箱裡什麽都沒有。然後機器人看到了家裡的貓。

(Laughter)

（笑聲）

And the robot hasn't quite learned the human value function properly, so it doesn't understand the sentimental value of the cat outweighs the nutritional value of the cat.

機器人還沒學透人類的價值觀。所以它不知道，貓的情感價值大於其營養價值。

(Laughter)

（笑聲）

So then what happens? Well, it happens like this: "Deranged robot cooks kitty for family dinner." That one incident would be the end of the domestic robot industry. So there's a huge incentive to get this right long before we reach superintelligent machines.

接下來會發生什麽事？頭版頭條可能會是這樣：「瘋狂機器人煮了貓咪當晚餐！」這場意外就足以結束整個居家機器人的產業。所以在我們實現超級 AI 之前，我們有足夠的動機把它做對做好。

So to summarize: I'm actually trying to change the definition of AI so that we have provably beneficial machines. And the principles are: machines that are altruistic, that want to achieve only our objectives, but that are uncertain about what those objectives are, and will watch all of us to learn more about what it is that we really want. And hopefully in the process, we will learn to be better people. Thank you very much.

總結來說：我事實上想要改變人工智慧的定義，這樣我們就可以製造出對我們有益無害的機器人。這三個原則是：機器是利他的，只想著實現我們的目標，但它不確定我們的目標是什麽，並且它會觀察我們，從中學習我們想要的究竟是什麽。希望在這個過程中，我們也能學會成為更好的人。謝謝大家。

(Applause)

（掌聲）

Chris Anderson: So interesting, Stuart. We're going to stand here a bit because I think they're setting up for our next speaker.

克里斯安德森：非常有意思，斯圖爾特。趁工作人員為下一位講者佈置的時候，我們先站在這裡聊幾句。

A couple of questions. So the idea of programming in ignorance seems intuitively really powerful. As you get to superintelligence, what's going to stop a robot reading literature and discovering this idea that knowledge is actually better than ignorance and still just shifting its own goals and rewriting that programming?

我有幾個問題。將「無知」編寫到程式中，這種思想真的很有衝擊力。當機器人有超級智慧時，還有什麽東西能阻檔機器人閱讀書籍，並了解到：博學比無知要好得多，進而改變它的目標，重新編寫自己的程式呢？

Stuart Russell: Yes, so we want it to learn more, as I said, about our objectives. It'll only become more certain as it becomes more correct, so the evidence is there and it's going to be designed to interpret it correctly. It will understand, for example, that books are very biased in the evidence they contain. They only talk about kings and princes and elite white male people doing stuff. So it's a complicated problem, but as it learns more about our objectives it will become more and more useful to us.

斯圖爾特拉塞爾：是的，我們想要它去學習，就像我說的，讓機器人學習我們的目標，只有在理解得越正確的時候，它們才會更明確我們要的東西，佐證擺在那裡，並且我們使它能夠正確解讀這些目標。比如說，它能夠從書中的佐證判斷出那些富含偏見的書，像是只講述國王、王子，和男性精英白人之類的書。所以這是一個複雜的問題，但當它更深入地學習我們的目標時，它會變得越來越有用。

CA: And you couldn't just boil it down to one law, you know, hardwired in: "if any human ever tries to switch me off, I comply. I comply."

CA：所以它十分複雜，遠不足以濃縮成一條法則嗎？像是，把這樣的命令燒録進去：「如果人類想把我關掉，我要服從。我要服從。」

SR: Absolutely not. That would be a terrible idea. So imagine that you have a self-driving car and you want to send your five-year-old off to preschool. Do you want your five-year-old to be able to switch off the car while it's driving along? Probably not. So it needs to understand how rational and sensible the person is. The more rational the person, the more willing you are to be switched off. If the person is completely random or even malicious, then you're less willing to be switched off.

SR：絕對不行。那將是一個很糟糕的主意。試想一下，你有一輛無人駕駛汽車，你想讓它送你五歲的孩子去幼稚園。你會希望你五歲的孩子在汽車運行的過程中將它關閉嗎？應該不會吧。所以它得理解下指令的人有多理智、有多講道理。這個人越理智，它就越願意被你關掉。如果這個人是完全思緒混亂或者甚至是有惡意的，那它就不太願意被你關掉了。

CA: All right. Stuart, can I just say, I really, really hope you figure this out for us. Thank you so much for that talk. That was amazing.

CA：好吧。斯圖爾特，我得說，我真的希望你為我們所有人，找到解決的辦法。很感謝你的演講。十分精彩。 SR：謝謝。

SR: Thank you.

CA：謝謝。

(Applause)

（掌聲）

This is Lee Sedol. Lee Sedol is one of the world's greatest Go players, and he's having what my friends in Silicon Valley call a "Holy Cow" moment --

這是李世石。李世石是全世界頂尖圍棋高手之一，此時，他正在經歷的是我的矽谷朋友們稱之為「我的媽呀！」的時刻......

(Laughter)

（笑聲）

a moment where we realize that AI is actually progressing a lot faster than we expected. So humans have lost on the Go board. What about the real world?

在這一刻讓我們意識到，原來人工智慧發展的進程比我們預期的要快得多。人類已在圍棋博弈中落敗，那現實世界中情況又如何？

(Laughter)

（笑聲）

So we actually need to nail down the problem a bit more. What exactly is the problem? Why is better AI possibly a catastrophe?

所以我們需要更加明確問題所在。這個問題到底是什麽呢？為什麽更強大的人工智慧可能會是個災難呢？

(Laughter)

（笑聲）

It's very simple. Just remember that. Repeat it to yourself three times a day.

這很簡單，記住就行了，每天早晚覆誦三遍。

(Laughter)

（笑聲）

(Laughter)

（笑聲）

So this seems to be inevitable, right? This kind of failure mode seems to be inevitable, and it follows from having a concrete, definite objective.

這無法避免，對吧？這種失敗看起來是必然的，因為機器人會遵循一個十分明確的目標。

"Uh, your 20th anniversary at 7pm."

「呃.....今晚 7 點慶祝結婚 20 周年。」

"I can't do that. I'm meeting with the secretary-general at 7:30. How could this have happened?"

「我可去不了，我晚上 7 點半要見秘書長。怎麽會這樣呢？」

"Well, I did warn you, but you overrode my recommendation."

「呃，我可是提醒過你的，但你沒有理會我的建議。」

"Well, what am I going to do? I can't just tell him I'm too busy."

「我該怎麽辦呢？我可不能跟秘書長說我有事，沒空見他。」

"Don't worry. I arranged for his plane to be delayed."

「別擔心。我已經安排了，讓他的航班延誤。」

(Laughter)

（笑聲）

"Some kind of computer malfunction."

「用某種電腦故障。」

(Laughter)

（笑聲）

"Really? You can do that?"

「真的嗎？這個你也能做到？」

"He sends his profound apologies and looks forward to meeting you for lunch tomorrow."

「秘書長很不好意思，跟你道歉，並邀請你明天中午吃飯。」

(Laughter)

（笑聲）

So the values here -- there's a slight mistake going on. This is clearly following my wife's values which is "Happy wife, happy life."

所以這裡談的價值觀就有點問題了，這顯然是在遵循我老婆的價值觀，也就是「老婆開心，生活舒心」。

(Laughter)

（笑聲）

It could go the other way. You could come home after a hard day's work, and the computer says, "Long day?"

它也有可能發展成另一種情況。你忙碌一天，回到家裏，電腦對你說：「今天很忙喔？」

"Yes, I didn't even have time for lunch."

「是啊，我連午飯都沒來得及吃。」

"You must be very hungry."

「那你一定很餓了吧。」

"Starving, yeah. Could you make some dinner?"

「快餓暈了。你能做點晚飯嗎？」

"There's something I need to tell you."

「有一件事我得告訴你。」

(Laughter)

（笑聲）

"There are humans in South Sudan who are in more urgent need than you."

「南蘇丹人民的情況比你更緊急，更需要照顧。」

(Laughter)

（笑聲）

"So I'm leaving. Make your own dinner."

「所以我要走了。你自己做飯去吧。」

(Laughter)

（笑聲）

So we have to solve these problems, and I'm looking forward to working on them.

我們得解決這類的問題，我也很期待能解決這樣的問題。

(Laughter)

（笑聲）

And the robot hasn't quite learned the human value function properly, so it doesn't understand the sentimental value of the cat outweighs the nutritional value of the cat.

機器人還沒學透人類的價值觀。所以它不知道，貓的情感價值大於其營養價值。

(Laughter)

（笑聲）

(Applause)

（掌聲）

Chris Anderson: So interesting, Stuart. We're going to stand here a bit because I think they're setting up for our next speaker.

克里斯安德森：非常有意思，斯圖爾特。趁工作人員為下一位講者佈置的時候，我們先站在這裡聊幾句。

CA: And you couldn't just boil it down to one law, you know, hardwired in: "if any human ever tries to switch me off, I comply. I comply."

CA：所以它十分複雜，遠不足以濃縮成一條法則嗎？像是，把這樣的命令燒録進去：「如果人類想把我關掉，我要服從。我要服從。」

CA: All right. Stuart, can I just say, I really, really hope you figure this out for us. Thank you so much for that talk. That was amazing.

CA：好吧。斯圖爾特，我得說，我真的希望你為我們所有人，找到解決的辦法。很感謝你的演講。十分精彩。 SR：謝謝。

SR: Thank you.

CA：謝謝。

(Applause)

（掌聲）

Stuart Russell: 3 principles for creating safer AI

Stuart Russell: 3 principles for creating safer AI

Related talks

Blaise Agüera y Arcas: How computers are learning to be creative

Sam Harris: Can we build AI without losing control over it?

Zeynep Tufekci: Machine intelligence makes human morals more important

Noriko Arai: Can a robot pass a university entrance exam?

David Lee: Why jobs of the future won't feel like work

Kriti Sharma: How to keep human bias out of AI

Related talks

Blaise Agüera y Arcas: How computers are learning to be creative

Sam Harris: Can we build AI without losing control over it?

Zeynep Tufekci: Machine intelligence makes human morals more important

Noriko Arai: Can a robot pass a university entrance exam?

David Lee: Why jobs of the future won't feel like work

Kriti Sharma: How to keep human bias out of AI