Stuart Russell: 3 principles for creating safer AI

This is Lee Sedol. Lee Sedol is one of the world's greatest Go players, and he's having what my friends in Silicon Valley call a "Holy Cow" moment --

これは李世ドルです李世ドルは世界で最も強い碁打ちの１人ですがシリコンバレーの友人たちなら「なんてこった」と言う瞬間を迎えています

(Laughter)

(笑)

a moment where we realize that AI is actually progressing a lot faster than we expected. So humans have lost on the Go board. What about the real world?

我々が予想していたよりもずっと早く AIが進歩していることに気付いた瞬間です人間は碁盤上で機械に負けましたが実際の世の中ではどうでしょう？

Well, the real world is much bigger, much more complicated than the Go board. It's a lot less visible, but it's still a decision problem. And if we think about some of the technologies that are coming down the pike ... Noriko [Arai] mentioned that reading is not yet happening in machines, at least with understanding. But that will happen, and when that happens, very soon afterwards, machines will have read everything that the human race has ever written. And that will enable machines, along with the ability to look further ahead than humans can, as we've already seen in Go, if they also have access to more information, they'll be able to make better decisions in the real world than we can. So is that a good thing? Well, I hope so.

実際の世界は碁盤よりもずっと大きくずっと複雑でずっと見通し難いですが決定問題であることに違いはありません到来しつつあるテクノロジーのことを考えるなら — 機械は本当に理解して文を読めるようにはまだなっていないことに新井紀子氏が触れていましたがそれもやがてできるようになるでしょうそしてそうなったとき機械は人類がかつて書いたすべてのものを速やかに読破することでしょうそうなると機械は碁において見せた人間より遠くまで見通す力と合わせより多くの情報に触れられるようになることで実際の世の中でも人間より優れた判断ができるようになるでしょうそれは良いことなのでしょうか？そうだと望みたいです

Our entire civilization, everything that we value, is based on our intelligence. And if we had access to a lot more intelligence, then there's really no limit to what the human race can do. And I think this could be, as some people have described it, the biggest event in human history. So why are people saying things like this, that AI might spell the end of the human race? Is this a new thing? Is it just Elon Musk and Bill Gates and Stephen Hawking?

我々の文明そのもの我々が価値を置くすべては我々の知性を拠り所としていますはるかに多くの知性が使えるようになったなら人類に可能なことに限界はないでしょうある人々が言っているようにこれは人類史上最大の出来事になるかもしれませんではなぜ「AIは人類の終焉を意味するかもしれない」などと言われているのでしょう？これは新しいことなのでしょうか？ただイーロン・マスクとビル・ゲイツとホーキングが言っているだけなのか？

Actually, no. This idea has been around for a while. Here's a quotation: "Even if we could keep the machines in a subservient position, for instance, by turning off the power at strategic moments" -- and I'll come back to that "turning off the power" idea later on -- "we should, as a species, feel greatly humbled." So who said this? This is Alan Turing in 1951. Alan Turing, as you know, is the father of computer science and in many ways, the father of AI as well. So if we think about this problem, the problem of creating something more intelligent than your own species, we might call this "the gorilla problem," because gorillas' ancestors did this a few million years ago, and now we can ask the gorillas: Was this a good idea?

違いますこの考えは結構前からありましたここにある人の言葉があります「重大な瞬間にスイッチを切るといったことによって機械を従属的な位置に保てたとしても — この “スイッチを切る” ことについては後でまた戻ってきます — 種としての我々は謙虚に捉えるべきである」誰の言葉でしょう？アラン・チューリングが 1951年に言ったことですご存じのようにチューリングはコンピューター科学の父でありいろいろな意味で AIの父でもありますこの問題を考えてみるとつまり自分の種よりも知的なものを生み出してしまうという問題ですがこれは「ゴリラの問題」と呼んでも良いかもしれませんなぜなら数百万年前にゴリラの祖先がそうしているからでゴリラたちに尋ねることができます「いいアイデアだったと思う？」

So here they are having a meeting to discuss whether it was a good idea, and after a little while, they conclude, no, this was a terrible idea. Our species is in dire straits. In fact, you can see the existential sadness in their eyes.

ゴリラたちがいいアイデアだったのか議論するために集まっていますがしばらくして出した結論は「あれは酷いアイデアだった」というものですおかげで我々の種はひどい苦境に置かれていると彼らの目に実存的な悲哀を見て取れるでしょう

(Laughter)

(笑)

So this queasy feeling that making something smarter than your own species is maybe not a good idea -- what can we do about that? Well, really nothing, except stop doing AI, and because of all the benefits that I mentioned and because I'm an AI researcher, I'm not having that. I actually want to be able to keep doing AI.

「自分の種より知的なものを生み出すのは良い考えではないのでは？」という不安な感覚がありますそれについて何ができるのでしょう？ AIの開発をやめてしまう以外ないかもしれませんが AIのもたらす様々な利点や私自身AI研究者であるという理由によって私にはそういう選択肢はありません実際AIは続けたいと思っています

So we actually need to nail down the problem a bit more. What exactly is the problem? Why is better AI possibly a catastrophe?

この問題をもう少し明確にする必要があるでしょう正確に何が問題なのか？優れたAIが我々の破滅に繋がりうるのはなぜなのか？

So here's another quotation: "We had better be quite sure that the purpose put into the machine is the purpose which we really desire." This was said by Norbert Wiener in 1960, shortly after he watched one of the very early learning systems learn to play checkers better than its creator. But this could equally have been said by King Midas. King Midas said, "I want everything I touch to turn to gold," and he got exactly what he asked for. That was the purpose that he put into the machine, so to speak, and then his food and his drink and his relatives turned to gold and he died in misery and starvation. So we'll call this "the King Midas problem" of stating an objective which is not, in fact, truly aligned with what we want. In modern terms, we call this "the value alignment problem."

ここにもう１つ引用があります「機械に与える目的についてはそれが本当に望むものだと確信があるものにする必要がある」これはノーバート・ウィーナーが 1960年に言ったことで最初期の学習システムが作り手よりもうまくチェッカーを指すのを見たすぐ後のことですしかしこれはミダス王の言葉だったとしてもおかしくないでしょうミダス王は「自分の触れたものすべてが金になってほしい」と望みそしてその望みが叶えられましたこれはいわば彼が「機械に与えた目的」ですそして彼の食べ物や飲み物や親類はみんな金に変わってしまい彼は悲嘆と飢えの中で死んでいきましただから自分が本当に望むことと合わない目的を掲げることを「ミダス王の問題」と呼ぶことにしましょう現代的な用語ではこれを「価値整合の問題」と言います

Putting in the wrong objective is not the only part of the problem. There's another part. If you put an objective into a machine, even something as simple as, "Fetch the coffee," the machine says to itself, "Well, how might I fail to fetch the coffee? Someone might switch me off. OK, I have to take steps to prevent that. I will disable my 'off' switch. I will do anything to defend myself against interference with this objective that I have been given." So this single-minded pursuit in a very defensive mode of an objective that is, in fact, not aligned with the true objectives of the human race -- that's the problem that we face. And in fact, that's the high-value takeaway from this talk. If you want to remember one thing, it's that you can't fetch the coffee if you're dead.

間違った目的を与えてしまうというのが問題のすべてではありません別の側面もあります「コーヒーを取ってくる」というようなごく単純な目的を機械に与えたとします機械は考えます「コーヒーを取ってくるのに失敗するどんな状況がありうるだろう？誰かが自分のスイッチを切るかもしれないそのようなことを防止する手を打たなければ自分の「オフ」スイッチを無効にしておこう与えられた目的の遂行を阻むものから自分を守るためであれば何だってやろう」１つの目的を非常に防御的に一途に追求すると人類の本当の目的に沿わなくなるというのが我々の直面する問題です実際それがこの講演から学べる価値ある教訓ですもし１つだけ覚えておくとしたらそれは — 「死んだらコーヒーを取ってこれない」ということです

(Laughter)

(笑)

It's very simple. Just remember that. Repeat it to yourself three times a day.

簡単でしょう記憶して１日３回唱えてください

(Laughter)

(笑)

And in fact, this is exactly the plot of "2001: [A Space Odyssey]" HAL has an objective, a mission, which is not aligned with the objectives of the humans, and that leads to this conflict. Now fortunately, HAL is not superintelligent. He's pretty smart, but eventually Dave outwits him and manages to switch him off. But we might not be so lucky. So what are we going to do?

実際映画『2001年宇宙の旅』の筋はそういうものでした HALの目的・ミッションは人間の目的とは合わずそのため衝突が起きます幸いHALは非常に賢くはあっても超知的ではありませんでしたそれで最終的には主人公が出し抜いてスイッチを切ることができましたでも私たちはそんなに幸運ではないかもしれませんではどうしたらいいのでしょう？

I'm trying to redefine AI to get away from this classical notion of machines that intelligently pursue objectives. There are three principles involved. The first one is a principle of altruism, if you like, that the robot's only objective is to maximize the realization of human objectives, of human values. And by values here I don't mean touchy-feely, goody-goody values. I just mean whatever it is that the human would prefer their life to be like. And so this actually violates Asimov's law that the robot has to protect its own existence. It has no interest in preserving its existence whatsoever.

「知的に目的を追求する機械」という古典的な見方から離れて AIの再定義を試みようと思います３つの原則があります第１は「利他性の原則」でロボットの唯一の目的は人間の目的人間にとって価値あることが最大限に実現されるようにすることですここで言う価値は善人ぶった崇高そうな価値ではありません単に何であれ人間が自分の生活に望むものということですこの原則は「ロボットは自己を守らなければならない」というアシモフの原則に反します自己の存在維持にはまったく関心を持たないのです

The second law is a law of humility, if you like. And this turns out to be really important to make robots safe. It says that the robot does not know what those human values are, so it has to maximize them, but it doesn't know what they are. And that avoids this problem of single-minded pursuit of an objective. This uncertainty turns out to be crucial.

第２の原則は言うなれば「謙虚の原則」ですこれはロボットを安全なものにする上で非常に重要であることがわかりますこの原則はロボットが人間の価値が何か知らないものとしていますロボットは最大化すべきものが何か知らないということです１つの目的を一途に追求することの問題をこれで避けることができますこの不確定性が極めて重要なのです

Now, in order to be useful to us, it has to have some idea of what we want. It obtains that information primarily by observation of human choices, so our own choices reveal information about what it is that we prefer our lives to be like. So those are the three principles. Let's see how that applies to this question of: "Can you switch the machine off?" as Turing suggested.

人間にとって有用であるためには我々が何を望むのかについて大まかな理解は必要ですロボットはその情報を主として人間の選択を観察することで得ます我々が自分の生活に望むのが何かという情報が我々のする選択を通して明かされるわけです以上が３つの原則ですこれがチューリングの提起した「機械のスイッチを切れるか」という問題にどう適用できるか見てみましょう

So here's a PR2 robot. This is one that we have in our lab, and it has a big red "off" switch right on the back. The question is: Is it going to let you switch it off? If we do it the classical way, we give it the objective of, "Fetch the coffee, I must fetch the coffee, I can't fetch the coffee if I'm dead," so obviously the PR2 has been listening to my talk, and so it says, therefore, "I must disable my 'off' switch, and probably taser all the other people in Starbucks who might interfere with me."

これは PR2 ロボットです私たちの研究室にあるもので背中に大きな赤い「オフ」スイッチがあります問題はロボットがスイッチを切らせてくれるかということです古典的なやり方をするなら「コーヒーを取ってくる」という目的に対し「コーヒーを取ってこなければならない」「死んだらコーヒーを取ってこれない」と考え私の講演を聴いていたPR2は「オフ・スイッチは無効にしなければ」と判断し「スターバックスで邪魔になる他の客はみんなテーザー銃で眠らせよう」となります

(Laughter)

(笑)

So this seems to be inevitable, right? This kind of failure mode seems to be inevitable, and it follows from having a concrete, definite objective.

これは避けがたいように見えますこのような故障モードは不可避に見えそしてそれは具体的で絶対的な目的があることから来ています

So what happens if the machine is uncertain about the objective? Well, it reasons in a different way. It says, "OK, the human might switch me off, but only if I'm doing something wrong. Well, I don't really know what wrong is, but I know that I don't want to do it." So that's the first and second principles right there. "So I should let the human switch me off." And in fact you can calculate the incentive that the robot has to allow the human to switch it off, and it's directly tied to the degree of uncertainty about the underlying objective.

目的が何なのか機械に確信がないとしたらどうなるでしょう？違ったように推論するはずです「人間は自分のスイッチを切るかもしれないがそれは自分が何か悪いことをしたときだけだ悪いことが何かよく分からないけど悪いことはしたくない」ここで第１および第２の原則が効いています「だからスイッチを切るのを人間に許すべきだ」実際ロボットが人間にスイッチを切ることを許すインセンティブを計算することができそれは目的の不確かさの度合いと直接的に結びついています

And then when the machine is switched off, that third principle comes into play. It learns something about the objectives it should be pursuing, because it learns that what it did wasn't right. In fact, we can, with suitable use of Greek symbols, as mathematicians usually do, we can actually prove a theorem that says that such a robot is provably beneficial to the human. You are provably better off with a machine that's designed in this way than without it. So this is a very simple example, but this is the first step in what we're trying to do with human-compatible AI.

機械のスイッチが切られると第３の原則が働いて追求すべき目的について何かを学びます自分の間違った行いから学ぶのです数学者がよくやるようにギリシャ文字をうまく使ってそのようなロボットが人間にとって有益であるという定理を証明することができますそのようにデザインされた機械の方がそうでないものより良い結果になると証明可能なのですこれは単純な例ですが人間互換のAIを手にするための第一歩です

Now, this third principle, I think is the one that you're probably scratching your head over. You're probably thinking, "Well, you know, I behave badly. I don't want my robot to behave like me. I sneak down in the middle of the night and take stuff from the fridge. I do this and that." There's all kinds of things you don't want the robot doing. But in fact, it doesn't quite work that way. Just because you behave badly doesn't mean the robot is going to copy your behavior. It's going to understand your motivations and maybe help you resist them, if appropriate. But it's still difficult. What we're trying to do, in fact, is to allow machines to predict for any person and for any possible life that they could live, and the lives of everybody else: Which would they prefer? And there are many, many difficulties involved in doing this; I don't expect that this is going to get solved very quickly. The real difficulties, in fact, are us.

３番目の原則については皆さん困惑しているのではと思います「自分の行動は見上げたものではないロボットに自分のように振る舞って欲しくはない真夜中にこっそり台所に行って冷蔵庫から食べ物を失敬したりあんなことやこんなことをしているから」ロボットにしてほしくない様々なことがありますでも実際そういう風に働くわけではありません自分がまずい振る舞いをしたらロボットがそれを真似するというわけではありません人がそのようにする動機を理解して誘惑に抵抗する手助けさえしてくれるかもしれませんそれでも難しいです私たちがやろうとしているのはあらゆる状況にあるあらゆる人のことを機械に予測させるということですその人たちはどちらを好むのか？これには難しいことがたくさんあってごく速やかに解決されるだろうとは思っていません本当に難しい部分は私たちにあります

As I have already mentioned, we behave badly. In fact, some of us are downright nasty. Now the robot, as I said, doesn't have to copy the behavior. The robot does not have any objective of its own. It's purely altruistic. And it's not designed just to satisfy the desires of one person, the user, but in fact it has to respect the preferences of everybody. So it can deal with a certain amount of nastiness, and it can even understand that your nastiness, for example, you may take bribes as a passport official because you need to feed your family and send your kids to school. It can understand that; it doesn't mean it's going to steal. In fact, it'll just help you send your kids to school.

言いましたように私たちはまずい振る舞いをします人によっては悪質でさえありますしかしロボットは人間の振るまいを真似する必要はありませんロボットはそれ自身の目的というのを持ちません純粋に利他的ですそして１人の人間の望みだけ満たそうとするのではなくみんなの好みに敬意を払うようデザインされていますだからある程度悪いことも扱え人間の悪い面も理解できます例えば入国審査官が賄賂を受け取っているけれどそれは家族を食べさせ子供を学校に行かせるためなのだとかロボットはそれを理解できますがそのために盗みをするわけではありませんただ子供が学校に行けるよう手助けをするだけです

We are also computationally limited. Lee Sedol is a brilliant Go player, but he still lost. So if we look at his actions, he took an action that lost the game. That doesn't mean he wanted to lose. So to understand his behavior, we actually have to invert through a model of human cognition that includes our computational limitations -- a very complicated model. But it's still something that we can work on understanding.

また人間は計算能力の点で限界があります李世ドルは素晴らしい碁打ちですがそれでも負けました彼の行動を見れば勝負に負けることになる手を打ったのが分かるでしょうしかしそれは彼が負けを望んだことを意味しません彼の行動を理解するためには人の認知モデルを逆にたどる必要がありますがそれは計算能力の限界も含むとても複雑なモデルですそれでも私たちが理解すべく取り組めるものではあります

Probably the most difficult part, from my point of view as an AI researcher, is the fact that there are lots of us, and so the machine has to somehow trade off, weigh up the preferences of many different people, and there are different ways to do that. Economists, sociologists, moral philosophers have understood that, and we are actively looking for collaboration.

AI研究者として見たとき最も難しいと思える部分は私たち人間が沢山いるということですだから機械はトレードオフを考え沢山の異なる人間の好みを比較考量する必要がありそれにはいろいろなやり方があります経済学者社会学者倫理学者はそういうことを分かっており私たちは協同の道を探っています

Let's have a look and see what happens when you get that wrong. So you can have a conversation, for example, with your intelligent personal assistant that might be available in a few years' time. Think of a Siri on steroids. So Siri says, "Your wife called to remind you about dinner tonight." And of course, you've forgotten. "What? What dinner? What are you talking about?"

そこをうまくやらないとどうなるか見てみましょうたとえばこんな会話を考えてみます知的な秘書AIが数年内に利用可能になるかもしれません強化されたSiriのようなものです Siriが「今晩のディナーについて奥様から確認の電話がありました」と言いますあなたはもちろん忘れています「何のディナーだって？何の話をしているんだ？」

"Uh, your 20th anniversary at 7pm."

「20周年のディナーですよ夜７時の」

"I can't do that. I'm meeting with the secretary-general at 7:30. How could this have happened?"

「無理だよ　７時半に事務総長と会わなきゃならないどうしてこんなことになったんだ？」

"Well, I did warn you, but you overrode my recommendation."

「警告は致しましたがあなたは推奨案を無視されました」

"Well, what am I going to do? I can't just tell him I'm too busy."

「どうしたらいいんだ？忙しくて行けないなんて言えないぞ」

"Don't worry. I arranged for his plane to be delayed."

「ご心配には及びません事務総長の飛行機が遅れるように手配済みです」

(Laughter)

(笑)

"Some kind of computer malfunction."

「コンピューターに細工しておきました」

(Laughter)

(笑)

"Really? You can do that?"

「えっそんなことできるのか？」

"He sends his profound apologies and looks forward to meeting you for lunch tomorrow."

「大変恐縮して明日のランチでお会いするのを楽しみにしているとのことです」

(Laughter)

(笑)

So the values here -- there's a slight mistake going on. This is clearly following my wife's values which is "Happy wife, happy life."

ここでは価値についてちょっと行き違いが起きています Siri は明らかに妻の価値観に従っています「妻の幸せが夫の幸せ」です

(Laughter)

(笑)

It could go the other way. You could come home after a hard day's work, and the computer says, "Long day?"

別の方向に行くこともあり得ます忙しい仕事を終え帰宅するとコンピューターが言います「大変な１日だったようですね」

"Yes, I didn't even have time for lunch."

「昼を食べる時間もなかったよ」

"You must be very hungry."

「お腹が空いたことでしょう」

"Starving, yeah. Could you make some dinner?"

「ああ腹ペコだよ何か夕食を作ってもらえるかな？」

"There's something I need to tell you."

「そのことでお話ししなければならないことがあります」

(Laughter)

(笑)

"There are humans in South Sudan who are in more urgent need than you."

「南スーダンにはあなたよりも必要に迫られている人々がいます」

(Laughter)

(笑)

"So I'm leaving. Make your own dinner."

「行くことに致しましたので夕食はご自分で作ってください」

(Laughter)

(笑)

So we have to solve these problems, and I'm looking forward to working on them.

こういった問題を解かなければなりませんそういう問題に取り組むのは楽しみです

There are reasons for optimism. One reason is, there is a massive amount of data. Because remember -- I said they're going to read everything the human race has ever written. Most of what we write about is human beings doing things and other people getting upset about it. So there's a massive amount of data to learn from.

楽観しているのには理由があります１つには膨大なデータがあること思い出してください機械は人類が書いたあらゆるものを読むことになるでしょう人間の書いたものはたいがい誰かが何かをし他の人がそれに腹を立てたというものです学べるデータが膨大にあります

There's also a very strong economic incentive to get this right. So imagine your domestic robot's at home. You're late from work again and the robot has to feed the kids, and the kids are hungry and there's nothing in the fridge. And the robot sees the cat.

またこれを正しくやるための強い経済的インセンティブが存在します家に家事ロボットがいると想像してくださいあなたはまた仕事で帰りが遅くロボットは子供達に食べさせなければなりません子供達はお腹を空かせていますが冷蔵庫は空っぽですそこでロボットは猫に目を止めます

(Laughter)

(笑)

And the robot hasn't quite learned the human value function properly, so it doesn't understand the sentimental value of the cat outweighs the nutritional value of the cat.

ロボットは人間の価値観をちゃんと学んでいないため猫の持つ感情的価値が猫の栄養的価値を上回ることを理解しません

(Laughter)

(笑)

So then what happens? Well, it happens like this: "Deranged robot cooks kitty for family dinner." That one incident would be the end of the domestic robot industry. So there's a huge incentive to get this right long before we reach superintelligent machines.

するとどうなるでしょう？「狂ったロボット子猫を料理して夕食に出す」みたいな見出しを見ることになりますこのような出来事１つで家事ロボット産業はお終いですだから超知的な機械に到達するずっと以前にこの問題を正すよう大きなインセンティブが働きます

So to summarize: I'm actually trying to change the definition of AI so that we have provably beneficial machines. And the principles are: machines that are altruistic, that want to achieve only our objectives, but that are uncertain about what those objectives are, and will watch all of us to learn more about what it is that we really want. And hopefully in the process, we will learn to be better people. Thank you very much.

要約すると私はAIの定義を変えて人間のためになると証明可能な機械が得られるよう試みていますその原則は機械は利他的であり人間の目的のみを達成しようとするがその目的が何かは確信を持たずそしてすべての人間を観察することで我々の本当に望むことが何かを学ぶということですその過程で人類がより良い者になる術を学ぶことを望みますありがとうございました

(Applause)

(拍手)

Chris Anderson: So interesting, Stuart. We're going to stand here a bit because I think they're setting up for our next speaker.

(クリス・アンダーソン) すごく興味深いねスチュワート次のスピーカーのための準備があるので少しここで話しましょう

A couple of questions. So the idea of programming in ignorance seems intuitively really powerful. As you get to superintelligence, what's going to stop a robot reading literature and discovering this idea that knowledge is actually better than ignorance and still just shifting its own goals and rewriting that programming?

質問があるんですが「無知にプログラムする」というアイデアはとても強力であるように思えます超知的になったロボットが文献を読んで無知よりも知識がある方が良いと気付き自分の目的を変えてプログラムを書き換えてしまう — そういうことにならないためにはどうすれば良いのでしょう？

Stuart Russell: Yes, so we want it to learn more, as I said, about our objectives. It'll only become more certain as it becomes more correct, so the evidence is there and it's going to be designed to interpret it correctly. It will understand, for example, that books are very biased in the evidence they contain. They only talk about kings and princes and elite white male people doing stuff. So it's a complicated problem, but as it learns more about our objectives it will become more and more useful to us.

(スチュワート・ラッセル) 私たちはロボットに人間の目的をよく学んでほしいと思っていますロボットはより正しくなるほど確信を強めます手がかりはそこにあるわけですからそれを正しく解釈するようデザインするのですたとえば本の内容にはバイアスがあることを理解するでしょう王や王女やエリートの白人男性がしたことばかり書かれているといった風にだから複雑な問題ではありますがロボットが我々の目的を学べは学ぶほど我々にとって有用なものになるでしょう

CA: And you couldn't just boil it down to one law, you know, hardwired in: "if any human ever tries to switch me off, I comply. I comply."

(クリス) １つの原則にまとめられないんですか？固定したプログラムとして「人間がスイッチを切ろうとしたら無条件に従う」みたいな

SR: Absolutely not. That would be a terrible idea. So imagine that you have a self-driving car and you want to send your five-year-old off to preschool. Do you want your five-year-old to be able to switch off the car while it's driving along? Probably not. So it needs to understand how rational and sensible the person is. The more rational the person, the more willing you are to be switched off. If the person is completely random or even malicious, then you're less willing to be switched off.

(スチュワート) それは駄目ですねまずいアイデアです自動運転車で５歳の子を幼稚園に送るところを考えてみてください車に１人で乗っている５歳児が車のスイッチを切れるようにしたいと思いますか？違うでしょうロボットはその人間がどれほど理性的で分別があるかを理解する必要があります人間が理性的であるほどスイッチを切らせる見込みは高くなりますまったくランダムな相手や悪意ある人間に対してはなかなかスイッチを切らせようとはしないでしょう

CA: All right. Stuart, can I just say, I really, really hope you figure this out for us. Thank you so much for that talk. That was amazing.

(クリス) スチュワートあなたがみんなのためにこの問題を解決してくれることを切に望みますありがとうございました素晴らしいお話でした

SR: Thank you.

(スチュワート) どうもありがとう

(Applause)

(拍手)

This is Lee Sedol. Lee Sedol is one of the world's greatest Go players, and he's having what my friends in Silicon Valley call a "Holy Cow" moment --

これは李世ドルです李世ドルは世界で最も強い碁打ちの１人ですがシリコンバレーの友人たちなら「なんてこった」と言う瞬間を迎えています

(Laughter)

(笑)

a moment where we realize that AI is actually progressing a lot faster than we expected. So humans have lost on the Go board. What about the real world?

(Laughter)

(笑)

So we actually need to nail down the problem a bit more. What exactly is the problem? Why is better AI possibly a catastrophe?

この問題をもう少し明確にする必要があるでしょう正確に何が問題なのか？優れたAIが我々の破滅に繋がりうるのはなぜなのか？

(Laughter)

(笑)

It's very simple. Just remember that. Repeat it to yourself three times a day.

簡単でしょう記憶して１日３回唱えてください

(Laughter)

(笑)

(Laughter)

(笑)

So this seems to be inevitable, right? This kind of failure mode seems to be inevitable, and it follows from having a concrete, definite objective.

これは避けがたいように見えますこのような故障モードは不可避に見えそしてそれは具体的で絶対的な目的があることから来ています

"Uh, your 20th anniversary at 7pm."

「20周年のディナーですよ夜７時の」

"I can't do that. I'm meeting with the secretary-general at 7:30. How could this have happened?"

「無理だよ　７時半に事務総長と会わなきゃならないどうしてこんなことになったんだ？」

"Well, I did warn you, but you overrode my recommendation."

「警告は致しましたがあなたは推奨案を無視されました」

"Well, what am I going to do? I can't just tell him I'm too busy."

「どうしたらいいんだ？忙しくて行けないなんて言えないぞ」

"Don't worry. I arranged for his plane to be delayed."

「ご心配には及びません事務総長の飛行機が遅れるように手配済みです」

(Laughter)

(笑)

"Some kind of computer malfunction."

「コンピューターに細工しておきました」

(Laughter)

(笑)

"Really? You can do that?"

「えっそんなことできるのか？」

"He sends his profound apologies and looks forward to meeting you for lunch tomorrow."

「大変恐縮して明日のランチでお会いするのを楽しみにしているとのことです」

(Laughter)

(笑)

So the values here -- there's a slight mistake going on. This is clearly following my wife's values which is "Happy wife, happy life."

ここでは価値についてちょっと行き違いが起きています Siri は明らかに妻の価値観に従っています「妻の幸せが夫の幸せ」です

(Laughter)

(笑)

It could go the other way. You could come home after a hard day's work, and the computer says, "Long day?"

別の方向に行くこともあり得ます忙しい仕事を終え帰宅するとコンピューターが言います「大変な１日だったようですね」

"Yes, I didn't even have time for lunch."

「昼を食べる時間もなかったよ」

"You must be very hungry."

「お腹が空いたことでしょう」

"Starving, yeah. Could you make some dinner?"

「ああ腹ペコだよ何か夕食を作ってもらえるかな？」

"There's something I need to tell you."

「そのことでお話ししなければならないことがあります」

(Laughter)

(笑)

"There are humans in South Sudan who are in more urgent need than you."

「南スーダンにはあなたよりも必要に迫られている人々がいます」

(Laughter)

(笑)

"So I'm leaving. Make your own dinner."

「行くことに致しましたので夕食はご自分で作ってください」

(Laughter)

(笑)

So we have to solve these problems, and I'm looking forward to working on them.

こういった問題を解かなければなりませんそういう問題に取り組むのは楽しみです

(Laughter)

(笑)

And the robot hasn't quite learned the human value function properly, so it doesn't understand the sentimental value of the cat outweighs the nutritional value of the cat.

ロボットは人間の価値観をちゃんと学んでいないため猫の持つ感情的価値が猫の栄養的価値を上回ることを理解しません

(Laughter)

(笑)

(Applause)

(拍手)

Chris Anderson: So interesting, Stuart. We're going to stand here a bit because I think they're setting up for our next speaker.

(クリス・アンダーソン) すごく興味深いねスチュワート次のスピーカーのための準備があるので少しここで話しましょう

CA: And you couldn't just boil it down to one law, you know, hardwired in: "if any human ever tries to switch me off, I comply. I comply."

(クリス) １つの原則にまとめられないんですか？固定したプログラムとして「人間がスイッチを切ろうとしたら無条件に従う」みたいな

CA: All right. Stuart, can I just say, I really, really hope you figure this out for us. Thank you so much for that talk. That was amazing.

(クリス) スチュワートあなたがみんなのためにこの問題を解決してくれることを切に望みますありがとうございました素晴らしいお話でした

SR: Thank you.

(スチュワート) どうもありがとう

(Applause)

(拍手)

Stuart Russell: 3 principles for creating safer AI

Stuart Russell: 3 principles for creating safer AI

Related talks

Blaise Agüera y Arcas: How computers are learning to be creative

Sam Harris: Can we build AI without losing control over it?

Zeynep Tufekci: Machine intelligence makes human morals more important

Noriko Arai: Can a robot pass a university entrance exam?

David Lee: Why jobs of the future won't feel like work

Kriti Sharma: How to keep human bias out of AI

Related talks

Blaise Agüera y Arcas: How computers are learning to be creative

Sam Harris: Can we build AI without losing control over it?

Zeynep Tufekci: Machine intelligence makes human morals more important

Noriko Arai: Can a robot pass a university entrance exam?

David Lee: Why jobs of the future won't feel like work

Kriti Sharma: How to keep human bias out of AI