Stuart Russell: 3 principles for creating safer AI

Das hier ist Lee Sedol. Lee Sedol ist einer der besten Go-Spieler. Er erlebt hier, was meine Freunde im Silicon Valley einen "Heiliger Strohsack"-Moment nennen.

This is Lee Sedol. Lee Sedol is one of the world's greatest Go players, and he's having what my friends in Silicon Valley call a "Holy Cow" moment --

(Gelächter)

(Laughter)

Ein Moment, in dem wir bemerken, dass die KI schneller vorankommt, als wir erwartet haben. Menschen verloren also bei Go. Was ist mit der echten Welt?

a moment where we realize that AI is actually progressing a lot faster than we expected. So humans have lost on the Go board. What about the real world? Well, the real world is much bigger,

Die echte Welt ist viel größer und komplizierter als ein Go-Brett. Sie ist weniger einsehbar, aber trotzdem entscheidbar. Wenn wir an ein paar Technologien denken, die auf uns zukommen -- Noriko [Arai] erwähnte, dass Maschinen noch nicht lesen können, jedenfalls nicht mit Verständnis. Aber das wird kommen. Wenn es so weit kommt, werden Maschinen sehr bald alles gelesen haben, was Menschen je geschrieben haben. Das wird Maschinen ermöglichen, neben der Fähigkeit weiter vorauszuschauen als Menschen, wie wir bei Go schon sahen -- wenn sie auch noch mehr Information haben, werden sie in der echten Welt bessere Entscheidungen treffen können als wir. Ist das gut? Ich hoffe es.

much more complicated than the Go board. It's a lot less visible, but it's still a decision problem. And if we think about some of the technologies that are coming down the pike ... Noriko [Arai] mentioned that reading is not yet happening in machines, at least with understanding. But that will happen, and when that happens, very soon afterwards, machines will have read everything that the human race has ever written. And that will enable machines, along with the ability to look further ahead than humans can, as we've already seen in Go, if they also have access to more information, they'll be able to make better decisions in the real world than we can. So is that a good thing? Well, I hope so.

Unsere gesamte Zivilisation, alles, was wir wertschätzen, basiert auf unserer Intelligenz. Wenn wir Zugriff auf mehr Intelligenz hätten, dann kann die Menschheit alles erreichen. Ich glaube, das könnte, wie manche Leute es beschrieben, das größte Ereignis in der Geschichte der Menschheit sein. Warum sagen Leute also, dass KI das Ende der Menschheit bedeuten könnte? Ist das neu? Sind es nur Elon Musk, Bill Gates und Stephen Hawking?

Our entire civilization, everything that we value, is based on our intelligence. And if we had access to a lot more intelligence, then there's really no limit to what the human race can do. And I think this could be, as some people have described it, the biggest event in human history. So why are people saying things like this, that AI might spell the end of the human race? Is this a new thing? Is it just Elon Musk and Bill Gates and Stephen Hawking?

Nein. Diese Idee gibt es schon länger. Hier ist ein Zitat: "Selbst wenn wir die Maschinen in unterwürfiger Stellung halten könnten, z. B. indem wir den Strom abschalten" -- ich werde später auf diese Idee eingehen -- "sollten wir uns als Spezies demütig fühlen." Wer hat das gesagt? Alan Turing, 1951. Alan Turing, wie Sie wissen, ist der Vater der Informatik und in vielerlei Hinsicht auch der KI. Wenn wir also über dieses Problem nachdenken, etwas Intelligenteres als unsere eigene Art zu entwerfen, könnten wir es als "Gorilla-Problem" bezeichnen, weil die Vorfahren der Gorillas das vor Jahrmillionen taten. Wir können die Gorillas jetzt fragen: War das eine gute Idee?

Actually, no. This idea has been around for a while. Here's a quotation: "Even if we could keep the machines in a subservient position, for instance, by turning off the power at strategic moments" -- and I'll come back to that "turning off the power" idea later on -- "we should, as a species, feel greatly humbled." So who said this? This is Alan Turing in 1951. Alan Turing, as you know, is the father of computer science and in many ways, the father of AI as well. So if we think about this problem, the problem of creating something more intelligent than your own species, we might call this "the gorilla problem," because gorillas' ancestors did this a few million years ago, and now we can ask the gorillas: Was this a good idea?

Hier sind sie bei einem Meeting und besprechen, ob es eine gute Idee war. Sie kommen zu dem Schluss, nein, es war eine schlechte Idee. Unsere Spezies ist in arger Not. Sie können die existenzielle Traurigkeit in ihren Augen sehen.

So here they are having a meeting to discuss whether it was a good idea, and after a little while, they conclude, no, this was a terrible idea. Our species is in dire straits. In fact, you can see the existential sadness in their eyes.

(Gelächter)

(Laughter)

Das mulmige Gefühl, dass es eine schlechte Idee sein könnte, etwas Intelligenteres als uns selbst zu erstellen, was können wir dagegen tun? Nichts so richtig, außer keine KI mehr zu entwickeln. Wegen all der Vorteile, die ich schon erwähnte, und weil ich ein KI-Forscher bin, werde ich das nicht zulassen. Ich will weiter KI erforschen.

So this queasy feeling that making something smarter than your own species is maybe not a good idea -- what can we do about that? Well, really nothing, except stop doing AI, and because of all the benefits that I mentioned and because I'm an AI researcher, I'm not having that. I actually want to be able to keep doing AI.

Wir müssen das Problem also genauer festnageln. Was genau ist das Problem? Warum könnte bessere KI eine Katastrophe sein?

So we actually need to nail down the problem a bit more. What exactly is the problem? Why is better AI possibly a catastrophe?

Hier ist noch ein Zitat: "Wir sollten uns sicher sein, dass die Absichten in der Maschine tatsächlich die gewünschten Absichten sind." Das sagte Norbert Wiener 1960, kurz nachdem er ein sehr frühes lernendes System hatte lernen sehen, besser als sein Urheber Dame zu spielen. Aber das hätte genauso gut König Midas sagen können. Er sagte: "Alles, was ich berühre, soll zu Gold werden." Er bekam das, was er wollte. Das war der Zweck, den er der Maschine gab, sozusagen, und sein Essen, Trinken und Verwandte verwandelten sich in Gold. Er verhungerte elendig. Nennen wir das also "das König-Midas-Problem", ein Ziel zu verfolgen, das in Wirklichkeit nicht mit dem, was wir wollen, übereinstimmt. Moderner gesagt nennen wir das das "Wertausrichtungsproblem".

So here's another quotation: "We had better be quite sure that the purpose put into the machine is the purpose which we really desire." This was said by Norbert Wiener in 1960, shortly after he watched one of the very early learning systems learn to play checkers better than its creator. But this could equally have been said by King Midas. King Midas said, "I want everything I touch to turn to gold," and he got exactly what he asked for. That was the purpose that he put into the machine, so to speak, and then his food and his drink and his relatives turned to gold and he died in misery and starvation. So we'll call this "the King Midas problem" of stating an objective which is not, in fact, truly aligned with what we want. In modern terms, we call this "the value alignment problem."

Das falsche Ziel vorzugeben, ist nicht der einzige Teil des Problems. Es gibt noch einen Teil. Wenn man Maschinen ein Ziel gibt, selbst etwas Einfaches wie: "Hol den Kaffee", sagt sich die Maschine: "Wie könnte ich dabei scheitern, den Kaffee zu holen? Jemand könnte mich ausschalten. Okay, ich muss das verhindern. Ich werde meinen Ausschalter blockieren. Ich werde alles tun, um mich gegen Störung des Ziels zu verteidigen, das mir gegeben wurde." Dieses zielstrebige, defensive Verfolgen eines Ziels, das tatsächlich nicht mit den Zielen der Menschheit übereinstimmt, ist das Problem, vor dem wir stehen. Das ist die hochwertige Information in diesem Vortrag. Wenn Sie sich nur eins merken, dann, dass Sie keinen Kaffee holen können, wenn Sie tot sind.

Putting in the wrong objective is not the only part of the problem. There's another part. If you put an objective into a machine, even something as simple as, "Fetch the coffee," the machine says to itself, "Well, how might I fail to fetch the coffee? Someone might switch me off. OK, I have to take steps to prevent that. I will disable my 'off' switch. I will do anything to defend myself against interference with this objective that I have been given." So this single-minded pursuit in a very defensive mode of an objective that is, in fact, not aligned with the true objectives of the human race -- that's the problem that we face. And in fact, that's the high-value takeaway from this talk. If you want to remember one thing, it's that you can't fetch the coffee if you're dead.

(Gelächter)

(Laughter)

Es ist einfach. Merken Sie sich das. Wiederholen Sie es dreimal am Tag.

It's very simple. Just remember that. Repeat it to yourself three times a day.

(Gelächter)

(Laughter)

Das ist im Grunde die Handlung von "2001: [Odyssee im Weltraum]". HAL hat ein Ziel, eine Mission, die nicht mit den Zielen der Menschen übereinstimmt, was zu Konflikt führt. Zum Glück ist HAL nicht superintelligent. Er ist ziemlich clever, aber Dave überlistet ihn letztendlich und schaltet ihn aus. Wir werden vielleicht weniger Glück haben. Was werden wir also tun?

And in fact, this is exactly the plot of "2001: [A Space Odyssey]" HAL has an objective, a mission, which is not aligned with the objectives of the humans, and that leads to this conflict. Now fortunately, HAL is not superintelligent. He's pretty smart, but eventually Dave outwits him and manages to switch him off. But we might not be so lucky. So what are we going to do?

Ich versuche, KI neu zu definieren, um von dieser klassischen Vorstellung wegzukommen, dass Maschinen Ziele auf intelligente Weise verfolgen. Es geht um drei Prinzipien. Das erste ist ein Prinzip des Altruismus, dass das einzige Ziel eines Roboters ist, die Verwirklichung menschlicher Ziele und Werte zu maximieren. Mit Werten meine ich nicht gefühlsduselige, tugendhafte Werte, sondern wie ein Mensch sein Leben bevorzugen würde. Das widerspricht also Asimovs Gesetz, dass der Roboter sich selbst schützen muss. Er hat kein Interesse an Selbsterhaltung.

I'm trying to redefine AI to get away from this classical notion of machines that intelligently pursue objectives. There are three principles involved. The first one is a principle of altruism, if you like, that the robot's only objective is to maximize the realization of human objectives, of human values. And by values here I don't mean touchy-feely, goody-goody values. I just mean whatever it is that the human would prefer their life to be like. And so this actually violates Asimov's law that the robot has to protect its own existence. It has no interest in preserving its existence whatsoever.

Das zweite Gesetz ist das Gesetz der Ergebenheit. Das ist sehr wichtig, um Roboter sicherzumachen. Es besagt, dass der Roboter diese menschlichen Werte nicht kennt, sie maximieren soll, aber sie nicht kennt. Das verhindert das Problem des zielstrebigen Verfolgens eines Ziels. Diese Ungewissheit ist ausschlaggebend.

The second law is a law of humility, if you like. And this turns out to be really important to make robots safe. It says that the robot does not know what those human values are, so it has to maximize them, but it doesn't know what they are. And that avoids this problem of single-minded pursuit of an objective. This uncertainty turns out to be crucial.

Damit er uns nützlich ist, muss er eine Ahnung haben, was wir wollen. Er erlangt diese Information durch das Beobachten menschlicher Entscheidungen. Unsere Entscheidungen beinhalten Information darüber, wie wir unser Leben gerne hätten. Das sind die drei Prinzipien. Was bewirken sie bei der Frage "Kann man die Maschine ausschalten?", wie Turing vorschlug.

Now, in order to be useful to us, it has to have some idea of what we want. It obtains that information primarily by observation of human choices, so our own choices reveal information about what it is that we prefer our lives to be like. So those are the three principles. Let's see how that applies to this question of: "Can you switch the machine off?" as Turing suggested.

Hier ist ein PR2-Roboter. Den haben wir bei uns im Labor und er hat einen großen, roten Ausschalter am Rücken. Die Frage ist: Lässt es uns ihn ausschalten? Wenn wir das klassisch machen, geben wir ihm das Ziel: "Ich hole Kaffee, ich muss Kaffee holen, ich kann keinen Kaffee holen, wenn ich tot bin." PR2 hat sich natürlich meinen Vortrag angehört und sagt also: "Ich muss meinen Ausschalter blockieren und alle anderen Leute in Starbucks tasern, die mich stören könnten."

So here's a PR2 robot. This is one that we have in our lab, and it has a big red "off" switch right on the back. The question is: Is it going to let you switch it off? If we do it the classical way, we give it the objective of, "Fetch the coffee, I must fetch the coffee, I can't fetch the coffee if I'm dead," so obviously the PR2 has been listening to my talk, and so it says, therefore, "I must disable my 'off' switch, and probably taser all the other people in Starbucks who might interfere with me."

(Gelächter)

(Laughter)

Das scheint unvermeidbar, nicht wahr? Dieser Fehlermodus scheint unvermeidbar und folgt aus der konkreten, eindeutigen Zielverfolgung.

So this seems to be inevitable, right? This kind of failure mode seems to be inevitable, and it follows from having a concrete, definite objective.

Was passiert, wenn die Maschine unsicher über das Ziel ist? Sie denkt anders. Sie sagt sich: "Der Mensch schaltet mich vielleicht aus, aber nur, wenn ich was falsch mache. Ich weiß nicht so richtig, was falsch ist, aber ich weiß, dass ich das nicht will." Das sind die ersten zwei Prinzipien. "Also sollte ich den Menschen mich ausschalten lassen." Man kann sogar den Anreiz errechnen, den ein Roboter hat, sich ausschalten zu lassen. Er ist direkt mit dem Grad der Ungewissheit über das Ziel verbunden.

So what happens if the machine is uncertain about the objective? Well, it reasons in a different way. It says, "OK, the human might switch me off, but only if I'm doing something wrong. Well, I don't really know what wrong is, but I know that I don't want to do it." So that's the first and second principles right there. "So I should let the human switch me off." And in fact you can calculate the incentive that the robot has to allow the human to switch it off, and it's directly tied to the degree of uncertainty about the underlying objective.

Wenn die Maschine ausgeschaltet wird, kommt das dritte Prinzip ins Spiel. Sie lernt über die Ziele, die sie verfolgen sollte, weil sie lernt, dass sie etwas falsch gemacht hat. Mit angemessener Verwendung griechischer Zeichen, wie Mathematiker das halt tun, können wir einen Satz beweisen, der besagt, dass ein solcher Roboter nachweislich nützlich für Menschen ist. Man ist bewiesenermaßen mit so einer Maschine besser dran als ohne sie. Das ist ein sehr einfaches Beispiel, aber es ist der erste Schritt in unseren Versuchen mit KI, die mit Menschen kompatibel ist.

And then when the machine is switched off, that third principle comes into play. It learns something about the objectives it should be pursuing, because it learns that what it did wasn't right. In fact, we can, with suitable use of Greek symbols, as mathematicians usually do, we can actually prove a theorem that says that such a robot is provably beneficial to the human. You are provably better off with a machine that's designed in this way than without it. So this is a very simple example, but this is the first step in what we're trying to do with human-compatible AI.

Über das dritte Prinzip zerbrechen Sie sich wahrscheinlich den Kopf. Wahrscheinlich denken Sie: "Ich verhalte mich schlecht. Mein Roboter soll sich nicht wie ich verhalten. Ich schleiche mich mitten in der Nacht zum Kühlschrank. Ich mache alles mögliche." Der Roboter soll alles mögliche nicht machen. Aber so funktioniert das nicht. Nur, weil Sie sich schlecht benehmen, muss der Roboter nicht Ihr Verhalten imitieren. Er versteht Ihre Motivation und kann vielleicht helfen, ihr zu widerstehen, falls angemessen. Aber es ist trotzdem schwer. Wir versuchen, Maschinen zu ermöglichen, für jede Person und jedes mögliche Leben und alle anderen Leben vorauszusagen: Was würden sie vorziehen? Das bringt viele Schwierigkeiten mit sich. Ich erwarte keine schnelle Lösung. Die eigentliche Schwierigkeit sind wir selbst.

Now, this third principle, I think is the one that you're probably scratching your head over. You're probably thinking, "Well, you know, I behave badly. I don't want my robot to behave like me. I sneak down in the middle of the night and take stuff from the fridge. I do this and that." There's all kinds of things you don't want the robot doing. But in fact, it doesn't quite work that way. Just because you behave badly doesn't mean the robot is going to copy your behavior. It's going to understand your motivations and maybe help you resist them, if appropriate. But it's still difficult. What we're trying to do, in fact, is to allow machines to predict for any person and for any possible life that they could live, and the lives of everybody else: Which would they prefer? And there are many, many difficulties involved in doing this; I don't expect that this is going to get solved very quickly. The real difficulties, in fact, are us.

Wie bereits gesagt benehmen wir uns schlecht. Manche von uns sind richtig fies. Der Roboter muss das Verhalten nicht imitieren. Der Roboter hat keine eigenen Ziele. Er ist komplett altruistisch. Er ist nicht zur Erfüllung der Wünsche eines Menschen, des Benutzers, vorgesehen, sondern muss die Präferenzen Aller respektieren. Er kann also mit etwas Gemeinheit umgehen und sogar verstehen, dass Sie zum Beispiel als Passbeamter Bestechung annehmen, weil Sie Ihre Familie ernähren und Ihre Kinder zur Schule schicken müssen. Er versteht das und wird nicht deswegen stehlen, sondern Ihnen helfen, Ihre Kinder zur Schule zu schicken.

As I have already mentioned, we behave badly. In fact, some of us are downright nasty. Now the robot, as I said, doesn't have to copy the behavior. The robot does not have any objective of its own. It's purely altruistic. And it's not designed just to satisfy the desires of one person, the user, but in fact it has to respect the preferences of everybody. So it can deal with a certain amount of nastiness, and it can even understand that your nastiness, for example, you may take bribes as a passport official because you need to feed your family and send your kids to school. It can understand that; it doesn't mean it's going to steal. In fact, it'll just help you send your kids to school.

Außerdem haben wir technische Limits. Lee Sedol ist ein hervorragender Go-Spieler, verlor aber trotzdem. Er hat also einen Zug gespielt, der das Spiel verloren hat. Das bedeutet nicht, dass er verlieren wollte. Um sein Verhalten zu verstehen, brauchen wir also ein Modell menschlicher Kognition, das unsere technischen Limits einbezieht -- ein sehr kompliziertes Modell. Aber wir können daran arbeiten, es zu verstehen.

We are also computationally limited. Lee Sedol is a brilliant Go player, but he still lost. So if we look at his actions, he took an action that lost the game. That doesn't mean he wanted to lose. So to understand his behavior, we actually have to invert through a model of human cognition that includes our computational limitations -- a very complicated model. But it's still something that we can work on understanding.

Als KI-Forscher erscheint es mir am schwierigsten, dass wir so viele sind und die Maschine irgendwie die Präferenzen vieler verschiedener Leute abwägen muss. Das geht auf verschiedene Arten. Wirtschaftswissenschaftler, Soziologen und Ethiker haben das verstanden. Wir suchen aktiv nach Zusammenarbeit.

Probably the most difficult part, from my point of view as an AI researcher, is the fact that there are lots of us, and so the machine has to somehow trade off, weigh up the preferences of many different people, and there are different ways to do that. Economists, sociologists, moral philosophers have understood that, and we are actively looking for collaboration.

Was passiert, wenn das schiefgeht? Man kann z. B. in ein paar Jahren mit seinem intelligenten Assistenten, eine Konversation führen. Man stelle sich ein aufgeputschtes Siri vor. Siri sagt: "Ihre Frau rief an, um Sie an das Abendessen heute zu erinnern." Sie haben es natürlich vergessen: "Was für ein Abendessen? Worum geht es?"

Let's have a look and see what happens when you get that wrong. So you can have a conversation, for example, with your intelligent personal assistant that might be available in a few years' time. Think of a Siri on steroids. So Siri says, "Your wife called to remind you about dinner tonight." And of course, you've forgotten. "What? What dinner? What are you talking about?"

"Ihr 20. Hochzeitstag um 7 Uhr."

"Uh, your 20th anniversary at 7pm."

"Das geht nicht. Ich treffe den Generalsekretär um 7:30. Wie konnte das passieren?"

"I can't do that. I'm meeting with the secretary-general at 7:30. How could this have happened?"

"Ich habe Sie gewarnt, aber Sie haben meine Empfehlung ignoriert."

"Well, I did warn you, but you overrode my recommendation."

"Was mache ich jetzt? Ich kann ihm nicht absagen."

"Well, what am I going to do? I can't just tell him I'm too busy."

"Keine Sorge, ich habe seinen Flug verzögert."

"Don't worry. I arranged for his plane to be delayed."

(Gelächter)

(Laughter)

"Irgend eine Computerstörung."

"Some kind of computer malfunction."

(Gelächter)

(Laughter)

"Echt? Du kannst das?"

"Really? You can do that?"

"Er bittet um Entschuldigung. Er freut sich darauf, Sie morgen Mittag zu treffen."

"He sends his profound apologies and looks forward to meeting you for lunch tomorrow."

(Gelächter)

(Laughter)

Die Werte hier -- hier läuft etwas schief. Es folgt ganz klar den Werten meiner Frau, und zwar "glückliche Frau, glückliches Leben".

So the values here -- there's a slight mistake going on. This is clearly following my wife's values which is "Happy wife, happy life."

(Gelächter)

(Laughter)

Es könnte auch anders laufen. Man kommt nach einem langen Arbeitstag heim und der Computer sagt: "Langer Tag?"

It could go the other way. You could come home after a hard day's work, and the computer says, "Long day?"

"Ja, ich hatte keine Zeit zum Mittagessen."

"Yes, I didn't even have time for lunch."

"Sie haben bestimmt Hunger."

"You must be very hungry."

"Ich bin am Verhungern. Kannst du mir Essen machen?"

"Starving, yeah. Could you make some dinner?"

"Ich muss Ihnen etwas sagen."

"There's something I need to tell you."

(Gelächter)

(Laughter)

"Im Südsudan gibt es bedürftigere Menschen als Sie."

"There are humans in South Sudan who are in more urgent need than you."

(Gelächter)

(Laughter)

"Also gehe ich. Machen Sie Ihr Essen selbst."

"So I'm leaving. Make your own dinner."

(Gelächter)

(Laughter)

Wir müssen also diese Probleme lösen und ich freue mich darauf, daran zu arbeiten.

So we have to solve these problems, and I'm looking forward to working on them.

Es gibt Anlass zur Hoffnung. Ein Anlass ist die enorme Menge an Daten. Wie gesagt, sie werden alles lesen, was geschrieben wurde. Der Großteil davon ist über Menschen und ihre Taten und die Reaktionen anderer. Also kann man von vielen Daten lernen.

There are reasons for optimism. One reason is, there is a massive amount of data. Because remember -- I said they're going to read everything the human race has ever written. Most of what we write about is human beings doing things and other people getting upset about it. So there's a massive amount of data to learn from.

Es gibt auch wirtschaftliche Anreize, das hinzubekommen. Z. B. Haushaltsroboter. Sie sind wieder zu spät und der Roboter soll den Kindern Essen machen. Die Kinder haben Hunger und der Kühlschrank ist leer. Und der Roboter sieht die Katze.

There's also a very strong economic incentive to get this right. So imagine your domestic robot's at home. You're late from work again and the robot has to feed the kids, and the kids are hungry and there's nothing in the fridge. And the robot sees the cat.

(Gelächter)

(Laughter)

Der Roboter hat die menschliche Wertefunktion noch nicht so ganz gelernt und versteht somit nicht, dass der sentimentale Wert der Katze ihren Nährwert übertrifft.

And the robot hasn't quite learned the human value function properly, so it doesn't understand the sentimental value of the cat outweighs the nutritional value of the cat.

(Gelächter)

(Laughter)

Was passiert also? Es passiert etwa so etwas: "Unzurechnungsfähiger Roboter kocht Kätzchen zum Abendessen." Dieser eine Vorfall wäre das Aus der Haushaltsroboter-Industrie. Es gibt also einen großen Anreiz, das herauszubekommen, lange bevor wir superintelligente Maschinen herstellen.

So then what happens? Well, it happens like this: "Deranged robot cooks kitty for family dinner." That one incident would be the end of the domestic robot industry. So there's a huge incentive to get this right long before we reach superintelligent machines.

Zusammenfassend: Ich will die Definition von KI ändern, sodass wir nachweislich vorteilhafte Maschinen haben. Die Grundsätze sind: Maschinen sind altruistisch, wollen nur unsere Ziele erreichen, aber wissen nicht genau, was diese Ziele sind, also beobachten sie uns alle, um besser zu lernen, was wir wirklich wollen. Hoffentlich werden wir dadurch lernen, bessere Menschen zu sein. Vielen Dank.

So to summarize: I'm actually trying to change the definition of AI so that we have provably beneficial machines. And the principles are: machines that are altruistic, that want to achieve only our objectives, but that are uncertain about what those objectives are, and will watch all of us to learn more about what it is that we really want. And hopefully in the process, we will learn to be better people. Thank you very much.

(Beifall)

(Applause)

Chris Anderson: Sehr interessant, Stuart. Wir haben etwas Zeit, weil alles für den nächsten Redner vorbereitet wird.

Chris Anderson: So interesting, Stuart. We're going to stand here a bit because I think they're setting up

Ein paar Fragen.

for our next speaker.

Die Idee, Unwissenheit vorzuprogrammieren, erscheint intuitiv wirksam. Wenn man Superintelligenz erreicht, was soll einen Roboter davon abhalten, Literatur zu lesen und auf die Idee zu kommen, dass Wissen besser ist als Unwissenheit und seine eigenen Ziele anzupassen und sich umzuprogrammieren?

A couple of questions. So the idea of programming in ignorance seems intuitively really powerful. As you get to superintelligence, what's going to stop a robot reading literature and discovering this idea that knowledge is actually better than ignorance and still just shifting its own goals and rewriting that programming?

Stuart Russel: Wir wollen, wie gesagt, dass er dazulernt, was unsere Ziele betrifft. Er wird sich sicherer werden, wenn er fehlerloser wird, also gibt es Hinweise und er ist dazu konzipiert, sie richtig zu interpretieren. Er wird verstehen, dass Bücher einseitig in ihrer Darstellung sind. Sie handeln nur von Königen und Prinzen und mächtigen weißen Männern und ihren Taten. Es ist ein kompliziertes Problem, aber wenn er mehr über unsere Ziele lernt, wird er zunehmend nützlicher für uns.

Stuart Russell: Yes, so we want it to learn more, as I said, about our objectives. It'll only become more certain as it becomes more correct, so the evidence is there and it's going to be designed to interpret it correctly. It will understand, for example, that books are very biased in the evidence they contain. They only talk about kings and princes and elite white male people doing stuff. So it's a complicated problem, but as it learns more about our objectives it will become more and more useful to us.

CA: Kann man das nicht zu einem Gesetz zusammenfassen, vorprogrammiert: "Wenn ein Mensch versucht, mich abzuschalten, gehorche ich. Ich gehorche."

CA: And you couldn't just boil it down to one law, you know, hardwired in: "if any human ever tries to switch me off, I comply. I comply."

SR: Auf keinen Fall, das wäre eine schlechte Idee. Wenn zum Beispiel Ihr selbstfahrendes Auto Ihr fünfjähriges Kind zum Kindergarten fahren soll, wollen Sie, dass Ihr Kind das Auto abschalten kann, während es fährt? Wahrscheinlich nicht. Es muss also verstehen, wie rational und vernünftig die Person ist. Je rationaler die Person, desto eher sollte es sich ausschalten. Wenn die Person willkürlich oder sogar boshaft ist, sollte es sich nicht ausschalten.

SR: Absolutely not. That would be a terrible idea. So imagine that you have a self-driving car and you want to send your five-year-old off to preschool. Do you want your five-year-old to be able to switch off the car while it's driving along? Probably not. So it needs to understand how rational and sensible the person is. The more rational the person, the more willing you are to be switched off. If the person is completely random or even malicious, then you're less willing to be switched off.

CA: Okay, Stuart, Ich hoffe nur, dass Sie das für uns lösen können. Vielen Dank für Ihren Vortrag. Er war toll.

CA: All right. Stuart, can I just say, I really, really hope you figure this out for us.

SR: Danke.

Thank you so much for that talk. That was amazing.

(Beifall)

SR: Thank you.

Das hier ist Lee Sedol. Lee Sedol ist einer der besten Go-Spieler. Er erlebt hier, was meine Freunde im Silicon Valley einen "Heiliger Strohsack"-Moment nennen.

This is Lee Sedol. Lee Sedol is one of the world's greatest Go players, and he's having what my friends in Silicon Valley call a "Holy Cow" moment --

(Gelächter)

(Laughter)

Ein Moment, in dem wir bemerken, dass die KI schneller vorankommt, als wir erwartet haben. Menschen verloren also bei Go. Was ist mit der echten Welt?

a moment where we realize that AI is actually progressing a lot faster than we expected. So humans have lost on the Go board. What about the real world? Well, the real world is much bigger,

(Gelächter)

(Laughter)

Wir müssen das Problem also genauer festnageln. Was genau ist das Problem? Warum könnte bessere KI eine Katastrophe sein?

So we actually need to nail down the problem a bit more. What exactly is the problem? Why is better AI possibly a catastrophe?

(Gelächter)

(Laughter)

Es ist einfach. Merken Sie sich das. Wiederholen Sie es dreimal am Tag.

It's very simple. Just remember that. Repeat it to yourself three times a day.

(Gelächter)

(Laughter)

(Gelächter)

(Laughter)

Das scheint unvermeidbar, nicht wahr? Dieser Fehlermodus scheint unvermeidbar und folgt aus der konkreten, eindeutigen Zielverfolgung.

So this seems to be inevitable, right? This kind of failure mode seems to be inevitable, and it follows from having a concrete, definite objective.

"Ihr 20. Hochzeitstag um 7 Uhr."

"Uh, your 20th anniversary at 7pm."

"Das geht nicht. Ich treffe den Generalsekretär um 7:30. Wie konnte das passieren?"

"I can't do that. I'm meeting with the secretary-general at 7:30. How could this have happened?"

"Ich habe Sie gewarnt, aber Sie haben meine Empfehlung ignoriert."

"Well, I did warn you, but you overrode my recommendation."

"Was mache ich jetzt? Ich kann ihm nicht absagen."

"Well, what am I going to do? I can't just tell him I'm too busy."

"Keine Sorge, ich habe seinen Flug verzögert."

"Don't worry. I arranged for his plane to be delayed."

(Gelächter)

(Laughter)

"Irgend eine Computerstörung."

"Some kind of computer malfunction."

(Gelächter)

(Laughter)

"Echt? Du kannst das?"

"Really? You can do that?"

"Er bittet um Entschuldigung. Er freut sich darauf, Sie morgen Mittag zu treffen."

"He sends his profound apologies and looks forward to meeting you for lunch tomorrow."

(Gelächter)

(Laughter)

Die Werte hier -- hier läuft etwas schief. Es folgt ganz klar den Werten meiner Frau, und zwar "glückliche Frau, glückliches Leben".

So the values here -- there's a slight mistake going on. This is clearly following my wife's values which is "Happy wife, happy life."

(Gelächter)

(Laughter)

Es könnte auch anders laufen. Man kommt nach einem langen Arbeitstag heim und der Computer sagt: "Langer Tag?"

It could go the other way. You could come home after a hard day's work, and the computer says, "Long day?"

"Ja, ich hatte keine Zeit zum Mittagessen."

"Yes, I didn't even have time for lunch."

"Sie haben bestimmt Hunger."

"You must be very hungry."

"Ich bin am Verhungern. Kannst du mir Essen machen?"

"Starving, yeah. Could you make some dinner?"

"Ich muss Ihnen etwas sagen."

"There's something I need to tell you."

(Gelächter)

(Laughter)

"Im Südsudan gibt es bedürftigere Menschen als Sie."

"There are humans in South Sudan who are in more urgent need than you."

(Gelächter)

(Laughter)

"Also gehe ich. Machen Sie Ihr Essen selbst."

"So I'm leaving. Make your own dinner."

(Gelächter)

(Laughter)

Wir müssen also diese Probleme lösen und ich freue mich darauf, daran zu arbeiten.

So we have to solve these problems, and I'm looking forward to working on them.

(Gelächter)

(Laughter)

Der Roboter hat die menschliche Wertefunktion noch nicht so ganz gelernt und versteht somit nicht, dass der sentimentale Wert der Katze ihren Nährwert übertrifft.

And the robot hasn't quite learned the human value function properly, so it doesn't understand the sentimental value of the cat outweighs the nutritional value of the cat.

(Gelächter)

(Laughter)

(Beifall)

(Applause)

Chris Anderson: Sehr interessant, Stuart. Wir haben etwas Zeit, weil alles für den nächsten Redner vorbereitet wird.

Chris Anderson: So interesting, Stuart. We're going to stand here a bit because I think they're setting up

Ein paar Fragen.

for our next speaker.

CA: Kann man das nicht zu einem Gesetz zusammenfassen, vorprogrammiert: "Wenn ein Mensch versucht, mich abzuschalten, gehorche ich. Ich gehorche."

CA: And you couldn't just boil it down to one law, you know, hardwired in: "if any human ever tries to switch me off, I comply. I comply."

CA: Okay, Stuart, Ich hoffe nur, dass Sie das für uns lösen können. Vielen Dank für Ihren Vortrag. Er war toll.

CA: All right. Stuart, can I just say, I really, really hope you figure this out for us.

SR: Danke.

Thank you so much for that talk. That was amazing.

(Beifall)

SR: Thank you.

Stuart Russell: 3 principles for creating safer AI

Stuart Russell: 3 principles for creating safer AI

Related talks

Blaise Agüera y Arcas: How computers are learning to be creative

Sam Harris: Can we build AI without losing control over it?

Zeynep Tufekci: Machine intelligence makes human morals more important

Noriko Arai: Can a robot pass a university entrance exam?

David Lee: Why jobs of the future won't feel like work

Kriti Sharma: How to keep human bias out of AI

Related talks

Blaise Agüera y Arcas: How computers are learning to be creative

Sam Harris: Can we build AI without losing control over it?

Zeynep Tufekci: Machine intelligence makes human morals more important

Noriko Arai: Can a robot pass a university entrance exam?

David Lee: Why jobs of the future won't feel like work

Kriti Sharma: How to keep human bias out of AI