Stuart Russell: 3 principles for creating safer AI

This is Lee Sedol. Lee Sedol is one of the world's greatest Go players, and he's having what my friends in Silicon Valley call a "Holy Cow" moment --

Este es Lee Sedol. Lee Sedol es uno de los mejores jugadores de Go del mundo. Y está teniendo lo que mis amigos de Silicon Valley llaman un momento "¡Bendito Dios!".

(Laughter)

(Risas)

a moment where we realize that AI is actually progressing a lot faster than we expected. So humans have lost on the Go board. What about the real world?

Un momento en el que nos damos cuenta de que la IA está avanzando mucho más rápido de lo que esperábamos. Los humanos han perdido en el tablero de Go.

Well, the real world is much bigger, much more complicated than the Go board. It's a lot less visible, but it's still a decision problem. And if we think about some of the technologies that are coming down the pike ... Noriko [Arai] mentioned that reading is not yet happening in machines, at least with understanding. But that will happen, and when that happens, very soon afterwards, machines will have read everything that the human race has ever written. And that will enable machines, along with the ability to look further ahead than humans can, as we've already seen in Go, if they also have access to more information, they'll be able to make better decisions in the real world than we can. So is that a good thing? Well, I hope so.

¿Y en el mundo real? Bueno, el mundo real es mucho más grande y complicado que el tablero de Go. Es mucho menos visible. Pero sigue siendo un problema de decisión. Y si pensamos en algunas de las tecnologías que están por venir Noriko [Arai] mencionó que las máquinas aún no saben leer, al menos no comprendiendo, pero lo harán, y cuando eso suceda, poco después las máquinas habrán leído todo lo que la raza humana ha escrito. Eso permitirá a las máquinas, junto a su habilidad mirar más allá de lo que pueden los humanos, como ya hemos visto en el Go, si también tienen acceso a más información, serán capaces de tomar mejores decisiones en el mundo real que nosotros. ¿Es eso bueno? Bueno, espero que sí.

Our entire civilization, everything that we value, is based on our intelligence. And if we had access to a lot more intelligence, then there's really no limit to what the human race can do. And I think this could be, as some people have described it, the biggest event in human history. So why are people saying things like this, that AI might spell the end of the human race? Is this a new thing? Is it just Elon Musk and Bill Gates and Stephen Hawking?

Toda nuestra civilización, todo lo que valoramos, se basa en nuestra inteligencia. Y si tuviéramos acceso a mucha más inteligencia, entonces no existirían límites para lo que la raza humana pueda hacer. Y creo que este podría ser, como han dicho algunos, el mayor acontecimiento de la historia de la humanidad. Entonces, ¿por qué la gente afirma cosas como esta? Que la inteligencia artificial podría significar el fin de la raza humana. ¿Es esto algo nuevo? ¿Se trata solo de Elon Musk y Bill Gates y Stephen Hawking?

Actually, no. This idea has been around for a while. Here's a quotation: "Even if we could keep the machines in a subservient position, for instance, by turning off the power at strategic moments" -- and I'll come back to that "turning off the power" idea later on -- "we should, as a species, feel greatly humbled." So who said this? This is Alan Turing in 1951. Alan Turing, as you know, is the father of computer science and in many ways, the father of AI as well. So if we think about this problem, the problem of creating something more intelligent than your own species, we might call this "the gorilla problem," because gorillas' ancestors did this a few million years ago, and now we can ask the gorillas: Was this a good idea?

En realidad, no. Esta idea no es nueva. He aquí una cita: "Incluso si pudiéramos mantener las máquinas en una posición servil, por ejemplo, desconectándolas en momentos estratégicos" --volveré a esa idea de "quitar la corriente" más adelante-- "deberíamos, como especie, sentirnos humillados". ¿Quién dijo esto? Este es Alan Turing, en 1951. Alan Turing, como Uds. saben, es el padre de la informática y en muchos sentidos también el padre de la IA. Así que si pensamos en este problema, el problema de crear algo más inteligente que tu propia especie, podríamos llamar a esto "el problema del gorila". Porque los antepasados de los gorilas hicieron esto hace unos millones de años, y ahora podríamos preguntar a los gorilas: ¿Fue una buena idea?

So here they are having a meeting to discuss whether it was a good idea, and after a little while, they conclude, no, this was a terrible idea. Our species is in dire straits. In fact, you can see the existential sadness in their eyes.

Aquí están, reunidos para discutir si fue una buena idea, y pasado un tiempo concluyen que no. Fue una idea terrible. Nuestra especie está en apuros. De hecho, pueden ver la tristeza existencial en sus ojos.

(Laughter)

(Risas)

So this queasy feeling that making something smarter than your own species is maybe not a good idea -- what can we do about that? Well, really nothing, except stop doing AI, and because of all the benefits that I mentioned and because I'm an AI researcher, I'm not having that. I actually want to be able to keep doing AI.

Así que esta sensación mareante de que crear algo más inteligente que tu propia especie tal vez no sea buena idea... ¿Qué podemos hacer al respecto? Bueno, nada en realidad, excepto dejar de hacer IA. Y por todos los beneficios que he mencionado y porque soy un investigador de IA, no voy a tomar eso. Sin duda quiero seguir creando IA.

So we actually need to nail down the problem a bit more. What exactly is the problem? Why is better AI possibly a catastrophe?

Así que necesitamos precisar el problema un poco más. ¿Cuál es el problema? ¿Por qué tener mejor IA puede ser una catástrofe?

So here's another quotation: "We had better be quite sure that the purpose put into the machine is the purpose which we really desire." This was said by Norbert Wiener in 1960, shortly after he watched one of the very early learning systems learn to play checkers better than its creator. But this could equally have been said by King Midas. King Midas said, "I want everything I touch to turn to gold," and he got exactly what he asked for. That was the purpose that he put into the machine, so to speak, and then his food and his drink and his relatives turned to gold and he died in misery and starvation. So we'll call this "the King Midas problem" of stating an objective which is not, in fact, truly aligned with what we want. In modern terms, we call this "the value alignment problem."

Aquí hay otra cita: "Más nos vale estar seguros de que el propósito que introducimos en la máquina es el que de verdad deseamos". Esto fue dicho por Norbert Wiener en 1960, poco después de ver a uno de los primeros sistemas de aprendizaje aprender a jugar a las damas mejor que su creador. Pero esto podría haberlo dicho de igual modo el Rey Midas. El Rey Midas dijo, "Deseo que todo lo que toque se convierta en oro". Y obtuvo justo lo que pidió. Fue el propósito que introdujo en la máquina, por así decirlo. Y luego su comida, su bebida y sus familiares se convirtieron en oro y murió miserable y muerto de hambre. Así que llamaremos a esto "el problema del rey Midas", el de indicar un objetivo que no está realmente alineado con lo que de verdad queremos. En términos modernos, lo llamamos "el problema de alineación de valor".

Putting in the wrong objective is not the only part of the problem. There's another part. If you put an objective into a machine, even something as simple as, "Fetch the coffee," the machine says to itself, "Well, how might I fail to fetch the coffee? Someone might switch me off. OK, I have to take steps to prevent that. I will disable my 'off' switch. I will do anything to defend myself against interference with this objective that I have been given." So this single-minded pursuit in a very defensive mode of an objective that is, in fact, not aligned with the true objectives of the human race -- that's the problem that we face. And in fact, that's the high-value takeaway from this talk. If you want to remember one thing, it's that you can't fetch the coffee if you're dead.

Introducir un objetivo equivocado no es la única parte del problema. Hay otra parte. Al introducir un objetivo en una máquina incluso algo tan simple como "Trae el café", la máquina se dice a sí misma, "¿Cómo podría fallar yendo a buscar el café? Alguien podría desconectarme. Vale, debo tomar medidas para evitarlo. Desactivaré mi interruptor de 'apagado'. Haré cualquier cosa para protegerme de interferencias con este objetivo que me han dado. Así que esta persecución obsesiva de un modo muy defensivo para lograr un objetivo que no está alineado con los verdaderos objetivos de la raza humana... ese es el problema al que nos enfrentamos. Y de hecho esa es la lección más valiosa de esta charla. Si quieren recordar una cosa es que no se puede ir a buscar el café si se está muerto.

(Laughter)

(Risas)

It's very simple. Just remember that. Repeat it to yourself three times a day.

Es muy simple. Solo recuerden eso. Repítanlo tres veces al día.

(Laughter)

(Risas)

And in fact, this is exactly the plot of "2001: [A Space Odyssey]" HAL has an objective, a mission, which is not aligned with the objectives of the humans, and that leads to this conflict. Now fortunately, HAL is not superintelligent. He's pretty smart, but eventually Dave outwits him and manages to switch him off. But we might not be so lucky. So what are we going to do?

Y de hecho, este es el mismo argumento de "2001: [Una odisea del espacio]". HAL tiene un objetivo, una misión, que no está alineada con los objetivos de los humanos, y eso conduce a este conflicto. Por suerte HAL no es superinteligente. Es bastante inteligente, pero llegado el momento, Dave lo supera y logra apagarlo. Pero tal vez no tengamos tanta suerte. Entonces, ¿qué vamos a hacer?

I'm trying to redefine AI to get away from this classical notion of machines that intelligently pursue objectives. There are three principles involved. The first one is a principle of altruism, if you like, that the robot's only objective is to maximize the realization of human objectives, of human values. And by values here I don't mean touchy-feely, goody-goody values. I just mean whatever it is that the human would prefer their life to be like. And so this actually violates Asimov's law that the robot has to protect its own existence. It has no interest in preserving its existence whatsoever.

Estoy tratando de redefinir la IA para alejarnos de esta noción clásica de máquinas que persiguen objetivos de manera inteligente. Hay tres principios implicados. El primero es un principio de altruismo, por así decirlo, el único objetivo del robot es maximizar la realización de los objetivos humanos, de los valores humanos. Y por valores aquí no me refiero a valores sentimentales o de bondad. Solo quiero decir aquello más similar a la vida que un humano preferiría. Y esto viola la ley de Asimov de que el robot debe proteger su propia existencia. No tiene ningún interés en preservar su existencia en absoluto.

The second law is a law of humility, if you like. And this turns out to be really important to make robots safe. It says that the robot does not know what those human values are, so it has to maximize them, but it doesn't know what they are. And that avoids this problem of single-minded pursuit of an objective. This uncertainty turns out to be crucial.

La segunda ley es una ley de humildad, digamos. Y resulta muy importante para que los robots sean seguros. Dice que el robot no sabe cuáles son esos valores humanos, así que debe maximizarlos, pero no sabe lo que son. Lo cual evita el problema de la búsqueda obsesiva de un objetivo. Esta incertidumbre resulta crucial.

Now, in order to be useful to us, it has to have some idea of what we want. It obtains that information primarily by observation of human choices, so our own choices reveal information about what it is that we prefer our lives to be like. So those are the three principles. Let's see how that applies to this question of: "Can you switch the machine off?" as Turing suggested.

Claro que para sernos útiles, deben tener alguna idea de lo que queremos. Obtiene esa información sobre todo observando elecciones humanas, para que nuestras propias decisiones revelen información sobre lo que nosotros preferimos para nuestras vidas. Estos son los tres principios. Veamos cómo se aplica a esta cuestión de "apagar la máquina", como sugirió Turing.

So here's a PR2 robot. This is one that we have in our lab, and it has a big red "off" switch right on the back. The question is: Is it going to let you switch it off? If we do it the classical way, we give it the objective of, "Fetch the coffee, I must fetch the coffee, I can't fetch the coffee if I'm dead," so obviously the PR2 has been listening to my talk, and so it says, therefore, "I must disable my 'off' switch, and probably taser all the other people in Starbucks who might interfere with me."

He aquí un robot PR2. Es uno que tenemos en nuestro laboratorio, y tiene un gran botón rojo de 'apagado' en la parte posterior. La pregunta es: ¿Va a dejar que lo apaguen? Si lo hacemos a la manera clásica, le damos el objetivo de traer el café. "Debo traer el café. No puedo traer el café si estoy muerto". Obviamente el PR2 ha escuchado mi charla, y por tanto, decide "Debo inhabilitar mi botón de 'apagado'". "Y probablemente electrocutar al resto de personas en el Starbucks

(Laughter)

que podrían interferir". (Risas)

So this seems to be inevitable, right? This kind of failure mode seems to be inevitable, and it follows from having a concrete, definite objective.

Así que esto parece ser inevitable, ¿verdad? Este tipo de error parece ser inevitable, y sucede por tener un objetivo concreto, definido.

So what happens if the machine is uncertain about the objective? Well, it reasons in a different way. It says, "OK, the human might switch me off, but only if I'm doing something wrong. Well, I don't really know what wrong is, but I know that I don't want to do it." So that's the first and second principles right there. "So I should let the human switch me off." And in fact you can calculate the incentive that the robot has to allow the human to switch it off, and it's directly tied to the degree of uncertainty about the underlying objective.

Entonces, ¿qué pasa si la máquina no tiene claro el objetivo? Bueno, razona de una manera diferente. Dice, "El humano podría desconectarme, pero solo si hago algo malo. No tengo claro lo que es malo pero sé que no quiero hacerlo". Ahí están el primer y el segundo principio. "Así que debería dejar que el humano me desconecte". De hecho se puede calcular el incentivo que tiene el robot para permitir que el humano lo apague. Y está directamente ligado al grado de incertidumbre sobre el objetivo subyacente.

And then when the machine is switched off, that third principle comes into play. It learns something about the objectives it should be pursuing, because it learns that what it did wasn't right. In fact, we can, with suitable use of Greek symbols, as mathematicians usually do, we can actually prove a theorem that says that such a robot is provably beneficial to the human. You are provably better off with a machine that's designed in this way than without it. So this is a very simple example, but this is the first step in what we're trying to do with human-compatible AI.

Y entonces cuando la máquina está apagada, el tercer principio entra en juego. Aprende algo sobre los objetivos que debe perseguir, porque aprende que lo que hizo no estaba bien. De hecho, podemos, con el uso adecuado de los símbolos griegos, como suelen hacer los matemáticos, podemos probar un teorema que dice que tal robot es probablemente beneficioso para el humano. Se está demostrablemente mejor con una máquina que se diseña de esta manera que sin ella. Este es un ejemplo muy simple, pero este es el primer paso en lo que estamos tratando de hacer con IA compatible con humanos.

Now, this third principle, I think is the one that you're probably scratching your head over. You're probably thinking, "Well, you know, I behave badly. I don't want my robot to behave like me. I sneak down in the middle of the night and take stuff from the fridge. I do this and that." There's all kinds of things you don't want the robot doing. But in fact, it doesn't quite work that way. Just because you behave badly doesn't mean the robot is going to copy your behavior. It's going to understand your motivations and maybe help you resist them, if appropriate. But it's still difficult. What we're trying to do, in fact, is to allow machines to predict for any person and for any possible life that they could live, and the lives of everybody else: Which would they prefer? And there are many, many difficulties involved in doing this; I don't expect that this is going to get solved very quickly. The real difficulties, in fact, are us.

Ahora, este tercer principio, es probablemente el que está haciendo que se rasquen la cabeza. Probablemente piensen: "Yo me comporto mal. No quiero que mi robot se comporte como yo. Me escabullo en mitad de la noche y tomo cosas de la nevera, hago esto y hago aquello". Hay todo tipo de cosas que no quieres que haga el robot. Pero lo cierto es que no funciona así. Solo porque uno se comporte mal no significa que el robot vaya a copiar su comportamiento. Va a entender sus motivaciones y tal vez a ayudarle a resistirlas, si es apropiado. Pero sigue siendo difícil. Lo que estamos tratando de hacer, de hecho, es permitir que las máquinas predigan para cualquier persona y para cualquier vida posible que podrían vivir, y las vidas de todos los demás lo que preferirían. Y hay muchas, muchas dificultades ligadas a hacer esto. No espero que vaya a resolverse pronto. Las verdaderas dificultades, de hecho, somos nosotros.

As I have already mentioned, we behave badly. In fact, some of us are downright nasty. Now the robot, as I said, doesn't have to copy the behavior. The robot does not have any objective of its own. It's purely altruistic. And it's not designed just to satisfy the desires of one person, the user, but in fact it has to respect the preferences of everybody. So it can deal with a certain amount of nastiness, and it can even understand that your nastiness, for example, you may take bribes as a passport official because you need to feed your family and send your kids to school. It can understand that; it doesn't mean it's going to steal. In fact, it'll just help you send your kids to school.

Como ya he mencionado, nos comportamos mal. De hecho, algunos de nosotros somos francamente desagradables. Como he dicho, el robot no tiene que copiar el comportamiento. El robot no tiene ningún objetivo propio. Es puramente altruista. Y no está diseñado solo para satisfacer los deseos de una persona, el usuario, sino que tiene que respetar las preferencias de todos. Así que puede lidiar con cierta cantidad de maldad, e incluso puede entender que su maldad, por ejemplo... Ud. puede aceptar sobornos como controlador de pasaportes porque necesita alimentar a su familia y que sus hijos vayan a la escuela. Puede entender eso; no significa que vaya a robar. De hecho, solo le ayudará a que sus hijos vayan al colegio.

We are also computationally limited. Lee Sedol is a brilliant Go player, but he still lost. So if we look at his actions, he took an action that lost the game. That doesn't mean he wanted to lose. So to understand his behavior, we actually have to invert through a model of human cognition that includes our computational limitations -- a very complicated model. But it's still something that we can work on understanding.

También estamos limitados computacionalmente. Lee Sedol es un jugador brillante de Go, pero aun así perdió. Si nos fijamos en sus acciones, tomó una decisión que le hizo perder. Eso no significa que él quisiera perder. Así que para entender su comportamiento, en realidad tenemos que invertir, a través de un modelo cognitivo humano que incluye nuestras limitaciones computacionales, y se trata de un modelo muy complicado. Pero es algo en lo que podemos trabajar para comprender.

Probably the most difficult part, from my point of view as an AI researcher, is the fact that there are lots of us, and so the machine has to somehow trade off, weigh up the preferences of many different people, and there are different ways to do that. Economists, sociologists, moral philosophers have understood that, and we are actively looking for collaboration.

Puede que la parte más difícil, desde mi punto de vista como investigador de IA, es el hecho de que hay muchos de nosotros, con lo cual la máquina tiene que sopesar las preferencias de mucha gente diferente. Hay diferentes maneras de hacer eso. Economistas, sociólogos, filósofos morales han comprendido esto y estamos buscando colaboración de manera activa.

Let's have a look and see what happens when you get that wrong. So you can have a conversation, for example, with your intelligent personal assistant that might be available in a few years' time. Think of a Siri on steroids. So Siri says, "Your wife called to remind you about dinner tonight." And of course, you've forgotten. "What? What dinner? What are you talking about?"

Vamos a ver lo que sucede cuando esto se hace mal. Ud. puede estar hablando, por ejemplo, con su asistente personal inteligente que podría estar disponible dentro de unos años. Piensen en Siri con esteroides. Siri dice "Su esposa llamó para recordarle la cena de esta noche". Por supuesto, lo había olvidado. ¿Qué cena? ¿De qué está hablando?

"Uh, your 20th anniversary at 7pm."

"Su 20 aniversario, a las 7pm".

"I can't do that. I'm meeting with the secretary-general at 7:30. How could this have happened?"

"No puedo, me reúno con el secretario general a las 7:30. ¿Cómo ha podido suceder esto?".

"Well, I did warn you, but you overrode my recommendation."

"Bueno, le advertí, pero ignoró mi recomendación".

"Well, what am I going to do? I can't just tell him I'm too busy."

"¿Qué voy a hacer? No puedo decirles que estoy demasiado ocupado".

"Don't worry. I arranged for his plane to be delayed."

"No se preocupe, he hecho que su avión se retrase".

(Laughter)

(Risas)

"Some kind of computer malfunction."

"Algún tipo de error en el ordenador".

(Laughter)

(Risas)

"Really? You can do that?"

"¿En serio? ¿Puede hacer eso?".

"He sends his profound apologies and looks forward to meeting you for lunch tomorrow."

"Le envía sinceras disculpas y espera poder conocerle mañana para el almuerzo".

(Laughter)

(Risas)

So the values here -- there's a slight mistake going on. This is clearly following my wife's values which is "Happy wife, happy life."

Así que los valores aquí... aquí hay un pequeño fallo. Claramente está siguiendo los valores de mi esposa que son "esposa feliz, vida feliz".

(Laughter)

(Risas)

It could go the other way. You could come home after a hard day's work, and the computer says, "Long day?"

Podría suceder al revés. Podría llegar a casa tras un duro día de trabajo, y el ordenador dice "¿Un día duro?".

"Yes, I didn't even have time for lunch."

"Sí, ni tuve tiempo de almorzar".

"You must be very hungry."

"Debe tener mucha hambre".

"Starving, yeah. Could you make some dinner?"

"Me muero de hambre, sí, ¿podría preparar algo de cena?".

"There's something I need to tell you."

"Hay algo que necesito decirle".

(Laughter)

(Risas)

"There are humans in South Sudan who are in more urgent need than you."

"Hay humanos en Sudán del Sur más necesitados que Ud.".

(Laughter)

(Risas)

"So I'm leaving. Make your own dinner."

"Así que me voy, hágase su propia cena".

(Laughter)

(Risas)

So we have to solve these problems, and I'm looking forward to working on them.

Así que tenemos que resolver estos problemas, y tengo ganas de trabajar en ellos.

There are reasons for optimism. One reason is, there is a massive amount of data. Because remember -- I said they're going to read everything the human race has ever written. Most of what we write about is human beings doing things and other people getting upset about it. So there's a massive amount of data to learn from.

Hay razones para ser optimistas. Una razón es que hay gran cantidad de datos Recuerden, leerán todo lo que la raza humana ha escrito. La mayoría de lo que escribimos trata sobre humanos haciendo cosas y cómo estas molestan a otras personas. Así que hay muchos datos de los que aprender.

There's also a very strong economic incentive to get this right. So imagine your domestic robot's at home. You're late from work again and the robot has to feed the kids, and the kids are hungry and there's nothing in the fridge. And the robot sees the cat.

También hay un fuerte incentivo económico para que esto funcione bien. Imagine que su robot doméstico está en casa Ud. llega tarde del trabajo, el robot tiene que dar de comer a los niños, los niños tienen hambre y no hay nada en la nevera. Y el robot ve al gato.

(Laughter)

(Risas)

And the robot hasn't quite learned the human value function properly, so it doesn't understand the sentimental value of the cat outweighs the nutritional value of the cat.

Y el robot no ha aprendido del todo bien la función del valor humano por lo que no entiende que el valor sentimental del gato supera el valor nutricional del gato.

(Laughter)

(Risas)

So then what happens? Well, it happens like this: "Deranged robot cooks kitty for family dinner." That one incident would be the end of the domestic robot industry. So there's a huge incentive to get this right long before we reach superintelligent machines.

Entonces, ¿qué pasa? Bueno, sucede lo siguiente: "Robot desquiciado cocina a un gatito para la cena familiar". Ese único incidente acabaría con la industria de robots domésticos. Así que hay un gran incentivo para hacer esto bien. mucho antes de llegar a las máquinas superinteligentes.

So to summarize: I'm actually trying to change the definition of AI so that we have provably beneficial machines. And the principles are: machines that are altruistic, that want to achieve only our objectives, but that are uncertain about what those objectives are, and will watch all of us to learn more about what it is that we really want. And hopefully in the process, we will learn to be better people. Thank you very much.

Así que para resumir: Estoy intentando cambiar la definición de IA para que tengamos máquinas demostrablemente beneficiosas. Y los principios son: Máquinas que son altruistas, que desean lograr solo nuestros objetivos, pero que no están seguras de cuáles son esos objetivos y nos observarán a todos para aprender qué es lo que realmente queremos. Y con suerte, en el proceso, aprenderemos a ser mejores personas. Muchas gracias.

(Applause)

(Aplausos)

Chris Anderson: So interesting, Stuart. We're going to stand here a bit because I think they're setting up for our next speaker.

Chris Anderson: Muy interesante, Stuart. Vamos a estar aquí un poco porque creo que están preparando a nuestro próximo orador.

A couple of questions. So the idea of programming in ignorance seems intuitively really powerful. As you get to superintelligence, what's going to stop a robot reading literature and discovering this idea that knowledge is actually better than ignorance and still just shifting its own goals and rewriting that programming?

Un par de preguntas. La idea de programar ignorancia parece intuitivamente muy poderosa. Al llegar a la superinteligencia, ¿qué puede impedir que un robot lea literatura y descubra esta idea de que el conocimiento es mejor que la ignorancia, cambiando sus propios objetivos y reescribiendo su programación?

Stuart Russell: Yes, so we want it to learn more, as I said, about our objectives. It'll only become more certain as it becomes more correct, so the evidence is there and it's going to be designed to interpret it correctly. It will understand, for example, that books are very biased in the evidence they contain. They only talk about kings and princes and elite white male people doing stuff. So it's a complicated problem, but as it learns more about our objectives it will become more and more useful to us.

Stuart Russell: Queremos que aprenda más, como he dicho, sobre nuestros objetivos. Solo ganará seguridad cuanto más acierte. La evidencia estará ahí, y estará diseñado para interpretarla adecuadamente. Comprenderá, por ejemplo, que los libros son muy sesgados en la evidencia que contienen. Solo hablan de reyes y príncipes y hombres blancos poderosos haciendo cosas. Es un problema complicado, pero conforme aprenda más sobre nuestros objetivos será cada vez más útil para nosotros.

CA: And you couldn't just boil it down to one law, you know, hardwired in: "if any human ever tries to switch me off, I comply. I comply."

CA: Y no podría reducirse a una ley, ya sabe, grabada a fuego, "Si un humano alguna vez intenta apagarme yo obedezco, obedezco".

SR: Absolutely not. That would be a terrible idea. So imagine that you have a self-driving car and you want to send your five-year-old off to preschool. Do you want your five-year-old to be able to switch off the car while it's driving along? Probably not. So it needs to understand how rational and sensible the person is. The more rational the person, the more willing you are to be switched off. If the person is completely random or even malicious, then you're less willing to be switched off.

SR: Absolutamente no. Sería una idea terrible. Imagine, tiene un auto que se conduce solo y quiere llevar a su hijo de cinco años al jardín de infancia. ¿Quiere que su hijo de cinco años pueda apagar el coche mientras conduce? Probablemente no. Por tanto necesita entender cuán racional y sensata es la persona. Cuanto más racional sea la persona, más dispuesto estará a dejar que lo apaguen. Si la persona es impredecible o incluso malintencionada estará menos dispuesto a permitir que lo apaguen.

CA: All right. Stuart, can I just say, I really, really hope you figure this out for us. Thank you so much for that talk. That was amazing.

CA: Stuart, permítame decir que de veras espero que resuelva esto por todos nosotros. Muchas gracias por su charla. Ha sido increíble, gracias.

SR: Thank you.

(Aplausos)

(Applause)

This is Lee Sedol. Lee Sedol is one of the world's greatest Go players, and he's having what my friends in Silicon Valley call a "Holy Cow" moment --

Este es Lee Sedol. Lee Sedol es uno de los mejores jugadores de Go del mundo. Y está teniendo lo que mis amigos de Silicon Valley llaman un momento "¡Bendito Dios!".

(Laughter)

(Risas)

a moment where we realize that AI is actually progressing a lot faster than we expected. So humans have lost on the Go board. What about the real world?

Un momento en el que nos damos cuenta de que la IA está avanzando mucho más rápido de lo que esperábamos. Los humanos han perdido en el tablero de Go.

(Laughter)

(Risas)

So we actually need to nail down the problem a bit more. What exactly is the problem? Why is better AI possibly a catastrophe?

Así que necesitamos precisar el problema un poco más. ¿Cuál es el problema? ¿Por qué tener mejor IA puede ser una catástrofe?

(Laughter)

(Risas)

It's very simple. Just remember that. Repeat it to yourself three times a day.

Es muy simple. Solo recuerden eso. Repítanlo tres veces al día.

(Laughter)

(Risas)

(Laughter)

que podrían interferir". (Risas)

So this seems to be inevitable, right? This kind of failure mode seems to be inevitable, and it follows from having a concrete, definite objective.

Así que esto parece ser inevitable, ¿verdad? Este tipo de error parece ser inevitable, y sucede por tener un objetivo concreto, definido.

"Uh, your 20th anniversary at 7pm."

"Su 20 aniversario, a las 7pm".

"I can't do that. I'm meeting with the secretary-general at 7:30. How could this have happened?"

"No puedo, me reúno con el secretario general a las 7:30. ¿Cómo ha podido suceder esto?".

"Well, I did warn you, but you overrode my recommendation."

"Bueno, le advertí, pero ignoró mi recomendación".

"Well, what am I going to do? I can't just tell him I'm too busy."

"¿Qué voy a hacer? No puedo decirles que estoy demasiado ocupado".

"Don't worry. I arranged for his plane to be delayed."

"No se preocupe, he hecho que su avión se retrase".

(Laughter)

(Risas)

"Some kind of computer malfunction."

"Algún tipo de error en el ordenador".

(Laughter)

(Risas)

"Really? You can do that?"

"¿En serio? ¿Puede hacer eso?".

"He sends his profound apologies and looks forward to meeting you for lunch tomorrow."

"Le envía sinceras disculpas y espera poder conocerle mañana para el almuerzo".

(Laughter)

(Risas)

So the values here -- there's a slight mistake going on. This is clearly following my wife's values which is "Happy wife, happy life."

Así que los valores aquí... aquí hay un pequeño fallo. Claramente está siguiendo los valores de mi esposa que son "esposa feliz, vida feliz".

(Laughter)

(Risas)

It could go the other way. You could come home after a hard day's work, and the computer says, "Long day?"

Podría suceder al revés. Podría llegar a casa tras un duro día de trabajo, y el ordenador dice "¿Un día duro?".

"Yes, I didn't even have time for lunch."

"Sí, ni tuve tiempo de almorzar".

"You must be very hungry."

"Debe tener mucha hambre".

"Starving, yeah. Could you make some dinner?"

"Me muero de hambre, sí, ¿podría preparar algo de cena?".

"There's something I need to tell you."

"Hay algo que necesito decirle".

(Laughter)

(Risas)

"There are humans in South Sudan who are in more urgent need than you."

"Hay humanos en Sudán del Sur más necesitados que Ud.".

(Laughter)

(Risas)

"So I'm leaving. Make your own dinner."

"Así que me voy, hágase su propia cena".

(Laughter)

(Risas)

So we have to solve these problems, and I'm looking forward to working on them.

Así que tenemos que resolver estos problemas, y tengo ganas de trabajar en ellos.

(Laughter)

(Risas)

And the robot hasn't quite learned the human value function properly, so it doesn't understand the sentimental value of the cat outweighs the nutritional value of the cat.

Y el robot no ha aprendido del todo bien la función del valor humano por lo que no entiende que el valor sentimental del gato supera el valor nutricional del gato.

(Laughter)

(Risas)

(Applause)

(Aplausos)

Chris Anderson: So interesting, Stuart. We're going to stand here a bit because I think they're setting up for our next speaker.

Chris Anderson: Muy interesante, Stuart. Vamos a estar aquí un poco porque creo que están preparando a nuestro próximo orador.

CA: And you couldn't just boil it down to one law, you know, hardwired in: "if any human ever tries to switch me off, I comply. I comply."

CA: Y no podría reducirse a una ley, ya sabe, grabada a fuego, "Si un humano alguna vez intenta apagarme yo obedezco, obedezco".

CA: All right. Stuart, can I just say, I really, really hope you figure this out for us. Thank you so much for that talk. That was amazing.

CA: Stuart, permítame decir que de veras espero que resuelva esto por todos nosotros. Muchas gracias por su charla. Ha sido increíble, gracias.

SR: Thank you.

(Aplausos)

(Applause)

Stuart Russell: 3 principles for creating safer AI

Stuart Russell: 3 principles for creating safer AI

Related talks

Blaise Agüera y Arcas: How computers are learning to be creative

Sam Harris: Can we build AI without losing control over it?

Zeynep Tufekci: Machine intelligence makes human morals more important

Noriko Arai: Can a robot pass a university entrance exam?

David Lee: Why jobs of the future won't feel like work

Kriti Sharma: How to keep human bias out of AI

Related talks

Blaise Agüera y Arcas: How computers are learning to be creative

Sam Harris: Can we build AI without losing control over it?

Zeynep Tufekci: Machine intelligence makes human morals more important

Noriko Arai: Can a robot pass a university entrance exam?

David Lee: Why jobs of the future won't feel like work

Kriti Sharma: How to keep human bias out of AI