Stuart Russell: 3 principles for creating safer AI

This is Lee Sedol. Lee Sedol is one of the world's greatest Go players, and he's having what my friends in Silicon Valley call a "Holy Cow" moment --

Este é Lee Sedol. Lee Sedol é um dos maiores jogadores de Go do mundo, e está tendo o que meus amigos do Vale do Silício chamam de momento "Caramba!",

(Laughter)

(Risos)

a moment where we realize that AI is actually progressing a lot faster than we expected. So humans have lost on the Go board. What about the real world?

um momento em que percebemos que a IA está progredindo realmente muito mais rápido do que esperávamos. O homem perde no tabuleiro de Go.

Well, the real world is much bigger,

E no mundo real?

much more complicated than the Go board. It's a lot less visible, but it's still a decision problem. And if we think about some of the technologies that are coming down the pike ... Noriko [Arai] mentioned that reading is not yet happening in machines, at least with understanding. But that will happen, and when that happens, very soon afterwards, machines will have read everything that the human race has ever written. And that will enable machines, along with the ability to look further ahead than humans can, as we've already seen in Go, if they also have access to more information, they'll be able to make better decisions in the real world than we can. So is that a good thing? Well, I hope so.

Bem, o mundo real é muito maior, muito mais complicado do que o tabuleiro de Go. É muito menos visível, mas ainda é um problema de decisão. Se pensarmos sobre algumas das tecnologias que estão surgindo... Noriko [Arai] mencionou que a leitura ainda não acontece nos computadores, pelo menos, com compreensão, mas isso irá acontecer. E quando acontecer, muito em breve, os computadores terão lido tudo que o homem tiver escrito. Isso permitirá aos computadores, junto com a capacidade de olhar mais adiante do que o homem, como já vimos no Go, se também tiverem acesso a mais informação, serem capazes de tomar decisões melhores no mundo real do que nós. Isso é bom? Bem, espero que sim.

Our entire civilization, everything that we value, is based on our intelligence. And if we had access to a lot more intelligence, then there's really no limit to what the human race can do. And I think this could be, as some people have described it, the biggest event in human history. So why are people saying things like this, that AI might spell the end of the human race? Is this a new thing? Is it just Elon Musk and Bill Gates and Stephen Hawking?

Toda a nossa civilização, tudo o que valorizamos, está baseada em nossa inteligência. Se tivéssemos acesso a muito mais informações, não haveria limites para o homem. Creio que seria, como alguns descreveram, o maior evento na história da humanidade. [Bem-vindo à Utopia. Aproveite sua viagem.] Então, por que as pessoas dizem coisas como esta, que a IA pode ser o sinal do fim da raça humana? Isso é novidade? Trata-se apenas de Elon Musk, Bill Gates e Stephen Hawking?

Actually, no. This idea has been around for a while. Here's a quotation: "Even if we could keep the machines in a subservient position, for instance, by turning off the power at strategic moments" -- and I'll come back to that "turning off the power" idea later on -- "we should, as a species, feel greatly humbled." So who said this? This is Alan Turing in 1951. Alan Turing, as you know, is the father of computer science and in many ways, the father of AI as well. So if we think about this problem, the problem of creating something more intelligent than your own species, we might call this "the gorilla problem," because gorillas' ancestors did this a few million years ago, and now we can ask the gorillas: Was this a good idea?

Na verdade, não. Esta ideia está por aí há algum tempo. Aqui está uma citação: "Mesmo que pudéssemos manter os computadores em posição submissa, desligando, por exemplo, a energia em momentos estratégicos", voltarei mais tarde com essa ideia de "desligar a energia", "deveríamos, como espécie, nos sentir muito humilhados". Quem disse isso? Foi Alan Turing, em 1951. Alan Turing, como sabem, é o pai da informática e, de muitas formas, o pai da IA também. Se pensarmos sobre o problema de criar algo mais inteligente do que a própria espécie, podemos chamar isso de "problema do gorila", porque os ancestrais dos gorilas fizeram isso há milhões de anos, e podemos agora perguntar a eles: "Foi uma boa ideia?"

So here they are having a meeting to discuss whether it was a good idea, and after a little while, they conclude, no, this was a terrible idea. Our species is in dire straits. In fact, you can see the existential sadness in their eyes.

Aqui estão eles tendo uma reunião para discutir se foi uma boa ideia, e, depois de um tempo, concluem que não, foi uma péssima ideia. Nossa espécie está em apuros. Sim, você pode ver a tristeza existencial nos olhos deles.

(Laughter)

(Risos)

So this queasy feeling that making something smarter than your own species is maybe not a good idea -- what can we do about that? Well, really nothing, except stop doing AI, and because of all the benefits that I mentioned and because I'm an AI researcher, I'm not having that. I actually want to be able to keep doing AI.

Esta sensação desconfortável de algo mais inteligente do que a própria espécie talvez não seja uma boa ideia. O que podemos fazer a respeito? Bem, realmente nada, a não ser parar de fazer IA, e, por causa de todos os benefícios que mencionei e, por ser pesquisador de IA, não permitirei isso. Quero mesmo poder continuar a fazer IA.

So we actually need to nail down the problem a bit more. What exactly is the problem? Why is better AI possibly a catastrophe?

Temos, na realidade, que decidir sobre o problema. Qual é o problema exatamente? Por que a IA pode ser uma catástrofe?

So here's another quotation: "We had better be quite sure that the purpose put into the machine is the purpose which we really desire." This was said by Norbert Wiener in 1960, shortly after he watched one of the very early learning systems learn to play checkers better than its creator. But this could equally have been said by King Midas. King Midas said, "I want everything I touch to turn to gold," and he got exactly what he asked for. That was the purpose that he put into the machine, so to speak, and then his food and his drink and his relatives turned to gold and he died in misery and starvation. So we'll call this "the King Midas problem" of stating an objective which is not, in fact, truly aligned with what we want. In modern terms, we call this "the value alignment problem."

Aqui está uma outra citação: "É melhor termos certeza de que a missão passada ao computador é o que realmente desejamos". Isso foi dito por Norbert Wiener, em 1960, pouco depois de ter visto um dos sistemas de aprendizagem aprender a jogar damas melhor do que seu criador. Mas isso também poderia ter sido dito pelo Rei Midas. O Rei Midas disse: "Quero que tudo o que eu tocar vire ouro", e ele conseguiu exatamente o que pediu. Essa foi a missão passada ao computador, por assim dizer, e então, sua comida, bebida e seus parentes se transformaram em ouro, e ele morreu de tristeza e fome. Chamaremos isso de "problema do Rei Midas" de dar uma missão que não está, de fato, verdadeiramente alinhada com aquilo que queremos. Em termos modernos, chamamos de "problema de alinhamento de valor".

Putting in the wrong objective is not the only part of the problem. There's another part. If you put an objective into a machine, even something as simple as, "Fetch the coffee," the machine says to itself, "Well, how might I fail to fetch the coffee? Someone might switch me off. OK, I have to take steps to prevent that. I will disable my 'off' switch. I will do anything to defend myself against interference with this objective that I have been given." So this single-minded pursuit in a very defensive mode of an objective that is, in fact, not aligned with the true objectives of the human race -- that's the problem that we face. And in fact, that's the high-value takeaway from this talk. If you want to remember one thing, it's that you can't fetch the coffee if you're dead.

Atribuir a missão errada não é a única parte do problema. Há outro elemento. Se você passar uma missão ao computador, mesmo algo tão simples como "Traga o café", o computador dirá a si mesmo: "Bem, como posso falhar ao trazer o café? Alguém pode me desligar. Certo, tenho que fazer algo para evitar isso. Desabilitarei meu botão liga e desliga. Farei de tudo para me defender contra interferências a esta missão que recebi". Esta busca determinada, de modo muito defensivo, de uma missão que não está, de fato, alinhada com os reais objetivos do homem, é o problema que enfrentamos. Essa é, na verdade, a conclusão valiosa desta palestra. Se quiserem se lembrar de uma coisa, é que não podem trazer o café se estiverem mortos.

(Laughter)

(Risos)

It's very simple. Just remember that. Repeat it to yourself three times a day.

É muito simples. Lembrem-se apenas disso. Repitam a si mesmos três vezes ao dia.

(Laughter)

(Risos)

And in fact, this is exactly the plot of "2001: [A Space Odyssey]" HAL has an objective, a mission, which is not aligned with the objectives of the humans, and that leads to this conflict. Now fortunately, HAL is not superintelligent. He's pretty smart, but eventually Dave outwits him and manages to switch him off. But we might not be so lucky. So what are we going to do?

Este é exatamente o enredo de "2001: Uma Odisseia no Espaço". HAL tem um objetivo, uma missão, que não está alinhada aos objetivos do homem, e que leva a este conflito. Felizmente, HAL não é superinteligente. É bem inteligente, mas Dave é mais esperto do que ele no final e consegue desligá-lo. Mas podemos não ter tanta sorte. [Desculpe, Dave, mas não posso fazer isso.] Então, o que faremos?

I'm trying to redefine AI to get away from this classical notion of machines that intelligently pursue objectives. There are three principles involved. The first one is a principle of altruism, if you like, that the robot's only objective is to maximize the realization of human objectives, of human values. And by values here I don't mean touchy-feely, goody-goody values. I just mean whatever it is that the human would prefer their life to be like. And so this actually violates Asimov's law that the robot has to protect its own existence. It has no interest in preserving its existence whatsoever.

Estou tentando redefinir a Inteligência Artificial para escapar dessa ideia tradicional de computadores que se dedicam aos objetivos de forma inteligente. Há três princípios envolvidos. O primeiro é o princípio do altruísmo segundo o qual o único objetivo do robô é maximizar a realização de objetivos do homem, de valores humanos. Por valores aqui, não me refiro a valores morais, sentimentais. Refiro-me apenas ao que o homem prefere que seja sua vida. Isso realmente viola a lei de Asimov pela qual o robô deve proteger sua existência. Ele não tem interesse em preservar sua existência de forma alguma.

The second law is a law of humility, if you like. And this turns out to be really important to make robots safe. It says that the robot does not know what those human values are, so it has to maximize them, but it doesn't know what they are. And that avoids this problem of single-minded pursuit of an objective. This uncertainty turns out to be crucial.

A segunda lei é a lei da humildade. Isso vem a ser realmente importante para fazer com que os robôs sejam seguros. Segundo ela, o robô não sabe quais são esses valores humanos. Então, ele tem que maximizá-los, mas não sabe quais são eles. Isso evita este problema de busca determinada por um objetivo. Esta incerteza revela-se crucial.

Now, in order to be useful to us, it has to have some idea of what we want. It obtains that information primarily by observation of human choices, so our own choices reveal information about what it is that we prefer our lives to be like. So those are the three principles. Let's see how that applies to this question of: "Can you switch the machine off?" as Turing suggested.

Para ser útil a nós, ele precisa ter uma ideia do que queremos. Ele obtém essa informação principalmente pela observação das escolhas humanas. Assim nossas próprias escolhas revelam informação sobre como preferimos que sejam nossas vidas. São três princípios. Vejamos como isso se aplica a esta questão: "Você pode desligar o computador?" como sugeriu Turing.

So here's a PR2 robot. This is one that we have in our lab, and it has a big red "off" switch right on the back. The question is: Is it going to let you switch it off? If we do it the classical way, we give it the objective of, "Fetch the coffee, I must fetch the coffee, I can't fetch the coffee if I'm dead," so obviously the PR2 has been listening to my talk, and so it says, therefore, "I must disable my 'off' switch, and probably taser all the other people in Starbucks who might interfere with me."

Aqui está o robô PR2, que temos em nosso laboratório, com um grande botão liga e desliga vermelho nas costas. A questão é: ele deixará você desligá-lo? Pelo modo tradicional, damos a ele a missão "Traga o café, devo trazer o café, não posso trazer o café se eu estiver morto". É claro que o PR2 estava ouvindo minha conversa, e diz então: "Devo desabilitar meu botão liga e desliga, e talvez dar um choque nas pessoas do Starbucks que mexerem comigo".

(Laughter)

(Risos)

So this seems to be inevitable, right? This kind of failure mode seems to be inevitable, and it follows from having a concrete, definite objective.

Isso parece inevitável, não? Este tipo de modo de falha parece inevitável, e resulta de um objetivo concreto, definido.

So what happens if the machine is uncertain about the objective? Well, it reasons in a different way. It says, "OK, the human might switch me off, but only if I'm doing something wrong. Well, I don't really know what wrong is, but I know that I don't want to do it." So that's the first and second principles right there. "So I should let the human switch me off." And in fact you can calculate the incentive that the robot has to allow the human to switch it off, and it's directly tied to the degree of uncertainty about the underlying objective.

O que acontece se o computador não tem certeza do objetivo? Bem, ele raciocina de modo diferente. Diz: "Tudo bem, o homem pode me desligar, mas só se estiver fazendo algo errado. Bem, não sei o que é errado, mas sei que não quero fazer isso". Ali estão o primeiro e o segundo princípios. "Então, deveria deixar o homem me desligar". De fato, você pode calcular o estímulo que o robô tem para deixar o homem desligá-lo, e está diretamente ligado ao grau de incerteza sobre o objetivo fundamental.

And then when the machine is switched off, that third principle comes into play. It learns something about the objectives it should be pursuing, because it learns that what it did wasn't right. In fact, we can, with suitable use of Greek symbols, as mathematicians usually do, we can actually prove a theorem that says that such a robot is provably beneficial to the human. You are provably better off with a machine that's designed in this way than without it. So this is a very simple example, but this is the first step in what we're trying to do with human-compatible AI.

Então, quando o computador é desligado, o terceiro princípio entra em campo. Ele aprende algo sobre os objetivos aos quais deveria se dedicar porque aprende que não fez o certo. De fato, com uso adequado de símbolos gregos, como costumavam fazer os matemáticos, podemos até provar um teorema segundo o qual tal robô é provavelmente benéfico ao homem. Talvez você esteja melhor com um computador projetado desta forma do que sem ele. Este é um exemplo muito simples, mas é o primeiro passo para o que estamos tentando fazer com IA compatível com o homem.

Now, this third principle, I think is the one that you're probably scratching your head over. You're probably thinking, "Well, you know, I behave badly. I don't want my robot to behave like me. I sneak down in the middle of the night and take stuff from the fridge. I do this and that." There's all kinds of things you don't want the robot doing. But in fact, it doesn't quite work that way. Just because you behave badly doesn't mean the robot is going to copy your behavior. It's going to understand your motivations and maybe help you resist them, if appropriate. But it's still difficult. What we're trying to do, in fact, is to allow machines to predict for any person and for any possible life that they could live, and the lives of everybody else: Which would they prefer? And there are many, many difficulties involved in doing this; I don't expect that this is going to get solved very quickly. The real difficulties, in fact, are us.

Há este terceiro princípio, pelo qual você deve estar coçando a cabeça. Você deve estar pensando: "Bem, sabe, eu me comportei mal. Não quero que meu robô se comporte como eu. Ando às escondidas, no meio da noite, e pego coisas da geladeira. Faço isso e aquilo". Há muitas coisas que você não quer que o robô faça. Mas, na verdade, não funciona bem assim. Só porque você se comporta mal não quer dizer que o robô irá copiar seu comportamento. Ele irá entender suas motivações e talvez ajudá-lo a resistir a elas, se for adequado. Mas ainda é difícil. O que estamos tentando fazer, é permitir que computadores prevejam para qualquer pessoa e para qualquer vida que ela poderia ter, e a vida de todos os demais: qual vida eles iriam preferir? Há muitas dificuldades envolvidas para fazer isso. Não espero que isso seja resolvido muito rapidamente. As dificuldades reais, na verdade, somos nós.

As I have already mentioned, we behave badly. In fact, some of us are downright nasty. Now the robot, as I said, doesn't have to copy the behavior. The robot does not have any objective of its own. It's purely altruistic. And it's not designed just to satisfy the desires of one person, the user, but in fact it has to respect the preferences of everybody. So it can deal with a certain amount of nastiness, and it can even understand that your nastiness, for example, you may take bribes as a passport official because you need to feed your family and send your kids to school. It can understand that; it doesn't mean it's going to steal. In fact, it'll just help you send your kids to school.

Como já havia mencionado, nós nos comportamos mal. Alguns de nós somos muito maus. Já o robô, como eu disse, não tem que copiar o comportamento. O robô não tem nenhum objetivo próprio. Ele é meramente altruísta. Não é projetado apenas para satisfazer os desejos de uma pessoa, o consumidor, mas ele tem que respeitar as preferências de todos. Ele pode lidar com um pouco de maldade, e pode até entender essa sua maldade. Por exemplo, você pode aceitar suborno como funcionário público porque precisa alimentar sua família e pagar a escola dos seus filhos. O robô pode entender isso. Não significa que ele irá roubar. Ele só o ajudará a pagar a escola de seus filhos.

We are also computationally limited. Lee Sedol is a brilliant Go player, but he still lost. So if we look at his actions, he took an action that lost the game. That doesn't mean he wanted to lose. So to understand his behavior, we actually have to invert through a model of human cognition that includes our computational limitations -- a very complicated model. But it's still something that we can work on understanding.

Também somos computacionalmente limitados. Lee Sedol é um jogador de Go genial, mas ele ainda perde. Se examinarmos suas ações, vemos que uma delas o fez perder o jogo. Isso não significa que ele queria perder. Para entender o comportamento dele, temos realmente que inverter pelo modelo de conhecimento humano que inclui limitações computacionais, um modelo muito complexo. Mas ainda é algo que podemos trabalhar para compreender.

Probably the most difficult part, from my point of view as an AI researcher, is the fact that there are lots of us, and so the machine has to somehow trade off, weigh up the preferences of many different people, and there are different ways to do that. Economists, sociologists, moral philosophers have understood that, and we are actively looking for collaboration.

Talvez, o mais difícil, do meu ponto de vista como pesquisador de IA, seja o fato de que há muitos de nós, e o computador precisa, de algum modo, trocar, considerar as preferências de muitas pessoas diferentes, e há modos diferentes para fazer isso. Economistas, sociólogos, filósofos morais entenderam isso, e estamos procurando ativamente por colaboração.

Let's have a look and see what happens when you get that wrong. So you can have a conversation, for example, with your intelligent personal assistant that might be available in a few years' time. Think of a Siri on steroids. So Siri says, "Your wife called to remind you about dinner tonight." And of course, you've forgotten. "What? What dinner? What are you talking about?"

Vamos ver o que acontece quando você interpreta isso mal. Você pode ter uma conversa, por exemplo, com seu assistente pessoal inteligente que pode estar disponível daqui a alguns anos. Pense em um assistente virtual. O assistente lhe diz: "Sua esposa ligou para lembrá-lo do jantar de hoje à noite", mas você havia esquecido: "O quê? Que jantar? Do que você está falando?"

"Uh, your 20th anniversary at 7pm."

"Seu aniversário de 20 anos, às 19h."

"I can't do that. I'm meeting with the secretary-general at 7:30. How could this have happened?"

"Não vai dar. Tenho um encontro com o secretário geral às 19h30. Como foi que isso aconteceu?"

"Well, I did warn you, but you overrode my recommendation."

"Bem, eu o avisei, mas você ignorou minha recomendação."

"Well, what am I going to do? I can't just tell him I'm too busy."

"O que vou fazer? Não posso falar que estou muito ocupado."

"Don't worry. I arranged for his plane to be delayed."

"Não se preocupe. Dei um jeito para o avião dele atrasar."

(Laughter)

(Risos)

"Some kind of computer malfunction."

"Algum tipo de defeito no computador".

(Laughter)

(Risos)

"Really? You can do that?"

"Sério? Consegue fazer isso?"

"He sends his profound apologies and looks forward to meeting you for lunch tomorrow."

"Ele manda suas profundas desculpas e não vê a hora de encontrá-lo amanhã para o almoço".

(Laughter)

(Risos)

So the values here -- there's a slight mistake going on. This is clearly following my wife's values which is "Happy wife, happy life."

Está acontecendo um pequeno erro, que é, obviamente, seguir os valores de minha esposa: "Esposa feliz, vida feliz".

(Laughter)

(Risos)

It could go the other way. You could come home after a hard day's work, and the computer says, "Long day?"

Poderia seguir outro rumo. Você chega depois de um dia de trabalho, e o computador diz: "Foi um longo dia?"

"Yes, I didn't even have time for lunch."

"Sim, nem consegui almoçar."

"You must be very hungry."

"Você deve estar faminto."

"Starving, yeah. Could you make some dinner?"

"Sim, faminto. Pode fazer o jantar?"

"There's something I need to tell you."

"Precisamos conversar."

(Laughter)

(Risos)

"There are humans in South Sudan who are in more urgent need than you."

"Tem gente no Sudão com necessidades mais urgentes do que as suas."

(Laughter)

(Risos)

"So I'm leaving. Make your own dinner."

"Vou sair. Faça você mesmo o seu jantar."

(Laughter)

(Risos)

So we have to solve these problems, and I'm looking forward to working on them.

Temos que resolver esses problemas, e estou esperando ansiosamente para trabalhar neles.

There are reasons for optimism. One reason is, there is a massive amount of data. Because remember -- I said they're going to read everything the human race has ever written. Most of what we write about is human beings doing things and other people getting upset about it. So there's a massive amount of data to learn from.

Há razões para o otimismo. Uma delas é que há uma enorme quantidade de dados. Lembrem-se: eu disse que eles irão ler tudo que o homem tiver escrito. A maioria do que escrevemos é sobre pessoas fazendo coisas e outras ficando aborrecidas com isso. Temos uma enorme quantidade de dados para aprender.

There's also a very strong economic incentive to get this right. So imagine your domestic robot's at home. You're late from work again and the robot has to feed the kids, and the kids are hungry and there's nothing in the fridge. And the robot sees the cat.

Há também um incentivo econômico muito forte para resolver isso. Imagine então seu robô doméstico em casa. Você chega tarde em casa e o robô precisa alimentar as crianças, elas estão com fome e não tem nada na geladeira. E o robô vê o gato.

(Laughter)

(Risos)

And the robot hasn't quite learned the human value function properly, so it doesn't understand the sentimental value of the cat outweighs the nutritional value of the cat.

O robô não aprendeu bem a função do valor humano. Então, ele não compreende que o valor sentimental pelo gato pesa mais do que seu valor nutritivo.

(Laughter)

(Risos)

So then what happens? Well, it happens like this: "Deranged robot cooks kitty for family dinner." That one incident would be the end of the domestic robot industry. So there's a huge incentive to get this right long before we reach superintelligent machines.

O que acontece então? Bem, acontece o seguinte: "Robô louco cozinha gatinho para o jantar". Esse único incidente seria o fim da indústria de robôs domésticos. Há um enorme incentivo para resolver isso antes de chegarmos aos computadores superinteligentes.

So to summarize: I'm actually trying to change the definition of AI so that we have provably beneficial machines. And the principles are: machines that are altruistic, that want to achieve only our objectives, but that are uncertain about what those objectives are, and will watch all of us to learn more about what it is that we really want. And hopefully in the process, we will learn to be better people. Thank you very much.

Em resumo: estou tentando realmente mudar a definição de IA para que tenhamos computadores úteis. Os princípios são: computadores altruístas, que querem alcançar apenas nossos objetivos, mas estão incertos sobre quais são eles, e irão observar todos nós para aprender mais sobre o que realmente queremos. E tomara que no processo, aprendamos a ser pessoas melhores. Muito obrigado.

(Applause)

(Aplausos)

Chris Anderson: So interesting, Stuart. We're going to stand here a bit because I think they're setting up for our next speaker.

Chris Anderson: Interessante, Stuart. Ficaremos aqui um pouco porque acho que estão preparando para o próximo palestrante.

A couple of questions. So the idea of programming in ignorance seems intuitively really powerful. As you get to superintelligence, what's going to stop a robot reading literature and discovering this idea that knowledge is actually better than ignorance and still just shifting its own goals and rewriting that programming?

Algumas questões. A ideia de programar na ignorância parece realmente convincente. Ao chegar à superinteligência, o que irá impedir um robô de ler literatura e descobrir que o conhecimento é melhor que a ignorância e ainda mudar seus próprios objetivos e reescrever essa programação?

Stuart Russell: Yes, so we want it to learn more, as I said, about our objectives. It'll only become more certain as it becomes more correct, so the evidence is there and it's going to be designed to interpret it correctly. It will understand, for example, that books are very biased in the evidence they contain. They only talk about kings and princes and elite white male people doing stuff. So it's a complicated problem, but as it learns more about our objectives it will become more and more useful to us.

Stuart Russell: Sim, queremos... Queremos que ele aprenda mais, como eu disse, sobre nossos objetivos. Ele só se tornará mais seguro quando se tornar mais correto. A evidência está lá, e ele será projetado para interpretá-la corretamente. Ele entenderá, por exemplo, que os livros são muito tendenciosos na evidência que contêm. Eles só falam sobre reis e príncipes e a elite do homem branco fazendo coisas. É um problema complicado, mas, à medida que aprende mais sobre nossos objetivos, ele será cada vez mais útil para nós.

CA: And you couldn't just boil it down to one law, you know, hardwired in: "if any human ever tries to switch me off, I comply. I comply."

CA: E você não poderia apenas reduzir a uma regra, integrada em: "Se qualquer pessoa tentar me desligar, eu concordo. Eu concordo"?

SR: Absolutely not. That would be a terrible idea. So imagine that you have a self-driving car and you want to send your five-year-old off to preschool. Do you want your five-year-old to be able to switch off the car while it's driving along? Probably not. So it needs to understand how rational and sensible the person is. The more rational the person, the more willing you are to be switched off. If the person is completely random or even malicious, then you're less willing to be switched off.

SR: De jeito nenhum. Seria uma ideia terrível. Imagine que você tem um carro que dirige sozinho e você quer mandar seu filho de cinco anos para a escola. Quer que seu filho de cinco anos consiga desligar o carro em movimento? Provavelmente não. Ele tem que compreender a racionalidade e a sensibilidade da pessoa. Quanto mais racional for a pessoa, mais disposto estará para ser desligado. Se a pessoa for sem noção ou mal-intencionada, menos disposto você estará para ser desligado.

CA: All right. Stuart, can I just say, I really, really hope you figure this out for us. Thank you so much for that talk. That was amazing.

CA: Tudo bem. Stuart, espero realmente que você resolva isso para nós. Muito obrigado por esta palestra. Foi incrível. Obrigado. (Aplausos)

SR: Thank you.

(Applause)

This is Lee Sedol. Lee Sedol is one of the world's greatest Go players, and he's having what my friends in Silicon Valley call a "Holy Cow" moment --

Este é Lee Sedol. Lee Sedol é um dos maiores jogadores de Go do mundo, e está tendo o que meus amigos do Vale do Silício chamam de momento "Caramba!",

(Laughter)

(Risos)

a moment where we realize that AI is actually progressing a lot faster than we expected. So humans have lost on the Go board. What about the real world?

um momento em que percebemos que a IA está progredindo realmente muito mais rápido do que esperávamos. O homem perde no tabuleiro de Go.

Well, the real world is much bigger,

E no mundo real?

(Laughter)

(Risos)

So we actually need to nail down the problem a bit more. What exactly is the problem? Why is better AI possibly a catastrophe?

Temos, na realidade, que decidir sobre o problema. Qual é o problema exatamente? Por que a IA pode ser uma catástrofe?

(Laughter)

(Risos)

It's very simple. Just remember that. Repeat it to yourself three times a day.

É muito simples. Lembrem-se apenas disso. Repitam a si mesmos três vezes ao dia.

(Laughter)

(Risos)

(Laughter)

(Risos)

So this seems to be inevitable, right? This kind of failure mode seems to be inevitable, and it follows from having a concrete, definite objective.

Isso parece inevitável, não? Este tipo de modo de falha parece inevitável, e resulta de um objetivo concreto, definido.

"Uh, your 20th anniversary at 7pm."

"Seu aniversário de 20 anos, às 19h."

"I can't do that. I'm meeting with the secretary-general at 7:30. How could this have happened?"

"Não vai dar. Tenho um encontro com o secretário geral às 19h30. Como foi que isso aconteceu?"

"Well, I did warn you, but you overrode my recommendation."

"Bem, eu o avisei, mas você ignorou minha recomendação."

"Well, what am I going to do? I can't just tell him I'm too busy."

"O que vou fazer? Não posso falar que estou muito ocupado."

"Don't worry. I arranged for his plane to be delayed."

"Não se preocupe. Dei um jeito para o avião dele atrasar."

(Laughter)

(Risos)

"Some kind of computer malfunction."

"Algum tipo de defeito no computador".

(Laughter)

(Risos)

"Really? You can do that?"

"Sério? Consegue fazer isso?"

"He sends his profound apologies and looks forward to meeting you for lunch tomorrow."

"Ele manda suas profundas desculpas e não vê a hora de encontrá-lo amanhã para o almoço".

(Laughter)

(Risos)

So the values here -- there's a slight mistake going on. This is clearly following my wife's values which is "Happy wife, happy life."

Está acontecendo um pequeno erro, que é, obviamente, seguir os valores de minha esposa: "Esposa feliz, vida feliz".

(Laughter)

(Risos)

It could go the other way. You could come home after a hard day's work, and the computer says, "Long day?"

Poderia seguir outro rumo. Você chega depois de um dia de trabalho, e o computador diz: "Foi um longo dia?"

"Yes, I didn't even have time for lunch."

"Sim, nem consegui almoçar."

"You must be very hungry."

"Você deve estar faminto."

"Starving, yeah. Could you make some dinner?"

"Sim, faminto. Pode fazer o jantar?"

"There's something I need to tell you."

"Precisamos conversar."

(Laughter)

(Risos)

"There are humans in South Sudan who are in more urgent need than you."

"Tem gente no Sudão com necessidades mais urgentes do que as suas."

(Laughter)

(Risos)

"So I'm leaving. Make your own dinner."

"Vou sair. Faça você mesmo o seu jantar."

(Laughter)

(Risos)

So we have to solve these problems, and I'm looking forward to working on them.

Temos que resolver esses problemas, e estou esperando ansiosamente para trabalhar neles.

(Laughter)

(Risos)

And the robot hasn't quite learned the human value function properly, so it doesn't understand the sentimental value of the cat outweighs the nutritional value of the cat.

O robô não aprendeu bem a função do valor humano. Então, ele não compreende que o valor sentimental pelo gato pesa mais do que seu valor nutritivo.

(Laughter)

(Risos)

(Applause)

(Aplausos)

Chris Anderson: So interesting, Stuart. We're going to stand here a bit because I think they're setting up for our next speaker.

Chris Anderson: Interessante, Stuart. Ficaremos aqui um pouco porque acho que estão preparando para o próximo palestrante.

CA: And you couldn't just boil it down to one law, you know, hardwired in: "if any human ever tries to switch me off, I comply. I comply."

CA: E você não poderia apenas reduzir a uma regra, integrada em: "Se qualquer pessoa tentar me desligar, eu concordo. Eu concordo"?

CA: All right. Stuart, can I just say, I really, really hope you figure this out for us. Thank you so much for that talk. That was amazing.

CA: Tudo bem. Stuart, espero realmente que você resolva isso para nós. Muito obrigado por esta palestra. Foi incrível. Obrigado. (Aplausos)

SR: Thank you.

(Applause)

Stuart Russell: 3 principles for creating safer AI

Stuart Russell: 3 principles for creating safer AI

Related talks

Blaise Agüera y Arcas: How computers are learning to be creative

Sam Harris: Can we build AI without losing control over it?

Zeynep Tufekci: Machine intelligence makes human morals more important

Noriko Arai: Can a robot pass a university entrance exam?

David Lee: Why jobs of the future won't feel like work

Kriti Sharma: How to keep human bias out of AI

Related talks

Blaise Agüera y Arcas: How computers are learning to be creative

Sam Harris: Can we build AI without losing control over it?

Zeynep Tufekci: Machine intelligence makes human morals more important

Noriko Arai: Can a robot pass a university entrance exam?

David Lee: Why jobs of the future won't feel like work

Kriti Sharma: How to keep human bias out of AI