Stuart Russell: 3 principles for creating safer AI

This is Lee Sedol. Lee Sedol is one of the world's greatest Go players, and he's having what my friends in Silicon Valley call a "Holy Cow" moment --

이 사람은 이세돌입니다. 이세돌은 세계에서 가장 뛰어난 바둑 기사 중 한명이죠. 보고 계신 이 사진의 순간에 실리콘벨리의 제 친구들은 "세상에나!" 라고 외쳤습니다.

(Laughter)

(웃음)

a moment where we realize that AI is actually progressing a lot faster than we expected. So humans have lost on the Go board. What about the real world?

바로 이 순간에 인공지능이 생각했던 것보다 빠르게 발전하고 있음을 깨닫게 되었죠. 자, 바둑판에서 인간들은 패배했습니다. 그럼 현실 세상에서는 어떨까요?

Well, the real world is much bigger, much more complicated than the Go board. It's a lot less visible, but it's still a decision problem. And if we think about some of the technologies that are coming down the pike ... Noriko [Arai] mentioned that reading is not yet happening in machines, at least with understanding. But that will happen, and when that happens, very soon afterwards, machines will have read everything that the human race has ever written. And that will enable machines, along with the ability to look further ahead than humans can, as we've already seen in Go, if they also have access to more information, they'll be able to make better decisions in the real world than we can. So is that a good thing? Well, I hope so.

현실 세계는 훨씬 크고 바둑판보다 훨씬 복잡합니다. 한 눈에 들어오지도 않고 결정에 관한 문제도 여전히 남아있습니다. 그리고 새롭게 떠오르는 기술들을 살펴보면 ---------------------------------- 노리코 아라이 교수가 말한대로 독서하는 인공지능은 아직 없습니다. 적어도 이해를 동반한 독서 말이죠. 하지만 그런 인공지능도 출현할 겁니다. 그때가 되면 얼마 안돼서 인공지능은 인류가 지금까지 쓴 모든 것을 읽게 될 것입니다. 그리고 인공지능 기계들이 그렇게 할 수 있게 되면 인간보다 더 멀리 예측하는 능력을 갖게 되고 바둑 시합에서 이미 드러났듯이 인공지능 기계들이 더 많은 정보에 접근할 수 있으면 실제 세계에서 우리보다 더 나은 결정들을 할 수 있을 것입니다. 그러면 좋은 것일까요? 음, 그랬으면 좋겠네요.

Our entire civilization, everything that we value, is based on our intelligence. And if we had access to a lot more intelligence, then there's really no limit to what the human race can do. And I think this could be, as some people have described it, the biggest event in human history. So why are people saying things like this, that AI might spell the end of the human race? Is this a new thing? Is it just Elon Musk and Bill Gates and Stephen Hawking?

우리 인류 문명을 통틀어 가치 있다고 여기는 모든 것들은 우리 지식에 바탕한 것입니다. 그리고 만약 우리가 더 많은 지식에 접근할 수 있다면 인류가 할 수 있는 것들에서 진정한 한계는 없을 것입니다. 누군가 말했듯이 그렇게만 된다면 인류 역사상 가장 큰 사건이 되겠죠. 그런데 왜 사람들은 이런 말을 할까요? 인공지능(AI)이 우리 인류의 종말을 가져올 거라고 말이죠. 인공지능이 새로운 것일까요? 엘런 머스크, 빌 게이츠, 스티븐 호킹. 이런 사람들만 아는 것인가요?

Actually, no. This idea has been around for a while. Here's a quotation: "Even if we could keep the machines in a subservient position, for instance, by turning off the power at strategic moments" -- and I'll come back to that "turning off the power" idea later on -- "we should, as a species, feel greatly humbled." So who said this? This is Alan Turing in 1951. Alan Turing, as you know, is the father of computer science and in many ways, the father of AI as well. So if we think about this problem, the problem of creating something more intelligent than your own species, we might call this "the gorilla problem," because gorillas' ancestors did this a few million years ago, and now we can ask the gorillas: Was this a good idea?

사실, 아닙니다. 이 개념이 나온 지는 좀 되었죠. 이런 말이 있습니다. "중요한 순간에 전원을 꺼버리는 식으로 기계를 우리 인간에게 계속 복종하도록 만들 수 있더라도.." "전원을 끈다"는 개념은 나중에 다시 설명하겠습니다. "우리는 인류로서 겸손함을 느껴야 합니다. " 누가 한 말일까요? 앨런 튜링이 1951년에 한 말입니다. 앨런 튜링은 아시다시피 컴퓨터 과학의 아버지입니다. 그리고 여러 측면에서 인공지능의 아버지이기도 하죠. 이런 문제를 생각해볼까요. 우리 인류의 지능을 뛰어넘는 무언가를 창조하는 문제입니다. 이른바 "고릴라 문제"라 할 수 있죠. 수만 년 전 고릴라들의 조상들도 같은 고민을 했을테니까요. 그럼 이제 고릴라들에게 이렇게 물어보면 어떨까요. "좋은 아이디어였나요?"

So here they are having a meeting to discuss whether it was a good idea, and after a little while, they conclude, no, this was a terrible idea. Our species is in dire straits. In fact, you can see the existential sadness in their eyes.

고릴라들은 좋은 생각이었는지 의논하기 위해 모였습니다. 그리고 잠시 후 결론을 내립니다. 최악의 아이디어였다고 결론짓죠. 그 때문에 자신들이 곤경에 처했다면서요. 실제로 그들 눈에서 존재론적 슬픔이 엿보이네요.

(Laughter)

(웃음)

So this queasy feeling that making something smarter than your own species is maybe not a good idea -- what can we do about that? Well, really nothing, except stop doing AI, and because of all the benefits that I mentioned and because I'm an AI researcher, I'm not having that. I actually want to be able to keep doing AI.

따라서 우리보다 더 뛰어난 무언가를 만들어낸다는 이러한 초조한 느낌은 좋은 아이디어가 아닐 겁니다. 그럼 어떻게 해야 할까요? 음, 사실 인공지능을 중단시키는 것 말고는 딱히 방법이 없습니다. 그리고 제가 앞서 말씀드렸던 모든 장점들 때문이기도 하고 물론 제가 인공지능을 연구하고 있어서도 그렇지만 저는 중단을 고려하지는 않습니다. 저는 사실 계속 인공지능을 가능하게 하고 싶습니다.

So we actually need to nail down the problem a bit more. What exactly is the problem? Why is better AI possibly a catastrophe?

여기서 이 문제에 대해 좀 더 살펴볼 필요가 있습니다. 문제가 정확히 무엇일까요? 왜 뛰어난 인공지능은 재앙을 뜻할까요?

So here's another quotation: "We had better be quite sure that the purpose put into the machine is the purpose which we really desire." This was said by Norbert Wiener in 1960, shortly after he watched one of the very early learning systems learn to play checkers better than its creator. But this could equally have been said by King Midas. King Midas said, "I want everything I touch to turn to gold," and he got exactly what he asked for. That was the purpose that he put into the machine, so to speak, and then his food and his drink and his relatives turned to gold and he died in misery and starvation. So we'll call this "the King Midas problem" of stating an objective which is not, in fact, truly aligned with what we want. In modern terms, we call this "the value alignment problem."

자, 이런 말도 있습니다. "기계에 부여한 그 목적이 우리가 정말 원했던 목적인지를 좀 더 확실히 해두었어야 합니다." 이건 노버트 위너가 1960년에 한 말입니다. 매우 초기의 기계 학습 장치가 체스 두는 법을 배우고 사람보다 체스를 더 잘 두게 된 것을 보고 바로 한 말이죠. 하지만 어떻게 보자면 마이더스 왕과 같다고 할 수도 있습니다. 손대는 모든 것이 금으로 변하길 원했던 왕이죠. 그리고 그가 원했던대로 실제로 그렇게 되었습니다. 그게 바로 그가 기계에 입력한 목표입니다. 비유를 하자면 그렇죠. 그리고 그의 음식, 술, 가족까지 모두 금으로 변했습니다. 그리고 그는 처절함과 배고픔 속에서 죽었습니다. 우리는 이것을 "마이더스 왕의 딜레마"라고 부릅니다. 이 딜레마는 그가 말한 목적이 실제로 그가 정말 원했던 것과 동일하지 않게 되는 문제입니다. 현대 용어로는 "가치 조합 문제"라고 하죠.

Putting in the wrong objective is not the only part of the problem. There's another part. If you put an objective into a machine, even something as simple as, "Fetch the coffee," the machine says to itself, "Well, how might I fail to fetch the coffee? Someone might switch me off. OK, I have to take steps to prevent that. I will disable my 'off' switch. I will do anything to defend myself against interference with this objective that I have been given." So this single-minded pursuit in a very defensive mode of an objective that is, in fact, not aligned with the true objectives of the human race -- that's the problem that we face. And in fact, that's the high-value takeaway from this talk. If you want to remember one thing, it's that you can't fetch the coffee if you're dead.

잘못된 목표를 입력하는 것만 문제가 되는 것은 아닙니다. 또 다른 문제가 있습니다. 여러분이 기계에 목표를 입력할 때 "커피를 갖고 와"같이 단순한 것이라고 해도 기계는 이렇게 생각할지 모릅니다. "자, 어떻게 하면 내가 커피를 못가져가게 될까? 누가 전원을 꺼버릴 수도 있잖아. 좋아, 그런 일을 막기 위해 조치를 취해야겠어. 내 '꺼짐' 버튼을 고장 내야겠어. 주어진 임무를 못하게 방해하는 것들로부터 나를 보호하기 위해 뭐든지 할 거야. " 사실, 이렇게 방어적 자세로 오직 목표 달성만을 추구하는 것은 인류의 진실된 목표와 일치하지는 않습니다. 우리가 직면한 문제점이 바로 이것입니다. 이건 사실 이번 강연 에서 무척 고차원적인 부분인데요. 만약 하나만 기억해야 한다면 여러분이 죽으면 커피를 가져다 주지 않을 거라는 것입니다.

(Laughter)

(웃음)

It's very simple. Just remember that. Repeat it to yourself three times a day.

간단하죠. 하루에 세 번씩 외우세요.

(Laughter)

(웃음)

And in fact, this is exactly the plot of "2001: [A Space Odyssey]" HAL has an objective, a mission, which is not aligned with the objectives of the humans, and that leads to this conflict. Now fortunately, HAL is not superintelligent. He's pretty smart, but eventually Dave outwits him and manages to switch him off. But we might not be so lucky. So what are we going to do?

그리고 사실 이게 바로 "2001:스페이스 오디세이" 영화의 줄거리입니다. 인공지능인 '할(HAL)'은 목표, 즉 임무를 갖고 있습니다. 이건 인류의 목표와 일치하지는 않습니다. 그래서 결국 서로 충돌하죠. 다행히도 HAL의 지능이 아주 뛰어나진 않았습니다. 꽤 똑똑했지만 결국 데이브가 HAL보다 한 수 위였고 HAL의 전원을 끌 수 있게 됩니다. 하지만 우리는 영화처럼 운이 좋지 않을 수 있습니다. 그럼 어떻게 해야 할까요?

I'm trying to redefine AI to get away from this classical notion of machines that intelligently pursue objectives. There are three principles involved. The first one is a principle of altruism, if you like, that the robot's only objective is to maximize the realization of human objectives, of human values. And by values here I don't mean touchy-feely, goody-goody values. I just mean whatever it is that the human would prefer their life to be like. And so this actually violates Asimov's law that the robot has to protect its own existence. It has no interest in preserving its existence whatsoever.

저는 인공지능을 다시 정의하려 합니다. 고전적 개념을 깨려고 해요. 기계는 지능적으로 목표를 달성하려 한다는 개념이죠. 여기에는 세 가지 원칙이 있습니다. 첫번째는 이타주의 원칙입니다. 로봇의 유일한 목적은 인간의 목표와 인간의 가치를 최대한 실현하는 것입니다. 여기서 말하는 가치는 닭살돋는 숭고한 가치를 의미하진 않습니다. 제가 말씀드리는 가치는 어떻게 해야 사람들의 삶이 더 나아질지를 의미하는 것입니다. 그리고 사실 이건 아시모프의 원칙 중에서 로봇은 스스로를 보호해야 한다는 원칙과 충돌합니다. 어쨌든 로봇은 스스로를 보호해서 얻는 것이 없죠.

The second law is a law of humility, if you like. And this turns out to be really important to make robots safe. It says that the robot does not know what those human values are, so it has to maximize them, but it doesn't know what they are. And that avoids this problem of single-minded pursuit of an objective. This uncertainty turns out to be crucial.

두 번째 원칙은 겸손에 관한 법칙이라고 말할 수 있습니다. 로봇의 안전을 지키기 위한 아주 중요한 원칙이죠. 이 원칙은 로봇들은 인간이 추구하는 가치가 무엇인지는 알 수 없다는 것입니다. 가치를 극대화해야 하지만 그것이 무엇인지는 알지 못한다는 것이죠. 그렇기 때문에 이를 통해서 목표만을 맹목적으로 추구하는 문제를 피할 수 있습니다. 이런 불확실성은 매우 중요한 부분입니다.

Now, in order to be useful to us, it has to have some idea of what we want. It obtains that information primarily by observation of human choices, so our own choices reveal information about what it is that we prefer our lives to be like. So those are the three principles. Let's see how that applies to this question of: "Can you switch the machine off?" as Turing suggested.

자, 우리에게 도움이 되기 위해서는 우리가 무엇을 원하는지에 대한 개념을 갖고 있어야 합니다. 로봇은 주로 사람들이 선택하는 것을 관찰해서 정보를 얻습니다. 그래서 우리들의 선택 안에는 우리 삶이 어떻게 되기를 바라는지에 관한 정보가 있습니다 이것들이 세 개의 원칙입니다. 자 그러면 이것이 다음 질문에 어떻게 적용되는지 살펴봅시다. 기계의 전원을 끌 수 있을지에 관해 앨런 튜링이 제기했던 문제입니다.

So here's a PR2 robot. This is one that we have in our lab, and it has a big red "off" switch right on the back. The question is: Is it going to let you switch it off? If we do it the classical way, we give it the objective of, "Fetch the coffee, I must fetch the coffee, I can't fetch the coffee if I'm dead," so obviously the PR2 has been listening to my talk, and so it says, therefore, "I must disable my 'off' switch, and probably taser all the other people in Starbucks who might interfere with me."

여기에 PR2 로봇이 있습니다. 저희 연구소에 있는 로봇입니다. 이 로봇 뒤에는 커다랗고 빨간 "꺼짐" 버튼이 있습니다. 자 여기에서 질문입니다. 로봇은 여러분이 전원을 끄도록 할까요? 고전적 방법대로 로봇에게 이런 목표를 부여한다고 치죠. "커피를 갖고 간다. 나는 커피를 가져가야만 한다. 내 전원이 꺼지면 커피를 갖고 갈 수 없다." PR2는 제 명령을 그대로 따르기 위해 이렇게 말하겠죠. "좋아, 내 '꺼짐' 버튼을 망가뜨려야 겠어. 그리고 스타벅스에 있는 모든 사람들을 전기총으로 쏴야겠어. 이 사람들이 장애물이 될 테니까."

(Laughter)

(웃음)

So this seems to be inevitable, right? This kind of failure mode seems to be inevitable, and it follows from having a concrete, definite objective.

이렇게 될 수 밖에 없잖아요. 이러한 종류의 실패 모드는 피할 수 없을 겁니다. 확실하고 명확한 목적을 갖고 그대로 따르고 있으니까요.

So what happens if the machine is uncertain about the objective? Well, it reasons in a different way. It says, "OK, the human might switch me off, but only if I'm doing something wrong. Well, I don't really know what wrong is, but I know that I don't want to do it." So that's the first and second principles right there. "So I should let the human switch me off." And in fact you can calculate the incentive that the robot has to allow the human to switch it off, and it's directly tied to the degree of uncertainty about the underlying objective.

그러면 이 기계가 목표를 부정확하게 알고 있으면 어떤 일이 일어날까요? 음, 좀 다른 방식으로 이렇게 생각하게 될 겁니다. "좋아, 인간이 내 전원을 끌 수 있어. 하지만 내가 뭔가 잘못했을 때만 그럴 거야. 그런데 나는 잘못한다는 게 뭔지는 모르지만 잘못하고 싶지는 않아." 자, 여기에 첫 번째와 두 번째 원칙이 있습니다. "그러면 나는 인간이 전원을 끄게 놔둬야겠다." 사실 로봇이 인간으로 하여금 전원을 끌 수 있도록 놔둘 때 어떤 이득이 있을지 예측할 수 있고 이것은 기저에 깔린 목적에 대한 불확실성의 정도와 직접적으로 연결됩니다.

And then when the machine is switched off, that third principle comes into play. It learns something about the objectives it should be pursuing, because it learns that what it did wasn't right. In fact, we can, with suitable use of Greek symbols, as mathematicians usually do, we can actually prove a theorem that says that such a robot is provably beneficial to the human. You are provably better off with a machine that's designed in this way than without it. So this is a very simple example, but this is the first step in what we're trying to do with human-compatible AI.

그리고 로봇의 전원이 꺼지면 세 번째 원칙이 작동하게 됩니다. 로봇은 추구해야 하는 목표에 대해 이해하게 되죠. 목표 수행에서 무엇이 잘못됐는지 알게 되기 때문입니다. 사실 우리가 그리이스 문자를 사용해서 수학을 늘 공부하는 것처럼 우리는 하나의 가설을 증명할 수 있습니다. 어떤 정의냐면, 이런 로봇은 아마도 인간에게 도움이 될 거라는 겁니다. 아마도 이렇게 작동하게 만들어진 기계로 인해 혜택을 보게 될 겁니다. 이런 기계가 없을 때보다 말이죠. 이것은 매우 간단한 예시입니다. 인간과 공존할 수 있는 인공지능으로 무얼 할 수 있는지에 대한 첫걸음이죠.

Now, this third principle, I think is the one that you're probably scratching your head over. You're probably thinking, "Well, you know, I behave badly. I don't want my robot to behave like me. I sneak down in the middle of the night and take stuff from the fridge. I do this and that." There's all kinds of things you don't want the robot doing. But in fact, it doesn't quite work that way. Just because you behave badly doesn't mean the robot is going to copy your behavior. It's going to understand your motivations and maybe help you resist them, if appropriate. But it's still difficult. What we're trying to do, in fact, is to allow machines to predict for any person and for any possible life that they could live, and the lives of everybody else: Which would they prefer? And there are many, many difficulties involved in doing this; I don't expect that this is going to get solved very quickly. The real difficulties, in fact, are us.

자, 이제 세 번째 원칙입니다. 이걸 보고 어쩌면 여러분이 머리를 긁적이실 것 같은데요. 이렇게 생각할 수도 있습니다. "음, 내가 좀 못됐잖아. 나는 내 로봇이 나처럼 행동하는 건 별로야. 나는 한밤중에 몰래 냉장고를 뒤져서 먹을 걸 찾기도 하잖아. 그것 말고도 많지." 로봇이 하지 않았으면 하는 여러 행동들이 있을 겁니다. 그런데, 그런 일은 없을 거예요. 단순히 여러분이 나쁜 행동을 했다고 해서 로봇이 그 행동을 그대로 따라서 하는 건 아닙니다. 로봇은 여러분의 동기를 이해하고 그 나쁜 행동을 하지 않도록 돕습니다. 만약 그게 옳다면 말이죠. 하지만 이건 어려운 문제입니다. 사실 우리가 하려고 하는 건 기계로 하여금 누군가를 대상으로 앞으로 그가 살게 될 삶이 어떠한지를 예측하게 하는 겁니다. 그리고 다른 여러 모두의 사람들의 삶을 말이죠. 어떤 것을 그들이 더 선호할까요? 이를 가능하게 하려면 너무나 많은 난관을 넘어야 합니다. 저는 이 문제들이 단기간에 해결되리라고 보지 않습니다. 사실 그중 가장 큰 난관은 바로 우리 자신입니다.

As I have already mentioned, we behave badly. In fact, some of us are downright nasty. Now the robot, as I said, doesn't have to copy the behavior. The robot does not have any objective of its own. It's purely altruistic. And it's not designed just to satisfy the desires of one person, the user, but in fact it has to respect the preferences of everybody. So it can deal with a certain amount of nastiness, and it can even understand that your nastiness, for example, you may take bribes as a passport official because you need to feed your family and send your kids to school. It can understand that; it doesn't mean it's going to steal. In fact, it'll just help you send your kids to school.

앞서 말씀드렸듯이 우리는 나쁜 행동을 합니다. 사실 우리 중에 누군가는 정말 형편 없을지도 모르죠. 말씀드렸듯이 로봇은 그런 행동을 그대로 따라 하지는 않습니다. 로봇은 스스로를 위한 어떤 목적도 갖지 않습니다. 로봇은 순전히 이타적이죠. 그리고 사용자라는 한 사람만의 욕구를 충족하기 위해 만들어진 것도 아닙니다. 사실은 모든 사람들이 원하는 바를 고려해야 하죠. 어느 정도의 형편없는 상황은 감내하게 될 것입니다. 여러분이 아무리 형편없어도 어느 정도는 양해해 주겠죠. 예를 들어 여러분이 여권 발급 공무원인데 뇌물을 받았다고 칩시다. 그런데 뇌물을 받은 이유가 생활비와 아이들 교육을 위한 것이었다면 로봇은 이런 걸 이해하고, 빼앗아가지는 않을 겁니다. 오히려 로봇은 여러분의 자녀가 학교에 갈 수 있도록 도울 거예요.

We are also computationally limited. Lee Sedol is a brilliant Go player, but he still lost. So if we look at his actions, he took an action that lost the game. That doesn't mean he wanted to lose. So to understand his behavior, we actually have to invert through a model of human cognition that includes our computational limitations -- a very complicated model. But it's still something that we can work on understanding.

우리의 연산능력은 한계가 있습니다. 이세들은 천재 바둑기사입니다. 하지만 그는 로봇에게 졌죠. 그의 수를 살펴보면, 그는 로봇에게 질 수 밖에 없는 수를 두었습니다. 그렇다고 해서 그가 패배를 원했다는 건 아닙니다. 따라서 그의 수를 이해하려면 우리는 인간의 지각 모델을 아예 뒤집어 봐야 합니다 인간의 지각 모델은 우리의 계산적 한계를 담고 있어서 매우 복잡합니다. 하지만 연구를 통해서 계속 발전하고 있습니다.

Probably the most difficult part, from my point of view as an AI researcher, is the fact that there are lots of us, and so the machine has to somehow trade off, weigh up the preferences of many different people, and there are different ways to do that. Economists, sociologists, moral philosophers have understood that, and we are actively looking for collaboration.

인공지능 연구자의 입장에서 제가 보기에 가장 어려운 부분은 인간이 너무 많다는 것입니다. 그래서 인공지능 기계는 수많은 사람들의 선호도를 비교하면서 균형을 유지해야 하죠. 거기에는 여러 방법이 있습니다. 그에 관해 경제학자, 사회학자, 윤리학자들이 연구를 수행했고 각 분야가 서로 협력하는 시도를 하고 있습니다.

Let's have a look and see what happens when you get that wrong. So you can have a conversation, for example, with your intelligent personal assistant that might be available in a few years' time. Think of a Siri on steroids. So Siri says, "Your wife called to remind you about dinner tonight." And of course, you've forgotten. "What? What dinner? What are you talking about?"

자 여러분이 잘못했을 때 어떤 일이 일어날지 살펴보죠. 예를 들어 여러분이 대화를 하고 있습니다. 여러분의 인공지능 비서와 이야기를 하고 있습니다. 이런 대화는 앞으로 몇 년 안에 가능해질 겁니다. 시리를 생각해보세요. 시리가 "사모님이 오늘 저녁 약속을 잊지말라고 전화하셨어요"라고 말합니다. 여러분은 약속을 잊고 있었죠. "뭐라고? 무슨 저녁? 무슨 말 하는거야?"

"Uh, your 20th anniversary at 7pm."

"음, 7시로 예정된 결혼 20주년 축하 저녁식사 입니다."

"I can't do that. I'm meeting with the secretary-general at 7:30. How could this have happened?"

"시간이 안되는데.. 사무총장님과 7시 반에 약속이 있단 말이야. 어떻게 이렇게 겹쳤지?"

"Well, I did warn you, but you overrode my recommendation."

"음, 제가 말씀드리긴 했지만 제 이야기는 무시하셨죠."

"Well, what am I going to do? I can't just tell him I'm too busy."

"그럼 어떡하지? 사무총장님한테 바쁘다고 핑계를 댈 수는 없잖아."

"Don't worry. I arranged for his plane to be delayed."

"걱정마세요. 제가 사무총장님이 타신 비행기가 연착되도록 손 썼습니다."

(Laughter)

(웃음)

"Some kind of computer malfunction."

"알 수 없는 컴퓨터 오류가 일어나도록 했죠."

(Laughter)

(웃음)

"Really? You can do that?"

"정말? 그렇게 할 수 있어?"

"He sends his profound apologies and looks forward to meeting you for lunch tomorrow."

"사무총장님이 정말 미안하다고 하셨습니다. 그리고 대신 내일 점심에 꼭 보자고 하셨습니다."

(Laughter)

(웃음)

So the values here -- there's a slight mistake going on. This is clearly following my wife's values which is "Happy wife, happy life."

이 상황에서 가치를 둔 것은.. 방향이 조금 잘못되기는 했지만요. 이건 분명 제 아내에게 가치를 둔 결정입니다. "아내가 행복하면, 삶이 행복하다." 라는 말처럼요.

(Laughter)

(웃음)

It could go the other way. You could come home after a hard day's work, and the computer says, "Long day?"

다른 상황도 벌어질 수 있습니다. 여러분이 정말 힘든 하루를 보내고 집에 왔습니다. 그리고 컴퓨터가 말합니다. "힘든 하루였죠?"

"Yes, I didn't even have time for lunch."

"맞아, 점심 먹을 시간도 없었어."

"You must be very hungry."

"배가 많이 고프시겠네요."

"Starving, yeah. Could you make some dinner?"

"배고파 죽을 것 같아. 저녁 좀 해줄래?"

"There's something I need to tell you."

"제가 먼저 드릴 말씀이 있습니다."

(Laughter)

(웃음)

"There are humans in South Sudan who are in more urgent need than you."

"남 수단에는 주인님보다 더 심한 굶주림에 시달리는 사람들이 있습니다."

(Laughter)

(웃음)

"So I'm leaving. Make your own dinner."

"저는 이제 전원을 끄겠습니다. 저녁은 알아서 드세요."

(Laughter)

(웃음)

So we have to solve these problems, and I'm looking forward to working on them.

우리는 이런 문제들을 해결해야 합니다. 그리고 해결되리라고 기대합니다.

There are reasons for optimism. One reason is, there is a massive amount of data. Because remember -- I said they're going to read everything the human race has ever written. Most of what we write about is human beings doing things and other people getting upset about it. So there's a massive amount of data to learn from.

제가 낙관하는 이유가 있습니다. 그 중 하나는 정말 방대한 양의 정보가 있다는 거죠. 기계들은 인류가 기록해 둔 모든 자료를 읽어 들일 것입니다. 우리가 기록한 자료 대부분은 누가 무엇을 했고 누가 그 일에 반대했는가에 관한 것들입니다. 이런 방대한 양의 정보에서 배울 점이 있습니다. 이를 통해 엄청난 경제적 이득을 볼 수 있죠.

There's also a very strong economic incentive to get this right. So imagine your domestic robot's at home. You're late from work again and the robot has to feed the kids, and the kids are hungry and there's nothing in the fridge. And the robot sees the cat.

제대로 활용한다면요. 자, 여러분 집에 가정부 로봇이 있다고 상상해보세요. 여러분이 야근으로 귀가가 늦어졌고, 로봇이 아이들의 식사를 챙겨야 합니다. 아이들은 배가 고프고 냉장고에는 먹을 게 없습니다. 그때 로봇은 고양이를 쳐다봅니다.

(Laughter)

(웃음)

And the robot hasn't quite learned the human value function properly, so it doesn't understand the sentimental value of the cat outweighs the nutritional value of the cat.

그리고 이 로봇은 인간의 가치 평가 방식을 아직 제대로 배우지 않았습니다. 그래서 고양이의 영양학적 가치보다 인간이 고양이를 아끼는 감성적 가치가 더 중요하다는 것을 이해하지 못합니다.

(Laughter)

(웃음)

So then what happens? Well, it happens like this: "Deranged robot cooks kitty for family dinner." That one incident would be the end of the domestic robot industry. So there's a huge incentive to get this right long before we reach superintelligent machines.

어떤 일이 일어날까요? 음, 이런 일이 일어날 겁니다. "미친 로봇이 저녁 메뉴로 고양이를 요리했다" 이 사건 하나로 가정용 로봇 업계는 망해버릴 수 있습니다. 따라서 초지능을 가진 로봇이 출현하기 전에 이걸 바로 잡는 것이 무척 중요합니다.

So to summarize: I'm actually trying to change the definition of AI so that we have provably beneficial machines. And the principles are: machines that are altruistic, that want to achieve only our objectives, but that are uncertain about what those objectives are, and will watch all of us to learn more about what it is that we really want. And hopefully in the process, we will learn to be better people. Thank you very much.

자, 요약을 하자면 저는 인공지능에 대한 정의를 바꾸기 위해 노력합니다. 그래야만 정말 도움이 되는 로봇들을 가질 수 있을 겁니다. 여기에는 기본 원칙이 있죠. 로봇은 이타적이어야 하고 우리가 원하는 목적만을 이루려고 해야 합니다. 하지만 그 목적이 정확히 무엇인지 몰라야 합니다. 그리고 우리 인간을 잘 살펴야 하죠. 우리가 진정 무엇을 원하는지 이해하기 위해서입니다. 그리고 그 과정에서 우리는 더 나은 사람이 되는 법을 배우게 될 것입니다. 감사합니다.

(Applause)

(웃음)

Chris Anderson: So interesting, Stuart. We're going to stand here a bit because I think they're setting up for our next speaker.

CA: 정말 흥미롭네요. 스튜어트. 다음 강연 순서를 준비하는 동안에 잠깐 이야기를 나누겠습니다. 몇 가지 질문이 있는데요.

A couple of questions. So the idea of programming in ignorance seems intuitively really powerful. As you get to superintelligence, what's going to stop a robot reading literature and discovering this idea that knowledge is actually better than ignorance and still just shifting its own goals and rewriting that programming?

무지한 상태에서 프로그래밍한다는 게 무척 대단하다고 생각됩니다. 초지능 연구를 하고 계시는데요. 로봇의 이런 부분을 막을 수 있을까요? 로봇이 문학 서적을 읽고 지식이 무지보다 더 낫다는 개념을 발견하게 되고 그러면서도 계속 스스로 목적을 바꾸고 그걸 다시 프로그래밍히는 걸 막을 수 있나요?

Stuart Russell: Yes, so we want it to learn more, as I said, about our objectives. It'll only become more certain as it becomes more correct, so the evidence is there and it's going to be designed to interpret it correctly. It will understand, for example, that books are very biased in the evidence they contain. They only talk about kings and princes and elite white male people doing stuff. So it's a complicated problem, but as it learns more about our objectives it will become more and more useful to us.

SR: 네. 말씀드렸다시피, 우리는 로봇이 더 많이 배우길 바랍니다. 우리의 목표에 대해서 말이죠 목표가 더 정확해질수록 더 확실해지게 됩니다. 그런 증거들이 있습니다. 그리고 목표를 정확하게 해석할 수 있도록 설계될 겁니다. 예를 들어, 인공지능이 책들을 읽고 그 안에 담긴 내용들이 한쪽으로 편향되어 있다고 이해하게 될 겁니다. 그 책들의 내용은 왕과 왕자, 뛰어난 백인 남성들의 업적들 뿐이죠. 이것은 어려운 문제입니다. 하지만 우리의 목적에 대해 더 많이 알게 될수록 우리에게 더욱 유용하게 될 겁니다.

CA: And you couldn't just boil it down to one law, you know, hardwired in: "if any human ever tries to switch me off, I comply. I comply."

CA: 그런데 그걸 하나의 법칙으로만 말할 수는 없겠죠. 말하자면, 이렇게 입력하는 건가요. "만약 어떤 인간이라도 내 전원을 끄려고 한다면 나는 복종한다. 복종한다"

SR: Absolutely not. That would be a terrible idea. So imagine that you have a self-driving car and you want to send your five-year-old off to preschool. Do you want your five-year-old to be able to switch off the car while it's driving along? Probably not. So it needs to understand how rational and sensible the person is. The more rational the person, the more willing you are to be switched off. If the person is completely random or even malicious, then you're less willing to be switched off.

SR: 물론 아닙니다. 그건 정말 끔찍한 생각이에요. 무인자동차를 갖고 있다고 생각해보세요. 그리고 다섯 살이 된 아이를 유치원에 데려다 주려고 합니다. 그때 운전중에 다섯 살짜리 아이가 전원을 꺼버리게 두시겠어요? 아마 아닐 겁니다. 로봇은 작동하는 사람이 얼마나 이성적이고 지각이 있는지 파악합니다. 더 이성적인 사람이 전윈을 꺼주기를 바라겠죠. 만약 어떤 사람인지 전혀 모르고 심지어 악의적인 사람이라면 그런 사람이 전원을 끄기를 원하지는 않을 겁니다.

CA: All right. Stuart, can I just say, I really, really hope you figure this out for us. Thank you so much for that talk. That was amazing.

CA: 알겠습니다. 스튜어드씨, 저희 모두를 위해 그 문제를 해결해 주시기를 바랍니다. 좋은 강연 감사합니다. 정말 놀라운 이야기였습니다

SR: Thank you.

SR: 감사합니다

(Applause)

(박수)

This is Lee Sedol. Lee Sedol is one of the world's greatest Go players, and he's having what my friends in Silicon Valley call a "Holy Cow" moment --

(Laughter)

(웃음)

a moment where we realize that AI is actually progressing a lot faster than we expected. So humans have lost on the Go board. What about the real world?

(Laughter)

(웃음)

So we actually need to nail down the problem a bit more. What exactly is the problem? Why is better AI possibly a catastrophe?

여기서 이 문제에 대해 좀 더 살펴볼 필요가 있습니다. 문제가 정확히 무엇일까요? 왜 뛰어난 인공지능은 재앙을 뜻할까요?

(Laughter)

(웃음)

It's very simple. Just remember that. Repeat it to yourself three times a day.

간단하죠. 하루에 세 번씩 외우세요.

(Laughter)

(웃음)

(Laughter)

(웃음)

So this seems to be inevitable, right? This kind of failure mode seems to be inevitable, and it follows from having a concrete, definite objective.

이렇게 될 수 밖에 없잖아요. 이러한 종류의 실패 모드는 피할 수 없을 겁니다. 확실하고 명확한 목적을 갖고 그대로 따르고 있으니까요.

"Uh, your 20th anniversary at 7pm."

"음, 7시로 예정된 결혼 20주년 축하 저녁식사 입니다."

"I can't do that. I'm meeting with the secretary-general at 7:30. How could this have happened?"

"시간이 안되는데.. 사무총장님과 7시 반에 약속이 있단 말이야. 어떻게 이렇게 겹쳤지?"

"Well, I did warn you, but you overrode my recommendation."

"음, 제가 말씀드리긴 했지만 제 이야기는 무시하셨죠."

"Well, what am I going to do? I can't just tell him I'm too busy."

"그럼 어떡하지? 사무총장님한테 바쁘다고 핑계를 댈 수는 없잖아."

"Don't worry. I arranged for his plane to be delayed."

"걱정마세요. 제가 사무총장님이 타신 비행기가 연착되도록 손 썼습니다."

(Laughter)

(웃음)

"Some kind of computer malfunction."

"알 수 없는 컴퓨터 오류가 일어나도록 했죠."

(Laughter)

(웃음)

"Really? You can do that?"

"정말? 그렇게 할 수 있어?"

"He sends his profound apologies and looks forward to meeting you for lunch tomorrow."

"사무총장님이 정말 미안하다고 하셨습니다. 그리고 대신 내일 점심에 꼭 보자고 하셨습니다."

(Laughter)

(웃음)

So the values here -- there's a slight mistake going on. This is clearly following my wife's values which is "Happy wife, happy life."

(Laughter)

(웃음)

It could go the other way. You could come home after a hard day's work, and the computer says, "Long day?"

다른 상황도 벌어질 수 있습니다. 여러분이 정말 힘든 하루를 보내고 집에 왔습니다. 그리고 컴퓨터가 말합니다. "힘든 하루였죠?"

"Yes, I didn't even have time for lunch."

"맞아, 점심 먹을 시간도 없었어."

"You must be very hungry."

"배가 많이 고프시겠네요."

"Starving, yeah. Could you make some dinner?"

"배고파 죽을 것 같아. 저녁 좀 해줄래?"

"There's something I need to tell you."

"제가 먼저 드릴 말씀이 있습니다."

(Laughter)

(웃음)

"There are humans in South Sudan who are in more urgent need than you."

"남 수단에는 주인님보다 더 심한 굶주림에 시달리는 사람들이 있습니다."

(Laughter)

(웃음)

"So I'm leaving. Make your own dinner."

"저는 이제 전원을 끄겠습니다. 저녁은 알아서 드세요."

(Laughter)

(웃음)

So we have to solve these problems, and I'm looking forward to working on them.

우리는 이런 문제들을 해결해야 합니다. 그리고 해결되리라고 기대합니다.

(Laughter)

(웃음)

And the robot hasn't quite learned the human value function properly, so it doesn't understand the sentimental value of the cat outweighs the nutritional value of the cat.

(Laughter)

(웃음)

(Applause)

(웃음)

Chris Anderson: So interesting, Stuart. We're going to stand here a bit because I think they're setting up for our next speaker.

CA: 정말 흥미롭네요. 스튜어트. 다음 강연 순서를 준비하는 동안에 잠깐 이야기를 나누겠습니다. 몇 가지 질문이 있는데요.

CA: And you couldn't just boil it down to one law, you know, hardwired in: "if any human ever tries to switch me off, I comply. I comply."

CA: All right. Stuart, can I just say, I really, really hope you figure this out for us. Thank you so much for that talk. That was amazing.

CA: 알겠습니다. 스튜어드씨, 저희 모두를 위해 그 문제를 해결해 주시기를 바랍니다. 좋은 강연 감사합니다. 정말 놀라운 이야기였습니다

SR: Thank you.

SR: 감사합니다

(Applause)

(박수)

Stuart Russell: 3 principles for creating safer AI

Stuart Russell: 3 principles for creating safer AI

Related talks

Blaise Agüera y Arcas: How computers are learning to be creative

Sam Harris: Can we build AI without losing control over it?

Zeynep Tufekci: Machine intelligence makes human morals more important

Noriko Arai: Can a robot pass a university entrance exam?

David Lee: Why jobs of the future won't feel like work

Kriti Sharma: How to keep human bias out of AI

Related talks

Blaise Agüera y Arcas: How computers are learning to be creative

Sam Harris: Can we build AI without losing control over it?

Zeynep Tufekci: Machine intelligence makes human morals more important

Noriko Arai: Can a robot pass a university entrance exam?

David Lee: Why jobs of the future won't feel like work

Kriti Sharma: How to keep human bias out of AI