Stuart Russell: 3 principles for creating safer AI

This is Lee Sedol. Lee Sedol is one of the world's greatest Go players, and he's having what my friends in Silicon Valley call a "Holy Cow" moment --

Ő Lee Sedol, a világ egyik legjobb Go-játékosa, aki itt épp arra gondol, amit Szilícium-völgyi barátaim csak úgy hívnak: "azt a leborult szivarvégit!".

(Laughter)

(Nevetés)

a moment where we realize that AI is actually progressing a lot faster than we expected. So humans have lost on the Go board. What about the real world?

A pillanat, amikor rádöbbenünk, hogy a MI sokkal gyorsabban fejlődik, mint amire számítottunk. A Go táblán az ember alul maradt. De mi lesz a való életben?

Well, the real world is much bigger, much more complicated than the Go board. It's a lot less visible, but it's still a decision problem. And if we think about some of the technologies that are coming down the pike ... Noriko [Arai] mentioned that reading is not yet happening in machines, at least with understanding. But that will happen, and when that happens, very soon afterwards, machines will have read everything that the human race has ever written. And that will enable machines, along with the ability to look further ahead than humans can, as we've already seen in Go, if they also have access to more information, they'll be able to make better decisions in the real world than we can. So is that a good thing? Well, I hope so.

A való világ sokkal nagyobb, sokszorta összetettebb, mint a Go. Nem annyira látható, de az is döntési probléma. Ha néhány új technológiára gondolunk, melyek szárnyaikat bontogatják... Noriko [Arai] megemlítette, hogy a gépek még nem olvasnak, legalábbis még nem értik. De ez is el fog jönni, és amikor bekövetkezik, onnan már nem kell sok, hogy elolvassanak mindent, amit az emberi faj valaha leírt. Ez képessé fogja tenni őket, hogy távolabbra lássanak, mint amire az ember képes, ahogy a Go-ban már láttuk. Ha még több információhoz is hozzáférnek, képesek lesznek a való világban is jobb döntéseket hozni nálunk. Vajon ez jó hír? Remélem.

Our entire civilization, everything that we value, is based on our intelligence. And if we had access to a lot more intelligence, then there's really no limit to what the human race can do. And I think this could be, as some people have described it, the biggest event in human history. So why are people saying things like this, that AI might spell the end of the human race? Is this a new thing? Is it just Elon Musk and Bill Gates and Stephen Hawking?

A teljes civilizációnk, minden, amit számunkra értéket képvisel, az intelligenciánkon alapul. Ha sokkal több intelligenciához férnénk hozzá, határtalan lehetőségek nyílnának meg az emberi faj előtt. Azt hiszem, ez lehet - ahogy néhányan megfogalmazták - az emberi történelem legnagyobb eseménye. Mégis miért hangzanak el akkor olyanok, hogy a MI az emberi faj végét jelentheti? Ez valami új dolog? Csak Elon Musk, Bill Gates és Stephen Hawking gondolja így?

Actually, no. This idea has been around for a while. Here's a quotation: "Even if we could keep the machines in a subservient position, for instance, by turning off the power at strategic moments" -- and I'll come back to that "turning off the power" idea later on -- "we should, as a species, feel greatly humbled." So who said this? This is Alan Turing in 1951. Alan Turing, as you know, is the father of computer science and in many ways, the father of AI as well. So if we think about this problem, the problem of creating something more intelligent than your own species, we might call this "the gorilla problem," because gorillas' ancestors did this a few million years ago, and now we can ask the gorillas: Was this a good idea?

Valójában nem. Ez már régebb óta kísértő elgondolás. Nézzük ezt az idézetet: "Még ha képesek is lennénk a gépeket alárendelt szolgai szerepben tartani, például úgy, hogy kritikus helyzetekben kikapcsolnánk őket," - később még visszatérek erre a kikapcsolás-ötletre - "mi, mint élőlények, nagyon megalázva éreznénk magunkat." Ki mondta ezt? Alan Turing 1951-ben. Alarn Turingról tudjuk, hogy a számítástudomány atyja, és sok tekintetben a MI atyja is. Ezt a problémakört, amikor saját fajunknál intelligensebb teremtményeket hozunk létre, hívhatjuk a "gorilla-problémának", mivel gorilla őseink épp ezt csinálták, néhány millió évvel ezelőtt, így most megkérdezhetjük a gorillákat: Jó ötlet volt?

So here they are having a meeting to discuss whether it was a good idea, and after a little while, they conclude, no, this was a terrible idea. Our species is in dire straits. In fact, you can see the existential sadness in their eyes.

Itt éppen arról értekeznek, hogy jó ötlet volt-e, és kisvártatva arra jutnak, hogy nem, ez szörnyű ötlet volt. Fajunk a kihalás szélén áll. Valójában a lét szomorúsága tükröződik a szemükben.

(Laughter)

(Nevetés)

So this queasy feeling that making something smarter than your own species is maybe not a good idea -- what can we do about that? Well, really nothing, except stop doing AI, and because of all the benefits that I mentioned and because I'm an AI researcher, I'm not having that. I actually want to be able to keep doing AI.

A saját fajunknál valami okosabbat létrehozni, talán nem is jó ötlet, ami felkavaró érzés - mégis hogyan kerülhetjük el? Tulajdonképpen sehogy, kivéve, ha leállítjuk a MI kutatást, de a sok előny, amit említettem, és mert én MI-kutató vagyok, nem állítanám le. Igazából folytatni szeretném a MI fejlesztést.

So we actually need to nail down the problem a bit more. What exactly is the problem? Why is better AI possibly a catastrophe?

Egy kissé pontosítanunk kell a problémát. Mi is igazából a probléma? Miért jelenthet katasztrófát egy jobb MI?

So here's another quotation: "We had better be quite sure that the purpose put into the machine is the purpose which we really desire." This was said by Norbert Wiener in 1960, shortly after he watched one of the very early learning systems learn to play checkers better than its creator. But this could equally have been said by King Midas. King Midas said, "I want everything I touch to turn to gold," and he got exactly what he asked for. That was the purpose that he put into the machine, so to speak, and then his food and his drink and his relatives turned to gold and he died in misery and starvation. So we'll call this "the King Midas problem" of stating an objective which is not, in fact, truly aligned with what we want. In modern terms, we call this "the value alignment problem."

Vegyünk egy másik idézetet: "Jó lesz nagyon odafigyelnünk rá, hogy a gépbe táplált cél valóban az a cél, amit tényleg akarunk." Ezt Norbert Wiener mondta 1960-ban, közvetlen azután, hogy látott egy nagyon korai tanuló rendszert, amelyik az alkotójánál jobb volt a dámajátékban. De ugyanezt elmondhatta volna Midász király is. Így szólt: "Azt akarom, változzon minden arannyá, amit megérintek", és pontosan azt kapta, amit kért. Ez volt a cél, amit betáplált a gépbe, mondhatjuk így is, így aztán az étele, az itala, és a rokonai is arannyá változtak, ő pedig nyomorúságos éhhalált halt. Szóval ezt hívjuk "Midász király problémának", amikor olyan célt nevezünk meg, ami valójában nem igazodik jól szándékainkhoz. Mai kifejezéssel ezt "érték-illesztési problémának" nevezzük.

Putting in the wrong objective is not the only part of the problem. There's another part. If you put an objective into a machine, even something as simple as, "Fetch the coffee," the machine says to itself, "Well, how might I fail to fetch the coffee? Someone might switch me off. OK, I have to take steps to prevent that. I will disable my 'off' switch. I will do anything to defend myself against interference with this objective that I have been given." So this single-minded pursuit in a very defensive mode of an objective that is, in fact, not aligned with the true objectives of the human race -- that's the problem that we face. And in fact, that's the high-value takeaway from this talk. If you want to remember one thing, it's that you can't fetch the coffee if you're dead.

A probléma nem csak a rossz cél betáplálását jelenti. Van egy másik oldala is. Ha egy célt előírunk egy gépnek, akár olyan egyszerűt is, mint: "Hozd ide a kávét", a gép azt mondja magának: "Miképp vallhatok kudarcot, a kávé oda vitelében?" Például valaki közben kikapcsol. Rendben, akkor ezt meg kell akadályoznom. Le fogom tiltani a kikapcsológombomat. Mindent megteszek az akadályok elhárításáért a cél érdekében, amit feladatul kaptam." Szóval ez az együgyű törekvés ilyen önvédelmező módon olyan cél felé viszi, ami igazából nem illeszkedik jól az emberi faj valós céljaihoz - ez a probléma, amivel szembesülünk. Valójában ez a nagyon értékes útravaló üzenete ennek az előadásnak. Ha csak egy dologra szeretnének emlékezni, ez legyen: nem tudod felszolgálni a kávét, ha meghaltál.

(Laughter)

(Nevetés)

It's very simple. Just remember that. Repeat it to yourself three times a day.

Roppant egyszerű. Csak erre emlékezzenek! Ismételjék el naponta háromszor!

(Laughter)

(Nevetés)

And in fact, this is exactly the plot of "2001: [A Space Odyssey]" HAL has an objective, a mission, which is not aligned with the objectives of the humans, and that leads to this conflict. Now fortunately, HAL is not superintelligent. He's pretty smart, but eventually Dave outwits him and manages to switch him off. But we might not be so lucky. So what are we going to do?

Tulajdonképpen erről szól a "2001: Űrodüsszeia". HAL-nak van egy célja, egy küldetése, mely nincs az emberi igényekhez igazítva, és ez nézeteltérésekhez vezet. De szerencsére HAL nem szuperintelligens. Meglehetősen okos, de végül Dave túljár az eszén, és sikerül kikapcsolnia. De lehet, hogy mi nem leszünk ilyen szerencsések. Akkor mit tegyünk majd?

I'm trying to redefine AI to get away from this classical notion of machines that intelligently pursue objectives. There are three principles involved. The first one is a principle of altruism, if you like, that the robot's only objective is to maximize the realization of human objectives, of human values. And by values here I don't mean touchy-feely, goody-goody values. I just mean whatever it is that the human would prefer their life to be like. And so this actually violates Asimov's law that the robot has to protect its own existence. It has no interest in preserving its existence whatsoever.

Megpróbálom újraértelmezni az MI-t, hogy eltávolodjunk ettől a klasszikus szemlélettől, mely szerint az intelligens gépek célokért küzdenek. Három alapelvről van szó. Az első az altruizmus alapelve, ha úgy tetszik: a robot egyetlen célja, hogy végsőkig segítse az emberi értékekhez igazodó emberi célok megvalósulását. Értékek alatt nem mézes-mázas, szuper jó értékekre gondolok. Hanem arra, amilyennek az emberek az életüket szeretnék. Ez valójában megsérti Asimov törvényét, hogy a robotnak meg kell védenie saját magát. Semmiféle önmegóvási célja nincs.

The second law is a law of humility, if you like. And this turns out to be really important to make robots safe. It says that the robot does not know what those human values are, so it has to maximize them, but it doesn't know what they are. And that avoids this problem of single-minded pursuit of an objective. This uncertainty turns out to be crucial.

A második törvény, ha úgy tetszik, az alázatosság törvénye. Úgy tűnik, ez rendkívül fontos, hogy a robotokat biztonságossá tegyük. Azt mondja ki, hogy a robot nem ismeri az emberi értékeket, maximalizálni igyekszik, ám mégsem tudja, mik azok. Ez elkerüli az együgyű célratörő magatartást. Ez a bizonytalanság kulcsfontosságú.

Now, in order to be useful to us, it has to have some idea of what we want. It obtains that information primarily by observation of human choices, so our own choices reveal information about what it is that we prefer our lives to be like. So those are the three principles. Let's see how that applies to this question of: "Can you switch the machine off?" as Turing suggested.

Hogy ennek hasznát vegyük, némi elképzelésének kell lennie, hogy mit is akarunk. Ehhez az információt az emberi döntések megfigyeléséből szerzi, mert a döntéseink árulkodnak arról, hogy milyenné szeretnénk tenni az életünket. Ez a három alapelv. Nézzük, hogyan alkalmazhatóak a következő kérdésre: "Ki tudod kapcsolni a gépet?" - ahogy Turing javasolta.

So here's a PR2 robot. This is one that we have in our lab, and it has a big red "off" switch right on the back. The question is: Is it going to let you switch it off? If we do it the classical way, we give it the objective of, "Fetch the coffee, I must fetch the coffee, I can't fetch the coffee if I'm dead," so obviously the PR2 has been listening to my talk, and so it says, therefore, "I must disable my 'off' switch, and probably taser all the other people in Starbucks who might interfere with me."

Tehát itt egy PR2-es robot. Ez van nálunk a laborban, és van egy nagy, piros "off" kapcsoló a hátán. A kérdés: engedni fogja, hogy kikapcsoljuk? Ha hagyományosan csináljuk, azt a parancsot adjuk neki, hogy: "Hozd ide a kávét!", "Hoznom kell a kávét, nem tudom hozni, ha halott vagyok" - tehát a PR2 nyilván hallgat rám, ezért azt mondja: "Le kell tiltanom az 'off' gombom, és jobb lenne ártalmatlanítani mindenkit a Starbuksnál is, aki utamat állhatja."

(Laughter)

(Nevetés)

So this seems to be inevitable, right? This kind of failure mode seems to be inevitable, and it follows from having a concrete, definite objective.

Ez tehát elkerülhetetlennek tűnik, igaz? Ez a zátonyra futás elkerülhetetlennek látszik, és ez egyszerűen a konkrét, meghatározott célból következik.

So what happens if the machine is uncertain about the objective? Well, it reasons in a different way. It says, "OK, the human might switch me off, but only if I'm doing something wrong. Well, I don't really know what wrong is, but I know that I don't want to do it." So that's the first and second principles right there. "So I should let the human switch me off." And in fact you can calculate the incentive that the robot has to allow the human to switch it off, and it's directly tied to the degree of uncertainty about the underlying objective.

Mi történik, ha a gép nem biztos a céljában? Akkor másképp okoskodik. Azt mondja magában: "Talán kikapcsol az ember, de csak akkor, ha valami rosszat teszek. Nem igazán tudom, mi rossz, de azt tudom, hogy olyat nem akarok tenni." Ez volt eddig az első és a második alapelv. "Engednem kell tehát, hogy az ember kikapcsoljon." Tényleg ki lehet számolni a robotnak szükséges ösztönzést, hogy megengedje a kikapcsolását, és ez közvetlenül azzal van összefüggésben, hogy az elérni kívánt cél mennyire bizonytalan.

And then when the machine is switched off, that third principle comes into play. It learns something about the objectives it should be pursuing, because it learns that what it did wasn't right. In fact, we can, with suitable use of Greek symbols, as mathematicians usually do, we can actually prove a theorem that says that such a robot is provably beneficial to the human. You are provably better off with a machine that's designed in this way than without it. So this is a very simple example, but this is the first step in what we're trying to do with human-compatible AI.

Itt jön képbe a harmadik alapelv, amikor a gép ki van kapcsolva. Megtanul valami újat a célokról, amikért küzd, mivel rájön, hogy amit tett, nem volt helyes. Ténylegesen képesek vagyunk ezt levezetni egy halom görög betűvel. Ahogy a matematikusoknál ez szokás: be tudjuk bizonyítani a tételt, mely szerint egy ilyen robot garantáltan hasznos az ember számára. Jobban járunk egy olyan géppel, amit így terveztek, mint nélküle. Szóval ez egy egyszerű példa volt, de ez az első lépés abban, amin dolgozunk, hogy ember-kompatibilis MI-t alkossunk.

Now, this third principle, I think is the one that you're probably scratching your head over. You're probably thinking, "Well, you know, I behave badly. I don't want my robot to behave like me. I sneak down in the middle of the night and take stuff from the fridge. I do this and that." There's all kinds of things you don't want the robot doing. But in fact, it doesn't quite work that way. Just because you behave badly doesn't mean the robot is going to copy your behavior. It's going to understand your motivations and maybe help you resist them, if appropriate. But it's still difficult. What we're trying to do, in fact, is to allow machines to predict for any person and for any possible life that they could live, and the lives of everybody else: Which would they prefer? And there are many, many difficulties involved in doing this; I don't expect that this is going to get solved very quickly. The real difficulties, in fact, are us.

Ez a harmadik alapelv... Talán már többen merengenek rajta ezen gondolatmenettel: "Néha én nem viselkedek túl jól, nem akarom, hogy a robot úgy viselkedjen, mint én. Éjjel lesettenkedek és kifosztom a hűtőt. Csinálok még ezt azt." Sok mindent nem szeretnénk, hogy a robot csináljon. Szerencsére azért ez nem így működik. Csak mert rosszul viselkedünk, a robot nem fog leutánozni. Meg fogja érteni a motivációnk, és talán segít ellenállni a kísértésnek, alkalomadtán. Azért ez még így is nehéz. Azon dolgozunk, hogy a gépek képessé váljanak arra, hogy bárkinek bárrmilyen életkörülményei is vannak, és egyáltalán mindenki esetén meg tudják jósolni, hogy ő mit szeretne. Ebben azonban nagyon sok a nehézség. Nem számítok rá, hogy ez egyhamar megoldódik. A fő nehézség éppen mi magunk vagyunk.

As I have already mentioned, we behave badly. In fact, some of us are downright nasty. Now the robot, as I said, doesn't have to copy the behavior. The robot does not have any objective of its own. It's purely altruistic. And it's not designed just to satisfy the desires of one person, the user, but in fact it has to respect the preferences of everybody. So it can deal with a certain amount of nastiness, and it can even understand that your nastiness, for example, you may take bribes as a passport official because you need to feed your family and send your kids to school. It can understand that; it doesn't mean it's going to steal. In fact, it'll just help you send your kids to school.

Ahogy már említettem, csúnyán viselkedünk, néhányunk egyenesen gazemberként. A robotnak pedig, ahogy mondtam, nem kell másolnia minket. Nincs is önálló célja. Csak tisztán altruista. Nem is úgy van tervezve, hogy csak egy ember kívánságait tartsa szem előtt, hanem mindenki igényeire tekintettel kell lennie. El kell tehát boldoguljon bizonyos mértékű galádsággal, sőt, meg kell értenie, miért viselkedünk renitens módon: például, hogy vámkezelőként csúszópénzt fogadunk el, mert el kell látnunk a családunkat és fizetnünk a gyerekeink iskoláztatását. Megérteni a cselekedetet, de ez nem jelenti, hogy ő is lopni fog. Azon lesz, hogy segítsen a gyerekek beiskolázásában.

We are also computationally limited. Lee Sedol is a brilliant Go player, but he still lost. So if we look at his actions, he took an action that lost the game. That doesn't mean he wanted to lose. So to understand his behavior, we actually have to invert through a model of human cognition that includes our computational limitations -- a very complicated model. But it's still something that we can work on understanding.

Továbbá az agyunk számítási teljesítménye is korlátozott. Lee Sedol briliáns Go játékos, mégis veszített. Ha konkrétan elemezzük a lépéseit, volt egy lépése, ami miatt veszített. Holott nem akart veszíteni. Hogy megértsük a viselkedését, igazából meg kell fordítanunk az emberi értelem modelljét, melyben helyet kap a véges számítási kapacitás - ez egy bonyolult modell. De ez is olyasmi, aminek megértésén van még mit dolgozni.

Probably the most difficult part, from my point of view as an AI researcher, is the fact that there are lots of us, and so the machine has to somehow trade off, weigh up the preferences of many different people, and there are different ways to do that. Economists, sociologists, moral philosophers have understood that, and we are actively looking for collaboration.

Szerintem egy MI-kutatónak valószínűleg az a legnehezebb, hogy sokan vagyunk, és a gépnek valamiképp súlyoznia kell és egyszerre optimalizálni megannyi különböző emberre tekintettel, és erre különböző megoldások vannak. Közgazdászok, szociológusok, filozófusok rájöttek már erre, s mi élénken keressük az együttműködést.

Let's have a look and see what happens when you get that wrong. So you can have a conversation, for example, with your intelligent personal assistant that might be available in a few years' time. Think of a Siri on steroids. So Siri says, "Your wife called to remind you about dinner tonight." And of course, you've forgotten. "What? What dinner? What are you talking about?"

Nézzük meg, mi történik, ha nem jól alkalmazzuk az eddigieket. Megbeszéljük a dolgokat, például az intelligens személyi asszisztensünkkel ami akár néhány éven belül elérhető lehet. Olyan, mint egy felturbózott Siri. Siri megszólal: "Hívott a feleséged, hogy figyelmeztessen a mai vacsorára." Mi persze elfeledkeztünk róla: "Mi? Miféle vacsora? Miről beszélsz?"

"Uh, your 20th anniversary at 7pm."

"Hát, a 20-adik évfordulótok, este 7-kor."

"I can't do that. I'm meeting with the secretary-general at 7:30. How could this have happened?"

"Ez nem fog menni. 7:30-kor a főtitkárral van találkozóm. Hogyan fordulhatott ez elő?"

"Well, I did warn you, but you overrode my recommendation."

"Nos, én figyelmeztettelek, de te máshogy döntöttél."

"Well, what am I going to do? I can't just tell him I'm too busy."

"Most mitévő legyek? Nem mondhatom neki, hogy sok a dolgom."

"Don't worry. I arranged for his plane to be delayed."

"Aggodalomra semmi ok. Elintéztem, hogy késsen a gépe."

(Laughter)

(Nevetés)

"Some kind of computer malfunction."

"Lesz valami számítástechnikai gubanc."

(Laughter)

(Nevetés)

"Really? You can do that?"

"Komolyan? Te erre is képes vagy?"

"He sends his profound apologies and looks forward to meeting you for lunch tomorrow."

"Üzent, hogy mélységesen sajnálja, és már nagyon várja a holnapi közös ebédet."

(Laughter)

(Nevetés)

So the values here -- there's a slight mistake going on. This is clearly following my wife's values which is "Happy wife, happy life."

Itt az értékek tekintetében van egy kis megbicsaklás... Ez a feleségem értékrendjéről szól, ami úgy szól: "Boldog feleség, boldog élet."

(Laughter)

(Nevetés)

It could go the other way. You could come home after a hard day's work, and the computer says, "Long day?"

De ez elsülhet ellenkezőleg is. Hazaérünk a munkából egy nehéz nap után. és a számítógép így szól: "Hosszú nap?"

"Yes, I didn't even have time for lunch."

"Igen, még ebédelni sem volt időm."

"You must be very hungry."

"Akkor nagyon éhes lehetsz."

"Starving, yeah. Could you make some dinner?"

"Igen, éhen halok. Készítenél valami vacsorát?"

"There's something I need to tell you."

"Valamit el kell mondanom..."

(Laughter)

(Nevetés)

"There are humans in South Sudan who are in more urgent need than you."

"Dél-Szudánban emberi lényeknek sokkal nagyobb szükségük van az ételre."

(Laughter)

(Nevetés)

"So I'm leaving. Make your own dinner."

"Szóval én távoztam. Csinálj magadnak vacsorát!"

(Laughter)

(Nevetés)

So we have to solve these problems, and I'm looking forward to working on them.

Ezeket a problémákat még meg kell oldanunk, de alig várom, hogy dolgozhassak rajtuk.

There are reasons for optimism. One reason is, there is a massive amount of data. Because remember -- I said they're going to read everything the human race has ever written. Most of what we write about is human beings doing things and other people getting upset about it. So there's a massive amount of data to learn from.

Több okunk is van optimizmusra. Az első, hogy rengeteg adat áll rendelkezésre. Mert mindent el fognak olvasni, amit az emberi faj valaha is írt. Az írásaink zöme arról szól, hogy emberek tesznek valamit, és ettől más emberi lények dühösek. Rengeteg ismeretanyag van, amiből tanulni lehet.

There's also a very strong economic incentive to get this right. So imagine your domestic robot's at home. You're late from work again and the robot has to feed the kids, and the kids are hungry and there's nothing in the fridge. And the robot sees the cat.

Továbbá van egy nagyon erős gazdasági nyomás, hogy ezt jól valósítsuk meg. Képzeljük el az otthoni háztartási robotunkat. Megint későn érünk haza, a robotnak kell megetetnie a gyerekeket. A gyerekek éhesek, de üres a hűtőszekrény. Ekkor a robot meglátja a macskát.

(Laughter)

(Nevetés)

And the robot hasn't quite learned the human value function properly, so it doesn't understand the sentimental value of the cat outweighs the nutritional value of the cat.

Mivel a robot nem sajátította el elég jól az emberi értékrendet, nem érti, hogy a macska szentimentális értéke túlmutat a tápértékén.

(Laughter)

(Nevetés)

So then what happens? Well, it happens like this: "Deranged robot cooks kitty for family dinner." That one incident would be the end of the domestic robot industry. So there's a huge incentive to get this right long before we reach superintelligent machines.

Vajon mi történik? Valami ilyesmi: "A zsarnok robot feltálalta vacsorára a család cicáját." Egyetlen ilyen fiaskó a háztartási robotipar végét jelentené. Szóval nagy nyomás van rajtunk, hogy ügyesen kezeljük ezt, még jóval azelőtt, hogy elérnénk a szuperintelligens gépekig.

So to summarize: I'm actually trying to change the definition of AI so that we have provably beneficial machines. And the principles are: machines that are altruistic, that want to achieve only our objectives, but that are uncertain about what those objectives are, and will watch all of us to learn more about what it is that we really want. And hopefully in the process, we will learn to be better people. Thank you very much.

Összefoglalva tehát: megpróbálom megváltoztatni a MI definícióját úgy, hogy bizonyíthatóan hasznunkra levő gépeket jelentsenek. Az alapelvek pedig: a gépek altruisták, azaz csak a mi céljaink elérésével foglalkoznak, miközben bizonytalanok a mi céljainkban, de figyelnek minket, hogy megértsék, mit is akarunk valójában. És remélhetően ebben a folyamatban mi is jobb emberekké válunk. Nagyon köszönöm.

(Applause)

(Taps)

Chris Anderson: So interesting, Stuart. We're going to stand here a bit because I think they're setting up for our next speaker.

Chris Anderson: Nagyon érdekes, Stuart. Álljunk arrébb kicsit, mert rendezkednek a következő előadó miatt.

A couple of questions. So the idea of programming in ignorance seems intuitively really powerful. As you get to superintelligence, what's going to stop a robot reading literature and discovering this idea that knowledge is actually better than ignorance and still just shifting its own goals and rewriting that programming?

Pár kérdés. Ez a tudatlanság alapú programozás elég hatékony dolognak tűnik. Ha elérjük a szuperintelligenciát, mi akadályozza meg a robotot abban, hogy az irodalmat olvasgatva arra a felismerésre jusson, hogy a tudás jobb, mint a tudatlanság, és ezért átírja a saját programját, hogy legyenek saját céljai?

Stuart Russell: Yes, so we want it to learn more, as I said, about our objectives. It'll only become more certain as it becomes more correct, so the evidence is there and it's going to be designed to interpret it correctly. It will understand, for example, that books are very biased in the evidence they contain. They only talk about kings and princes and elite white male people doing stuff. So it's a complicated problem, but as it learns more about our objectives it will become more and more useful to us.

Stuart Russell: Igen, ahogy mondtam is, jobban meg kell ismerjük a saját céljainkat is. Csak annyival válik magabiztosabbá, amennyivel jobban átlátja a dolgokat, ez rá a garancia, és úgy lesz tervezve, hogy ezt jól értelmezze. Például rá fog jönni, hogy a könyvek nagyon elfogultak a megismert bizonyítékaik alapján. Kizárólag királyokról és hercegekről szólnak, csak fehér férfiak csinálnak mindent. Szóval komplikált a probléma, de ahogy egyre jobban megismeri a céljainkat, annál inkább lesz hasznunkra.

CA: And you couldn't just boil it down to one law, you know, hardwired in: "if any human ever tries to switch me off, I comply. I comply."

CA: Nem lehetne csak egy szabályra egyszerűsíteni, simán betáplálva, hogy: "Ha egy ember megpróbál kikapcsolni, elfogadom, elfogadom."

SR: Absolutely not. That would be a terrible idea. So imagine that you have a self-driving car and you want to send your five-year-old off to preschool. Do you want your five-year-old to be able to switch off the car while it's driving along? Probably not. So it needs to understand how rational and sensible the person is. The more rational the person, the more willing you are to be switched off. If the person is completely random or even malicious, then you're less willing to be switched off.

SR: Egyáltalán nem. Ez egy szörnyű elképzelés. Képzeld csak el, hogy van egy önvezető autód, és el akarod vele küldeni az ötéves gyereked az óvodába. Azt akarod, hogy az ötéves képes legyen kikapcsolni az autót útközben? Valószínűleg nem. Tehát szükséges, hogy értse, mennyire racionális és józan az adott ember. Minél racionálisabb az ember, annál inkább hajlandó engedni, hogy kikapcsolják. Ha a személy teljesen zavart, vagy akár ártó szándékú, akkor kevésbé lesz hajlandó rá, hogy kikapcsolják.

CA: All right. Stuart, can I just say, I really, really hope you figure this out for us. Thank you so much for that talk. That was amazing.

CA: Rendben, Stuart, csak annyit mondok, nagyon-nagyon remélem, hogy megoldod ezt nekünk. Igazán köszönöm az előadást. Lenyűgöző volt.

SR: Thank you.

SR: Köszönöm.

(Applause)

(Taps)

This is Lee Sedol. Lee Sedol is one of the world's greatest Go players, and he's having what my friends in Silicon Valley call a "Holy Cow" moment --

Ő Lee Sedol, a világ egyik legjobb Go-játékosa, aki itt épp arra gondol, amit Szilícium-völgyi barátaim csak úgy hívnak: "azt a leborult szivarvégit!".

(Laughter)

(Nevetés)

a moment where we realize that AI is actually progressing a lot faster than we expected. So humans have lost on the Go board. What about the real world?

A pillanat, amikor rádöbbenünk, hogy a MI sokkal gyorsabban fejlődik, mint amire számítottunk. A Go táblán az ember alul maradt. De mi lesz a való életben?

(Laughter)

(Nevetés)

So we actually need to nail down the problem a bit more. What exactly is the problem? Why is better AI possibly a catastrophe?

Egy kissé pontosítanunk kell a problémát. Mi is igazából a probléma? Miért jelenthet katasztrófát egy jobb MI?

(Laughter)

(Nevetés)

It's very simple. Just remember that. Repeat it to yourself three times a day.

Roppant egyszerű. Csak erre emlékezzenek! Ismételjék el naponta háromszor!

(Laughter)

(Nevetés)

(Laughter)

(Nevetés)

So this seems to be inevitable, right? This kind of failure mode seems to be inevitable, and it follows from having a concrete, definite objective.

Ez tehát elkerülhetetlennek tűnik, igaz? Ez a zátonyra futás elkerülhetetlennek látszik, és ez egyszerűen a konkrét, meghatározott célból következik.

"Uh, your 20th anniversary at 7pm."

"Hát, a 20-adik évfordulótok, este 7-kor."

"I can't do that. I'm meeting with the secretary-general at 7:30. How could this have happened?"

"Ez nem fog menni. 7:30-kor a főtitkárral van találkozóm. Hogyan fordulhatott ez elő?"

"Well, I did warn you, but you overrode my recommendation."

"Nos, én figyelmeztettelek, de te máshogy döntöttél."

"Well, what am I going to do? I can't just tell him I'm too busy."

"Most mitévő legyek? Nem mondhatom neki, hogy sok a dolgom."

"Don't worry. I arranged for his plane to be delayed."

"Aggodalomra semmi ok. Elintéztem, hogy késsen a gépe."

(Laughter)

(Nevetés)

"Some kind of computer malfunction."

"Lesz valami számítástechnikai gubanc."

(Laughter)

(Nevetés)

"Really? You can do that?"

"Komolyan? Te erre is képes vagy?"

"He sends his profound apologies and looks forward to meeting you for lunch tomorrow."

"Üzent, hogy mélységesen sajnálja, és már nagyon várja a holnapi közös ebédet."

(Laughter)

(Nevetés)

So the values here -- there's a slight mistake going on. This is clearly following my wife's values which is "Happy wife, happy life."

Itt az értékek tekintetében van egy kis megbicsaklás... Ez a feleségem értékrendjéről szól, ami úgy szól: "Boldog feleség, boldog élet."

(Laughter)

(Nevetés)

It could go the other way. You could come home after a hard day's work, and the computer says, "Long day?"

De ez elsülhet ellenkezőleg is. Hazaérünk a munkából egy nehéz nap után. és a számítógép így szól: "Hosszú nap?"

"Yes, I didn't even have time for lunch."

"Igen, még ebédelni sem volt időm."

"You must be very hungry."

"Akkor nagyon éhes lehetsz."

"Starving, yeah. Could you make some dinner?"

"Igen, éhen halok. Készítenél valami vacsorát?"

"There's something I need to tell you."

"Valamit el kell mondanom..."

(Laughter)

(Nevetés)

"There are humans in South Sudan who are in more urgent need than you."

"Dél-Szudánban emberi lényeknek sokkal nagyobb szükségük van az ételre."

(Laughter)

(Nevetés)

"So I'm leaving. Make your own dinner."

"Szóval én távoztam. Csinálj magadnak vacsorát!"

(Laughter)

(Nevetés)

So we have to solve these problems, and I'm looking forward to working on them.

Ezeket a problémákat még meg kell oldanunk, de alig várom, hogy dolgozhassak rajtuk.

(Laughter)

(Nevetés)

And the robot hasn't quite learned the human value function properly, so it doesn't understand the sentimental value of the cat outweighs the nutritional value of the cat.

Mivel a robot nem sajátította el elég jól az emberi értékrendet, nem érti, hogy a macska szentimentális értéke túlmutat a tápértékén.

(Laughter)

(Nevetés)

(Applause)

(Taps)

Chris Anderson: So interesting, Stuart. We're going to stand here a bit because I think they're setting up for our next speaker.

Chris Anderson: Nagyon érdekes, Stuart. Álljunk arrébb kicsit, mert rendezkednek a következő előadó miatt.

CA: And you couldn't just boil it down to one law, you know, hardwired in: "if any human ever tries to switch me off, I comply. I comply."

CA: Nem lehetne csak egy szabályra egyszerűsíteni, simán betáplálva, hogy: "Ha egy ember megpróbál kikapcsolni, elfogadom, elfogadom."

CA: All right. Stuart, can I just say, I really, really hope you figure this out for us. Thank you so much for that talk. That was amazing.

CA: Rendben, Stuart, csak annyit mondok, nagyon-nagyon remélem, hogy megoldod ezt nekünk. Igazán köszönöm az előadást. Lenyűgöző volt.

SR: Thank you.

SR: Köszönöm.

(Applause)

(Taps)

Stuart Russell: 3 principles for creating safer AI

Stuart Russell: 3 principles for creating safer AI

Related talks

Blaise Agüera y Arcas: How computers are learning to be creative

Sam Harris: Can we build AI without losing control over it?

Zeynep Tufekci: Machine intelligence makes human morals more important

Noriko Arai: Can a robot pass a university entrance exam?

David Lee: Why jobs of the future won't feel like work

Kriti Sharma: How to keep human bias out of AI

Related talks

Blaise Agüera y Arcas: How computers are learning to be creative

Sam Harris: Can we build AI without losing control over it?

Zeynep Tufekci: Machine intelligence makes human morals more important

Noriko Arai: Can a robot pass a university entrance exam?

David Lee: Why jobs of the future won't feel like work

Kriti Sharma: How to keep human bias out of AI