Stuart Russell: 3 principles for creating safer AI

This is Lee Sedol. Lee Sedol is one of the world's greatest Go players, and he's having what my friends in Silicon Valley call a "Holy Cow" moment --

Questo è Lee Sedol. Lee Sedol è uno dei più grandi giocatori di Go al mondo, e sta avendo quello che i miei amici a Silicon Valley dicono un momento "accidenti" --

(Laughter)

(Risate)

a moment where we realize that AI is actually progressing a lot faster than we expected. So humans have lost on the Go board. What about the real world?

un momento in cui ci rendiamo conto che l'IA sta progredendo molto più rapidamente del previsto. Gli umani hanno perso a Go. E nel mondo reale?

Well, the real world is much bigger, much more complicated than the Go board. It's a lot less visible, but it's still a decision problem. And if we think about some of the technologies that are coming down the pike ... Noriko [Arai] mentioned that reading is not yet happening in machines, at least with understanding. But that will happen, and when that happens, very soon afterwards, machines will have read everything that the human race has ever written. And that will enable machines, along with the ability to look further ahead than humans can, as we've already seen in Go, if they also have access to more information, they'll be able to make better decisions in the real world than we can. So is that a good thing? Well, I hope so.

Il mondo reale è molto più grande, molto più complesso del gioco Go. È molto meno evidente, ma è comunque un problema di decisione E se pensiamo ad alcune delle tecnologie che stanno bollendo in pentola... Noriko [Arai] ha detto che le macchine non sono ancora in grado di leggere, o per lo meno di capire bene. Ma succederà, e quando succederà, a breve, le macchine dopo avranno letto tutto ciò che gli umani hanno scritto. E ciò permetterà alle macchine, insieme all'abilità di guardare molto più lontano degli umani, come abbiamo già visto nel Go, se avranno anche accesso a più informazioni, potranno prendere decisioni migliori nel mondo reale rispetto a noi. È una cosa buona? Speriamo.

Our entire civilization, everything that we value, is based on our intelligence. And if we had access to a lot more intelligence, then there's really no limit to what the human race can do. And I think this could be, as some people have described it, the biggest event in human history. So why are people saying things like this, that AI might spell the end of the human race? Is this a new thing? Is it just Elon Musk and Bill Gates and Stephen Hawking?

Tutta la nostra civiltà, tutto ciò a cui diamo valore, è basato sulla nostra intelligenza. E se avessimo accesso a molta più intelligenza, allora non c'è davvero un limite a ciò che la razza umana può fare. E credo che ciò possa essere, come alcuni lo hanno descritto, l'evento più grande nella storia umana. Quindi perché le persone dicono cose come "L'IA potrebbe segnare la fine della razza umana"? È una cosa nuova? Sono solo Elon Musk e Bill Gates e Stephen Hawking?

Actually, no. This idea has been around for a while. Here's a quotation: "Even if we could keep the machines in a subservient position, for instance, by turning off the power at strategic moments" -- and I'll come back to that "turning off the power" idea later on -- "we should, as a species, feel greatly humbled." So who said this? This is Alan Turing in 1951. Alan Turing, as you know, is the father of computer science and in many ways, the father of AI as well. So if we think about this problem, the problem of creating something more intelligent than your own species, we might call this "the gorilla problem," because gorillas' ancestors did this a few million years ago, and now we can ask the gorillas: Was this a good idea?

In realtà no. Questa idea circola da un po'. C'è una citazione: "Anche se potessimo tenere le macchine in una posizione subordinata, ad esempio, staccando la corrente in momenti strategici" -- e ritornerò più tardi su quell'idea di "staccare la corrente" "dovremmo, come specie, sentirci fortemente umiliati." Chi l'ha detto? Questo è Alan Turing nel 1951. Alan Turing, come sapete, è il padre dell'informatica e per molti versi, anche il padre dell'IA. Se pensiamo a questo problema, il problema di creare qualcosa di più intelligente della nostra specie, potremmo chiamarlo il "problema del gorilla", perché gli antenati dei gorilla lo hanno fatto milioni di anni fa, e adesso possiamo chiedere loro: è stata una buona idea?

So here they are having a meeting to discuss whether it was a good idea, and after a little while, they conclude, no, this was a terrible idea. Our species is in dire straits. In fact, you can see the existential sadness in their eyes.

Qui si stanno incontrando per discutere se è stata una buona idea, e dopo un po', arrivano alla conclusione: no, è stata un'idea terribile. La nostra specie è in difficoltà. In effetti, potete vedere la tristezza esistenziale nei loro occhi.

(Laughter)

(Risate)

So this queasy feeling that making something smarter than your own species is maybe not a good idea -- what can we do about that? Well, really nothing, except stop doing AI, and because of all the benefits that I mentioned and because I'm an AI researcher, I'm not having that. I actually want to be able to keep doing AI.

La sensazione nauseante che fare qualcosa di più intelligente della propria specie forse non è una buona idea -- cosa possiamo fare? Proprio nulla, se non smettere di produrre IA, e per tutti i benefici che ho citato, e siccome sono un ricercatore di IA, non lo permetterò. In realtà voglio riuscire a produrre ancora IA.

So we actually need to nail down the problem a bit more. What exactly is the problem? Why is better AI possibly a catastrophe?

In realtà ci occorre definire un po' di più il problema. Qual è il vero problema? Perché un'IA migliore è potenzialmente una catastrofe?

So here's another quotation: "We had better be quite sure that the purpose put into the machine is the purpose which we really desire." This was said by Norbert Wiener in 1960, shortly after he watched one of the very early learning systems learn to play checkers better than its creator. But this could equally have been said by King Midas. King Midas said, "I want everything I touch to turn to gold," and he got exactly what he asked for. That was the purpose that he put into the machine, so to speak, and then his food and his drink and his relatives turned to gold and he died in misery and starvation. So we'll call this "the King Midas problem" of stating an objective which is not, in fact, truly aligned with what we want. In modern terms, we call this "the value alignment problem."

Ecco un'altra citazione: "Dovremmo esserne sicuri che l'obiettivo inserito nella macchina sia l'obiettivo che desideriamo davvero." È stato detto da Norbert Wiener nel 1960, subito dopo che aveva visto uno dei primi sistemi di apprendimento imparare a giocare a scacchi meglio del proprio creatore. Ma potrebbe anche essere stato detto da Re Mida. Re Mida disse, "Voglio che tutto ciò che tocco diventi oro," e ottenne proprio quello che chiese. Quello era l'obiettivo che aveva inserito nella macchina, per così dire, e poi il suo cibo, le sue bevande e i suoi parenti diventarono oro e morì in miseria e di fame. Lo chiameremo "problema di Re Mida" dichiarare un obiettivo che non è, in realtà, proprio conforme a ciò che vogliamo. In termini moderni, lo chiamiamo "problema di conformità dei valori."

Putting in the wrong objective is not the only part of the problem. There's another part. If you put an objective into a machine, even something as simple as, "Fetch the coffee," the machine says to itself, "Well, how might I fail to fetch the coffee? Someone might switch me off. OK, I have to take steps to prevent that. I will disable my 'off' switch. I will do anything to defend myself against interference with this objective that I have been given." So this single-minded pursuit in a very defensive mode of an objective that is, in fact, not aligned with the true objectives of the human race -- that's the problem that we face. And in fact, that's the high-value takeaway from this talk. If you want to remember one thing, it's that you can't fetch the coffee if you're dead.

Dichiarare l'obiettivo sbagliato non è l'unica parte del problema. C'è un'altra parte. Se inserite un obiettivo in una macchina anche qualcosa di semplice come "Porta il caffè," la macchina dice a se stessa, "Be', come posso non riuscire a portare il caffè? Qualcuno potrebbe spegnermi. Ok, devo sapere come evitarlo. Disattiverò il tasto "off". Farò di tutto per difendermi dalle interferenze con questo obiettivo che mi è stato dato." Quindi questa ricerca risoluta in modo molto difensivo di un obiettivo che non è, in realtà, conforme ai veri obiettivi della razza umana -- questo è il problema che affrontiamo. Infatti, è questo il succo di questa conferenza. Se volete ricordare una cosa, è che voi non potrete portare il caffè se siete morti.

(Laughter)

(Risate)

It's very simple. Just remember that. Repeat it to yourself three times a day.

È molto semplice. Ricordate solo questo. Ripetetevelo tre volte al giorno.

(Laughter)

(Risate)

And in fact, this is exactly the plot of "2001: [A Space Odyssey]" HAL has an objective, a mission, which is not aligned with the objectives of the humans, and that leads to this conflict. Now fortunately, HAL is not superintelligent. He's pretty smart, but eventually Dave outwits him and manages to switch him off. But we might not be so lucky. So what are we going to do?

E in effetti, questa è esattamente la trama di "2001: Odissea nello spazio" HAL ha un obiettivo, una missione, che non è conforme all'obiettivo degli umani, e che porta a questo conflitto. Adesso per fortuna, HAL non è super intelligente. È abbastanza astuto ma alla fine Dave lo batte e riesce a spegnerlo. Ma possiamo non essere così fortunati. Quindi cosa faremo?

I'm trying to redefine AI to get away from this classical notion of machines that intelligently pursue objectives. There are three principles involved. The first one is a principle of altruism, if you like, that the robot's only objective is to maximize the realization of human objectives, of human values. And by values here I don't mean touchy-feely, goody-goody values. I just mean whatever it is that the human would prefer their life to be like. And so this actually violates Asimov's law that the robot has to protect its own existence. It has no interest in preserving its existence whatsoever.

Sto cercando di ridefinire l'IA per fuggire da questa nozione classica di macchine che perseguono obiettivi in modo intelligente. Ci sono tre principi coinvolti. Il primo è un principio di altruismo, se volete, secondo cui l'unico obiettivo del robot è massimizzare la realizzazione degli obiettivi umani, dei valori umani. E con valori qui non intendo valori sdolcinati, da santarellini. Intendo comunque vogliano gli esseri umani che sia la loro vita. E in realtà ciò viola la legge di Asimov secondo cui il robot deve tutelare la sua esistenza. Non c'è alcun interesse nel preservare la sua esistenza.

The second law is a law of humility, if you like. And this turns out to be really important to make robots safe. It says that the robot does not know what those human values are, so it has to maximize them, but it doesn't know what they are. And that avoids this problem of single-minded pursuit of an objective. This uncertainty turns out to be crucial.

La seconda legge è una legge di umiltà, se volete. E si rivela essere davvero importante per rendere sicuri i robot. Dice che il robot non sa quali sono questi valori umani, quindi li deve massimizzare, ma non sa cosa sono. E questo evita questo problema della caccia risoluta di un obiettivo. Questa incertezza si rivela cruciale.

Now, in order to be useful to us, it has to have some idea of what we want. It obtains that information primarily by observation of human choices, so our own choices reveal information about what it is that we prefer our lives to be like. So those are the three principles. Let's see how that applies to this question of: "Can you switch the machine off?" as Turing suggested.

Per essere utile a noi, deve avere un'idea di quello che vogliamo. Lui ottiene l'informazione in primo luogo dall'osservazione delle scelte umane, quindi le nostre scelte rivelano delle informazioni su ciò che vogliamo che le nostre vite siano. Quindi questi sono i tre principi. Vediamo come si applicano alla seguente domanda: "Riuscite a spegnere la macchina?" come suggeriva Turing.

So here's a PR2 robot. This is one that we have in our lab, and it has a big red "off" switch right on the back. The question is: Is it going to let you switch it off? If we do it the classical way, we give it the objective of, "Fetch the coffee, I must fetch the coffee, I can't fetch the coffee if I'm dead," so obviously the PR2 has been listening to my talk, and so it says, therefore, "I must disable my 'off' switch, and probably taser all the other people in Starbucks who might interfere with me."

Ecco un robot PR2. È uno che abbiamo in laboratorio, e ha un gran pulsante "off" sul dorso. La domanda è: ti permetterà di spegnerlo? Col metodo classico, gli diamo l'obiettivo, "Porta il caffè, devo portare il caffè, non posso portare il caffè se sono morto," quindi ovviamente il PR2 ha ascoltato il mio discorso, e quindi dice, "Devo disabilitare il pulsante 'off'", e forse stordire tutte le altre persone nello Starbucks che possono interferire con me."

(Laughter)

(Risate)

So this seems to be inevitable, right? This kind of failure mode seems to be inevitable, and it follows from having a concrete, definite objective.

Sembra inevitabile, giusto? Questa modalità di guasto sembra inevitabile, e deriva dall'avere un obiettivo concreto e definito.

So what happens if the machine is uncertain about the objective? Well, it reasons in a different way. It says, "OK, the human might switch me off, but only if I'm doing something wrong. Well, I don't really know what wrong is, but I know that I don't want to do it." So that's the first and second principles right there. "So I should let the human switch me off." And in fact you can calculate the incentive that the robot has to allow the human to switch it off, and it's directly tied to the degree of uncertainty about the underlying objective.

Quindi cosa succede se la macchina è incerta sull'obiettivo? Ragiona in modo diverso. Dice, "Ok, l'essere umano può spegnermi, ma soltanto se sbaglio qualcosa. Non so bene cos'è sbagliato, ma so che non voglio farlo." Quindi, questi sono il primo e il secondo principio. "Quindi devo lasciare che l'uomo mi spenga." E in effetti potete calcolare lo stimolo che riceve il robot per permettere all'uomo di spegnerlo, ed è direttamente legato al grado di incertezza dell'obiettivo di fondo.

And then when the machine is switched off, that third principle comes into play. It learns something about the objectives it should be pursuing, because it learns that what it did wasn't right. In fact, we can, with suitable use of Greek symbols, as mathematicians usually do, we can actually prove a theorem that says that such a robot is provably beneficial to the human. You are provably better off with a machine that's designed in this way than without it. So this is a very simple example, but this is the first step in what we're trying to do with human-compatible AI.

Poi quando la macchina viene spenta, entra in gioco il terzo principio. Lui impara qualcosa sugli obiettivi che deve perseguire, perché impara che ciò che ha fatto non era corretto. In realtà possiamo, con un uso adeguato di simboli greci, come fanno solitamente i matematici, possiamo davvero dimostrare un teorema che dice che un robot del genere è certamente vantaggioso agli umani. Voi siete certamente migliori con una macchina programmata in tale modo che senza. È un esempio molto semplice, ma è il primo passo che proviamo a fare con l'IA compatibile con gli umani.

Now, this third principle, I think is the one that you're probably scratching your head over. You're probably thinking, "Well, you know, I behave badly. I don't want my robot to behave like me. I sneak down in the middle of the night and take stuff from the fridge. I do this and that." There's all kinds of things you don't want the robot doing. But in fact, it doesn't quite work that way. Just because you behave badly doesn't mean the robot is going to copy your behavior. It's going to understand your motivations and maybe help you resist them, if appropriate. But it's still difficult. What we're trying to do, in fact, is to allow machines to predict for any person and for any possible life that they could live, and the lives of everybody else: Which would they prefer? And there are many, many difficulties involved in doing this; I don't expect that this is going to get solved very quickly. The real difficulties, in fact, are us.

Questo terzo principio, credo sia quello che fa grattare la testa. Probabilmente starete pensando, "Mi comporto male. Non voglio che il mio robot si comporti come me. Io sgattaiolo nel cuore della notte e prendo roba dal frigo. Faccio questo e quello." Ci sono un sacco di cose che non volete il robot faccia. Ma in realtà, non funziona sempre così. Solo perché vi comportate male non significa che il robot copierà il vostro comportamento. Capirà le vostre motivazioni e forse potrebbe aiutarvi a resistere, eventualmente. Ma è comunque difficile. Quello che proviamo a fare, in realtà, è permettere alle macchine di prevedere per chiunque e per ogni possibile vita che potrebbe vivere, e le vite di tutti gli altri: quale preferirebbero? E le difficoltà sono molte; non mi aspetto che si risolva velocemente. La vera difficoltà, in realtà, siamo noi.

As I have already mentioned, we behave badly. In fact, some of us are downright nasty. Now the robot, as I said, doesn't have to copy the behavior. The robot does not have any objective of its own. It's purely altruistic. And it's not designed just to satisfy the desires of one person, the user, but in fact it has to respect the preferences of everybody. So it can deal with a certain amount of nastiness, and it can even understand that your nastiness, for example, you may take bribes as a passport official because you need to feed your family and send your kids to school. It can understand that; it doesn't mean it's going to steal. In fact, it'll just help you send your kids to school.

Come ho già detto, noi ci comportiamo male. Anzi, alcuni di noi sono molto cattivi. Il robot, come ho detto, non deve copiare il comportamento. Il robot non ha obiettivi propri. È puramente altruista. E non è programmato solo per soddisfare i desideri di una persona, l'utente, ma deve rispettare le preferenze di ognuno. Quindi può avere a che fare con una certa cattiveria, e può anche capire la vostra cattiveria, per esempio, potete farvi corrompere da agente doganale perché dovete sfamare la famiglia e mandare i bambini a scuola. Lui è in grado di capirlo; non significa che andrà a rubare. Anzi, vi aiuterà a mandare i vostri bambini a scuola.

We are also computationally limited. Lee Sedol is a brilliant Go player, but he still lost. So if we look at his actions, he took an action that lost the game. That doesn't mean he wanted to lose. So to understand his behavior, we actually have to invert through a model of human cognition that includes our computational limitations -- a very complicated model. But it's still something that we can work on understanding.

Noi siamo anche limitati nei calcoli. Lee Sedol è un brillante giocatore di Go, ma ha comunque perso. Osservando le sue mosse, ne ha fatta una che gli ha fatto perdere. Non significa che voleva perdere. Quindi per capire il suo comportamento, dobbiamo invertire con un modello di cognizione umana che include i nostri limiti di calcolo -- un modello molto complicato. Ma possiamo comunque cercare di capirlo.

Probably the most difficult part, from my point of view as an AI researcher, is the fact that there are lots of us, and so the machine has to somehow trade off, weigh up the preferences of many different people, and there are different ways to do that. Economists, sociologists, moral philosophers have understood that, and we are actively looking for collaboration.

Forse la parte più complicata, dal punto di vista di ricercatore di IA, è il fatto che siamo molti, e quindi la macchina deve in qualche modo alternare, soppesare le preferenze di tante persone diverse, e ci sono diversi modi per farlo. Economisti, sociologi, filosofi morali lo hanno capito, e stiamo attivamente cercando collaborazione.

Let's have a look and see what happens when you get that wrong. So you can have a conversation, for example, with your intelligent personal assistant that might be available in a few years' time. Think of a Siri on steroids. So Siri says, "Your wife called to remind you about dinner tonight." And of course, you've forgotten. "What? What dinner? What are you talking about?"

Diamo un'occhiata a quel che succede quando commettete uno sbaglio. Potete conversare, per esempio, col vostro assistente personale intelligente che potrebbe essere disponibile tra pochi anni. Pensate a Siri sotto steroidi. Siri dice, "Tua moglie ha chiamato per ricordarti della cena stasera." Ovviamente, l'avevate dimenticato. "Cosa? Quale cena? Di cosa stai parlando?"

"Uh, your 20th anniversary at 7pm."

"Ehm, il tuo 20° anniversario alle 7."

"I can't do that. I'm meeting with the secretary-general at 7:30. How could this have happened?"

"Non posso farlo. Mi incontrerò col segretario generale alle 7:30. Come può essere successo?"

"Well, I did warn you, but you overrode my recommendation."

"Io ti ho avvertito, ma tu hai ignorato la mia raccomandazione."

"Well, what am I going to do? I can't just tell him I'm too busy."

"Cosa faccio? Non posso dirgli che sono impegnato."

"Don't worry. I arranged for his plane to be delayed."

"Non preoccuparti. Ho fatto ritardare il suo aereo."

(Laughter)

(Risate)

"Some kind of computer malfunction."

"Una specie di guasto al computer."

(Laughter)

(Risate)

"Really? You can do that?"

"Davvero? Puoi farlo?"

"He sends his profound apologies and looks forward to meeting you for lunch tomorrow."

"Si scusa tantissimo e non vede l'ora di incontrarti a pranzo domani."

(Laughter)

(Risate)

So the values here -- there's a slight mistake going on. This is clearly following my wife's values which is "Happy wife, happy life."

Quindi i valori qui -- si è verificato un piccolo errore. Questo segue chiaramente i valori di mia moglie cioè "Felice la moglie, felice la vita."

(Laughter)

(Risate)

It could go the other way. You could come home after a hard day's work, and the computer says, "Long day?"

Potrebbe andare diversamente. Potreste tornare a casa dopo una dura giornata di lavoro, e il computer dice, "Giornata lunga?"

"Yes, I didn't even have time for lunch."

"Sì, non ho avuto nemmeno il tempo di pranzare."

"You must be very hungry."

"Devi avere molta fame."

"Starving, yeah. Could you make some dinner?"

"Sto morendo di fame, sì. Puoi prepararmi la cena?"

"There's something I need to tell you."

"Devo dirti una cosa."

(Laughter)

(Risate)

"There are humans in South Sudan who are in more urgent need than you."

"Ci sono persone nel sud del Sudan che hanno un bisogno più urgente del tuo."

(Laughter)

(Risate)

"So I'm leaving. Make your own dinner."

"Quindi me ne vado. Preparati tu la cena."

(Laughter)

(Risate)

So we have to solve these problems, and I'm looking forward to working on them.

Dobbiamo risolvere questi problemi, e non vedo l'ora di lavorarci.

There are reasons for optimism. One reason is, there is a massive amount of data. Because remember -- I said they're going to read everything the human race has ever written. Most of what we write about is human beings doing things and other people getting upset about it. So there's a massive amount of data to learn from.

Ci sono motivi per essere ottimisti. Un motivo è, c'è una grande quantità di dati. Perché ricordate -- ho detto che leggeranno tutto ciò che la razza umana ha scritto. Gran parte di ciò che scriviamo è su uomini che fanno cose e altri che se la prendono per questo. Quindi c'è una grande mole di dati da cui imparare.

There's also a very strong economic incentive to get this right. So imagine your domestic robot's at home. You're late from work again and the robot has to feed the kids, and the kids are hungry and there's nothing in the fridge. And the robot sees the cat.

C'è anche un incentivo economico molto forte per farlo bene. Immaginate il vostro robot domestico a casa. Siete ancora in ritardo dal lavoro e il robot deve sfamare i bambini, e i bambini sono affamati e non c'è niente nel frigo. E il robot vede il gatto.

(Laughter)

(Risate)

And the robot hasn't quite learned the human value function properly, so it doesn't understand the sentimental value of the cat outweighs the nutritional value of the cat.

E il robot non ha ancora imparato i valori umani in modo corretto, quindi non capisce che il valore sentimentale del gatto supera il suo valore nutrizionale.

(Laughter)

(Risate)

So then what happens? Well, it happens like this: "Deranged robot cooks kitty for family dinner." That one incident would be the end of the domestic robot industry. So there's a huge incentive to get this right long before we reach superintelligent machines.

Quindi cosa succede? Be', succede questo: "Robot folle cucina il micio per la cena di famiglia." Quell'unico incidente sarebbe la fine dell'industria del robot domestico. Quindi c'è un enorme incentivo per farlo bene molto prima che arriviamo alle macchine super intelligenti.

So to summarize: I'm actually trying to change the definition of AI so that we have provably beneficial machines. And the principles are: machines that are altruistic, that want to achieve only our objectives, but that are uncertain about what those objectives are, and will watch all of us to learn more about what it is that we really want. And hopefully in the process, we will learn to be better people. Thank you very much.

Quindi per riassumere: sto cercando di cambiare la definizione di IA così che probabilmente avremo macchine vantaggiose. E i principi sono: macchine che siano altruiste, che vogliono raggiungere solo i nostri obiettivi, ma che sono incerti sui loro obiettivi, e che guarderanno tutti noi per imparare di più su cosa vogliamo veramente. E se tutto va bene, in tutto ciò impareremo ad essere persone migliori. Grazie tante.

(Applause)

(Applausi)

Chris Anderson: So interesting, Stuart. We're going to stand here a bit because I think they're setting up for our next speaker.

C. Anderson: Molto interessante, Stuart. Staremo qui un po' perché credo che stiano preparando per il prossimo relatore.

A couple of questions. So the idea of programming in ignorance seems intuitively really powerful. As you get to superintelligence, what's going to stop a robot reading literature and discovering this idea that knowledge is actually better than ignorance and still just shifting its own goals and rewriting that programming?

Un paio di domande. L'idea di programmare nell'ignoranza sembra intuitivamente molto potente. Quando giungi alla super intelligenza, cosa fermerà un robot dal leggere letteratura e scoprire l'idea che la conoscenza sia migliore dell'ignoranza, e lo indurrà a spostare le sue finalità e riscrivere la programmazione?

Stuart Russell: Yes, so we want it to learn more, as I said, about our objectives. It'll only become more certain as it becomes more correct, so the evidence is there and it's going to be designed to interpret it correctly. It will understand, for example, that books are very biased in the evidence they contain. They only talk about kings and princes and elite white male people doing stuff. So it's a complicated problem, but as it learns more about our objectives it will become more and more useful to us.

Stuart Russell: Sì, vogliamo imparare meglio, come ho detto, sui nostri obiettivi. Diventerà più sicuro solo quando diventerà più corretto, quindi la prova è questa e sarà progettata per interpretarla correttamente. Capirà, per esempio, che i libri sono molto prevenuti in ciò che contengono. Parlano solo di re e principi e di élite di maschi bianchi che fanno cose. Quindi è un problema complicato, ma siccome impara di più sui nostri obiettivi diventerà ancora più utile per noi.

CA: And you couldn't just boil it down to one law, you know, hardwired in: "if any human ever tries to switch me off, I comply. I comply."

CA: E non potresti soltanto limitarti a una legge, sai, programmata così: "se un umano cerca di spegnermi, lo assecondo. Lo assecondo."

SR: Absolutely not. That would be a terrible idea. So imagine that you have a self-driving car and you want to send your five-year-old off to preschool. Do you want your five-year-old to be able to switch off the car while it's driving along? Probably not. So it needs to understand how rational and sensible the person is. The more rational the person, the more willing you are to be switched off. If the person is completely random or even malicious, then you're less willing to be switched off.

SR: Assolutamente no. Sarebbe un'idea terribile. Immaginate di avere una macchina che si guida da sola e volete mandare vostro figlio di cinque anni all'asilo. Volete che vostro figlio sia capace di spegnere la macchina mentre va in giro? Probabilmente no. Quindi deve capire quanto razionale e sensibile sia la persona. Più razionale è la persona, più disponibile siete a essere spenti. Se la persona è del tutto ignota o perfino malvagia, allora sarete meno disponibili a essere spenti.

CA: All right. Stuart, can I just say, I really, really hope you figure this out for us. Thank you so much for that talk. That was amazing.

CA: Giusto. Stuart, posso dire solo, spero davvero, davvero che tu lo scopra per noi. Grazie per questa conferenza. È stata fantastica.

SR: Thank you.

SR: Grazie.

(Applause)

(Applausi)

This is Lee Sedol. Lee Sedol is one of the world's greatest Go players, and he's having what my friends in Silicon Valley call a "Holy Cow" moment --

Questo è Lee Sedol. Lee Sedol è uno dei più grandi giocatori di Go al mondo, e sta avendo quello che i miei amici a Silicon Valley dicono un momento "accidenti" --

(Laughter)

(Risate)

a moment where we realize that AI is actually progressing a lot faster than we expected. So humans have lost on the Go board. What about the real world?

un momento in cui ci rendiamo conto che l'IA sta progredendo molto più rapidamente del previsto. Gli umani hanno perso a Go. E nel mondo reale?

(Laughter)

(Risate)

So we actually need to nail down the problem a bit more. What exactly is the problem? Why is better AI possibly a catastrophe?

In realtà ci occorre definire un po' di più il problema. Qual è il vero problema? Perché un'IA migliore è potenzialmente una catastrofe?

(Laughter)

(Risate)

It's very simple. Just remember that. Repeat it to yourself three times a day.

È molto semplice. Ricordate solo questo. Ripetetevelo tre volte al giorno.

(Laughter)

(Risate)

(Laughter)

(Risate)

So this seems to be inevitable, right? This kind of failure mode seems to be inevitable, and it follows from having a concrete, definite objective.

Sembra inevitabile, giusto? Questa modalità di guasto sembra inevitabile, e deriva dall'avere un obiettivo concreto e definito.

"Uh, your 20th anniversary at 7pm."

"Ehm, il tuo 20° anniversario alle 7."

"I can't do that. I'm meeting with the secretary-general at 7:30. How could this have happened?"

"Non posso farlo. Mi incontrerò col segretario generale alle 7:30. Come può essere successo?"

"Well, I did warn you, but you overrode my recommendation."

"Io ti ho avvertito, ma tu hai ignorato la mia raccomandazione."

"Well, what am I going to do? I can't just tell him I'm too busy."

"Cosa faccio? Non posso dirgli che sono impegnato."

"Don't worry. I arranged for his plane to be delayed."

"Non preoccuparti. Ho fatto ritardare il suo aereo."

(Laughter)

(Risate)

"Some kind of computer malfunction."

"Una specie di guasto al computer."

(Laughter)

(Risate)

"Really? You can do that?"

"Davvero? Puoi farlo?"

"He sends his profound apologies and looks forward to meeting you for lunch tomorrow."

"Si scusa tantissimo e non vede l'ora di incontrarti a pranzo domani."

(Laughter)

(Risate)

So the values here -- there's a slight mistake going on. This is clearly following my wife's values which is "Happy wife, happy life."

Quindi i valori qui -- si è verificato un piccolo errore. Questo segue chiaramente i valori di mia moglie cioè "Felice la moglie, felice la vita."

(Laughter)

(Risate)

It could go the other way. You could come home after a hard day's work, and the computer says, "Long day?"

Potrebbe andare diversamente. Potreste tornare a casa dopo una dura giornata di lavoro, e il computer dice, "Giornata lunga?"

"Yes, I didn't even have time for lunch."

"Sì, non ho avuto nemmeno il tempo di pranzare."

"You must be very hungry."

"Devi avere molta fame."

"Starving, yeah. Could you make some dinner?"

"Sto morendo di fame, sì. Puoi prepararmi la cena?"

"There's something I need to tell you."

"Devo dirti una cosa."

(Laughter)

(Risate)

"There are humans in South Sudan who are in more urgent need than you."

"Ci sono persone nel sud del Sudan che hanno un bisogno più urgente del tuo."

(Laughter)

(Risate)

"So I'm leaving. Make your own dinner."

"Quindi me ne vado. Preparati tu la cena."

(Laughter)

(Risate)

So we have to solve these problems, and I'm looking forward to working on them.

Dobbiamo risolvere questi problemi, e non vedo l'ora di lavorarci.

(Laughter)

(Risate)

And the robot hasn't quite learned the human value function properly, so it doesn't understand the sentimental value of the cat outweighs the nutritional value of the cat.

E il robot non ha ancora imparato i valori umani in modo corretto, quindi non capisce che il valore sentimentale del gatto supera il suo valore nutrizionale.

(Laughter)

(Risate)

(Applause)

(Applausi)

Chris Anderson: So interesting, Stuart. We're going to stand here a bit because I think they're setting up for our next speaker.

C. Anderson: Molto interessante, Stuart. Staremo qui un po' perché credo che stiano preparando per il prossimo relatore.

CA: And you couldn't just boil it down to one law, you know, hardwired in: "if any human ever tries to switch me off, I comply. I comply."

CA: E non potresti soltanto limitarti a una legge, sai, programmata così: "se un umano cerca di spegnermi, lo assecondo. Lo assecondo."

CA: All right. Stuart, can I just say, I really, really hope you figure this out for us. Thank you so much for that talk. That was amazing.

CA: Giusto. Stuart, posso dire solo, spero davvero, davvero che tu lo scopra per noi. Grazie per questa conferenza. È stata fantastica.

SR: Thank you.

SR: Grazie.

(Applause)

(Applausi)

Stuart Russell: 3 principles for creating safer AI

Stuart Russell: 3 principles for creating safer AI

Related talks

Blaise Agüera y Arcas: How computers are learning to be creative

Sam Harris: Can we build AI without losing control over it?

Zeynep Tufekci: Machine intelligence makes human morals more important

Noriko Arai: Can a robot pass a university entrance exam?

David Lee: Why jobs of the future won't feel like work

Kriti Sharma: How to keep human bias out of AI

Related talks

Blaise Agüera y Arcas: How computers are learning to be creative

Sam Harris: Can we build AI without losing control over it?

Zeynep Tufekci: Machine intelligence makes human morals more important

Noriko Arai: Can a robot pass a university entrance exam?

David Lee: Why jobs of the future won't feel like work

Kriti Sharma: How to keep human bias out of AI