Stuart Russell: 3 principles for creating safer AI

This is Lee Sedol. Lee Sedol is one of the world's greatest Go players, and he's having what my friends in Silicon Valley call a "Holy Cow" moment --

זהו לי סידול. הוא אחד מגדולי שחקני גו בעולם. כאן הוא חווה את הרגע שחבריי מעמק הסיליקון מכנים "זה הזוי!" -

(Laughter)

(צחוק)

a moment where we realize that AI is actually progressing a lot faster than we expected. So humans have lost on the Go board. What about the real world?

הרגע בו אנו מבינים שהתפתחותה של הב''מ (בינה מלאכתית) מתקדמת הרבה יותר מהר משציפינו. אז בני האנוש הפסידו במשחק גו. אבל מה עם העולם האמיתי?

Well, the real world is much bigger, much more complicated than the Go board. It's a lot less visible, but it's still a decision problem. And if we think about some of the technologies that are coming down the pike ... Noriko [Arai] mentioned that reading is not yet happening in machines, at least with understanding. But that will happen, and when that happens, very soon afterwards, machines will have read everything that the human race has ever written. And that will enable machines, along with the ability to look further ahead than humans can, as we've already seen in Go, if they also have access to more information, they'll be able to make better decisions in the real world than we can. So is that a good thing? Well, I hope so.

ובכן, העולם האמיתי הרבה יותר גדול, הרבה יותר מורכב ממשחק גו. זה פחות נגלה לעין, אבל זו עדיין בעיית קבלת החלטות. ואם חושבים על כמה טכנולוגיות שמתממשות כנגד עיניינו... נוריקו [אראי] הזכירה שמכונות עדיין לא יודעות לקרוא, לפחות לקרוא ולהבין. אבל, זה יקרה. וכאשר זה כן יקרה, עד מהרה הן תקראנה את כל מה שהאנושות כתבה אי פעם. זה יקנה למכונות יכולת חדשה, לצד יכולת החיזוי מעבר למה שבני האנוש מסוגלים לחזות, כפי שנוכחנו לדעת במשחק גו, אם תקבלנה גישה ליותר מידע, הן תוכלנה לקבל החלטות טובות יותר מאיתנו בעולם האמיתי. האם זה טוב לנו? ובכן, אני מקווה שכן.

Our entire civilization, everything that we value, is based on our intelligence. And if we had access to a lot more intelligence, then there's really no limit to what the human race can do. And I think this could be, as some people have described it, the biggest event in human history. So why are people saying things like this, that AI might spell the end of the human race? Is this a new thing? Is it just Elon Musk and Bill Gates and Stephen Hawking?

הציביליזציה שלנו על כל ערכיה, מבוססת על התבונה שלנו. ולו היתה לנו גישה לתבונה רבה יותר, אזי לא יהיה גבול למה שהאנושות תוכל לעשות. ואני סבור כי זה היה יכול להיות, כפי שאנשים מסוימים תיארו זאת, הארוע הגדול בתולדות האנושות. אז מדוע אנשים אומרים דברים כגון, ב''מ עלולה לגרום לסוף האנושות? האם זה חדש לנו? האם אלה רק אלון מאסק, ביל גייטס וסטיבן הוקינג?

Actually, no. This idea has been around for a while. Here's a quotation: "Even if we could keep the machines in a subservient position, for instance, by turning off the power at strategic moments" -- and I'll come back to that "turning off the power" idea later on -- "we should, as a species, feel greatly humbled." So who said this? This is Alan Turing in 1951. Alan Turing, as you know, is the father of computer science and in many ways, the father of AI as well. So if we think about this problem, the problem of creating something more intelligent than your own species, we might call this "the gorilla problem," because gorillas' ancestors did this a few million years ago, and now we can ask the gorillas: Was this a good idea?

לא. רעיון זה כבר קיים זמן מה. הרי הציטוט: "אפילו אם היינו מסוגלים לשלוט במכונות כבמשרתים בלבד, למשל על ידי כיבוי אספקת חשמל ברגעים קריטיים" -- אחזור לנושא "כיבוי החשמל" בהמשך -- "אנחנו כמין צריכים להרגיש ענווה גדולה." מי אמר זאת? אלן טורינג ב-1951. כידוע לכם, אלן טיורינג הוא אבי מדע המחשב ובמובנים רבים, גם אבי הב''מ. אם חושבים על הבעייתיות שביצירת משהו שחכם יותר ממינך שלך, ניתן לכנות זאת "בעיית הגורילה", היות ואבות אבותיהן של הגורילות עשו זאת לפני מיליוני שנה, וכיום נוכל לשאול את הגורילות: האם זה היה רעיון טוב?

So here they are having a meeting to discuss whether it was a good idea, and after a little while, they conclude, no, this was a terrible idea. Our species is in dire straits. In fact, you can see the existential sadness in their eyes.

הינה הם עורכים דיון על טיב הדבר ואחרי זמן מה הם קובעים שזה- לא! זה היה רעיון נוראי. מין הגורילות במצוקה קשה. למעשה, אתם יכולים לראות את העצב הקיומי בעיניהן.

(Laughter)

(צחוק)

So this queasy feeling that making something smarter than your own species is maybe not a good idea -- what can we do about that? Well, really nothing, except stop doing AI, and because of all the benefits that I mentioned and because I'm an AI researcher, I'm not having that. I actually want to be able to keep doing AI.

לגבי ההרגשה המבחילה הזאת שיצירת משהו שעולה עליך בחוכמתו איננו רעיון טוב -- מה עושים איתה? האמת היא שכלום, חוץ מעצירת פיתוח ב''מ. בגלל כל היתרונות שהזכרתי ובגלל שאני חוקר ב''מ, אני לא מקבל את אפשרות העצירה. אני כן מעוניין להמשיך בפיתוח ב''מ.

So we actually need to nail down the problem a bit more. What exactly is the problem? Why is better AI possibly a catastrophe?

לכן, אנחנו חייבים להגדיר את הבעיה טוב יותר. מהי מהות הבעיה בדיוק? למה ב''מ טובה יותר עלולה להוות אסון?

So here's another quotation: "We had better be quite sure that the purpose put into the machine is the purpose which we really desire." This was said by Norbert Wiener in 1960, shortly after he watched one of the very early learning systems learn to play checkers better than its creator. But this could equally have been said by King Midas. King Midas said, "I want everything I touch to turn to gold," and he got exactly what he asked for. That was the purpose that he put into the machine, so to speak, and then his food and his drink and his relatives turned to gold and he died in misery and starvation. So we'll call this "the King Midas problem" of stating an objective which is not, in fact, truly aligned with what we want. In modern terms, we call this "the value alignment problem."

אז הינה לכם עוד ציטוט: "עדיף שנהיה בטוחים שהיעוד שאנחנו מטמיעים במכונה זה אותו היעוד שאנחנו חפצים בו באמת ". זה נאמר על ידי נורברט וויינר ב-1960 זמן קצר אחרי שצפה באחת ממערכות הלמידה המוקדמות לומדת לשחק דמקה טוב מיוצרה. אימרה זו היתה גם יכולה להיאמר על ידי המלך מידאס. המלך מידאס אמר, "אני רוצה שכל דבר שאגע בו יהפוך לזהב". והוא קיבל את מבוקשו בדיוק. זה היעוד אותו הוא הטמיע במכונה, כביכול, וכל המזון והמשקה שלו וכל קרוביו הפכו לזהב והוא מת אומלל ומורעב. נקרא לכך "בעית מלך מידאס". מתן משימה אשר לא עולה בקנה אחד עם מה שאנו רוצים. במושגים מודרניים אנו קוראים לזה "בעיית תאום ערכים".

Putting in the wrong objective is not the only part of the problem. There's another part. If you put an objective into a machine, even something as simple as, "Fetch the coffee," the machine says to itself, "Well, how might I fail to fetch the coffee? Someone might switch me off. OK, I have to take steps to prevent that. I will disable my 'off' switch. I will do anything to defend myself against interference with this objective that I have been given." So this single-minded pursuit in a very defensive mode of an objective that is, in fact, not aligned with the true objectives of the human race -- that's the problem that we face. And in fact, that's the high-value takeaway from this talk. If you want to remember one thing, it's that you can't fetch the coffee if you're dead.

מתן מטרה שגויה אינה המרכיב היחיד של הבעיה. יש מרכיב נוסף. אם אתם מטמיעים משימה במכונה אפילו משהו פשוט כמו "תביאי לי קפה", המכונה אומרת לעצמה, "מה עלול למנוע ממני להביא את הקפה? מישהו עלול לנתק אותי מהחשמל. לכן אני חייבת לנקוט בצעדים שימנעו זאת. אנטרל את מתג הכיבוי שלי. אעשה הכל כדי להגן על עצמי כנגד מה שימנע ממני לבצע את המשימה שניתנה לי". לכן, חתירה חד-כיוונית למשימה בתצורה הגנתית המבטיחה את ביצוע המשימה, אשר איננה תואמת משימות אמיתיות של האנושות -- זו הבעיה אתה אנו מתמודדים. זו התובנה בעלת ערך אותה אנו נקח מהרצאה זו. אם תרצו לזכור דבר אחד -- -- לא ניתן להביא את הקפה אם אתה מת.

(Laughter)

(צחוק)

It's very simple. Just remember that. Repeat it to yourself three times a day.

זה פשוט מאוד. רק תזכרו את זה. שננו זאת לעצמכם שלוש פעמים ביום.

(Laughter)

(צחוק)

And in fact, this is exactly the plot of "2001: [A Space Odyssey]" HAL has an objective, a mission, which is not aligned with the objectives of the humans, and that leads to this conflict. Now fortunately, HAL is not superintelligent. He's pretty smart, but eventually Dave outwits him and manages to switch him off. But we might not be so lucky. So what are we going to do?

זוהי העלילה המדויקת של "2001: אודיסאה בחלל" ל-HAL יש משימה, אשר איננה מתואמת עם משימות של בני האנוש, וזה מוביל לקונפליקט. למרבה המזל HAL איננו בעל תבונת-על. הוא די חכם, אבל לבסוף דייב מצליח להערים עליו ומצליח לכבות אותו. אבל אנחנו עלולים לא להיות ברי מזל באותה מידה. אז מה נעשה?

I'm trying to redefine AI to get away from this classical notion of machines that intelligently pursue objectives. There are three principles involved. The first one is a principle of altruism, if you like, that the robot's only objective is to maximize the realization of human objectives, of human values. And by values here I don't mean touchy-feely, goody-goody values. I just mean whatever it is that the human would prefer their life to be like. And so this actually violates Asimov's law that the robot has to protect its own existence. It has no interest in preserving its existence whatsoever.

אני מנסה להגדיר את הב''מ במושגים אחרים כדי להתנתק מן ההעקרון הקלאסי לפיו מכונות שואפות לבצע משימות בצורה תבונתית. ישנם שלושה עקרונות בבסיס העניין. הראשון הוא, אם תרצו, עקרון האלטרויזם -- -- המשימה היחידה של הרובוטים היא מיקסום הגשמתן של המטרות של בני אנוש, של ערכי בני אנוש. באומרי "ערכים" אינני מתכוון כאן לערכים נשגבים של יפי נפש. אני פשוט מתכוון לכל מה שבני אנוש מעדיפים בחייהם. זה למעשה נוגד לחוק אסימוב לפיו רובוט חייב להגן על קיומו. אין לו עניין בשימור קיומו בכלל.

The second law is a law of humility, if you like. And this turns out to be really important to make robots safe. It says that the robot does not know what those human values are, so it has to maximize them, but it doesn't know what they are. And that avoids this problem of single-minded pursuit of an objective. This uncertainty turns out to be crucial.

העקרון השני, אם תרצו, הוא עקרון הענווה. מתברר שהוא חשוב מאוד ליצירת רובוטים בטוחים. לפיו רובוט איננו יודע מהם ערכי בני האנוש. הוא חייב לממש אותם על הצד הטוב ביותר, אבל הוא לא יודע מה הם. זה עוקף את בעיית חתירה חד-כיוונית למשימה. אי הוודאות הזאת הופכת לעניין מכריע.

Now, in order to be useful to us, it has to have some idea of what we want. It obtains that information primarily by observation of human choices, so our own choices reveal information about what it is that we prefer our lives to be like. So those are the three principles. Let's see how that applies to this question of: "Can you switch the machine off?" as Turing suggested.

בכדי שרובוט יהיה מועיל לנו עליו לדעת מה אנו רוצים, ברמה כלשהי. הוא מקבל מידע זה בעיקר מצפיה בבחירות אנושיות, כך שהבחירות שלנו מלמדות מידע על כיצד אנו מעדיפים שחיינו יהיו. אלה הם שלושת העקרונות. הבה נראה כיצד הם מיושמים בבעיה של: "האם אתה יכול לכבות את המכונה?" כפי שהציע טיורינג.

So here's a PR2 robot. This is one that we have in our lab, and it has a big red "off" switch right on the back. The question is: Is it going to let you switch it off? If we do it the classical way, we give it the objective of, "Fetch the coffee, I must fetch the coffee, I can't fetch the coffee if I'm dead," so obviously the PR2 has been listening to my talk, and so it says, therefore, "I must disable my 'off' switch, and probably taser all the other people in Starbucks who might interfere with me."

הינה רובוט מדגם PR2 -- -- אחד מאלה שיש לנו במעבדה ויש לו על הגב כפתור כיבוי גדול ואדום. השאלה היא: האם הוא ירשה לך לכבות אותו? אם נלך לפי המודל הקלאסי, כך הוא יבצע את משימת "תביא את הקפה": "אני חייב להביא את הקפה, אני לא יכול להביא את הקפה אם אני מת", והיות ו - PR2 כמובן הקשיב להרצאתי הוא אומר איפוא, "אני חייב לנטרל את מתג הכיבוי שלי ואולי גם לחשמל את כל האנשים בסטארבקס אשר עלולים להפריע לי".

(Laughter)

(צחוק)

So this seems to be inevitable, right? This kind of failure mode seems to be inevitable, and it follows from having a concrete, definite objective.

נראה בלתי נמנע, נכון? אופן פעולה זה נראה בלתי נמנע בגלל שהוא נובע ממתן משימה מוגדרת וסופית.

So what happens if the machine is uncertain about the objective? Well, it reasons in a different way. It says, "OK, the human might switch me off, but only if I'm doing something wrong. Well, I don't really know what wrong is, but I know that I don't want to do it." So that's the first and second principles right there. "So I should let the human switch me off." And in fact you can calculate the incentive that the robot has to allow the human to switch it off, and it's directly tied to the degree of uncertainty about the underlying objective.

אבל מה יקרה אם המכונה לא בטוחה לגבי המשימה? במקרה זה היא חושבת בצורה שונה. היא אומרת. "טוב, בן אנוש עלול לכבות אותי, אבל זה רק אם אעשה משהו לא נכון. אינני יודעת מהו לא נכון, אבל אני יודעת שאינני רוצה לעשות אותו." כך, יש לנו פה העקרון הראשון והשני גם יחד. "לכן, אני צריכה לאפשר לבני אנוש לכבות אותי". ניתן להטמיע את שיקול הרווח שיהיה לרובוט כדי שיאפשר לנו לכבות אותו. זה ישירות קשור ברמת אי-הוודאות לגבי מהות המשימה.

And then when the machine is switched off, that third principle comes into play. It learns something about the objectives it should be pursuing, because it learns that what it did wasn't right. In fact, we can, with suitable use of Greek symbols, as mathematicians usually do, we can actually prove a theorem that says that such a robot is provably beneficial to the human. You are provably better off with a machine that's designed in this way than without it. So this is a very simple example, but this is the first step in what we're trying to do with human-compatible AI.

וכאשר המכונה כבר מכובה, העקרון השלישי נכנס למשחק. המכונה לומדת משהו אודות המשימות שעליה לבצע, כי היא לומדת שמה שעשתה היה לא נכון. באמצעות שימוש ראוי באותיות יווניות, כפי שנהוג אצל המתמטיקאים, ניתן להוכיח ההנחה שאומרת שרובוט כזה הינו בהחלט מועיל לאנושות. עדיף לכם שהמכונה תהיה כזאת ולא אחרת. זוהי דוגמא מאוד פשוטה, אבל זה הצעד הראשון לקראת מה שאנו מנסים לעשות עם ב''מ מותאמת אנושות.

Now, this third principle, I think is the one that you're probably scratching your head over. You're probably thinking, "Well, you know, I behave badly. I don't want my robot to behave like me. I sneak down in the middle of the night and take stuff from the fridge. I do this and that." There's all kinds of things you don't want the robot doing. But in fact, it doesn't quite work that way. Just because you behave badly doesn't mean the robot is going to copy your behavior. It's going to understand your motivations and maybe help you resist them, if appropriate. But it's still difficult. What we're trying to do, in fact, is to allow machines to predict for any person and for any possible life that they could live, and the lives of everybody else: Which would they prefer? And there are many, many difficulties involved in doing this; I don't expect that this is going to get solved very quickly. The real difficulties, in fact, are us.

העקרון השלישי הוא זה שלגביו אתם מהרהרים לדעתי. אתם בוודאי חושבים, "אני מתנהג לא כראוי. ואינני רוצה שהרובוט שלי יתנהג כמוני. אני מתגנב באמצע הלילה ולוקח דברים מן המקרר. אני עושה את זה ואת זה." יש כל מיני דברים שלא תרצו שהרובוט שלכם יעשה. אבל, הדברים לא בדיוק עובדים ככה. רק בגלל שאתם מתנהגים לא כראוי הרובוט שלכם לא בהכרח יעתיק את התנהגותכם. הוא יבין את המניעים שלכם ואולי יעזור לכם להתנגד להם, במידה וזה יהיה ראוי. אבל עדיין, העניין מסובך. מה שאנו מנסים לעשות, זה לאפשר למכונות לחזות בעבור כל אדם ועבור כל מסלולי החיים שהוא עשוי לחיות, וכן חייהם של כל השאר: מה יעדיפו? וישנם הרבה מאוד קשיים בדרך לביצוע, אינני מצפה שהעניין ייפתר במהרה. הקשיים האמיתיים הם למעשה אנחנו.

As I have already mentioned, we behave badly. In fact, some of us are downright nasty. Now the robot, as I said, doesn't have to copy the behavior. The robot does not have any objective of its own. It's purely altruistic. And it's not designed just to satisfy the desires of one person, the user, but in fact it has to respect the preferences of everybody. So it can deal with a certain amount of nastiness, and it can even understand that your nastiness, for example, you may take bribes as a passport official because you need to feed your family and send your kids to school. It can understand that; it doesn't mean it's going to steal. In fact, it'll just help you send your kids to school.

כפי שכבר ציינתי, אנחנו מתנהגים לא כראוי. למעשה, חלק מאיתנו פשוט נוראים. כפי שאמרתי הרובוט לא חייב להעתיק את התנהגותינו. לרובוט אין כל משימה משלו. הוא אלטרויסט טהור. והוא לא עוצב לשם מימוש רצונות של אדם אחד בלבד,המשתמש, אלא לקחת בחשבון את העדפות של כולם. כך, הוא יכול להתמודד עם רמה מסוימת של רוע, והוא אפילו יכול להבין את ההתנהגות הרעה שלך. למשל, אתה אולי לוקח שוחד בתור פקיד דרכונים, בגלל שאתה צריך להאכיל את משפחתך ולשלוח את ילדיך לביה''ס. הוא יכול להבין זאת. אין זה אומר שהוא בעצמו יגנוב. ההבנה שלו רק תעזור לך לשלוח את הילדים לב''ס.

We are also computationally limited. Lee Sedol is a brilliant Go player, but he still lost. So if we look at his actions, he took an action that lost the game. That doesn't mean he wanted to lose. So to understand his behavior, we actually have to invert through a model of human cognition that includes our computational limitations -- a very complicated model. But it's still something that we can work on understanding.

אנחנו גם מוגבלים בתחום החישובים. לי סדול הוא שחקן גו מבריק, אבל הוא עדיין הפסיד. אז אם נתבונן בצעדיו, הוא עשה צעד שהוביל להפסד. זה לא אומר שהוא רצה להפסיד. לכן, על מנת להבין את התנהגותו, עלינו לרדת לפרטי המודל הקוגניטיבי האנושי שכולל את המגבלות החישוביות שלנו - וזה מודל מסובך מאוד. אבל, זה עדיין משהו שאנחנו יכולים לנסות להבין.

Probably the most difficult part, from my point of view as an AI researcher, is the fact that there are lots of us, and so the machine has to somehow trade off, weigh up the preferences of many different people, and there are different ways to do that. Economists, sociologists, moral philosophers have understood that, and we are actively looking for collaboration.

אולי החלק הקשה ביותר עבורי כחוקר ב''מ הוא העובדה שאנחנו רבים, ולכן המכונה חייבת איכשהו לנתב ולקחת בחשבון את העדפות של המון אנשים שונים. וישנן דרכים שונות לעשות זאת. כלכלנים, סוציולוגים, פילוסופים חוקרי מוסר הבינו זאת ואנחנו מחפשים את שיתוף הפעולה באופן פעיל.

Let's have a look and see what happens when you get that wrong. So you can have a conversation, for example, with your intelligent personal assistant that might be available in a few years' time. Think of a Siri on steroids. So Siri says, "Your wife called to remind you about dinner tonight." And of course, you've forgotten. "What? What dinner? What are you talking about?"

בוא נראה מה קורה כאשר לא מבינים את זה נכונה. נגיד שאתם מנהלים שיחה עם העוזר האישי התבוני שלכם, אשר יכול להיות זמין בעוד כמה שנים. תחשבו על סירי על סטרואידים. אז סירי אומרת, "אישתך התקשרה להזכיר על ארוחת הערב". וכמובן אתם שכחתם. "מה? איזה ארוחה? על מה אתה מדבר?"

"Uh, your 20th anniversary at 7pm."

"אה, יום הנישואין ה-20 שלכם ב- 19:00"

"I can't do that. I'm meeting with the secretary-general at 7:30. How could this have happened?"

"אני לא יכול. יש לי פגישה עם המנכ''ל ב- 19:30. איך זה היה יכול לקרות?"

"Well, I did warn you, but you overrode my recommendation."

"הזכרתי לך, אבל אתה התעלמת מההמלצה שלי".

"Well, what am I going to do? I can't just tell him I'm too busy."

"טוב, אזה מה אני אעשה? אני לא יכול פשוט לומר לו שאני עסוק."

"Don't worry. I arranged for his plane to be delayed."

"אל תדאג. דאגתי שיהיה לו עיכוב בטיסה".

(Laughter)

(צחוק)

"Some kind of computer malfunction."

"סוג של תקלת מחשב".

(Laughter)

(צחוק)

"Really? You can do that?"

"באמת? אתה יכול לעשות את זה?"

"He sends his profound apologies and looks forward to meeting you for lunch tomorrow."

"הוא שולח את התנצלותו העמוקה ומצפה לפגוש אותך מחר לארוחת הצהרים".

(Laughter)

(צחוק)

So the values here -- there's a slight mistake going on. This is clearly following my wife's values which is "Happy wife, happy life."

אז מבחינת הערכים פה -- ישנו שיבוש קל. זה בבירור תואם לערכיה של אישתי - "אישה מאושרת- חיים מאושרים".

(Laughter)

(צחוק)

It could go the other way. You could come home after a hard day's work, and the computer says, "Long day?"

זה היה יכול ללכת גם לכיוון אחר. נניח שחזרתם הביתה אחרי יום עבודה קשה והמחשב שואל "יום ארוך?"

"Yes, I didn't even have time for lunch."

"כן, אפילו לא היה לי זמן לאכול צהרים".

"You must be very hungry."

"אתה בטח מאוד רעב".

"Starving, yeah. Could you make some dinner?"

כן, גווע מרעב. תוכל לבשל ארוחת ערב?"

"There's something I need to tell you."

"יש משהו שאני חייב לספר לך"

(Laughter)

(צחוק)

"There are humans in South Sudan who are in more urgent need than you."

"יש בני אנוש בדרום סודן שנזקקים הרבה יותר ממך."

(Laughter)

(צחוק)

"So I'm leaving. Make your own dinner."

"אז אני עוזב. תכין לך אוכל בעצמך."

(Laughter)

(צחוק)

So we have to solve these problems, and I'm looking forward to working on them.

אז עלינו לפתור את הבעיות הללו ואני מאוד רוצה לעבוד על זה.

There are reasons for optimism. One reason is, there is a massive amount of data. Because remember -- I said they're going to read everything the human race has ever written. Most of what we write about is human beings doing things and other people getting upset about it. So there's a massive amount of data to learn from.

יש סיבות לאופטימיות. סיבה ראשונה היא שקיימת כמות אדירה של מידע. זוכרים -- אמרתי שהם יקראו את כל מה שהאנושות כתבה אי פעם? לרוב אנחנו כותבים אודות מעשי אנשים ואודות אנשים אחרים שמתוסכלים מכך. לכן, ישנו נפח ענק של מידע שאפשר ללמוד ממנו.

There's also a very strong economic incentive to get this right. So imagine your domestic robot's at home. You're late from work again and the robot has to feed the kids, and the kids are hungry and there's nothing in the fridge. And the robot sees the cat.

ישנו גם מניע כלכלי חזק מאוד לעשות זאת נכון. דמיינו את הרובוט הביתי שלכם. שוב חזרתם מאוחר מהעבודה והרובוט חייב להאכיל את ילדיכם הילדים רעבים ואין כלום במקרר. ואז הרובוט מבחין בחתול.

(Laughter)

(צחוק)

And the robot hasn't quite learned the human value function properly, so it doesn't understand the sentimental value of the cat outweighs the nutritional value of the cat.

הרובוט עדיין לא לגמרי למד כיצד הערכים האנושיים עובדים, לכן איננו מבין שהערך הרגשי של החתול רב על ערכו התזונתי.

(Laughter)

(צחוק)

So then what happens? Well, it happens like this: "Deranged robot cooks kitty for family dinner." That one incident would be the end of the domestic robot industry. So there's a huge incentive to get this right long before we reach superintelligent machines.

אז מה קורה אחרי זה? מה שקורה זה ככה: "הרובוט המטורף מבשל את החתול לארוחת ערב משפחתית". ארוע אחד שכזה יהיה סופה של תעשיית רובוטים ביתיים. לכן, ישנה סיבה ממש טובה לעשות הכל נכון. הרבה לפני שנגיע לשלב מכונות עם תבונת-על.

So to summarize: I'm actually trying to change the definition of AI so that we have provably beneficial machines. And the principles are: machines that are altruistic, that want to achieve only our objectives, but that are uncertain about what those objectives are, and will watch all of us to learn more about what it is that we really want. And hopefully in the process, we will learn to be better people. Thank you very much.

לסיכום: אני מנסה לשנות את הגדרת ב''מ, כך שנייצר מכונות שתהיינה בהחלט טובות לנו. והרי העקרונות: מכונות שהן אלטרויסטיות, שרוצות לממש את המשימות שלנו בלבד, אבל אינן בטוחות מה הן המשימות הללו ושתתבוננה בכולנו כדי ללמוד יותר על מה שאנחנו רוצים באמת. נקווה שתוך כדי כך נלמד להיות אנשים טובים יותר. תודה רבה.

(Applause)

(מחיאות כפיים)

Chris Anderson: So interesting, Stuart. We're going to stand here a bit because I think they're setting up for our next speaker.

כריס אנדרסון (כ.א.): זה כל כך מעניין, סטוארט. בוא רגע נעמוד פה כי אני חושב שהמארגנים מתכוננים לדובר הבא.

A couple of questions. So the idea of programming in ignorance seems intuitively really powerful. As you get to superintelligence, what's going to stop a robot reading literature and discovering this idea that knowledge is actually better than ignorance and still just shifting its own goals and rewriting that programming?

מספר שאלות. רעיון הטמעת בורות מרגיש ממש עוצמתי. אבל, כאשר נגיע לשלב תבונת-על מה ימנע מרובוט לעיין בספרות ולגלות את הרעיון שידע עדיף על בורות, לשנות את מטרותיו ולשכתב את התוכנה בהתאם?

Stuart Russell: Yes, so we want it to learn more, as I said, about our objectives. It'll only become more certain as it becomes more correct, so the evidence is there and it's going to be designed to interpret it correctly. It will understand, for example, that books are very biased in the evidence they contain. They only talk about kings and princes and elite white male people doing stuff. So it's a complicated problem, but as it learns more about our objectives it will become more and more useful to us.

ס.ר.: כן. אנחנו רוצים שילמד יותר, כפי שאמרתי, אודות מטרותינו. זה יתבהר יותר רק כשזה יהיה נכון יותר אז העובדות נמצאות בפניו והרובוט יעוצב כך שיוכל לפרש אותן נכון. למשל, הוא יבין שספרים מוטים מאוד בעובדות שהם מכילים. הם מדברים רק על המלכים והנסיכות ואליטות של גברים לבנים שעושים כל מיני דברים. לכן, זוהי בעיה מורכבת. אבל, ככל שהרובוט ילמד את מטרותינו הוא יילך וייעשה מועיל יותר עבורנו.

CA: And you couldn't just boil it down to one law, you know, hardwired in: "if any human ever tries to switch me off, I comply. I comply."

כ.א.: אי אפשר לסכם את זה לכדי חוק אחד, אתה יודע, משהו חצוב בסלע: " אם בן אנוש ינסה אי פעם לכבות אותי, אני מציית. אני מציית."

SR: Absolutely not. That would be a terrible idea. So imagine that you have a self-driving car and you want to send your five-year-old off to preschool. Do you want your five-year-old to be able to switch off the car while it's driving along? Probably not. So it needs to understand how rational and sensible the person is. The more rational the person, the more willing you are to be switched off. If the person is completely random or even malicious, then you're less willing to be switched off.

ס.ר.: לגמרי לא. זה יהיה רעיון נוראי. דמיינו שיש לכם רכב אוטומטי ואתם רוצים לשלוח את ילדיכם בן החמש לגן ילדים. הייתם רוצים שהילד יוכל לכבות את הרכב תוך כדי הנסיעה? כנראה שלא. יוצא שהמכונית חייבת להחליט עד כמה הנוסע נבון והגיוני. ככל שהנוסע הגיוני יותר, כך המכונית תהיה מוכנה יותר שיכבו אותה. אם הנוסע פזיז וחסר הגיון לחלוטין או אפילו זדוני, אז המכונית פחות תרצה שיכבו אותה.

CA: All right. Stuart, can I just say, I really, really hope you figure this out for us. Thank you so much for that talk. That was amazing.

ק.א.: בסדר גמור. תרשה לי רק לומר שאני ממש מקווה שתפתור את זה עבורינו. תודה רבה על ההרצאה הזאת. זה היה מדהים.

SR: Thank you.

ס.ר: תודה.

(Applause)

(מחיאות כפיים)

This is Lee Sedol. Lee Sedol is one of the world's greatest Go players, and he's having what my friends in Silicon Valley call a "Holy Cow" moment --

זהו לי סידול. הוא אחד מגדולי שחקני גו בעולם. כאן הוא חווה את הרגע שחבריי מעמק הסיליקון מכנים "זה הזוי!" -

(Laughter)

(צחוק)

a moment where we realize that AI is actually progressing a lot faster than we expected. So humans have lost on the Go board. What about the real world?

(Laughter)

(צחוק)

So we actually need to nail down the problem a bit more. What exactly is the problem? Why is better AI possibly a catastrophe?

לכן, אנחנו חייבים להגדיר את הבעיה טוב יותר. מהי מהות הבעיה בדיוק? למה ב''מ טובה יותר עלולה להוות אסון?

(Laughter)

(צחוק)

It's very simple. Just remember that. Repeat it to yourself three times a day.

זה פשוט מאוד. רק תזכרו את זה. שננו זאת לעצמכם שלוש פעמים ביום.

(Laughter)

(צחוק)

(Laughter)

(צחוק)

So this seems to be inevitable, right? This kind of failure mode seems to be inevitable, and it follows from having a concrete, definite objective.

נראה בלתי נמנע, נכון? אופן פעולה זה נראה בלתי נמנע בגלל שהוא נובע ממתן משימה מוגדרת וסופית.

"Uh, your 20th anniversary at 7pm."

"אה, יום הנישואין ה-20 שלכם ב- 19:00"

"I can't do that. I'm meeting with the secretary-general at 7:30. How could this have happened?"

"אני לא יכול. יש לי פגישה עם המנכ''ל ב- 19:30. איך זה היה יכול לקרות?"

"Well, I did warn you, but you overrode my recommendation."

"הזכרתי לך, אבל אתה התעלמת מההמלצה שלי".

"Well, what am I going to do? I can't just tell him I'm too busy."

"טוב, אזה מה אני אעשה? אני לא יכול פשוט לומר לו שאני עסוק."

"Don't worry. I arranged for his plane to be delayed."

"אל תדאג. דאגתי שיהיה לו עיכוב בטיסה".

(Laughter)

(צחוק)

"Some kind of computer malfunction."

"סוג של תקלת מחשב".

(Laughter)

(צחוק)

"Really? You can do that?"

"באמת? אתה יכול לעשות את זה?"

"He sends his profound apologies and looks forward to meeting you for lunch tomorrow."

"הוא שולח את התנצלותו העמוקה ומצפה לפגוש אותך מחר לארוחת הצהרים".

(Laughter)

(צחוק)

So the values here -- there's a slight mistake going on. This is clearly following my wife's values which is "Happy wife, happy life."

אז מבחינת הערכים פה -- ישנו שיבוש קל. זה בבירור תואם לערכיה של אישתי - "אישה מאושרת- חיים מאושרים".

(Laughter)

(צחוק)

It could go the other way. You could come home after a hard day's work, and the computer says, "Long day?"

זה היה יכול ללכת גם לכיוון אחר. נניח שחזרתם הביתה אחרי יום עבודה קשה והמחשב שואל "יום ארוך?"

"Yes, I didn't even have time for lunch."

"כן, אפילו לא היה לי זמן לאכול צהרים".

"You must be very hungry."

"אתה בטח מאוד רעב".

"Starving, yeah. Could you make some dinner?"

כן, גווע מרעב. תוכל לבשל ארוחת ערב?"

"There's something I need to tell you."

"יש משהו שאני חייב לספר לך"

(Laughter)

(צחוק)

"There are humans in South Sudan who are in more urgent need than you."

"יש בני אנוש בדרום סודן שנזקקים הרבה יותר ממך."

(Laughter)

(צחוק)

"So I'm leaving. Make your own dinner."

"אז אני עוזב. תכין לך אוכל בעצמך."

(Laughter)

(צחוק)

So we have to solve these problems, and I'm looking forward to working on them.

אז עלינו לפתור את הבעיות הללו ואני מאוד רוצה לעבוד על זה.

(Laughter)

(צחוק)

And the robot hasn't quite learned the human value function properly, so it doesn't understand the sentimental value of the cat outweighs the nutritional value of the cat.

הרובוט עדיין לא לגמרי למד כיצד הערכים האנושיים עובדים, לכן איננו מבין שהערך הרגשי של החתול רב על ערכו התזונתי.

(Laughter)

(צחוק)

(Applause)

(מחיאות כפיים)

Chris Anderson: So interesting, Stuart. We're going to stand here a bit because I think they're setting up for our next speaker.

כריס אנדרסון (כ.א.): זה כל כך מעניין, סטוארט. בוא רגע נעמוד פה כי אני חושב שהמארגנים מתכוננים לדובר הבא.

CA: And you couldn't just boil it down to one law, you know, hardwired in: "if any human ever tries to switch me off, I comply. I comply."

CA: All right. Stuart, can I just say, I really, really hope you figure this out for us. Thank you so much for that talk. That was amazing.

ק.א.: בסדר גמור. תרשה לי רק לומר שאני ממש מקווה שתפתור את זה עבורינו. תודה רבה על ההרצאה הזאת. זה היה מדהים.

SR: Thank you.

ס.ר: תודה.

(Applause)

(מחיאות כפיים)

Stuart Russell: 3 principles for creating safer AI

Stuart Russell: 3 principles for creating safer AI

Related talks

Blaise Agüera y Arcas: How computers are learning to be creative

Sam Harris: Can we build AI without losing control over it?

Zeynep Tufekci: Machine intelligence makes human morals more important

Noriko Arai: Can a robot pass a university entrance exam?

David Lee: Why jobs of the future won't feel like work

Kriti Sharma: How to keep human bias out of AI

Related talks

Blaise Agüera y Arcas: How computers are learning to be creative

Sam Harris: Can we build AI without losing control over it?

Zeynep Tufekci: Machine intelligence makes human morals more important

Noriko Arai: Can a robot pass a university entrance exam?

David Lee: Why jobs of the future won't feel like work

Kriti Sharma: How to keep human bias out of AI