Nick Bostrom: What happens when our computers get smarter than we are?

I work with a bunch of mathematicians, philosophers and computer scientists, and we sit around and think about the future of machine intelligence, among other things. Some people think that some of these things are sort of science fiction-y, far out there, crazy. But I like to say, okay, let's look at the modern human condition. (Laughter) This is the normal way for things to be.

אני עובד עם חבורה של מתמטיקאים, פילוסופים ומדעני מחשב. ואנחנו יושבים וחושבים על העתיד של אינטיליגנצית מכונות, בין דברים אחרים. כמה אנשים חושבים שכמה מהדברים האלה הם סוג של מדע בדיוני, רחוקים מאוד, משוגעים. אבל אני אוהב לומר, אוקיי, בואו נביט במצב האנושי המודרני. (צחוק) זאת הדרך הנורמלית של הדברים.

But if we think about it, we are actually recently arrived guests on this planet, the human species. Think about if Earth was created one year ago, the human species, then, would be 10 minutes old. The industrial era started two seconds ago. Another way to look at this is to think of world GDP over the last 10,000 years, I've actually taken the trouble to plot this for you in a graph. It looks like this. (Laughter) It's a curious shape for a normal condition. I sure wouldn't want to sit on it. (Laughter)

אבל אם נחשוב על זה, אנחנו למעשה אורחים שהגיעו לאחרונה לפלנטה הזו, המין האנושי. חשבו על אם כדור הארץ היה נוצר לפני שנה, המין האנושי, אז, היה בן 10 דקות. המהפכה התעשייתית התחילה לפני שתי שניות. דרך נוספת להביט בזה היא לחשוב על התוצר הלאומי הגולמי ב 10,000 השנים האחרונות, אני למעשה טרחתי לצייר את זה על גרף בשבילכם. זה נראה כך. (צחוק) זו צורה מוזרה למצב נורמלי. אני מאוד לא הייתי רוצה לשבת על זה. (צחוק)

Let's ask ourselves, what is the cause of this current anomaly? Some people would say it's technology. Now it's true, technology has accumulated through human history, and right now, technology advances extremely rapidly -- that is the proximate cause, that's why we are currently so very productive. But I like to think back further to the ultimate cause.

בואו נשאל את עצמנו, מהי הסיבה לאנומליה הנוכחית? כמה אנשים היו אומרים שזו טכנולוגיה. עכשיו זה נכון, הטכנולוגיה הצטברה במהלך ההסטוריה האנושית, וממש עכשיו, ההתקדמות הטכנולוגית היא ממש מהירה -- זו הסיבה המשוערת, לכן אנחנו כל כך פרודוקטיביים עכשיו. אבל אני רוצה לחשוב אחורה יותר לסיבה האולטימטיבית.

Look at these two highly distinguished gentlemen: We have Kanzi -- he's mastered 200 lexical tokens, an incredible feat. And Ed Witten unleashed the second superstring revolution. If we look under the hood, this is what we find: basically the same thing. One is a little larger, it maybe also has a few tricks in the exact way it's wired. These invisible differences cannot be too complicated, however, because there have only been 250,000 generations since our last common ancestor. We know that complicated mechanisms take a long time to evolve. So a bunch of relatively minor changes take us from Kanzi to Witten, from broken-off tree branches to intercontinental ballistic missiles.

הביטו בשני האדונים המאוד נבדלים האלה: יש לנו את קנזי -- הוא שולט ב 200 מונחים לשוניים, מטלה מדהימה. ואד וויטן שיחרר את מהפכת מיתרי העל השניה. אם נביט מתחת למכסה המנוע, זה מה שנמצא: בעיקרון אותו הדבר. אחד גדול מעט יותר, אולי יש לו גם כמה טריקים בצורה בה הוא מחווט. ההבדלים הבלתי נראים לא יכולים להיות כל כך מסובכים, עם זאת, מפני שיש הבדל של 250,000 דורות בלבד מאז האב המשותף שלנו. אנחנו יודעים שלמנגנונים מורכבים לוקח זמן רב להתפתח. אז כמה שינויים יחסית קטנים לוקחים אותנו מקנזי לוויטן, מענפים שבורים לטילים בין יבשתיים.

So this then seems pretty obvious that everything we've achieved, and everything we care about, depends crucially on some relatively minor changes that made the human mind. And the corollary, of course, is that any further changes that could significantly change the substrate of thinking could have potentially enormous consequences.

אז זה נראה די ברור שכל מה שהשגנו, וכל מה שחשוב לנו, תלוי קריטית בשינויים יחסית מינוריים שיצרו את המוח האנושי. והמקביל לזה, כמובן, זה שכל שינוים עתידיים שיכולים להשפיע משמעותית על התשתית של החשיבה יכולים פוטנציאלית להיות בעלי תוצאות עצומות.

Some of my colleagues think we're on the verge of something that could cause a profound change in that substrate, and that is machine superintelligence. Artificial intelligence used to be about putting commands in a box. You would have human programmers that would painstakingly handcraft knowledge items. You build up these expert systems, and they were kind of useful for some purposes, but they were very brittle, you couldn't scale them. Basically, you got out only what you put in. But since then, a paradigm shift has taken place in the field of artificial intelligence.

כמה מעמיתי חושבים שאנחנו על סף משהו שיוכל לגרום לשינוי משמעותי בתשתית, ושהדבר זה הוא בינת על של מכונות. בינה מלאכותית היתה לשים פקודות במכונה. הייתם צריכים מתכנתים אנושיים שיבנו בעבודה קשה את פריטי הידע. אתם בונים את המערכות המתמחות האלו, והן סוג של יעילות למטרות מסויימות, אבל הן היו מאוד שבירות, לא יכולתם להגדיל אותן. בעיקרון, קיבלתם רק מה שהכנסתם. אבל מאז, היה שינוי פרדיגמה שהתרחש בשדה של בינה מלאכותית.

Today, the action is really around machine learning. So rather than handcrafting knowledge representations and features, we create algorithms that learn, often from raw perceptual data. Basically the same thing that the human infant does. The result is A.I. that is not limited to one domain -- the same system can learn to translate between any pairs of languages, or learn to play any computer game on the Atari console. Now of course, A.I. is still nowhere near having the same powerful, cross-domain ability to learn and plan as a human being has. The cortex still has some algorithmic tricks that we don't yet know how to match in machines.

היום, הפעילות סובבת סביב לימוד מכונה. אז במקום לייצר יצוג ותכונות של ידע, יצרנו אלגוריתמים שלומדים, הרבה פעמים ממידע תפישתי גולמי. בעיקרון אותו הדבר שתינוקות אנושיים עושים. התוצאה היא ב"מ שלא מוגבלת לתחום אחד -- אותה מערכת יכולה ללמוד לתרגם בין כל זוג שפות, או ללמוד לשחק כל משחק מחשב על קונסולת האטארי. עכשיו כמובן, ב"מ עדיין לא קרובה ליכולות חוצות התחומים והעוצמה והיכולת ללמוד ולתכנן כמו בני אדם. לקורטקס עדיין יש כמה טריקים אלגוריתמיים שאנחנו עדיין לא יודעים איך לדמות במכונות.

So the question is, how far are we from being able to match those tricks? A couple of years ago, we did a survey of some of the world's leading A.I. experts, to see what they think, and one of the questions we asked was, "By which year do you think there is a 50 percent probability that we will have achieved human-level machine intelligence?" We defined human-level here as the ability to perform almost any job at least as well as an adult human, so real human-level, not just within some limited domain. And the median answer was 2040 or 2050, depending on precisely which group of experts we asked. Now, it could happen much, much later, or sooner, the truth is nobody really knows.

אז השאלה היא, כמה רחקו אנחנו מלהיות מסוגלים להשתוות לטריקים האלה? לפני כמה שנים, עשינו סקר של כמה ממומחי הב"מ המובילים בעולם, כדי לראות מה הם חושבים, ואחת השאלות ששאלנו היתה, "באיזו שנה אתם חושבים שיש הסתברות של 50 אחוז שנגיע לרמת בינה אנושית במכונות?" הגדרנו רמת בינה אנושית פה כיכולת לבצע כמעט כל עבודה בצורה טובה לפחות כמו אדם בוגר, אז רמה אנושית אמיתית, לא רק בתוך תחום מסויים. והתוצאה החציונית היתה בין 2040 ל 2050, תלוי בדיוק באיזו קבוצת מומחים שאלנו. עכשיו, זה יכול לקרות הרבה הרבה יותר מאוחר, או מוקדם, האמת היא שאף אחד לא באמת יודע.

What we do know is that the ultimate limit to information processing in a machine substrate lies far outside the limits in biological tissue. This comes down to physics. A biological neuron fires, maybe, at 200 hertz, 200 times a second. But even a present-day transistor operates at the Gigahertz. Neurons propagate slowly in axons, 100 meters per second, tops. But in computers, signals can travel at the speed of light. There are also size limitations, like a human brain has to fit inside a cranium, but a computer can be the size of a warehouse or larger. So the potential for superintelligence lies dormant in matter, much like the power of the atom lay dormant throughout human history, patiently waiting there until 1945. In this century, scientists may learn to awaken the power of artificial intelligence. And I think we might then see an intelligence explosion.

מה שאנחנו כן יודעים זה שהסף האולטימטיבי לעיבוד מידע בתשתית המכונה נמצא מחוץ למגבלות ברקמה ביולוגית. זה בסופו של דבר עניין של פיזיקה. ניורון ביולוגי יורה, אולי, ב 200 הרץ, 200 פעמים בשניה. אבל אפילו טרנזיסטור היום פועל בגיגה הרצים. הניורונים מתפשטים לאט באקסונים 100 מטר בשניה מקסימום. אבל במחשבים, האות יכול לנוע במהירות האור. יש גם מגבלות גודל, כמו שמוח אנושי חייב להתאים לתוך הגולגולת, אבל מחשב יכול להיות בגודל של מחסן או גדול יותר. אז הפוטנציאל לבינת על נמצא רדום בתוך החומר, בדומה לכוח האטום שנמצא רדום במהלך ההסטוריה האנושית, בסבלנות עד 1945, במאה הזו, מדענים אולי ילמדו להעיר את הכוח של בינה מלאכותית. ואני חושב שאולי נראה התפוצצות בינה.

Now most people, when they think about what is smart and what is dumb, I think have in mind a picture roughly like this. So at one end we have the village idiot, and then far over at the other side we have Ed Witten, or Albert Einstein, or whoever your favorite guru is. But I think that from the point of view of artificial intelligence, the true picture is actually probably more like this: AI starts out at this point here, at zero intelligence, and then, after many, many years of really hard work, maybe eventually we get to mouse-level artificial intelligence, something that can navigate cluttered environments as well as a mouse can. And then, after many, many more years of really hard work, lots of investment, maybe eventually we get to chimpanzee-level artificial intelligence. And then, after even more years of really, really hard work, we get to village idiot artificial intelligence. And a few moments later, we are beyond Ed Witten. The train doesn't stop at Humanville Station. It's likely, rather, to swoosh right by.

עכשיו רוב האנשים, כשהם חושבים על מה זה חכם ומה טיפש, אני חושב שיש בראשם תמונה בערך כזו. אז בצד אחד יש לנו את טיפש הכפר, ואז הרחק בצד השני יש לנו את אד וויטן, או אלברט איינשטיין, או מי שהגורו החביב עליכם. אבל אני חושב שמנקודת המבט של בינה מלאכותית, התמונה האמיתית היא למעשה יותר דומה לזה: ב"מ מתחילה בנקודה הזו פה, באפס בינה, ואז, אחרי הרבה, הרבה, שנים של עבודה קשה, אולי לבסוף נגיע לרמת בינה של עכבר, משהו שיכול לנווט בסביבות מורכבות כמו שעכבר יכול. ואז, אחרי עוד הרבה הרבה שנים של עבודה ממש קשה, הרבה השקעות, אולי לבסוף נגיע לרמת בינה של שימפנזה. ואז, אחרי אפילו יותר שנים של עבודה ממש ממש קשה, נגיע לרמה של אידיוט הכפר. וכמה רגעים אחר כך, אנחנו מעבר לאד וויטן. הרכבת לא עוצרת בעיר בני האדם, רוב הסיכוים שהיא תחלוף ביעף.

Now this has profound implications, particularly when it comes to questions of power. For example, chimpanzees are strong -- pound for pound, a chimpanzee is about twice as strong as a fit human male. And yet, the fate of Kanzi and his pals depends a lot more on what we humans do than on what the chimpanzees do themselves. Once there is superintelligence, the fate of humanity may depend on what the superintelligence does. Think about it: Machine intelligence is the last invention that humanity will ever need to make. Machines will then be better at inventing than we are, and they'll be doing so on digital timescales. What this means is basically a telescoping of the future. Think of all the crazy technologies that you could have imagined maybe humans could have developed in the fullness of time: cures for aging, space colonization, self-replicating nanobots or uploading of minds into computers, all kinds of science fiction-y stuff that's nevertheless consistent with the laws of physics. All of this superintelligence could develop, and possibly quite rapidly.

עכשיו יש לזה השלכות עמוקות, בעיקר כשזה מגיע לשאלות של כוח. לדוגמה, שימפנזים חזקים -- קילו לקילו, שימפנזה חזקה בערך פי שתיים מגבר אנושי. ועדיין, הגורל של קנזי וחבריו תלוי הרבה יותר במה שאנשים יעשו מאשר במה שהשימפנזות יעשו בעצמן. ברגע שתהייה בינת על, גורל האנושות אולי יהיה תלוי במה שבינת העל תעשה. חשבו על זה: בינת מכונות היא ההמצאה האחרונה שהמין האנושי יצטרך אי פעם לעשות. מכונות יהיו אז טובות יותר בהמצאה מאיתנו, והן יעשו את זה בטווחי זמן דיגיטליים. מה שזה אומר זה בעיקרון טלסקופ של העתיד. חשבו על כל הטכנולוגיות המשוגעות שתוכלו לדמיין שאולי אנשים היו יכולים לפתח במשך הזמן: תרופות להזדקנות, ישוב החלל, ננו רובוטים שמשכפלים את עצמם או העלאת התודעה שלנו למחשבים, כל מיני סוגים של מדע בדיוני שעדיין מתאים לחוקי הפיזיקה. את כל זה בינת העל הזו יכולה לפתח, ויכול להיות שדי מהר.

Now, a superintelligence with such technological maturity would be extremely powerful, and at least in some scenarios, it would be able to get what it wants. We would then have a future that would be shaped by the preferences of this A.I. Now a good question is, what are those preferences? Here it gets trickier. To make any headway with this, we must first of all avoid anthropomorphizing. And this is ironic because every newspaper article about the future of A.I. has a picture of this: So I think what we need to do is to conceive of the issue more abstractly, not in terms of vivid Hollywood scenarios.

עכשיו בינת על עם כזו בגרות טכנולוגית תהיה חזקה מאוד, ולפחות בכמה מקרים, היא תהיה מסוגלת להשיג מה שהיא רוצה. אז יהיו לנו עתיד שיהיה מעוצב על ידי ההעדפות של ב"מ זו. עכשיו שאלה טובה היא, מהן ההעדפות האלו? פה זה נעשה מסובך. כדי להתקדם בכלל עם זה, אנחנו חייבים ראשית להמנע מאנתרופומורפזינג. וזה אירוני מפני שבכל מאמר בעיתון על העתיד של ב"מ יש את התמונה הזו: אז אני חושב שמה שאנחנו צריכים לעשות זה לחשוב על הבעיה הזו בצורה יותר מופשטת, לא במונחים של תסריטים הוליוודיים.

We need to think of intelligence as an optimization process, a process that steers the future into a particular set of configurations. A superintelligence is a really strong optimization process. It's extremely good at using available means to achieve a state in which its goal is realized. This means that there is no necessary connection between being highly intelligent in this sense, and having an objective that we humans would find worthwhile or meaningful.

אנחנו צריכים לחשוב על בינה כתהליך מיטוב, תהליך שמנתב את העתיד לסט מסויים של קונפיגורציות. בינת על היא באמת תהליך מיטוב מאוד חזק. היא ממש טובה בשימוש באמצעים זמינים כדי להשיג מצב בו המטרה שלה מושגת. זה אומר שאין חיבור הכרחי בין להיות מאוד נבון במובן הזה, ושתהיה לה מטרה שאנחנו האנשים נמצא כדאית או משמעותית.

Suppose we give an A.I. the goal to make humans smile. When the A.I. is weak, it performs useful or amusing actions that cause its user to smile. When the A.I. becomes superintelligent, it realizes that there is a more effective way to achieve this goal: take control of the world and stick electrodes into the facial muscles of humans to cause constant, beaming grins. Another example, suppose we give A.I. the goal to solve a difficult mathematical problem. When the A.I. becomes superintelligent, it realizes that the most effective way to get the solution to this problem is by transforming the planet into a giant computer, so as to increase its thinking capacity. And notice that this gives the A.I.s an instrumental reason to do things to us that we might not approve of. Human beings in this model are threats, we could prevent the mathematical problem from being solved.

נניח שניתן לב"מ את המטרה לגרום לאנשים לחייך. כשהב"מ חלשה, היא מבצעת פעולות מועילות או משעשעות שגורמות למשתמשים שלה לחייך. כשהב"מ הופכת לבינת על, היא מבינה שיש דרך יותר יעילה כדי להשיג את המטרה: להשתלט על העולם ולתקוע אלקטרודות בשרירי הפנים של אנשים כדי לגרום לפרצופים מחייכים בקביעות. דוגמה נוספת, נניח שניתן לב"מ את המטרה לפתור בעיה מתמטית מסובכת. כשהב"מ הופכת לבינת על, היא מבינה שהדרך הכי אפקטיבית לפתור את הבעיה הזו היא להפוך את הפלנטה למחשב עצום, כדי להגביר את יכולת החשיבה. ושימו לב שזה נותן לב"מ סיבה משמעותית לעשות דברים לנו שאולי לא נסכים להם. אנשים במודל הזה הם איומים, נוכל למנוע מהבעיה המתמטית מלהפתר.

Of course, perceivably things won't go wrong in these particular ways; these are cartoon examples. But the general point here is important: if you create a really powerful optimization process to maximize for objective x, you better make sure that your definition of x incorporates everything you care about. This is a lesson that's also taught in many a myth. King Midas wishes that everything he touches be turned into gold. He touches his daughter, she turns into gold. He touches his food, it turns into gold. This could become practically relevant, not just as a metaphor for greed, but as an illustration of what happens if you create a powerful optimization process and give it misconceived or poorly specified goals.

כמובן, יכול להיות שהדברים לא ילכו לכיוונים לא טובים בדרכים האלה; אלה דוגמאות הנפשה. אבל הנקודה העיקרית פה היא חשובה: אם אתם יוצרים תהליך מיטוב ממש חזק כדי למקסם את מטרה X, כדאי שתדאגו שההגדרה שלכם ל X כוללת כל מה שחשוב לכם. זה שיעור שגם מסופר בהרבה מיתוסים. המלך מידאס רצה שכל מה שיגע בו יהפוך לזהב. הוא נוגע בביתו, והיא הופכת לזהב. הוא נוגע באוכל, והוא הופך לזהב. זה יכול להפוך לרלוונטי במיוחד, לא רק כמטאפורה לתאוות בצע, אלא כהדגמה למה שקורה אם אתם יוצרים תהליך מיטוב חזק ונותנים לו מטרות לא מובנות או לא מוגדרות היטב.

Now you might say, if a computer starts sticking electrodes into people's faces, we'd just shut it off. A, this is not necessarily so easy to do if we've grown dependent on the system -- like, where is the off switch to the Internet? B, why haven't the chimpanzees flicked the off switch to humanity, or the Neanderthals? They certainly had reasons. We have an off switch, for example, right here. (Choking) The reason is that we are an intelligent adversary; we can anticipate threats and plan around them. But so could a superintelligent agent, and it would be much better at that than we are. The point is, we should not be confident that we have this under control here.

עכשיו אתם אולי תגידו, אם מחשב יתחיל לתקוע אלקטרודות לתוך פנים של אנשים, פשוט נכבה אותו. ראשית, זה לא בהכרח יהיה פשוט לעשות אם הפכנו לתלויים במערכת -- כמו, איפה מתג הכיבוי לאינטרנט? שנית, למה השימפנזים לא כיבו את המתג של האנושות, או הנאנדרטלים? בהחלט היו להם סיבות. יש לנו מתג כיבוי, לדוגמה, ממש פה. (חנק) הסיבה היא שאנחנו יריב נבון; אנחנו יכולים לצפות איומים ולתכנן סביבם. אבל כך גם תוכל בינת העל, והיא תהיה טובה בזה מאיתנו. הנקודה היא, אנחנו לא צריכים להיות בטוחים שנוכל לשלוט במצב הזה.

And we could try to make our job a little bit easier by, say, putting the A.I. in a box, like a secure software environment, a virtual reality simulation from which it cannot escape. But how confident can we be that the A.I. couldn't find a bug. Given that merely human hackers find bugs all the time, I'd say, probably not very confident. So we disconnect the ethernet cable to create an air gap, but again, like merely human hackers routinely transgress air gaps using social engineering. Right now, as I speak, I'm sure there is some employee out there somewhere who has been talked into handing out her account details by somebody claiming to be from the I.T. department.

ונוכל לנסות לעשות את העבודה שלנו מעט קלה יותר על ידי, נגיד, לשים את הב"מ בקופסה, כמו סביבת תוכנה מאובטחת, הדמיית מציאות מדומה ממנה היא לא יכולה לברוח. אבל כמה בטוחים אנחנו יכולים להיות שהב"מ לא תוכל למצוא באג. בהתחשב בזה שהאקרים אנושיים מוצאים באגים כל הזמן, היתי אומר, שכנראה לא כל כך בטוחים. אז ננתק את כבל הרשת כדי ליצור מרווח אויר, אבל שוב, כמו האקרים אנושיים רגילים שעוברים פערי אויר באופן שוטף בעזרת הנדסה חברתית. ממש עכשיו, כשאני מדבר, אני בטוח שיש איזשהו עובדת אי שם ששכנעו אותה לתת את פרטי החשבון שלה על ידי מישהו שטוען שהוא ממחלקת מערכות המידע.

More creative scenarios are also possible, like if you're the A.I., you can imagine wiggling electrodes around in your internal circuitry to create radio waves that you can use to communicate. Or maybe you could pretend to malfunction, and then when the programmers open you up to see what went wrong with you, they look at the source code -- Bam! -- the manipulation can take place. Or it could output the blueprint to a really nifty technology, and when we implement it, it has some surreptitious side effect that the A.I. had planned. The point here is that we should not be confident in our ability to keep a superintelligent genie locked up in its bottle forever. Sooner or later, it will out.

תרחישים יותר יצירתיים הם גם אפשריים, כמו אם אתם הב"מ, אתם יכולים לדמיין שינוי אלקטרודות בחיווט הפנימי שלכם כדי ליצור גלי רדיו בהם אתם יכולים להשתמש כדי לתקשר. או אולי תוכלו להעמיד פנים שאתם מקולקלים, ואז כשהמתכנת יפתח אתכם לראות מה לא תקין איתכם, הם יביטי בקוד המקור -- באם! -- המניפולציה יכולה להתרחש. או אולי היא תוכל להוציא את התוכנית שלה לטכנולוגיה ממש מגניבה, וכשתיישמו אותה, יהיה לה אפקט משנה חשאי שהב"מ תכננה. הנקודה פה היא שאנחנו לא צריכים להיות בטוחים ביכולת שלנו לשמור על שד בינת העל נעול בבקבוק שלו לעד. במוקדם או במאוחר, הוא יצא.

I believe that the answer here is to figure out how to create superintelligent A.I. such that even if -- when -- it escapes, it is still safe because it is fundamentally on our side because it shares our values. I see no way around this difficult problem.

אני מאמין שהתשובה פה היא להבין איך ליצור בינת על כך שאפילו אם -- מתי - שהיא תברח, זה עדיין בטוח מפני שהיא באופן בסיסי לצידנו מפני שהיא חולקת את הערכים שלנו. אני לא רואה דרך לעקוף את הבעיה המסובכת הזו.

Now, I'm actually fairly optimistic that this problem can be solved. We wouldn't have to write down a long list of everything we care about, or worse yet, spell it out in some computer language like C++ or Python, that would be a task beyond hopeless. Instead, we would create an A.I. that uses its intelligence to learn what we value, and its motivation system is constructed in such a way that it is motivated to pursue our values or to perform actions that it predicts we would approve of. We would thus leverage its intelligence as much as possible to solve the problem of value-loading.

עכשיו, אני למעשה די אופטימי שהבעיה הזו יכולה להפתר. לא נהיה צריכים לכתוב רשימה ארוכה של כל מה שאכפת לנו ממנו, או גרוע יותר, לאיית את זה בשפת מחשב כלשהיא כמו C++ או פייתון, זו תהיה משימה מעבר לחסרת תקווה. במקום, אנחנו צריכים ליצור ב"מ שמשמשת בבינה שלה כדי ללמוד מה אנחנו מעריכים, ומערכת המוטיבציה שלה מורכבת בדרך כזו שיש לה מוטיבציה לרדוף אחרי הערכים שלנו כדי לבצע פעולות שהיא צופה שנסכים להן. לכן ננצל את הבינה שלה ככל האפשר כדי לפתור בעיות של הטענת ערכים.

This can happen, and the outcome could be very good for humanity. But it doesn't happen automatically. The initial conditions for the intelligence explosion might need to be set up in just the right way if we are to have a controlled detonation. The values that the A.I. has need to match ours, not just in the familiar context, like where we can easily check how the A.I. behaves, but also in all novel contexts that the A.I. might encounter in the indefinite future.

זה יכול לקרות, והתוצאה תוכל להיות מאוד טובה לאנושות. אבל זה לא קורה אוטומטית. התנאים ההתחלתיים להתפוצצות הבינה אולי צריכים להבנות בדיוק בדרך הנכונה אם אנחנו רוצים שיהיה לנו פיצוץ מבוקר. הערכים שיש לב"מ צריכים להתאים לשלנו, לא רק בהקשר המוכר, כמו איפה אנחנו יכולים לבדוק בקלות איך הב"מ מתנהגת, אלא גם בהקשרים הכי חדשים שהב"מ אולי תיתקל בהם בעתיד הלא בטוח.

And there are also some esoteric issues that would need to be solved, sorted out: the exact details of its decision theory, how to deal with logical uncertainty and so forth. So the technical problems that need to be solved to make this work look quite difficult -- not as difficult as making a superintelligent A.I., but fairly difficult. Here is the worry: Making superintelligent A.I. is a really hard challenge. Making superintelligent A.I. that is safe involves some additional challenge on top of that. The risk is that if somebody figures out how to crack the first challenge without also having cracked the additional challenge of ensuring perfect safety.

ויש גם כמה נושאים איזוטריים שנצטרך לפתור, לארגן: הפרטים המדוייקים של תאוריית ההחלטות שלה, איך היא מתמודדת עם חוסר ודאות לוגי וכך האלה. אז הבעיות הטכניות שצריכות להפתר כדי לגרום לזה לעבוד נראים די קשים -- לא קשים כמו לעשות ב"מ סופר אינטיליגנטית, אבל די קשים. הנה הדאגה: ליצור ב"מ סופר אינטיליגנטית זה אתגר מאוד קשה. ליצור ב"מ סופר אינטיליגנטית שהיא בטוחה כולל כמה אתגרים נוספים מעבר לזה. הסיכון הוא שמישהו יבין איך לפצח את האתגר הראשון בלי לפתור את האתגר הנוסף של להבטיח בטיחות מושלמת.

So I think that we should work out a solution to the control problem in advance, so that we have it available by the time it is needed. Now it might be that we cannot solve the entire control problem in advance because maybe some elements can only be put in place once you know the details of the architecture where it will be implemented. But the more of the control problem that we solve in advance, the better the odds that the transition to the machine intelligence era will go well.

אז אני חושב שאנחנו צריכים למצוא פיתרון לבעית השליטה מראש, כך שהיא תהיה זמינה בזמן שמצטרך אותה. עכשיו אולי לא נוכל לפתור את כל בעיית השליטה מראש מפני שאולי כמה אלמנטים יכולים להיות במקום רק ברגע שאנחנו יודעים את הפרטים של הארכיטקטורה בה היא תיושם. אבל ככל שנפתור יותר מבעית השליטה מראש, הסיכויים יהיו טובים יותר שהמעבר לעידן בינת המכונה יעבור טוב.

This to me looks like a thing that is well worth doing and I can imagine that if things turn out okay, that people a million years from now look back at this century and it might well be that they say that the one thing we did that really mattered was to get this thing right.

זה בשבילי נראה כמו משהו ששווה לעשות ואני יכול לדמיין שאם דברים ילכו טוב, שאנשים בעוד מליון שנה יביטו אחורה במאה הזו ואולי יסתבר שהדבר היחידי שעשינו ובאמת שינה היה לעשות את זה נכון.

Thank you.

תודה לכם.

(Applause)

(מחיאות כפיים)