Stuart Russell: 3 principles for creating safer AI

أمامَكُم لي سيدول. لي سيدول هو واحدٌ من أفضلِ لاعبي لُعبةِ "غو" على مستوى العالم وهو يمرّ بما يدعوه أصدقائي في سيليكون فالي بلحظةِ "يا للهول!"

This is Lee Sedol. Lee Sedol is one of the world's greatest Go players, and he's having what my friends in Silicon Valley call a "Holy Cow" moment --

(ضحك) هيَ لحظةٌ نُدرِكُ عندها بأنّ تقنيّات الذكاء الاصطناعي تتطوّر بسرعة أكبرَ بكثيرٍ ممّا توقعنا. إذاً فقد خسرَ البشرُ في لعبةِ "غو" لكِنْ ماذا عنِ العالمِ الواقعيّ؟ العالمُ الواقعيّ حقيقةً، أكبرُ بكثيرٍ و أشدُّ تعقيداً مِنْ لُعبةِ "غو". صحيحٌ بأنّهُ أقلُّ وضوحاً لكنّه يُصنّفُ أيضاً كقضيّةِ اتخاذِ قرارات. ولو نظرنا إلى بعضِ التّقنياتِ التي ظهرتْ على السّاحةِ مؤخّراً

(Laughter) a moment where we realize that AI is actually progressing a lot faster than we expected. So humans have lost on the Go board. What about the real world? Well, the real world is much bigger, much more complicated than the Go board. It's a lot less visible, but it's still a decision problem. And if we think about some of the technologies that are coming down the pike ...

فإنَّ نوريكو أراي قالت أن الآلات لا تستطيع القراءة بعد على الأقل على مستوى الفهم. لكن هذا سيحصل، و حينما يحصلُ ذلك، لَنْ تستغرِقَ طويلاً، قبلَ أَنْ تقرأَ وتفهمَ كلَّ ما توصّل إليهِ البشرُ مِن عِلم. و هذا ما سيُمكِّنها، باستخدامِ قُدُراتِها الهائلةِ في حسابِ الاحتمالاتِ المستقبليّة، كما رأينا لتوّنا في لعبةِ "غو"، إنْ تمكّنت من الوصولِ إلى المزيدِ من المعلومات، من اتّخاذِ قراراتٍ أكثرَ منطقيّةً من قراراتنا في العالمِ الواقعيّ. إذاً، هل هذا شيءٌ جيّد؟ أتمنّى ذلكَ حقيقةً.

Noriko [Arai] mentioned that reading is not yet happening in machines, at least with understanding. But that will happen, and when that happens, very soon afterwards, machines will have read everything that the human race has ever written. And that will enable machines, along with the ability to look further ahead than humans can, as we've already seen in Go, if they also have access to more information, they'll be able to make better decisions in the real world than we can. So is that a good thing? Well, I hope so.

الحضارةُ البشريّةُ بأكملها و كلّ شيءٍ ذو قيمةٍ لدينا تمَّ بفضلِ ذكائنا نحن. وإنْ تمكنّا من الحصولِ على ذكاءٍ أكثر، لن يكونَ حينها هناكَ شيءٌ لن يستطيعَ البشرُ القيامَ به. وأعتقدُ بأنّ إنجازاً كَهذا منَ الممكنِ أن يُصنّف كما وصفهُ البعضُ كأعظمِ إنجازٍ في تاريخِ البشريّة. لماذا يقولُ بعضُ الأشخاصِ إذاً أشياءَ كهذه: بأنّ الذكاءَ الاصطناعيَّ سيقضي على الجنسِ البشريّ؟ هل ظهرت هذهِ الفكرةُ من جديد؟ هل هي مجرّد فكرةٍ يؤمنُ بها كلٌّ من إيلون ماسك، بيل غيتس، وستيفن هوكنغ؟

Our entire civilization, everything that we value, is based on our intelligence. And if we had access to a lot more intelligence, then there's really no limit to what the human race can do. And I think this could be, as some people have described it, the biggest event in human history. So why are people saying things like this, that AI might spell the end of the human race? Is this a new thing? Is it just Elon Musk and Bill Gates and Stephen Hawking?

حقيقةً لا، هذه الفكرةُ موجودةٌ منذ زمنٍ بعيد. وسأعرِضُ لَكُم مقولةً شهيرة: "حتى ولوّ تمكّنا من إبقاءِ الآلاتِ تحتَ سيطرتنا" "عبرَ إطفاءِها مثلاً حينما يلزمُ الأمر" وسأعودُ لاحقاً لهذهِ الفكرة -قطع مصدر الطاقة عن الآلة- "علينا كبشر أن نشعر بالتواضع." مقولةُ من هذه إذاً؟ إنّها مقولةُ آلان تورينغ، عامَ 1951 آلان تورينغ كما نعلمُ جميعاً هو مؤسّسُ علومِ الحاسب والأبُ الروحيّ للذكاءِ الاصطناعيِّ كذلك. إنْ فكّرنا إذاً في هذهِ القضيّة، قضيّةُ صُنعِ شيءٍ أكثرَ ذكاءً ممّا أنتَ عليه قد نجد "قضيّة الغوريلا" اسماً مناسباً لها، لأنَّ أجدادَ الغوريلا قاموا بهذا منذُ عدّةِ ملايينِ سنة، لِمَ لا نستشيرُ الغوريلّا إذاً: هَل كانتْ هذهِ فكرةً جيّدة؟

Actually, no. This idea has been around for a while. Here's a quotation: "Even if we could keep the machines in a subservient position, for instance, by turning off the power at strategic moments" -- and I'll come back to that "turning off the power" idea later on -- "we should, as a species, feel greatly humbled." So who said this? This is Alan Turing in 1951. Alan Turing, as you know, is the father of computer science and in many ways, the father of AI as well. So if we think about this problem, the problem of creating something more intelligent than your own species, we might call this "the gorilla problem," because gorillas' ancestors did this a few million years ago, and now we can ask the gorillas: Was this a good idea?

وها هُم يتباحثون فيما بينهم ليُقدّموا لّنا الإجابة، ويبدو أنّ إجابتهم بالإجماعِ هيَ: "لا! لقد كانت فكرةً فظيعة!" "نحنُ في حالةٍ يُرثى لها." يمكننا حقيقةً رؤيةُ التعاسةِ جيّداً في أعينهم. (ضحك)

So here they are having a meeting to discuss whether it was a good idea, and after a little while, they conclude, no, this was a terrible idea. Our species is in dire straits. In fact, you can see the existential sadness in their eyes. (Laughter)

ويبدو بأنّه قد حانَ دورنا لِنُحسَّ بأنَّ صُنعَ شيءٍ أذكى مِنّا قد لا يكونُ فكرةً سديدة -- لكن ما الحلّ حيالَ هذا؟ حسنًا، لا شيء، ماعدا التوقفُ عن تطويرِ تقنيّات الذكاءِ الاصطناعي، ولكنْ نظراً للفوائدِ التي ذكرتُها ولأنني باحث في هذا المجال فأنا لن أقبل بحلٍّ كهذا. أودّ حقيقةً الاستمرارَ في تطويرِ هذهِ التقنيّات.

So this queasy feeling that making something smarter than your own species is maybe not a good idea -- what can we do about that? Well, really nothing, except stop doing AI, and because of all the benefits that I mentioned and because I'm an AI researcher, I'm not having that. I actually want to be able to keep doing AI.

لذا أرى بأنّه علينا أنْ نكونَ أكثرَ وضوحاً. ونجدَ المشكلةَ الحقيقيّة. لمَ قد يؤدّي تحسينُ هذه التقنياتِ إلى نتائجَ كارثيّة؟

So we actually need to nail down the problem a bit more. What exactly is the problem? Why is better AI possibly a catastrophe?

سأعرضُ لكم مقولةً أخرى: "كانَ علينا أن نتأكد من أن المغزى من تصميم آلة هو المغزى الذي نرجوه حقيقة" هذا ما قالهُ نوربرت ويينر عامَ 1960، بعدَ مُشاهدتهِ أحدَ أوّل الآلاتِ الذكيّةِ تتغلّبُ على مُخترعها في لعبةِ "تشيكرز". و لكنَّ الشيءَ ذاتهُ ينطبقُ على ماحصلَ للمَلِكِ مايدس. إذ قال الملكُ مايدس،" أريد أن يتحول كل ما ألمسه ذهبًا" وقد تحقَّقَت أمنيتهُ بحذافيرها. هذا هوَ ما صمّم آلتُه لتقومَ بهِ إن صحّ التعبير، و هكذا تحوّل طعامُه و شرابه و حتى أقرباؤه، جميعهم إلى ذهب. وماتَ في النهاية تعيساً جائعاً. سنُسمّي هذه القضيّة إذاً بقضيّة الملك مايدس قضيّةُ تحديد هدفٍ لا يتماشى فعلًا مع ما نريده حقًا. و هي قضيّة نسمّيها أكاديميّاً بقضيّة توافقِ الأهداف.

So here's another quotation: "We had better be quite sure that the purpose put into the machine is the purpose which we really desire." This was said by Norbert Wiener in 1960, shortly after he watched one of the very early learning systems learn to play checkers better than its creator. But this could equally have been said by King Midas. King Midas said, "I want everything I touch to turn to gold," and he got exactly what he asked for. That was the purpose that he put into the machine, so to speak, and then his food and his drink and his relatives turned to gold and he died in misery and starvation. So we'll call this "the King Midas problem" of stating an objective which is not, in fact, truly aligned with what we want. In modern terms, we call this "the value alignment problem."

ولكن وضع أهداف خاطئة ليس الجزء الوحيد في المشكلة. هنالِكَ جُزءٌ آخر. إن وضعت هدفًا لآلة ما، لتقومَ بشيءٍ ولوّ كانَ ببساطةِ جلبِ القهوة، الآلةُ ستقولُ لنفسها، "ما الذي قد يُعيقني عن جلب القهوة؟" "قد يقوم شخصٌ ما بإطفائي!" "حسناً!" "سوف أمنع حصول هذا." "سوف أعطّل مفتاح إيقاف تشغيلي!" "سأفعل أيّ شيءٍ يحفظُ لي مهمّتي" "لن أسمحَ لأحدٍ بِأنْ يمنعني من أدائها." هذا التفكيرِ الذاتي الذي قامت به الآلة، بطريقة دفاعية محضة للدفاع عن هدف، لا يتوافقُ مع الأهداف الحقيقية للجنس البشري -- هذه هي مشكلتنا التي نواجهها. في الواقع، هذهِ أهمّ فكرةٍ أودُّ مِنكم تذكُّرَها مِن هذه المحادثة. إن أردتم تذكّر شيءٍ وحيدٍ من حديثي، تذكروا التالي: "لنْ تستطيعَ جلبَ القهوةِ لأحدٍ وأنتَ ميتْ."

Putting in the wrong objective is not the only part of the problem. There's another part. If you put an objective into a machine, even something as simple as, "Fetch the coffee," the machine says to itself, "Well, how might I fail to fetch the coffee? Someone might switch me off. OK, I have to take steps to prevent that. I will disable my 'off' switch. I will do anything to defend myself against interference with this objective that I have been given." So this single-minded pursuit in a very defensive mode of an objective that is, in fact, not aligned with the true objectives of the human race -- that's the problem that we face. And in fact, that's the high-value takeaway from this talk. If you want to remember one thing, it's that you can't fetch the coffee if you're dead.

(ضحك) قاعدةٌ بمنتهى البساطة، صحيح؟ ردّدوها ثلاثَ مرّاتٍ يوميّاً. (ضحك)

(Laughter) It's very simple. Just remember that. Repeat it to yourself three times a day. (Laughter)

في الواقع، هذا هوَ تماماً ماحصلَ في الفلمِ الشّهيرِ (2001: رحلةُ الفضاء) الحاسوبُ (هال) أرادَ قيادةَ المهمّةِ لكنّ هذا ليسَ تماماً ما أرادهُ البشرُ مِنه كحاسوب وأدّى هذا لِمُعارضةِ (هال) أوامرَ الطّاقَمْ. لِحُسن الحظِّ لم يَكُنْ (هال) خارقَ الذّكاء. كانَ ذكيّاً فعلاً، ولكِنّ (ديف) كانَ أذكى مِنه وتمكّن من إيقافهِ عن العمل. لكنّنا قد لا نكونُ مَحظوظينَ على الدّوام. ما الذي سنفعلهُ إذاً؟

And in fact, this is exactly the plot of "2001: [A Space Odyssey]" HAL has an objective, a mission, which is not aligned with the objectives of the humans, and that leads to this conflict. Now fortunately, HAL is not superintelligent. He's pretty smart, but eventually Dave outwits him and manages to switch him off. But we might not be so lucky. So what are we going to do?

أحاولُ حقيقةً أنْ أقومَ بإعادةِ تعريفِ الذّكاءِ الاصطناعي بطريقةٍ تُنهي جميعَ الشكوكِ حولَ إمكانيّةِ تمرُّدِ الآلاتِ ومحاولتها تحقيقَ أهدافِها الشخصيّة. سأعتمدُ إذاً على ثلاثِ مبادئَ أساسيّة وأوّل مبدأٍ هوَ مبدأُ الإيثار بأنْ يكونَ الهدفُ الوحيدُ للرّوبوتِ هوَ أنْ يُحاولَ تحقيقَ أكبرِ قدرٍ ممكنٍ من أهدافِ البشر و مِنْ قيم البشريّة. وبقيم البشريّةِ لستُ أعني تلكَ المثاليّاتِ البعيدةَ عن الواقع. بل أعني بها أيّ شيءٍ قد يفضل الإنسان أن تكون عليه حياته. وهوَ ما ينتهكُ (قانونَ أزيموف) بأنّه على الروبوتِ حمايةُ حياتهِ الشخصيّة. لايجبُ أنْ يهتمَّ الروبوتُ بهكذا أمورٍ مطلقاً.

I'm trying to redefine AI to get away from this classical notion of machines that intelligently pursue objectives. There are three principles involved. The first one is a principle of altruism, if you like, that the robot's only objective is to maximize the realization of human objectives, of human values. And by values here I don't mean touchy-feely, goody-goody values. I just mean whatever it is that the human would prefer their life to be like. And so this actually violates Asimov's law that the robot has to protect its own existence. It has no interest in preserving its existence whatsoever.

أمّا المبدأ الثاني فَهوَ مبدأُ التذلّلِ إنْ صحَّ التّعبير. وهوَ مهمٌّ للغايةِ لإبقاءِ الروبوتاتِ آمنة. و ينُصُّ على أنَّ الروبوتَ لا يعلمَ بحقيقةِ المنافعِ التي سيحقّقها للبشريّةِ ولكنّهُ سيعملُ على تحقيقِ أكبرِ قدرٍ منها لكنه لا يعلم ماهيتها. وهوَ ماسيوقفهُ عن اتّخاذِ قراراتهِ الخاصةِ حولَ الهدفِ الموكلِ إليه. إذ أن عدم اليقين يبدو أمرًا بالغ الأهمية.

The second law is a law of humility, if you like. And this turns out to be really important to make robots safe. It says that the robot does not know what those human values are, so it has to maximize them, but it doesn't know what they are. And that avoids this problem of single-minded pursuit of an objective. This uncertainty turns out to be crucial.

لكنْ لكيّ يكونَ الروبوتُ مفيداً لنا لابدّ مِنْ أنْ يمتلكَ فكرةً عمّا نريده. ستسكتشِفُ الروبوتاتُ مانريدهُ مِنها عبرَ مراقبةِ سلوكِنا فاختياراتنا الشخصية ستعطي معلومات عما نفضّلُ أن تكون حياتنا عليه. هذه هيَ إذاً المبادئُ الثلاثة. لنرى سويّةً كيفَ يمكنُ تطبيقُ هذه المبادئ على فكرة "قطع الطاقة عن الروبوت" التي اقترحها (تيورنغ).

Now, in order to be useful to us, it has to have some idea of what we want. It obtains that information primarily by observation of human choices, so our own choices reveal information about what it is that we prefer our lives to be like. So those are the three principles. Let's see how that applies to this question of: "Can you switch the machine off?" as Turing suggested.

إذاً، سأقدّم إليكُم الرّوبوت (PR2) الذي نمتلكهُ في مختبرنا للأبحاث. ولديه كما ترونَ زرُّ إطفاءِ تشغيلٍ أحمرُ كبيرٌ على ظهره. والسّؤال هوَ: هل سيسمحُ الروبوتُ لكَ بضغطِ هذا الزرّ؟ إنْ تخيّلنا الموضوعَ كالمُعتاد أنْ نخبرهُ بأنْ يجلبَ القهوةَ ويفكّر: "يجب أن أجلب القهوة بأيّ ثمن" "لكنني لن أستطيعَ جلبَ القهوةِ لأحدٍ وأنا ميت" من الواضح أنَّ الروبوتَ كانَ يُشاهدُ هذه المحادثة، وبالتّالي سيقرّرُ الروبوت: "سأعطّلُ زرَّ إطفاءِ تشغيلي إذاً!" "سأصعقُ أيضاً الجميعَ في (ستارباكس) "لأنّهم قد يقفونَ في طريقي لجلبِ القهوة!" (ضحك)

So here's a PR2 robot. This is one that we have in our lab, and it has a big red "off" switch right on the back. The question is: Is it going to let you switch it off? If we do it the classical way, we give it the objective of, "Fetch the coffee, I must fetch the coffee, I can't fetch the coffee if I'm dead," so obviously the PR2 has been listening to my talk, and so it says, therefore, "I must disable my 'off' switch, and probably taser all the other people in Starbucks who might interfere with me." (Laughter)

لايبدو بأنهُ هناكَ مفرٌّ من هذا، صحيح؟ لايبدو بأنّ تجنّبَ أخطاءٍ كهذهِ أمرٌ ممكن وهذا لأنّ الروبوتَ لديه هدفٌ صريحٌ.

So this seems to be inevitable, right? This kind of failure mode seems to be inevitable, and it follows from having a concrete, definite objective.

لكنْ ماذا لَوْ جعلنا الروبوتَ أقلَّ ثقةً بصحّةِ فهمهِ للهدف؟ سيفكّرُ حتماً عندها بطريقةٍ مختلفة. سيقول لنفسه: "قد يقومُ البشرُ بإيقافِ تشغيلي" "لكنّ ذلك سيحصلُ فقط إنْ فعلتُ شيئاً خاطئاً." "لكنّني لا أعلمُ ما الشيءُ الخاطئُ الذي قد أفعله" "أنا أعلمُ فقط بأنّني لاأريدُ فِعلَ شيءٍ خاطئ" وهذا تطبيقٌ للمبدأين الأوّل والثاني. "لذا منَ الأفضلِ أنْ أتركَهُ يُطفئني" ويمكننا رياضيّاً حسابُ مدى التقبّلِ الذي سيمتلكهُ الروبوت لأنْ يقومَ البشرُ بإطفاءه. وهذا مرتبط بشكل مباشر بمدى تأكّدِ الرّوبوتِ من فهمهِ للهدفِ من إطفاءه.

So what happens if the machine is uncertain about the objective? Well, it reasons in a different way. It says, "OK, the human might switch me off, but only if I'm doing something wrong. Well, I don't really know what wrong is, but I know that I don't want to do it." So that's the first and second principles right there. "So I should let the human switch me off." And in fact you can calculate the incentive that the robot has to allow the human to switch it off, and it's directly tied to the degree of uncertainty about the underlying objective.

وهكذا تماماً من خلالِ إطفاءِ الرّوبوت، يكونُ المبدأُ الثالثُ قد تحقق. لأنّ الروبوتَ سيكونُ قد تعلّمَ مِن هذهِ التجربة بأنّهُ قد فعلَ شيئاً خاطئاً. و يمكننا حقيقةً باستعمالِ بعضِ الرّموز كما يفعلُ علماءُ الرياضيّاتِ عادةً أنْ نُثبتَ النظريّةَ القائلةَ بأنَّ روبوتاً كهذا سيكونُ حتماً مفيداً للإنسان. وبأنَّ روبوتاً مصمّماً بهذهِ المعاييرِ سيكونُ حتماً أكثرَ فائدةً من روبوتٍ مصمّمٍ من دونها. هذا مثالٌ بسيطٌ إذاً وهوَ الخطوةُ الأولى فقط ممّا نحاولُ تحقيقهُ من خلالِ الذّكاءِ الاصطناعي المطابق للإنسان.

And then when the machine is switched off, that third principle comes into play. It learns something about the objectives it should be pursuing, because it learns that what it did wasn't right. In fact, we can, with suitable use of Greek symbols, as mathematicians usually do, we can actually prove a theorem that says that such a robot is provably beneficial to the human. You are provably better off with a machine that's designed in this way than without it. So this is a very simple example, but this is the first step in what we're trying to do with human-compatible AI.

الآنَ إلى المبدأ الثالث، و الذي أظنّه يدفعكم للتفكير بحيرة. تفكرون بالتالي: "و لكنني، كإنسانٍ أتصرّف بشكلٍ سيّء!" "لا أريدُ للروبوتِ أنْ يقلّدني!" "لا أريدهُ أنْ يتسللَ في اللّيلِ إلى المطبخِ ويأخُذَ طعاماً منَ الثلّاجة" "أنا أفعل أشياءَ سيّئة!" بالطّبعِ هناكَ أشياءٌ عدّةٌ لانرغبُ أنْ تقلّدنا الروبوتاتُ بها. لكنّي لم أعنِ هذا بالتعلّمِ من مراقبتنا فقط لأنّك تتصرف بشكلٍ سيّءٍ لايعني بأنَّ الروبوتَ سيقومُ بتقليدِ تصرّفاتك. بل سيتفهّم دوافعكَ وربّما يساعدكَ في مقاومتها إنْ لزمَ الأمر. و لكنّ الأمرَ لايزالُ صعباً. مانحاولُ الوصولَ إليهِ حقيقةً هو جعلُ الآلاتِ قادرةً على اكتشافِ الحياةِ التي سيفضّلُ الشّخص أن يعيشها كائناً من كان أيّاً تكن الحياة التي يريدُها مهما كانَ مايريده. لكنّ عدداً هائلاً من المشاكل تواجهنا في تحقيق ذلك ولا أتوقّع بأنّ نستطيع حلّها جميعها في القريب العاجل. المشكلةُ الأصعبُ هي نحنُ للأسف.

Now, this third principle, I think is the one that you're probably scratching your head over. You're probably thinking, "Well, you know, I behave badly. I don't want my robot to behave like me. I sneak down in the middle of the night and take stuff from the fridge. I do this and that." There's all kinds of things you don't want the robot doing. But in fact, it doesn't quite work that way. Just because you behave badly doesn't mean the robot is going to copy your behavior. It's going to understand your motivations and maybe help you resist them, if appropriate. But it's still difficult. What we're trying to do, in fact, is to allow machines to predict for any person and for any possible life that they could live, and the lives of everybody else: Which would they prefer? And there are many, many difficulties involved in doing this; I don't expect that this is going to get solved very quickly. The real difficulties, in fact, are us.

فنحنُ نتصرّفُ بشكلٍ سيّءٍ كما ذكرتُ قبل قليل. و بعضُنا مؤذٍ للآخرين. رغمَ هذا، لن يقومَ الروبوتُ بتقليدِ تصرّفاتنا السيّئة. إذ ليست له أهداف شخصية خاصة. فهو يؤثر الإيثار بطريقة صرفة. وهو ليس مصممًا لإرضاء شخص واحد، المستعمل، لكن على الروبوتِ حقيقةً أن يحترمَ ما يفضله الجميع. سيكون بإمكانهِ مثلاً تقبّلُ انحرافٍ بسيطٍ عن الصواب و سيتفهّم بأنّك لستَ سيّئاً جداً فقط لأنّكَ تحصُلَ على بعضِ الرشاوى كمسؤول جوازات سفر وأنَّ السببَ هوَ حاجتكُ لإطعامِ عائلتكَ ودفعِ أقساطِ المدرسةِ لأطفالك. سيفهم الروبوتُ هذا لكنّه لايعني بأنّه سيتعلّمُ السرقة. سيساعدكَ الروبوتُ على إرسال أطفالك للمدرسة.

As I have already mentioned, we behave badly. In fact, some of us are downright nasty. Now the robot, as I said, doesn't have to copy the behavior. The robot does not have any objective of its own. It's purely altruistic. And it's not designed just to satisfy the desires of one person, the user, but in fact it has to respect the preferences of everybody. So it can deal with a certain amount of nastiness, and it can even understand that your nastiness, for example, you may take bribes as a passport official because you need to feed your family and send your kids to school. It can understand that; it doesn't mean it's going to steal. In fact, it'll just help you send your kids to school.

نحنُ البشرُ للأسفِ محدودونَ في قدراتنا الحسابيّة. لي سيدول كان لاعبَ "غو" رائعاً، لكنّه خسر أمامَ حاسوب. ولَو راجعنا تحركاتِه خلالَ المباراةِ لوجدنا حركةً خسرَ بسببها. لكنّ ذلك لا يعني بأنّه قد خسرَ متعمّداً. ولنفهمَ لمَ اختارَ هذهِ الحركةَ علينا أن نقوم بمحاكاةٍ لتحرّكاته عبرَ نموذجٍ حاسوبيّ لعملِ الدّماغ البشريّ يلتزم بمحدّدات البشرِ الحسابيّة وهذا شيءٌ معقّدٌ للغاية. ولكنّه شيءٌ بإمكاننا العملُ على فهمه.

We are also computationally limited. Lee Sedol is a brilliant Go player, but he still lost. So if we look at his actions, he took an action that lost the game. That doesn't mean he wanted to lose. So to understand his behavior, we actually have to invert through a model of human cognition that includes our computational limitations -- a very complicated model. But it's still something that we can work on understanding.

و برأيي كباحثٍ في مجال الذكاء الاصطناعي، إنَّ أصعبَ مشكلةٍ تواجهنا هي بأنّه هناك عددًا هائلًا من البشر، و هو ما سيُرغمُ الآلاتِ على المفاضلةِ بين الكثير من الخيارات التي يريدها العديدُ من الأشخاصِ بطرقٍ عديدةٍ لتنفيذِ كلّ خيارٍ منهم. علماءُ الاقتصاد و باحثوا علم الإجتماع جميعهم يفهمون هذا، ونحن نبحث جديًا عن التعاون في هذا الشأن.

Probably the most difficult part, from my point of view as an AI researcher, is the fact that there are lots of us, and so the machine has to somehow trade off, weigh up the preferences of many different people, and there are different ways to do that. Economists, sociologists, moral philosophers have understood that, and we are actively looking for collaboration.

دعونا نلقي نظرةً على ما قد يحدثُ حينما لا تسير الأمور على ما يرام. قد تمرّ بمحادثةٍ كهذهِ مثلاً مع مساعدكَ الشخصيّ الذكيّ والذي قد يتوفّرُ في الأسواقِ خلالَ بضعِ سنوات. نسخةٌ محسّنةٌ عن (سيري) مثلاً. تخبركَ (سيري) إذاً في المحادثةِ بأنَّ زوجتكَ قد اتصلت لتذكّركَ بعشائكما الليلة. وبالتّأكيدِ أنتَ لا تذكرُ شيئاً كهذا: "ماذا! أيُّ عشاء!" "عمَّ تتحدثين؟!"

Let's have a look and see what happens when you get that wrong. So you can have a conversation, for example, with your intelligent personal assistant that might be available in a few years' time. Think of a Siri on steroids. So Siri says, "Your wife called to remind you about dinner tonight." And of course, you've forgotten. "What? What dinner? What are you talking about?"

"عشاءُ السّابعةِ مساءً في ذكرى زواجكما العشرين."

"Uh, your 20th anniversary at 7pm."

"لا يمكنني الحضور! سألتقي بالأمينِ العامِّ عندَ السابعة والنصف!" "كيفَ حصلَ كلّ هذا؟"

"I can't do that. I'm meeting with the secretary-general at 7:30. How could this have happened?"

"حذّرتكَ ولكِنْ، هذا مايحصلُ حينما تتجاهلُ نصائحي."

"Well, I did warn you, but you overrode my recommendation."

"حسناً ماذا سأفعلُ الآن؟ لايمكنني إخبارُ زوجتي بأنّي نسيتُ مناسبةً كهذه!"

"Well, what am I going to do? I can't just tell him I'm too busy."

"لا تقلق." "قمتُ بتأجيلِ موعدِ انطلاقِ طائرتها." (ضحك)

"Don't worry. I arranged for his plane to be delayed." (Laughter)

"افتعلتُ بعضَ الأعطالِ في حاسوبِ شركةِ الطّيران" (ضحك)

"Some kind of computer malfunction." (Laughter)

"حقاً!" "بإمكانكِ فِعلُ أشياءَ كهذه!"

"Really? You can do that?"

"ترسلُ زوجتكُ اعتذارها العميق" "وتتطلّع للقاءكَ غداً على الغداء"

"He sends his profound apologies and looks forward to meeting you for lunch tomorrow."

(ضحك)

(Laughter)

إذن فالقيم هنا -- يبدو إذاً بأنّهُ هناكَ خطأ طفيفًا وعلى مايبدو، فإنّه يتّبع مبادئَ زوجتي و التي هي: "أنت بخيرٍ مادامت زوجتك بخير."

So the values here -- there's a slight mistake going on. This is clearly following my wife's values which is "Happy wife, happy life."

(ضحك)

(Laughter)

يمكنُ كذلكَ أنْ يحصُلَ العكس. كأن تعودَ إلى المنزلِ بعدَ يومٍ مُتعِبٍ في العمل ويرحّب بِك الروبوت: "هل كانَ يوماً شاقاً؟"

It could go the other way. You could come home after a hard day's work, and the computer says, "Long day?"

"جدّاً! لدرجة أنّي لم أجد وقتاً لتناولِ الغداء."

"Yes, I didn't even have time for lunch."

"لابدّ من أنّك جائعٌ إذاً"

"You must be very hungry."

"أتضور جوعًا. هل بإمكانكِ تحضيرُ العشاء لي؟"

"Starving, yeah. Could you make some dinner?"

"هناكَ شيءٌ عليك معرفته." (ضحك)

"There's something I need to tell you." (Laughter)

"هناكٌ أُناسٌ في جنوب السودانِ في حاجةٍ ملحّةٍ للطعامِ أكثرَ منكَ بكثير!"

"There are humans in South Sudan who are in more urgent need than you."

(ضحك)

(Laughter)

"أنا ذاهب إلى هناك! اصنع طعامكَ بنفسك!"

"So I'm leaving. Make your own dinner."

(ضحك)

(Laughter)

لذا، يجبُ أن نَحُلّ هذهِ المشاكل، وأنا متحمّسٌ للعملِ عليهم.

So we have to solve these problems, and I'm looking forward to working on them.

هناكَ أسبابٌ تدفعني للتفاؤل و أحدُ هذه الأسباب: حجمُ البياناتِ الهائلُ الذي نمتلكهُ. لأنَّ الروبوتاتِ كما أخبرتكم ستقرأُ كلّ شيءٍ قامَ البشرُ بكتابته. ومعظمُ ما نكتبُ عنهُ هو قصصٌ عن أناسٍ يرتكبون أخطاءً وآخرين تزعجهم هذهِ الأخطاء. لذا ستتمكّن الروبوتاتُ منْ تعلّمِ كلِّ هذا.

There are reasons for optimism. One reason is, there is a massive amount of data. Because remember -- I said they're going to read everything the human race has ever written. Most of what we write about is human beings doing things and other people getting upset about it. So there's a massive amount of data to learn from.

ولدينا كذلكَ دافعٌ اقتصاديٌّ كبيرٌ لإتمامِ الأمرِ دونَ أخطاء. تخيّلو مثلاً روبوتَكم المنزليَّ وقد تأخرتم في عملكم مجدداً لكنَّ على الروبوتَ أن يطعم أبناءكم، والأطفال جائعون لكنّ الثلّاجةَ فارغة. سيبجثُ الروبوتُ عن أيِّ طعامٍ وسيرى القطة. (ضحك)

There's also a very strong economic incentive to get this right. So imagine your domestic robot's at home. You're late from work again and the robot has to feed the kids, and the kids are hungry and there's nothing in the fridge. And the robot sees the cat. (Laughter)

لكنّه لم يتعلّم بعدُ قِيَمَكُم الإنسانيّة جيّداً لذا لن يفهم بأنَّ القيمة العاطفيّة للقطّةِ تفوقُ قيمتها الغذائيّة.

And the robot hasn't quite learned the human value function properly, so it doesn't understand the sentimental value of the cat outweighs the nutritional value of the cat.

(ضحك)

(Laughter)

ماذا سيحصل عندها؟ في الواقع، سينتهي الأمرُ هكذا: "روبوتٌ مختلٌّ قامَ بطبخِ قطّة المنزلِ على العشاء" حادثٌ واحدٌ كهذا كفيلٌ بإنهاء صناعة الروبوتات المنزلية. لهذا لدينا دافعٌ حقيقيٌّ لإتمام الأمرِ دون أخطاء حتّى قبلَ أن نتمكّن من اختراعِ أيّ آلة خارقِة الذكاء.

So then what happens? Well, it happens like this: "Deranged robot cooks kitty for family dinner." That one incident would be the end of the domestic robot industry. So there's a huge incentive to get this right long before we reach superintelligent machines.

إذاً كخلاصةٍ لكلامي: أنا أحاولُ بجديّةٍ تغييرَ تعريفِ الذّكاءِ الاصطناعي بحيثُ يمكننا اختراعُ آلاتٍ ذاتَ فائدةٍ مثبتةٍ علميّاً. و ما أقترحهُ كتعريف: آلاتٌ تؤثرُ دوماً مصلحةَ البشر، و تسعى لتحقيقِ أهدافنا فقط، لكنّها ليست متأكّدةً من صحّةِ فهمها لهذهِ الأهداف، لذا ستقومُ بمراقبتنا على الدوام لكيّ تتعلّمَ أكثرَ عمّا نريدها أنْ تساعدنا فيهِ حقّاً. وآمُلُ أنْ نتعلّم نحنُ أيضاً مِنْ هذهِ التجاربِ كيفَ نصبحُ أناساً أفضل. شكراً لكم. (تصفيق)

So to summarize: I'm actually trying to change the definition of AI so that we have provably beneficial machines. And the principles are: machines that are altruistic, that want to achieve only our objectives, but that are uncertain about what those objectives are, and will watch all of us to learn more about what it is that we really want. And hopefully in the process, we will learn to be better people. Thank you very much. (Applause)

كريس أندرسون: هذا مشوّق جدّاً، ستيوارت. سنبقى على المنصّةِ قليلاً، لأنهم بصدد الإعداد للمحادثة التالية.

Chris Anderson: So interesting, Stuart. We're going to stand here a bit because I think they're setting up for our next speaker.

لديَّ بعضُ الأسئلة: إذن فكرة البرمجة عن جهل تببدو للوهلة الأولى رهيبةً. ونحن نتجه نحو الذكاء الخارق، ما الذي سيمنع الروبوت من القراءةِ والتعلّمِ بأنّ المعرفةَ أفضلُ من الجهلِ وتظل في المقابل تحيد عن أهدافها الشخصية وتعيد صياغة تلك البرمجة؟

A couple of questions. So the idea of programming in ignorance seems intuitively really powerful. As you get to superintelligence, what's going to stop a robot reading literature and discovering this idea that knowledge is actually better than ignorance and still just shifting its own goals and rewriting that programming?

ستيوارت راسل: نريدُ حقيقةً كما قلت، أنْ تتعلّمَ هذهِ الآلاتُ أكثر، عن أهدافنا نحن. وستكونُ أكثرَ ثقةً بصحّةِ معلوماتها فقط حينما تصبحُ أكثرَ دقّةً فالدليل واضح هناك وستكون مصممة لتفسيره بشكل صحيح. وسوفَ تفهمُ مثلاً بأنَّ بعضَ الكُتُبِ متحيّزةٌ جدا حيال المعلوماتِ التي تحتويها. حيث لاتروي قصصاً سوى عن ملوكٍ أو أميراتٍ أو عن أبطال ذكور ذوي بشرة بيضاء يقومون بأشياء جميلة. لذا فهي مشكلةٌ معقّدةٌ، لكنَّ الآلاتَ ستتعلّمُ أكثرَ عن أهدافنا في النهاية وستصبحُ أكثرَ فائدةً لنا.

Stuart Russell: Yes, so we want it to learn more, as I said, about our objectives. It'll only become more certain as it becomes more correct, so the evidence is there and it's going to be designed to interpret it correctly. It will understand, for example, that books are very biased in the evidence they contain. They only talk about kings and princes and elite white male people doing stuff. So it's a complicated problem, but as it learns more about our objectives it will become more and more useful to us.

ك.أ : ألم يكن بإمكانك أن تختصر هذا في قانون واحد، كأنْ تزرعَ في دماغها مايلي: "إنْ حاولَ أيُّ إنسانٍ في أيّ وقتٍ أن يُطفِئني" "سأمتثل، سأمتثل."

CA: And you couldn't just boil it down to one law, you know, hardwired in: "if any human ever tries to switch me off, I comply. I comply."

س.ر: بالتأكيد لا. ستكون تلك فكرة سيئة جدًا. تخيّل مثلاً لو امتلكتَ سيّارةً ذاتيّةَ القيادةِ وأردتَ أنْ تُرسلَ بها طفلكَ ذي الخمسة أعوامٍ إلى الروضة. هل تريدُ أن يُطفئَ طِفلُك السيّارةَ في منتصفِ الطّريق؟ بالتّأكيدِ لا. لذا يجبُ أن تفهم الآلةُ دوافعَ الإنسانِ لإطفائها ومدى عقلانيّتهِ و كلّما كانَ الشخصُ منطقيّاً أكثر، كلّما كانت الآلةُ أكثرَ تقبّلاً لإيقافِ تشغيلها. ولوّ كان الشخصُ غريباً أو بدا مُؤذياً، فلنْ تتقبّلَ الآلةُ أنْ يتمَّ إطفاؤها.

SR: Absolutely not. That would be a terrible idea. So imagine that you have a self-driving car and you want to send your five-year-old off to preschool. Do you want your five-year-old to be able to switch off the car while it's driving along? Probably not. So it needs to understand how rational and sensible the person is. The more rational the person, the more willing you are to be switched off. If the person is completely random or even malicious, then you're less willing to be switched off.

ك. أ: حسناً، أودُّ أنْ أقولَ بأنّني أتمنّى حقّاً أنْ تخترعَ هذهِ الأشياء لنا قريباً شكراً جزيلاً لكَ على هذه المحادثة لقد كانت مدهشة.

CA: All right. Stuart, can I just say, I really, really hope you figure this out for us. Thank you so much for that talk. That was amazing.

س. ر: شكراً لك. (تصفيق)

SR: Thank you. (Applause)

This is Lee Sedol. Lee Sedol is one of the world's greatest Go players, and he's having what my friends in Silicon Valley call a "Holy Cow" moment --

So we actually need to nail down the problem a bit more. What exactly is the problem? Why is better AI possibly a catastrophe?

(ضحك) قاعدةٌ بمنتهى البساطة، صحيح؟ ردّدوها ثلاثَ مرّاتٍ يوميّاً. (ضحك)

(Laughter) It's very simple. Just remember that. Repeat it to yourself three times a day. (Laughter)

So this seems to be inevitable, right? This kind of failure mode seems to be inevitable, and it follows from having a concrete, definite objective.

"عشاءُ السّابعةِ مساءً في ذكرى زواجكما العشرين."

"Uh, your 20th anniversary at 7pm."

"لا يمكنني الحضور! سألتقي بالأمينِ العامِّ عندَ السابعة والنصف!" "كيفَ حصلَ كلّ هذا؟"

"I can't do that. I'm meeting with the secretary-general at 7:30. How could this have happened?"

"حذّرتكَ ولكِنْ، هذا مايحصلُ حينما تتجاهلُ نصائحي."

"Well, I did warn you, but you overrode my recommendation."

"حسناً ماذا سأفعلُ الآن؟ لايمكنني إخبارُ زوجتي بأنّي نسيتُ مناسبةً كهذه!"

"Well, what am I going to do? I can't just tell him I'm too busy."

"لا تقلق." "قمتُ بتأجيلِ موعدِ انطلاقِ طائرتها." (ضحك)

"Don't worry. I arranged for his plane to be delayed." (Laughter)

"افتعلتُ بعضَ الأعطالِ في حاسوبِ شركةِ الطّيران" (ضحك)

"Some kind of computer malfunction." (Laughter)

"حقاً!" "بإمكانكِ فِعلُ أشياءَ كهذه!"

"Really? You can do that?"

"ترسلُ زوجتكُ اعتذارها العميق" "وتتطلّع للقاءكَ غداً على الغداء"

"He sends his profound apologies and looks forward to meeting you for lunch tomorrow."

(ضحك)

(Laughter)

So the values here -- there's a slight mistake going on. This is clearly following my wife's values which is "Happy wife, happy life."

(ضحك)

(Laughter)

It could go the other way. You could come home after a hard day's work, and the computer says, "Long day?"

"جدّاً! لدرجة أنّي لم أجد وقتاً لتناولِ الغداء."

"Yes, I didn't even have time for lunch."

"لابدّ من أنّك جائعٌ إذاً"

"You must be very hungry."

"أتضور جوعًا. هل بإمكانكِ تحضيرُ العشاء لي؟"

"Starving, yeah. Could you make some dinner?"

"هناكَ شيءٌ عليك معرفته." (ضحك)

"There's something I need to tell you." (Laughter)

"هناكٌ أُناسٌ في جنوب السودانِ في حاجةٍ ملحّةٍ للطعامِ أكثرَ منكَ بكثير!"

"There are humans in South Sudan who are in more urgent need than you."

(ضحك)

(Laughter)

"أنا ذاهب إلى هناك! اصنع طعامكَ بنفسك!"

"So I'm leaving. Make your own dinner."

(ضحك)

(Laughter)

لذا، يجبُ أن نَحُلّ هذهِ المشاكل، وأنا متحمّسٌ للعملِ عليهم.

So we have to solve these problems, and I'm looking forward to working on them.

And the robot hasn't quite learned the human value function properly, so it doesn't understand the sentimental value of the cat outweighs the nutritional value of the cat.

(ضحك)

(Laughter)

كريس أندرسون: هذا مشوّق جدّاً، ستيوارت. سنبقى على المنصّةِ قليلاً، لأنهم بصدد الإعداد للمحادثة التالية.

Chris Anderson: So interesting, Stuart. We're going to stand here a bit because I think they're setting up for our next speaker.

CA: And you couldn't just boil it down to one law, you know, hardwired in: "if any human ever tries to switch me off, I comply. I comply."

CA: All right. Stuart, can I just say, I really, really hope you figure this out for us. Thank you so much for that talk. That was amazing.

س. ر: شكراً لك. (تصفيق)

SR: Thank you. (Applause)

Stuart Russell: 3 principles for creating safer AI

Stuart Russell: 3 principles for creating safer AI

Related talks

Blaise Agüera y Arcas: How computers are learning to be creative

Sam Harris: Can we build AI without losing control over it?

Zeynep Tufekci: Machine intelligence makes human morals more important

Noriko Arai: Can a robot pass a university entrance exam?

David Lee: Why jobs of the future won't feel like work

Kriti Sharma: How to keep human bias out of AI

Related talks

Blaise Agüera y Arcas: How computers are learning to be creative

Sam Harris: Can we build AI without losing control over it?

Zeynep Tufekci: Machine intelligence makes human morals more important

Noriko Arai: Can a robot pass a university entrance exam?

David Lee: Why jobs of the future won't feel like work

Kriti Sharma: How to keep human bias out of AI