Blaise Agüera y Arcas: How computers are learning to be creative

حسنا، أقود فريقاً في غوغل يعمل في مجال الذكاء الاصطناعي؛ بعبارة أخرى، النظام الهندسي لصنع الحواسيب والأجهزة القادرة علي القيام ببعض الأمور التي يفعلها الدماغ. وهذا مايجعلنا مهتمين بالدماغ الطبيعي وعلم الأعصاب علي حد سواء، ونهتم بشكل خاص بالأمور التي تقوم بها أدمغتنا والتي ما تزال متفوقة جداُ علي أداء الحواسيب.

So, I lead a team at Google that works on machine intelligence; in other words, the engineering discipline of making computers and devices able to do some of the things that brains do. And this makes us interested in real brains and neuroscience as well, and especially interested in the things that our brains do that are still far superior to the performance of computers.

تاريخياً، كان الإدراك أحد تلك الأشياء، وهي العملية التي من خلالها يمكن للأشياء المحيطة -- كالأصوات والصور -- أن تتحول إلي أفكار في العقل. وهذا أساسي لأدمغتنا، وأيضاً مفيد جداً في الحواسيب. خوارزميات الإدراك الآليه، على سبيل المثال، التي يصنعها فريقنا، هي مايجعل صورك على محرك بحث غوغل للصور قابلة للبحث، بناءً علي محتوياتها. الوجة الآخر للإدراك هو الإبداع: أن تحول مفهوماً ما إلى شئ ملموس يهم العالم. لذك خلال العام الماضي، فإن عملنا في الإدراك الإصطناعي قد اقترن على نحو غير متوقع بالإبداع الآلي والفن الآلي.

Historically, one of those areas has been perception, the process by which things out there in the world -- sounds and images -- can turn into concepts in the mind. This is essential for our own brains, and it's also pretty useful on a computer. The machine perception algorithms, for example, that our team makes, are what enable your pictures on Google Photos to become searchable, based on what's in them. The flip side of perception is creativity: turning a concept into something out there into the world. So over the past year, our work on machine perception has also unexpectedly connected with the world of machine creativity and machine art.

أعتقد أن (مايكل أنجلو) كان يملك بصيرةً نافذة في هذه العلاقة الثنائية بين الإدراك والإبداع. هذه مقولة مشهورة نقلاً عنه: "كل كتلة حجرية تحمل في داخها تمثالاً ومهمة النحات هي أن يكتشفه". لذا أعتقد أن مايرمي اليه (مايكل أنجلو) هو أننا نبدع بمدى استيعابنا، وأن الإدراك ذاته هو عملية تخيل وهو أيضاً أداة الإبداع.

I think Michelangelo had a penetrating insight into to this dual relationship between perception and creativity. This is a famous quote of his: "Every block of stone has a statue inside of it, and the job of the sculptor is to discover it." So I think that what Michelangelo was getting at is that we create by perceiving, and that perception itself is an act of imagination and is the stuff of creativity.

العضو الذي يقوم بكل التفكير والإستيعاب والتخيّل، هو بالطبع، الدماغ. و أودّ أن أبدأ بنبذه تاريخية قصيرة عن ما نعرفه عن الدماغ. لأنه، خلافاً لمثلاً، القلب أو الأمعاء. لا يمكنك قول الكثير عن الدماغ بمجرد النظر اليه، على الأقل بالعين المجردة. علماء التشريج الأوائل الذين نظروا في الدماغ منحوا البنى السطحية للدماغ جميع المصطلحات المبهرجة، مثل الحصين، الذي يعني "الجمبري الصغير." وبالطبع هذا النمط من الأمور لا يخبرنا بالكثير عن ما يحدث في الداخل.

The organ that does all the thinking and perceiving and imagining, of course, is the brain. And I'd like to begin with a brief bit of history about what we know about brains. Because unlike, say, the heart or the intestines, you really can't say very much about a brain by just looking at it, at least with the naked eye. The early anatomists who looked at brains gave the superficial structures of this thing all kinds of fanciful names, like hippocampus, meaning "little shrimp." But of course that sort of thing doesn't tell us very much about what's actually going on inside.

أعتقد بحق، أن أول من قام بتكوين نوع من البصيرة عما يحدث داخل الدماغ كان عالم تشريح الأعصاب العظيم (سانتياغو رامون كاخال)، في القرن التاسع عشر، و الذي استخدم المجهر وأصبغة خاصة و التي كان بإمكانها أن تملأ الخلايا المفردة في الدماغ بتباينٍ شديد الوضوح، من أجل البدء بفهم تكوينها الشكلي. و هذه هي أنواع الرسومات التي ابتكرها من الخلايا العصبية في القرن التاسع عشر.

The first person who, I think, really developed some kind of insight into what was going on in the brain was the great Spanish neuroanatomist, Santiago Ramón y Cajal, in the 19th century, who used microscopy and special stains that could selectively fill in or render in very high contrast the individual cells in the brain, in order to start to understand their morphologies. And these are the kinds of drawings that he made of neurons in the 19th century.

هذا من دماغ طائر. ويمكنك رؤية التنوع الرائع لمختلف أنواع الخلايا، حتى النظرية الخلوية نفسها كانت حديثة العهد في تلك المرحلة. وهذه البنى،

This is from a bird brain. And you see this incredible variety of different sorts of cells, even the cellular theory itself was quite new at this point. And these structures,

هذه الخلايا التي لديها هذه التغصنات النهائية، هذه التفرعات التي يمكنها أن تتمدد لمسافات طويلة جداً جداً كان أمراً غير مألوف في تلك الحقبة. بالطبع، إنها أسلاك حافلة بالذكريات،. قد يبدو الأمر واضحاً للبعض في القرن 19؛ ثورة الأسلاك و الكهرباء كانت لا تزال قيد البناء. لكن في العديد من النواحي، كانت هذه الرسومات المجهرية ل(رامون كاخال) كهذه الرسمة، كانت لا تزال بشكل ما متعثرة الخطى.

these cells that have these arborizations, these branches that can go very, very long distances -- this was very novel at the time. They're reminiscent, of course, of wires. That might have been obvious to some people in the 19th century; the revolutions of wiring and electricity were just getting underway. But in many ways, these microanatomical drawings of Ramón y Cajal's, like this one, they're still in some ways unsurpassed.

ولا نزال بعد أكثر من قرن، نحاول إنهاء المهمة التي بدأها (رامون كاخال). هذه بيانات خام من مساعدينا في معهد ماكس بلانك لعلم الأعصاب. وما فعله مساعدونا هو رسم أجزاء صغيرة من نسيج دماغي. حجم العينة الكاملة هنا حوالي ميليميتر مكعب واحد، هنا أريكم جزءاً صغيراً جداً جداً منها. طول الخط الموجود إلى اليسار حوالي مايكرون واحد. البنى التي ترونها هي الجسيمات الكوندرية (ميتوكوندريا) و هي بحجم الباكتريا. و هذه شرائح متلاحقة خلال كتلة النسيج هذه المتناهية في الصغر. و على سبيل المقارنة فقط، فإن قطر خصلة شعر عادية حوالي 100 مايكرون. و بهذا فإننا ننظر إلى شئ أصغر بكثيرٍ جداً من خصلة شعر واحدة.

We're still more than a century later, trying to finish the job that Ramón y Cajal started. These are raw data from our collaborators at the Max Planck Institute of Neuroscience. And what our collaborators have done is to image little pieces of brain tissue. The entire sample here is about one cubic millimeter in size, and I'm showing you a very, very small piece of it here. That bar on the left is about one micron. The structures you see are mitochondria that are the size of bacteria. And these are consecutive slices through this very, very tiny block of tissue. Just for comparison's sake, the diameter of an average strand of hair is about 100 microns. So we're looking at something much, much smaller than a single strand of hair.

و من خلال هذه الأنواع من الشرائح المأخوذة بمجهر إلكتروني تسلسلي، يمكن للمرء أن يبدأ العمل على إعادة بناء نموذج ثلاثي الأبعاد لخلية عصبية تبدو كهذه. إذاً هذه أنماط مشابهة نوعاً ما لتلك التي لدى (رامون كاخال). و قد ظهرت بضع خلايا عصبية فقط، و إلا ما كنا لنستطيع أن نرى أي شيء هنا. كانت لتبدو شديدة الإزدحام، غنية بالتركيب، و الوصلات التي تربط الخلايا العصبية ببعضها البعض.

And from these kinds of serial electron microscopy slices, one can start to make reconstructions in 3D of neurons that look like these. So these are sort of in the same style as Ramón y Cajal. Only a few neurons lit up, because otherwise we wouldn't be able to see anything here. It would be so crowded, so full of structure, of wiring all connecting one neuron to another.

بهذا كان (رامون كاخال) سابقاً لعصره بعض الشيء، و متقدماً في فهمه للدماغ تابع ببطئ خلال العقود اللاحقة. و لكننا علمنا أن الخلايا العصبية استخدمت الكهرباء، و بحلول الحرب العالمية الثانية، تطورت تقنياتنا على نحو كافٍ للبدء بإجراء تجارب كهربائية فعلية على خلايا عصبية حية من أجل زيادة فهمنا لكيفية عملها. و في ذات الفترة تماماً بدأ اختراع الحواسيب. و قد اعتمدت الفكرة إلى حد كبير على نمذجة الدماغ -- "الآلة الذكية،" كما أطلق عليها (ألان تورينغ)، أحد آباء علم الحاسوب.

So Ramón y Cajal was a little bit ahead of his time, and progress on understanding the brain proceeded slowly over the next few decades. But we knew that neurons used electricity, and by World War II, our technology was advanced enough to start doing real electrical experiments on live neurons to better understand how they worked. This was the very same time when computers were being invented, very much based on the idea of modeling the brain -- of "intelligent machinery," as Alan Turing called it, one of the fathers of computer science.

اتطلع (وارن ماكولوكش) و(والتر بيتس) على رسومات (رامون كاخال) لمنطقة القشرة البصرية، التي أعرضها هنا. هذه هي القشرة التي تقوم بمعالجة الصور القادمة من العين. و بالنسبة لهما، بدا هذا كمخطط دارة كهربائية. و بهذا ثمة الكثير من التفاصيل في مخطط الدارة لكل من (ماكولوتش) و(بيتس) ليست صحيحة تماماً.

Warren McCulloch and Walter Pitts looked at Ramón y Cajal's drawing of visual cortex, which I'm showing here. This is the cortex that processes imagery that comes from the eye. And for them, this looked like a circuit diagram. So there are a lot of details in McCulloch and Pitts's circuit diagram that are not quite right.

لكن هذه الفكرة الأساسية أن القشرة الدماغية البصرية تعمل كسلسلة من العناصر الحاسوبية التي تمرر المعلومات من عنصر إلى التالي بتسلسل، هي فكرة صحيحة أساساً.

But this basic idea that visual cortex works like a series of computational elements that pass information one to the next in a cascade, is essentially correct.

دعونا نتكلم لبرهة عن النموذج المطلوب من أجل معالجة المعلومات البصرية. المهمة الأساسية للإدراك هي أخذ صورة كهذه والقول، "ذلك طائر،" و هو أمر في غاية السهولة بالنسبة لنا باستخدام أدمغتنا. و لكن ما عليكم أن تفهموه هو أنه بالنسبة لحاسوب، فإن هذا الأمر كان من المحال تحقيقه قبل بضعة سنوات قليلة. لم يكن نموذج الحاسوب التقليدي واحداً يمكن من خلاله تحقيق هذه المهمة بسهولة.

Let's talk for a moment about what a model for processing visual information would need to do. The basic task of perception is to take an image like this one and say, "That's a bird," which is a very simple thing for us to do with our brains. But you should all understand that for a computer, this was pretty much impossible just a few years ago. The classical computing paradigm is not one in which this task is easy to do.

إذاً إن ما يحدث بين نقاط البيكسل، بين صورة الطائر، و كلمة "طائر،" أساساً هو مجموعة من العصبونات المرتبطة ببعضها ضمن شبكة عصبونية، كما أعرضها هنا. يمكن لهذه الشبكة العصبونية أن تكون حيوية، ضمن القشرة الدماغية البصرية، أو، حالياً، نبدأ العمل على إمكانية نمذجة شبكات عصبونية مماثلة في الحاسوب. و سوف أريكم حقيقة كيف تبدو تلك الشبكات.

So what's going on between the pixels, between the image of the bird and the word "bird," is essentially a set of neurons connected to each other in a neural network, as I'm diagramming here. This neural network could be biological, inside our visual cortices, or, nowadays, we start to have the capability to model such neural networks on the computer. And I'll show you what that actually looks like.

إذاً يمكنك تصور نقاط البيكسل كطبقة أولى من العصبونات، و هذا، في الواقع، كيفية عملها في العين -- تلك هي العصبونات في الشبكية. و تلك العصبونات تلقم الإشارة داخل طبقة بعد طبقة أخرى، بعد طبقة أخرى من العصبونات، جميعها مرتبطة بواسطة مشابك ذات أوزان مختلفة. إن سلوك هذه الشبكة تتميز عن طريق قوة جميع تلك المشابك. و هذه تميز السمات الحسابية لهذه الشبكة. وبنهاية المطاف، يغدو لديك عصبون أو مجموعة صغيرة من العصبونات التي تضيء، كلمة، "طائر."

So the pixels you can think about as a first layer of neurons, and that's, in fact, how it works in the eye -- that's the neurons in the retina. And those feed forward into one layer after another layer, after another layer of neurons, all connected by synapses of different weights. The behavior of this network is characterized by the strengths of all of those synapses. Those characterize the computational properties of this network. And at the end of the day, you have a neuron or a small group of neurons that light up, saying, "bird."

سأقوم الآن بتمثيل تلك الأمور الثلاث -- المدخلات بيكسلات والمشابك في الشبكة العصبية، و طائر، المخرجات -- تحدد بمتغيرات ثلاث: x وw وy ربما ثمة مليون أو نحو ذلك من المتغير x مليون بيكسل في تلك الصورة. ثمة مليونات أو ترليونات من المتغير w، التي تمثل وزن جميع هذه المشابك في الشبكة العصبية. و يوجد كم ضئيل من المتغير y، من المخرجات التي تمتلكها الشبكة. "طائر" هي كلمة من أربعة أحرف، صحيح؟ إذاً دعونا نتظاهر بأن هذه مجرد معادلة بسيطة، x" x" w = y. أضع التكرار ضمن إشارتي اقتباس مخيفتين لأن ما يحدث بالفعل هناك، بالطبع، عبارة عن سلاسل معقدة من العمليات الحسابية الرياضية.

Now I'm going to represent those three things -- the input pixels and the synapses in the neural network, and bird, the output -- by three variables: x, w and y. There are maybe a million or so x's -- a million pixels in that image. There are billions or trillions of w's, which represent the weights of all these synapses in the neural network. And there's a very small number of y's, of outputs that that network has. "Bird" is only four letters, right? So let's pretend that this is just a simple formula, x "x" w = y. I'm putting the times in scare quotes because what's really going on there, of course, is a very complicated series of mathematical operations.

تلك معادلة واحدة. يوجد ثلاثة متغيرات. و جميعنا نعلم أنه عندما يكون لدينا معادلة واحدة، يمكنكم حل أحد المتغيرات بمعرفة المتغيرين الآخرين. لذلك فإن مشكلة الاستدلال، هي أن نعلم أن صورة الطائر هي طائر، هي هذه: هي أن المتغير y مجهول، والمتغيرين w و x معلومان. أنتم تعرفون الشبكات العصبية، تعرفون البيكسلات. كما ترون، تلك هي مشكلة بسيطة نسبياً. تضاعفون مرتين بثلاث وتنتهون. سأريكم شبكة عصبية اصطناعية قمنا ببنائها مؤخراً، بنفس الأسلوب تماماً.

That's one equation. There are three variables. And we all know that if you have one equation, you can solve one variable by knowing the other two things. So the problem of inference, that is, figuring out that the picture of a bird is a bird, is this one: it's where y is the unknown and w and x are known. You know the neural network, you know the pixels. As you can see, that's actually a relatively straightforward problem. You multiply two times three and you're done. I'll show you an artificial neural network that we've built recently, doing exactly that.

تعمل بالزمن الحقيقي على الهاتف المحمول، و هذا، بالطبع، أمر رائع بحد ذاته، الهاتف النقال يستطيع القيام بمليارات بل ترليونات العمليات بالثانية. ما تنظر اليه هو هاتف تنظر لصورة تلو الأخرى لطائر، وفي الحقيقية، لا تقول فقط "نعم، إنه طائر" بل تميّز نوع الطائر بشبكة من هذا النوع. اذاً في تلك الصورة، ال x وw معروفان، بينما y غير معروفة. أنا أموه حول الجزء الصعب كما ترون اذاً كيف من الممكن أن نميز w، الدماغ الذي يستطيع القيام بشيء كهذا؟ كيف يمكننا تعلم هذا النموذج؟

This is running in real time on a mobile phone, and that's, of course, amazing in its own right, that mobile phones can do so many billions and trillions of operations per second. What you're looking at is a phone looking at one after another picture of a bird, and actually not only saying, "Yes, it's a bird," but identifying the species of bird with a network of this sort. So in that picture, the x and the w are known, and the y is the unknown. I'm glossing over the very difficult part, of course, which is how on earth do we figure out the w, the brain that can do such a thing? How would we ever learn such a model?

إذاً عملية التعليم هذه، لحل w اذا ما كنا نقوم بهذا بمساعدة المعادلة البسيطة والتي نعامل فيها هذه الأحرف كأرقام، نستطيع فهم ذلك تماماً 6 = 2 x w حسناً نقسم على اثنين وانتهينا. المشكلة ستكون بهذه العملية. اذاً، القسمة -- استخدمنا القسمة ، لأنها عكس الضرب، لكن كما قلت، الضرب هو كذبة صغيرة هنا. هذه العملية معقدة للغاية، وهي عملية غير خطية وليس لديها معكوس. اذاً ،علينا إيجاد طريقة لحل هذه المعادلة بدون عملية قسمة. والطريقة للقيام بذلك غير واضحة نوعاً ما. لنقل، دعونا نقوم بلعبة جبرية ما ولننقل الرقم ستة الى الجانب اليميني من المعادلة. الأن، مازلنا نستخدم عملية الضرب. وذلك الصفر -- لنفكر به كأنه خطأ ما. بعبارة أخرى، إذا حلينا المعادلة ل w بالطريقة الصحيحة، اذاً سيكون الخطأ صفراً. واذا لم نحلها حلاً صحيحاً، سيكون الخطأ أكبر من الصفر.

So this process of learning, of solving for w, if we were doing this with the simple equation in which we think about these as numbers, we know exactly how to do that: 6 = 2 x w, well, we divide by two and we're done. The problem is with this operator. So, division -- we've used division because it's the inverse to multiplication, but as I've just said, the multiplication is a bit of a lie here. This is a very, very complicated, very non-linear operation; it has no inverse. So we have to figure out a way to solve the equation without a division operator. And the way to do that is fairly straightforward. You just say, let's play a little algebra trick, and move the six over to the right-hand side of the equation. Now, we're still using multiplication. And that zero -- let's think about it as an error. In other words, if we've solved for w the right way, then the error will be zero. And if we haven't gotten it quite right, the error will be greater than zero.

الأن سوف نخمن حتى يكون الخطأ أصغر، وهذا هو الشيء الذي تبرع فيه أجهزة الحاسوب جداً. اذاً، لناخذ تخميناً أولياً: مائا لو w=0 ؟ إذاً، الخطأ سيكون 6. ماذا لو W=1 ؟ إذاً الخطأ 4 . ومن ثم يستطيع الحاسوب أن يلعب (ماركو بولو)، حتى يقوم بإنقاص الخطأ الى الصفر. وبينما يفعل ذلك، فهو يحصل على قيمة تقريبة متعاقبة ل w. بالعادة ،لايصل الى هذه القيمة بسرعة لكن بعد الكثير من الخطوات، نصل لحوالي w = 2.999 وهي قيمة تقريبية كافية. وهذه هي العملية التعليمية.

So now we can just take guesses to minimize the error, and that's the sort of thing computers are very good at. So you've taken an initial guess: what if w = 0? Well, then the error is 6. What if w = 1? The error is 4. And then the computer can sort of play Marco Polo, and drive down the error close to zero. As it does that, it's getting successive approximations to w. Typically, it never quite gets there, but after about a dozen steps, we're up to w = 2.999, which is close enough. And this is the learning process.

اذاً تذكر ما كان يجري هنا كنا نقوم باخذ الكثير من قيم x و y المعلومة ونقوم بحل w خلال عمليات متعاقبة. تماماً هي نفس الطريقة التي نتعلم بها. تستصحب أذهاننا الكثير من صور فترة الطفولة ويُقال لنا "هذا طائر، ليس هذا طائراً." ومع مرور الوقت، ومع التكرار، نحل الw نقوم بالحل عن طريق تلك الوصلات العصبية.

So remember that what's been going on here is that we've been taking a lot of known x's and known y's and solving for the w in the middle through an iterative process. It's exactly the same way that we do our own learning. We have many, many images as babies and we get told, "This is a bird; this is not a bird." And over time, through iteration, we solve for w, we solve for those neural connections.

اذاً الأن، لدينا x و y كقيم ثابتة لكي نحل y ، هذا ككل يوم، تصور سريع. نكتشف كيف نستطيع إيجاد الحل ل w، ذلك هو التعلم، وهو الأكثر صعوبة، لأننا نحتاج إلى تصغير الخطأ، باستخدام الكثير من أمثلة التعلم،

So now, we've held x and w fixed to solve for y; that's everyday, fast perception. We figure out how we can solve for w, that's learning, which is a lot harder, because we need to do error minimization, using a lot of training examples.

ومنذ حوالي السنة، أحد أعضاء فريقنا (أليكس ماريفينسف)، قرر أن يجرب ماذا سيحدث إذا ما حاولنا حل المعادلة لأجل x، بإعطاء قيمة معلومة ل w و y . بعبارة أخرى، أنت تعرف أنه طائر، وتتمتع مسبقاً بشبكة عصبية دربتها على أن الذي أمامك طائر، لكن ما الذي تبدو عليه صورة الطائر؟ اتضح أنه باستخدام نفس عملية تقليل الخطأ، نستطيع فعل المثل عن طريق الشبكة المدربة على التعرف على الطيور، واتضح أن النتيجة ستكون ... صورة لطيور. هذه صورة لطائر تم توليدها كلياً بواسطة شبكة عصبية والتي دُربت لتتعرف على الطيور. فقط بحل بالنسبة ل x بدل الحل بالنسبة ل y. وبالقيام بتلك التكرارات.

And about a year ago, Alex Mordvintsev, on our team, decided to experiment with what happens if we try solving for x, given a known w and a known y. In other words, you know that it's a bird, and you already have your neural network that you've trained on birds, but what is the picture of a bird? It turns out that by using exactly the same error-minimization procedure, one can do that with the network trained to recognize birds, and the result turns out to be ... a picture of birds. So this is a picture of birds generated entirely by a neural network that was trained to recognize birds, just by solving for x rather than solving for y, and doing that iteratively.

هنا مثال مسلي آخر. هذا العمل صنعه (مايك تايكو) من فريقنا، والذي يدعوه "موكب الحيوانات". ويذكرني قليلاً بأعمال (وليام كينتردوغ) الفنية، حيث يقوم برسم نماذج، ثم يقوم بتحريكها، يرسم النماذج، ويحركها، ويصنع فلم بهذه الطريقة. في هذه الحالة، مايقوم به (مايك) هو تغيير y عبر مساحة متباينة من الحيوانات، ضمن شبكة مصصمة، لكي تميز وتعرف الحيوانات المختلفة عن بعضها البعض. وستحصل على هذا الشكل الغريب من حيوان لآخر،

Here's another fun example. This was a work made by Mike Tyka in our group, which he calls "Animal Parade." It reminds me a little bit of William Kentridge's artworks, in which he makes sketches, rubs them out, makes sketches, rubs them out, and creates a movie this way. In this case, what Mike is doing is varying y over the space of different animals, in a network designed to recognize and distinguish different animals from each other. And you get this strange, Escher-like morph from one animal to another.

هو و(ألكس) حاولا تقليل ال y الى مساحة بعدين فقط، مما سمح لهما بصنع خريطة من مساحة كل الأشياء المُتعرف عليها من قبل هذه الشبكة. بالقيام بهذا النوع من التركيب أو توليد صورة من ذلك السطح الكامل، بتغيير y عبر ذلك السطح، ستحصل على خريطة نوعاً ما -- خريطة بصرية من كل الأشياء التي تستطيع الشبكة تمييزها.

Here he and Alex together have tried reducing the y's to a space of only two dimensions, thereby making a map out of the space of all things recognized by this network. Doing this kind of synthesis or generation of imagery over that entire surface, varying y over the surface, you make a kind of map -- a visual map of all the things the network knows how to recognize.

الحيوانات كلها موجودة هنا، و"أرمانديلو" في البقعة المناسبة.

The animals are all here; "armadillo" is right in that spot.

وتستطيع القيام بذلك مع أنواع آخرى من الشبكات أيضاً. هذه الشبكة مصممة لمعرفة الوجوه، لتميز كل وجه عن الآخر. وهنا نقوم بوضع ال y التي تقول "أنا،" مقاييس وجهي الخاصة. وعندما تُحل هذه الشبكة بالنسبة لx، بالأحرى تولد هذا الجنون، صورة تكعيبية، وسريالية، وغريبة لي من عدة وجهات نظر في نفس الوقت.

You can do this with other kinds of networks as well. This is a network designed to recognize faces, to distinguish one face from another. And here, we're putting in a y that says, "me," my own face parameters. And when this thing solves for x, it generates this rather crazy, kind of cubist, surreal, psychedelic picture of me from multiple points of view at once.

والسبب في تكوينها من عدة وجهات نظر في نفس الوقت هو أن هذه الشبكة صُممت لكي تتخلص من الغموض الذي يلحق بالوجوه من وضعية تصوير لآخرى، وبالنظر من وضع إضاءة الى آخر. إذاً حتى تقوم بهذا النوع من إعادة التكوين، اذا لم تستخدم نوع من الصور الدليلية أو دليل إحصائي، حينها ستحصل على قليل من الإرتباك من نقاط مختلفة، لأنه غامض. هذا ما سيحصل إذا استخدم (أليكس) وجهه كصورة دليلية خلال عملية إعادة تكوين وجهي. سترون بأنها ليست مثالية. وما يزال هناك الكثير من العمل لنقوم به لتحسين عملية إعادة التكوين هذه. لكن بدأنا بالحصول على شيء يشبه الوجه المتماسك، وذلك باستخدام وجهي كدليل.

The reason it looks like multiple points of view at once is because that network is designed to get rid of the ambiguity of a face being in one pose or another pose, being looked at with one kind of lighting, another kind of lighting. So when you do this sort of reconstruction, if you don't use some sort of guide image or guide statistics, then you'll get a sort of confusion of different points of view, because it's ambiguous. This is what happens if Alex uses his own face as a guide image during that optimization process to reconstruct my own face. So you can see it's not perfect. There's still quite a lot of work to do on how we optimize that optimization process. But you start to get something more like a coherent face, rendered using my own face as a guide.

ليس عليك البدء بقماش فارغ أو ضوضاء بيضاء. بحل المعادلة بالنسبة لx تستطيع البدء بx والتي هي نفسها صورة ما مسبقاً. هذا هو هذا المنظر الصغير. هذه شبكة مُصممة لكي تصنف العديد من الأشياء المختلفة أشخاص، أشكال، حيوانات ... هنا، بدأنا بصورة للغيوم، وبينما نقوم بعملية الاستمثال، أساساً، هذه الشبكة تميز ما تراه بين الغيوم. وكلما استغرقت بالنظر الى هذا، سترى المزيد من الأشياء بين الغيوم تستطيع أيضاً استخدام شبكة الوجوه لكي تهلوس الى هذا، وستحصل على أشياء مجنونة جداً.

You don't have to start with a blank canvas or with white noise. When you're solving for x, you can begin with an x, that is itself already some other image. That's what this little demonstration is. This is a network that is designed to categorize all sorts of different objects -- man-made structures, animals ... Here we're starting with just a picture of clouds, and as we optimize, basically, this network is figuring out what it sees in the clouds. And the more time you spend looking at this, the more things you also will see in the clouds. You could also use the face network to hallucinate into this, and you get some pretty crazy stuff.

(ضحك)

(Laughter)

أو، أجرى (مايك) تجارب أخرى حيث يأخذ صورة الغيمة هذه، يهلوسها ويكبرها، يهلوسها ويكبرها، يهلوسها ويكبرها. وبهذه الطريقة، نستطيع الحصول على حالة من الضباب لهذه الشبكة كما أعتقد، أو نوعاً ما من المجمعات الحرة، حيث تقوم الشبكات بتدمير نفسها. إذاً كل صورة هي الأساس لما "ما الذي أعتقد أني سأراه بعدها؟ مالذي أعتقد أني سأراه بعدها؟ مالذي أعتقد أني سأراه بعدها؟

Or, Mike has done some other experiments in which he takes that cloud image, hallucinates, zooms, hallucinates, zooms hallucinates, zooms. And in this way, you can get a sort of fugue state of the network, I suppose, or a sort of free association, in which the network is eating its own tail. So every image is now the basis for, "What do I think I see next? What do I think I see next? What do I think I see next?"

عرضت هذا لأول مرة على الملأ لمجموعة خلال محاضرة في سياتل تحت عوان (التعليم العالي) -- كان هذا مباشرة بعد إجازة الماريجونا.

I showed this for the first time in public to a group at a lecture in Seattle called "Higher Education" -- this was right after marijuana was legalized.

(ضحك)

(Laughter)

إذاً، أريد أن أختم سريعاً بالإشارة الى أن هذه التكنولوجيا غير مقيدة لقد أريتكم أمثلة بصرية بحتة لأنه من الممتع النظر اليها لكنها ليست تقنية بصرية بحتة. الفنان المتعاون معنا، (روس غوردن) قام بتجارب، تتضمن كاميرا تقوم بأخذ صورة، ثم حاسوب في حقيبة ظهره يقوم بكتابة قصيدة باستخدام الشبكات العصبية، وذلك استناداً على محتوى الصورة. وقد دُربت الشبكة العصبية الشعرية على أشعار كثيرة من القرن العشرين. والشعر كما تعلمون، كما أعتقد، هذا ليس سيئاً في الواقع.

So I'd like to finish up quickly by just noting that this technology is not constrained. I've shown you purely visual examples because they're really fun to look at. It's not a purely visual technology. Our artist collaborator, Ross Goodwin, has done experiments involving a camera that takes a picture, and then a computer in his backpack writes a poem using neural networks, based on the contents of the image. And that poetry neural network has been trained on a large corpus of 20th-century poetry. And the poetry is, you know, I think, kind of not bad, actually.

(ضحك)

(Laughter)

في الختام، أعتقد أن (مايكل أنجيلو)، كان على حق، الإدراك والإبداع مرتبطان ارتباطاً وثيقاً. ما رأيناه للتو هو شبكات عصبية، مُدربة كلياً لكي تميز، أو للتعرف على الأشياء المختلفة في هذا العالم، وقادرة على العمل في الإتجاه المعاكس، لتولد. وأحد الأشياء التي اقترحت لي ليس فقط ما رأه (مايكل أنجيلو) المنحوتة في قطعة الحجر، لكن أي مخلوق، أي كائن، أي فضائي يستطيع أن يقوم بأعمال حسية من هذا النوع هو قادر أيضاً على التكوين لأنها نفس الآلية المستخدمة في الحالتين.

In closing, I think that per Michelangelo, I think he was right; perception and creativity are very intimately connected. What we've just seen are neural networks that are entirely trained to discriminate, or to recognize different things in the world, able to be run in reverse, to generate. One of the things that suggests to me is not only that Michelangelo really did see the sculpture in the blocks of stone, but that any creature, any being, any alien that is able to do perceptual acts of that sort is also able to create because it's exactly the same machinery that's used in both cases.

كذلك، أعتقد أن الإدراك والإبداع لا يعنيان بالضرورة إنسان على نحو مميز. بدأنا باختراع حواسيب تقوم بنفس هذه الأشياء. ومن المفترض أن لا يكون ذلك مفاجئاً؛ فالدماغ في الأساس حسابي.

Also, I think that perception and creativity are by no means uniquely human. We start to have computer models that can do exactly these sorts of things. And that ought to be unsurprising; the brain is computational.

وأخيراً، بدأت الحوسبة كتدريب لتطويرألات ذكية. وتم تغييرها على نحو كبير بعد فكرة كيف نستطيع جعل الآلات ذكية. وأخيراً، بدأنا نوفي ببعض وعود أولئك الرعيل الأول، ل(تورينج) و(فون نيومان) و(مكولوتش) و(بيتس). وأعتقد أن الحوسبة لا تتعلق فقط بالحساب أو لعب الكاندي كراش أو شيء ما. من البداية، قمنا بأخذ أدمغتنا كنموذج. وأعطانا ذلك القابلية لفهم أدمغتنا فهماً أفضل ومدّها.

And finally, computing began as an exercise in designing intelligent machinery. It was very much modeled after the idea of how could we make machines intelligent. And we finally are starting to fulfill now some of the promises of those early pioneers, of Turing and von Neumann and McCulloch and Pitts. And I think that computing is not just about accounting or playing Candy Crush or something. From the beginning, we modeled them after our minds. And they give us both the ability to understand our own minds better and to extend them.

شكراً جزيلاً لكم.

Thank you very much.

(تصفيق)

(Applause)

This is from a bird brain. And you see this incredible variety of different sorts of cells, even the cellular theory itself was quite new at this point. And these structures,

But this basic idea that visual cortex works like a series of computational elements that pass information one to the next in a cascade, is essentially correct.

الحيوانات كلها موجودة هنا، و"أرمانديلو" في البقعة المناسبة.

The animals are all here; "armadillo" is right in that spot.

(ضحك)

(Laughter)

I showed this for the first time in public to a group at a lecture in Seattle called "Higher Education" -- this was right after marijuana was legalized.

(ضحك)

(Laughter)