Blaise Agüera y Arcas: How computers are learning to be creative

So, I lead a team at Google that works on machine intelligence; in other words, the engineering discipline of making computers and devices able to do some of the things that brains do. And this makes us interested in real brains and neuroscience as well, and especially interested in the things that our brains do that are still far superior to the performance of computers.

אז, אני מוביל צוות בגוגל שעובד על למידה חישובית; כלומר, תחום ההנדסה שגורם למחשבים ומכשירים לעשות כמה מהדברים שמוחותינו עושים. וזה גורם לנו להתעניין במוחות אמיתיים וגם במדעי המוח, ובמיוחד בדברים שבביצועם מוחותינו עדיין עולים בהרבה על מחשבים.

Historically, one of those areas has been perception, the process by which things out there in the world -- sounds and images -- can turn into concepts in the mind. This is essential for our own brains, and it's also pretty useful on a computer. The machine perception algorithms, for example, that our team makes, are what enable your pictures on Google Photos to become searchable, based on what's in them. The flip side of perception is creativity: turning a concept into something out there into the world. So over the past year, our work on machine perception has also unexpectedly connected with the world of machine creativity and machine art.

מבחינה היסטורית, אחד התחומים האלה הוא תפיסה, ההליך שדרכו כל הדברים שיש בעולם החיצוני -- צלילים ותמונות -- נעשים למושגים בשכל. זה חיוני למוחות שלנו, וגם די שימושי במחשב. האלגוריתמים לתפיסה חישובית, לדוגמא, שהצוות שלנו מוציא, הם מה שמאפשר לתמונותיכם בגוגל פוטוז להיות ניתנות-חיפוש, על פי תוכנן. הפן השני של תפיסה הוא יצירתיות: הפיכת מושג לעצם בעולם. וכך במשך השנה האחרונה, עבודתנו על תפיסה חישובית התחברה באופן לא צפוי לעולם היצירתיות החישובית, ולאמנות ממוחשבת.

I think Michelangelo had a penetrating insight into to this dual relationship between perception and creativity. This is a famous quote of his: "Every block of stone has a statue inside of it, and the job of the sculptor is to discover it." So I think that what Michelangelo was getting at is that we create by perceiving, and that perception itself is an act of imagination and is the stuff of creativity.

אני חושב שלמיכלאלג'לו הייתה תובנה חודרת לגבי היחס הדואלי הזה שבין תפיסה ליצירתיות. זאת ציטטה מפורסמת שלו: "לכל אבן יש פסל בתוכה, ועבודת הפסל לגלות אותו." אז, אני חושב שמיכלאנג'לו התכוון לזה שאנחנו יוצרים באמצעות תפיסה, ושתפיסה עצמה היא מעשה של דמיון והיא חומר היצירתיות.

The organ that does all the thinking and perceiving and imagining, of course, is the brain. And I'd like to begin with a brief bit of history about what we know about brains. Because unlike, say, the heart or the intestines, you really can't say very much about a brain by just looking at it, at least with the naked eye. The early anatomists who looked at brains gave the superficial structures of this thing all kinds of fanciful names, like hippocampus, meaning "little shrimp." But of course that sort of thing doesn't tell us very much about what's actually going on inside.

האיבר שמבצע את כל המחשבה והתפיסה והדמיון, הוא כמובן המוח. ואני רוצה לפתוח בקטע קצר של היסטוריה על מה שאנחנו יודעים על המוח. כי לא כמו, כגון, הלב או המעיים, אי אפשר לומר הרבה על המוח פשוט מלהסתכל עליו, לפחות, בעין בלתי מזויינת. האנטומיסטים המוקדמים שהסתכלו על מוחות קראו למבנים השטחיים של הדבר הזה בכל מיני שמות דמיוניים, כגון היפוקמפוס, שמשמעו "חסילון קטן." אך בבירור דבר מסוג זה לא מספר לנו הרבה על מה שבאמת קורה בפנים.

The first person who, I think, really developed some kind of insight into what was going on in the brain was the great Spanish neuroanatomist, Santiago Ramón y Cajal, in the 19th century, who used microscopy and special stains that could selectively fill in or render in very high contrast the individual cells in the brain, in order to start to understand their morphologies. And these are the kinds of drawings that he made of neurons in the 19th century.

האדם הראשון, לדעתי, שבפועל פיתח איזה מין של הבנה של מה שבאמת קורה בתוך המוח היה הניורואנטומיסט הספרדי הגדול, סנטיאגו רמון אי קחאל, במאה התשע-עשרה, שהשתמש במיקרוסקופ וצבעים מיוחדים שיכלו לדמות בבררנות ובניגוד גבוה תאים יחידים במוח, כדי להתחיל להבין את המורפולוגיה שלהם. ואלה סוגי הציורים שהוא צייר של תאי עצב במאה התשע-עשרה.

This is from a bird brain. And you see this incredible variety of different sorts of cells, even the cellular theory itself was quite new at this point. And these structures, these cells that have these arborizations, these branches that can go very, very long distances -- this was very novel at the time. They're reminiscent, of course, of wires. That might have been obvious to some people in the 19th century; the revolutions of wiring and electricity were just getting underway. But in many ways, these microanatomical drawings of Ramón y Cajal's, like this one, they're still in some ways unsurpassed.

זה מתוך מוח של ציפור. ותראו את המגוון המדהים הזה של תאים מסוגים שונים, אפילו תיאוריית התא עצמה הייתה חדשה למדי בעת הזו והמבנים האלה, התאים האלה שיש להם הסתעפויות כאלה וענפים כאלה שמסוגלים להאריך מרחקים רבים -- זה היה מאד חדשני ומקורי בתקופה זו. הם מזכירים חוטים כמובן. אפשר שכבר תפסו כך כמה אנשים במאה התשע-עשרה; מהפכות החיווט והחישמול בדיוק החלו. אבל מבחינות רבות, הציורים המיקרו-אנטומיים האלה של רמון אי קחאל, כגון זה, מבחינת-מה עוד לא הושגו.

We're still more than a century later, trying to finish the job that Ramón y Cajal started. These are raw data from our collaborators at the Max Planck Institute of Neuroscience. And what our collaborators have done is to image little pieces of brain tissue. The entire sample here is about one cubic millimeter in size, and I'm showing you a very, very small piece of it here. That bar on the left is about one micron. The structures you see are mitochondria that are the size of bacteria. And these are consecutive slices through this very, very tiny block of tissue. Just for comparison's sake, the diameter of an average strand of hair is about 100 microns. So we're looking at something much, much smaller than a single strand of hair.

ואנחנו מאה שנים אחרי כן, מנסים לגמור את העבודה בו רמון אי קחאל החל. אלה הם נתונים לא-מעובדים ממשתפי הפעולה שלנו במכון מקס פלאנק למדעי מוח. ומה שמשתפי הפעולה שלנו עשו זה לדמת חלקיקים קטנים של רקמה מוחית. כל הדגימה הינה קרובה בגודלה למילימטר מעוקב אחד, ואני מראה לכם חלק מאד קטן ממנה פה. הפס הזה שלצד שמאל בגודל מיקרון אחד בערך. המבנים שאתם רואים הם מיטוכונדריה ששווים בגודלם לחיידק. ואלה פרוסות עוקבות בתוך הגוש הקטנטן מאד הזה של רקמה. רק לשם השוואה, קוטר קווצה ממוצעת של שיער היא כמאה מיקרון. וכך אנחנו מתבוננים במשהו הרבה יותר זעיר משערה אחת.

And from these kinds of serial electron microscopy slices, one can start to make reconstructions in 3D of neurons that look like these. So these are sort of in the same style as Ramón y Cajal. Only a few neurons lit up, because otherwise we wouldn't be able to see anything here. It would be so crowded, so full of structure, of wiring all connecting one neuron to another.

ועל פי כל הסוגים האלה של פרוסות סדרתיות מוגדלות מיקרוסקופ-אלקטרוני, אפשר להתחיל בשיחזורים תלת-ממדיים של תאי עצב, הנראים ככה. ובכן, אלה נוהגים על פי אותו סגנון בקירוב של רמון אי קחאל. רק תאי עצב אחדים נדלקו, אחרת לא היינו יכולים לראות פה שום דבר. יהיה כל כך צפוף, כל כך מלא מבנים, מלא חיווט המקשר תאי עצב אחד לשני.

So Ramón y Cajal was a little bit ahead of his time, and progress on understanding the brain proceeded slowly over the next few decades. But we knew that neurons used electricity, and by World War II, our technology was advanced enough to start doing real electrical experiments on live neurons to better understand how they worked. This was the very same time when computers were being invented, very much based on the idea of modeling the brain -- of "intelligent machinery," as Alan Turing called it, one of the fathers of computer science.

אז רמון אי קחאל במידה הקדים את זמנו, והתקדמות בהבנת המוח התקדמה לאט לאט במשך העשורים הבאים. אבל ידענו שתאי עצב עשו שימוש בחשמל, ועד מלחמת העולם השנייה, הטכנולוגיה שלנו התקדמה דיה כדי להתחיל ניסויים חשמליים אמיתיים בתאי עצב חיים בשביל להבין טוב יותר כיצד הם עבדו. זה אותו הזמן שמחשבים הומצאו לראשונה, ממש בהתבסס על הרעיון של בניית דגם המוח -- של "מנגנון נבון," כמו שאלן טיורינג כינה אותו, אחד מאבות מדעי המחשב.

Warren McCulloch and Walter Pitts looked at Ramón y Cajal's drawing of visual cortex, which I'm showing here. This is the cortex that processes imagery that comes from the eye. And for them, this looked like a circuit diagram. So there are a lot of details in McCulloch and Pitts's circuit diagram that are not quite right. But this basic idea that visual cortex works like a series of computational elements that pass information one to the next in a cascade, is essentially correct.

ווארן מקולוק ווואלטר פיטס הביטו בציורו של רמון אי קחאל של קליפת הראייה, שאותו אני מראה פה. זאת הקליפה שמעבדת דימויים המגיעים מהעין. ומבחינתם, זה נראה כמו דיאגרמת מעגל חשמלי. לכן יש הרבה פרטים בדיאגרמת המעגל של מקולוק ופיטס שאינם לגמרי נכונים. אבל הרעיון המרכזי הזה שקליפת הראייה עובדת כסדרה של רכיבים חישוביים המעבירים מידע אחד לרעהו במפל, בעיקרו של דבר נכון.

Let's talk for a moment about what a model for processing visual information would need to do. The basic task of perception is to take an image like this one and say, "That's a bird," which is a very simple thing for us to do with our brains. But you should all understand that for a computer, this was pretty much impossible just a few years ago. The classical computing paradigm is not one in which this task is easy to do.

בואו נדבר רגע על מה שדגם לעיבוד מידע חזותי צריך לעשות. המשימה היסודית של תפיסה היא החזקת תמונה כזו שמשמאל ולאמר כי "זאת ציפור," היא דבר מאד פשוט לעשות בעזרת מוחותינו. אבל צריכים להבין שעבור מחשבים, זה היה כמעט בלתי אפשרי רק לפני כמה שנים. במסגרת הפרדיגמה החישובית הקלאסית אין זאת משימה קלת-ביצוע.

So what's going on between the pixels, between the image of the bird and the word "bird," is essentially a set of neurons connected to each other in a neural network, as I'm diagramming here. This neural network could be biological, inside our visual cortices, or, nowadays, we start to have the capability to model such neural networks on the computer. And I'll show you what that actually looks like.

אז מה שקורה בין הפיקסלים, בין דמות הציפור לבין המילה "ציפור," ביסוד הוא התקשרות בין מערכת תאי עצב מקושרים זה לזה ברשת עצבית, כפי שאני משרטט פה. הרשת העצבית יכולה להיות ביולוגית, בתוך קליפתנו הראייתית, או, כיום, מתחילה להיות לנו היכולת לבנות דגמים של רשתות אלה במחשב. ואראה לכם איך זה נראה באמת.

So the pixels you can think about as a first layer of neurons, and that's, in fact, how it works in the eye -- that's the neurons in the retina. And those feed forward into one layer after another layer, after another layer of neurons, all connected by synapses of different weights. The behavior of this network is characterized by the strengths of all of those synapses. Those characterize the computational properties of this network. And at the end of the day, you have a neuron or a small group of neurons that light up, saying, "bird."

אז אתם יכולים לחשוב על הפיקסלים כשכבה הראשונה של ניורונים, וזה, למעשה, איך שזה עובד בעין -- אלה הניורונים ברשתית. ואלה מזינים הלאה לתוך שכבה אחת אחרי אחרת אחרי הבאה של ניורונים, כולם מחוברים על ידי סינפסות במשקלים שונים. ההתנהגות של הרשת הזו מאופיינת על ידי הכוחות של כל הסינפסות האלה. אלה מאפיינים את התכונות המחשוביות של הרשת הזו. ובסופו של יום, יש לכם ניורון או קבוצה קטנה של ניורונים שנדלקים, ואומרים, "ציפור."

Now I'm going to represent those three things -- the input pixels and the synapses in the neural network, and bird, the output -- by three variables: x, w and y. There are maybe a million or so x's -- a million pixels in that image. There are billions or trillions of w's, which represent the weights of all these synapses in the neural network. And there's a very small number of y's, of outputs that that network has. "Bird" is only four letters, right? So let's pretend that this is just a simple formula, x "x" w = y. I'm putting the times in scare quotes because what's really going on there, of course, is a very complicated series of mathematical operations.

עכשיו אני עומד לייצג את שלושת הדברים האלה -- פיקסל הקלט והסינפסות ברשתות העצביות, וציפור, הפלט -- על ידי שלושה משתנים: X, W ו Y. יש אולי מליון Xים בערך -- מליון פיקסלים בתמונה הזו. יש מיליארדים או טריליונים של W, שמייצגים את המשקל של כל הסינפסות האלו ברשתות העצביות. ויש מספר מאוד קטן של Y, של פלטים שיש לרשת הזו. "ציפור " היא רק חמש אותיות, נכון? אז בואו נעמיד פנים שזו רק נוסחה פשוטה, x "x" w = y. אני שם את הכפול בגרשיים בגלל שמה שבאמת מתרחש פה, כמובן, זה סדרה מאוד מורכבת של פעולות מתמטיות.

That's one equation. There are three variables. And we all know that if you have one equation, you can solve one variable by knowing the other two things. So the problem of inference, that is, figuring out that the picture of a bird is a bird, is this one: it's where y is the unknown and w and x are known. You know the neural network, you know the pixels. As you can see, that's actually a relatively straightforward problem. You multiply two times three and you're done. I'll show you an artificial neural network that we've built recently, doing exactly that.

זו משוואה אחת. יש שלושה משתנים. וכולנו יודעים שאם יש לכם משוואה אחת, אתם יכולים לפתור משתנה אחד אם יודעים את שני הדברים האחרים. אז הבעיה של הסקה, שהיא, להבין שהתמונה של הציפור היא ציפור, היא זו: פה Y הוא המשתנה ו W ו X ידועים. אתם יודעים את הרשת העצבית, אתם יודעים את הפיקסלים. כמו שאתם יכולים לראות, זו למעשה בעיה די פשוטה. אתם מכפילים שתיים כפול שלוש וסיימתם. אני אראה לכם רשת עצבית מלאכותית שבנינו לאחרונה, ועשינו בדיוק את זה.

This is running in real time on a mobile phone, and that's, of course, amazing in its own right, that mobile phones can do so many billions and trillions of operations per second. What you're looking at is a phone looking at one after another picture of a bird, and actually not only saying, "Yes, it's a bird," but identifying the species of bird with a network of this sort. So in that picture, the x and the w are known, and the y is the unknown. I'm glossing over the very difficult part, of course, which is how on earth do we figure out the w, the brain that can do such a thing? How would we ever learn such a model?

זה רץ בזמן אמת על טלפון נייד, וזה, כמובן, מדהים בפני עצמו, הטלפונים הניידים יכולים לעשות כל כך הרבה מליארדים וטריליונים של פעולות לשניה. מה שאתם רואים זה טלפון מביט בתמונות של ציפורים אחת אחרי השניה, ולמעשה לא רק אומר, "כן, זו ציפור," אלא מזהה את המין של הציפור עם רשת מסוג כזה. אז בתמונה הזו, ה X וה W ידועים, וה Y לא ידוע. אני מרפרף על החלקים הממש קשים כמובן, שזה איך בעצם אנחנו מגלים את ה W, המוח שיכול לעשות כזה דבר? איך אי פעם נלמד מודל כזה?

So this process of learning, of solving for w, if we were doing this with the simple equation in which we think about these as numbers, we know exactly how to do that: 6 = 2 x w, well, we divide by two and we're done. The problem is with this operator. So, division -- we've used division because it's the inverse to multiplication, but as I've just said, the multiplication is a bit of a lie here. This is a very, very complicated, very non-linear operation; it has no inverse. So we have to figure out a way to solve the equation without a division operator. And the way to do that is fairly straightforward. You just say, let's play a little algebra trick, and move the six over to the right-hand side of the equation. Now, we're still using multiplication. And that zero -- let's think about it as an error. In other words, if we've solved for w the right way, then the error will be zero. And if we haven't gotten it quite right, the error will be greater than zero.

אז התהליך הזה של למידה, או פתירת ה W, אם היינו עושים את זה עם משוואות פשוטות בהן אנחנו חושבים על אלה כמספרים, אנחנו יודעים בדיוק איך לעשות את זה: 6 = 2 *w, ובכן, אנחנו מחלקים בשניים וסיימנו. הבעיה היא עם הפעולה הזו. אז, חלוקה -- השתמשו בחלוקה בגלל שהיא ההפך מכפל, אבל כמו שכרגע אמרתי, ההכפלה היא מעט שקר פה. זו פעולה מאוד מאוד מורכבת, מאוד לא לינארית; אין לה פעולה הופכית. אז אנחנו צריכים למצוא דרך לפתור את המשוואה בלי פעולת החילוק. והדרך לעשות את זה היא די ישירה. אתם פשוט אומרים, בואו נשחק בטריק אלגבראי פשוט, ונעביר את השש לצד ימין של המשוואה. עכשיו, אנחנו עדיין משתמשים בכפל. והאפס הזה -- בואו נחשוב עליו כשגיאה. במילים אחרות, אם פתרנו עבור W נכון, אז השגיאה תהיה אפס. ואם זה לא יצא לנו ממש נכון, השגיאה תהיה גדולה מאפס.

So now we can just take guesses to minimize the error, and that's the sort of thing computers are very good at. So you've taken an initial guess: what if w = 0? Well, then the error is 6. What if w = 1? The error is 4. And then the computer can sort of play Marco Polo, and drive down the error close to zero. As it does that, it's getting successive approximations to w. Typically, it never quite gets there, but after about a dozen steps, we're up to w = 2.999, which is close enough. And this is the learning process.

אז עכשיו אנחנו יכולים פשוט לנחש כדי להקטין את השגיאה, וזה סוג הדבר שמחשבים ממש טובים בו. אז לקחתם ניחוש ראשוני: מה אם W=0? ובכן, אז השגיאה היא 6. מה עם W =1? השגיאה היא 4. ואז המחשב יכול לשחק סוג של מרקו פולו, ולהוריד את השגיאה קרוב לאפס. וכשהוא עושה את זה, הוא מקבל קרובים עוקבים ל W. ובאופן טיפוסי, הוא לעולם לא ממש מגיע לשם, אבל אחרי בערך שנים עשר צעדים, אנחנו מגיעים ל W = 2.999, שזה קרוב מספיק. וזה תהליך הלמידה.

So remember that what's been going on here is that we've been taking a lot of known x's and known y's and solving for the w in the middle through an iterative process. It's exactly the same way that we do our own learning. We have many, many images as babies and we get told, "This is a bird; this is not a bird." And over time, through iteration, we solve for w, we solve for those neural connections.

אז זכרו שמה שמתרחש פה זה שלקחנו הרבה X ידועים ו Y ידועים ופתרנו עבור W במרכז דרך תהליך לולאתי. זו בדיוק אותה דרך שאנחנו עושים את הלמידה בעצמנו. יש לנו הרבה הרבה תמונות כתינוקות ואומרים לנו, "זו ציפור; זו לא ציפור." ובמשך הזמן, דרך חזרה, אנחנו פותרים עבור W, אנחנו פותרים עבור החיבורים העצביים האלה.

So now, we've held x and w fixed to solve for y; that's everyday, fast perception. We figure out how we can solve for w, that's learning, which is a lot harder, because we need to do error minimization, using a lot of training examples.

אז עכשיו, החזקנו את X ואת W קבועים כדי לפתור עבור Y; זו תפישה מהירה, יום יומית. הבנו איך אנחנו יכולים לפתור עבור W, זה למידה, שהיא הרבה יותר קשה, בגלל שאנחנו צריכים לעשות מזעור שגיאות, בשימוש בהרבה דוגמאות אימון.

And about a year ago, Alex Mordvintsev, on our team, decided to experiment with what happens if we try solving for x, given a known w and a known y. In other words, you know that it's a bird, and you already have your neural network that you've trained on birds, but what is the picture of a bird? It turns out that by using exactly the same error-minimization procedure, one can do that with the network trained to recognize birds, and the result turns out to be ... a picture of birds. So this is a picture of birds generated entirely by a neural network that was trained to recognize birds, just by solving for x rather than solving for y, and doing that iteratively.

ולפני בערך שנה, אלכס מורדבינטסב, בצוות שלנו, החליט להתנסות עם מה שקורה אם אנחנו מנסים לפתור עבור X, בהתחשב ב W ו Y ידועים. במילים אחרות, אתם יודעים שזו ציפור, וכבר יש לכם את הרשת העצבית שאימנתם על ציפורים, אבל מה היא התמונה של הציפור? מסתבר שבשימוש בדיוק באותו תהליך מזעור שגיאות, שאפשר לעשות עם רשת שמאומנת להכיר ציפורים, ומסתבר שהתוצאה היא... תמונה של ציפורים. אז זו תמונה של ציפורים שמייוצרת לגמרי על ידי רשת עצבית שאומנה להכיר ציפורים, פשוט על ידי פיתרון ל X במקום לפתור ל Y, ולעשות את זה בחזרתיות.

Here's another fun example. This was a work made by Mike Tyka in our group, which he calls "Animal Parade." It reminds me a little bit of William Kentridge's artworks, in which he makes sketches, rubs them out, makes sketches, rubs them out, and creates a movie this way. In this case, what Mike is doing is varying y over the space of different animals, in a network designed to recognize and distinguish different animals from each other. And you get this strange, Escher-like morph from one animal to another.

הנה דוגמה כיפית נוספת. זו היתה עבודה שנעשתה על ידי מייק טייקה בקבוצה שלנו, שנקראה "תהלוכת החיות." זה מזכיר לי מעט את האמנות של ווליאם קמטרידג', בה הוא יוצר איורים, מוחק אותם, יוצר איורים, מוחק אותם, ויוצר סרט בדרך זו. במקרה הזה, מה שמייק עושה זה לשנות את Y במרחב של חיות שונות, ברשת שמתוכננת להכיר ולהבחין בין חיות שונות. ואתם מקבלים את הסוג במוזר הזה של שינויים כמו של אשר, מחיה אחת לאחרת.

Here he and Alex together have tried reducing the y's to a space of only two dimensions, thereby making a map out of the space of all things recognized by this network. Doing this kind of synthesis or generation of imagery over that entire surface, varying y over the surface, you make a kind of map -- a visual map of all the things the network knows how to recognize. The animals are all here; "armadillo" is right in that spot.

פה הוא ואלכס יחד ניסו להפחית את ה Y לחלל של רק שני מימדים, לכן הם יוצרים מפה מהחלל של כל הדברים שמוכרים על ידי הרשת הזו. לעשות סוג כזה של סינטזה או יצירה של תמונות על פני כל המשטח, שמשנים את Y על המשטח, אתם עושים סוג של מפה -- מפה ויזואלית של כל הדברים שהרשת יודעת איך להכיר. החיות כולן פה; "ארמדילו" בדיוק בנקודה הזו בנקודה ההיא.

You can do this with other kinds of networks as well. This is a network designed to recognize faces, to distinguish one face from another. And here, we're putting in a y that says, "me," my own face parameters. And when this thing solves for x, it generates this rather crazy, kind of cubist, surreal, psychedelic picture of me from multiple points of view at once. The reason it looks like multiple points of view at once is because that network is designed to get rid of the ambiguity of a face being in one pose or another pose, being looked at with one kind of lighting, another kind of lighting. So when you do this sort of reconstruction, if you don't use some sort of guide image or guide statistics, then you'll get a sort of confusion of different points of view, because it's ambiguous. This is what happens if Alex uses his own face as a guide image during that optimization process to reconstruct my own face. So you can see it's not perfect. There's still quite a lot of work to do on how we optimize that optimization process. But you start to get something more like a coherent face, rendered using my own face as a guide.

אתם יכולים לעשות זאת גם עם סוגים אחרים של רשתות. זו רשת שמתוכננת להכיר פרצופים, כדי להבחין בין פרצוף אחד לאחר. ופה, אנחנו שמים את ה Y שאומר, "אני," הפרמטרים של הפנים שלי. וכשהדבר הזה פותר ל X, הוא יוצר תמונה די משוגעת, סוג של תמונה קוביסטית, סוראליסטית, ופסיכדלית שלי ממספר נקודות צפיה יחד. הסיבה שזה נראה כמו מספר רב של נקודות מבט יחד זה בגלל שהרשת הזו מעוצבת להפתר מדו-משמעות של פנים שבפוזה אחת או אחרת, כשמסתכלים עליהם עם סוג אחד של תאורה, או סוג אחר של תאורה. אז כשאתם עושים סוג זה של בנייה מחדש, אם אתם לא משתמשים בסוג מסווים של תמונת הנחייה או סטטיסטיקה מנחה, אז אתם תקבלו סוג של בלבול מנקודות מבט שונות, בגלל שזה דו משמעי. זה מה שקורה אם אלכס משתמש בפנים של עצמו כתמונה מנחה במהלך תהליך האופטימיזציה כדי לבנות מחדש את הפנים שלי. אז אתם יכולים לראות שזה לא מושלם. יש עדיין די הרבה עבודה לעשות על איך אנחנו עושים מיטוב של תהליך המיטוב. אבל אתם מתחילים לקבל משהו יותר כמו פנים ברורות, שמצויירות בשימוש בפנים שלי כהנחיה.

You don't have to start with a blank canvas or with white noise. When you're solving for x, you can begin with an x, that is itself already some other image. That's what this little demonstration is. This is a network that is designed to categorize all sorts of different objects -- man-made structures, animals ... Here we're starting with just a picture of clouds, and as we optimize, basically, this network is figuring out what it sees in the clouds. And the more time you spend looking at this, the more things you also will see in the clouds. You could also use the face network to hallucinate into this, and you get some pretty crazy stuff.

אתם לא צריכים להתחיל עם קאנבס ריק או עם צליל לבן. כשאתם פותרים עבור X, אתם יכולים להתחיל עם X, שהוא בעצמו כבר תמונה אחרת. זו מה שההדגמה הקטנה הזו. זו רשת שמתוכננת לקטלג כל מיני אובייקטים שונים -- מבנים מעשה ידי אדם, חיות... פה אנחנו מתחילים עם רק תמונה של עננים, וכשאנחנו ממטבים, בעיקרון, הרשת הזו מבינה מה היא רואה בעננים. וככל שאתם מבלים יותר זמן בלהביט בהם, אתם גם תראו הרבה יותר דברים בעננים. אתם תוכלו גם להשתמש ברשת הפנים כדי להזות לתוך זה, ואתם מקבלים דברים די מטורפים.

(Laughter)

(צחוק)

Or, Mike has done some other experiments in which he takes that cloud image, hallucinates, zooms, hallucinates, zooms hallucinates, zooms. And in this way, you can get a sort of fugue state of the network, I suppose, or a sort of free association, in which the network is eating its own tail. So every image is now the basis for, "What do I think I see next? What do I think I see next? What do I think I see next?"

או, מייק עשה כמה ניסויים אחרים בהם הוא לוקח את תמונת העננים ההיא, הוזה, עושה זום, הוזה, זום, הוזה, זום. ובדרך זו, אתם יכולים לקבל סוג של מצב של פוגה של הרשת, אני מניח, או סוג של אסוציאציה חופשית, בה הרשת אוכלת את הזנב של עצמה. אז כל תמונה היא עכשיו הבסיס, ל"מה אני חושב שאני רואה עכשיו? מה אני חושב שאני רואה עכשיו? מה אני חושב שאני רואה עכשיו?"

I showed this for the first time in public to a group at a lecture in Seattle called "Higher Education" -- this was right after marijuana was legalized.

הראתי את זה בפעם הראשונה בציבור לקבוצה בהרצאה בסיאטל שנקראה "חינוך גבוה יותר" -- זה היה מייד אחרי שמריחואנה הפכה לחוקית.

(Laughter)

(צחוק)

So I'd like to finish up quickly by just noting that this technology is not constrained. I've shown you purely visual examples because they're really fun to look at. It's not a purely visual technology. Our artist collaborator, Ross Goodwin, has done experiments involving a camera that takes a picture, and then a computer in his backpack writes a poem using neural networks, based on the contents of the image. And that poetry neural network has been trained on a large corpus of 20th-century poetry. And the poetry is, you know, I think, kind of not bad, actually.

אז הייתי רוצה לסיים במהירות פשוט בלהעיר שהטכנולוגיה הזו לא מוגבלת. הראתי לכם דוגמאות ויזאוליות לגמרי בגלל שבאמת כיף להביט בהן. זו לא טכנולוגיה ויזואלית לגמרי. האמן ששיתף איתנו פעולה, רוס גודווין, עשה ניסויים שכללו מצלמה שמצלמת תמונות, ואז המחשב בתיק שלו כותב פואמה בשימוש ברשתות עצביות, בהתבסס על התוכן של התמונה. והרשת העצבית הזו של השירה אומנה על קורפוס גדול של השירה של המאה ה20. והשירה היא, אתם יודעים, אני חושב, סוג של לא רעה, למעשה.

(Laughter)

(צחוק)

In closing, I think that per Michelangelo, I think he was right; perception and creativity are very intimately connected. What we've just seen are neural networks that are entirely trained to discriminate, or to recognize different things in the world, able to be run in reverse, to generate. One of the things that suggests to me is not only that Michelangelo really did see the sculpture in the blocks of stone, but that any creature, any being, any alien that is able to do perceptual acts of that sort is also able to create because it's exactly the same machinery that's used in both cases.

לסיכום. אני חושב שעבור מיכאלאנג'לו, אני חושב שהוא צדק; תפישה ויצירתיות מחוברות מאוד אינטימית. מה שכרגע ראינו הן רשתות עצביות שלגמרי מאומנות להפלות, או להכיר דברים שונים בעולם, מסוגלת להיות מורצת אחורנית, כדי לייצר. אחד הדברים שמראים לי הם לא רק שמיכאלאנג'לו באמת ראה את הפסל בתוך בלוק האבן, אלא שכל יצור, כל ישות, כל חייזר שמסוגל לעשות פעולות תפישתיות מסוג כלשהו גם מסוגל ליצור בגלל שזה בדיוק אותו מנגנון שבשימוש בשני המקרים.

Also, I think that perception and creativity are by no means uniquely human. We start to have computer models that can do exactly these sorts of things. And that ought to be unsurprising; the brain is computational.

כמו כן ,אני חושב שתפישה ויצירתיות הן בשום צורה לא רק אנושיות. מתחילים להיות לנו מודלים ממוחשבים שיכולים לעשות בדיוק דברים מהסוג הזה. וזה לא צריך להפתיע; המוח הוא חישובי.

And finally, computing began as an exercise in designing intelligent machinery. It was very much modeled after the idea of how could we make machines intelligent. And we finally are starting to fulfill now some of the promises of those early pioneers, of Turing and von Neumann and McCulloch and Pitts. And I think that computing is not just about accounting or playing Candy Crush or something. From the beginning, we modeled them after our minds. And they give us both the ability to understand our own minds better and to extend them.

ולבסוף, מחשוב החל כתרגיל בעיצוב מכונות חכמות. הוא מודל אחר הרעיון של איך אנחנו יכולים ליצור מכונות חכמות. ואנחנו לבסוף מתחילים להגשים עכשיו כמה מההבטחות של החלוצים הראשונים האלה, של טיורינג וואן ניומן ומקקולוך ופיטס. ואני חושב שמחשוב לא נוגע רק לחשבונאות או לשחק קנדי קראש או משהו. מההתחלה, מידלנו אותם לפי המוח שלנו. והם נותנים לנו גם את היכולת להבין את המוחות שלנו טוב יותר ולהרחיב אותם.

Thank you very much.

תודה רבה לכם.

(Applause)

(מחיאות כפיים)

(Laughter)

(צחוק)

I showed this for the first time in public to a group at a lecture in Seattle called "Higher Education" -- this was right after marijuana was legalized.

(Laughter)

(צחוק)