So what the heck happened in the field of AI in the last decade? It's like a strange new type of intelligence appeared on our planet. But it's not like human intelligence. It has remarkable capabilities, but it also makes egregious errors that we never make. And it doesn't yet do the deep logical reasoning that we can do. It has a very mysterious surface of both capabilities and fragilities. And we understand almost nothing about how it works. I would like a deeper scientific understanding of intelligence.
But to understand AI, it's useful to place it in the historical context of biological intelligence. The story of human intelligence might as well have started with this little critter. It's the last common ancestor of all vertebrates. We are all descended from it. It lived about 500 million years ago. Then evolution went on to build the brain, which in turn, in the space of 500 years from Newton to Einstein, developed the deep math and physics required to understand the universe, from quarks to cosmology. And it did this all without consulting ChatGPT.
And then, of course, there's the advances of the last decade. To really understand what just happened in AI, we need to combine physics, math, neuroscience, psychology, computer science and more, to develop a new science of intelligence. The science of intelligence can simultaneously help us understand biological intelligence and create better artificial intelligence. And we need this science now, because the engineering of intelligence has vastly outstripped our ability to understand it.
I want to take you on a tour of our work in the science of intelligence that addresses five critical areas in which AI can improve -- data efficiency, energy efficiency, going beyond evolution, explainability and melding minds and machines. Let's address these critical gaps one by one.
First, data efficiency. AI is vastly more data-hungry than humans. For example, we train our language models on the order of one trillion words now. Well, how many words do we get? Just 100 million. It's that tiny little red dot at the center. You might not be able to see it. It would take us 24,000 years to read the rest of the one trillion words.
OK, now, you might say that's unfair. Sure, AI read for 24,000 human-equivalent years, but humans got 500 million years of vertebrate brain evolution. But there's a catch. Your entire legacy of evolution is given to you through your DNA, and your DNA is only about 700 megabytes, or equivalently, 600 million [words]. So the combined information we get from learning and evolution is minuscule compared to what AI gets. You are all incredibly efficient learning machines.
So how do we bridge the gap between AI and humans? We started to tackle this problem by revisiting the famous scaling laws. Here's an example of a scaling law, where error falls off as a power law with the amount of training data. These scaling laws have captured the imagination of industry and motivated significant societal investments in energy, compute and data collection.
But there's a problem. The exponents of these scaling laws are small. So to reduce the error by a little bit, you might need to ten-x your amount of training data. This is unsustainable in the long run. And even if it leads to improvements in the short run, there must be a better way.
We developed a theory that explains why these scaling laws are so bad. The basic idea is that large random datasets are incredibly redundant. If you already have billions of data points, the next data point doesn't tell you much that's new. But what if you could create a nonredundant dataset, where each data point is chosen carefully to tell you something new, compared to all the other data points? We developed theory and algorithms to do just this. We theoretically predicted and experimentally verified that we could bend these bad power laws down to much better exponentials, where adding a few more data points could reduce your error, rather than ten-xing the amount of data.
So what theory did we use to get this result? We used ideas from statistical physics, and these are the equations. Now, for the rest of this entire talk, I'm going to go through these equations one by one.
(Laughter)
You think I'm joking? And explain them to you. OK, you're right, I'm joking. I'm not that mean. But you should have seen the faces of the TED organizers when I said I was going to do that.
Alright, let's move on. Let's zoom out a little bit, and think more generally about what it takes to make AI less data-hungry.
Imagine if we trained our kids the same way we pretrain our large language models, by next-word prediction. So I'd give my kid a random chunk of the internet and say, "By the way, this is the next word." I'd give them another random chunk of the internet and say, "This is the next word." If that's all we did, it would take our kids 24,000 years to learn anything useful. But we do so much more than that.
For example, when I teach my son math, I teach him the algorithm required to solve the problem, then he can immediately solve new problems and generalize using far less training data than any AI system would do. I don't just throw millions of math problems at him.
So to really make AI more data-efficient, we have to go far beyond our current training algorithms and turn machine learning into a new science of machine teaching. And neuroscience, psychology and math can really help here.
Let's go on to the next big gap, energy efficiency. Our brains are incredibly efficient. We only consume 20 watts of power. For reference, our old light bulbs were 100 watts. So we are all literally dimmer than light bulbs.
(Laughter)
But what about AI? Training a large model can consume as much as 10 million watts, and there’s talk of going nuclear to power one-billion-watt data centers. So why is AI so much more energy-hungry than brains?
Well, the fault lies in the choice of digital computation itself, where we rely on fast and reliable bit flips at every intermediate step of the computation. Now, the laws of thermodynamics demand that every fast and reliable bit flip must consume a lot of energy.
Biology took a very different route. Biology computes the right answer just in time, using intermediate steps that are as slow and as unreliable as possible. In essence, biology does not rev its engine any more than it needs to.
In addition, biology matches computation to physics much better. Consider, for example, addition. Our computers add using really complex energy-consuming transistor circuits, but neurons just directly add their voltage inputs, because Maxwell's laws of electromagnetism already know how to add voltage. In essence, biology matches its computation to the native physics of the universe.
So to really build more energy-efficient AI, we need to rethink our entire technology stack, from electrons to algorithms, and better match computational dynamics to physical dynamics.
For example, what are the fundamental limits on the speed and accuracy of any given computation, given an energy budget? And what kinds of electrochemical computers can achieve these fundamental limits? We recently solved this problem for the computation of sensing, which is something that every neuron has to do. We were able to find fundamental lower bounds or lower limits on the error as a function of the energy budget. That's that red curve. And we were able to find the chemical computers that achieve these limits. And remarkably, they looked a lot like G-protein coupled receptors, which every neuron uses to sense external signals. So this suggests that biology can achieve amounts of efficiency that are close to fundamental limits set by the laws of physics itself.
Popping up a level, neuroscience now gives us the ability to measure not only neural activity, but also energy consumption across, for example, the entire brain of the fly. The energy consumption is measured through ATP usage, which is the chemical fuel that powers all neurons. So now let me ask you a question. Let's say in a certain brain region, neural activity goes up. Does the ATP go up or down? A natural guess would be that the ATP goes down, because neural activity costs energy, so it's got to consume the fuel.
We found the exact opposite. When neural activity goes up, ATP goes up and it stays elevated just long enough to power expected future neural activity. This suggests that the brain follows a predictive energy allocation principle, where it can predict how much energy is needed, where and when, and it delivers just the right amount of energy at just the right location, for just the right amount of time.
So clearly, we have a lot to learn from physics, neuroscience and evolution about building more energy-efficient AI. But we don't need to be limited by evolution. We can go beyond evolution, to co-opt the neural algorithms discovered by evolution, but implement them in quantum hardware that evolution could never figure out.
For example, we can replace neurons with atoms. The different firing states of neurons correspond to the different electronic states of atoms. And we can replace synapses with photons. Just as synapses allow two neurons to communicate, photons allow two atoms to communicate through photon emission and absorption. So what can we build with this?
We can build a quantum associative memory out of atoms and photons. This is the same memory system that won John Hopfield his recent Nobel Prize in physics, but this time, it's a quantum-mechanical system built of atoms and photons, and we can analyze its performance and show that the quantum dynamics yields enhanced memory capacity, robustness and recall.
We can also build new types of quantum optimizers built directly out of photons, and we can analyze their energy landscape and explain how they solve optimization problems in fundamentally new ways.
This marriage between neural algorithms and quantum hardware opens up an entirely new field, which I like to call quantum neuromorphic computing.
OK, but let's return to the brain, where explainable AI can help us understand how it works. So now, AI allows us to build incredibly accurate but complicated models of the brain. So where is this all going? Are we simply replacing something we don't understand, the brain, with something else we don't understand, our complex model of it? As scientists, we'd like to have a conceptual understanding of how the brain works, not just have a model handed to us.
So basically, I'd like to give you an example of our work on explainable AI, applied to the retina. So the retina is a multilayered circuit of photoreceptors going to hidden neurons, going to output neurons. So how does it work? Well, we recently built the world's most accurate model of the retina. It could reproduce two decades of experiments on the retina. So this is fantastic.
We have a digital twin of the retina. But how does the twin work? Why is it designed the way it is? To make these questions concrete, I'd like to discuss just one of the two decades of experiments that I mentioned. And we're going to do this experiment on you right now.
I'd like you to focus on my hand, and I'd like you to track it.
OK, great. Let's do that just one more time.
OK. You might have been slightly surprised when my hand reversed direction. And you should be surprised, because my hand just violated Newton's first law of motion, which states that objects that are in motion tend to remain in motion.
So where in your brain is a violation of Newton's first law first detected? The answer is remarkable. It's in your retina. There are neurons in your retina that will fire if and only if Newton's first law is violated. So does our model do that? Yes, it does. It reproduces it.
But now, there's a puzzle. How does the model do it? Well, we developed methods, explainable AI methods, to take any given stimulus that causes a neuron to fire, and we carve out the essential subcircuit responsible for that firing, and we explain how it works. We were able to do this not only for Newton's first law violations, but for the two decades of experiments that our model reproduced. And so this one model reproduces two decades' worth of neuroscience and also makes some new predictions.
This opens up a new pathway to accelerating neuroscience discovery using AI. Basically, build digital twins of the brain, and then use explainable AI to understand how they work. We're actually engaged in a big effort at Stanford to build a digital twin of the entire primate visual system and explain how it works.
But we can go beyond that and use our digital twins to meld minds and machines, by allowing bidirectional communication between them.
So imagine a scenario where you have a brain, you record from it, you build a digital twin. Then you use control theory to learn neural activity patterns that you can write directly into the digital twin to control it. Then, you take those same neural activity patterns and you write them into the brain to control the brain.
In essence, we can learn the language of the brain, and then speak directly back to it. So we recently carried out this program in mice, where we could use AI to read the mind of a mouse.
So on the top row, you're seeing images that we actually showed to the mouse, and in the bottom row, you're seeing images that we decoded from the brain of the mouse. Our decoded images are lower-resolution than the actual images, but not because our decoders are bad. It's because mouse visual resolution is bad. So actually, the decoded images show you what the world would actually look like if you were a mouse.
Now, we can go beyond that. We can now write neural activity patterns into the mouse's brain, so we can make it hallucinate any particular percept we would like it to hallucinate. And we got so good at this that we could make it reliably hallucinate a percept by controlling only 20 neurons in the mouse's brain, by figuring out the right 20 neurons to control. So essentially, we can control what the mouse sees directly, by writing to its brain.
The possibilities of bidirectional communication between brains and machines are limitless. To understand, to cure and to augment the brain.
So I hope you'll see that the pursuit of a unified science of intelligence that spans brains and machines can both help us better understand biological intelligence and help us create more efficient, explainable and powerful artificial intelligence.
But it's important that this pursuit be done out in the open so the science can be shared with the world, and it must be done with a very long time horizon. This makes academia the perfect place to pursue a science of intelligence.
In academia, we're free from the tyranny of quarterly earnings reports. We're free from the censorship of corporate legal departments. We can be far more interdisciplinary than any one company. And our very mission is to share what we learn with the world. For all these reasons, we're actually building a new center for the science of intelligence at Stanford.
While there have been incredible advances in industry on the engineering of intelligence, now increasingly happening behind closed doors, I'm very excited about what the science of intelligence can achieve out in the open.
You know, in the last century, one of the greatest intellectual adventures lay in humanity peering outwards into the universe to understand it, from quarks to cosmology. I think one of the greatest intellectual adventures of this century will lie in humanity peering inwards, both into ourselves and into the AIs that we create, in order to develop a deeper, new scientific understanding of intelligence.
Thank you.
(Applause)