Agenda
Narrowing the problem
The world is systems
Data flows between systems
Systems transform inputs to outputs
Intelligence is a measure of prediction accuracy
Intelligence is a function of environmental complexity
Practical applications
Tip: Some of you may find it helpful to read the Practical Applications section first.
Narrowing the Problem
What is Intelligence?
The simple definition is cognitive capability, but it’s not a very helpful definition. Capability to do what? What does it mean for something to be “cognitive”?
Intelligence is the ability to…
Think abstractly
Reason
Acquire knowledge
Again, these are all true, but they don’t really bring us any closer to an understanding of intelligence based on first principles. We’d have to define equally difficult terms like “thinking”, “abstraction”, “reasoning”, and “knowledge”.
In fact, there is an entire branch of philosophy devoted to the definition of knowledge, called epistemology. Unfortunately, we’re not going to make much progress in epistemology in a Substack post.
The World Is Systems
The following are more concrete (although still not my final definition). Intelligence is the ability to…
Adapt to novel situations and solve problems
Benefit from instruction and experience
Act purposefully
These definitions emphasize that intelligence is about a system’s ability to interact with its environment, including other systems. These systems can be biological, like people and animals. They can be composed of groups of organisms, like an ecosystem. They can also be artificial, like the internet, a social network, or a computer. There is a flow of data between systems, and intelligent systems can use that data to act effectively within their environment.
You can see why this definition is more interesting for robotics and AI. We don’t create AI systems simply because we want to create something like a human. We have easier ways of creating more humans. We build robots as tools to help us act in the environment. We want to give them “intelligence” so that they can deal with obstacles in the environment without our help and attention.
In this Substack, I talk a lot about giving AI agency because that’s the route to more useful, beneficial AI. We need our robots and internet bots to be able to roam their environment and perform useful work.
Data Flows Between Systems
The world is a collection of systems: galaxies, solar systems, planets, ecosystems, individual organisms, and societies. There is data flowing between these systems. Distant galaxies exchange radiation, matter, and gravitational effects. Ecosystems exchange nutrients, water, and even organisms.
I would argue that humans operate within the most complicated environment of all. We interact with systems smaller than a single atom and systems larger than the planet we inhabit. We’ve used our intelligence to turn atomic forces, observable on the smallest scale into catastrophic weapons, destructive on the planetary scale. We live within complex social, economic, and political systems, all of which exist to effect change within our world.
Humans take-in all this input through our senses, and we interact with our environment through our choices and interactions. Our intelligence is our ability to use all this data effectively to alter the environment to serve our own purposes.
Systems Transform Inputs into Outputs
We’ve established that the world is full of a variety of systems, all nested within each other. There is data flowing between these systems, meaning each system has inputs and outputs. We know that the outputs are based on the inputs. We make our choices based on our observations and experiences.
The main question I want to address in this post is—how do intelligent systems transform their inputs into outputs?
As I discussed in Artificial Neural Networks: They Think Like Us, networks are systems that transform input signals into output signals, but networks can be trained to do a lot of different transformations.
When I worked on speech recognition systems, we trained neural networks that accepted digitized audio data and transcribed it into words.
Computer vision neural networks transform images into categories of objects. For example, an autonomous vehicle would need to identify pedestrians and road signs in images.
Neural networks can detect email spam, labeling emails as “spam” or “not spam”.
All three of these examples are classification tasks.
Classify audio data into words
Classify image data into classes of objects
Classify emails into spam labels
Neural networks can perform classification tasks like these with a high degree of accuracy, but there is a few big problems.
First Problem: Classification networks need to be trained on lots of examples. When training a speech recognition network, we feed it examples of speech audio, along with the text contained in the audio. For this, you need a large dataset of examples. Depending on the task, it can be expensive or even impossible to create an adequate dataset.
Second Problem: Classification involves a finite set of classes. For example, a classification network can be trained to classify text based on sentiment (angry, happy, envious, etc.), but how do we choose the list of all possible sentiments?
Third Problem: Classification is not general. Even if we could train a network to classify sentiment perfectly, that’s just one tiny piece of the puzzle. Humans don’t hear someone speak, and then run sentiment classification, classify the parts of speech, and generate a tree of all possible meanings. We just “understand”. We are not able to break this down into a sequence of classification tasks.
Intelligence Is a Measure of Prediction Accuracy
Humans do not classify. Sure, there is some teaching, but the human brain is not solely dependent on other humans to teach it. There is a natural process of learning that occurs in every experience that is more fundamental than education.
Neuroscientist Karl Firston proposes that we actually learn simply by trying to predict our own sensory inputs accurately. Biological systems have developed to seek equilibrium. We stay alive by maintaining a steady internal state. It turns out that in order to maintain a steady internal state, you really need to also maintain a steady external state. In other words, you need to be able to predict what’s going to happen in your environment. Friston calls this “minimizing surprise”.
But we’re not just passive observers. We also have the ability to act to shape our environment. This gives us extra power to improve our prediction accuracy because we can put ourselves in positions that make it easy to predict what’s going to happen.
You might think that this would lead to extremely safe behavior in all circumstances, but Friston points out that this is not necessarily the case. Leaving the venomous snake to roam freely outside your house is more likely to lead to surprises than hunting down the snake. There is a balance between traveling safe paths and exploring new ones. In the field of reinforcement learning, this is known as the exploration vs. exploitation trade-off.
This has an interesting consequence. If learning is a process of making better and better predictions about the environment, then intelligence is a measure of how good you are at predicting and acting to make your predictions come true.
This serves as the most concrete definition of intelligence that I’ve found.
Human intelligence comes from our ability to predict our environment accurately (including ourselves as part of our environment). Our large brains play a big role in this, but just as important is our ability to sense our environment and act within it. We use tools both to sense the environment more accurately and to make larger changes to the environment.
Intelligence Is a Function of Environmental Complexity
This also means that intelligence is limited by the complexity of the environment. A fish’s environment just isn’t that complex compared to the hierarchy of systems that humans inhabit. I think the process runs something like this:
Complex environment develops.
The “intelligent” organisms that survive are the ones that are able to sense the most from the environment, act the most within the environment, and predict the results of their actions the most accurately. (big brains + tools)
The intelligent systems are able to create more complex environments while maintaining prediction accuracy, driving higher levels of intelligence.
Humans have made the world more complex with technology, and we’ve had to become smarter to handle it. A perfect example is the development of artificial intelligence. A world with AI is a lot more difficult to navigate, but AI provides dramatically higher potential for agency within the environment. It’s the ultimate tool—a tool that works on its own to shape the environment how we want it.
Practical Applications
As it turns out, transformer-based large language models like the ones powering ChatGPT learn using a similar principle. Rather than classifying the environment based on examples, GPT’s simply learn to predict their next input—the next word in a sentence.
You can think of GPT-4 as an intelligence that is trained in the “environment of internet text”, where it senses words and learns to minimize its surprise when observing the next word.
Based on the preceding sections, we can make the following theories:
GPT intelligence will scale in proportion to its predictive capabilities. Larger models provide more prediction power. I discuss this in depth in my article OpenAI Does Not Need a Breakthrough.
Intelligence is a function of environmental complexity, so GPT models (or other transformer models) will be more intelligent if they can be trained in more complex environments. E.g. multi-modal training that includes images, video, sounds, etc.
Intelligence grows with a system’s ability to act within its environment. Greater agency leads to a more favorable environment, which leads to better prediction accuracy. Another good reason to keep reading Agentic AI.
I’m interested in applying the transformer model architecture to other environments besides internet text, so I’ve started experimenting with Minecraft bots.
There are already a lot of tools for creating simple rule-driven bots. My idea is to create rule-driven bots that can act in a Minecraft environment in order to collect data to train a transformer.
Essentially this would be a way to generate synthetic data to bootstrap the transformer, which could then itself act in the environment and produce higher-quality synthetic data. As I discussed in previous sections, intelligent agents can increase complexity in the environment and produce higher and higher levels of intelligence.
If this method is correct, it could be used to train robots in the real world that are capable of artificial general intelligence (AGI). Training robots would take a lot more compute power than any of us have access to, but we may still be able to prove-out the method in simulation.