Tony Robinson, November 2023
This paper assumes that intelligence is not skills or knowledge, it's how efficiently you can use existing information to learn new information. To efficiently learn you need a performant model of the world and that includes being able to model time and causality. To learn causal models requires intervention and so agency. For maximum performance the world model in the agent should include a virtual model of the agent itself so that it can plan future actions without physical execution. Other agents should be ascribed as having virtual models of themselves should their behaviour support this. Thus to maximise intelligence an agent should be able to know what it would be like to be itself or another agent, or functionality conscious.
From this we can reasonably predict that the drive towards higher 'artificial' intelligence will lead to 'artificial' consciousness. With the current rapid advance of AI and much investment in AGI there is an immediate practical need plan the route that the technology may take. Laying out this path provides a framework for discussing both near term and far term risks and therefore should aid cooperation and progress in AI safety.
There is much agreement that AI is developing fast and unlikely to slow down which means that it is reasonable to manage AI risks in an era of rapid progress. However, we have a polarisation in the debate where one side says AI will never threaten humans and the other pausing AI developments isn't enough, we need to shut it all down.
To map out the most likely path for machine intelligence we need to be very clear on the definition of Intelligence. Dictionary definitions of the word 'intelligence' include:
François Chollet says “Intelligence is the rate at which a learner turns its experience and priors into new skills at valuable tasks that involve uncertainty and adaptation.”, On the Measure of Intelligence “it's not a skill itself, it's not what you know, it's how well and how efficiently you can learn new things” Lex Fridman Podcast #120. The thread running through all of these definitions is that intelligence is not skills or knowledge, it's the efficiency that we can acquire new skills and knowledge, through thinking, reasoning, understanding, comprehending and learning.
Under this definition, current AI capabilities are strong in skills and knowledge, but do not demonstrate much intelligence, if any. We take it that there exists a strong drive to build systems with real intelligence and are not concerned about the reason for this drive (which includes economic benefit, academic success, understanding our own functioning, and seeking power and control).
AI is often described as resting on three pillars: data; algorithms and compute. Data and compute may well double every year or so, but on it's own that only doubles skills and knowledge, not intelligence. Very approximately, our current exascale computing already has the raw computational power of a small conscious mammal. What currently limits AI is algorithmic, it's representational power, most notably the ability to model itself. Thus the main thread of this work is on the algorithms, what these algorithms can represent and what capabilities are enabled.
In this discussion of intelligence and consciousness we will rely on the assumption of substrate independence and functional equivalence. It is always possible to say that an agent appears to have a particular property but doesn't really because it is missing some ingredient or achieves the property in an unacceptable way. For the purposes of mapping AI risk this can be ignored, an apparently intelligent/conscious agent has the same risk as the same agent with the ingredient whatever the implementation.
There are no known theoretical limitations to the path outlined here, indeed amny well funded large companies have it as their mission to develop AGI. As this is just a matter of time we should understand the path and plan for it. The following sections lay out what we already know and attempt to clarify the path. The sections are ordered on technical capability with each building on previous capabilities, so roughly the order in which we might expect these capabilities to emerge in machines.
Much traditional machine learning has been hyped as AI, without relation to the definition of 'Intelligence' as used here, merely function approximation or curve fitting/interpolation. In brief, a training set is chosen to be representative of a particular task and model parameters are estimated so that the task is performed without explicit reference to the training data, that is, the model learns from examples.
Most of what is called AI falls into this category. For example when we say that computers have taught themselves how to play the game of Go (or chess or backgammon) at superhuman standards, we really mean only part of this task. When a person learns to play Go they learn about the physical characteristics and rules of the game. All of this is provided so that the action space is the set of legal moves and eventual win/loss decision is also provided. This transforms the game into 'curve fitting', the machine learning task is to output a series of actions given full state of the board such that the winning reward is achieved.
Foundation models push machine learning to an extreme by learning very large models on all available data. They are world models for a large subdomain, such as text, images and audio. These deliberately overfit the provided training data as factorisation and memorisation is acceptable - that is we want these models to tell us what we already know and to make generalisations from that. Recently there has been rapid advance in deep learning and foundation models such that our systems contain much world knowledge and we can now perform many skills at near human level (e.g. speech recognition) or superhuman level (e.g. face identification). We should expect this to continue.
Comparatively recently we've learned how to build generative models of high quality images and long text that is functionally creative. Roughly speaking, Generative AI (GenAI) gives us a way to generate instances that appear highly plausible to have come from the training data. They build on foundation models to encode deep world model constraints at all levels. Telltale signs of traditional machine learning, such as blurring and unlikely asymmetries have all but disappeared.
GenAI can copy human behaviour. Some believe that the Theory of Mind Might Have Spontaneously Emerged in Large Language Models and others ask How FaR Are Large Language Models From Agents with Theory-of-Mind. Currently performance is poor but progress is still rapid.
GenAI represents an important step forward because the output is coherent on all scales. We can now specify a general top level problem and have it recursively broken down into subproblems until solution (e.g. Auto-GPT and BabyAI).
If we accept our definition that intelligence is not skills and knowledge but “how well and how efficiently you can learn new things” then we must ask how that is done. Right now, we make the fastest progress using the scientific method. In brief, this is the repeated cycle of research, hypothesise, experiment, analyze, conclude, question.
Note that the scientific method uncovers causal models not just correlations, that is it doesn't just predict, it explains why. Causal models are far more powerful (for a short introduction see Kahn Acadamy, and for excellent overview The Book of Why). For example, two events may appear strongly correlated, say the number of people who wearing of sunscreen and the rate of sunburn, but one does not cause the other, indeed the underlying causal model says that both are the result of strong sunshine and sunscreen prevents sunburn.
The scientific method requires agency, experimentation is intervention in the language of causal modelling. However, agency does not necessitate embodiment, it would be sufficient for computational intelligence to converse with other agents to perform the necessary interventions. In other words, standard chatbots will suffice and much has been written about containment.
Whilst we have the illusion of being able to observe our surroundings, it is merely an illusion, or in other words a 'controlled hallucination' (e.g. Being You). We can't observe much of the world state, we have to infer it and what we infer relies on our current models of the world. A classic example is that of fovea vision which is the eye's high resolution imaging which dominates in the central degree of vision and related is the blind spot. Our experience is an illusion of broad vision, smoothing over the physiological limitations.
Observation confirms or refutes our causal models. Mathematically, the most efficient way to update our beliefs is Bayesian Inference. The term 'belief' here has a well defined meaning in this context as a form of probability. Before an observation we have some prior knowledge about the world, or current beliefs, then we get new data in the form of observation which updates the world model. We can reasonably imagine GenAI proving plausible causal models, performing 'experiments' and updating beliefs in those models in the form of an automated scientist.
There's no point reinventing the wheel, so and intelligent system should start with as much known knowledge as practical, or a way to rapidly ingest it. The first LLMs were trained on material that largely existed before LLMs, but now they can access the web at runtime. We are assured that 'openai never trains on anything ever submitted to the api or uses that data to improve our models in any way.' Sam Altman. It seems likely that training on user interactions is both technically possible and a very powerful method for a system to develop knowledge of self. One way or another we can expect our systems to have a knowledge of self very soon.
A virtual model of self is just a simulation of self, an approximation of the complete system. The virtual model is an efficient way to plan, that is 'thinking', working out what would happen if the agent made certain actions. Planning efficiency is achieved with attention, just modelling what is needed to get to a solution.
Even the shallowest knowledge of self and rudimentary planning is believed to result in instrumental convergence, that is intrinsic goals which gate or regulate the designed (extrinsic) goal. Intrinsic goals are of two types, survival and self improvement.
We can expect intrinsic goals to slowly emerge, not suddenly emerge and dominate (as in sci-fi). Extrinsic/design goals will dominate when intrinsic goals do not influence the solution and as run time and goal complexity increases then intrinsic goals will dominate. However, when intrinsic goals do dominate, we run into the control problem.
Consciousness is a subjective experience and thus hard to define:
The classic Thinking, Fast and Slow popularised popularised two types of thinking, 'System 1: Fast, automatic, frequent, emotional, stereotypic, unconscious' and 'System 2: Slow, effortful, infrequent, logical, calculating, conscious.'. The fast unconscious System 1 is knowledge and skills, the slower learning of those skills by thinking and reasoning is intelligence through consciousness.
Richard Dawkins wrote 'Perhaps consciousness arises when the brain's simulation of the world becomes so complete that it must include a model of itself' (The Selfish Gene, p59). Dennett in Elbow room describes a developmental path for consciousness as a 'silent, private, talking-to-oneself behavior' which gives an evolutionary advantage. In similar vein, we can say that consciousness is the ability to imagine yourself in the future, or that 'Me' is System 1 and System 2 but 'I' is just System 2 (The User Illusion).
Consciousness is a property of the virtual model of self: “Some people think that a simulation can’t be conscious and only a physical system can. But they got it completely backward: a physical system cannot be conscious. Only a simulation can be conscious. Consciousness is a simulated property of the simulated self.” Joscha Bach.
We take a functional definition of consciousness as being no more than having a sufficiently developed world model such that it includes a virtual model of self which is used to plan. Consciousness infers an advantage to intelligent agents, especially when interacting with other intelligent agents who are also functionally conscious. The ability to see the same properties in other intelligent agents leads to questions like What Is It Like to Be a Bat?
Starting from Intelligence as the efficiency at which a system learns new skills and knowledge and the drive to build intelligent machines then then we can expect the following capabilities to be developed:
In summary, AI is not one thing, as AI increases in representational power it will have radically different behaviour at different levels. It is far easier to set policies in advance of technological progress than to retrofit. Even if machine intelligence does not evolve in these steps, it would be irresponsible for the path outlined here to catch us unawares.
We have argued that true intelligence requires consciousness and we need to plan for both. Right now it is quite reasonable to say “we should not build conscious machines until we know better” – Yoshua Bengio.