Last month (Nov 2023) Google’s AI research group, DeepMind, published an academic paper entitled “Levels of AGI: Operationalizing Progress on the Path to AGI” in which they set out to define artificial general intelligence (AGI). A big step and a big call by Google, no doubt about that. I welcome their paper, however. I think it’s a good one.
In this post I’m going to present a summary of what Google has released and then my commentaries. The post will be broken down into the corresponding sections of the publication.
1. Introduction
The key quote in this section is this one:
[I]f you were to ask 100 AI experts to define what they mean by “AGI,” you would likely get 100 related but different definitions.
page 1.
That’s the problem. We’re all using terms in the field of AI without a clear consensus of what we mean by them. The purpose of this paper, then, is to clear this mess up partially by explicitly reflecting on what is meant by AGI and then attempting to provide quantifiable attributes like the performance, generality, and autonomy of AI systems to fit into this definition.
2. Defining AGI: Case Studies
This section is akin to a literature review. It looks at what other organisations or people have proposed as a definition for AGI. Nine case studies are examined. I’ll summarise most of them.
Case Study 1: The Turing Test
Turing’s famous “imitation game” is looked at here where fooling a human into thinking it is talking to another human being is the goal of the test after which one can deduce that the machine passing the test can “think”. And so a thinking machine has achieved AGI.
Here is where an important step is taken by Google. Whether a machine can think or not is deemed a philosophical question that does not focus on a machine’s capabilities. Because machines’ capabilities are:
much more straightforward to measure and more important for evaluating impacts. Therefore we propose that AGI should be defined in terms of capabilities rather than processes.
page 2 [emphasis mine].
So, a definition of AGI should be framed in terms of what a program can DO rather than whether a machine can think.
Case Studies 2 and 3: Systems Possessing Consciousness or Mimicking the Human Brain
Some have proposed to define AGI in terms of whether a machine is said to understand and have other cognitive states. However, no consensus exists to test for such things as consciousness. So, as with Case Study 1, Google suggests that one should steer clear of process-oriented definitions of AGI and frame one in terms of capabilities.
Likewise, the machine does not have to operate or process things like a human brain – capabilities (final results) is what counts.
Case Study 4: Human-Level Performance on Cognitive Tasks
Some researchers have suggested that an AGI machine is one that can do the cognitive tasks (i.e. non-physical/robotic tasks) that people can typically perform. But ambiguity exists with this approach because no consensus has been proposed as to which tasks and which type of people this definition would entail.
Case Study 6: Economically Valuable Work
This section looks at how OpenAI uses the term AGI:
[AGI are] highly autonomous systems that outperform humans at most economically valuable work
OpenAI Charter, 2018.
Google’s research group likes this definition because it focuses on capabilities rather than processes. It also provides a yardstick for measurement: economic value. But the definition does not capture aspects of intelligence that aren’t directly in the scope of economic value such as artistic creativity or emotional intelligence. And also the definition does not take into consideration machines that may have potential economic value but are not deployed in the world for various reasons such as ethical, legal, and social. Such systems would not be able to realise their economic value.
Case Study 7 and 9: Flexible and General
Gary Marcus, a leading expert in AI, has suggested on X that AGI is:
shorthand for any intelligence (there might be many) that is flexible and general, with resourcefulness and reliability comparable to (or beyond) human intelligence.
X post, 25 May 2022 (retrieved 23 December 2023).
DeepMind also likes this definition because it captures both generality and performance. Current state-of-the-art LLMs, for example, appear to have significant generality but their performance is lacking (they still make basic mistakes). Noteworthy is also the need, according to Prof. Marcus, for a machine to be flexible implying that it will need to learn and adapt to achieve sufficient generality.
3. Defining AGI: Six Principles
After analysing what others have proposed for a definition of AGI, Google sits down and identifies “properties and commonalities that [they] feel contribute to a clear, operationalizable definition of AGI” (pg. 4).
Here we go!
So, AGI needs to meet the following six criteria:
- Focus on Capabilities, not Processes. So, a machine does not need to think or understand or have sentience or consciousness to achieve AGI. What matters is what tasks it can and can’t perform.
- Focus on Generality and Performance. The next section will elucidate how these interplay and their varying levels.
- Focus on Cognitive and Metacognitive Tasks. There is some debate whether to include robotic embodiment in a definition of AGI. Google suggests that the ability to perform physical tasks simply increases a system’s generality and hence is not a prerequisite for AGI.
- Focus on Potential, not Deployment. The deployment of an AGI system should not be a prerequisite for AGI. Just showing that the requisite criteria have been met (as per the next section) should suffice. This will avoid such things as legal and ethical considerations that could hinder forms of deployment.
- Focus on Ecological Validity. Tasks that an AI system should be able to do to be given an AGI status should be aligned with the real-world, i.e. they should be tasks that people value.
- Focus on the Path to AGI, not a Single Endpoint. Being inspired by the success of adopting a standard set of Levels of Driving Automation for autonomous cars, Google is also suggesting that we do the same for AGI. That is, they posit value in defining “Levels of AGI”, rather than a single endpoint. The next section will define these levels.
4. Levels of AGI
The publication here presents a table in which they show the different levels of AGI in terms of capability (rows) and generality (columns). I’m going to include a simplified version of this table here. Note the different levels of AGI in the third column starting from row “Level 1: Emerging”. (Highlighted portions in orange below are mine)
Performance (rows) x Generality (columns) | Narrow (clearly scoped task or set of tasks) | General (wide range of non-physical tasks) |
---|---|---|
Level 0: No AI | Narrow Non-AI calculator software; compiler | General Non-AI human-in-the-loop computing, e.g., Amazon Mechanical Turk |
Level 1: Emerging equal to or somewhat better than an unskilled human | Emerging Narrow AI simple rule-based systems | Emerging AGI ChatGPT, Bard, Llama 2 |
Level 2: Competent at least 50th percentile of skilled adults | Competent Narrow AI Smart Speakers such as Siri, LLMs for a subset of tasks (e.g., short essay writing, simple coding) | Competent AGI not yet achieved |
Level 3: Expert at least 90th percentile of skilled adults | Expert Narrow AI generative image models such as Imagen or Dall-E 2 | Expert AGI not yet achieved |
Level 4: Virtuoso at least 99th percentile of skilled adults | Virtuoso Narrow AI Deep Blue, AlphaGo | Virtuoso AGI not yet achieved |
Level 5: Superhuman outperforms 100% of humans | Superhuman Narrow AI AlphaFold, StockFish | Artificial Superintelligence (ASI) not yet achieved |
Hence, according to DeepMind, we’ve only achieved the Emerging AGI status with our latest LLMs (e.g. ChatGPT).
5. Testing for AGI
With respect to testing for the different levels of AGI a number of questions need to be asked:
What is the set of tasks that constitute the generality criteria? What proportion of such tasks must an AI system master to achieve a given level of generality in our schema? Are there some tasks that must always be performed to meet the criteria for certain generality levels, such as metacognitive tasks?
page 8.
Challenging tasks and benchmarks (constantly updated) are needed to deal with these questions. The paper, however, leaves all this for future work. It wants to get the ball rolling by initially clarifying the ontology a benchmark should attempt to measure.
6. Risk in Context: Autonomy and Human-AI Interaction
Providing an ordered framework for AGI levels will make it easier to analyse and categorise risk for AI. In this section, Google also provides a table specifying different levels of AI autonomy to further improve risk assessment.
I won’t discuss this section further as I want to focus more on the definition of AGI in this post rather than anything else that may stem from it.
7. Commentary
As I said earlier, I welcome this attempt by DeepMind to define AGI. It’s been a long time coming. Whenever the term AGI is used anywhere (e.g. in the media) nobody knows exactly what is meant by it. Some think in purely practical terms, as discussed above, but some allow their imaginations to run wild and automatically think about consciousness, understanding, and machines taking over worlds. So, which is it? Currently, nobody knows! And that’s the problem.
Hopefully this paper will help the current state of affairs. Whether it will be utilised, whether the levels of AGI will henceforward be referenced is another question.
I also like the fact that Google has decided to ground AGI in purely practical terms: capability and generality measured against human competence. Computer science venturing into the realm of philosophy and discussing things like consciousness is muddying the waters and undoubtedly asking for trouble. There’s no need for this.
However, the waters are already muddied because we use the word “intelligence” in the context of machines – even if we precede it with the adjectives “artificial” or “artificial general”. I’ve discussed this before (“The Need for New Terminology in AI“). Intelligence is a loaded term that implies something profound in the existence of an entity that is said to be intelligent. In my last post (“AI Needs to be Unmasked“) I talked about how AI is just if-else statements executed at incredible speed. That’s all it is and there’s certainly nothing magical about it.
So, just like Google decided to steer clear of words like consciousness and understanding, perhaps the word “intelligence” should also be avoided. We’re not being precise when we use it around machines (especially when we’re focusing on capabilities rather than processes). A key indicator of this is how easily everything is classified as AI. Realistically speaking, however, the terms are here to stay, I know. But one can dream. (Can you picture, though, how the hype around AI would diminish if it was suddenly being referred to as Applied Statistics?)
In conclusion, I’m glad we have a reference point when discussing AGI. It’ll make things easier for all of us. The taxonomy presented by Google seems to me to be a good one. Let’s see where this all goes in the future.
(Note: If this post is found on a site other than zbigatron.com, a bot has stolen it – it’s been happening a lot lately)
To be informed when new content like this is posted, subscribe to the mailing list: