AGI – Artificial General Intelligence: Are We There Yet, and How Will We Know?

For decades, the concept of Artificial General Intelligence (AGI) – software possessing human-level intelligence – has captivated our imaginations, often fueled by Hollywood’s dramatic portrayals. From sentient spacecraft to robot uprisings, the idea of self-aware computers with capabilities matching or exceeding human cognitive abilities across any task can seem both thrilling and terrifying. But beyond the silver screen, AGI is a serious subject for computer scientists, theorists, and philosophers, sparking intense debate about its definition, arrival, and the best path forward.

What Exactly is AGI?

While terms like “AI” are used broadly, AGI has a specific, hypothetical definition. IBM defines it as a stage where an AI system can match or exceed human cognitive abilities across any task. This goes beyond the capabilities of current AI systems, which are typically designed for specific tasks.

Unlike today’s powerful AI models, such as Large Language Models (LLMs) like ChatGPT or Google Gemini, which are trained on massive datasets to predict the next most likely word or response, AGI seeks deeper, self-directed comprehension. It would effectively have its own independent consciousness, capable of autonomously learning, understanding, communicating, and forming goals without constant human guidance. True AGI would need to demonstrate traits we associate with organic lifeforms, including intuitive visual and auditory perception beyond basic identification, creativity that isn’t just regurgitated data, problem-solving incorporating common sense, independent reasoning, and even empathy.

The Problem with Current AI and Benchmarks

The rapid advancements in LLMs have led to chatbots that can hold convincing conversations and even fool people into thinking they’re human. Some have claimed these models are already showing signs of AGI, pointing to instances like a Google engineer’s claim about the LaMDA chatbot or a study where GPT-4.5 allegedly passed the Turing Test.

However, most experts view this as jumping the gun, arguing that these models have primarily mastered the game of imitation rather than developed genuine general intelligence. This brings us to a fundamental challenge: how do we benchmark for AGI?

For many years, the Turing Test was considered a solid benchmark. The idea was that if an AI could convince a human evaluator it was human, it demonstrated human-level intelligence. But this test was deemed broken and obsolete after a chatbot designed as a teenager “passed” it in 2014 simply by acting somewhat unintelligent and deflecting questions.

In the wake of the Turing Test’s decline, developers began using existing difficult human tests like the bar exam, the MCAT, and the MMLU (a benchmark created for evaluating language models’ knowledge) to evaluate their AI models. While these offer a more objective way to compare AIs than the Turing Test, they may not necessarily show progress toward AGI.

A major issue with these benchmarks, particularly for LLMs, is “data contamination”. LLMs are trained on vast amounts of text, often scraped from the internet, which likely includes the very questions used in these evaluation tests. This allows the models to potentially simply regurgitate answers they’ve memorized from their training data rather than performing genuine human-like reasoning. As AI researcher François Chollet puts it, “Memorization is useful, but intelligence is something else.”. Intelligence, in his view, is what you use when you don’t know what to do – it’s about learning in new circumstances, adapting, improvising, and acquiring new skills. Research has shown that changing test questions slightly or using problems created after the model’s training cutoff date can dramatically reduce performance.

A New Path Forward? The Abstraction and Reasoning Corpus (ARC)

Recognizing the limitations of existing benchmarks, François Chollet proposed an alternative in 2019: the Abstraction and Reasoning Corpus (ARC). He describes ARC as a deceptively simple benchmark designed to evaluate AIs for intelligence that is resistant to memorization.

ARC is similar to Raven’s Progressive Matrices, a human IQ test. It presents tasks as pairs of grids (input and output) with colored cells. Based on a pattern shown in one or two examples, the AI must predict the correct output grid for a new input grid. Each task is intended to be novel to the test-taker, making it a test of skill-acquisition efficiency.

Since its introduction, hundreds of AI developers have participated in ARC competitions. Initially, the best AIs could solve only 20% of tasks; by June 2024, this had risen to 34%. This is still far below the 84% most humans can solve.

The ARC Prize: Accelerating Progress and Openness

Developers can access public training and evaluation sets of ARC tasks on GitHub. Entrants must submit their code by November 10, 2024. The critical final evaluation will take place offline on a private set of 100 tasks to prevent leaking and ensure the AIs haven’t seen the questions before. While some teams have reached 43%, winning the grand prize of $500,000 requires an AI to solve 85% of the tasks. A key requirement for prize eligibility is that developers must open source their code. This emphasis on open-sourcing is intended to ensure that breakthroughs don’t remain trade secrets within large corporations.

To further accelerate progress towards AGI by focusing research on the right kind of architectures, Chollet teamed up with Mike Knoop to launch the ARC Prize in June 2024. This competition offers more than $1 million in prizes for AIs that achieve the highest scores on a set of ARC tasks.

Are LLMs a Dead End for AGI? The Debate Continues

Chollet and others believe that the current focus on LLMs, which absorbed nearly half of AI funding in 2023, is not only unlikely to lead to AGI but might be actively slowing progress.

Chollet argues that companies like OpenAI have “set back progress to AGI by five to 10 years” because LLMs have “sucked the oxygen out of the room,” leading almost everyone to focus solely on them.

This view is echoed by Yann LeCun, Meta’s chief AI scientist, who called LLMs “an off-ramp, a distraction, a dead end” on the path to human-level intelligence. Even OpenAI’s CEO, Sam Altman, has reportedly stated that he doesn’t believe simply scaling up LLMs will result in AGI.

However, not everyone agrees that LLMs should be written out of the AGI story. Some innovators, like OpenAI co-creator Ilya Sutskever, suggest that LLMs are a path to AGI, viewing their predictive nature as akin to a genuine understanding of the world. Demis Hassabis, co-founder of Google’s DeepMind, also sees these chatbots as a component in AGI development.

The specific architectures most likely to lead to AGI are still unknown, but Chollet has noted that approaches like active inference, DSL program synthesis, and discrete program search have performed well on ARC so far, and he encourages exploring novel methods, including deep learning models.

When Can We Expect AGI?

Predicting the arrival of AGI is notoriously difficult, with experts’ forecasts varying wildly. Some researchers believe it has already arrived, citing instances like the LaMDA chatbot or GPT-4.5 passing tests. However, as discussed, many experts view these claims as premature, based on imitation rather than true general intelligence.

Veteran AI advocate Ray Kurzweil is among the most optimistic, predicting AGI is just around the corner. He initially foretold its advent in the 2030s and later doubled down, stating that AI will “reach human levels by around 2029” and vastly multiply human intelligence thereafter.

More moderate estimators align with the results of a 2022 survey of 738 machine learning researchers, where the average prediction for a 50% chance of high-level machine intelligence (similar to AGI) was 2059. For others, the idea of human-like computer sentience remains firmly in the realm of science fiction or far beyond our current lifetimes.

Ultimately, the debate over the path to AGI and its arrival date highlights the need for better ways to measure true intelligence in machines. If LLMs are indeed a dead end, benchmarks like ARC could play a crucial role in helping the field redirect research towards the kinds of models that will actually lead to AGI, bringing with it the potential for world-changing benefits like curing diseases, accelerating discoveries, and reducing poverty.

——————————————————————————–

References:

  1. Excerpts from “LLMs are a dead end to AGI, says François Chollet” by Kristin Houser, August 3, 2024.
  2. Excerpts from “What is artificial general intelligence (AGI)? Everything you need to know” by Adam Marshall, May 17, 2025.

Tool: Notebooklm.google.com.

More From Author

HAL-9000

Leave a Reply

Your email address will not be published. Required fields are marked *