Will AGI Ever Happen?

A Morning of Realization

I woke up today, like most days, scrolling through social media while my coffee brewed—and something struck me. It wasn't a technical paper. It wasn't a data point. It was a vibe. A shift. I've been deeply embedded in the world of artificial intelligence—building, shipping, and integrating cutting-edge tools into real manufacturing environments. But today, for the first time, I paused and truly asked: Will we ever reach AGI? Not the marketing version. Not the "GPT-5 is coming" hype. I'm talking about true artificial general intelligence—an entity that reasons, reflects, and understands.

And right now, I'm not sure anymore.

Paul Graham on AGI & Prompting

This morning I stumbled across a tweet from Paul Graham—and I agree with him...

"AGI would mean the end of prompt engineering... The fact that we still rely on prompts tells us we aren't there yet."

He's absolutely right. The need to craft precise, often finicky prompts is not something you'd expect from a generally intelligent agent. Humans don't require this level of specificity to understand what's needed. So why should AGI?

We're spending billions developing models so sensitive they fail without the "right" phrasing. Would you consider someone "intelligent" if they couldn't function unless spoken to with a precise sequence of words?

Gary Marcus & Nicholas Thompson's Warning

I also chimed into a podcast with Nicholas Thompson and Gary Marcus—a duo who never fail to bring grounded thinking to the hype. Gary has long warned that scaling alone won't get us to AGI. He advocates for combining symbolic reasoning with modern machine learning. Thompson added a stark economic reality to the discussion:

"Each version of GPT has cost vastly more than the last, yet the improvements—while real—have begun to plateau."

A Shift in Perspective

At IMTS in Chicago this year, I was on stage for a keynote and was asked bluntly:

"Do you believe AGI will happen within a few years?"

Back then, my answer was confident: yes. We were witnessing remarkable acceleration—GPT-4 had emerged, chain-of-thought prompting was improving reasoning, and startups were shipping agents that could write code, book travel, or solve equations. But here's the thing: everything looks like progress until it doesn't.

Behind the scenes, working daily on our Vault (retrieval) framework and CoT (Chain-of-Thought) processing at Multiaxis Intelligence, I've seen the cracks—and they're growing.

The Diminishing Returns of Scaling

We need to talk about the economics.

From GPT-1 to GPT-4 Turbo, the jump in cost is astronomical. The jump in intelligence? Marginal at best.

As Nicholas Thompson (CEO of The Atlantic) put it: "The marginal cost of intelligence is rising, while the returns are flattening."

We've hit a wall of diminishing returns. Each improvement is more expensive, more complex, and more fragile than the last.

Pliny the Liberator: The Hacker's Mirror

Pliny the Liberator—aka @elder_plinius—has become one of the sharpest minds in testing the limits of LLMs. He doesn't build. He breaks. He prompt-injects, collides contexts, and watches as even the most advanced LLMs unravel.

"These models don't think. They autocomplete."

And he's right. These systems aren't thinking. They're predicting the next word based on the last 8,000–32,000 tokens. We've mistaken fluency for understanding.

Scaling vs. Reasoning: The Company Divide

Here's how major AI players are aligning:

OpenAI & Google - Betting on scale and compute
Anthropic - Constitutional AI and safety-first approaches
Meta - Open source and massive parameter counts
DeepMind - Hybrid symbolic-neural approaches

What's missing? A unifying theory of intelligence. Everyone's guessing. Everyone's experimenting. No one knows for sure.

What We're Building

With Multiaxis Intelligence, we're doing something different. We're not chasing AGI. We're building real reasoning systems for real people making things... things that require a single source of truth. Here's how:

1. The Vault

Our Retrieval-Augmented Generation system prioritizes verified, structured content. We don't let models guess—we give them grounded facts.

2. Chain-of-Thought (CoT)

Before an answer is given, our models outline their reasoning. Not just a reply—a rationale.

3. Prompt Minimalism

Our internal metric? Fewer tokens, better thinking. We strive for systems that understand your intent without gymnastics.

We're not building minds—we're building mirrors with structure.

Let's Be Honest

AGI might happen.
But not tomorrow. And not through scale alone.
It will take reasoning. Planning. Memory. Grounding. Intuition. Causality.
Right now, LLMs can barely pass their own jailbreak tests.
So, let's stop selling sci-fi. Let's build systems that actually work.
Let's equip people with tools that reason in context. That cite their sources. That don't need perfect prompts.

And maybe—just maybe—we'll get closer.