← Back to blog

The Capability Overhang

Demis Hassabis named something at Davos last week that’s been nagging at me since.

He called it the “capability overhang” — the gap between what current AI models can already do and what most people are actually getting out of them. The phrase sounds technical, but the idea is simple. The models have pulled ahead of most users’ ability to exploit them. Even if capabilities froze today, there’s unexplored territory in what’s already shipping.

While the discourse fixates on AGI timelines and whether software engineering will be automated in six months or six years, this quieter gap might matter more. It’s not about what’s coming. It’s about what’s already here and undertapped.


The Evidence Is Accumulating

Consider what’s happened in the past few months.

A developer named Geoffrey Huntley discovered that if you run Claude Code in a loop — feeding failures back in until the job is done — it can complete contracts worth $50,000 in billable hours for under $300 in API costs. The technique is embarrassingly simple: a five-line bash script. He named it after Ralph Wiggum, the dim-witted Simpsons character, because the approach is “deterministically bad in an undeterministic world.” It just keeps trying until it works.

This wasn’t a new model. It was a new way of using an existing one.

Or take Lenny Rachitsky, the product management writer with one of the largest business newsletters online. When Anthropic launched Cowork — their desktop agent for non-developers — he fed it 320 podcast transcripts and asked for patterns. Fifteen minutes later, he had ten recurring themes and ten counterintuitive truths, structured and ready to use.

The capability to do this existed before Cowork. The interface made it accessible.

Or consider how Claude Code itself emerged. Anthropic built it for software engineering — but users immediately started using it for vacation research, slide decks, and organising wedding photos. Boris Cherny, the product lead, described it perfectly: “We built a race car and people are using it to pick up groceries.” The groceries were always possible. People just hadn’t tried.


Why the Gap Exists

If the capabilities are there, why aren’t people finding them?

Three reasons, I think.

First, interface constraints. The chat window is deceptively limiting. It suggests short exchanges, quick questions, contained tasks. The models can do much more — extended analysis, multi-step projects, genuine collaboration — but the interface doesn’t invite it. This is why Claude Code and Cowork matter: they’re interface experiments, attempts to expose capabilities that were always present but hidden behind a text box.

Second, learned helplessness. People’s expectations were set by earlier, worse models. GPT-3 was impressive but unreliable. Early ChatGPT hallucinated constantly. Users learned to distrust, to double-check everything, to keep tasks small and simple. Those instincts made sense then. They’re limiting now. The models have improved faster than the habits.

Third, prompting is a skill. This sounds like productivity-guru nonsense, but it’s true. The difference between a novice prompt and an expert prompt isn’t style — it’s outcome. Knowing how to structure context, when to provide examples, how to break complex tasks into steps, when to push back on a weak response — these are learnable skills that dramatically affect what you get. Most people haven’t learned them. Why would they? Nobody taught prompting in school.


The Uncomfortable Implication

Here’s where it gets uncomfortable.

If the capability overhang is real — if the models can do significantly more than most people are extracting — then the bottleneck isn’t the technology. It’s us.

This doesn’t mean everyone should become a “prompt engineer” or optimise their workflows into oblivion. But it does mean that the common complaint — “I tried Claude/GPT and it wasn’t that useful” — might say more about the attempt than the tool.

I notice this in my own work. The more I push, the more I find. Tasks I assumed were beyond the model turn out to be possible. Capabilities I dismissed as hype turn out to be real, just poorly surfaced. The ceiling keeps being higher than I expected.

This is genuinely strange to sit with. We’re used to tools having obvious limits. A hammer can’t become a screwdriver if you just believe in it harder. But these models are different. The limit is fuzzy, and it moves based on how you approach it.


What Exploring Actually Looks Like

This isn’t a call to “hustle harder” or “unlock your AI potential.” It’s simpler than that.

Exploring the overhang means trying things you assume won’t work. Feeding the model a problem you think is too complex, too messy, too ambiguous. Seeing what happens when you give it real context instead of sanitised test cases. Treating weak outputs as a prompt problem before concluding it’s a capability problem.

It means being willing to waste some time on experiments that don’t pan out — because the ones that do reveal something you wouldn’t have found otherwise.

It means, frankly, playing. The people discovering new capabilities aren’t following playbooks. They’re curious and slightly reckless. They try weird things.


The Caveat

None of this validates the hype cycle.

The Davos predictions — software engineering automated in twelve months, Nobel-level AI by 2027 — may or may not come true. The capability overhang doesn’t mean current models are secretly superintelligent. They’re not. They still hallucinate. They still fail in predictable and unpredictable ways. They still require judgment that they can’t provide.

But the overhang does mean that the interesting story right now isn’t primarily about what’s coming next. It’s about what’s already here and underexplored.

The people who figure out how to close that gap — how to actually extract the value that’s already present in shipping models — will be better positioned regardless of what the next generation brings. And they’ll have a clearer view of what’s real versus what’s vapor.

The timeline debates will continue. The predictions will get more dramatic. The discourse will remain fixated on the future.

Meanwhile, the present is sitting there, waiting for someone to take it seriously.


The Davos panel:

The examples cited:

Related episodes:

  • EP001: The Day After Davos — Amodei’s predictions and what to make of them
  • EP003: The Soul Document 2.0 — the constitution and the Ralph Wiggum story