The Eventual Clone Wars
Coding agents should work more like a swarm than a human software teamThe Eventual Clone Wars
Coding agents should work more like a swarm than a human software team - Part-2 of a 2 part article series on AI
Open any agentic coding platform today and you'll find something familiar: an architect agent, a developer agent, a QA agent, maybe a product manager. The org chart has been faithfully reproduced in YAML. We've spent decades building software in teams with specialized roles, and now we're teaching AI to do the same thing. While replicating traditional software teams for agents feels natural, it's ultimately a mistake.
The Bitter Lesson
In The Bitter Lesson, Rich Sutton argues that AI methods leveraging computation tend to win over those that rely on human knowledge. The reason is straightforward: computation gets cheaper over time. Human knowledge doesn't scale the same way.
Sutton wrote this before LLMs caught fire. It has been remarkably accurate at predicting what methods of LLM use are most effective.
Sutton's stories about AI systems that play chess and Go are instructive. Researchers spent decades encoding expert knowledge into their programs: evaluation functions, opening books, positional heuristics. Those programs kept losing to ones that just searched deeper. More hardware beats more human knowledge. The researchers who built those knowledge-heavy systems weren't wrong about chess. They were wrong about what would matter as computation got cheaper.
Replicating the software team
Many contemporary agentic coding tools attempt to encode established software development practices into agentic systems, replicating roles such as architect, developer, and designer. AutoGen pioneered this approach in 2023, and even the widely adopted Superpowers framework draws heavily on conventions from traditional practices. Over-reliance on the human-knowledge approach that Sutton cautions against risks repeating the failures observed in chess and Go.
The architecture of these platforms is understandable. We know how software teams work. We know the failure modes. Bad designs produced by an architecture agent are easy to identify and fix. A developer agent that writes buggy code can be fixed with better prompts or guardrails. The system is legible in the same way that a human team is legible.
But legibility is a management convenience, not a sign that we're on the right path. Human-team structures are predictable and easy to debug. We deeply understand their strengths and limitations. But they're most useful only in the context of today's cost constraints, not the right long-term model.
Letting computation do the work
Heeding Sutton's advice, agentic coding systems should aim to use more computing power and tokens than is feasible today. Solutions like Gastown get closer to this approach.
We've been experimenting with even more extreme approaches. Imagine starting a task and telling a swarm of thousands of agents to each implement a solution, then picking the best one. No architect, no code review, no role specialization. Just a population of attempts and a selection mechanism.
Building agentic software is closer to natural selection than it is to a software team. You don't need to encode knowledge about how to write good software. You need a way to generate many candidates and a way to identify which ones work. The knowledge is implicit in the selection, not explicit in the process.
The next problem to solve is: how do you pick the best solution? Do you run tests? Build a judge agent? Use a tournament-style bracket? This is an area ripe for experimentation. The evaluation mechanism is as important as the generation mechanism, and it's where most of the interesting work is happening.
This approach is currently cost-prohibitive at scale. Running thousands of agents on a single task burns through tokens quickly. But token costs have been dropping steadily, and there's little reason to think that the trend will slow down.
A step in the right direction
For now, we need a cost-effective solution. But we should pick one that moves in the right direction: toward more computation rather than encoded human process.
A small swarm is feasible today. Five to ten agents, each independently implementing a solution, with a selection step at the end. It's not thousands, but it's a fundamentally different approach than assigning one agent to be the architect and another to be the developer. It treats the problem as search rather than process.
As costs drop, the size of the swarm is able to grow. The approach scales naturally in a way that role-based systems don't. An architect-developer-QA pipeline doesn't get better when you add more architects. A swarm gets better when you add more agents.
Nobody knows exactly what agentic software engineering will look like in a few years. Using The Bitter Lesson as a guide, agentic software development will look less familiar, and certainly less like today's software teams.