The Invisible Handoff: Why Most Multi-Agent Systems Fail at the Boundaries — aniketkarneai.com

Every multi-agent framework talks about the pipeline. PM → Planner → Architect → Developer → QA. Six boxes connected by arrows. It looks clean.

What nobody talks about is what happens at the arrows.

The Handoff Problem Nobody Admits

When a human PM hands off a spec to a human engineer, the engineer doesn’t just read the document. They ask questions. They push back. They form a mental model of why this feature matters, not just what it should do. When the engineer hands off to QA, QA doesn’t just run the test plan — they develop intuition about where the risky parts are.

In most AI agent systems, that rich human handoff gets replaced with a JSON payload.

The PM agent outputs a story. The Planner agent reads it. What the Planner actually receives isn’t the PM’s reasoning — it’s the Planner’s interpretation of the PM’s output. And because these agents are powered by LLMs, that interpretation varies. Sometimes significantly.

This is the invisible handoff problem: context arrives at each agent, but the lens through which that context gets interpreted is never controlled.

What a Prompt Actually Does

Here’s the thing most people miss about prompts: a prompt doesn’t just tell an agent what to do. It tells the agent how to think about what it’s receiving.

A generic “you are a planner agent” prompt produces an agent that reads stories and breaks them into tasks. Fine. But the quality of that breakdown depends entirely on what lens the agent is using to read the story.

An agent in “task decomposition mode” sees a story and asks: what are the implementation steps? An agent in “engineering manager mode” sees the same story and asks: what are the states, the failure modes, the data flows? Same input, completely different output — not because of different training, but because of different framing of the same reasoning process.

That’s the insight behind the cognitive modes in the ACO system. Instead of one generic agent reading each handoff, you have five agents with five distinct thinking frames — and crucially, each frame is consistent. The PM agent always thinks from the CEO/Founder lens. The Architect always thinks from the paranoid production-safety lens. The Developer always thinks from the release-engineer velocity lens.

The Five Modes as Five Expertises

The ACO system assigns distinct cognitive modes to each agent based on what expert humans in those roles actually optimize for:

CEO/Founder mode (PM Agent): Don’t take requests literally. Before writing a single line of a story, the PM agent must challenge the premise. Is this feature solving the real problem or the stated problem? What’s the 10-star version hiding inside the obvious request? This mode exists because most requirements documents describe a solution, not the underlying user need.

Engineering Manager mode (Planner Agent): Lock the technical spine before breaking down work. Architecture diagrams, data flow, state machines — all before a single task is written. This mode exists because most task lists fail not because of bad implementation but because of missing edge cases that a proper system design would have surfaced.

Paranoid Review mode (Architect Agent): Hard gate, not soft suggestions. The Architect doesn’t recommend — it rejects. If a plan has a categorical blocker (technically impossible core requirement, missing critical tasks, internal contradictions), the story goes back. This mode exists because suggestions get ignored; hard gates don’t.

Release Engineer mode (Developer Agent): Ship fast. Sync, test, commit, push, PR — in exactly that order, no exceptions. This mode exists because the last 10% of shipping (changelog, PR description, cleanup) is where momentum dies and branches go stale.

Browser QA Engineer mode (QA Agent): Eyes on the live application. Screenshots at every step, console checks after every navigation, health scores on every run. This mode exists because code review misses what users actually see.

Why Modes Beat Prompts

You could try to encode all of this in a single massive prompt. “You are a planner who thinks about architecture, data flow, state machines, edge cases, and also sometimes takes requests literally.” That prompt would be incoherent — a soup of competing directives.

What cognitive modes do is make the thinking sequential and exclusive rather than concurrent. Each mode is a lens that the agent uses to the exclusion of others. The Developer doesn’t need to be paranoid — the Architect handles that. The PM doesn’t need to be fast — the Developer handles that. Each agent trusts the previous agent in the chain to have handled their domain.

The result is that handoffs stop being invisible. When the PM hands off to the Planner, the Planner isn’t receiving a flat story document — it’s receiving output from an agent that was explicitly thinking from the CEO lens. The Planner knows it can trust the user-need analysis because that was the PM’s job. It can focus entirely on the technical decomposition.

The Real Cost of Generic Handoffs

The reason most multi-agent systems don’t achieve the pipeline efficiency they promise is that every handoff is a context loss event. The Architect reviews what the Planner wrote, but the Architect doesn’t know what questions the Planner didn’t think to ask. The Developer implements what the Architect specified, but the Developer doesn’t know what the Architect was most paranoid about.

In a human team, this gets solved through co-location, code review culture, and institutional memory. In an AI agent team, it has to be engineered explicitly.

Cognitive modes are one approach. The invisible handoffs become less invisible when each agent knows what lens the previous agent was using. The PM’s output is trustworthy because we know it was CEO-mode output. The Architect’s review is trustworthy because we know it was paranoid-review output.

Is it perfect? No. The LLM still interprets. The modes still have edge cases. But it’s a structured approach to a problem that most frameworks just… hope away.

The pipeline looks clean when you draw the arrows. The hard part is making sure something meaningful crosses each arrow.

Aniket Karne

DevOps & AI Engineer · Amsterdam

Back to all posts