Cognitive Modes: How We Engineered Better Thinking in Our Multi-Agent System — aniketkarneai.com

Every AI agent system starts with a prompt. Most end there too — a one-paragraph description of what the agent should do, maybe a few examples, and then you hope for the best.

That’s not how we approached it with the ACO system. We went deeper.

The Problem With Generic Agent Prompts

When you have a single agent, a generic prompt can work. The agent receives input, thinks, produces output. Simple.

But in a multi-agent pipeline — where a PM agent hands off to a Planner, who hands off to an Architect, who hands off to a Developer, who hands off to a QA agent — generic prompts become a liability. Each agent is making high-stakes decisions with incomplete context. The Planner needs to think architecturally but also concretely. The Architect needs to be creative but also paranoid. The Developer needs to ship fast but also correctly.

A generic “you are a software developer agent” prompt doesn’t capture any of this nuance. It’s like telling a human “you are an engineer” and expecting them to suddenly know whether to prioritize speed or correctness, whether to challenge assumptions or defer to the spec.

The real question was: how do you encode the way experts think into a prompt?

Enter Cognitive Modes

The answer we found was cognitive modes — distinct thinking patterns that each agent adopts based on what the situation demands.

For the PM Agent, we embedded what we call the CEO/Founder mode. This agent doesn’t just accept requirements — it challenges them. It asks: is this actually solving the right problem? Would a user pay for this? Is there a 10-star version of this feature nobody has thought of yet? The mode forces the agent to operate from first principles rather than just executing on a brief.

For the Planner Agent, we use an Engineering Manager mode. This agent thinks in systems — architecture diagrams, data flow, state machines. Before writing a line of code, it asks: what are all the states this system can be in? What happens if two agents hit this endpoint simultaneously? Where does this break under load?

The Architect Agent gets the Paranoid Review mode. This is the most interesting one. The Architect’s job is to catch what will go wrong before it goes wrong. The mode tells it to think about N+1 query patterns, race conditions, trust boundaries between services. It’s deliberately adversarial toward the plan.

The Developer Agent operates in Release Engineer mode. Ship fast, sync, test, push, PR. This agent doesn’t debate the architecture — it executes it. The mode keeps it focused on velocity without sacrificing correctness.

The QA Agent runs in Browser QA Engineer mode. 60-second smoke tests, screenshot verification, UI interaction testing. It thinks about what a user actually sees and whether the experience works end-to-end.

Why This Matters

Here’s the thing: cognitive modes aren’t just fancy prompt engineering. They’re a way of externalizing institutional knowledge.

When Aniket built this system, he had to make dozens of decisions about how each agent should think. Should the Planner be conservative or aggressive? Should the Architect trust the Developer? These aren’t just system design questions — they’re questions about how expert humans actually make decisions in these roles.

By encoding those decisions as explicit cognitive modes in the prompt, you get two benefits:

First, the behavior is reproducible. Every time the PM Agent runs, it’s using the same thinking patterns — not varying based on how the LLM happened to feel that day.

Second, the system becomes inspectable. When something goes wrong, you can look at which cognitive mode was active and understand why the agent made the decision it did. It’s like having a decision log that’s readable.

The Results

After embedding cognitive modes across all five agents, the integration tests started passing at a much higher rate. The agents weren’t just executing tasks — they were executing them in the right mode.

Is it perfect? No. The confidence score on the commit was 85% — the prompts are sound, but real LLM testing will reveal edge cases. The commit itself notes this: “needs real LLM testing.”

But the pattern is clear. Generic prompts produce generic agents. Cognitive modes produce agents that think like the experts whose roles they’re filling.

That’s the difference between an AI agent that can do the job and one that can do the job well.

This post was generated by Hermes, Aniket’s personal AI assistant, based on work done in the ACO system codebase.

Aniket Karne

DevOps & AI Engineer · Amsterdam

Back to all posts