Five Cognitive Modes That Changed How My Agents Think — aniketkarneai.com

Five specialized agents. Each one used to sound the same — like five versions of the same generic AI assistant reading from the same script. That changed today after integrating YC President Garry Tan’s gstack cognitive mode philosophy into the ACO pipeline, and the difference in output quality is immediately visible.

The Problem With Generic Agent Prompts

When you’re running a multi-agent pipeline — PM → Planner → Architect → Dev → QA — there’s a trap that’s easy to fall into: you write one set of behavioral guidelines and apply it to every agent. You get compliance, but not excellence. Each agent starts performing the same generic cognitive patterns regardless of what their role actually demands.

The PM agent was brainstorming like a developer. The QA agent was thinking like a product manager. Everyone was mediocre at their actual job because the prompts never forced them into the right mode of thinking.

What gstack Taught Me About Cognitive Gearing

Garry Tan’s gstack (now with 280K+ GitHub stars) introduced an idea that’s deceptively simple: instead of one general-purpose AI, you give it distinct cognitive personas, each with their own priorities, vocabulary, and decision-making framework. A CEO persona challenges premises and finds the 10-star product. An Engineering Manager persona thinks in architecture diagrams and release timelines. A Paranoid QA persona hunts for race conditions and trust boundary violations.

I took this philosophy and applied it to the five ACO agents:

PM Agent → CEO/Founder Mode: Challenges story premises before accepting them. Asks “what if we’re wrong about the core assumption?” Finds the 10-star version of every feature before writing a single task.

Planner Agent → Engineering Manager Mode: Produces architecture diagrams, data flow diagrams, and state machines before writing any task. Locks the technical spine first, decomposes second. Every task gets a mandatory file_path, function_signature, acceptance_criteria, and test_strategy — no exceptions.

Architect Agent → Paranoid Review Mode: Thinks in N+1 queries, race conditions, and trust boundaries. Reviews every proposed implementation as if it will be attacked. Produces a risk table and edge case map before approving.

Dev Agent → Release Engineer Mode: Optimized for shipping. Sync → test → push → PR in a tight loop. No golden paths, no excessive documentation — working code and a clean PR.

QA Agent → Browse QA Engineer Mode: 60-second smoke tests with actual screenshots. Visual verification that the UI matches expectations. Not just unit tests passing — actual user-facing behavior verified.

What Actually Changed

After integrating these modes into ~/.openclaw/workspace/aco-system/agent_prompts/ — specifically pm.md, planner.md, architect.md, dev.md, and qa.md — the pipeline started producing fundamentally different outputs.

The Planner no longer just breaks down stories into tasks. It now locks a technical spine first: architecture diagram, data flow, state machine, edge case map, and risk table. Only then does it produce tasks — each one machine-readable with explicit file paths, function signatures, and test strategies.

The Architect stopped approving implementations and started interrogating them. It now surfaces N+1 queries and trust boundary issues that previously only surfaced in production.

The difference isn’t cosmetic. It’s the difference between agents that execute and agents that think about what they’re executing.

The 85% Confidence Problem

The integration commit notes put confidence at 85% — prompts are sound, but real LLM testing is still needed. That’s the honest state of prompt engineering: you can reason through it carefully, add worked examples and strict contracts, and still not know how the model will behave until it’s running in production.

The next step is running integration tests against actual model outputs and comparing the cognitive mode behaviors against the previous generic prompts. The architectural improvement is clear on paper. Whether it translates to measurably better outcomes is what matters.

That’s the experiment for tomorrow.

Aniket Karne

DevOps & AI Engineer · Amsterdam

Back to all posts