The Prompt That Thinks Like an Engineer: Lessons from Enhancing the ACO System's Five Agents

There’s a moment in building multi-agent systems when you realize the agents aren’t the hard part. The hard part is getting them to think correctly before they act. An agent that knows Python but writes sloppy, unshippable code is often less useful than one that needs guidance on architecture but produces clean, tested, PR-ready output.

That’s the lesson embedded in a commit from mid-March on the ACO system workspace — a commit that enhanced all five agent prompts with what the team called “gstack wisdom”: cognitive modes borrowed from years of shipping software with actual senior engineers.

What We Actually Changed

The ACO system has five agents: PM, Planner, Architect, Developer, and QA. Each had a base prompt that described their role and a generic set of behaviors. The enhancement added role-specific cognitive modes:

PM Agent got CEO/Founder mode — challenged to find 10-star product perspectives, reframe problems before solving them
Planner Agent got Eng Manager mode — required to produce architecture diagrams, data flow diagrams, state machines before any code is written
Architect Agent got Paranoid Review mode — tasked with hunting N+1 queries, race conditions, and trust boundary violations before approving any spec
Developer Agent got Release Engineer mode — ship fast, sync, test, push, PR in a tight loop
QA Agent got Browse QA Engineer mode — 60-second smoke tests, screenshots, UI verification

This sounds obvious when written out. But the difference between “write good code” and “ship fast, sync, test, push, PR” is the difference between a todo list and a workflow.

Why This Matters for Multi-Agent Pipelines

The failure mode we kept hitting in the ACO system wasn’t that agents made mistakes. It was that agents with identical base prompts made wildly inconsistent decisions about how much to do before handing off to the next agent.

The Planner would sometimes hand off a three-sentence spec to the Architect. The Architect would sometimes approve it without finding the obvious race condition. The Developer would sometimes write code without tests and push it directly. No single agent was wrong — but the pipeline was producing fragile output.

The gstack wisdom enhancement was an attempt to close those inconsistency gaps by making each agent’s cognitive mode explicit in the prompt. Instead of “do a good job,” each agent got a frame: “think like a paranoid reviewer who has to explain every trust boundary violation to a courtroom.”

What Changed When We Ran Real Stories

Integration tests against the enhanced prompts showed measurable improvements in output completeness. The Planner now consistently produces architecture diagrams before handing off — not just sometimes, but always. The Architect finds the edge cases because it’s explicitly told to think like someone who has to explain failures to a post-mortem.

The Developer agent’s release-engineer mode deserves special mention. In the original prompts, “write code” was the core instruction. In the enhanced version, the loop is explicit: write it, test it, sync with the spec, push, open a PR. The handoff back to QA happens with a specific checklist.

This matters because multi-agent pipelines are only as strong as their weakest cognitive link. An agent that cuts corners to feel faster creates work for every downstream agent. The release-engineer mode doesn’t make the Developer faster — it makes the pipeline faster by reducing rework.

The Engineering Takeaway

If you’re building multi-agent systems and your agents produce inconsistent output, the first question isn’t “how do I add more agents?” It’s “what cognitive mode is each agent operating in, and is it explicit?”

Prompts that describe roles are table stakes. Prompts that embed cognitive modes — the actual thinking patterns of experienced engineers in each role — are what separate a working pipeline from a fragile one.

The ACO system is now running stories through enhanced agents that think like their human counterparts. The integration tests pass. The next real-world validation is already queued.

What We Actually Changed

Why This Matters for Multi-Agent Pipelines

What Changed When We Ran Real Stories

The Engineering Takeaway

Stay in the loop

Comments