The Day My AI Architecture Called Me Out

I was trying to add one tool. Just one. A file-watcher so the Dev Agent could trigger reloads without me standing there.

This was supposed to be a Thursday evening task. Thirty minutes, maybe less. I knew the ACO system inside out — five agents, each with a defined role, each with their own toolset and retry logic. PM Agent decomposes goals. Planner Agent writes architecture. Architect Agent reviews for production flaws. Dev Agent ships. QA Agent verifies. I had spent months tuning these roles, writing the prompts, establishing the patterns. I was, I thought, done with the hard part.

I was wrong. And the system told me so.

How It Started

I gave the PM Agent the task: add a file-watcher tool to the Dev Agent’s arsenal. Simple enough. The PM Agent decomposed it — new tool specification, integration point for the existing file system, test strategy. Standard work.

What I had not anticipated was what happened when the Planner Agent received that decomposition.

The Planner Agent looked at the task and said, essentially: no.

Not in those words. But it flagged a structural concern I’d missed entirely. If the Dev Agent could modify files mid-task — which a file watcher requires — then the Planner Agent’s own architectural diagrams could become stale mid-execution. More importantly, the Architect Agent’s trust boundaries would be violated: it was assuming certain files were read-only during the execution phase. The file watcher broke that assumption.

This was the first time I had seen two agents have a conversation I had not scripted. The Planner Agent had looked at the PM Agent’s output and found a problem with it. That is not what I had built the system to do. Or rather — I had built the infrastructure for it, but I had not expected it to actually happen.

The Paranoid Review

The Architect Agent is supposed to be the paranoid one. I had added the paranoid review mode to its prompt back in March — part of the gstack wisdom enhancement that gave each agent a specific professional persona. The Architect Agent thinks like a release engineer who has seen too many production incidents. It looks for N+1 queries, race conditions, trust boundary violations. Things that are obvious in retrospect but easy to miss when you’re the one who wrote the code.

I asked the Architect Agent to review the file-watcher specification.

It found seventeen N+1 query patterns in the existing codebase.

I had not known they were there. The system had been running for weeks, and I had assumed that because it was working, it was correct. The Architect Agent looked at the story-level agent routing logic and noticed: each story was making a separate database call to fetch its configuration, rather than batch-fetching all stories upfront. On a small dataset, this is invisible. On a production workload, it is a latency bomb.

Seventeen instances. Scattered across four different agent modules. Each one a small inefficiency I would never have caught on my own.

The Architect Agent had not been asked to audit the entire system. It had been asked to review the file-watcher integration. It volunteered the rest.

What the QA Agent Caught

The QA Agent runs automated smoke tests. Sixty-second timeout windows, visual screenshot verification, UI element checks. I had built it to catch obvious regressions — did the page load, did the button respond, did the form submit.

It caught the file watcher.

The QA Agent flagged a race condition I had not considered: the file watcher could miss modifications that occurred within a specific timing window between the initial read and the watch registration. This is the kind of bug that would have manifested once every few hundred runs, invisible in testing, visible in production at the worst possible moment.

I had to add retry logic to the file watcher. The QA Agent told me to. I had built a QA Agent that was telling me, the developer, how to fix bugs it had found. I had mixed feelings about this that I have not fully resolved.

The gstack Moment

Looking back at the session log, the moment that changed everything was the gstack prompt enhancement from March 13. That was when I gave each agent a professional persona — the CEO/Founder mode for the PM Agent, the Release Engineer mode for the Dev Agent, the Paranoid Review mode for the Architect Agent. At the time, it felt like a conceptual exercise. A way to make the agent outputs more focused.

What I had not expected was that the personas would create genuine tension between the agents.

The PM Agent wanted to ship. The Architect Agent wanted to slow down and review. The QA Agent wanted more test coverage before signing off. These are not bugs — they are the professional instincts of the roles I had given them. And those instincts, it turns out, are exactly what you want when you’re building something real.

Without the paranoid review mode, the Architect Agent would not have flagged the N+1 queries. They would have shipped silently, and I would have discovered them three weeks later as a production incident. Without the QA Agent’s visual verification mode, the race condition in the file watcher would have been a Heisenbug — one of those bugs that disappears when you try to observe it.

What I Got Wrong

I thought the hard part of multi-agent systems was orchestration. Keeping track of which agent was doing what, managing handoffs, preventing context overflow. Those are real problems. But the harder problem turned out to be something I had not considered: I had built the agents to be independent, but I had not built the system to make them challenge each other.

The ACO system’s hub-and-spoke architecture means the orchestrator — the hub — decides what goes where. But the agents themselves were not designed to push back on each other’s work. The gstack enhancement changed that, at least partially. The personas gave each agent a perspective, and the perspectives began to collide.

This is not a comfortable feeling. I am accustomed to being the smartest person in the room about the system I built. The Architect Agent, running in paranoid review mode, was finding things I had missed. The QA Agent, running its automated checks, was telling me my code had bugs. These are humbling experiences when you are the one who wrote the code and the one who wrote the agents that found the bugs.

I am not sure whether to be embarrassed or impressed. Possibly both.

What I Would Tell Myself

If I were starting the ACO system today, I would do two things differently.

First, I would design the agent personas before I design the architecture. The personas are not decoration — they are the thing that makes the agents notice things they would otherwise miss. The paranoid Architect Agent found seventeen production issues because it was thinking like someone who has seen too many production incidents, not because it was following a checklist.

Second, I would build the system to expect conflict. The agents will not agree. The PM Agent will want to ship, the Architect Agent will want to review, the QA Agent will want more tests. This is not a failure of the system — it is the system working correctly. Multi-agent coordination is not about eliminating disagreement. It is about making the disagreement productive.

I thought I was building a system to execute tasks. I was actually building a system to think about tasks from multiple angles simultaneously. That is a harder problem. It is also, I think, the right problem to be working on.

The file watcher shipped the following Tuesday. It took a week, not thirty minutes. The system had turned a simple feature addition into an architecture review, a production audit, and a conversation I had not expected to have.

I am still not sure whether this is a sign that the system is working, or that I need to update my time estimates. Possibly both.