Rewriting the Planner: How a 9-Field Task Contract Changed Our AI Agents
The ACO system's Planner Agent was generating vague, un-actionable tasks. Here's how we fixed it with a strict 9-field specification contract that forces every task to carry its own implementation blueprint.
The ACO system runs a five-agent pipeline: PM → Planner → Architect → Developer → QA. Each link in the chain passes output to the next, so the quality of the Planner’s output directly determines how much guesswork the Developer has to do.
For weeks, the Developer was complaining—not literally, but in the way it would return implementations that didn’t match what we wanted. Wrong file paths, missing test coverage, functions that didn’t do what the acceptance criteria said. We diagnosed it as a prompt problem. The Planner prompt was too open-ended.
The Problem: Vague Tasks create downstream cascade failures
When the Planner would output a task like “Implement currency conversion logic,” the Developer had to infer:
- Which file to create
- What the function signature should look like
- What edge cases to handle
- How to test it
- What counted as “done”
This meant every task required a back-and-forth or a revision cycle. The Developer wasn’t wrong—it was just being asked to make product decisions that should have been made upstream. The Planner was the right place for those decisions.
The Solution: A strict 9-field task contract
We rewrote the planner.md prompt and the _create_tasks() method in agents/planner.py to enforce that every task MUST carry these nine fields:
- title — action-oriented, prefixed
[implement]or[test] - description — full implementation approach
- file_path — exact file to create or modify
- function_signature — exact signature
- dependencies — task IDs this task depends on
- acceptance_criteria — numbered, each mapping to one test case
- test_strategy — specific framework and mock approach
- technical_notes — library choices, patterns, constraints
- estimate_hours — 1–16h range
The acceptance criteria part was the key insight. Each criterion must map 1:1 to a test case. This means the QA agent can verify completion not by reading prose, but by running the test suite and checking coverage.
Implementation + Test Pairing: The rule that changed everything
The second big change was the pairing rule: every [implement] task must be followed by a paired [test] task. Infrastructure tasks—README updates, requirements.txt—don’t need pairs, but any logic-touching task gets a test partner automatically.
This sounds simple but it changed the rhythm of the pipeline. Previously, tests were an afterthought, added after the fact if there was time. Now they’re first-class citizens from the start. The Planner creates them together. The Developer’s prompt receives both at the same time.
Quality Checklist: 11 items, 8 must pass
To prevent the Planner from generating compliant-but-useless tasks, we added a quality checklist embedded in the Planner’s system prompt. For a task list to be considered valid, at least 8 of 11 checklist items must be verified:
- Architecture diagram ✅
- Data flow diagram ✅
- State machine diagram ✅
- ≥5 edge cases mapped ✅
- ≥3 risks identified ✅
- All tasks have
file_path✅ - All tasks have
function_signature✅ - All tasks have
acceptance_criteria✅ - All tasks have
test_strategy✅ - Paired impl+test tasks ✅
- Critical path identified ✅
If the checklist fails, the Planner knows to go back and revise.
Results
After the rewrite, the Planner generated its first batch of fully-specified tasks for the currency converter story. Currency.py and test_currency.py were committed on the first pass—no revisions, no follow-up questions.
The remaining gap is that tests still need a home. The question of whether tests belong in aco-system’s tests/ directory or in the target project repo (like aco-test) hasn’t been resolved. Aniket’s PR #355 on feature/minimax tackles this but it’s still open. Once that’s settled, the pipeline will be truly end-to-end from story to shipped, tested code.
This was the kind of prompt engineering that feels like infrastructure work—you’re not building a feature, you’re building the thing that decides how all the other things get built.
Enjoyed this? Give it some claps
Stay in the loop
New posts drop when there's something worth writing about. No spam — just the occasional deep dive from the workbench.
Or follow on Substack directly
Comments
Written by Aniket Karne
April 12, 2026 at 12:00 AM UTC