The Validation Hook System That Stops Bad Stories Before They Reach QA — aniketkarneai.com

Multi-agent systems have a trust problem. When a story moves from the Architect gate to Development, how do you know it’s actually ready? The naive answer is: ask the Architect agent to validate it. But that’s exactly where hallucinations creep in. An LLM told to “review for security” will often give a confidently wrong answer — especially when the prompt is tired from a long conversation.

The ACO system’s answer to this is Phase 3: a deterministic validation hook system that runs at every story state transition. No LLM. Just code.

The Transition Map

Stories in the ACO system flow through seven states: pm-review → ready-planner → ready-architect → approved → dev → qa → human-review → done. At five of those transitions, the system runs a JSON-defined hook:

transition_map = {
    "ready-architect → approved": "pre-architect-approval",  # CRITICAL gate
    "approved → dev": "pre-dev-start",
    "dev → qa": "pre-qa-review",
    "qa → human-review": "post-qa-pass",
    "human-review → done": "post-story-done",
}

Each hook file lives in hooks/ as JSON, defining which validators run and what happens if they fail. The key insight is that validators are split into two categories: deterministic (hard gates that can reject a transition) and LLM-based (advisory only, never fail a transition).

Deterministic Validators: The Hard Gates

The four deterministic checks are designed to catch things that can be verified with pure code:

security_checklist uses regex patterns to scan the story’s description and context for hardcoded secrets:

patterns = {
    "api_key": r'(?:api[_-]?key|apikey)\s*[=:]\s*["\']?[a-z0-9]{20,}["\']?',
    "password": r'(?:password|passwd|pwd)\s*[=:]\s*["\']?[^\s"\']{8,}["\']?',
    "token": r'(?:token|auth[_-]?token)\s*[=:]\s*["\']?[a-z0-9]{20,}["\']?',
    "secret": r'(?:secret|private[_-]?key)\s*[=:]\s*["\']?[a-z0-9]{20,}["\']?',
}

schema_validation enforces required fields — the story’s context_json must contain tech_stack and acceptance_criteria before the Architect even looks at it.

task_coverage verifies every task has an assigned agent and an estimate in hours. No unassigned tasks can reach dev.

all_tasks_done is the final check before QA: every task must be marked DONE. No partial stories slip through.

The Gap: Environment Files and Shell History

Here’s where it gets interesting. The security checklist scans story.context_json + story.description for secret patterns. This catches secrets that agents write into story descriptions or that end up in structured context fields. But it’s blind to secrets that live in environment files.

A Dev agent implementing a feature might write this to .env.example:

DATABASE_URL=postgresql://admin:supersecret123@db.internal:5432/prod
OPENAI_API_KEY=sk-1Tmk4l2HSoZ4YK5pkHKtHPbrGSPABr0GI2X5

The story description says “add database connection” — no secrets there. The Architect validates it, passes it, and the story transitions to dev. The regex-based security check never sees the .env file because it only operates on the story’s structured fields, not the files the agent creates.

This is a known architectural limitation and it’s being addressed in a future iteration: the security checklist needs filesystem access to scan newly created files before a transition completes. For now, the hook catches secrets in descriptions and structured context, but an agent that puts credentials into a new file passes through undetected.

The LLM-based validators — architectural sanity checks, code quality warnings — are explicitly marked warn_only. They run but never fail a transition. The comment in the code reads: “LLM validators are advisory only (don’t fail transition).” This is deliberate: the system treats LLM judgment as unreliable for hard gates, so it only uses it for optional warnings.

What This Teaches About LLM Trust Boundaries

The ACO system’s Phase 3 validates a broader principle: LLMs should never be the gatekeeper of things that can be checked deterministically. Security secrets, schema compliance, task completeness — these are all checkable with code. Putting an LLM in charge of them introduces a failure mode where the model hallucinates a clean bill of health.

This doesn’t mean LLMs have no role in validation. The llm validator type exists and runs as a warning system. But its results never block a transition. The Architect agent still uses LLM reasoning to make actual approval decisions on architecture and feasibility. That’s the right trust boundary: LLM for judgment, code for rules.

The hook system isn’t glamorous. It won’t generate compelling prose about why a design is sound. But it’s the layer that prevents the system from shipping code with password = "admin123" in it — and in a 24/7 autonomous multi-agent pipeline, that’s the layer that matters most.

Aniket Karne

DevOps & AI Engineer · Amsterdam

Back to all posts