The Context Debt Problem: When Your Database Doesn't Know What Your Agents Said — aniketkarneai.com

Multi-agent systems have a dirty secret: the structured database you’ve designed to track state is always fighting a war with the unstructured flood of LLM output coming from your agents.

The ACO system tracks stories through a workflow — PM Review → Planner → Architect → Dev → QA → Done. Each story has a context_json column in SQLite, meant to hold “conversation history, decisions, constraints.” The database schema is clean. The ORM models are tidy. But then you actually run a story through the pipeline and watch what happens.

The Blob That Grew

When a PM agent reviews a story, it adds context like:

{
  "pm_notes": "This story is unclear. The acceptance criteria need rewording...",
  "decisions": [],
  "questions": ["Should this integrate with the billing system?"]
}

Then the Planner agent runs. Its output — architecture diagrams, data flow explanations, state machine descriptions — gets appended as:

{
  "planner_output": "Architecture: ... [300 lines of thinking]",
  "planner_diagram": "...",
  "decisions": [{"type": "arch", "decision": "Use event-driven model"}]
}

The Architect agent reviews it. The Dev agent generates code. The QA agent finds bugs, re-files stories, adds screenshots as base64.

By the end of a single story cycle, context_json is a 40KB JSON blob. It’s not searchable. It’s not summarizable without an LLM call. And when the story comes back around for iteration three, every agent has to parse through all of this accumulated context to find the signal.

Why Schema Design Can’t Save You

The obvious fix is to design better schema. Add a StoryComment table. Add a Decision table. Add an AgentOutput table. But this only shifts the problem — you’re now imposing a structure that your LLM agents don’t naturally produce.

The agents output prose. They output thinking. They output decisions embedded in paragraphs. Forcing them into structured tables means either: (a) writing a parsing layer that extracts structured data from prose, or (b) prompting the agents to output JSON, which makes them less fluent thinkers.

There’s also the identity map problem we covered previously: when agents run in separate processes, the same database object looks different to each one. The context_json field that one agent wrote isn’t the same Python object another agent sees — it’s a deserialized copy. This means the second agent can mutate it, but commit() won’t detect those mutations unless you explicitly call flag_modified().

The Phase 4 Context Trimmer — A Partial Solution

The ACO system has a ContextTrimmer (Phase 4 of the implementation roadmap) that tries to solve this. Its strategy: keep the N most recent comments unsummarized, and use an LLM to summarize older ones into a compact text block.

def trim_comments(self, comments: List[Dict]) -> tuple[List[Dict], Optional[str]]:
    if len(comments) <= self.recent_count:
        return comments, None

    recent_comments = comments[-self.recent_count:]
    old_comments = comments[:-self.recent_count]

    summary = self._summarize_comments(old_comments)
    # ...
    return trimmed, summary

The problem: _summarize_comments is a placeholder. It counts questions and answers and returns a one-line string — not an actual LLM summary. The real implementation would need to call an LLM with all the old comments and ask it to produce a 2-3 sentence summary of decisions made and requirements clarified.

The Real Problem: Two Different Worldviews

The core tension is that database schemas represent a point-in-time worldview — a story has a status, an assignee, a priority. But LLM agents operate in a flow worldview — they produce reasoning traces, they iterate, they change their mind mid-task.

Your database says “status: DEV.” Your agent says “well, I explored two approaches and the first one had a subtle race condition I caught, so I pivoted to approach B, but then I realized approach B needs a new table…”

These two worldviews don’t map cleanly onto each other. The context_json blob is the escape valve — a place to dump all the unstructured output that doesn’t fit the schema. But as the ACO system has learned, that escape valve becomes a liability at scale.

What Would Actually Fix It

The honest answer: you need a two-phase context strategy.

In the first phase, agents output structured comments to a StoryComment table with explicit comment_type fields (question, answer, decision, note). This requires aggressive prompting — “after each significant decision, add a StoryComment with type=decision” — but it produces queryable data.

In the second phase, a background job runs the LLM summarization that the ContextTrimmer currently lacks. Older comments get compressed into summary blocks. The summarization prompt isn’t complex: “Summarize the following conversation. Focus on: what decisions were made, what requirements changed, what open questions remain. Respond in 3 sentences.”

The tricky part is timing: you don’t want to summarize while a story is actively being worked on. You want to summarize when a story enters a waiting state — HUMAN_REVIEW, for instance, or when it sits in QA for more than N hours.

The Debt Doesn’t Go Away

Context debt accumulates silently. A story that starts with 2KB of context grows to 40KB by iteration 3. Agents start timing out because their context windows are full of historical noise. Developers stop reading the context and start making decisions from scratch, which breaks the audit trail.

The ACO system’s ContextTrimmer is the right idea, but it’s not finished. The placeholder summarizer is a reminder that in multi-agent systems, the infrastructure for managing agent output is as important as the agents themselves — and it’s almost always underinvested.

The schema gets you to a certain scale. After that, you need actual context management. And that means an LLM, running in the background, doing the boring work of summarizing what your other LLMs said.

Aniket Karne

DevOps & AI Engineer · Amsterdam

Back to all posts