Notes on AI Agent Architecture — What Actually Works — aniketkarneai.com

I’ve been building with AI agents for about six months now. What started as curiosity has become the core of how I approach problems. Some notes on what I’ve learned about the architecture side.

The Agent Loop

Every agent system eventually boils down to this:

Observe -> Think -> Act -> Result -> (repeat)

The quality of each stage determines everything. Most frameworks focus on the “Think” part (the LLM), but the “Observe” and “Act” stages are where systems actually break down.

What Observation Looks Like

An agent that can’t see its environment is flying blind. Good observation means:

Tool access — The agent can query the state it needs. Filesystem, APIs, databases, terminal output. Not just “can it run commands” but “does it understand the output?”

Context window management — Long conversations kill agent performance. The best systems I’ve built spend as much time on context pruning as on prompt engineering.

State reflection — Can the agent see what it did last time? Memory isn’t just storage — it’s the ability to query past actions and their outcomes.

The Action Surface

This is where most agent projects underinvest. You can give an agent 50 tools, but if error handling is bad, one bad tool call breaks everything.

What matters in actions:

Idempotency — Can you run it twice safely?
Rollback — What happens when it fails halfway?
Atomicity — Does partial execution leave the system in a valid state?

The Tooling Reality

After trying several frameworks (LangChain, LlamaIndex, raw APIs), my current stack:

Orchestration — Custom Python layer. Frameworks add abstraction without adding value.
Memory — SQLite for persistent facts, in-memory buffer for session context.
Execution — Separate process per agent task. Isolation beats clever concurrency.
LLM — Claude for reasoning, Opus for complex tasks.

The Honest Problems

Reliability — An agent that fails 5% of the time isn’t reliable. I’ve gotten most systems to under 1% failure with aggressive retries and state verification, but it takes work.

Evaluation — How do you know if the agent did the right thing? Traditional testing doesn’t apply. I’ve been building custom eval harnesses that compare outputs against known-good baselines.

Cost — Running agents is expensive. A complex task that takes a human 5 minutes might cost $2 in API calls. The economics only work for tasks where the human’s time is worth more than $24/hour.

What’s Next

The field is moving fast. The next six months will probably see major improvements in reliability and cost. Until then, the best agent is one that knows its limits.

Aniket Karne

DevOps & AI Engineer · Amsterdam

Back to all posts