Skip to content
Back to posts

Orchestrating AI Agents with Claudio: Lessons from Parallelizing Claude Code

13 min read

The Single-Threaded Problem

If you’ve spent any meaningful time with AI coding agents like Claude Code, you’ve probably hit the same bottleneck I have: they’re single-threaded. One agent, one task, one branch. You sit and watch it work through a feature, then move on to the next thing. For small changes, this is fine. For larger projects where you want to ship multiple features or refactors in parallel, it starts to feel like a waste. You wouldn’t assign one engineer to every task sequentially — you’d split the work across a team.

That’s the idea behind Claudio. It’s a CLI and TUI tool, written in Go, that orchestrates multiple Claude Code (or Codex) instances running simultaneously on a single project. Each instance gets its own isolated workspace, its own branch, and its own task. The result is something closer to how a team of engineers works: in parallel, with coordination.

Building it taught me a lot about what it takes to parallelize AI-assisted development. Here’s what I learned.

Git Worktrees Are the Key Insight

The first thing you realize when trying to run multiple agents in parallel is that they can’t share a working directory. Two processes modifying the same files at the same time is a recipe for corruption and confusion. You need isolation.

Git worktrees solve this elegantly. A worktree is a separate working copy of the same repository, backed by the same .git directory. Each worktree can be checked out to a different branch, and changes in one don’t affect another. They’re lightweight, easy to create, and easy to clean up.

In Claudio, each agent instance gets its own worktree stored under .claudio/worktrees/<instance-id>/. Each one operates on an independent branch. The agents don’t know or care about each other’s file systems — they just see a normal git repo and work as usual.

# Under the hood, Claudio creates worktrees like:
git worktree add .claudio/worktrees/instance-1 -b claudio/instance-1-auth-api
git worktree add .claudio/worktrees/instance-2 -b claudio/instance-2-unit-tests

This is far simpler than alternatives like Docker containers or full repo clones. Worktrees share the object store, so they’re fast to create and take minimal disk space. And when you’re done, claudio cleanup removes the stale worktrees and branches in one shot.

The key takeaway: if you’re building any kind of tool that needs multiple parallel workstreams on the same repo, reach for git worktrees before anything else.

The Coordination Challenge

Isolation gets you parallelism, but parallelism without coordination gets you chaos. The hardest part of building Claudio wasn’t getting multiple agents running — it was keeping them from stepping on each other.

Shared Context

Each instance in Claudio is aware of what the others are working on. The orchestrator auto-generates a context.md file and injects it into every worktree. This file contains the current task assignments, their statuses, and which files each instance has modified. When an agent starts working, it knows the lay of the land.

This is a lightweight form of inter-agent communication. The agents don’t talk to each other directly. Instead, the orchestrator acts as a shared state coordinator, broadcasting context to everyone. It’s the same pattern you’d use with a team of humans: a shared project board that everyone can see.

Conflict Detection

Even with isolation, two agents might independently decide to modify the same file. Claudio monitors the files each instance touches and raises alerts when overlap is detected. This doesn’t block the agents — it surfaces the conflict early so you can intervene before merge time.

Task Dependencies

Some tasks have natural ordering. You can’t write integration tests for an API that doesn’t exist yet. Claudio supports task chaining via a --depends-on flag:

claudio add "Implement user authentication API"
claudio add "Write integration tests for auth" --depends-on auth-api

Dependent tasks won’t start until their prerequisites complete. This prevents wasted work and keeps agents from operating on assumptions that haven’t materialized yet.

Planning Before Execution

Throwing tasks at agents one at a time is fine for simple work. But for larger objectives — “add authentication to this app,” “migrate the data layer to async/await” — you want a structured plan that decomposes the work, identifies dependencies, and determines what can run in parallel. Claudio offers several modes for this, and they turned out to be some of the most interesting parts of the project.

UltraPlan

UltraPlan is Claudio’s “project manager” mode. You give it a high-level objective, and it runs a four-phase process:

  1. Analysis — an agent scans the codebase, understanding the architecture, existing patterns, and relevant files
  2. Decomposition — the objective is broken into discrete, independently shippable tasks
  3. Dependency mapping — tasks are ordered by their dependencies, identifying which can run in parallel and which must wait
  4. Execution — all unblocked tasks launch simultaneously, with dependent tasks starting as their prerequisites complete
claudio ultraplan "Add user authentication with JWT tokens and role-based access control"

The key insight is that the decomposition step produces a structured JSON plan that you can review before any agents start coding. You see the task graph, the dependency edges, and the parallelism budget. If the plan looks wrong — maybe it wants to write tests before the code they test — you can adjust before burning compute on execution. Planning is cheap. Execution is expensive. Inspect the plan.

For even more rigor, Multi-Pass Planning generates three competing decomposition strategies for the same objective, evaluates their tradeoffs, and selects the strongest one. This is useful when you’re not sure what the right architecture should be — let three strategies compete and pick the best.

TripleShot

TripleShot takes the idea of competition from planning into execution. For any given task, it spawns three agents working in parallel on the same problem, each in its own worktree. When all three finish, a judge agent evaluates the implementations and selects the best one.

# From the TUI command mode
:tripleshot "Implement the caching layer with LRU eviction"

This sounds wasteful — you’re paying 3x the compute. But for tasks that are tricky, ambiguous, or have multiple valid approaches, it’s remarkably effective. Different agents make different design choices. One might use a dictionary with a doubly-linked list. Another might reach for NSCache. A third might build something custom. The judge evaluates correctness, performance characteristics, and code quality, then picks the winner.

The mental model is a tournament. Three engineers independently tackle the same problem, and you ship the best solution. In practice, this is most valuable for core infrastructure code where getting the implementation right matters more than getting it fast.

Adversarial Review

Adversarial Review pairs every implementing agent with a dedicated critical reviewer. The reviewer’s job is adversarial by design — it actively looks for bugs, edge cases, design flaws, and violations of project conventions. The implementation goes through iterative feedback loops: implement → review → revise → review → ship.

The adversarial agent is configured with a minimum passing score (default: 8/10) and a maximum iteration count (default: 10). The loop continues until the implementation either passes the reviewer’s bar or hits the iteration limit. In practice, most implementations pass within 2-4 iterations.

This is the closest analog to a real code review process. The reviewer isn’t rubber-stamping — it’s actively trying to find problems. And because the reviewer is a separate agent with its own context, it approaches the code with fresh eyes on each pass. Combined with TripleShot, you can have three implementations each going through adversarial review before the judge picks the best one. It’s expensive, but for high-stakes changes, the quality improvement is meaningful.

Choosing the Right Mode

Each mode reflects a different strategy for managing uncertainty:

  • UltraPlan when you have a large objective and need task decomposition
  • TripleShot when the task is tricky and you want to explore multiple approaches
  • Adversarial when correctness is critical and you want rigorous review
  • Multi-Pass when you’re not sure what the right plan even looks like

You can also combine them. UltraPlan for decomposition, TripleShot for the critical tasks in the plan, adversarial review for everything. The compute cost scales accordingly, but so does the quality.

Building the TUI with Bubbletea

Running multiple agents means you need visibility into what they’re all doing. A CLI that dumps interleaved stdout from five processes isn’t useful. You need a dashboard.

I built the TUI layer using Bubbletea, the Elm-architecture TUI framework for Go. It gives you a clean model-update-view loop, and the Charm ecosystem provides excellent primitives for styling and layout.

The dashboard shows each instance’s status, output stream, and current task. You can tab between instances, pause and resume them, view diffs, and check for conflicts — all without leaving the terminal. The keyboard-driven interface keeps it fast:

Tab / l / Right  → Next instance
a                → Add new instance
p                → Pause/Resume
d                → Toggle diff view
c                → Toggle conflict view

Bubbletea’s message-passing architecture maps naturally to this problem. Each agent instance emits events (output lines, status changes, file modifications), and the TUI subscribes to them. The separation between orchestration logic and rendering keeps the codebase manageable.

How This Changes the Workflow

The practical impact of running parallel agents is significant. Tasks that used to be sequential — implement a feature, write its tests, update the docs — can now happen simultaneously. A session might look like:

claudio start my-feature
claudio add "Implement the new caching layer"
claudio add "Write unit tests for the cache" --depends-on caching-layer
claudio add "Update the configuration docs"
claudio add "Refactor the existing storage adapter"

Three of those tasks run immediately in parallel, and the fourth starts once its dependency completes. You monitor everything from the TUI, intervene when conflicts arise, and end up with a set of branches ready for review.

It’s not a replacement for careful engineering. You still need to review what the agents produce, resolve conflicts thoughtfully, and make architectural decisions. But it compresses the mechanical parts of the workflow considerably.

What’s Next: Orchestrator of Orchestrators

Everything described above orchestrates individual Claude Code instances. Each instance is a single agent, working alone in its worktree, communicating only through shared context files. But Claude Code now supports Agent Teams — a lead agent coordinating multiple teammates via mailbox messaging, shared task lists, and dynamic work assignment. An Agent Team is already a self-coordinating unit. The question becomes: what happens when you orchestrate teams instead of instances?

That’s the north star for Claudio’s next chapter, tracked as the Orchestration 2.0 initiative. The vision:

Today:    Claudio → [Instance A] [Instance B] [Instance C]  (isolated, static)
Tomorrow: Claudio → [Team Alpha ↔ Team Beta ↔ Team Gamma]  (collaborative, adaptive)

Instead of each worktree containing a single Claude process, each worktree contains an entire Agent Team — a lead plus N teammates, self-coordinating internally via mailbox. Claudio becomes a meta-orchestrator: it manages team lifecycle, routes messages between teams, enforces budgets, and handles phase transitions. The teams handle the actual work internally.

This unlocks things that isolated instances can’t do. A single instance hits context window limits on large tasks. A team can distribute context across multiple windows while maintaining internal coordination. A single instance can’t adapt when it discovers unexpected complexity. A team lead can dynamically spawn more teammates, split tasks, and redistribute work. And when a teammate fails, the lead can replace it and continue — no need to restart the entire task.

The Foundation Is Mostly Built

The prerequisite infrastructure for Orchestrator of Orchestrators has already shipped across the sister epics:

Inter-Instance Mailbox (#629) replaced the one-way sentinel file protocol with a real-time messaging system. Instances can send discoveries, claim file ownership, ask questions, and share progress. Messages are delivered via .claudio/mailbox/ directories with both broadcast and targeted messaging. This is the communication primitive everything else builds on.

Dynamic Task Queue (#630) replaced UltraPlan’s static execution order with a self-claiming task queue. Instances pull work when they’re ready, respecting dependency graphs. If a task fails, it goes back in the queue with failure context for the next claimant. No more idle instances waiting for a bottleneck task to complete.

Adaptive Lead (#634) introduced a Claude instance as the orchestration coordinator — a hybrid architecture where Go handles infrastructure (worktrees, tmux, budgets, TUI) and Claude handles strategy (plan adjustment, task redistribution, decision-making). When a task reveals unexpected complexity (“Cannot refactor auth.go — 47 callers”), the lead can dynamically split it into sub-tasks and add them to the queue.

Peer Debate Mode (#633) added a new workflow where multiple instances investigate competing hypotheses, actively challenge each other’s reasoning, and converge through structured debate. Unlike TripleShot (three isolated attempts, judge picks winner), Peer Debate involves cross-pollination — each instance reads others’ findings and writes challenges and rebuttals. The debate converges when instances agree, confidence is high, or no new evidence emerges. This is particularly valuable for bug investigation and architecture decisions, where the first plausible answer is often not the right one.

The remaining pieces — Context Propagation (#632), Elastic Scaling (#636), File Conflict Prevention (#635), and Plan Approval Gates (#631) — have all shipped as well. The foundation is in place. What remains is the capstone: wiring it all together so that each “instance” in Claudio’s orchestration is itself a self-coordinating Agent Team.

What This Looks Like

Imagine an UltraPlan session where the planning phase is handled by a Planning Team — three planners debating approach with a lead synthesizing the final plan. Execution groups become Execution Teams — a backend team, a frontend team, an infrastructure team — each with a lead dynamically assigning sub-tasks to teammates. Cross-team discoveries get routed through Claudio (“Team Alpha found the API schema changed → notify Team Beta”). A Review Team handles cross-cutting quality review. A Merge Team coordinates integration across branches.

TripleShot evolves too: instead of three isolated instances competing, three teams compete — each team internally collaborating for a more thorough solution before the judge evaluates.

It’s orchestration all the way down. And it’s the next thing I’m building.

Closing Thoughts

Building Claudio reinforced a few lessons that apply well beyond this specific tool. Git worktrees are an underused primitive for parallel workflows. Coordination is harder than parallelism. Planning is cheap and execution is expensive, so invest in planning. And if you’re building developer tools, a good TUI goes a long way toward making complex workflows approachable.

If you want to try it out or dig into the code, Claudio is open source at github.com/Iron-Ham/claudio.