Skip to content
Back to posts

Symposium: Agents That Start Themselves

10 min read

This post is a companion to my talk, The Unsupervised Agent Pipeline.

The Problem with Supervised Agents

I built Claudio to run multiple AI coding agents in parallel, and it works well. But someone still has to read the issue, write the prompt, kick off the agent, and open the PR. For complex, ambiguous work, that human judgment is the point. For bug fixes with clear reproduction steps, dependency updates with automated tests, mechanical refactors with well-defined scopes? The agent is capable enough. The bottleneck is the human who has to tell it to start.

Symposium removes that bottleneck. It’s a Rust service that takes an issue from your tracker, works it end-to-end (isolated workspace, coding agent, code review agent, draft PR), and then monitors that PR for reviewer feedback. If a reviewer requests changes, it spins up another agent to address them. The human reviews at their convenience, not at the agent’s pace.

It’s an implementation of the OpenAI Symphony spec, a pattern for autonomous issue-to-PR pipelines, but with Notion as the issue tracker (instead of Linear) and MCP as the communication layer (instead of REST APIs).

Architecture

The service is a single Tokio event loop running on a timer. Each tick:

  1. Poll Notion for issues in active states (e.g., “Todo”, “In Progress”)
  2. Sort by priority
  3. Filter out issues that are already running, in retry backoff, or over the concurrency limit
  4. For each eligible issue: create a workspace, run hooks, spawn an agent

Workers communicate back to the orchestrator through an mpsc channel. When a worker completes, the event loop processes the result: schedule a retry if it failed, or move on to PR creation if it succeeded.

CLI (clap)
  └─ Orchestrator (tokio event loop)
       ├─ Config Layer (WORKFLOW.md watcher + hot-reload)
       ├─ Notion Tracker (MCP client → notion MCP server)
       ├─ Workspace Manager (fs + hook subprocess)
       ├─ Agent Runner (claude CLI + streaming JSON parser)
       └─ HTTP Server (axum: dashboard + REST API)

Talking to Notion Through MCP

Symposium never calls the Notion API directly. It spawns @notionhq/notion-mcp-server as a child process and speaks JSON-RPC 2.0 over stdio:

let child = Command::new("sh")
    .arg("-c")
    .arg(&mcp_command)
    .stdin(Stdio::piped())
    .stdout(Stdio::piped())
    .stderr(Stdio::null())
    .spawn()?;

Each tick, the client handshakes, queries the database, and gets dropped, which kills the subprocess:

SELECT * FROM "collection://{database_id}"
  WHERE "Status" IN ('Todo', 'In Progress')

The server lives only for one polling cycle. Process isolation means a misbehaving MCP server can’t corrupt the orchestrator’s state, and the ~50-100ms spawn overhead is negligible against a 30-second polling interval.

Why MCP instead of REST? The Notion MCP server already exists as a maintained package, and MCP gives agents access to the same tools the orchestrator uses. When an agent needs to read Notion comments or update issue status, it calls the same MCP server through its own config. One protocol, one server, two consumers.

The Worker Pipeline

When an issue is dispatched, the worker runs through these stages:

1. Workspace Creation

Each issue gets an isolated directory. The after_create hook controls how it’s set up; typically this is a git worktree add from a local repo:

hooks:
  after_create: |
    git -C ~/Developer/my-org/my-repo worktree add \
      {{ workspace }} -b symposium/bug-{{ issue.safe_identifier }}

Hooks are Liquid templates with access to the issue’s properties. safe_identifier sanitizes the issue ID for branch names (replacing colons, slashes, etc. with hyphens).

2. Optional Preflight Verification

Before spending agent time on an issue, a separate agent session can verify the issue is still valid. This is useful for bug workflows where issues go stale, get fixed by other changes, or turn out to be expected behavior.

The preflight agent runs with full access to the workspace and tools. If it determines the issue should be skipped, it drops a PREFLIGHT_SKIP file into the workspace. In practice, the agent runs:

touch PREFLIGHT_SKIP
echo "Bug no longer reproduces after commit abc123 fixed the underlying race condition"

Symposium reads this file and short-circuits: no implementation, no review, no PR. The design is fail-open; if the preflight agent crashes or times out, the main agent proceeds anyway. A broken preflight should never block real work. This principle (a failing component degrades gracefully, never halts the system) runs through every layer of Symposium: config hot-reload, retries, even the MCP client. Early versions didn’t do this, and a single flaky Notion API response would stall the entire pipeline.

3. Main Agent Session

The agent runs as claude -p --output-format stream-json --verbose, with the full prompt piped to stdin. The prompt is rendered from the Liquid template in WORKFLOW.md, which has access to the issue’s title, description, priority, comments, and any extra Notion properties.

The --output-format stream-json flag is key. Instead of waiting for the agent to finish, Symposium reads newline-delimited JSON from stdout in real time. Each message is typed: system for initialization, assistant for content blocks (text and tool calls), and result for completion. This streaming protocol drives a built-in web dashboard (served by the axum HTTP server in the architecture diagram) where you can watch every tool call and text response as they happen across all active sessions.

One subtle detail: the CLAUDECODE environment variable is stripped from the subprocess. Without this, a nested Claude Code instance detects it’s running inside another Claude Code and changes behavior. Removing the variable makes the agent think it’s running standalone.

4. Code Review Agent

After the implementation agent finishes, a second agent session reviews the changes. This agent runs in the same workspace with a review-focused prompt. The review tool is configurable; you can point it at a linter, a custom review skill, or just a prompt that says “review the diff.” If it finds issues, it fixes them and commits. This catches the obvious mistakes that a single-pass agent misses: unused imports, missing error handling, test files that import the wrong module.

5. PR Creation

The implementation agent writes PR_TITLE and PR_BODY.md files to the workspace root (not committed to git, to avoid accidental git add). Symposium reads these, moves them to a temp directory, and runs git push && gh pr create --draft.

The PR body includes the investigation reasoning, what was changed and why, and a link back to the Notion issue. If the metadata files are missing, Symposium falls back to a generic title derived from the issue.

This pattern (the agent writes files, the orchestrator reads them) is the only IPC between the two processes. PREFLIGHT_SKIP, PR_TITLE, PR_BODY.md, the workspace directory itself: no shared memory, no sockets, no database. It’s simple to debug, and if the agent crashes mid-run, the workspace is still on disk. You can inspect it, re-run manually, or let the retry mechanism pick it up.

6. PR Review Monitoring

This is the feature that saves the most time. The initial implementation is useful, but you’d review that PR anyway. The real cost in most teams is the back-and-forth: a reviewer requests changes, the author context-switches back to the branch days later, pushes a fix, waits for another review. That cycle can stretch a one-hour fix into a week of calendar time.

If pr_review.enabled is set, each tick checks open PRs for reviewer feedback. The logic groups reviews by author, keeps only the latest per author, and checks whether any CHANGES_REQUESTED or COMMENTED reviews are newer than the last time an agent addressed feedback. A reviewer who requests changes and later approves doesn’t trigger another agent.

When actionable feedback is found, an agent spins up in the existing workspace (no new branch or worktree) to address the comments. This loop continues until the PR is approved, merged, or closed. The revision cycle that normally takes days of asynchronous waiting just happens.

Config as a Single File

The entire workflow configuration lives in one WORKFLOW.md file: YAML front matter for config, Liquid template for the prompt. Everything between the first and second --- delimiters is YAML; everything after is the template.

---
tracker:
  kind: notion
  mcp_command: "npx -y @notionhq/notion-mcp-server"
  database_id: "your-database-uuid"
  active_states: ["Todo", "In Progress"]
  terminal_states: ["Done", "Cancelled"]
  # ...

agent:
  max_concurrent_agents: 3
---

You are working on bug {{ issue.identifier }}: {{ issue.title }}.

{{ issue.description }}

{% if attempt %}
This is retry attempt {{ attempt }}.
Review what happened in the previous attempt and continue.
{% endif %}

The file is hot-reloaded while the service is running. A filesystem watcher (debounced, watching the parent directory to catch atomic renames from editors) detects changes and propagates the new config through tokio::sync::watch channels. On parse error, the previous config is retained. Already-running workers keep their config snapshot from dispatch time, but hooks read the latest config, so updated scripts take effect immediately.

Multiple workflow files can run simultaneously (symposium WORKFLOW.bugs.md WORKFLOW.sentry.md). Each gets its own tracker, polling interval, and workspace root. State keys are namespaced by workflow ID to prevent collisions.

Retry Strategy

Failed issues get exponential backoff: 1s, 2s, 4s, 8s, … up to a 5-minute cap. The mechanism is a tokio::spawn that sleeps for the delay, then fires a RetryFired event on the orchestrator’s mpsc channel. The timer signals readiness; the next tick performs the dispatch. Stale retries (issue already re-dispatched by another path) are safely skipped.

The issue is re-fetched from Notion at retry time, not cached. If someone changed the issue’s status to “Won’t Fix” while the agent was retrying, the fresh fetch catches that and skips dispatch.

What I Learned

Using MCP as the universal adapter turned out better than I expected. Notion, Sentry, the coding agent: they all speak the same protocol, so there’s one serialization format, one error handling pattern, and one mental model. The overhead of spawning a subprocess per tick is negligible compared to the simplicity it buys.

The fail-open principle I described in the preflight section extends everywhere. Config hot-reload, retry dispatch, MCP client lifecycle: every component follows the same rule. It took a few production stalls to get there, but once the pattern was in place, the system became dramatically more resilient. The same goes for filesystem-as-IPC; I didn’t initially plan for it to be the only communication mechanism, but every alternative I considered (shared memory, SQLite, Unix sockets) added complexity without solving a real problem.

Agents do still fail. They fixate on the wrong part of an issue, produce changes that pass tests but miss the point, or generate PRs so large they’re harder to review than the original bug. The orchestrator has its own failure modes too: issue IDs with slashes broke URL routing, the capacity guard accidentally blocked PR review checks from running, and a restart wiped all tracked PR state because it lived only in memory. Every one of these surfaced from running Symposium against a real codebase. That’s why everything opens as a draft.

What’s Next

The current version handles independent issues well, but real projects have dependencies. A feature might need a data model change before the API layer, and the API before the UI. You can’t throw all the tasks at agents simultaneously and hope the merge order works out.

The next major change adds epic execution: Symposium reads a dependency graph (from Mermaid diagrams or Notion “Blocked by” relations), determines which tasks are ready based on what’s already merged, and stacks PRs on the correct base branches. A task with one dependency gets a stacked PR on that dependency’s branch. A fan-in with multiple dependencies bases off main.

Concretely: an epic with a data model migration, an API endpoint, and a UI component would produce three PRs in the right order, each waiting for its parent to merge before the agent starts the next one. That’s the leap from “fix individual bugs” to “execute multi-task projects.”

Symposium is open source at github.com/Iron-Ham/symposium.