Preparing Your iOS Codebase for AI Agents

This post is a companion to my talk, Preparing Your iOS Codebase for AI Agents.

The Setup Problem

AI coding agents are good at writing code. They’re less good at writing the right code for your project. Drop Claude Code into a large iOS codebase with no guidance and you’ll get syntactically correct Swift that ignores your architecture, uses the wrong testing framework, calls xcodebuild with missing flags, and puts files in the wrong directories. The agent isn’t dumb; it just doesn’t have the context that you carry in your head.

Over the past few months I’ve been working on a medium-sized iOS codebase (SwiftUI + TCA, ~20 modules, Tuist for project generation) and figuring out how to make it agent-friendly. The goal: an AI agent should be able to pick up a task, understand the conventions, build the project, write tests, and verify its own work, all without a human babysitting every step.

Here’s what worked.

Hierarchical Documentation

The most impactful thing you can do is write layered AGENTS.md files. Not one giant document. A hierarchy.

The Root: Operating Contract

At the repo root, the AGENTS.md defines the operating contract. This is the stuff that applies everywhere: how to name things, how to handle version control, what the testing philosophy is, how to debug. Think of it as the team handbook.

The important sections:

Architecture overview: What services exist, what languages they use, how they communicate. An agent needs this map to understand where its work fits.
Critical workflow rules: Things like “always run yarn build:lib before building services.” These are the kinds of gotchas that waste 20 minutes when an agent discovers them through trial and error.
Behavioral guidelines: Rules about how the agent should behave. Ask for clarification rather than assuming. Don’t rewrite implementations without permission. Fix bugs immediately. These feel obvious, but agents without explicit behavioral guidelines will make surprising choices.
Troubleshooting table: A simple symptom-to-fix lookup. “Module not found? Run yarn build:lib.” This saves agents from lengthy debugging cycles.

## Troubleshooting

| Symptom | Fix |
|---------|-----|
| Module not found | `yarn build:lib` |
| GraphQL errors | `yarn graphql` |
| Port conflicts | `make dev/stop` |

The Subsystem: Platform-Specific Guide

Below the root, each major subsystem gets its own AGENTS.md. The iOS client guide covers what an agent needs to work in that part of the codebase: build commands, architecture, testing patterns, and platform-specific rules.

The key design principle: the subsystem guide refines, not replaces, the root. It starts with “Read the root AGENTS.md first” and only adds iOS-specific content. When the two conflict, the subsystem guide wins for that subsystem.

An early mistake I made was putting too much in the subsystem guide. The first version of our iOS AGENTS.md was nearly 800 lines. It included a full project structure tree, a list of every module with descriptions, generic design principles (Single Responsibility, Separation of Concerns) that were already in the root, and detailed reference sections for accessibility, localization, and SwiftUI previews. It was comprehensive, but it was also a wall of text that buried the actionable content.

The fix was aggressive trimming. The current guide is about 415 lines, roughly half the original. The approach:

Remove anything derivable from the code. A list of every module with a one-sentence description is redundant when make modules prints the same information from the source of truth (the Tuist configuration). A “Key Technologies” section listing SwiftUI, Apollo, and GRDB adds nothing; an agent can see the imports. A “Key Features” section that reads like marketing copy is noise.

Remove anything already in the root. Generic principles like “each function should do one thing well” and “prefer composition of small protocols over large ones” belong in the root AGENTS.md. Repeating them in the subsystem guide is clutter, and worse, it creates a maintenance burden where changes need to happen in two places.

Extract reference material, keep the rule inline. The accessibility section was 120+ lines of VoiceOver patterns, keyboard access requirements, Dynamic Type examples, and testing checklists. Valuable, but only when you’re actually working on accessibility. For the subsystem guide, the rule is what matters:

## Accessibility Requirements

**Accessibility is not optional.** All new UI code must be accessible
from the start (WCAG 2.1 AA, MAS-C). Treat a11y bugs as P0 issues.
See [`docs/accessibility.md`](docs/accessibility.md) for VoiceOver,
keyboard, Dynamic Type, color contrast, touch target, and testing
requirements.

Two lines instead of 120. The full reference lives in docs/accessibility.md where an agent can find it when it needs it. Same treatment for localization and SwiftUI previews.

Replace trees with tables. A 30-line ASCII project structure tree looks nice in a README but is wasted context for an agent. A compact table communicates the same information in a third of the space:

| Directory | Contents |
|-----------|----------|
| `Tuist/` | SPM dependencies, project description helpers (see Tuist/AGENTS.md) |
| `Modules/` | Feature modules; run `make modules` for the current list |
| `Tests/` | Test suites mirroring module structure |

What stays in the subsystem guide is the stuff an agent needs on every task: build commands, architecture decisions that are iOS-specific (Apollo types as DTOs, TCA action dispatch rules, dependency injection patterns), banned patterns with code examples, and a table of directory-specific guides for deeper reading.

The lesson is counterintuitive: shorter agent documentation is more effective. Agents have finite context windows. Every redundant line is a line of actual code or conversation that gets pushed out. Keep the guide lean and scannable, with detailed reference material one link away.

The Module: Directory-Specific Knowledge

The deepest layer is per-module AGENTS.md files. These live right next to the code they describe and contain the kind of knowledge you’d normally only get from someone who’s been working in that directory for months.

For example, the GraphQL module’s guide explains:

The file layout (.graphql source files vs. generated/ output)
That generated files must never be edited manually
How Apollo test mocks work, including the @_spi(Unsafe) escape hatch and why new code should avoid it
A gotcha about type shadowing when GraphQLTestMocks is imported alongside AppFeature

The AppFeature module’s guide takes a different approach. Its main message is: stop adding code here. The module is too large, new features should be extracted into their own modules, and the guide lists candidates for future extraction. This kind of architectural guidance prevents agents from making an existing problem worse.

The subsystem guide ties these together with a directory-specific guides table so agents know where to look:

| Area | Guide | Scope |
|------|-------|-------|
| AppFeature | `Modules/AppFeature/AGENTS.md` | Database layer, module structure |
| GraphQL | `Modules/GraphQL/AGENTS.md` | Apollo codegen, query conventions |
| Analytics | `Modules/Analytics/AGENTS.md` | Event tracking, user ID wiring |
| Networking | `Modules/Networking/AGENTS.md` | HTTP layer, connectivity |
| Tuist | `Tuist/AGENTS.md` | Adding dependencies and modules |

Each AGENTS.md answers one question: “What would a new team member need to know before making their first change here?” If you can answer that in a markdown file, an agent can use it too.

CLI-Friendly Tooling

Agents operate through a terminal. They can’t click Xcode’s Build button or navigate the Issue navigator. If the only way to build your project involves the GUI, an agent can’t do its job.

The fix is a Makefile that wraps every common operation as a single command: make build, make test, make format, make modules. Each command encodes the correct flags, simulators, and sequencing so the agent doesn’t have to guess. The AGENTS.md then includes a blocklist telling agents to use the Makefile instead of calling xcodebuild or tuist directly.

If you’re running multiple agents in parallel using git worktrees, the Makefile also handles simulator isolation and project-local DerivedData so that make build works identically in the main repo and in any worktree.

I wrote a dedicated post on this: Give Your Agents a Makefile.

Skills: Executable Methodology

Documentation tells agents what to do. Skills tell them how to do it, step by step.

Claude Code has a concept of “skills”: markdown files that describe a multi-step workflow the agent can invoke. They’re like runbooks, but for AI. Our codebase defines about a dozen skills, each encoding a specific methodology.

Workflow Skills

Some skills encode multi-step workflows that would otherwise require careful human sequencing:

verify-ios-change builds the app, launches it on a simulator, observes the result, and produces a pass/fail report with video evidence. This one is too project-specific to open source, but I wrote a full walkthrough that covers the approach if you want to build your own.
deep-review runs a comprehensive code review using specialized agents for architecture, error handling, types, accessibility, concurrency, and platform-specific concerns. It distinguishes new issues from pre-existing ones, so the reviewer focuses on what actually changed.
split takes a large branch and splits it into N stacked logical branches with hunk-level granularity, turning a monolithic PR into a reviewable sequence.

Each of these would take a human 15-30 minutes of careful manual work. As a skill, the agent handles the orchestration and the human reviews the output.

Visual Verification

The verification skill deserves a closer look. It uses XcodeBuildMCP to take screenshots, snapshot the view hierarchy, and record video. It also piggybacks an accessibility audit on every run, catching missing accessibilityLabel on interactive elements as a side effect of normal verification. The full walkthrough covers the workflow step by step.

Library Reference Skills

Some skills aren’t workflows; they’re reference material. We have skills for The Composable Architecture, Swift Dependencies, IdentifiedCollections, and Swift Sharing that include the full API interface and usage patterns. When an agent needs to use @Dependency correctly or construct an IdentifiedArray safely, the skill provides the authoritative reference rather than letting the agent rely on its training data (which may be outdated or wrong for the specific version we use).

The Swift Concurrency skill is particularly valuable. It includes project-specific settings (which modules have strict concurrency, which use complete vs targeted checking), the team’s concurrency philosophy (avoid blanket @MainActor, prefer structured concurrency, document escape hatches), and references for specific patterns like GRDB concurrency and Sendable conformance.

Design System Compliance

One class of agent mistakes is harder to catch than wrong architecture or missing flags: UI that compiles and looks plausible but doesn’t match your design system. Hardcoded colors instead of semantic tokens. Arbitrary spacing instead of your scale. Corner radii that are close but not quite right. These slip through code review because they render fine at a glance.

Our approach is to make the design system the only practical way to style UI. The architecture does the enforcement; the agent doesn’t need to “remember” to use the right values.

Token-Driven Architecture

The design system is built on a protocol layer (AppTheme) that defines every visual property: colors, spacing, radii, typography, shadows, animation curves, stroke widths. A concrete implementation maps each token to the actual asset catalog values. Views access the theme through a single injection point:

@Environment(\.appTheme) private var theme

From there, convenience modifiers make the tokens easy to use:

Text("Hello")
    .themeForeground(.primary)
    .themePadding(.md)
    .themeCornerRadius(.lg)

There’s no Color.blue or .padding(16) in the vocabulary. The modifiers accept semantic levels (.primary, .md, .lg), not raw values. An agent reaching for inline styles would have to fight the API to do it wrong.

Making Tokens Discoverable

A TOKENS.md file in the Tokens module serves as the agent’s design reference. It documents every token category with purpose tables, correct/incorrect code examples, and a migration reference that maps legacy values to their token equivalents. When the agent needs to choose between theme.spacing.sm and theme.spacing.md, the reference has the answer.

The module hierarchy enforces the layering: Tokens (protocols) -> Primitives (concrete values + atomic components) -> Components (molecules like chips, avatars, toasts) -> feature screens. Each layer can only import downward. An agent writing a feature screen can use Components and Tokens but can’t bypass them to access asset catalog colors directly.

Lint Rules as the Safety Net

The API makes the design system easy to use. Lint rules make it hard to skip. Custom SwiftLint rules flag direct use of Color(...), inline .padding(...) with literal values, and other patterns that bypass the token system. These are configured as warnings in Xcode, but make build passes RUN_SWIFTLINT=1 with warnings-as-errors enabled. An agent building through the Makefile (which the blocklist ensures) can’t merge code that violates the design system; the build simply fails.

This is the piece that ties everything together. The token API makes the right approach easy. The lint rules make the wrong approach a build failure. And because linting only runs through the Makefile, not through raw xcodebuild, the blocklist from the CLI tooling section is doing double duty: it both gives agents the right flags and ensures design system enforcement is active.

Visual Verification Catches the Rest

Even with good architecture and lint enforcement, an agent might combine tokens in ways that look wrong. The preview-build skill catches this: after modifying a view, the agent renders its #Preview block and evaluates the screenshot against the design system. Snapshot tests across multiple configurations (light mode, dark mode, large text, accessibility sizes) provide a permanent regression check.

This is a topic that deserves its own post. For now, the key insight is: if you want agents to produce design-system-compliant UI, layer your enforcement. Make the design system easy to use at the API level, enforce it with lint rules at build time, and verify it visually with previews and snapshots.

The Feedback Loop

All of this documentation would become stale without a maintenance strategy. The root AGENTS.md includes a section on maintaining itself:

AGENTS.md files and skills are living documents. Update them when you discover undocumented conventions, missing workflows, gotchas that cost you time, or outdated instructions.

The agents themselves are encouraged to update the docs. Small factual corrections can be made directly. Structural changes need human approval. The result is a self-improving system: as agents hit new gotchas, the documentation grows to prevent the next agent from hitting the same problem.

This is probably the most underrated part of the setup. Documentation that doesn’t evolve with the codebase is documentation that lies. By making it part of the agent’s operating contract to maintain the docs, you get documentation that stays current because the entities reading it are also the entities updating it.

But there’s a tension here. Agents adding to the docs is good. Agents adding too much to the docs is how you end up with an 800-line subsystem guide. The maintenance section needs guardrails too:

Every change must leave the document shorter or more useful, ideally both.

This single rule prevents documentation bloat. An agent that discovers a new gotcha should add it, but an agent that wants to add a 50-line example section should think twice about whether a link to a docs/ file would serve better.

Practical Advice

If you’re looking to make your own iOS codebase agent-friendly, here’s the order I’d suggest:

Start with a Makefile. Wrap your build, test, and codegen commands. This alone eliminates the most common class of agent errors.
Write a root AGENTS.md. Cover architecture, conventions, and your operating contract. Focus on things an agent would get wrong without guidance: naming conventions, file organization, testing philosophy, banned patterns.
Add subsystem guides, but keep them lean. Each platform or service gets its own guide. Rules and decisions stay inline; detailed reference material goes to docs/ with links. If your guide is over 400 lines, it’s probably too long.
Encode your testing methodology as a skill. This is where the highest ROI is for code quality. Agents write a lot of tests, and the difference between good and bad tests is entirely in the methodology.
Add module-level docs as you discover gaps. Don’t try to document everything upfront. When an agent makes a mistake because it didn’t know a module-specific convention, add an AGENTS.md to that directory.
Make the docs self-maintaining, with guardrails. Tell agents to update docs when they find something undocumented. But also tell them that every change should leave the document shorter or more useful.

The investment compounds. Every hour spent on documentation saves many hours of correcting agent mistakes. And unlike human onboarding documents that get read once and forgotten, agent documentation gets read at the start of every single session.