Preparing Your

iOS Codebase
for AI Agents

Documentation, tooling, and skills for agent-friendly development

Hesham Salman

Where I Was

And maybe where some of you are right now

Knew the codebase front-to-back
Built or helped design the custom frameworks
AI had no training data for any of it
Spent more time fixing outputs than writing from scratch
At best? It could generate unit tests

The push from leadership to use AI felt like it was
introducing bugs and multiplying work.

SAY: "At my previous role, I was deep in a large iOS codebase. I'd been there for years. I knew every module, every pattern, every gotcha." [CLICK → fragment 1] SAY: "I knew the codebase front-to-back." [CLICK → fragment 2] SAY: "I'd built or helped design the custom frameworks we used. Internal libraries, internal architecture patterns." [CLICK → fragment 3] SAY: "And AI had zero training data for any of it. These were proprietary frameworks. No open-source equivalent. No Stack Overflow answers." [CLICK → fragment 4] SAY: "So when I tried AI coding tools, I spent more time fixing what they generated than it would have taken me to just write it myself. The code looked right at first glance, but it was wrong in ways that only someone who knew the codebase would catch." [CLICK → fragment 5] SAY: "At best, it could generate unit tests. And even those needed significant rework." [CLICK → fragment 6] SAY: "Meanwhile, leadership was pushing hard for the team to adopt AI. And from where I sat, it was introducing more bugs and multiplying work. I'm guessing some of you are in a similar position right now." TARGET TIME: ~2 minutes total.

What Changed

Two things happened at once

1. New codebase I didn't know front-to-back

2. Claude Opus 4.5

Not perfect. Still made plenty of mistakes.
But for the first time, it was a genuine tool that sped me up.

SAY: "Then two things changed at the same time." [CLICK → fragment 1] SAY: "I left that role and joined a new company. New codebase. Unfamiliar architecture, unfamiliar conventions. For the first time in years, I wasn't the person who knew everything about the code." [CLICK → fragment 2] SAY: "And it happened to coincide with the release of Claude Opus 4.5, which was a genuinely different class of model." [CLICK → fragment 3] SAY: "It wasn't perfect. It still made plenty of mistakes. But in a codebase where I wasn't the expert, the calculus flipped. The agent's ability to read and navigate code I hadn't written yet, to find patterns and conventions across modules I hadn't explored, that became genuinely valuable. For the first time, it sped me up instead of slowing me down." TARGET TIME: ~3.5 minutes total.

A Different Question

The question changed.
Not "is AI useful?" but "what does the codebase need to give the agent so it can actually help?"

That's what this talk is about.

The Problem

What happens when an agent has no context

Uses XCTest instead of Swift Testing
Calls xcodebuild with wrong flags
Ignores your TCA architecture entirely
Puts files in the wrong directories

The agent isn't dumb.
It just doesn't have your context.

SAY: "Let me paint you a picture. You point an agent at your iOS project and ask it to write a feature." [CLICK → fragment 1] SAY: "First thing it does: writes tests in XCTest. Your team moved to Swift Testing six months ago. All the existing tests use the new framework. But the agent defaulted to what it knows from training data." [CLICK → fragment 2] SAY: "Then it tries to build. It shells out to xcodebuild with generic flags. Misses the correct scheme, the right simulator, the derived data path your CI depends on. The build fails with a cryptic error." [CLICK → fragment 3] SAY: "It writes the feature code as a plain MVVM view model. Your entire app is built on TCA — The Composable Architecture. Reducers, effects, the whole thing. The agent has no idea." [CLICK → fragment 4] SAY: "And it drops the new files in the root of the project instead of the correct module directory." [CLICK → fragment 5] SAY: "Here's the thing — the agent isn't dumb. It's actually quite capable. It just doesn't have the context that every developer on your team carries in their head. Your conventions, your architecture decisions, your build setup. None of that is in the code itself." TARGET TIME: ~6 minutes total by end of this slide.

The Three Layers

Each layer builds on the previous

1. Documentation

AGENTS.md hierarchy. The operating contract.

2. Tooling

Makefile + worktrees. The executable interface.

3. Skills

Executable methodology. The how-to guides.

SAY: "The solution comes in three layers." [CLICK → fragment 1] SAY: "First, documentation. Specifically, a hierarchy of AGENTS.md files that give the agent its operating contract — naming conventions, architecture decisions, testing philosophy." [CLICK → fragment 2] SAY: "Second, tooling. A Makefile that wraps your entire build system into simple commands the agent can actually run. No GUI required." [CLICK → fragment 3] SAY: "Third, skills. These are step-by-step workflows that encode your team's methodology. Not just what to do, but how to do it." SAY: "Each layer builds on the previous one. Documentation tells the agent what matters. Tooling gives it the ability to act. Skills tie it all together into repeatable workflows. Let's go through each one." TARGET TIME: ~6.5 minutes total.

Hierarchical AGENTS.md

Three levels of context, from broad to specific

Root

The team handbook. Naming conventions, VCS workflow, testing philosophy, troubleshooting guides.

Subsystem

The iOS playbook. Build commands, architecture decisions, banned patterns, platform-specific conventions.

Module

Tribal knowledge. Gotchas only someone working in this directory would know. The things you'd tell a new teammate.

Key principle: The subsystem refines, not replaces, the root.

SAY: "AGENTS.md files work in a hierarchy — just like how your codebase is organized." [CLICK → fragment 1] SAY: "At the top is the root AGENTS.md. This is your team handbook. It covers everything that applies everywhere: naming conventions, version control workflow, how to run tests, general troubleshooting steps. Every agent session starts by reading this file." [CLICK → fragment 2] SAY: "Below that is the subsystem level. For iOS, this is where you document your build commands, your architecture — TCA, MVVM, whatever you use — banned patterns like force unwraps or singletons, and platform-specific conventions. This file lives in your iOS directory." [CLICK → fragment 3] SAY: "At the bottom is the module level. These are small, focused files that capture the kind of knowledge you'd share with a new teammate sitting next to you. 'Hey, this module has a weird dependency ordering issue.' 'The mock setup here requires this specific pattern.' Tribal knowledge, written down." [CLICK → fragment 4] SAY: "The key principle: each level refines, not replaces, the parent. The subsystem doesn't restate everything in the root. It adds iOS-specific detail. The module doesn't restate the subsystem. It adds directory-specific gotchas. This keeps files short and avoids contradictions." TARGET TIME: ~8.5 minutes total.

Keep It Lean

More docs does not mean better docs

800 lines

first draft

→

415 lines

after trimming

Shorter docs = more effective. Every redundant line pushes actual code out of the context window.

How to Trim

Four rules for keeping docs lean

1. Remove anything derivable from code

2. Remove anything already in the root

3. Extract reference material, keep the rule inline

4. Replace directory trees with tables

Treat docs like code. An accessibility section went from 120 lines to 2 lines plus a link. Same information, 98% less context consumed.

[CLICK → fragment 1] SAY: "Rule one: remove anything the agent can derive from reading the code. Don't document your class hierarchy — the agent can see it." [CLICK → fragment 2] SAY: "Rule two: remove anything that's already covered in the root. The subsystem inherits the root. Don't repeat it." [CLICK → fragment 3] SAY: "Rule three: extract reference material into separate files and keep just the rule inline. Our accessibility section was 120 lines of examples. We replaced it with two lines: 'All interactive elements must have accessibility labels. See docs/accessibility.md for examples.'" [CLICK → fragment 4] SAY: "Rule four: replace ASCII directory trees with compact tables. Same information, half the lines." [CLICK → fragment 5] SAY: "Treat your AGENTS.md like you'd treat code — refactor it, keep it lean." TARGET TIME: ~10.5 minutes total.

Give Your Agents a Makefile

Agents can't click Cmd+B

make build            # Build the Prod variant
make test             # Run all tests
make test FILTER=…    # Filter to specific tests
make format           # Run SwiftFormat
make modules          # List all SPM modules
make setup            # First-time project setup

Each command encodes the correct flags, simulators, and sequencing.
Miss one flag and you get a cryptic error 3 minutes into a build.

The Makefile IS the institutional knowledge. Which scheme, which simulator, which derived data path, which Swift flags. All encoded once, used everywhere.

SAY: "Here's a fundamental problem: agents can't use Xcode. They can't click buttons, they can't use the GUI, they can't press Cmd+B. They need a CLI interface. And for most iOS projects, that CLI interface doesn't exist." SAY: "The solution is a Makefile. Six commands that cover everything an agent needs to do." SAY: "make build — builds the production variant with the correct scheme, simulator, and derived data path. make test — runs all tests with the right flags. make test with a FILTER argument — targets specific test suites. make format — runs SwiftFormat so the agent's code matches your style. make modules — lists every SPM module in the project, so the agent knows what exists. make setup — handles first-time setup like resolving packages." [CLICK → fragment 1] SAY: "This is more important than it sounds. Each command encodes dozens of decisions: the right Xcode scheme, the right simulator device, the correct derived data path, the right Swift compiler flags. Get one wrong and you get an inscrutable error three minutes into a build. The agent wastes time debugging infrastructure instead of writing features." [CLICK → fragment 2] SAY: "The Makefile is institutional knowledge made executable. Instead of documenting all the flags in a README that nobody reads, you encode them once and they work every time." TARGET TIME: ~11.5 minutes total.

Block Direct Access

Wrapping isn't enough. You need a blocklist.

## IMPORTANT: Always Use Makefile Commands

**Do NOT call these tools directly:**
- `xcodebuild`  → use `make build` or `make test`
- `tuist build`  → use `make build`
- `tuist test`   → use `make test`
- `tuist install` → use `make install`

Without blocklist

Agent calls xcodebuild directly

Wrong flags, wrong simulator

Cryptic error, 5 minutes wasted debugging

With blocklist

Agent uses make build

Correct flags every time

Just works

The blocklist makes the Makefile the path of least resistance. Don't rely on the agent choosing the right tool. Remove the wrong tools from its vocabulary.

SAY: "We learned this the hard way. Wrapping your build system in a Makefile isn't enough — you also need to tell the agent NOT to use the underlying tools directly." SAY: "Without a blocklist, agents will reach for xcodebuild. It's in their training data. They know the API. And they'll use it with generic flags that are wrong for your project." [CLICK → fragment 1] SAY: "The difference is dramatic. Without a blocklist, the agent calls xcodebuild with wrong flags, gets a cryptic error, and spends five minutes trying to debug it. With the blocklist, it uses make build and everything works on the first try." [CLICK → fragment 2] SAY: "The blocklist makes the Makefile the path of least resistance. You're not fighting the agent's defaults — you're redirecting them." TARGET TIME: ~12.5 minutes total.

Worktree-Aware Tooling

Parallel agents need isolated environments

# Dynamic simulator per worktree
SIMULATOR = $(shell ./Tools/simulator-clone.sh get \
    2>/dev/null || echo "$(BASE_SIMULATOR)")

# Project-local DerivedData
DERIVED_DATA = $(PWD)/.build/DerivedData

1. Each worktree gets its own cloned simulator

2. DerivedData stays project-local, no cross-contamination

3. make build works identically in every worktree

Invisible to the agent. It runs make build and it just works. The isolation is handled by the Makefile, not by the agent.

SAY: "When you run multiple agents in parallel — and you will — they fight over shared resources. Two agents trying to use the same simulator at the same time will cause failures. Two agents writing to the same DerivedData directory will corrupt each other's builds." SAY: "The solution is worktree-aware tooling. Each git worktree gets its own isolated environment." [CLICK → fragment 1] SAY: "First, each worktree gets its own cloned simulator. The script detects whether you're in a worktree and either creates or reuses a simulator clone. No conflicts." [CLICK → fragment 2] SAY: "Second, DerivedData is project-local. It lives inside the worktree directory, not in the shared ~/Library path. No cross-contamination between parallel builds." [CLICK → fragment 3] SAY: "Third, make build works identically in every worktree. Same command, same behavior, different isolated environment." [CLICK → fragment 4] SAY: "The best part: the agent doesn't know any of this is happening. It runs make build. The Makefile handles the rest. The isolation is infrastructure, not something the agent needs to think about." TARGET TIME: ~13.5 minutes total.

Skills: Executable Methodology

Documentation tells agents WHAT. Skills tell them HOW.

1 Research Read the implementation and understand the domain

2 Analyze gaps Find coverage gaps and untested paths

3 Find examples Locate similar tests in the codebase as templates

4 Write tests Follow templates, use DI, match project conventions

5 Verify Run make test and fix any failures

6 Review Check quality, naming, and coverage completeness

Without the skill: agents use XCTest instead of Swift Testing, skip DI, and reinvent mock setup that already exists.

SAY: "Documentation tells agents what your conventions are. Skills tell them how to apply those conventions step by step. Here's our test-writing skill." [CLICK → fragments 1: steps 1-2 appear] SAY: "Steps 1 and 2: the agent researches the implementation and analyzes coverage gaps. It reads the code it's about to test and figures out what's missing — untested branches, error paths, edge cases." [CLICK → fragment 2: steps 3-4 appear] SAY: "Steps 3 and 4: before writing anything, the agent finds similar tests in the codebase. These become templates. Then it writes the tests following those templates — using your DI patterns, your mock setup, your naming conventions." [CLICK → fragment 3: steps 5-6 appear] SAY: "Steps 5 and 6: the agent runs the tests with make test, fixes any failures, and does a quality review. Does the naming match conventions? Is coverage complete?" [CLICK → fragment 4] SAY: "Without this skill, agents default to what they know from training data. They use XCTest instead of Swift Testing. They skip dependency injection. They create their own mock objects instead of using the ones your team already built. The skill makes the right approach the easy approach." TARGET TIME: ~15 minutes total.

Design System Compliance

Make the design system the only practical way to style UI

@Environment(\.appTheme) private var theme

Text("Hello")
    .themeForeground(.primary)
    .themePadding(.md)
    .themeCornerRadius(.lg)

1. Token protocols define every visual property: colors, spacing, radii, typography, shadows

2. Convenience modifiers accept semantic levels, not raw values

3. TOKENS.md documents every token with correct/incorrect examples

4. Lint rules flag violations; make build runs them as errors, not warnings

5. Preview + snapshot tests verify the output visually

Three layers of enforcement. The API makes the right approach easy. Lint rules make the wrong approach a build failure. Previews catch everything else.

SAY: "Here's a subtler problem. Agents can produce UI that compiles and looks plausible but doesn't match your design system. Hardcoded colors, arbitrary spacing, corner radii that are close but not right. These slip through code review because they render fine at first glance." SAY: "Our approach: make the design system the only practical way to style UI. The architecture does the enforcement, not documentation." [CLICK → fragment 1: code block appears] SAY: "Every view accesses the design system through a single environment injection point. Then convenience modifiers like themeForeground, themePadding, and themeCornerRadius accept semantic levels — primary, md, lg — not raw values. There is no Color.blue or .padding(16) in the vocabulary." [CLICK → fragment 2] SAY: "The foundation is a protocol layer that defines every visual property as a token: colors, spacing, radii, typography, shadows, stroke widths. A concrete implementation maps each token to your asset catalog values." [CLICK → fragment 3] SAY: "The convenience modifiers are the key. They accept semantic levels, not numbers. An agent reaching for inline styles would have to actively fight the API to do it wrong." [CLICK → fragment 4] SAY: "A TOKENS.md file in the module serves as the agent's design reference. Every token category documented with correct and incorrect code examples." [CLICK → fragment 5] SAY: "Here's the safety net. Custom SwiftLint rules flag direct use of Color, inline padding with literal values, anything that bypasses the token system. In Xcode these are warnings. But make build passes RUN_SWIFTLINT=1 with warnings as errors. So if an agent tries to use Color.blue instead of themeForeground, the build fails. And remember — the blocklist forces agents through the Makefile. They can't sidestep the lint rules by calling xcodebuild directly." [CLICK → fragment 6] SAY: "And preview builds plus snapshot tests across multiple configurations — light, dark, large text, accessibility sizes — catch anything that looks wrong even when it passes lint." [CLICK → fragment 7: takeaway appears] SAY: "Three layers of enforcement. The API makes the right approach easy. Lint rules make the wrong approach a build failure. And previews catch the rest. Each layer catches what the previous one misses." TARGET TIME: ~17 minutes total.

Visual Verification

Compiling is not the same as working

1 Screenshot Capture the screen to see what's rendered

2 View hierarchy Snapshot the element tree for structure analysis

3 Capture logs Check for runtime errors and warnings not visible in the UI

<!-- Simplified view hierarchy snapshot -->
<AXElement type="button" label="Submit" enabled="true">
  <AXElement type="text" value="Submit Order" />
</AXElement>
<AXElement type="image" label="" enabled="true" />  ⚠️ Missing label!

Side effect: missing accessibility labels fail the verification.

The agent reviews evidence, not just compilation. Observe, don't tap. Built on XcodeBuildMCP.

SAY: "Here's a gap most teams don't think about: just because code compiles and tests pass doesn't mean the UI is correct. An agent can add a button that's behind a navigation bar, or a label that's truncated. It compiles fine. Tests pass. But it's broken." [CLICK → fragment 1] SAY: "Step one: screenshot the simulator screen. The agent can literally see what was rendered." [CLICK → fragment 2] SAY: "Step two: snapshot the view hierarchy. This gives the agent the element tree — every view, its type, its position, its accessibility properties." [CLICK → fragment 3] SAY: "Step three: capture logs. Runtime warnings, constraint violations, anything the system is complaining about." [CLICK → fragment 4: code block + accessibility callout appear together] SAY: "Here's what a simplified view hierarchy snapshot looks like. The agent can see there's a button with a proper accessibility label, and an image with a missing label — that's a bug it can catch and fix. And here's a great side effect: missing accessibility labels automatically fail the verification. Accessibility isn't a separate audit anymore — it's built into the agent's workflow." [CLICK → fragment 5: takeaway appears] SAY: "The key idea: the agent reviews evidence, not just compilation results. It observes but doesn't tap — it's looking at the output, not interacting with it. This is built on XcodeBuildMCP, which provides the bridge between the agent and the simulator." TARGET TIME: ~18.5 minutes total.

Self-Maintaining Docs

The feedback loop that keeps documentation honest

"AGENTS.md files are living documents. Update them when you discover undocumented conventions, encounter patterns not covered by existing guidance, or find that current instructions lead to mistakes."

The guardrail: Every change must leave the document shorter or more useful.

Docs stay current because the entities reading them are also updating them. Unlike human documentation that gets written once and slowly drifts, agent documentation evolves with every session.

SAY: "This is the most underrated part of the whole system. Documentation that doesn't evolve is documentation that lies. And agent documentation gets read at the start of every single session — so stale docs cause failures immediately, not six months later." [CLICK → fragment 1] SAY: "We include this instruction directly in the AGENTS.md itself. The agents are told: if you discover something undocumented, if existing instructions led you astray, update the docs. The documentation becomes a living document maintained by its own readers." [CLICK → fragment 2] SAY: "But we need a guardrail. Without one, docs grow forever. So we add this constraint: every change must leave the document shorter or more useful. You can add a new rule, but you should also look for something to remove or consolidate. This keeps the docs from bloating over time." [CLICK → fragment 3] SAY: "The result is a feedback loop. Agents read the docs, discover gaps, and fill them in. The next agent session benefits from the update. Over time, the documentation converges on exactly what agents need to know — no more, no less." TARGET TIME: ~19 minutes total.

Takeaways

Context > capability.
Agents are only as good as the context you give them. An agent with good docs, a Makefile, and skills outperforms a "smarter" agent that's flying blind.

Make the right approach the easy approach.
Blocklists, skills, templates. Don't fight the agent's defaults; redirect them. When the correct workflow is also the path of least resistance, the agent follows it naturally.

Documentation is infrastructure.
Unlike human docs that get read once, agent docs get read at the start of every session. It's the highest-leverage investment you can make in your agent workflow.

SAY: "Let me leave you with three takeaways." [CLICK → fragment 1: first takeaway appears] SAY: "First: context beats capability. A well-equipped agent with good documentation, a solid Makefile, and clear skills will outperform a more capable agent that's flying blind. Invest in context." [CLICK → fragment 2: second takeaway appears] SAY: "Second: make the right approach the easy approach. Blocklists remove wrong choices. Skills encode right choices. Templates make right choices fast. You're not fighting the agent — you're shaping its environment so the right thing happens naturally." [CLICK → fragment 3: third takeaway appears] SAY: "Third: documentation is infrastructure. This is the mental shift. Human documentation gets written once and maybe read twice. Agent documentation gets read at the start of every single session. Every improvement to your AGENTS.md pays dividends on every future task. It's the highest-leverage investment you can make." PAUSE. SAY: "Thank you." TARGET TIME: ~20 minutes total.