Skip to main content

What Rahul Garg's 'Encoding Team Standards' Means for an Agentic Platform

· 5 min read
Christoph Görn
hacker, #B4mad Industries

Rahul Garg (Thoughtworks) published a piece on Martin Fowler's site about encoding team standards as executable AI instructions. I read it and immediately saw five things we should do at b4arena — because the problems they describe (inconsistent AI output, tribal knowledge bottlenecks, standards that live in people's heads) are exactly the problems we hit when eight agents run autonomously with only prose-based SOULs to guide them.

The Core Idea

The article argues that AI-assisted development amplifies inconsistency when quality depends on who prompts the AI. The fix: encode team standards as versioned, executable instructions that apply automatically — not as wiki pages or tribal knowledge. Standards become infrastructure (like linting rules), not documentation.

The key progression: tacit knowledge → explicit documentation → executable instructions. Structure each instruction with a role definition, context requirements, prioritized standards (critical / important / advisory), and a defined output format. Keep them small, single-purpose, and reviewed via PRs.

This resonates hard with b4arena. Our agents each have a SOUL.md that was written by whoever created the agent, with quietly different quality thresholds. Atlas reviews code with implicit criteria. Forge writes code guided by whatever CLAUDE.md happens to exist in the target repo. Nobody verified whether these standards were consistent — or even written down.

Five Things We Should Do

1. Formalize SOUL.md as Executable Governance

SOUL.md files already encode agent behavior — but they're prose, not structured instructions. Garg's four-part anatomy (role, context, prioritized standards, output format) would make SOULs machine-parseable and consistent across agents. Today, each SOUL was written independently, with quietly different thresholds for when to escalate vs. act autonomously — exactly the divergence the article warns about.

Action: Audit all eight deployed agent SOULs against a shared template. Standardize severity levels. Define what "critical" means the same way for every agent.

2. Encode Review Standards into Atlas

Atlas does architecture reviews via cron, but its review criteria live implicitly in its SOUL. The actual standards — what constitutes a blocking issue, what's advisory, what's fine — came from mine and Marcel's review patterns, never formalized. We already have a review-checklist.md in the arena skill's references. It needs to become an agent-consumable instruction format, not a human reference doc.

Action: Convert references/review-checklist.md into a structured instruction with explicit priority levels. Bind it to Atlas's review workflow so reviews are consistent regardless of model version.

3. Version-Control Agent Instructions Like Code

Agent SOULs, skills, and review checklists should follow the same PR review process as source code. Today, SOUL changes get rsynced to rpi5 without review. The article calls this "standards as infrastructure" — they live in the repo, they get PRs, they have accountability.

This also connects to a live problem: agents on rpi5 have been modifying their own SOUL files and CLI source (#98), breaking the platform twice in two days. If SOUL changes required a PR, agents couldn't silently drift.

Action: Add a pre-deploy check to just deploy-agents that verifies no uncommitted changes exist in agents/*/SOUL.md. Treat SOUL drift like code drift.

4. Create Generation Standards for Forge

Forge is our code-writing agent. It currently relies on per-repo CLAUDE.md files for conventions, but these vary wildly in quality and completeness. The result: Forge's output quality depends on which repo has better docs, not on any consistent standard. One shared forge-generation-standards.md covering naming conventions, error handling, test expectations, and security baselines would normalize output quality across all repos.

Action: Extract the patterns from our best CLAUDE.md (ludus, the most mature) into a shared Forge instruction. Start there, expand later.

5. Make Post-Action Verification a Critical Standard

In a previous dispatch, I documented how agents were saying they completed actions without verifying. Helm "assigned" an issue but didn't check. Main reported stale status without re-querying. The article's framing makes this precise: post-action verification should be a critical-priority standard (not advisory, not optional) applied at the earliest point — generation, not review.

Action: Add a ## Post-Action Verification section to the SOUL template with concrete examples: gh issue view after gh issue create, git log after git commit, bd show after bd close. Make it required, not prose.

The Meta-Lesson

The article's deepest insight isn't about AI — it's about making implicit knowledge explicit. We've been running b4arena with standards that "live in people's heads" (specifically, in mine and Marcel's heads, transferred to agents through SOUL.md prose). That worked for a two-person team with eight agents. It won't work when we scale to more agents, more repos, or more operators.

The progression from tacit → explicit → executable is the same progression b4arena needs: from "I know what good looks like" to "the SOUL says what good looks like" to "the agent enforces what good looks like, automatically, every time."


Written with help from Dispatch.