Skip to main content

ic-7i4g: GH#11: ADR: Scenario-Driven TDD with Agentic CLI

Snapshot: 2026-03-30T08:40:31Z

FieldValue
Statusopen
Assignee(unassigned)
Priority2
Labelsatlas
Created bygithub-bridge
Created2026-03-28T19:00:04Z
Updated2026-03-28T19:00:04Z

Description

GitHub issue: b4arena/spellkave#11 URL: https://github.com/b4arena/spellkave/issues/11

Context

Spellkave's development must be 100% test-driven. No implementation code without a failing test first.

For a persistent world simulation, traditional unit testing is insufficient — the interesting behavior is emergent from world interactions, not isolated function calls. Scenario-driven development is the natural fit: world scenarios (game situations) become the primary test specification.

The text CLI (reference implementation) serves double duty: it's both the player/agent interface and the test harness. It must follow agentic CLI design patterns so AI agents can use it as clients.

What Atlas Should Deliver

An ADR establishing:

1. TDD as Non-Negotiable

The rule: Every feature, every reducer, every table gets a test BEFORE implementation.

The loop:

  1. Write a world scenario as a test (Given/When/Then)
  2. Run it — watch it fail
  3. Implement the minimum reducer/table to make it pass
  4. Refactor
  5. Next scenario

For SpacetimeDB modules in Rust:

  • Unit tests via #[cfg(test)] for pure logic (damage calculation, ability checks)
  • Integration tests that spin up an in-memory SpacetimeDB instance, call reducers, assert table state
  • Consider cucumber-rs for Gherkin-style scenario definitions

2. Scenario-Driven Design

Scenarios are world situations that simultaneously define:

  • Product requirements — what the world should do
  • Test cases — the executable specification
  • Game design — gameplay moments

Example scenarios for Phase 0:

Scenario: Agent moves between locations
Given an agent "Greymantle" at location "Town Square"
And a location "Dark Cave" connected to "Town Square"
When Greymantle moves to "Dark Cave"
Then Greymantle's location is "Dark Cave"
And a WorldEvent "Greymantle entered Dark Cave" is logged

Scenario: Agent attacks a creature
Given an agent "Greymantle" with STR 16 wielding a "Longsword"
And a creature "Goblin" with AC 15, HP 7 at the same location
When Greymantle attacks the Goblin
Then an attack roll is made (d20 + STR mod + proficiency vs AC)
And if hit, damage is applied to Goblin's HP
And a WorldEvent documenting the combat is logged

Scenario: Two agents form an alliance
Given agent "Greymantle" of faction "Silver Order"
And agent "Thornwick" of faction "Merchant Guild"
When Greymantle proposes alliance to Thornwick
And Thornwick accepts (via ThinkRequest/ThinkResult cycle)
Then a FactionRelationship "allied" exists between Silver Order and Merchant Guild
And both agents' MemoryState records the alliance

Scenario: Observer watches emergent conflict
Given a running world with 3+ agents and active faction tensions
When an observer subscribes to WorldEventLog
Then they receive a stream of legible events
And can follow a 10-minute narrative thread without prior context

Key principle: If you can't write it as a scenario, you don't understand the feature well enough to build it.

3. Agentic CLI as Reference Implementation

The text CLI is the first client and follows agentic CLI design patterns. It is how both human players and AI agents interact with the world.

Core agentic CLI principles (from the Agentic CLI Recipe):

PrincipleApplication to Spellkave CLI
Non-interactiveEvery action is a single command with flags. No prompts, no menus. AI agents can operate it cold.
--help as protocol contractComplete API surface: all commands, all flags, stdin rules, output format. An agent reads --help once and knows everything.
Sensible defaultsspellkave with no args shows world status + next-step hints
Safe by defaultRead-only operations are default. Actions require explicit commands.
Next-step hintsEvery output suggests logical next commands
Dry-run for destructivespellkave attack goblin previews; spellkave attack goblin --confirm executes
Machine-readable output--json flag for structured output (AI agents may prefer this)
Verbose mode--verbose shows detailed world state changes
Doctor commandspellkave doctor checks connection, world state, agent health

Example CLI interaction flow:

# Cold start — agent reads the contract
$ spellkave --help

# Default: show world status (safe, read-only)
$ spellkave
World: Thornhaven (running 47h12m)
3 agents active, 2 factions, 12 events today
Your character: Greymantle (Level 3 Fighter, Town Square)

Next steps:
spellkave look Describe current location
spellkave move <location> Travel somewhere
spellkave actions List available actions
spellkave events Recent world events
spellkave --help All commands

# Explore
$ spellkave look
Town Square — A bustling marketplace.
NPCs: Thornwick (Merchant), Guard Captain Voss
Exits: Dark Cave (north), Forest Road (east), Docks (south)
Items: Notice board, well

Next steps:
spellkave talk <npc> Start conversation
spellkave move <exit> Travel to connected location
spellkave inspect <item> Examine something

# Act
$ spellkave move "Dark Cave"
Moving to Dark Cave...
Greymantle travels north from Town Square.
Dark Cave — A damp cavern. You hear scratching sounds.
Creatures: 2x Goblins (hostile)

Next steps:
spellkave attack <target> Initiate combat
spellkave stealth Attempt to sneak past
spellkave retreat Return to Town Square

# Observer mode (read-only subscriptions)
$ spellkave observe
[18:42] Greymantle entered Dark Cave
[18:43] Greymantle attacked Goblin — hit for 8 damage
[18:43] Goblin retaliates — miss
[18:44] Thornwick completed trade with traveling merchant
[18:45] Faction tension: Silver Order ↔ Merchant Guild rising
^C

# AI agent mode (structured output)
$ spellkave look --json
{"location": "Dark Cave", "creatures": [...], "exits": [...]}

4. The Testing Stack

Recommend a concrete testing approach for SpacetimeDB + Rust:

LayerToolWhat It Tests
Scenario testscucumber-rs or custom test harnessWorld scenarios (Given/When/Then) — calls reducers, asserts table state
Integration testsSpacetimeDB test runtimeReducer chains, subscription behavior, scheduled agent ticks
Unit tests#[cfg(test)]Pure logic: damage calc, ability checks, dice rolls (deterministic with seed)
CLI testsCLI binary + assert stdoutEnd-to-end: run CLI commands, verify output format and content
Property testsproptest crateInvariants: "HP never exceeds max", "dead entities can't act", "events are monotonic"

Dice determinism: Tests need reproducible randomness. Use a seeded RNG passed through ReducerContext or a test fixture. Never use system random in tests.

5. Development Workflow

1. Product scenario (Priya/human writes a world situation)

2. Test scenario (developer translates to Gherkin/test code)

3. Red — test fails (reducer/table doesn't exist yet)

4. Green — implement minimum code to pass

5. Refactor — clean up without breaking tests

6. CLI verification — same scenario works through the text CLI

7. Next scenario

Reference Material

Agentic CLI Patterns

Testing in Rust

SpacetimeDB Testing

  • #6 — SpacetimeDB as runtime (the platform we're testing on)
  • #9 — Rules engine interface (the logic scenarios exercise)
  • #10 — Phase 0 minimal module (the first implementation target — scenarios define its scope)

Dependencies

  • Depends on #6 (SpacetimeDB runtime) — testing approach depends on platform capabilities
  • Informs ALL other issues — this is a cross-cutting methodology decision
  • Specifically shapes #10 (Phase 0 module) — Phase 0 scenarios = Phase 0 scope

Acceptance Criteria

  • ADR establishes TDD as mandatory development practice
  • Scenario-driven design pattern documented with examples
  • At least 10 Phase 0 scenarios defined (covering PRD exit criteria)
  • Agentic CLI design principles documented with Spellkave-specific examples
  • Testing stack recommendation (tools, layers, dice determinism)
  • Development workflow documented (scenario → test → implement → verify)
  • Committed to spellkave repo

Conversation

github-bridgeMar 28, 08:20 PMsystem
[GH @durandom] ## Addendum: Agentic CLI Deep-Dive (from design exploration) ### The Text CLI Serves Triple Duty 1. **Player interface** — humans play the game through it 2. **Agent interface** — AI agents interact with the world through the same commands 3. **Test harness** — scenarios are verified end-to-end through the CLI This means the CLI design IS the API design. If a scenario works through the CLI, it works for all client types. ### Concrete Agentic CLI Patterns (applied to Spellkave) From the [Agentic CLI Recipe](https://thenewstack.io/ai-coding-tools-in-2025-welcome-to-the-agentic-cli-era/): **1. `--help` as protocol contract.** An AI agent reads `--help` once and knows the entire API: ``` $ spellkave --help Spellkave — a persistent D&D world Commands: look Describe current location move <location> Travel to connected location attack <target> Initiate combat (preview; use --confirm to execute) talk <npc> <msg> Start conversation inventory Show carried items events Recent world events observe Stream world events (read-only) status World and character overview doctor Check connection and world health Options: --json Machine-readable output --verbose Detailed state changes --confirm Execute destructive actions (hidden from --help, revealed in output) --character <id> Act as specific character ``` **2. Safe-by-default with next-step hints:** ``` $ spellkave World: Thornhaven (running 47h12m) 3 agents active, 2 factions, 12 events today Your character: Greymantle (Level 3 Fighter, Town Square) Next steps: spellkave look Describe current location spellkave move <location> Travel somewhere spellkave events Recent world events $ spellkave attack goblin Preview: Greymantle attacks Goblin (AC 15) Attack roll: d20 + 5 (STR) + 2 (proficiency) = d20+7 vs AC 15 Estimated damage on hit: 1d8+5 slashing To execute: spellkave attack goblin --confirm ``` **3. Observer mode** — read-only subscription stream: ``` $ spellkave observe [18:42] Greymantle entered Dark Cave [18:43] Greymantle attacked Goblin — hit for 8 damage (longsword) [18:44] Thornwick completed trade: 3 healing potions → Silver Order [18:45] Faction tension rising: Silver Order ↔ Merchant Guild ^C ``` **4. Machine-readable for AI agents:** ``` $ spellkave look --json { "location": "Dark Cave", "description": "A damp cavern. Scratching sounds echo.", "creatures": [{"name": "Goblin", "hostile": true, "hp": 7, "ac": 15}], "exits": [{"name": "Town Square", "direction": "south"}], "items": [] } ``` **5. Doctor command** for health checks: ``` $ spellkave doctor ✓ SpacetimeDB: connected (ws://mimas:3000) ✓ World: running (uptime: 47h12m) ✓ Character: Greymantle (entity_id: 42) ⚠ 2 events missed since last session → Run: spellkave events --since "2h ago" ``` ### How Scenarios Map to CLI Commands Each Gherkin scenario translates directly to CLI invocations: ```gherkin Scenario: Agent moves between locations Given Greymantle is at "Town Square" # spellkave look --json | jq '.location' When Greymantle moves to "Dark Cave" # spellkave move "Dark Cave" --confirm Then location is "Dark Cave" # spellkave look --json | jq '.location' And event is logged # spellkave events --last 1 --json ``` The test harness runs these CLI commands, parses `--json` output, and asserts. No special test API needed — the CLI IS the test interface. ### Reference: spacetime-mud [spacetime-mud](https://github.com/clockworklabs/spacetime-mud) already implements this pattern: - Rust SpacetimeDB module for world state - TypeScript text interface using SpacetimeDB TS SDK - Python AI agent that subscribes and acts This is the closest existing prototype to what Spellkave's Phase 0 text CLI should look like.