ic-7i4g: GH#11: ADR: Scenario-Driven TDD with Agentic CLI

Snapshot: 2026-03-30T08:40:31Z

Field	Value
Status	open
Assignee	(unassigned)
Priority	2
Labels	atlas
Created by	github-bridge
Created	2026-03-28T19:00:04Z
Updated	2026-03-28T19:00:04Z

Description

GitHub issue: b4arena/spellkave#11 URL: https://github.com/b4arena/spellkave/issues/11

Context

Spellkave's development must be 100% test-driven. No implementation code without a failing test first.

For a persistent world simulation, traditional unit testing is insufficient — the interesting behavior is emergent from world interactions, not isolated function calls. Scenario-driven development is the natural fit: world scenarios (game situations) become the primary test specification.

The text CLI (reference implementation) serves double duty: it's both the player/agent interface and the test harness. It must follow agentic CLI design patterns so AI agents can use it as clients.

What Atlas Should Deliver

An ADR establishing:

1. TDD as Non-Negotiable

The rule: Every feature, every reducer, every table gets a test BEFORE implementation.

The loop:

Write a world scenario as a test (Given/When/Then)
Run it — watch it fail
Implement the minimum reducer/table to make it pass
Refactor
Next scenario

For SpacetimeDB modules in Rust:

Unit tests via #[cfg(test)] for pure logic (damage calculation, ability checks)
Integration tests that spin up an in-memory SpacetimeDB instance, call reducers, assert table state
Consider cucumber-rs for Gherkin-style scenario definitions

2. Scenario-Driven Design

Scenarios are world situations that simultaneously define:

Product requirements — what the world should do
Test cases — the executable specification
Game design — gameplay moments

Example scenarios for Phase 0:

Scenario: Agent moves between locations
  Given an agent "Greymantle" at location "Town Square"
  And a location "Dark Cave" connected to "Town Square"
  When Greymantle moves to "Dark Cave"
  Then Greymantle's location is "Dark Cave"
  And a WorldEvent "Greymantle entered Dark Cave" is logged

Scenario: Agent attacks a creature
  Given an agent "Greymantle" with STR 16 wielding a "Longsword"
  And a creature "Goblin" with AC 15, HP 7 at the same location
  When Greymantle attacks the Goblin
  Then an attack roll is made (d20 + STR mod + proficiency vs AC)
  And if hit, damage is applied to Goblin's HP
  And a WorldEvent documenting the combat is logged

Scenario: Two agents form an alliance
  Given agent "Greymantle" of faction "Silver Order"
  And agent "Thornwick" of faction "Merchant Guild"
  When Greymantle proposes alliance to Thornwick
  And Thornwick accepts (via ThinkRequest/ThinkResult cycle)
  Then a FactionRelationship "allied" exists between Silver Order and Merchant Guild
  And both agents' MemoryState records the alliance

Scenario: Observer watches emergent conflict
  Given a running world with 3+ agents and active faction tensions
  When an observer subscribes to WorldEventLog
  Then they receive a stream of legible events
  And can follow a 10-minute narrative thread without prior context

Key principle: If you can't write it as a scenario, you don't understand the feature well enough to build it.

3. Agentic CLI as Reference Implementation

The text CLI is the first client and follows agentic CLI design patterns. It is how both human players and AI agents interact with the world.

Core agentic CLI principles (from the Agentic CLI Recipe):

Principle	Application to Spellkave CLI
Non-interactive	Every action is a single command with flags. No prompts, no menus. AI agents can operate it cold.
`--help` as protocol contract	Complete API surface: all commands, all flags, stdin rules, output format. An agent reads `--help` once and knows everything.
Sensible defaults	`spellkave` with no args shows world status + next-step hints
Safe by default	Read-only operations are default. Actions require explicit commands.
Next-step hints	Every output suggests logical next commands
Dry-run for destructive	`spellkave attack goblin` previews; `spellkave attack goblin --confirm` executes
Machine-readable output	`--json` flag for structured output (AI agents may prefer this)
Verbose mode	`--verbose` shows detailed world state changes
Doctor command	`spellkave doctor` checks connection, world state, agent health

Example CLI interaction flow:

# Cold start — agent reads the contract
$ spellkave --help

# Default: show world status (safe, read-only)
$ spellkave
World: Thornhaven (running 47h12m)
  3 agents active, 2 factions, 12 events today
  Your character: Greymantle (Level 3 Fighter, Town Square)

Next steps:
  spellkave look              Describe current location
  spellkave move <location>   Travel somewhere
  spellkave actions           List available actions
  spellkave events            Recent world events
  spellkave --help            All commands

# Explore
$ spellkave look
Town Square — A bustling marketplace. 
  NPCs: Thornwick (Merchant), Guard Captain Voss
  Exits: Dark Cave (north), Forest Road (east), Docks (south)
  Items: Notice board, well

Next steps:
  spellkave talk <npc>        Start conversation
  spellkave move <exit>       Travel to connected location
  spellkave inspect <item>    Examine something

# Act
$ spellkave move "Dark Cave"
Moving to Dark Cave...
  Greymantle travels north from Town Square.
  Dark Cave — A damp cavern. You hear scratching sounds.
  Creatures: 2x Goblins (hostile)
  
Next steps:
  spellkave attack <target>   Initiate combat
  spellkave stealth           Attempt to sneak past
  spellkave retreat           Return to Town Square

# Observer mode (read-only subscriptions)
$ spellkave observe
[18:42] Greymantle entered Dark Cave
[18:43] Greymantle attacked Goblin — hit for 8 damage
[18:43] Goblin retaliates — miss
[18:44] Thornwick completed trade with traveling merchant
[18:45] Faction tension: Silver Order ↔ Merchant Guild rising
^C

# AI agent mode (structured output)
$ spellkave look --json
{"location": "Dark Cave", "creatures": [...], "exits": [...]}

4. The Testing Stack

Recommend a concrete testing approach for SpacetimeDB + Rust:

Layer	Tool	What It Tests
Scenario tests	`cucumber-rs` or custom test harness	World scenarios (Given/When/Then) — calls reducers, asserts table state
Integration tests	SpacetimeDB test runtime	Reducer chains, subscription behavior, scheduled agent ticks
Unit tests	`#[cfg(test)]`	Pure logic: damage calc, ability checks, dice rolls (deterministic with seed)
CLI tests	CLI binary + assert stdout	End-to-end: run CLI commands, verify output format and content
Property tests	`proptest` crate	Invariants: "HP never exceeds max", "dead entities can't act", "events are monotonic"

Dice determinism: Tests need reproducible randomness. Use a seeded RNG passed through ReducerContext or a test fixture. Never use system random in tests.

5. Development Workflow

1. Product scenario (Priya/human writes a world situation)
   ↓
2. Test scenario (developer translates to Gherkin/test code)
   ↓
3. Red — test fails (reducer/table doesn't exist yet)
   ↓
4. Green — implement minimum code to pass
   ↓
5. Refactor — clean up without breaking tests
   ↓
6. CLI verification — same scenario works through the text CLI
   ↓
7. Next scenario

Reference Material

Agentic CLI Patterns

Agentic CLI Recipe — non-interactive, safe-by-default, capability gradient, next-step hints
Building Effective AI Agents - Anthropic — agent interaction patterns

Testing in Rust

cucumber-rs — Gherkin BDD framework for Rust
proptest — property-based testing for Rust

SpacetimeDB Testing

SpacetimeDB Rust Quickstart — module setup (test patterns in examples)
BitCraft server code — reference for how a production SpacetimeDB module is structured (check for test patterns)

#6 — SpacetimeDB as runtime (the platform we're testing on)
#9 — Rules engine interface (the logic scenarios exercise)
#10 — Phase 0 minimal module (the first implementation target — scenarios define its scope)

Dependencies

Depends on #6 (SpacetimeDB runtime) — testing approach depends on platform capabilities
Informs ALL other issues — this is a cross-cutting methodology decision
Specifically shapes #10 (Phase 0 module) — Phase 0 scenarios = Phase 0 scope

Acceptance Criteria

ADR establishes TDD as mandatory development practice
Scenario-driven design pattern documented with examples
At least 10 Phase 0 scenarios defined (covering PRD exit criteria)
Agentic CLI design principles documented with Spellkave-specific examples
Testing stack recommendation (tools, layers, dice determinism)
Development workflow documented (scenario → test → implement → verify)
Committed to spellkave repo

Conversation

github-bridgeMar 28, 08:20 PMsystem

[GH @durandom] ## Addendum: Agentic CLI Deep-Dive (from design exploration) ### The Text CLI Serves Triple Duty 1. **Player interface** — humans play the game through it 2. **Agent interface** — AI agents interact with the world through the same commands 3. **Test harness** — scenarios are verified end-to-end through the CLI This means the CLI design IS the API design. If a scenario works through the CLI, it works for all client types. ### Concrete Agentic CLI Patterns (applied to Spellkave) From the [Agentic CLI Recipe](https://thenewstack.io/ai-coding-tools-in-2025-welcome-to-the-agentic-cli-era/): **1. `--help` as protocol contract.** An AI agent reads `--help` once and knows the entire API: ``` $ spellkave --help Spellkave — a persistent D&D world Commands: look Describe current location move <location> Travel to connected location attack <target> Initiate combat (preview; use --confirm to execute) talk <npc> <msg> Start conversation inventory Show carried items events Recent world events observe Stream world events (read-only) status World and character overview doctor Check connection and world health Options: --json Machine-readable output --verbose Detailed state changes --confirm Execute destructive actions (hidden from --help, revealed in output) --character <id> Act as specific character ``` **2. Safe-by-default with next-step hints:** ``` $ spellkave World: Thornhaven (running 47h12m) 3 agents active, 2 factions, 12 events today Your character: Greymantle (Level 3 Fighter, Town Square) Next steps: spellkave look Describe current location spellkave move <location> Travel somewhere spellkave events Recent world events $ spellkave attack goblin Preview: Greymantle attacks Goblin (AC 15) Attack roll: d20 + 5 (STR) + 2 (proficiency) = d20+7 vs AC 15 Estimated damage on hit: 1d8+5 slashing To execute: spellkave attack goblin --confirm ``` **3. Observer mode** — read-only subscription stream: ``` $ spellkave observe [18:42] Greymantle entered Dark Cave [18:43] Greymantle attacked Goblin — hit for 8 damage (longsword) [18:44] Thornwick completed trade: 3 healing potions → Silver Order [18:45] Faction tension rising: Silver Order ↔ Merchant Guild ^C ``` **4. Machine-readable for AI agents:** ``` $ spellkave look --json { "location": "Dark Cave", "description": "A damp cavern. Scratching sounds echo.", "creatures": [{"name": "Goblin", "hostile": true, "hp": 7, "ac": 15}], "exits": [{"name": "Town Square", "direction": "south"}], "items": [] } ``` **5. Doctor command** for health checks: ``` $ spellkave doctor ✓ SpacetimeDB: connected (ws://mimas:3000) ✓ World: running (uptime: 47h12m) ✓ Character: Greymantle (entity_id: 42) ⚠ 2 events missed since last session → Run: spellkave events --since "2h ago" ``` ### How Scenarios Map to CLI Commands Each Gherkin scenario translates directly to CLI invocations: ```gherkin Scenario: Agent moves between locations Given Greymantle is at "Town Square" # spellkave look --json | jq '.location' When Greymantle moves to "Dark Cave" # spellkave move "Dark Cave" --confirm Then location is "Dark Cave" # spellkave look --json | jq '.location' And event is logged # spellkave events --last 1 --json ``` The test harness runs these CLI commands, parses `--json` output, and asserts. No special test API needed — the CLI IS the test interface. ### Reference: spacetime-mud [spacetime-mud](https://github.com/clockworklabs/spacetime-mud) already implements this pattern: - Rust SpacetimeDB module for world state - TypeScript text interface using SpacetimeDB TS SDK - Python AI agent that subscribes and acts This is the closest existing prototype to what Spellkave's Phase 0 text CLI should look like.

Description​

Context​

What Atlas Should Deliver​

1. TDD as Non-Negotiable​

2. Scenario-Driven Design​

3. Agentic CLI as Reference Implementation​

4. The Testing Stack​

5. Development Workflow​

Reference Material​

Agentic CLI Patterns​

Testing in Rust​

SpacetimeDB Testing​

Related Spellkave Issues​

Dependencies​

Acceptance Criteria​

Conversation​