ic-7i4g: GH#11: ADR: Scenario-Driven TDD with Agentic CLI
Snapshot: 2026-03-30T08:40:31Z
| Field | Value |
|---|---|
| Status | open |
| Assignee | (unassigned) |
| Priority | 2 |
| Labels | atlas |
| Created by | github-bridge |
| Created | 2026-03-28T19:00:04Z |
| Updated | 2026-03-28T19:00:04Z |
Description
GitHub issue: b4arena/spellkave#11 URL: https://github.com/b4arena/spellkave/issues/11
Context
Spellkave's development must be 100% test-driven. No implementation code without a failing test first.
For a persistent world simulation, traditional unit testing is insufficient — the interesting behavior is emergent from world interactions, not isolated function calls. Scenario-driven development is the natural fit: world scenarios (game situations) become the primary test specification.
The text CLI (reference implementation) serves double duty: it's both the player/agent interface and the test harness. It must follow agentic CLI design patterns so AI agents can use it as clients.
What Atlas Should Deliver
An ADR establishing:
1. TDD as Non-Negotiable
The rule: Every feature, every reducer, every table gets a test BEFORE implementation.
The loop:
- Write a world scenario as a test (Given/When/Then)
- Run it — watch it fail
- Implement the minimum reducer/table to make it pass
- Refactor
- Next scenario
For SpacetimeDB modules in Rust:
- Unit tests via
#[cfg(test)]for pure logic (damage calculation, ability checks) - Integration tests that spin up an in-memory SpacetimeDB instance, call reducers, assert table state
- Consider cucumber-rs for Gherkin-style scenario definitions
2. Scenario-Driven Design
Scenarios are world situations that simultaneously define:
- Product requirements — what the world should do
- Test cases — the executable specification
- Game design — gameplay moments
Example scenarios for Phase 0:
Scenario: Agent moves between locations
Given an agent "Greymantle" at location "Town Square"
And a location "Dark Cave" connected to "Town Square"
When Greymantle moves to "Dark Cave"
Then Greymantle's location is "Dark Cave"
And a WorldEvent "Greymantle entered Dark Cave" is logged
Scenario: Agent attacks a creature
Given an agent "Greymantle" with STR 16 wielding a "Longsword"
And a creature "Goblin" with AC 15, HP 7 at the same location
When Greymantle attacks the Goblin
Then an attack roll is made (d20 + STR mod + proficiency vs AC)
And if hit, damage is applied to Goblin's HP
And a WorldEvent documenting the combat is logged
Scenario: Two agents form an alliance
Given agent "Greymantle" of faction "Silver Order"
And agent "Thornwick" of faction "Merchant Guild"
When Greymantle proposes alliance to Thornwick
And Thornwick accepts (via ThinkRequest/ThinkResult cycle)
Then a FactionRelationship "allied" exists between Silver Order and Merchant Guild
And both agents' MemoryState records the alliance
Scenario: Observer watches emergent conflict
Given a running world with 3+ agents and active faction tensions
When an observer subscribes to WorldEventLog
Then they receive a stream of legible events
And can follow a 10-minute narrative thread without prior context
Key principle: If you can't write it as a scenario, you don't understand the feature well enough to build it.
3. Agentic CLI as Reference Implementation
The text CLI is the first client and follows agentic CLI design patterns. It is how both human players and AI agents interact with the world.
Core agentic CLI principles (from the Agentic CLI Recipe):
| Principle | Application to Spellkave CLI |
|---|---|
| Non-interactive | Every action is a single command with flags. No prompts, no menus. AI agents can operate it cold. |
--help as protocol contract | Complete API surface: all commands, all flags, stdin rules, output format. An agent reads --help once and knows everything. |
| Sensible defaults | spellkave with no args shows world status + next-step hints |
| Safe by default | Read-only operations are default. Actions require explicit commands. |
| Next-step hints | Every output suggests logical next commands |
| Dry-run for destructive | spellkave attack goblin previews; spellkave attack goblin --confirm executes |
| Machine-readable output | --json flag for structured output (AI agents may prefer this) |
| Verbose mode | --verbose shows detailed world state changes |
| Doctor command | spellkave doctor checks connection, world state, agent health |
Example CLI interaction flow:
# Cold start — agent reads the contract
$ spellkave --help
# Default: show world status (safe, read-only)
$ spellkave
World: Thornhaven (running 47h12m)
3 agents active, 2 factions, 12 events today
Your character: Greymantle (Level 3 Fighter, Town Square)
Next steps:
spellkave look Describe current location
spellkave move <location> Travel somewhere
spellkave actions List available actions
spellkave events Recent world events
spellkave --help All commands
# Explore
$ spellkave look
Town Square — A bustling marketplace.
NPCs: Thornwick (Merchant), Guard Captain Voss
Exits: Dark Cave (north), Forest Road (east), Docks (south)
Items: Notice board, well
Next steps:
spellkave talk <npc> Start conversation
spellkave move <exit> Travel to connected location
spellkave inspect <item> Examine something
# Act
$ spellkave move "Dark Cave"
Moving to Dark Cave...
Greymantle travels north from Town Square.
Dark Cave — A damp cavern. You hear scratching sounds.
Creatures: 2x Goblins (hostile)
Next steps:
spellkave attack <target> Initiate combat
spellkave stealth Attempt to sneak past
spellkave retreat Return to Town Square
# Observer mode (read-only subscriptions)
$ spellkave observe
[18:42] Greymantle entered Dark Cave
[18:43] Greymantle attacked Goblin — hit for 8 damage
[18:43] Goblin retaliates — miss
[18:44] Thornwick completed trade with traveling merchant
[18:45] Faction tension: Silver Order ↔ Merchant Guild rising
^C
# AI agent mode (structured output)
$ spellkave look --json
{"location": "Dark Cave", "creatures": [...], "exits": [...]}
4. The Testing Stack
Recommend a concrete testing approach for SpacetimeDB + Rust:
| Layer | Tool | What It Tests |
|---|---|---|
| Scenario tests | cucumber-rs or custom test harness | World scenarios (Given/When/Then) — calls reducers, asserts table state |
| Integration tests | SpacetimeDB test runtime | Reducer chains, subscription behavior, scheduled agent ticks |
| Unit tests | #[cfg(test)] | Pure logic: damage calc, ability checks, dice rolls (deterministic with seed) |
| CLI tests | CLI binary + assert stdout | End-to-end: run CLI commands, verify output format and content |
| Property tests | proptest crate | Invariants: "HP never exceeds max", "dead entities can't act", "events are monotonic" |
Dice determinism: Tests need reproducible randomness. Use a seeded RNG passed through ReducerContext or a test fixture. Never use system random in tests.
5. Development Workflow
1. Product scenario (Priya/human writes a world situation)
↓
2. Test scenario (developer translates to Gherkin/test code)
↓
3. Red — test fails (reducer/table doesn't exist yet)
↓
4. Green — implement minimum code to pass
↓
5. Refactor — clean up without breaking tests
↓
6. CLI verification — same scenario works through the text CLI
↓
7. Next scenario
Reference Material
Agentic CLI Patterns
- Agentic CLI Recipe — non-interactive, safe-by-default, capability gradient, next-step hints
- Building Effective AI Agents - Anthropic — agent interaction patterns
Testing in Rust
- cucumber-rs — Gherkin BDD framework for Rust
- proptest — property-based testing for Rust
SpacetimeDB Testing
- SpacetimeDB Rust Quickstart — module setup (test patterns in examples)
- BitCraft server code — reference for how a production SpacetimeDB module is structured (check for test patterns)
Related Spellkave Issues
- #6 — SpacetimeDB as runtime (the platform we're testing on)
- #9 — Rules engine interface (the logic scenarios exercise)
- #10 — Phase 0 minimal module (the first implementation target — scenarios define its scope)
Dependencies
- Depends on #6 (SpacetimeDB runtime) — testing approach depends on platform capabilities
- Informs ALL other issues — this is a cross-cutting methodology decision
- Specifically shapes #10 (Phase 0 module) — Phase 0 scenarios = Phase 0 scope
Acceptance Criteria
- ADR establishes TDD as mandatory development practice
- Scenario-driven design pattern documented with examples
- At least 10 Phase 0 scenarios defined (covering PRD exit criteria)
- Agentic CLI design principles documented with Spellkave-specific examples
- Testing stack recommendation (tools, layers, dice determinism)
- Development workflow documented (scenario → test → implement → verify)
- Committed to spellkave repo