M3: Simulated Agent Loop

2026-02-20T18:51:23Z by Showboat 0.6.0

QA Environment Setup

Starting M3 QA testing. Beads repo was missing — initialized fresh with bd init. BEADS_DIR=/Users/mhild/src/durandom/openclaw/b4arena/beads/.beads. Tools: bd, jq available.

Test 1: Single Bead Lifecycle (Happy Path)

Create a single bead with label dev, pipe watcher through agent, verify it gets claimed and closed.

export BEADS_DIR=$(pwd)/beads/.beads && bd create "Fix login button" -p 1 --labels dev -d "The login button is broken on mobile" --json | jq .

{
  "id": "beads-jls",
  "title": "Fix login button",
  "description": "The login button is broken on mobile",
  "status": "open",
  "priority": 1,
  "issue_type": "task",
  "owner": "hild@b4mad.net",
  "created_at": "2026-02-20T18:52:27.623958Z",
  "created_by": "Marcel Hild",
  "updated_at": "2026-02-20T18:52:27.623958Z"
}

export BEADS_DIR=$(pwd)/beads/.beads && scripts/beads-watcher.sh

WAKE dev beads-jls

export BEADS_DIR=$(pwd)/beads/.beads && scripts/beads-watcher.sh | BD_ACTOR=dev-agent scripts/agent-sim.sh --label dev

export BEADS_DIR=$(pwd)/beads/.beads && bd show beads-jls --json | jq ".[0] | {id, title, status, assignee, close_reason}"

{
  "id": "beads-jls",
  "title": "Fix login button",
  "status": "closed",
  "assignee": "dev-agent",
  "close_reason": "Done"
}

PASS: Single bead lifecycle works. Bead created (open) -> watcher emits WAKE -> agent claims (assignee=dev-agent) -> agent closes (reason=Done). BD_ACTOR correctly sets the assignee.

Test 2: Multiple Beads with Different Labels

Create 3 beads: 2 with label dev, 1 with label qa. Run agent with --label dev. Verify only dev beads are processed.

export BEADS_DIR=$(pwd)/beads/.beads && bd create "Refactor auth module" -p 2 --labels dev -d "Clean up auth code" --json | jq -r .id && bd create "Add unit tests" -p 2 --labels dev -d "Add tests for auth" --json | jq -r .id && bd create "Write test plan" -p 1 --labels qa -d "QA test plan for release" --json | jq -r .id

beads-fe9
beads-jca
beads-o62

export BEADS_DIR=$(pwd)/beads/.beads && scripts/beads-watcher.sh

WAKE dev beads-fe9 beads-jca
WAKE qa beads-o62

export BEADS_DIR=$(pwd)/beads/.beads && scripts/beads-watcher.sh | BD_ACTOR=dev-agent scripts/agent-sim.sh --label dev

export BEADS_DIR=$(pwd)/beads/.beads && echo "--- dev beads ---" && bd show beads-fe9 beads-jca --json | jq ".[] | {id, status, assignee}" && echo "--- qa bead ---" && bd show beads-o62 --json | jq ".[0] | {id, status, assignee}"

--- dev beads ---
{
  "id": "beads-fe9",
  "status": "closed",
  "assignee": "dev-agent"
}
{
  "id": "beads-jca",
  "status": "closed",
  "assignee": "dev-agent"
}
--- qa bead ---
{
  "id": "beads-o62",
  "status": "open",
  "assignee": null
}

PASS: Label filtering works correctly. Agent with --label dev processed only the 2 dev beads (both closed, assignee=dev-agent). The qa bead (beads-o62) remains open and unassigned.

Test 3: Second Agent Picks Up Remaining Work

Run a qa-agent with --label qa to process the remaining qa bead.

export BEADS_DIR=$(pwd)/beads/.beads && scripts/beads-watcher.sh | BD_ACTOR=qa-agent scripts/agent-sim.sh --label qa

export BEADS_DIR=$(pwd)/beads/.beads && bd show beads-o62 --json | jq ".[0] | {id, status, assignee, close_reason}"

{
  "id": "beads-o62",
  "status": "closed",
  "assignee": "qa-agent",
  "close_reason": "Done"
}

PASS: Second agent (qa-agent) picks up and closes the qa bead. Different BD_ACTOR identities are correctly recorded per agent.

Test 4: Watcher with No Open Beads

All beads are now closed. Run watcher — should produce no output and exit cleanly.

export BEADS_DIR=$(pwd)/beads/.beads && output=$(scripts/beads-watcher.sh) && echo "Output: \"${output}\"" && echo "Exit code: $?"

Output: ""
Exit code: 0

PASS: Watcher exits cleanly with no output when all beads are closed.

Test 5: Already-Claimed Bead

Create a bead, manually claim it, then run the pipeline. Does the agent handle a double-claim gracefully?

export BEADS_DIR=$(pwd)/beads/.beads && echo "Bead beads-rde status:" && bd show beads-rde --json | jq ".[0] | {id, status, assignee}" && echo "--- Watcher output: ---" && scripts/beads-watcher.sh && echo "(empty = no WAKE lines)"

Bead beads-rde status:
{
  "id": "beads-rde",
  "status": "in_progress",
  "assignee": "human-dev"
}
--- Watcher output: ---
(empty = no WAKE lines)

PASS: The watcher uses --status open, so already-claimed beads (status=in_progress) are naturally excluded from WAKE output. The agent never sees them. This is correct behavior — the claim acts as an implicit lock.

Test 6: Unlabeled Beads (Triage Routing)

Create a bead with no labels. Watcher should route it as WAKE eng-mgr. Agent with --label dev should skip it.

export BEADS_DIR=$(pwd)/beads/.beads && BEAD_ID=$(bd create "Unlabeled task" -p 3 -d "No labels assigned" --json | jq -r .id) && echo "Created: $BEAD_ID" && echo "--- Watcher output: ---" && scripts/beads-watcher.sh

Created: beads-41v
--- Watcher output: ---
WAKE eng-mgr beads-41v

export BEADS_DIR=$(pwd)/beads/.beads && scripts/beads-watcher.sh | BD_ACTOR=dev-agent scripts/agent-sim.sh --label dev && echo "After dev-agent run:" && bd show beads-41v --json | jq ".[0] | {id, status, assignee}"

After dev-agent run:
{
  "id": "beads-41v",
  "status": "open",
  "assignee": null
}

export BEADS_DIR=$(pwd)/beads/.beads && scripts/beads-watcher.sh | BD_ACTOR=eng-mgr scripts/agent-sim.sh --label eng-mgr && echo "After eng-mgr run:" && bd show beads-41v --json | jq ".[0] | {id, status, assignee, close_reason}"

After eng-mgr run:
{
  "id": "beads-41v",
  "status": "closed",
  "assignee": "eng-mgr",
  "close_reason": "Done"
}

PASS: Unlabeled beads are routed as WAKE eng-mgr. Dev-agent with --label dev correctly ignores them. An eng-mgr agent with --label eng-mgr picks them up and closes them.

Test 7: Agent Without --label Flag (Processes All)

Create beads with different labels. Run agent without --label — should process everything.

export BEADS_DIR=$(pwd)/beads/.beads && bd create "Task A" -p 1 --labels dev -d "dev task" --json | jq -r .id && bd create "Task B" -p 1 --labels qa -d "qa task" --json | jq -r .id && bd create "Task C" -p 1 -d "unlabeled task" --json | jq -r .id

beads-j3h
beads-y7i
beads-kg8

export BEADS_DIR=$(pwd)/beads/.beads && scripts/beads-watcher.sh | BD_ACTOR=omni-agent scripts/agent-sim.sh && echo "--- All beads after ---" && bd show beads-j3h beads-y7i beads-kg8 --json | jq ".[] | {id, status, assignee}"

--- All beads after ---
{
  "id": "beads-j3h",
  "status": "closed",
  "assignee": "omni-agent"
}
{
  "id": "beads-y7i",
  "status": "closed",
  "assignee": "omni-agent"
}
{
  "id": "beads-kg8",
  "status": "closed",
  "assignee": "omni-agent"
}

PASS: Agent without --label processes all WAKE lines regardless of label. All 3 beads (dev, qa, unlabeled/eng-mgr) were claimed and closed by omni-agent.

Test 8: Agent with Empty Input

Pipe empty string to agent. Should exit cleanly without errors.

export BEADS_DIR=$(pwd)/beads/.beads && echo "" | BD_ACTOR=test scripts/agent-sim.sh && echo "Exit code: $?"

Exit code: 0

PASS: Agent handles empty input gracefully — exits with code 0, no errors.

Test 9: Agent with Garbage/Non-WAKE Input

Feed non-WAKE lines to the agent. Should ignore them silently.

export BEADS_DIR=$(pwd)/beads/.beads && printf "hello world\nrandom noise\nWAKE-ish but not really\nSLEEP dev beads-123\n" | BD_ACTOR=test scripts/agent-sim.sh && echo "Exit code: $?"

Exit code: 0

PASS: Agent ignores non-WAKE lines (random text, SLEEP, WAKE-ish). Only lines starting with exactly WAKE are processed.

Test 10: Agent with Unknown Argument

Pass an invalid flag to the agent. Should reject with error.

export BEADS_DIR=$(pwd)/beads/.beads && echo "WAKE dev beads-xxx" | BD_ACTOR=test scripts/agent-sim.sh --bogus 2>&1; echo "Exit code: $?"

Unknown argument: --bogus
Exit code: 1

PASS: Agent rejects unknown arguments with a clear error message and exit code 1.

Test 11: WAKE with Non-existent Bead ID

Feed the agent a WAKE line referencing a bead ID that does not exist. Agent suppresses bd errors (>/dev/null 2>&1) — does it exit cleanly?

export BEADS_DIR=$(pwd)/beads/.beads && echo "WAKE dev beads-NONEXISTENT" | BD_ACTOR=test scripts/agent-sim.sh 2>&1; echo "Exit code: $?"

Exit code: 1

OBSERVATION: Agent exits with code 1 on non-existent bead ID. The script uses set -e (errexit), so when bd update fails (even with stderr suppressed), the entire script aborts. This is a minor issue — in a real pipeline with multiple beads, one bad ID would stop processing of subsequent beads. Consider adding || true to the bd commands, or handling errors per-bead.

Test 12: Mixed Valid/Invalid IDs in Single WAKE Line

Create one valid bead, then send a WAKE line with a bad ID followed by the valid one. Does the valid bead get skipped due to the failure on the bad one?

export BEADS_DIR=$(pwd)/beads/.beads && BEAD_ID=$(bd create "Valid bead" -p 1 --labels dev -d "This one exists" --json | jq -r .id) && echo "Created: $BEAD_ID" && echo "WAKE dev beads-BAD $BEAD_ID" | BD_ACTOR=test scripts/agent-sim.sh 2>&1; echo "Exit code: $?" && bd show "$BEAD_ID" --json | jq ".[0] | {id, status, assignee}"

Created: beads-a9f
Exit code: 1
{
  "id": "beads-a9f",
  "status": "open",
  "assignee": null
}

FAIL (minor): When a WAKE line contains a bad ID before a valid one (e.g., WAKE dev beads-BAD beads-a9f), the script aborts on the bad ID due to set -e. The valid bead beads-a9f is never processed. In production, this means one stale/corrupt bead ID in the watcher output could block processing of all subsequent beads in that label group.

Recommendation: Add error handling per-bead, e.g., bd update ... || { echo 'warn: claim failed' >&2; continue; }

Test 13: Bead with Multiple Labels

Create a bead with both dev and qa labels. Watcher should emit it in both WAKE lines. Two agents (dev, qa) should both see it.

export BEADS_DIR=$(pwd)/beads/.beads && BEAD_ID=$(bd create "Cross-team task" -p 1 --labels dev,qa -d "Needs both dev and qa" --json | jq -r .id) && echo "Created: $BEAD_ID" && echo "--- Watcher output: ---" && scripts/beads-watcher.sh

Created: beads-ceh
--- Watcher output: ---
WAKE dev beads-ceh
WAKE qa beads-ceh

export BEADS_DIR=$(pwd)/beads/.beads && echo "WAKE dev beads-ceh" | BD_ACTOR=dev-agent scripts/agent-sim.sh --label dev && echo "After dev-agent:" && bd show beads-ceh --json | jq ".[0] | {id, status, assignee, close_reason}"

After dev-agent:
{
  "id": "beads-ceh",
  "status": "closed",
  "assignee": "dev-agent",
  "close_reason": "Done"
}

NOTE: The dolt database became corrupted when showboat's background process held a lock while another bd command tried to access it concurrently. The bd doctor --fix command itself panicked (SIGSEGV). Had to remove the dolt directory and re-init. This highlights a real concern: dolt's embedded mode does not handle concurrent access gracefully. The lock file mechanism works, but recovery from stale locks is broken.

Re-initialized beads repo to continue testing.

Test 13 (retry): Bead with Multiple Labels

Re-running after database reinit. Create a bead with both dev and qa labels.

export BEADS_DIR=$(pwd)/beads/.beads && BEAD_ID=$(bd create "Cross-team task" -p 1 --labels dev,qa -d "Needs both dev and qa" --json | jq -r .id) && echo "Created: $BEAD_ID" && echo "--- Watcher output: ---" && scripts/beads-watcher.sh

Created: beads-69q
--- Watcher output: ---
WAKE dev beads-69q
WAKE qa beads-69q

export BEADS_DIR=$(pwd)/beads/.beads && scripts/beads-watcher.sh | BD_ACTOR=dev-agent scripts/agent-sim.sh --label dev && echo "After dev-agent claims:" && bd show beads-69q --json | jq ".[0] | {id, status, assignee, close_reason}"

After dev-agent claims:
{
  "id": "beads-69q",
  "status": "closed",
  "assignee": "dev-agent",
  "close_reason": "Done"
}

PASS: Multi-label bead appears in both WAKE lines (dev and qa). First agent to run (dev-agent) claims and closes it. Since it is then closed, the watcher won't emit it again, so the qa-agent never sees it. This is correct 'first-come-first-served' behavior, but worth noting: multi-label beads get processed by whichever agent runs first, not by all matching agents.

Test 14: Full Pipeline — 5 Beads, Multiple Labels

Create 5 beads: 3 dev, 1 qa, 1 unlabeled. Run full pipeline with no label filter. Verify all beads are closed.

export BEADS_DIR=$(pwd)/beads/.beads && for i in 1 2 3; do bd create "Dev task $i" -p $i --labels dev -d "Dev work item $i" --json | jq -r .id; done && bd create "QA review" -p 1 --labels qa -d "Review cycle" --json | jq -r .id && bd create "Triage me" -p 2 -d "No labels" --json | jq -r .id

beads-6gi
beads-gyg
beads-bbp
beads-z1i
beads-5mp

export BEADS_DIR=$(pwd)/beads/.beads && echo "--- Watcher output ---" && scripts/beads-watcher.sh && echo "--- Running agent (no filter) ---" && scripts/beads-watcher.sh | BD_ACTOR=omni-bot scripts/agent-sim.sh && echo "--- Results ---" && bd list --json | jq ".[] | {id, title, status, assignee}" 

--- Watcher output ---
WAKE dev beads-6gi beads-bbp beads-gyg
WAKE qa beads-z1i
WAKE eng-mgr beads-5mp
--- Running agent (no filter) ---
--- Results ---

export BEADS_DIR=$(pwd)/beads/.beads && echo "Open beads: $(bd list --status open --json | jq length)" && echo "Closed beads: $(bd list --status closed --json | jq length)" && echo "--- All closed beads ---" && bd list --status closed --json | jq ".[] | {id, title, status, assignee, close_reason}"

Open beads: 0
Closed beads: 6
--- All closed beads ---
{
  "id": "beads-5mp",
  "title": "Triage me",
  "status": "closed",
  "assignee": "omni-bot",
  "close_reason": "Done"
}
{
  "id": "beads-69q",
  "title": "Cross-team task",
  "status": "closed",
  "assignee": "dev-agent",
  "close_reason": "Done"
}
{
  "id": "beads-6gi",
  "title": "Dev task 1",
  "status": "closed",
  "assignee": "omni-bot",
  "close_reason": "Done"
}
{
  "id": "beads-bbp",
  "title": "Dev task 3",
  "status": "closed",
  "assignee": "omni-bot",
  "close_reason": "Done"
}
{
  "id": "beads-gyg",
  "title": "Dev task 2",
  "status": "closed",
  "assignee": "omni-bot",
  "close_reason": "Done"
}
{
  "id": "beads-z1i",
  "title": "QA review",
  "status": "closed",
  "assignee": "omni-bot",
  "close_reason": "Done"
}

PASS: Full pipeline works. All 5 new beads (3 dev, 1 qa, 1 unlabeled) processed by omni-bot. Combined with the earlier multi-label bead (beads-69q by dev-agent), all 6 beads in the repo are closed. The watcher correctly grouped beads by label, and the agent claimed and closed each one.

Side note: bd list --json (without --status) only shows open beads, not all beads. Use --status closed explicitly to see closed beads.

Test 15: WAKE with Non-existent Bead ID (retry)

Feed the agent a fabricated WAKE line with a bead ID that doesn't exist.

export BEADS_DIR=$(pwd)/beads/.beads && echo "WAKE dev beads-FAKE" | BD_ACTOR=test scripts/agent-sim.sh 2>&1; echo "Exit code: $?"

Exit code: 1

FAIL (minor, confirmed): Agent exits with code 1 on non-existent bead ID. Silent failure (stderr suppressed), but set -e kills the script. As noted in Test 12, this means one bad ID can block processing of remaining beads in the same WAKE line.

Test 16: Agent Without BD_ACTOR

Run the pipeline without setting BD_ACTOR. What identity is recorded for the claim?

export BEADS_DIR=$(pwd)/beads/.beads && unset BD_ACTOR && BEAD_ID=$(bd create "No actor test" -p 1 --labels dev -d "Who claims this?" --json | jq -r .id) && echo "Created: $BEAD_ID" && scripts/beads-watcher.sh | scripts/agent-sim.sh && bd show "$BEAD_ID" --json | jq ".[0] | {id, status, assignee, close_reason}"

Created: beads-15h
{
  "id": "beads-15h",
  "status": "closed",
  "assignee": "Marcel Hild",
  "close_reason": "Done"
}

PASS (with caveat): Without BD_ACTOR, bd falls back to the git user identity ("Marcel Hild"). The bead is still claimed and closed successfully. However, in a multi-agent setup this would make claims indistinguishable. The agent-sim.sh script does not enforce BD_ACTOR — it relies on the caller to set it. Consider adding a guard: [[ -z "$BD_ACTOR" ]] && echo 'error: BD_ACTOR required' >&2 && exit 1

Test 17: --label Filter with No Matching Beads

Create beads with label dev, run agent with --label ops (no match). Agent should do nothing.

export BEADS_DIR=$(pwd)/beads/.beads && BEAD_ID=$(bd create "Dev only" -p 1 --labels dev -d "Only for dev" --json | jq -r .id) && echo "Created: $BEAD_ID" && scripts/beads-watcher.sh | BD_ACTOR=ops-agent scripts/agent-sim.sh --label ops && echo "After ops-agent:" && bd show "$BEAD_ID" --json | jq ".[0] | {id, status, assignee}"

Created: beads-y1i
After ops-agent:
{
  "id": "beads-y1i",
  "status": "open",
  "assignee": null
}

PASS: Agent with --label ops correctly ignores beads with label dev. Bead remains open and unassigned.

Test 18: --label Without Value

Pass --label with no argument. The script should fail due to set -u (unbound variable) or shift error.

export BEADS_DIR=$(pwd)/beads/.beads && echo "WAKE dev beads-test" | BD_ACTOR=test scripts/agent-sim.sh --label 2>&1; echo "Exit code: $?"

scripts/agent-sim.sh: line 27: $2: unbound variable
Exit code: 1

PASS (acceptable): --label without a value triggers set -u (unbound variable $2). Exits with code 1. Error message could be friendlier (e.g., 'error: --label requires a value'), but it does fail safely.

Summary

Tests Passed (13/15 applicable)

#	Test	Result
1	Single bead lifecycle (create->watch->claim->close)	PASS
2	Multiple beads with label filtering	PASS
3	Second agent picks up remaining work	PASS
4	Watcher with no open beads	PASS
5	Already-claimed bead excluded by watcher	PASS
6	Unlabeled beads route to eng-mgr	PASS
7	Agent without --label processes all	PASS
8	Agent with empty input	PASS
9	Agent with garbage/non-WAKE input	PASS
10	Agent rejects unknown arguments	PASS
11	WAKE with non-existent bead ID	FAIL (minor)
12	Mixed valid/invalid IDs in one WAKE line	FAIL (minor)
13	Bead with multiple labels	PASS
14	Full pipeline (5 beads, mixed labels)	PASS
15	Non-existent bead ID (confirmed)	FAIL (minor)
16	Agent without BD_ACTOR	PASS (caveat)
17	--label with no matching beads	PASS
18	--label without value	PASS

Issues Found

set -e kills agent on bad bead IDs (Tests 11, 12, 15): If a WAKE line contains an invalid bead ID, bd update --claim fails and set -e aborts the entire script. In a WAKE line with multiple IDs (WAKE dev bad-id good-id), the good ID never gets processed. Fix: Add || true or per-bead error handling.
No BD_ACTOR validation (Test 16): Agent runs fine without BD_ACTOR, falling back to git identity. In multi-agent setups this makes claims indistinguishable. Fix: Add a guard at the top of agent-sim.sh.
Dolt concurrent access fragility (observed during testing): Two bd processes hitting the database simultaneously caused dolt lock contention. bd doctor --fix panicked with SIGSEGV. Recovery required deleting the dolt directory. This is a known dolt embedded-mode limitation.
--label without value gives cryptic error (Test 18): Reports $2: unbound variable instead of a helpful message. Minor UX issue.

Architecture Observations

The watcher-agent pipeline design is clean and unix-idiomatic (pipes, stdin/stdout)
Label-based routing works well as a dispatch mechanism
The atomic bd update --claim acts as an implicit distributed lock
The watcher's --status open filter naturally excludes already-claimed (in_progress) beads
Multi-label beads get first-come-first-served treatment (correct, but worth documenting)

Verify Note

showboat verify re-runs all exec blocks and compares output. Since bead IDs are hash-based (non-deterministic), every run produces different IDs, making verify always fail. The recorded outputs in this document are correct for the original run. To make showboat verify work with beads, commands would need to be wrapped to strip or normalize IDs — a future improvement for the showboat + beads workflow.

QA Environment Setup​

Test 1: Single Bead Lifecycle (Happy Path)​

Test 2: Multiple Beads with Different Labels​

Test 3: Second Agent Picks Up Remaining Work​

Test 4: Watcher with No Open Beads​

Test 5: Already-Claimed Bead​

Test 6: Unlabeled Beads (Triage Routing)​

Test 7: Agent Without --label Flag (Processes All)​

Test 8: Agent with Empty Input​

Test 9: Agent with Garbage/Non-WAKE Input​

Test 10: Agent with Unknown Argument​

Test 11: WAKE with Non-existent Bead ID​

Test 12: Mixed Valid/Invalid IDs in Single WAKE Line​

Test 13: Bead with Multiple Labels​

Test 13 (retry): Bead with Multiple Labels​

Test 14: Full Pipeline — 5 Beads, Multiple Labels​

Test 15: WAKE with Non-existent Bead ID (retry)​

Test 16: Agent Without BD_ACTOR​

Test 17: --label Filter with No Matching Beads​

Test 18: --label Without Value​

Summary​

Tests Passed (13/15 applicable)​

Issues Found​

Architecture Observations​

Verify Note​