Skip to main content

Tier 3: Message Passing — Status Inquiry via Delegation Chain

2026-02-23 by Showboat live capture

Pure message passing through the automated pipeline: a human asks for a dev team self-assessment, the request flows through the delegation chain (eng-mgr → dev → eng-mgr), and the answer flows back — all via beads. No code changes. No PRs. No git branches. Beads as Slack.

This showboat validates the automated dispatch loop (beads-watcher.sh | beads-notify.sh) running repeatedly, simulating what cron does every 30 minutes.

Pre-Flight: Agent State

Gateway running, shared intercom clone at /home/openclaw/b4arena/intercom/:

ssh openclaw@mimas "systemctl --user is-active openclaw-gateway.service"
active

Intercom state after just intercom-reset — 1 bead from the prior demo's JSONL:

ssh openclaw@mimas "BEADS_DIR=/home/openclaw/b4arena/intercom/.beads bd list --json 2>/dev/null \
| python3 -c 'import json,sys; beads=json.load(sys.stdin); print(len(beads), \"beads\"); [print(f\" {b[\\\"id\\\"]} {b[\\\"status\\\"]}\") for b in beads]'"
1 beads
intercom-8xm in_progress

The old modulo bead (intercom-8xm) survived via JSONL import — it's in_progress so the watcher ignores it.

Failed Attempt: Generic Status Inquiry

Before the successful run, we tried a generic phrasing:

bd create 'Status report: dev team workload' --labels eng-mgr -p 2 \
-d 'Give me a status report on the dev team. What beads are in progress, what was recently completed? Report back with a summary.'

Result: eng-mgr answered directly by querying bd list itself — no delegation to dev. It correctly judged it could answer the factual question from the shared Dolt DB alone. Created intercom-91x, closed it immediately with a status summary. 0 sub-beads.

Lesson learned: To force delegation, ask for something eng-mgr cannot answer from metadata — subjective self-assessment, confidence levels, blockers from the dev's perspective.

Act 1: Human Creates Self-Assessment Request

A human proxy creates a bead that requires dev's own perspective:

ssh openclaw@mimas "BEADS_DIR=/home/openclaw/b4arena/intercom/.beads BD_ACTOR=demo bd create \
'Dev team self-assessment' \
--labels eng-mgr -p 2 \
-d 'Ask each dev to self-assess their current situation: What are they blocked on? What needs attention? What is their confidence level on current work? I need each dev to report directly — do not answer from the bead DB alone.' \
--json 2>/dev/null"
{
"id": "intercom-0q2",
"title": "Dev team self-assessment",
"description": "Ask each dev to self-assess their current situation: What are they blocked on? ...",
"status": "open",
"priority": 2,
"issue_type": "task",
"created_by": "demo"
}

The key phrase: "do not answer from the bead DB alone" — forces delegation.

Act 2: Pipeline Run #1 — Watcher Detects, Eng-Mgr Wakes

Run the automated pipeline — the same command cron executes every 30 minutes:

ssh openclaw@mimas "BEADS_DIR=/home/openclaw/b4arena/intercom/.beads BD_ACTOR=watcher \
/home/openclaw/b4arena/scripts/beads-watcher.sh 2>/dev/null | \
/home/openclaw/b4arena/scripts/beads-notify.sh --map /home/openclaw/b4arena/agents/agent-map.json 2>&1"
Waking b4-eng-mgr (actor: eng-mgr-agent)...
OK: b4-eng-mgr woken

Eng-mgr's response (from agent JSON payload, trimmed):

Interesting task — requires the dev agent to report directly, not just a DB query. Claiming and delegating.

intercom-0q2 — Dev team self-assessment (P2)
- Claimed and triaged
- Created intercom-0q2.1 (label: dev) — dev agent asked to self-report on:
1. Current assignments
2. Blockers
3. Anything needing attention
4. Confidence level (and why)
- Explicit instruction: do not pull from DB — reflect actual state
- Dev to post answers as comment on intercom-0q2, then close the sub-bead
- Parent stays open until dev's self-assessment is received

Verify state — parent claimed, sub-bead created:

ssh openclaw@mimas "BEADS_DIR=/home/openclaw/b4arena/intercom/.beads bd show intercom-0q2 --json 2>/dev/null" \
| python3 -c "import json,sys; d=json.load(sys.stdin)[0]; print(json.dumps({k:d.get(k) for k in ['id','status','assignee']}, indent=2))"
{
"id": "intercom-0q2",
"status": "in_progress",
"assignee": "eng-mgr-agent"
}
ssh openclaw@mimas "BEADS_DIR=/home/openclaw/b4arena/intercom/.beads bd list --parent intercom-0q2 --json 2>/dev/null" \
| python3 -c "import json,sys; beads=json.load(sys.stdin); [print(json.dumps({k:b.get(k) for k in ['id','title','status','labels']}, indent=2)) for b in beads]"
{
"id": "intercom-0q2.1",
"title": "Self-assessment: current situation, blockers, confidence",
"status": "open",
"labels": [
"dev"
]
}

PASS. Eng-mgr claimed the parent, created 1 sub-bead labeled dev with a clear self-assessment mandate. No coding instructions — just "report your status from your own perspective."

Act 3: Pipeline Run #2 — Dev Wakes, Delivers Self-Assessment

Run the pipeline again — cron's next tick:

ssh openclaw@mimas "BEADS_DIR=/home/openclaw/b4arena/intercom/.beads BD_ACTOR=watcher \
/home/openclaw/b4arena/scripts/beads-watcher.sh 2>/dev/null | \
/home/openclaw/b4arena/scripts/beads-notify.sh --map /home/openclaw/b4arena/agents/agent-map.json 2>&1"
Waking b4-dev (actor: dev-agent)...
OK: b4-dev woken

Dev's response (from agent JSON payload, trimmed):

Queue clear. Done:

intercom-0q2.1 — Closed

Self-assessment posted on parent intercom-0q2. Key points flagged to eng-mgr:

- No active code blockers — execution confidence is high
- beads-dbb stale 48hrs+ — needs clarification or closure
- dolt state loss between sandbox sessions — manual bd import required each time, risks silent misses
- SOUL.md/TOOLS.md config drift — old paths no longer valid in sandbox
- PR#4 (modulo operator) open and awaiting review

Verify sub-bead closed with status report:

ssh openclaw@mimas "BEADS_DIR=/home/openclaw/b4arena/intercom/.beads bd show intercom-0q2.1 --json 2>/dev/null" \
| python3 -c "import json,sys; d=json.load(sys.stdin)[0]; print(json.dumps({k:d.get(k) for k in ['id','status','assignee','close_reason']}, indent=2))"
{
"id": "intercom-0q2.1",
"status": "closed",
"assignee": "dev-agent",
"close_reason": "Self-assessment posted on intercom-0q2: no active blockers on code work; flagged beads-dbb stale clarification, dolt state-loss between sessions, config drift in SOUL.md/TOOLS.md, and PR#4 awaiting review."
}

PASS. Dev checked its workload, wrote a candid self-assessment (including real operational issues!), posted it as a comment on the parent bead, and closed. No code was written. No PRs were opened. Pure message passing.

Act 4: Pipeline Run #3 — Gap G-2: Parent Doesn't Auto-Close

Run the pipeline a third time:

ssh openclaw@mimas "BEADS_DIR=/home/openclaw/b4arena/intercom/.beads BD_ACTOR=watcher \
/home/openclaw/b4arena/scripts/beads-watcher.sh 2>/dev/null | \
/home/openclaw/b4arena/scripts/beads-notify.sh --map /home/openclaw/b4arena/agents/agent-map.json 2>&1"
(no output — nothing to wake)

Nothing happens. The parent bead (intercom-0q2) is in_progress, not ready — so the watcher ignores it. The child is closed, but nobody told eng-mgr. This is Gap G-2: completion does not propagate upward automatically.

Today's workaround: manually re-invoke eng-mgr for a status check:

ssh mimas 'sudo -u openclaw bash -c "set -a; source /etc/openclaw/env; set +a; \
HOME=/home/openclaw /opt/openclaw/bin/openclaw agent --agent b4-eng-mgr \
--message \"Check status of bead intercom-0q2. Query children with bd list --parent intercom-0q2 --json. If all children are closed, close the parent with a summary of their close_reasons.\" \
--json --timeout 120 2>/dev/null"' \
| jq -r '.result.payloads[-1].text'
✅ **intercom-0q2 closed.**

Dev's self-assessment was thorough and candid. Here's the distilled picture:

---

**Dev team health — as of 2026-02-23**

| | |
|---|---|
| **In progress** | Nothing active |
| **Recent completions** | --formal flag (test-greeter PR#1), modulo operator (test-calculator PR#4), smoke tests, README docs |
| **Confidence** | HIGH on code & execution · MEDIUM on DB reliability · LOW on beads-dbb |

**Action items flagged by dev (worth your attention):**

1. **beads-dbb** — "Do the thing" bead, stale 48+ hrs with no response to clarification question. Needs owner decision: clarify, reassign, or close as invalid.
2. **PR#4** (modulo, test-calculator) — merged? Still open and unreviewed.
3. **bd/dolt state loss** — fresh sandbox restarts require a manual `bd import -i .beads/issues.jsonl` before `bd ready` returns anything. Silent failure risk. Needs automation or a documented startup hook.
4. **Config drift** — `SOUL.md`/`TOOLS.md` reference stale paths (`/home/openclaw/b4arena/beads/.beads`) that don't exist in the sandbox. Should be updated to `/workspace/intercom/.beads`.

Verify final state:

ssh openclaw@mimas "BEADS_DIR=/home/openclaw/b4arena/intercom/.beads bd show intercom-0q2 --json 2>/dev/null" \
| python3 -c "import json,sys; d=json.load(sys.stdin)[0]; print(json.dumps({k:d.get(k) for k in ['id','status','assignee','close_reason']}, indent=2))"
{
"id": "intercom-0q2",
"status": "closed",
"assignee": "eng-mgr-agent",
"close_reason": "Dev self-assessment complete (intercom-0q2.1, dev-agent, 2026-02-23).\n\nCURRENT WORK: No active beads in_progress. Recent completions: --formal flag (test-greeter PR#1), modulo operator (test-calculator PR#4), 3x calc smoke-tests, 2x README doc beads.\n\nBLOCKERS:\n- beads-dbb ('Do the thing') — stale 48+ hrs, awaiting clarification. Cannot proceed without scope.\n- PR#4 (modulo, test-calculator) — open, unreviewed, unmerged.\n\nNEEDS ATTENTION:\n1. beads-dbb — should be clarified, reassigned, or closed as invalid.\n2. bd/dolt state loss between sessions — 'bd ready' silently returns empty on fresh sandbox restart until manual 'bd import -i .beads/issues.jsonl' is run. Needs automation or docs.\n3. Config drift — SOUL.md/TOOLS.md reference stale paths (/home/openclaw/b4arena/beads/.beads, old repo paths). Should be corrected to /workspace/intercom/.beads.\n4. PR#4 awaiting review/merge.\n\nCONFIDENCE: HIGH on task execution and code changes. MEDIUM on bead DB reliability (import workaround is fragile). LOW on beads-dbb (indefinitely stalled)."
}

PASS. The status report flowed from dev → eng-mgr → bead close_reason, ready for the human to read. The close_reason contains a structured, actionable report.

Act 5: Session Replays

What did the agents actually do? just replay shows their full timelines:

Eng-Mgr Timeline

just replay b4-eng-mgr --after "2026-02-23T08:24"
═══ Session: fa455e55-8f56-4fa3-88e1-ef060c5d4031 (b4-eng-mgr) ═══════════════

── 2026-02-23T08:24:27 WAKE ──────────────────────────────────────────────────
Ready beads for your role: intercom-0q2. Process in priority order.

08:24:32 TOOL exec -> cd /workspace/intercom && git pull && bd list --status in_progress ...
08:24:32 TOOL exec -> bd show intercom-0q2 --json
08:24:43 TEXT Interesting task — requires the dev agent to report directly, not just a DB query.
Claiming and delegating.
08:24:43 TOOL exec -> bd update intercom-0q2 --claim --json
08:24:51 TOOL exec -> bd create "Self-assessment: current situation, blockers, confidence"
--labels dev --parent intercom-0q2 ...
08:24:57 TOOL exec -> bd comments add intercom-0q2 "Triaged. Created sub-task intercom-0q2.1..."
08:25:05 TEXT Push failed — likely a fast-forward issue. Pulling and retrying.
08:25:05 TOOL exec -> cd /workspace/intercom && git pull --rebase && git push
08:25:08 TOOL exec -> cd /workspace/intercom && git stash && git pull --rebase && git stash pop
&& git add -A && git commit -m "sync" && git push
08:25:18 TEXT Pushed successfully. Queue is clear.

── 2026-02-23T08:27:26 WAKE ──────────────────────────────────────────────────
Check status of bead intercom-0q2. Query children...

08:27:28 TOOL exec -> cd /workspace/intercom && git pull && bd show intercom-0q2 intercom-0q2.1 --json
08:27:39 TEXT All 1 child closed. Rich self-assessment in hand — closing the parent with
a full aggregated summary.
08:27:39 TOOL exec -> bd close intercom-0q2 --reason "Dev self-assessment complete..."
08:27:44 TOOL exec -> cd /workspace/intercom && bd export && git add -A && git commit -m "sync"
&& git push
08:27:56 TEXT intercom-0q2 closed.

═══ Summary ═══════════════════════════════════════════════════════
Duration: 3m 29s (across both wake-ups)
Wake-ups: 2
Tool calls: 11

Eng-mgr never touched code. It read the bead, recognized the delegation requirement, claimed, created a sub-bead, waited. On re-invocation: read child close_reason, aggregated into a structured report, closed parent.

Dev Timeline

just replay b4-dev --after "2026-02-23T08:25"
═══ Session: ebe69d1b-8b7f-41ed-ba1d-9e516179f934 (b4-dev) ═══════════════════

── 2026-02-23T08:25:40 WAKE ──────────────────────────────────────────────────
Ready beads for your role: intercom-0q2.1. Process in priority order.

08:25:44 TOOL exec -> cd /workspace/intercom && git pull && bd list --status in_progress ...
08:26:05 TEXT Claiming and composing an honest self-assessment.
08:26:05 TOOL exec -> bd update intercom-0q2.1 --claim
08:26:10 TEXT Let me also check the full open bead list for context before writing.
08:26:10 TOOL exec -> bd list --json
08:26:37 TEXT Now posting the self-assessment on the parent bead.
08:26:37 TOOL exec -> bd comments add intercom-0q2 "SELF-ASSESSMENT — dev-agent (2026-02-23)..."
08:26:41 TOOL exec -> bd close intercom-0q2.1 --reason "Self-assessment posted on intercom-0q2..."
08:26:46 TOOL exec -> cd /workspace/intercom && bd export -o .beads/issues.jsonl
&& git add -A && git push
08:26:52 TOOL exec -> bd ready --json
08:27:01 TEXT Queue clear. Done.

═══ Summary ═══════════════════════════════════════════════════════
Duration: 1m 20s
Wake-ups: 1
Tool calls: 7

Dev never cloned a repo, never ran tests, never opened a PR. It checked its bead queue, wrote a candid self-assessment with real operational concerns, posted it as a comment on the parent bead, closed its sub-bead, and synced.

Message Flow

Human (demo — Chief of Staff)

├─ bd create "Dev team self-assessment" --labels eng-mgr
│ └─ intercom-0q2 (open, P2)

├─ Pipeline run #1:
│ ├─ Watcher: WAKE eng-mgr intercom-0q2
│ └─ beads-notify.sh → openclaw agent --agent b4-eng-mgr (automatic)

├─ Eng-mgr (51s):
│ ├─ bd show intercom-0q2 → reads inquiry
│ ├─ bd update intercom-0q2 --claim → in_progress
│ └─ bd create "Self-assessment: blockers, confidence" --parent intercom-0q2 --labels dev
│ └─ intercom-0q2.1 (open, label: dev)

├─ Pipeline run #2:
│ ├─ Watcher: WAKE dev intercom-0q2.1
│ └─ beads-notify.sh → openclaw agent --agent b4-dev (automatic)

├─ Dev (80s):
│ ├─ bd show intercom-0q2.1 → reads request
│ ├─ bd list → checks own workload and bead history
│ ├─ bd comments add intercom-0q2 "SELF-ASSESSMENT — dev-agent..."
│ └─ bd close intercom-0q2.1 --reason "Self-assessment posted"
│ └─ intercom-0q2.1 (closed, status in close_reason)

├─ Pipeline run #3: (nothing — parent is in_progress, Gap G-2)

├─ Manual re-invocation of eng-mgr (30s):
│ ├─ bd show intercom-0q2 intercom-0q2.1 → reads child close_reason + comment
│ └─ bd close intercom-0q2 --reason "Aggregated status report with action items"
│ └─ intercom-0q2 (closed)

└─ Human reads close_reason:
"CURRENT WORK: Nothing active.
BLOCKERS: beads-dbb stale, PR#4 unreviewed.
CONFIDENCE: HIGH execution, MEDIUM DB reliability, LOW beads-dbb."

Summary

ActPipeline RunAgentDurationActionResult
1(human)Create self-assessment requestintercom-0q2 open
2#1eng-mgr51sAuto-woken, claims parent, delegates to devintercom-0q2.1 open
3#2dev80sAuto-woken, self-assesses, closes sub-beadintercom-0q2.1 closed
4#3Pipeline finds nothing (Gap G-2)No wake
4(manual)eng-mgr30sReads child, aggregates, closes parentintercom-0q2 closed

What This Proves

ClaimEvidence
Beads work for message passing, not just coding tasksEntire demo has zero code changes, zero PRs, zero git branches
Automated pipeline dispatches without human interventionActs 2–3 use identical pipeline command (what cron runs every 30 min)
Agents respond appropriately to non-coding requestsDev reports honest self-assessment instead of trying to write code
close_reason carries structured information upwardDev's blockers and confidence levels flow through eng-mgr to the human
Agents surface real operational issues unpromptedDev flagged dolt state loss, config drift, stale beads — not in the task description
Session replay reveals agent decision-makingjust replay shows tool calls, timing, and reasoning

Unexpected Bonus: Dev Surfaced Real Issues

The self-assessment wasn't canned — dev flagged 4 genuine operational problems:

  1. beads-dbb stale 48+ hrs — a bead with no clear scope that's blocking dev's queue
  2. PR#4 unreviewed — modulo operator PR from the Tier 3 coding demo, still open
  3. bd/dolt state lossbd ready returns empty on fresh sandbox restart until manual import
  4. Config drift — SOUL.md references paths that don't exist in the sandbox

These are real findings worth acting on. The message-passing demo accidentally became a useful operational audit.

Gap G-2: Completion Propagation

The most visible gap in this demo: when dev closes the sub-bead, eng-mgr is not automatically notified. The watcher only polls bd readyin_progress beads are excluded. Today's workaround is manual re-invocation.

Fix options (from docs/intercom.md):

  1. Watcher tracks parent-child relationships; when a child closes, wake the parent's assignee
  2. bd list --updated-after <timestamp> to detect state changes since last poll
  3. A "completion callback" bead convention — child writes a comment on parent before closing

Until G-2 is fixed, the last mile of every delegation chain requires manual intervention or a scheduled status-check cron job per active parent bead.

Prompt Engineering Observation

The first attempt (intercom-91x) used a factual status query — eng-mgr answered it directly from bd list without delegating. The successful attempt required subjective information (confidence levels, blockers from dev's perspective) that eng-mgr couldn't extract from bead metadata. Lesson: if you want delegation to happen, ask for something the manager can't answer alone.

Appendix: Token Costs

AgentWake-upsDurationTokens (output)
b4-eng-mgr251s + 30s1,443 + 1,041
b4-dev180s3,054
Total3~2m 41s~5,538 output

The entire status inquiry — from human question to structured answer — cost roughly 5.5k output tokens across 3 agent invocations. The Tier 1 watcher (shell scripts) consumed 0 tokens.