Tier 3: Message Passing — Status Inquiry via Delegation Chain
2026-02-23 by Showboat live capture
Pure message passing through the automated pipeline: a human asks for a dev team self-assessment, the request flows through the delegation chain (eng-mgr → dev → eng-mgr), and the answer flows back — all via beads. No code changes. No PRs. No git branches. Beads as Slack.
This showboat validates the automated dispatch loop (beads-watcher.sh | beads-notify.sh) running repeatedly, simulating what cron does every 30 minutes.
Pre-Flight: Agent State
Gateway running, shared intercom clone at /home/openclaw/b4arena/intercom/:
ssh openclaw@mimas "systemctl --user is-active openclaw-gateway.service"
active
Intercom state after just intercom-reset — 1 bead from the prior demo's JSONL:
ssh openclaw@mimas "BEADS_DIR=/home/openclaw/b4arena/intercom/.beads bd list --json 2>/dev/null \
| python3 -c 'import json,sys; beads=json.load(sys.stdin); print(len(beads), \"beads\"); [print(f\" {b[\\\"id\\\"]} {b[\\\"status\\\"]}\") for b in beads]'"
1 beads
intercom-8xm in_progress
The old modulo bead (intercom-8xm) survived via JSONL import — it's in_progress so the watcher ignores it.
Failed Attempt: Generic Status Inquiry
Before the successful run, we tried a generic phrasing:
bd create 'Status report: dev team workload' --labels eng-mgr -p 2 \
-d 'Give me a status report on the dev team. What beads are in progress, what was recently completed? Report back with a summary.'
Result: eng-mgr answered directly by querying bd list itself — no delegation to dev. It correctly judged it could answer the factual question from the shared Dolt DB alone. Created intercom-91x, closed it immediately with a status summary. 0 sub-beads.
Lesson learned: To force delegation, ask for something eng-mgr cannot answer from metadata — subjective self-assessment, confidence levels, blockers from the dev's perspective.
Act 1: Human Creates Self-Assessment Request
A human proxy creates a bead that requires dev's own perspective:
ssh openclaw@mimas "BEADS_DIR=/home/openclaw/b4arena/intercom/.beads BD_ACTOR=demo bd create \
'Dev team self-assessment' \
--labels eng-mgr -p 2 \
-d 'Ask each dev to self-assess their current situation: What are they blocked on? What needs attention? What is their confidence level on current work? I need each dev to report directly — do not answer from the bead DB alone.' \
--json 2>/dev/null"
{
"id": "intercom-0q2",
"title": "Dev team self-assessment",
"description": "Ask each dev to self-assess their current situation: What are they blocked on? ...",
"status": "open",
"priority": 2,
"issue_type": "task",
"created_by": "demo"
}
The key phrase: "do not answer from the bead DB alone" — forces delegation.
Act 2: Pipeline Run #1 — Watcher Detects, Eng-Mgr Wakes
Run the automated pipeline — the same command cron executes every 30 minutes:
ssh openclaw@mimas "BEADS_DIR=/home/openclaw/b4arena/intercom/.beads BD_ACTOR=watcher \
/home/openclaw/b4arena/scripts/beads-watcher.sh 2>/dev/null | \
/home/openclaw/b4arena/scripts/beads-notify.sh --map /home/openclaw/b4arena/agents/agent-map.json 2>&1"
Waking b4-eng-mgr (actor: eng-mgr-agent)...
OK: b4-eng-mgr woken
Eng-mgr's response (from agent JSON payload, trimmed):
Interesting task — requires the dev agent to report directly, not just a DB query. Claiming and delegating.
intercom-0q2 — Dev team self-assessment (P2)
- Claimed and triaged
- Created intercom-0q2.1 (label: dev) — dev agent asked to self-report on:
1. Current assignments
2. Blockers
3. Anything needing attention
4. Confidence level (and why)
- Explicit instruction: do not pull from DB — reflect actual state
- Dev to post answers as comment on intercom-0q2, then close the sub-bead
- Parent stays open until dev's self-assessment is received
Verify state — parent claimed, sub-bead created:
ssh openclaw@mimas "BEADS_DIR=/home/openclaw/b4arena/intercom/.beads bd show intercom-0q2 --json 2>/dev/null" \
| python3 -c "import json,sys; d=json.load(sys.stdin)[0]; print(json.dumps({k:d.get(k) for k in ['id','status','assignee']}, indent=2))"
{
"id": "intercom-0q2",
"status": "in_progress",
"assignee": "eng-mgr-agent"
}
ssh openclaw@mimas "BEADS_DIR=/home/openclaw/b4arena/intercom/.beads bd list --parent intercom-0q2 --json 2>/dev/null" \
| python3 -c "import json,sys; beads=json.load(sys.stdin); [print(json.dumps({k:b.get(k) for k in ['id','title','status','labels']}, indent=2)) for b in beads]"
{
"id": "intercom-0q2.1",
"title": "Self-assessment: current situation, blockers, confidence",
"status": "open",
"labels": [
"dev"
]
}
PASS. Eng-mgr claimed the parent, created 1 sub-bead labeled dev with a clear self-assessment mandate. No coding instructions — just "report your status from your own perspective."
Act 3: Pipeline Run #2 — Dev Wakes, Delivers Self-Assessment
Run the pipeline again — cron's next tick:
ssh openclaw@mimas "BEADS_DIR=/home/openclaw/b4arena/intercom/.beads BD_ACTOR=watcher \
/home/openclaw/b4arena/scripts/beads-watcher.sh 2>/dev/null | \
/home/openclaw/b4arena/scripts/beads-notify.sh --map /home/openclaw/b4arena/agents/agent-map.json 2>&1"
Waking b4-dev (actor: dev-agent)...
OK: b4-dev woken
Dev's response (from agent JSON payload, trimmed):
Queue clear. Done:
intercom-0q2.1 — Closed
Self-assessment posted on parent intercom-0q2. Key points flagged to eng-mgr:
- No active code blockers — execution confidence is high
- beads-dbb stale 48hrs+ — needs clarification or closure
- dolt state loss between sandbox sessions — manual bd import required each time, risks silent misses
- SOUL.md/TOOLS.md config drift — old paths no longer valid in sandbox
- PR#4 (modulo operator) open and awaiting review
Verify sub-bead closed with status report:
ssh openclaw@mimas "BEADS_DIR=/home/openclaw/b4arena/intercom/.beads bd show intercom-0q2.1 --json 2>/dev/null" \
| python3 -c "import json,sys; d=json.load(sys.stdin)[0]; print(json.dumps({k:d.get(k) for k in ['id','status','assignee','close_reason']}, indent=2))"
{
"id": "intercom-0q2.1",
"status": "closed",
"assignee": "dev-agent",
"close_reason": "Self-assessment posted on intercom-0q2: no active blockers on code work; flagged beads-dbb stale clarification, dolt state-loss between sessions, config drift in SOUL.md/TOOLS.md, and PR#4 awaiting review."
}
PASS. Dev checked its workload, wrote a candid self-assessment (including real operational issues!), posted it as a comment on the parent bead, and closed. No code was written. No PRs were opened. Pure message passing.
Act 4: Pipeline Run #3 — Gap G-2: Parent Doesn't Auto-Close
Run the pipeline a third time:
ssh openclaw@mimas "BEADS_DIR=/home/openclaw/b4arena/intercom/.beads BD_ACTOR=watcher \
/home/openclaw/b4arena/scripts/beads-watcher.sh 2>/dev/null | \
/home/openclaw/b4arena/scripts/beads-notify.sh --map /home/openclaw/b4arena/agents/agent-map.json 2>&1"
(no output — nothing to wake)
Nothing happens. The parent bead (intercom-0q2) is in_progress, not ready — so the watcher ignores it. The child is closed, but nobody told eng-mgr. This is Gap G-2: completion does not propagate upward automatically.
Today's workaround: manually re-invoke eng-mgr for a status check:
ssh mimas 'sudo -u openclaw bash -c "set -a; source /etc/openclaw/env; set +a; \
HOME=/home/openclaw /opt/openclaw/bin/openclaw agent --agent b4-eng-mgr \
--message \"Check status of bead intercom-0q2. Query children with bd list --parent intercom-0q2 --json. If all children are closed, close the parent with a summary of their close_reasons.\" \
--json --timeout 120 2>/dev/null"' \
| jq -r '.result.payloads[-1].text'
✅ **intercom-0q2 closed.**
Dev's self-assessment was thorough and candid. Here's the distilled picture:
---
**Dev team health — as of 2026-02-23**
| | |
|---|---|
| **In progress** | Nothing active |
| **Recent completions** | --formal flag (test-greeter PR#1), modulo operator (test-calculator PR#4), smoke tests, README docs |
| **Confidence** | HIGH on code & execution · MEDIUM on DB reliability · LOW on beads-dbb |
**Action items flagged by dev (worth your attention):**
1. **beads-dbb** — "Do the thing" bead, stale 48+ hrs with no response to clarification question. Needs owner decision: clarify, reassign, or close as invalid.
2. **PR#4** (modulo, test-calculator) — merged? Still open and unreviewed.
3. **bd/dolt state loss** — fresh sandbox restarts require a manual `bd import -i .beads/issues.jsonl` before `bd ready` returns anything. Silent failure risk. Needs automation or a documented startup hook.
4. **Config drift** — `SOUL.md`/`TOOLS.md` reference stale paths (`/home/openclaw/b4arena/beads/.beads`) that don't exist in the sandbox. Should be updated to `/workspace/intercom/.beads`.
Verify final state:
ssh openclaw@mimas "BEADS_DIR=/home/openclaw/b4arena/intercom/.beads bd show intercom-0q2 --json 2>/dev/null" \
| python3 -c "import json,sys; d=json.load(sys.stdin)[0]; print(json.dumps({k:d.get(k) for k in ['id','status','assignee','close_reason']}, indent=2))"
{
"id": "intercom-0q2",
"status": "closed",
"assignee": "eng-mgr-agent",
"close_reason": "Dev self-assessment complete (intercom-0q2.1, dev-agent, 2026-02-23).\n\nCURRENT WORK: No active beads in_progress. Recent completions: --formal flag (test-greeter PR#1), modulo operator (test-calculator PR#4), 3x calc smoke-tests, 2x README doc beads.\n\nBLOCKERS:\n- beads-dbb ('Do the thing') — stale 48+ hrs, awaiting clarification. Cannot proceed without scope.\n- PR#4 (modulo, test-calculator) — open, unreviewed, unmerged.\n\nNEEDS ATTENTION:\n1. beads-dbb — should be clarified, reassigned, or closed as invalid.\n2. bd/dolt state loss between sessions — 'bd ready' silently returns empty on fresh sandbox restart until manual 'bd import -i .beads/issues.jsonl' is run. Needs automation or docs.\n3. Config drift — SOUL.md/TOOLS.md reference stale paths (/home/openclaw/b4arena/beads/.beads, old repo paths). Should be corrected to /workspace/intercom/.beads.\n4. PR#4 awaiting review/merge.\n\nCONFIDENCE: HIGH on task execution and code changes. MEDIUM on bead DB reliability (import workaround is fragile). LOW on beads-dbb (indefinitely stalled)."
}
PASS. The status report flowed from dev → eng-mgr → bead close_reason, ready for the human to read. The close_reason contains a structured, actionable report.
Act 5: Session Replays
What did the agents actually do? just replay shows their full timelines:
Eng-Mgr Timeline
just replay b4-eng-mgr --after "2026-02-23T08:24"
═══ Session: fa455e55-8f56-4fa3-88e1-ef060c5d4031 (b4-eng-mgr) ═══════════════
── 2026-02-23T08:24:27 WAKE ──────────────────────────────────────────────────
Ready beads for your role: intercom-0q2. Process in priority order.
08:24:32 TOOL exec -> cd /workspace/intercom && git pull && bd list --status in_progress ...
08:24:32 TOOL exec -> bd show intercom-0q2 --json
08:24:43 TEXT Interesting task — requires the dev agent to report directly, not just a DB query.
Claiming and delegating.
08:24:43 TOOL exec -> bd update intercom-0q2 --claim --json
08:24:51 TOOL exec -> bd create "Self-assessment: current situation, blockers, confidence"
--labels dev --parent intercom-0q2 ...
08:24:57 TOOL exec -> bd comments add intercom-0q2 "Triaged. Created sub-task intercom-0q2.1..."
08:25:05 TEXT Push failed — likely a fast-forward issue. Pulling and retrying.
08:25:05 TOOL exec -> cd /workspace/intercom && git pull --rebase && git push
08:25:08 TOOL exec -> cd /workspace/intercom && git stash && git pull --rebase && git stash pop
&& git add -A && git commit -m "sync" && git push
08:25:18 TEXT Pushed successfully. Queue is clear.
── 2026-02-23T08:27:26 WAKE ──────────────────────────────────────────────────
Check status of bead intercom-0q2. Query children...
08:27:28 TOOL exec -> cd /workspace/intercom && git pull && bd show intercom-0q2 intercom-0q2.1 --json
08:27:39 TEXT All 1 child closed. Rich self-assessment in hand — closing the parent with
a full aggregated summary.
08:27:39 TOOL exec -> bd close intercom-0q2 --reason "Dev self-assessment complete..."
08:27:44 TOOL exec -> cd /workspace/intercom && bd export && git add -A && git commit -m "sync"
&& git push
08:27:56 TEXT intercom-0q2 closed.
═══ Summary ═══════════════════════════════════════════════════════
Duration: 3m 29s (across both wake-ups)
Wake-ups: 2
Tool calls: 11
Eng-mgr never touched code. It read the bead, recognized the delegation requirement, claimed, created a sub-bead, waited. On re-invocation: read child close_reason, aggregated into a structured report, closed parent.
Dev Timeline
just replay b4-dev --after "2026-02-23T08:25"
═══ Session: ebe69d1b-8b7f-41ed-ba1d-9e516179f934 (b4-dev) ═══════════════════
── 2026-02-23T08:25:40 WAKE ──────────────────────────────────────────────────
Ready beads for your role: intercom-0q2.1. Process in priority order.
08:25:44 TOOL exec -> cd /workspace/intercom && git pull && bd list --status in_progress ...
08:26:05 TEXT Claiming and composing an honest self-assessment.
08:26:05 TOOL exec -> bd update intercom-0q2.1 --claim
08:26:10 TEXT Let me also check the full open bead list for context before writing.
08:26:10 TOOL exec -> bd list --json
08:26:37 TEXT Now posting the self-assessment on the parent bead.
08:26:37 TOOL exec -> bd comments add intercom-0q2 "SELF-ASSESSMENT — dev-agent (2026-02-23)..."
08:26:41 TOOL exec -> bd close intercom-0q2.1 --reason "Self-assessment posted on intercom-0q2..."
08:26:46 TOOL exec -> cd /workspace/intercom && bd export -o .beads/issues.jsonl
&& git add -A && git push
08:26:52 TOOL exec -> bd ready --json
08:27:01 TEXT Queue clear. Done.
═══ Summary ═══════════════════════════════════════════════════════
Duration: 1m 20s
Wake-ups: 1
Tool calls: 7
Dev never cloned a repo, never ran tests, never opened a PR. It checked its bead queue, wrote a candid self-assessment with real operational concerns, posted it as a comment on the parent bead, closed its sub-bead, and synced.
Message Flow
Human (demo — Chief of Staff)
│
├─ bd create "Dev team self-assessment" --labels eng-mgr
│ └─ intercom-0q2 (open, P2)
│
├─ Pipeline run #1:
│ ├─ Watcher: WAKE eng-mgr intercom-0q2
│ └─ beads-notify.sh → openclaw agent --agent b4-eng-mgr (automatic)
│
├─ Eng-mgr (51s):
│ ├─ bd show intercom-0q2 → reads inquiry
│ ├─ bd update intercom-0q2 --claim → in_progress
│ └─ bd create "Self-assessment: blockers, confidence" --parent intercom-0q2 --labels dev
│ └─ intercom-0q2.1 (open, label: dev)
│
├─ Pipeline run #2:
│ ├─ Watcher: WAKE dev intercom-0q2.1
│ └─ beads-notify.sh → openclaw agent --agent b4-dev (automatic)
│
├─ Dev (80s):
│ ├─ bd show intercom-0q2.1 → reads request
│ ├─ bd list → checks own workload and bead history
│ ├─ bd comments add intercom-0q2 "SELF-ASSESSMENT — dev-agent..."
│ └─ bd close intercom-0q2.1 --reason "Self-assessment posted"
│ └─ intercom-0q2.1 (closed, status in close_reason)
│
├─ Pipeline run #3: (nothing — parent is in_progress, Gap G-2)
│
├─ Manual re-invocation of eng-mgr (30s):
│ ├─ bd show intercom-0q2 intercom-0q2.1 → reads child close_reason + comment
│ └─ bd close intercom-0q2 --reason "Aggregated status report with action items"
│ └─ intercom-0q2 (closed)
│
└─ Human reads close_reason:
"CURRENT WORK: Nothing active.
BLOCKERS: beads-dbb stale, PR#4 unreviewed.
CONFIDENCE: HIGH execution, MEDIUM DB reliability, LOW beads-dbb."
Summary
| Act | Pipeline Run | Agent | Duration | Action | Result |
|---|---|---|---|---|---|
| 1 | — | (human) | — | Create self-assessment request | intercom-0q2 open |
| 2 | #1 | eng-mgr | 51s | Auto-woken, claims parent, delegates to dev | intercom-0q2.1 open |
| 3 | #2 | dev | 80s | Auto-woken, self-assesses, closes sub-bead | intercom-0q2.1 closed |
| 4 | #3 | — | — | Pipeline finds nothing (Gap G-2) | No wake |
| 4 | (manual) | eng-mgr | 30s | Reads child, aggregates, closes parent | intercom-0q2 closed |
What This Proves
| Claim | Evidence |
|---|---|
| Beads work for message passing, not just coding tasks | Entire demo has zero code changes, zero PRs, zero git branches |
| Automated pipeline dispatches without human intervention | Acts 2–3 use identical pipeline command (what cron runs every 30 min) |
| Agents respond appropriately to non-coding requests | Dev reports honest self-assessment instead of trying to write code |
| close_reason carries structured information upward | Dev's blockers and confidence levels flow through eng-mgr to the human |
| Agents surface real operational issues unprompted | Dev flagged dolt state loss, config drift, stale beads — not in the task description |
| Session replay reveals agent decision-making | just replay shows tool calls, timing, and reasoning |
Unexpected Bonus: Dev Surfaced Real Issues
The self-assessment wasn't canned — dev flagged 4 genuine operational problems:
- beads-dbb stale 48+ hrs — a bead with no clear scope that's blocking dev's queue
- PR#4 unreviewed — modulo operator PR from the Tier 3 coding demo, still open
- bd/dolt state loss —
bd readyreturns empty on fresh sandbox restart until manual import - Config drift — SOUL.md references paths that don't exist in the sandbox
These are real findings worth acting on. The message-passing demo accidentally became a useful operational audit.
Gap G-2: Completion Propagation
The most visible gap in this demo: when dev closes the sub-bead, eng-mgr is not automatically notified. The watcher only polls bd ready — in_progress beads are excluded. Today's workaround is manual re-invocation.
Fix options (from docs/intercom.md):
- Watcher tracks parent-child relationships; when a child closes, wake the parent's assignee
bd list --updated-after <timestamp>to detect state changes since last poll- A "completion callback" bead convention — child writes a comment on parent before closing
Until G-2 is fixed, the last mile of every delegation chain requires manual intervention or a scheduled status-check cron job per active parent bead.
Prompt Engineering Observation
The first attempt (intercom-91x) used a factual status query — eng-mgr answered it directly from bd list without delegating. The successful attempt required subjective information (confidence levels, blockers from dev's perspective) that eng-mgr couldn't extract from bead metadata. Lesson: if you want delegation to happen, ask for something the manager can't answer alone.
Appendix: Token Costs
| Agent | Wake-ups | Duration | Tokens (output) |
|---|---|---|---|
| b4-eng-mgr | 2 | 51s + 30s | 1,443 + 1,041 |
| b4-dev | 1 | 80s | 3,054 |
| Total | 3 | ~2m 41s | ~5,538 output |
The entire status inquiry — from human question to structured answer — cost roughly 5.5k output tokens across 3 agent invocations. The Tier 1 watcher (shell scripts) consumed 0 tokens.