Tier 3: Message Passing — Status Inquiry via Delegation Chain

2026-02-23 by Showboat live capture

Pure message passing through the automated pipeline: a human asks for a dev team self-assessment, the request flows through the delegation chain (eng-mgr → dev → eng-mgr), and the answer flows back — all via beads. No code changes. No PRs. No git branches. Beads as Slack.

This showboat validates the automated dispatch loop (beads-watcher.sh | beads-notify.sh) running repeatedly, simulating what cron does every 30 minutes.

Pre-Flight: Agent State

Gateway running, shared intercom clone at /home/openclaw/b4arena/intercom/:

ssh openclaw@mimas "systemctl --user is-active openclaw-gateway.service"

active

Intercom state after just intercom-reset — 1 bead from the prior demo's JSONL:

ssh openclaw@mimas "BEADS_DIR=/home/openclaw/b4arena/intercom/.beads bd list --json 2>/dev/null \
  | python3 -c 'import json,sys; beads=json.load(sys.stdin); print(len(beads), \"beads\"); [print(f\"  {b[\\\"id\\\"]} {b[\\\"status\\\"]}\") for b in beads]'"

1 beads
  intercom-8xm in_progress

The old modulo bead (intercom-8xm) survived via JSONL import — it's in_progress so the watcher ignores it.

Failed Attempt: Generic Status Inquiry

Before the successful run, we tried a generic phrasing:

bd create 'Status report: dev team workload' --labels eng-mgr -p 2 \
  -d 'Give me a status report on the dev team. What beads are in progress, what was recently completed? Report back with a summary.'

Result: eng-mgr answered directly by querying bd list itself — no delegation to dev. It correctly judged it could answer the factual question from the shared Dolt DB alone. Created intercom-91x, closed it immediately with a status summary. 0 sub-beads.

Lesson learned: To force delegation, ask for something eng-mgr cannot answer from metadata — subjective self-assessment, confidence levels, blockers from the dev's perspective.

Act 1: Human Creates Self-Assessment Request

A human proxy creates a bead that requires dev's own perspective:

ssh openclaw@mimas "BEADS_DIR=/home/openclaw/b4arena/intercom/.beads BD_ACTOR=demo bd create \
  'Dev team self-assessment' \
  --labels eng-mgr -p 2 \
  -d 'Ask each dev to self-assess their current situation: What are they blocked on? What needs attention? What is their confidence level on current work? I need each dev to report directly — do not answer from the bead DB alone.' \
  --json 2>/dev/null"

{
  "id": "intercom-0q2",
  "title": "Dev team self-assessment",
  "description": "Ask each dev to self-assess their current situation: What are they blocked on? ...",
  "status": "open",
  "priority": 2,
  "issue_type": "task",
  "created_by": "demo"
}

The key phrase: "do not answer from the bead DB alone" — forces delegation.

Act 2: Pipeline Run #1 — Watcher Detects, Eng-Mgr Wakes

Run the automated pipeline — the same command cron executes every 30 minutes:

ssh openclaw@mimas "BEADS_DIR=/home/openclaw/b4arena/intercom/.beads BD_ACTOR=watcher \
  /home/openclaw/b4arena/scripts/beads-watcher.sh 2>/dev/null | \
  /home/openclaw/b4arena/scripts/beads-notify.sh --map /home/openclaw/b4arena/agents/agent-map.json 2>&1"

Waking b4-eng-mgr (actor: eng-mgr-agent)...
OK: b4-eng-mgr woken

Eng-mgr's response (from agent JSON payload, trimmed):

Interesting task — requires the dev agent to report directly, not just a DB query. Claiming and delegating.

intercom-0q2 — Dev team self-assessment (P2)
- Claimed and triaged
- Created intercom-0q2.1 (label: dev) — dev agent asked to self-report on:
  1. Current assignments
  2. Blockers
  3. Anything needing attention
  4. Confidence level (and why)
  - Explicit instruction: do not pull from DB — reflect actual state
  - Dev to post answers as comment on intercom-0q2, then close the sub-bead
- Parent stays open until dev's self-assessment is received

Verify state — parent claimed, sub-bead created:

ssh openclaw@mimas "BEADS_DIR=/home/openclaw/b4arena/intercom/.beads bd show intercom-0q2 --json 2>/dev/null" \
  | python3 -c "import json,sys; d=json.load(sys.stdin)[0]; print(json.dumps({k:d.get(k) for k in ['id','status','assignee']}, indent=2))"

{
  "id": "intercom-0q2",
  "status": "in_progress",
  "assignee": "eng-mgr-agent"
}

ssh openclaw@mimas "BEADS_DIR=/home/openclaw/b4arena/intercom/.beads bd list --parent intercom-0q2 --json 2>/dev/null" \
  | python3 -c "import json,sys; beads=json.load(sys.stdin); [print(json.dumps({k:b.get(k) for k in ['id','title','status','labels']}, indent=2)) for b in beads]"

{
  "id": "intercom-0q2.1",
  "title": "Self-assessment: current situation, blockers, confidence",
  "status": "open",
  "labels": [
    "dev"
  ]
}

PASS. Eng-mgr claimed the parent, created 1 sub-bead labeled dev with a clear self-assessment mandate. No coding instructions — just "report your status from your own perspective."

Act 3: Pipeline Run #2 — Dev Wakes, Delivers Self-Assessment

Run the pipeline again — cron's next tick:

ssh openclaw@mimas "BEADS_DIR=/home/openclaw/b4arena/intercom/.beads BD_ACTOR=watcher \
  /home/openclaw/b4arena/scripts/beads-watcher.sh 2>/dev/null | \
  /home/openclaw/b4arena/scripts/beads-notify.sh --map /home/openclaw/b4arena/agents/agent-map.json 2>&1"

Waking b4-dev (actor: dev-agent)...
OK: b4-dev woken

Dev's response (from agent JSON payload, trimmed):

Queue clear. Done:

intercom-0q2.1 — Closed

Self-assessment posted on parent intercom-0q2. Key points flagged to eng-mgr:

- No active code blockers — execution confidence is high
- beads-dbb stale 48hrs+ — needs clarification or closure
- dolt state loss between sandbox sessions — manual bd import required each time, risks silent misses
- SOUL.md/TOOLS.md config drift — old paths no longer valid in sandbox
- PR#4 (modulo operator) open and awaiting review

Verify sub-bead closed with status report:

ssh openclaw@mimas "BEADS_DIR=/home/openclaw/b4arena/intercom/.beads bd show intercom-0q2.1 --json 2>/dev/null" \
  | python3 -c "import json,sys; d=json.load(sys.stdin)[0]; print(json.dumps({k:d.get(k) for k in ['id','status','assignee','close_reason']}, indent=2))"

{
  "id": "intercom-0q2.1",
  "status": "closed",
  "assignee": "dev-agent",
  "close_reason": "Self-assessment posted on intercom-0q2: no active blockers on code work; flagged beads-dbb stale clarification, dolt state-loss between sessions, config drift in SOUL.md/TOOLS.md, and PR#4 awaiting review."
}

PASS. Dev checked its workload, wrote a candid self-assessment (including real operational issues!), posted it as a comment on the parent bead, and closed. No code was written. No PRs were opened. Pure message passing.

Act 4: Pipeline Run #3 — Gap G-2: Parent Doesn't Auto-Close

Run the pipeline a third time:

ssh openclaw@mimas "BEADS_DIR=/home/openclaw/b4arena/intercom/.beads BD_ACTOR=watcher \
  /home/openclaw/b4arena/scripts/beads-watcher.sh 2>/dev/null | \
  /home/openclaw/b4arena/scripts/beads-notify.sh --map /home/openclaw/b4arena/agents/agent-map.json 2>&1"

(no output — nothing to wake)

Nothing happens. The parent bead (intercom-0q2) is in_progress, not ready — so the watcher ignores it. The child is closed, but nobody told eng-mgr. This is Gap G-2: completion does not propagate upward automatically.

Today's workaround: manually re-invoke eng-mgr for a status check:

ssh mimas 'sudo -u openclaw bash -c "set -a; source /etc/openclaw/env; set +a; \
  HOME=/home/openclaw /opt/openclaw/bin/openclaw agent --agent b4-eng-mgr \
  --message \"Check status of bead intercom-0q2. Query children with bd list --parent intercom-0q2 --json. If all children are closed, close the parent with a summary of their close_reasons.\" \
  --json --timeout 120 2>/dev/null"' \
  | jq -r '.result.payloads[-1].text'

✅ **intercom-0q2 closed.**

Dev's self-assessment was thorough and candid. Here's the distilled picture:

---

**Dev team health — as of 2026-02-23**

| | |
|---|---|
| **In progress** | Nothing active |
| **Recent completions** | --formal flag (test-greeter PR#1), modulo operator (test-calculator PR#4), smoke tests, README docs |
| **Confidence** | HIGH on code & execution · MEDIUM on DB reliability · LOW on beads-dbb |

**Action items flagged by dev (worth your attention):**

1. **beads-dbb** — "Do the thing" bead, stale 48+ hrs with no response to clarification question. Needs owner decision: clarify, reassign, or close as invalid.
2. **PR#4** (modulo, test-calculator) — merged? Still open and unreviewed.
3. **bd/dolt state loss** — fresh sandbox restarts require a manual `bd import -i .beads/issues.jsonl` before `bd ready` returns anything. Silent failure risk. Needs automation or a documented startup hook.
4. **Config drift** — `SOUL.md`/`TOOLS.md` reference stale paths (`/home/openclaw/b4arena/beads/.beads`) that don't exist in the sandbox. Should be updated to `/workspace/intercom/.beads`.

Verify final state:

ssh openclaw@mimas "BEADS_DIR=/home/openclaw/b4arena/intercom/.beads bd show intercom-0q2 --json 2>/dev/null" \
  | python3 -c "import json,sys; d=json.load(sys.stdin)[0]; print(json.dumps({k:d.get(k) for k in ['id','status','assignee','close_reason']}, indent=2))"

{
  "id": "intercom-0q2",
  "status": "closed",
  "assignee": "eng-mgr-agent",
  "close_reason": "Dev self-assessment complete (intercom-0q2.1, dev-agent, 2026-02-23).\n\nCURRENT WORK: No active beads in_progress. Recent completions: --formal flag (test-greeter PR#1), modulo operator (test-calculator PR#4), 3x calc smoke-tests, 2x README doc beads.\n\nBLOCKERS:\n- beads-dbb ('Do the thing') — stale 48+ hrs, awaiting clarification. Cannot proceed without scope.\n- PR#4 (modulo, test-calculator) — open, unreviewed, unmerged.\n\nNEEDS ATTENTION:\n1. beads-dbb — should be clarified, reassigned, or closed as invalid.\n2. bd/dolt state loss between sessions — 'bd ready' silently returns empty on fresh sandbox restart until manual 'bd import -i .beads/issues.jsonl' is run. Needs automation or docs.\n3. Config drift — SOUL.md/TOOLS.md reference stale paths (/home/openclaw/b4arena/beads/.beads, old repo paths). Should be corrected to /workspace/intercom/.beads.\n4. PR#4 awaiting review/merge.\n\nCONFIDENCE: HIGH on task execution and code changes. MEDIUM on bead DB reliability (import workaround is fragile). LOW on beads-dbb (indefinitely stalled)."
}

PASS. The status report flowed from dev → eng-mgr → bead close_reason, ready for the human to read. The close_reason contains a structured, actionable report.

Act 5: Session Replays

What did the agents actually do? just replay shows their full timelines:

Eng-Mgr Timeline

just replay b4-eng-mgr --after "2026-02-23T08:24"

═══ Session: fa455e55-8f56-4fa3-88e1-ef060c5d4031 (b4-eng-mgr) ═══════════════

── 2026-02-23T08:24:27 WAKE ──────────────────────────────────────────────────
  Ready beads for your role: intercom-0q2. Process in priority order.

  08:24:32  TOOL    exec -> cd /workspace/intercom && git pull && bd list --status in_progress ...
  08:24:32  TOOL    exec -> bd show intercom-0q2 --json
  08:24:43  TEXT    Interesting task — requires the dev agent to report directly, not just a DB query.
                    Claiming and delegating.
  08:24:43  TOOL    exec -> bd update intercom-0q2 --claim --json
  08:24:51  TOOL    exec -> bd create "Self-assessment: current situation, blockers, confidence"
                            --labels dev --parent intercom-0q2 ...
  08:24:57  TOOL    exec -> bd comments add intercom-0q2 "Triaged. Created sub-task intercom-0q2.1..."
  08:25:05  TEXT    Push failed — likely a fast-forward issue. Pulling and retrying.
  08:25:05  TOOL    exec -> cd /workspace/intercom && git pull --rebase && git push
  08:25:08  TOOL    exec -> cd /workspace/intercom && git stash && git pull --rebase && git stash pop
                            && git add -A && git commit -m "sync" && git push
  08:25:18  TEXT    Pushed successfully. Queue is clear.

── 2026-02-23T08:27:26 WAKE ──────────────────────────────────────────────────
  Check status of bead intercom-0q2. Query children...

  08:27:28  TOOL    exec -> cd /workspace/intercom && git pull && bd show intercom-0q2 intercom-0q2.1 --json
  08:27:39  TEXT    All 1 child closed. Rich self-assessment in hand — closing the parent with
                    a full aggregated summary.
  08:27:39  TOOL    exec -> bd close intercom-0q2 --reason "Dev self-assessment complete..."
  08:27:44  TOOL    exec -> cd /workspace/intercom && bd export && git add -A && git commit -m "sync"
                            && git push
  08:27:56  TEXT    intercom-0q2 closed.

═══ Summary ═══════════════════════════════════════════════════════
  Duration: 3m 29s (across both wake-ups)
  Wake-ups: 2
  Tool calls: 11

Eng-mgr never touched code. It read the bead, recognized the delegation requirement, claimed, created a sub-bead, waited. On re-invocation: read child close_reason, aggregated into a structured report, closed parent.

Dev Timeline

just replay b4-dev --after "2026-02-23T08:25"

═══ Session: ebe69d1b-8b7f-41ed-ba1d-9e516179f934 (b4-dev) ═══════════════════

── 2026-02-23T08:25:40 WAKE ──────────────────────────────────────────────────
  Ready beads for your role: intercom-0q2.1. Process in priority order.

  08:25:44  TOOL    exec -> cd /workspace/intercom && git pull && bd list --status in_progress ...
  08:26:05  TEXT    Claiming and composing an honest self-assessment.
  08:26:05  TOOL    exec -> bd update intercom-0q2.1 --claim
  08:26:10  TEXT    Let me also check the full open bead list for context before writing.
  08:26:10  TOOL    exec -> bd list --json
  08:26:37  TEXT    Now posting the self-assessment on the parent bead.
  08:26:37  TOOL    exec -> bd comments add intercom-0q2 "SELF-ASSESSMENT — dev-agent (2026-02-23)..."
  08:26:41  TOOL    exec -> bd close intercom-0q2.1 --reason "Self-assessment posted on intercom-0q2..."
  08:26:46  TOOL    exec -> cd /workspace/intercom && bd export -o .beads/issues.jsonl
                            && git add -A && git push
  08:26:52  TOOL    exec -> bd ready --json
  08:27:01  TEXT    Queue clear. Done.

═══ Summary ═══════════════════════════════════════════════════════
  Duration: 1m 20s
  Wake-ups: 1
  Tool calls: 7

Dev never cloned a repo, never ran tests, never opened a PR. It checked its bead queue, wrote a candid self-assessment with real operational concerns, posted it as a comment on the parent bead, closed its sub-bead, and synced.

Message Flow

Human (demo — Chief of Staff)
│
├─ bd create "Dev team self-assessment" --labels eng-mgr
│  └─ intercom-0q2 (open, P2)
│
├─ Pipeline run #1:
│  ├─ Watcher: WAKE eng-mgr intercom-0q2
│  └─ beads-notify.sh → openclaw agent --agent b4-eng-mgr (automatic)
│
├─ Eng-mgr (51s):
│  ├─ bd show intercom-0q2 → reads inquiry
│  ├─ bd update intercom-0q2 --claim → in_progress
│  └─ bd create "Self-assessment: blockers, confidence" --parent intercom-0q2 --labels dev
│     └─ intercom-0q2.1 (open, label: dev)
│
├─ Pipeline run #2:
│  ├─ Watcher: WAKE dev intercom-0q2.1
│  └─ beads-notify.sh → openclaw agent --agent b4-dev (automatic)
│
├─ Dev (80s):
│  ├─ bd show intercom-0q2.1 → reads request
│  ├─ bd list → checks own workload and bead history
│  ├─ bd comments add intercom-0q2 "SELF-ASSESSMENT — dev-agent..."
│  └─ bd close intercom-0q2.1 --reason "Self-assessment posted"
│     └─ intercom-0q2.1 (closed, status in close_reason)
│
├─ Pipeline run #3: (nothing — parent is in_progress, Gap G-2)
│
├─ Manual re-invocation of eng-mgr (30s):
│  ├─ bd show intercom-0q2 intercom-0q2.1 → reads child close_reason + comment
│  └─ bd close intercom-0q2 --reason "Aggregated status report with action items"
│     └─ intercom-0q2 (closed)
│
└─ Human reads close_reason:
   "CURRENT WORK: Nothing active.
    BLOCKERS: beads-dbb stale, PR#4 unreviewed.
    CONFIDENCE: HIGH execution, MEDIUM DB reliability, LOW beads-dbb."

Summary

Act	Pipeline Run	Agent	Duration	Action	Result
1	—	(human)	—	Create self-assessment request	`intercom-0q2` open
2	#1	eng-mgr	51s	Auto-woken, claims parent, delegates to dev	`intercom-0q2.1` open
3	#2	dev	80s	Auto-woken, self-assesses, closes sub-bead	`intercom-0q2.1` closed
4	#3	—	—	Pipeline finds nothing (Gap G-2)	No wake
4	(manual)	eng-mgr	30s	Reads child, aggregates, closes parent	`intercom-0q2` closed

What This Proves

Claim	Evidence
Beads work for message passing, not just coding tasks	Entire demo has zero code changes, zero PRs, zero git branches
Automated pipeline dispatches without human intervention	Acts 2–3 use identical pipeline command (what cron runs every 30 min)
Agents respond appropriately to non-coding requests	Dev reports honest self-assessment instead of trying to write code
close_reason carries structured information upward	Dev's blockers and confidence levels flow through eng-mgr to the human
Agents surface real operational issues unprompted	Dev flagged dolt state loss, config drift, stale beads — not in the task description
Session replay reveals agent decision-making	`just replay` shows tool calls, timing, and reasoning

Unexpected Bonus: Dev Surfaced Real Issues

The self-assessment wasn't canned — dev flagged 4 genuine operational problems:

beads-dbb stale 48+ hrs — a bead with no clear scope that's blocking dev's queue
PR#4 unreviewed — modulo operator PR from the Tier 3 coding demo, still open
bd/dolt state loss — bd ready returns empty on fresh sandbox restart until manual import
Config drift — SOUL.md references paths that don't exist in the sandbox

These are real findings worth acting on. The message-passing demo accidentally became a useful operational audit.

Gap G-2: Completion Propagation

The most visible gap in this demo: when dev closes the sub-bead, eng-mgr is not automatically notified. The watcher only polls bd ready — in_progress beads are excluded. Today's workaround is manual re-invocation.

Fix options (from docs/intercom.md):

Watcher tracks parent-child relationships; when a child closes, wake the parent's assignee
bd list --updated-after <timestamp> to detect state changes since last poll
A "completion callback" bead convention — child writes a comment on parent before closing

Until G-2 is fixed, the last mile of every delegation chain requires manual intervention or a scheduled status-check cron job per active parent bead.

Prompt Engineering Observation

The first attempt (intercom-91x) used a factual status query — eng-mgr answered it directly from bd list without delegating. The successful attempt required subjective information (confidence levels, blockers from dev's perspective) that eng-mgr couldn't extract from bead metadata. Lesson: if you want delegation to happen, ask for something the manager can't answer alone.

Appendix: Token Costs

Agent	Wake-ups	Duration	Tokens (output)
b4-eng-mgr	2	51s + 30s	1,443 + 1,041
b4-dev	1	80s	3,054
Total	3	~2m 41s	~5,538 output

The entire status inquiry — from human question to structured answer — cost roughly 5.5k output tokens across 3 agent invocations. The Tier 1 watcher (shell scripts) consumed 0 tokens.

Pre-Flight: Agent State​

Failed Attempt: Generic Status Inquiry​

Act 1: Human Creates Self-Assessment Request​

Act 2: Pipeline Run #1 — Watcher Detects, Eng-Mgr Wakes​

Act 3: Pipeline Run #2 — Dev Wakes, Delivers Self-Assessment​

Act 4: Pipeline Run #3 — Gap G-2: Parent Doesn't Auto-Close​

Act 5: Session Replays​

Eng-Mgr Timeline​

Dev Timeline​

Message Flow​

Summary​

What This Proves​

Unexpected Bonus: Dev Surfaced Real Issues​

Gap G-2: Completion Propagation​

Prompt Engineering Observation​

Appendix: Token Costs​