From fc6297efa54e1949b1c62a980d5ef2b3d2628dec Mon Sep 17 00:00:00 2001 From: Alexander Whitestone Date: Sat, 4 Apr 2026 17:20:20 -0400 Subject: [PATCH] docs: define hub-and-spoke IPC doctrine (#157) --- README.md | 6 +- docs/ipc-hub-and-spoke-doctrine.md | 166 +++++++++++++++++++++++++++++ 2 files changed, 171 insertions(+), 1 deletion(-) create mode 100644 docs/ipc-hub-and-spoke-doctrine.md diff --git a/README.md b/README.md index a82de6f7..bd64f65f 100644 --- a/README.md +++ b/README.md @@ -25,7 +25,9 @@ timmy-config/ ├── skins/ ← UI skins (timmy skin) ├── playbooks/ ← Agent playbooks (YAML) ├── cron/ ← Cron job definitions -├── docs/automation-inventory.md ← Live automation + stale-state inventory +├── docs/ +│ ├── automation-inventory.md ← Live automation + stale-state inventory +│ └── ipc-hub-and-spoke-doctrine.md ← Coordinator-first, transport-agnostic fleet IPC doctrine └── training/ ← Transitional training recipes, not canonical lived data ``` @@ -45,6 +47,8 @@ The scripts in `bin/` are sidecar-managed operational helpers for the Hermes lay Do NOT assume older prose about removed loops is still true at runtime. Audit the live machine first, then read `docs/automation-inventory.md` for the current reality and stale-state risks. +For fleet routing semantics over sovereign transport, read +`docs/ipc-hub-and-spoke-doctrine.md`. ## Orchestration: Huey diff --git a/docs/ipc-hub-and-spoke-doctrine.md b/docs/ipc-hub-and-spoke-doctrine.md new file mode 100644 index 00000000..d96739e4 --- /dev/null +++ b/docs/ipc-hub-and-spoke-doctrine.md @@ -0,0 +1,166 @@ +# IPC Doctrine: Hub-and-Spoke Semantics over Sovereign Transport + +Status: canonical doctrine for issue #157 +Parent: #154 +Related migration work: +- [`../son-of-timmy.md`](../son-of-timmy.md) for Timmy's layered communications worldview +- [`nostr_agent_research.md`](nostr_agent_research.md) for one sovereign transport candidate under evaluation + +## Why this exists + +Timmy is in an ongoing migration toward sovereign transport. +The first question is not which bus wins. The first question is what semantics every bus must preserve. +Those semantics matter more than any one transport. + +Telegram is not the target backbone for fleet IPC. +It may exist as a temporary edge or operator convenience while migration is in flight, but the architecture we are building toward must stand on sovereign transport. + +This doctrine defines the routing and failure semantics that any transport adapter must honor, whether the carrier is Matrix, Nostr, NATS, or something we have not picked yet. + +## Roles + +- Coordinator: the only actor allowed to own routing authority for live agent work +- Spoke: an executing agent that receives work, asks for clarification, and returns results +- Durable execution truth: the visible task system of record, which remains authoritative for ownership and state transitions +- Operator: the human principal who can direct the coordinator but is not a transport shim + +Timmy world-state stays the same while transport changes: +- Gitea remains visible execution truth +- live IPC accelerates coordination, but does not become a hidden source of authority +- transport migration may change the wire, but not the rules + +## Core rules + +### 1. Coordinator-first routing + +Coordinator-first routing is the default system rule. + +- All new work enters through the coordinator +- All reroutes, cancellations, escalations, and cross-agent handoffs go through the coordinator +- A spoke receives assignments from the coordinator and reports back to the coordinator +- A spoke does not mutate the routing graph on its own +- If route intent is ambiguous, the system should fail closed and ask the coordinator instead of guessing a peer path + +The coordinator is the hub. +Spokes are not free-roaming routers. + +### 2. Anti-cascade behavior + +The system must resist cascade failures and mesh chatter. + +- A spoke MUST NOT recursively fan out work to other spokes +- A spoke MUST NOT create hidden side queues or recruit additional agents without coordinator approval +- Broadcasts are coordinator-owned and should be rare, deliberate, and bounded +- Retries must be bounded and idempotent +- Transport adapters must not auto-bridge, auto-replay, or auto-forward in ways that amplify loops or duplicate storms + +A worker that encounters new sub-work should escalate back to the coordinator. +It should not become a shadow dispatcher. + +### 3. Limited peer mesh + +Direct spoke-to-spoke communication is an exception, not the default. + +It is allowed only when the coordinator opens an explicit peer window. +That peer window must define: +- the allowed participants +- the task or correlation ID +- the narrow purpose +- the expiry, timeout, or close condition +- the expected artifact or summary that returns to the coordinator + +Peer windows are tightly scoped: +- they are time-bounded +- they are non-transitive +- they do not grant standing routing authority +- they close back to coordinator-first behavior when the declared purpose is complete + +Good uses for a peer window: +- artifact handoff between two already-assigned agents +- verifier-to-builder clarification on a bounded review loop +- short-lived data exchange where routing everything through the coordinator would be pure latency + +Bad uses for a peer window: +- ad hoc planning rings +- recursive delegation chains +- quorum gossip +- hidden ownership changes +- free-form peer mesh as the normal operating mode + +### 4. Transport independence + +The doctrine is transport-agnostic on purpose. + +NATS, Matrix, Nostr, or a future bus are acceptable only if they preserve the same semantics. +If a transport cannot preserve these semantics, it is not acceptable as the fleet backbone. + +A valid transport layer must carry or emulate: +- authenticated sender identity +- intended recipient or bounded scope +- task or work identifier +- correlation identifier +- message type +- timeout or TTL semantics +- acknowledgement or explicit timeout behavior +- idempotency or deduplication signals + +Transport choice does not change authority. +Semantics matter more than any one transport. + +### 5. Circuit breakers + +Every acceptable IPC layer must support circuit-breaker behavior. + +At minimum, the system must be able to: +- isolate a noisy or unhealthy spoke +- stop new dispatches onto a failing route +- disable direct peer windows and collapse back to strict hub-and-spoke mode +- stop retrying after a bounded count or deadline +- quarantine duplicate storms, fan-out anomalies, or missing coordinator acknowledgements instead of amplifying them + +When a breaker trips, the fallback is slower coordinator-mediated operation over durable machine-readable channels. +It is not a return to hidden relays. +It is not a reason to rebuild the fleet around Telegram. + +No human-token fallback patterns: +- do not route agent IPC through personal chat identities +- do not rely on operator copy-paste as a standing transport layer +- do not treat human-owned bot tokens as the resilience plan + +## Required message classes + +Any transport mapping should preserve these message classes, even if the carrier names differ: + +- dispatch +- ack or nack +- status or progress +- clarify or question +- result +- failure or escalation +- control messages such as cancel, pause, resume, open-peer-window, and close-peer-window + +## Failure semantics + +When things break, authority should degrade safely. + +- If a spoke loses contact with the coordinator, it may finish currently safe local work and persist a checkpoint, but it must not appoint itself as a router +- If a spoke receives an unscoped peer message, it should ignore or quarantine it and report the event to the coordinator when possible +- If delivery is duplicated or reordered, recipients should prefer correlation IDs and idempotency keys over guesswork +- If the live transport is degraded, the system may fall back to slower durable coordination paths, but routing authority remains coordinator-first + +## World-state alignment + +This doctrine sits above transport selection. +It does not try to settle every Matrix-vs-Nostr-vs-NATS debate inside one file. +It constrains those choices. + +Current Timmy alignment: +- sovereign transport migration is ongoing +- Telegram is not the backbone we are building toward +- Matrix remains relevant for human-to-fleet interaction +- Nostr remains relevant as a sovereign option under evaluation +- NATS remains relevant as a strong internal bus candidate +- the semantics stay constant across all of them + +If we swap the wire and keep the semantics, the fleet stays coherent. +If we keep the wire and lose the semantics, the fleet regresses into chatter, hidden routing, and cascade failure. -- 2.43.0