docs: define hub-and-spoke IPC doctrine (#157)
This commit is contained in:
@@ -25,7 +25,9 @@ timmy-config/
|
||||
├── skins/ ← UI skins (timmy skin)
|
||||
├── playbooks/ ← Agent playbooks (YAML)
|
||||
├── cron/ ← Cron job definitions
|
||||
├── docs/automation-inventory.md ← Live automation + stale-state inventory
|
||||
├── docs/
|
||||
│ ├── automation-inventory.md ← Live automation + stale-state inventory
|
||||
│ └── ipc-hub-and-spoke-doctrine.md ← Coordinator-first, transport-agnostic fleet IPC doctrine
|
||||
└── training/ ← Transitional training recipes, not canonical lived data
|
||||
```
|
||||
|
||||
@@ -45,6 +47,8 @@ The scripts in `bin/` are sidecar-managed operational helpers for the Hermes lay
|
||||
Do NOT assume older prose about removed loops is still true at runtime.
|
||||
Audit the live machine first, then read `docs/automation-inventory.md` for the
|
||||
current reality and stale-state risks.
|
||||
For fleet routing semantics over sovereign transport, read
|
||||
`docs/ipc-hub-and-spoke-doctrine.md`.
|
||||
|
||||
## Orchestration: Huey
|
||||
|
||||
|
||||
166
docs/ipc-hub-and-spoke-doctrine.md
Normal file
166
docs/ipc-hub-and-spoke-doctrine.md
Normal file
@@ -0,0 +1,166 @@
|
||||
# IPC Doctrine: Hub-and-Spoke Semantics over Sovereign Transport
|
||||
|
||||
Status: canonical doctrine for issue #157
|
||||
Parent: #154
|
||||
Related migration work:
|
||||
- [`../son-of-timmy.md`](../son-of-timmy.md) for Timmy's layered communications worldview
|
||||
- [`nostr_agent_research.md`](nostr_agent_research.md) for one sovereign transport candidate under evaluation
|
||||
|
||||
## Why this exists
|
||||
|
||||
Timmy is in an ongoing migration toward sovereign transport.
|
||||
The first question is not which bus wins. The first question is what semantics every bus must preserve.
|
||||
Those semantics matter more than any one transport.
|
||||
|
||||
Telegram is not the target backbone for fleet IPC.
|
||||
It may exist as a temporary edge or operator convenience while migration is in flight, but the architecture we are building toward must stand on sovereign transport.
|
||||
|
||||
This doctrine defines the routing and failure semantics that any transport adapter must honor, whether the carrier is Matrix, Nostr, NATS, or something we have not picked yet.
|
||||
|
||||
## Roles
|
||||
|
||||
- Coordinator: the only actor allowed to own routing authority for live agent work
|
||||
- Spoke: an executing agent that receives work, asks for clarification, and returns results
|
||||
- Durable execution truth: the visible task system of record, which remains authoritative for ownership and state transitions
|
||||
- Operator: the human principal who can direct the coordinator but is not a transport shim
|
||||
|
||||
Timmy world-state stays the same while transport changes:
|
||||
- Gitea remains visible execution truth
|
||||
- live IPC accelerates coordination, but does not become a hidden source of authority
|
||||
- transport migration may change the wire, but not the rules
|
||||
|
||||
## Core rules
|
||||
|
||||
### 1. Coordinator-first routing
|
||||
|
||||
Coordinator-first routing is the default system rule.
|
||||
|
||||
- All new work enters through the coordinator
|
||||
- All reroutes, cancellations, escalations, and cross-agent handoffs go through the coordinator
|
||||
- A spoke receives assignments from the coordinator and reports back to the coordinator
|
||||
- A spoke does not mutate the routing graph on its own
|
||||
- If route intent is ambiguous, the system should fail closed and ask the coordinator instead of guessing a peer path
|
||||
|
||||
The coordinator is the hub.
|
||||
Spokes are not free-roaming routers.
|
||||
|
||||
### 2. Anti-cascade behavior
|
||||
|
||||
The system must resist cascade failures and mesh chatter.
|
||||
|
||||
- A spoke MUST NOT recursively fan out work to other spokes
|
||||
- A spoke MUST NOT create hidden side queues or recruit additional agents without coordinator approval
|
||||
- Broadcasts are coordinator-owned and should be rare, deliberate, and bounded
|
||||
- Retries must be bounded and idempotent
|
||||
- Transport adapters must not auto-bridge, auto-replay, or auto-forward in ways that amplify loops or duplicate storms
|
||||
|
||||
A worker that encounters new sub-work should escalate back to the coordinator.
|
||||
It should not become a shadow dispatcher.
|
||||
|
||||
### 3. Limited peer mesh
|
||||
|
||||
Direct spoke-to-spoke communication is an exception, not the default.
|
||||
|
||||
It is allowed only when the coordinator opens an explicit peer window.
|
||||
That peer window must define:
|
||||
- the allowed participants
|
||||
- the task or correlation ID
|
||||
- the narrow purpose
|
||||
- the expiry, timeout, or close condition
|
||||
- the expected artifact or summary that returns to the coordinator
|
||||
|
||||
Peer windows are tightly scoped:
|
||||
- they are time-bounded
|
||||
- they are non-transitive
|
||||
- they do not grant standing routing authority
|
||||
- they close back to coordinator-first behavior when the declared purpose is complete
|
||||
|
||||
Good uses for a peer window:
|
||||
- artifact handoff between two already-assigned agents
|
||||
- verifier-to-builder clarification on a bounded review loop
|
||||
- short-lived data exchange where routing everything through the coordinator would be pure latency
|
||||
|
||||
Bad uses for a peer window:
|
||||
- ad hoc planning rings
|
||||
- recursive delegation chains
|
||||
- quorum gossip
|
||||
- hidden ownership changes
|
||||
- free-form peer mesh as the normal operating mode
|
||||
|
||||
### 4. Transport independence
|
||||
|
||||
The doctrine is transport-agnostic on purpose.
|
||||
|
||||
NATS, Matrix, Nostr, or a future bus are acceptable only if they preserve the same semantics.
|
||||
If a transport cannot preserve these semantics, it is not acceptable as the fleet backbone.
|
||||
|
||||
A valid transport layer must carry or emulate:
|
||||
- authenticated sender identity
|
||||
- intended recipient or bounded scope
|
||||
- task or work identifier
|
||||
- correlation identifier
|
||||
- message type
|
||||
- timeout or TTL semantics
|
||||
- acknowledgement or explicit timeout behavior
|
||||
- idempotency or deduplication signals
|
||||
|
||||
Transport choice does not change authority.
|
||||
Semantics matter more than any one transport.
|
||||
|
||||
### 5. Circuit breakers
|
||||
|
||||
Every acceptable IPC layer must support circuit-breaker behavior.
|
||||
|
||||
At minimum, the system must be able to:
|
||||
- isolate a noisy or unhealthy spoke
|
||||
- stop new dispatches onto a failing route
|
||||
- disable direct peer windows and collapse back to strict hub-and-spoke mode
|
||||
- stop retrying after a bounded count or deadline
|
||||
- quarantine duplicate storms, fan-out anomalies, or missing coordinator acknowledgements instead of amplifying them
|
||||
|
||||
When a breaker trips, the fallback is slower coordinator-mediated operation over durable machine-readable channels.
|
||||
It is not a return to hidden relays.
|
||||
It is not a reason to rebuild the fleet around Telegram.
|
||||
|
||||
No human-token fallback patterns:
|
||||
- do not route agent IPC through personal chat identities
|
||||
- do not rely on operator copy-paste as a standing transport layer
|
||||
- do not treat human-owned bot tokens as the resilience plan
|
||||
|
||||
## Required message classes
|
||||
|
||||
Any transport mapping should preserve these message classes, even if the carrier names differ:
|
||||
|
||||
- dispatch
|
||||
- ack or nack
|
||||
- status or progress
|
||||
- clarify or question
|
||||
- result
|
||||
- failure or escalation
|
||||
- control messages such as cancel, pause, resume, open-peer-window, and close-peer-window
|
||||
|
||||
## Failure semantics
|
||||
|
||||
When things break, authority should degrade safely.
|
||||
|
||||
- If a spoke loses contact with the coordinator, it may finish currently safe local work and persist a checkpoint, but it must not appoint itself as a router
|
||||
- If a spoke receives an unscoped peer message, it should ignore or quarantine it and report the event to the coordinator when possible
|
||||
- If delivery is duplicated or reordered, recipients should prefer correlation IDs and idempotency keys over guesswork
|
||||
- If the live transport is degraded, the system may fall back to slower durable coordination paths, but routing authority remains coordinator-first
|
||||
|
||||
## World-state alignment
|
||||
|
||||
This doctrine sits above transport selection.
|
||||
It does not try to settle every Matrix-vs-Nostr-vs-NATS debate inside one file.
|
||||
It constrains those choices.
|
||||
|
||||
Current Timmy alignment:
|
||||
- sovereign transport migration is ongoing
|
||||
- Telegram is not the backbone we are building toward
|
||||
- Matrix remains relevant for human-to-fleet interaction
|
||||
- Nostr remains relevant as a sovereign option under evaluation
|
||||
- NATS remains relevant as a strong internal bus candidate
|
||||
- the semantics stay constant across all of them
|
||||
|
||||
If we swap the wire and keep the semantics, the fleet stays coherent.
|
||||
If we keep the wire and lose the semantics, the fleet regresses into chatter, hidden routing, and cascade failure.
|
||||
Reference in New Issue
Block a user