167 lines
6.9 KiB
Markdown
167 lines
6.9 KiB
Markdown
# IPC Doctrine: Hub-and-Spoke Semantics over Sovereign Transport
|
|
|
|
Status: canonical doctrine for issue #157
|
|
Parent: #154
|
|
Related migration work:
|
|
- [`../son-of-timmy.md`](../son-of-timmy.md) for Timmy's layered communications worldview
|
|
- [`nostr_agent_research.md`](nostr_agent_research.md) for one sovereign transport candidate under evaluation
|
|
|
|
## Why this exists
|
|
|
|
Timmy is in an ongoing migration toward sovereign transport.
|
|
The first question is not which bus wins. The first question is what semantics every bus must preserve.
|
|
Those semantics matter more than any one transport.
|
|
|
|
Telegram is not the target backbone for fleet IPC.
|
|
It may exist as a temporary edge or operator convenience while migration is in flight, but the architecture we are building toward must stand on sovereign transport.
|
|
|
|
This doctrine defines the routing and failure semantics that any transport adapter must honor, whether the carrier is Matrix, Nostr, NATS, or something we have not picked yet.
|
|
|
|
## Roles
|
|
|
|
- Coordinator: the only actor allowed to own routing authority for live agent work
|
|
- Spoke: an executing agent that receives work, asks for clarification, and returns results
|
|
- Durable execution truth: the visible task system of record, which remains authoritative for ownership and state transitions
|
|
- Operator: the human principal who can direct the coordinator but is not a transport shim
|
|
|
|
Timmy world-state stays the same while transport changes:
|
|
- Gitea remains visible execution truth
|
|
- live IPC accelerates coordination, but does not become a hidden source of authority
|
|
- transport migration may change the wire, but not the rules
|
|
|
|
## Core rules
|
|
|
|
### 1. Coordinator-first routing
|
|
|
|
Coordinator-first routing is the default system rule.
|
|
|
|
- All new work enters through the coordinator
|
|
- All reroutes, cancellations, escalations, and cross-agent handoffs go through the coordinator
|
|
- A spoke receives assignments from the coordinator and reports back to the coordinator
|
|
- A spoke does not mutate the routing graph on its own
|
|
- If route intent is ambiguous, the system should fail closed and ask the coordinator instead of guessing a peer path
|
|
|
|
The coordinator is the hub.
|
|
Spokes are not free-roaming routers.
|
|
|
|
### 2. Anti-cascade behavior
|
|
|
|
The system must resist cascade failures and mesh chatter.
|
|
|
|
- A spoke MUST NOT recursively fan out work to other spokes
|
|
- A spoke MUST NOT create hidden side queues or recruit additional agents without coordinator approval
|
|
- Broadcasts are coordinator-owned and should be rare, deliberate, and bounded
|
|
- Retries must be bounded and idempotent
|
|
- Transport adapters must not auto-bridge, auto-replay, or auto-forward in ways that amplify loops or duplicate storms
|
|
|
|
A worker that encounters new sub-work should escalate back to the coordinator.
|
|
It should not become a shadow dispatcher.
|
|
|
|
### 3. Limited peer mesh
|
|
|
|
Direct spoke-to-spoke communication is an exception, not the default.
|
|
|
|
It is allowed only when the coordinator opens an explicit peer window.
|
|
That peer window must define:
|
|
- the allowed participants
|
|
- the task or correlation ID
|
|
- the narrow purpose
|
|
- the expiry, timeout, or close condition
|
|
- the expected artifact or summary that returns to the coordinator
|
|
|
|
Peer windows are tightly scoped:
|
|
- they are time-bounded
|
|
- they are non-transitive
|
|
- they do not grant standing routing authority
|
|
- they close back to coordinator-first behavior when the declared purpose is complete
|
|
|
|
Good uses for a peer window:
|
|
- artifact handoff between two already-assigned agents
|
|
- verifier-to-builder clarification on a bounded review loop
|
|
- short-lived data exchange where routing everything through the coordinator would be pure latency
|
|
|
|
Bad uses for a peer window:
|
|
- ad hoc planning rings
|
|
- recursive delegation chains
|
|
- quorum gossip
|
|
- hidden ownership changes
|
|
- free-form peer mesh as the normal operating mode
|
|
|
|
### 4. Transport independence
|
|
|
|
The doctrine is transport-agnostic on purpose.
|
|
|
|
NATS, Matrix, Nostr, or a future bus are acceptable only if they preserve the same semantics.
|
|
If a transport cannot preserve these semantics, it is not acceptable as the fleet backbone.
|
|
|
|
A valid transport layer must carry or emulate:
|
|
- authenticated sender identity
|
|
- intended recipient or bounded scope
|
|
- task or work identifier
|
|
- correlation identifier
|
|
- message type
|
|
- timeout or TTL semantics
|
|
- acknowledgement or explicit timeout behavior
|
|
- idempotency or deduplication signals
|
|
|
|
Transport choice does not change authority.
|
|
Semantics matter more than any one transport.
|
|
|
|
### 5. Circuit breakers
|
|
|
|
Every acceptable IPC layer must support circuit-breaker behavior.
|
|
|
|
At minimum, the system must be able to:
|
|
- isolate a noisy or unhealthy spoke
|
|
- stop new dispatches onto a failing route
|
|
- disable direct peer windows and collapse back to strict hub-and-spoke mode
|
|
- stop retrying after a bounded count or deadline
|
|
- quarantine duplicate storms, fan-out anomalies, or missing coordinator acknowledgements instead of amplifying them
|
|
|
|
When a breaker trips, the fallback is slower coordinator-mediated operation over durable machine-readable channels.
|
|
It is not a return to hidden relays.
|
|
It is not a reason to rebuild the fleet around Telegram.
|
|
|
|
No human-token fallback patterns:
|
|
- do not route agent IPC through personal chat identities
|
|
- do not rely on operator copy-paste as a standing transport layer
|
|
- do not treat human-owned bot tokens as the resilience plan
|
|
|
|
## Required message classes
|
|
|
|
Any transport mapping should preserve these message classes, even if the carrier names differ:
|
|
|
|
- dispatch
|
|
- ack or nack
|
|
- status or progress
|
|
- clarify or question
|
|
- result
|
|
- failure or escalation
|
|
- control messages such as cancel, pause, resume, open-peer-window, and close-peer-window
|
|
|
|
## Failure semantics
|
|
|
|
When things break, authority should degrade safely.
|
|
|
|
- If a spoke loses contact with the coordinator, it may finish currently safe local work and persist a checkpoint, but it must not appoint itself as a router
|
|
- If a spoke receives an unscoped peer message, it should ignore or quarantine it and report the event to the coordinator when possible
|
|
- If delivery is duplicated or reordered, recipients should prefer correlation IDs and idempotency keys over guesswork
|
|
- If the live transport is degraded, the system may fall back to slower durable coordination paths, but routing authority remains coordinator-first
|
|
|
|
## World-state alignment
|
|
|
|
This doctrine sits above transport selection.
|
|
It does not try to settle every Matrix-vs-Nostr-vs-NATS debate inside one file.
|
|
It constrains those choices.
|
|
|
|
Current Timmy alignment:
|
|
- sovereign transport migration is ongoing
|
|
- Telegram is not the backbone we are building toward
|
|
- Matrix remains relevant for human-to-fleet interaction
|
|
- Nostr remains relevant as a sovereign option under evaluation
|
|
- NATS remains relevant as a strong internal bus candidate
|
|
- the semantics stay constant across all of them
|
|
|
|
If we swap the wire and keep the semantics, the fleet stays coherent.
|
|
If we keep the wire and lose the semantics, the fleet regresses into chatter, hidden routing, and cascade failure.
|