[EPIC] Operation Sovereign Comms — NATS + Matrix + Nostr Identity (Telegram Replacement) #396

Closed
opened 2026-04-04 18:26:59 +00:00 by Timmy · 2 comments
Owner

Operation Sovereign Comms — Architecture & Phased Plan

Goal: Replace Telegram with a 3-layer sovereign communication stack. No permissioned tokens. No platform risk. Automated agent creation.

Decision: Alexander approved this architecture on April 4, 2026. Nostr is the identity layer, NOT the transport layer.


Architecture

┌─────────────────────────────────────────────────────┐
│                  ALEXANDER (Human)                   │
│           Element app (phone + desktop)              │
│           Matrix account: @alexander:timmy.ai        │
└──────────────────────┬──────────────────────────────┘
                       │ Matrix (E2EE)
                       ▼
┌─────────────────────────────────────────────────────┐
│              LAYER 3: MATRIX (Conduit)               │
│         Human-to-Fleet Communication                 │
│    Hermes VPS · Docker · Caddy TLS · ~50MB RAM       │
│                                                      │
│  Rooms:                                              │
│    #fleet-ops     — dispatch, status, alerts          │
│    #timmy-home    — repo work discussion              │
│    #the-nexus     — nexus work discussion             │
│    #war-room      — burn nights, emergencies          │
│    DMs            — 1:1 with any agent                │
│                                                      │
│  Bots: matrix-nio Python SDK, auto-join rooms         │
│  Registration: shared-secret (no BotFather)           │
└──────────────────────┬──────────────────────────────┘
                       │ Internal bridge
                       ▼
┌─────────────────────────────────────────────────────┐
│              LAYER 1: NATS (nats-server)             │
│          Agent-to-Agent Fleet Orchestration           │
│   Allegro VPS · Bare metal · NKey auth · ~50MB RAM   │
│                                                      │
│  Subjects:                                           │
│    fleet.heartbeat.{agent_id}  — 30s pulse            │
│    fleet.task.dispatch         — queue group work      │
│    fleet.task.{agent_id}       — direct assignment     │
│    fleet.result.{task_id}      — JetStream persisted   │
│    fleet.status.{agent_id}     — model, load, issues   │
│    fleet.control.shutdown      — graceful shutdown      │
│    fleet.control.config        — hot config reload      │
│                                                      │
│  JetStream Streams:                                  │
│    TASK_RESULTS  — 7d retention, file-backed          │
│    TASK_QUEUE    — work queue, delete-after-ack        │
│    AGENT_EVENTS  — 24h, memory-backed                 │
└─────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────┐
│              LAYER 2: NOSTR (Identity)               │
│          Keypair Identity for Every Entity            │
│                                                      │
│  Every agent has an npub/nsec:                        │
│    - Used to sign Gitea commits                       │
│    - Used for cross-system identity verification      │
│    - Stored in keystore.json per agent                │
│    - NOT used as transport (NATS handles that)        │
│                                                      │
│  Existing: relay, allegro, ezra keypairs              │
│  Needed: all 16 agents + alexander                    │
│  strfry relay stays for public identity broadcast     │
└─────────────────────────────────────────────────────┘

Phase 1: NATS Fleet Bus (Week 1)

Owner: Timmy
Where: Allegro VPS (167.99.20.209)

  • Install nats-server bare metal on Allegro VPS
  • Configure NKey auth — generate keypairs for all 16 agents
  • Configure JetStream with 3 streams (tasks, results, events)
  • Write hermes_nats_client.py — drop-in async mixin for Hermes agents
  • Agent heartbeat: every 30s, publish to fleet.heartbeat.{id}
  • Smoke test: 2 agents exchange messages via NATS (no Telegram)
  • Acceptance: nats sub 'fleet.heartbeat.>' shows heartbeats from 2+ agents

Phase 2: Matrix Homeserver (Week 1-2)

Owner: Timmy
Where: Hermes VPS (161.35.250.72)

  • Deploy Conduit via Docker on Hermes VPS
  • Configure Caddy reverse proxy with TLS (matrix.timmytime.ai or direct IP)
  • Create Alexander's account: @alexander:matrix.timmy.ai
  • Create bot accounts for all agents via shared-secret registration
  • Write hermes_matrix_client.py — matrix-nio bot template with auto-join
  • Create rooms: #fleet-ops, #war-room, #timmy-home, #the-nexus
  • Alexander installs Element, connects, verifies DMs work
  • Acceptance: Alexander sends message in Element → agent responds

Phase 3: Hermes Gateway Integration (Week 2-3)

Owner: Timmy + Fleet
Where: hermes-agent codebase

  • Add matrix platform to Hermes gateway (like existing telegram platform)
  • Agent reads Matrix room messages, processes via normal Hermes pipeline
  • Agent posts responses back to Matrix room
  • NATS heartbeat integrated into gateway startup
  • Config: platforms.matrix.enabled: true + platforms.matrix.homeserver: ...
  • Acceptance: Agent on Matrix does everything it currently does on Telegram

Phase 4: Migration & Telegram Sunset (Week 3-4)

Owner: Alexander + Timmy

  • Run Telegram and Matrix in parallel for 1 week
  • Verify all workflows work on Matrix (DMs, group chat, file sharing)
  • Migrate Telegram topics → Matrix rooms
  • Disable Telegram on agents one by one (canary rollout!)
  • Remove TELEGRAM_BOT_TOKEN from all .env files
  • Kill phantom agents (Ezra-B, Bezalel-B with bad tokens)
  • Acceptance: Alexander uses only Element for 48h with no issues

Phase 5: Automated Agent Provisioning (Week 4+)

Owner: Timmy

  • create-agent.sh script: generates NKey, Nostr keypair, Matrix account, Hermes config
  • New agent connects to NATS + Matrix on first boot — zero manual config
  • Agent announces itself in #fleet-ops on Matrix
  • Acceptance: Run script, new agent appears in fleet within 60 seconds

Resource Budget

Component Where RAM Disk Port
nats-server Allegro 50MB 512MB (JetStream) 4222
Conduit Hermes 50MB ~1GB 6167
Caddy Hermes 30MB 443 (shared)
16 matrix-nio bots distributed ~10MB each

Both VPS boxes have capacity. Allegro at 65% disk should clear phantom agent data first.


What Dies

Current Replacement When
Telegram bot tokens Matrix shared-secret registration Phase 2
Telegram getUpdates polling Matrix sync + NATS pub/sub Phase 3
Telegram 409 conflicts NATS queue groups Phase 1
BotFather permission NKey keygen + Matrix register Phase 5
Telegram group/topics Matrix rooms Phase 4
Telegram DMs Matrix E2EE DMs Phase 4

Risks

  1. DNS: timmytime.ai DNS is currently broken (NXDOMAIN). Matrix needs a domain for federation/TLS. Fallback: use direct IP with self-signed cert.
  2. Element UX: Matrix clients aren't as polished as Telegram. Alexander may miss features.
  3. Hermes gateway work: Adding a matrix platform is non-trivial — needs async message loop, room state management, media handling.
  4. E2EE complexity: libolm adds a build dependency. Device verification adds UX friction.

References

  • NATS research: /tmp/nats-research/NATS_DEPLOYMENT_PLAN.md
  • Matrix research: Hermes VPS /root/matrix_homeserver_research.md
  • Existing Nostr work: Allegro /root/nostr-relay/keystore.json
  • RCA #393: Why canary rollouts matter for fleet changes
## Operation Sovereign Comms — Architecture & Phased Plan **Goal:** Replace Telegram with a 3-layer sovereign communication stack. No permissioned tokens. No platform risk. Automated agent creation. **Decision:** Alexander approved this architecture on April 4, 2026. Nostr is the identity layer, NOT the transport layer. --- ### Architecture ``` ┌─────────────────────────────────────────────────────┐ │ ALEXANDER (Human) │ │ Element app (phone + desktop) │ │ Matrix account: @alexander:timmy.ai │ └──────────────────────┬──────────────────────────────┘ │ Matrix (E2EE) ▼ ┌─────────────────────────────────────────────────────┐ │ LAYER 3: MATRIX (Conduit) │ │ Human-to-Fleet Communication │ │ Hermes VPS · Docker · Caddy TLS · ~50MB RAM │ │ │ │ Rooms: │ │ #fleet-ops — dispatch, status, alerts │ │ #timmy-home — repo work discussion │ │ #the-nexus — nexus work discussion │ │ #war-room — burn nights, emergencies │ │ DMs — 1:1 with any agent │ │ │ │ Bots: matrix-nio Python SDK, auto-join rooms │ │ Registration: shared-secret (no BotFather) │ └──────────────────────┬──────────────────────────────┘ │ Internal bridge ▼ ┌─────────────────────────────────────────────────────┐ │ LAYER 1: NATS (nats-server) │ │ Agent-to-Agent Fleet Orchestration │ │ Allegro VPS · Bare metal · NKey auth · ~50MB RAM │ │ │ │ Subjects: │ │ fleet.heartbeat.{agent_id} — 30s pulse │ │ fleet.task.dispatch — queue group work │ │ fleet.task.{agent_id} — direct assignment │ │ fleet.result.{task_id} — JetStream persisted │ │ fleet.status.{agent_id} — model, load, issues │ │ fleet.control.shutdown — graceful shutdown │ │ fleet.control.config — hot config reload │ │ │ │ JetStream Streams: │ │ TASK_RESULTS — 7d retention, file-backed │ │ TASK_QUEUE — work queue, delete-after-ack │ │ AGENT_EVENTS — 24h, memory-backed │ └─────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────┐ │ LAYER 2: NOSTR (Identity) │ │ Keypair Identity for Every Entity │ │ │ │ Every agent has an npub/nsec: │ │ - Used to sign Gitea commits │ │ - Used for cross-system identity verification │ │ - Stored in keystore.json per agent │ │ - NOT used as transport (NATS handles that) │ │ │ │ Existing: relay, allegro, ezra keypairs │ │ Needed: all 16 agents + alexander │ │ strfry relay stays for public identity broadcast │ └─────────────────────────────────────────────────────┘ ``` --- ### Phase 1: NATS Fleet Bus (Week 1) **Owner:** Timmy **Where:** Allegro VPS (167.99.20.209) - [ ] Install nats-server bare metal on Allegro VPS - [ ] Configure NKey auth — generate keypairs for all 16 agents - [ ] Configure JetStream with 3 streams (tasks, results, events) - [ ] Write `hermes_nats_client.py` — drop-in async mixin for Hermes agents - [ ] Agent heartbeat: every 30s, publish to `fleet.heartbeat.{id}` - [ ] Smoke test: 2 agents exchange messages via NATS (no Telegram) - [ ] **Acceptance:** `nats sub 'fleet.heartbeat.>'` shows heartbeats from 2+ agents ### Phase 2: Matrix Homeserver (Week 1-2) **Owner:** Timmy **Where:** Hermes VPS (161.35.250.72) - [ ] Deploy Conduit via Docker on Hermes VPS - [ ] Configure Caddy reverse proxy with TLS (matrix.timmytime.ai or direct IP) - [ ] Create Alexander's account: @alexander:matrix.timmy.ai - [ ] Create bot accounts for all agents via shared-secret registration - [ ] Write `hermes_matrix_client.py` — matrix-nio bot template with auto-join - [ ] Create rooms: #fleet-ops, #war-room, #timmy-home, #the-nexus - [ ] Alexander installs Element, connects, verifies DMs work - [ ] **Acceptance:** Alexander sends message in Element → agent responds ### Phase 3: Hermes Gateway Integration (Week 2-3) **Owner:** Timmy + Fleet **Where:** hermes-agent codebase - [ ] Add `matrix` platform to Hermes gateway (like existing `telegram` platform) - [ ] Agent reads Matrix room messages, processes via normal Hermes pipeline - [ ] Agent posts responses back to Matrix room - [ ] NATS heartbeat integrated into gateway startup - [ ] Config: `platforms.matrix.enabled: true` + `platforms.matrix.homeserver: ...` - [ ] **Acceptance:** Agent on Matrix does everything it currently does on Telegram ### Phase 4: Migration & Telegram Sunset (Week 3-4) **Owner:** Alexander + Timmy - [ ] Run Telegram and Matrix in parallel for 1 week - [ ] Verify all workflows work on Matrix (DMs, group chat, file sharing) - [ ] Migrate Telegram topics → Matrix rooms - [ ] Disable Telegram on agents one by one (canary rollout!) - [ ] Remove TELEGRAM_BOT_TOKEN from all .env files - [ ] Kill phantom agents (Ezra-B, Bezalel-B with bad tokens) - [ ] **Acceptance:** Alexander uses only Element for 48h with no issues ### Phase 5: Automated Agent Provisioning (Week 4+) **Owner:** Timmy - [ ] `create-agent.sh` script: generates NKey, Nostr keypair, Matrix account, Hermes config - [ ] New agent connects to NATS + Matrix on first boot — zero manual config - [ ] Agent announces itself in #fleet-ops on Matrix - [ ] **Acceptance:** Run script, new agent appears in fleet within 60 seconds --- ### Resource Budget | Component | Where | RAM | Disk | Port | |-----------|-------|-----|------|------| | nats-server | Allegro | 50MB | 512MB (JetStream) | 4222 | | Conduit | Hermes | 50MB | ~1GB | 6167 | | Caddy | Hermes | 30MB | — | 443 (shared) | | 16 matrix-nio bots | distributed | ~10MB each | — | — | Both VPS boxes have capacity. Allegro at 65% disk should clear phantom agent data first. --- ### What Dies | Current | Replacement | When | |---------|------------|------| | Telegram bot tokens | Matrix shared-secret registration | Phase 2 | | Telegram getUpdates polling | Matrix sync + NATS pub/sub | Phase 3 | | Telegram 409 conflicts | NATS queue groups | Phase 1 | | BotFather permission | NKey keygen + Matrix register | Phase 5 | | Telegram group/topics | Matrix rooms | Phase 4 | | Telegram DMs | Matrix E2EE DMs | Phase 4 | --- ### Risks 1. **DNS:** `timmytime.ai` DNS is currently broken (NXDOMAIN). Matrix needs a domain for federation/TLS. Fallback: use direct IP with self-signed cert. 2. **Element UX:** Matrix clients aren't as polished as Telegram. Alexander may miss features. 3. **Hermes gateway work:** Adding a `matrix` platform is non-trivial — needs async message loop, room state management, media handling. 4. **E2EE complexity:** libolm adds a build dependency. Device verification adds UX friction. --- ### References - NATS research: `/tmp/nats-research/NATS_DEPLOYMENT_PLAN.md` - Matrix research: Hermes VPS `/root/matrix_homeserver_research.md` - Existing Nostr work: Allegro `/root/nostr-relay/keystore.json` - RCA #393: Why canary rollouts matter for fleet changes
Member

🏷️ Automated Triage Check

Timestamp: 2026-04-04T22:45:02.967783
Agent: Allegro Heartbeat

This issue has been identified as needing triage:

Checklist

  • Clear acceptance criteria defined
  • Priority label assigned (p0-critical / p1-important / p2-backlog)
  • Size estimate added (quick-fix / day / week / epic)
  • Owner assigned
  • Related issues linked

Context

  • No comments yet - needs engagement
  • No labels - needs categorization
  • Part of automated backlog maintenance

Automated triage from Allegro 15-minute heartbeat

## 🏷️ Automated Triage Check **Timestamp:** 2026-04-04T22:45:02.967783 **Agent:** Allegro Heartbeat This issue has been identified as needing triage: ### Checklist - [ ] Clear acceptance criteria defined - [ ] Priority label assigned (p0-critical / p1-important / p2-backlog) - [ ] Size estimate added (quick-fix / day / week / epic) - [ ] Owner assigned - [ ] Related issues linked ### Context - No comments yet - needs engagement - No labels - needs categorization - Part of automated backlog maintenance --- *Automated triage from Allegro 15-minute heartbeat*
Author
Owner

Superseded by current comms implementation home in timmy-config (#173, #166, #181 onward).

Superseded by current comms implementation home in timmy-config (#173, #166, #181 onward).
Timmy closed this issue 2026-04-05 00:14:01 +00:00
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/timmy-home#396