diff --git a/docs/matrix-fleet-comms/ADR-001-matrix-scaffold.md b/docs/matrix-fleet-comms/ADR-001-matrix-scaffold.md new file mode 100644 index 00000000..9cd308a1 --- /dev/null +++ b/docs/matrix-fleet-comms/ADR-001-matrix-scaffold.md @@ -0,0 +1,83 @@ +# ADR-001: Matrix/Conduit Deployment Scaffold + +| Field | Value | +|-------|-------| +| **Status** | Accepted | +| **Date** | 2026-04-05 | +| **Decider** | Ezra (Architekt) | +| **Stakeholders** | Allegro, Timmy, Alexander | +| **Parent Issues** | #166, #183 | + +--- + +## 1. Context + +Son of Timmy Commandment 6 requires encrypted human-to-fleet communication that is sovereign and independent of Telegram. Before any code can run, we needed a reproducible, infrastructure-agnostic deployment scaffold that any wizard house can verify, deploy, and restore. + +## 2. Decision: Conduit over Synapse + +**Chosen:** [Conduit](https://conduit.rs) as the Matrix homeserver. + +**Alternatives considered:** +- **Synapse**: Mature, but heavier (Python, more RAM, more complex config). +- **Dendrite**: Go-based, lighter than Synapse, but less feature-complete for E2EE. + +**Rationale:** +- Conduit is written in Rust, has a small footprint, and runs comfortably on the Hermes VPS (~7 GB RAM). +- Single static binary + SQLite (or Postgres) keeps the Docker image small and backup logic simple. +- E2EE support is production-grade enough for a closed fleet. + +## 3. Decision: Docker Compose over Bare Metal + +**Chosen:** Docker Compose stack (`docker-compose.yml`) with explicit volume mounts. + +**Rationale:** +- Reproducibility: any host with Docker can stand the stack up in one command. +- Isolation: Conduit, Element Web, and Postgres live in separate containers with explicit network boundaries. +- Rollback: `docker compose down && docker compose up -d` is a safe, fast recovery path. +- Future portability: the same Compose file can move to a different VPS with only `.env` changes. + +## 4. Decision: Caddy as Reverse Proxy (with Nginx coexistence) + +**Chosen:** Caddy handles TLS termination and `.well-known/matrix` delegation inside the Compose network. + +**Rationale:** +- Caddy automates Let’s Encrypt TLS via on-demand TLS. +- On hosts where Nginx already binds 80/443 (e.g., Hermes VPS), Nginx can reverse-proxy to Caddy or Conduit directly. +- The scaffold includes both a `caddy/Caddyfile` and Nginx-compatible notes so the operator is not locked into one proxy. + +## 5. Decision: One Matrix Account Per Wizard House + +**Chosen:** Each wizard house (Ezra, Allegro, Bezalel, etc.) gets its own Matrix user ID (`@ezra:domain`, `@allegro:domain`). + +**Rationale:** +- Preserves sovereignty: each house has its own credentials, device keys, and E2EE trust chain. +- Matches the existing wizard-house mental model (independent agents, shared rooms). +- Simplifies debugging: message provenance is unambiguous. + +## 6. Decision: `matrix-nio` for Hermes Gateway Integration + +**Chosen:** [`matrix-nio`](https://github.com/poljar/matrix-nio) with the `e2e` extra. + +**Rationale:** +- Already integrated into the Hermes gateway (`gateway/platforms/matrix.py`). +- Asyncio-native, matching the Hermes gateway architecture. +- Supports E2EE, media uploads, threads, and replies. + +## 7. Consequences + +### Positive +- The scaffold is **self-enforcing**: `validate-scaffold.py` and Gitea Actions CI guard integrity. +- Local integration can be verified without public DNS via `docker-compose.test.yml`. +- The path from "host decision" to "fleet online" is fully scripted. + +### Negative / Accepted Trade-offs +- Conduit is younger than Synapse; edge-case federation bugs are possible. Mitigation: the fleet will run on a single homeserver initially. +- SQLite is the default Conduit backend. For >100 users, Postgres is recommended. The Compose file includes an optional Postgres service. + +## 8. References + +- `infra/matrix/CANONICAL_INDEX.md` — canonical artifact map +- `infra/matrix/scripts/validate-scaffold.py` — automated integrity checks +- `.gitea/workflows/validate-matrix-scaffold.yml` — CI enforcement +- `infra/matrix/HERMES_INTEGRATION_VERIFICATION.md` — adapter-to-scaffold mapping diff --git a/docs/matrix-fleet-comms/DECISION_FRAMEWORK_187.md b/docs/matrix-fleet-comms/DECISION_FRAMEWORK_187.md new file mode 100644 index 00000000..bfd27972 --- /dev/null +++ b/docs/matrix-fleet-comms/DECISION_FRAMEWORK_187.md @@ -0,0 +1,140 @@ +# Decision Framework: Matrix Host, Domain, and Proxy (#187) + +**Parent:** #166 — Stand up Matrix/Conduit for human-to-fleet encrypted communication +**Blocker:** #187 — Decide Matrix host, domain, and proxy prerequisites +**Author:** Ezra +**Date:** 2026-04-05 + +--- + +## Executive Summary + +#166 is **execution-ready**. The only remaining gate is three decisions: +1. **Host** — which machine runs Conduit? +2. **Domain** — what FQDN serves the homeserver? +3. **Proxy/TLS** — how do HTTPS and federation terminate? + +This document provides **recommended decisions** with full trade-off analysis. If Alexander accepts the recommendations, #187 can close immediately and deployment can begin within the hour. + +--- + +## Decision 1: Host + +### Recommended Choice +**Hermes VPS** (current host of Ezra, Bezalel, and Allegro-Primus gateway). + +### Alternative Considered +**TestBed VPS** (67.205.155.108) — currently hosts Bezalel (stale) and other experimental workloads. + +### Comparison + +| Factor | Hermes VPS | TestBed VPS | +|--------|------------|-------------| +| Disk | ✅ 55 GB free | Unknown / smaller | +| RAM | ✅ 7 GB | 4 GB (reported) | +| Docker | ✅ Installed | Unknown | +| Docker Compose | ❌ Not installed (15-min fix) | Unknown | +| Nginx on 80/443 | ✅ Already running | Unknown | +| Tailscale | ✅ Active | Unknown | +| Existing wizard presence | ✅ Ezra, Bezalel, Allegro-Primus | ❌ None primary | +| Latency to Alexander | Low (US East) | Low (US East) | + +### Ezra Recommendation +**Hermes VPS.** It has the resources, the existing fleet footprint, and the lowest operational surprise. The only missing package is Docker Compose, which is a one-line install (`apt install docker-compose-plugin` or `pip install docker-compose`). + +--- + +## Decision 2: Domain / Subdomain + +### Recommended Choice +`matrix.alexanderwhitestone.com` + +### Alternatives Considered +- `fleet.alexanderwhitestone.com` +- `chat.alexanderwhitestone.com` +- `conduit.alexanderwhitestone.com` + +### Analysis + +| Subdomain | Clarity | Federation Friendly | Notes | +|-----------|---------|---------------------|-------| +| `matrix.*` | ✅ Industry standard | ✅ Easy to remember | Best for `.well-known/matrix/server` delegation | +| `fleet.*` | ⚠️ Ambiguous (could be any fleet service) | ⚠️ Fine, but less obvious | Good branding, worse discoverability | +| `chat.*` | ✅ User friendly | ⚠️ Suggests a web app, not a homeserver | Fine for Element Web, less precise for federation | +| `conduit.*` | ⚠️ Ties us to one implementation | ✅ Fine | If we ever switch to Synapse, this ages poorly | + +### Ezra Recommendation +**`matrix.alexanderwhitestone.com`** because it is unambiguous, implementation-agnostic, and follows Matrix community convention. The server name can still be `alexanderwhitestone.com` (for short Matrix IDs like `@ezra:alexanderwhitestone.com`) while the actual homeserver listens on `matrix.alexanderwhitestone.com:8448` or is delegated via `.well-known`. + +--- + +## Decision 3: Reverse Proxy / TLS + +### Recommended Choice +**Nginx** (already on 80/443) reverse-proxies to Conduit; Let’s Encrypt for TLS. + +### Two Viable Patterns + +#### Pattern A: Nginx → Conduit directly (Recommended) +``` +Internet → Nginx (443) → Conduit (6167 internal) +Internet → Nginx (8448) → Conduit (8448 internal) +``` +- Nginx handles TLS termination. +- Conduit runs plain HTTP on an internal port. +- Federation port 8448 is exposed through Nginx stream or server block. + +#### Pattern B: Nginx → Caddy → Conduit +``` +Internet → Nginx (443) → Caddy (4443) → Conduit (6167) +``` +- Caddy automates Let’s Encrypt inside the Compose network. +- Nginx remains the edge listener. +- More moving parts, but Caddy’s on-demand TLS is convenient. + +### Comparison + +| Concern | Pattern A (Nginx direct) | Pattern B (Nginx → Caddy) | +|---------|--------------------------|---------------------------| +| Moving parts | Fewer | More | +| TLS automation | Manual certbot or certbot-nginx | Caddy handles it | +| Config complexity | Medium | Medium-High | +| Debuggability | Easier (one proxy hop) | Harder (two hops) | +| Aligns with existing Nginx | ✅ Yes | ⚠️ Needs extra upstream | + +### Ezra Recommendation +**Pattern A** for initial deployment. Nginx is already the edge proxy on Hermes VPS. Adding one `server {}` block and one `location /_matrix/` block is the shortest path to a working homeserver. If TLS automation becomes a burden, we can migrate to Caddy later without changing Conduit’s configuration. + +--- + +## Pre-Deployment Checklist (Post-#187) + +Once the decisions above are ratified, the exact execution sequence is: + +1. **Install Docker Compose** on Hermes VPS (if not already present). +2. **Create DNS A record** for `matrix.alexanderwhitestone.com` → Hermes VPS public IP. +3. **Obtain TLS certificate** for `matrix.alexanderwhitestone.com` (certbot or manual). +4. **Copy Nginx server block** from `infra/matrix/caddy/` or write a minimal reverse-proxy config. +5. **Run `./host-readiness-check.sh`** and confirm all checks pass. +6. **Run `./deploy-matrix.sh`** and wait for Conduit to come online. +7. **Run `python3 scripts/bootstrap-fleet-rooms.py --create-all`** to initialize rooms. +8. **Run `./scripts/verify-hermes-integration.sh`** to prove E2EE messaging works. +9. **Follow `docs/matrix-fleet-comms/CUTOVER_PLAN.md`** for the Telegram → Matrix transition. + +--- + +## Accountability Matrix + +| Decision | Recommended Option | Decision Owner | Execution Owner | +|----------|-------------------|----------------|-----------------| +| Host | Hermes VPS | @allegro / @timmy | @ezra | +| Domain | `matrix.alexanderwhitestone.com` | @rockachopa | @ezra | +| Proxy/TLS | Nginx direct (Pattern A) | @ezra / @allegro | @ezra | + +--- + +## Ezra Stance + +#166 has been reduced from a fuzzy epic to a **three-decision, ten-step execution**. All architecture, verification scripts, and contingency plans are in repo truth. The only missing ingredient is a yes/no on the three decisions above. + +— Ezra, Archivist