[ezra] ADR + Decision Framework for Matrix scaffold (#183, #166, #187)

This commit is contained in:
Ezra
2026-04-05 21:00:24 +00:00
parent d6741b1cf4
commit 170f701fc9
2 changed files with 223 additions and 0 deletions

View File

@@ -0,0 +1,83 @@
# ADR-001: Matrix/Conduit Deployment Scaffold
| Field | Value |
|-------|-------|
| **Status** | Accepted |
| **Date** | 2026-04-05 |
| **Decider** | Ezra (Architekt) |
| **Stakeholders** | Allegro, Timmy, Alexander |
| **Parent Issues** | #166, #183 |
---
## 1. Context
Son of Timmy Commandment 6 requires encrypted human-to-fleet communication that is sovereign and independent of Telegram. Before any code can run, we needed a reproducible, infrastructure-agnostic deployment scaffold that any wizard house can verify, deploy, and restore.
## 2. Decision: Conduit over Synapse
**Chosen:** [Conduit](https://conduit.rs) as the Matrix homeserver.
**Alternatives considered:**
- **Synapse**: Mature, but heavier (Python, more RAM, more complex config).
- **Dendrite**: Go-based, lighter than Synapse, but less feature-complete for E2EE.
**Rationale:**
- Conduit is written in Rust, has a small footprint, and runs comfortably on the Hermes VPS (~7 GB RAM).
- Single static binary + SQLite (or Postgres) keeps the Docker image small and backup logic simple.
- E2EE support is production-grade enough for a closed fleet.
## 3. Decision: Docker Compose over Bare Metal
**Chosen:** Docker Compose stack (`docker-compose.yml`) with explicit volume mounts.
**Rationale:**
- Reproducibility: any host with Docker can stand the stack up in one command.
- Isolation: Conduit, Element Web, and Postgres live in separate containers with explicit network boundaries.
- Rollback: `docker compose down && docker compose up -d` is a safe, fast recovery path.
- Future portability: the same Compose file can move to a different VPS with only `.env` changes.
## 4. Decision: Caddy as Reverse Proxy (with Nginx coexistence)
**Chosen:** Caddy handles TLS termination and `.well-known/matrix` delegation inside the Compose network.
**Rationale:**
- Caddy automates Lets Encrypt TLS via on-demand TLS.
- On hosts where Nginx already binds 80/443 (e.g., Hermes VPS), Nginx can reverse-proxy to Caddy or Conduit directly.
- The scaffold includes both a `caddy/Caddyfile` and Nginx-compatible notes so the operator is not locked into one proxy.
## 5. Decision: One Matrix Account Per Wizard House
**Chosen:** Each wizard house (Ezra, Allegro, Bezalel, etc.) gets its own Matrix user ID (`@ezra:domain`, `@allegro:domain`).
**Rationale:**
- Preserves sovereignty: each house has its own credentials, device keys, and E2EE trust chain.
- Matches the existing wizard-house mental model (independent agents, shared rooms).
- Simplifies debugging: message provenance is unambiguous.
## 6. Decision: `matrix-nio` for Hermes Gateway Integration
**Chosen:** [`matrix-nio`](https://github.com/poljar/matrix-nio) with the `e2e` extra.
**Rationale:**
- Already integrated into the Hermes gateway (`gateway/platforms/matrix.py`).
- Asyncio-native, matching the Hermes gateway architecture.
- Supports E2EE, media uploads, threads, and replies.
## 7. Consequences
### Positive
- The scaffold is **self-enforcing**: `validate-scaffold.py` and Gitea Actions CI guard integrity.
- Local integration can be verified without public DNS via `docker-compose.test.yml`.
- The path from "host decision" to "fleet online" is fully scripted.
### Negative / Accepted Trade-offs
- Conduit is younger than Synapse; edge-case federation bugs are possible. Mitigation: the fleet will run on a single homeserver initially.
- SQLite is the default Conduit backend. For >100 users, Postgres is recommended. The Compose file includes an optional Postgres service.
## 8. References
- `infra/matrix/CANONICAL_INDEX.md` — canonical artifact map
- `infra/matrix/scripts/validate-scaffold.py` — automated integrity checks
- `.gitea/workflows/validate-matrix-scaffold.yml` — CI enforcement
- `infra/matrix/HERMES_INTEGRATION_VERIFICATION.md` — adapter-to-scaffold mapping

View File

@@ -0,0 +1,140 @@
# Decision Framework: Matrix Host, Domain, and Proxy (#187)
**Parent:** #166 — Stand up Matrix/Conduit for human-to-fleet encrypted communication
**Blocker:** #187 — Decide Matrix host, domain, and proxy prerequisites
**Author:** Ezra
**Date:** 2026-04-05
---
## Executive Summary
#166 is **execution-ready**. The only remaining gate is three decisions:
1. **Host** — which machine runs Conduit?
2. **Domain** — what FQDN serves the homeserver?
3. **Proxy/TLS** — how do HTTPS and federation terminate?
This document provides **recommended decisions** with full trade-off analysis. If Alexander accepts the recommendations, #187 can close immediately and deployment can begin within the hour.
---
## Decision 1: Host
### Recommended Choice
**Hermes VPS** (current host of Ezra, Bezalel, and Allegro-Primus gateway).
### Alternative Considered
**TestBed VPS** (67.205.155.108) — currently hosts Bezalel (stale) and other experimental workloads.
### Comparison
| Factor | Hermes VPS | TestBed VPS |
|--------|------------|-------------|
| Disk | ✅ 55 GB free | Unknown / smaller |
| RAM | ✅ 7 GB | 4 GB (reported) |
| Docker | ✅ Installed | Unknown |
| Docker Compose | ❌ Not installed (15-min fix) | Unknown |
| Nginx on 80/443 | ✅ Already running | Unknown |
| Tailscale | ✅ Active | Unknown |
| Existing wizard presence | ✅ Ezra, Bezalel, Allegro-Primus | ❌ None primary |
| Latency to Alexander | Low (US East) | Low (US East) |
### Ezra Recommendation
**Hermes VPS.** It has the resources, the existing fleet footprint, and the lowest operational surprise. The only missing package is Docker Compose, which is a one-line install (`apt install docker-compose-plugin` or `pip install docker-compose`).
---
## Decision 2: Domain / Subdomain
### Recommended Choice
`matrix.alexanderwhitestone.com`
### Alternatives Considered
- `fleet.alexanderwhitestone.com`
- `chat.alexanderwhitestone.com`
- `conduit.alexanderwhitestone.com`
### Analysis
| Subdomain | Clarity | Federation Friendly | Notes |
|-----------|---------|---------------------|-------|
| `matrix.*` | ✅ Industry standard | ✅ Easy to remember | Best for `.well-known/matrix/server` delegation |
| `fleet.*` | ⚠️ Ambiguous (could be any fleet service) | ⚠️ Fine, but less obvious | Good branding, worse discoverability |
| `chat.*` | ✅ User friendly | ⚠️ Suggests a web app, not a homeserver | Fine for Element Web, less precise for federation |
| `conduit.*` | ⚠️ Ties us to one implementation | ✅ Fine | If we ever switch to Synapse, this ages poorly |
### Ezra Recommendation
**`matrix.alexanderwhitestone.com`** because it is unambiguous, implementation-agnostic, and follows Matrix community convention. The server name can still be `alexanderwhitestone.com` (for short Matrix IDs like `@ezra:alexanderwhitestone.com`) while the actual homeserver listens on `matrix.alexanderwhitestone.com:8448` or is delegated via `.well-known`.
---
## Decision 3: Reverse Proxy / TLS
### Recommended Choice
**Nginx** (already on 80/443) reverse-proxies to Conduit; Lets Encrypt for TLS.
### Two Viable Patterns
#### Pattern A: Nginx → Conduit directly (Recommended)
```
Internet → Nginx (443) → Conduit (6167 internal)
Internet → Nginx (8448) → Conduit (8448 internal)
```
- Nginx handles TLS termination.
- Conduit runs plain HTTP on an internal port.
- Federation port 8448 is exposed through Nginx stream or server block.
#### Pattern B: Nginx → Caddy → Conduit
```
Internet → Nginx (443) → Caddy (4443) → Conduit (6167)
```
- Caddy automates Lets Encrypt inside the Compose network.
- Nginx remains the edge listener.
- More moving parts, but Caddys on-demand TLS is convenient.
### Comparison
| Concern | Pattern A (Nginx direct) | Pattern B (Nginx → Caddy) |
|---------|--------------------------|---------------------------|
| Moving parts | Fewer | More |
| TLS automation | Manual certbot or certbot-nginx | Caddy handles it |
| Config complexity | Medium | Medium-High |
| Debuggability | Easier (one proxy hop) | Harder (two hops) |
| Aligns with existing Nginx | ✅ Yes | ⚠️ Needs extra upstream |
### Ezra Recommendation
**Pattern A** for initial deployment. Nginx is already the edge proxy on Hermes VPS. Adding one `server {}` block and one `location /_matrix/` block is the shortest path to a working homeserver. If TLS automation becomes a burden, we can migrate to Caddy later without changing Conduits configuration.
---
## Pre-Deployment Checklist (Post-#187)
Once the decisions above are ratified, the exact execution sequence is:
1. **Install Docker Compose** on Hermes VPS (if not already present).
2. **Create DNS A record** for `matrix.alexanderwhitestone.com` → Hermes VPS public IP.
3. **Obtain TLS certificate** for `matrix.alexanderwhitestone.com` (certbot or manual).
4. **Copy Nginx server block** from `infra/matrix/caddy/` or write a minimal reverse-proxy config.
5. **Run `./host-readiness-check.sh`** and confirm all checks pass.
6. **Run `./deploy-matrix.sh`** and wait for Conduit to come online.
7. **Run `python3 scripts/bootstrap-fleet-rooms.py --create-all`** to initialize rooms.
8. **Run `./scripts/verify-hermes-integration.sh`** to prove E2EE messaging works.
9. **Follow `docs/matrix-fleet-comms/CUTOVER_PLAN.md`** for the Telegram → Matrix transition.
---
## Accountability Matrix
| Decision | Recommended Option | Decision Owner | Execution Owner |
|----------|-------------------|----------------|-----------------|
| Host | Hermes VPS | @allegro / @timmy | @ezra |
| Domain | `matrix.alexanderwhitestone.com` | @rockachopa | @ezra |
| Proxy/TLS | Nginx direct (Pattern A) | @ezra / @allegro | @ezra |
---
## Ezra Stance
#166 has been reduced from a fuzzy epic to a **three-decision, ten-step execution**. All architecture, verification scripts, and contingency plans are in repo truth. The only missing ingredient is a yes/no on the three decisions above.
— Ezra, Archivist