ci: fix smoke JSON parsing for backup tests (#693 )

docs: add Hermes backup runbook (#693 )
feat: add encrypted Hermes backup pipeline (#693 )
2026-04-14 23:59:46 -04:00 · 2026-04-14 23:57:41 -04:00 · 2026-04-14 23:56:46 -04:00
9 changed files with 456 additions and 416 deletions
--- a/.gitea/workflows/smoke.yml
+++ b/.gitea/workflows/smoke.yml
@@ -14,7 +14,7 @@ jobs:
      - name: Parse check
        run: |
          find . -name '*.yml' -o -name '*.yaml' | grep -v .gitea | xargs -r python3 -c "import sys,yaml; [yaml.safe_load(open(f)) for f in sys.argv[1:]]"
-          find . -name '*.json' | xargs -r python3 -m json.tool > /dev/null
+          find . -name '*.json' -print0 | xargs -0 -r -n1 python3 -m json.tool > /dev/null
          find . -name '*.py' | xargs -r python3 -m py_compile
          find . -name '*.sh' | xargs -r bash -n
          echo "PASS: All files parse"
@@ -22,3 +22,6 @@ jobs:
        run: |
          if grep -rE 'sk-or-|sk-ant-|ghp_|AKIA' . --include='*.yml' --include='*.py' --include='*.sh' 2>/dev/null | grep -v '.gitea' | grep -v 'detect_secrets' | grep -v 'test_trajectory_sanitize'; then exit 1; fi
          echo "PASS: No secrets"
+      - name: Backup pipeline regression test
+        run: |
+          python3 -m unittest discover -s tests -p 'test_backup_pipeline.py' -v
--- a/docs/BACKUP_PIPELINE.md
+++ b/docs/BACKUP_PIPELINE.md
@@ -0,0 +1,98 @@
+# Encrypted Hermes Backup Pipeline
+
+Issue: `timmy-home#693`
+
+This pipeline creates a nightly encrypted archive of `~/.hermes`, stores a local encrypted copy, uploads it to remote storage, and supports restore verification.
+
+## What gets backed up
+
+By default the pipeline archives:
+
+- `~/.hermes/config.yaml`
+- `~/.hermes/state.db`
+- `~/.hermes/sessions/`
+- `~/.hermes/cron/`
+- any other files under `~/.hermes`
+
+Override the source with `BACKUP_SOURCE_DIR=/path/to/.hermes`.
+
+## Backup command
+
+```bash
+BACKUP_PASSPHRASE_FILE=~/.config/timmy/backup.passphrase \
+BACKUP_NAS_TARGET=/Volumes/timmy-nas/hermes-backups \
+bash scripts/backup_pipeline.sh
+```
+
+The script writes:
+
+- local encrypted copy: `~/.timmy-backups/hermes/<timestamp>/hermes-backup-<timestamp>.tar.gz.enc`
+- local manifest: `~/.timmy-backups/hermes/<timestamp>/hermes-backup-<timestamp>.json`
+- log file: `~/.timmy-backups/hermes/logs/backup_pipeline.log`
+
+## Nightly schedule
+
+Run every night at 03:00:
+
+```cron
+0 3 * * * cd /Users/apayne/.timmy/timmy-home && BACKUP_PASSPHRASE_FILE=/Users/apayne/.config/timmy/backup.passphrase BACKUP_NAS_TARGET=/Volumes/timmy-nas/hermes-backups bash scripts/backup_pipeline.sh >> /Users/apayne/.timmy-backups/hermes/logs/cron.log 2>&1
+```
+
+## Remote targets
+
+At least one remote target must be configured.
+
+### Local NAS
+
+Use a mounted path:
+
+```bash
+BACKUP_NAS_TARGET=/Volumes/timmy-nas/hermes-backups
+```
+
+The pipeline copies the encrypted archive and manifest into `<BACKUP_NAS_TARGET>/<timestamp>/`.
+
+### S3-compatible storage
+
+```bash
+BACKUP_PASSPHRASE_FILE=~/.config/timmy/backup.passphrase \
+BACKUP_S3_URI=s3://timmy-backups/hermes \
+AWS_ENDPOINT_URL=https://minio.example.com \
+bash scripts/backup_pipeline.sh
+```
+
+Notes:
+
+- `aws` CLI must be installed if `BACKUP_S3_URI` is set.
+- `AWS_ENDPOINT_URL` is optional and is used for MinIO, R2, and other S3-compatible endpoints.
+
+## Restore playbook
+
+Restore an encrypted archive into a clean target root:
+
+```bash
+BACKUP_PASSPHRASE_FILE=~/.config/timmy/backup.passphrase \
+bash scripts/restore_backup.sh \
+  /Volumes/timmy-nas/hermes-backups/20260415-030000/hermes-backup-20260415-030000.tar.gz.enc \
+  /tmp/hermes-restore
+```
+
+Result:
+
+- restored tree lands at `/tmp/hermes-restore/.hermes`
+- if a sibling manifest exists, the restore script verifies the archive SHA256 before decrypting
+
+## End-to-end verification
+
+Run the regression suite:
+
+```bash
+python3 -m unittest discover -s tests -p 'test_backup_pipeline.py' -v
+```
+
+This proves:
+
+1. the backup output is encrypted
+2. plaintext archives do not leak into the backup destinations
+3. the restore script recreates the original `.hermes` tree end-to-end
+4. the pipeline refuses to run without a remote target
--- a/docs/FLEET_PHASE_1_SURVIVAL.md
+++ b/docs/FLEET_PHASE_1_SURVIVAL.md
@@ -1,61 +0,0 @@
-# [PHASE-1] Survival - Keep the Lights On
-
-Phase 1 is the manual-clicker stage of the fleet. The machines exist. The services exist. The human is still the automation loop.
-
-## Phase Definition
-
- Current state: fleet exists, agents run, everything important still depends on human vigilance.
- Resources tracked here: Capacity, Uptime.
- Next phase: [PHASE-2] Automation - Self-Healing Infrastructure
-
-## Current Buildings
-
- VPS hosts: Ezra, Allegro, Bezalel
- Agents: Timmy harness, Code Claw heartbeat, Gemini AI Studio worker
- Gitea forge
- Evennia worlds
-
-## Current Resource Snapshot
-
- Fleet operational: yes
- Uptime baseline: 0.0%
- Days at or above 95% uptime: 0
- Capacity utilization: 0.0%
-
-## Next Phase Trigger
-
-To unlock [PHASE-2] Automation - Self-Healing Infrastructure, the fleet must hold both of these conditions at once:
- Uptime >= 95% for 30 consecutive days
- Capacity utilization > 60%
- Current trigger state: NOT READY
-
-## Missing Requirements
-
- Uptime 0.0% / 95.0%
- Days at or above 95% uptime: 0/30
- Capacity utilization 0.0% / >60.0%
-
-## Manual Clicker Interpretation
-
-Paperclips analogy: Phase 1 = Manual clicker. You ARE the automation.
-Every restart, every SSH, every check is a manual click.
-
-## Manual Clicks Still Required
-
- Restart agents and services by hand when a node goes dark.
- SSH into machines to verify health, disk, and memory.
- Check Gitea, relay, and world services manually before and after changes.
- Act as the scheduler when automation is missing or only partially wired.
-
-## Repo Signals Already Present
-
- `scripts/fleet_health_probe.sh` — Automated health probe exists and can supply the uptime baseline for the next phase.
- `scripts/fleet_milestones.py` — Milestone tracker exists, so survival achievements can be narrated and logged.
- `scripts/auto_restart_agent.sh` — Auto-restart tooling already exists as phase-2 groundwork.
- `scripts/backup_pipeline.sh` — Backup pipeline scaffold exists for post-survival automation work.
- `infrastructure/timmy-bridge/reports/generate_report.py` — Bridge reporting exists and can summarize heartbeat-driven uptime.
-
-## Notes
-
- The fleet is alive, but the human is still the control loop.
- Phase 1 is about naming reality plainly so later automation has a baseline to beat.
--- a/docs/RUNBOOK_INDEX.md
+++ b/docs/RUNBOOK_INDEX.md
@@ -12,7 +12,8 @@ Quick-reference index for common operational tasks across the Timmy Foundation i
 | Check fleet health | fleet-ops | `python3 scripts/fleet_readiness.py` |
 | Agent scorecard | fleet-ops | `python3 scripts/agent_scorecard.py` |
 | View fleet manifest | fleet-ops | `cat manifest.yaml` |
-| Render Phase-1 survival report | timmy-home | `python3 scripts/fleet_phase_status.py --output docs/FLEET_PHASE_1_SURVIVAL.md` |
+| Backup Hermes state | timmy-home | `BACKUP_PASSPHRASE_FILE=... BACKUP_NAS_TARGET=... bash scripts/backup_pipeline.sh` |
+| Restore Hermes state | timmy-home | `BACKUP_PASSPHRASE_FILE=... bash scripts/restore_backup.sh <archive> <restore-root>` |

 ## the-nexus (Frontend + Brain)

--- a/scripts/backup_pipeline.sh
+++ b/scripts/backup_pipeline.sh
@@ -1,80 +1,170 @@
 #!/usr/bin/env bash
-# backup_pipeline.sh — Daily fleet backup pipeline (FLEET-008)
-# Refs: timmy-home #561
+# backup_pipeline.sh — Nightly encrypted Hermes backup pipeline
+# Refs: timmy-home #693, timmy-home #561
 set -euo pipefail

-BACKUP_ROOT="/backups/timmy"
-DATESTAMP=$(date +%Y%m%d-%H%M%S)
-BACKUP_DIR="${BACKUP_ROOT}/${DATESTAMP}"
-LOG_DIR="/var/log/timmy"
-ALERT_LOG="${LOG_DIR}/backup_pipeline.log"
-mkdir -p "$BACKUP_DIR" "$LOG_DIR"
+DATESTAMP="${BACKUP_TIMESTAMP:-$(date +%Y%m%d-%H%M%S)}"
+BACKUP_SOURCE_DIR="${BACKUP_SOURCE_DIR:-${HOME}/.hermes}"
+BACKUP_ROOT="${BACKUP_ROOT:-${HOME}/.timmy-backups/hermes}"
+BACKUP_LOG_DIR="${BACKUP_LOG_DIR:-${BACKUP_ROOT}/logs}"
+BACKUP_RETENTION_DAYS="${BACKUP_RETENTION_DAYS:-14}"
+BACKUP_S3_URI="${BACKUP_S3_URI:-}"
+BACKUP_NAS_TARGET="${BACKUP_NAS_TARGET:-}"
+AWS_ENDPOINT_URL="${AWS_ENDPOINT_URL:-}"
+BACKUP_NAME="hermes-backup-${DATESTAMP}"
+LOCAL_BACKUP_DIR="${BACKUP_ROOT}/${DATESTAMP}"
+STAGE_DIR="$(mktemp -d "${TMPDIR:-/tmp}/timmy-backup.XXXXXX")"
+PLAINTEXT_ARCHIVE="${STAGE_DIR}/${BACKUP_NAME}.tar.gz"
+ENCRYPTED_ARCHIVE="${STAGE_DIR}/${BACKUP_NAME}.tar.gz.enc"
+MANIFEST_PATH="${STAGE_DIR}/${BACKUP_NAME}.json"
+ALERT_LOG="${BACKUP_LOG_DIR}/backup_pipeline.log"
+PASSFILE_CLEANUP=""

-TELEGRAM_BOT_TOKEN="${TELEGRAM_BOT_TOKEN:-}"
-TELEGRAM_CHAT_ID="${TELEGRAM_CHAT_ID:-}"
-OFFSITE_TARGET="${OFFSITE_TARGET:-}"
+mkdir -p "$BACKUP_LOG_DIR"

-log() { echo "[$(date -Iseconds)] $1" | tee -a "$ALERT_LOG"; }
+log() {
+    echo "[$(date -Iseconds)] $1" | tee -a "$ALERT_LOG"
+}

-send_telegram() {
-    local msg="$1"
-    if [[ -n "$TELEGRAM_BOT_TOKEN" && -n "$TELEGRAM_CHAT_ID" ]]; then
-        curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
-            -d "chat_id=${TELEGRAM_CHAT_ID}" -d "text=${msg}" >/dev/null 2>&1 || true
+fail() {
+    log "ERROR: $1"
+    exit 1
+}
+
+cleanup() {
+    rm -f "$PLAINTEXT_ARCHIVE"
+    rm -rf "$STAGE_DIR"
+    if [[ -n "$PASSFILE_CLEANUP" && -f "$PASSFILE_CLEANUP" ]]; then
+        rm -f "$PASSFILE_CLEANUP"
+    fi
+}
+trap cleanup EXIT
+
+resolve_passphrase_file() {
+    if [[ -n "${BACKUP_PASSPHRASE_FILE:-}" ]]; then
+        [[ -f "$BACKUP_PASSPHRASE_FILE" ]] || fail "BACKUP_PASSPHRASE_FILE does not exist: $BACKUP_PASSPHRASE_FILE"
+        echo "$BACKUP_PASSPHRASE_FILE"
+        return
+    fi
+
+    if [[ -n "${BACKUP_PASSPHRASE:-}" ]]; then
+        PASSFILE_CLEANUP="${STAGE_DIR}/backup.passphrase"
+        printf '%s' "$BACKUP_PASSPHRASE" > "$PASSFILE_CLEANUP"
+        chmod 600 "$PASSFILE_CLEANUP"
+        echo "$PASSFILE_CLEANUP"
+        return
+    fi
+
+    fail "Set BACKUP_PASSPHRASE_FILE or BACKUP_PASSPHRASE before running the backup pipeline."
+}
+
+sha256_file() {
+    local path="$1"
+    if command -v shasum >/dev/null 2>&1; then
+        shasum -a 256 "$path" | awk '{print $1}'
+    elif command -v sha256sum >/dev/null 2>&1; then
+        sha256sum "$path" | awk '{print $1}'
+    else
+        python3 - <<'PY' "$path"
+import hashlib
+import pathlib
+import sys
+path = pathlib.Path(sys.argv[1])
+h = hashlib.sha256()
+with path.open('rb') as f:
+    for chunk in iter(lambda: f.read(1024 * 1024), b''):
+        h.update(chunk)
+print(h.hexdigest())
+PY
    fi
 }

-status=0
+write_manifest() {
+    python3 - <<'PY' "$1" "$2" "$3" "$4" "$5" "$6" "$7" "$8"
+import json
+import sys
+manifest_path, source_dir, archive_name, archive_sha256, local_dir, s3_uri, nas_target, created_at = sys.argv[1:]
+manifest = {
+    "created_at": created_at,
+    "source_dir": source_dir,
+    "archive_name": archive_name,
+    "archive_sha256": archive_sha256,
+    "encryption": {
+        "type": "openssl",
+        "cipher": "aes-256-cbc",
+        "pbkdf2": True,
+        "iterations": 200000,
+    },
+    "destinations": {
+        "local_dir": local_dir,
+        "s3_uri": s3_uri or None,
+        "nas_target": nas_target or None,
+    },
+}
+with open(manifest_path, 'w', encoding='utf-8') as handle:
+    json.dump(manifest, handle, indent=2)
+    handle.write('\n')
+PY
+}

-# --- Gitea repositories ---
-if [[ -d /root/gitea ]]; then
-    tar czf "${BACKUP_DIR}/gitea-repos.tar.gz" -C /root gitea 2>/dev/null || true
-    log "Backed up Gitea repos"
-fi
+upload_to_nas() {
+    local archive_path="$1"
+    local manifest_path="$2"
+    local target_root="$3"

-# --- Agent configs and state ---
-for wiz in bezalel allegro ezra timmy; do
-    if [[ -d "/root/wizards/${wiz}" ]]; then
-        tar czf "${BACKUP_DIR}/${wiz}-home.tar.gz" -C /root/wizards "${wiz}" 2>/dev/null || true
-        log "Backed up ${wiz} home"
+    local target_dir="${target_root%/}/${DATESTAMP}"
+    mkdir -p "$target_dir"
+    cp "$archive_path" "$manifest_path" "$target_dir/"
+    log "Uploaded backup to NAS target: $target_dir"
+}
+
+upload_to_s3() {
+    local archive_path="$1"
+    local manifest_path="$2"
+
+    command -v aws >/dev/null 2>&1 || fail "BACKUP_S3_URI is set but aws CLI is not installed."
+
+    local args=()
+    if [[ -n "$AWS_ENDPOINT_URL" ]]; then
+        args+=(--endpoint-url "$AWS_ENDPOINT_URL")
    fi
-done

-# --- System configs ---
-cp /etc/crontab "${BACKUP_DIR}/crontab" 2>/dev/null || true
-cp -r /etc/systemd/system "${BACKUP_DIR}/systemd" 2>/dev/null || true
-log "Backed up system configs"
+    aws "${args[@]}" s3 cp "$archive_path" "${BACKUP_S3_URI%/}/$(basename "$archive_path")"
+    aws "${args[@]}" s3 cp "$manifest_path" "${BACKUP_S3_URI%/}/$(basename "$manifest_path")"
+    log "Uploaded backup to S3 target: $BACKUP_S3_URI"
+}

-# --- Evennia worlds (if present) ---
-if [[ -d /root/evennia ]]; then
-    tar czf "${BACKUP_DIR}/evennia-worlds.tar.gz" -C /root evennia 2>/dev/null || true
-    log "Backed up Evennia worlds"
+[[ -d "$BACKUP_SOURCE_DIR" ]] || fail "BACKUP_SOURCE_DIR does not exist: $BACKUP_SOURCE_DIR"
+[[ -n "$BACKUP_NAS_TARGET" || -n "$BACKUP_S3_URI" ]] || fail "Set BACKUP_NAS_TARGET or BACKUP_S3_URI for remote backup storage."
+
+PASSFILE="$(resolve_passphrase_file)"
+mkdir -p "$LOCAL_BACKUP_DIR"
+
+log "Creating archive from $BACKUP_SOURCE_DIR"
+tar -czf "$PLAINTEXT_ARCHIVE" -C "$(dirname "$BACKUP_SOURCE_DIR")" "$(basename "$BACKUP_SOURCE_DIR")"
+
+log "Encrypting archive"
+openssl enc -aes-256-cbc -salt -pbkdf2 -iter 200000 \
+    -pass "file:${PASSFILE}" \
+    -in "$PLAINTEXT_ARCHIVE" \
+    -out "$ENCRYPTED_ARCHIVE"
+
+ARCHIVE_SHA256="$(sha256_file "$ENCRYPTED_ARCHIVE")"
+CREATED_AT="$(date -u '+%Y-%m-%dT%H:%M:%SZ')"
+write_manifest "$MANIFEST_PATH" "$BACKUP_SOURCE_DIR" "$(basename "$ENCRYPTED_ARCHIVE")" "$ARCHIVE_SHA256" "$LOCAL_BACKUP_DIR" "$BACKUP_S3_URI" "$BACKUP_NAS_TARGET" "$CREATED_AT"
+
+cp "$ENCRYPTED_ARCHIVE" "$MANIFEST_PATH" "$LOCAL_BACKUP_DIR/"
+rm -f "$PLAINTEXT_ARCHIVE"
+log "Encrypted backup stored locally: ${LOCAL_BACKUP_DIR}/$(basename "$ENCRYPTED_ARCHIVE")"
+
+if [[ -n "$BACKUP_NAS_TARGET" ]]; then
+    upload_to_nas "$ENCRYPTED_ARCHIVE" "$MANIFEST_PATH" "$BACKUP_NAS_TARGET"
 fi

-# --- Manifest ---
-find "$BACKUP_DIR" -type f > "${BACKUP_DIR}/manifest.txt"
-log "Backup manifest written"
-
-# --- Offsite sync ---
-if [[ -n "$OFFSITE_TARGET" ]]; then
-    if rsync -az --delete "${BACKUP_DIR}/" "${OFFSITE_TARGET}/${DATESTAMP}/" 2>/dev/null; then
-        log "Offsite sync completed"
-    else
-        log "WARNING: Offsite sync failed"
-        status=1
-    fi
+if [[ -n "$BACKUP_S3_URI" ]]; then
+    upload_to_s3 "$ENCRYPTED_ARCHIVE" "$MANIFEST_PATH"
 fi

-# --- Retention: keep last 7 days ---
-find "$BACKUP_ROOT" -mindepth 1 -maxdepth 1 -type d -mtime +7 -exec rm -rf {} + 2>/dev/null || true
-log "Retention applied (7 days)"
-
-if [[ "$status" -eq 0 ]]; then
-    log "Backup pipeline completed: ${BACKUP_DIR}"
-    send_telegram "✅ Daily backup completed: ${DATESTAMP}"
-else
-    log "Backup pipeline completed with WARNINGS: ${BACKUP_DIR}"
-    send_telegram "⚠️ Daily backup completed with warnings: ${DATESTAMP}"
-fi
-
-exit "$status"
+find "$BACKUP_ROOT" -mindepth 1 -maxdepth 1 -type d -name '20*' -mtime "+${BACKUP_RETENTION_DAYS}" -exec rm -rf {} + 2>/dev/null || true
+log "Retention applied (${BACKUP_RETENTION_DAYS} days)"
+log "Backup pipeline completed successfully"
--- a/scripts/fleet_phase_status.py
+++ b/scripts/fleet_phase_status.py
@@ -1,224 +0,0 @@
-#!/usr/bin/env python3
-"""Render the current fleet survival phase as a durable report."""
-
-from __future__ import annotations
-
-import argparse
-import json
-from copy import deepcopy
-from pathlib import Path
-from typing import Any
-
-
-PHASE_NAME = "[PHASE-1] Survival - Keep the Lights On"
-NEXT_PHASE_NAME = "[PHASE-2] Automation - Self-Healing Infrastructure"
-TARGET_UPTIME_PERCENT = 95.0
-TARGET_UPTIME_DAYS = 30
-TARGET_CAPACITY_PERCENT = 60.0
-
-DEFAULT_BUILDINGS = [
-    "VPS hosts: Ezra, Allegro, Bezalel",
-    "Agents: Timmy harness, Code Claw heartbeat, Gemini AI Studio worker",
-    "Gitea forge",
-    "Evennia worlds",
-]
-
-DEFAULT_MANUAL_CLICKS = [
-    "Restart agents and services by hand when a node goes dark.",
-    "SSH into machines to verify health, disk, and memory.",
-    "Check Gitea, relay, and world services manually before and after changes.",
-    "Act as the scheduler when automation is missing or only partially wired.",
-]
-
-REPO_SIGNAL_FILES = {
-    "scripts/fleet_health_probe.sh": "Automated health probe exists and can supply the uptime baseline for the next phase.",
-    "scripts/fleet_milestones.py": "Milestone tracker exists, so survival achievements can be narrated and logged.",
-    "scripts/auto_restart_agent.sh": "Auto-restart tooling already exists as phase-2 groundwork.",
-    "scripts/backup_pipeline.sh": "Backup pipeline scaffold exists for post-survival automation work.",
-    "infrastructure/timmy-bridge/reports/generate_report.py": "Bridge reporting exists and can summarize heartbeat-driven uptime.",
-}
-
-DEFAULT_SNAPSHOT = {
-    "fleet_operational": True,
-    "resources": {
-        "uptime_percent": 0.0,
-        "days_at_or_above_95_percent": 0,
-        "capacity_utilization_percent": 0.0,
-    },
-    "current_buildings": DEFAULT_BUILDINGS,
-    "manual_clicks": DEFAULT_MANUAL_CLICKS,
-    "notes": [
-        "The fleet is alive, but the human is still the control loop.",
-        "Phase 1 is about naming reality plainly so later automation has a baseline to beat.",
-    ],
-}
-
-
-def default_snapshot() -> dict[str, Any]:
-    return deepcopy(DEFAULT_SNAPSHOT)
-
-
-def _deep_merge(base: dict[str, Any], override: dict[str, Any]) -> dict[str, Any]:
-    result = deepcopy(base)
-    for key, value in override.items():
-        if isinstance(value, dict) and isinstance(result.get(key), dict):
-            result[key] = _deep_merge(result[key], value)
-        else:
-            result[key] = value
-    return result
-
-
-def load_snapshot(snapshot_path: Path | None = None) -> dict[str, Any]:
-    snapshot = default_snapshot()
-    if snapshot_path is None:
-        return snapshot
-    override = json.loads(snapshot_path.read_text(encoding="utf-8"))
-    return _deep_merge(snapshot, override)
-
-
-def collect_repo_signals(repo_root: Path) -> list[str]:
-    signals: list[str] = []
-    for rel_path, description in REPO_SIGNAL_FILES.items():
-        if (repo_root / rel_path).exists():
-            signals.append(f"`{rel_path}` — {description}")
-    return signals
-
-
-def compute_phase_status(snapshot: dict[str, Any], repo_root: Path | None = None) -> dict[str, Any]:
-    repo_root = repo_root or Path(__file__).resolve().parents[1]
-    resources = snapshot.get("resources", {})
-    uptime_percent = float(resources.get("uptime_percent", 0.0))
-    uptime_days = int(resources.get("days_at_or_above_95_percent", 0))
-    capacity_percent = float(resources.get("capacity_utilization_percent", 0.0))
-    fleet_operational = bool(snapshot.get("fleet_operational", False))
-
-    missing: list[str] = []
-    if not fleet_operational:
-        missing.append("Fleet operational flag is false.")
-    if uptime_percent < TARGET_UPTIME_PERCENT:
-        missing.append(f"Uptime {uptime_percent:.1f}% / {TARGET_UPTIME_PERCENT:.1f}%")
-    if uptime_days < TARGET_UPTIME_DAYS:
-        missing.append(f"Days at or above 95% uptime: {uptime_days}/{TARGET_UPTIME_DAYS}")
-    if capacity_percent <= TARGET_CAPACITY_PERCENT:
-        missing.append(f"Capacity utilization {capacity_percent:.1f}% / >{TARGET_CAPACITY_PERCENT:.1f}%")
-
-    return {
-        "title": PHASE_NAME,
-        "current_phase": "PHASE-1 Survival",
-        "fleet_operational": fleet_operational,
-        "resources": {
-            "uptime_percent": uptime_percent,
-            "days_at_or_above_95_percent": uptime_days,
-            "capacity_utilization_percent": capacity_percent,
-        },
-        "current_buildings": list(snapshot.get("current_buildings", DEFAULT_BUILDINGS)),
-        "manual_clicks": list(snapshot.get("manual_clicks", DEFAULT_MANUAL_CLICKS)),
-        "notes": list(snapshot.get("notes", [])),
-        "repo_signals": collect_repo_signals(repo_root),
-        "next_phase": NEXT_PHASE_NAME,
-        "next_phase_ready": fleet_operational and not missing,
-        "missing_requirements": missing,
-    }
-
-
-def render_markdown(status: dict[str, Any]) -> str:
-    resources = status["resources"]
-    missing = status["missing_requirements"]
-    ready_line = "READY" if status["next_phase_ready"] else "NOT READY"
-
-    lines = [
-        f"# {status['title']}",
-        "",
-        "Phase 1 is the manual-clicker stage of the fleet. The machines exist. The services exist. The human is still the automation loop.",
-        "",
-        "## Phase Definition",
-        "",
-        "- Current state: fleet exists, agents run, everything important still depends on human vigilance.",
-        "- Resources tracked here: Capacity, Uptime.",
-        f"- Next phase: {status['next_phase']}",
-        "",
-        "## Current Buildings",
-        "",
-    ]
-    lines.extend(f"- {item}" for item in status["current_buildings"])
-
-    lines.extend([
-        "",
-        "## Current Resource Snapshot",
-        "",
-        f"- Fleet operational: {'yes' if status['fleet_operational'] else 'no'}",
-        f"- Uptime baseline: {resources['uptime_percent']:.1f}%",
-        f"- Days at or above 95% uptime: {resources['days_at_or_above_95_percent']}",
-        f"- Capacity utilization: {resources['capacity_utilization_percent']:.1f}%",
-        "",
-        "## Next Phase Trigger",
-        "",
-        f"To unlock {status['next_phase']}, the fleet must hold both of these conditions at once:",
-        f"- Uptime >= {TARGET_UPTIME_PERCENT:.0f}% for {TARGET_UPTIME_DAYS} consecutive days",
-        f"- Capacity utilization > {TARGET_CAPACITY_PERCENT:.0f}%",
-        f"- Current trigger state: {ready_line}",
-        "",
-        "## Missing Requirements",
-        "",
-    ])
-    if missing:
-        lines.extend(f"- {item}" for item in missing)
-    else:
-        lines.append("- None. Phase 2 can unlock now.")
-
-    lines.extend([
-        "",
-        "## Manual Clicker Interpretation",
-        "",
-        "Paperclips analogy: Phase 1 = Manual clicker. You ARE the automation.",
-        "Every restart, every SSH, every check is a manual click.",
-        "",
-        "## Manual Clicks Still Required",
-        "",
-    ])
-    lines.extend(f"- {item}" for item in status["manual_clicks"])
-
-    lines.extend([
-        "",
-        "## Repo Signals Already Present",
-        "",
-    ])
-    if status["repo_signals"]:
-        lines.extend(f"- {item}" for item in status["repo_signals"])
-    else:
-        lines.append("- No survival-adjacent repo signals detected.")
-
-    if status["notes"]:
-        lines.extend(["", "## Notes", ""])
-        lines.extend(f"- {item}" for item in status["notes"])
-
-    return "\n".join(lines).rstrip() + "\n"
-
-
-def main() -> None:
-    parser = argparse.ArgumentParser(description="Render the fleet phase-1 survival report")
-    parser.add_argument("--snapshot", help="Optional JSON snapshot overriding the default phase-1 baseline")
-    parser.add_argument("--output", help="Write markdown report to this path")
-    parser.add_argument("--json", action="store_true", help="Print computed status as JSON instead of markdown")
-    args = parser.parse_args()
-
-    snapshot = load_snapshot(Path(args.snapshot).expanduser() if args.snapshot else None)
-    repo_root = Path(__file__).resolve().parents[1]
-    status = compute_phase_status(snapshot, repo_root=repo_root)
-
-    if args.json:
-        rendered = json.dumps(status, indent=2)
-    else:
-        rendered = render_markdown(status)
-
-    if args.output:
-        output_path = Path(args.output).expanduser()
-        output_path.parent.mkdir(parents=True, exist_ok=True)
-        output_path.write_text(rendered, encoding="utf-8")
-        print(f"Phase status written to {output_path}")
-    else:
-        print(rendered)
-
-
-if __name__ == "__main__":
-    main()
--- a/scripts/restore_backup.sh
+++ b/scripts/restore_backup.sh
@@ -0,0 +1,97 @@
+#!/usr/bin/env bash
+# restore_backup.sh — Restore an encrypted Hermes backup archive
+# Usage: restore_backup.sh /path/to/hermes-backup-YYYYmmdd-HHMMSS.tar.gz.enc /restore/root
+set -euo pipefail
+
+ARCHIVE_PATH="${1:-}"
+RESTORE_ROOT="${2:-}"
+STAGE_DIR="$(mktemp -d "${TMPDIR:-/tmp}/timmy-restore.XXXXXX")"
+PLAINTEXT_ARCHIVE="${STAGE_DIR}/restore.tar.gz"
+PASSFILE_CLEANUP=""
+
+cleanup() {
+    rm -f "$PLAINTEXT_ARCHIVE"
+    rm -rf "$STAGE_DIR"
+    if [[ -n "$PASSFILE_CLEANUP" && -f "$PASSFILE_CLEANUP" ]]; then
+        rm -f "$PASSFILE_CLEANUP"
+    fi
+}
+trap cleanup EXIT
+
+fail() {
+    echo "ERROR: $1" >&2
+    exit 1
+}
+
+resolve_passphrase_file() {
+    if [[ -n "${BACKUP_PASSPHRASE_FILE:-}" ]]; then
+        [[ -f "$BACKUP_PASSPHRASE_FILE" ]] || fail "BACKUP_PASSPHRASE_FILE does not exist: $BACKUP_PASSPHRASE_FILE"
+        echo "$BACKUP_PASSPHRASE_FILE"
+        return
+    fi
+
+    if [[ -n "${BACKUP_PASSPHRASE:-}" ]]; then
+        PASSFILE_CLEANUP="${STAGE_DIR}/backup.passphrase"
+        printf '%s' "$BACKUP_PASSPHRASE" > "$PASSFILE_CLEANUP"
+        chmod 600 "$PASSFILE_CLEANUP"
+        echo "$PASSFILE_CLEANUP"
+        return
+    fi
+
+    fail "Set BACKUP_PASSPHRASE_FILE or BACKUP_PASSPHRASE before restoring a backup."
+}
+
+sha256_file() {
+    local path="$1"
+    if command -v shasum >/dev/null 2>&1; then
+        shasum -a 256 "$path" | awk '{print $1}'
+    elif command -v sha256sum >/dev/null 2>&1; then
+        sha256sum "$path" | awk '{print $1}'
+    else
+        python3 - <<'PY' "$path"
+import hashlib
+import pathlib
+import sys
+path = pathlib.Path(sys.argv[1])
+h = hashlib.sha256()
+with path.open('rb') as f:
+    for chunk in iter(lambda: f.read(1024 * 1024), b''):
+        h.update(chunk)
+print(h.hexdigest())
+PY
+    fi
+}
+
+[[ -n "$ARCHIVE_PATH" ]] || fail "Usage: restore_backup.sh /path/to/archive.tar.gz.enc /restore/root"
+[[ -n "$RESTORE_ROOT" ]] || fail "Usage: restore_backup.sh /path/to/archive.tar.gz.enc /restore/root"
+[[ -f "$ARCHIVE_PATH" ]] || fail "Archive not found: $ARCHIVE_PATH"
+
+if [[ "$ARCHIVE_PATH" == *.tar.gz.enc ]]; then
+    MANIFEST_PATH="${ARCHIVE_PATH%.tar.gz.enc}.json"
+else
+    MANIFEST_PATH=""
+fi
+
+if [[ -n "$MANIFEST_PATH" && -f "$MANIFEST_PATH" ]]; then
+    EXPECTED_SHA="$(python3 - <<'PY' "$MANIFEST_PATH"
+import json
+import sys
+with open(sys.argv[1], 'r', encoding='utf-8') as handle:
+    manifest = json.load(handle)
+print(manifest['archive_sha256'])
+PY
+)"
+    ACTUAL_SHA="$(sha256_file "$ARCHIVE_PATH")"
+    [[ "$EXPECTED_SHA" == "$ACTUAL_SHA" ]] || fail "Archive SHA256 mismatch: expected $EXPECTED_SHA got $ACTUAL_SHA"
+fi
+
+PASSFILE="$(resolve_passphrase_file)"
+mkdir -p "$RESTORE_ROOT"
+
+openssl enc -d -aes-256-cbc -salt -pbkdf2 -iter 200000 \
+    -pass "file:${PASSFILE}" \
+    -in "$ARCHIVE_PATH" \
+    -out "$PLAINTEXT_ARCHIVE"
+
+tar -xzf "$PLAINTEXT_ARCHIVE" -C "$RESTORE_ROOT"
+echo "Restored backup into $RESTORE_ROOT"
--- a/tests/test_backup_pipeline.py
+++ b/tests/test_backup_pipeline.py
@@ -0,0 +1,103 @@
+#!/usr/bin/env python3
+import os
+import subprocess
+import tempfile
+import unittest
+from pathlib import Path
+
+
+ROOT = Path(__file__).resolve().parents[1]
+BACKUP_SCRIPT = ROOT / "scripts" / "backup_pipeline.sh"
+RESTORE_SCRIPT = ROOT / "scripts" / "restore_backup.sh"
+
+
+class TestBackupPipeline(unittest.TestCase):
+    def setUp(self) -> None:
+        self.tempdir = tempfile.TemporaryDirectory()
+        self.base = Path(self.tempdir.name)
+        self.home = self.base / "home"
+        self.source_dir = self.home / ".hermes"
+        self.source_dir.mkdir(parents=True)
+        (self.source_dir / "sessions").mkdir()
+        (self.source_dir / "cron").mkdir()
+        (self.source_dir / "config.yaml").write_text("model: local-first\n")
+        (self.source_dir / "sessions" / "session.jsonl").write_text('{"role":"assistant","content":"hello"}\n')
+        (self.source_dir / "cron" / "jobs.json").write_text('{"jobs": 1}\n')
+        (self.source_dir / "state.db").write_bytes(b"sqlite-state")
+
+        self.backup_root = self.base / "backup-root"
+        self.nas_target = self.base / "nas-target"
+        self.restore_root = self.base / "restore-root"
+        self.log_dir = self.base / "logs"
+        self.passphrase_file = self.base / "backup.passphrase"
+        self.passphrase_file.write_text("correct horse battery staple\n")
+
+    def tearDown(self) -> None:
+        self.tempdir.cleanup()
+
+    def _env(self, *, include_remote: bool = True) -> dict[str, str]:
+        env = os.environ.copy()
+        env.update(
+            {
+                "HOME": str(self.home),
+                "BACKUP_SOURCE_DIR": str(self.source_dir),
+                "BACKUP_ROOT": str(self.backup_root),
+                "BACKUP_LOG_DIR": str(self.log_dir),
+                "BACKUP_PASSPHRASE_FILE": str(self.passphrase_file),
+            }
+        )
+        if include_remote:
+            env["BACKUP_NAS_TARGET"] = str(self.nas_target)
+        return env
+
+    def test_backup_encrypts_and_restore_round_trips(self) -> None:
+        backup = subprocess.run(
+            ["bash", str(BACKUP_SCRIPT)],
+            capture_output=True,
+            text=True,
+            env=self._env(),
+            cwd=ROOT,
+        )
+        self.assertEqual(backup.returncode, 0, msg=backup.stdout + backup.stderr)
+
+        encrypted_archives = sorted(self.nas_target.rglob("*.tar.gz.enc"))
+        self.assertEqual(len(encrypted_archives), 1, msg=f"expected one encrypted archive, found: {encrypted_archives}")
+        archive_path = encrypted_archives[0]
+        self.assertNotIn(b"model: local-first", archive_path.read_bytes())
+
+        manifests = sorted(self.nas_target.rglob("*.json"))
+        self.assertEqual(len(manifests), 1, msg=f"expected one manifest, found: {manifests}")
+
+        plaintext_archives = sorted(self.backup_root.rglob("*.tar.gz")) + sorted(self.nas_target.rglob("*.tar.gz"))
+        self.assertEqual(plaintext_archives, [], msg=f"plaintext archives leaked: {plaintext_archives}")
+
+        restore = subprocess.run(
+            ["bash", str(RESTORE_SCRIPT), str(archive_path), str(self.restore_root)],
+            capture_output=True,
+            text=True,
+            env=self._env(),
+            cwd=ROOT,
+        )
+        self.assertEqual(restore.returncode, 0, msg=restore.stdout + restore.stderr)
+
+        restored_hermes = self.restore_root / ".hermes"
+        self.assertTrue(restored_hermes.exists())
+        self.assertEqual((restored_hermes / "config.yaml").read_text(), "model: local-first\n")
+        self.assertEqual((restored_hermes / "sessions" / "session.jsonl").read_text(), '{"role":"assistant","content":"hello"}\n')
+        self.assertEqual((restored_hermes / "cron" / "jobs.json").read_text(), '{"jobs": 1}\n')
+        self.assertEqual((restored_hermes / "state.db").read_bytes(), b"sqlite-state")
+
+    def test_backup_requires_remote_target(self) -> None:
+        backup = subprocess.run(
+            ["bash", str(BACKUP_SCRIPT)],
+            capture_output=True,
+            text=True,
+            env=self._env(include_remote=False),
+            cwd=ROOT,
+        )
+        self.assertNotEqual(backup.returncode, 0)
+        self.assertIn("BACKUP_NAS_TARGET or BACKUP_S3_URI", backup.stdout + backup.stderr)
+
+
+if __name__ == "__main__":
+    unittest.main(verbosity=2)
--- a/tests/test_fleet_phase_status.py
+++ b/tests/test_fleet_phase_status.py
@@ -1,67 +0,0 @@
-from __future__ import annotations
-
-import importlib.util
-from pathlib import Path
-
-
-ROOT = Path(__file__).resolve().parents[1]
-SCRIPT_PATH = ROOT / "scripts" / "fleet_phase_status.py"
-DOC_PATH = ROOT / "docs" / "FLEET_PHASE_1_SURVIVAL.md"
-
-
-def _load_module(path: Path, name: str):
-    assert path.exists(), f"missing {path.relative_to(ROOT)}"
-    spec = importlib.util.spec_from_file_location(name, path)
-    assert spec and spec.loader
-    module = importlib.util.module_from_spec(spec)
-    spec.loader.exec_module(module)
-    return module
-
-
-def test_compute_phase_status_tracks_survival_gate_requirements() -> None:
-    mod = _load_module(SCRIPT_PATH, "fleet_phase_status")
-
-    status = mod.compute_phase_status(
-        {
-            "fleet_operational": True,
-            "resources": {
-                "uptime_percent": 94.5,
-                "days_at_or_above_95_percent": 12,
-                "capacity_utilization_percent": 45.0,
-            },
-        }
-    )
-
-    assert status["current_phase"] == "PHASE-1 Survival"
-    assert status["next_phase_ready"] is False
-    assert any("94.5% / 95.0%" in item for item in status["missing_requirements"])
-    assert any("12/30" in item for item in status["missing_requirements"])
-    assert any("45.0% / >60.0%" in item for item in status["missing_requirements"])
-
-
-def test_render_markdown_preserves_phase_buildings_and_manual_clicker_language() -> None:
-    mod = _load_module(SCRIPT_PATH, "fleet_phase_status")
-    status = mod.compute_phase_status(mod.default_snapshot())
-    report = mod.render_markdown(status)
-
-    for snippet in (
-        "# [PHASE-1] Survival - Keep the Lights On",
-        "VPS hosts: Ezra, Allegro, Bezalel",
-        "Timmy harness",
-        "Gitea forge",
-        "Evennia worlds",
-        "Every restart, every SSH, every check is a manual click.",
-    ):
-        assert snippet in report
-
-
-def test_repo_contains_generated_phase_1_doc() -> None:
-    assert DOC_PATH.exists(), "missing committed phase-1 survival doc"
-    text = DOC_PATH.read_text(encoding="utf-8")
-    for snippet in (
-        "# [PHASE-1] Survival - Keep the Lights On",
-        "## Current Buildings",
-        "## Next Phase Trigger",
-        "## Manual Clicker Interpretation",
-    ):
-        assert snippet in text
Author	SHA1	Message	Date
Alexander Whitestone	6bbf6c4e0e	ci: fix smoke JSON parsing for backup tests (#693 ) Some checks failed Smoke Test / smoke (pull_request) Failing after 10m42s Details	2026-04-14 23:59:46 -04:00
Alexander Whitestone	6fbdbcf1c1	docs: add Hermes backup runbook (#693 )	2026-04-14 23:57:41 -04:00
Alexander Whitestone	f8a9bae8fb	feat: add encrypted Hermes backup pipeline (#693 )	2026-04-14 23:56:46 -04:00