Compare commits

..

3 Commits

Author SHA1 Message Date
Alexander Whitestone
6bbf6c4e0e ci: fix smoke JSON parsing for backup tests (#693)
Some checks failed
Smoke Test / smoke (pull_request) Failing after 10m42s
2026-04-14 23:59:46 -04:00
Alexander Whitestone
6fbdbcf1c1 docs: add Hermes backup runbook (#693) 2026-04-14 23:57:41 -04:00
Alexander Whitestone
f8a9bae8fb feat: add encrypted Hermes backup pipeline (#693) 2026-04-14 23:56:46 -04:00
8 changed files with 456 additions and 428 deletions

View File

@@ -14,7 +14,7 @@ jobs:
- name: Parse check
run: |
find . -name '*.yml' -o -name '*.yaml' | grep -v .gitea | xargs -r python3 -c "import sys,yaml; [yaml.safe_load(open(f)) for f in sys.argv[1:]]"
find . -name '*.json' | xargs -r python3 -m json.tool > /dev/null
find . -name '*.json' -print0 | xargs -0 -r -n1 python3 -m json.tool > /dev/null
find . -name '*.py' | xargs -r python3 -m py_compile
find . -name '*.sh' | xargs -r bash -n
echo "PASS: All files parse"
@@ -22,3 +22,6 @@ jobs:
run: |
if grep -rE 'sk-or-|sk-ant-|ghp_|AKIA' . --include='*.yml' --include='*.py' --include='*.sh' 2>/dev/null | grep -v '.gitea' | grep -v 'detect_secrets' | grep -v 'test_trajectory_sanitize'; then exit 1; fi
echo "PASS: No secrets"
- name: Backup pipeline regression test
run: |
python3 -m unittest discover -s tests -p 'test_backup_pipeline.py' -v

304
GENOME.md
View File

@@ -1,304 +0,0 @@
# GENOME.md — timmy-config
Auto-generated facts were derived from the local `~/.timmy/timmy-config` checkout on 2026-04-14 and then reviewed manually for architecture, API surface, and operational meaning.
## Project Overview
`timmy-config` is the sovereign sidecar for Timmy.
It is not the model runtime and it is not the work archive. It is the configuration and orchestration layer that tells Hermes who Timmy is, how he is routed, what scripts are available, what memories and skins are installed, and how the fleet is operated.
The repo exists because the Timmy Foundation made a hard architectural split:
- `hermes-agent` is the engine
- `timmy-config` is the driver's seat
- `timmy-home` is the lived workspace and artifact archive
This repo is therefore a mixed codebase with three major responsibilities:
1. Sidecar deployment into `~/.hermes/`
2. Operational automation for the fleet
3. Thin training/orchestration support without becoming a monolith again
Observed quick facts from the current local checkout:
- Source files: 178
- Test files: 36
- Config files: 24
- Total lines: 35,624
- Last commit on analyzed source: `7630806f` (`sync: align repo with live system config`)
- Total commits: 393
## Architecture Diagram
```mermaid
flowchart TD
A[timmy-config repo] --> B[deploy.sh]
B --> C[~/.hermes/config.yaml]
B --> D[~/.hermes/bin/*]
B --> E[~/.hermes/playbooks/*]
B --> F[~/.hermes/cron/*]
B --> G[~/.hermes/memories/*]
B --> H[~/.timmy/SOUL.md]
C --> I[Hermes gateway/runtime]
D --> J[Operational scripts]
E --> K[Playbook-driven agent behavior]
F --> L[Cron-scheduled automation]
J --> M[Gitea forge]
J --> N[Telegram / platform gateways]
J --> O[tmux wizard fleet]
J --> P[metrics + local files]
Q[orchestration.py + tasks.py] --> R[Huey / SQLite queue]
R --> J
Q --> M
Q --> P
S[gitea_client.py] --> M
T[training/*] --> U[curated datasets + evaluation recipes]
U --> P
V[timmy-home] <-->|artifacts / logs / training outputs| J
V <-->|continuity / notes / metrics| Q
```
## Entry Points and Data Flow
### Primary entry points
1. `deploy.sh`
- canonical sidecar deployment entry point
- copies config, scripts, playbooks, cron definitions, skins, and memories into `~/.hermes/`
- writes `SOUL.md` into `~/.timmy/`
2. `orchestration.py`
- defines the Huey SQLite queue
- the root scheduler primitive for queued work
3. `tasks.py`
- the real orchestration surface
- contains scheduled jobs, local Hermes invocations, archive/training helpers, JSONL helpers, continuity flushing, and repo lists
4. `gitea_client.py`
- typed stdlib-only interface to Gitea
- shared API layer replacing scattered raw curl logic
5. `bin/*`
- operational executables for health checks, deadman switch, dispatch, watchdogs, scans, and status panels
- this is the hands-on operator layer
### Data flow summary
- Configuration starts in repo files like `config.yaml`, `fallback-portfolios.yaml`, `channel_directory.json`, `memories/`, `skins/`, `playbooks/`, and `cron/`
- `deploy.sh` overlays these into the Hermes runtime directory
- Hermes runtime reads the deployed config and scripts
- `tasks.py` and `bin/*` then interact with:
- Gitea (`gitea_client.py`, issue/PR automation)
- local Hermes sessions (`run_hermes_local` paths in `tasks.py`)
- local files in `~/.timmy/` and `~/.hermes/`
- metrics JSONL outputs
- Telegram / gateway surfaces
This is a classic sidecar pattern: the repo does not own the engine, but it owns almost every operational decision around the engine.
## Key Abstractions
### 1. Sidecar overlay
Core idea: never fork Hermes if a sidecar can express the behavior.
This abstraction appears in:
- `deploy.sh`
- repo layout under `bin/`, `cron/`, `playbooks/`, `memories/`, `skins/`
- the explicit README boundary between `timmy-config` and `timmy-home`
### 2. Typed forge client
`gitea_client.py` turns the forge into a stable internal API surface.
Important abstractions:
- `Issue`
- `PullRequest`
- `Comment`
- `Label`
- `User`
- `GiteaClient`
This is important because many other scripts can depend on one client instead of shelling out to brittle curl commands.
### 3. Queue-backed orchestration
`orchestration.py` + `tasks.py` define the move from ad-hoc shell automation to queued work.
The central abstraction is not just “a cron job” but “a schedulable task with local model execution, continuity, and metrics.”
### 4. Continuity as files, not vibes
`tasks.py` contains explicit file-backed continuity helpers:
- `flush_continuity(...)`
- JSON readers/writers
- JSONL append/load helpers
- archive checkpoint/state files
This abstraction matters because Timmy continuity survives compaction or restart by being written to disk.
### 5. Training as thin recipes
The training directory is intentionally framed as transitional.
It exposes recipes and helper scripts, but the README is explicit that lived data belongs elsewhere.
This is an important abstraction boundary:
- configs + generators here
- real activity artifacts in `timmy-home`
## API Surface
### Shell / operator API
Important user-facing commands implied by the repo:
- `./deploy.sh`
- Huey consumer startup via `huey_consumer.py tasks.huey -w 2 -k thread`
- scripts in `bin/` such as:
- `deadman-switch.sh`
- `fleet-status.sh`
- `model-health-check.sh`
- `start-loops.sh`
- `agent-dispatch.sh`
### Python API surface
Most reusable programmatic interfaces:
#### `gitea_client.GiteaClient`
Key methods include:
- `list_org_repos`
- `list_issues`
- `get_issue`
- `create_issue`
- `update_issue`
- `close_issue`
- `assign_issue`
- `add_labels`
- `list_comments`
- `create_comment`
- `list_pulls`
- `get_pull`
- `create_pull`
- `merge_pull`
- `update_pull_branch`
- `close_pull`
#### `tasks.py` helpers
Notable reusable surfaces:
- `run_hermes_local(...)`
- `hermes_local(...)`
- `run_reflex_task(...)`
- `run_archive_hermes(...)`
- `flush_continuity(...)`
- JSON/JSONL primitives:
- `read_json`
- `write_json`
- `load_jsonl`
- `write_jsonl`
- `append_jsonl`
- `count_jsonl_rows`
This API surface is broad enough that timmy-config is functionally an operations SDK as much as a config repo.
## Test Coverage Gaps
The codebase-genome pipeline estimated:
- source modules: 86
- test modules: 36
- estimated coverage: 15%
- untested modules: 73
That estimate is crude, but directionally useful.
### High-value gaps
The most important untested or under-tested areas are not random scripts. They are the system boundary scripts and operator-critical surfaces:
1. `deploy.sh`
- highest leverage file in the repo
- if deploy breaks, the sidecar stops being the source of truth in practice
2. `orchestration.py` and large portions of `tasks.py`
- especially scheduling, local-Hermes execution wrappers, checkpoint/state flows, and failure modes
3. `gitea_client.py`
- some behavior may be covered indirectly, but the client is important enough to deserve deeper contract tests for pagination, merge failures, 405 handling, and retry behavior
4. `bin/*` operational scripts
- deadman, watchdog, model health, and dispatch paths are safety critical
5. Ansible deployment surface
- `ansible/playbooks/*` and `ansible/roles/*` are configuration-heavy but still represent production behavior
### Practical takeaway
If the next wave of testing work is prioritized, the order should be:
1. deploy + runtime overlay correctness
2. Gitea client contracts
3. queue/task execution behavior in `tasks.py`
4. deadman / watchdog / fleet-health scripts
5. training and archive helper edge cases
## Security Considerations
The pipeline already flags several categories, and manual review supports them.
### 1. Subprocess-heavy code
This repo uses shell and subprocess execution widely.
That is expected for an ops repo, but it increases command-injection risk.
Files like `tasks.py`, deployment helpers, and many `bin/*` scripts need careful boundary checking around interpolated inputs.
### 2. Secret adjacency
This repo references tokens, auth files, routing configs, and platform integrations.
Even when secrets are not hardcoded, the repo lives close to sensitive paths.
That means review discipline matters:
- no accidental dumps of live tokens
- no committing generated auth artifacts
- no relaxed assumptions about local paths being safe to expose
### 3. HTTP and webhook surfaces
Multiple scripts make outbound HTTP calls or serve automation endpoints.
This means input validation, response validation, and timeout/error handling matter more than in a static config repo.
### 4. SQLite and file-backed state
SQLite/Huey plus JSON/JSONL file state are simple and sovereign, but they create corruption and stale-state risks if writes are partial or multiple writers race.
The repo already encodes a strong preference for explicit checkpoints and continuity flushes, which is the right direction.
### 5. Sidecar privilege
Because `deploy.sh` writes directly into `~/.hermes/`, this repo effectively has configuration authority over the live runtime.
That is powerful and dangerous. A bad deploy can break routing, scripts, or identity. In security terms, this repo is a control plane.
## Performance Characteristics
`timmy-config` is not performance-sensitive in the same way a game loop or serving stack is. Its performance profile is operational:
- Many small scripts, low-latency startup expectations
- File-backed state and JSONL append patterns optimized for simplicity over throughput
- Huey + SQLite chosen for low operational overhead, not horizontal scale
- Deployment is copy-based and cheap; correctness matters more than speed
- The repo is large in surface area but shallow in runtime residency — most code runs only when called
Practical performance characteristics:
- deploy path should remain fast because it is mostly file copies
- Gitea automation cost is network-bound, not CPU-bound
- training helpers are the heaviest local operations but intentionally limited in scope
- orchestration latency is dominated by external model/API calls rather than Python compute
The biggest performance risk is not CPU.
It is operational drift: too many scripts, overlapping logic, or old paths that still appear to exist. That increases human debugging cost even when machine runtime is fine.
## Final Assessment
`timmy-config` is best understood as a sovereign control plane.
It is part config repo, part operations toolkit, part lightweight SDK, and part deployment overlay.
Its strongest architectural idea is the boundary:
- keep Hermes upstream
- keep Timmy sovereign through the sidecar
- keep lived work in timmy-home
Its biggest risk is sprawl.
A repo with 178 source files and 393 commits can easily become the place where every operational idea goes to live forever.
The right long-term direction is not to make it smaller by deleting its authority, but to keep sharpening the boundaries inside it:
- sidecar deploy surface
- queue/orchestration surface
- Gitea client surface
- training recipe surface
- operator script surface
That keeps the genome legible.
Without that, the repo becomes powerful but opaque — and a control plane cannot afford opacity.

98
docs/BACKUP_PIPELINE.md Normal file
View File

@@ -0,0 +1,98 @@
# Encrypted Hermes Backup Pipeline
Issue: `timmy-home#693`
This pipeline creates a nightly encrypted archive of `~/.hermes`, stores a local encrypted copy, uploads it to remote storage, and supports restore verification.
## What gets backed up
By default the pipeline archives:
- `~/.hermes/config.yaml`
- `~/.hermes/state.db`
- `~/.hermes/sessions/`
- `~/.hermes/cron/`
- any other files under `~/.hermes`
Override the source with `BACKUP_SOURCE_DIR=/path/to/.hermes`.
## Backup command
```bash
BACKUP_PASSPHRASE_FILE=~/.config/timmy/backup.passphrase \
BACKUP_NAS_TARGET=/Volumes/timmy-nas/hermes-backups \
bash scripts/backup_pipeline.sh
```
The script writes:
- local encrypted copy: `~/.timmy-backups/hermes/<timestamp>/hermes-backup-<timestamp>.tar.gz.enc`
- local manifest: `~/.timmy-backups/hermes/<timestamp>/hermes-backup-<timestamp>.json`
- log file: `~/.timmy-backups/hermes/logs/backup_pipeline.log`
## Nightly schedule
Run every night at 03:00:
```cron
0 3 * * * cd /Users/apayne/.timmy/timmy-home && BACKUP_PASSPHRASE_FILE=/Users/apayne/.config/timmy/backup.passphrase BACKUP_NAS_TARGET=/Volumes/timmy-nas/hermes-backups bash scripts/backup_pipeline.sh >> /Users/apayne/.timmy-backups/hermes/logs/cron.log 2>&1
```
## Remote targets
At least one remote target must be configured.
### Local NAS
Use a mounted path:
```bash
BACKUP_NAS_TARGET=/Volumes/timmy-nas/hermes-backups
```
The pipeline copies the encrypted archive and manifest into `<BACKUP_NAS_TARGET>/<timestamp>/`.
### S3-compatible storage
```bash
BACKUP_PASSPHRASE_FILE=~/.config/timmy/backup.passphrase \
BACKUP_S3_URI=s3://timmy-backups/hermes \
AWS_ENDPOINT_URL=https://minio.example.com \
bash scripts/backup_pipeline.sh
```
Notes:
- `aws` CLI must be installed if `BACKUP_S3_URI` is set.
- `AWS_ENDPOINT_URL` is optional and is used for MinIO, R2, and other S3-compatible endpoints.
## Restore playbook
Restore an encrypted archive into a clean target root:
```bash
BACKUP_PASSPHRASE_FILE=~/.config/timmy/backup.passphrase \
bash scripts/restore_backup.sh \
/Volumes/timmy-nas/hermes-backups/20260415-030000/hermes-backup-20260415-030000.tar.gz.enc \
/tmp/hermes-restore
```
Result:
- restored tree lands at `/tmp/hermes-restore/.hermes`
- if a sibling manifest exists, the restore script verifies the archive SHA256 before decrypting
## End-to-end verification
Run the regression suite:
```bash
python3 -m unittest discover -s tests -p 'test_backup_pipeline.py' -v
```
This proves:
1. the backup output is encrypted
2. plaintext archives do not leak into the backup destinations
3. the restore script recreates the original `.hermes` tree end-to-end
4. the pipeline refuses to run without a remote target

View File

@@ -12,6 +12,8 @@ Quick-reference index for common operational tasks across the Timmy Foundation i
| Check fleet health | fleet-ops | `python3 scripts/fleet_readiness.py` |
| Agent scorecard | fleet-ops | `python3 scripts/agent_scorecard.py` |
| View fleet manifest | fleet-ops | `cat manifest.yaml` |
| Backup Hermes state | timmy-home | `BACKUP_PASSPHRASE_FILE=... BACKUP_NAS_TARGET=... bash scripts/backup_pipeline.sh` |
| Restore Hermes state | timmy-home | `BACKUP_PASSPHRASE_FILE=... bash scripts/restore_backup.sh <archive> <restore-root>` |
## the-nexus (Frontend + Brain)

View File

@@ -1,80 +1,170 @@
#!/usr/bin/env bash
# backup_pipeline.sh — Daily fleet backup pipeline (FLEET-008)
# Refs: timmy-home #561
# backup_pipeline.sh — Nightly encrypted Hermes backup pipeline
# Refs: timmy-home #693, timmy-home #561
set -euo pipefail
BACKUP_ROOT="/backups/timmy"
DATESTAMP=$(date +%Y%m%d-%H%M%S)
BACKUP_DIR="${BACKUP_ROOT}/${DATESTAMP}"
LOG_DIR="/var/log/timmy"
ALERT_LOG="${LOG_DIR}/backup_pipeline.log"
mkdir -p "$BACKUP_DIR" "$LOG_DIR"
DATESTAMP="${BACKUP_TIMESTAMP:-$(date +%Y%m%d-%H%M%S)}"
BACKUP_SOURCE_DIR="${BACKUP_SOURCE_DIR:-${HOME}/.hermes}"
BACKUP_ROOT="${BACKUP_ROOT:-${HOME}/.timmy-backups/hermes}"
BACKUP_LOG_DIR="${BACKUP_LOG_DIR:-${BACKUP_ROOT}/logs}"
BACKUP_RETENTION_DAYS="${BACKUP_RETENTION_DAYS:-14}"
BACKUP_S3_URI="${BACKUP_S3_URI:-}"
BACKUP_NAS_TARGET="${BACKUP_NAS_TARGET:-}"
AWS_ENDPOINT_URL="${AWS_ENDPOINT_URL:-}"
BACKUP_NAME="hermes-backup-${DATESTAMP}"
LOCAL_BACKUP_DIR="${BACKUP_ROOT}/${DATESTAMP}"
STAGE_DIR="$(mktemp -d "${TMPDIR:-/tmp}/timmy-backup.XXXXXX")"
PLAINTEXT_ARCHIVE="${STAGE_DIR}/${BACKUP_NAME}.tar.gz"
ENCRYPTED_ARCHIVE="${STAGE_DIR}/${BACKUP_NAME}.tar.gz.enc"
MANIFEST_PATH="${STAGE_DIR}/${BACKUP_NAME}.json"
ALERT_LOG="${BACKUP_LOG_DIR}/backup_pipeline.log"
PASSFILE_CLEANUP=""
TELEGRAM_BOT_TOKEN="${TELEGRAM_BOT_TOKEN:-}"
TELEGRAM_CHAT_ID="${TELEGRAM_CHAT_ID:-}"
OFFSITE_TARGET="${OFFSITE_TARGET:-}"
mkdir -p "$BACKUP_LOG_DIR"
log() { echo "[$(date -Iseconds)] $1" | tee -a "$ALERT_LOG"; }
log() {
echo "[$(date -Iseconds)] $1" | tee -a "$ALERT_LOG"
}
send_telegram() {
local msg="$1"
if [[ -n "$TELEGRAM_BOT_TOKEN" && -n "$TELEGRAM_CHAT_ID" ]]; then
curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
-d "chat_id=${TELEGRAM_CHAT_ID}" -d "text=${msg}" >/dev/null 2>&1 || true
fail() {
log "ERROR: $1"
exit 1
}
cleanup() {
rm -f "$PLAINTEXT_ARCHIVE"
rm -rf "$STAGE_DIR"
if [[ -n "$PASSFILE_CLEANUP" && -f "$PASSFILE_CLEANUP" ]]; then
rm -f "$PASSFILE_CLEANUP"
fi
}
trap cleanup EXIT
resolve_passphrase_file() {
if [[ -n "${BACKUP_PASSPHRASE_FILE:-}" ]]; then
[[ -f "$BACKUP_PASSPHRASE_FILE" ]] || fail "BACKUP_PASSPHRASE_FILE does not exist: $BACKUP_PASSPHRASE_FILE"
echo "$BACKUP_PASSPHRASE_FILE"
return
fi
if [[ -n "${BACKUP_PASSPHRASE:-}" ]]; then
PASSFILE_CLEANUP="${STAGE_DIR}/backup.passphrase"
printf '%s' "$BACKUP_PASSPHRASE" > "$PASSFILE_CLEANUP"
chmod 600 "$PASSFILE_CLEANUP"
echo "$PASSFILE_CLEANUP"
return
fi
fail "Set BACKUP_PASSPHRASE_FILE or BACKUP_PASSPHRASE before running the backup pipeline."
}
sha256_file() {
local path="$1"
if command -v shasum >/dev/null 2>&1; then
shasum -a 256 "$path" | awk '{print $1}'
elif command -v sha256sum >/dev/null 2>&1; then
sha256sum "$path" | awk '{print $1}'
else
python3 - <<'PY' "$path"
import hashlib
import pathlib
import sys
path = pathlib.Path(sys.argv[1])
h = hashlib.sha256()
with path.open('rb') as f:
for chunk in iter(lambda: f.read(1024 * 1024), b''):
h.update(chunk)
print(h.hexdigest())
PY
fi
}
status=0
write_manifest() {
python3 - <<'PY' "$1" "$2" "$3" "$4" "$5" "$6" "$7" "$8"
import json
import sys
manifest_path, source_dir, archive_name, archive_sha256, local_dir, s3_uri, nas_target, created_at = sys.argv[1:]
manifest = {
"created_at": created_at,
"source_dir": source_dir,
"archive_name": archive_name,
"archive_sha256": archive_sha256,
"encryption": {
"type": "openssl",
"cipher": "aes-256-cbc",
"pbkdf2": True,
"iterations": 200000,
},
"destinations": {
"local_dir": local_dir,
"s3_uri": s3_uri or None,
"nas_target": nas_target or None,
},
}
with open(manifest_path, 'w', encoding='utf-8') as handle:
json.dump(manifest, handle, indent=2)
handle.write('\n')
PY
}
# --- Gitea repositories ---
if [[ -d /root/gitea ]]; then
tar czf "${BACKUP_DIR}/gitea-repos.tar.gz" -C /root gitea 2>/dev/null || true
log "Backed up Gitea repos"
fi
upload_to_nas() {
local archive_path="$1"
local manifest_path="$2"
local target_root="$3"
# --- Agent configs and state ---
for wiz in bezalel allegro ezra timmy; do
if [[ -d "/root/wizards/${wiz}" ]]; then
tar czf "${BACKUP_DIR}/${wiz}-home.tar.gz" -C /root/wizards "${wiz}" 2>/dev/null || true
log "Backed up ${wiz} home"
local target_dir="${target_root%/}/${DATESTAMP}"
mkdir -p "$target_dir"
cp "$archive_path" "$manifest_path" "$target_dir/"
log "Uploaded backup to NAS target: $target_dir"
}
upload_to_s3() {
local archive_path="$1"
local manifest_path="$2"
command -v aws >/dev/null 2>&1 || fail "BACKUP_S3_URI is set but aws CLI is not installed."
local args=()
if [[ -n "$AWS_ENDPOINT_URL" ]]; then
args+=(--endpoint-url "$AWS_ENDPOINT_URL")
fi
done
# --- System configs ---
cp /etc/crontab "${BACKUP_DIR}/crontab" 2>/dev/null || true
cp -r /etc/systemd/system "${BACKUP_DIR}/systemd" 2>/dev/null || true
log "Backed up system configs"
aws "${args[@]}" s3 cp "$archive_path" "${BACKUP_S3_URI%/}/$(basename "$archive_path")"
aws "${args[@]}" s3 cp "$manifest_path" "${BACKUP_S3_URI%/}/$(basename "$manifest_path")"
log "Uploaded backup to S3 target: $BACKUP_S3_URI"
}
# --- Evennia worlds (if present) ---
if [[ -d /root/evennia ]]; then
tar czf "${BACKUP_DIR}/evennia-worlds.tar.gz" -C /root evennia 2>/dev/null || true
log "Backed up Evennia worlds"
[[ -d "$BACKUP_SOURCE_DIR" ]] || fail "BACKUP_SOURCE_DIR does not exist: $BACKUP_SOURCE_DIR"
[[ -n "$BACKUP_NAS_TARGET" || -n "$BACKUP_S3_URI" ]] || fail "Set BACKUP_NAS_TARGET or BACKUP_S3_URI for remote backup storage."
PASSFILE="$(resolve_passphrase_file)"
mkdir -p "$LOCAL_BACKUP_DIR"
log "Creating archive from $BACKUP_SOURCE_DIR"
tar -czf "$PLAINTEXT_ARCHIVE" -C "$(dirname "$BACKUP_SOURCE_DIR")" "$(basename "$BACKUP_SOURCE_DIR")"
log "Encrypting archive"
openssl enc -aes-256-cbc -salt -pbkdf2 -iter 200000 \
-pass "file:${PASSFILE}" \
-in "$PLAINTEXT_ARCHIVE" \
-out "$ENCRYPTED_ARCHIVE"
ARCHIVE_SHA256="$(sha256_file "$ENCRYPTED_ARCHIVE")"
CREATED_AT="$(date -u '+%Y-%m-%dT%H:%M:%SZ')"
write_manifest "$MANIFEST_PATH" "$BACKUP_SOURCE_DIR" "$(basename "$ENCRYPTED_ARCHIVE")" "$ARCHIVE_SHA256" "$LOCAL_BACKUP_DIR" "$BACKUP_S3_URI" "$BACKUP_NAS_TARGET" "$CREATED_AT"
cp "$ENCRYPTED_ARCHIVE" "$MANIFEST_PATH" "$LOCAL_BACKUP_DIR/"
rm -f "$PLAINTEXT_ARCHIVE"
log "Encrypted backup stored locally: ${LOCAL_BACKUP_DIR}/$(basename "$ENCRYPTED_ARCHIVE")"
if [[ -n "$BACKUP_NAS_TARGET" ]]; then
upload_to_nas "$ENCRYPTED_ARCHIVE" "$MANIFEST_PATH" "$BACKUP_NAS_TARGET"
fi
# --- Manifest ---
find "$BACKUP_DIR" -type f > "${BACKUP_DIR}/manifest.txt"
log "Backup manifest written"
# --- Offsite sync ---
if [[ -n "$OFFSITE_TARGET" ]]; then
if rsync -az --delete "${BACKUP_DIR}/" "${OFFSITE_TARGET}/${DATESTAMP}/" 2>/dev/null; then
log "Offsite sync completed"
else
log "WARNING: Offsite sync failed"
status=1
fi
if [[ -n "$BACKUP_S3_URI" ]]; then
upload_to_s3 "$ENCRYPTED_ARCHIVE" "$MANIFEST_PATH"
fi
# --- Retention: keep last 7 days ---
find "$BACKUP_ROOT" -mindepth 1 -maxdepth 1 -type d -mtime +7 -exec rm -rf {} + 2>/dev/null || true
log "Retention applied (7 days)"
if [[ "$status" -eq 0 ]]; then
log "Backup pipeline completed: ${BACKUP_DIR}"
send_telegram "✅ Daily backup completed: ${DATESTAMP}"
else
log "Backup pipeline completed with WARNINGS: ${BACKUP_DIR}"
send_telegram "⚠️ Daily backup completed with warnings: ${DATESTAMP}"
fi
exit "$status"
find "$BACKUP_ROOT" -mindepth 1 -maxdepth 1 -type d -name '20*' -mtime "+${BACKUP_RETENTION_DAYS}" -exec rm -rf {} + 2>/dev/null || true
log "Retention applied (${BACKUP_RETENTION_DAYS} days)"
log "Backup pipeline completed successfully"

97
scripts/restore_backup.sh Normal file
View File

@@ -0,0 +1,97 @@
#!/usr/bin/env bash
# restore_backup.sh — Restore an encrypted Hermes backup archive
# Usage: restore_backup.sh /path/to/hermes-backup-YYYYmmdd-HHMMSS.tar.gz.enc /restore/root
set -euo pipefail
ARCHIVE_PATH="${1:-}"
RESTORE_ROOT="${2:-}"
STAGE_DIR="$(mktemp -d "${TMPDIR:-/tmp}/timmy-restore.XXXXXX")"
PLAINTEXT_ARCHIVE="${STAGE_DIR}/restore.tar.gz"
PASSFILE_CLEANUP=""
cleanup() {
rm -f "$PLAINTEXT_ARCHIVE"
rm -rf "$STAGE_DIR"
if [[ -n "$PASSFILE_CLEANUP" && -f "$PASSFILE_CLEANUP" ]]; then
rm -f "$PASSFILE_CLEANUP"
fi
}
trap cleanup EXIT
fail() {
echo "ERROR: $1" >&2
exit 1
}
resolve_passphrase_file() {
if [[ -n "${BACKUP_PASSPHRASE_FILE:-}" ]]; then
[[ -f "$BACKUP_PASSPHRASE_FILE" ]] || fail "BACKUP_PASSPHRASE_FILE does not exist: $BACKUP_PASSPHRASE_FILE"
echo "$BACKUP_PASSPHRASE_FILE"
return
fi
if [[ -n "${BACKUP_PASSPHRASE:-}" ]]; then
PASSFILE_CLEANUP="${STAGE_DIR}/backup.passphrase"
printf '%s' "$BACKUP_PASSPHRASE" > "$PASSFILE_CLEANUP"
chmod 600 "$PASSFILE_CLEANUP"
echo "$PASSFILE_CLEANUP"
return
fi
fail "Set BACKUP_PASSPHRASE_FILE or BACKUP_PASSPHRASE before restoring a backup."
}
sha256_file() {
local path="$1"
if command -v shasum >/dev/null 2>&1; then
shasum -a 256 "$path" | awk '{print $1}'
elif command -v sha256sum >/dev/null 2>&1; then
sha256sum "$path" | awk '{print $1}'
else
python3 - <<'PY' "$path"
import hashlib
import pathlib
import sys
path = pathlib.Path(sys.argv[1])
h = hashlib.sha256()
with path.open('rb') as f:
for chunk in iter(lambda: f.read(1024 * 1024), b''):
h.update(chunk)
print(h.hexdigest())
PY
fi
}
[[ -n "$ARCHIVE_PATH" ]] || fail "Usage: restore_backup.sh /path/to/archive.tar.gz.enc /restore/root"
[[ -n "$RESTORE_ROOT" ]] || fail "Usage: restore_backup.sh /path/to/archive.tar.gz.enc /restore/root"
[[ -f "$ARCHIVE_PATH" ]] || fail "Archive not found: $ARCHIVE_PATH"
if [[ "$ARCHIVE_PATH" == *.tar.gz.enc ]]; then
MANIFEST_PATH="${ARCHIVE_PATH%.tar.gz.enc}.json"
else
MANIFEST_PATH=""
fi
if [[ -n "$MANIFEST_PATH" && -f "$MANIFEST_PATH" ]]; then
EXPECTED_SHA="$(python3 - <<'PY' "$MANIFEST_PATH"
import json
import sys
with open(sys.argv[1], 'r', encoding='utf-8') as handle:
manifest = json.load(handle)
print(manifest['archive_sha256'])
PY
)"
ACTUAL_SHA="$(sha256_file "$ARCHIVE_PATH")"
[[ "$EXPECTED_SHA" == "$ACTUAL_SHA" ]] || fail "Archive SHA256 mismatch: expected $EXPECTED_SHA got $ACTUAL_SHA"
fi
PASSFILE="$(resolve_passphrase_file)"
mkdir -p "$RESTORE_ROOT"
openssl enc -d -aes-256-cbc -salt -pbkdf2 -iter 200000 \
-pass "file:${PASSFILE}" \
-in "$ARCHIVE_PATH" \
-out "$PLAINTEXT_ARCHIVE"
tar -xzf "$PLAINTEXT_ARCHIVE" -C "$RESTORE_ROOT"
echo "Restored backup into $RESTORE_ROOT"

View File

@@ -0,0 +1,103 @@
#!/usr/bin/env python3
import os
import subprocess
import tempfile
import unittest
from pathlib import Path
ROOT = Path(__file__).resolve().parents[1]
BACKUP_SCRIPT = ROOT / "scripts" / "backup_pipeline.sh"
RESTORE_SCRIPT = ROOT / "scripts" / "restore_backup.sh"
class TestBackupPipeline(unittest.TestCase):
def setUp(self) -> None:
self.tempdir = tempfile.TemporaryDirectory()
self.base = Path(self.tempdir.name)
self.home = self.base / "home"
self.source_dir = self.home / ".hermes"
self.source_dir.mkdir(parents=True)
(self.source_dir / "sessions").mkdir()
(self.source_dir / "cron").mkdir()
(self.source_dir / "config.yaml").write_text("model: local-first\n")
(self.source_dir / "sessions" / "session.jsonl").write_text('{"role":"assistant","content":"hello"}\n')
(self.source_dir / "cron" / "jobs.json").write_text('{"jobs": 1}\n')
(self.source_dir / "state.db").write_bytes(b"sqlite-state")
self.backup_root = self.base / "backup-root"
self.nas_target = self.base / "nas-target"
self.restore_root = self.base / "restore-root"
self.log_dir = self.base / "logs"
self.passphrase_file = self.base / "backup.passphrase"
self.passphrase_file.write_text("correct horse battery staple\n")
def tearDown(self) -> None:
self.tempdir.cleanup()
def _env(self, *, include_remote: bool = True) -> dict[str, str]:
env = os.environ.copy()
env.update(
{
"HOME": str(self.home),
"BACKUP_SOURCE_DIR": str(self.source_dir),
"BACKUP_ROOT": str(self.backup_root),
"BACKUP_LOG_DIR": str(self.log_dir),
"BACKUP_PASSPHRASE_FILE": str(self.passphrase_file),
}
)
if include_remote:
env["BACKUP_NAS_TARGET"] = str(self.nas_target)
return env
def test_backup_encrypts_and_restore_round_trips(self) -> None:
backup = subprocess.run(
["bash", str(BACKUP_SCRIPT)],
capture_output=True,
text=True,
env=self._env(),
cwd=ROOT,
)
self.assertEqual(backup.returncode, 0, msg=backup.stdout + backup.stderr)
encrypted_archives = sorted(self.nas_target.rglob("*.tar.gz.enc"))
self.assertEqual(len(encrypted_archives), 1, msg=f"expected one encrypted archive, found: {encrypted_archives}")
archive_path = encrypted_archives[0]
self.assertNotIn(b"model: local-first", archive_path.read_bytes())
manifests = sorted(self.nas_target.rglob("*.json"))
self.assertEqual(len(manifests), 1, msg=f"expected one manifest, found: {manifests}")
plaintext_archives = sorted(self.backup_root.rglob("*.tar.gz")) + sorted(self.nas_target.rglob("*.tar.gz"))
self.assertEqual(plaintext_archives, [], msg=f"plaintext archives leaked: {plaintext_archives}")
restore = subprocess.run(
["bash", str(RESTORE_SCRIPT), str(archive_path), str(self.restore_root)],
capture_output=True,
text=True,
env=self._env(),
cwd=ROOT,
)
self.assertEqual(restore.returncode, 0, msg=restore.stdout + restore.stderr)
restored_hermes = self.restore_root / ".hermes"
self.assertTrue(restored_hermes.exists())
self.assertEqual((restored_hermes / "config.yaml").read_text(), "model: local-first\n")
self.assertEqual((restored_hermes / "sessions" / "session.jsonl").read_text(), '{"role":"assistant","content":"hello"}\n')
self.assertEqual((restored_hermes / "cron" / "jobs.json").read_text(), '{"jobs": 1}\n')
self.assertEqual((restored_hermes / "state.db").read_bytes(), b"sqlite-state")
def test_backup_requires_remote_target(self) -> None:
backup = subprocess.run(
["bash", str(BACKUP_SCRIPT)],
capture_output=True,
text=True,
env=self._env(include_remote=False),
cwd=ROOT,
)
self.assertNotEqual(backup.returncode, 0)
self.assertIn("BACKUP_NAS_TARGET or BACKUP_S3_URI", backup.stdout + backup.stderr)
if __name__ == "__main__":
unittest.main(verbosity=2)

View File

@@ -1,61 +0,0 @@
from pathlib import Path
GENOME = Path('GENOME.md')
def read_genome() -> str:
assert GENOME.exists(), 'GENOME.md must exist at repo root'
return GENOME.read_text(encoding='utf-8')
def test_genome_exists():
assert GENOME.exists(), 'GENOME.md must exist at repo root'
def test_genome_has_required_sections():
text = read_genome()
for heading in [
'# GENOME.md — timmy-config',
'## Project Overview',
'## Architecture Diagram',
'## Entry Points and Data Flow',
'## Key Abstractions',
'## API Surface',
'## Test Coverage Gaps',
'## Security Considerations',
'## Performance Characteristics',
]:
assert heading in text
def test_genome_contains_mermaid_diagram():
text = read_genome()
assert '```mermaid' in text
assert 'graph TD' in text or 'flowchart TD' in text
def test_genome_mentions_core_timmy_config_files():
text = read_genome()
for token in [
'deploy.sh',
'config.yaml',
'gitea_client.py',
'orchestration.py',
'tasks.py',
'bin/',
'playbooks/',
'training/',
]:
assert token in text
def test_genome_explains_sidecar_boundary():
text = read_genome()
assert 'sidecar' in text.lower()
assert 'Hermes' in text
assert 'timmy-home' in text
def test_genome_is_substantial():
text = read_genome()
assert len(text) >= 5000