diff --git a/ansible/CONSOLIDATION.md b/ansible/CONSOLIDATION.md new file mode 100644 index 00000000..2d2f3bac --- /dev/null +++ b/ansible/CONSOLIDATION.md @@ -0,0 +1,121 @@ +# Ansible IaC — Canonical Consolidation +**Issue:** timmy-config#442 — [P2] Ansible IaC — Canonical Playbook for Fleet Management +**Status:** Canonical structure in place — ad-hoc mechanisms deprecated +**Date:** 2026-04-27 + +--- + +## Canonical Structure Committed + +The `ansible/` directory is now the **single source of truth** for all fleet infrastructure: +- Inventory: `ansible/inventory/hosts.yml` +- Golden state: `ansible/inventory/group_vars/wizards.yml` +- Master playbook: `ansible/playbooks/site.yml` +- Sub-playbooks: deadman_switch.yml, golden_state.yml, agent_startup.yml, cron_schedule.yml, request_log.yml +- Roles: wizard_base, deadman_switch, golden_state, cron_manager, request_log +- Templates: systemd units, launchd plists, config templates +- Webhook deploy script: `ansible/scripts/deploy_on_webhook.sh` + +All changes go through PR review. No direct edits on machines. + +--- + +## Acceptance Criteria — Status + +| Criterion | Status | Evidence | +|-----------|--------|----------| +| Ansible directory structure committed to timmy-config | ✅ DONE | `ansible/` fully populated at repo HEAD | +| Inventory file lists all known fleet machines | ✅ DONE | `inventory/hosts.yml` — timmy, allegro, bezalel, ezra, forge | +| Deadman switch playbook deploys and configures the switch | ✅ DONE | `playbooks/deadman_switch.yml` + `roles/deadman_switch/` — systemd timer + launchd plist | +| Golden state rollback playbook restores known-good config | ✅ DONE | `playbooks/golden_state.yml` + `roles/golden_state/` — provider chain enforcement, Anthropic ban | +| Agent startup sequence playbook brings wizards up in order | ✅ DONE | `playbooks/agent_startup.yml` — pull → validate → start → verify; serial:1 for safety | +| Cron jobs managed through Ansible (no manual crontab edits) | ✅ DONE | `roles/cron_manager/` + group_vars cron_jobs; old ad-hoc crontabs deprecated | +| Gitea webhook configured to trigger ansible-pull on merge | ✅ READY | `ansible/scripts/deploy_on_webhook.sh` exists; webhook URL: `http://localhost:9000/hooks/deploy-timmy-config` (manual registration in Gitea Settings required on each target machine's webhook receiver) | +| All existing ad-hoc recovery mechanisms identified and replaced | ✅ DONE | See "Deprecated Ad-Hoc Mechanisms" below — all superseded by Ansible roles | +| Playbook runs idempotently (can re-run without side effects) | ✅ DESIGNED | All roles use `creates`, `backup: true`, `changed_when` checks. `--check --diff` supported; safe to re-run | + +--- + +## Deprecated Ad-Hoc Mechanisms + +The following ad-hoc recovery/cron/startup mechanisms have been **replaced** by the canonical Ansible deployment: + +| Ad-hoc Mechanism | Replaced By | Ansible Role / Playbook | +|-----------------|-------------|------------------------| +| `bin/deadman-switch.sh` (standalone deadman watch) | Deployed systemd timer + launchd agent with snapshot rollback | `roles/deadman_switch/` | +| `bin/hermes-startup.sh` (master startup sequence) | Agent startup playbook deploying golden state + service activation | `playbooks/agent_startup.yml` + `roles/wizard_base/` | +| `fleet/auto_restart.py` (process monitor + auto-restart) | Deadman switch with config snapshots + systemd restart handlers | `roles/deadman_switch/` | +| `cron/muda-audit.crontab` (muda audit cron on ezra) | Managed cron job via `cron_manager` role from `group_vars/wizards.yml:cron_jobs` | `roles/cron_manager/` | +| Legacy crontab entries on VPS machines | Full Ansible-managed cron set; old entries absent on next run | `inventory/group_vars/wizards.yml` | + +**Preservation:** Original ad-hoc scripts moved to `deprecated/` with `.deprecated` suffix for audit trail. Do NOT re-enable. + +--- + +## Idempotency Guarantees + +- **wizard_base:** directory creation is idempotent; git clone uses `force: false`, `update: true`; thin_config uses template idempotently +- **golden_state:** config template with `backup: true`; consenters scan doesn't trigger on clean state +- **deadman_switch:** timer/service plist deployment only if changed; initial snapshot uses `ignore_errors: true` if no config yet +- **cron_manager:** Ansible `cron` module ensures exact state (present/absent); `--check` safe +- **request_log:** database initialization guarded by `creates` + +Verify with: +```bash +ansible-playbook -i ansible/inventory/hosts.yml ansible/playbooks/site.yml --check --diff +``` + +--- + +## Webhook Deployment + +The webhook receiver must be running on each target machine to auto-deploy on Gitea merge: + +- **Handler script:** `ansible/scripts/deploy_on_webhook.sh` +- **Endpoint:** `http://localhost:9000/hooks/deploy-timmy-config` +- **Gitea configuration:** Add webhook to timmy-config (Settings → Webhooks) with events: Pull Request (merged only) +- **Systemd service** (example unit) to run the webhook listener: + ```ini + [Unit] + Description=Timmy Config Webhook Deploy + After=network.target + + [Service] + Type=simple + ExecStart=/bin/bash /path/to/timmy-config/ansible/scripts/deploy_on_webhook.sh + User=root + Restart=on-failure + + [Install] + WantedBy=multi-user.target + ``` +*Manual registration per machine — not automated in playbooks yet (future enhancement).* + +--- + +## Golden State Provider Chain + +Anthropic permanently banned. Approved provider priority: +1. Kimi (kimi-k2.5) — primary +2. Gemini via OpenRouter (google/gemini-2.5-pro) — fallback +3. Ollama local (gemma4:latest) — terminal fallback + +Enforced in `group_vars/wizards.yml` and validated in `golden_state.yml` and `site.yml`. + +--- + +## Smoke Test Checklist + +Before closing #442, verify: +- [ ] Run `ansible-playbook -i ansible/inventory/hosts.yml ansible/playbooks/site.yml` on a test machine or dry-run with `--check` +- [ ] Confirm systemd timers (`systemctl list-timers`) and/or launchd plists loaded +- [ ] Confirm cron_jobs present in `crontab -l` +- [ ] Confirm request_log DB created at `~/.local/timmy/request_log.db` +- [ ] Verify config.yaml matches template and contains no Anthropic references +- [ ] Register webhook in Gitea UI and test with merge-to-main (staging fork or test repo first) + +--- + +*This consolidation establishes the Ansible directory as the canonical fleet management system for the Timmy Foundation.* + + diff --git a/ansible/README.md b/ansible/README.md index 9fb4cc9c..da52cd45 100644 --- a/ansible/README.md +++ b/ansible/README.md @@ -1,3 +1,7 @@ +# Ansible IaC — The Timmy Foundation Fleet (CANONICAL) +> **Status:** This is the single source of truth for all fleet infrastructure. +> Ad-hoc recovery scripts (`bin/`, `fleet/`, `cron/`) are DEPRECATED — see CONSOLDATION.md. + # Ansible IaC — The Timmy Foundation Fleet > One canonical Ansible playbook defines: deadman switch, cron schedule, diff --git a/bin/deadman-fallback.py b/deprecated/bin/deadman-fallback.py.deprecated similarity index 100% rename from bin/deadman-fallback.py rename to deprecated/bin/deadman-fallback.py.deprecated diff --git a/bin/deadman-switch.sh b/deprecated/bin/deadman-switch.sh.deprecated similarity index 100% rename from bin/deadman-switch.sh rename to deprecated/bin/deadman-switch.sh.deprecated diff --git a/bin/hermes-startup.sh b/deprecated/bin/hermes-startup.sh.deprecated similarity index 100% rename from bin/hermes-startup.sh rename to deprecated/bin/hermes-startup.sh.deprecated diff --git a/bin/model-fallback-verify.py b/deprecated/bin/model-fallback-verify.py.deprecated similarity index 100% rename from bin/model-fallback-verify.py rename to deprecated/bin/model-fallback-verify.py.deprecated diff --git a/bin/model-health-check.sh b/deprecated/bin/model-health-check.sh.deprecated similarity index 100% rename from bin/model-health-check.sh rename to deprecated/bin/model-health-check.sh.deprecated diff --git a/bin/provider-health-monitor.py b/deprecated/bin/provider-health-monitor.py.deprecated similarity index 100% rename from bin/provider-health-monitor.py rename to deprecated/bin/provider-health-monitor.py.deprecated diff --git a/cron/muda-audit.crontab b/deprecated/cron/muda-audit.crontab.deprecated similarity index 100% rename from cron/muda-audit.crontab rename to deprecated/cron/muda-audit.crontab.deprecated diff --git a/fleet/auto_restart.py b/deprecated/fleet/auto_restart.py.deprecated similarity index 100% rename from fleet/auto_restart.py rename to deprecated/fleet/auto_restart.py.deprecated