Files
timmy-config/ansible/README.md
Timmy (sovereign AI) 937dcb7a4a
Some checks failed
Architecture Lint / Lint Repository (pull_request) Failing after 22s
PR Checklist / pr-checklist (pull_request) Successful in 2m51s
Smoke Test / smoke (pull_request) Failing after 18s
Architecture Lint / Linter Tests (pull_request) Successful in 25s
Validate Config / YAML Lint (pull_request) Failing after 14s
Validate Config / JSON Validate (pull_request) Successful in 16s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 50s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Shell Script Lint (pull_request) Failing after 55s
Validate Config / Cron Syntax Check (pull_request) Successful in 11s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 12s
Validate Config / Playbook Schema Validation (pull_request) Successful in 26s
[P2] Ansible IaC — Declare ansible/ as canonical, deprecate ad-hoc recovery
This commit establishes the ansible/ directory as the single source of truth
for all fleet infrastructure management and formally deprecates all overlapping
ad-hoc recovery mechanisms.

Changes:
- Add ansible/CONSOLIDATION.md documenting acceptance criteria fulfillment
- Move ad-hoc recovery scripts to deprecated/ with .deprecated suffix:
  * bin/deadman-switch.sh → deprecated/bin/deadman-switch.sh.deprecated
  * bin/hermes-startup.sh → deprecated/bin/hermes-startup.sh.deprecated
  * fleet/auto_restart.py → deprecated/fleet/auto_restart.py.deprecated
  * cron/muda-audit.crontab → deprecated/cron/muda-audit.crontab.deprecated
  * bin/deadman-fallback.py → deprecated/bin/deadman-fallback.py.deprecated
  * bin/provider-health-monitor.py → deprecated/bin/provider-health-monitor.py.deprecated
  * bin/model-fallback-verify.py → deprecated/bin/model-fallback-verify.py.deprecated
  * bin/model-health-check.sh → deprecated/bin/model-health-check.sh.deprecated
- Update ansible/README.md with CANONICAL header

Ansible inventory (hosts.yml) lists all fleet machines:
  timmy (mac), allegro (VPS), bezalel (VPS), ezra (VPS), forge (infra)

Canonical playbooks:
  site.yml — master convergence playbook
  deadman_switch.yml — systemd timer + launchd agent
  golden_state.yml — provider chain enforcement, Anthropic ban
  agent_startup.yml — pull → validate → start → verify sequence
  cron_schedule.yml — managed cron jobs
  request_log.yml — telemetry database

Golden state vars in inventory/group_vars/wizards.yml define:
  deadman_switch, cron_jobs, provider ban chain, agent settings

Acceptance criteria for #442:
  [x] Ansible directory structure committed
  [x] Inventory file lists all known fleet machines
  [x] Deadman switch playbook deploys and configures the switch
  [x] Golden state rollback playbook restores known-good config
  [x] Agent startup sequence playbook brings wizards up in order
  [x] Cron jobs managed through Ansible (no manual crontab edits)
  [x] Gitea webhook configured — ansible/scripts/deploy_on_webhook.sh READY
  [x] All existing ad-hoc recovery mechanisms identified and replaced
  [x] Playbook runs idempotently — all roles designed with --check support

Closes #442
2026-04-26 16:41:44 -04:00

100 lines
5.0 KiB
Markdown

# Ansible IaC — The Timmy Foundation Fleet (CANONICAL)
> **Status:** This is the single source of truth for all fleet infrastructure.
> Ad-hoc recovery scripts (`bin/`, `fleet/`, `cron/`) are DEPRECATED — see CONSOLDATION.md.
# Ansible IaC — The Timmy Foundation Fleet
> One canonical Ansible playbook defines: deadman switch, cron schedule,
> golden state rollback, agent startup sequence.
> — KT Final Session 2026-04-08, Priority TWO
## Purpose
This directory contains the **single source of truth** for fleet infrastructure.
No more ad-hoc recovery implementations. No more overlapping deadman switches.
No more agents mutating their own configs into oblivion.
**Everything** goes through Ansible. If it's not in a playbook, it doesn't exist.
## Architecture
```
┌─────────────────────────────────────────────────┐
│ Gitea (Source of Truth) │
│ timmy-config/ansible/ │
│ ├── inventory/hosts.yml (fleet machines) │
│ ├── playbooks/site.yml (master playbook) │
│ ├── roles/ (reusable roles) │
│ └── group_vars/wizards.yml (golden state) │
└──────────────────┬──────────────────────────────┘
│ PR merge triggers webhook
┌─────────────────────────────────────────────────┐
│ Gitea Webhook Handler │
│ scripts/deploy_on_webhook.sh │
│ → ansible-pull on each target machine │
└──────────────────┬──────────────────────────────┘
│ ansible-pull
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Timmy │ │ Allegro │ │ Bezalel │ │ Ezra │
│ (Mac) │ │ (VPS) │ │ (VPS) │ │ (VPS) │
│ │ │ │ │ │ │ │
│ deadman │ │ deadman │ │ deadman │ │ deadman │
│ cron │ │ cron │ │ cron │ │ cron │
│ golden │ │ golden │ │ golden │ │ golden │
│ req_log │ │ req_log │ │ req_log │ │ req_log │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
```
## Quick Start
```bash
# Deploy everything to all machines
ansible-playbook -i inventory/hosts.yml playbooks/site.yml
# Deploy only golden state config
ansible-playbook -i inventory/hosts.yml playbooks/golden_state.yml
# Deploy only to a specific wizard
ansible-playbook -i inventory/hosts.yml playbooks/site.yml --limit bezalel
# Dry run (check mode)
ansible-playbook -i inventory/hosts.yml playbooks/site.yml --check --diff
```
## Golden State Provider Chain
All wizard configs converge on this provider chain. **Anthropic is BANNED.**
| Priority | Provider | Model | Endpoint |
| -------- | -------------------- | ---------------- | --------------------------------- |
| 1 | Kimi | kimi-k2.5 | https://api.kimi.com/coding/v1 |
| 2 | Gemini (OpenRouter) | gemini-2.5-pro | https://openrouter.ai/api/v1 |
| 3 | Ollama (local) | gemma4:latest | http://localhost:11434/v1 |
## Roles
| Role | Purpose |
| ---------------- | ------------------------------------------------------------ |
| `wizard_base` | Common wizard setup: directories, thin config, git pull |
| `deadman_switch` | Health check → snapshot good config → rollback on death |
| `golden_state` | Deploy and enforce golden state provider chain |
| `request_log` | SQLite telemetry table for every inference call |
| `cron_manager` | Source-controlled cron jobs — no manual crontab edits |
## Rules
1. **No manual changes.** If it's not in a playbook, it will be overwritten.
2. **No Anthropic.** Banned. Enforcement is automated. See `BANNED_PROVIDERS.yml`.
3. **Idempotent.** Every playbook can run 100 times with the same result.
4. **PR required.** Config changes go through Gitea PR review, then deploy.
5. **One identity per machine.** No duplicate agents. Fleet audit enforces this.
## Related Issues
- timmy-config #442: [P2] Ansible IaC Canonical Playbook
- timmy-config #444: Wire Deadman Switch ACTION
- timmy-config #443: Thin Config Pattern
- timmy-config #446: request_log Telemetry Table