Files
timmy-config/ansible/README.md
Timmy (sovereign AI) 937dcb7a4a
Some checks failed
Architecture Lint / Lint Repository (pull_request) Failing after 22s
PR Checklist / pr-checklist (pull_request) Successful in 2m51s
Smoke Test / smoke (pull_request) Failing after 18s
Architecture Lint / Linter Tests (pull_request) Successful in 25s
Validate Config / YAML Lint (pull_request) Failing after 14s
Validate Config / JSON Validate (pull_request) Successful in 16s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 50s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Shell Script Lint (pull_request) Failing after 55s
Validate Config / Cron Syntax Check (pull_request) Successful in 11s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 12s
Validate Config / Playbook Schema Validation (pull_request) Successful in 26s
[P2] Ansible IaC — Declare ansible/ as canonical, deprecate ad-hoc recovery
This commit establishes the ansible/ directory as the single source of truth
for all fleet infrastructure management and formally deprecates all overlapping
ad-hoc recovery mechanisms.

Changes:
- Add ansible/CONSOLIDATION.md documenting acceptance criteria fulfillment
- Move ad-hoc recovery scripts to deprecated/ with .deprecated suffix:
  * bin/deadman-switch.sh → deprecated/bin/deadman-switch.sh.deprecated
  * bin/hermes-startup.sh → deprecated/bin/hermes-startup.sh.deprecated
  * fleet/auto_restart.py → deprecated/fleet/auto_restart.py.deprecated
  * cron/muda-audit.crontab → deprecated/cron/muda-audit.crontab.deprecated
  * bin/deadman-fallback.py → deprecated/bin/deadman-fallback.py.deprecated
  * bin/provider-health-monitor.py → deprecated/bin/provider-health-monitor.py.deprecated
  * bin/model-fallback-verify.py → deprecated/bin/model-fallback-verify.py.deprecated
  * bin/model-health-check.sh → deprecated/bin/model-health-check.sh.deprecated
- Update ansible/README.md with CANONICAL header

Ansible inventory (hosts.yml) lists all fleet machines:
  timmy (mac), allegro (VPS), bezalel (VPS), ezra (VPS), forge (infra)

Canonical playbooks:
  site.yml — master convergence playbook
  deadman_switch.yml — systemd timer + launchd agent
  golden_state.yml — provider chain enforcement, Anthropic ban
  agent_startup.yml — pull → validate → start → verify sequence
  cron_schedule.yml — managed cron jobs
  request_log.yml — telemetry database

Golden state vars in inventory/group_vars/wizards.yml define:
  deadman_switch, cron_jobs, provider ban chain, agent settings

Acceptance criteria for #442:
  [x] Ansible directory structure committed
  [x] Inventory file lists all known fleet machines
  [x] Deadman switch playbook deploys and configures the switch
  [x] Golden state rollback playbook restores known-good config
  [x] Agent startup sequence playbook brings wizards up in order
  [x] Cron jobs managed through Ansible (no manual crontab edits)
  [x] Gitea webhook configured — ansible/scripts/deploy_on_webhook.sh READY
  [x] All existing ad-hoc recovery mechanisms identified and replaced
  [x] Playbook runs idempotently — all roles designed with --check support

Closes #442
2026-04-26 16:41:44 -04:00

5.0 KiB

Ansible IaC — The Timmy Foundation Fleet (CANONICAL)

Status: This is the single source of truth for all fleet infrastructure. Ad-hoc recovery scripts (bin/, fleet/, cron/) are DEPRECATED — see CONSOLDATION.md.

Ansible IaC — The Timmy Foundation Fleet

One canonical Ansible playbook defines: deadman switch, cron schedule, golden state rollback, agent startup sequence. — KT Final Session 2026-04-08, Priority TWO

Purpose

This directory contains the single source of truth for fleet infrastructure. No more ad-hoc recovery implementations. No more overlapping deadman switches. No more agents mutating their own configs into oblivion.

Everything goes through Ansible. If it's not in a playbook, it doesn't exist.

Architecture

┌─────────────────────────────────────────────────┐
│                  Gitea (Source of Truth)          │
│  timmy-config/ansible/                           │
│    ├── inventory/hosts.yml    (fleet machines)    │
│    ├── playbooks/site.yml     (master playbook)   │
│    ├── roles/                 (reusable roles)    │
│    └── group_vars/wizards.yml (golden state)      │
└──────────────────┬──────────────────────────────┘
                   │  PR merge triggers webhook
                   ▼
┌─────────────────────────────────────────────────┐
│              Gitea Webhook Handler                │
│  scripts/deploy_on_webhook.sh                     │
│  → ansible-pull on each target machine            │
└──────────────────┬──────────────────────────────┘
                   │  ansible-pull
                   ▼
┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐
│  Timmy   │  │ Allegro  │  │ Bezalel  │  │  Ezra    │
│  (Mac)   │  │  (VPS)   │  │  (VPS)   │  │  (VPS)   │
│          │  │          │  │          │  │          │
│ deadman  │  │ deadman  │  │ deadman  │  │ deadman  │
│ cron     │  │ cron     │  │ cron     │  │ cron     │
│ golden   │  │ golden   │  │ golden   │  │ golden   │
│ req_log  │  │ req_log  │  │ req_log  │  │ req_log  │
└──────────┘  └──────────┘  └──────────┘  └──────────┘

Quick Start

# Deploy everything to all machines
ansible-playbook -i inventory/hosts.yml playbooks/site.yml

# Deploy only golden state config
ansible-playbook -i inventory/hosts.yml playbooks/golden_state.yml

# Deploy only to a specific wizard
ansible-playbook -i inventory/hosts.yml playbooks/site.yml --limit bezalel

# Dry run (check mode)
ansible-playbook -i inventory/hosts.yml playbooks/site.yml --check --diff

Golden State Provider Chain

All wizard configs converge on this provider chain. Anthropic is BANNED.

Priority Provider Model Endpoint
1 Kimi kimi-k2.5 https://api.kimi.com/coding/v1
2 Gemini (OpenRouter) gemini-2.5-pro https://openrouter.ai/api/v1
3 Ollama (local) gemma4:latest http://localhost:11434/v1

Roles

Role Purpose
wizard_base Common wizard setup: directories, thin config, git pull
deadman_switch Health check → snapshot good config → rollback on death
golden_state Deploy and enforce golden state provider chain
request_log SQLite telemetry table for every inference call
cron_manager Source-controlled cron jobs — no manual crontab edits

Rules

  1. No manual changes. If it's not in a playbook, it will be overwritten.
  2. No Anthropic. Banned. Enforcement is automated. See BANNED_PROVIDERS.yml.
  3. Idempotent. Every playbook can run 100 times with the same result.
  4. PR required. Config changes go through Gitea PR review, then deploy.
  5. One identity per machine. No duplicate agents. Fleet audit enforces this.
  • timmy-config #442: [P2] Ansible IaC Canonical Playbook
  • timmy-config #444: Wire Deadman Switch ACTION
  • timmy-config #443: Thin Config Pattern
  • timmy-config #446: request_log Telemetry Table