[P2] Ansible IaC — Canonical Fleet Playbook #449

Merged
Rockachopa merged 3 commits from ansible-iac into main 2026-04-10 09:37:45 +00:00
Member

Summary

Implements the Ansible Infrastructure as Code story from KT 2026-04-08 (Priority TWO).

One canonical Ansible playbook defines the entire fleet lifecycle. No more ad-hoc recovery. No more overlapping deadman switches. No more agents mutating their own configs into oblivion.

What's Included (28 files, 1571 lines)

Playbooks

  • site.yml — Master playbook, runs everything, validates no banned providers
  • golden_state.yml — Deploy golden state provider chain to all wizards
  • deadman_switch.yml — Deploy deadman switch ACTION (snapshot + rollback + restart)
  • agent_startup.yml — Resurrect wizards from checked-in configs (serial, one at a time)
  • cron_schedule.yml — Source-controlled cron jobs (no manual crontab edits)
  • request_log.yml — Deploy telemetry table to all machines

Roles

  • wizard_base — Common setup: directories, thin config, git pull, SOUL.md
  • golden_state — Deploy and enforce golden state (Kimi → Gemini → Ollama)
  • deadman_switch — Health check → snapshot → rollback → restart (systemd + launchd)
  • request_log — SQLite telemetry table with schema, indexes, and analytics views
  • cron_manager — All cron jobs from group_vars, removes legacy jobs

Infrastructure

  • inventory/hosts.yml — Fleet inventory (Timmy, Allegro, Bezalel, Ezra)
  • group_vars/wizards.yml — Golden state variables, deadman config, cron schedule
  • scripts/deploy_on_webhook.sh — Gitea webhook → ansible-pull on merge
  • scripts/validate_config.py — Pre-deploy config validation (rejects banned providers)
  • BANNED_PROVIDERS.yml — Anthropic ban manifest

Templates

  • thin_config.yml.j2 — Immutable local pointer to upstream (P4)
  • wizard_config.yaml.j2 — Golden state wizard config
  • deadman_action.sh.j2 — The deadman switch ACTION script
  • deadman_switch.service.j2 / .timer.j2 — systemd units for VPS
  • deadman_switch.plist.j2 — launchd for Mac

Golden State Provider Chain

Priority Provider Model Status
1 Kimi kimi-k2.5 Primary
2 Gemini via OpenRouter gemini-2.5-pro Fallback
3 Ollama local gemma4:latest Terminal fallback

Anthropic: PERMANENTLY BANNED — Ansible FAILS if any Anthropic reference is found.

Addresses

  • timmy-config #442 (Ansible IaC)
  • timmy-config #443 (Thin Config Pattern)
  • timmy-config #444 (Deadman Switch ACTION)
  • timmy-config #446 (request_log Telemetry)
  • Partially addresses #445 (Provider Fallback Chain — config side)

KT References

  • KT Final 2026-04-08: Priority TWO (Ansible IaC), Priority FOUR (Thin Config)
  • KT Bezalel 2026-04-08: Priorities #1-#5 (deadman, fallback, telemetry, resurrect, MiMo)
  • KT Supplementary 2026-04-08: Config Stability, Fleet Status
## Summary Implements the Ansible Infrastructure as Code story from KT 2026-04-08 (Priority TWO). One canonical Ansible playbook defines the entire fleet lifecycle. No more ad-hoc recovery. No more overlapping deadman switches. No more agents mutating their own configs into oblivion. ## What's Included (28 files, 1571 lines) ### Playbooks - **site.yml** — Master playbook, runs everything, validates no banned providers - **golden_state.yml** — Deploy golden state provider chain to all wizards - **deadman_switch.yml** — Deploy deadman switch ACTION (snapshot + rollback + restart) - **agent_startup.yml** — Resurrect wizards from checked-in configs (serial, one at a time) - **cron_schedule.yml** — Source-controlled cron jobs (no manual crontab edits) - **request_log.yml** — Deploy telemetry table to all machines ### Roles - **wizard_base** — Common setup: directories, thin config, git pull, SOUL.md - **golden_state** — Deploy and enforce golden state (Kimi → Gemini → Ollama) - **deadman_switch** — Health check → snapshot → rollback → restart (systemd + launchd) - **request_log** — SQLite telemetry table with schema, indexes, and analytics views - **cron_manager** — All cron jobs from group_vars, removes legacy jobs ### Infrastructure - **inventory/hosts.yml** — Fleet inventory (Timmy, Allegro, Bezalel, Ezra) - **group_vars/wizards.yml** — Golden state variables, deadman config, cron schedule - **scripts/deploy_on_webhook.sh** — Gitea webhook → ansible-pull on merge - **scripts/validate_config.py** — Pre-deploy config validation (rejects banned providers) - **BANNED_PROVIDERS.yml** — Anthropic ban manifest ### Templates - **thin_config.yml.j2** — Immutable local pointer to upstream (P4) - **wizard_config.yaml.j2** — Golden state wizard config - **deadman_action.sh.j2** — The deadman switch ACTION script - **deadman_switch.service.j2** / **.timer.j2** — systemd units for VPS - **deadman_switch.plist.j2** — launchd for Mac ## Golden State Provider Chain | Priority | Provider | Model | Status | |---|---|---|---| | 1 | Kimi | kimi-k2.5 | Primary | | 2 | Gemini via OpenRouter | gemini-2.5-pro | Fallback | | 3 | Ollama local | gemma4:latest | Terminal fallback | **Anthropic: PERMANENTLY BANNED** — Ansible FAILS if any Anthropic reference is found. ## Addresses - timmy-config #442 (Ansible IaC) - timmy-config #443 (Thin Config Pattern) - timmy-config #444 (Deadman Switch ACTION) - timmy-config #446 (request_log Telemetry) - Partially addresses #445 (Provider Fallback Chain — config side) ## KT References - KT Final 2026-04-08: Priority TWO (Ansible IaC), Priority FOUR (Thin Config) - KT Bezalel 2026-04-08: Priorities #1-#5 (deadman, fallback, telemetry, resurrect, MiMo) - KT Supplementary 2026-04-08: Config Stability, Fleet Status
perplexity added this to the KT-2026-04-08: Infrastructure Stabilization milestone 2026-04-09 22:26:05 +00:00
perplexity added 1 commit 2026-04-09 22:26:05 +00:00
feat(ansible): Canonical IaC playbook for fleet management
Some checks failed
PR Checklist / pr-checklist (pull_request) Failing after 1m27s
7ec45642eb
Implements the Ansible Infrastructure as Code story from KT 2026-04-08.

One canonical Ansible playbook defines:
- Deadman switch (snapshot good config on health, rollback+restart on death)
- Golden state config deployment (Anthropic BANNED, Kimi→Gemini→Ollama)
- Cron schedule (source-controlled, no manual crontab edits)
- Agent startup sequence (pull→validate→start→verify)
- request_log telemetry table (every inference call logged)
- Thin config pattern (immutable local pointer to upstream)
- Gitea webhook handler (deploy on merge)
- Config validator (rejects banned providers)

Fleet inventory: Timmy (Mac), Allegro (VPS), Bezalel (VPS), Ezra (VPS)

Roles: wizard_base, golden_state, deadman_switch, request_log, cron_manager

Addresses: timmy-config #442, #443, #444, #445, #446
References: KT Final 2026-04-08 P2, KT Bezalel 2026-04-08 #1-#5
Rockachopa reviewed 2026-04-10 03:41:18 +00:00
Rockachopa left a comment
Owner

Auto-approved: clean diff, no conflicts, mergeable.

Auto-approved: clean diff, no conflicts, mergeable.
Rockachopa scheduled this pull request to auto merge when all checks succeed 2026-04-10 03:41:18 +00:00
Timmy added 1 commit 2026-04-10 09:36:26 +00:00
Merge branch 'main' into ansible-iac
Some checks failed
PR Checklist / pr-checklist (pull_request) Failing after 1m40s
e124ff8b05
Timmy added 1 commit 2026-04-10 09:37:40 +00:00
Merge branch 'main' into ansible-iac
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 8s
Architecture Lint / Lint Repository (pull_request) Failing after 6s
PR Checklist / pr-checklist (pull_request) Failing after 1m43s
544f2a9729
Rockachopa merged commit d8f88bed16 into main 2026-04-10 09:37:45 +00:00
Rockachopa referenced this issue from a commit 2026-04-10 09:37:46 +00:00
Sign in to join this conversation.