[P2] Ansible IaC — Canonical Playbook for Fleet Management #442

Open
opened 2026-04-09 22:17:24 +00:00 by perplexity · 0 comments
Member

Source

KT Final Session Document 2026-04-08 — Priority TWO

Problem

Multiple overlapping deadman switches, backup systems, and recovery mechanisms built by different agents without coordination. Chaos on restart. Configs restored incorrectly. Cascading failures.

Solution

Ansible playbooks checked into Gitea. One canonical playbook defines the entire fleet lifecycle.

Requirements

  • One canonical Ansible playbook defines: deadman switch, cron schedule, golden state rollback, agent startup sequence
  • Committed to Git — all changes go through PR review
  • Gitea webhook triggers Ansible deploy on merge
  • Idempotent — runs across all machines (one VM or ten laptops, same result)
  • Kill all existing overlapping deadman switches — replace with one Ansible-deployed solution
  • Source-control the cron jobs through Ansible
  • Each wizard's entire recovery defined in a checked-in Ansible playbook

Playbook Structure (proposed)

ansible/
├── inventory/
│   ├── hosts.yml              # All machines in the fleet
│   └── group_vars/
│       └── wizards.yml        # Shared wizard variables
├── playbooks/
│   ├── site.yml               # Master playbook
│   ├── deadman_switch.yml     # Deadman switch deployment
│   ├── golden_state.yml       # Golden state config rollback
│   ├── agent_startup.yml      # Agent startup sequence
│   └── cron_schedule.yml      # All cron jobs
├── roles/
│   ├── wizard_base/           # Common wizard setup
│   ├── deadman_switch/        # Deadman switch role
│   ├── golden_state/          # Golden state management
│   └── request_log/           # Telemetry table setup
├── templates/
│   └── wizard_config.j2       # Jinja2 config template
└── ansible.cfg

Acceptance Criteria

  • Ansible directory structure committed to timmy-config
  • Inventory file lists all known fleet machines
  • Deadman switch playbook deploys and configures the switch
  • Golden state rollback playbook restores known-good config
  • Agent startup sequence playbook brings wizards up in correct order
  • Cron jobs managed through Ansible (no manual crontab edits)
  • Gitea webhook configured to trigger ansible-pull on merge
  • All existing ad-hoc recovery mechanisms identified and replaced
  • Playbook runs idempotently (can re-run without side effects)

Dependencies

  • Deadman switch ACTION must be wired first (separate issue)
  • Golden state config must be defined (done — PR #440)
  • Thin config pattern informs how configs are deployed
## Source KT Final Session Document 2026-04-08 — Priority TWO ## Problem Multiple overlapping deadman switches, backup systems, and recovery mechanisms built by different agents without coordination. Chaos on restart. Configs restored incorrectly. Cascading failures. ## Solution Ansible playbooks checked into Gitea. One canonical playbook defines the entire fleet lifecycle. ## Requirements - **One canonical Ansible playbook** defines: deadman switch, cron schedule, golden state rollback, agent startup sequence - Committed to Git — all changes go through PR review - **Gitea webhook** triggers Ansible deploy on merge - **Idempotent** — runs across all machines (one VM or ten laptops, same result) - **Kill all existing overlapping deadman switches** — replace with one Ansible-deployed solution - Source-control the cron jobs through Ansible - Each wizard's entire recovery defined in a checked-in Ansible playbook ## Playbook Structure (proposed) ``` ansible/ ├── inventory/ │ ├── hosts.yml # All machines in the fleet │ └── group_vars/ │ └── wizards.yml # Shared wizard variables ├── playbooks/ │ ├── site.yml # Master playbook │ ├── deadman_switch.yml # Deadman switch deployment │ ├── golden_state.yml # Golden state config rollback │ ├── agent_startup.yml # Agent startup sequence │ └── cron_schedule.yml # All cron jobs ├── roles/ │ ├── wizard_base/ # Common wizard setup │ ├── deadman_switch/ # Deadman switch role │ ├── golden_state/ # Golden state management │ └── request_log/ # Telemetry table setup ├── templates/ │ └── wizard_config.j2 # Jinja2 config template └── ansible.cfg ``` ## Acceptance Criteria - [ ] Ansible directory structure committed to timmy-config - [ ] Inventory file lists all known fleet machines - [ ] Deadman switch playbook deploys and configures the switch - [ ] Golden state rollback playbook restores known-good config - [ ] Agent startup sequence playbook brings wizards up in correct order - [ ] Cron jobs managed through Ansible (no manual crontab edits) - [ ] Gitea webhook configured to trigger ansible-pull on merge - [ ] All existing ad-hoc recovery mechanisms identified and replaced - [ ] Playbook runs idempotently (can re-run without side effects) ## Dependencies - Deadman switch ACTION must be wired first (separate issue) - Golden state config must be defined (done — PR #440) - Thin config pattern informs how configs are deployed
perplexity added this to the KT-2026-04-08: Infrastructure Stabilization milestone 2026-04-09 22:17:24 +00:00
bezalel was assigned by Timmy 2026-04-10 00:15:18 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/timmy-config#442