- Add DEPRECATION headers to overlapping ad-hoc recovery scripts: - config_drift.py, config_drift_detector.py → golden_state role - self_healing.py → deadman_switch role - reset_pipeline_state.py → agent_startup playbook - provision_wizard.py → site.yml + wizard_base role - cron_audit.py, cron-audit-662.py → cron_manager role - Add tests/test_ansible_structure.py (19 pytest checks) validating the complete canonical ansible/ structure required by #442 - Update docs/CANONICAL_SERVICES.md with explicit supersession table mapping each ad-hoc mechanism to its Ansible replacement - ansible/README.md: document webhook setup as required step This establishes ansible/ as the single canonical source of truth for fleet lifecycle management and formally retires the overlapping ad-hoc recovery mechanisms. Closes #442
5.8 KiB
Canonical Fleet Services
Last updated: 2026-04-28 (audit #880)
Parent: #478
Scope: Local cron jobs, launchd agents, daemon scripts, and watchdog processes in Timmy's sovereign fleet.
This document is the source-of-truth inventory of what services are intentionally running and what has been deliberately removed. It is not a live diagnostic — for that, see
docs/automation-inventory.md(launchd) andscripts/cron-audit-662.py(cron health).
Quick state summary
| Layer | Total | Canonical | Dead / superseded | Action taken |
|---|---|---|---|---|
| Hermes cron jobs | 8 → 6 | 6 | 2 (Triage Heartbeat, PR Review Sweep) | Removed from cron/jobs.json |
| VPS crontab jobs | 25 | 25 | 0 | Untouched (per #880 hard rule) |
| launchd agents | 5 (live) | 5 | 3 quarantined in 2026-04-04 cleanup | Documented only |
| daemon/watchdog | see automation-inventory.md | — | — | — |
Hermes cron jobs (source: cron/jobs.json)
These are managed by the Hermes cron system (~/.hermes/cron/jobs.json). Jobs marked REMOVED have been excised from source control as dead, superseded, or non-canonical.
| Name | Schedule | Enabled | Owner | Purpose | Status |
|---|---|---|---|---|---|
| Health Monitor | every 5m | yes | Ops | Ollama/disk/memory/GPU health check | ✅ Canonical |
| Muda Audit | 0 21 * * 0 (Sun) | yes | Ezra | Weekly fleet audit (fleet/muda-audit.sh) |
✅ Canonical |
| Kaizen Retro | daily 07:30 | yes | Ezra | Post-burn retrospective (scripts/kaizen_retro.py) |
✅ Canonical |
| Overnight R&D Loop | nightly 22:00 EDT | yes | Research | Deep dive papers, tool-use training data | ✅ Canonical |
| Autonomous Cron Supervisor | every 7m | yes | Timmy | Monitors dev/timmy tmux sessions (tmux-supervisor) |
✅ Canonical |
| Hermes Philosophy Loop | every 1440m | no | Timmy | Draft — issues to hermes-agent | ⏸️ Disabled (draft) |
| Triage Heartbeat | every 15m | no | Dashboard | Scan & auto-assign issues | ❌ REMOVED — dashboard repo frozen, loops redirected to the-nexus |
| PR Review Sweep | every 30m | no | Dashboard | Review diffs, merge passing PRs | ❌ REMOVED — dashboard repo frozen, loops redirected to the-nexus |
Removal rationale (issue #880): Triage Heartbeat and PR Review Sweep were dashboard-era jobs paused on 2026-04-04 with the explicit reason: "Dashboard repo frozen - loops redirected to the-nexus." They have been superseded by the-nexus coordinator flows and pose state-rot risk if accidentally re-enabled. They are deleted from cron/jobs.json.
VPS crontab jobs
Per the hard rule in #880, VPS-specific crontab entries are NOT modified in this issue. They remain as-is in cron/vps/*-crontab-backup.txt.
Allegro (7 jobs) — model download guard, heartbeat daemon, burn-mode loops, dead-man monitor
Ezra (8 jobs) — burn-mode, gitea/awareness loops, kt compiler, mempalace nightly, dispatch
Bezalel (8 jobs) — nightly watch, act runner daemon, backups, heartbeat, secret guard, ultraplan
See individual files for accurate listings:
cron/vps/allegro-crontab-backup.txtcron/vps/ezra-crontab-backup.txtcron/vps/bezalel-crontab-backup.txt
Launchd agents (macOS local)
Fully documented in docs/automation-inventory.md.
| Name | Plist | Interval | Status |
|---|---|---|---|
| ai.hermes.gateway | ~/Library/LaunchAgents/ai.hermes.gateway.plist |
KeepAlive | ✅ Active |
| ai.hermes.gateway-fenrir | ~/Library/LaunchAgents/ai.hermes.gateway-fenrir.plist |
KeepAlive | ✅ Active |
| ai.timmy.kimi-heartbeat | ~/Library/LaunchAgents/ai.timmy.kimi-heartbeat.plist |
300s | ✅ Active |
| ai.timmy.claudemax-watchdog | ~/Library/LaunchAgents/ai.timmy.claudemax-watchdog.plist |
300s | ✅ Active |
| (quarantined legacy) | — | — | ❌ Moved 2026-04-04 |
Daemons / tmux watchdogs
Long-running autonomous processes managed by launchd or tmux supervisors. Status is not tracked here — see live diagnostics or the automation-inventory for details.
autonomous-cron-supervisor(Hermes cron job above triggers this)tmux-supervisor— monitors dev/timmy tmux panesclaudemax-watchdog— watches Claude loop quotaburn-modeloops on each VPS (via crontab)
Ad-hoc recovery mechanisms — superseded by Ansible IaC (#442)
The following ad-hoc recovery/drift/healing scripts have been superseded by the canonical Ansible deployment. They remain in the repo for historical reference only. Do not use. New recovery must go through Ansible.
| Ad-hoc script | Superseded by | Reason |
|---|---|---|
scripts/config_drift.py |
ansible/roles/golden_state |
Drift detection is unnecessary — golden_state enforces convergence |
scripts/config_drift_detector.py |
ansible/roles/golden_state |
Duplicate — use golden_state playbook |
scripts/self_healing.py |
ansible/roles/deadman_switch |
Self-healing is the deadman switch ACTION |
scripts/reset_pipeline_state.py |
ansible/playbooks/agent_startup.yml |
Wizard resurrection is agent_startup |
scripts/provision_wizard.py |
ansible/roles/wizard_base |
Provisioning is wizard_base + site.yml |
scripts/cron_audit.py / cron-audit-662.py |
ansible/roles/cron_manager |
Cron jobs are declared in Ansible, not audited ad-hoc |
DEPRECATION NOTICE (2026-04-29, STEP35): All scripts above now contain inline DEPRECATION headers pointing to their Ansible replacements. They will be removed after fleet verification.
Change log
| Date | Change | By |
|---|---|---|
| 2026-04-28 | Removed Triage Heartbeat & PR Review Sweep from cron/jobs.json (issue #880) |
STEP35 audit |