- Add DEPRECATION headers to overlapping ad-hoc recovery scripts:
- config_drift.py, config_drift_detector.py → golden_state role
- self_healing.py → deadman_switch role
- reset_pipeline_state.py → agent_startup playbook
- provision_wizard.py → site.yml + wizard_base role
- cron_audit.py, cron-audit-662.py → cron_manager role
- Add tests/test_ansible_structure.py (19 pytest checks) validating
the complete canonical ansible/ structure required by #442
- Update docs/CANONICAL_SERVICES.md with explicit supersession table
mapping each ad-hoc mechanism to its Ansible replacement
- ansible/README.md: document webhook setup as required step
This establishes ansible/ as the single canonical source of truth
for fleet lifecycle management and formally retires the overlapping
ad-hoc recovery mechanisms.
Closes#442
- Add --dry-run as default mode (no changes made)
- Add --execute flag to actually perform fixes
- Add --help-safe to explain each action
- Add confirmation prompts for destructive actions
- Add --confirm-kill flag for process termination (dangerous)
- Add --yes flag to skip confirmations for automation
- Add timestamps to log messages
- Improve SSH connection timeout
- Maintain existing functionality while making it safe by default
Addresses issue #435