Commit Graph

6 Commits

Author SHA1 Message Date
STEP35 FREE BURN
d913be508b fix(#442): deprecate ad-hoc recovery scripts; add ansible structure tests
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 22s
Smoke Test / smoke (pull_request) Failing after 21s
Validate Config / YAML Lint (pull_request) Failing after 15s
Validate Config / JSON Validate (pull_request) Successful in 18s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 57s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Cron Syntax Check (pull_request) Successful in 13s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 14s
Validate Config / Shell Script Lint (pull_request) Failing after 53s
Validate Config / Playbook Schema Validation (pull_request) Successful in 25s
Architecture Lint / Lint Repository (pull_request) Failing after 20s
PR Checklist / pr-checklist (pull_request) Failing after 3m32s
- Add DEPRECATION headers to overlapping ad-hoc recovery scripts:
  - config_drift.py, config_drift_detector.py → golden_state role
  - self_healing.py → deadman_switch role
  - reset_pipeline_state.py → agent_startup playbook
  - provision_wizard.py → site.yml + wizard_base role
  - cron_audit.py, cron-audit-662.py → cron_manager role

- Add tests/test_ansible_structure.py (19 pytest checks) validating
  the complete canonical ansible/ structure required by #442

- Update docs/CANONICAL_SERVICES.md with explicit supersession table
  mapping each ad-hoc mechanism to its Ansible replacement

- ansible/README.md: document webhook setup as required step

This establishes ansible/ as the single canonical source of truth
for fleet lifecycle management and formally retires the overlapping
ad-hoc recovery mechanisms.

Closes #442
2026-04-29 01:41:34 -04:00
Alexander Whitestone
b49a0abf39 fix: restore self-healing runtime checks
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 9s
PR Checklist / pr-checklist (pull_request) Failing after 1m17s
Smoke Test / smoke (pull_request) Failing after 7s
Validate Config / YAML Lint (pull_request) Failing after 6s
Validate Config / JSON Validate (pull_request) Successful in 6s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 8s
Validate Config / Shell Script Lint (pull_request) Successful in 15s
Validate Config / Cron Syntax Check (pull_request) Successful in 5s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 5s
Validate Config / Playbook Schema Validation (pull_request) Successful in 8s
Architecture Lint / Lint Repository (pull_request) Failing after 8s
2026-04-12 10:53:55 -04:00
Alexander Whitestone
19aa0830f4 Harden Gemini scripts with verified SSH trust
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 8s
Validate Config / YAML Lint (pull_request) Failing after 5s
Validate Config / JSON Validate (pull_request) Successful in 6s
PR Checklist / pr-checklist (pull_request) Failing after 1m11s
Smoke Test / smoke (pull_request) Failing after 7s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 9s
Validate Config / Shell Script Lint (pull_request) Successful in 15s
Validate Config / Cron Syntax Check (pull_request) Successful in 5s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 5s
Validate Config / Playbook Schema Validation (pull_request) Successful in 7s
Architecture Lint / Lint Repository (pull_request) Failing after 6s
2026-04-11 15:13:15 -04:00
Alexander Whitestone
06031d923f test(scripts): lock self_healing safe CLI behavior (#435)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 8s
PR Checklist / pr-checklist (pull_request) Failing after 1m18s
Architecture Lint / Lint Repository (pull_request) Failing after 8s
2026-04-10 21:11:47 -04:00
Alexander Whitestone
179833148f feat(scripts/self_healing.py): safe-by-default with dry-run support
All checks were successful
PR Checklist / pr-checklist (pull_request) Successful in 1m14s
- Add --dry-run as default mode (no changes made)
- Add --execute flag to actually perform fixes
- Add --help-safe to explain each action
- Add confirmation prompts for destructive actions
- Add --confirm-kill flag for process termination (dangerous)
- Add --yes flag to skip confirmations for automation
- Add timestamps to log messages
- Improve SSH connection timeout
- Maintain existing functionality while making it safe by default

Addresses issue #435
2026-04-09 12:49:39 -04:00
e8302b4af2 feat: implement self_healing.py 2026-04-08 11:40:25 +00:00