feat(scripts/self_healing.py): safe-by-default with dry-run support #450
Reference in New Issue
Block a user
Delete Branch "burn/20260409-1247-self-healing-safe"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Addresses issue #435.
Auto-approved: clean diff, no conflicts, mergeable.
Branch is behind base and has conflicts after update. Needs manual rebase/merge conflict resolution.
Perplexity Review — PR #450
Verdict: Approve (minor notes)
Strong safety-first refactor. The original
self_healing.pywould restart services and delete logs without asking — this PR adds dry-run-by-default, confirmation prompts, and tiered escalation. Good pattern for fleet infrastructure scripts.Findings
Dry-run by default — correct approach.
--executerequired for any changes. The redundant--dry-runflag for explicitness is fine.Confirmation hierarchy — three tiers: dry-run → execute+confirm → execute+confirm-kill. Well-designed escalation path.
StrictHostKeyChecking=nostill present — line 38 still uses-o StrictHostKeyChecking=noinrun_remote(). This was flagged in #452's audit as a security concern. Consider migrating tossh_trust.pyonce that PR is recreated, or at minimum add a TODO comment.Bare
except:clauses — lines 64, 90, 104 use bareexcept: pass. These silently swallow all exceptions includingKeyboardInterrupt. Useexcept Exception:at minimum, or log the error.rm -rf /var/log/*.gz— line 87 runs a glob-basedrm -rfvia SSH. While gated behind confirmation, this is still aggressive. Consider usingfind /var/log -name '*.gz' -mtime +1 -deletefor safer targeted cleanup.Process kill placeholder — line 120 is a placeholder comment. Fine for now, but should be tracked as a follow-up if this script is meant to actually perform process management.
SSH timeout bump — 10s→15s is reasonable for cross-host connections. The added
ConnectTimeout=5is good.CI status — PR checklist and Lint Repository failing; branch is out-of-date. Needs rebase before merge.
No blockers. Rebase to resolve conflicts and this is merge-ready.