[claude] Poka-yoke cron heartbeats — make silent failures impossible (#1096) #1102
Reference in New Issue
Block a user
Delete Branch "claude/issue-1096"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Fixes #1096
What this does
Implements all three acceptance criteria from [#1096]:
1. Prevention —
nexus/cron_heartbeat.pyEvery cron job can call
write_cron_heartbeat("job_name", interval_seconds=300)on completion. The file is written atomically to/var/run/bezalel/heartbeats/<job>.last(falls back to~/.bezalel/heartbeats/on dev machines).nexus_watchdog.pynow writes its own heartbeat after each check cycle.2. Detection —
bin/check_cron_heartbeats.pyMeta-heartbeat checker meant to run every 15 minutes. Scans all
.lastfiles and creates/updates/closes a Gitea alert issue ([heartbeat-checker]prefix) when any job has been silent for > 2× its declared interval.Cron entry:
3. Correction —
bin/night_watch.py+ heartbeat panelNew nightly report generator. Includes a Heartbeat Panel table (job, status, age, interval, ratio) so stale jobs appear in every morning brief.
--panelflag outputs just the panel for embedding in other reports.Tests
28 new tests across
TestWriteCronHeartbeat,TestScanHeartbeats,TestHeartbeatReport,TestFmtDuration,TestBuildReport, andTestWatchdogHeartbeatIntegration. All 50 tests pass (28 new + 22 existing watchdog tests).