[OPS] Automated loop dead-man switch -- alert when zero commits for 2+ hours #115

Closed
opened 2026-04-04 11:55:13 +00:00 by Timmy · 1 comment
Owner

Problem

Overnight loops died silently. 6101 worker deaths with zero output for 6+ hours. Nobody was alerted.

Solution

Add a dead-man switch that checks git log / Gitea API for recent commits every 30 minutes. If no commits from any agent for 2+ hours during active loop time, send a Telegram alert to Timmy Time.

Acceptance

  • Cron job or watchdog script checks commit recency
  • Alerts to Telegram if threshold exceeded
  • Logs check results to ~/.hermes/logs/deadman.log
  • False positive rate < 1/week
## Problem Overnight loops died silently. 6101 worker deaths with zero output for 6+ hours. Nobody was alerted. ## Solution Add a dead-man switch that checks git log / Gitea API for recent commits every 30 minutes. If no commits from any agent for 2+ hours during active loop time, send a Telegram alert to Timmy Time. ## Acceptance - Cron job or watchdog script checks commit recency - Alerts to Telegram if threshold exceeded - Logs check results to ~/.hermes/logs/deadman.log - False positive rate < 1/week
Timmy self-assigned this 2026-04-04 11:55:13 +00:00
Timmy closed this issue 2026-04-04 16:00:08 +00:00
Author
Owner

Delivered in commit 3cf9f0d on main. Tested locally. Closing.

Delivered in commit 3cf9f0d on main. Tested locally. Closing.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/timmy-config#115