[ANDON] 行灯 — Real-time signal light. Instant Telegram alerts on any failure. #347

Open
opened 2026-04-07 14:30:10 +00:00 by Timmy · 0 comments
Owner

Part of Epic: #345

Principle

Andon is a signal cord or light that any worker can pull to make a problem visible to EVERYONE. Problems don't hide in log files. They show up on a board that the whole factory can see.

Our Problem

Dead API keys, ghost services, death-cycling loops — all buried in log files. Alexander had to ask for status. Problems should come to HIM, not the other way around.

Implementation: andon-alert.sh

A single alert function that any script can call:

andon-alert.sh "CRITICAL" "Groq API key expired" "agent-loop"
andon-alert.sh "WARNING" "Gemini rate limited 10 times in a row" "gemini-loop"
andon-alert.sh "HALT" "Jidoka quality gate triggered" "jidoka-gate"

Severity levels:

  • INFO: logged only
  • WARNING: logged + Telegram
  • CRITICAL: logged + Telegram + loop paused
  • HALT: logged + Telegram + loop killed + flag file

Wire into: loop startup (key validation fails), loop runtime (3 consecutive failures), watchdog (restart happened), jidoka (quality halt).

The burn-monitor cron becomes the "Andon board" — it should show RED/YELLOW/GREEN per lane, not just counts.

Acceptance Criteria

  • andon-alert.sh accepts severity, message, source
  • WARNING and above posts to Telegram immediately
  • CRITICAL and above pauses the offending loop
  • HALT kills + creates flag file
  • Burn monitor shows color-coded status per lane
  • Any script in the fleet can call andon-alert.sh to raise visibility
Part of Epic: #345 ## Principle Andon is a signal cord or light that any worker can pull to make a problem visible to EVERYONE. Problems don't hide in log files. They show up on a board that the whole factory can see. ## Our Problem Dead API keys, ghost services, death-cycling loops — all buried in log files. Alexander had to ask for status. Problems should come to HIM, not the other way around. ## Implementation: andon-alert.sh A single alert function that any script can call: ``` andon-alert.sh "CRITICAL" "Groq API key expired" "agent-loop" andon-alert.sh "WARNING" "Gemini rate limited 10 times in a row" "gemini-loop" andon-alert.sh "HALT" "Jidoka quality gate triggered" "jidoka-gate" ``` Severity levels: - INFO: logged only - WARNING: logged + Telegram - CRITICAL: logged + Telegram + loop paused - HALT: logged + Telegram + loop killed + flag file Wire into: loop startup (key validation fails), loop runtime (3 consecutive failures), watchdog (restart happened), jidoka (quality halt). The burn-monitor cron becomes the "Andon board" — it should show RED/YELLOW/GREEN per lane, not just counts. ## Acceptance Criteria - [ ] andon-alert.sh accepts severity, message, source - [ ] WARNING and above posts to Telegram immediately - [ ] CRITICAL and above pauses the offending loop - [ ] HALT kills + creates flag file - [ ] Burn monitor shows color-coded status per lane - [ ] Any script in the fleet can call andon-alert.sh to raise visibility
Timmy self-assigned this 2026-04-07 14:30:10 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/timmy-config#347