Files
timmy-config/bin/start-loops.sh
Alexander Payne a4261d2dff
Some checks failed
Smoke Test / smoke (pull_request) Failing after 20s
Architecture Lint / Linter Tests (pull_request) Successful in 24s
Validate Config / YAML Lint (pull_request) Failing after 15s
Validate Config / JSON Validate (pull_request) Successful in 17s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 57s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Deploy Script Dry Run (pull_request) Successful in 11s
Validate Config / Playbook Schema Validation (pull_request) Successful in 25s
Validate Config / Shell Script Lint (pull_request) Failing after 1m3s
Validate Config / Cron Syntax Check (pull_request) Successful in 13s
Architecture Lint / Lint Repository (pull_request) Failing after 21s
PR Checklist / pr-checklist (pull_request) Successful in 5m4s
feat(andon): implement real-time signal light system (#347)
- Add bin/andon-alert.sh with severity levels: INFO, WARNING, CRITICAL, HALT
- Wire into agent-loop: CRITICAL on 3 consecutive failures and dispatch key validation errors
- Wire into start-loops: CRITICAL on model health check failure
- Wire into pane-watchdog: WARNING on successful restart, CRITICAL on max restart exhaustion
- Creates ~/.hermes/andon-{flags,critical,halt} directories for state

The Andon system makes failures visible immediately via Telegram alerts
and local flag files, turning the factory floor into a transparent
operations center.

Closes #347
2026-04-27 09:41:57 -04:00

103 lines
4.2 KiB
Bash
Executable File

#!/usr/bin/env bash
# start-loops.sh — Start all Hermes agent loops (orchestrator + workers)
# Validates model health, cleans stale state, launches loops with nohup.
# Part of Gitea issue #126.
#
# Usage: start-loops.sh
set -euo pipefail
HERMES_BIN="$HOME/.hermes/bin"
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
LOG_DIR="$HOME/.hermes/logs"
CLAUDE_LOCKS="$LOG_DIR/claude-locks"
GEMINI_LOCKS="$LOG_DIR/gemini-locks"
mkdir -p "$LOG_DIR" "$CLAUDE_LOCKS" "$GEMINI_LOCKS"
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] START-LOOPS: $*"
}
# ── 1. Model health check ────────────────────────────────────────────
log "Running model health check..."
if ! bash "$SCRIPT_DIR/model-health-check.sh"; then
log "FATAL: Model health check failed. Aborting loop startup."
# ANDON Alert: CRITICAL — loop startup blocked
if [ -x "$SCRIPT_DIR/andon-alert.sh" ]; then
"$SCRIPT_DIR/andon-alert.sh" "CRITICAL" "Model health check failed — loops not started" "startup" 2>/dev/null || true
fi
exit 1
fi
log "Model health check passed."
# ── 2. Kill stale loop processes ──────────────────────────────────────
log "Killing stale loop processes..."
for proc_name in claude-loop gemini-loop timmy-orchestrator; do
pids=$(pgrep -f "${proc_name}\\.sh" 2>/dev/null || true)
if [ -n "$pids" ]; then
log " Killing stale $proc_name PIDs: $pids"
echo "$pids" | xargs kill 2>/dev/null || true
sleep 1
# Force-kill any survivors
pids=$(pgrep -f "${proc_name}\\.sh" 2>/dev/null || true)
if [ -n "$pids" ]; then
echo "$pids" | xargs kill -9 2>/dev/null || true
fi
else
log " No stale $proc_name found."
fi
done
# ── 3. Clear lock directories ────────────────────────────────────────
log "Clearing lock dirs..."
rm -rf "${CLAUDE_LOCKS:?}"/*
rm -rf "${GEMINI_LOCKS:?}"/*
log " Cleared $CLAUDE_LOCKS and $GEMINI_LOCKS"
# ── 4. Launch loops with nohup ───────────────────────────────────────
log "Launching timmy-orchestrator..."
nohup bash "$HERMES_BIN/timmy-orchestrator.sh" \
>> "$LOG_DIR/timmy-orchestrator-nohup.log" 2>&1 &
ORCH_PID=$!
log " timmy-orchestrator PID: $ORCH_PID"
log "Launching claude-loop (5 workers)..."
nohup bash "$HERMES_BIN/claude-loop.sh" 5 \
>> "$LOG_DIR/claude-loop-nohup.log" 2>&1 &
CLAUDE_PID=$!
log " claude-loop PID: $CLAUDE_PID"
log "Launching gemini-loop (3 workers)..."
nohup bash "$HERMES_BIN/gemini-loop.sh" 3 \
>> "$LOG_DIR/gemini-loop-nohup.log" 2>&1 &
GEMINI_PID=$!
log " gemini-loop PID: $GEMINI_PID"
# ── 5. PID summary ───────────────────────────────────────────────────
log "Waiting 3s for processes to settle..."
sleep 3
echo ""
echo "═══════════════════════════════════════════════════"
echo " HERMES LOOP STATUS"
echo "═══════════════════════════════════════════════════"
printf " %-25s %s\n" "PROCESS" "PID / STATUS"
echo "───────────────────────────────────────────────────"
for entry in "timmy-orchestrator:$ORCH_PID" "claude-loop:$CLAUDE_PID" "gemini-loop:$GEMINI_PID"; do
name="${entry%%:*}"
pid="${entry##*:}"
if kill -0 "$pid" 2>/dev/null; then
printf " %-25s %s\n" "$name" "$pid ✓ running"
else
printf " %-25s %s\n" "$name" "$pid ✗ DEAD"
fi
done
echo "───────────────────────────────────────────────────"
echo " Logs: $LOG_DIR/*-nohup.log"
echo "═══════════════════════════════════════════════════"
echo ""
log "All loops launched."