Files
timmy-config/bin/start-loops.sh
Alexander Payne a027249618
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 22s
Smoke Test / smoke (pull_request) Failing after 26s
Validate Config / YAML Lint (pull_request) Failing after 19s
Validate Config / JSON Validate (pull_request) Successful in 21s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m3s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Cron Syntax Check (pull_request) Successful in 15s
Validate Config / Shell Script Lint (pull_request) Failing after 1m2s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 15s
Validate Config / Playbook Schema Validation (pull_request) Successful in 27s
Architecture Lint / Lint Repository (pull_request) Failing after 21s
PR Checklist / pr-checklist (pull_request) Successful in 4m33s
feat(andon): add andon-alert system with burn monitor\n\nImplements real-time signal light (行燈) for instant visibility of failures:\n- bin/andon-alert.sh: accepts severity (INFO/WARNING/CRITICAL/HALT), logs, sends Telegram for WARNING+, pauses loops on CRITICAL, kills + flag on HALT.\n- Integrated into model-health-check (startup failures), agent-loop (3 consecutive failures), and claudemax-watchdog (restart events).\n- New burn-monitor.sh displays color-coded per-lane status from andon state.\n- Any fleet script can call andon-alert.sh to raise visibility.\n\nCloses #347
2026-04-30 10:17:12 -04:00

100 lines
4.1 KiB
Bash
Executable File

#!/usr/bin/env bash
# start-loops.sh — Start all Hermes agent loops (orchestrator + workers)
# Validates model health, cleans stale state, launches loops with nohup.
# Part of Gitea issue #126.
#
# Usage: start-loops.sh
set -euo pipefail
HERMES_BIN="$HOME/.hermes/bin"
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
LOG_DIR="$HOME/.hermes/logs"
CLAUDE_LOCKS="$LOG_DIR/claude-locks"
GEMINI_LOCKS="$LOG_DIR/gemini-locks"
mkdir -p "$LOG_DIR" "$CLAUDE_LOCKS" "$GEMINI_LOCKS"
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] START-LOOPS: $*"
}
# ── 1. Model health check ────────────────────────────────────────────
log "Running model health check..."
if ! bash "$SCRIPT_DIR/model-health-check.sh"; then
log "FATAL: Model health check failed. Aborting loop startup."
"$SCRIPT_DIR/andon-alert.sh" CRITICAL "Model health check failed — API key invalid or unreachable" "model-health"
exit 1
fi
log "Model health check passed."
# ── 2. Kill stale loop processes ──────────────────────────────────────
log "Killing stale loop processes..."
for proc_name in claude-loop gemini-loop timmy-orchestrator; do
pids=$(pgrep -f "${proc_name}\\.sh" 2>/dev/null || true)
if [ -n "$pids" ]; then
log " Killing stale $proc_name PIDs: $pids"
echo "$pids" | xargs kill 2>/dev/null || true
sleep 1
# Force-kill any survivors
pids=$(pgrep -f "${proc_name}\\.sh" 2>/dev/null || true)
if [ -n "$pids" ]; then
echo "$pids" | xargs kill -9 2>/dev/null || true
fi
else
log " No stale $proc_name found."
fi
done
# ── 3. Clear lock directories ────────────────────────────────────────
log "Clearing lock dirs..."
rm -rf "${CLAUDE_LOCKS:?}"/*
rm -rf "${GEMINI_LOCKS:?}"/*
log " Cleared $CLAUDE_LOCKS and $GEMINI_LOCKS"
# ── 4. Launch loops with nohup ───────────────────────────────────────
log "Launching timmy-orchestrator..."
nohup bash "$HERMES_BIN/timmy-orchestrator.sh" \
>> "$LOG_DIR/timmy-orchestrator-nohup.log" 2>&1 &
ORCH_PID=$!
log " timmy-orchestrator PID: $ORCH_PID"
log "Launching claude-loop (5 workers)..."
nohup bash "$HERMES_BIN/claude-loop.sh" 5 \
>> "$LOG_DIR/claude-loop-nohup.log" 2>&1 &
CLAUDE_PID=$!
log " claude-loop PID: $CLAUDE_PID"
log "Launching gemini-loop (3 workers)..."
nohup bash "$HERMES_BIN/gemini-loop.sh" 3 \
>> "$LOG_DIR/gemini-loop-nohup.log" 2>&1 &
GEMINI_PID=$!
log " gemini-loop PID: $GEMINI_PID"
# ── 5. PID summary ───────────────────────────────────────────────────
log "Waiting 3s for processes to settle..."
sleep 3
echo ""
echo "═══════════════════════════════════════════════════"
echo " HERMES LOOP STATUS"
echo "═══════════════════════════════════════════════════"
printf " %-25s %s\n" "PROCESS" "PID / STATUS"
echo "───────────────────────────────────────────────────"
for entry in "timmy-orchestrator:$ORCH_PID" "claude-loop:$CLAUDE_PID" "gemini-loop:$GEMINI_PID"; do
name="${entry%%:*}"
pid="${entry##*:}"
if kill -0 "$pid" 2>/dev/null; then
printf " %-25s %s\n" "$name" "$pid ✓ running"
else
printf " %-25s %s\n" "$name" "$pid ✗ DEAD"
fi
done
echo "───────────────────────────────────────────────────"
echo " Logs: $LOG_DIR/*-nohup.log"
echo "═══════════════════════════════════════════════════"
echo ""
log "All loops launched."