feat: parallel workers for all agents, dynamic scaling, self-healing watchdog
- claude-loop: 7 workers default, scales up to 21, 5s cooldown
- gemini-loop: rewritten as parallel worker system (3→12), multi-repo,
auto-clone, correct CLI flags (-p/--yolo), bash 3.2 compatible
- loop-watchdog: monitors all loops every 2min, auto-restarts dead loops,
kills zombies, files Gitea issues for unfixable problems
- ops-helpers: added ops-wake-watchdog, ops-kill-watchdog
- All scripts use file-based PID tracking (bash 3.2 safe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:22:18 -04:00
|
|
|
#!/usr/bin/env bash
|
2026-03-22 22:03:58 -04:00
|
|
|
# loop-watchdog.sh — Self-healing monitor for agent loops
|
|
|
|
|
# Single instance enforced via pidfile. Checks every 2 minutes.
|
|
|
|
|
# Restarts dead loops. Files Gitea issues for persistent failures.
|
|
|
|
|
# Does NOT kill git processes — workers handle their own cleanup.
|
feat: parallel workers for all agents, dynamic scaling, self-healing watchdog
- claude-loop: 7 workers default, scales up to 21, 5s cooldown
- gemini-loop: rewritten as parallel worker system (3→12), multi-repo,
auto-clone, correct CLI flags (-p/--yolo), bash 3.2 compatible
- loop-watchdog: monitors all loops every 2min, auto-restarts dead loops,
kills zombies, files Gitea issues for unfixable problems
- ops-helpers: added ops-wake-watchdog, ops-kill-watchdog
- All scripts use file-based PID tracking (bash 3.2 safe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:22:18 -04:00
|
|
|
|
|
|
|
|
set -uo pipefail
|
|
|
|
|
|
|
|
|
|
LOG_DIR="$HOME/.hermes/logs"
|
|
|
|
|
LOG="$LOG_DIR/watchdog.log"
|
2026-03-22 22:03:58 -04:00
|
|
|
PIDFILE="$LOG_DIR/watchdog.pid"
|
|
|
|
|
ISSUE_LOG="$LOG_DIR/watchdog-issues.json"
|
feat: parallel workers for all agents, dynamic scaling, self-healing watchdog
- claude-loop: 7 workers default, scales up to 21, 5s cooldown
- gemini-loop: rewritten as parallel worker system (3→12), multi-repo,
auto-clone, correct CLI flags (-p/--yolo), bash 3.2 compatible
- loop-watchdog: monitors all loops every 2min, auto-restarts dead loops,
kills zombies, files Gitea issues for unfixable problems
- ops-helpers: added ops-wake-watchdog, ops-kill-watchdog
- All scripts use file-based PID tracking (bash 3.2 safe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:22:18 -04:00
|
|
|
GITEA_URL="http://143.198.27.163:3000"
|
|
|
|
|
ADMIN_TOKEN=$(cat "$HOME/.config/gitea/token" 2>/dev/null)
|
2026-03-22 22:03:58 -04:00
|
|
|
ISSUE_REPO="rockachopa/hermes-agent"
|
|
|
|
|
CHECK_INTERVAL=120
|
feat: parallel workers for all agents, dynamic scaling, self-healing watchdog
- claude-loop: 7 workers default, scales up to 21, 5s cooldown
- gemini-loop: rewritten as parallel worker system (3→12), multi-repo,
auto-clone, correct CLI flags (-p/--yolo), bash 3.2 compatible
- loop-watchdog: monitors all loops every 2min, auto-restarts dead loops,
kills zombies, files Gitea issues for unfixable problems
- ops-helpers: added ops-wake-watchdog, ops-kill-watchdog
- All scripts use file-based PID tracking (bash 3.2 safe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:22:18 -04:00
|
|
|
|
|
|
|
|
mkdir -p "$LOG_DIR"
|
|
|
|
|
[ -f "$ISSUE_LOG" ] || echo '{}' > "$ISSUE_LOG"
|
|
|
|
|
|
2026-03-22 22:03:58 -04:00
|
|
|
# === Single instance guard ===
|
|
|
|
|
if [ -f "$PIDFILE" ]; then
|
|
|
|
|
old_pid=$(cat "$PIDFILE")
|
|
|
|
|
if kill -0 "$old_pid" 2>/dev/null; then
|
|
|
|
|
echo "Watchdog already running (PID $old_pid)" >&2
|
|
|
|
|
exit 0
|
|
|
|
|
fi
|
|
|
|
|
fi
|
|
|
|
|
echo $$ > "$PIDFILE"
|
|
|
|
|
trap 'rm -f "$PIDFILE"' EXIT
|
|
|
|
|
|
feat: parallel workers for all agents, dynamic scaling, self-healing watchdog
- claude-loop: 7 workers default, scales up to 21, 5s cooldown
- gemini-loop: rewritten as parallel worker system (3→12), multi-repo,
auto-clone, correct CLI flags (-p/--yolo), bash 3.2 compatible
- loop-watchdog: monitors all loops every 2min, auto-restarts dead loops,
kills zombies, files Gitea issues for unfixable problems
- ops-helpers: added ops-wake-watchdog, ops-kill-watchdog
- All scripts use file-based PID tracking (bash 3.2 safe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:22:18 -04:00
|
|
|
log() {
|
|
|
|
|
echo "[$(date '+%Y-%m-%d %H:%M:%S')] WATCHDOG: $*" >> "$LOG"
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-22 22:03:58 -04:00
|
|
|
# File a Gitea issue. Deduplicates within 6 hours.
|
feat: parallel workers for all agents, dynamic scaling, self-healing watchdog
- claude-loop: 7 workers default, scales up to 21, 5s cooldown
- gemini-loop: rewritten as parallel worker system (3→12), multi-repo,
auto-clone, correct CLI flags (-p/--yolo), bash 3.2 compatible
- loop-watchdog: monitors all loops every 2min, auto-restarts dead loops,
kills zombies, files Gitea issues for unfixable problems
- ops-helpers: added ops-wake-watchdog, ops-kill-watchdog
- All scripts use file-based PID tracking (bash 3.2 safe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:22:18 -04:00
|
|
|
file_issue() {
|
|
|
|
|
local issue_key="$1"
|
|
|
|
|
local title="$2"
|
|
|
|
|
local body="$3"
|
|
|
|
|
local assignee="${4:-claude}"
|
|
|
|
|
|
|
|
|
|
local should_file
|
|
|
|
|
should_file=$(python3 -c "
|
|
|
|
|
import json, time
|
|
|
|
|
try:
|
|
|
|
|
with open('$ISSUE_LOG') as f: filed = json.load(f)
|
|
|
|
|
except: filed = {}
|
|
|
|
|
entry = filed.get('$issue_key', {})
|
|
|
|
|
if entry and entry.get('until', 0) > time.time():
|
|
|
|
|
print('no')
|
|
|
|
|
else:
|
2026-03-22 22:03:58 -04:00
|
|
|
filed['$issue_key'] = {'until': time.time() + 21600}
|
feat: parallel workers for all agents, dynamic scaling, self-healing watchdog
- claude-loop: 7 workers default, scales up to 21, 5s cooldown
- gemini-loop: rewritten as parallel worker system (3→12), multi-repo,
auto-clone, correct CLI flags (-p/--yolo), bash 3.2 compatible
- loop-watchdog: monitors all loops every 2min, auto-restarts dead loops,
kills zombies, files Gitea issues for unfixable problems
- ops-helpers: added ops-wake-watchdog, ops-kill-watchdog
- All scripts use file-based PID tracking (bash 3.2 safe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:22:18 -04:00
|
|
|
with open('$ISSUE_LOG', 'w') as f: json.dump(filed, f, indent=2)
|
|
|
|
|
print('yes')
|
|
|
|
|
" 2>/dev/null)
|
|
|
|
|
|
2026-03-22 22:03:58 -04:00
|
|
|
[ "$should_file" != "yes" ] && return 0
|
feat: parallel workers for all agents, dynamic scaling, self-healing watchdog
- claude-loop: 7 workers default, scales up to 21, 5s cooldown
- gemini-loop: rewritten as parallel worker system (3→12), multi-repo,
auto-clone, correct CLI flags (-p/--yolo), bash 3.2 compatible
- loop-watchdog: monitors all loops every 2min, auto-restarts dead loops,
kills zombies, files Gitea issues for unfixable problems
- ops-helpers: added ops-wake-watchdog, ops-kill-watchdog
- All scripts use file-based PID tracking (bash 3.2 safe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:22:18 -04:00
|
|
|
|
|
|
|
|
log "FILING ISSUE: $title"
|
2026-03-22 21:35:56 -04:00
|
|
|
local tmpfile="/tmp/watchdog-issue-$$.json"
|
|
|
|
|
python3 -c "
|
|
|
|
|
import json, sys
|
|
|
|
|
with open('$tmpfile', 'w') as f:
|
|
|
|
|
json.dump({
|
2026-03-22 22:03:58 -04:00
|
|
|
'title': '[watchdog] ' + sys.argv[1],
|
|
|
|
|
'body': sys.argv[2] + '\n\n---\n*Auto-filed by loop-watchdog*',
|
|
|
|
|
'assignees': [sys.argv[3]],
|
2026-03-22 21:35:56 -04:00
|
|
|
}, f)
|
|
|
|
|
" "$title" "$body" "$assignee" 2>/dev/null
|
|
|
|
|
|
|
|
|
|
if [ -f "$tmpfile" ]; then
|
|
|
|
|
curl -sf -X POST "${GITEA_URL}/api/v1/repos/${ISSUE_REPO}/issues" \
|
|
|
|
|
-H "Authorization: token ${ADMIN_TOKEN}" \
|
|
|
|
|
-H "Content-Type: application/json" \
|
|
|
|
|
-d @"$tmpfile" >/dev/null 2>&1 || log "WARN: Failed to file issue: $title"
|
|
|
|
|
rm -f "$tmpfile"
|
|
|
|
|
fi
|
feat: parallel workers for all agents, dynamic scaling, self-healing watchdog
- claude-loop: 7 workers default, scales up to 21, 5s cooldown
- gemini-loop: rewritten as parallel worker system (3→12), multi-repo,
auto-clone, correct CLI flags (-p/--yolo), bash 3.2 compatible
- loop-watchdog: monitors all loops every 2min, auto-restarts dead loops,
kills zombies, files Gitea issues for unfixable problems
- ops-helpers: added ops-wake-watchdog, ops-kill-watchdog
- All scripts use file-based PID tracking (bash 3.2 safe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:22:18 -04:00
|
|
|
}
|
|
|
|
|
|
2026-03-22 22:03:58 -04:00
|
|
|
# Check if a loop is alive. Restart if dead.
|
feat: parallel workers for all agents, dynamic scaling, self-healing watchdog
- claude-loop: 7 workers default, scales up to 21, 5s cooldown
- gemini-loop: rewritten as parallel worker system (3→12), multi-repo,
auto-clone, correct CLI flags (-p/--yolo), bash 3.2 compatible
- loop-watchdog: monitors all loops every 2min, auto-restarts dead loops,
kills zombies, files Gitea issues for unfixable problems
- ops-helpers: added ops-wake-watchdog, ops-kill-watchdog
- All scripts use file-based PID tracking (bash 3.2 safe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:22:18 -04:00
|
|
|
check_loop() {
|
2026-03-22 22:03:58 -04:00
|
|
|
local name="$1"
|
|
|
|
|
local grep_pat="$2"
|
|
|
|
|
local wake_cmd="$3"
|
|
|
|
|
local log_file="$4"
|
feat: parallel workers for all agents, dynamic scaling, self-healing watchdog
- claude-loop: 7 workers default, scales up to 21, 5s cooldown
- gemini-loop: rewritten as parallel worker system (3→12), multi-repo,
auto-clone, correct CLI flags (-p/--yolo), bash 3.2 compatible
- loop-watchdog: monitors all loops every 2min, auto-restarts dead loops,
kills zombies, files Gitea issues for unfixable problems
- ops-helpers: added ops-wake-watchdog, ops-kill-watchdog
- All scripts use file-based PID tracking (bash 3.2 safe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:22:18 -04:00
|
|
|
|
|
|
|
|
local pid
|
|
|
|
|
pid=$(pgrep -f "$grep_pat" 2>/dev/null | head -1)
|
|
|
|
|
|
|
|
|
|
if [ -z "$pid" ]; then
|
|
|
|
|
log "$name loop DOWN — restarting..."
|
|
|
|
|
eval "$wake_cmd"
|
2026-03-22 22:03:58 -04:00
|
|
|
sleep 5
|
feat: parallel workers for all agents, dynamic scaling, self-healing watchdog
- claude-loop: 7 workers default, scales up to 21, 5s cooldown
- gemini-loop: rewritten as parallel worker system (3→12), multi-repo,
auto-clone, correct CLI flags (-p/--yolo), bash 3.2 compatible
- loop-watchdog: monitors all loops every 2min, auto-restarts dead loops,
kills zombies, files Gitea issues for unfixable problems
- ops-helpers: added ops-wake-watchdog, ops-kill-watchdog
- All scripts use file-based PID tracking (bash 3.2 safe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:22:18 -04:00
|
|
|
pid=$(pgrep -f "$grep_pat" 2>/dev/null | head -1)
|
|
|
|
|
if [ -z "$pid" ]; then
|
2026-03-22 22:03:58 -04:00
|
|
|
file_issue "${name}-loop-dead" \
|
feat: parallel workers for all agents, dynamic scaling, self-healing watchdog
- claude-loop: 7 workers default, scales up to 21, 5s cooldown
- gemini-loop: rewritten as parallel worker system (3→12), multi-repo,
auto-clone, correct CLI flags (-p/--yolo), bash 3.2 compatible
- loop-watchdog: monitors all loops every 2min, auto-restarts dead loops,
kills zombies, files Gitea issues for unfixable problems
- ops-helpers: added ops-wake-watchdog, ops-kill-watchdog
- All scripts use file-based PID tracking (bash 3.2 safe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:22:18 -04:00
|
|
|
"${name} loop won't start" \
|
2026-03-22 22:03:58 -04:00
|
|
|
"The ${name} agent loop failed to start.\nCommand: ${wake_cmd}\nLast log: $(tail -10 "$log_file" 2>/dev/null)" \
|
feat: parallel workers for all agents, dynamic scaling, self-healing watchdog
- claude-loop: 7 workers default, scales up to 21, 5s cooldown
- gemini-loop: rewritten as parallel worker system (3→12), multi-repo,
auto-clone, correct CLI flags (-p/--yolo), bash 3.2 compatible
- loop-watchdog: monitors all loops every 2min, auto-restarts dead loops,
kills zombies, files Gitea issues for unfixable problems
- ops-helpers: added ops-wake-watchdog, ops-kill-watchdog
- All scripts use file-based PID tracking (bash 3.2 safe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:22:18 -04:00
|
|
|
"claude"
|
|
|
|
|
else
|
|
|
|
|
log "$name loop restarted (PID $pid)"
|
|
|
|
|
fi
|
|
|
|
|
return
|
|
|
|
|
fi
|
|
|
|
|
|
2026-03-22 22:03:58 -04:00
|
|
|
# Check for stalls (no log activity > 30 min, and not idle)
|
feat: parallel workers for all agents, dynamic scaling, self-healing watchdog
- claude-loop: 7 workers default, scales up to 21, 5s cooldown
- gemini-loop: rewritten as parallel worker system (3→12), multi-repo,
auto-clone, correct CLI flags (-p/--yolo), bash 3.2 compatible
- loop-watchdog: monitors all loops every 2min, auto-restarts dead loops,
kills zombies, files Gitea issues for unfixable problems
- ops-helpers: added ops-wake-watchdog, ops-kill-watchdog
- All scripts use file-based PID tracking (bash 3.2 safe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:22:18 -04:00
|
|
|
if [ -f "$log_file" ]; then
|
2026-03-22 22:03:58 -04:00
|
|
|
local last_mod now stale
|
|
|
|
|
last_mod=$(stat -f %m "$log_file" 2>/dev/null || stat -c %Y "$log_file" 2>/dev/null || echo 0)
|
feat: parallel workers for all agents, dynamic scaling, self-healing watchdog
- claude-loop: 7 workers default, scales up to 21, 5s cooldown
- gemini-loop: rewritten as parallel worker system (3→12), multi-repo,
auto-clone, correct CLI flags (-p/--yolo), bash 3.2 compatible
- loop-watchdog: monitors all loops every 2min, auto-restarts dead loops,
kills zombies, files Gitea issues for unfixable problems
- ops-helpers: added ops-wake-watchdog, ops-kill-watchdog
- All scripts use file-based PID tracking (bash 3.2 safe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:22:18 -04:00
|
|
|
now=$(date +%s)
|
2026-03-22 22:03:58 -04:00
|
|
|
stale=$(( now - last_mod ))
|
feat: parallel workers for all agents, dynamic scaling, self-healing watchdog
- claude-loop: 7 workers default, scales up to 21, 5s cooldown
- gemini-loop: rewritten as parallel worker system (3→12), multi-repo,
auto-clone, correct CLI flags (-p/--yolo), bash 3.2 compatible
- loop-watchdog: monitors all loops every 2min, auto-restarts dead loops,
kills zombies, files Gitea issues for unfixable problems
- ops-helpers: added ops-wake-watchdog, ops-kill-watchdog
- All scripts use file-based PID tracking (bash 3.2 safe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:22:18 -04:00
|
|
|
|
2026-03-22 22:03:58 -04:00
|
|
|
if [ "$stale" -gt 1800 ]; then
|
feat: parallel workers for all agents, dynamic scaling, self-healing watchdog
- claude-loop: 7 workers default, scales up to 21, 5s cooldown
- gemini-loop: rewritten as parallel worker system (3→12), multi-repo,
auto-clone, correct CLI flags (-p/--yolo), bash 3.2 compatible
- loop-watchdog: monitors all loops every 2min, auto-restarts dead loops,
kills zombies, files Gitea issues for unfixable problems
- ops-helpers: added ops-wake-watchdog, ops-kill-watchdog
- All scripts use file-based PID tracking (bash 3.2 safe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:22:18 -04:00
|
|
|
local last_line
|
|
|
|
|
last_line=$(tail -1 "$log_file" 2>/dev/null)
|
2026-03-22 22:03:58 -04:00
|
|
|
# Idle is fine, truly stuck is not
|
|
|
|
|
if ! echo "$last_line" | grep -q "Queue empty\|Waiting\|idle"; then
|
|
|
|
|
log "$name loop stale (${stale}s) — restarting"
|
|
|
|
|
pkill -f "$grep_pat" 2>/dev/null
|
|
|
|
|
sleep 2
|
|
|
|
|
eval "$wake_cmd"
|
feat: parallel workers for all agents, dynamic scaling, self-healing watchdog
- claude-loop: 7 workers default, scales up to 21, 5s cooldown
- gemini-loop: rewritten as parallel worker system (3→12), multi-repo,
auto-clone, correct CLI flags (-p/--yolo), bash 3.2 compatible
- loop-watchdog: monitors all loops every 2min, auto-restarts dead loops,
kills zombies, files Gitea issues for unfixable problems
- ops-helpers: added ops-wake-watchdog, ops-kill-watchdog
- All scripts use file-based PID tracking (bash 3.2 safe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:22:18 -04:00
|
|
|
fi
|
|
|
|
|
fi
|
|
|
|
|
|
2026-03-22 22:03:58 -04:00
|
|
|
# Check for crash loops (10+ failures in last 50 lines)
|
feat: parallel workers for all agents, dynamic scaling, self-healing watchdog
- claude-loop: 7 workers default, scales up to 21, 5s cooldown
- gemini-loop: rewritten as parallel worker system (3→12), multi-repo,
auto-clone, correct CLI flags (-p/--yolo), bash 3.2 compatible
- loop-watchdog: monitors all loops every 2min, auto-restarts dead loops,
kills zombies, files Gitea issues for unfixable problems
- ops-helpers: added ops-wake-watchdog, ops-kill-watchdog
- All scripts use file-based PID tracking (bash 3.2 safe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:22:18 -04:00
|
|
|
local recent_failures
|
|
|
|
|
recent_failures=$(tail -50 "$log_file" 2>/dev/null | grep -c "FAILED:\|ERROR:" || true)
|
2026-03-22 22:03:58 -04:00
|
|
|
if [ "$recent_failures" -ge 10 ]; then
|
|
|
|
|
local errors
|
|
|
|
|
errors=$(tail -50 "$log_file" 2>/dev/null | grep "FAILED:\|ERROR:" | tail -5)
|
|
|
|
|
file_issue "${name}-crash-loop" \
|
|
|
|
|
"${name} in crash loop (${recent_failures} failures in 50 lines)" \
|
|
|
|
|
"Recent errors:\n${errors}" \
|
feat: parallel workers for all agents, dynamic scaling, self-healing watchdog
- claude-loop: 7 workers default, scales up to 21, 5s cooldown
- gemini-loop: rewritten as parallel worker system (3→12), multi-repo,
auto-clone, correct CLI flags (-p/--yolo), bash 3.2 compatible
- loop-watchdog: monitors all loops every 2min, auto-restarts dead loops,
kills zombies, files Gitea issues for unfixable problems
- ops-helpers: added ops-wake-watchdog, ops-kill-watchdog
- All scripts use file-based PID tracking (bash 3.2 safe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:22:18 -04:00
|
|
|
"claude"
|
|
|
|
|
fi
|
|
|
|
|
fi
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
check_gitea() {
|
|
|
|
|
if ! curl -sf --max-time 5 "${GITEA_URL}/api/v1/version" >/dev/null 2>&1; then
|
|
|
|
|
log "Gitea UNREACHABLE"
|
2026-03-22 22:03:58 -04:00
|
|
|
file_issue "gitea-down" "Gitea unreachable" \
|
|
|
|
|
"Gitea at ${GITEA_URL} not responding. Check VPS 143.198.27.163." "claude"
|
feat: parallel workers for all agents, dynamic scaling, self-healing watchdog
- claude-loop: 7 workers default, scales up to 21, 5s cooldown
- gemini-loop: rewritten as parallel worker system (3→12), multi-repo,
auto-clone, correct CLI flags (-p/--yolo), bash 3.2 compatible
- loop-watchdog: monitors all loops every 2min, auto-restarts dead loops,
kills zombies, files Gitea issues for unfixable problems
- ops-helpers: added ops-wake-watchdog, ops-kill-watchdog
- All scripts use file-based PID tracking (bash 3.2 safe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:22:18 -04:00
|
|
|
fi
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
check_disk() {
|
2026-03-22 22:03:58 -04:00
|
|
|
local dir_count
|
|
|
|
|
dir_count=$(ls -1d "$HOME/worktrees"/claude-w* 2>/dev/null | wc -l | tr -d ' ')
|
|
|
|
|
if [ "${dir_count:-0}" -gt 40 ]; then
|
|
|
|
|
log "WARN: $dir_count clone dirs — cleaning old ones"
|
|
|
|
|
# Delete clone dirs not modified in the last hour
|
|
|
|
|
find "$HOME/worktrees" -maxdepth 1 -name "claude-w*" -type d -mmin +60 -exec rm -rf {} \; 2>/dev/null
|
feat: parallel workers for all agents, dynamic scaling, self-healing watchdog
- claude-loop: 7 workers default, scales up to 21, 5s cooldown
- gemini-loop: rewritten as parallel worker system (3→12), multi-repo,
auto-clone, correct CLI flags (-p/--yolo), bash 3.2 compatible
- loop-watchdog: monitors all loops every 2min, auto-restarts dead loops,
kills zombies, files Gitea issues for unfixable problems
- ops-helpers: added ops-wake-watchdog, ops-kill-watchdog
- All scripts use file-based PID tracking (bash 3.2 safe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:22:18 -04:00
|
|
|
fi
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
# === MAIN ===
|
2026-03-22 22:03:58 -04:00
|
|
|
log "=== Watchdog Started (PID $$) ==="
|
feat: parallel workers for all agents, dynamic scaling, self-healing watchdog
- claude-loop: 7 workers default, scales up to 21, 5s cooldown
- gemini-loop: rewritten as parallel worker system (3→12), multi-repo,
auto-clone, correct CLI flags (-p/--yolo), bash 3.2 compatible
- loop-watchdog: monitors all loops every 2min, auto-restarts dead loops,
kills zombies, files Gitea issues for unfixable problems
- ops-helpers: added ops-wake-watchdog, ops-kill-watchdog
- All scripts use file-based PID tracking (bash 3.2 safe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:22:18 -04:00
|
|
|
|
|
|
|
|
while true; do
|
|
|
|
|
check_gitea
|
|
|
|
|
|
2026-03-22 22:03:58 -04:00
|
|
|
check_loop "claude" "claude-loop.sh" \
|
|
|
|
|
"nohup bash ~/.hermes/bin/claude-loop.sh 10 >> ~/.hermes/logs/claude-loop.log 2>&1 &" \
|
|
|
|
|
"$LOG_DIR/claude-loop.log"
|
|
|
|
|
|
feat: parallel workers for all agents, dynamic scaling, self-healing watchdog
- claude-loop: 7 workers default, scales up to 21, 5s cooldown
- gemini-loop: rewritten as parallel worker system (3→12), multi-repo,
auto-clone, correct CLI flags (-p/--yolo), bash 3.2 compatible
- loop-watchdog: monitors all loops every 2min, auto-restarts dead loops,
kills zombies, files Gitea issues for unfixable problems
- ops-helpers: added ops-wake-watchdog, ops-kill-watchdog
- All scripts use file-based PID tracking (bash 3.2 safe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:22:18 -04:00
|
|
|
check_loop "kimi" "kimi-loop.sh" \
|
|
|
|
|
"nohup bash ~/.hermes/bin/kimi-loop.sh >> ~/.hermes/logs/kimi-loop.log 2>&1 &" \
|
2026-03-22 22:03:58 -04:00
|
|
|
"$LOG_DIR/kimi-loop.log"
|
feat: parallel workers for all agents, dynamic scaling, self-healing watchdog
- claude-loop: 7 workers default, scales up to 21, 5s cooldown
- gemini-loop: rewritten as parallel worker system (3→12), multi-repo,
auto-clone, correct CLI flags (-p/--yolo), bash 3.2 compatible
- loop-watchdog: monitors all loops every 2min, auto-restarts dead loops,
kills zombies, files Gitea issues for unfixable problems
- ops-helpers: added ops-wake-watchdog, ops-kill-watchdog
- All scripts use file-based PID tracking (bash 3.2 safe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:22:18 -04:00
|
|
|
|
2026-03-22 22:03:58 -04:00
|
|
|
# Gemini disabled — no API key configured
|
|
|
|
|
# check_loop "gemini" ...
|
feat: parallel workers for all agents, dynamic scaling, self-healing watchdog
- claude-loop: 7 workers default, scales up to 21, 5s cooldown
- gemini-loop: rewritten as parallel worker system (3→12), multi-repo,
auto-clone, correct CLI flags (-p/--yolo), bash 3.2 compatible
- loop-watchdog: monitors all loops every 2min, auto-restarts dead loops,
kills zombies, files Gitea issues for unfixable problems
- ops-helpers: added ops-wake-watchdog, ops-kill-watchdog
- All scripts use file-based PID tracking (bash 3.2 safe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:22:18 -04:00
|
|
|
|
|
|
|
|
check_disk
|
|
|
|
|
|
|
|
|
|
sleep "$CHECK_INTERVAL"
|
|
|
|
|
done
|