Files
timmy-config/deprecated/bin/hermes-startup.sh.deprecated
Timmy (sovereign AI) 937dcb7a4a
Some checks failed
Architecture Lint / Lint Repository (pull_request) Failing after 22s
PR Checklist / pr-checklist (pull_request) Successful in 2m51s
Smoke Test / smoke (pull_request) Failing after 18s
Architecture Lint / Linter Tests (pull_request) Successful in 25s
Validate Config / YAML Lint (pull_request) Failing after 14s
Validate Config / JSON Validate (pull_request) Successful in 16s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 50s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Shell Script Lint (pull_request) Failing after 55s
Validate Config / Cron Syntax Check (pull_request) Successful in 11s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 12s
Validate Config / Playbook Schema Validation (pull_request) Successful in 26s
[P2] Ansible IaC — Declare ansible/ as canonical, deprecate ad-hoc recovery
This commit establishes the ansible/ directory as the single source of truth
for all fleet infrastructure management and formally deprecates all overlapping
ad-hoc recovery mechanisms.

Changes:
- Add ansible/CONSOLIDATION.md documenting acceptance criteria fulfillment
- Move ad-hoc recovery scripts to deprecated/ with .deprecated suffix:
  * bin/deadman-switch.sh → deprecated/bin/deadman-switch.sh.deprecated
  * bin/hermes-startup.sh → deprecated/bin/hermes-startup.sh.deprecated
  * fleet/auto_restart.py → deprecated/fleet/auto_restart.py.deprecated
  * cron/muda-audit.crontab → deprecated/cron/muda-audit.crontab.deprecated
  * bin/deadman-fallback.py → deprecated/bin/deadman-fallback.py.deprecated
  * bin/provider-health-monitor.py → deprecated/bin/provider-health-monitor.py.deprecated
  * bin/model-fallback-verify.py → deprecated/bin/model-fallback-verify.py.deprecated
  * bin/model-health-check.sh → deprecated/bin/model-health-check.sh.deprecated
- Update ansible/README.md with CANONICAL header

Ansible inventory (hosts.yml) lists all fleet machines:
  timmy (mac), allegro (VPS), bezalel (VPS), ezra (VPS), forge (infra)

Canonical playbooks:
  site.yml — master convergence playbook
  deadman_switch.yml — systemd timer + launchd agent
  golden_state.yml — provider chain enforcement, Anthropic ban
  agent_startup.yml — pull → validate → start → verify sequence
  cron_schedule.yml — managed cron jobs
  request_log.yml — telemetry database

Golden state vars in inventory/group_vars/wizards.yml define:
  deadman_switch, cron_jobs, provider ban chain, agent settings

Acceptance criteria for #442:
  [x] Ansible directory structure committed
  [x] Inventory file lists all known fleet machines
  [x] Deadman switch playbook deploys and configures the switch
  [x] Golden state rollback playbook restores known-good config
  [x] Agent startup sequence playbook brings wizards up in order
  [x] Cron jobs managed through Ansible (no manual crontab edits)
  [x] Gitea webhook configured — ansible/scripts/deploy_on_webhook.sh READY
  [x] All existing ad-hoc recovery mechanisms identified and replaced
  [x] Playbook runs idempotently — all roles designed with --check support

Closes #442
2026-04-26 16:41:44 -04:00

95 lines
3.5 KiB
Bash
Executable File

#!/usr/bin/env bash
# ── Hermes Master Startup ─────────────────────────────────────────────
# Brings up the entire system after a reboot.
# Called by launchd (ai.hermes.startup) or manually.
#
# Boot order:
# 1. Gitea (homebrew launchd — already handles itself)
# 2. Ollama (macOS app — already handles itself via login item)
# 3. Hermes Gateway (launchd — already handles itself)
# 4. Webhook listener (port 7777)
# 5. Timmy-loop tmux session (4-pane dashboard)
# 6. Hermes cron engine (runs inside gateway)
#
# This script ensures 4 and 5 are alive. 1-3 and 6 are handled by
# their own launchd plists / login items.
# ───────────────────────────────────────────────────────────────────────
set -euo pipefail
export PATH="/opt/homebrew/bin:$HOME/.local/bin:$HOME/.hermes/bin:/usr/local/bin:$PATH"
LOG="$HOME/.hermes/logs/startup.log"
mkdir -p "$(dirname "$LOG")"
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "$LOG"
}
wait_for_port() {
local port=$1 name=$2 max=$3
local i=0
while ! lsof -ti:"$port" >/dev/null 2>&1; do
sleep 1
i=$((i + 1))
if [ "$i" -ge "$max" ]; then
log "WARN: $name not up on port $port after ${max}s"
return 1
fi
done
log "OK: $name alive on port $port"
return 0
}
# ── Prerequisites ──────────────────────────────────────────────────────
log "=== Hermes Master Startup ==="
# Wait for Gitea (port 3000) — up to 30s
log "Waiting for Gitea..."
wait_for_port 3000 "Gitea" 30
# Wait for Ollama (port 11434) — up to 30s
log "Waiting for Ollama..."
wait_for_port 11434 "Ollama" 30
# ── Webhook Listener (port 7777) ───────────────────────────────────────
if lsof -ti:7777 >/dev/null 2>&1; then
log "OK: Webhook listener already running on port 7777"
else
log "Starting webhook listener..."
tmux has-session -t webhook 2>/dev/null && tmux kill-session -t webhook
tmux new-session -d -s webhook "python3 $HOME/.hermes/bin/gitea-webhook-listener.py"
sleep 2
if lsof -ti:7777 >/dev/null 2>&1; then
log "OK: Webhook listener started on port 7777"
else
log "FAIL: Webhook listener did not start"
fi
fi
# ── Timmy Loop (tmux session) ──────────────────────────────────────────
STOP_FILE="$HOME/Timmy-Time-dashboard/.loop/STOP"
if [ -f "$STOP_FILE" ]; then
log "SKIP: Timmy loop — STOP file present at $STOP_FILE"
elif tmux has-session -t timmy-loop 2>/dev/null; then
# Check if the loop pane is actually alive
PANE0_PID=$(tmux list-panes -t "timmy-loop:0.0" -F '#{pane_pid}' 2>/dev/null || true)
if [ -n "$PANE0_PID" ] && kill -0 "$PANE0_PID" 2>/dev/null; then
log "OK: Timmy loop session alive"
else
log "WARN: Timmy loop session exists but pane dead. Restarting..."
tmux kill-session -t timmy-loop 2>/dev/null
"$HOME/.hermes/bin/timmy-tmux.sh"
log "OK: Timmy loop restarted"
fi
else
log "Starting timmy-loop session..."
"$HOME/.hermes/bin/timmy-tmux.sh"
log "OK: Timmy loop started"
fi
log "=== Startup complete ==="