Tighten Hermes cutover and export checks

This commit is contained in:
Alexander Whitestone
2026-03-27 17:35:07 -04:00
parent 2384908be7
commit d72ae92189
5 changed files with 106 additions and 39 deletions

View File

@@ -1,7 +1,7 @@
# DEPRECATED — Bash Loop Scripts Removed # DEPRECATED — Bash Loop Scripts Removed
**Date:** 2026-03-25 **Date:** 2026-03-25
**Reason:** Replaced by sovereign-orchestration (SQLite + Python single-process executor) **Reason:** Replaced by Hermes + timmy-config sidecar orchestration
## What was removed ## What was removed
- claude-loop.sh, gemini-loop.sh, agent-loop.sh - claude-loop.sh, gemini-loop.sh, agent-loop.sh
@@ -9,14 +9,15 @@
- nexus-merge-bot.sh, claudemax-watchdog.sh, timmy-loopstat.sh - nexus-merge-bot.sh, claudemax-watchdog.sh, timmy-loopstat.sh
## What replaces them ## What replaces them
**Repo:** Timmy_Foundation/sovereign-orchestration **Harness:** Hermes
**Entry point:** `python3 src/sovereign_executor.py --workers 3 --poll 30` **Overlay repo:** Timmy_Foundation/timmy-config
**Features:** SQLite task queue, crash recovery, dedup, playbooks, MCP server **Entry points:** `orchestration.py`, `tasks.py`, `deploy.sh`
**Issues:** #29 (fix imports), #30 (deploy as service) **Features:** Huey + SQLite scheduling, local-model health checks, session export, DPO artifact staging
## Why ## Why
The bash loops crash-looped, produced zero work after relaunch, had no crash The bash loops crash-looped, produced zero work after relaunch, had no crash
recovery, no dedup, and required 8 separate scripts. The Python executor is recovery, no durable export path, and required too many ad hoc scripts. The
one process with SQLite durability. Hermes sidecar keeps orchestration close to Timmy's actual config and training
surfaces.
Do NOT recreate bash loops. If the executor is broken, fix the executor. Do NOT recreate bash loops. If orchestration is broken, fix the Hermes sidecar.

View File

@@ -14,11 +14,12 @@ timmy-config/
├── DEPRECATED.md ← What was removed and why ├── DEPRECATED.md ← What was removed and why
├── config.yaml ← Hermes harness configuration ├── config.yaml ← Hermes harness configuration
├── channel_directory.json ← Platform channel mappings ├── channel_directory.json ← Platform channel mappings
├── bin/ ← Utility scripts (NOT loops — see below) ├── bin/ ← Live utility scripts (NOT deprecated loops)
│ ├── hermes-startup.sh ← Hermes boot sequence │ ├── hermes-startup.sh ← Hermes boot sequence
│ ├── agent-dispatch.sh ← Manual agent dispatch │ ├── agent-dispatch.sh ← Manual agent dispatch
│ ├── ops-panel.sh ← Ops dashboard panel │ ├── ops-panel.sh ← Ops dashboard panel
│ ├── ops-gitea.sh ← Gitea ops helpers │ ├── ops-gitea.sh ← Gitea ops helpers
│ ├── pipeline-freshness.sh ← Session/export drift check
│ └── timmy-status.sh ← Status check │ └── timmy-status.sh ← Status check
├── memories/ ← Persistent memory YAML ├── memories/ ← Persistent memory YAML
├── skins/ ← UI skins (timmy skin) ├── skins/ ← UI skins (timmy skin)
@@ -39,10 +40,14 @@ If a file answers "who is Timmy?" or "how does Hermes host him?", it belongs
here. If it answers "what has Timmy done or learned?" it belongs in here. If it answers "what has Timmy done or learned?" it belongs in
`timmy-home`. `timmy-home`.
The scripts in `bin/` are live operational helpers for the Hermes sidecar.
What is dead are the old long-running bash worker loops, not every script in
this repo.
## Orchestration: Huey ## Orchestration: Huey
All orchestration (triage, PR review, dispatch) runs via [Huey](https://github.com/coleifer/huey) with SQLite. All orchestration (triage, PR review, dispatch) runs via [Huey](https://github.com/coleifer/huey) with SQLite.
`orchestration.py` (6 lines) + `tasks.py` (~70 lines) replace the entire sovereign-orchestration repo (3,846 lines). `orchestration.py` + `tasks.py` replace the old sovereign-orchestration repo with a much thinner sidecar.
```bash ```bash
pip install huey pip install huey

42
bin/pipeline-freshness.sh Executable file
View File

@@ -0,0 +1,42 @@
#!/usr/bin/env bash
set -euo pipefail
SESSIONS_DIR="$HOME/.hermes/sessions"
EXPORT_DIR="$HOME/.timmy/training-data/dpo-pairs"
latest_session=$(find "$SESSIONS_DIR" -maxdepth 1 -name 'session_*.json' -type f -print 2>/dev/null | sort | tail -n 1)
latest_export=$(find "$EXPORT_DIR" -maxdepth 1 -name 'session_*.json' -type f -print 2>/dev/null | sort | tail -n 1)
echo "latest_session=${latest_session:-none}"
echo "latest_export=${latest_export:-none}"
if [ -z "${latest_session:-}" ]; then
echo "status=ok"
echo "reason=no sessions yet"
exit 0
fi
if [ -z "${latest_export:-}" ]; then
echo "status=lagging"
echo "reason=no exports yet"
exit 1
fi
session_mtime=$(stat -f '%m' "$latest_session")
export_mtime=$(stat -f '%m' "$latest_export")
lag_minutes=$(( (session_mtime - export_mtime) / 60 ))
if [ "$lag_minutes" -lt 0 ]; then
lag_minutes=0
fi
echo "lag_minutes=$lag_minutes"
if [ "$lag_minutes" -gt 300 ]; then
echo "status=lagging"
echo "reason=exports more than 5 hours behind sessions"
exit 1
fi
echo "status=ok"
echo "reason=exports within freshness window"

View File

@@ -3,7 +3,7 @@
# This is the canonical way to deploy Timmy's configuration. # This is the canonical way to deploy Timmy's configuration.
# Hermes-agent is the engine. timmy-config is the driver's seat. # Hermes-agent is the engine. timmy-config is the driver's seat.
# #
# Usage: ./deploy.sh [--restart-loops] # Usage: ./deploy.sh
set -euo pipefail set -euo pipefail
@@ -74,24 +74,10 @@ done
chmod +x "$HERMES_HOME/bin/"*.sh "$HERMES_HOME/bin/"*.py 2>/dev/null || true chmod +x "$HERMES_HOME/bin/"*.sh "$HERMES_HOME/bin/"*.py 2>/dev/null || true
log "bin/ -> $HERMES_HOME/bin/" log "bin/ -> $HERMES_HOME/bin/"
# === Restart loops if requested === if [ "${1:-}" != "" ]; then
if [ "${1:-}" = "--restart-loops" ]; then echo "ERROR: deploy.sh no longer accepts legacy loop flags." >&2
log "Killing existing loops..." echo "Deploy the sidecar only. Do not relaunch deprecated bash loops." >&2
pkill -f 'claude-loop.sh' 2>/dev/null || true exit 1
pkill -f 'gemini-loop.sh' 2>/dev/null || true
pkill -f 'timmy-orchestrator.sh' 2>/dev/null || true
sleep 2
log "Clearing stale locks..."
rm -rf "$HERMES_HOME/logs/claude-locks/"* 2>/dev/null || true
rm -rf "$HERMES_HOME/logs/gemini-locks/"* 2>/dev/null || true
log "Relaunching loops..."
nohup bash "$HERMES_HOME/bin/timmy-orchestrator.sh" >> "$HERMES_HOME/logs/timmy-orchestrator.log" 2>&1 &
nohup bash "$HERMES_HOME/bin/claude-loop.sh" 2 >> "$HERMES_HOME/logs/claude-loop.log" 2>&1 &
nohup bash "$HERMES_HOME/bin/gemini-loop.sh" 1 >> "$HERMES_HOME/logs/gemini-loop.log" 2>&1 &
sleep 1
log "Loops relaunched."
fi fi
log "Deploy complete. timmy-config applied to $HERMES_HOME/" log "Deploy complete. timmy-config applied to $HERMES_HOME/"

View File

@@ -26,6 +26,13 @@ NET_LINE_LIMIT = 10
HEARTBEAT_MODEL = "hermes4:14b" HEARTBEAT_MODEL = "hermes4:14b"
FALLBACK_MODEL = "hermes3:8b" FALLBACK_MODEL = "hermes3:8b"
LOCAL_PROVIDER_BASE_URL = "http://localhost:8081/v1"
LOCAL_PROVIDER_MODEL = HEARTBEAT_MODEL
def newest_file(directory, pattern):
files = sorted(directory.glob(pattern))
return files[-1] if files else None
def hermes_local(prompt, model=None, caller_tag=None, toolsets=None): def hermes_local(prompt, model=None, caller_tag=None, toolsets=None):
@@ -322,26 +329,32 @@ def session_export():
@huey.periodic_task(crontab(minute="*/5")) # every 5 minutes @huey.periodic_task(crontab(minute="*/5")) # every 5 minutes
def model_health(): def model_health():
"""Check Ollama is running, a model is loaded, inference responds.""" """Check the active local inference surface and export freshness."""
checks = {} checks = {}
models_url = f"{LOCAL_PROVIDER_BASE_URL}/models"
chat_url = f"{LOCAL_PROVIDER_BASE_URL}/chat/completions"
# 1. Is Ollama process running? checks["provider"] = "local-llama.cpp"
checks["provider_base_url"] = LOCAL_PROVIDER_BASE_URL
checks["provider_model"] = LOCAL_PROVIDER_MODEL
# 1. Is the local inference process running?
try: try:
result = subprocess.run( result = subprocess.run(
["pgrep", "-f", "ollama"], ["pgrep", "-f", "llama-server|ollama"],
capture_output=True, timeout=5 capture_output=True, timeout=5
) )
checks["ollama_running"] = result.returncode == 0 checks["local_inference_running"] = result.returncode == 0
except Exception: except Exception:
checks["ollama_running"] = False checks["local_inference_running"] = False
# 2. Can we hit the API? # 2. Can we hit the configured API?
try: try:
import urllib.request import urllib.request
req = urllib.request.Request("http://localhost:11434/api/tags") req = urllib.request.Request(models_url)
with urllib.request.urlopen(req, timeout=5) as resp: with urllib.request.urlopen(req, timeout=5) as resp:
data = json.loads(resp.read()) data = json.loads(resp.read())
models = [m["name"] for m in data.get("models", [])] models = [m.get("id", "?") for m in data.get("data", [])]
checks["models_loaded"] = models checks["models_loaded"] = models
checks["api_responding"] = True checks["api_responding"] = True
except Exception as e: except Exception as e:
@@ -352,13 +365,13 @@ def model_health():
if checks.get("api_responding"): if checks.get("api_responding"):
try: try:
payload = json.dumps({ payload = json.dumps({
"model": "hermes3:8b", "model": LOCAL_PROVIDER_MODEL,
"messages": [{"role": "user", "content": "ping"}], "messages": [{"role": "user", "content": "ping"}],
"max_tokens": 5, "max_tokens": 5,
"stream": False, "stream": False,
}).encode() }).encode()
req = urllib.request.Request( req = urllib.request.Request(
"http://localhost:11434/v1/chat/completions", chat_url,
data=payload, data=payload,
headers={"Content-Type": "application/json"}, headers={"Content-Type": "application/json"},
) )
@@ -368,6 +381,26 @@ def model_health():
checks["inference_ok"] = False checks["inference_ok"] = False
checks["inference_error"] = str(e) checks["inference_error"] = str(e)
# 4. Is session export keeping up with new Hermes sessions?
sessions_dir = HERMES_HOME / "sessions"
export_dir = TIMMY_HOME / "training-data" / "dpo-pairs"
latest_session = newest_file(sessions_dir, "session_*.json")
latest_export = newest_file(export_dir, "session_*.json")
checks["latest_session"] = latest_session.name if latest_session else None
checks["latest_export"] = latest_export.name if latest_export else None
if latest_session and latest_export:
session_mtime = latest_session.stat().st_mtime
export_mtime = latest_export.stat().st_mtime
lag_minutes = max(0, int((session_mtime - export_mtime) // 60))
checks["export_lag_minutes"] = lag_minutes
checks["export_fresh"] = lag_minutes <= 300
elif latest_session and not latest_export:
checks["export_lag_minutes"] = None
checks["export_fresh"] = False
else:
checks["export_lag_minutes"] = 0
checks["export_fresh"] = True
# Write health status to a file for other tools to read # Write health status to a file for other tools to read
health_file = HERMES_HOME / "model_health.json" health_file = HERMES_HOME / "model_health.json"
checks["timestamp"] = datetime.now(timezone.utc).isoformat() checks["timestamp"] = datetime.now(timezone.utc).isoformat()