Tighten Hermes cutover and export checks

2026-03-27 17:35:07 -04:00
parent 2384908be7
commit d72ae92189
5 changed files with 106 additions and 39 deletions
--- a/DEPRECATED.md
+++ b/DEPRECATED.md
@@ -1,7 +1,7 @@
 # DEPRECATED — Bash Loop Scripts Removed
 **Date:** 2026-03-25
-**Reason:** Replaced by sovereign-orchestration (SQLite + Python single-process executor)
+**Reason:** Replaced by Hermes + timmy-config sidecar orchestration
 ## What was removed
 - claude-loop.sh, gemini-loop.sh, agent-loop.sh
@@ -9,14 +9,15 @@
 - nexus-merge-bot.sh, claudemax-watchdog.sh, timmy-loopstat.sh
 ## What replaces them
-**Repo:** Timmy_Foundation/sovereign-orchestration
+**Harness:** Hermes
-**Entry point:** `python3 src/sovereign_executor.py --workers 3 --poll 30`
+**Overlay repo:** Timmy_Foundation/timmy-config
-**Features:** SQLite task queue, crash recovery, dedup, playbooks, MCP server
+**Entry points:** `orchestration.py`, `tasks.py`, `deploy.sh`
-**Issues:** #29 (fix imports), #30 (deploy as service)
+**Features:** Huey + SQLite scheduling, local-model health checks, session export, DPO artifact staging
 ## Why
 The bash loops crash-looped, produced zero work after relaunch, had no crash
-recovery, no dedup, and required 8 separate scripts. The Python executor is
+recovery, no durable export path, and required too many ad hoc scripts. The
-one process with SQLite durability.
+Hermes sidecar keeps orchestration close to Timmy's actual config and training
 surfaces.
-Do NOT recreate bash loops. If the executor is broken, fix the executor.
+Do NOT recreate bash loops. If orchestration is broken, fix the Hermes sidecar.
--- a/README.md
+++ b/README.md
@@ -14,11 +14,12 @@ timmy-config/
 ├── DEPRECATED.md              ← What was removed and why
 ├── config.yaml                ← Hermes harness configuration
 ├── channel_directory.json     ← Platform channel mappings
-├── bin/                       ← Utility scripts (NOT loops — see below)
+├── bin/                       ← Live utility scripts (NOT deprecated loops)
 │   ├── hermes-startup.sh      ← Hermes boot sequence
 │   ├── agent-dispatch.sh      ← Manual agent dispatch
 │   ├── ops-panel.sh           ← Ops dashboard panel
 │   ├── ops-gitea.sh           ← Gitea ops helpers
 │   ├── pipeline-freshness.sh  ← Session/export drift check
 │   └── timmy-status.sh        ← Status check
 ├── memories/                  ← Persistent memory YAML
 ├── skins/                     ← UI skins (timmy skin)
@@ -39,10 +40,14 @@ If a file answers "who is Timmy?" or "how does Hermes host him?", it belongs
 here. If it answers "what has Timmy done or learned?" it belongs in
 `timmy-home`.
 The scripts in `bin/` are live operational helpers for the Hermes sidecar.
 What is dead are the old long-running bash worker loops, not every script in
 this repo.
 ## Orchestration: Huey
 All orchestration (triage, PR review, dispatch) runs via [Huey](https://github.com/coleifer/huey) with SQLite.
-`orchestration.py` (6 lines) + `tasks.py` (~70 lines) replace the entire sovereign-orchestration repo (3,846 lines).
+`orchestration.py` + `tasks.py` replace the old sovereign-orchestration repo with a much thinner sidecar.
 ```bash
 pip install huey
--- a/bin/pipeline-freshness.sh
+++ b/bin/pipeline-freshness.sh
@@ -0,0 +1,42 @@
 #!/usr/bin/env bash
 set -euo pipefail
 SESSIONS_DIR="$HOME/.hermes/sessions"
 EXPORT_DIR="$HOME/.timmy/training-data/dpo-pairs"
 latest_session=$(find "$SESSIONS_DIR" -maxdepth 1 -name 'session_*.json' -type f -print 2>/dev/null | sort | tail -n 1)
 latest_export=$(find "$EXPORT_DIR" -maxdepth 1 -name 'session_*.json' -type f -print 2>/dev/null | sort | tail -n 1)
 echo "latest_session=${latest_session:-none}"
 echo "latest_export=${latest_export:-none}"
 if [ -z "${latest_session:-}" ]; then
  echo "status=ok"
  echo "reason=no sessions yet"
  exit 0
 fi
 if [ -z "${latest_export:-}" ]; then
  echo "status=lagging"
  echo "reason=no exports yet"
  exit 1
 fi
 session_mtime=$(stat -f '%m' "$latest_session")
 export_mtime=$(stat -f '%m' "$latest_export")
 lag_minutes=$(( (session_mtime - export_mtime) / 60 ))
 if [ "$lag_minutes" -lt 0 ]; then
  lag_minutes=0
 fi
 echo "lag_minutes=$lag_minutes"
 if [ "$lag_minutes" -gt 300 ]; then
  echo "status=lagging"
  echo "reason=exports more than 5 hours behind sessions"
  exit 1
 fi
 echo "status=ok"
 echo "reason=exports within freshness window"
--- a/deploy.sh
+++ b/deploy.sh
@@ -3,7 +3,7 @@
 # This is the canonical way to deploy Timmy's configuration.
 # Hermes-agent is the engine. timmy-config is the driver's seat.
 #
-# Usage: ./deploy.sh [--restart-loops]
+# Usage: ./deploy.sh
 set -euo pipefail
@@ -74,24 +74,10 @@ done
 chmod +x "$HERMES_HOME/bin/"*.sh "$HERMES_HOME/bin/"*.py 2>/dev/null || true
 log "bin/ -> $HERMES_HOME/bin/"
-# === Restart loops if requested ===
+if [ "${1:-}" != "" ]; then
-if [ "${1:-}" = "--restart-loops" ]; then
+  echo "ERROR: deploy.sh no longer accepts legacy loop flags." >&2
-  log "Killing existing loops..."
+  echo "Deploy the sidecar only. Do not relaunch deprecated bash loops." >&2
-  pkill -f 'claude-loop.sh' 2>/dev/null || true
+  exit 1
  pkill -f 'gemini-loop.sh' 2>/dev/null || true
  pkill -f 'timmy-orchestrator.sh' 2>/dev/null || true
  sleep 2
  log "Clearing stale locks..."
  rm -rf "$HERMES_HOME/logs/claude-locks/"* 2>/dev/null || true
  rm -rf "$HERMES_HOME/logs/gemini-locks/"* 2>/dev/null || true
  log "Relaunching loops..."
  nohup bash "$HERMES_HOME/bin/timmy-orchestrator.sh" >> "$HERMES_HOME/logs/timmy-orchestrator.log" 2>&1 &
  nohup bash "$HERMES_HOME/bin/claude-loop.sh" 2 >> "$HERMES_HOME/logs/claude-loop.log" 2>&1 &
  nohup bash "$HERMES_HOME/bin/gemini-loop.sh" 1 >> "$HERMES_HOME/logs/gemini-loop.log" 2>&1 &
  sleep 1
  log "Loops relaunched."
 fi
 log "Deploy complete. timmy-config applied to $HERMES_HOME/"
--- a/tasks.py
+++ b/tasks.py
@@ -26,6 +26,13 @@ NET_LINE_LIMIT = 10
 HEARTBEAT_MODEL = "hermes4:14b"
 FALLBACK_MODEL = "hermes3:8b"
 LOCAL_PROVIDER_BASE_URL = "http://localhost:8081/v1"
 LOCAL_PROVIDER_MODEL = HEARTBEAT_MODEL
 def newest_file(directory, pattern):
    files = sorted(directory.glob(pattern))
    return files[-1] if files else None
 def hermes_local(prompt, model=None, caller_tag=None, toolsets=None):
@@ -322,26 +329,32 @@ def session_export():
@huey.periodic_task(crontab(minute="*/5"))  # every 5 minutes
 def model_health():
-    """Check Ollama is running, a model is loaded, inference responds."""
+    """Check the active local inference surface and export freshness."""
    checks = {}
    models_url = f"{LOCAL_PROVIDER_BASE_URL}/models"
    chat_url = f"{LOCAL_PROVIDER_BASE_URL}/chat/completions"
-    # 1. Is Ollama process running?
+    checks["provider"] = "local-llama.cpp"
    checks["provider_base_url"] = LOCAL_PROVIDER_BASE_URL
    checks["provider_model"] = LOCAL_PROVIDER_MODEL
    # 1. Is the local inference process running?
    try:
        result = subprocess.run(
-            ["pgrep", "-f", "ollama"],
+            ["pgrep", "-f", "llama-server|ollama"],
            capture_output=True, timeout=5
        )
-        checks["ollama_running"] = result.returncode == 0
+        checks["local_inference_running"] = result.returncode == 0
    except Exception:
-        checks["ollama_running"] = False
+        checks["local_inference_running"] = False
-    # 2. Can we hit the API?
+    # 2. Can we hit the configured API?
    try:
        import urllib.request
-        req = urllib.request.Request("http://localhost:11434/api/tags")
+        req = urllib.request.Request(models_url)
        with urllib.request.urlopen(req, timeout=5) as resp:
            data = json.loads(resp.read())
-            models = [m["name"] for m in data.get("models", [])]
+            models = [m.get("id", "?") for m in data.get("data", [])]
            checks["models_loaded"] = models
            checks["api_responding"] = True
    except Exception as e:
@@ -352,13 +365,13 @@ def model_health():
    if checks.get("api_responding"):
        try:
            payload = json.dumps({
-                "model": "hermes3:8b",
+                "model": LOCAL_PROVIDER_MODEL,
                "messages": [{"role": "user", "content": "ping"}],
                "max_tokens": 5,
                "stream": False,
            }).encode()
            req = urllib.request.Request(
-                "http://localhost:11434/v1/chat/completions",
+                chat_url,
                data=payload,
                headers={"Content-Type": "application/json"},
            )
@@ -368,6 +381,26 @@ def model_health():
            checks["inference_ok"] = False
            checks["inference_error"] = str(e)
    # 4. Is session export keeping up with new Hermes sessions?
    sessions_dir = HERMES_HOME / "sessions"
    export_dir = TIMMY_HOME / "training-data" / "dpo-pairs"
    latest_session = newest_file(sessions_dir, "session_*.json")
    latest_export = newest_file(export_dir, "session_*.json")
    checks["latest_session"] = latest_session.name if latest_session else None
    checks["latest_export"] = latest_export.name if latest_export else None
    if latest_session and latest_export:
        session_mtime = latest_session.stat().st_mtime
        export_mtime = latest_export.stat().st_mtime
        lag_minutes = max(0, int((session_mtime - export_mtime) // 60))
        checks["export_lag_minutes"] = lag_minutes
        checks["export_fresh"] = lag_minutes <= 300
    elif latest_session and not latest_export:
        checks["export_lag_minutes"] = None
        checks["export_fresh"] = False
    else:
        checks["export_lag_minutes"] = 0
        checks["export_fresh"] = True
    # Write health status to a file for other tools to read
    health_file = HERMES_HOME / "model_health.json"
    checks["timestamp"] = datetime.now(timezone.utc).isoformat()