Tighten Hermes cutover and export checks
This commit is contained in:
@@ -1,7 +1,7 @@
|
|||||||
# DEPRECATED — Bash Loop Scripts Removed
|
# DEPRECATED — Bash Loop Scripts Removed
|
||||||
|
|
||||||
**Date:** 2026-03-25
|
**Date:** 2026-03-25
|
||||||
**Reason:** Replaced by sovereign-orchestration (SQLite + Python single-process executor)
|
**Reason:** Replaced by Hermes + timmy-config sidecar orchestration
|
||||||
|
|
||||||
## What was removed
|
## What was removed
|
||||||
- claude-loop.sh, gemini-loop.sh, agent-loop.sh
|
- claude-loop.sh, gemini-loop.sh, agent-loop.sh
|
||||||
@@ -9,14 +9,15 @@
|
|||||||
- nexus-merge-bot.sh, claudemax-watchdog.sh, timmy-loopstat.sh
|
- nexus-merge-bot.sh, claudemax-watchdog.sh, timmy-loopstat.sh
|
||||||
|
|
||||||
## What replaces them
|
## What replaces them
|
||||||
**Repo:** Timmy_Foundation/sovereign-orchestration
|
**Harness:** Hermes
|
||||||
**Entry point:** `python3 src/sovereign_executor.py --workers 3 --poll 30`
|
**Overlay repo:** Timmy_Foundation/timmy-config
|
||||||
**Features:** SQLite task queue, crash recovery, dedup, playbooks, MCP server
|
**Entry points:** `orchestration.py`, `tasks.py`, `deploy.sh`
|
||||||
**Issues:** #29 (fix imports), #30 (deploy as service)
|
**Features:** Huey + SQLite scheduling, local-model health checks, session export, DPO artifact staging
|
||||||
|
|
||||||
## Why
|
## Why
|
||||||
The bash loops crash-looped, produced zero work after relaunch, had no crash
|
The bash loops crash-looped, produced zero work after relaunch, had no crash
|
||||||
recovery, no dedup, and required 8 separate scripts. The Python executor is
|
recovery, no durable export path, and required too many ad hoc scripts. The
|
||||||
one process with SQLite durability.
|
Hermes sidecar keeps orchestration close to Timmy's actual config and training
|
||||||
|
surfaces.
|
||||||
|
|
||||||
Do NOT recreate bash loops. If the executor is broken, fix the executor.
|
Do NOT recreate bash loops. If orchestration is broken, fix the Hermes sidecar.
|
||||||
|
|||||||
@@ -14,11 +14,12 @@ timmy-config/
|
|||||||
├── DEPRECATED.md ← What was removed and why
|
├── DEPRECATED.md ← What was removed and why
|
||||||
├── config.yaml ← Hermes harness configuration
|
├── config.yaml ← Hermes harness configuration
|
||||||
├── channel_directory.json ← Platform channel mappings
|
├── channel_directory.json ← Platform channel mappings
|
||||||
├── bin/ ← Utility scripts (NOT loops — see below)
|
├── bin/ ← Live utility scripts (NOT deprecated loops)
|
||||||
│ ├── hermes-startup.sh ← Hermes boot sequence
|
│ ├── hermes-startup.sh ← Hermes boot sequence
|
||||||
│ ├── agent-dispatch.sh ← Manual agent dispatch
|
│ ├── agent-dispatch.sh ← Manual agent dispatch
|
||||||
│ ├── ops-panel.sh ← Ops dashboard panel
|
│ ├── ops-panel.sh ← Ops dashboard panel
|
||||||
│ ├── ops-gitea.sh ← Gitea ops helpers
|
│ ├── ops-gitea.sh ← Gitea ops helpers
|
||||||
|
│ ├── pipeline-freshness.sh ← Session/export drift check
|
||||||
│ └── timmy-status.sh ← Status check
|
│ └── timmy-status.sh ← Status check
|
||||||
├── memories/ ← Persistent memory YAML
|
├── memories/ ← Persistent memory YAML
|
||||||
├── skins/ ← UI skins (timmy skin)
|
├── skins/ ← UI skins (timmy skin)
|
||||||
@@ -39,10 +40,14 @@ If a file answers "who is Timmy?" or "how does Hermes host him?", it belongs
|
|||||||
here. If it answers "what has Timmy done or learned?" it belongs in
|
here. If it answers "what has Timmy done or learned?" it belongs in
|
||||||
`timmy-home`.
|
`timmy-home`.
|
||||||
|
|
||||||
|
The scripts in `bin/` are live operational helpers for the Hermes sidecar.
|
||||||
|
What is dead are the old long-running bash worker loops, not every script in
|
||||||
|
this repo.
|
||||||
|
|
||||||
## Orchestration: Huey
|
## Orchestration: Huey
|
||||||
|
|
||||||
All orchestration (triage, PR review, dispatch) runs via [Huey](https://github.com/coleifer/huey) with SQLite.
|
All orchestration (triage, PR review, dispatch) runs via [Huey](https://github.com/coleifer/huey) with SQLite.
|
||||||
`orchestration.py` (6 lines) + `tasks.py` (~70 lines) replace the entire sovereign-orchestration repo (3,846 lines).
|
`orchestration.py` + `tasks.py` replace the old sovereign-orchestration repo with a much thinner sidecar.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
pip install huey
|
pip install huey
|
||||||
|
|||||||
42
bin/pipeline-freshness.sh
Executable file
42
bin/pipeline-freshness.sh
Executable file
@@ -0,0 +1,42 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
SESSIONS_DIR="$HOME/.hermes/sessions"
|
||||||
|
EXPORT_DIR="$HOME/.timmy/training-data/dpo-pairs"
|
||||||
|
|
||||||
|
latest_session=$(find "$SESSIONS_DIR" -maxdepth 1 -name 'session_*.json' -type f -print 2>/dev/null | sort | tail -n 1)
|
||||||
|
latest_export=$(find "$EXPORT_DIR" -maxdepth 1 -name 'session_*.json' -type f -print 2>/dev/null | sort | tail -n 1)
|
||||||
|
|
||||||
|
echo "latest_session=${latest_session:-none}"
|
||||||
|
echo "latest_export=${latest_export:-none}"
|
||||||
|
|
||||||
|
if [ -z "${latest_session:-}" ]; then
|
||||||
|
echo "status=ok"
|
||||||
|
echo "reason=no sessions yet"
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ -z "${latest_export:-}" ]; then
|
||||||
|
echo "status=lagging"
|
||||||
|
echo "reason=no exports yet"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
session_mtime=$(stat -f '%m' "$latest_session")
|
||||||
|
export_mtime=$(stat -f '%m' "$latest_export")
|
||||||
|
lag_minutes=$(( (session_mtime - export_mtime) / 60 ))
|
||||||
|
if [ "$lag_minutes" -lt 0 ]; then
|
||||||
|
lag_minutes=0
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "lag_minutes=$lag_minutes"
|
||||||
|
|
||||||
|
if [ "$lag_minutes" -gt 300 ]; then
|
||||||
|
echo "status=lagging"
|
||||||
|
echo "reason=exports more than 5 hours behind sessions"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "status=ok"
|
||||||
|
echo "reason=exports within freshness window"
|
||||||
24
deploy.sh
24
deploy.sh
@@ -3,7 +3,7 @@
|
|||||||
# This is the canonical way to deploy Timmy's configuration.
|
# This is the canonical way to deploy Timmy's configuration.
|
||||||
# Hermes-agent is the engine. timmy-config is the driver's seat.
|
# Hermes-agent is the engine. timmy-config is the driver's seat.
|
||||||
#
|
#
|
||||||
# Usage: ./deploy.sh [--restart-loops]
|
# Usage: ./deploy.sh
|
||||||
|
|
||||||
set -euo pipefail
|
set -euo pipefail
|
||||||
|
|
||||||
@@ -74,24 +74,10 @@ done
|
|||||||
chmod +x "$HERMES_HOME/bin/"*.sh "$HERMES_HOME/bin/"*.py 2>/dev/null || true
|
chmod +x "$HERMES_HOME/bin/"*.sh "$HERMES_HOME/bin/"*.py 2>/dev/null || true
|
||||||
log "bin/ -> $HERMES_HOME/bin/"
|
log "bin/ -> $HERMES_HOME/bin/"
|
||||||
|
|
||||||
# === Restart loops if requested ===
|
if [ "${1:-}" != "" ]; then
|
||||||
if [ "${1:-}" = "--restart-loops" ]; then
|
echo "ERROR: deploy.sh no longer accepts legacy loop flags." >&2
|
||||||
log "Killing existing loops..."
|
echo "Deploy the sidecar only. Do not relaunch deprecated bash loops." >&2
|
||||||
pkill -f 'claude-loop.sh' 2>/dev/null || true
|
exit 1
|
||||||
pkill -f 'gemini-loop.sh' 2>/dev/null || true
|
|
||||||
pkill -f 'timmy-orchestrator.sh' 2>/dev/null || true
|
|
||||||
sleep 2
|
|
||||||
|
|
||||||
log "Clearing stale locks..."
|
|
||||||
rm -rf "$HERMES_HOME/logs/claude-locks/"* 2>/dev/null || true
|
|
||||||
rm -rf "$HERMES_HOME/logs/gemini-locks/"* 2>/dev/null || true
|
|
||||||
|
|
||||||
log "Relaunching loops..."
|
|
||||||
nohup bash "$HERMES_HOME/bin/timmy-orchestrator.sh" >> "$HERMES_HOME/logs/timmy-orchestrator.log" 2>&1 &
|
|
||||||
nohup bash "$HERMES_HOME/bin/claude-loop.sh" 2 >> "$HERMES_HOME/logs/claude-loop.log" 2>&1 &
|
|
||||||
nohup bash "$HERMES_HOME/bin/gemini-loop.sh" 1 >> "$HERMES_HOME/logs/gemini-loop.log" 2>&1 &
|
|
||||||
sleep 1
|
|
||||||
log "Loops relaunched."
|
|
||||||
fi
|
fi
|
||||||
|
|
||||||
log "Deploy complete. timmy-config applied to $HERMES_HOME/"
|
log "Deploy complete. timmy-config applied to $HERMES_HOME/"
|
||||||
|
|||||||
53
tasks.py
53
tasks.py
@@ -26,6 +26,13 @@ NET_LINE_LIMIT = 10
|
|||||||
|
|
||||||
HEARTBEAT_MODEL = "hermes4:14b"
|
HEARTBEAT_MODEL = "hermes4:14b"
|
||||||
FALLBACK_MODEL = "hermes3:8b"
|
FALLBACK_MODEL = "hermes3:8b"
|
||||||
|
LOCAL_PROVIDER_BASE_URL = "http://localhost:8081/v1"
|
||||||
|
LOCAL_PROVIDER_MODEL = HEARTBEAT_MODEL
|
||||||
|
|
||||||
|
|
||||||
|
def newest_file(directory, pattern):
|
||||||
|
files = sorted(directory.glob(pattern))
|
||||||
|
return files[-1] if files else None
|
||||||
|
|
||||||
|
|
||||||
def hermes_local(prompt, model=None, caller_tag=None, toolsets=None):
|
def hermes_local(prompt, model=None, caller_tag=None, toolsets=None):
|
||||||
@@ -322,26 +329,32 @@ def session_export():
|
|||||||
|
|
||||||
@huey.periodic_task(crontab(minute="*/5")) # every 5 minutes
|
@huey.periodic_task(crontab(minute="*/5")) # every 5 minutes
|
||||||
def model_health():
|
def model_health():
|
||||||
"""Check Ollama is running, a model is loaded, inference responds."""
|
"""Check the active local inference surface and export freshness."""
|
||||||
checks = {}
|
checks = {}
|
||||||
|
models_url = f"{LOCAL_PROVIDER_BASE_URL}/models"
|
||||||
|
chat_url = f"{LOCAL_PROVIDER_BASE_URL}/chat/completions"
|
||||||
|
|
||||||
# 1. Is Ollama process running?
|
checks["provider"] = "local-llama.cpp"
|
||||||
|
checks["provider_base_url"] = LOCAL_PROVIDER_BASE_URL
|
||||||
|
checks["provider_model"] = LOCAL_PROVIDER_MODEL
|
||||||
|
|
||||||
|
# 1. Is the local inference process running?
|
||||||
try:
|
try:
|
||||||
result = subprocess.run(
|
result = subprocess.run(
|
||||||
["pgrep", "-f", "ollama"],
|
["pgrep", "-f", "llama-server|ollama"],
|
||||||
capture_output=True, timeout=5
|
capture_output=True, timeout=5
|
||||||
)
|
)
|
||||||
checks["ollama_running"] = result.returncode == 0
|
checks["local_inference_running"] = result.returncode == 0
|
||||||
except Exception:
|
except Exception:
|
||||||
checks["ollama_running"] = False
|
checks["local_inference_running"] = False
|
||||||
|
|
||||||
# 2. Can we hit the API?
|
# 2. Can we hit the configured API?
|
||||||
try:
|
try:
|
||||||
import urllib.request
|
import urllib.request
|
||||||
req = urllib.request.Request("http://localhost:11434/api/tags")
|
req = urllib.request.Request(models_url)
|
||||||
with urllib.request.urlopen(req, timeout=5) as resp:
|
with urllib.request.urlopen(req, timeout=5) as resp:
|
||||||
data = json.loads(resp.read())
|
data = json.loads(resp.read())
|
||||||
models = [m["name"] for m in data.get("models", [])]
|
models = [m.get("id", "?") for m in data.get("data", [])]
|
||||||
checks["models_loaded"] = models
|
checks["models_loaded"] = models
|
||||||
checks["api_responding"] = True
|
checks["api_responding"] = True
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
@@ -352,13 +365,13 @@ def model_health():
|
|||||||
if checks.get("api_responding"):
|
if checks.get("api_responding"):
|
||||||
try:
|
try:
|
||||||
payload = json.dumps({
|
payload = json.dumps({
|
||||||
"model": "hermes3:8b",
|
"model": LOCAL_PROVIDER_MODEL,
|
||||||
"messages": [{"role": "user", "content": "ping"}],
|
"messages": [{"role": "user", "content": "ping"}],
|
||||||
"max_tokens": 5,
|
"max_tokens": 5,
|
||||||
"stream": False,
|
"stream": False,
|
||||||
}).encode()
|
}).encode()
|
||||||
req = urllib.request.Request(
|
req = urllib.request.Request(
|
||||||
"http://localhost:11434/v1/chat/completions",
|
chat_url,
|
||||||
data=payload,
|
data=payload,
|
||||||
headers={"Content-Type": "application/json"},
|
headers={"Content-Type": "application/json"},
|
||||||
)
|
)
|
||||||
@@ -368,6 +381,26 @@ def model_health():
|
|||||||
checks["inference_ok"] = False
|
checks["inference_ok"] = False
|
||||||
checks["inference_error"] = str(e)
|
checks["inference_error"] = str(e)
|
||||||
|
|
||||||
|
# 4. Is session export keeping up with new Hermes sessions?
|
||||||
|
sessions_dir = HERMES_HOME / "sessions"
|
||||||
|
export_dir = TIMMY_HOME / "training-data" / "dpo-pairs"
|
||||||
|
latest_session = newest_file(sessions_dir, "session_*.json")
|
||||||
|
latest_export = newest_file(export_dir, "session_*.json")
|
||||||
|
checks["latest_session"] = latest_session.name if latest_session else None
|
||||||
|
checks["latest_export"] = latest_export.name if latest_export else None
|
||||||
|
if latest_session and latest_export:
|
||||||
|
session_mtime = latest_session.stat().st_mtime
|
||||||
|
export_mtime = latest_export.stat().st_mtime
|
||||||
|
lag_minutes = max(0, int((session_mtime - export_mtime) // 60))
|
||||||
|
checks["export_lag_minutes"] = lag_minutes
|
||||||
|
checks["export_fresh"] = lag_minutes <= 300
|
||||||
|
elif latest_session and not latest_export:
|
||||||
|
checks["export_lag_minutes"] = None
|
||||||
|
checks["export_fresh"] = False
|
||||||
|
else:
|
||||||
|
checks["export_lag_minutes"] = 0
|
||||||
|
checks["export_fresh"] = True
|
||||||
|
|
||||||
# Write health status to a file for other tools to read
|
# Write health status to a file for other tools to read
|
||||||
health_file = HERMES_HOME / "model_health.json"
|
health_file = HERMES_HOME / "model_health.json"
|
||||||
checks["timestamp"] = datetime.now(timezone.utc).isoformat()
|
checks["timestamp"] = datetime.now(timezone.utc).isoformat()
|
||||||
|
|||||||
Reference in New Issue
Block a user