config: update bin/timmy-dashboard,config.yaml,docs/local-model-integration-sketch.md,tasks.py

This commit is contained in:
Alexander Whitestone
2026-03-26 17:00:22 -04:00
parent 7468a6d063
commit 49020b34d9
4 changed files with 811 additions and 15 deletions

View File

@@ -0,0 +1,438 @@
# Local Model Integration Sketch v2
# Hermes4-14B in the Heartbeat Loop — No New Telemetry
## Principle
No new inference layer. Huey tasks call `hermes chat -q` pointed at
Ollama. Hermes handles sessions, token tracking, cost logging.
The dashboard reads what Hermes already stores.
---
## Why Not Ollama Directly?
Ollama is fine as a serving backend. The issue isn't Ollama — it's that
calling Ollama directly with urllib bypasses the harness. The harness
already tracks sessions, tokens, model/provider, platform. Building a
second telemetry layer is owning code we don't need.
Ollama as a named provider isn't wired into the --provider flag yet,
but routing works via env vars:
HERMES_MODEL="hermes4:14b" \
HERMES_PROVIDER="custom" \
HERMES_BASE_URL="http://localhost:11434/v1" \
hermes chat -q "prompt here" -Q
This creates a tracked session, logs tokens, and returns the response.
That's our local inference call.
### Alternatives to Ollama for serving:
- **llama.cpp server** — lighter, no Python, raw HTTP. Good for single
model serving. Less convenient for model switching.
- **vLLM** — best throughput, but needs NVIDIA GPU. Not for M3 Mac.
- **MLX serving** — native Apple Silicon, but no OpenAI-compat API yet.
MLX is for training, not serving (our current policy).
- **llamafile** — single binary, portable. Good for distribution.
Verdict: Ollama is fine. It's the standard OpenAI-compat local server
on Mac. The issue was never Ollama — it was bypassing the harness.
---
## 1. The Call Pattern
One function in tasks.py that all Huey tasks use:
```python
import subprocess
import json
HERMES_BIN = "hermes"
LOCAL_ENV = {
"HERMES_MODEL": "hermes4:14b",
"HERMES_PROVIDER": "custom",
"HERMES_BASE_URL": "http://localhost:11434/v1",
}
def hermes_local(prompt, caller_tag=None, max_retries=2):
"""Call hermes with local Ollama model. Returns response text.
Every call creates a hermes session with full telemetry.
caller_tag gets prepended to prompt for searchability.
"""
import os
env = os.environ.copy()
env.update(LOCAL_ENV)
tagged_prompt = prompt
if caller_tag:
tagged_prompt = f"[{caller_tag}] {prompt}"
for attempt in range(max_retries + 1):
try:
result = subprocess.run(
[HERMES_BIN, "chat", "-q", tagged_prompt, "-Q", "-t", "none"],
capture_output=True, text=True,
timeout=120, env=env,
)
if result.returncode == 0 and result.stdout.strip():
# Strip the session_id line from -Q output
lines = result.stdout.strip().split("\n")
response_lines = [l for l in lines if not l.startswith("session_id:")]
return "\n".join(response_lines).strip()
except subprocess.TimeoutExpired:
if attempt == max_retries:
return None
continue
return None
```
Notes:
- `-t none` disables all toolsets — the heartbeat model shouldn't
have terminal/file access. Pure reasoning only.
- `-Q` quiet mode suppresses banner/spinner, gives clean output.
- Every call creates a session in Hermes session store. Searchable,
exportable, countable.
- The `[caller_tag]` prefix lets you filter sessions by which Huey
task generated them: `hermes sessions list | grep heartbeat`
---
## 2. Heartbeat DECIDE Phase
Replace the hardcoded if/else with a model call:
```python
# In heartbeat_tick(), replace the DECIDE + ACT section:
# DECIDE: let hermes4:14b reason about what to do
decide_prompt = f"""System state at {now.isoformat()}:
{json.dumps(perception, indent=2)}
Previous tick: {last_tick.get('tick_id', 'none')}
You are the heartbeat monitor. Based on this state:
1. List any actions needed (alerts, restarts, escalations). Empty if all OK.
2. Rate severity: ok, warning, or critical.
3. One sentence of reasoning.
Respond ONLY with JSON:
{{"actions": [], "severity": "ok", "reasoning": "..."}}"""
decision = None
try:
raw = hermes_local(decide_prompt, caller_tag="heartbeat_tick")
if raw:
# Try to parse JSON from the response
# Model might wrap it in markdown, so extract
for line in raw.split("\n"):
line = line.strip()
if line.startswith("{"):
decision = json.loads(line)
break
if not decision:
decision = json.loads(raw)
except (json.JSONDecodeError, Exception) as e:
decision = None
# Fallback to hardcoded logic if model fails or is down
if decision is None:
actions = []
if not perception.get("gitea_alive"):
actions.append("ALERT: Gitea unreachable")
health = perception.get("model_health", {})
if isinstance(health, dict) and not health.get("ollama_running"):
actions.append("ALERT: Ollama not running")
decision = {
"actions": actions,
"severity": "fallback",
"reasoning": "model unavailable, used hardcoded checks"
}
tick_record["decision"] = decision
actions = decision.get("actions", [])
```
---
## 3. DPO Candidate Collection
No new database. Hermes sessions ARE the DPO candidates.
Every `hermes_local()` call creates a session. To extract DPO pairs:
```bash
# Export all local-model sessions
hermes sessions export --output /tmp/local-sessions.jsonl
# Filter for heartbeat decisions
grep "heartbeat_tick" /tmp/local-sessions.jsonl > heartbeat_decisions.jsonl
```
The existing `session_export` Huey task (runs every 4h) already extracts
user→assistant pairs. It just needs to be aware that some sessions are
now local-model decisions instead of human conversations.
For DPO annotation, add a simple review script:
```python
# review_decisions.py — reads heartbeat tick logs, shows model decisions,
# asks Alexander to mark chosen/rejected
# Writes annotations back to the tick log files
import json
from pathlib import Path
TICK_DIR = Path.home() / ".timmy" / "heartbeat"
for log_file in sorted(TICK_DIR.glob("ticks_*.jsonl")):
for line in log_file.read_text().strip().split("\n"):
tick = json.loads(line)
decision = tick.get("decision", {})
if decision.get("severity") == "fallback":
continue # skip fallback entries
print(f"\n--- Tick {tick['tick_id']} ---")
print(f"Perception: {json.dumps(tick['perception'], indent=2)}")
print(f"Decision: {json.dumps(decision, indent=2)}")
rating = input("Rate (c=chosen, r=rejected, s=skip): ").strip()
if rating in ("c", "r"):
tick["dpo_label"] = "chosen" if rating == "c" else "rejected"
# write back... (append to annotated file)
```
---
## 4. Dashboard — Reads Hermes Data
```python
#!/usr/bin/env python3
"""Timmy Model Dashboard — reads from Hermes, owns nothing."""
import json
import os
import subprocess
import sys
import time
import urllib.request
from datetime import datetime
from pathlib import Path
HERMES_HOME = Path.home() / ".hermes"
TIMMY_HOME = Path.home() / ".timmy"
def get_ollama_models():
"""What's available in Ollama."""
try:
req = urllib.request.Request("http://localhost:11434/api/tags")
with urllib.request.urlopen(req, timeout=5) as resp:
return json.loads(resp.read()).get("models", [])
except Exception:
return []
def get_loaded_models():
"""What's actually in VRAM right now."""
try:
req = urllib.request.Request("http://localhost:11434/api/ps")
with urllib.request.urlopen(req, timeout=5) as resp:
return json.loads(resp.read()).get("models", [])
except Exception:
return []
def get_huey_status():
try:
r = subprocess.run(["pgrep", "-f", "huey_consumer"],
capture_output=True, timeout=5)
return r.returncode == 0
except Exception:
return False
def get_hermes_sessions(hours=24):
"""Read session metadata from Hermes session store."""
sessions_file = HERMES_HOME / "sessions" / "sessions.json"
if not sessions_file.exists():
return []
try:
data = json.loads(sessions_file.read_text())
return list(data.values())
except Exception:
return []
def get_heartbeat_ticks(date_str=None):
"""Read today's heartbeat ticks."""
if not date_str:
date_str = datetime.now().strftime("%Y%m%d")
tick_file = TIMMY_HOME / "heartbeat" / f"ticks_{date_str}.jsonl"
if not tick_file.exists():
return []
ticks = []
for line in tick_file.read_text().strip().split("\n"):
try:
ticks.append(json.loads(line))
except Exception:
continue
return ticks
def render(hours=24):
models = get_ollama_models()
loaded = get_loaded_models()
huey = get_huey_status()
sessions = get_hermes_sessions(hours)
ticks = get_heartbeat_ticks()
loaded_names = {m.get("name", "") for m in loaded}
print("\033[2J\033[H")
print("=" * 70)
print(" TIMMY MODEL DASHBOARD")
now = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
print(f" {now} | Huey: {'UP' if huey else 'DOWN'} | Ollama models: {len(models)}")
print("=" * 70)
# DEPLOYMENTS
print("\n LOCAL MODELS")
print(" " + "-" * 55)
for m in models:
name = m.get("name", "?")
size_gb = m.get("size", 0) / 1e9
status = "IN VRAM" if name in loaded_names else "on disk"
print(f" {name:35s} {size_gb:5.1f}GB {status}")
if not models:
print(" (Ollama not responding)")
# HERMES SESSION ACTIVITY
# Count sessions by platform/provider
print(f"\n HERMES SESSIONS (recent)")
print(" " + "-" * 55)
local_sessions = [s for s in sessions
if "localhost" in str(s.get("origin", {}))]
cli_sessions = [s for s in sessions
if s.get("platform") == "cli" or s.get("origin", {}).get("platform") == "cli"]
total_tokens = sum(s.get("total_tokens", 0) for s in sessions)
print(f" Total sessions: {len(sessions)}")
print(f" CLI sessions: {len(cli_sessions)}")
print(f" Total tokens: {total_tokens:,}")
# HEARTBEAT STATUS
print(f"\n HEARTBEAT ({len(ticks)} ticks today)")
print(" " + "-" * 55)
if ticks:
last = ticks[-1]
decision = last.get("decision", {})
severity = decision.get("severity", "unknown")
reasoning = decision.get("reasoning", "no model decision yet")
print(f" Last tick: {last.get('tick_id', '?')}")
print(f" Severity: {severity}")
print(f" Reasoning: {reasoning[:60]}")
# Count model vs fallback decisions
model_decisions = sum(1 for t in ticks
if t.get("decision", {}).get("severity") != "fallback")
fallback = len(ticks) - model_decisions
print(f" Model decisions: {model_decisions} | Fallback: {fallback}")
# DPO labels if any
labeled = sum(1 for t in ticks if "dpo_label" in t)
if labeled:
chosen = sum(1 for t in ticks if t.get("dpo_label") == "chosen")
rejected = sum(1 for t in ticks if t.get("dpo_label") == "rejected")
print(f" DPO labeled: {labeled} (chosen: {chosen}, rejected: {rejected})")
else:
print(" (no ticks today)")
# ACTIVE LOOPS
print(f"\n ACTIVE LOOPS USING LOCAL MODELS")
print(" " + "-" * 55)
print(" heartbeat_tick 10m hermes4:14b DECIDE phase")
print(" (future) 15m hermes4:14b issue triage")
print(" (future) daily timmy:v0.1 morning report")
print(f"\n NON-LOCAL LOOPS (Gemini/Grok API)")
print(" " + "-" * 55)
print(" gemini_worker 20m gemini-2.5-pro aider")
print(" grok_worker 20m grok-3-fast opencode")
print(" cross_review 30m both PR review")
print("\n" + "=" * 70)
if __name__ == "__main__":
watch = "--watch" in sys.argv
hours = 24
for a in sys.argv[1:]:
if a.startswith("--hours="):
hours = int(a.split("=")[1])
if watch:
while True:
render(hours)
time.sleep(30)
else:
render(hours)
```
---
## 5. Implementation Steps
### Step 1: Add hermes_local() to tasks.py
- One function, ~20 lines
- Calls `hermes chat -q` with Ollama env vars
- All telemetry comes from Hermes for free
### Step 2: Wire heartbeat_tick DECIDE phase
- Replace 6 lines of if/else with hermes_local() call
- Keep hardcoded fallback when model is down
- Decision stored in tick record for DPO review
### Step 3: Fix the MCP server warning
- The orchestration MCP server path is broken — harmless but noisy
- Either fix the path or remove from config
### Step 4: Drop model_dashboard.py in timmy-config/bin/
- Reads Ollama API, Hermes sessions, heartbeat ticks
- No new data stores — just views over existing ones
- `python3 model_dashboard.py --watch` for live view
### Step 5: Expand to more Huey tasks
- triage_issues: model reads issue, picks agent
- good_morning_report: model writes the "From Timmy" section
- Each expansion is just calling hermes_local() with a different prompt
---
## What Gets Hotfixed in Hermes Config
If `hermes insights` is broken (the cache_read_tokens column error),
that needs a fix. The dashboard falls back to reading sessions.json
directly, but insights would be the better data source.
The `providers.ollama` section in config.yaml exists but isn't wired
to the --provider flag. Filing this upstream or patching locally would
let us do `hermes chat -q "..." --provider ollama` cleanly instead
of relying on env vars. Not blocking — env vars work today.
---
## What This Owns
- hermes_local() — 20-line wrapper around a subprocess call
- model_dashboard.py — read-only views over existing data
- review_decisions.py — optional DPO annotation CLI
## What This Does NOT Own
- Inference. Ollama does that.
- Telemetry. Hermes does that.
- Session storage. Hermes does that.
- Token counting. Hermes does that.
- Training pipeline. Already exists in timmy-config/training/.