config: update bin/timmy-dashboard,config.yaml,docs/local-model-integration-sketch.md,tasks.py

2026-03-26 17:00:22 -04:00
parent 7468a6d063
commit 49020b34d9
4 changed files with 811 additions and 15 deletions
--- a/docs/local-model-integration-sketch.md
+++ b/docs/local-model-integration-sketch.md
@@ -0,0 +1,438 @@
+# Local Model Integration Sketch v2
+# Hermes4-14B in the Heartbeat Loop — No New Telemetry
+
+## Principle
+
+No new inference layer. Huey tasks call `hermes chat -q` pointed at
+Ollama. Hermes handles sessions, token tracking, cost logging.
+The dashboard reads what Hermes already stores.
+
+---
+
+## Why Not Ollama Directly?
+
+Ollama is fine as a serving backend. The issue isn't Ollama — it's that
+calling Ollama directly with urllib bypasses the harness. The harness
+already tracks sessions, tokens, model/provider, platform. Building a
+second telemetry layer is owning code we don't need.
+
+Ollama as a named provider isn't wired into the --provider flag yet,
+but routing works via env vars:
+
+    HERMES_MODEL="hermes4:14b" \
+    HERMES_PROVIDER="custom" \
+    HERMES_BASE_URL="http://localhost:11434/v1" \
+    hermes chat -q "prompt here" -Q
+
+This creates a tracked session, logs tokens, and returns the response.
+That's our local inference call.
+
+### Alternatives to Ollama for serving:
+- **llama.cpp server** — lighter, no Python, raw HTTP. Good for single
+  model serving. Less convenient for model switching.
+- **vLLM** — best throughput, but needs NVIDIA GPU. Not for M3 Mac.
+- **MLX serving** — native Apple Silicon, but no OpenAI-compat API yet.
+  MLX is for training, not serving (our current policy).
+- **llamafile** — single binary, portable. Good for distribution.
+
+Verdict: Ollama is fine. It's the standard OpenAI-compat local server
+on Mac. The issue was never Ollama — it was bypassing the harness.
+
+---
+
+## 1. The Call Pattern
+
+One function in tasks.py that all Huey tasks use:
+
+```python
+import subprocess
+import json
+
+HERMES_BIN = "hermes"
+LOCAL_ENV = {
+    "HERMES_MODEL": "hermes4:14b",
+    "HERMES_PROVIDER": "custom",
+    "HERMES_BASE_URL": "http://localhost:11434/v1",
+}
+
+def hermes_local(prompt, caller_tag=None, max_retries=2):
+    """Call hermes with local Ollama model. Returns response text.
+    
+    Every call creates a hermes session with full telemetry.
+    caller_tag gets prepended to prompt for searchability.
+    """
+    import os
+    env = os.environ.copy()
+    env.update(LOCAL_ENV)
+    
+    tagged_prompt = prompt
+    if caller_tag:
+        tagged_prompt = f"[{caller_tag}] {prompt}"
+    
+    for attempt in range(max_retries + 1):
+        try:
+            result = subprocess.run(
+                [HERMES_BIN, "chat", "-q", tagged_prompt, "-Q", "-t", "none"],
+                capture_output=True, text=True,
+                timeout=120, env=env,
+            )
+            if result.returncode == 0 and result.stdout.strip():
+                # Strip the session_id line from -Q output
+                lines = result.stdout.strip().split("\n")
+                response_lines = [l for l in lines if not l.startswith("session_id:")]
+                return "\n".join(response_lines).strip()
+        except subprocess.TimeoutExpired:
+            if attempt == max_retries:
+                return None
+            continue
+    return None
+```
+
+Notes:
+- `-t none` disables all toolsets — the heartbeat model shouldn't
+  have terminal/file access. Pure reasoning only.
+- `-Q` quiet mode suppresses banner/spinner, gives clean output.
+- Every call creates a session in Hermes session store. Searchable,
+  exportable, countable.
+- The `[caller_tag]` prefix lets you filter sessions by which Huey
+  task generated them: `hermes sessions list | grep heartbeat`
+
+---
+
+## 2. Heartbeat DECIDE Phase
+
+Replace the hardcoded if/else with a model call:
+
+```python
+# In heartbeat_tick(), replace the DECIDE + ACT section:
+
+    # DECIDE: let hermes4:14b reason about what to do
+    decide_prompt = f"""System state at {now.isoformat()}:
+
+{json.dumps(perception, indent=2)}
+
+Previous tick: {last_tick.get('tick_id', 'none')}
+
+You are the heartbeat monitor. Based on this state:
+1. List any actions needed (alerts, restarts, escalations). Empty if all OK.
+2. Rate severity: ok, warning, or critical.
+3. One sentence of reasoning.
+
+Respond ONLY with JSON:
+{{"actions": [], "severity": "ok", "reasoning": "..."}}"""
+
+    decision = None
+    try:
+        raw = hermes_local(decide_prompt, caller_tag="heartbeat_tick")
+        if raw:
+            # Try to parse JSON from the response
+            # Model might wrap it in markdown, so extract
+            for line in raw.split("\n"):
+                line = line.strip()
+                if line.startswith("{"):
+                    decision = json.loads(line)
+                    break
+            if not decision:
+                decision = json.loads(raw)
+    except (json.JSONDecodeError, Exception) as e:
+        decision = None
+
+    # Fallback to hardcoded logic if model fails or is down
+    if decision is None:
+        actions = []
+        if not perception.get("gitea_alive"):
+            actions.append("ALERT: Gitea unreachable")
+        health = perception.get("model_health", {})
+        if isinstance(health, dict) and not health.get("ollama_running"):
+            actions.append("ALERT: Ollama not running")
+        decision = {
+            "actions": actions,
+            "severity": "fallback",
+            "reasoning": "model unavailable, used hardcoded checks"
+        }
+
+    tick_record["decision"] = decision
+    actions = decision.get("actions", [])
+```
+
+---
+
+## 3. DPO Candidate Collection
+
+No new database. Hermes sessions ARE the DPO candidates.
+
+Every `hermes_local()` call creates a session. To extract DPO pairs:
+
+```bash
+# Export all local-model sessions
+hermes sessions export --output /tmp/local-sessions.jsonl
+
+# Filter for heartbeat decisions
+grep "heartbeat_tick" /tmp/local-sessions.jsonl > heartbeat_decisions.jsonl
+```
+
+The existing `session_export` Huey task (runs every 4h) already extracts
+user→assistant pairs. It just needs to be aware that some sessions are
+now local-model decisions instead of human conversations.
+
+For DPO annotation, add a simple review script:
+
+```python
+# review_decisions.py — reads heartbeat tick logs, shows model decisions,
+# asks Alexander to mark chosen/rejected
+# Writes annotations back to the tick log files
+
+import json
+from pathlib import Path
+
+TICK_DIR = Path.home() / ".timmy" / "heartbeat"
+
+for log_file in sorted(TICK_DIR.glob("ticks_*.jsonl")):
+    for line in log_file.read_text().strip().split("\n"):
+        tick = json.loads(line)
+        decision = tick.get("decision", {})
+        if decision.get("severity") == "fallback":
+            continue  # skip fallback entries
+        
+        print(f"\n--- Tick {tick['tick_id']} ---")
+        print(f"Perception: {json.dumps(tick['perception'], indent=2)}")
+        print(f"Decision:   {json.dumps(decision, indent=2)}")
+        
+        rating = input("Rate (c=chosen, r=rejected, s=skip): ").strip()
+        if rating in ("c", "r"):
+            tick["dpo_label"] = "chosen" if rating == "c" else "rejected"
+            # write back... (append to annotated file)
+```
+
+---
+
+## 4. Dashboard — Reads Hermes Data
+
+```python
+#!/usr/bin/env python3
+"""Timmy Model Dashboard — reads from Hermes, owns nothing."""
+
+import json
+import os
+import subprocess
+import sys
+import time
+import urllib.request
+from datetime import datetime
+from pathlib import Path
+
+HERMES_HOME = Path.home() / ".hermes"
+TIMMY_HOME = Path.home() / ".timmy"
+
+
+def get_ollama_models():
+    """What's available in Ollama."""
+    try:
+        req = urllib.request.Request("http://localhost:11434/api/tags")
+        with urllib.request.urlopen(req, timeout=5) as resp:
+            return json.loads(resp.read()).get("models", [])
+    except Exception:
+        return []
+
+
+def get_loaded_models():
+    """What's actually in VRAM right now."""
+    try:
+        req = urllib.request.Request("http://localhost:11434/api/ps")
+        with urllib.request.urlopen(req, timeout=5) as resp:
+            return json.loads(resp.read()).get("models", [])
+    except Exception:
+        return []
+
+
+def get_huey_status():
+    try:
+        r = subprocess.run(["pgrep", "-f", "huey_consumer"],
+                          capture_output=True, timeout=5)
+        return r.returncode == 0
+    except Exception:
+        return False
+
+
+def get_hermes_sessions(hours=24):
+    """Read session metadata from Hermes session store."""
+    sessions_file = HERMES_HOME / "sessions" / "sessions.json"
+    if not sessions_file.exists():
+        return []
+    try:
+        data = json.loads(sessions_file.read_text())
+        return list(data.values())
+    except Exception:
+        return []
+
+
+def get_heartbeat_ticks(date_str=None):
+    """Read today's heartbeat ticks."""
+    if not date_str:
+        date_str = datetime.now().strftime("%Y%m%d")
+    tick_file = TIMMY_HOME / "heartbeat" / f"ticks_{date_str}.jsonl"
+    if not tick_file.exists():
+        return []
+    ticks = []
+    for line in tick_file.read_text().strip().split("\n"):
+        try:
+            ticks.append(json.loads(line))
+        except Exception:
+            continue
+    return ticks
+
+
+def render(hours=24):
+    models = get_ollama_models()
+    loaded = get_loaded_models()
+    huey = get_huey_status()
+    sessions = get_hermes_sessions(hours)
+    ticks = get_heartbeat_ticks()
+
+    loaded_names = {m.get("name", "") for m in loaded}
+
+    print("\033[2J\033[H")
+    print("=" * 70)
+    print("  TIMMY MODEL DASHBOARD")
+    now = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
+    print(f"  {now}  |  Huey: {'UP' if huey else 'DOWN'}  |  Ollama models: {len(models)}")
+    print("=" * 70)
+
+    # DEPLOYMENTS
+    print("\n  LOCAL MODELS")
+    print("  " + "-" * 55)
+    for m in models:
+        name = m.get("name", "?")
+        size_gb = m.get("size", 0) / 1e9
+        status = "IN VRAM" if name in loaded_names else "on disk"
+        print(f"    {name:35s} {size_gb:5.1f}GB  {status}")
+    if not models:
+        print("    (Ollama not responding)")
+
+    # HERMES SESSION ACTIVITY
+    # Count sessions by platform/provider
+    print(f"\n  HERMES SESSIONS (recent)")
+    print("  " + "-" * 55)
+    local_sessions = [s for s in sessions
+                     if "localhost" in str(s.get("origin", {}))]
+    cli_sessions = [s for s in sessions
+                   if s.get("platform") == "cli" or s.get("origin", {}).get("platform") == "cli"]
+
+    total_tokens = sum(s.get("total_tokens", 0) for s in sessions)
+    print(f"    Total sessions: {len(sessions)}")
+    print(f"    CLI sessions: {len(cli_sessions)}")
+    print(f"    Total tokens: {total_tokens:,}")
+
+    # HEARTBEAT STATUS
+    print(f"\n  HEARTBEAT ({len(ticks)} ticks today)")
+    print("  " + "-" * 55)
+    if ticks:
+        last = ticks[-1]
+        decision = last.get("decision", {})
+        severity = decision.get("severity", "unknown")
+        reasoning = decision.get("reasoning", "no model decision yet")
+        print(f"    Last tick: {last.get('tick_id', '?')}")
+        print(f"    Severity:  {severity}")
+        print(f"    Reasoning: {reasoning[:60]}")
+
+        # Count model vs fallback decisions
+        model_decisions = sum(1 for t in ticks
+                            if t.get("decision", {}).get("severity") != "fallback")
+        fallback = len(ticks) - model_decisions
+        print(f"    Model decisions: {model_decisions}  |  Fallback: {fallback}")
+
+        # DPO labels if any
+        labeled = sum(1 for t in ticks if "dpo_label" in t)
+        if labeled:
+            chosen = sum(1 for t in ticks if t.get("dpo_label") == "chosen")
+            rejected = sum(1 for t in ticks if t.get("dpo_label") == "rejected")
+            print(f"    DPO labeled: {labeled} (chosen: {chosen}, rejected: {rejected})")
+    else:
+        print("    (no ticks today)")
+
+    # ACTIVE LOOPS
+    print(f"\n  ACTIVE LOOPS USING LOCAL MODELS")
+    print("  " + "-" * 55)
+    print("    heartbeat_tick    10m    hermes4:14b    DECIDE phase")
+    print("    (future)          15m    hermes4:14b    issue triage")
+    print("    (future)          daily  timmy:v0.1     morning report")
+
+    print(f"\n  NON-LOCAL LOOPS (Gemini/Grok API)")
+    print("  " + "-" * 55)
+    print("    gemini_worker     20m    gemini-2.5-pro   aider")
+    print("    grok_worker       20m    grok-3-fast      opencode")
+    print("    cross_review      30m    both             PR review")
+
+    print("\n" + "=" * 70)
+
+
+if __name__ == "__main__":
+    watch = "--watch" in sys.argv
+    hours = 24
+    for a in sys.argv[1:]:
+        if a.startswith("--hours="):
+            hours = int(a.split("=")[1])
+    if watch:
+        while True:
+            render(hours)
+            time.sleep(30)
+    else:
+        render(hours)
+```
+
+---
+
+## 5. Implementation Steps
+
+### Step 1: Add hermes_local() to tasks.py
+- One function, ~20 lines
+- Calls `hermes chat -q` with Ollama env vars
+- All telemetry comes from Hermes for free
+
+### Step 2: Wire heartbeat_tick DECIDE phase
+- Replace 6 lines of if/else with hermes_local() call
+- Keep hardcoded fallback when model is down
+- Decision stored in tick record for DPO review
+
+### Step 3: Fix the MCP server warning
+- The orchestration MCP server path is broken — harmless but noisy
+- Either fix the path or remove from config
+
+### Step 4: Drop model_dashboard.py in timmy-config/bin/
+- Reads Ollama API, Hermes sessions, heartbeat ticks
+- No new data stores — just views over existing ones
+- `python3 model_dashboard.py --watch` for live view
+
+### Step 5: Expand to more Huey tasks
+- triage_issues: model reads issue, picks agent
+- good_morning_report: model writes the "From Timmy" section
+- Each expansion is just calling hermes_local() with a different prompt
+
+---
+
+## What Gets Hotfixed in Hermes Config
+
+If `hermes insights` is broken (the cache_read_tokens column error),
+that needs a fix. The dashboard falls back to reading sessions.json
+directly, but insights would be the better data source.
+
+The `providers.ollama` section in config.yaml exists but isn't wired
+to the --provider flag. Filing this upstream or patching locally would
+let us do `hermes chat -q "..." --provider ollama` cleanly instead
+of relying on env vars. Not blocking — env vars work today.
+
+---
+
+## What This Owns
+
+- hermes_local() — 20-line wrapper around a subprocess call
+- model_dashboard.py — read-only views over existing data
+- review_decisions.py — optional DPO annotation CLI
+
+## What This Does NOT Own
+
+- Inference. Ollama does that.
+- Telemetry. Hermes does that.
+- Session storage. Hermes does that.
+- Token counting. Hermes does that.
+- Training pipeline. Already exists in timmy-config/training/.