config: update bin/timmy-dashboard,config.yaml,docs/local-model-integration-sketch.md,tasks.py
This commit is contained in:
438
docs/local-model-integration-sketch.md
Normal file
438
docs/local-model-integration-sketch.md
Normal file
@@ -0,0 +1,438 @@
|
||||
# Local Model Integration Sketch v2
|
||||
# Hermes4-14B in the Heartbeat Loop — No New Telemetry
|
||||
|
||||
## Principle
|
||||
|
||||
No new inference layer. Huey tasks call `hermes chat -q` pointed at
|
||||
Ollama. Hermes handles sessions, token tracking, cost logging.
|
||||
The dashboard reads what Hermes already stores.
|
||||
|
||||
---
|
||||
|
||||
## Why Not Ollama Directly?
|
||||
|
||||
Ollama is fine as a serving backend. The issue isn't Ollama — it's that
|
||||
calling Ollama directly with urllib bypasses the harness. The harness
|
||||
already tracks sessions, tokens, model/provider, platform. Building a
|
||||
second telemetry layer is owning code we don't need.
|
||||
|
||||
Ollama as a named provider isn't wired into the --provider flag yet,
|
||||
but routing works via env vars:
|
||||
|
||||
HERMES_MODEL="hermes4:14b" \
|
||||
HERMES_PROVIDER="custom" \
|
||||
HERMES_BASE_URL="http://localhost:11434/v1" \
|
||||
hermes chat -q "prompt here" -Q
|
||||
|
||||
This creates a tracked session, logs tokens, and returns the response.
|
||||
That's our local inference call.
|
||||
|
||||
### Alternatives to Ollama for serving:
|
||||
- **llama.cpp server** — lighter, no Python, raw HTTP. Good for single
|
||||
model serving. Less convenient for model switching.
|
||||
- **vLLM** — best throughput, but needs NVIDIA GPU. Not for M3 Mac.
|
||||
- **MLX serving** — native Apple Silicon, but no OpenAI-compat API yet.
|
||||
MLX is for training, not serving (our current policy).
|
||||
- **llamafile** — single binary, portable. Good for distribution.
|
||||
|
||||
Verdict: Ollama is fine. It's the standard OpenAI-compat local server
|
||||
on Mac. The issue was never Ollama — it was bypassing the harness.
|
||||
|
||||
---
|
||||
|
||||
## 1. The Call Pattern
|
||||
|
||||
One function in tasks.py that all Huey tasks use:
|
||||
|
||||
```python
|
||||
import subprocess
|
||||
import json
|
||||
|
||||
HERMES_BIN = "hermes"
|
||||
LOCAL_ENV = {
|
||||
"HERMES_MODEL": "hermes4:14b",
|
||||
"HERMES_PROVIDER": "custom",
|
||||
"HERMES_BASE_URL": "http://localhost:11434/v1",
|
||||
}
|
||||
|
||||
def hermes_local(prompt, caller_tag=None, max_retries=2):
|
||||
"""Call hermes with local Ollama model. Returns response text.
|
||||
|
||||
Every call creates a hermes session with full telemetry.
|
||||
caller_tag gets prepended to prompt for searchability.
|
||||
"""
|
||||
import os
|
||||
env = os.environ.copy()
|
||||
env.update(LOCAL_ENV)
|
||||
|
||||
tagged_prompt = prompt
|
||||
if caller_tag:
|
||||
tagged_prompt = f"[{caller_tag}] {prompt}"
|
||||
|
||||
for attempt in range(max_retries + 1):
|
||||
try:
|
||||
result = subprocess.run(
|
||||
[HERMES_BIN, "chat", "-q", tagged_prompt, "-Q", "-t", "none"],
|
||||
capture_output=True, text=True,
|
||||
timeout=120, env=env,
|
||||
)
|
||||
if result.returncode == 0 and result.stdout.strip():
|
||||
# Strip the session_id line from -Q output
|
||||
lines = result.stdout.strip().split("\n")
|
||||
response_lines = [l for l in lines if not l.startswith("session_id:")]
|
||||
return "\n".join(response_lines).strip()
|
||||
except subprocess.TimeoutExpired:
|
||||
if attempt == max_retries:
|
||||
return None
|
||||
continue
|
||||
return None
|
||||
```
|
||||
|
||||
Notes:
|
||||
- `-t none` disables all toolsets — the heartbeat model shouldn't
|
||||
have terminal/file access. Pure reasoning only.
|
||||
- `-Q` quiet mode suppresses banner/spinner, gives clean output.
|
||||
- Every call creates a session in Hermes session store. Searchable,
|
||||
exportable, countable.
|
||||
- The `[caller_tag]` prefix lets you filter sessions by which Huey
|
||||
task generated them: `hermes sessions list | grep heartbeat`
|
||||
|
||||
---
|
||||
|
||||
## 2. Heartbeat DECIDE Phase
|
||||
|
||||
Replace the hardcoded if/else with a model call:
|
||||
|
||||
```python
|
||||
# In heartbeat_tick(), replace the DECIDE + ACT section:
|
||||
|
||||
# DECIDE: let hermes4:14b reason about what to do
|
||||
decide_prompt = f"""System state at {now.isoformat()}:
|
||||
|
||||
{json.dumps(perception, indent=2)}
|
||||
|
||||
Previous tick: {last_tick.get('tick_id', 'none')}
|
||||
|
||||
You are the heartbeat monitor. Based on this state:
|
||||
1. List any actions needed (alerts, restarts, escalations). Empty if all OK.
|
||||
2. Rate severity: ok, warning, or critical.
|
||||
3. One sentence of reasoning.
|
||||
|
||||
Respond ONLY with JSON:
|
||||
{{"actions": [], "severity": "ok", "reasoning": "..."}}"""
|
||||
|
||||
decision = None
|
||||
try:
|
||||
raw = hermes_local(decide_prompt, caller_tag="heartbeat_tick")
|
||||
if raw:
|
||||
# Try to parse JSON from the response
|
||||
# Model might wrap it in markdown, so extract
|
||||
for line in raw.split("\n"):
|
||||
line = line.strip()
|
||||
if line.startswith("{"):
|
||||
decision = json.loads(line)
|
||||
break
|
||||
if not decision:
|
||||
decision = json.loads(raw)
|
||||
except (json.JSONDecodeError, Exception) as e:
|
||||
decision = None
|
||||
|
||||
# Fallback to hardcoded logic if model fails or is down
|
||||
if decision is None:
|
||||
actions = []
|
||||
if not perception.get("gitea_alive"):
|
||||
actions.append("ALERT: Gitea unreachable")
|
||||
health = perception.get("model_health", {})
|
||||
if isinstance(health, dict) and not health.get("ollama_running"):
|
||||
actions.append("ALERT: Ollama not running")
|
||||
decision = {
|
||||
"actions": actions,
|
||||
"severity": "fallback",
|
||||
"reasoning": "model unavailable, used hardcoded checks"
|
||||
}
|
||||
|
||||
tick_record["decision"] = decision
|
||||
actions = decision.get("actions", [])
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. DPO Candidate Collection
|
||||
|
||||
No new database. Hermes sessions ARE the DPO candidates.
|
||||
|
||||
Every `hermes_local()` call creates a session. To extract DPO pairs:
|
||||
|
||||
```bash
|
||||
# Export all local-model sessions
|
||||
hermes sessions export --output /tmp/local-sessions.jsonl
|
||||
|
||||
# Filter for heartbeat decisions
|
||||
grep "heartbeat_tick" /tmp/local-sessions.jsonl > heartbeat_decisions.jsonl
|
||||
```
|
||||
|
||||
The existing `session_export` Huey task (runs every 4h) already extracts
|
||||
user→assistant pairs. It just needs to be aware that some sessions are
|
||||
now local-model decisions instead of human conversations.
|
||||
|
||||
For DPO annotation, add a simple review script:
|
||||
|
||||
```python
|
||||
# review_decisions.py — reads heartbeat tick logs, shows model decisions,
|
||||
# asks Alexander to mark chosen/rejected
|
||||
# Writes annotations back to the tick log files
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
TICK_DIR = Path.home() / ".timmy" / "heartbeat"
|
||||
|
||||
for log_file in sorted(TICK_DIR.glob("ticks_*.jsonl")):
|
||||
for line in log_file.read_text().strip().split("\n"):
|
||||
tick = json.loads(line)
|
||||
decision = tick.get("decision", {})
|
||||
if decision.get("severity") == "fallback":
|
||||
continue # skip fallback entries
|
||||
|
||||
print(f"\n--- Tick {tick['tick_id']} ---")
|
||||
print(f"Perception: {json.dumps(tick['perception'], indent=2)}")
|
||||
print(f"Decision: {json.dumps(decision, indent=2)}")
|
||||
|
||||
rating = input("Rate (c=chosen, r=rejected, s=skip): ").strip()
|
||||
if rating in ("c", "r"):
|
||||
tick["dpo_label"] = "chosen" if rating == "c" else "rejected"
|
||||
# write back... (append to annotated file)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Dashboard — Reads Hermes Data
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
"""Timmy Model Dashboard — reads from Hermes, owns nothing."""
|
||||
|
||||
import json
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
import time
|
||||
import urllib.request
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
HERMES_HOME = Path.home() / ".hermes"
|
||||
TIMMY_HOME = Path.home() / ".timmy"
|
||||
|
||||
|
||||
def get_ollama_models():
|
||||
"""What's available in Ollama."""
|
||||
try:
|
||||
req = urllib.request.Request("http://localhost:11434/api/tags")
|
||||
with urllib.request.urlopen(req, timeout=5) as resp:
|
||||
return json.loads(resp.read()).get("models", [])
|
||||
except Exception:
|
||||
return []
|
||||
|
||||
|
||||
def get_loaded_models():
|
||||
"""What's actually in VRAM right now."""
|
||||
try:
|
||||
req = urllib.request.Request("http://localhost:11434/api/ps")
|
||||
with urllib.request.urlopen(req, timeout=5) as resp:
|
||||
return json.loads(resp.read()).get("models", [])
|
||||
except Exception:
|
||||
return []
|
||||
|
||||
|
||||
def get_huey_status():
|
||||
try:
|
||||
r = subprocess.run(["pgrep", "-f", "huey_consumer"],
|
||||
capture_output=True, timeout=5)
|
||||
return r.returncode == 0
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
|
||||
def get_hermes_sessions(hours=24):
|
||||
"""Read session metadata from Hermes session store."""
|
||||
sessions_file = HERMES_HOME / "sessions" / "sessions.json"
|
||||
if not sessions_file.exists():
|
||||
return []
|
||||
try:
|
||||
data = json.loads(sessions_file.read_text())
|
||||
return list(data.values())
|
||||
except Exception:
|
||||
return []
|
||||
|
||||
|
||||
def get_heartbeat_ticks(date_str=None):
|
||||
"""Read today's heartbeat ticks."""
|
||||
if not date_str:
|
||||
date_str = datetime.now().strftime("%Y%m%d")
|
||||
tick_file = TIMMY_HOME / "heartbeat" / f"ticks_{date_str}.jsonl"
|
||||
if not tick_file.exists():
|
||||
return []
|
||||
ticks = []
|
||||
for line in tick_file.read_text().strip().split("\n"):
|
||||
try:
|
||||
ticks.append(json.loads(line))
|
||||
except Exception:
|
||||
continue
|
||||
return ticks
|
||||
|
||||
|
||||
def render(hours=24):
|
||||
models = get_ollama_models()
|
||||
loaded = get_loaded_models()
|
||||
huey = get_huey_status()
|
||||
sessions = get_hermes_sessions(hours)
|
||||
ticks = get_heartbeat_ticks()
|
||||
|
||||
loaded_names = {m.get("name", "") for m in loaded}
|
||||
|
||||
print("\033[2J\033[H")
|
||||
print("=" * 70)
|
||||
print(" TIMMY MODEL DASHBOARD")
|
||||
now = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
|
||||
print(f" {now} | Huey: {'UP' if huey else 'DOWN'} | Ollama models: {len(models)}")
|
||||
print("=" * 70)
|
||||
|
||||
# DEPLOYMENTS
|
||||
print("\n LOCAL MODELS")
|
||||
print(" " + "-" * 55)
|
||||
for m in models:
|
||||
name = m.get("name", "?")
|
||||
size_gb = m.get("size", 0) / 1e9
|
||||
status = "IN VRAM" if name in loaded_names else "on disk"
|
||||
print(f" {name:35s} {size_gb:5.1f}GB {status}")
|
||||
if not models:
|
||||
print(" (Ollama not responding)")
|
||||
|
||||
# HERMES SESSION ACTIVITY
|
||||
# Count sessions by platform/provider
|
||||
print(f"\n HERMES SESSIONS (recent)")
|
||||
print(" " + "-" * 55)
|
||||
local_sessions = [s for s in sessions
|
||||
if "localhost" in str(s.get("origin", {}))]
|
||||
cli_sessions = [s for s in sessions
|
||||
if s.get("platform") == "cli" or s.get("origin", {}).get("platform") == "cli"]
|
||||
|
||||
total_tokens = sum(s.get("total_tokens", 0) for s in sessions)
|
||||
print(f" Total sessions: {len(sessions)}")
|
||||
print(f" CLI sessions: {len(cli_sessions)}")
|
||||
print(f" Total tokens: {total_tokens:,}")
|
||||
|
||||
# HEARTBEAT STATUS
|
||||
print(f"\n HEARTBEAT ({len(ticks)} ticks today)")
|
||||
print(" " + "-" * 55)
|
||||
if ticks:
|
||||
last = ticks[-1]
|
||||
decision = last.get("decision", {})
|
||||
severity = decision.get("severity", "unknown")
|
||||
reasoning = decision.get("reasoning", "no model decision yet")
|
||||
print(f" Last tick: {last.get('tick_id', '?')}")
|
||||
print(f" Severity: {severity}")
|
||||
print(f" Reasoning: {reasoning[:60]}")
|
||||
|
||||
# Count model vs fallback decisions
|
||||
model_decisions = sum(1 for t in ticks
|
||||
if t.get("decision", {}).get("severity") != "fallback")
|
||||
fallback = len(ticks) - model_decisions
|
||||
print(f" Model decisions: {model_decisions} | Fallback: {fallback}")
|
||||
|
||||
# DPO labels if any
|
||||
labeled = sum(1 for t in ticks if "dpo_label" in t)
|
||||
if labeled:
|
||||
chosen = sum(1 for t in ticks if t.get("dpo_label") == "chosen")
|
||||
rejected = sum(1 for t in ticks if t.get("dpo_label") == "rejected")
|
||||
print(f" DPO labeled: {labeled} (chosen: {chosen}, rejected: {rejected})")
|
||||
else:
|
||||
print(" (no ticks today)")
|
||||
|
||||
# ACTIVE LOOPS
|
||||
print(f"\n ACTIVE LOOPS USING LOCAL MODELS")
|
||||
print(" " + "-" * 55)
|
||||
print(" heartbeat_tick 10m hermes4:14b DECIDE phase")
|
||||
print(" (future) 15m hermes4:14b issue triage")
|
||||
print(" (future) daily timmy:v0.1 morning report")
|
||||
|
||||
print(f"\n NON-LOCAL LOOPS (Gemini/Grok API)")
|
||||
print(" " + "-" * 55)
|
||||
print(" gemini_worker 20m gemini-2.5-pro aider")
|
||||
print(" grok_worker 20m grok-3-fast opencode")
|
||||
print(" cross_review 30m both PR review")
|
||||
|
||||
print("\n" + "=" * 70)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
watch = "--watch" in sys.argv
|
||||
hours = 24
|
||||
for a in sys.argv[1:]:
|
||||
if a.startswith("--hours="):
|
||||
hours = int(a.split("=")[1])
|
||||
if watch:
|
||||
while True:
|
||||
render(hours)
|
||||
time.sleep(30)
|
||||
else:
|
||||
render(hours)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Implementation Steps
|
||||
|
||||
### Step 1: Add hermes_local() to tasks.py
|
||||
- One function, ~20 lines
|
||||
- Calls `hermes chat -q` with Ollama env vars
|
||||
- All telemetry comes from Hermes for free
|
||||
|
||||
### Step 2: Wire heartbeat_tick DECIDE phase
|
||||
- Replace 6 lines of if/else with hermes_local() call
|
||||
- Keep hardcoded fallback when model is down
|
||||
- Decision stored in tick record for DPO review
|
||||
|
||||
### Step 3: Fix the MCP server warning
|
||||
- The orchestration MCP server path is broken — harmless but noisy
|
||||
- Either fix the path or remove from config
|
||||
|
||||
### Step 4: Drop model_dashboard.py in timmy-config/bin/
|
||||
- Reads Ollama API, Hermes sessions, heartbeat ticks
|
||||
- No new data stores — just views over existing ones
|
||||
- `python3 model_dashboard.py --watch` for live view
|
||||
|
||||
### Step 5: Expand to more Huey tasks
|
||||
- triage_issues: model reads issue, picks agent
|
||||
- good_morning_report: model writes the "From Timmy" section
|
||||
- Each expansion is just calling hermes_local() with a different prompt
|
||||
|
||||
---
|
||||
|
||||
## What Gets Hotfixed in Hermes Config
|
||||
|
||||
If `hermes insights` is broken (the cache_read_tokens column error),
|
||||
that needs a fix. The dashboard falls back to reading sessions.json
|
||||
directly, but insights would be the better data source.
|
||||
|
||||
The `providers.ollama` section in config.yaml exists but isn't wired
|
||||
to the --provider flag. Filing this upstream or patching locally would
|
||||
let us do `hermes chat -q "..." --provider ollama` cleanly instead
|
||||
of relying on env vars. Not blocking — env vars work today.
|
||||
|
||||
---
|
||||
|
||||
## What This Owns
|
||||
|
||||
- hermes_local() — 20-line wrapper around a subprocess call
|
||||
- model_dashboard.py — read-only views over existing data
|
||||
- review_decisions.py — optional DPO annotation CLI
|
||||
|
||||
## What This Does NOT Own
|
||||
|
||||
- Inference. Ollama does that.
|
||||
- Telemetry. Hermes does that.
|
||||
- Session storage. Hermes does that.
|
||||
- Token counting. Hermes does that.
|
||||
- Training pipeline. Already exists in timmy-config/training/.
|
||||
Reference in New Issue
Block a user