docs/local-model-integration-sketch.md

# Local Model Integration Sketch v2
# Hermes4-14B in the Heartbeat Loop — No New Telemetry

## Principle

No new inference layer. Huey tasks call `hermes chat -q` pointed at
Ollama. Hermes handles sessions, token tracking, cost logging.
The dashboard reads what Hermes already stores.

---

## Why Not Ollama Directly?

Ollama is fine as a serving backend. The issue isn't Ollama — it's that
calling Ollama directly with urllib bypasses the harness. The harness
already tracks sessions, tokens, model/provider, platform. Building a
second telemetry layer is owning code we don't need.

Ollama as a named provider isn't wired into the --provider flag yet,
but routing works via env vars:

    HERMES_MODEL="hermes4:14b" \
    HERMES_PROVIDER="custom" \
    HERMES_BASE_URL="http://localhost:11434/v1" \
    hermes chat -q "prompt here" -Q

This creates a tracked session, logs tokens, and returns the response.
That's our local inference call.

### Alternatives to Ollama for serving:
- **llama.cpp server** — lighter, no Python, raw HTTP. Good for single
  model serving. Less convenient for model switching.
- **vLLM** — best throughput, but needs NVIDIA GPU. Not for M3 Mac.
- **MLX serving** — native Apple Silicon, but no OpenAI-compat API yet.
  MLX is for training, not serving (our current policy).
- **llamafile** — single binary, portable. Good for distribution.

Verdict: Ollama is fine. It's the standard OpenAI-compat local server
on Mac. The issue was never Ollama — it was bypassing the harness.

---

## 1. The Call Pattern

One function in tasks.py that all Huey tasks use:

```python
import subprocess
import json

HERMES_BIN = "hermes"
LOCAL_ENV = {
    "HERMES_MODEL": "hermes4:14b",
    "HERMES_PROVIDER": "custom",
    "HERMES_BASE_URL": "http://localhost:11434/v1",
}

def hermes_local(prompt, caller_tag=None, max_retries=2):
    """Call hermes with local Ollama model. Returns response text.
    
    Every call creates a hermes session with full telemetry.
    caller_tag gets prepended to prompt for searchability.
    """
    import os
    env = os.environ.copy()
    env.update(LOCAL_ENV)
    
    tagged_prompt = prompt
    if caller_tag:
        tagged_prompt = f"[{caller_tag}] {prompt}"
    
    for attempt in range(max_retries + 1):
        try:
            result = subprocess.run(
                [HERMES_BIN, "chat", "-q", tagged_prompt, "-Q", "-t", "none"],
                capture_output=True, text=True,
                timeout=120, env=env,
            )
            if result.returncode == 0 and result.stdout.strip():
                # Strip the session_id line from -Q output
                lines = result.stdout.strip().split("\n")
                response_lines = [l for l in lines if not l.startswith("session_id:")]
                return "\n".join(response_lines).strip()
        except subprocess.TimeoutExpired:
            if attempt == max_retries:
                return None
            continue
    return None
```

Notes:
- `-t none` disables all toolsets — the heartbeat model shouldn't
  have terminal/file access. Pure reasoning only.
- `-Q` quiet mode suppresses banner/spinner, gives clean output.
- Every call creates a session in Hermes session store. Searchable,
  exportable, countable.
- The `[caller_tag]` prefix lets you filter sessions by which Huey
  task generated them: `hermes sessions list | grep heartbeat`

---

## 2. Heartbeat DECIDE Phase

Replace the hardcoded if/else with a model call:

```python
# In heartbeat_tick(), replace the DECIDE + ACT section:

    # DECIDE: let hermes4:14b reason about what to do
    decide_prompt = f"""System state at {now.isoformat()}:

{json.dumps(perception, indent=2)}

Previous tick: {last_tick.get('tick_id', 'none')}

You are the heartbeat monitor. Based on this state:
1. List any actions needed (alerts, restarts, escalations). Empty if all OK.
2. Rate severity: ok, warning, or critical.
3. One sentence of reasoning.

Respond ONLY with JSON:
{{"actions": [], "severity": "ok", "reasoning": "..."}}"""

    decision = None
    try:
        raw = hermes_local(decide_prompt, caller_tag="heartbeat_tick")
        if raw:
            # Try to parse JSON from the response
            # Model might wrap it in markdown, so extract
            for line in raw.split("\n"):
                line = line.strip()
                if line.startswith("{"):
                    decision = json.loads(line)
                    break
            if not decision:
                decision = json.loads(raw)
    except (json.JSONDecodeError, Exception) as e:
        decision = None

    # Fallback to hardcoded logic if model fails or is down
    if decision is None:
        actions = []
        if not perception.get("gitea_alive"):
            actions.append("ALERT: Gitea unreachable")
        health = perception.get("model_health", {})
        if isinstance(health, dict) and not health.get("ollama_running"):
            actions.append("ALERT: Ollama not running")
        decision = {
            "actions": actions,
            "severity": "fallback",
            "reasoning": "model unavailable, used hardcoded checks"
        }

    tick_record["decision"] = decision
    actions = decision.get("actions", [])
```

---

## 3. DPO Candidate Collection

No new database. Hermes sessions ARE the DPO candidates.

Every `hermes_local()` call creates a session. To extract DPO pairs:

```bash
# Export all local-model sessions
hermes sessions export --output /tmp/local-sessions.jsonl

# Filter for heartbeat decisions
grep "heartbeat_tick" /tmp/local-sessions.jsonl > heartbeat_decisions.jsonl
```

The existing `session_export` Huey task (runs every 4h) already extracts
user→assistant pairs. It just needs to be aware that some sessions are
now local-model decisions instead of human conversations.

For DPO annotation, add a simple review script:

```python
# review_decisions.py — reads heartbeat tick logs, shows model decisions,
# asks Alexander to mark chosen/rejected
# Writes annotations back to the tick log files

import json
from pathlib import Path

TICK_DIR = Path.home() / ".timmy" / "heartbeat"

for log_file in sorted(TICK_DIR.glob("ticks_*.jsonl")):
    for line in log_file.read_text().strip().split("\n"):
        tick = json.loads(line)
        decision = tick.get("decision", {})
        if decision.get("severity") == "fallback":
            continue  # skip fallback entries
        
        print(f"\n--- Tick {tick['tick_id']} ---")
        print(f"Perception: {json.dumps(tick['perception'], indent=2)}")
        print(f"Decision:   {json.dumps(decision, indent=2)}")
        
        rating = input("Rate (c=chosen, r=rejected, s=skip): ").strip()
        if rating in ("c", "r"):
            tick["dpo_label"] = "chosen" if rating == "c" else "rejected"
            # write back... (append to annotated file)
```

---

## 4. Dashboard — Reads Hermes Data

```python
#!/usr/bin/env python3
"""Timmy Model Dashboard — reads from Hermes, owns nothing."""

import json
import os
import subprocess
import sys
import time
import urllib.request
from datetime import datetime
from pathlib import Path

HERMES_HOME = Path.home() / ".hermes"
TIMMY_HOME = Path.home() / ".timmy"


def get_ollama_models():
    """What's available in Ollama."""
    try:
        req = urllib.request.Request("http://localhost:11434/api/tags")
        with urllib.request.urlopen(req, timeout=5) as resp:
            return json.loads(resp.read()).get("models", [])
    except Exception:
        return []


def get_loaded_models():
    """What's actually in VRAM right now."""
    try:
        req = urllib.request.Request("http://localhost:11434/api/ps")
        with urllib.request.urlopen(req, timeout=5) as resp:
            return json.loads(resp.read()).get("models", [])
    except Exception:
        return []


def get_huey_status():
    try:
        r = subprocess.run(["pgrep", "-f", "huey_consumer"],
                          capture_output=True, timeout=5)
        return r.returncode == 0
    except Exception:
        return False


def get_hermes_sessions(hours=24):
    """Read session metadata from Hermes session store."""
    sessions_file = HERMES_HOME / "sessions" / "sessions.json"
    if not sessions_file.exists():
        return []
    try:
        data = json.loads(sessions_file.read_text())
        return list(data.values())
    except Exception:
        return []


def get_heartbeat_ticks(date_str=None):
    """Read today's heartbeat ticks."""
    if not date_str:
        date_str = datetime.now().strftime("%Y%m%d")
    tick_file = TIMMY_HOME / "heartbeat" / f"ticks_{date_str}.jsonl"
    if not tick_file.exists():
        return []
    ticks = []
    for line in tick_file.read_text().strip().split("\n"):
        try:
            ticks.append(json.loads(line))
        except Exception:
            continue
    return ticks


def render(hours=24):
    models = get_ollama_models()
    loaded = get_loaded_models()
    huey = get_huey_status()
    sessions = get_hermes_sessions(hours)
    ticks = get_heartbeat_ticks()

    loaded_names = {m.get("name", "") for m in loaded}

    print("\033[2J\033[H")
    print("=" * 70)
    print("  TIMMY MODEL DASHBOARD")
    now = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    print(f"  {now}  |  Huey: {'UP' if huey else 'DOWN'}  |  Ollama models: {len(models)}")
    print("=" * 70)

    # DEPLOYMENTS
    print("\n  LOCAL MODELS")
    print("  " + "-" * 55)
    for m in models:
        name = m.get("name", "?")
        size_gb = m.get("size", 0) / 1e9
        status = "IN VRAM" if name in loaded_names else "on disk"
        print(f"    {name:35s} {size_gb:5.1f}GB  {status}")
    if not models:
        print("    (Ollama not responding)")

    # HERMES SESSION ACTIVITY
    # Count sessions by platform/provider
    print(f"\n  HERMES SESSIONS (recent)")
    print("  " + "-" * 55)
    local_sessions = [s for s in sessions
                     if "localhost" in str(s.get("origin", {}))]
    cli_sessions = [s for s in sessions
                   if s.get("platform") == "cli" or s.get("origin", {}).get("platform") == "cli"]

    total_tokens = sum(s.get("total_tokens", 0) for s in sessions)
    print(f"    Total sessions: {len(sessions)}")
    print(f"    CLI sessions: {len(cli_sessions)}")
    print(f"    Total tokens: {total_tokens:,}")

    # HEARTBEAT STATUS
    print(f"\n  HEARTBEAT ({len(ticks)} ticks today)")
    print("  " + "-" * 55)
    if ticks:
        last = ticks[-1]
        decision = last.get("decision", {})
        severity = decision.get("severity", "unknown")
        reasoning = decision.get("reasoning", "no model decision yet")
        print(f"    Last tick: {last.get('tick_id', '?')}")
        print(f"    Severity:  {severity}")
        print(f"    Reasoning: {reasoning[:60]}")

        # Count model vs fallback decisions
        model_decisions = sum(1 for t in ticks
                            if t.get("decision", {}).get("severity") != "fallback")
        fallback = len(ticks) - model_decisions
        print(f"    Model decisions: {model_decisions}  |  Fallback: {fallback}")

        # DPO labels if any
        labeled = sum(1 for t in ticks if "dpo_label" in t)
        if labeled:
            chosen = sum(1 for t in ticks if t.get("dpo_label") == "chosen")
            rejected = sum(1 for t in ticks if t.get("dpo_label") == "rejected")
            print(f"    DPO labeled: {labeled} (chosen: {chosen}, rejected: {rejected})")
    else:
        print("    (no ticks today)")

    # ACTIVE LOOPS
    print(f"\n  ACTIVE LOOPS USING LOCAL MODELS")
    print("  " + "-" * 55)
    print("    heartbeat_tick    10m    hermes4:14b    DECIDE phase")
    print("    (future)          15m    hermes4:14b    issue triage")
    print("    (future)          daily  timmy:v0.1     morning report")

    print(f"\n  NON-LOCAL LOOPS (Gemini/Grok API)")
    print("  " + "-" * 55)
    print("    gemini_worker     20m    gemini-2.5-pro   aider")
    print("    grok_worker       20m    grok-3-fast      opencode")
    print("    cross_review      30m    both             PR review")

    print("\n" + "=" * 70)


if __name__ == "__main__":
    watch = "--watch" in sys.argv
    hours = 24
    for a in sys.argv[1:]:
        if a.startswith("--hours="):
            hours = int(a.split("=")[1])
    if watch:
        while True:
            render(hours)
            time.sleep(30)
    else:
        render(hours)
```

---

## 5. Implementation Steps

### Step 1: Add hermes_local() to tasks.py
- One function, ~20 lines
- Calls `hermes chat -q` with Ollama env vars
- All telemetry comes from Hermes for free

### Step 2: Wire heartbeat_tick DECIDE phase
- Replace 6 lines of if/else with hermes_local() call
- Keep hardcoded fallback when model is down
- Decision stored in tick record for DPO review

### Step 3: Fix the MCP server warning
- The orchestration MCP server path is broken — harmless but noisy
- Either fix the path or remove from config

### Step 4: Drop model_dashboard.py in timmy-config/bin/
- Reads Ollama API, Hermes sessions, heartbeat ticks
- No new data stores — just views over existing ones
- `python3 model_dashboard.py --watch` for live view

### Step 5: Expand to more Huey tasks
- triage_issues: model reads issue, picks agent
- good_morning_report: model writes the "From Timmy" section
- Each expansion is just calling hermes_local() with a different prompt

---

## What Gets Hotfixed in Hermes Config

If `hermes insights` is broken (the cache_read_tokens column error),
that needs a fix. The dashboard falls back to reading sessions.json
directly, but insights would be the better data source.

The `providers.ollama` section in config.yaml exists but isn't wired
to the --provider flag. Filing this upstream or patching locally would
let us do `hermes chat -q "..." --provider ollama` cleanly instead
of relying on env vars. Not blocking — env vars work today.

---

## What This Owns

- hermes_local() — 20-line wrapper around a subprocess call
- model_dashboard.py — read-only views over existing data
- review_decisions.py — optional DPO annotation CLI

## What This Does NOT Own

- Inference. Ollama does that.
- Telemetry. Hermes does that.
- Session storage. Hermes does that.
- Token counting. Hermes does that.
- Training pipeline. Already exists in timmy-config/training/.
config: update bin/timmy-dashboard,config.yaml,docs/local-model-integration-sketch.md,tasks.py 2026-03-26 17:00:22 -04:00			`# Local Model Integration Sketch v2`
			`# Hermes4-14B in the Heartbeat Loop — No New Telemetry`

			`## Principle`

			No new inference layer. Huey tasks call `hermes chat -q` pointed at
			`Ollama. Hermes handles sessions, token tracking, cost logging.`
			`The dashboard reads what Hermes already stores.`

			`---`

			`## Why Not Ollama Directly?`

			`Ollama is fine as a serving backend. The issue isn't Ollama — it's that`
			`calling Ollama directly with urllib bypasses the harness. The harness`
			`already tracks sessions, tokens, model/provider, platform. Building a`
			`second telemetry layer is owning code we don't need.`

			`Ollama as a named provider isn't wired into the --provider flag yet,`
			`but routing works via env vars:`

			`HERMES_MODEL="hermes4:14b" \`
			`HERMES_PROVIDER="custom" \`
			`HERMES_BASE_URL="http://localhost:11434/v1" \`
			`hermes chat -q "prompt here" -Q`

			`This creates a tracked session, logs tokens, and returns the response.`
			`That's our local inference call.`

			`### Alternatives to Ollama for serving:`
			`- llama.cpp server — lighter, no Python, raw HTTP. Good for single`
			`model serving. Less convenient for model switching.`
			`- vLLM — best throughput, but needs NVIDIA GPU. Not for M3 Mac.`
			`- MLX serving — native Apple Silicon, but no OpenAI-compat API yet.`
			`MLX is for training, not serving (our current policy).`
			`- llamafile — single binary, portable. Good for distribution.`

			`Verdict: Ollama is fine. It's the standard OpenAI-compat local server`
			`on Mac. The issue was never Ollama — it was bypassing the harness.`

			`---`

			`## 1. The Call Pattern`

			`One function in tasks.py that all Huey tasks use:`

			```python
			`import subprocess`
			`import json`

			`HERMES_BIN = "hermes"`
			`LOCAL_ENV = {`
			`"HERMES_MODEL": "hermes4:14b",`
			`"HERMES_PROVIDER": "custom",`
			`"HERMES_BASE_URL": "http://localhost:11434/v1",`
			`}`

			`def hermes_local(prompt, caller_tag=None, max_retries=2):`
			`"""Call hermes with local Ollama model. Returns response text.`

			`Every call creates a hermes session with full telemetry.`
			`caller_tag gets prepended to prompt for searchability.`
			`"""`
			`import os`
			`env = os.environ.copy()`
			`env.update(LOCAL_ENV)`

			`tagged_prompt = prompt`
			`if caller_tag:`
			`tagged_prompt = f"[{caller_tag}] {prompt}"`

			`for attempt in range(max_retries + 1):`
			`try:`
			`result = subprocess.run(`
			`[HERMES_BIN, "chat", "-q", tagged_prompt, "-Q", "-t", "none"],`
			`capture_output=True, text=True,`
			`timeout=120, env=env,`
			`)`
			`if result.returncode == 0 and result.stdout.strip():`
			`# Strip the session_id line from -Q output`
			`lines = result.stdout.strip().split("\n")`
			`response_lines = [l for l in lines if not l.startswith("session_id:")]`
			`return "\n".join(response_lines).strip()`
			`except subprocess.TimeoutExpired:`
			`if attempt == max_retries:`
			`return None`
			`continue`
			`return None`
			```

			`Notes:`
			- `-t none` disables all toolsets — the heartbeat model shouldn't
			`have terminal/file access. Pure reasoning only.`
			- `-Q` quiet mode suppresses banner/spinner, gives clean output.
			`- Every call creates a session in Hermes session store. Searchable,`
			`exportable, countable.`
			- The `[caller_tag]` prefix lets you filter sessions by which Huey
			task generated them: `hermes sessions list \| grep heartbeat`

			`---`

			`## 2. Heartbeat DECIDE Phase`

			`Replace the hardcoded if/else with a model call:`

			```python
			`# In heartbeat_tick(), replace the DECIDE + ACT section:`

			`# DECIDE: let hermes4:14b reason about what to do`
			`decide_prompt = f"""System state at {now.isoformat()}:`

			`{json.dumps(perception, indent=2)}`

			`Previous tick: {last_tick.get('tick_id', 'none')}`

			`You are the heartbeat monitor. Based on this state:`
			`1. List any actions needed (alerts, restarts, escalations). Empty if all OK.`
			`2. Rate severity: ok, warning, or critical.`
			`3. One sentence of reasoning.`

			`Respond ONLY with JSON:`
			`{{"actions": [], "severity": "ok", "reasoning": "..."}}"""`

			`decision = None`
			`try:`
			`raw = hermes_local(decide_prompt, caller_tag="heartbeat_tick")`
			`if raw:`
			`# Try to parse JSON from the response`
			`# Model might wrap it in markdown, so extract`
			`for line in raw.split("\n"):`
			`line = line.strip()`
			`if line.startswith("{"):`
			`decision = json.loads(line)`
			`break`
			`if not decision:`
			`decision = json.loads(raw)`
			`except (json.JSONDecodeError, Exception) as e:`
			`decision = None`

			`# Fallback to hardcoded logic if model fails or is down`
			`if decision is None:`
			`actions = []`
			`if not perception.get("gitea_alive"):`
			`actions.append("ALERT: Gitea unreachable")`
			`health = perception.get("model_health", {})`
			`if isinstance(health, dict) and not health.get("ollama_running"):`
			`actions.append("ALERT: Ollama not running")`
			`decision = {`
			`"actions": actions,`
			`"severity": "fallback",`
			`"reasoning": "model unavailable, used hardcoded checks"`
			`}`

			`tick_record["decision"] = decision`
			`actions = decision.get("actions", [])`
			```

			`---`

			`## 3. DPO Candidate Collection`

			`No new database. Hermes sessions ARE the DPO candidates.`

			Every `hermes_local()` call creates a session. To extract DPO pairs:

			```bash
			`# Export all local-model sessions`
			`hermes sessions export --output /tmp/local-sessions.jsonl`

			`# Filter for heartbeat decisions`
			`grep "heartbeat_tick" /tmp/local-sessions.jsonl > heartbeat_decisions.jsonl`
			```

			The existing `session_export` Huey task (runs every 4h) already extracts
			`user→assistant pairs. It just needs to be aware that some sessions are`
			`now local-model decisions instead of human conversations.`

			`For DPO annotation, add a simple review script:`

			```python
			`# review_decisions.py — reads heartbeat tick logs, shows model decisions,`
			`# asks Alexander to mark chosen/rejected`
			`# Writes annotations back to the tick log files`

			`import json`
			`from pathlib import Path`

			`TICK_DIR = Path.home() / ".timmy" / "heartbeat"`

			`for log_file in sorted(TICK_DIR.glob("ticks_*.jsonl")):`
			`for line in log_file.read_text().strip().split("\n"):`
			`tick = json.loads(line)`
			`decision = tick.get("decision", {})`
			`if decision.get("severity") == "fallback":`
			`continue # skip fallback entries`

			`print(f"\n--- Tick {tick['tick_id']} ---")`
			`print(f"Perception: {json.dumps(tick['perception'], indent=2)}")`
			`print(f"Decision: {json.dumps(decision, indent=2)}")`

			`rating = input("Rate (c=chosen, r=rejected, s=skip): ").strip()`
			`if rating in ("c", "r"):`
			`tick["dpo_label"] = "chosen" if rating == "c" else "rejected"`
			`# write back... (append to annotated file)`
			```

			`---`

			`## 4. Dashboard — Reads Hermes Data`

			```python
			`#!/usr/bin/env python3`
			`"""Timmy Model Dashboard — reads from Hermes, owns nothing."""`

			`import json`
			`import os`
			`import subprocess`
			`import sys`
			`import time`
			`import urllib.request`
			`from datetime import datetime`
			`from pathlib import Path`

			`HERMES_HOME = Path.home() / ".hermes"`
			`TIMMY_HOME = Path.home() / ".timmy"`


			`def get_ollama_models():`
			`"""What's available in Ollama."""`
			`try:`
			`req = urllib.request.Request("http://localhost:11434/api/tags")`
			`with urllib.request.urlopen(req, timeout=5) as resp:`
			`return json.loads(resp.read()).get("models", [])`
			`except Exception:`
			`return []`


			`def get_loaded_models():`
			`"""What's actually in VRAM right now."""`
			`try:`
			`req = urllib.request.Request("http://localhost:11434/api/ps")`
			`with urllib.request.urlopen(req, timeout=5) as resp:`
			`return json.loads(resp.read()).get("models", [])`
			`except Exception:`
			`return []`


			`def get_huey_status():`
			`try:`
			`r = subprocess.run(["pgrep", "-f", "huey_consumer"],`
			`capture_output=True, timeout=5)`
			`return r.returncode == 0`
			`except Exception:`
			`return False`


			`def get_hermes_sessions(hours=24):`
			`"""Read session metadata from Hermes session store."""`
			`sessions_file = HERMES_HOME / "sessions" / "sessions.json"`
			`if not sessions_file.exists():`
			`return []`
			`try:`
			`data = json.loads(sessions_file.read_text())`
			`return list(data.values())`
			`except Exception:`
			`return []`


			`def get_heartbeat_ticks(date_str=None):`
			`"""Read today's heartbeat ticks."""`
			`if not date_str:`
			`date_str = datetime.now().strftime("%Y%m%d")`
			`tick_file = TIMMY_HOME / "heartbeat" / f"ticks_{date_str}.jsonl"`
			`if not tick_file.exists():`
			`return []`
			`ticks = []`
			`for line in tick_file.read_text().strip().split("\n"):`
			`try:`
			`ticks.append(json.loads(line))`
			`except Exception:`
			`continue`
			`return ticks`


			`def render(hours=24):`
			`models = get_ollama_models()`
			`loaded = get_loaded_models()`
			`huey = get_huey_status()`
			`sessions = get_hermes_sessions(hours)`
			`ticks = get_heartbeat_ticks()`

			`loaded_names = {m.get("name", "") for m in loaded}`

			`print("\033[2J\033[H")`
			`print("=" * 70)`
			`print(" TIMMY MODEL DASHBOARD")`
			`now = datetime.now().strftime("%Y-%m-%d %H:%M:%S")`
			`print(f" {now} \| Huey: {'UP' if huey else 'DOWN'} \| Ollama models: {len(models)}")`
			`print("=" * 70)`

			`# DEPLOYMENTS`
			`print("\n LOCAL MODELS")`
			`print(" " + "-" * 55)`
			`for m in models:`
			`name = m.get("name", "?")`
			`size_gb = m.get("size", 0) / 1e9`
			`status = "IN VRAM" if name in loaded_names else "on disk"`
			`print(f" {name:35s} {size_gb:5.1f}GB {status}")`
			`if not models:`
			`print(" (Ollama not responding)")`

			`# HERMES SESSION ACTIVITY`
			`# Count sessions by platform/provider`
			`print(f"\n HERMES SESSIONS (recent)")`
			`print(" " + "-" * 55)`
			`local_sessions = [s for s in sessions`
			`if "localhost" in str(s.get("origin", {}))]`
			`cli_sessions = [s for s in sessions`
			`if s.get("platform") == "cli" or s.get("origin", {}).get("platform") == "cli"]`

			`total_tokens = sum(s.get("total_tokens", 0) for s in sessions)`
			`print(f" Total sessions: {len(sessions)}")`
			`print(f" CLI sessions: {len(cli_sessions)}")`
			`print(f" Total tokens: {total_tokens:,}")`

			`# HEARTBEAT STATUS`
			`print(f"\n HEARTBEAT ({len(ticks)} ticks today)")`
			`print(" " + "-" * 55)`
			`if ticks:`
			`last = ticks[-1]`
			`decision = last.get("decision", {})`
			`severity = decision.get("severity", "unknown")`
			`reasoning = decision.get("reasoning", "no model decision yet")`
			`print(f" Last tick: {last.get('tick_id', '?')}")`
			`print(f" Severity: {severity}")`
			`print(f" Reasoning: {reasoning[:60]}")`

			`# Count model vs fallback decisions`
			`model_decisions = sum(1 for t in ticks`
			`if t.get("decision", {}).get("severity") != "fallback")`
			`fallback = len(ticks) - model_decisions`
			`print(f" Model decisions: {model_decisions} \| Fallback: {fallback}")`

			`# DPO labels if any`
			`labeled = sum(1 for t in ticks if "dpo_label" in t)`
			`if labeled:`
			`chosen = sum(1 for t in ticks if t.get("dpo_label") == "chosen")`
			`rejected = sum(1 for t in ticks if t.get("dpo_label") == "rejected")`
			`print(f" DPO labeled: {labeled} (chosen: {chosen}, rejected: {rejected})")`
			`else:`
			`print(" (no ticks today)")`

			`# ACTIVE LOOPS`
			`print(f"\n ACTIVE LOOPS USING LOCAL MODELS")`
			`print(" " + "-" * 55)`
			`print(" heartbeat_tick 10m hermes4:14b DECIDE phase")`
			`print(" (future) 15m hermes4:14b issue triage")`
			`print(" (future) daily timmy:v0.1 morning report")`

			`print(f"\n NON-LOCAL LOOPS (Gemini/Grok API)")`
			`print(" " + "-" * 55)`
			`print(" gemini_worker 20m gemini-2.5-pro aider")`
			`print(" grok_worker 20m grok-3-fast opencode")`
			`print(" cross_review 30m both PR review")`

			`print("\n" + "=" * 70)`


			`if __name__ == "__main__":`
			`watch = "--watch" in sys.argv`
			`hours = 24`
			`for a in sys.argv[1:]:`
			`if a.startswith("--hours="):`
			`hours = int(a.split("=")[1])`
			`if watch:`
			`while True:`
			`render(hours)`
			`time.sleep(30)`
			`else:`
			`render(hours)`
			```

			`---`

			`## 5. Implementation Steps`

			`### Step 1: Add hermes_local() to tasks.py`
			`- One function, ~20 lines`
			- Calls `hermes chat -q` with Ollama env vars
			`- All telemetry comes from Hermes for free`

			`### Step 2: Wire heartbeat_tick DECIDE phase`
			`- Replace 6 lines of if/else with hermes_local() call`
			`- Keep hardcoded fallback when model is down`
			`- Decision stored in tick record for DPO review`

			`### Step 3: Fix the MCP server warning`
			`- The orchestration MCP server path is broken — harmless but noisy`
			`- Either fix the path or remove from config`

			`### Step 4: Drop model_dashboard.py in timmy-config/bin/`
			`- Reads Ollama API, Hermes sessions, heartbeat ticks`
			`- No new data stores — just views over existing ones`
			- `python3 model_dashboard.py --watch` for live view

			`### Step 5: Expand to more Huey tasks`
			`- triage_issues: model reads issue, picks agent`
			`- good_morning_report: model writes the "From Timmy" section`
			`- Each expansion is just calling hermes_local() with a different prompt`

			`---`

			`## What Gets Hotfixed in Hermes Config`

			If `hermes insights` is broken (the cache_read_tokens column error),
			`that needs a fix. The dashboard falls back to reading sessions.json`
			`directly, but insights would be the better data source.`

			The `providers.ollama` section in config.yaml exists but isn't wired
			`to the --provider flag. Filing this upstream or patching locally would`
			let us do `hermes chat -q "..." --provider ollama` cleanly instead
			`of relying on env vars. Not blocking — env vars work today.`

			`---`

			`## What This Owns`

			`- hermes_local() — 20-line wrapper around a subprocess call`
			`- model_dashboard.py — read-only views over existing data`
			`- review_decisions.py — optional DPO annotation CLI`

			`## What This Does NOT Own`

			`- Inference. Ollama does that.`
			`- Telemetry. Hermes does that.`
			`- Session storage. Hermes does that.`
			`- Token counting. Hermes does that.`
			`- Training pipeline. Already exists in timmy-config/training/.`