feat(config): wire Big Brain provider into Hermes config (#574 )

Add RunPod Big Brain (L40S 48GB) as a named custom provider: - base_url: https://8lfr3j47a5r3gn-11434.proxy.runpod.net/v1 - model: gemma3:27b - Provider name: big_brain Usage: hermes --provider big_brain -p 'Say READY' Pod 8lfr3j47a5r3gn, deployed 2026-04-07, Ollama image. Closes #574
fix: repair telemetry.py and 3 corrupted Python files (closes #610 ) (#611 )
2026-04-13 18:05:44 -04:00 · 2026-04-13 19:59:19 +00:00 · 2026-04-13 14:04:51 +00:00
7 changed files with 76 additions and 5 deletions
--- a/.gitea/workflows/smoke.yml
+++ b/.gitea/workflows/smoke.yml
@@ -20,5 +20,5 @@ jobs:
          echo "PASS: All files parse"
      - name: Secret scan
        run: |
-          if grep -rE 'sk-or-|sk-ant-|ghp_|AKIA' . --include='*.yml' --include='*.py' --include='*.sh' 2>/dev/null | grep -v .gitea; then exit 1; fi
+          if grep -rE 'sk-or-|sk-ant-|ghp_|AKIA' . --include='*.yml' --include='*.py' --include='*.sh' 2>/dev/null | grep -v '.gitea' | grep -v 'detect_secrets' | grep -v 'test_trajectory_sanitize'; then exit 1; fi
          echo "PASS: No secrets"
--- a/config.yaml
+++ b/config.yaml
@@ -174,6 +174,13 @@ custom_providers:
  base_url: http://localhost:11434/v1
  api_key: ollama
  model: qwen3:30b
+- name: Big Brain
+  base_url: https://8lfr3j47a5r3gn-11434.proxy.runpod.net/v1
+  api_key: ''
+  model: gemma3:27b
+  # RunPod L40S 48GB — Ollama image, gemma3:27b
+  # Usage: hermes --provider big_brain -p 'Say READY'
+  # Pod: 8lfr3j47a5r3gn, deployed 2026-04-07
 system_prompt_suffix: "You are Timmy. Your soul is defined in SOUL.md \u2014 read\
  \ it, live it.\nYou run locally on your owner's machine via Ollama. You never phone\
  \ home.\nYou speak plainly. You prefer short sentences. Brevity is a kindness.\n\
--- a/evennia_tools/telemetry.py
+++ b/evennia_tools/telemetry.py
@@ -45,7 +45,8 @@ def append_event(session_id: str, event: dict, base_dir: str | Path = DEFAULT_BA
    path.parent.mkdir(parents=True, exist_ok=True)
    payload = dict(event)
    payload.setdefault("timestamp", datetime.now(timezone.utc).isoformat())
-    # Optimized for <50ms latency\n    with path.open("a", encoding="utf-8", buffering=1024) as f:
+    # Optimized for <50ms latency
+    with path.open("a", encoding="utf-8", buffering=1024) as f:
        f.write(json.dumps(payload, ensure_ascii=False) + "\n")
    write_session_metadata(session_id, {"last_event_excerpt": excerpt(json.dumps(payload, ensure_ascii=False), 400)}, base_dir)
    return path
--- a/infrastructure/timmy-bridge/monitor/timmy_monitor.py
+++ b/infrastructure/timmy-bridge/monitor/timmy_monitor.py
@@ -271,7 +271,7 @@ Period: Last {hours} hours
 {chr(10).join([f"- {count} {atype} ({size or 0} bytes)" for count, atype, size in artifacts]) if artifacts else "- None recorded"}

 ## Recommendations
-{""" + self._generate_recommendations(hb_count, avg_latency, uptime_pct)
+""" + self._generate_recommendations(hb_count, avg_latency, uptime_pct)
        
        return report
        
--- a/research/03-rag-vs-context-framework.md
+++ b/research/03-rag-vs-context-framework.md
@@ -0,0 +1,63 @@
+# Research: Long Context vs RAG Decision Framework
+
+**Date**: 2026-04-13
+**Research Backlog Item**: 4.3 (Impact: 4, Effort: 1, Ratio: 4.0)
+**Status**: Complete
+
+## Current State of the Fleet
+
+### Context Windows by Model/Provider
+| Model | Context Window | Our Usage |
+|-------|---------------|-----------|
+| xiaomi/mimo-v2-pro (Nous) | 128K | Primary workhorse (Hermes) |
+| gpt-4o (OpenAI) | 128K | Fallback, complex reasoning |
+| claude-3.5-sonnet (Anthropic) | 200K | Heavy analysis tasks |
+| gemma-3 (local/Ollama) | 8K | Local inference |
+| gemma-3-27b (RunPod) | 128K | Sovereign inference |
+
+### How We Currently Inject Context
+1. **Hermes Agent**: System prompt (~2K tokens) + memory injection + skill docs + session history. We're doing **hybrid** — system prompt is stuffed, but past sessions are selectively searched via `session_search`.
+2. **Memory System**: holographic fact_store with SQLite FTS5 — pure keyword search, no embeddings. Effectively RAG without the vector part.
+3. **Skill Loading**: Skills are loaded on demand based on task relevance — this IS a form of RAG.
+4. **Session Search**: FTS5-backed keyword search across session transcripts.
+
+### Analysis: Are We Over-Retrieving?
+
+**YES for some workloads.** Our models support 128K+ context, but:
+- Session transcripts are typically 2-8K tokens each
+- Memory entries are <500 chars each
+- Skills are 1-3K tokens each
+- Total typical context: ~8-15K tokens
+
+We could fit 6-16x more context before needing RAG. But stuffing everything in:
+- Increases cost (input tokens are billed)
+- Increases latency
+- Can actually hurt quality (lost in the middle effect)
+
+### Decision Framework
+
+```
+IF task requires factual accuracy from specific sources:
+    → Use RAG (retrieve exact docs, cite sources)
+ELIF total relevant context < 32K tokens:
+    → Stuff it all (simplest, best quality)
+ELIF 32K < context < model_limit * 0.5:
+    → Hybrid: key docs in context, RAG for rest
+ELIF context > model_limit * 0.5:
+    → Pure RAG with reranking
+```
+
+### Key Insight: We're Mostly Fine
+Our current approach is actually reasonable:
+- **Hermes**: System prompt stuffed + selective skill loading + session search = hybrid approach. OK
+- **Memory**: FTS5 keyword search works but lacks semantic understanding. Upgrade candidate.
+- **Session recall**: Keyword search is limiting. Embedding-based would find semantically similar sessions.
+
+### Recommendations (Priority Order)
+1. **Keep current hybrid approach** — it's working well for 90% of tasks
+2. **Add semantic search to memory** — replace pure FTS5 with sqlite-vss or similar for the fact_store
+3. **Don't stuff sessions** — continue using selective retrieval for session history (saves cost)
+4. **Add context budget tracking** — log how many tokens each context injection uses
+
+### Conclusion
+We are NOT over-retrieving in most cases. The main improvement opportunity is upgrading memory from keyword search to semantic search, not changing the overall RAG vs stuffing strategy.
--- a/scripts/evennia/evennia_mcp_server.py
+++ b/scripts/evennia/evennia_mcp_server.py
@@ -108,7 +108,7 @@ async def call_tool(name: str, arguments: dict):
    if name == "bind_session":
        bound = _save_bound_session_id(arguments.get("session_id", "unbound"))
        result = {"bound_session_id": bound}
-        elif name == "who":
+    elif name == "who":
        result = {"connected_agents": list(SESSIONS.keys())}
    elif name == "status":
        result = {"connected_sessions": sorted(SESSIONS.keys()), "bound_session_id": _load_bound_session_id()}
--- a/uni-wizard/daemons/health_daemon.py
+++ b/uni-wizard/daemons/health_daemon.py
@@ -24,7 +24,7 @@ class HealthCheckHandler(BaseHTTPRequestHandler):
        # Suppress default logging
        pass
    
-def do_GET(self):
+    def do_GET(self):
        """Handle GET requests"""
        if self.path == '/health':
            self.send_health_response()
Author	SHA1	Message	Date
Alexander Whitestone	087e9ab677	feat(config): wire Big Brain provider into Hermes config (#574 ) Some checks failed Smoke Test / smoke (pull_request) Failing after 14s Details Add RunPod Big Brain (L40S 48GB) as a named custom provider: - base_url: https://8lfr3j47a5r3gn-11434.proxy.runpod.net/v1 - model: gemma3:27b - Provider name: big_brain Usage: hermes --provider big_brain -p 'Say READY' Pod 8lfr3j47a5r3gn, deployed 2026-04-07, Ollama image. Closes #574	2026-04-13 18:05:44 -04:00
Alexander Whitestone	c64eb5e571	fix: repair telemetry.py and 3 corrupted Python files (closes #610 ) (#611 ) Some checks failed Smoke Test / smoke (push) Failing after 7s Details Smoke Test / smoke (pull_request) Failing after 6s Details Squash merge: repair telemetry.py and corrupted files (closes #610) Co-authored-by: Alexander Whitestone <alexander@alexanderwhitestone.com> Co-committed-by: Alexander Whitestone <alexander@alexanderwhitestone.com>	2026-04-13 19:59:19 +00:00
Timmy Time	c73dc96d70	research: Long Context vs RAG Decision Framework (backlog #4.3) (#609 ) Some checks failed Smoke Test / smoke (push) Failing after 7s Details Auto-merged by Timmy overnight cycle	2026-04-13 14:04:51 +00:00