feat: Frontier Local Agenda — Gemma Scout & Local RAG (#227)

Co-authored-by: Google AI Agent <gemini@hermes.local> Co-committed-by: Google AI Agent <gemini@hermes.local>
2026-04-05 21:38:56 +00:00
parent 2b308f300a
commit 317140efcf
2 changed files with 38 additions and 3 deletions
--- a/FRONTIER_LOCAL.md
+++ b/FRONTIER_LOCAL.md
@@ -0,0 +1,30 @@
+
+# The Frontier Local Agenda: Technical Standards v1.0
+
+This document defines the "Frontier Local" agenda — the technical strategy for achieving sovereign, high-performance intelligence on consumer hardware.
+
+## 1. The Multi-Layered Mind (MLM)
+We do not rely on a single "God Model." We use a hierarchy of local intelligence:
+
+- **Reflex Layer (Gemma 2B):** Instantaneous tactical decisions, input classification, and simple acknowledgments. Latency: <100ms.
+- **Reasoning Layer (Hermes 14B / Llama 3 8B):** General-purpose problem solving, coding, and tool use. Latency: <1s.
+- **Synthesis Layer (Llama 3 70B / Qwen 72B):** Deep architectural planning, creative synthesis, and complex debugging. Latency: <5s.
+
+## 2. Local-First RAG (Retrieval Augmented Generation)
+Sovereignty requires that your memories stay on your disk.
+
+- **Embedding:** Use `nomic-embed-text` or `all-minilm` locally via Ollama.
+- **Vector Store:** Use a local instance of ChromaDB or LanceDB.
+- **Privacy:** Zero data leaves the local network for indexing or retrieval.
+
+## 3. Speculative Decoding
+Where supported by the harness (e.g., llama.cpp), use Gemma 2B as a draft model for larger Hermes/Llama models to achieve 2x-3x speedups in token generation.
+
+## 4. The "Gemma Scout" Protocol
+Gemma 2B is our "Scout." It pre-processes every user request to:
+1. Detect PII (Personally Identifiable Information) for redaction.
+2. Determine if the request requires the "Reasoning Layer" or can be handled by the "Reflex Layer."
+3. Extract keywords for local memory retrieval.
+
+---
+*Intelligence is a utility. Sovereignty is a right. The Frontier is Local.*
--- a/config.yaml
+++ b/config.yaml
@@ -20,7 +20,12 @@ terminal:
  modal_image: nikolaik/python-nodejs:python3.11-nodejs20
  daytona_image: nikolaik/python-nodejs:python3.11-nodejs20
  container_cpu: 1
-  container_memory: 5120
+  container_embeddings:
+  provider: ollama
+  model: nomic-embed-text
+  base_url: http://localhost:11434/v1
+
+memory: 5120
  container_disk: 51200
  container_persistent: true
  docker_volumes: []
@@ -43,8 +48,8 @@ compression:
  summary_base_url: ''
 smart_model_routing:
  enabled: true
-  max_simple_chars: 200
-  max_simple_words: 35
+  max_simple_chars: 400
+  max_simple_words: 75
  cheap_model:
    provider: 'ollama'
    model: 'gemma2:2b'