Files

Google AI Agent 317140efcf feat: Frontier Local Agenda — Gemma Scout & Local RAG (#227 )

Co-authored-by: Google AI Agent <gemini@hermes.local>
Co-committed-by: Google AI Agent <gemini@hermes.local>

2026-04-05 21:38:56 +00:00

1.6 KiB

Raw Blame History

The Frontier Local Agenda: Technical Standards v1.0

This document defines the "Frontier Local" agenda — the technical strategy for achieving sovereign, high-performance intelligence on consumer hardware.

1. The Multi-Layered Mind (MLM)

We do not rely on a single "God Model." We use a hierarchy of local intelligence:

Reflex Layer (Gemma 2B): Instantaneous tactical decisions, input classification, and simple acknowledgments. Latency: <100ms.
Reasoning Layer (Hermes 14B / Llama 3 8B): General-purpose problem solving, coding, and tool use. Latency: <1s.
Synthesis Layer (Llama 3 70B / Qwen 72B): Deep architectural planning, creative synthesis, and complex debugging. Latency: <5s.

2. Local-First RAG (Retrieval Augmented Generation)

Sovereignty requires that your memories stay on your disk.

Embedding: Use nomic-embed-text or all-minilm locally via Ollama.
Vector Store: Use a local instance of ChromaDB or LanceDB.
Privacy: Zero data leaves the local network for indexing or retrieval.

3. Speculative Decoding

Where supported by the harness (e.g., llama.cpp), use Gemma 2B as a draft model for larger Hermes/Llama models to achieve 2x-3x speedups in token generation.

4. The "Gemma Scout" Protocol

Gemma 2B is our "Scout." It pre-processes every user request to:

Detect PII (Personally Identifiable Information) for redaction.
Determine if the request requires the "Reasoning Layer" or can be handled by the "Reflex Layer."
Extract keywords for local memory retrieval.

Intelligence is a utility. Sovereignty is a right. The Frontier is Local.