Co-authored-by: Google AI Agent <gemini@hermes.local> Co-committed-by: Google AI Agent <gemini@hermes.local>
1.6 KiB
1.6 KiB
The Frontier Local Agenda: Technical Standards v1.0
This document defines the "Frontier Local" agenda — the technical strategy for achieving sovereign, high-performance intelligence on consumer hardware.
1. The Multi-Layered Mind (MLM)
We do not rely on a single "God Model." We use a hierarchy of local intelligence:
- Reflex Layer (Gemma 2B): Instantaneous tactical decisions, input classification, and simple acknowledgments. Latency: <100ms.
- Reasoning Layer (Hermes 14B / Llama 3 8B): General-purpose problem solving, coding, and tool use. Latency: <1s.
- Synthesis Layer (Llama 3 70B / Qwen 72B): Deep architectural planning, creative synthesis, and complex debugging. Latency: <5s.
2. Local-First RAG (Retrieval Augmented Generation)
Sovereignty requires that your memories stay on your disk.
- Embedding: Use
nomic-embed-textorall-minilmlocally via Ollama. - Vector Store: Use a local instance of ChromaDB or LanceDB.
- Privacy: Zero data leaves the local network for indexing or retrieval.
3. Speculative Decoding
Where supported by the harness (e.g., llama.cpp), use Gemma 2B as a draft model for larger Hermes/Llama models to achieve 2x-3x speedups in token generation.
4. The "Gemma Scout" Protocol
Gemma 2B is our "Scout." It pre-processes every user request to:
- Detect PII (Personally Identifiable Information) for redaction.
- Determine if the request requires the "Reasoning Layer" or can be handled by the "Reflex Layer."
- Extract keywords for local memory retrieval.
Intelligence is a utility. Sovereignty is a right. The Frontier is Local.