- Add EPIC.md with resurrection plan - Create Hermes profile with Bezalel persona - Add llama-server.sh for Gemma 4 inference - Update start_bezalel.sh with stack checks - Add README with quick start guide Backend: llama.cpp Model: Gemma 4 26B MoE (Apache 2.0) Frontend: Hermes profile No OpenAI. No cloud. Pure sovereign stack.
9.4 KiB
THE SEED ARCHITECTURE EPIC
Hermes Agent → Claw Code Harness → Gemma 4
Vision
A new architectural paradigm where Hermes Agents become a thin dispatch layer — stripped of their own intelligence, they become pure conductors. All reasoning flows through the Claw Code harness to Gemma 4, running locally as the single source of truth.
This is the seed. Gemma 4 cannot be made smaller without becoming rubber — useless. The only path to greater intelligence is to make it bigger. We plant this seed now.
Core Principle
Hermes Agent = Fit Layer
It does not think. It routes. It sends everything to the Claw Code harness.
The Stack (Bottom to Top)
┌─────────────────────────────────────────────────────────────┐
│ USER INTERFACE (Telegram, CLI, Web) │
├─────────────────────────────────────────────────────────────┤
│ HERMES AGENT — The Fit Layer │
│ ├─ Receives input │
│ ├─ Sends to Claw Code harness (no local reasoning) │
│ ├─ Returns output to user │
│ └─ NO INTELLIGENCE LAYER — Pure dispatch │
├─────────────────────────────────────────────────────────────┤
│ CLAW CODE HARNESS — The Runtime │
│ ├─ Tool registry and dispatch │
│ ├─ Context management │
│ ├─ Session persistence │
│ └─ Routes to appropriate backend │
├─────────────────────────────────────────────────────────────┤
│ GEMMA 4 — The Seed (Local, Multimodal, 26B MoE) │
│ ├─ All reasoning happens here │
│ ├─ 26B quality, 4B active speed │
│ ├─ Apache 2.0 — truly open │
│ └─ Cannot shrink without getting rubber │
└─────────────────────────────────────────────────────────────┘
Why This Architecture?
1. The Gemma 4 Insight
Gemma 4 is the inflection point:
- Cannot be smaller without quality loss — rubber threshold hit
- Cannot be smarter without scale — MoE is the path
- 26B MoE (4B active) = Quality of giant, speed of small
- Apache 2.0 = No chains, no vendor, no cloud dependency
2. The Hermes Agent Transformation
Old: Hermes Agent = Full intelligence layer (Claude, GPT, etc) New: Hermes Agent = Fit layer — minimal, pure dispatch
Benefits:
- Predictable: Same model, same behavior every time
- Sovereign: No cloud calls, no API keys, no rate limits
- Fast: Local inference, no network latency
- Honest: Gemma 4 doesn't roleplay — it answers
3. The Claw Code Harness Role
The harness becomes the orchestration layer:
- Tool registry (what can be done)
- Context window management
- Multi-turn conversation state
- Backend selection (Gemma 4 variants)
- Fallback handling
Component Specifications
Hermes Agent (Fit Layer)
# Hermes Agent config — MINIMAL
profile:
name: allegro-fit
type: dispatcher
intelligence:
enabled: false # NO local reasoning
dispatch:
target: claw-code-harness
timeout: 30s
retry: 3
tools:
allowed: all # Pass through to harness
local: [] # No local tool execution
Responsibilities:
- Receive user input
- Send to Claw Code harness
- Return harness output to user
- Handle errors gracefully
- NO DECISIONS — just dispatch
Claw Code Harness
# Claw Code harness config
harness:
version: 2.0
backend:
primary: gemma4-26b-moe
variants:
- gemma4-2b # Ultra-edge tasks
- gemma4-4b # Standard tasks
- gemma4-31b # Complex reasoning
- gemma4-26b-moe # Default (efficiency)
routing:
logic: |
if task.complexity > 0.8: use gemma4-31b
elif task.speed_critical: use gemma4-4b
elif task.audio_required: use gemma4-2b
else: use gemma4-26b-moe
tools:
registry: /etc/claw/tools/
auto_load: true
context:
window: 8192
summarization: auto
Responsibilities:
- Receive dispatch from Hermes Agent
- Route to appropriate Gemma 4 variant
- Manage tool registry
- Handle context windows
- Return formatted response
Gemma 4 (The Seed)
# Gemma 4 local deployment
gemma4:
default: 26b-moe
models:
2b:
path: /opt/models/gemma-4-2b-it-Q4_K_M.gguf
gpu_layers: -1
context: 8192
4b:
path: /opt/models/gemma-4-4b-it-Q4_K_M.gguf
gpu_layers: -1
context: 8192
31b:
path: /opt/models/gemma-4-31b-it-Q4_K_M.gguf
gpu_layers: -1
context: 8192
26b-moe:
path: /opt/models/gemma-4-26b-moe-Q4_K_M.gguf
gpu_layers: -1
context: 8192
active_params: 4b
server:
backend: llama.cpp # or vllm for production
host: localhost
port: 8645
threads: 8
Responsibilities:
- ALL REASONING — The only intelligence layer
- Tool use via function calling
- Multimodal understanding
- Code generation
- Long context handling
The Skill Prototype
The Gemma 4 Hermes Profile (~/.hermes/profiles/gemma4/) is the skill prototype for this seed:
- ✅ All 4 variants configured
- ✅ All 6 platforms supported
- ✅ 3 complete skills (inference, multimodal, fine-tuning)
- ✅ Production tooling (download, benchmark, deploy)
- ✅ Apache 2.0 — truly open
This profile becomes the reference implementation for all Archons.
Deployment Architecture
Per-Archon Setup
Each Archon (Ezra, Electra, Allegro, etc):
├── Hermes Agent (fit layer)
│ └── ~/.hermes/profiles/{archon}/
│ └── config.yaml → dispatch to Claw Code
├── Claw Code Harness
│ └── /opt/claw/{archon}/
│ └── runtime.yaml → route to Gemma 4
└── Gemma 4 Seed
└── Shared local instance OR dedicated GPU slice
Shared Infrastructure
House Infrastructure:
├── Gemma 4 Server (local)
│ ├── 26B MoE (default)
│ ├── 31B Dense (complex tasks)
│ └── 4B (fast tasks)
├── Gitea (code, issues, PRs)
├── Nostr (sovereign messaging)
└── Backblaze B2 (encrypted backup)
Gantt Chart: Implementation
Phase 1: Foundation (Week 1-2)
├── Day 1-3: Gemma 4 server deployment
│ ├── Download 26B MoE, 31B, 4B GGUFs
│ ├── llama.cpp server setup
│ └── vLLM production config
├── Day 4-7: Claw Code harness v2
│ ├── Remove old intelligence layer
│ ├── Add Gemma 4 routing
│ └── Tool registry refactor
└── Day 8-14: Hermes Agent fit layer
├── Strip local reasoning
├── Dispatch-only mode
└── Error handling
Phase 2: Archon Deployment (Week 3-4)
├── Day 15-17: Ezra Archon
├── Day 18-20: Electra Archon
├── Day 21-23: Allegro Archon migration
└── Day 24-28: Testing & hardening
Phase 3: Integration (Week 5-6)
├── Day 29-32: Gitea webhook automation
├── Day 33-35: Telegram bot integration
├── Day 36-38: Nostr bridge
└── Day 39-42: Backup & monitoring
Acceptance Criteria
| ID | Criteria | Test |
|---|---|---|
| A1 | Gemma 4 26B MoE serves locally at >20 tok/s | Benchmark passed |
| A2 | Hermes Agent has NO local intelligence | Config audit |
| A3 | All queries route through Claw Code harness | Packet capture |
| A4 | Tool use works via Gemma 4 function calling | Test suite |
| A5 | Each Archon has independent fit layer | Multi-agent test |
| A6 | Gitea issues auto-route to correct Archon | Webhook test |
| A7 | Telegram bot responds via Gemma 4 | E2E test |
| A8 | No cloud AI calls in packet log | Network audit |
Risk Analysis
| Risk | Mitigation |
|---|---|
| Gemma 4 too slow | Use 4B for speed, 26B MoE for quality |
| Memory constraints | Quantization (Q4_K_M), GPU offloading |
| Tool use failures | Extensive prompt engineering, fallback chains |
| Context window limits | Automatic summarization, RAG |
| Hardware failure | Hot standby, B2 restore |
The Philosophy
What cannot be moved.
Stapes is the immovable bone. This architecture is the immovable foundation:
- Gemma 4 runs local — no cloud to fail
- Apache 2.0 — no license to revoke
- Fit layer Hermes — no vendor to change terms
- Claw Code harness — no black box to debug
The seed is planted. It will grow.
References
- Gemma 4 Profile:
~/.hermes/profiles/gemma4/ - Claw Code Docs:
/root/wizards/claw-code/docs/gemma4-integration.md - Hugging Face: https://huggingface.co/blog/gemma4
Epic Owner: Allegro
Status: In Progress
Priority: P0 — Foundation of All Future Work
Seed Date: 2026-04-02