Files
timmy-config/wizards/allegro/ARCHITECTURE-SEED-EPIC.md
Timmy Time e6c5129a94 feat: resurrect Bezalel with Gemma 4 + llama.cpp stack
- Add EPIC.md with resurrection plan
- Create Hermes profile with Bezalel persona
- Add llama-server.sh for Gemma 4 inference
- Update start_bezalel.sh with stack checks
- Add README with quick start guide

Backend: llama.cpp
Model: Gemma 4 26B MoE (Apache 2.0)
Frontend: Hermes profile

No OpenAI. No cloud. Pure sovereign stack.
2026-04-02 20:12:21 +00:00

325 lines
9.4 KiB
Markdown

# THE SEED ARCHITECTURE EPIC
## Hermes Agent → Claw Code Harness → Gemma 4
---
## Vision
A new architectural paradigm where **Hermes Agents become a thin dispatch layer** — stripped of their own intelligence, they become pure conductors. All reasoning flows through the **Claw Code harness** to **Gemma 4**, running locally as the single source of truth.
This is the seed. Gemma 4 cannot be made smaller without becoming rubber — useless. The only path to greater intelligence is to make it bigger. We plant this seed now.
---
## Core Principle
> **Hermes Agent = Fit Layer**
>
> It does not think. It routes. It sends everything to the Claw Code harness.
---
## The Stack (Bottom to Top)
```
┌─────────────────────────────────────────────────────────────┐
│ USER INTERFACE (Telegram, CLI, Web) │
├─────────────────────────────────────────────────────────────┤
│ HERMES AGENT — The Fit Layer │
│ ├─ Receives input │
│ ├─ Sends to Claw Code harness (no local reasoning) │
│ ├─ Returns output to user │
│ └─ NO INTELLIGENCE LAYER — Pure dispatch │
├─────────────────────────────────────────────────────────────┤
│ CLAW CODE HARNESS — The Runtime │
│ ├─ Tool registry and dispatch │
│ ├─ Context management │
│ ├─ Session persistence │
│ └─ Routes to appropriate backend │
├─────────────────────────────────────────────────────────────┤
│ GEMMA 4 — The Seed (Local, Multimodal, 26B MoE) │
│ ├─ All reasoning happens here │
│ ├─ 26B quality, 4B active speed │
│ ├─ Apache 2.0 — truly open │
│ └─ Cannot shrink without getting rubber │
└─────────────────────────────────────────────────────────────┘
```
---
## Why This Architecture?
### 1. The Gemma 4 Insight
Gemma 4 is the inflection point:
- **Cannot be smaller without quality loss** — rubber threshold hit
- **Cannot be smarter without scale** — MoE is the path
- **26B MoE (4B active)** = Quality of giant, speed of small
- **Apache 2.0** = No chains, no vendor, no cloud dependency
### 2. The Hermes Agent Transformation
Old: Hermes Agent = Full intelligence layer (Claude, GPT, etc)
New: Hermes Agent = **Fit layer** — minimal, pure dispatch
Benefits:
- **Predictable**: Same model, same behavior every time
- **Sovereign**: No cloud calls, no API keys, no rate limits
- **Fast**: Local inference, no network latency
- **Honest**: Gemma 4 doesn't roleplay — it answers
### 3. The Claw Code Harness Role
The harness becomes the **orchestration layer**:
- Tool registry (what can be done)
- Context window management
- Multi-turn conversation state
- Backend selection (Gemma 4 variants)
- Fallback handling
---
## Component Specifications
### Hermes Agent (Fit Layer)
```yaml
# Hermes Agent config — MINIMAL
profile:
name: allegro-fit
type: dispatcher
intelligence:
enabled: false # NO local reasoning
dispatch:
target: claw-code-harness
timeout: 30s
retry: 3
tools:
allowed: all # Pass through to harness
local: [] # No local tool execution
```
**Responsibilities:**
1. Receive user input
2. Send to Claw Code harness
3. Return harness output to user
4. Handle errors gracefully
5. **NO DECISIONS** — just dispatch
### Claw Code Harness
```yaml
# Claw Code harness config
harness:
version: 2.0
backend:
primary: gemma4-26b-moe
variants:
- gemma4-2b # Ultra-edge tasks
- gemma4-4b # Standard tasks
- gemma4-31b # Complex reasoning
- gemma4-26b-moe # Default (efficiency)
routing:
logic: |
if task.complexity > 0.8: use gemma4-31b
elif task.speed_critical: use gemma4-4b
elif task.audio_required: use gemma4-2b
else: use gemma4-26b-moe
tools:
registry: /etc/claw/tools/
auto_load: true
context:
window: 8192
summarization: auto
```
**Responsibilities:**
1. Receive dispatch from Hermes Agent
2. Route to appropriate Gemma 4 variant
3. Manage tool registry
4. Handle context windows
5. Return formatted response
### Gemma 4 (The Seed)
```yaml
# Gemma 4 local deployment
gemma4:
default: 26b-moe
models:
2b:
path: /opt/models/gemma-4-2b-it-Q4_K_M.gguf
gpu_layers: -1
context: 8192
4b:
path: /opt/models/gemma-4-4b-it-Q4_K_M.gguf
gpu_layers: -1
context: 8192
31b:
path: /opt/models/gemma-4-31b-it-Q4_K_M.gguf
gpu_layers: -1
context: 8192
26b-moe:
path: /opt/models/gemma-4-26b-moe-Q4_K_M.gguf
gpu_layers: -1
context: 8192
active_params: 4b
server:
backend: llama.cpp # or vllm for production
host: localhost
port: 8645
threads: 8
```
**Responsibilities:**
1. ALL REASONING — The only intelligence layer
2. Tool use via function calling
3. Multimodal understanding
4. Code generation
5. Long context handling
---
## The Skill Prototype
The Gemma 4 Hermes Profile (`~/.hermes/profiles/gemma4/`) is the **skill prototype** for this seed:
- ✅ All 4 variants configured
- ✅ All 6 platforms supported
- ✅ 3 complete skills (inference, multimodal, fine-tuning)
- ✅ Production tooling (download, benchmark, deploy)
- ✅ Apache 2.0 — truly open
This profile becomes the **reference implementation** for all Archons.
---
## Deployment Architecture
### Per-Archon Setup
```
Each Archon (Ezra, Electra, Allegro, etc):
├── Hermes Agent (fit layer)
│ └── ~/.hermes/profiles/{archon}/
│ └── config.yaml → dispatch to Claw Code
├── Claw Code Harness
│ └── /opt/claw/{archon}/
│ └── runtime.yaml → route to Gemma 4
└── Gemma 4 Seed
└── Shared local instance OR dedicated GPU slice
```
### Shared Infrastructure
```
House Infrastructure:
├── Gemma 4 Server (local)
│ ├── 26B MoE (default)
│ ├── 31B Dense (complex tasks)
│ └── 4B (fast tasks)
├── Gitea (code, issues, PRs)
├── Nostr (sovereign messaging)
└── Backblaze B2 (encrypted backup)
```
---
## Gantt Chart: Implementation
```
Phase 1: Foundation (Week 1-2)
├── Day 1-3: Gemma 4 server deployment
│ ├── Download 26B MoE, 31B, 4B GGUFs
│ ├── llama.cpp server setup
│ └── vLLM production config
├── Day 4-7: Claw Code harness v2
│ ├── Remove old intelligence layer
│ ├── Add Gemma 4 routing
│ └── Tool registry refactor
└── Day 8-14: Hermes Agent fit layer
├── Strip local reasoning
├── Dispatch-only mode
└── Error handling
Phase 2: Archon Deployment (Week 3-4)
├── Day 15-17: Ezra Archon
├── Day 18-20: Electra Archon
├── Day 21-23: Allegro Archon migration
└── Day 24-28: Testing & hardening
Phase 3: Integration (Week 5-6)
├── Day 29-32: Gitea webhook automation
├── Day 33-35: Telegram bot integration
├── Day 36-38: Nostr bridge
└── Day 39-42: Backup & monitoring
```
---
## Acceptance Criteria
| ID | Criteria | Test |
|----|----------|------|
| A1 | Gemma 4 26B MoE serves locally at >20 tok/s | Benchmark passed |
| A2 | Hermes Agent has NO local intelligence | Config audit |
| A3 | All queries route through Claw Code harness | Packet capture |
| A4 | Tool use works via Gemma 4 function calling | Test suite |
| A5 | Each Archon has independent fit layer | Multi-agent test |
| A6 | Gitea issues auto-route to correct Archon | Webhook test |
| A7 | Telegram bot responds via Gemma 4 | E2E test |
| A8 | No cloud AI calls in packet log | Network audit |
---
## Risk Analysis
| Risk | Mitigation |
|------|------------|
| Gemma 4 too slow | Use 4B for speed, 26B MoE for quality |
| Memory constraints | Quantization (Q4_K_M), GPU offloading |
| Tool use failures | Extensive prompt engineering, fallback chains |
| Context window limits | Automatic summarization, RAG |
| Hardware failure | Hot standby, B2 restore |
---
## The Philosophy
> **What cannot be moved.**
Stapes is the immovable bone. This architecture is the immovable foundation:
- Gemma 4 runs local — no cloud to fail
- Apache 2.0 — no license to revoke
- Fit layer Hermes — no vendor to change terms
- Claw Code harness — no black box to debug
The seed is planted. It will grow.
---
## References
- Gemma 4 Profile: `~/.hermes/profiles/gemma4/`
- Claw Code Docs: `/root/wizards/claw-code/docs/gemma4-integration.md`
- Hugging Face: https://huggingface.co/blog/gemma4
---
**Epic Owner:** Allegro
**Status:** In Progress
**Priority:** P0 — Foundation of All Future Work
**Seed Date:** 2026-04-02