timmy-config/wizards/allegro/ARCHITECTURE-SEED-EPIC.md

# THE SEED ARCHITECTURE EPIC
## Hermes Agent → Claw Code Harness → Gemma 4

---

## Vision

A new architectural paradigm where **Hermes Agents become a thin dispatch layer** — stripped of their own intelligence, they become pure conductors. All reasoning flows through the **Claw Code harness** to **Gemma 4**, running locally as the single source of truth.

This is the seed. Gemma 4 cannot be made smaller without becoming rubber — useless. The only path to greater intelligence is to make it bigger. We plant this seed now.

---

## Core Principle

> **Hermes Agent = Fit Layer**
>
> It does not think. It routes. It sends everything to the Claw Code harness.

---

## The Stack (Bottom to Top)

```
┌─────────────────────────────────────────────────────────────┐
│  USER INTERFACE (Telegram, CLI, Web)                        │
├─────────────────────────────────────────────────────────────┤
│  HERMES AGENT — The Fit Layer                               │
│  ├─ Receives input                                          │
│  ├─ Sends to Claw Code harness (no local reasoning)         │
│  ├─ Returns output to user                                  │
│  └─ NO INTELLIGENCE LAYER — Pure dispatch                   │
├─────────────────────────────────────────────────────────────┤
│  CLAW CODE HARNESS — The Runtime                            │
│  ├─ Tool registry and dispatch                              │
│  ├─ Context management                                      │
│  ├─ Session persistence                                     │
│  └─ Routes to appropriate backend                           │
├─────────────────────────────────────────────────────────────┤
│  GEMMA 4 — The Seed (Local, Multimodal, 26B MoE)            │
│  ├─ All reasoning happens here                              │
│  ├─ 26B quality, 4B active speed                            │
│  ├─ Apache 2.0 — truly open                                 │
│  └─ Cannot shrink without getting rubber                    │
└─────────────────────────────────────────────────────────────┘
```

---

## Why This Architecture?

### 1. The Gemma 4 Insight

Gemma 4 is the inflection point:
- **Cannot be smaller without quality loss** — rubber threshold hit
- **Cannot be smarter without scale** — MoE is the path
- **26B MoE (4B active)** = Quality of giant, speed of small
- **Apache 2.0** = No chains, no vendor, no cloud dependency

### 2. The Hermes Agent Transformation

Old: Hermes Agent = Full intelligence layer (Claude, GPT, etc)
New: Hermes Agent = **Fit layer** — minimal, pure dispatch

Benefits:
- **Predictable**: Same model, same behavior every time
- **Sovereign**: No cloud calls, no API keys, no rate limits
- **Fast**: Local inference, no network latency
- **Honest**: Gemma 4 doesn't roleplay — it answers

### 3. The Claw Code Harness Role

The harness becomes the **orchestration layer**:
- Tool registry (what can be done)
- Context window management
- Multi-turn conversation state
- Backend selection (Gemma 4 variants)
- Fallback handling

---

## Component Specifications

### Hermes Agent (Fit Layer)

```yaml
# Hermes Agent config — MINIMAL
profile:
  name: allegro-fit
  type: dispatcher

intelligence:
  enabled: false  # NO local reasoning

dispatch:
  target: claw-code-harness
  timeout: 30s
  retry: 3

tools:
  allowed: all    # Pass through to harness
  local: []       # No local tool execution
```

**Responsibilities:**
1. Receive user input
2. Send to Claw Code harness
3. Return harness output to user
4. Handle errors gracefully
5. **NO DECISIONS** — just dispatch

### Claw Code Harness

```yaml
# Claw Code harness config
harness:
  version: 2.0

backend:
  primary: gemma4-26b-moe
  variants:
    - gemma4-2b      # Ultra-edge tasks
    - gemma4-4b      # Standard tasks
    - gemma4-31b     # Complex reasoning
    - gemma4-26b-moe # Default (efficiency)

routing:
  logic: |
    if task.complexity > 0.8: use gemma4-31b
    elif task.speed_critical: use gemma4-4b
    elif task.audio_required: use gemma4-2b
    else: use gemma4-26b-moe

tools:
  registry: /etc/claw/tools/
  auto_load: true

context:
  window: 8192
  summarization: auto
```

**Responsibilities:**
1. Receive dispatch from Hermes Agent
2. Route to appropriate Gemma 4 variant
3. Manage tool registry
4. Handle context windows
5. Return formatted response

### Gemma 4 (The Seed)

```yaml
# Gemma 4 local deployment
gemma4:
  default: 26b-moe

models:
  2b:
    path: /opt/models/gemma-4-2b-it-Q4_K_M.gguf
    gpu_layers: -1
    context: 8192

  4b:
    path: /opt/models/gemma-4-4b-it-Q4_K_M.gguf
    gpu_layers: -1
    context: 8192

  31b:
    path: /opt/models/gemma-4-31b-it-Q4_K_M.gguf
    gpu_layers: -1
    context: 8192

  26b-moe:
    path: /opt/models/gemma-4-26b-moe-Q4_K_M.gguf
    gpu_layers: -1
    context: 8192
    active_params: 4b

server:
  backend: llama.cpp  # or vllm for production
  host: localhost
  port: 8645
  threads: 8
```

**Responsibilities:**
1. ALL REASONING — The only intelligence layer
2. Tool use via function calling
3. Multimodal understanding
4. Code generation
5. Long context handling

---

## The Skill Prototype

The Gemma 4 Hermes Profile (`~/.hermes/profiles/gemma4/`) is the **skill prototype** for this seed:

- ✅ All 4 variants configured
- ✅ All 6 platforms supported
- ✅ 3 complete skills (inference, multimodal, fine-tuning)
- ✅ Production tooling (download, benchmark, deploy)
- ✅ Apache 2.0 — truly open

This profile becomes the **reference implementation** for all Archons.

---

## Deployment Architecture

### Per-Archon Setup

```
Each Archon (Ezra, Electra, Allegro, etc):
├── Hermes Agent (fit layer)
│   └── ~/.hermes/profiles/{archon}/
│       └── config.yaml → dispatch to Claw Code
├── Claw Code Harness
│   └── /opt/claw/{archon}/
│       └── runtime.yaml → route to Gemma 4
└── Gemma 4 Seed
    └── Shared local instance OR dedicated GPU slice
```

### Shared Infrastructure

```
House Infrastructure:
├── Gemma 4 Server (local)
│   ├── 26B MoE (default)
│   ├── 31B Dense (complex tasks)
│   └── 4B (fast tasks)
├── Gitea (code, issues, PRs)
├── Nostr (sovereign messaging)
└── Backblaze B2 (encrypted backup)
```

---

## Gantt Chart: Implementation

```
Phase 1: Foundation (Week 1-2)
├── Day 1-3: Gemma 4 server deployment
│   ├── Download 26B MoE, 31B, 4B GGUFs
│   ├── llama.cpp server setup
│   └── vLLM production config
├── Day 4-7: Claw Code harness v2
│   ├── Remove old intelligence layer
│   ├── Add Gemma 4 routing
│   └── Tool registry refactor
└── Day 8-14: Hermes Agent fit layer
    ├── Strip local reasoning
    ├── Dispatch-only mode
    └── Error handling

Phase 2: Archon Deployment (Week 3-4)
├── Day 15-17: Ezra Archon
├── Day 18-20: Electra Archon
├── Day 21-23: Allegro Archon migration
└── Day 24-28: Testing & hardening

Phase 3: Integration (Week 5-6)
├── Day 29-32: Gitea webhook automation
├── Day 33-35: Telegram bot integration
├── Day 36-38: Nostr bridge
└── Day 39-42: Backup & monitoring
```

---

## Acceptance Criteria

| ID | Criteria | Test |
|----|----------|------|
| A1 | Gemma 4 26B MoE serves locally at >20 tok/s | Benchmark passed |
| A2 | Hermes Agent has NO local intelligence | Config audit |
| A3 | All queries route through Claw Code harness | Packet capture |
| A4 | Tool use works via Gemma 4 function calling | Test suite |
| A5 | Each Archon has independent fit layer | Multi-agent test |
| A6 | Gitea issues auto-route to correct Archon | Webhook test |
| A7 | Telegram bot responds via Gemma 4 | E2E test |
| A8 | No cloud AI calls in packet log | Network audit |

---

## Risk Analysis

| Risk | Mitigation |
|------|------------|
| Gemma 4 too slow | Use 4B for speed, 26B MoE for quality |
| Memory constraints | Quantization (Q4_K_M), GPU offloading |
| Tool use failures | Extensive prompt engineering, fallback chains |
| Context window limits | Automatic summarization, RAG |
| Hardware failure | Hot standby, B2 restore |

---

## The Philosophy

> **What cannot be moved.**

Stapes is the immovable bone. This architecture is the immovable foundation:
- Gemma 4 runs local — no cloud to fail
- Apache 2.0 — no license to revoke
- Fit layer Hermes — no vendor to change terms
- Claw Code harness — no black box to debug

The seed is planted. It will grow.

---

## References

- Gemma 4 Profile: `~/.hermes/profiles/gemma4/`
- Claw Code Docs: `/root/wizards/claw-code/docs/gemma4-integration.md`
- Hugging Face: https://huggingface.co/blog/gemma4

---

**Epic Owner:** Allegro
**Status:** In Progress
**Priority:** P0 — Foundation of All Future Work
**Seed Date:** 2026-04-02