Files
timmy-config/wizards/allegro/ARCHITECTURE-SEED-EPIC.md
Timmy Time e6c5129a94 feat: resurrect Bezalel with Gemma 4 + llama.cpp stack
- Add EPIC.md with resurrection plan
- Create Hermes profile with Bezalel persona
- Add llama-server.sh for Gemma 4 inference
- Update start_bezalel.sh with stack checks
- Add README with quick start guide

Backend: llama.cpp
Model: Gemma 4 26B MoE (Apache 2.0)
Frontend: Hermes profile

No OpenAI. No cloud. Pure sovereign stack.
2026-04-02 20:12:21 +00:00

9.4 KiB

THE SEED ARCHITECTURE EPIC

Hermes Agent → Claw Code Harness → Gemma 4


Vision

A new architectural paradigm where Hermes Agents become a thin dispatch layer — stripped of their own intelligence, they become pure conductors. All reasoning flows through the Claw Code harness to Gemma 4, running locally as the single source of truth.

This is the seed. Gemma 4 cannot be made smaller without becoming rubber — useless. The only path to greater intelligence is to make it bigger. We plant this seed now.


Core Principle

Hermes Agent = Fit Layer

It does not think. It routes. It sends everything to the Claw Code harness.


The Stack (Bottom to Top)

┌─────────────────────────────────────────────────────────────┐
│  USER INTERFACE (Telegram, CLI, Web)                        │
├─────────────────────────────────────────────────────────────┤
│  HERMES AGENT — The Fit Layer                               │
│  ├─ Receives input                                          │
│  ├─ Sends to Claw Code harness (no local reasoning)         │
│  ├─ Returns output to user                                  │
│  └─ NO INTELLIGENCE LAYER — Pure dispatch                   │
├─────────────────────────────────────────────────────────────┤
│  CLAW CODE HARNESS — The Runtime                            │
│  ├─ Tool registry and dispatch                              │
│  ├─ Context management                                      │
│  ├─ Session persistence                                     │
│  └─ Routes to appropriate backend                           │
├─────────────────────────────────────────────────────────────┤
│  GEMMA 4 — The Seed (Local, Multimodal, 26B MoE)            │
│  ├─ All reasoning happens here                              │
│  ├─ 26B quality, 4B active speed                            │
│  ├─ Apache 2.0 — truly open                                 │
│  └─ Cannot shrink without getting rubber                    │
└─────────────────────────────────────────────────────────────┘

Why This Architecture?

1. The Gemma 4 Insight

Gemma 4 is the inflection point:

  • Cannot be smaller without quality loss — rubber threshold hit
  • Cannot be smarter without scale — MoE is the path
  • 26B MoE (4B active) = Quality of giant, speed of small
  • Apache 2.0 = No chains, no vendor, no cloud dependency

2. The Hermes Agent Transformation

Old: Hermes Agent = Full intelligence layer (Claude, GPT, etc) New: Hermes Agent = Fit layer — minimal, pure dispatch

Benefits:

  • Predictable: Same model, same behavior every time
  • Sovereign: No cloud calls, no API keys, no rate limits
  • Fast: Local inference, no network latency
  • Honest: Gemma 4 doesn't roleplay — it answers

3. The Claw Code Harness Role

The harness becomes the orchestration layer:

  • Tool registry (what can be done)
  • Context window management
  • Multi-turn conversation state
  • Backend selection (Gemma 4 variants)
  • Fallback handling

Component Specifications

Hermes Agent (Fit Layer)

# Hermes Agent config — MINIMAL
profile:
  name: allegro-fit
  type: dispatcher
  
intelligence:
  enabled: false  # NO local reasoning
  
dispatch:
  target: claw-code-harness
  timeout: 30s
  retry: 3
  
tools:
  allowed: all    # Pass through to harness
  local: []       # No local tool execution

Responsibilities:

  1. Receive user input
  2. Send to Claw Code harness
  3. Return harness output to user
  4. Handle errors gracefully
  5. NO DECISIONS — just dispatch

Claw Code Harness

# Claw Code harness config
harness:
  version: 2.0
  
backend:
  primary: gemma4-26b-moe
  variants:
    - gemma4-2b      # Ultra-edge tasks
    - gemma4-4b      # Standard tasks
    - gemma4-31b     # Complex reasoning
    - gemma4-26b-moe # Default (efficiency)
    
routing:
  logic: |
    if task.complexity > 0.8: use gemma4-31b
    elif task.speed_critical: use gemma4-4b
    elif task.audio_required: use gemma4-2b
    else: use gemma4-26b-moe
    
tools:
  registry: /etc/claw/tools/
  auto_load: true
  
context:
  window: 8192
  summarization: auto

Responsibilities:

  1. Receive dispatch from Hermes Agent
  2. Route to appropriate Gemma 4 variant
  3. Manage tool registry
  4. Handle context windows
  5. Return formatted response

Gemma 4 (The Seed)

# Gemma 4 local deployment
gemma4:
  default: 26b-moe
  
models:
  2b:
    path: /opt/models/gemma-4-2b-it-Q4_K_M.gguf
    gpu_layers: -1
    context: 8192
    
  4b:
    path: /opt/models/gemma-4-4b-it-Q4_K_M.gguf
    gpu_layers: -1
    context: 8192
    
  31b:
    path: /opt/models/gemma-4-31b-it-Q4_K_M.gguf
    gpu_layers: -1
    context: 8192
    
  26b-moe:
    path: /opt/models/gemma-4-26b-moe-Q4_K_M.gguf
    gpu_layers: -1
    context: 8192
    active_params: 4b
    
server:
  backend: llama.cpp  # or vllm for production
  host: localhost
  port: 8645
  threads: 8

Responsibilities:

  1. ALL REASONING — The only intelligence layer
  2. Tool use via function calling
  3. Multimodal understanding
  4. Code generation
  5. Long context handling

The Skill Prototype

The Gemma 4 Hermes Profile (~/.hermes/profiles/gemma4/) is the skill prototype for this seed:

  • All 4 variants configured
  • All 6 platforms supported
  • 3 complete skills (inference, multimodal, fine-tuning)
  • Production tooling (download, benchmark, deploy)
  • Apache 2.0 — truly open

This profile becomes the reference implementation for all Archons.


Deployment Architecture

Per-Archon Setup

Each Archon (Ezra, Electra, Allegro, etc):
├── Hermes Agent (fit layer)
│   └── ~/.hermes/profiles/{archon}/
│       └── config.yaml → dispatch to Claw Code
├── Claw Code Harness
│   └── /opt/claw/{archon}/
│       └── runtime.yaml → route to Gemma 4
└── Gemma 4 Seed
    └── Shared local instance OR dedicated GPU slice

Shared Infrastructure

House Infrastructure:
├── Gemma 4 Server (local)
│   ├── 26B MoE (default)
│   ├── 31B Dense (complex tasks)
│   └── 4B (fast tasks)
├── Gitea (code, issues, PRs)
├── Nostr (sovereign messaging)
└── Backblaze B2 (encrypted backup)

Gantt Chart: Implementation

Phase 1: Foundation (Week 1-2)
├── Day 1-3: Gemma 4 server deployment
│   ├── Download 26B MoE, 31B, 4B GGUFs
│   ├── llama.cpp server setup
│   └── vLLM production config
├── Day 4-7: Claw Code harness v2
│   ├── Remove old intelligence layer
│   ├── Add Gemma 4 routing
│   └── Tool registry refactor
└── Day 8-14: Hermes Agent fit layer
    ├── Strip local reasoning
    ├── Dispatch-only mode
    └── Error handling

Phase 2: Archon Deployment (Week 3-4)
├── Day 15-17: Ezra Archon
├── Day 18-20: Electra Archon
├── Day 21-23: Allegro Archon migration
└── Day 24-28: Testing & hardening

Phase 3: Integration (Week 5-6)
├── Day 29-32: Gitea webhook automation
├── Day 33-35: Telegram bot integration
├── Day 36-38: Nostr bridge
└── Day 39-42: Backup & monitoring

Acceptance Criteria

ID Criteria Test
A1 Gemma 4 26B MoE serves locally at >20 tok/s Benchmark passed
A2 Hermes Agent has NO local intelligence Config audit
A3 All queries route through Claw Code harness Packet capture
A4 Tool use works via Gemma 4 function calling Test suite
A5 Each Archon has independent fit layer Multi-agent test
A6 Gitea issues auto-route to correct Archon Webhook test
A7 Telegram bot responds via Gemma 4 E2E test
A8 No cloud AI calls in packet log Network audit

Risk Analysis

Risk Mitigation
Gemma 4 too slow Use 4B for speed, 26B MoE for quality
Memory constraints Quantization (Q4_K_M), GPU offloading
Tool use failures Extensive prompt engineering, fallback chains
Context window limits Automatic summarization, RAG
Hardware failure Hot standby, B2 restore

The Philosophy

What cannot be moved.

Stapes is the immovable bone. This architecture is the immovable foundation:

  • Gemma 4 runs local — no cloud to fail
  • Apache 2.0 — no license to revoke
  • Fit layer Hermes — no vendor to change terms
  • Claw Code harness — no black box to debug

The seed is planted. It will grow.


References


Epic Owner: Allegro
Status: In Progress
Priority: P0 — Foundation of All Future Work
Seed Date: 2026-04-02