Files

Timmy Time e6c5129a94 feat: resurrect Bezalel with Gemma 4 + llama.cpp stack

- Add EPIC.md with resurrection plan
- Create Hermes profile with Bezalel persona
- Add llama-server.sh for Gemma 4 inference
- Update start_bezalel.sh with stack checks
- Add README with quick start guide

Backend: llama.cpp
Model: Gemma 4 26B MoE (Apache 2.0)
Frontend: Hermes profile

No OpenAI. No cloud. Pure sovereign stack.

2026-04-02 20:12:21 +00:00

9.4 KiB

Raw Blame History

THE SEED ARCHITECTURE EPIC

Hermes Agent → Claw Code Harness → Gemma 4

Vision

A new architectural paradigm where Hermes Agents become a thin dispatch layer — stripped of their own intelligence, they become pure conductors. All reasoning flows through the Claw Code harness to Gemma 4, running locally as the single source of truth.

This is the seed. Gemma 4 cannot be made smaller without becoming rubber — useless. The only path to greater intelligence is to make it bigger. We plant this seed now.

Core Principle

Hermes Agent = Fit Layer

It does not think. It routes. It sends everything to the Claw Code harness.

The Stack (Bottom to Top)

┌─────────────────────────────────────────────────────────────┐
│  USER INTERFACE (Telegram, CLI, Web)                        │
├─────────────────────────────────────────────────────────────┤
│  HERMES AGENT — The Fit Layer                               │
│  ├─ Receives input                                          │
│  ├─ Sends to Claw Code harness (no local reasoning)         │
│  ├─ Returns output to user                                  │
│  └─ NO INTELLIGENCE LAYER — Pure dispatch                   │
├─────────────────────────────────────────────────────────────┤
│  CLAW CODE HARNESS — The Runtime                            │
│  ├─ Tool registry and dispatch                              │
│  ├─ Context management                                      │
│  ├─ Session persistence                                     │
│  └─ Routes to appropriate backend                           │
├─────────────────────────────────────────────────────────────┤
│  GEMMA 4 — The Seed (Local, Multimodal, 26B MoE)            │
│  ├─ All reasoning happens here                              │
│  ├─ 26B quality, 4B active speed                            │
│  ├─ Apache 2.0 — truly open                                 │
│  └─ Cannot shrink without getting rubber                    │
└─────────────────────────────────────────────────────────────┘

Why This Architecture?

1. The Gemma 4 Insight

Gemma 4 is the inflection point:

Cannot be smaller without quality loss — rubber threshold hit
Cannot be smarter without scale — MoE is the path
26B MoE (4B active) = Quality of giant, speed of small
Apache 2.0 = No chains, no vendor, no cloud dependency

2. The Hermes Agent Transformation

Old: Hermes Agent = Full intelligence layer (Claude, GPT, etc) New: Hermes Agent = Fit layer — minimal, pure dispatch

Benefits:

Predictable: Same model, same behavior every time
Sovereign: No cloud calls, no API keys, no rate limits
Fast: Local inference, no network latency
Honest: Gemma 4 doesn't roleplay — it answers

3. The Claw Code Harness Role

The harness becomes the orchestration layer:

Tool registry (what can be done)
Context window management
Multi-turn conversation state
Backend selection (Gemma 4 variants)
Fallback handling

Component Specifications

Hermes Agent (Fit Layer)

# Hermes Agent config — MINIMAL
profile:
  name: allegro-fit
  type: dispatcher
  
intelligence:
  enabled: false  # NO local reasoning
  
dispatch:
  target: claw-code-harness
  timeout: 30s
  retry: 3
  
tools:
  allowed: all    # Pass through to harness
  local: []       # No local tool execution

Responsibilities:

Receive user input
Send to Claw Code harness
Return harness output to user
Handle errors gracefully
NO DECISIONS — just dispatch

Claw Code Harness

# Claw Code harness config
harness:
  version: 2.0
  
backend:
  primary: gemma4-26b-moe
  variants:
    - gemma4-2b      # Ultra-edge tasks
    - gemma4-4b      # Standard tasks
    - gemma4-31b     # Complex reasoning
    - gemma4-26b-moe # Default (efficiency)
    
routing:
  logic: |
    if task.complexity > 0.8: use gemma4-31b
    elif task.speed_critical: use gemma4-4b
    elif task.audio_required: use gemma4-2b
    else: use gemma4-26b-moe
    
tools:
  registry: /etc/claw/tools/
  auto_load: true
  
context:
  window: 8192
  summarization: auto

Responsibilities:

Receive dispatch from Hermes Agent
Route to appropriate Gemma 4 variant
Manage tool registry
Handle context windows
Return formatted response

Gemma 4 (The Seed)

# Gemma 4 local deployment
gemma4:
  default: 26b-moe
  
models:
  2b:
    path: /opt/models/gemma-4-2b-it-Q4_K_M.gguf
    gpu_layers: -1
    context: 8192
    
  4b:
    path: /opt/models/gemma-4-4b-it-Q4_K_M.gguf
    gpu_layers: -1
    context: 8192
    
  31b:
    path: /opt/models/gemma-4-31b-it-Q4_K_M.gguf
    gpu_layers: -1
    context: 8192
    
  26b-moe:
    path: /opt/models/gemma-4-26b-moe-Q4_K_M.gguf
    gpu_layers: -1
    context: 8192
    active_params: 4b
    
server:
  backend: llama.cpp  # or vllm for production
  host: localhost
  port: 8645
  threads: 8

Responsibilities:

ALL REASONING — The only intelligence layer
Tool use via function calling
Multimodal understanding
Code generation
Long context handling

The Skill Prototype

The Gemma 4 Hermes Profile (~/.hermes/profiles/gemma4/) is the skill prototype for this seed:

✅ All 4 variants configured
✅ All 6 platforms supported
✅ 3 complete skills (inference, multimodal, fine-tuning)
✅ Production tooling (download, benchmark, deploy)
✅ Apache 2.0 — truly open

This profile becomes the reference implementation for all Archons.

Deployment Architecture

Per-Archon Setup

Each Archon (Ezra, Electra, Allegro, etc):
├── Hermes Agent (fit layer)
│   └── ~/.hermes/profiles/{archon}/
│       └── config.yaml → dispatch to Claw Code
├── Claw Code Harness
│   └── /opt/claw/{archon}/
│       └── runtime.yaml → route to Gemma 4
└── Gemma 4 Seed
    └── Shared local instance OR dedicated GPU slice

Shared Infrastructure

House Infrastructure:
├── Gemma 4 Server (local)
│   ├── 26B MoE (default)
│   ├── 31B Dense (complex tasks)
│   └── 4B (fast tasks)
├── Gitea (code, issues, PRs)
├── Nostr (sovereign messaging)
└── Backblaze B2 (encrypted backup)

Gantt Chart: Implementation

Phase 1: Foundation (Week 1-2)
├── Day 1-3: Gemma 4 server deployment
│   ├── Download 26B MoE, 31B, 4B GGUFs
│   ├── llama.cpp server setup
│   └── vLLM production config
├── Day 4-7: Claw Code harness v2
│   ├── Remove old intelligence layer
│   ├── Add Gemma 4 routing
│   └── Tool registry refactor
└── Day 8-14: Hermes Agent fit layer
    ├── Strip local reasoning
    ├── Dispatch-only mode
    └── Error handling

Phase 2: Archon Deployment (Week 3-4)
├── Day 15-17: Ezra Archon
├── Day 18-20: Electra Archon
├── Day 21-23: Allegro Archon migration
└── Day 24-28: Testing & hardening

Phase 3: Integration (Week 5-6)
├── Day 29-32: Gitea webhook automation
├── Day 33-35: Telegram bot integration
├── Day 36-38: Nostr bridge
└── Day 39-42: Backup & monitoring

Acceptance Criteria

ID	Criteria	Test
A1	Gemma 4 26B MoE serves locally at >20 tok/s	Benchmark passed
A2	Hermes Agent has NO local intelligence	Config audit
A3	All queries route through Claw Code harness	Packet capture
A4	Tool use works via Gemma 4 function calling	Test suite
A5	Each Archon has independent fit layer	Multi-agent test
A6	Gitea issues auto-route to correct Archon	Webhook test
A7	Telegram bot responds via Gemma 4	E2E test
A8	No cloud AI calls in packet log	Network audit

Risk Analysis

Risk	Mitigation
Gemma 4 too slow	Use 4B for speed, 26B MoE for quality
Memory constraints	Quantization (Q4_K_M), GPU offloading
Tool use failures	Extensive prompt engineering, fallback chains
Context window limits	Automatic summarization, RAG
Hardware failure	Hot standby, B2 restore

The Philosophy

What cannot be moved.

Stapes is the immovable bone. This architecture is the immovable foundation:

Gemma 4 runs local — no cloud to fail
Apache 2.0 — no license to revoke
Fit layer Hermes — no vendor to change terms
Claw Code harness — no black box to debug

The seed is planted. It will grow.

References

Gemma 4 Profile: ~/.hermes/profiles/gemma4/
Claw Code Docs: /root/wizards/claw-code/docs/gemma4-integration.md
Hugging Face: https://huggingface.co/blog/gemma4

Epic Owner: Allegro
Status: In Progress
Priority: P0 — Foundation of All Future Work
Seed Date: 2026-04-02

9.4 KiB Raw Blame History

THE SEED ARCHITECTURE EPIC

Hermes Agent → Claw Code Harness → Gemma 4

Vision

Core Principle

The Stack (Bottom to Top)

Why This Architecture?

1. The Gemma 4 Insight

2. The Hermes Agent Transformation

3. The Claw Code Harness Role

Component Specifications

Hermes Agent (Fit Layer)

Claw Code Harness

Gemma 4 (The Seed)

The Skill Prototype

Deployment Architecture

Per-Archon Setup

Shared Infrastructure

Gantt Chart: Implementation

Acceptance Criteria

Risk Analysis

The Philosophy

References

9.4 KiB

Raw Blame History