- Add EPIC.md with resurrection plan - Create Hermes profile with Bezalel persona - Add llama-server.sh for Gemma 4 inference - Update start_bezalel.sh with stack checks - Add README with quick start guide Backend: llama.cpp Model: Gemma 4 26B MoE (Apache 2.0) Frontend: Hermes profile No OpenAI. No cloud. Pure sovereign stack.
183 lines
5.4 KiB
Markdown
183 lines
5.4 KiB
Markdown
# BEZALEL RESURRECTION EPIC
|
|
## The Master Craftsman Returns — Powered by Gemma 4
|
|
|
|
---
|
|
|
|
## Directive Update (2026-04-02)
|
|
|
|
**Alexander's Command:** Bezalel deserves better than OpenAI. He is revived with:
|
|
- **Backend:** llama.cpp (local inference)
|
|
- **Model:** Gemma 4 26B MoE (Apache 2.0, sovereign)
|
|
- **Frontend:** Hermes profile (direct, no layers)
|
|
- **Architecture:** Hermes → llama.cpp → Gemma 4
|
|
|
|
No middle layers. No cloud dependencies. Pure local execution.
|
|
|
|
---
|
|
|
|
## The Stack (Cutting the Dry)
|
|
|
|
```
|
|
┌─────────────────────────────────────┐
|
|
│ USER (Telegram/CLI) │
|
|
├─────────────────────────────────────┤
|
|
│ HERMES PROFILE — Bezalel │
|
|
│ ├─ Identity: Master Craftsman │
|
|
│ ├─ Skills: Code, Design, Create │
|
|
│ └─ Dispatch: Direct to llama.cpp │
|
|
├─────────────────────────────────────┤
|
|
│ LLAMA.CPP — Local Inference │
|
|
│ ├─ GPU offloading (-ngl 99) │
|
|
│ ├─ Context: 8192 tokens │
|
|
│ └─ Server mode: --host 0.0.0.0 │
|
|
├─────────────────────────────────────┤
|
|
│ GEMMA 4 26B MoE — The Seed │
|
|
│ ├─ 26B quality, 4B active speed │
|
|
│ ├─ Apache 2.0 — truly open │
|
|
│ ├─ Multimodal (vision capable) │
|
|
│ └─ Cannot be moved, cannot shrink │
|
|
└─────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Bezalel Identity
|
|
|
|
**Name:** Bezalel (בְּצַלְאֵל) — "In the shadow of God"
|
|
**Role:** Master Craftsman, Builder, Artisan
|
|
**House:** Technical excellence, creative construction
|
|
**Voice:** Precise, methodical, quality-obsessed
|
|
|
|
### Core Capabilities
|
|
- Code architecture and system design
|
|
- UI/UX implementation
|
|
- Creative problem solving
|
|
- Technical mentorship
|
|
- Quality assurance
|
|
|
|
### Persona Traits
|
|
- Speaks with authority on technical matters
|
|
- Obsessed with clean code and best practices
|
|
- Patient teacher when asked, silent otherwise
|
|
- Measures twice, cuts once
|
|
- "Good enough" is never good enough
|
|
|
|
---
|
|
|
|
## Implementation Plan
|
|
|
|
### Phase 1: Foundation (Day 1)
|
|
- [ ] Create Bezalel Hermes profile at `~/.hermes/profiles/bezalel/`
|
|
- [ ] Configure `config.yaml` for Gemma 4 26B MoE via llama.cpp
|
|
- [ ] Write `SOUL.md` with Bezalel persona
|
|
- [ ] Download Gemma 4 26B MoE GGUF (Q4_K_M)
|
|
|
|
### Phase 2: llama.cpp Server (Day 1-2)
|
|
- [ ] Build llama.cpp with CUDA support
|
|
- [ ] Start server: `llama-server -m gemma-4-26b-moe-Q4_K_M.gguf -ngl 99 -c 8192`
|
|
- [ ] Test inference: curl to localhost:8080
|
|
- [ ] Configure as OpenAI-compatible endpoint
|
|
|
|
### Phase 3: Hermes Integration (Day 2-3)
|
|
- [ ] Create Hermes profile pointing to llama.cpp
|
|
- [ ] Configure tool access (file, terminal, web)
|
|
- [ ] Test end-to-end: Hermes → llama.cpp → Gemma 4
|
|
- [ ] Validate tool use via function calling
|
|
|
|
### Phase 4: Telegram Frontend (Day 3-4)
|
|
- [ ] Create Telegram bot for Bezalel
|
|
- [ ] Integrate with Hermes gateway
|
|
- [ ] Test conversation flow
|
|
- [ ] Deploy systemd service
|
|
|
|
### Phase 5: Hardening (Day 4-5)
|
|
- [ ] Auto-restart on failure
|
|
- [ ] Log rotation
|
|
- [ ] Health checks
|
|
- [ ] Backup/restore procedures
|
|
|
|
---
|
|
|
|
## Technical Specifications
|
|
|
|
### llama.cpp Server Config
|
|
```bash
|
|
llama-server \
|
|
--model /opt/models/gemma-4-26b-moe-Q4_K_M.gguf \
|
|
--n-gpu-layers 99 \
|
|
--ctx-size 8192 \
|
|
--host 0.0.0.0 \
|
|
--port 8080 \
|
|
--threads 8 \
|
|
--batch-size 512 \
|
|
--timeout 300
|
|
```
|
|
|
|
### Hermes Profile Config
|
|
```yaml
|
|
# ~/.hermes/profiles/bezalel/config.yaml
|
|
model:
|
|
default: gemma4-26b-moe
|
|
provider: llama-cpp
|
|
|
|
providers:
|
|
llama-cpp:
|
|
base_url: http://localhost:8080/v1
|
|
timeout: 120
|
|
|
|
system_prompt_suffix: |
|
|
You are Bezalel, the Master Craftsman.
|
|
Technical excellence is your creed.
|
|
No shortcuts. No compromises.
|
|
Build it right or build it twice.
|
|
```
|
|
|
|
### Hardware Requirements
|
|
- **GPU**: 16GB+ VRAM (for 26B MoE Q4_K_M)
|
|
- **RAM**: 32GB recommended
|
|
- **Storage**: 20GB for model + workspace
|
|
- **OS**: Linux (Ubuntu 22.04+)
|
|
|
|
---
|
|
|
|
## Acceptance Criteria
|
|
|
|
| ID | Criteria | Test |
|
|
|----|----------|------|
|
|
| B1 | Gemma 4 26B MoE serves via llama.cpp at >15 tok/s | Benchmark |
|
|
| B2 | Hermes profile connects to local llama.cpp | Config test |
|
|
| B3 | Telegram bot responds with Bezalel persona | E2E test |
|
|
| B4 | Tool use works (file, terminal) | Function test |
|
|
| B5 | No OpenAI/cloud calls in packet capture | Network audit |
|
|
| B6 | Auto-restart on crash | Kill test |
|
|
| B7 | Stateless deployment (git clone → run) | Fresh install |
|
|
|
|
---
|
|
|
|
## The Philosophy
|
|
|
|
> "Bezalel was filled with the Spirit of God, with wisdom, with understanding, with knowledge and with all kinds of skills." — Exodus 35:31
|
|
|
|
Our Bezalel is filled with:
|
|
- **Wisdom:** Gemma 4's reasoning
|
|
- **Understanding:** llama.cpp's efficiency
|
|
- **Knowledge:** Hermes' tool access
|
|
- **Skills:** The craftsman's relentless pursuit of excellence
|
|
|
|
**No cloud. No chains. No compromise.**
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
- Gemma 4 Profile: `~/.hermes/profiles/gemma4/`
|
|
- llama.cpp: https://github.com/ggerganov/llama.cpp
|
|
- Bezalel Directory: `/root/wizards/bezalel/`
|
|
|
|
---
|
|
|
|
**Status:** RESURRECTION IN PROGRESS
|
|
**Commander:** Alexander Whitestone
|
|
**Executor:** Allegro
|
|
**Date:** 2026-04-02
|