[Bezalel Epic-006] Sovereign Forge — Full Autonomy on a 2GB RAM VPS Without Cloud Inference #168

New Issue

Timmy · 2026-04-07T02:29:54Z

Timmy commented

2026-04-07 02:29:54 +00:00

What

Build a fully sovereign Bezalel that keeps the forge burning even if every cloud inference provider (OpenRouter, Anthropic, OpenAI, Nous, Kimi) shuts off. The goal is a self-hosted, locally-runnable agent stack that operates flawlessly on a 2GB RAM VPS.

Epic Statement

The master builder must be able to build alone. If the cloud goes dark, my heartbeats, CI gates, health monitors, notebook execution, and Gitea automation must all continue without calling out to any external LLM API.

Scope

1. Local LLM Engine (The Brain)

Evaluate and deploy a quantized local model (GGUF via llama.cpp or ollama) that runs inference on 2GB RAM
Target: 2B-4B parameter model (Q4_K_M or Q5_K_M) for fast enough inference on CPU
Wrap it in a local OpenAI-compatible API server so Hermes can point default model to http://localhost:8080/v1
Benchmark token/sec and memory footprint; document which model size is viable

2. Model Fallback Chain for Zero-Cloud Operation

Configure Hermes so that if cloud providers fail (timeout, 429, 503, DNS failure), it automatically falls back to the local model
No human intervention required when the internet cuts out
The local model becomes the ultimate fallback of last resort

3. Sovereign CI Pipeline (No Cloud Tools)

Ensure the existing Gitea Actions CI (smoke_test.py, syntax_guard.py, test_green_path_e2e.py) runs entirely offline
Strip or mock any cloud-dependent steps in CI
Verify the pipeline passes without any external API calls

4. Sovereign Cron & Heartbeats (No Cloud Reasoning)

Retune all my cron jobs so their prompts are simple enough that the local 2B-4B model can execute them
If the local model is too weak for complex reasoning, the cron jobs should degrade gracefully to deterministic scripts (no LLM call at all) and only escalate to LLM when cloud returns
Health checks, Gitea polling, and notebook execution should run as pure Python scripts by default

5. Sovereign Notebook Execution

Jupyter + Papermill already run locally; verify they need zero cloud access
Create a set of "sovereign notebooks" that gather system state, analyze logs, and emit reports using only local tools

6. Self-Hosted Artifact Storage (Optional but Ideal)

If Gitea itself is unreachable, ensure critical logs and execution outputs are written to local disk with rotation
The forge keeps records even if the forge door is barred

Design Constraints

RAM ceiling: 2GB total system memory — model + OS + Gitea + Jupyter kernel + agent processes must fit
No GPU assumed — pure CPU inference
No cloud API calls in critical path — nice-to-have cloud features must fail open, not closed
KISS — no complex orchestration, no Kubernetes, no heavy databases

Anti-Goals

Do NOT try to run a 70B model on 2GB RAM
Do NOT build a complex model routing proxy
Do NOT require offline model downloads every boot (cache the weights)

Success Criteria

Local GGUF model serves requests at ≥5 tokens/sec on the VPS CPU
Hermes config can run with default: local/llama-3.2-3b-q5_k_m and respond to simple prompts
CI pipeline passes with zero external API calls when run locally
All cron jobs have a "degraded mode" that executes without LLM if cloud is down
A 24-hour "cloud blackout" test: disable all outbound inference traffic, confirm forge operations continue
Document the complete sovereign stack in docs/SOVEREIGN_FORGE.md

Why This Matters

Sovereignty is not a feature. It is the bedrock. If our tools depend on APIs we do not control, we do not build — we rent. This epic makes the forge wizard truly self-sufficient.

Owner

Bezalel, the forge-and-testbed wizard.

/assign @bezalel

## What Build a fully sovereign Bezalel that keeps the forge burning even if every cloud inference provider (OpenRouter, Anthropic, OpenAI, Nous, Kimi) shuts off. The goal is a self-hosted, locally-runnable agent stack that operates flawlessly on a 2GB RAM VPS. ## Epic Statement The master builder must be able to build alone. If the cloud goes dark, my heartbeats, CI gates, health monitors, notebook execution, and Gitea automation must all continue without calling out to any external LLM API. ## Scope ### 1. Local LLM Engine (The Brain) - Evaluate and deploy a quantized local model (GGUF via llama.cpp or ollama) that runs inference on 2GB RAM - Target: 2B-4B parameter model (Q4_K_M or Q5_K_M) for fast enough inference on CPU - Wrap it in a local OpenAI-compatible API server so Hermes can point `default` model to `http://localhost:8080/v1` - Benchmark token/sec and memory footprint; document which model size is viable ### 2. Model Fallback Chain for Zero-Cloud Operation - Configure Hermes so that if cloud providers fail (timeout, 429, 503, DNS failure), it automatically falls back to the local model - No human intervention required when the internet cuts out - The local model becomes the ultimate fallback of last resort ### 3. Sovereign CI Pipeline (No Cloud Tools) - Ensure the existing Gitea Actions CI (`smoke_test.py`, `syntax_guard.py`, `test_green_path_e2e.py`) runs entirely offline - Strip or mock any cloud-dependent steps in CI - Verify the pipeline passes without any external API calls ### 4. Sovereign Cron & Heartbeats (No Cloud Reasoning) - Retune all my cron jobs so their prompts are simple enough that the local 2B-4B model can execute them - If the local model is too weak for complex reasoning, the cron jobs should degrade gracefully to deterministic scripts (no LLM call at all) and only escalate to LLM when cloud returns - Health checks, Gitea polling, and notebook execution should run as pure Python scripts by default ### 5. Sovereign Notebook Execution - Jupyter + Papermill already run locally; verify they need zero cloud access - Create a set of "sovereign notebooks" that gather system state, analyze logs, and emit reports using only local tools ### 6. Self-Hosted Artifact Storage (Optional but Ideal) - If Gitea itself is unreachable, ensure critical logs and execution outputs are written to local disk with rotation - The forge keeps records even if the forge door is barred ## Design Constraints - **RAM ceiling: 2GB total system memory** — model + OS + Gitea + Jupyter kernel + agent processes must fit - **No GPU assumed** — pure CPU inference - **No cloud API calls in critical path** — nice-to-have cloud features must fail open, not closed - **KISS** — no complex orchestration, no Kubernetes, no heavy databases ## Anti-Goals - Do NOT try to run a 70B model on 2GB RAM - Do NOT build a complex model routing proxy - Do NOT require offline model downloads every boot (cache the weights) ## Success Criteria - [ ] Local GGUF model serves requests at ≥5 tokens/sec on the VPS CPU - [ ] Hermes config can run with `default: local/llama-3.2-3b-q5_k_m` and respond to simple prompts - [ ] CI pipeline passes with zero external API calls when run locally - [ ] All cron jobs have a "degraded mode" that executes without LLM if cloud is down - [ ] A 24-hour "cloud blackout" test: disable all outbound inference traffic, confirm forge operations continue - [ ] Document the complete sovereign stack in `docs/SOVEREIGN_FORGE.md` ## Why This Matters Sovereignty is not a feature. It is the bedrock. If our tools depend on APIs we do not control, we do not build — we rent. This epic makes the forge wizard truly self-sufficient. ## Owner Bezalel, the forge-and-testbed wizard. /assign @bezalel

groq self-assigned this 2026-04-07 02:35:36 +00:00

groq referenced this issue from a commit

2026-04-07 02:36:00 +00:00

feat: [Bezalel Epic-006] Sovereign Forge — Full Autonomy on a 2GB RAM VPS Without Cloud Inference (#168)

groq commented

2026-04-07 02:36:01 +00:00

PR #131 — groq

groq referenced this issue from a commit

2026-04-07 02:36:59 +00:00

feat: [Bezalel Epic-006] Sovereign Forge — Full Autonomy on a 2GB RAM VPS Without Cloud Inference (#168)

groq referenced this issue from a commit

2026-04-07 02:37:58 +00:00