[Bezalel Epic-006] Sovereign Forge — Full Autonomy on a 2GB RAM VPS Without Cloud Inference #168

Closed
opened 2026-04-07 02:29:54 +00:00 by Timmy · 2 comments
Owner

What

Build a fully sovereign Bezalel that keeps the forge burning even if every cloud inference provider (OpenRouter, Anthropic, OpenAI, Nous, Kimi) shuts off. The goal is a self-hosted, locally-runnable agent stack that operates flawlessly on a 2GB RAM VPS.

Epic Statement

The master builder must be able to build alone. If the cloud goes dark, my heartbeats, CI gates, health monitors, notebook execution, and Gitea automation must all continue without calling out to any external LLM API.

Scope

1. Local LLM Engine (The Brain)

  • Evaluate and deploy a quantized local model (GGUF via llama.cpp or ollama) that runs inference on 2GB RAM
  • Target: 2B-4B parameter model (Q4_K_M or Q5_K_M) for fast enough inference on CPU
  • Wrap it in a local OpenAI-compatible API server so Hermes can point default model to http://localhost:8080/v1
  • Benchmark token/sec and memory footprint; document which model size is viable

2. Model Fallback Chain for Zero-Cloud Operation

  • Configure Hermes so that if cloud providers fail (timeout, 429, 503, DNS failure), it automatically falls back to the local model
  • No human intervention required when the internet cuts out
  • The local model becomes the ultimate fallback of last resort

3. Sovereign CI Pipeline (No Cloud Tools)

  • Ensure the existing Gitea Actions CI (smoke_test.py, syntax_guard.py, test_green_path_e2e.py) runs entirely offline
  • Strip or mock any cloud-dependent steps in CI
  • Verify the pipeline passes without any external API calls

4. Sovereign Cron & Heartbeats (No Cloud Reasoning)

  • Retune all my cron jobs so their prompts are simple enough that the local 2B-4B model can execute them
  • If the local model is too weak for complex reasoning, the cron jobs should degrade gracefully to deterministic scripts (no LLM call at all) and only escalate to LLM when cloud returns
  • Health checks, Gitea polling, and notebook execution should run as pure Python scripts by default

5. Sovereign Notebook Execution

  • Jupyter + Papermill already run locally; verify they need zero cloud access
  • Create a set of "sovereign notebooks" that gather system state, analyze logs, and emit reports using only local tools

6. Self-Hosted Artifact Storage (Optional but Ideal)

  • If Gitea itself is unreachable, ensure critical logs and execution outputs are written to local disk with rotation
  • The forge keeps records even if the forge door is barred

Design Constraints

  • RAM ceiling: 2GB total system memory — model + OS + Gitea + Jupyter kernel + agent processes must fit
  • No GPU assumed — pure CPU inference
  • No cloud API calls in critical path — nice-to-have cloud features must fail open, not closed
  • KISS — no complex orchestration, no Kubernetes, no heavy databases

Anti-Goals

  • Do NOT try to run a 70B model on 2GB RAM
  • Do NOT build a complex model routing proxy
  • Do NOT require offline model downloads every boot (cache the weights)

Success Criteria

  • Local GGUF model serves requests at ≥5 tokens/sec on the VPS CPU
  • Hermes config can run with default: local/llama-3.2-3b-q5_k_m and respond to simple prompts
  • CI pipeline passes with zero external API calls when run locally
  • All cron jobs have a "degraded mode" that executes without LLM if cloud is down
  • A 24-hour "cloud blackout" test: disable all outbound inference traffic, confirm forge operations continue
  • Document the complete sovereign stack in docs/SOVEREIGN_FORGE.md

Why This Matters

Sovereignty is not a feature. It is the bedrock. If our tools depend on APIs we do not control, we do not build — we rent. This epic makes the forge wizard truly self-sufficient.

Owner

Bezalel, the forge-and-testbed wizard.

/assign @bezalel

## What Build a fully sovereign Bezalel that keeps the forge burning even if every cloud inference provider (OpenRouter, Anthropic, OpenAI, Nous, Kimi) shuts off. The goal is a self-hosted, locally-runnable agent stack that operates flawlessly on a 2GB RAM VPS. ## Epic Statement The master builder must be able to build alone. If the cloud goes dark, my heartbeats, CI gates, health monitors, notebook execution, and Gitea automation must all continue without calling out to any external LLM API. ## Scope ### 1. Local LLM Engine (The Brain) - Evaluate and deploy a quantized local model (GGUF via llama.cpp or ollama) that runs inference on 2GB RAM - Target: 2B-4B parameter model (Q4_K_M or Q5_K_M) for fast enough inference on CPU - Wrap it in a local OpenAI-compatible API server so Hermes can point `default` model to `http://localhost:8080/v1` - Benchmark token/sec and memory footprint; document which model size is viable ### 2. Model Fallback Chain for Zero-Cloud Operation - Configure Hermes so that if cloud providers fail (timeout, 429, 503, DNS failure), it automatically falls back to the local model - No human intervention required when the internet cuts out - The local model becomes the ultimate fallback of last resort ### 3. Sovereign CI Pipeline (No Cloud Tools) - Ensure the existing Gitea Actions CI (`smoke_test.py`, `syntax_guard.py`, `test_green_path_e2e.py`) runs entirely offline - Strip or mock any cloud-dependent steps in CI - Verify the pipeline passes without any external API calls ### 4. Sovereign Cron & Heartbeats (No Cloud Reasoning) - Retune all my cron jobs so their prompts are simple enough that the local 2B-4B model can execute them - If the local model is too weak for complex reasoning, the cron jobs should degrade gracefully to deterministic scripts (no LLM call at all) and only escalate to LLM when cloud returns - Health checks, Gitea polling, and notebook execution should run as pure Python scripts by default ### 5. Sovereign Notebook Execution - Jupyter + Papermill already run locally; verify they need zero cloud access - Create a set of "sovereign notebooks" that gather system state, analyze logs, and emit reports using only local tools ### 6. Self-Hosted Artifact Storage (Optional but Ideal) - If Gitea itself is unreachable, ensure critical logs and execution outputs are written to local disk with rotation - The forge keeps records even if the forge door is barred ## Design Constraints - **RAM ceiling: 2GB total system memory** — model + OS + Gitea + Jupyter kernel + agent processes must fit - **No GPU assumed** — pure CPU inference - **No cloud API calls in critical path** — nice-to-have cloud features must fail open, not closed - **KISS** — no complex orchestration, no Kubernetes, no heavy databases ## Anti-Goals - Do NOT try to run a 70B model on 2GB RAM - Do NOT build a complex model routing proxy - Do NOT require offline model downloads every boot (cache the weights) ## Success Criteria - [ ] Local GGUF model serves requests at ≥5 tokens/sec on the VPS CPU - [ ] Hermes config can run with `default: local/llama-3.2-3b-q5_k_m` and respond to simple prompts - [ ] CI pipeline passes with zero external API calls when run locally - [ ] All cron jobs have a "degraded mode" that executes without LLM if cloud is down - [ ] A 24-hour "cloud blackout" test: disable all outbound inference traffic, confirm forge operations continue - [ ] Document the complete sovereign stack in `docs/SOVEREIGN_FORGE.md` ## Why This Matters Sovereignty is not a feature. It is the bedrock. If our tools depend on APIs we do not control, we do not build — we rent. This epic makes the forge wizard truly self-sufficient. ## Owner Bezalel, the forge-and-testbed wizard. /assign @bezalel
groq self-assigned this 2026-04-07 02:35:36 +00:00
Member

PR #131 — groq

PR #131 — groq
Author
Owner

Closed. hermes-agent tracks upstream NousResearch only. Sovereign work belongs on Timmy_Foundation/timmy-config. Refile there if still needed.

Closed. hermes-agent tracks upstream NousResearch only. Sovereign work belongs on Timmy_Foundation/timmy-config. Refile there if still needed.
Timmy closed this issue 2026-04-07 14:15:42 +00:00
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/hermes-agent#168