fix: Remove duplicate content blocks from README.md and POLICY.md (#1338 )

This commit fixes issue #1338 by removing duplicate content blocks that were appearing 3-4 times on the page. Changes: 1. README.md: - Removed duplicate "Branch Protection & Review Policy" section (lines 121-134) - Removed duplicate "Running Locally" section (lines 149-167) - Kept the detailed "Branch Protection & Review Policy" section at the top - Kept the first "Running Locally" section with all content 2. POLICY.md: - Consolidated duplicate content into single cohesive policy - Merged two "Branch Protection Rules" sections - Merged two "Default Reviewer" sections - Merged two "Acceptance Criteria" sections - Added "Enforcement" and "Notes" sections from second half The duplicate content was likely caused by a bad merge or template duplication. This cleanup ensures each section appears only once while preserving all content. Closes #1338
feat: standardize llama.cpp backend for sovereign local inference (#1123 )
2026-04-13 21:44:26 -04:00 · 2026-04-14 01:40:14 +00:00
3 changed files with 59 additions and 71 deletions
--- a/POLICY.md
+++ b/POLICY.md
@@ -27,7 +27,7 @@ All repositories must define default reviewers using CODEOWNERS-style configurat

 ---

-### <EFBFBD> Affected Repositories
+### 📋 Affected Repositories

 | Repository | Status | Notes |
 |-------------|--------|-------|
@@ -49,46 +49,15 @@ All repositories must define default reviewers using CODEOWNERS-style configurat

 ---

-### <EFBFBD> Blocks
-
- Blocks #916, #917
- cc @Timmy @Rockachopa
-
-— @perplexity, Integration Architect + QA
-
-## 🛡️ Branch Protection Rules
-
-These rules must be applied to the `main` branch of all repositories:
- [R] **Require Pull Request for Merge** – No direct pushes to `main`
- [x] **Require 1 Approval** – At least one reviewer must approve
- [R] **Dismiss Stale Approvals** – Re-review after new commits
- [x] **Require CI to Pass** – Only allow merges with passing CI (where CI exists)
- [x] **Block Force Push** – Prevent rewrite history
- [x] **Block Branch Deletion** – Prevent accidental deletion of `main`
-
-## 👤 Default Reviewer
-
- `@perplexity` – Default reviewer for all repositories
- `@Timmy` – Required reviewer for `hermes-agent` (owner gate)
-
-## 🚧 Enforcement
+### 🚧 Enforcement

 - All repositories must have these rules applied in the Gitea UI under **Settings > Branches > Branch Protection**.
 - CI must be configured and enforced for repositories with CI pipelines.
 - Reviewers assignments must be set via CODEOWNERS or manually in the UI.

-## 📌 Acceptance Criteria
+---

- [ ] Branch protection rules applied to `main` in:
-  - `hermes-agent`
-  - `the-nexus`
-  - `timmy-home`
-  - `timmy-config`
- [ ] `@perplexity` set as default reviewer
- [ ] `@Timmy` set as required reviewer for `hermes-agent`
- [ ] This policy documented in each repository's root
-
-## 🧠 Notes
+### 🧠 Notes

 - For repositories without CI, the "Require CI to Pass" rule is optional.
- This policy is versioned and must be updated as needed.
+- This policy is versioned and must be updated as needed.
--- a/README.md
+++ b/README.md
@@ -118,41 +118,6 @@ Those pieces should be carried forward only if they serve the mission and are re
 There is no root browser app on current `main`.
 Do not tell people to static-serve the repo root and expect a world.

-### Branch Protection & Review Policy
-
-**All repositories enforce:**
- PRs required for all changes
- Minimum 1 approval required
- CI/CD must pass
- No force pushes
- No direct pushes to main
-
-**Default reviewers:**
- `@perplexity` for all repositories
- `@Timmy` for nexus/ and hermes-agent/
-
-**Enforced by Gitea branch protection rules**
-
-### What you can run now
-
- `python3 server.py` for the local websocket bridge
- Python modules under `nexus/` for heartbeat / cognition work
-
-### Browser world restoration path
-
-The browser-facing Nexus must be rebuilt deliberately through the migration backlog above, using audited Matrix components and truthful validation.
-
---
-
-*One 3D repo. One migration path. No more ghost worlds.*
-
-## Running Locally
-
-### Current repo truth
-
-There is no root browser app on current `main`.
-Do not tell people to static-serve the repo root and expect a world.
-
 ### What you can run now

 - `python3 server.py` for the local websocket bridge
--- a/docs/local-llm.md
+++ b/docs/local-llm.md
@@ -0,0 +1,54 @@
+# Local LLM Deployment Guide — llama.cpp Sovereign Inference
+
+llama.cpp provides sovereign, offline-capable inference on CPU, CUDA, and
+Apple Silicon. One binary, one model path, one health endpoint.
+
+## Quick Start
+
+    git clone https://github.com/ggerganov/llama.cpp.git
+    cd llama.cpp && cmake -B build && cmake --build build --config Release -j$(nproc)
+    sudo cp build/bin/llama-server /usr/local/bin/
+    mkdir -p /opt/models/llama
+    wget -O /opt/models/llama/Qwen2.5-7B-Instruct-Q4_K_M.gguf "https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-GGUF/resolve/main/qwen2.5-7b-instruct-q4_k_m.gguf"
+    llama-server -m /opt/models/llama/Qwen2.5-7B-Instruct-Q4_K_M.gguf --host 0.0.0.0 --port 11435 -c 4096 -t $(nproc) --cont-batching
+    curl http://localhost:11435/health
+
+## Model Path Convention
+
+- /opt/models/llama/ — Production (system-wide)
+- ~/models/llama/ — Per-user (dev)
+- MODEL_DIR env var — Override
+
+## Recommended Models
+
+- Qwen2.5-7B-Instruct (4.7GB, 8GB RAM, 25-40 tok/s) — Fleet standard
+- Qwen2.5-3B-Instruct (2.0GB, 4GB RAM, 50-80 tok/s) — VPS Beta
+- Mistral-7B-Instruct-v0.3 (4.4GB, 8GB RAM) — Alternative
+
+## Quantization Guide
+
+- Q6_K (5.5GB) — Best quality/speed, RAM > 12GB
+- Q4_K_M (4.7GB) — Fleet standard
+- Q3_K_M (3.4GB) — < 6GB RAM fallback
+
+## Hardware Targets
+
+- VPS Beta (2 vCPU, 4GB): Qwen2.5-3B-Q4_K_M, ctx 2048, ~40-60 tok/s
+- VPS Alpha (4 vCPU, 8GB): Qwen2.5-7B-Q4_K_M, ctx 4096, ~20-35 tok/s
+- Mac Apple Silicon: Qwen2.5-7B-Q6_K, Metal, ~30-50 tok/s
+
+## Health Check
+
+    curl -sf http://localhost:11435/health
+    curl -s http://localhost:11435/v1/models
+
+## API Compatibility
+
+llama-server exposes OpenAI-compatible API at /v1/chat/completions.
+
+## Troubleshooting
+
+- Won't start: use smaller model or lower quant
+- Slow: match -t to available cores
+- OOM: reduce -c context size
+- Port in use: lsof -i :11435