diff --git a/docs/local-llm.md b/docs/local-llm.md
new file mode 100644
index 00000000..6fc5bcf3
--- /dev/null
+++ b/docs/local-llm.md
@@ -0,0 +1,54 @@
+# Local LLM Deployment Guide — llama.cpp Sovereign Inference
+
+llama.cpp provides sovereign, offline-capable inference on CPU, CUDA, and
+Apple Silicon. One binary, one model path, one health endpoint.
+
+## Quick Start
+
+    git clone https://github.com/ggerganov/llama.cpp.git
+    cd llama.cpp && cmake -B build && cmake --build build --config Release -j$(nproc)
+    sudo cp build/bin/llama-server /usr/local/bin/
+    mkdir -p /opt/models/llama
+    wget -O /opt/models/llama/Qwen2.5-7B-Instruct-Q4_K_M.gguf "https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-GGUF/resolve/main/qwen2.5-7b-instruct-q4_k_m.gguf"
+    llama-server -m /opt/models/llama/Qwen2.5-7B-Instruct-Q4_K_M.gguf --host 0.0.0.0 --port 11435 -c 4096 -t $(nproc) --cont-batching
+    curl http://localhost:11435/health
+
+## Model Path Convention
+
+- /opt/models/llama/ — Production (system-wide)
+- ~/models/llama/ — Per-user (dev)
+- MODEL_DIR env var — Override
+
+## Recommended Models
+
+- Qwen2.5-7B-Instruct (4.7GB, 8GB RAM, 25-40 tok/s) — Fleet standard
+- Qwen2.5-3B-Instruct (2.0GB, 4GB RAM, 50-80 tok/s) — VPS Beta
+- Mistral-7B-Instruct-v0.3 (4.4GB, 8GB RAM) — Alternative
+
+## Quantization Guide
+
+- Q6_K (5.5GB) — Best quality/speed, RAM > 12GB
+- Q4_K_M (4.7GB) — Fleet standard
+- Q3_K_M (3.4GB) — < 6GB RAM fallback
+
+## Hardware Targets
+
+- VPS Beta (2 vCPU, 4GB): Qwen2.5-3B-Q4_K_M, ctx 2048, ~40-60 tok/s
+- VPS Alpha (4 vCPU, 8GB): Qwen2.5-7B-Q4_K_M, ctx 4096, ~20-35 tok/s
+- Mac Apple Silicon: Qwen2.5-7B-Q6_K, Metal, ~30-50 tok/s
+
+## Health Check
+
+    curl -sf http://localhost:11435/health
+    curl -s http://localhost:11435/v1/models
+
+## API Compatibility
+
+llama-server exposes OpenAI-compatible API at /v1/chat/completions.
+
+## Troubleshooting
+
+- Won't start: use smaller model or lower quant
+- Slow: match -t to available cores
+- OOM: reduce -c context size
+- Port in use: lsof -i :11435