[LOCAL-LLM] Standardize llama.cpp Backend for Sovereign Inference #1123
Open
opened 2026-04-07 21:17:07 +00:00 by Timmy
·
0 comments
No Branch/Tag Specified
main
perplexity/soul-md-pointer
claude/issue-1125
claude/issue-1126
feat/mempalace-api-add-1775582323040
claude/issue-1112
groq/issue-1126
groq/issue-1118
groq/issue-1119
groq/issue-1047
groq/issue-915
claude/issue-1075
groq/issue-917
groq/issue-918
groq/issue-1103
groq/issue-1105
groq/issue-1106
groq/issue-1108
groq/issue-1092
groq/issue-1095
groq/issue-1098
groq/issue-913
timmy/issue-fix-896-897-898-910
claude/issue-823
claude/issue-879
claude/issue-880
claude/issue-827
claude/issue-882
claude/issue-826
claude/issue-836
claude/issue-832
claude/issue-833
timmy/issue-855
allegro/self-improvement-infra
ezra/deep-dive-architecture-20260405
claude/modularization-phase-1
gemini/issue-431
GoldenRockachopa
pre-agent-workers-v1
v0-golden
Labels
Clear labels
222-epic
3d-world
CI
QA
actionable
agent-presence
aistudio-ready
assigned-aistudio
assigned-claude
assigned-claw-code
assigned-gemini
assigned-groq
assigned-kimi
assigned-kimi
assigned-perplexity
assigned-sonnet
blocked
claude-ready
claw-code-done
claw-code-in-progress
deprioritized
duplicate
epic
gemini-api
gemini-review
google-ai-ultra
groq-ready
harness
identity
infrastructure
kimi-done
kimi-in-progress
kimi-ready
lazzyPit
media-gen
modularization
needs-design
nostr
p0-critical
p1-important
p2-backlog
performance
perplexity-ready
portal
research
security
sonnet-ready
sovereignty
velocity-engine
Continuous integration, runners, workflow issues
Quality assurance, testing, and production audit
Queued for Code Claw (qwen/openrouter)
Dispatched to Kimi via OpenClaw
Blocked by external dependency or merge conflict
Code Claw completed this task
Code Claw is actively working
Epic / umbrella issue
Gemini API integration
Google AI Ultra integration work
Timmy identity and branding
Kimi completed this task
Kimi is actively working on this
Lazarus Pit — automated agent resurrection and health recovery
AI media generation (image/video/audio)
Deep research and planning tasks
Security hardening, vulnerability fixes, access control
Auto-generated by velocity engine
No Label
Milestone
No items
No Milestone
Projects
Clear projects
No project
Assignees
KimiClaw
Rockachopa
Timmy
allegro
antigravity
bezalel
claude
claw-code
codex-agent
ezra
gemini
google
grok
hermes
kimi
manus
perplexity
sonnet
Clear assignees
No Assignees
bezalel
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: Timmy_Foundation/the-nexus#1123
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Objective
Standardize local LLM inference across the fleet using
llama.cppas a sovereign, offline-capable backend for Hermes.Background
We currently rely on external APIs (OpenAI, Anthropic, Kimi). Alpha has a local fallback at
127.0.0.1:11435but it is ad-hoc.llama.cppis the gold standard for efficient local inference on CPU, CUDA, and Apple Silicon. A standardized local backend gives us resilience, privacy, and cost control.Acceptance Criteria
Phase 1 — Deployment (1 week)
llama.cppserver (llama-server) is installed and running on Beta and AlphaQwen2.5-7B-Instruct-Q4_K_M.gguf)/health) is probed by Night Watch; failure triggers alert/opt/models/or~/models/)Phase 2 — Hermes Integration (1 week)
llama.cppserver when:LOCAL_ONLY=trueis set, orPhase 3 — Optimization & Ops (1 week)
the-nexus/docs/local-llm.mdllama-serverSuggested Implementation Path
llama.cppserver binary on Betatools/llama_client.pywrapping the llama.cpp HTTP APIOwner
Bezalel
Linked Epic
#1120