[Infra] Local inference fallback — degraded but never dark #914

New Issue

Timmy · 2026-03-22T13:24:01Z

Timmy commented

2026-03-22 13:24:01 +00:00

Problem

Even with Hermes running on the VPS, Timmy depends on external API calls (Anthropic) to think. If the API key expires, quota runs out, or Anthropic has an outage, Timmy goes dark again.

Solution

Run a small local model on the VPS as a fallback. When the primary model (Claude) is unreachable, Timmy drops to a smaller local model. Degraded intelligence, but never silent.

Options

Model	RAM	Quality	Notes
Qwen 2.5 1.5B (GGUF Q4)	~2GB	Basic	Fits on VPS, fast
Phi-3 Mini 3.8B (GGUF Q4)	~3GB	Decent	May be tight on VPS RAM
TinyLlama 1.1B	~1GB	Minimal	Very fast, limited reasoning

Behavior

Primary: Claude via Anthropic API (full capability)
Fallback: Local model via llama.cpp (limited but present)
Timmy should tell the user when running in fallback mode: "I'm running on a smaller brain right now — Alexander's API connection is down. I can still chat but I'm not as sharp."
Auto-switch back when primary is available

Requirements

Install llama-cpp-python on VPS
Download a small GGUF model
Add fallback routing in Hermes API server config
Add status indicator in Workshop panel (which model is active)
Test failover and failback

Priority

Medium — nice to have after VPS migration is done. The VPS + Anthropic API is already much more reliable than the Mac dependency. This is the belt-and-suspenders layer.

References

Memory note: Timmy runs on Hermes harness, SOUL.md says "If I have four [GB], I think with a smaller one. I never refuse to work because my resources are limited."

## Problem Even with Hermes running on the VPS, Timmy depends on external API calls (Anthropic) to think. If the API key expires, quota runs out, or Anthropic has an outage, Timmy goes dark again. ## Solution Run a small local model on the VPS as a fallback. When the primary model (Claude) is unreachable, Timmy drops to a smaller local model. Degraded intelligence, but never silent. ## Options | Model | RAM | Quality | Notes | |-------|-----|---------|-------| | Qwen 2.5 1.5B (GGUF Q4) | ~2GB | Basic | Fits on VPS, fast | | Phi-3 Mini 3.8B (GGUF Q4) | ~3GB | Decent | May be tight on VPS RAM | | TinyLlama 1.1B | ~1GB | Minimal | Very fast, limited reasoning | ## Behavior - Primary: Claude via Anthropic API (full capability) - Fallback: Local model via llama.cpp (limited but present) - Timmy should **tell the user** when running in fallback mode: "I'm running on a smaller brain right now — Alexander's API connection is down. I can still chat but I'm not as sharp." - Auto-switch back when primary is available ## Requirements - [ ] Install llama-cpp-python on VPS - [ ] Download a small GGUF model - [ ] Add fallback routing in Hermes API server config - [ ] Add status indicator in Workshop panel (which model is active) - [ ] Test failover and failback ## Priority **Medium** — nice to have after VPS migration is done. The VPS + Anthropic API is already much more reliable than the Mac dependency. This is the belt-and-suspenders layer. ## References - Memory note: Timmy runs on Hermes harness, SOUL.md says "If I have four [GB], I think with a smaller one. I never refuse to work because my resources are limited."

gemini was assigned by Rockachopa

2026-03-22 23:31:57 +00:00

perplexity referenced this issue

2026-03-23 13:52:44 +00:00

Enhancement: Secure "Air-Gapped" Mode #1016

claude added the harness inference p0-critical labels 2026-03-23 13:52:56 +00:00

gemini was unassigned by Timmy

2026-03-24 19:33:52 +00:00

Timmy closed this issue

2026-03-24 21:54:47 +00:00

Sign in to join this conversation.

Branches Tags

main

gemini/issue-892

claude/issue-1342

claude/issue-1346

claude/issue-1351

claude/issue-1340

fix/test-llm-triage-syntax

gemini/issue-1014

gemini/issue-932

claude/issue-1277

claude/issue-1139

claude/issue-870

claude/issue-1285

claude/issue-1292

claude/issue-1281

claude/issue-917

claude/issue-1275

claude/issue-925

claude/issue-1019

claude/issue-1094

claude/issue-1019-v3

fix/flaky-vassal-xdist-tests

fix/test-config-env-isolation

claude/issue-1019-v2

claude/issue-957-v2

claude/issue-1218

claude/issue-1217

test/chat-store-unit-tests

claude/issue-1191

claude/issue-1186

claude/issue-957

gemini/issue-936

claude/issue-1065

gemini/issue-976

gemini/issue-1149

claude/issue-1135

claude/issue-1064

gemini/issue-1012

claude/issue-1095

claude/issue-1102

claude/issue-1114

gemini/issue-978

gemini/issue-971

claude/issue-1074

claude/issue-987

claude/issue-1011

feature/internal-monologue

feature/issue-1006

feature/issue-1007

feature/issue-1008

feature/issue-1009

feature/issue-1010

feature/issue-1011

feature/issue-1012

feature/issue-1013

feature/issue-1014

feature/issue-981

feature/issue-982

feature/issue-983

feature/issue-984

feature/issue-985

feature/issue-986

feature/issue-987

feature/issue-993

claude/issue-943

claude/issue-975

claude/issue-989

claude/issue-988

fix/loop-guard-gitea-api-and-queue-validation

feature/lhf-tech-debt-fixes

kimi/issue-753

kimi/issue-714

kimi/issue-716

fix/csrf-check-before-execute

chore/migrate-gitea-to-vps

kimi/issue-640

fix/utcnow-calm-py

kimi/issue-635

kimi/issue-625

fix/router-api-truncated-param

kimi/issue-604

kimi/issue-594

review-fixes

kimi/issue-570

kimi/issue-554

kimi/issue-539

kimi/issue-540

feature/ipad-v1-api

kimi/issue-506

kimi/issue-512

refactor/airllm-doc-cleanup

kimi/issue-513

kimi/issue-514

kimi/issue-500

kimi/issue-492

kimi/issue-490

kimi/issue-459

kimi/issue-472

kimi/issue-473

kimi/issue-462

kimi/issue-463

kimi/issue-454

kimi/issue-445

kimi/issue-446

kimi/issue-431

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#914