[claude] feat: add vLLM as alternative inference backend (#1281) #1300
Closed
claude
wants to merge 1 commits from
claude/issue-1281 into main
pull from: claude/issue-1281
merge into: Rockachopa:main
Rockachopa:main
Rockachopa:gemini/issue-892
Rockachopa:claude/issue-1342
Rockachopa:claude/issue-1346
Rockachopa:claude/issue-1351
Rockachopa:claude/issue-1340
Rockachopa:fix/test-llm-triage-syntax
Rockachopa:gemini/issue-1014
Rockachopa:gemini/issue-932
Rockachopa:claude/issue-1277
Rockachopa:claude/issue-1139
Rockachopa:claude/issue-870
Rockachopa:claude/issue-1285
Rockachopa:claude/issue-1292
Rockachopa:claude/issue-917
Rockachopa:claude/issue-1275
Rockachopa:claude/issue-925
Rockachopa:claude/issue-1019
Rockachopa:claude/issue-1094
Rockachopa:claude/issue-1019-v3
Rockachopa:fix/flaky-vassal-xdist-tests
Rockachopa:fix/test-config-env-isolation
Rockachopa:claude/issue-1019-v2
Rockachopa:claude/issue-957-v2
Rockachopa:claude/issue-1218
Rockachopa:claude/issue-1217
Rockachopa:test/chat-store-unit-tests
Rockachopa:claude/issue-1191
Rockachopa:claude/issue-1186
Rockachopa:claude/issue-957
Rockachopa:gemini/issue-936
Rockachopa:claude/issue-1065
Rockachopa:gemini/issue-976
Rockachopa:gemini/issue-1149
Rockachopa:claude/issue-1135
Rockachopa:claude/issue-1064
Rockachopa:gemini/issue-1012
Rockachopa:claude/issue-1095
Rockachopa:claude/issue-1102
Rockachopa:claude/issue-1114
Rockachopa:gemini/issue-978
Rockachopa:gemini/issue-971
Rockachopa:claude/issue-1074
Rockachopa:claude/issue-987
Rockachopa:claude/issue-1011
Rockachopa:feature/internal-monologue
Rockachopa:feature/issue-1006
Rockachopa:feature/issue-1007
Rockachopa:feature/issue-1008
Rockachopa:feature/issue-1009
Rockachopa:feature/issue-1010
Rockachopa:feature/issue-1011
Rockachopa:feature/issue-1012
Rockachopa:feature/issue-1013
Rockachopa:feature/issue-1014
Rockachopa:feature/issue-981
Rockachopa:feature/issue-982
Rockachopa:feature/issue-983
Rockachopa:feature/issue-984
Rockachopa:feature/issue-985
Rockachopa:feature/issue-986
Rockachopa:feature/issue-987
Rockachopa:feature/issue-993
Rockachopa:claude/issue-943
Rockachopa:claude/issue-975
Rockachopa:claude/issue-989
Rockachopa:claude/issue-988
Rockachopa:fix/loop-guard-gitea-api-and-queue-validation
Rockachopa:feature/lhf-tech-debt-fixes
Rockachopa:kimi/issue-753
Rockachopa:kimi/issue-714
Rockachopa:kimi/issue-716
Rockachopa:fix/csrf-check-before-execute
Rockachopa:chore/migrate-gitea-to-vps
Rockachopa:kimi/issue-640
Rockachopa:fix/utcnow-calm-py
Rockachopa:kimi/issue-635
Rockachopa:kimi/issue-625
Rockachopa:fix/router-api-truncated-param
Rockachopa:kimi/issue-604
Rockachopa:kimi/issue-594
Rockachopa:review-fixes
Rockachopa:kimi/issue-570
Rockachopa:kimi/issue-554
Rockachopa:kimi/issue-539
Rockachopa:kimi/issue-540
Rockachopa:feature/ipad-v1-api
Rockachopa:kimi/issue-506
Rockachopa:kimi/issue-512
Rockachopa:refactor/airllm-doc-cleanup
Rockachopa:kimi/issue-513
Rockachopa:kimi/issue-514
Rockachopa:kimi/issue-500
Rockachopa:kimi/issue-492
Rockachopa:kimi/issue-490
Rockachopa:kimi/issue-459
Rockachopa:kimi/issue-472
Rockachopa:kimi/issue-473
Rockachopa:kimi/issue-462
Rockachopa:kimi/issue-463
Rockachopa:kimi/issue-454
Rockachopa:kimi/issue-445
Rockachopa:kimi/issue-446
Rockachopa:kimi/issue-431
1 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
28d1905df4 |
feat: add vLLM as alternative inference backend (#1281)
Adds vLLM (high-throughput OpenAI-compatible inference server) as a selectable backend alongside the existing Ollama and vllm-mlx backends. vLLM's continuous batching gives 3-10x throughput for agentic workloads. Changes: - config.py: add `vllm` to timmy_model_backend Literal; add vllm_url / vllm_model settings (VLLM_URL / VLLM_MODEL env vars) - cascade.py: add vllm provider type with _check_provider_available (hits /health) and _call_vllm (OpenAI-compatible completions) - providers.yaml: add disabled-by-default vllm-local provider (priority 3, port 8001); bump OpenAI/Anthropic backup priorities to 4/5 - health.py: add _check_vllm/_check_vllm_sync with 30-second TTL cache; /health and /health/sovereignty reflect vLLM status when it is the active backend - docker-compose.yml: add vllm service behind 'vllm' profile (GPU passthrough commented-out template included); add vllm-cache volume - CLAUDE.md: add vLLM row to Service Fallback Matrix - tests: 26 new unit tests covering availability checks, _call_vllm, providers.yaml validation, config options, and health helpers Graceful fallback: if vLLM is unavailable the cascade router automatically falls back to Ollama. The app never crashes. Fixes #1281 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |