feat: integrate AirLLM as optional high-performance backend

Adds the `bigbrain` optional dependency group (airllm>=2.9.0) and a complete second inference path that runs 8B / 70B / 405B Llama models locally via layer-by-layer loading — no GPU required, no cloud, fully sovereign. Key changes: - src/timmy/backends.py — TimmyAirLLMAgent (same print_response interface as Agno Agent); auto-selects AirLLMMLX on Apple Silicon, AutoModel (PyTorch) everywhere else - src/timmy/agent.py — _resolve_backend() routing with explicit override, env-config, and 'auto' Apple-Silicon detection - src/timmy/cli.py — --backend / --model-size flags on all commands - src/config.py — timmy_model_backend + airllm_model_size settings - src/timmy/prompts.py — mentions AirLLM "even bigger brains, still fully sovereign" - pyproject.toml — bigbrain optional dep; wheel includes updated - .env.example — TIMMY_MODEL_BACKEND + AIRLLM_MODEL_SIZE docs - tests/conftest.py — stubs 'airllm' module so tests run without GPU - tests/test_backends.py — 13 new tests covering helpers + TimmyAirLLMAgent - tests/test_agent.py — 7 new tests for backend routing - README.md — Big Brain section with one-line install - activate_self_tdd.sh — bootstrap script (venv + install + tests + watchdog + dashboard); --big-brain flag All 61 tests pass. Self-TDD watchdog unaffected. https://claude.ai/code/session_01DMjQ5qMZ8iHeyix1j3GS7c
2026-02-21 16:53:16 +00:00
parent 7619407b63
commit 19af4ae540
12 changed files with 601 additions and 13 deletions
--- a/src/config.py
+++ b/src/config.py
@@ -1,3 +1,5 @@
+from typing import Literal
+
 from pydantic_settings import BaseSettings, SettingsConfigDict


@@ -11,6 +13,18 @@ class Settings(BaseSettings):
    # Set DEBUG=true to enable /docs and /redoc (disabled by default)
    debug: bool = False

+    # ── AirLLM / backend selection ───────────────────────────────────────────
+    # "ollama"  — always use Ollama (default, safe everywhere)
+    # "airllm"  — always use AirLLM (requires pip install ".[bigbrain]")
+    # "auto"    — use AirLLM on Apple Silicon if airllm is installed,
+    #             fall back to Ollama otherwise
+    timmy_model_backend: Literal["ollama", "airllm", "auto"] = "ollama"
+
+    # AirLLM model size when backend is airllm or auto.
+    # Larger = smarter, but needs more RAM / disk.
+    # 8b  ~16 GB  |  70b  ~140 GB  |  405b  ~810 GB
+    airllm_model_size: Literal["8b", "70b", "405b"] = "70b"
+
    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",