feat: quality analysis — bug fixes, mobile tests, HITL checklist

Senior architect review findings + remediations: BUG FIX — critical interface mismatch - TimmyAirLLMAgent only exposed print_response(); dashboard route calls agent.run() → AttributeError when AirLLM backend is selected. Added run() → RunResult(content) as primary inference entry point; print_response() now delegates to run() so both call sites share one inference path. - Added RunResult dataclass for Agno-compatible structured return. BUG FIX — hardcoded model name in health status partial - health_status.html rendered literal "llama3.2" regardless of OLLAMA_MODEL env var. Route now passes settings.ollama_model to the template context; partial renders {{ model }} instead. FEATURE — /mobile-test HITL checklist page - 22 human-executable test scenarios across: Layout, Touch & Input, Chat behaviour, Health, Scroll, Notch/Home Bar, Live UI. - Pass/Fail/Skip buttons with sessionStorage state persistence. - Live progress bar + final score summary. - TEST link added to Mission Control header for quick access on phone. TEST — 32 new automated mobile quality tests (M1xx–M6xx) - M1xx: viewport/meta tags (8 tests) - M2xx: touch target sizing — 44 px min-height, manipulation (4 tests) - M3xx: iOS zoom prevention, autocapitalize, enterkeyhint (5 tests) - M4xx: HTMX robustness — hx-sync drop, disabled-elt, polling (5 tests) - M5xx: safe-area insets, overscroll, dvh units (5 tests) - M6xx: AirLLM interface contract — run(), RunResult, delegation (5 tests) Total test count: 61 → 93 (all passing). https://claude.ai/code/session_01RBuRCBXZNkAQQXXGiJNDmt
2026-02-21 17:21:47 +00:00
parent 7499690e10
commit c8aa6a5fbb
9 changed files with 958 additions and 12 deletions
--- a/src/timmy/backends.py
+++ b/src/timmy/backends.py
@@ -1,14 +1,16 @@
 """AirLLM backend — only imported when the airllm extra is installed.

 Provides TimmyAirLLMAgent: a drop-in replacement for an Agno Agent that
-exposes the same print_response(message, stream) surface while routing
-inference through AirLLM.  On Apple Silicon (arm64 Darwin) the MLX backend
-is selected automatically; everywhere else AutoModel (PyTorch) is used.
+exposes both the run(message, stream) → RunResult interface used by the
+dashboard and the print_response(message, stream) interface used by the CLI.
+On Apple Silicon (arm64 Darwin) the MLX backend is selected automatically;
+everywhere else AutoModel (PyTorch) is used.

 No cloud.  No telemetry.  Sats are sovereignty, boss.
 """

 import platform
+from dataclasses import dataclass
 from typing import Literal

 from timmy.prompts import TIMMY_SYSTEM_PROMPT
@@ -23,6 +25,12 @@ _AIRLLM_MODELS: dict[str, str] = {
 ModelSize = Literal["8b", "70b", "405b"]


+@dataclass
+class RunResult:
+    """Minimal Agno-compatible run result — carries the model's response text."""
+    content: str
+
+
 def is_apple_silicon() -> bool:
    """Return True when running on an M-series Mac (arm64 Darwin)."""
    return platform.system() == "Darwin" and platform.machine() == "arm64"
@@ -38,7 +46,11 @@ def airllm_available() -> bool:


 class TimmyAirLLMAgent:
-    """Thin AirLLM wrapper with the same print_response interface as Agno Agent.
+    """Thin AirLLM wrapper compatible with both dashboard and CLI call sites.
+
+    Exposes:
+      run(message, stream)           → RunResult(content=...)  [dashboard]
+      print_response(message, stream) → None                   [CLI]

    Maintains a rolling 10-turn in-memory history so Timmy remembers the
    conversation within a session — no SQLite needed at this layer.
@@ -64,12 +76,11 @@ class TimmyAirLLMAgent:

    # ── public interface (mirrors Agno Agent) ────────────────────────────────

-    def print_response(self, message: str, *, stream: bool = True) -> None:
-        """Run inference, update history, and render the response to stdout.
+    def run(self, message: str, *, stream: bool = False) -> RunResult:
+        """Run inference and return a structured result (matches Agno Agent.run()).

-        `stream` is accepted for API compatibility but AirLLM generates the
-        full output in one pass — the result is still printed as soon as it
-        is ready.
+        `stream` is accepted for API compatibility; AirLLM always generates
+        the full output in one pass.
        """
        prompt = self._build_prompt(message)

@@ -97,7 +108,12 @@ class TimmyAirLLMAgent:
        self._history.append(f"User: {message}")
        self._history.append(f"Timmy: {response}")

-        self._render(response)
+        return RunResult(content=response)
+
+    def print_response(self, message: str, *, stream: bool = True) -> None:
+        """Run inference and render the response to stdout (CLI interface)."""
+        result = self.run(message, stream=stream)
+        self._render(result.content)

    # ── private helpers ──────────────────────────────────────────────────────