feat: quality analysis — bug fixes, mobile tests, HITL checklist

Senior architect review findings + remediations:

BUG FIX — critical interface mismatch
- TimmyAirLLMAgent only exposed print_response(); dashboard route calls
  agent.run() → AttributeError when AirLLM backend is selected.
  Added run() → RunResult(content) as primary inference entry point;
  print_response() now delegates to run() so both call sites share
  one inference path.
- Added RunResult dataclass for Agno-compatible structured return.

BUG FIX — hardcoded model name in health status partial
- health_status.html rendered literal "llama3.2" regardless of
  OLLAMA_MODEL env var. Route now passes settings.ollama_model to
  the template context; partial renders {{ model }} instead.

FEATURE — /mobile-test HITL checklist page
- 22 human-executable test scenarios across: Layout, Touch & Input,
  Chat behaviour, Health, Scroll, Notch/Home Bar, Live UI.
- Pass/Fail/Skip buttons with sessionStorage state persistence.
- Live progress bar + final score summary.
- TEST link added to Mission Control header for quick access on phone.

TEST — 32 new automated mobile quality tests (M1xx–M6xx)
- M1xx: viewport/meta tags (8 tests)
- M2xx: touch target sizing — 44 px min-height, manipulation (4 tests)
- M3xx: iOS zoom prevention, autocapitalize, enterkeyhint (5 tests)
- M4xx: HTMX robustness — hx-sync drop, disabled-elt, polling (5 tests)
- M5xx: safe-area insets, overscroll, dvh units (5 tests)
- M6xx: AirLLM interface contract — run(), RunResult, delegation (5 tests)

Total test count: 61 → 93 (all passing).

https://claude.ai/code/session_01RBuRCBXZNkAQQXXGiJNDmt
This commit is contained in:
Claude
2026-02-21 17:21:47 +00:00
parent 7499690e10
commit c8aa6a5fbb
9 changed files with 958 additions and 12 deletions

View File

@@ -1,14 +1,16 @@
"""AirLLM backend — only imported when the airllm extra is installed.
Provides TimmyAirLLMAgent: a drop-in replacement for an Agno Agent that
exposes the same print_response(message, stream) surface while routing
inference through AirLLM. On Apple Silicon (arm64 Darwin) the MLX backend
is selected automatically; everywhere else AutoModel (PyTorch) is used.
exposes both the run(message, stream) → RunResult interface used by the
dashboard and the print_response(message, stream) interface used by the CLI.
On Apple Silicon (arm64 Darwin) the MLX backend is selected automatically;
everywhere else AutoModel (PyTorch) is used.
No cloud. No telemetry. Sats are sovereignty, boss.
"""
import platform
from dataclasses import dataclass
from typing import Literal
from timmy.prompts import TIMMY_SYSTEM_PROMPT
@@ -23,6 +25,12 @@ _AIRLLM_MODELS: dict[str, str] = {
ModelSize = Literal["8b", "70b", "405b"]
@dataclass
class RunResult:
"""Minimal Agno-compatible run result — carries the model's response text."""
content: str
def is_apple_silicon() -> bool:
"""Return True when running on an M-series Mac (arm64 Darwin)."""
return platform.system() == "Darwin" and platform.machine() == "arm64"
@@ -38,7 +46,11 @@ def airllm_available() -> bool:
class TimmyAirLLMAgent:
"""Thin AirLLM wrapper with the same print_response interface as Agno Agent.
"""Thin AirLLM wrapper compatible with both dashboard and CLI call sites.
Exposes:
run(message, stream) → RunResult(content=...) [dashboard]
print_response(message, stream) → None [CLI]
Maintains a rolling 10-turn in-memory history so Timmy remembers the
conversation within a session — no SQLite needed at this layer.
@@ -64,12 +76,11 @@ class TimmyAirLLMAgent:
# ── public interface (mirrors Agno Agent) ────────────────────────────────
def print_response(self, message: str, *, stream: bool = True) -> None:
"""Run inference, update history, and render the response to stdout.
def run(self, message: str, *, stream: bool = False) -> RunResult:
"""Run inference and return a structured result (matches Agno Agent.run()).
`stream` is accepted for API compatibility but AirLLM generates the
full output in one pass — the result is still printed as soon as it
is ready.
`stream` is accepted for API compatibility; AirLLM always generates
the full output in one pass.
"""
prompt = self._build_prompt(message)
@@ -97,7 +108,12 @@ class TimmyAirLLMAgent:
self._history.append(f"User: {message}")
self._history.append(f"Timmy: {response}")
self._render(response)
return RunResult(content=response)
def print_response(self, message: str, *, stream: bool = True) -> None:
"""Run inference and render the response to stdout (CLI interface)."""
result = self.run(message, stream=stream)
self._render(result.content)
# ── private helpers ──────────────────────────────────────────────────────