Commit Graph

7 Commits

Author SHA1 Message Date
Alexander Whitestone
ae3bb1cc21 feat: code quality audit + autoresearch integration + infra hardening (#150) 2026-03-08 12:50:44 -04:00
Alexander Whitestone
2b97da9e9c Add pre-commit hook enforcing 30s test suite time limit (#132) 2026-03-05 19:45:38 -05:00
Alexander Whitestone
89cfe1be0d fix: Docker-first test suite, UX improvements, and bug fixes (#100)
Dashboard UX:
- Restructure nav from 22 flat links to 6 core + MORE dropdown
- Add mobile nav section labels (Core, Intelligence, Agents, System, Commerce)
- Defer marked.js and dompurify.js loading, consolidate CDN to jsdelivr
- Optimize font weights (drop unused 300/500), bump style.css cache buster
- Remove duplicate HTMX load triggers from sidebar and health panels

Bug fixes:
- Fix Timmy showing OFFLINE by registering after swarm recovery sweep
- Fix ThinkingEngine await bug with asyncio.run_coroutine_threadsafe
- Fix chat auto-scroll by calling scrollChat() after history partial loads
- Add missing /voice/button page and /voice/command endpoint
- Fix Grok api_key="" treated as falsy falling through to env key
- Fix self_modify PROJECT_ROOT using settings.repo_root instead of __file__

Docker test infrastructure:
- Bind-mount hands/, docker/, Dockerfiles, and compose files into test container
- Add fontconfig + fonts-dejavu-core for creative/assembler TextClip tests
- Initialize minimal git repo in Dockerfile.test for GitSafety compatibility
- Fix introspection and path resolution tests for Docker /app context

All 1863 tests pass in Docker (0 failures, 77 skipped).

Co-authored-by: Alexander Payne <apayne@MM.local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 22:14:37 -05:00
Alexander Whitestone
ab014dc5c6 feat: add timmy interview command for structured agent initialization (#87) 2026-02-28 09:35:44 -05:00
Claude
17059bc0ea feat: add Grok (xAI) as opt-in premium backend with monetization
- Add GrokBackend class in src/timmy/backends.py with full sync/async
  support, health checks, usage stats, and cost estimation in sats
- Add consult_grok tool to Timmy's toolkit for proactive Grok queries
- Extend cascade router with Grok provider type for failover chain
- Add Grok Mode toggle card to Mission Control dashboard (HTMX live)
- Add "Ask Grok" button on chat input for direct Grok queries
- Add /grok/* routes: status, toggle, chat, stats endpoints
- Integrate Lightning invoice generation for Grok usage monetization
- Add GROK_ENABLED, XAI_API_KEY, GROK_DEFAULT_MODEL, GROK_MAX_SATS_PER_QUERY,
  GROK_FREE config settings via pydantic-settings
- Update .env.example and docker-compose.yml with Grok env vars
- Add 21 tests covering backend, tools, and route endpoints (all green)

Local-first ethos preserved: Grok is premium augmentation only,
disabled by default, and Lightning-payable when enabled.

https://claude.ai/code/session_01FygwN8wS8J6WGZ8FPb7XGV
2026-02-27 01:12:51 +00:00
Claude
c8aa6a5fbb feat: quality analysis — bug fixes, mobile tests, HITL checklist
Senior architect review findings + remediations:

BUG FIX — critical interface mismatch
- TimmyAirLLMAgent only exposed print_response(); dashboard route calls
  agent.run() → AttributeError when AirLLM backend is selected.
  Added run() → RunResult(content) as primary inference entry point;
  print_response() now delegates to run() so both call sites share
  one inference path.
- Added RunResult dataclass for Agno-compatible structured return.

BUG FIX — hardcoded model name in health status partial
- health_status.html rendered literal "llama3.2" regardless of
  OLLAMA_MODEL env var. Route now passes settings.ollama_model to
  the template context; partial renders {{ model }} instead.

FEATURE — /mobile-test HITL checklist page
- 22 human-executable test scenarios across: Layout, Touch & Input,
  Chat behaviour, Health, Scroll, Notch/Home Bar, Live UI.
- Pass/Fail/Skip buttons with sessionStorage state persistence.
- Live progress bar + final score summary.
- TEST link added to Mission Control header for quick access on phone.

TEST — 32 new automated mobile quality tests (M1xx–M6xx)
- M1xx: viewport/meta tags (8 tests)
- M2xx: touch target sizing — 44 px min-height, manipulation (4 tests)
- M3xx: iOS zoom prevention, autocapitalize, enterkeyhint (5 tests)
- M4xx: HTMX robustness — hx-sync drop, disabled-elt, polling (5 tests)
- M5xx: safe-area insets, overscroll, dvh units (5 tests)
- M6xx: AirLLM interface contract — run(), RunResult, delegation (5 tests)

Total test count: 61 → 93 (all passing).

https://claude.ai/code/session_01RBuRCBXZNkAQQXXGiJNDmt
2026-02-21 17:21:47 +00:00
Claude
19af4ae540 feat: integrate AirLLM as optional high-performance backend
Adds the `bigbrain` optional dependency group (airllm>=2.9.0) and a
complete second inference path that runs 8B / 70B / 405B Llama models
locally via layer-by-layer loading — no GPU required, no cloud, fully
sovereign.

Key changes:
- src/timmy/backends.py   — TimmyAirLLMAgent (same print_response interface
                            as Agno Agent); auto-selects AirLLMMLX on Apple
                            Silicon, AutoModel (PyTorch) everywhere else
- src/timmy/agent.py      — _resolve_backend() routing with explicit override,
                            env-config, and 'auto' Apple-Silicon detection
- src/timmy/cli.py        — --backend / --model-size flags on all commands
- src/config.py           — timmy_model_backend + airllm_model_size settings
- src/timmy/prompts.py    — mentions AirLLM "even bigger brains, still fully
                            sovereign"
- pyproject.toml          — bigbrain optional dep; wheel includes updated
- .env.example            — TIMMY_MODEL_BACKEND + AIRLLM_MODEL_SIZE docs
- tests/conftest.py       — stubs 'airllm' module so tests run without GPU
- tests/test_backends.py  — 13 new tests covering helpers + TimmyAirLLMAgent
- tests/test_agent.py     — 7 new tests for backend routing
- README.md               — Big Brain section with one-line install
- activate_self_tdd.sh    — bootstrap script (venv + install + tests +
                            watchdog + dashboard); --big-brain flag

All 61 tests pass. Self-TDD watchdog unaffected.

https://claude.ai/code/session_01DMjQ5qMZ8iHeyix1j3GS7c
2026-02-21 16:53:16 +00:00