Commit Graph

41 Commits

Author SHA1 Message Date
2f623826bd cleanup: delete dead modules — ~7,900 lines removed
Closes #22, Closes #23

Deleted: brain/, swarm/, openfang/, paperclip/, cascade_adapter,
memory_migrate, agents/timmy.py, dead routes + all corresponding tests.

Updated pyproject.toml, app.py, loop_qa.py for removed imports.
2026-03-14 09:49:24 -04:00
Trip T
78167675f2 feat: replace custom Gitea client with MCP servers
Replace the bespoke GiteaHand httpx client and tools_gitea.py wrappers
with official MCP tool servers (gitea-mcp + filesystem MCP), wired into
Agno via MCPTools. Switch all session functions to async (arun/acontinue_run)
so MCP tools auto-connect. Delete ~1070 lines of custom Gitea code.

- Create src/timmy/mcp_tools.py with MCP factories + standalone issue bridge
- Wire MCPTools into agent.py tool list (Gitea + filesystem)
- Switch session.py chat/chat_with_tools/continue_chat to async
- Update all callers (dashboard routes, Discord vendor, CLI, thinking engine)
- Add gitea_token fallback from ~/.config/gitea/token
- Add MCP session cleanup to app shutdown hook
- Update tool_safety.py for MCP tool names
- 11 new tests, all 1417 passing, coverage 74.2%

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 21:40:32 -04:00
Trip T
f6a6c0f62e feat: upgrade to qwen3.5, self-hosted Gitea CI, optimize Docker image
Model upgrade:
- qwen2.5:14b → qwen3.5:latest across config, tools, and docs
- Added qwen3.5 to multimodal model registry

Self-hosted Gitea CI:
- .gitea/workflows/tests.yml: lint + test jobs via act_runner
- Unified Dockerfile: pre-baked deps from poetry.lock for fast CI
- sitepackages=true in tox for ~2s dep resolution (was ~40s)
- OLLAMA_URL set to dead port in CI to prevent real LLM calls

Test isolation fixes:
- Smoke test fixture mocks create_timmy (was hitting real Ollama)
- WebSocket sends initial_state before joining broadcast pool (race fix)
- Tests use settings.ollama_model/url instead of hardcoded values
- skip_ci marker for Ollama-dependent tests, excluded in CI tox envs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 18:36:42 -04:00
Alexander Whitestone
36fc10097f Claude/angry cerf (#173)
* feat: set qwen3.5:latest as default model

- Make qwen3.5:latest the primary default model for faster inference
- Move llama3.1:8b-instruct to fallback chain
- Update text fallback chain to prioritize qwen3.5:latest

Retains full backward compatibility via cascade fallback.

* test: remove ~55 brittle, duplicate, and useless tests

Audit of all 100 test files identified tests that provided no real
regression protection. Removed:

- 4 files deleted entirely: test_setup_script (always skipped),
  test_csrf_bypass (tautological assertions), test_input_validation
  (accepts 200-500 status codes), test_security_regression (fragile
  source-pattern checks redundant with rendering tests)
- Duplicate test classes (TestToolTracking, TestCalculatorExtended)
- Mock-only tests that just verify mock wiring, not behavior
- Structurally broken tests (TestCreateToolFunctions patches after import)
- Empty/pass-body tests and meaningless assertions (len > 20)
- Flaky subprocess tests (aider tool calling real binary)

All 1328 remaining tests pass. Net: -699 lines, zero coverage loss.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: prevent test pollution from autoresearch_enabled mutation

test_autoresearch_perplexity.py was setting settings.autoresearch_enabled = True
but never restoring it in the finally block — polluting subsequent tests.
When pytest-randomly ordered it before test_experiments_page_shows_disabled_when_off,
the victim test saw enabled=True and failed to find "Disabled" in the page.

Fix both sides:
- Restore autoresearch_enabled in the finally block (root cause)
- Mock settings explicitly in the victim test (defense in depth)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 16:55:27 -04:00
Alexander Whitestone
9d78eb31d1 ruff (#169)
* polish: streamline nav, extract inline styles, improve tablet UX

- Restructure desktop nav from 8+ flat links + overflow dropdown into
  5 grouped dropdowns (Core, Agents, Intel, System, More) matching
  the mobile menu structure to reduce decision fatigue
- Extract all inline styles from mission_control.html and base.html
  notification elements into mission-control.css with semantic classes
- Replace JS-built innerHTML with secure DOM construction in
  notification loader and chat history
- Add CONNECTING state to connection indicator (amber) instead of
  showing OFFLINE before WebSocket connects
- Add tablet breakpoint (1024px) with larger touch targets for
  Apple Pencil / stylus use and safe-area padding for iPad toolbar
- Add active-link highlighting in desktop dropdown menus
- Rename "Mission Control" page title to "System Overview" to
  disambiguate from the chat home page
- Add "Home — Timmy Time" page title to index.html

https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h

* fix(security): move auth-gate credentials to environment variables

Hardcoded username, password, and HMAC secret in auth-gate.py replaced
with os.environ lookups. Startup now refuses to run if any variable is
unset. Added AUTH_GATE_SECRET/USER/PASS to .env.example.

https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h

* refactor(tooling): migrate from black+isort+bandit to ruff

Replace three separate linting/formatting tools with a single ruff
invocation. Updates tox.ini (lint, format, pre-push, pre-commit envs),
.pre-commit-config.yaml, and CI workflow. Fixes all ruff errors
including unused imports, missing raise-from, and undefined names.
Ruff config maps existing bandit skips to equivalent S-rules.

https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-03-11 12:23:35 -04:00
Alexander Whitestone
622a6a9204 polish: extract inline CSS, add connection status, panel macro, favicon, ollama cache, toast system (#164)
Major:
- Extract all inline <style> blocks from 22 Jinja2 templates into
  static/css/mission-control.css — single cacheable stylesheet
- Add tox lint check that fails on inline <style> in templates

Minor:
1. Connection status indicator in topbar (green/amber/red dot) reflecting
   WebSocket + Ollama reachability, with auto-reconnect
2. Jinja2 {% macro panel(title) %} in macros.html — eliminates repeated
   .card.mc-panel markup; index.html converted as example
3. SVG favicon (purple T + orange dot)
4. 30-second TTL cache on _check_ollama() to avoid blocking the event loop
   on every health poll (asyncio.to_thread was already in place)
5. Toast notification system (McToast.show) for transient status messages —
   wired into connection status for Ollama/WebSocket state changes

Enforcement:
- CLAUDE.md updated with conventions 11-14 (no inline CSS, use panel macro,
  use toasts, never block the event loop)
- tox lint + pre-push environments now fail on inline <style> blocks

https://claude.ai/code/session_014FQ785MQdyJQ4BAXrRSo9w

Co-authored-by: Claude <noreply@anthropic.com>
2026-03-11 09:52:57 -04:00
Alexander Whitestone
2a5f317a12 fix: implement @csrf_exempt decorator support in CSRFMiddleware (#159) 2026-03-10 15:26:40 -04:00
Alexander Whitestone
904a7c564e feat: migrate to Agno native HITL tool confirmation flow (#158)
Replace the homebrew regex-based tool extraction and manual dispatch
(tool_executor.py) with Agno's built-in Human-In-The-Loop confirmation:

- Toolkit(requires_confirmation_tools=...) marks dangerous tools
- agent.run() returns RunOutput with status=paused when confirmation needed
- RunRequirement.confirm()/reject() + agent.continue_run() resumes execution

Dashboard and Discord vendor both use the native flow. DuckDuckGo import
isolated so its absence doesn't kill all tools. Test stubs cleaned up
(agno is a real dependency, only truly optional packages stubbed).

1384 tests pass in parallel (~14s).

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 21:54:04 -04:00
Alexander Whitestone
fe484ad7b6 Fix input validation for chat and memory routes (#155) 2026-03-09 09:36:16 -04:00
Alexander Whitestone
8dbce25183 fix: handle concurrent table creation race in SQLite (#151) 2026-03-08 13:27:11 -04:00
Alexander Whitestone
ae3bb1cc21 feat: code quality audit + autoresearch integration + infra hardening (#150) 2026-03-08 12:50:44 -04:00
Alexander Whitestone
4bc53a43f9 fix: Round 4 bug fixes — 8 dashboard bugs + git blocker + Discord regression (#146)
* chore: stop tracking runtime-generated self-modify reports

These 65 files in data/self_modify_reports/ are auto-generated at
runtime and already listed in .gitignore. Tracking them caused
conflicts when pulling from main.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve 8 dashboard bugs from Round 4 testing report

- Fix Ollama timeout regression: request_timeout → timeout (agno API)
- Add Bootstrap JS to base.html (fixes creative UI tab switching)
- Send initial_state on Swarm Live WebSocket connect
- Add /api/queue/status endpoint (stops 404 log spam from chat panel)
- Populate agent tools from registry on /tools page
- Add notification bell dropdown with /api/notifications endpoint
- All 1157 tests pass

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add 17 e2e tests covering all Round 4 bug fixes

Covers: /calm 200, /api/queue/status JSON, Bootstrap JS presence,
Swarm Live WebSocket initial_state, agent tools populated on /tools,
/api/notifications endpoint, Ollama timeout param, full task lifecycle,
and smoke test for all 15 dashboard pages.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 23:48:20 -05:00
Alexander Whitestone
e36a1dc939 fix: resolve 6 dashboard bugs and rebuild Task Queue + Work Orders (#144) (#144)
Round 2+3 bug fix batch:

1. Ollama timeout: Add request_timeout=300 to prevent socket read errors
   on complex 30-60s prompts (production crash fix)

2. Memory API: Create missing HTMX partial templates (memory_facts.html,
   memory_results.html) so Save/Search buttons work

3. CALM page: Add create_tables() call so SQLAlchemy tables exist on
   first request (was returning HTTP 500)

4. Task Queue: Full SQLite-backed rebuild with CRUD endpoints, HTMX
   partials, and action buttons (approve/veto/pause/cancel/retry)

5. Work Orders: Full SQLite-backed rebuild with submit/approve/reject/
   execute pipeline and HTMX polling partials

6. Memory READ tool: Add memory_read function so Timmy stops calling
   read_file when trying to recall stored facts

Also: Close GitHub issues #115, #114, #112, #110 as won't-fix.
Comment on #107 confirming prune_memories() already wired to startup.

Tests: 33 new tests across 4 test files, all passing.
Full suite: 1155 passed, 2 pre-existing failures (hands_shell).

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 23:21:30 -05:00
Alexander Whitestone
b615595100 refactor: centralize config & harden security (#141)
* feat: upgrade primary model from llama3.1:8b to qwen2.5:14b

- Swap OLLAMA_MODEL_PRIMARY to qwen2.5:14b for better reasoning
- llama3.1:8b-instruct becomes fallback
- Update .env default and README quick start
- Fix hardcoded model assertions in tests

qwen2.5:14b provides significantly better multi-step reasoning
and tool calling reliability while still running locally on
modest hardware. The 8B model remains as automatic fallback.

* security: centralize config, harden uploads, fix silent exceptions

- Add 9 pydantic Settings fields (skip_embeddings, disable_csrf,
  rqlite_url, brain_source, brain_db_path, csrf_cookie_secure,
  chat_api_max_body_bytes, timmy_test_mode) to centralize env-var access
- Migrate 8 os.environ.get() calls across 5 source files to use
  `from config import settings` per project convention
- Add path traversal defense-in-depth to file upload endpoint
- Add 1MB request body size limit to chat API
- Make CSRF cookie secure flag configurable via settings
- Replace 2 silent `except: pass` blocks with debug logging in session.py
- Remove unused `import os` from brain/memory.py and csrf.py
- Update 5 CSRF test fixtures to patch settings instead of os.environ

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 18:49:37 -05:00
Alexander Whitestone
cdd3e1a90b feat: upgrade primary model from llama3.1:8b to qwen2.5:14b (#140)
- Swap OLLAMA_MODEL_PRIMARY to qwen2.5:14b for better reasoning
- llama3.1:8b-instruct becomes fallback
- Update .env default and README quick start
- Fix hardcoded model assertions in tests

qwen2.5:14b provides significantly better multi-step reasoning
and tool calling reliability while still running locally on
modest hardware. The 8B model remains as automatic fallback.

Co-authored-by: Trip T <trip@local>
2026-03-07 18:20:34 -05:00
Alexander Whitestone
480b8d324e security: fix CSRF bypass vulnerabilities via strict path matching and normalization (#138) 2026-03-07 06:45:32 -05:00
Alexander Whitestone
39461858a0 SEC: Fix CSRF bypass via path traversal in exempt routes (#135) 2026-03-06 09:00:56 -05:00
Alexander Whitestone
87dc5eadfe Wire orchestrator pipe into task runner + pipe-verifying integration tests (#134) 2026-03-06 01:20:14 -05:00
Alexander Whitestone
2b97da9e9c Add pre-commit hook enforcing 30s test suite time limit (#132) 2026-03-05 19:45:38 -05:00
Alexander Whitestone
b8ff534ad8 Security: Enhance CSRF protection with form field support and stricter validation (#128) 2026-03-05 07:04:30 -05:00
AlexanderWhitestone
5e8766cef0 Fix build issues, implement missing routes, and stabilize e2e tests for production readiness 2026-03-04 17:15:46 -05:00
Alexander Whitestone
425e7da380 Claude/remove persona system f vgt m (#126)
* Remove persona system, identity, and all Timmy references

Strip the codebase to pure orchestration logic:

- Delete TIMMY_IDENTITY.md and memory/self/identity.md
- Gut brain/identity.py to no-op stubs (empty returns)
- Remove all system prompts reinforcing Timmy's character, faith,
  sovereignty, sign-off ("Sir, affirmative"), and agent roster
- Replace identity-laden prompts with generic local-AI-assistant prompts
- Remove "You work for Timmy" from all sub-agent system prompts
- Rename PersonaTools → AgentTools, PERSONA_TOOLKITS → AGENT_TOOLKITS
- Replace "timmy" agent ID with "orchestrator" across routes, marketplace,
  tools catalog, and orchestrator class
- Strip Timmy references from config comments, templates, telegram bot,
  chat API, and dashboard UI
- Delete tests/brain/test_identity.py entirely
- Fix all test assertions that checked for persona identity content

729 tests pass (2 pre-existing failures in test_calm.py unrelated).

https://claude.ai/code/session_01LjQGUE6nk9W9674zaxrYxy

* Add Taskosaur (PM + AI task execution) to docker-compose

Spins up Taskosaur alongside the dashboard on `docker compose up`:
- postgres:16-alpine (port 5432, Taskosaur DB)
- redis:7-alpine (Bull queue backend)
- taskosaur (ports 3000 API / 3001 UI)
- dashboard now depends_on taskosaur healthy
- TASKOSAUR_API_URL injected into dashboard environment

Dashboard can reach Taskosaur at http://taskosaur:3000/api on the
internal network. Frontend UI accessible at http://localhost:3001.

https://claude.ai/code/session_01LjQGUE6nk9W9674zaxrYxy

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-03-04 12:00:49 -05:00
Alexander Whitestone
548a3f980d Test: add input validation tests for form handlers (#125) 2026-03-04 07:58:58 -05:00
Alexander Whitestone
15c7ee5d1e Refactor: migrate inline middleware to dedicated classes (#123) 2026-03-04 07:58:39 -05:00
Alexander Whitestone
eb2b34876b security: implement rate limiting for chat API and fix calm tests (#122) 2026-03-03 08:16:36 -05:00
Alexander Whitestone
584eeb679e Operation Darling Purge: slim to wealth core (-33,783 lines) (#121) 2026-03-02 13:17:38 -05:00
Alexander Whitestone
f694eff0a4 fix: align calm imports with project conventions (#120) 2026-03-02 12:04:30 -05:00
AlexanderWhitestone
d080e67faf feat: Implement Minimum Viable Calm (MVC) feature and initial tests 2026-03-02 11:46:40 -05:00
Alexander Whitestone
62ef1120a4 Memory Unification + Canonical Identity: -11,074 lines of homebrew (#119) 2026-03-02 09:58:07 -05:00
Alexander Whitestone
b1552b9f64 Fix font resolution in creative module and add security headers to dashboard (#103) 2026-03-01 11:50:34 -05:00
Alexander Whitestone
3a8496a3f1 feat: add security middleware suite - CSRF, security headers, and request logging (#102)
Implements three security middleware components with full test coverage:

- CSRF Protection: Token generation/validation, safe method allowlist,
  auto-exempt webhooks, constant-time comparison for timing attack prevention

- Security Headers: X-Content-Type-Options, X-Frame-Options, CSP,
  Permissions-Policy, Referrer-Policy, HSTS (production)

- Request Logging: Method/path/status/duration logging with correlation IDs,
  configurable path exclusions, X-Forwarded-For support

Also fixes Discord test isolation issue where settings.discord_token
was not being properly reset between tests.

New files:
- src/dashboard/middleware/{csrf,security_headers,request_logging}.py
- tests/dashboard/middleware/test_{csrf,security_headers,request_logging}.py

Addresses design review recommendations R3, R8, R9, R4.

All tests pass: 1950 passed, 40 skipped

Co-authored-by: Alexander Payne <apayne@MM.local>
2026-02-28 23:21:09 -05:00
Alexander Whitestone
6e67c3b421 feat: add bug report ingestion pipeline with Forge dispatch (#99)
Replace the stub `handle_bug_report` handler with a real implementation
that logs a decision trail and dispatches code_fix tasks to Forge for
automated fixing. Add `POST /api/bugs/submit` endpoint and `timmy
ingest-report` CLI command so AI test runners (Comet) can submit
structured bug reports without manual copy-paste.

- POST /api/bugs/submit: accepts JSON reports, creates bug_report tasks
- timmy ingest-report: CLI for file/stdin JSON ingestion with --dry-run
- handle_bug_report: logs decision trail to event_log, dispatches
  code_fix task to Forge with parent_task_id linking back to the bug
- 18 TDD tests covering endpoint, handler, and CLI

Co-authored-by: Alexander Payne <apayne@MM.local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 21:15:53 -05:00
Alexander Whitestone
2e92838033 fix: restore real-time chat responses via WebSocket (#98)
The chat WebSocket return path was broken by two bugs that prevented
Timmy's responses from appearing in the live chat feed:

1. Frontend checked msg.type instead of msg.event for 'timmy_response'
   events — the WSEvent dataclass uses 'event' as the field name.
2. Frontend accessed msg.response instead of msg.data.response — the
   response payload is nested in the data field.

Additional fixes:
- Queue acknowledgment ("Message queued...") no longer logged as an
  agent message in chat history; the real response is logged by the
  task processor when it completes, eliminating duplicate messages.
- Chat message template now carries data-task-id so the WS handler
  can find and replace the placeholder with the actual response.
- appendMessage() uses DOM APIs (textContent) instead of innerHTML
  for safer content insertion before markdown rendering.
- Fixed chat_message.html script targeting when queue-status div is
  present between the agent message and the inline script.

https://claude.ai/code/session_011cJfexqBBuGhSRQU8qwKcR

Co-authored-by: Claude <noreply@anthropic.com>
2026-02-28 20:22:47 -05:00
Alexander Whitestone
d7d7a5a80a audit: clean Docker architecture, consolidate test fixtures, add containerized test runner (#94) 2026-02-28 16:11:58 -05:00
Alexander Whitestone
da5745db48 Fix dashboard tests and add SECURITY.md audit report (#84) 2026-02-28 06:59:15 -05:00
Alexander Whitestone
aa3263bc3b feat: automatic error feedback loop with bug report tracker (#80)
Errors and uncaught exceptions are now automatically captured, deduplicated,
persisted to a rotating log file, and filed as bug report tasks in the
existing task queue — giving Timmy a sovereign, local issue tracker with
zero new dependencies.

- Add RotatingFileHandler writing errors to logs/errors.log (5MB rotate, 5 backups)
- Add error capture module with stack-trace hashing and 5-min dedup window
- Add FastAPI exception middleware + global exception handler
- Instrument all background loops (briefing, thinking, task processor) with capture_error()
- Extend task queue with bug_report task type and auto-approve rule
- Fix auto-approve type matching (was ignoring task_type field entirely)
- Add /bugs dashboard page and /api/bugs JSON endpoints
- Add ERROR_CAPTURED and BUG_REPORT_CREATED event types for real-time feed
- Add BUGS nav link to desktop and mobile navigation
- Add 16 tests covering error capture, deduplication, and bug report routes

Co-authored-by: Alexander Payne <apayne@MM.local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 19:51:37 -05:00
Alexander Whitestone
5e60a6453b feat: wire mobile app to real Timmy backend via JSON REST API (#73)
Add /api/chat, /api/upload, and /api/chat/history endpoints to the
FastAPI dashboard so the Expo mobile app talks directly to Timmy's
brain (Ollama) instead of a non-existent Node.js server.

Backend:
- New src/dashboard/routes/chat_api.py with 4 endpoints
- Mount /uploads/ for serving chat attachments
- Same context injection and session management as HTMX chat

Mobile app fixes:
- Point API base URL at port 8000 (FastAPI) instead of 3000
- Create lib/_core/theme.ts (was referenced but never created)
- Fix shared/types.ts (remove broken drizzle/errors re-exports)
- Remove broken server/chat.ts and 1,235-line template README
- Clean package.json (remove express, mysql2, drizzle, tRPC deps)
- Remove debug console.log from theme-provider

Tests: 13 new tests covering all API endpoints (all passing).

https://claude.ai/code/session_01XqErDoh2rVsPY8oTj21Lz2

Co-authored-by: Claude <noreply@anthropic.com>
2026-02-26 23:58:53 -05:00
Alexander Whitestone
18ed6232f9 feat: Timmy fixes and improvements (#72)
* test: remove hardcoded sleeps, add pytest-timeout

- Replace fixed time.sleep() calls with intelligent polling or WebDriverWait
- Add pytest-timeout dependency and --timeout=30 to prevent hangs
- Fixes test flakiness and improves test suite speed

* feat: add Aider AI tool to Forge's toolkit

- Add Aider tool that calls local Ollama (qwen2.5:14b) for AI coding assist
- Register tool in Forge's code toolkit
- Add functional tests for the Aider tool

* config: add opencode.json with local Ollama provider for sovereign AI

* feat: Timmy fixes and improvements

## Bug Fixes
- Fix read_file path resolution: add ~ expansion, proper relative path handling
- Add repo_root to config.py with auto-detection from .git location
- Fix hardcoded llama3.2 - now dynamic from settings.ollama_model

## Timmy's Requests
- Add communication protocol to AGENTS.md (read context first, explain changes)
- Create DECISIONS.md for architectural decision documentation
- Add reasoning guidance to system prompts (step-by-step, state uncertainty)
- Update tests to reflect correct model name (llama3.1:8b-instruct)

## Testing
- All 177 dashboard tests pass
- All 32 prompt/tool tests pass

---------

Co-authored-by: Alexander Payne <apayne@MM.local>
2026-02-26 23:39:13 -05:00
Claude
3b7fcc5ebc feat: add in-browser local model support for iPhone via WebLLM
Enable Timmy to run directly on iPhone by loading a small LLM into
the browser via WebGPU (Safari 26+ / iOS 26+). No server connection
required — fully sovereign, fully offline.

New files:
- static/local_llm.js: WebLLM wrapper with model catalogue, WebGPU
  detection, streaming chat, and progress callbacks
- templates/mobile_local.html: Mobile-optimized UI with model
  selector, download progress, LOCAL/SERVER badge, and chat
- tests/dashboard/test_local_models.py: 31 tests covering routes,
  config, template UX, JS asset, and XSS prevention

Changes:
- config.py: browser_model_enabled, browser_model_id,
  browser_model_fallback settings
- routes/mobile.py: /mobile/local page, /mobile/local-models API
- base.html: LOCAL AI nav link

Supported models: SmolLM2-360M (~200MB), Qwen2.5-0.5B (~350MB),
SmolLM2-1.7B (~1GB), Llama-3.2-1B (~700MB). Falls back to
server-side Ollama when local model is unavailable.

https://claude.ai/code/session_01Cqkvr4sZbED7T3iDu1rwSD
2026-02-27 00:03:05 +00:00
Claude
9f4c809f70 refactor: Phase 2b — consolidate 28 modules into 14 packages
Complete the module consolidation planned in REFACTORING_PLAN.md:

Modules merged:
- work_orders/ + task_queue/ → swarm/ (subpackages)
- self_modify/ + self_tdd/ + upgrades/ → self_coding/ (subpackages)
- tools/ → creative/tools/
- chat_bridge/ + telegram_bot/ + shortcuts/ + voice/ → integrations/ (new)
- ws_manager/ + notifications/ + events/ + router/ → infrastructure/ (new)
- agents/ + agent_core/ + memory/ → timmy/ (subpackages)

Updated across codebase:
- 66 source files: import statements rewritten
- 13 test files: import + patch() target strings rewritten
- pyproject.toml: wheel includes (28→14), entry points updated
- CLAUDE.md: singleton paths, module map, entry points table
- AGENTS.md: file convention updates
- REFACTORING_PLAN.md: execution status, success metrics

Extras:
- Module-level CLAUDE.md added to 6 key packages (Phase 6.2)
- Zero test regressions: 1462 tests passing

https://claude.ai/code/session_01JNjWfHqusjT3aiN4vvYgUk
2026-02-26 22:07:41 +00:00
Claude
4e11dd2490 refactor: Phase 3 — reorganize tests into module-mirroring subdirectories
Move 97 test files from flat tests/ into 13 subdirectories:
  tests/dashboard/   (8 files — routes, mobile, mission control)
  tests/swarm/       (17 files — coordinator, docker, routing, tasks)
  tests/timmy/       (12 files — agent, backends, CLI, tools)
  tests/self_coding/  (14 files — git safety, indexer, self-modify)
  tests/lightning/   (3 files — L402, LND, interface)
  tests/creative/    (8 files — assembler, director, image/music/video)
  tests/integrations/ (10 files — chat bridge, telegram, voice, websocket)
  tests/mcp/         (4 files — bootstrap, discovery, executor)
  tests/spark/       (3 files — engine, tools, events)
  tests/hands/       (3 files — registry, oracle, phase5)
  tests/scripture/   (1 file)
  tests/infrastructure/ (3 files — router cascade, API)
  tests/security/    (3 files — XSS, regression)

Fix Path(__file__) reference in test_mobile_scenarios.py for new depth.
Add __init__.py to all test subdirectories.

Tests: 1503 passed, 9 failed (pre-existing), 53 errors (pre-existing)

https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN
2026-02-26 21:21:28 +00:00