Commit Graph

64 Commits

Author SHA1 Message Date
Alexander Whitestone
bc21bbe96f fix: connect WebSocket to correct /swarm/live endpoint (#82)
The tasks board and Timmy panel were connecting to /ws which doesn't
exist, causing constant 403 Forbidden rejections and preventing
live event updates from reaching the UI.

Co-authored-by: Alexander Payne <apayne@MM.local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 20:27:20 -05:00
Alexander Whitestone
aa3263bc3b feat: automatic error feedback loop with bug report tracker (#80)
Errors and uncaught exceptions are now automatically captured, deduplicated,
persisted to a rotating log file, and filed as bug report tasks in the
existing task queue — giving Timmy a sovereign, local issue tracker with
zero new dependencies.

- Add RotatingFileHandler writing errors to logs/errors.log (5MB rotate, 5 backups)
- Add error capture module with stack-trace hashing and 5-min dedup window
- Add FastAPI exception middleware + global exception handler
- Instrument all background loops (briefing, thinking, task processor) with capture_error()
- Extend task queue with bug_report task type and auto-approve rule
- Fix auto-approve type matching (was ignoring task_type field entirely)
- Add /bugs dashboard page and /api/bugs JSON endpoints
- Add ERROR_CAPTURED and BUG_REPORT_CREATED event types for real-time feed
- Add BUGS nav link to desktop and mobile navigation
- Add 16 tests covering error capture, deduplication, and bug report routes

Co-authored-by: Alexander Payne <apayne@MM.local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 19:51:37 -05:00
Alexander Whitestone
5b6d33e05a feat: task queue system with startup drain and backlogging (#76)
* feat: add task queue system for Timmy - all work goes through the queue

- Add queue position tracking to task_queue models with task_type field
- Add TaskProcessor class that consumes tasks from queue one at a time
- Modify chat route to queue all messages for async processing
- Chat responses get 'high' priority to jump ahead of thought tasks
- Add queue status API endpoints for position polling
- Update UI to show queue position (x/y) and current task banner
- Replace thinking loop with task-based approach - thoughts are queued tasks
- Push responses to user via WebSocket instead of immediate HTTP response
- Add database migrations for existing tables

* feat: Timmy drains task queue on startup, backlogs unhandleable tasks

On spin-up, Timmy now iterates through all pending/approved tasks
immediately instead of waiting for the polling loop. Tasks without a
registered handler or with permanent errors are moved to a new
BACKLOGGED status with a reason, keeping the queue clear for work
Timmy can actually do.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Alexander Payne <apayne@MM.local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 01:52:42 -05:00
Alexander Whitestone
849b5b1a8d feat: add default thinking thread — Timmy always ponders (#75) 2026-02-27 01:00:11 -05:00
Alexander Whitestone
5e60a6453b feat: wire mobile app to real Timmy backend via JSON REST API (#73)
Add /api/chat, /api/upload, and /api/chat/history endpoints to the
FastAPI dashboard so the Expo mobile app talks directly to Timmy's
brain (Ollama) instead of a non-existent Node.js server.

Backend:
- New src/dashboard/routes/chat_api.py with 4 endpoints
- Mount /uploads/ for serving chat attachments
- Same context injection and session management as HTMX chat

Mobile app fixes:
- Point API base URL at port 8000 (FastAPI) instead of 3000
- Create lib/_core/theme.ts (was referenced but never created)
- Fix shared/types.ts (remove broken drizzle/errors re-exports)
- Remove broken server/chat.ts and 1,235-line template README
- Clean package.json (remove express, mysql2, drizzle, tRPC deps)
- Remove debug console.log from theme-provider

Tests: 13 new tests covering all API endpoints (all passing).

https://claude.ai/code/session_01XqErDoh2rVsPY8oTj21Lz2

Co-authored-by: Claude <noreply@anthropic.com>
2026-02-26 23:58:53 -05:00
Alexander Whitestone
18ed6232f9 feat: Timmy fixes and improvements (#72)
* test: remove hardcoded sleeps, add pytest-timeout

- Replace fixed time.sleep() calls with intelligent polling or WebDriverWait
- Add pytest-timeout dependency and --timeout=30 to prevent hangs
- Fixes test flakiness and improves test suite speed

* feat: add Aider AI tool to Forge's toolkit

- Add Aider tool that calls local Ollama (qwen2.5:14b) for AI coding assist
- Register tool in Forge's code toolkit
- Add functional tests for the Aider tool

* config: add opencode.json with local Ollama provider for sovereign AI

* feat: Timmy fixes and improvements

## Bug Fixes
- Fix read_file path resolution: add ~ expansion, proper relative path handling
- Add repo_root to config.py with auto-detection from .git location
- Fix hardcoded llama3.2 - now dynamic from settings.ollama_model

## Timmy's Requests
- Add communication protocol to AGENTS.md (read context first, explain changes)
- Create DECISIONS.md for architectural decision documentation
- Add reasoning guidance to system prompts (step-by-step, state uncertainty)
- Update tests to reflect correct model name (llama3.1:8b-instruct)

## Testing
- All 177 dashboard tests pass
- All 32 prompt/tool tests pass

---------

Co-authored-by: Alexander Payne <apayne@MM.local>
2026-02-26 23:39:13 -05:00
Claude
211c54bc8c feat: add custom weights, model registry, per-agent models, and reward scoring
Inspired by OpenClaw-RL's multi-model orchestration, this adds four
features for custom model management:

1. Custom model registry (infrastructure/models/registry.py) — SQLite-backed
   registry for GGUF, safetensors, HF checkpoint, and Ollama models with
   role-based lookups (general, reward, teacher, judge).

2. Per-agent model assignment — each swarm persona can use a different model
   instead of sharing the global default. Resolved via registry assignment >
   persona default > global default.

3. Runtime model management API (/api/v1/models) — REST endpoints to register,
   list, assign, enable/disable, and remove custom models without restart.
   Includes a dashboard page at /models.

4. Reward model scoring (PRM-style) — majority-vote quality evaluation of
   agent outputs using a configurable reward model. Scores persist in SQLite
   and feed into the swarm learner.

New config settings: custom_weights_dir, reward_model_enabled,
reward_model_name, reward_model_votes.

54 new tests covering registry CRUD, API endpoints, agent assignments,
role lookups, and reward scoring.

https://claude.ai/code/session_01V4iTozMwcE2gjfnCJdCugC
2026-02-27 01:27:53 +00:00
Claude
17059bc0ea feat: add Grok (xAI) as opt-in premium backend with monetization
- Add GrokBackend class in src/timmy/backends.py with full sync/async
  support, health checks, usage stats, and cost estimation in sats
- Add consult_grok tool to Timmy's toolkit for proactive Grok queries
- Extend cascade router with Grok provider type for failover chain
- Add Grok Mode toggle card to Mission Control dashboard (HTMX live)
- Add "Ask Grok" button on chat input for direct Grok queries
- Add /grok/* routes: status, toggle, chat, stats endpoints
- Integrate Lightning invoice generation for Grok usage monetization
- Add GROK_ENABLED, XAI_API_KEY, GROK_DEFAULT_MODEL, GROK_MAX_SATS_PER_QUERY,
  GROK_FREE config settings via pydantic-settings
- Update .env.example and docker-compose.yml with Grok env vars
- Add 21 tests covering backend, tools, and route endpoints (all green)

Local-first ethos preserved: Grok is premium augmentation only,
disabled by default, and Lightning-payable when enabled.

https://claude.ai/code/session_01FygwN8wS8J6WGZ8FPb7XGV
2026-02-27 01:12:51 +00:00
Claude
bc2c09d3f8 feat: replace GitHub page with embedded Timmy chat interface
Replaces the marketing landing page with a minimal, full-screen chat
interface that connects to a running Timmy instance. Mobile-first design
with single vertical scroll direction, looping scroll, no zoom, no
buttons — just type and press Enter to talk to Timmy.

- docs/index.html: full rewrite as a clean chat UI with dark terminal
  theme, looping infinite scroll, markdown rendering, connection status,
  and /connect, /clear, /help slash commands
- src/dashboard/app.py: add CORS middleware so the GitHub Pages site can
  reach a local Timmy server cross-origin
- src/config.py: add cors_origins setting (defaults to ["*"])

https://claude.ai/code/session_01AWLxg6KDWsfCATiuvsRMGr
2026-02-27 00:35:33 +00:00
Claude
3b7fcc5ebc feat: add in-browser local model support for iPhone via WebLLM
Enable Timmy to run directly on iPhone by loading a small LLM into
the browser via WebGPU (Safari 26+ / iOS 26+). No server connection
required — fully sovereign, fully offline.

New files:
- static/local_llm.js: WebLLM wrapper with model catalogue, WebGPU
  detection, streaming chat, and progress callbacks
- templates/mobile_local.html: Mobile-optimized UI with model
  selector, download progress, LOCAL/SERVER badge, and chat
- tests/dashboard/test_local_models.py: 31 tests covering routes,
  config, template UX, JS asset, and XSS prevention

Changes:
- config.py: browser_model_enabled, browser_model_id,
  browser_model_fallback settings
- routes/mobile.py: /mobile/local page, /mobile/local-models API
- base.html: LOCAL AI nav link

Supported models: SmolLM2-360M (~200MB), Qwen2.5-0.5B (~350MB),
SmolLM2-1.7B (~1GB), Llama-3.2-1B (~700MB). Falls back to
server-side Ollama when local model is unavailable.

https://claude.ai/code/session_01Cqkvr4sZbED7T3iDu1rwSD
2026-02-27 00:03:05 +00:00
Claude
9f4c809f70 refactor: Phase 2b — consolidate 28 modules into 14 packages
Complete the module consolidation planned in REFACTORING_PLAN.md:

Modules merged:
- work_orders/ + task_queue/ → swarm/ (subpackages)
- self_modify/ + self_tdd/ + upgrades/ → self_coding/ (subpackages)
- tools/ → creative/tools/
- chat_bridge/ + telegram_bot/ + shortcuts/ + voice/ → integrations/ (new)
- ws_manager/ + notifications/ + events/ + router/ → infrastructure/ (new)
- agents/ + agent_core/ + memory/ → timmy/ (subpackages)

Updated across codebase:
- 66 source files: import statements rewritten
- 13 test files: import + patch() target strings rewritten
- pyproject.toml: wheel includes (28→14), entry points updated
- CLAUDE.md: singleton paths, module map, entry points table
- AGENTS.md: file convention updates
- REFACTORING_PLAN.md: execution status, success metrics

Extras:
- Module-level CLAUDE.md added to 6 key packages (Phase 6.2)
- Zero test regressions: 1462 tests passing

https://claude.ai/code/session_01JNjWfHqusjT3aiN4vvYgUk
2026-02-26 22:07:41 +00:00
Claude
d2c80fbf4c refactor: Phase 2a — consolidate dashboard routes (27→22 files)
Merge related route files to reduce sprawl:
- voice.py ← voice_enhanced.py (enhanced pipeline merged in)
- swarm.py ← swarm_internal.py + swarm_ws.py (internal API + WebSocket)
- self_coding.py ← self_modify.py (self-modify endpoints merged in)
- Delete mobile_test.py route + template (test-only page, not for prod)
- Delete test_xss_prevention.py (tested the deleted mobile_test page)

Update app.py to use consolidated imports.
Update test_voice_enhanced.py patch paths.
Remove mobile_test.py from coverage omit (file deleted).

27 route files → 22. Tests: 1502 passed (1 removed with deleted page).

https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN
2026-02-26 21:30:39 +00:00
Claude
6045077144 refactor: Phase 1/4/6 — doc cleanup, config fix, token optimization
Phase 1 — Documentation cleanup:
- Slim README 303→93 lines (remove duplicated architecture, config tables)
- Slim CLAUDE.md 267→80 lines (remove project layout, env vars, CI section)
- Slim AGENTS.md 342→72 lines (remove duplicated patterns, running locally)
- Delete MEMORY.md, WORKSET_PLAN.md, WORKSET_PLAN_PHASE2.md (session docs)
- Archive PLAN.md, IMPLEMENTATION_SUMMARY.md to docs/
- Move QUALITY_ANALYSIS.md, QUALITY_REVIEW_REPORT.md to docs/
- Move apply_security_fixes.py, activate_self_tdd.sh to scripts/

Phase 4 — Config & build cleanup:
- Fix wheel build: add 11 missing modules to pyproject.toml include list
- Add pytest markers (unit, integration, dashboard, swarm, slow)
- Add data/self_modify_reports/ and .handoff/ to .gitignore

Phase 6 — Token optimization:
- Add docstrings to 15 __init__.py files that were empty
- Create __init__.py for events/, memory/, upgrades/ modules

Root markdown: 87KB → ~18KB (79% reduction)

https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN
2026-02-26 21:03:15 +00:00
Alexander Payne
d7aaae74d5 feat: Hands Dashboard Routes and UI (Phase 3.6)
Add dashboard for managing autonomous Hands:

Routes (src/dashboard/routes/hands.py):
- GET /api/hands - List all Hands with status
- GET /api/hands/{name} - Get Hand details
- POST /api/hands/{name}/trigger - Manual trigger
- POST /api/hands/{name}/pause - Pause scheduled Hand
- POST /api/hands/{name}/resume - Resume paused Hand
- GET /api/approvals - List pending approvals
- POST /api/approvals/{id}/approve - Approve request
- POST /api/approvals/{id}/reject - Reject request
- GET /api/executions - List execution history

Templates:
- hands.html - Main dashboard page
- partials/hands_list.html - Active Hands list
- partials/approvals_list.html - Pending approvals
- partials/hand_executions.html - Execution history

Integration:
- Wired up in app.py
- Navigation links in base.html
2026-02-26 12:46:48 -05:00
Alexander Payne
62365cc9b2 feat: Wire up Self-Coding Dashboard
Integrate self-coding routes into dashboard:

Changes:
- Add import for self_coding_router in app.py
- Include self_coding_router in FastAPI app
- Add SELF-CODING link to desktop navigation
- Add SELF-CODING link to mobile navigation

The self-coding dashboard is now accessible at /self-coding
2026-02-26 12:28:30 -05:00
Alexander Payne
e81be8aed7 feat: Self-Coding Dashboard HTMX Templates
Add complete UI for self-coding dashboard:

Templates:
- self_coding.html - Main dashboard page with layout
- partials/self_coding_stats.html - Stats cards (total, success rate, etc)
- partials/journal_entries.html - List of modification attempts
- partials/journal_entry_detail.html - Expanded view of single attempt
- partials/execute_form.html - Task execution form
- partials/execute_result.html - Execution result display
- partials/error.html - Error message display

Features:
- HTMX-powered dynamic updates
- Real-time journal filtering (all/success/failure)
- Modal dialog for task execution
- Responsive Bootstrap 5 styling
- Automatic refresh after successful execution
2026-02-26 12:28:05 -05:00
Alexander Payne
cb70cb392a feat: Self-Coding Dashboard API Routes
Add FastAPI routes for self-coding dashboard:

API Endpoints:
- GET /api/journal - List modification journal entries
- GET /api/journal/{id} - Get detailed attempt info
- GET /api/stats - Get success rate metrics
- POST /api/execute - Execute self-edit task
- GET /api/codebase/summary - Get codebase summary
- POST /api/codebase/reindex - Trigger reindex

HTMX Partials:
- GET /self-coding/ - Main dashboard page
- GET /self-coding/journal - Journal entries list
- GET /self-coding/stats - Stats cards
- GET /self-coding/execute-form - Task execution form
- POST /self-coding/execute - Execute task endpoint
- GET /journal/{id}/detail - Entry detail view
2026-02-26 12:28:05 -05:00
Claude
63bbe2a288 feat: add sovereign biblical text integration module (scripture)
Implement the core scripture module for local-first ESV text storage,
verse retrieval, reference parsing, original language support,
cross-referencing, topical mapping, and automated meditation workflows.

Architecture:
- scripture/constants.py: 66-book Protestant canon with aliases and metadata
- scripture/models.py: Pydantic models with integer-encoded verse IDs
- scripture/parser.py: Regex-based reference extraction and formatting
- scripture/store.py: SQLite-backed verse/xref/topic/Strong's storage
- scripture/memory.py: Tripartite memory (working/long-term/associative)
- scripture/meditation.py: Sequential/thematic/lectionary meditation scheduler
- dashboard/routes/scripture.py: REST endpoints for all scripture operations
- config.py: scripture_enabled, translation, meditation settings
- 95 comprehensive tests covering all modules and routes

https://claude.ai/code/session_015wv7FM6BFsgZ35Us6WeY7H
2026-02-26 17:06:00 +00:00
Alexander Payne
3ca8e9f2d6 fix: chat evaluation bugs — task pipeline, prompt grounding, markdown rendering
Addresses 14 bugs from 3 rounds of deep chat evaluation:

- Add chat-to-task pipeline in agents.py with regex-based intent detection,
  agent extraction, priority extraction, and title cleaning
- Filter meta-questions ("how do I create a task?") from task creation
- Inject real-time date/time context into every chat message
- Inject live queue state when user asks about tasks
- Ground system prompts with agent roster, honesty guardrails, self-knowledge,
  math delegation template, anti-filler rules, values-conflict guidance
- Add CSS for markdown code blocks, inline code, lists, blockquotes in chat
- Add highlight.js CDN for syntax highlighting in chat responses
- Reduce small-model memory context budget (4000→2000) for expanded prompt
- Add 27 comprehensive tests covering the full chat-to-task pipeline

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 11:42:42 -05:00
Alexander Payne
5f9bbb8435 feat: add task queue with human-in-the-loop approval + work orders + UI bug fixes
Task Queue system:
- New /tasks page with three-column layout (Pending/Active/Completed)
- Full CRUD API at /api/tasks with approve/veto/modify/pause/cancel/retry
- SQLite persistence in task_queue table
- WebSocket live updates via ws_manager
- Create task modal with agent assignment and priority
- Auto-approve rules for low-risk tasks
- HTMX polling for real-time column updates
- HOME TASK buttons now link to task queue with agent pre-selected
- MARKET HIRE buttons link to task queue with agent pre-selected

Work Order system:
- External submission API for agents/users (POST /work-orders/submit)
- Risk scoring and configurable auto-execution thresholds
- Dashboard at /work-orders/queue with approve/reject/execute flow
- Integration with swarm task system for execution

UI & Dashboard bug fixes:
- EVENTS: add startup event so page is never empty
- LEDGER: fix empty filter params in URL
- MISSION CONTROL: LLM backend and model now read from /health
- MISSION CONTROL: agent count fallback to /swarm/agents
- SWARM: HTMX fallback loads initial data if WebSocket is slow
- MEMORY: add edit/delete buttons for personal facts
- UPGRADES: add empty state guidance with links
- BRIEFING: add regenerate button and POST /briefing/regenerate endpoint

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 10:27:08 -05:00
Alexander Payne
6e6b4355bb fix: calculator tool, markdown rendering, prompt guardrails, briefing notification
- Add sandboxed calculator tool to Timmy's toolkit so arithmetic questions
  get exact answers instead of LLM hallucinations
- Update system prompts (lite + full) to instruct Timmy to always use the
  calculator and never attempt multi-digit math in his head
- Add self-contradiction guard to both prompts ("commit to your facts")
- Render Timmy's chat responses as markdown via marked.js + DOMPurify
  instead of raw escaped text
- Suppress empty briefing notification on startup when there are 0
  pending approval items
- Add calculator to session response sanitizer regex
- 18 new calculator tests, 2 updated briefing notification tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 09:35:59 -05:00
Alexander Payne
05d4dc997c fix: chat panel scroll — internal scroll on #chat-log, auto-scroll on new messages
- Set overflow:hidden on mc-main to prevent page-level scrolling
- Add max-height:100% to sidebar and chat panel to contain within viewport
- Use flex-wrap:nowrap on layout row to prevent column stacking on desktop
- Move scrollChat() to hx-on::after-settle for reliable post-swap scrolling
- Use requestAnimationFrame for smooth scroll-to-bottom timing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 09:15:40 -05:00
Alexander Payne
f95c9606f1 fix: Timmy startup crashes and clean initialization
- Remove show_tool_calls kwarg (not in Agno 2.5.3), which crashed Agent.__init__
- Guard memory_search against top_k=None from model, return formatted string
- Skip Telegram/Discord startup silently when no token configured
- Replace placeholder MEMORY.md with proper structured hot memory document

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 09:11:48 -05:00
Alexander Whitestone
dccd13df8e Merge pull request #46 from AlexanderWhitestone/feature/memory-layers-and-conversational-ai
feat: Event Log, Ledger, Memory, Cascade Router, Upgrade Queue, Activity Feed
2026-02-26 08:33:32 -05:00
Alexander Payne
96ed82d81e fix: memory route bug + fast E2E tests under 10 seconds
- Fix recall_personal_facts() call - remove unsupported limit parameter
- Replace 4 slow E2E test files with single fast test file
- All 6 E2E tests complete in ~9 seconds (was 60+ seconds)
- Reuse browser session across tests (module-scoped fixture)
- Combine related checks into single tests
- Add HTTP-only smoke test for speed
2026-02-26 08:08:32 -05:00
Alexander Payne
d8d976aa60 feat: complete Event Log, Ledger, Memory, Cascade Router, Upgrade Queue, Activity Feed
This commit implements six major features:

1. Event Log System (src/swarm/event_log.py)
   - SQLite-based audit trail for all swarm events
   - Task lifecycle tracking (created, assigned, completed, failed)
   - Agent lifecycle tracking (joined, left, status changes)
   - Integrated with coordinator for automatic logging
   - Dashboard page at /swarm/events

2. Lightning Ledger (src/lightning/ledger.py)
   - Transaction tracking for Lightning Network payments
   - Balance calculations (incoming, outgoing, net, available)
   - Integrated with payment_handler for automatic logging
   - Dashboard page at /lightning/ledger

3. Semantic Memory / Vector Store (src/memory/vector_store.py)
   - Embedding-based similarity search for Echo agent
   - Fallback to keyword matching if sentence-transformers unavailable
   - Personal facts storage and retrieval
   - Dashboard page at /memory

4. Cascade Router Integration (src/timmy/cascade_adapter.py)
   - Automatic LLM failover between providers (Ollama → AirLLM → API)
   - Circuit breaker pattern for failing providers
   - Metrics tracking per provider (latency, error rates)
   - Dashboard status page at /router/status

5. Self-Upgrade Approval Queue (src/upgrades/)
   - State machine for self-modifications: proposed → approved/rejected → applied/failed
   - Human approval required before applying changes
   - Git integration for branch management
   - Dashboard queue at /self-modify/queue

6. Real-Time Activity Feed (src/events/broadcaster.py)
   - WebSocket-based live activity streaming
   - Bridges event_log to dashboard clients
   - Activity panel on /swarm/live

Tests:
- 101 unit tests passing
- 4 new E2E test files for Selenium testing
- Run with: SELENIUM_UI=1 pytest tests/functional/ -v --headed

Documentation:
- 6 ADRs (017-022) documenting architecture decisions
- Implementation summary in docs/IMPLEMENTATION_SUMMARY.md
- Architecture diagram in docs/architecture-v2.md
2026-02-26 08:01:01 -05:00
AlexanderWhitestone
930ec9eb80 Security: Fix XSS vulnerabilities in dashboard templates and improve mobile test UI safety 2026-02-26 02:07:54 -05:00
Alexander Whitestone
3792bf16cf Merge pull request #44 from AlexanderWhitestone/feature/memory-layers-and-conversational-ai
Phase 3-4: Cascade LLM Router + Tool Registry Auto-Discovery
2026-02-25 20:04:30 -05:00
Alexander Payne
56437751d3 Phase 4: Tool Registry Auto-Discovery
- @mcp_tool decorator for marking functions as tools
- ToolDiscovery class for introspecting modules and packages
- Automatic JSON schema generation from type hints
- AST-based discovery for files (without importing)
- Auto-bootstrap on startup (packages=['tools'] by default)
- Support for tags, categories, and metadata
- Updated registry with register_tool() convenience method
- Environment variable MCP_AUTO_BOOTSTRAP to disable
- 39 tests with proper isolation and cleanup

Files Added:
- src/mcp/discovery.py: Tool discovery and introspection
- src/mcp/bootstrap.py: Auto-bootstrap functionality
- tests/test_mcp_discovery.py: 26 tests
- tests/test_mcp_bootstrap.py: 13 tests

Files Modified:
- src/mcp/registry.py: Added tags, source_module, auto_discovered fields
- src/mcp/__init__.py: Export discovery and bootstrap modules
- src/dashboard/app.py: Auto-bootstrap on startup
2026-02-25 19:59:42 -05:00
Alexander Payne
c658ca829c Phase 3: Cascade LLM Router with automatic failover
- YAML-based provider configuration (config/providers.yaml)
- Priority-ordered provider routing
- Circuit breaker pattern for failing providers
- Health check and availability monitoring
- Metrics tracking (latency, errors, success rates)
- Support for Ollama, OpenAI, Anthropic, AirLLM providers
- Automatic failover on rate limits or errors
- REST API endpoints for monitoring and control
- 41 comprehensive tests

API Endpoints:
- POST /api/v1/router/complete - Chat completion with failover
- GET /api/v1/router/status - Provider health status
- GET /api/v1/router/metrics - Detailed metrics
- GET /api/v1/router/providers - List all providers
- POST /api/v1/router/providers/{name}/control - Enable/disable/reset
- POST /api/v1/router/health-check - Run health checks
- GET /api/v1/router/config - View configuration
2026-02-25 19:43:43 -05:00
Alexander Payne
26e1691099 Fix Timmy coherence: persistent session, model-aware tools, response sanitization
Timmy was exhibiting severe incoherence (no memory between messages, tool call
leakage, chain-of-thought narration, random tool invocations) due to creating
a brand new agent per HTTP request and giving a 3B model (llama3.2) a 73-line
system prompt with complex tool-calling instructions it couldn't follow.

Key changes:
- Add session.py singleton with stable session_id for conversation continuity
- Add _model_supports_tools() to strip tools from small models (< 7B)
- Add two-tier prompts: lite (12 lines) for small models, full for capable ones
- Add response sanitizer to strip leaked JSON tool calls and CoT narration
- Set show_tool_calls=False to prevent raw tool JSON in output
- Wire ConversationManager for user name extraction
- Deprecate orphaned memory_layers.py (unused 4-layer system)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 19:18:08 -05:00
Alexander Payne
90a93aa070 fix: resolve merge conflict in base.html nav with main
Keep Mission Control link from this branch alongside SWARM and SPARK
links from main. All 939 tests pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 17:51:15 -05:00
Alexander Whitestone
d853e931ec Merge pull request #40 from AlexanderWhitestone/kimi/phase2-swarm-hardening-v2
Phase 2: Swarm hardening, auto-auction, WebSocket fix
2026-02-25 17:34:13 -05:00
Alexander Payne
4b12aca090 Swarm hardening: mobile nav, registry cleanup, module path fix
## Workset E: Swarm System Realization
- Verified PersonaNode bidding system is properly connected
- Coordinator already subscribes personas to task announcements
- Auction system works when /tasks/auction endpoint is used

## Workset F: Testing & Reliability
- Mobile nav: Add MOBILE link to desktop header (UX-01)
- Voice TTS: Verified graceful degradation already implemented
- Registry: Add proper connection cleanup with try/finally

## Workset G: Performance & Architecture
- Fix module path: websocket.handler -> ws_manager.handler
- Registry connections now properly closed after operations

All 895 tests pass.

Addresses QUALITY_ANALYSIS.md:
- UX-01: /mobile route now in desktop nav
- PERF-01: Connection cleanup improved (P3)
- FUNC-01/02: Verified bidding system operational
2026-02-25 17:26:42 -05:00
Alexander Payne
8fec9c41a5 feat: autonomous self-modifying agent with multi-backend LLM support
Adds SelfModifyLoop — an edit→validate→test→commit cycle that can read
its own failure reports, diagnose root causes, and restart autonomously.

Key capabilities:
- Multi-backend LLM: Anthropic Claude API, Ollama, or auto-detect
- Syntax validation via compile() before writing to disk
- Autonomous self-correction loop with configurable max cycles
- XML-based output format to avoid triple-quote delimiter conflicts
- Branch creation skipped by default to prevent container restarts
- CLI: self-modify run "instruction" --backend auto --autonomous
- 939 tests passing, 30 skipped

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 17:23:47 -05:00
Alexander Whitestone
c430f8002c Merge pull request #29 from AlexanderWhitestone/fix/xss-prevention-mobile-test
Security: XSS Prevention in Mobile Test Page
2026-02-25 08:01:05 -05:00
Alexander Payne
3463f4e4a4 fix: rename src/websocket to src/ws_manager to avoid websocket-client clash
selenium depends on websocket-client which installs a top-level
`websocket` package that shadows our src/websocket/ module on CI.
Renaming to ws_manager eliminates the conflict entirely — no more
sys.path hacks needed in conftest or Selenium tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 07:57:28 -05:00
Alexander Payne
29292cfb84 feat: single-command Docker startup, fix UI bugs, add Selenium tests
- Add `make up` / `make up DEV=1` for one-command Docker startup with
  optional hot-reload via docker-compose.dev.yml overlay
- Add `timmy up --dev` / `timmy down` CLI commands
- Fix cross-platform font resolution in creative assembler (7 test failures)
- Fix Ollama host URL not passed to Agno model (container connectivity)
- Fix task panel route shadowing by reordering literal routes before
  parameterized routes in swarm.py
- Fix chat input not clearing after send (hx-on::after-request)
- Fix chat scroll overflow (CSS min-height: 0 on flex children)
- Add Selenium UI smoke tests (17 tests, gated behind SELENIUM_UI=1)
- Install fonts-dejavu-core in Dockerfile for container font support
- Remove obsolete docker-compose version key
- Bump CSS cache-bust to v4

833 unit tests pass, 15 Selenium tests pass (2 skipped).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 07:20:56 -05:00
AlexanderWhitestone
bc1be23e23 security: prevent XSS in mobile-test by using textContent 2026-02-25 02:08:02 -05:00
Claude
15596ca325 feat: add Discord integration with chat_bridge abstraction layer
Introduces a vendor-agnostic chat platform architecture:

- chat_bridge/base.py: ChatPlatform ABC, ChatMessage, ChatThread
- chat_bridge/registry.py: PlatformRegistry singleton
- chat_bridge/invite_parser.py: QR + Ollama vision invite extraction
- chat_bridge/vendors/discord.py: DiscordVendor with native threads

Workflow: paste a screenshot of a Discord invite or QR code at
POST /discord/join → Timmy extracts the invite automatically.

Every Discord conversation gets its own thread, keeping channels clean.
Bot responds to @mentions and DMs, routes through Timmy agent.

43 new tests (base classes, registry, invite parser, vendor, routes).

https://claude.ai/code/session_01WU4h3cQQiouMwmgYmAgkMM
2026-02-25 01:11:14 +00:00
Claude
65a278dbee fix: comprehensive iPhone UI overhaul — glassmorphism, responsive layouts, theme unification
- base.html: add missing {% block extra_styles %}, mobile hamburger menu with
  slide-out nav, interactive-widget viewport meta, -webkit-text-size-adjust
- style.css: define 15+ missing CSS variables (--bg-secondary, --text-muted,
  --accent, --success, --danger, etc.), add missing utility classes (.grid,
  .stat, .agent-card, .agent-avatar, .form-group), glassmorphism card effects,
  iPhone breakpoints (768px, 390px), 44pt min touch targets, smooth animations
- mobile.html: rewrite with proper theme variables, glass cards, touch-friendly
  quick actions grid, chat with proper message bubbles
- swarm_live.html: replace undefined CSS vars, use mc-panel theme cards
- marketplace.html: responsive agent cards that stack on iPhone, themed pricing
- voice_button.html & voice_enhanced.html: proper theme integration, touch-sized
  buttons, themed result containers
- create_task.html: mobile-friendly forms with 16px font (prevents iOS zoom)
- tools.html & creative.html: themed headers, responsive column stacking
- spark.html: replace all hardcoded blue (#00d4ff) colors with theme purple/orange
- briefing.html: replace hardcoded bootstrap colors with theme variables

Fixes: header nav overflow on iPhone (7 links in single row), missing
extra_styles block silently dropping child template styles, undefined CSS
variables breaking mobile/swarm/marketplace/voice pages, sub-44pt touch
targets, missing -webkit-text-size-adjust, inconsistent color themes.

97 UI tests pass (91 UI-specific + 6 creative route).

https://claude.ai/code/session_01JiyhGyee2zoMN4p8xWYqEe
2026-02-24 22:25:04 +00:00
Alexander Whitestone
03ff505c4b Merge pull request #23 from AlexanderWhitestone/security/macaroon-forgery-and-xss-1771955896 2026-02-24 13:00:52 -05:00
AlexanderWhitestone
4daf382819 security: fix L402 macaroon forgery and XSS in templates 2026-02-24 12:58:19 -05:00
Claude
1103da339c feat: add full creative studio + DevOps tools (Pixel, Lyra, Reel personas)
Adds 3 new personas (Pixel, Lyra, Reel) and 5 new tool modules:

- Git/DevOps tools (GitPython): clone, status, diff, log, blame, branch,
  add, commit, push, pull, stash — wired to Forge and Helm personas
- Image generation (FLUX via diffusers): text-to-image, storyboards,
  variations — Pixel persona
- Music generation (ACE-Step 1.5): full songs with vocals+instrumentals,
  instrumental tracks, vocal-only tracks — Lyra persona
- Video generation (Wan 2.1 via diffusers): text-to-video, image-to-video
  clips — Reel persona
- Creative Director pipeline: multi-step orchestration that chains
  storyboard → music → video → assembly into 3+ minute final videos
- Video assembler (MoviePy + FFmpeg): stitch clips, overlay audio,
  title cards, subtitles, final export

Also includes:
- Spark Intelligence tool-level + creative pipeline event capture
- Creative Studio dashboard page (/creative/ui) with 4 tabs
- Config settings for all new models and output directories
- pyproject.toml creative optional extra for GPU dependencies
- 107 new tests covering all modules (624 total, all passing)

https://claude.ai/code/session_01KJm6jQkNi3aA3yoQJn636c
2026-02-24 16:31:47 +00:00
Claude
1ab26d30ad feat: integrate Spark Intelligence into Timmy swarm system
Adds a self-evolving cognitive layer inspired by vibeship-spark-intelligence,
adapted for Timmy's agent architecture. Spark captures swarm events, runs
EIDOS prediction-evaluation loops, consolidates memories, and generates
advisory recommendations — all backed by SQLite consistent with existing
patterns.

New modules:
- spark/memory.py — event capture with importance scoring + memory consolidation
- spark/eidos.py — EIDOS cognitive loop (predict → observe → evaluate → learn)
- spark/advisor.py — ranked advisory generation from accumulated intelligence
- spark/engine.py — top-level API wiring all subsystems together

Dashboard:
- /spark/ui — full Spark Intelligence dashboard (3-column: status/advisories,
  predictions/memories, event timeline) with HTMX auto-refresh
- /spark — JSON API for programmatic access
- SPARK link added to navigation header

Integration:
- Coordinator hooks emit Spark events on task post, bid, assign, complete, fail
- EIDOS predictions generated when tasks are posted, evaluated on completion
- Memory consolidation triggers when agents accumulate enough outcomes
- SPARK_ENABLED config toggle (default: true)

Tests: 47 new tests covering all Spark subsystems + dashboard routes.
Full suite: 538 tests passing.

https://claude.ai/code/session_01KJm6jQkNi3aA3yoQJn636c
2026-02-24 15:51:15 +00:00
Alexander Payne
ace5bfdf5f feat: Mission Control dashboard with sovereignty audit + scary path tests
Mission Control Dashboard:
- /swarm/mission-control page with real-time system status
- Sovereignty score display with visual progress bar
- Dependency health grid (Ollama, Redis, Lightning, SQLite)
- Recommendations based on dependency status
- Heartbeat monitor with tick counter
- System metrics: uptime, agents, tasks, sats earned

Health Endpoints:
- /health/sovereignty - Full sovereignty audit report
- /health/components - Component status and config

Tests (TDD approach):
- 11 Mission Control tests (all passing)
- 23 scary path tests for production scenarios
- Concurrent load, memory persistence, edge cases

Total: 525 tests passing
2026-02-22 20:48:14 -05:00
Alexander Payne
f0aa43533f feat: swarm E2E, MCP tools, timmy-serve L402, tests, notifications
Major Features:
- Auto-spawn persona agents (Echo, Forge, Seer) on app startup
- WebSocket broadcasts for real-time swarm UI updates
- MCP tool integration: web search, file I/O, shell, Python execution
- New /tools dashboard page showing agent capabilities
- Real timmy-serve start with L402 payment gating middleware
- Browser push notifications for briefings and task events

Tests:
- test_docker_agent.py: 9 tests for Docker agent runner
- test_swarm_integration_full.py: 18 E2E lifecycle tests
- Fixed all pytest warnings (436 tests, 0 warnings)

Improvements:
- Fixed coroutine warnings in coordinator broadcasts
- Fixed ResourceWarning for unclosed process pipes
- Added pytest-asyncio config to pyproject.toml
- Test isolation with proper event loop cleanup
2026-02-22 19:01:04 -05:00
Claude
167fd0a7b4 Add outcome-based learning system for swarm agents
Introduce a feedback loop where task outcomes (win/loss, success/failure)
feed back into agent bidding strategy. Borrows the "learn from outcomes"
concept from Spark Intelligence but builds it natively on Timmy's existing
SQLite + swarm architecture.

New module: src/swarm/learner.py
- Records every bid outcome with task description context
- Computes per-agent metrics: win rate, success rate, keyword performance
- suggest_bid() adjusts bids based on historical performance
- learned_keywords() discovers what task types agents actually excel at

Changes:
- persona_node: _compute_bid() now consults learner for adaptive adjustments
- coordinator: complete_task/fail_task feed results into learner
- coordinator: run_auction_and_assign records all bid outcomes
- routes/swarm: add /swarm/insights and /swarm/insights/{agent_id} endpoints
- routes/swarm: add POST /swarm/tasks/{task_id}/fail endpoint

All 413 tests pass (23 new + 390 existing).

https://claude.ai/code/session_01E5jhTCwSUnJk9p9zrTMVUJ
2026-02-22 22:04:37 +00:00
Alexander Payne
4020b5222f feat: add Docker-based swarm agent containerization
Add infrastructure for running swarm agents as isolated Docker
containers with HTTP-based coordination, startup recovery, and
enhanced dashboard UI for agent management.

- Dockerfile and docker-compose.yml for multi-service orchestration
- DockerAgentRunner for programmatic container lifecycle management
- Internal HTTP API for container agents to poll tasks and submit bids
- Startup recovery system to reconcile orphaned tasks and stale agents
- Enhanced UI partials for agent panels, chat, and task assignment
- Timmy docker entry point with heartbeat and task polling
- New Makefile targets for Docker workflows
- Tests for swarm recovery

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 16:21:32 -05:00
Claude
bb93697b92 feat: add Telegram bot integration
Bridges Telegram messages to Timmy via python-telegram-bot (optional
dependency). The bot token can be supplied through the TELEGRAM_TOKEN
env var or at runtime via the new POST /telegram/setup dashboard
endpoint, which (re)starts the bot without a restart.

Changes:
- src/telegram_bot/bot.py — TelegramBot singleton: token persistence
  (telegram_state.json), lifecycle (start/stop), /start command and
  message handler that forwards to Timmy
- src/dashboard/routes/telegram.py — /telegram/setup and /telegram/status
  FastAPI routes
- src/dashboard/app.py — register telegram router; auto-start/stop bot
  in lifespan hook
- src/config.py — TELEGRAM_TOKEN setting (pydantic-settings)
- pyproject.toml — [telegram] optional extra (python-telegram-bot>=21),
  telegram_bot wheel include
- .env.example — TELEGRAM_TOKEN section
- .gitignore — exclude telegram_state.json (contains token)
- tests/conftest.py — stub telegram/telegram.ext for offline test runs
- tests/test_telegram_bot.py — 16 tests covering token helpers,
  lifecycle, and all dashboard routes (370 total, all passing)

https://claude.ai/code/session_01CNBm3ZLobtx3Z1YogHq8ZS
2026-02-22 17:16:12 +00:00