Commit Graph

100 Commits

Author SHA1 Message Date
Alexander Whitestone
e5190b248a CI/CD Optimization: Guard Rails, Pre-commit Checks, and Test Fixes (#90)
* CI/CD Optimization: Guard Rails, Black Linting, and Pre-commit Hooks

- Fixed all test collection errors (Selenium imports, fixture paths, syntax)
- Implemented pre-commit hooks with Black formatting and isort
- Created comprehensive Makefile with test targets (unit, integration, functional, e2e)
- Added pytest.ini with marker definitions for test categorization
- Established guard rails to prevent future collection errors
- Wrapped optional dependencies (Selenium, MoviePy) in try-except blocks
- Added conftest_markers for automatic test categorization

This ensures a smooth development stream with:
- Fast feedback loops (pre-commit checks before push)
- Consistent code formatting (Black)
- Reliable CI/CD (no collection errors, proper test isolation)
- Clear test organization (unit, integration, functional, E2E)

* Fix CI/CD test failures:
- Export templates from dashboard.app
- Fix model name assertion in test_agent.py
- Fix platform-agnostic path resolution in test_path_resolution.py
- Skip Docker tests in test_docker_deployment.py if docker not available
- Fix test_model_fallback_chain logic in test_ollama_integration.py

* Add preventative pre-commit checks and Docker test skipif decorators:
- Create pre_commit_checks.py script for common CI failures
- Add skipif decorators to Docker tests
- Improve test robustness for CI environments
2026-02-28 11:36:50 -05:00
Alexander Whitestone
a5fd680428 feat: microservices refactoring with TDD and Docker optimization (#88)
## Summary
Complete refactoring of Timmy Time from monolithic architecture to microservices
using Test-Driven Development (TDD) and optimized Docker builds.

## Changes

### Core Improvements
- Optimized dashboard startup: moved blocking tasks to async background processes
- Fixed model fallback logic in agent configuration
- Enhanced test fixtures with comprehensive conftest.py

### Microservices Architecture
- Created separate Dockerfiles for dashboard, Ollama, and agent services
- Implemented docker-compose.microservices.yml for service orchestration
- Added health checks and non-root user execution for security
- Multi-stage Docker builds for lean, fast images

### Testing
- Added E2E tests for dashboard responsiveness
- Added E2E tests for Ollama integration
- Added E2E tests for microservices architecture validation
- All 36 tests passing, 8 skipped (environment-specific)

### Documentation
- Created comprehensive final report
- Generated issue resolution plan
- Added interview transcript demonstrating core agent functionality

### New Modules
- skill_absorption.py: Dynamic skill loading and integration system for Timmy

## Test Results
 36 passed, 8 skipped, 6 warnings
 All microservices tests passing
 Dashboard responsiveness verified
 Ollama integration validated

## Files Added/Modified
- docker/: Multi-stage Dockerfiles for all services
- tests/e2e/: Comprehensive E2E test suite
- src/timmy/skill_absorption.py: Skill absorption system
- src/dashboard/app.py: Optimized startup logic
- tests/conftest.py: Enhanced test fixtures
- docker-compose.microservices.yml: Service orchestration

## Breaking Changes
None - all changes are backward compatible

## Next Steps
- Integrate skill absorption system into agent workflow
- Test with microservices-tdd-refactor skill
- Deploy to production with docker-compose orchestration
2026-02-28 11:07:19 -05:00
Alexander Whitestone
ab014dc5c6 feat: add timmy interview command for structured agent initialization (#87) 2026-02-28 09:35:44 -05:00
Alexander Whitestone
add3f7a07a fix: register task_request handler and fix Docker 403 errors on macOS (#86) 2026-02-28 07:39:31 -05:00
Alexander Whitestone
3426761894 fix: unblock task queue — auto-approve all tasks, recycle zombie runners (#85)
The task queue was completely stuck: 82 tasks trapped in pending_approval,
4 zombie tasks frozen in running, and the worker loop unable to process
anything. This removes the approval gate as the default and adds startup
recovery for orphaned tasks.

- Auto-approve all tasks by default; only task_type="escalation" requires
  human review (and escalations never block the processor)
- Add reconcile_zombie_tasks() to reset RUNNING→APPROVED on startup
- Use in-memory _current_task for concurrency check instead of DB status
  so stale RUNNING rows from a crash can't block new work
- Update get_next_pending_task to only query APPROVED tasks
- Update all callsites (chat route, API, form) to match new defaults

Co-authored-by: Alexander Payne <apayne@MM.local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 06:57:51 -05:00
Alexander Whitestone
bc21bbe96f fix: connect WebSocket to correct /swarm/live endpoint (#82)
The tasks board and Timmy panel were connecting to /ws which doesn't
exist, causing constant 403 Forbidden rejections and preventing
live event updates from reaching the UI.

Co-authored-by: Alexander Payne <apayne@MM.local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 20:27:20 -05:00
Alexander Whitestone
aa3263bc3b feat: automatic error feedback loop with bug report tracker (#80)
Errors and uncaught exceptions are now automatically captured, deduplicated,
persisted to a rotating log file, and filed as bug report tasks in the
existing task queue — giving Timmy a sovereign, local issue tracker with
zero new dependencies.

- Add RotatingFileHandler writing errors to logs/errors.log (5MB rotate, 5 backups)
- Add error capture module with stack-trace hashing and 5-min dedup window
- Add FastAPI exception middleware + global exception handler
- Instrument all background loops (briefing, thinking, task processor) with capture_error()
- Extend task queue with bug_report task type and auto-approve rule
- Fix auto-approve type matching (was ignoring task_type field entirely)
- Add /bugs dashboard page and /api/bugs JSON endpoints
- Add ERROR_CAPTURED and BUG_REPORT_CREATED event types for real-time feed
- Add BUGS nav link to desktop and mobile navigation
- Add 16 tests covering error capture, deduplication, and bug report routes

Co-authored-by: Alexander Payne <apayne@MM.local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 19:51:37 -05:00
Alexander Whitestone
5b6d33e05a feat: task queue system with startup drain and backlogging (#76)
* feat: add task queue system for Timmy - all work goes through the queue

- Add queue position tracking to task_queue models with task_type field
- Add TaskProcessor class that consumes tasks from queue one at a time
- Modify chat route to queue all messages for async processing
- Chat responses get 'high' priority to jump ahead of thought tasks
- Add queue status API endpoints for position polling
- Update UI to show queue position (x/y) and current task banner
- Replace thinking loop with task-based approach - thoughts are queued tasks
- Push responses to user via WebSocket instead of immediate HTTP response
- Add database migrations for existing tables

* feat: Timmy drains task queue on startup, backlogs unhandleable tasks

On spin-up, Timmy now iterates through all pending/approved tasks
immediately instead of waiting for the polling loop. Tasks without a
registered handler or with permanent errors are moved to a new
BACKLOGGED status with a reason, keeping the queue clear for work
Timmy can actually do.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Alexander Payne <apayne@MM.local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 01:52:42 -05:00
Alexander Whitestone
849b5b1a8d feat: add default thinking thread — Timmy always ponders (#75) 2026-02-27 01:00:11 -05:00
Alexander Whitestone
a975a845c5 feat: Timmy system introspection, delegation, and session logging (#74)
* test: remove hardcoded sleeps, add pytest-timeout

- Replace fixed time.sleep() calls with intelligent polling or WebDriverWait
- Add pytest-timeout dependency and --timeout=30 to prevent hangs
- Fixes test flakiness and improves test suite speed

* feat: add Aider AI tool to Forge's toolkit

- Add Aider tool that calls local Ollama (qwen2.5:14b) for AI coding assist
- Register tool in Forge's code toolkit
- Add functional tests for the Aider tool

* config: add opencode.json with local Ollama provider for sovereign AI

* feat: Timmy fixes and improvements

## Bug Fixes
- Fix read_file path resolution: add ~ expansion, proper relative path handling
- Add repo_root to config.py with auto-detection from .git location
- Fix hardcoded llama3.2 - now dynamic from settings.ollama_model

## Timmy's Requests
- Add communication protocol to AGENTS.md (read context first, explain changes)
- Create DECISIONS.md for architectural decision documentation
- Add reasoning guidance to system prompts (step-by-step, state uncertainty)
- Update tests to reflect correct model name (llama3.1:8b-instruct)

## Testing
- All 177 dashboard tests pass
- All 32 prompt/tool tests pass

* feat: Timmy system introspection, delegation, and session logging

## System Introspection (Sovereign Self-Knowledge)
- Add get_system_info() tool - Timmy can now query his runtime environment
- Add check_ollama_health() - verify Ollama status
- Add get_memory_status() - check memory tier status
- True introspection vs hardcoded prompts

## Path Resolution Fix
- Fix all toolkits to use settings.repo_root consistently
- Now uses Path(settings.repo_root) instead of Path.cwd()

## Inter-Agent Delegation
- Add delegate_task() tool - Timmy can dispatch to Seer, Forge, Echo, etc.
- Add list_swarm_agents() - query available agents

## Session Logging
- Add SessionLogger for comprehensive interaction logging
- Records messages, tool calls, errors, decisions
- Writes to /logs/session_{date}.jsonl

## Tests
- Add tests for introspection tools
- Add tests for delegation tools
- Add tests for session logging
- Add tests for path resolution
- All 18 new tests pass
- All 177 dashboard tests pass

---------

Co-authored-by: Alexander Payne <apayne@MM.local>
2026-02-27 00:11:53 -05:00
Alexander Whitestone
5e60a6453b feat: wire mobile app to real Timmy backend via JSON REST API (#73)
Add /api/chat, /api/upload, and /api/chat/history endpoints to the
FastAPI dashboard so the Expo mobile app talks directly to Timmy's
brain (Ollama) instead of a non-existent Node.js server.

Backend:
- New src/dashboard/routes/chat_api.py with 4 endpoints
- Mount /uploads/ for serving chat attachments
- Same context injection and session management as HTMX chat

Mobile app fixes:
- Point API base URL at port 8000 (FastAPI) instead of 3000
- Create lib/_core/theme.ts (was referenced but never created)
- Fix shared/types.ts (remove broken drizzle/errors re-exports)
- Remove broken server/chat.ts and 1,235-line template README
- Clean package.json (remove express, mysql2, drizzle, tRPC deps)
- Remove debug console.log from theme-provider

Tests: 13 new tests covering all API endpoints (all passing).

https://claude.ai/code/session_01XqErDoh2rVsPY8oTj21Lz2

Co-authored-by: Claude <noreply@anthropic.com>
2026-02-26 23:58:53 -05:00
Alexander Whitestone
18ed6232f9 feat: Timmy fixes and improvements (#72)
* test: remove hardcoded sleeps, add pytest-timeout

- Replace fixed time.sleep() calls with intelligent polling or WebDriverWait
- Add pytest-timeout dependency and --timeout=30 to prevent hangs
- Fixes test flakiness and improves test suite speed

* feat: add Aider AI tool to Forge's toolkit

- Add Aider tool that calls local Ollama (qwen2.5:14b) for AI coding assist
- Register tool in Forge's code toolkit
- Add functional tests for the Aider tool

* config: add opencode.json with local Ollama provider for sovereign AI

* feat: Timmy fixes and improvements

## Bug Fixes
- Fix read_file path resolution: add ~ expansion, proper relative path handling
- Add repo_root to config.py with auto-detection from .git location
- Fix hardcoded llama3.2 - now dynamic from settings.ollama_model

## Timmy's Requests
- Add communication protocol to AGENTS.md (read context first, explain changes)
- Create DECISIONS.md for architectural decision documentation
- Add reasoning guidance to system prompts (step-by-step, state uncertainty)
- Update tests to reflect correct model name (llama3.1:8b-instruct)

## Testing
- All 177 dashboard tests pass
- All 32 prompt/tool tests pass

---------

Co-authored-by: Alexander Payne <apayne@MM.local>
2026-02-26 23:39:13 -05:00
Alexander Whitestone
a5765c33b6 feat: add Aider AI tool to Forge's toolkit (#70)
* test: remove hardcoded sleeps, add pytest-timeout

- Replace fixed time.sleep() calls with intelligent polling or WebDriverWait
- Add pytest-timeout dependency and --timeout=30 to prevent hangs
- Fixes test flakiness and improves test suite speed

* feat: add Aider AI tool to Forge's toolkit

- Add Aider tool that calls local Ollama (qwen2.5:14b) for AI coding assist
- Register tool in Forge's code toolkit
- Add functional tests for the Aider tool

---------

Co-authored-by: Alexander Payne <apayne@MM.local>
2026-02-26 23:17:19 -05:00
Alexander Payne
72a58f1f49 feat: Multi-modal support with automatic model fallback
- Add MultiModalManager with capability detection for vision/audio/tools
- Define fallback chains: vision (llama3.2:3b -> llava:7b -> moondream)
                       tools (llama3.1:8b-instruct -> qwen2.5:7b)
- Update CascadeRouter to detect content type and select appropriate models
- Add model pulling with automatic fallback in agent creation
- Update providers.yaml with multi-modal model configurations
- Update OllamaAdapter to use model resolution with vision support

Tests: All 96 infrastructure tests pass
2026-02-26 22:29:44 -05:00
Alexander Payne
a85661274c Merge main into feature/model-upgrade-llama3.1 with conflict resolution 2026-02-26 22:19:44 -05:00
Claude
211c54bc8c feat: add custom weights, model registry, per-agent models, and reward scoring
Inspired by OpenClaw-RL's multi-model orchestration, this adds four
features for custom model management:

1. Custom model registry (infrastructure/models/registry.py) — SQLite-backed
   registry for GGUF, safetensors, HF checkpoint, and Ollama models with
   role-based lookups (general, reward, teacher, judge).

2. Per-agent model assignment — each swarm persona can use a different model
   instead of sharing the global default. Resolved via registry assignment >
   persona default > global default.

3. Runtime model management API (/api/v1/models) — REST endpoints to register,
   list, assign, enable/disable, and remove custom models without restart.
   Includes a dashboard page at /models.

4. Reward model scoring (PRM-style) — majority-vote quality evaluation of
   agent outputs using a configurable reward model. Scores persist in SQLite
   and feed into the swarm learner.

New config settings: custom_weights_dir, reward_model_enabled,
reward_model_name, reward_model_votes.

54 new tests covering registry CRUD, API endpoints, agent assignments,
role lookups, and reward scoring.

https://claude.ai/code/session_01V4iTozMwcE2gjfnCJdCugC
2026-02-27 01:27:53 +00:00
Claude
17059bc0ea feat: add Grok (xAI) as opt-in premium backend with monetization
- Add GrokBackend class in src/timmy/backends.py with full sync/async
  support, health checks, usage stats, and cost estimation in sats
- Add consult_grok tool to Timmy's toolkit for proactive Grok queries
- Extend cascade router with Grok provider type for failover chain
- Add Grok Mode toggle card to Mission Control dashboard (HTMX live)
- Add "Ask Grok" button on chat input for direct Grok queries
- Add /grok/* routes: status, toggle, chat, stats endpoints
- Integrate Lightning invoice generation for Grok usage monetization
- Add GROK_ENABLED, XAI_API_KEY, GROK_DEFAULT_MODEL, GROK_MAX_SATS_PER_QUERY,
  GROK_FREE config settings via pydantic-settings
- Update .env.example and docker-compose.yml with Grok env vars
- Add 21 tests covering backend, tools, and route endpoints (all green)

Local-first ethos preserved: Grok is premium augmentation only,
disabled by default, and Lightning-payable when enabled.

https://claude.ai/code/session_01FygwN8wS8J6WGZ8FPb7XGV
2026-02-27 01:12:51 +00:00
Claude
bc2c09d3f8 feat: replace GitHub page with embedded Timmy chat interface
Replaces the marketing landing page with a minimal, full-screen chat
interface that connects to a running Timmy instance. Mobile-first design
with single vertical scroll direction, looping scroll, no zoom, no
buttons — just type and press Enter to talk to Timmy.

- docs/index.html: full rewrite as a clean chat UI with dark terminal
  theme, looping infinite scroll, markdown rendering, connection status,
  and /connect, /clear, /help slash commands
- src/dashboard/app.py: add CORS middleware so the GitHub Pages site can
  reach a local Timmy server cross-origin
- src/config.py: add cors_origins setting (defaults to ["*"])

https://claude.ai/code/session_01AWLxg6KDWsfCATiuvsRMGr
2026-02-27 00:35:33 +00:00
Claude
3b7fcc5ebc feat: add in-browser local model support for iPhone via WebLLM
Enable Timmy to run directly on iPhone by loading a small LLM into
the browser via WebGPU (Safari 26+ / iOS 26+). No server connection
required — fully sovereign, fully offline.

New files:
- static/local_llm.js: WebLLM wrapper with model catalogue, WebGPU
  detection, streaming chat, and progress callbacks
- templates/mobile_local.html: Mobile-optimized UI with model
  selector, download progress, LOCAL/SERVER badge, and chat
- tests/dashboard/test_local_models.py: 31 tests covering routes,
  config, template UX, JS asset, and XSS prevention

Changes:
- config.py: browser_model_enabled, browser_model_id,
  browser_model_fallback settings
- routes/mobile.py: /mobile/local page, /mobile/local-models API
- base.html: LOCAL AI nav link

Supported models: SmolLM2-360M (~200MB), Qwen2.5-0.5B (~350MB),
SmolLM2-1.7B (~1GB), Llama-3.2-1B (~700MB). Falls back to
server-side Ollama when local model is unavailable.

https://claude.ai/code/session_01Cqkvr4sZbED7T3iDu1rwSD
2026-02-27 00:03:05 +00:00
Claude
89e677e5cc chore: remove accidentally tracked self_modify_reports
These test artifacts are already in .gitignore (data/self_modify_reports/)
but were included because they landed in src/data/ during test runs.

https://claude.ai/code/session_01JNjWfHqusjT3aiN4vvYgUk
2026-02-26 22:07:59 +00:00
Claude
9f4c809f70 refactor: Phase 2b — consolidate 28 modules into 14 packages
Complete the module consolidation planned in REFACTORING_PLAN.md:

Modules merged:
- work_orders/ + task_queue/ → swarm/ (subpackages)
- self_modify/ + self_tdd/ + upgrades/ → self_coding/ (subpackages)
- tools/ → creative/tools/
- chat_bridge/ + telegram_bot/ + shortcuts/ + voice/ → integrations/ (new)
- ws_manager/ + notifications/ + events/ + router/ → infrastructure/ (new)
- agents/ + agent_core/ + memory/ → timmy/ (subpackages)

Updated across codebase:
- 66 source files: import statements rewritten
- 13 test files: import + patch() target strings rewritten
- pyproject.toml: wheel includes (28→14), entry points updated
- CLAUDE.md: singleton paths, module map, entry points table
- AGENTS.md: file convention updates
- REFACTORING_PLAN.md: execution status, success metrics

Extras:
- Module-level CLAUDE.md added to 6 key packages (Phase 6.2)
- Zero test regressions: 1462 tests passing

https://claude.ai/code/session_01JNjWfHqusjT3aiN4vvYgUk
2026-02-26 22:07:41 +00:00
Claude
d2c80fbf4c refactor: Phase 2a — consolidate dashboard routes (27→22 files)
Merge related route files to reduce sprawl:
- voice.py ← voice_enhanced.py (enhanced pipeline merged in)
- swarm.py ← swarm_internal.py + swarm_ws.py (internal API + WebSocket)
- self_coding.py ← self_modify.py (self-modify endpoints merged in)
- Delete mobile_test.py route + template (test-only page, not for prod)
- Delete test_xss_prevention.py (tested the deleted mobile_test page)

Update app.py to use consolidated imports.
Update test_voice_enhanced.py patch paths.
Remove mobile_test.py from coverage omit (file deleted).

27 route files → 22. Tests: 1502 passed (1 removed with deleted page).

https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN
2026-02-26 21:30:39 +00:00
Claude
6045077144 refactor: Phase 1/4/6 — doc cleanup, config fix, token optimization
Phase 1 — Documentation cleanup:
- Slim README 303→93 lines (remove duplicated architecture, config tables)
- Slim CLAUDE.md 267→80 lines (remove project layout, env vars, CI section)
- Slim AGENTS.md 342→72 lines (remove duplicated patterns, running locally)
- Delete MEMORY.md, WORKSET_PLAN.md, WORKSET_PLAN_PHASE2.md (session docs)
- Archive PLAN.md, IMPLEMENTATION_SUMMARY.md to docs/
- Move QUALITY_ANALYSIS.md, QUALITY_REVIEW_REPORT.md to docs/
- Move apply_security_fixes.py, activate_self_tdd.sh to scripts/

Phase 4 — Config & build cleanup:
- Fix wheel build: add 11 missing modules to pyproject.toml include list
- Add pytest markers (unit, integration, dashboard, swarm, slow)
- Add data/self_modify_reports/ and .handoff/ to .gitignore

Phase 6 — Token optimization:
- Add docstrings to 15 __init__.py files that were empty
- Create __init__.py for events/, memory/, upgrades/ modules

Root markdown: 87KB → ~18KB (79% reduction)

https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN
2026-02-26 21:03:15 +00:00
Alexander Payne
d9e556d4c1 fix: Upgrade model to llama3.1:8b-instruct + fix git tool cwd
Change 1: Model Upgrade (Primary Fix)
- Changed default model from llama3.2 to llama3.1:8b-instruct
- llama3.1:8b-instruct is fine-tuned for reliable tool/function calling
- llama3.2 (3B) consistently hallucinated tool output in testing
- Added fallback to qwen2.5:14b if primary unavailable

Change 2: Structured Output Foundation
- Enhanced session init to load real data on first message
- Preparation for JSON schema enforcement

Change 3: Git Tool Working Directory Fix
- Rewrote git_tools.py to use subprocess with cwd=REPO_ROOT
- REPO_ROOT auto-detected at module load time
- All git commands now run from correct directory

Change 4: Session Init with Git Log
- _session_init() reads git log --oneline -15 on first message
- Recent commits prepended to system prompt
- Timmy can now answer 'what's new?' from actual commit data

Change 5: Documentation
- Updated README with new model requirement
- Added CHANGELOG_2025-02-27.md

User must run: ollama pull llama3.1:8b-instruct

All 18 git tool tests pass.
2026-02-26 13:42:36 -05:00
Alexander Payne
d7aaae74d5 feat: Hands Dashboard Routes and UI (Phase 3.6)
Add dashboard for managing autonomous Hands:

Routes (src/dashboard/routes/hands.py):
- GET /api/hands - List all Hands with status
- GET /api/hands/{name} - Get Hand details
- POST /api/hands/{name}/trigger - Manual trigger
- POST /api/hands/{name}/pause - Pause scheduled Hand
- POST /api/hands/{name}/resume - Resume paused Hand
- GET /api/approvals - List pending approvals
- POST /api/approvals/{id}/approve - Approve request
- POST /api/approvals/{id}/reject - Reject request
- GET /api/executions - List execution history

Templates:
- hands.html - Main dashboard page
- partials/hands_list.html - Active Hands list
- partials/approvals_list.html - Pending approvals
- partials/hand_executions.html - Execution history

Integration:
- Wired up in app.py
- Navigation links in base.html
2026-02-26 12:46:48 -05:00
Alexander Payne
73cf780656 feat: HandRunner and hands module init (Phase 3.5)
Add HandRunner for executing Hands:

- hands/runner.py: Hand execution engine
  - Load SYSTEM.md and SKILL.md files
  - Inject domain expertise into LLM context
  - Check and handle approval gates
  - Execute tool loop with LLM
  - Deliver output to dashboard/channel/file
  - Log execution records

- hands/__init__.py: Module exports
  - Export all public classes and models
  - Usage documentation

The HandRunner completes the core Hands infrastructure.
2026-02-26 12:43:40 -05:00
Alexander Payne
8a952f6818 feat: Hands Infrastructure - Models, Registry, Scheduler (Phase 3.1-3.3)
Add core Hands infrastructure:

- hands/models.py: Pydantic models for HAND.toml schema
  - HandConfig: Complete hand configuration
  - HandState: Runtime state tracking
  - HandExecution: Execution records
  - ApprovalRequest: Approval queue entries

- hands/registry.py: HandRegistry for loading and indexing
  - Load Hands from hands/ directory
  - Parse HAND.toml manifests
  - SQLite indexing for fast lookup
  - Approval queue management
  - Execution history logging

- hands/scheduler.py: APScheduler-based scheduling
  - Cron and interval triggers
  - Job management (schedule, pause, resume, unschedule)
  - Hand execution wrapper
  - Manual trigger support
2026-02-26 12:41:52 -05:00
Alexander Payne
62365cc9b2 feat: Wire up Self-Coding Dashboard
Integrate self-coding routes into dashboard:

Changes:
- Add import for self_coding_router in app.py
- Include self_coding_router in FastAPI app
- Add SELF-CODING link to desktop navigation
- Add SELF-CODING link to mobile navigation

The self-coding dashboard is now accessible at /self-coding
2026-02-26 12:28:30 -05:00
Alexander Payne
e81be8aed7 feat: Self-Coding Dashboard HTMX Templates
Add complete UI for self-coding dashboard:

Templates:
- self_coding.html - Main dashboard page with layout
- partials/self_coding_stats.html - Stats cards (total, success rate, etc)
- partials/journal_entries.html - List of modification attempts
- partials/journal_entry_detail.html - Expanded view of single attempt
- partials/execute_form.html - Task execution form
- partials/execute_result.html - Execution result display
- partials/error.html - Error message display

Features:
- HTMX-powered dynamic updates
- Real-time journal filtering (all/success/failure)
- Modal dialog for task execution
- Responsive Bootstrap 5 styling
- Automatic refresh after successful execution
2026-02-26 12:28:05 -05:00
Alexander Payne
cb70cb392a feat: Self-Coding Dashboard API Routes
Add FastAPI routes for self-coding dashboard:

API Endpoints:
- GET /api/journal - List modification journal entries
- GET /api/journal/{id} - Get detailed attempt info
- GET /api/stats - Get success rate metrics
- POST /api/execute - Execute self-edit task
- GET /api/codebase/summary - Get codebase summary
- POST /api/codebase/reindex - Trigger reindex

HTMX Partials:
- GET /self-coding/ - Main dashboard page
- GET /self-coding/journal - Journal entries list
- GET /self-coding/stats - Stats cards
- GET /self-coding/execute-form - Task execution form
- POST /self-coding/execute - Execute task endpoint
- GET /journal/{id}/detail - Entry detail view
2026-02-26 12:28:05 -05:00
Alexander Payne
49ca4dad43 feat: Self-Edit MCP Tool (Phase 2.1)
Implements the Self-Edit MCP Tool that orchestrates the self-coding foundation:

## Core Features

1. **SelfEditTool** (src/tools/self_edit.py)
   - Complete self-modification orchestrator
   - Pre-flight safety checks (clean repo, on main branch)
   - Context gathering (codebase indexer + modification journal)
   - Feature branch creation (timmy/self-edit/{timestamp})
   - LLM-based edit planning with fallback
   - Safety constraint validation
   - Aider integration (preferred) with fallback to direct editing
   - Automatic test execution via pytest
   - Commit on success, rollback on failure
   - Modification journaling with reflections

2. **Safety Constraints**
   - Max 3 files per commit
   - Max 100 lines changed
   - Protected files list (self-edit tool, foundation services)
   - Only modify files with test coverage
   - Max 3 retries on failure
   - Requires user confirmation (MCP tool registration)

3. **Execution Backends**
   - Aider integration: --auto-test --test-cmd pytest --yes --no-git
   - Direct editing fallback: LLM-based file modification with AST validation
   - Automatic backend selection based on availability

## Test Coverage

- 19 new tests covering:
  - Basic functionality (initialization, preflight checks)
  - Edit planning (with/without LLM)
  - Safety validation (file limits, protected files)
  - Execution flow (success and failure paths)
  - Error handling (exceptions, LLM failures)
  - MCP registration

## Usage

    from tools.self_edit import register_self_edit_tool
    from mcp.registry import tool_registry

    # Register with MCP
    register_self_edit_tool(tool_registry, llm_adapter)

Phase 2.2 will add Dashboard API endpoints and UI.
2026-02-26 12:28:05 -05:00
Claude
63bbe2a288 feat: add sovereign biblical text integration module (scripture)
Implement the core scripture module for local-first ESV text storage,
verse retrieval, reference parsing, original language support,
cross-referencing, topical mapping, and automated meditation workflows.

Architecture:
- scripture/constants.py: 66-book Protestant canon with aliases and metadata
- scripture/models.py: Pydantic models with integer-encoded verse IDs
- scripture/parser.py: Regex-based reference extraction and formatting
- scripture/store.py: SQLite-backed verse/xref/topic/Strong's storage
- scripture/memory.py: Tripartite memory (working/long-term/associative)
- scripture/meditation.py: Sequential/thematic/lectionary meditation scheduler
- dashboard/routes/scripture.py: REST endpoints for all scripture operations
- config.py: scripture_enabled, translation, meditation settings
- 95 comprehensive tests covering all modules and routes

https://claude.ai/code/session_015wv7FM6BFsgZ35Us6WeY7H
2026-02-26 17:06:00 +00:00
Alexander Payne
431cf3e020 merge: resolve conflicts with main, keep comprehensive chat pipeline
Resolved merge conflicts in agents.py and test_task_queue.py:
- Keep full chat-to-task pipeline (agent/priority extraction, question
  filtering, context injection) over simpler main version
- Incorporate test_briefing_task_queue_summary from main
- All 64 task queue tests pass

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 11:47:34 -05:00
Alexander Payne
3ca8e9f2d6 fix: chat evaluation bugs — task pipeline, prompt grounding, markdown rendering
Addresses 14 bugs from 3 rounds of deep chat evaluation:

- Add chat-to-task pipeline in agents.py with regex-based intent detection,
  agent extraction, priority extraction, and title cleaning
- Filter meta-questions ("how do I create a task?") from task creation
- Inject real-time date/time context into every chat message
- Inject live queue state when user asks about tasks
- Ground system prompts with agent roster, honesty guardrails, self-knowledge,
  math delegation template, anti-filler rules, values-conflict guidance
- Add CSS for markdown code blocks, inline code, lists, blockquotes in chat
- Add highlight.js CDN for syntax highlighting in chat responses
- Reduce small-model memory context budget (4000→2000) for expanded prompt
- Add 27 comprehensive tests covering the full chat-to-task pipeline

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 11:42:42 -05:00
Alexander Whitestone
32ad43a61a Merge pull request #51 from AlexanderWhitestone/feature/task-queue-and-ui-fixes
feat: wire chat-to-task-queue and briefing integration
2026-02-26 11:31:25 -05:00
Alexander Payne
18bc64b36d feat: Self-Coding Foundation (Phase 1)
Implements the foundational infrastructure for Timmy's self-modification capability:

## New Services

1. **GitSafety** (src/self_coding/git_safety.py)
   - Atomic git operations with rollback capability
   - Snapshot/restore for safe experimentation
   - Feature branch management (timmy/self-edit/{timestamp})
   - Merge to main only after tests pass

2. **CodebaseIndexer** (src/self_coding/codebase_indexer.py)
   - AST-based parsing of Python source files
   - Extracts classes, functions, imports, docstrings
   - Builds dependency graph for blast radius analysis
   - SQLite storage with hash-based incremental indexing
   - get_summary() for LLM context (<4000 tokens)
   - get_relevant_files() for task-based file discovery

3. **ModificationJournal** (src/self_coding/modification_journal.py)
   - Persistent log of all self-modification attempts
   - Tracks outcomes: success, failure, rollback
   - find_similar() for learning from past attempts
   - Success rate metrics and recent failure tracking
   - Supports vector embeddings (Phase 2)

4. **ReflectionService** (src/self_coding/reflection.py)
   - LLM-powered analysis of modification attempts
   - Generates lessons learned from successes and failures
   - Fallback templates when LLM unavailable
   - Supports context from similar past attempts

## Test Coverage

- 104 new tests across 7 test files
- 95% code coverage on self_coding module
- Green path tests: full workflow integration
- Red path tests: errors, rollbacks, edge cases
- Safety constraint tests: test coverage requirements, protected files

## Usage

    from self_coding import GitSafety, CodebaseIndexer, ModificationJournal

    git = GitSafety(repo_path=/path/to/repo)
    indexer = CodebaseIndexer(repo_path=/path/to/repo)
    journal = ModificationJournal()

Phase 2 will build the Self-Edit MCP Tool that orchestrates these services.
2026-02-26 11:08:05 -05:00
Alexander Payne
bc9089ef96 feat: wire chat-to-task-queue and briefing integration
- Chat messages like "add X to the queue" or "create a task" are
  intercepted and create a task_queue entry with pending_approval
  status instead of going through to the LLM
- Briefing engine now gathers task queue stats (pending, running,
  completed, failed) and includes them in the morning briefing prompt
- 7 new tests covering detection patterns, chat integration, and
  briefing summary

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 10:33:14 -05:00
Alexander Payne
5f9bbb8435 feat: add task queue with human-in-the-loop approval + work orders + UI bug fixes
Task Queue system:
- New /tasks page with three-column layout (Pending/Active/Completed)
- Full CRUD API at /api/tasks with approve/veto/modify/pause/cancel/retry
- SQLite persistence in task_queue table
- WebSocket live updates via ws_manager
- Create task modal with agent assignment and priority
- Auto-approve rules for low-risk tasks
- HTMX polling for real-time column updates
- HOME TASK buttons now link to task queue with agent pre-selected
- MARKET HIRE buttons link to task queue with agent pre-selected

Work Order system:
- External submission API for agents/users (POST /work-orders/submit)
- Risk scoring and configurable auto-execution thresholds
- Dashboard at /work-orders/queue with approve/reject/execute flow
- Integration with swarm task system for execution

UI & Dashboard bug fixes:
- EVENTS: add startup event so page is never empty
- LEDGER: fix empty filter params in URL
- MISSION CONTROL: LLM backend and model now read from /health
- MISSION CONTROL: agent count fallback to /swarm/agents
- SWARM: HTMX fallback loads initial data if WebSocket is slow
- MEMORY: add edit/delete buttons for personal facts
- UPGRADES: add empty state guidance with links
- BRIEFING: add regenerate button and POST /briefing/regenerate endpoint

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 10:27:08 -05:00
Alexander Payne
6e6b4355bb fix: calculator tool, markdown rendering, prompt guardrails, briefing notification
- Add sandboxed calculator tool to Timmy's toolkit so arithmetic questions
  get exact answers instead of LLM hallucinations
- Update system prompts (lite + full) to instruct Timmy to always use the
  calculator and never attempt multi-digit math in his head
- Add self-contradiction guard to both prompts ("commit to your facts")
- Render Timmy's chat responses as markdown via marked.js + DOMPurify
  instead of raw escaped text
- Suppress empty briefing notification on startup when there are 0
  pending approval items
- Add calculator to session response sanitizer regex
- 18 new calculator tests, 2 updated briefing notification tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 09:35:59 -05:00
Alexander Payne
05d4dc997c fix: chat panel scroll — internal scroll on #chat-log, auto-scroll on new messages
- Set overflow:hidden on mc-main to prevent page-level scrolling
- Add max-height:100% to sidebar and chat panel to contain within viewport
- Use flex-wrap:nowrap on layout row to prevent column stacking on desktop
- Move scrollChat() to hx-on::after-settle for reliable post-swap scrolling
- Use requestAnimationFrame for smooth scroll-to-bottom timing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 09:15:40 -05:00
Alexander Payne
f95c9606f1 fix: Timmy startup crashes and clean initialization
- Remove show_tool_calls kwarg (not in Agno 2.5.3), which crashed Agent.__init__
- Guard memory_search against top_k=None from model, return formatted string
- Skip Telegram/Discord startup silently when no token configured
- Replace placeholder MEMORY.md with proper structured hot memory document

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 09:11:48 -05:00
Alexander Whitestone
dccd13df8e Merge pull request #46 from AlexanderWhitestone/feature/memory-layers-and-conversational-ai
feat: Event Log, Ledger, Memory, Cascade Router, Upgrade Queue, Activity Feed
2026-02-26 08:33:32 -05:00
Alexander Payne
96ed82d81e fix: memory route bug + fast E2E tests under 10 seconds
- Fix recall_personal_facts() call - remove unsupported limit parameter
- Replace 4 slow E2E test files with single fast test file
- All 6 E2E tests complete in ~9 seconds (was 60+ seconds)
- Reuse browser session across tests (module-scoped fixture)
- Combine related checks into single tests
- Add HTTP-only smoke test for speed
2026-02-26 08:08:32 -05:00
Alexander Payne
d8d976aa60 feat: complete Event Log, Ledger, Memory, Cascade Router, Upgrade Queue, Activity Feed
This commit implements six major features:

1. Event Log System (src/swarm/event_log.py)
   - SQLite-based audit trail for all swarm events
   - Task lifecycle tracking (created, assigned, completed, failed)
   - Agent lifecycle tracking (joined, left, status changes)
   - Integrated with coordinator for automatic logging
   - Dashboard page at /swarm/events

2. Lightning Ledger (src/lightning/ledger.py)
   - Transaction tracking for Lightning Network payments
   - Balance calculations (incoming, outgoing, net, available)
   - Integrated with payment_handler for automatic logging
   - Dashboard page at /lightning/ledger

3. Semantic Memory / Vector Store (src/memory/vector_store.py)
   - Embedding-based similarity search for Echo agent
   - Fallback to keyword matching if sentence-transformers unavailable
   - Personal facts storage and retrieval
   - Dashboard page at /memory

4. Cascade Router Integration (src/timmy/cascade_adapter.py)
   - Automatic LLM failover between providers (Ollama → AirLLM → API)
   - Circuit breaker pattern for failing providers
   - Metrics tracking per provider (latency, error rates)
   - Dashboard status page at /router/status

5. Self-Upgrade Approval Queue (src/upgrades/)
   - State machine for self-modifications: proposed → approved/rejected → applied/failed
   - Human approval required before applying changes
   - Git integration for branch management
   - Dashboard queue at /self-modify/queue

6. Real-Time Activity Feed (src/events/broadcaster.py)
   - WebSocket-based live activity streaming
   - Bridges event_log to dashboard clients
   - Activity panel on /swarm/live

Tests:
- 101 unit tests passing
- 4 new E2E test files for Selenium testing
- Run with: SELENIUM_UI=1 pytest tests/functional/ -v --headed

Documentation:
- 6 ADRs (017-022) documenting architecture decisions
- Implementation summary in docs/IMPLEMENTATION_SUMMARY.md
- Architecture diagram in docs/architecture-v2.md
2026-02-26 08:01:01 -05:00
AlexanderWhitestone
930ec9eb80 Security: Fix XSS vulnerabilities in dashboard templates and improve mobile test UI safety 2026-02-26 02:07:54 -05:00
Alexander Payne
8d85f95ee5 Fix router disabled provider check + comprehensive functional tests
Fixes:
- Router now properly skips disabled providers in complete() method
- Fixed avg_latency calculation comment in tests (now correctly documents behavior)

New Test Suites:
- tests/test_functional_router.py: 10 functional tests for router
- tests/test_functional_mcp.py: 15 functional tests for MCP discovery/bootstrap
- tests/test_integration_full.py: 14 end-to-end integration tests

Total: 39 new functional/integration tests

All 144 tests passing (105 router/mcp + 39 functional/integration)
2026-02-25 20:22:51 -05:00
Alexander Whitestone
3792bf16cf Merge pull request #44 from AlexanderWhitestone/feature/memory-layers-and-conversational-ai
Phase 3-4: Cascade LLM Router + Tool Registry Auto-Discovery
2026-02-25 20:04:30 -05:00
Alexander Payne
56437751d3 Phase 4: Tool Registry Auto-Discovery
- @mcp_tool decorator for marking functions as tools
- ToolDiscovery class for introspecting modules and packages
- Automatic JSON schema generation from type hints
- AST-based discovery for files (without importing)
- Auto-bootstrap on startup (packages=['tools'] by default)
- Support for tags, categories, and metadata
- Updated registry with register_tool() convenience method
- Environment variable MCP_AUTO_BOOTSTRAP to disable
- 39 tests with proper isolation and cleanup

Files Added:
- src/mcp/discovery.py: Tool discovery and introspection
- src/mcp/bootstrap.py: Auto-bootstrap functionality
- tests/test_mcp_discovery.py: 26 tests
- tests/test_mcp_bootstrap.py: 13 tests

Files Modified:
- src/mcp/registry.py: Added tags, source_module, auto_discovered fields
- src/mcp/__init__.py: Export discovery and bootstrap modules
- src/dashboard/app.py: Auto-bootstrap on startup
2026-02-25 19:59:42 -05:00
Alexander Payne
c658ca829c Phase 3: Cascade LLM Router with automatic failover
- YAML-based provider configuration (config/providers.yaml)
- Priority-ordered provider routing
- Circuit breaker pattern for failing providers
- Health check and availability monitoring
- Metrics tracking (latency, errors, success rates)
- Support for Ollama, OpenAI, Anthropic, AirLLM providers
- Automatic failover on rate limits or errors
- REST API endpoints for monitoring and control
- 41 comprehensive tests

API Endpoints:
- POST /api/v1/router/complete - Chat completion with failover
- GET /api/v1/router/status - Provider health status
- GET /api/v1/router/metrics - Detailed metrics
- GET /api/v1/router/providers - List all providers
- POST /api/v1/router/providers/{name}/control - Enable/disable/reset
- POST /api/v1/router/health-check - Run health checks
- GET /api/v1/router/config - View configuration
2026-02-25 19:43:43 -05:00
Alexander Payne
a719c7538d Implement MCP system, Event Bus, and Sub-Agents
## 1. MCP (Model Context Protocol) Implementation

### Registry (src/mcp/registry.py)
- Tool registration with JSON schemas
- Dynamic tool discovery
- Health tracking per tool
- Metrics collection (latency, error rates)
- @register_tool decorator for easy registration

### Server (src/mcp/server.py)
- MCPServer class implementing MCP protocol
- MCPHTTPServer for FastAPI integration
- Standard endpoints: list_tools, call_tool, get_schema

### Schemas (src/mcp/schemas/base.py)
- create_tool_schema() helper
- Common parameter types
- Standard return types

### Bootstrap (src/mcp/bootstrap.py)
- Automatic tool module loading
- Status reporting

## 2. MCP-Compliant Tools (src/tools/)

| Tool | Purpose | Category |
|------|---------|----------|
| web_search | DuckDuckGo search | research |
| read_file | File reading | files |
| write_file | File writing (confirmation) | files |
| list_directory | Directory listing | files |
| python | Python code execution | code |
| memory_search | Vector memory search | memory |

All tools have proper schemas, error handling, and MCP registration.

## 3. Event Bus (src/events/bus.py)

- Async publish/subscribe pattern
- Pattern matching with wildcards (agent.task.*)
- Event history tracking
- Concurrent handler execution
- Module-level singleton for system-wide use

## 4. Sub-Agents (src/agents/)

All agents inherit from BaseAgent with:
- Agno Agent integration
- MCP tool registry access
- Event bus connectivity
- Structured logging

### Agent Roster

| Agent | Role | Tools | Purpose |
|-------|------|-------|---------|
| Seer | Research | web_search, read_file, memory_search | Information gathering |
| Forge | Code | python, write_file, read_file | Code generation |
| Quill | Writing | write_file, read_file, memory_search | Content creation |
| Echo | Memory | memory_search, read_file, write_file | Context retrieval |
| Helm | Routing | memory_search | Task routing decisions |
| Timmy | Orchestrator | All tools | Coordination & user interface |

### Timmy Orchestrator
- Analyzes user requests
- Routes to appropriate sub-agent
- Handles direct queries
- Manages swarm coordination
- create_timmy_swarm() factory function

## 5. Integration

All components wired together:
- Tools auto-register on import
- Agents connect to event bus
- MCP server provides HTTP API
- Ready for dashboard integration

## Tests
- All 973 existing tests pass
- New components tested manually
- Import verification successful

Next steps: Cascade Router, Self-Upgrade Loop, Dashboard integration
2026-02-25 19:26:24 -05:00