Timmy-time-dashboard

Archived

forked from Rockachopa/Timmy-time-dashboard

Author	SHA1	Message	Date
Alexander Whitestone	da5745db48	Fix dashboard tests and add SECURITY.md audit report (#84 )	2026-02-28 06:59:15 -05:00
Alexander Whitestone	3426761894	fix: unblock task queue — auto-approve all tasks, recycle zombie runners (#85 ) The task queue was completely stuck: 82 tasks trapped in pending_approval, 4 zombie tasks frozen in running, and the worker loop unable to process anything. This removes the approval gate as the default and adds startup recovery for orphaned tasks. - Auto-approve all tasks by default; only task_type="escalation" requires human review (and escalations never block the processor) - Add reconcile_zombie_tasks() to reset RUNNING→APPROVED on startup - Use in-memory _current_task for concurrency check instead of DB status so stale RUNNING rows from a crash can't block new work - Update get_next_pending_task to only query APPROVED tasks - Update all callsites (chat route, API, form) to match new defaults Co-authored-by: Alexander Payne <apayne@MM.local> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 06:57:51 -05:00
Alexander Whitestone	aa3263bc3b	feat: automatic error feedback loop with bug report tracker (#80 ) Errors and uncaught exceptions are now automatically captured, deduplicated, persisted to a rotating log file, and filed as bug report tasks in the existing task queue — giving Timmy a sovereign, local issue tracker with zero new dependencies. - Add RotatingFileHandler writing errors to logs/errors.log (5MB rotate, 5 backups) - Add error capture module with stack-trace hashing and 5-min dedup window - Add FastAPI exception middleware + global exception handler - Instrument all background loops (briefing, thinking, task processor) with capture_error() - Extend task queue with bug_report task type and auto-approve rule - Fix auto-approve type matching (was ignoring task_type field entirely) - Add /bugs dashboard page and /api/bugs JSON endpoints - Add ERROR_CAPTURED and BUG_REPORT_CREATED event types for real-time feed - Add BUGS nav link to desktop and mobile navigation - Add 16 tests covering error capture, deduplication, and bug report routes Co-authored-by: Alexander Payne <apayne@MM.local> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 19:51:37 -05:00
Alexander Whitestone	6545b7e26a	test: add inbox zero functional tests for task queue processor (#79 ) 11 tests verify that the TaskProcessor drains all queued tasks to completion — the core behavior needed for Timmy's stream of consciousness. Tests cover: single/batch/burst processing, priority ordering, mixed task types, failure recovery, timestamp tracking, and a loop-based inbox zero assertion. Adds an `isolated_task_db` fixture to functional conftest that gives each test a fresh temporary SQLite database via pytest's tmp_path. Co-authored-by: Alexander Payne <apayne@MM.local> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 02:19:02 -05:00
Alexander Whitestone	5b6d33e05a	feat: task queue system with startup drain and backlogging (#76 ) * feat: add task queue system for Timmy - all work goes through the queue - Add queue position tracking to task_queue models with task_type field - Add TaskProcessor class that consumes tasks from queue one at a time - Modify chat route to queue all messages for async processing - Chat responses get 'high' priority to jump ahead of thought tasks - Add queue status API endpoints for position polling - Update UI to show queue position (x/y) and current task banner - Replace thinking loop with task-based approach - thoughts are queued tasks - Push responses to user via WebSocket instead of immediate HTTP response - Add database migrations for existing tables * feat: Timmy drains task queue on startup, backlogs unhandleable tasks On spin-up, Timmy now iterates through all pending/approved tasks immediately instead of waiting for the polling loop. Tasks without a registered handler or with permanent errors are moved to a new BACKLOGGED status with a reason, keeping the queue clear for work Timmy can actually do. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Alexander Payne <apayne@MM.local> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 01:52:42 -05:00
Alexander Whitestone	849b5b1a8d	feat: add default thinking thread — Timmy always ponders (#75 )	2026-02-27 01:00:11 -05:00
Alexander Whitestone	a975a845c5	feat: Timmy system introspection, delegation, and session logging (#74 ) * test: remove hardcoded sleeps, add pytest-timeout - Replace fixed time.sleep() calls with intelligent polling or WebDriverWait - Add pytest-timeout dependency and --timeout=30 to prevent hangs - Fixes test flakiness and improves test suite speed * feat: add Aider AI tool to Forge's toolkit - Add Aider tool that calls local Ollama (qwen2.5:14b) for AI coding assist - Register tool in Forge's code toolkit - Add functional tests for the Aider tool * config: add opencode.json with local Ollama provider for sovereign AI * feat: Timmy fixes and improvements ## Bug Fixes - Fix read_file path resolution: add ~ expansion, proper relative path handling - Add repo_root to config.py with auto-detection from .git location - Fix hardcoded llama3.2 - now dynamic from settings.ollama_model ## Timmy's Requests - Add communication protocol to AGENTS.md (read context first, explain changes) - Create DECISIONS.md for architectural decision documentation - Add reasoning guidance to system prompts (step-by-step, state uncertainty) - Update tests to reflect correct model name (llama3.1:8b-instruct) ## Testing - All 177 dashboard tests pass - All 32 prompt/tool tests pass * feat: Timmy system introspection, delegation, and session logging ## System Introspection (Sovereign Self-Knowledge) - Add get_system_info() tool - Timmy can now query his runtime environment - Add check_ollama_health() - verify Ollama status - Add get_memory_status() - check memory tier status - True introspection vs hardcoded prompts ## Path Resolution Fix - Fix all toolkits to use settings.repo_root consistently - Now uses Path(settings.repo_root) instead of Path.cwd() ## Inter-Agent Delegation - Add delegate_task() tool - Timmy can dispatch to Seer, Forge, Echo, etc. - Add list_swarm_agents() - query available agents ## Session Logging - Add SessionLogger for comprehensive interaction logging - Records messages, tool calls, errors, decisions - Writes to /logs/session_{date}.jsonl ## Tests - Add tests for introspection tools - Add tests for delegation tools - Add tests for session logging - Add tests for path resolution - All 18 new tests pass - All 177 dashboard tests pass --------- Co-authored-by: Alexander Payne <apayne@MM.local>	2026-02-27 00:11:53 -05:00
Alexander Whitestone	5e60a6453b	feat: wire mobile app to real Timmy backend via JSON REST API (#73 ) Add /api/chat, /api/upload, and /api/chat/history endpoints to the FastAPI dashboard so the Expo mobile app talks directly to Timmy's brain (Ollama) instead of a non-existent Node.js server. Backend: - New src/dashboard/routes/chat_api.py with 4 endpoints - Mount /uploads/ for serving chat attachments - Same context injection and session management as HTMX chat Mobile app fixes: - Point API base URL at port 8000 (FastAPI) instead of 3000 - Create lib/_core/theme.ts (was referenced but never created) - Fix shared/types.ts (remove broken drizzle/errors re-exports) - Remove broken server/chat.ts and 1,235-line template README - Clean package.json (remove express, mysql2, drizzle, tRPC deps) - Remove debug console.log from theme-provider Tests: 13 new tests covering all API endpoints (all passing). https://claude.ai/code/session_01XqErDoh2rVsPY8oTj21Lz2 Co-authored-by: Claude <noreply@anthropic.com>	2026-02-26 23:58:53 -05:00
Alexander Whitestone	18ed6232f9	feat: Timmy fixes and improvements (#72 ) * test: remove hardcoded sleeps, add pytest-timeout - Replace fixed time.sleep() calls with intelligent polling or WebDriverWait - Add pytest-timeout dependency and --timeout=30 to prevent hangs - Fixes test flakiness and improves test suite speed * feat: add Aider AI tool to Forge's toolkit - Add Aider tool that calls local Ollama (qwen2.5:14b) for AI coding assist - Register tool in Forge's code toolkit - Add functional tests for the Aider tool * config: add opencode.json with local Ollama provider for sovereign AI * feat: Timmy fixes and improvements ## Bug Fixes - Fix read_file path resolution: add ~ expansion, proper relative path handling - Add repo_root to config.py with auto-detection from .git location - Fix hardcoded llama3.2 - now dynamic from settings.ollama_model ## Timmy's Requests - Add communication protocol to AGENTS.md (read context first, explain changes) - Create DECISIONS.md for architectural decision documentation - Add reasoning guidance to system prompts (step-by-step, state uncertainty) - Update tests to reflect correct model name (llama3.1:8b-instruct) ## Testing - All 177 dashboard tests pass - All 32 prompt/tool tests pass --------- Co-authored-by: Alexander Payne <apayne@MM.local>	2026-02-26 23:39:13 -05:00
Alexander Whitestone	a5765c33b6	feat: add Aider AI tool to Forge's toolkit (#70 ) * test: remove hardcoded sleeps, add pytest-timeout - Replace fixed time.sleep() calls with intelligent polling or WebDriverWait - Add pytest-timeout dependency and --timeout=30 to prevent hangs - Fixes test flakiness and improves test suite speed * feat: add Aider AI tool to Forge's toolkit - Add Aider tool that calls local Ollama (qwen2.5:14b) for AI coding assist - Register tool in Forge's code toolkit - Add functional tests for the Aider tool --------- Co-authored-by: Alexander Payne <apayne@MM.local>	2026-02-26 23:17:19 -05:00
Alexander Whitestone	51140fb7f0	test: remove hardcoded sleeps, add pytest-timeout (#69 ) - Replace fixed time.sleep() calls with intelligent polling or WebDriverWait - Add pytest-timeout dependency and --timeout=30 to prevent hangs - Fixes test flakiness and improves test suite speed Co-authored-by: Alexander Payne <apayne@MM.local>	2026-02-26 22:52:36 -05:00
Claude	eb501c43da	fix: resolve 8 test failures from missing requests stub and wrong python path - Add `requests` to conftest.py module stubs so patch("requests.post") works in reward scoring tests without the package installed - Use sys.executable instead of bare "python" in git safety tests so the subprocess finds pytest from the venv rather than system python https://claude.ai/code/session_012Ye9nyFEiw2QQfx4bZeDmn	2026-02-27 02:06:45 +00:00
Claude	21846f3897	fix: disable gpg signing in test git fixtures and skip root-only permission test Test fixtures that create temporary git repos now set commit.gpgsign=false to avoid failures in environments with global commit signing configured. The permission error test is skipped when running as root since file permissions don't apply to the root user. https://claude.ai/code/session_018u1fAx2GihSGctYS64tD4H	2026-02-27 01:52:47 +00:00
Claude	211c54bc8c	feat: add custom weights, model registry, per-agent models, and reward scoring Inspired by OpenClaw-RL's multi-model orchestration, this adds four features for custom model management: 1. Custom model registry (infrastructure/models/registry.py) — SQLite-backed registry for GGUF, safetensors, HF checkpoint, and Ollama models with role-based lookups (general, reward, teacher, judge). 2. Per-agent model assignment — each swarm persona can use a different model instead of sharing the global default. Resolved via registry assignment > persona default > global default. 3. Runtime model management API (/api/v1/models) — REST endpoints to register, list, assign, enable/disable, and remove custom models without restart. Includes a dashboard page at /models. 4. Reward model scoring (PRM-style) — majority-vote quality evaluation of agent outputs using a configurable reward model. Scores persist in SQLite and feed into the swarm learner. New config settings: custom_weights_dir, reward_model_enabled, reward_model_name, reward_model_votes. 54 new tests covering registry CRUD, API endpoints, agent assignments, role lookups, and reward scoring. https://claude.ai/code/session_01V4iTozMwcE2gjfnCJdCugC	2026-02-27 01:27:53 +00:00
Claude	17059bc0ea	feat: add Grok (xAI) as opt-in premium backend with monetization - Add GrokBackend class in src/timmy/backends.py with full sync/async support, health checks, usage stats, and cost estimation in sats - Add consult_grok tool to Timmy's toolkit for proactive Grok queries - Extend cascade router with Grok provider type for failover chain - Add Grok Mode toggle card to Mission Control dashboard (HTMX live) - Add "Ask Grok" button on chat input for direct Grok queries - Add /grok/* routes: status, toggle, chat, stats endpoints - Integrate Lightning invoice generation for Grok usage monetization - Add GROK_ENABLED, XAI_API_KEY, GROK_DEFAULT_MODEL, GROK_MAX_SATS_PER_QUERY, GROK_FREE config settings via pydantic-settings - Update .env.example and docker-compose.yml with Grok env vars - Add 21 tests covering backend, tools, and route endpoints (all green) Local-first ethos preserved: Grok is premium augmentation only, disabled by default, and Lightning-payable when enabled. https://claude.ai/code/session_01FygwN8wS8J6WGZ8FPb7XGV	2026-02-27 01:12:51 +00:00
Claude	3b7fcc5ebc	feat: add in-browser local model support for iPhone via WebLLM Enable Timmy to run directly on iPhone by loading a small LLM into the browser via WebGPU (Safari 26+ / iOS 26+). No server connection required — fully sovereign, fully offline. New files: - static/local_llm.js: WebLLM wrapper with model catalogue, WebGPU detection, streaming chat, and progress callbacks - templates/mobile_local.html: Mobile-optimized UI with model selector, download progress, LOCAL/SERVER badge, and chat - tests/dashboard/test_local_models.py: 31 tests covering routes, config, template UX, JS asset, and XSS prevention Changes: - config.py: browser_model_enabled, browser_model_id, browser_model_fallback settings - routes/mobile.py: /mobile/local page, /mobile/local-models API - base.html: LOCAL AI nav link Supported models: SmolLM2-360M (~200MB), Qwen2.5-0.5B (~350MB), SmolLM2-1.7B (~1GB), Llama-3.2-1B (~700MB). Falls back to server-side Ollama when local model is unavailable. https://claude.ai/code/session_01Cqkvr4sZbED7T3iDu1rwSD	2026-02-27 00:03:05 +00:00
Claude	9f4c809f70	refactor: Phase 2b — consolidate 28 modules into 14 packages Complete the module consolidation planned in REFACTORING_PLAN.md: Modules merged: - work_orders/ + task_queue/ → swarm/ (subpackages) - self_modify/ + self_tdd/ + upgrades/ → self_coding/ (subpackages) - tools/ → creative/tools/ - chat_bridge/ + telegram_bot/ + shortcuts/ + voice/ → integrations/ (new) - ws_manager/ + notifications/ + events/ + router/ → infrastructure/ (new) - agents/ + agent_core/ + memory/ → timmy/ (subpackages) Updated across codebase: - 66 source files: import statements rewritten - 13 test files: import + patch() target strings rewritten - pyproject.toml: wheel includes (28→14), entry points updated - CLAUDE.md: singleton paths, module map, entry points table - AGENTS.md: file convention updates - REFACTORING_PLAN.md: execution status, success metrics Extras: - Module-level CLAUDE.md added to 6 key packages (Phase 6.2) - Zero test regressions: 1462 tests passing https://claude.ai/code/session_01JNjWfHqusjT3aiN4vvYgUk	2026-02-26 22:07:41 +00:00
Claude	d2c80fbf4c	refactor: Phase 2a — consolidate dashboard routes (27→22 files) Merge related route files to reduce sprawl: - voice.py ← voice_enhanced.py (enhanced pipeline merged in) - swarm.py ← swarm_internal.py + swarm_ws.py (internal API + WebSocket) - self_coding.py ← self_modify.py (self-modify endpoints merged in) - Delete mobile_test.py route + template (test-only page, not for prod) - Delete test_xss_prevention.py (tested the deleted mobile_test page) Update app.py to use consolidated imports. Update test_voice_enhanced.py patch paths. Remove mobile_test.py from coverage omit (file deleted). 27 route files → 22. Tests: 1502 passed (1 removed with deleted page). https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN	2026-02-26 21:30:39 +00:00
Claude	4e11dd2490	refactor: Phase 3 — reorganize tests into module-mirroring subdirectories Move 97 test files from flat tests/ into 13 subdirectories: tests/dashboard/ (8 files — routes, mobile, mission control) tests/swarm/ (17 files — coordinator, docker, routing, tasks) tests/timmy/ (12 files — agent, backends, CLI, tools) tests/self_coding/ (14 files — git safety, indexer, self-modify) tests/lightning/ (3 files — L402, LND, interface) tests/creative/ (8 files — assembler, director, image/music/video) tests/integrations/ (10 files — chat bridge, telegram, voice, websocket) tests/mcp/ (4 files — bootstrap, discovery, executor) tests/spark/ (3 files — engine, tools, events) tests/hands/ (3 files — registry, oracle, phase5) tests/scripture/ (1 file) tests/infrastructure/ (3 files — router cascade, API) tests/security/ (3 files — XSS, regression) Fix Path(__file__) reference in test_mobile_scenarios.py for new depth. Add __init__.py to all test subdirectories. Tests: 1503 passed, 9 failed (pre-existing), 53 errors (pre-existing) https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN	2026-02-26 21:21:28 +00:00
Alexander Payne	7b26922339	test: Phase 5 Hands tests Add comprehensive tests for new Hands: TestScoutHand: - Directory structure, TOML validity, SYSTEM.md - Registry loading TestScribeHand: - Same validation pattern TestLedgerHand: - Same validation pattern TestWeaverHand: - Same validation pattern TestPhase5Schedules: - Scout: hourly (0 * * * ) - Scribe: daily 9am (0 9 * ) - Ledger: every 6 hours (0 /6 * * ) - Weaver: Sunday 10am (0 10 * 0) TestPhase5ApprovalGates: - All 4 Hands have approval gates TestAllHandsLoad: - All 6 Hands load together 25 tests total, all passing.	2026-02-26 13:08:48 -05:00
Alexander Payne	7508ef13c1	test: Oracle and Sentinel Hands tests (Phase 4) Add validation tests for the first two autonomous Hands: TestOracleHand: - Directory structure exists - HAND.toml is valid TOML with correct config - SYSTEM.md exists with proper content - Skills directory populated - Loads correctly in HandRegistry TestSentinelHand: - Same validation pattern as Oracle TestHandSchedules: - Oracle runs twice daily (7am, 7pm UTC) - Sentinel runs every 15 minutes TestHandApprovalGates: - Both Hands have approval gates configured - Safety model enforced 14 tests total, all passing.	2026-02-26 12:57:41 -05:00
Alexander Payne	a1d00da2de	test: Hands infrastructure tests (Phase 3) Add comprehensive test suite for Hands framework: TestHandRegistry: - Load all Hands from directory - Get Hand by name (with not-found handling) - Get scheduled vs all Hands - State management (status updates) - Approval queue operations TestHandScheduler: - Scheduler initialization - Schedule Hand with cron - Get scheduled jobs list - Manual trigger execution TestHandRunner: - Load system prompts from SYSTEM.md - Load skills from skills/ directory - Build execution prompts TestHandConfig: - HandConfig creation and validation - Cron schedule validation TestHandModels: - HandStatus enum values - HandState serialization to dict 17 tests total, all passing.	2026-02-26 12:49:06 -05:00
Alexander Payne	4d3995012a	test: Self-Coding Dashboard Tests Add tests for dashboard routes: - Page routes (main page, journal partial, stats partial, execute form) - API routes (journal list/detail, stats, codebase summary/reindex) - Execute endpoints (API and HTMX) - Navigation integration (link in header) Tests verify endpoints return correct status codes and content types.	2026-02-26 12:28:30 -05:00
Alexander Payne	49ca4dad43	feat: Self-Edit MCP Tool (Phase 2.1) Implements the Self-Edit MCP Tool that orchestrates the self-coding foundation: ## Core Features 1. SelfEditTool (src/tools/self_edit.py) - Complete self-modification orchestrator - Pre-flight safety checks (clean repo, on main branch) - Context gathering (codebase indexer + modification journal) - Feature branch creation (timmy/self-edit/{timestamp}) - LLM-based edit planning with fallback - Safety constraint validation - Aider integration (preferred) with fallback to direct editing - Automatic test execution via pytest - Commit on success, rollback on failure - Modification journaling with reflections 2. Safety Constraints - Max 3 files per commit - Max 100 lines changed - Protected files list (self-edit tool, foundation services) - Only modify files with test coverage - Max 3 retries on failure - Requires user confirmation (MCP tool registration) 3. Execution Backends - Aider integration: --auto-test --test-cmd pytest --yes --no-git - Direct editing fallback: LLM-based file modification with AST validation - Automatic backend selection based on availability ## Test Coverage - 19 new tests covering: - Basic functionality (initialization, preflight checks) - Edit planning (with/without LLM) - Safety validation (file limits, protected files) - Execution flow (success and failure paths) - Error handling (exceptions, LLM failures) - MCP registration ## Usage from tools.self_edit import register_self_edit_tool from mcp.registry import tool_registry # Register with MCP register_self_edit_tool(tool_registry, llm_adapter) Phase 2.2 will add Dashboard API endpoints and UI.	2026-02-26 12:28:05 -05:00
Claude	63bbe2a288	feat: add sovereign biblical text integration module (scripture) Implement the core scripture module for local-first ESV text storage, verse retrieval, reference parsing, original language support, cross-referencing, topical mapping, and automated meditation workflows. Architecture: - scripture/constants.py: 66-book Protestant canon with aliases and metadata - scripture/models.py: Pydantic models with integer-encoded verse IDs - scripture/parser.py: Regex-based reference extraction and formatting - scripture/store.py: SQLite-backed verse/xref/topic/Strong's storage - scripture/memory.py: Tripartite memory (working/long-term/associative) - scripture/meditation.py: Sequential/thematic/lectionary meditation scheduler - dashboard/routes/scripture.py: REST endpoints for all scripture operations - config.py: scripture_enabled, translation, meditation settings - 95 comprehensive tests covering all modules and routes https://claude.ai/code/session_015wv7FM6BFsgZ35Us6WeY7H	2026-02-26 17:06:00 +00:00
Alexander Payne	431cf3e020	merge: resolve conflicts with main, keep comprehensive chat pipeline Resolved merge conflicts in agents.py and test_task_queue.py: - Keep full chat-to-task pipeline (agent/priority extraction, question filtering, context injection) over simpler main version - Incorporate test_briefing_task_queue_summary from main - All 64 task queue tests pass Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 11:47:34 -05:00
Alexander Payne	3ca8e9f2d6	fix: chat evaluation bugs — task pipeline, prompt grounding, markdown rendering Addresses 14 bugs from 3 rounds of deep chat evaluation: - Add chat-to-task pipeline in agents.py with regex-based intent detection, agent extraction, priority extraction, and title cleaning - Filter meta-questions ("how do I create a task?") from task creation - Inject real-time date/time context into every chat message - Inject live queue state when user asks about tasks - Ground system prompts with agent roster, honesty guardrails, self-knowledge, math delegation template, anti-filler rules, values-conflict guidance - Add CSS for markdown code blocks, inline code, lists, blockquotes in chat - Add highlight.js CDN for syntax highlighting in chat responses - Reduce small-model memory context budget (4000→2000) for expanded prompt - Add 27 comprehensive tests covering the full chat-to-task pipeline Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 11:42:42 -05:00
Alexander Whitestone	32ad43a61a	Merge pull request #51 from AlexanderWhitestone/feature/task-queue-and-ui-fixes feat: wire chat-to-task-queue and briefing integration	2026-02-26 11:31:25 -05:00
Alexander Payne	18bc64b36d	feat: Self-Coding Foundation (Phase 1) Implements the foundational infrastructure for Timmy's self-modification capability: ## New Services 1. GitSafety (src/self_coding/git_safety.py) - Atomic git operations with rollback capability - Snapshot/restore for safe experimentation - Feature branch management (timmy/self-edit/{timestamp}) - Merge to main only after tests pass 2. CodebaseIndexer (src/self_coding/codebase_indexer.py) - AST-based parsing of Python source files - Extracts classes, functions, imports, docstrings - Builds dependency graph for blast radius analysis - SQLite storage with hash-based incremental indexing - get_summary() for LLM context (<4000 tokens) - get_relevant_files() for task-based file discovery 3. ModificationJournal (src/self_coding/modification_journal.py) - Persistent log of all self-modification attempts - Tracks outcomes: success, failure, rollback - find_similar() for learning from past attempts - Success rate metrics and recent failure tracking - Supports vector embeddings (Phase 2) 4. ReflectionService (src/self_coding/reflection.py) - LLM-powered analysis of modification attempts - Generates lessons learned from successes and failures - Fallback templates when LLM unavailable - Supports context from similar past attempts ## Test Coverage - 104 new tests across 7 test files - 95% code coverage on self_coding module - Green path tests: full workflow integration - Red path tests: errors, rollbacks, edge cases - Safety constraint tests: test coverage requirements, protected files ## Usage from self_coding import GitSafety, CodebaseIndexer, ModificationJournal git = GitSafety(repo_path=/path/to/repo) indexer = CodebaseIndexer(repo_path=/path/to/repo) journal = ModificationJournal() Phase 2 will build the Self-Edit MCP Tool that orchestrates these services.	2026-02-26 11:08:05 -05:00
Alexander Payne	bc9089ef96	feat: wire chat-to-task-queue and briefing integration - Chat messages like "add X to the queue" or "create a task" are intercepted and create a task_queue entry with pending_approval status instead of going through to the LLM - Briefing engine now gathers task queue stats (pending, running, completed, failed) and includes them in the morning briefing prompt - 7 new tests covering detection patterns, chat integration, and briefing summary Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 10:33:14 -05:00
Alexander Payne	5f9bbb8435	feat: add task queue with human-in-the-loop approval + work orders + UI bug fixes Task Queue system: - New /tasks page with three-column layout (Pending/Active/Completed) - Full CRUD API at /api/tasks with approve/veto/modify/pause/cancel/retry - SQLite persistence in task_queue table - WebSocket live updates via ws_manager - Create task modal with agent assignment and priority - Auto-approve rules for low-risk tasks - HTMX polling for real-time column updates - HOME TASK buttons now link to task queue with agent pre-selected - MARKET HIRE buttons link to task queue with agent pre-selected Work Order system: - External submission API for agents/users (POST /work-orders/submit) - Risk scoring and configurable auto-execution thresholds - Dashboard at /work-orders/queue with approve/reject/execute flow - Integration with swarm task system for execution UI & Dashboard bug fixes: - EVENTS: add startup event so page is never empty - LEDGER: fix empty filter params in URL - MISSION CONTROL: LLM backend and model now read from /health - MISSION CONTROL: agent count fallback to /swarm/agents - SWARM: HTMX fallback loads initial data if WebSocket is slow - MEMORY: add edit/delete buttons for personal facts - UPGRADES: add empty state guidance with links - BRIEFING: add regenerate button and POST /briefing/regenerate endpoint Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 10:27:08 -05:00
Alexander Payne	6e6b4355bb	fix: calculator tool, markdown rendering, prompt guardrails, briefing notification - Add sandboxed calculator tool to Timmy's toolkit so arithmetic questions get exact answers instead of LLM hallucinations - Update system prompts (lite + full) to instruct Timmy to always use the calculator and never attempt multi-digit math in his head - Add self-contradiction guard to both prompts ("commit to your facts") - Render Timmy's chat responses as markdown via marked.js + DOMPurify instead of raw escaped text - Suppress empty briefing notification on startup when there are 0 pending approval items - Add calculator to session response sanitizer regex - 18 new calculator tests, 2 updated briefing notification tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 09:35:59 -05:00
Alexander Payne	f95c9606f1	fix: Timmy startup crashes and clean initialization - Remove show_tool_calls kwarg (not in Agno 2.5.3), which crashed Agent.__init__ - Guard memory_search against top_k=None from model, return formatted string - Skip Telegram/Discord startup silently when no token configured - Replace placeholder MEMORY.md with proper structured hot memory document Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 09:11:48 -05:00
Alexander Whitestone	dccd13df8e	Merge pull request #46 from AlexanderWhitestone/feature/memory-layers-and-conversational-ai feat: Event Log, Ledger, Memory, Cascade Router, Upgrade Queue, Activity Feed	2026-02-26 08:33:32 -05:00
Alexander Payne	06a15bb3f2	test: add missing fixtures for functional tests Add fixtures required by functional test suite: - docker_stack: Docker container test URL (skips if FUNCTIONAL_DOCKER != 1) - serve_client: FastAPI TestClient for timmy-serve app - tdd_runner: Alias for self_tdd_runner Fixes CI errors in test_docker_swarm.py, test_l402_flow.py, test_cli.py	2026-02-26 08:30:04 -05:00
Alexander Payne	96ed82d81e	fix: memory route bug + fast E2E tests under 10 seconds - Fix recall_personal_facts() call - remove unsupported limit parameter - Replace 4 slow E2E test files with single fast test file - All 6 E2E tests complete in ~9 seconds (was 60+ seconds) - Reuse browser session across tests (module-scoped fixture) - Combine related checks into single tests - Add HTTP-only smoke test for speed	2026-02-26 08:08:32 -05:00
Alexander Payne	d8d976aa60	feat: complete Event Log, Ledger, Memory, Cascade Router, Upgrade Queue, Activity Feed This commit implements six major features: 1. Event Log System (src/swarm/event_log.py) - SQLite-based audit trail for all swarm events - Task lifecycle tracking (created, assigned, completed, failed) - Agent lifecycle tracking (joined, left, status changes) - Integrated with coordinator for automatic logging - Dashboard page at /swarm/events 2. Lightning Ledger (src/lightning/ledger.py) - Transaction tracking for Lightning Network payments - Balance calculations (incoming, outgoing, net, available) - Integrated with payment_handler for automatic logging - Dashboard page at /lightning/ledger 3. Semantic Memory / Vector Store (src/memory/vector_store.py) - Embedding-based similarity search for Echo agent - Fallback to keyword matching if sentence-transformers unavailable - Personal facts storage and retrieval - Dashboard page at /memory 4. Cascade Router Integration (src/timmy/cascade_adapter.py) - Automatic LLM failover between providers (Ollama → AirLLM → API) - Circuit breaker pattern for failing providers - Metrics tracking per provider (latency, error rates) - Dashboard status page at /router/status 5. Self-Upgrade Approval Queue (src/upgrades/) - State machine for self-modifications: proposed → approved/rejected → applied/failed - Human approval required before applying changes - Git integration for branch management - Dashboard queue at /self-modify/queue 6. Real-Time Activity Feed (src/events/broadcaster.py) - WebSocket-based live activity streaming - Bridges event_log to dashboard clients - Activity panel on /swarm/live Tests: - 101 unit tests passing - 4 new E2E test files for Selenium testing - Run with: SELENIUM_UI=1 pytest tests/functional/ -v --headed Documentation: - 6 ADRs (017-022) documenting architecture decisions - Implementation summary in docs/IMPLEMENTATION_SUMMARY.md - Architecture diagram in docs/architecture-v2.md	2026-02-26 08:01:01 -05:00
AlexanderWhitestone	930ec9eb80	Security: Fix XSS vulnerabilities in dashboard templates and improve mobile test UI safety	2026-02-26 02:07:54 -05:00
Alexander Payne	8d85f95ee5	Fix router disabled provider check + comprehensive functional tests Fixes: - Router now properly skips disabled providers in complete() method - Fixed avg_latency calculation comment in tests (now correctly documents behavior) New Test Suites: - tests/test_functional_router.py: 10 functional tests for router - tests/test_functional_mcp.py: 15 functional tests for MCP discovery/bootstrap - tests/test_integration_full.py: 14 end-to-end integration tests Total: 39 new functional/integration tests All 144 tests passing (105 router/mcp + 39 functional/integration)	2026-02-25 20:22:51 -05:00
Alexander Whitestone	3792bf16cf	Merge pull request #44 from AlexanderWhitestone/feature/memory-layers-and-conversational-ai Phase 3-4: Cascade LLM Router + Tool Registry Auto-Discovery	2026-02-25 20:04:30 -05:00
Alexander Payne	56437751d3	Phase 4: Tool Registry Auto-Discovery - @mcp_tool decorator for marking functions as tools - ToolDiscovery class for introspecting modules and packages - Automatic JSON schema generation from type hints - AST-based discovery for files (without importing) - Auto-bootstrap on startup (packages=['tools'] by default) - Support for tags, categories, and metadata - Updated registry with register_tool() convenience method - Environment variable MCP_AUTO_BOOTSTRAP to disable - 39 tests with proper isolation and cleanup Files Added: - src/mcp/discovery.py: Tool discovery and introspection - src/mcp/bootstrap.py: Auto-bootstrap functionality - tests/test_mcp_discovery.py: 26 tests - tests/test_mcp_bootstrap.py: 13 tests Files Modified: - src/mcp/registry.py: Added tags, source_module, auto_discovered fields - src/mcp/__init__.py: Export discovery and bootstrap modules - src/dashboard/app.py: Auto-bootstrap on startup	2026-02-25 19:59:42 -05:00
Alexander Payne	c658ca829c	Phase 3: Cascade LLM Router with automatic failover - YAML-based provider configuration (config/providers.yaml) - Priority-ordered provider routing - Circuit breaker pattern for failing providers - Health check and availability monitoring - Metrics tracking (latency, errors, success rates) - Support for Ollama, OpenAI, Anthropic, AirLLM providers - Automatic failover on rate limits or errors - REST API endpoints for monitoring and control - 41 comprehensive tests API Endpoints: - POST /api/v1/router/complete - Chat completion with failover - GET /api/v1/router/status - Provider health status - GET /api/v1/router/metrics - Detailed metrics - GET /api/v1/router/providers - List all providers - POST /api/v1/router/providers/{name}/control - Enable/disable/reset - POST /api/v1/router/health-check - Run health checks - GET /api/v1/router/config - View configuration	2026-02-25 19:43:43 -05:00
Alexander Payne	26e1691099	Fix Timmy coherence: persistent session, model-aware tools, response sanitization Timmy was exhibiting severe incoherence (no memory between messages, tool call leakage, chain-of-thought narration, random tool invocations) due to creating a brand new agent per HTTP request and giving a 3B model (llama3.2) a 73-line system prompt with complex tool-calling instructions it couldn't follow. Key changes: - Add session.py singleton with stable session_id for conversation continuity - Add _model_supports_tools() to strip tools from small models (< 7B) - Add two-tier prompts: lite (12 lines) for small models, full for capable ones - Add response sanitizer to strip leaked JSON tool calls and CoT narration - Set show_tool_calls=False to prevent raw tool JSON in output - Wire ConversationManager for user name extraction - Deprecate orphaned memory_layers.py (unused 4-layer system) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 19:18:08 -05:00
Alexander Payne	7838df19b0	Implement three-tier memory architecture (Hot/Vault/Handoff) This commit replaces the previous memory_layers.py with a proper three-tier memory system as specified by the user: ## Tier 1 — Hot Memory (MEMORY.md) - Single flat file always loaded into system context - Contains: current status, standing rules, agent roster, key decisions - ~300 lines max, pruned monthly - Managed by HotMemory class ## Tier 2 — Structured Vault (memory/) - Directory with three namespaces: • self/ — identity.md, user_profile.md, methodology.md • notes/ — session logs, AARs, research • aar/ — post-task retrospectives - Markdown format, Obsidian-compatible - Append-only, date-stamped - Managed by VaultMemory class ## Handoff Protocol - last-session-handoff.md written at session end - Contains: summary, key decisions, open items, next steps - Auto-loaded at next session start - Maintains continuity across resets ## Implementation ### New Files: - src/timmy/memory_system.py — Core memory system - MEMORY.md — Hot memory template - memory/self/*.md — Identity, user profile, methodology ### Modified: - src/timmy/agent.py — Integrated with memory system - create_timmy() injects memory context - TimmyWithMemory class with automatic fact extraction - tests/test_agent.py — Updated for memory context ## Key Principles - Hot memory = small and curated - Vault = append-only, never delete - Handoffs = continuity mechanism - Flat files = human-readable, portable ## Usage All 973 tests pass.	2026-02-25 18:17:43 -05:00
Alexander Payne	625806daf5	Fine-tune Timmy's conversational AI with memory layers ## Enhanced System Prompt - Detailed tool usage guidelines with explicit examples - Clear DO and DON'T examples for tool selection - Memory system documentation - Conversation flow guidelines - Context awareness instructions ## Memory Layer System (NEW) Implemented 3-layer memory architecture: 1. WORKING MEMORY (src/timmy/memory_layers.py) - Immediate context (last 20 messages) - Topic tracking - Tool call tracking - Fast, ephemeral 2. SHORT-TERM MEMORY (Agno SQLite) - Recent conversations (100) - Persists across restarts - Managed by Agno Agent 3. LONG-TERM MEMORY (src/timmy/memory_layers.py) - Facts about user (name, preferences) - SQLite storage in data/memory/ - Auto-extraction from conversations - User profile generation ## Memory Manager (NEW) - Central coordinator for all memory layers - Context injection into prompts - Fact extraction and storage - Session management ## TimmyWithMemory Class (NEW) - Wrapper around Agno Agent with explicit memory - Auto-injects user context from LTM - Tracks exchanges across all layers - Simple chat() interface ## Agent Configuration - Increased num_history_runs: 10 -> 20 - Better conversational context retention ## Tests - All 973 tests pass - Fixed test expectations for new config - Fixed module path in test_scary_paths.py ## Files Added/Modified - src/timmy/prompts.py - Enhanced with memory and tool guidance - src/timmy/agent.py - Added TimmyWithMemory class - src/timmy/memory_layers.py - NEW memory system - src/timmy/conversation.py - NEW conversation manager - tests/ - Updated for new config	2026-02-25 18:07:44 -05:00
Alexander Payne	90a93aa070	fix: resolve merge conflict in base.html nav with main Keep Mission Control link from this branch alongside SWARM and SPARK links from main. All 939 tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 17:51:15 -05:00
Alexander Whitestone	d853e931ec	Merge pull request #40 from AlexanderWhitestone/kimi/phase2-swarm-hardening-v2 Phase 2: Swarm hardening, auto-auction, WebSocket fix	2026-02-25 17:34:13 -05:00
Alexander Payne	fc326421b1	fix: update integration tests for auto-auction behavior The POST /swarm/tasks endpoint now triggers an automatic auction via asyncio.create_task. Tests must allow tasks to be in bidding, assigned, or failed status since the background auction may resolve before the follow-up GET query. All 895 tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 17:28:41 -05:00
Alexander Payne	8fec9c41a5	feat: autonomous self-modifying agent with multi-backend LLM support Adds SelfModifyLoop — an edit→validate→test→commit cycle that can read its own failure reports, diagnose root causes, and restart autonomously. Key capabilities: - Multi-backend LLM: Anthropic Claude API, Ollama, or auto-detect - Syntax validation via compile() before writing to disk - Autonomous self-correction loop with configurable max cycles - XML-based output format to avoid triple-quote delimiter conflicts - Branch creation skipped by default to prevent container restarts - CLI: self-modify run "instruction" --backend auto --autonomous - 939 tests passing, 30 skipped Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 17:23:47 -05:00
Alexander Whitestone	c430f8002c	Merge pull request #29 from AlexanderWhitestone/fix/xss-prevention-mobile-test Security: XSS Prevention in Mobile Test Page	2026-02-25 08:01:05 -05:00

1 2

88 Commits