Timmy-time-dashboard

Archived

forked from Rockachopa/Timmy-time-dashboard

Author	SHA1	Message	Date
Claude	3b7fcc5ebc	feat: add in-browser local model support for iPhone via WebLLM Enable Timmy to run directly on iPhone by loading a small LLM into the browser via WebGPU (Safari 26+ / iOS 26+). No server connection required — fully sovereign, fully offline. New files: - static/local_llm.js: WebLLM wrapper with model catalogue, WebGPU detection, streaming chat, and progress callbacks - templates/mobile_local.html: Mobile-optimized UI with model selector, download progress, LOCAL/SERVER badge, and chat - tests/dashboard/test_local_models.py: 31 tests covering routes, config, template UX, JS asset, and XSS prevention Changes: - config.py: browser_model_enabled, browser_model_id, browser_model_fallback settings - routes/mobile.py: /mobile/local page, /mobile/local-models API - base.html: LOCAL AI nav link Supported models: SmolLM2-360M (~200MB), Qwen2.5-0.5B (~350MB), SmolLM2-1.7B (~1GB), Llama-3.2-1B (~700MB). Falls back to server-side Ollama when local model is unavailable. https://claude.ai/code/session_01Cqkvr4sZbED7T3iDu1rwSD	2026-02-27 00:03:05 +00:00
Claude	9f4c809f70	refactor: Phase 2b — consolidate 28 modules into 14 packages Complete the module consolidation planned in REFACTORING_PLAN.md: Modules merged: - work_orders/ + task_queue/ → swarm/ (subpackages) - self_modify/ + self_tdd/ + upgrades/ → self_coding/ (subpackages) - tools/ → creative/tools/ - chat_bridge/ + telegram_bot/ + shortcuts/ + voice/ → integrations/ (new) - ws_manager/ + notifications/ + events/ + router/ → infrastructure/ (new) - agents/ + agent_core/ + memory/ → timmy/ (subpackages) Updated across codebase: - 66 source files: import statements rewritten - 13 test files: import + patch() target strings rewritten - pyproject.toml: wheel includes (28→14), entry points updated - CLAUDE.md: singleton paths, module map, entry points table - AGENTS.md: file convention updates - REFACTORING_PLAN.md: execution status, success metrics Extras: - Module-level CLAUDE.md added to 6 key packages (Phase 6.2) - Zero test regressions: 1462 tests passing https://claude.ai/code/session_01JNjWfHqusjT3aiN4vvYgUk	2026-02-26 22:07:41 +00:00
Claude	d2c80fbf4c	refactor: Phase 2a — consolidate dashboard routes (27→22 files) Merge related route files to reduce sprawl: - voice.py ← voice_enhanced.py (enhanced pipeline merged in) - swarm.py ← swarm_internal.py + swarm_ws.py (internal API + WebSocket) - self_coding.py ← self_modify.py (self-modify endpoints merged in) - Delete mobile_test.py route + template (test-only page, not for prod) - Delete test_xss_prevention.py (tested the deleted mobile_test page) Update app.py to use consolidated imports. Update test_voice_enhanced.py patch paths. Remove mobile_test.py from coverage omit (file deleted). 27 route files → 22. Tests: 1502 passed (1 removed with deleted page). https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN	2026-02-26 21:30:39 +00:00
Claude	4e11dd2490	refactor: Phase 3 — reorganize tests into module-mirroring subdirectories Move 97 test files from flat tests/ into 13 subdirectories: tests/dashboard/ (8 files — routes, mobile, mission control) tests/swarm/ (17 files — coordinator, docker, routing, tasks) tests/timmy/ (12 files — agent, backends, CLI, tools) tests/self_coding/ (14 files — git safety, indexer, self-modify) tests/lightning/ (3 files — L402, LND, interface) tests/creative/ (8 files — assembler, director, image/music/video) tests/integrations/ (10 files — chat bridge, telegram, voice, websocket) tests/mcp/ (4 files — bootstrap, discovery, executor) tests/spark/ (3 files — engine, tools, events) tests/hands/ (3 files — registry, oracle, phase5) tests/scripture/ (1 file) tests/infrastructure/ (3 files — router cascade, API) tests/security/ (3 files — XSS, regression) Fix Path(__file__) reference in test_mobile_scenarios.py for new depth. Add __init__.py to all test subdirectories. Tests: 1503 passed, 9 failed (pre-existing), 53 errors (pre-existing) https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN	2026-02-26 21:21:28 +00:00
Alexander Payne	7b26922339	test: Phase 5 Hands tests Add comprehensive tests for new Hands: TestScoutHand: - Directory structure, TOML validity, SYSTEM.md - Registry loading TestScribeHand: - Same validation pattern TestLedgerHand: - Same validation pattern TestWeaverHand: - Same validation pattern TestPhase5Schedules: - Scout: hourly (0 * * * ) - Scribe: daily 9am (0 9 * ) - Ledger: every 6 hours (0 /6 * * ) - Weaver: Sunday 10am (0 10 * 0) TestPhase5ApprovalGates: - All 4 Hands have approval gates TestAllHandsLoad: - All 6 Hands load together 25 tests total, all passing.	2026-02-26 13:08:48 -05:00
Alexander Payne	7508ef13c1	test: Oracle and Sentinel Hands tests (Phase 4) Add validation tests for the first two autonomous Hands: TestOracleHand: - Directory structure exists - HAND.toml is valid TOML with correct config - SYSTEM.md exists with proper content - Skills directory populated - Loads correctly in HandRegistry TestSentinelHand: - Same validation pattern as Oracle TestHandSchedules: - Oracle runs twice daily (7am, 7pm UTC) - Sentinel runs every 15 minutes TestHandApprovalGates: - Both Hands have approval gates configured - Safety model enforced 14 tests total, all passing.	2026-02-26 12:57:41 -05:00
Alexander Payne	a1d00da2de	test: Hands infrastructure tests (Phase 3) Add comprehensive test suite for Hands framework: TestHandRegistry: - Load all Hands from directory - Get Hand by name (with not-found handling) - Get scheduled vs all Hands - State management (status updates) - Approval queue operations TestHandScheduler: - Scheduler initialization - Schedule Hand with cron - Get scheduled jobs list - Manual trigger execution TestHandRunner: - Load system prompts from SYSTEM.md - Load skills from skills/ directory - Build execution prompts TestHandConfig: - HandConfig creation and validation - Cron schedule validation TestHandModels: - HandStatus enum values - HandState serialization to dict 17 tests total, all passing.	2026-02-26 12:49:06 -05:00
Alexander Payne	4d3995012a	test: Self-Coding Dashboard Tests Add tests for dashboard routes: - Page routes (main page, journal partial, stats partial, execute form) - API routes (journal list/detail, stats, codebase summary/reindex) - Execute endpoints (API and HTMX) - Navigation integration (link in header) Tests verify endpoints return correct status codes and content types.	2026-02-26 12:28:30 -05:00
Alexander Payne	49ca4dad43	feat: Self-Edit MCP Tool (Phase 2.1) Implements the Self-Edit MCP Tool that orchestrates the self-coding foundation: ## Core Features 1. SelfEditTool (src/tools/self_edit.py) - Complete self-modification orchestrator - Pre-flight safety checks (clean repo, on main branch) - Context gathering (codebase indexer + modification journal) - Feature branch creation (timmy/self-edit/{timestamp}) - LLM-based edit planning with fallback - Safety constraint validation - Aider integration (preferred) with fallback to direct editing - Automatic test execution via pytest - Commit on success, rollback on failure - Modification journaling with reflections 2. Safety Constraints - Max 3 files per commit - Max 100 lines changed - Protected files list (self-edit tool, foundation services) - Only modify files with test coverage - Max 3 retries on failure - Requires user confirmation (MCP tool registration) 3. Execution Backends - Aider integration: --auto-test --test-cmd pytest --yes --no-git - Direct editing fallback: LLM-based file modification with AST validation - Automatic backend selection based on availability ## Test Coverage - 19 new tests covering: - Basic functionality (initialization, preflight checks) - Edit planning (with/without LLM) - Safety validation (file limits, protected files) - Execution flow (success and failure paths) - Error handling (exceptions, LLM failures) - MCP registration ## Usage from tools.self_edit import register_self_edit_tool from mcp.registry import tool_registry # Register with MCP register_self_edit_tool(tool_registry, llm_adapter) Phase 2.2 will add Dashboard API endpoints and UI.	2026-02-26 12:28:05 -05:00
Claude	63bbe2a288	feat: add sovereign biblical text integration module (scripture) Implement the core scripture module for local-first ESV text storage, verse retrieval, reference parsing, original language support, cross-referencing, topical mapping, and automated meditation workflows. Architecture: - scripture/constants.py: 66-book Protestant canon with aliases and metadata - scripture/models.py: Pydantic models with integer-encoded verse IDs - scripture/parser.py: Regex-based reference extraction and formatting - scripture/store.py: SQLite-backed verse/xref/topic/Strong's storage - scripture/memory.py: Tripartite memory (working/long-term/associative) - scripture/meditation.py: Sequential/thematic/lectionary meditation scheduler - dashboard/routes/scripture.py: REST endpoints for all scripture operations - config.py: scripture_enabled, translation, meditation settings - 95 comprehensive tests covering all modules and routes https://claude.ai/code/session_015wv7FM6BFsgZ35Us6WeY7H	2026-02-26 17:06:00 +00:00
Alexander Payne	431cf3e020	merge: resolve conflicts with main, keep comprehensive chat pipeline Resolved merge conflicts in agents.py and test_task_queue.py: - Keep full chat-to-task pipeline (agent/priority extraction, question filtering, context injection) over simpler main version - Incorporate test_briefing_task_queue_summary from main - All 64 task queue tests pass Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 11:47:34 -05:00
Alexander Payne	3ca8e9f2d6	fix: chat evaluation bugs — task pipeline, prompt grounding, markdown rendering Addresses 14 bugs from 3 rounds of deep chat evaluation: - Add chat-to-task pipeline in agents.py with regex-based intent detection, agent extraction, priority extraction, and title cleaning - Filter meta-questions ("how do I create a task?") from task creation - Inject real-time date/time context into every chat message - Inject live queue state when user asks about tasks - Ground system prompts with agent roster, honesty guardrails, self-knowledge, math delegation template, anti-filler rules, values-conflict guidance - Add CSS for markdown code blocks, inline code, lists, blockquotes in chat - Add highlight.js CDN for syntax highlighting in chat responses - Reduce small-model memory context budget (4000→2000) for expanded prompt - Add 27 comprehensive tests covering the full chat-to-task pipeline Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 11:42:42 -05:00
Alexander Whitestone	32ad43a61a	Merge pull request #51 from AlexanderWhitestone/feature/task-queue-and-ui-fixes feat: wire chat-to-task-queue and briefing integration	2026-02-26 11:31:25 -05:00
Alexander Payne	18bc64b36d	feat: Self-Coding Foundation (Phase 1) Implements the foundational infrastructure for Timmy's self-modification capability: ## New Services 1. GitSafety (src/self_coding/git_safety.py) - Atomic git operations with rollback capability - Snapshot/restore for safe experimentation - Feature branch management (timmy/self-edit/{timestamp}) - Merge to main only after tests pass 2. CodebaseIndexer (src/self_coding/codebase_indexer.py) - AST-based parsing of Python source files - Extracts classes, functions, imports, docstrings - Builds dependency graph for blast radius analysis - SQLite storage with hash-based incremental indexing - get_summary() for LLM context (<4000 tokens) - get_relevant_files() for task-based file discovery 3. ModificationJournal (src/self_coding/modification_journal.py) - Persistent log of all self-modification attempts - Tracks outcomes: success, failure, rollback - find_similar() for learning from past attempts - Success rate metrics and recent failure tracking - Supports vector embeddings (Phase 2) 4. ReflectionService (src/self_coding/reflection.py) - LLM-powered analysis of modification attempts - Generates lessons learned from successes and failures - Fallback templates when LLM unavailable - Supports context from similar past attempts ## Test Coverage - 104 new tests across 7 test files - 95% code coverage on self_coding module - Green path tests: full workflow integration - Red path tests: errors, rollbacks, edge cases - Safety constraint tests: test coverage requirements, protected files ## Usage from self_coding import GitSafety, CodebaseIndexer, ModificationJournal git = GitSafety(repo_path=/path/to/repo) indexer = CodebaseIndexer(repo_path=/path/to/repo) journal = ModificationJournal() Phase 2 will build the Self-Edit MCP Tool that orchestrates these services.	2026-02-26 11:08:05 -05:00
Alexander Payne	bc9089ef96	feat: wire chat-to-task-queue and briefing integration - Chat messages like "add X to the queue" or "create a task" are intercepted and create a task_queue entry with pending_approval status instead of going through to the LLM - Briefing engine now gathers task queue stats (pending, running, completed, failed) and includes them in the morning briefing prompt - 7 new tests covering detection patterns, chat integration, and briefing summary Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 10:33:14 -05:00
Alexander Payne	5f9bbb8435	feat: add task queue with human-in-the-loop approval + work orders + UI bug fixes Task Queue system: - New /tasks page with three-column layout (Pending/Active/Completed) - Full CRUD API at /api/tasks with approve/veto/modify/pause/cancel/retry - SQLite persistence in task_queue table - WebSocket live updates via ws_manager - Create task modal with agent assignment and priority - Auto-approve rules for low-risk tasks - HTMX polling for real-time column updates - HOME TASK buttons now link to task queue with agent pre-selected - MARKET HIRE buttons link to task queue with agent pre-selected Work Order system: - External submission API for agents/users (POST /work-orders/submit) - Risk scoring and configurable auto-execution thresholds - Dashboard at /work-orders/queue with approve/reject/execute flow - Integration with swarm task system for execution UI & Dashboard bug fixes: - EVENTS: add startup event so page is never empty - LEDGER: fix empty filter params in URL - MISSION CONTROL: LLM backend and model now read from /health - MISSION CONTROL: agent count fallback to /swarm/agents - SWARM: HTMX fallback loads initial data if WebSocket is slow - MEMORY: add edit/delete buttons for personal facts - UPGRADES: add empty state guidance with links - BRIEFING: add regenerate button and POST /briefing/regenerate endpoint Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 10:27:08 -05:00
Alexander Payne	6e6b4355bb	fix: calculator tool, markdown rendering, prompt guardrails, briefing notification - Add sandboxed calculator tool to Timmy's toolkit so arithmetic questions get exact answers instead of LLM hallucinations - Update system prompts (lite + full) to instruct Timmy to always use the calculator and never attempt multi-digit math in his head - Add self-contradiction guard to both prompts ("commit to your facts") - Render Timmy's chat responses as markdown via marked.js + DOMPurify instead of raw escaped text - Suppress empty briefing notification on startup when there are 0 pending approval items - Add calculator to session response sanitizer regex - 18 new calculator tests, 2 updated briefing notification tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 09:35:59 -05:00
Alexander Payne	f95c9606f1	fix: Timmy startup crashes and clean initialization - Remove show_tool_calls kwarg (not in Agno 2.5.3), which crashed Agent.__init__ - Guard memory_search against top_k=None from model, return formatted string - Skip Telegram/Discord startup silently when no token configured - Replace placeholder MEMORY.md with proper structured hot memory document Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 09:11:48 -05:00
Alexander Whitestone	dccd13df8e	Merge pull request #46 from AlexanderWhitestone/feature/memory-layers-and-conversational-ai feat: Event Log, Ledger, Memory, Cascade Router, Upgrade Queue, Activity Feed	2026-02-26 08:33:32 -05:00
Alexander Payne	06a15bb3f2	test: add missing fixtures for functional tests Add fixtures required by functional test suite: - docker_stack: Docker container test URL (skips if FUNCTIONAL_DOCKER != 1) - serve_client: FastAPI TestClient for timmy-serve app - tdd_runner: Alias for self_tdd_runner Fixes CI errors in test_docker_swarm.py, test_l402_flow.py, test_cli.py	2026-02-26 08:30:04 -05:00
Alexander Payne	96ed82d81e	fix: memory route bug + fast E2E tests under 10 seconds - Fix recall_personal_facts() call - remove unsupported limit parameter - Replace 4 slow E2E test files with single fast test file - All 6 E2E tests complete in ~9 seconds (was 60+ seconds) - Reuse browser session across tests (module-scoped fixture) - Combine related checks into single tests - Add HTTP-only smoke test for speed	2026-02-26 08:08:32 -05:00
Alexander Payne	d8d976aa60	feat: complete Event Log, Ledger, Memory, Cascade Router, Upgrade Queue, Activity Feed This commit implements six major features: 1. Event Log System (src/swarm/event_log.py) - SQLite-based audit trail for all swarm events - Task lifecycle tracking (created, assigned, completed, failed) - Agent lifecycle tracking (joined, left, status changes) - Integrated with coordinator for automatic logging - Dashboard page at /swarm/events 2. Lightning Ledger (src/lightning/ledger.py) - Transaction tracking for Lightning Network payments - Balance calculations (incoming, outgoing, net, available) - Integrated with payment_handler for automatic logging - Dashboard page at /lightning/ledger 3. Semantic Memory / Vector Store (src/memory/vector_store.py) - Embedding-based similarity search for Echo agent - Fallback to keyword matching if sentence-transformers unavailable - Personal facts storage and retrieval - Dashboard page at /memory 4. Cascade Router Integration (src/timmy/cascade_adapter.py) - Automatic LLM failover between providers (Ollama → AirLLM → API) - Circuit breaker pattern for failing providers - Metrics tracking per provider (latency, error rates) - Dashboard status page at /router/status 5. Self-Upgrade Approval Queue (src/upgrades/) - State machine for self-modifications: proposed → approved/rejected → applied/failed - Human approval required before applying changes - Git integration for branch management - Dashboard queue at /self-modify/queue 6. Real-Time Activity Feed (src/events/broadcaster.py) - WebSocket-based live activity streaming - Bridges event_log to dashboard clients - Activity panel on /swarm/live Tests: - 101 unit tests passing - 4 new E2E test files for Selenium testing - Run with: SELENIUM_UI=1 pytest tests/functional/ -v --headed Documentation: - 6 ADRs (017-022) documenting architecture decisions - Implementation summary in docs/IMPLEMENTATION_SUMMARY.md - Architecture diagram in docs/architecture-v2.md	2026-02-26 08:01:01 -05:00
AlexanderWhitestone	930ec9eb80	Security: Fix XSS vulnerabilities in dashboard templates and improve mobile test UI safety	2026-02-26 02:07:54 -05:00
Alexander Payne	8d85f95ee5	Fix router disabled provider check + comprehensive functional tests Fixes: - Router now properly skips disabled providers in complete() method - Fixed avg_latency calculation comment in tests (now correctly documents behavior) New Test Suites: - tests/test_functional_router.py: 10 functional tests for router - tests/test_functional_mcp.py: 15 functional tests for MCP discovery/bootstrap - tests/test_integration_full.py: 14 end-to-end integration tests Total: 39 new functional/integration tests All 144 tests passing (105 router/mcp + 39 functional/integration)	2026-02-25 20:22:51 -05:00
Alexander Whitestone	3792bf16cf	Merge pull request #44 from AlexanderWhitestone/feature/memory-layers-and-conversational-ai Phase 3-4: Cascade LLM Router + Tool Registry Auto-Discovery	2026-02-25 20:04:30 -05:00
Alexander Payne	56437751d3	Phase 4: Tool Registry Auto-Discovery - @mcp_tool decorator for marking functions as tools - ToolDiscovery class for introspecting modules and packages - Automatic JSON schema generation from type hints - AST-based discovery for files (without importing) - Auto-bootstrap on startup (packages=['tools'] by default) - Support for tags, categories, and metadata - Updated registry with register_tool() convenience method - Environment variable MCP_AUTO_BOOTSTRAP to disable - 39 tests with proper isolation and cleanup Files Added: - src/mcp/discovery.py: Tool discovery and introspection - src/mcp/bootstrap.py: Auto-bootstrap functionality - tests/test_mcp_discovery.py: 26 tests - tests/test_mcp_bootstrap.py: 13 tests Files Modified: - src/mcp/registry.py: Added tags, source_module, auto_discovered fields - src/mcp/__init__.py: Export discovery and bootstrap modules - src/dashboard/app.py: Auto-bootstrap on startup	2026-02-25 19:59:42 -05:00
Alexander Payne	c658ca829c	Phase 3: Cascade LLM Router with automatic failover - YAML-based provider configuration (config/providers.yaml) - Priority-ordered provider routing - Circuit breaker pattern for failing providers - Health check and availability monitoring - Metrics tracking (latency, errors, success rates) - Support for Ollama, OpenAI, Anthropic, AirLLM providers - Automatic failover on rate limits or errors - REST API endpoints for monitoring and control - 41 comprehensive tests API Endpoints: - POST /api/v1/router/complete - Chat completion with failover - GET /api/v1/router/status - Provider health status - GET /api/v1/router/metrics - Detailed metrics - GET /api/v1/router/providers - List all providers - POST /api/v1/router/providers/{name}/control - Enable/disable/reset - POST /api/v1/router/health-check - Run health checks - GET /api/v1/router/config - View configuration	2026-02-25 19:43:43 -05:00
Alexander Payne	26e1691099	Fix Timmy coherence: persistent session, model-aware tools, response sanitization Timmy was exhibiting severe incoherence (no memory between messages, tool call leakage, chain-of-thought narration, random tool invocations) due to creating a brand new agent per HTTP request and giving a 3B model (llama3.2) a 73-line system prompt with complex tool-calling instructions it couldn't follow. Key changes: - Add session.py singleton with stable session_id for conversation continuity - Add _model_supports_tools() to strip tools from small models (< 7B) - Add two-tier prompts: lite (12 lines) for small models, full for capable ones - Add response sanitizer to strip leaked JSON tool calls and CoT narration - Set show_tool_calls=False to prevent raw tool JSON in output - Wire ConversationManager for user name extraction - Deprecate orphaned memory_layers.py (unused 4-layer system) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 19:18:08 -05:00
Alexander Payne	7838df19b0	Implement three-tier memory architecture (Hot/Vault/Handoff) This commit replaces the previous memory_layers.py with a proper three-tier memory system as specified by the user: ## Tier 1 — Hot Memory (MEMORY.md) - Single flat file always loaded into system context - Contains: current status, standing rules, agent roster, key decisions - ~300 lines max, pruned monthly - Managed by HotMemory class ## Tier 2 — Structured Vault (memory/) - Directory with three namespaces: • self/ — identity.md, user_profile.md, methodology.md • notes/ — session logs, AARs, research • aar/ — post-task retrospectives - Markdown format, Obsidian-compatible - Append-only, date-stamped - Managed by VaultMemory class ## Handoff Protocol - last-session-handoff.md written at session end - Contains: summary, key decisions, open items, next steps - Auto-loaded at next session start - Maintains continuity across resets ## Implementation ### New Files: - src/timmy/memory_system.py — Core memory system - MEMORY.md — Hot memory template - memory/self/*.md — Identity, user profile, methodology ### Modified: - src/timmy/agent.py — Integrated with memory system - create_timmy() injects memory context - TimmyWithMemory class with automatic fact extraction - tests/test_agent.py — Updated for memory context ## Key Principles - Hot memory = small and curated - Vault = append-only, never delete - Handoffs = continuity mechanism - Flat files = human-readable, portable ## Usage All 973 tests pass.	2026-02-25 18:17:43 -05:00
Alexander Payne	625806daf5	Fine-tune Timmy's conversational AI with memory layers ## Enhanced System Prompt - Detailed tool usage guidelines with explicit examples - Clear DO and DON'T examples for tool selection - Memory system documentation - Conversation flow guidelines - Context awareness instructions ## Memory Layer System (NEW) Implemented 3-layer memory architecture: 1. WORKING MEMORY (src/timmy/memory_layers.py) - Immediate context (last 20 messages) - Topic tracking - Tool call tracking - Fast, ephemeral 2. SHORT-TERM MEMORY (Agno SQLite) - Recent conversations (100) - Persists across restarts - Managed by Agno Agent 3. LONG-TERM MEMORY (src/timmy/memory_layers.py) - Facts about user (name, preferences) - SQLite storage in data/memory/ - Auto-extraction from conversations - User profile generation ## Memory Manager (NEW) - Central coordinator for all memory layers - Context injection into prompts - Fact extraction and storage - Session management ## TimmyWithMemory Class (NEW) - Wrapper around Agno Agent with explicit memory - Auto-injects user context from LTM - Tracks exchanges across all layers - Simple chat() interface ## Agent Configuration - Increased num_history_runs: 10 -> 20 - Better conversational context retention ## Tests - All 973 tests pass - Fixed test expectations for new config - Fixed module path in test_scary_paths.py ## Files Added/Modified - src/timmy/prompts.py - Enhanced with memory and tool guidance - src/timmy/agent.py - Added TimmyWithMemory class - src/timmy/memory_layers.py - NEW memory system - src/timmy/conversation.py - NEW conversation manager - tests/ - Updated for new config	2026-02-25 18:07:44 -05:00
Alexander Payne	90a93aa070	fix: resolve merge conflict in base.html nav with main Keep Mission Control link from this branch alongside SWARM and SPARK links from main. All 939 tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 17:51:15 -05:00
Alexander Whitestone	d853e931ec	Merge pull request #40 from AlexanderWhitestone/kimi/phase2-swarm-hardening-v2 Phase 2: Swarm hardening, auto-auction, WebSocket fix	2026-02-25 17:34:13 -05:00
Alexander Payne	fc326421b1	fix: update integration tests for auto-auction behavior The POST /swarm/tasks endpoint now triggers an automatic auction via asyncio.create_task. Tests must allow tasks to be in bidding, assigned, or failed status since the background auction may resolve before the follow-up GET query. All 895 tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 17:28:41 -05:00
Alexander Payne	8fec9c41a5	feat: autonomous self-modifying agent with multi-backend LLM support Adds SelfModifyLoop — an edit→validate→test→commit cycle that can read its own failure reports, diagnose root causes, and restart autonomously. Key capabilities: - Multi-backend LLM: Anthropic Claude API, Ollama, or auto-detect - Syntax validation via compile() before writing to disk - Autonomous self-correction loop with configurable max cycles - XML-based output format to avoid triple-quote delimiter conflicts - Branch creation skipped by default to prevent container restarts - CLI: self-modify run "instruction" --backend auto --autonomous - 939 tests passing, 30 skipped Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 17:23:47 -05:00
Alexander Whitestone	c430f8002c	Merge pull request #29 from AlexanderWhitestone/fix/xss-prevention-mobile-test Security: XSS Prevention in Mobile Test Page	2026-02-25 08:01:05 -05:00
Alexander Payne	3463f4e4a4	fix: rename src/websocket to src/ws_manager to avoid websocket-client clash selenium depends on websocket-client which installs a top-level `websocket` package that shadows our src/websocket/ module on CI. Renaming to ws_manager eliminates the conflict entirely — no more sys.path hacks needed in conftest or Selenium tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 07:57:28 -05:00
Alexander Payne	e483748816	fix: resolve websocket-client shadowing src/websocket on CI selenium depends on websocket-client which installs a top-level `websocket` package that shadows our src/websocket/ module. Ensure src/ is inserted at the front of sys.path in conftest so the project module wins the import race. Fixes collection errors for test_websocket.py and test_websocket_extended.py on GitHub Actions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 07:32:57 -05:00
Alexander Payne	29292cfb84	feat: single-command Docker startup, fix UI bugs, add Selenium tests - Add `make up` / `make up DEV=1` for one-command Docker startup with optional hot-reload via docker-compose.dev.yml overlay - Add `timmy up --dev` / `timmy down` CLI commands - Fix cross-platform font resolution in creative assembler (7 test failures) - Fix Ollama host URL not passed to Agno model (container connectivity) - Fix task panel route shadowing by reordering literal routes before parameterized routes in swarm.py - Fix chat input not clearing after send (hx-on::after-request) - Fix chat scroll overflow (CSS min-height: 0 on flex children) - Add Selenium UI smoke tests (17 tests, gated behind SELENIUM_UI=1) - Install fonts-dejavu-core in Dockerfile for container font support - Remove obsolete docker-compose version key - Bump CSS cache-bust to v4 833 unit tests pass, 15 Selenium tests pass (2 skipped). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 07:20:56 -05:00
AlexanderWhitestone	bc1be23e23	security: prevent XSS in mobile-test by using textContent	2026-02-25 02:08:02 -05:00
Claude	78cf91697c	feat: add functional Ollama chat tests with containerised LLM Add an ollama service (behind --profile ollama) to the test compose stack and a new test suite that verifies real LLM inference end-to-end: - docker-compose.test.yml: add ollama/ollama service with health check, make OLLAMA_URL and OLLAMA_MODEL configurable via env vars - tests/functional/test_ollama_chat.py: session-scoped fixture that brings up Ollama + dashboard, pulls qwen2.5:0.5b (~400MB, CPU-only), and runs chat/history/multi-turn tests against the live stack - Makefile: add `make test-ollama` target Run with: make test-ollama (or FUNCTIONAL_DOCKER=1 pytest tests/functional/test_ollama_chat.py -v) https://claude.ai/code/session_01NTEzfRHSZQCfkfypxgyHKk	2026-02-25 02:44:36 +00:00
Claude	15596ca325	feat: add Discord integration with chat_bridge abstraction layer Introduces a vendor-agnostic chat platform architecture: - chat_bridge/base.py: ChatPlatform ABC, ChatMessage, ChatThread - chat_bridge/registry.py: PlatformRegistry singleton - chat_bridge/invite_parser.py: QR + Ollama vision invite extraction - chat_bridge/vendors/discord.py: DiscordVendor with native threads Workflow: paste a screenshot of a Discord invite or QR code at POST /discord/join → Timmy extracts the invite automatically. Every Discord conversation gets its own thread, keeping channels clean. Bot responds to @mentions and DMs, routes through Timmy agent. 43 new tests (base classes, registry, invite parser, vendor, routes). https://claude.ai/code/session_01WU4h3cQQiouMwmgYmAgkMM	2026-02-25 01:11:14 +00:00
Claude	2c419a777d	fix: skip Docker tests gracefully when daemon is unavailable The docker_stack fixture now checks `docker info` before attempting `compose up`. If the daemon isn't reachable, tests skip instead of erroring with pytest.fail. https://claude.ai/code/session_01WU4h3cQQiouMwmgYmAgkMM	2026-02-25 00:49:06 +00:00
Claude	c91e02e7c5	test: add functional test suite with real fixtures, no mocking Three-tier functional test infrastructure: - CLI tests via Typer CliRunner (timmy, timmy-serve, self-tdd) - Dashboard integration tests with real TestClient, real SQLite, real coordinator (no patch/mock — Ollama offline = graceful degradation) - Docker compose container-level tests (gated by FUNCTIONAL_DOCKER=1) - End-to-end L402 payment flow with real mock-lightning backend 42 new tests (8 Docker tests skipped without FUNCTIONAL_DOCKER=1). All 849 tests pass. https://claude.ai/code/session_01WU4h3cQQiouMwmgYmAgkMM	2026-02-25 00:46:22 +00:00
Claude	3e51434b4b	test: add 157 functional tests covering 8 low-coverage modules Analyze test coverage (75.3% → 85.4%) and add functional test suites for the major gaps identified: - test_agent_core.py: Full coverage for agent_core/interface.py (0→100%) and agent_core/ollama_adapter.py (0→100%) — data classes, factories, abstract enforcement, perceive/reason/act/recall workflow, effect logging - test_docker_runner.py: Full coverage for swarm/docker_runner.py (0→100%) — container spawn/stop/list lifecycle with mocked subprocess - test_timmy_tools.py: Tool usage tracking, persona toolkit mapping, catalog generation, graceful degradation without Agno - test_routes_tools.py: /tools page, API stats endpoint, and WebSocket /swarm/live connect/disconnect/send lifecycle (41→82%) - test_voice_tts_functional.py: VoiceTTS init, speak, volume clamping, voice listing, graceful degradation (41→94%) - test_watchdog_functional.py: _run_tests, watch loop state transitions, regression detection, KeyboardInterrupt (47→97%) - test_lnd_backend.py: LND init from params/env, grpc stub enforcement, method-level BackendNotAvailableError, settle returns False (25→61%) - test_swarm_routes_functional.py: Agent spawn/stop, task CRUD, auction, insights, UI partials, error paths (63→92%) https://claude.ai/code/session_01WU4h3cQQiouMwmgYmAgkMM	2026-02-24 23:36:50 +00:00
Claude	65a278dbee	fix: comprehensive iPhone UI overhaul — glassmorphism, responsive layouts, theme unification - base.html: add missing {% block extra_styles %}, mobile hamburger menu with slide-out nav, interactive-widget viewport meta, -webkit-text-size-adjust - style.css: define 15+ missing CSS variables (--bg-secondary, --text-muted, --accent, --success, --danger, etc.), add missing utility classes (.grid, .stat, .agent-card, .agent-avatar, .form-group), glassmorphism card effects, iPhone breakpoints (768px, 390px), 44pt min touch targets, smooth animations - mobile.html: rewrite with proper theme variables, glass cards, touch-friendly quick actions grid, chat with proper message bubbles - swarm_live.html: replace undefined CSS vars, use mc-panel theme cards - marketplace.html: responsive agent cards that stack on iPhone, themed pricing - voice_button.html & voice_enhanced.html: proper theme integration, touch-sized buttons, themed result containers - create_task.html: mobile-friendly forms with 16px font (prevents iOS zoom) - tools.html & creative.html: themed headers, responsive column stacking - spark.html: replace all hardcoded blue (#00d4ff) colors with theme purple/orange - briefing.html: replace hardcoded bootstrap colors with theme variables Fixes: header nav overflow on iPhone (7 links in single row), missing extra_styles block silently dropping child template styles, undefined CSS variables breaking mobile/swarm/marketplace/voice pages, sub-44pt touch targets, missing -webkit-text-size-adjust, inconsistent color themes. 97 UI tests pass (91 UI-specific + 6 creative route). https://claude.ai/code/session_01JiyhGyee2zoMN4p8xWYqEe	2026-02-24 22:25:04 +00:00
Alexander Whitestone	7018a756b3	Merge pull request #22 from AlexanderWhitestone/claude/audit-timmy-dashboard-ft27r	2026-02-24 14:18:29 -05:00
Claude	96c9f1b02f	fix: address audit low-hanging fruit — docs accuracy, auction timing, stubs, tests - Docs: "No Cloud" → "No Cloud AI" (frontend uses CDN for Bootstrap/HTMX/fonts) - Docs: "600+" → "640+" tests, "20+" → "58" endpoints (actual counts) - Docs: LND described as "scaffolded" not "gRPC-ready"; remove "agents earn sats" - Fix auction timing: coordinator sleep(0) → sleep(AUCTION_DURATION_SECONDS) - agent_core: implement remember() with dedup/eviction, communicate() via swarm comms - Tests: add CLI tests for chat, think, and backend/model-size forwarding (647 passing) https://claude.ai/code/session_01SZTwAkTg6v4ybv8g9NLxqN	2026-02-24 18:29:21 +00:00
Alexander Whitestone	03ff505c4b	Merge pull request #23 from AlexanderWhitestone/security/macaroon-forgery-and-xss-1771955896	2026-02-24 13:00:52 -05:00
AlexanderWhitestone	4daf382819	security: fix L402 macaroon forgery and XSS in templates	2026-02-24 12:58:19 -05:00
Claude	832478f0d0	fix: serve_chat endpoint bug, stale docs, and license mismatch - Fix /serve/chat AttributeError: split Request and ChatRequest params so auth headers are read from HTTP request, not Pydantic body - Add regression tests for the serve_chat endpoint bug - Add agent_core and lightning to pyproject.toml wheel includes - Replace Apache 2.0 LICENSE with MIT to match pyproject.toml - Update test count from "228" to "600+" across README, docs, AGENTS.md - Add 5 missing subsystems to README table (Spark, Creative, Tools, Telegram, agent_core/lightning) - Update AGENTS.md project structure with 6 missing modules - Mark completed v2 roadmap items (personas, MCP tools) in AGENTS.md https://claude.ai/code/session_01GMiccXbo77GkV3TA69x6KS	2026-02-24 17:18:29 +00:00

1 2

73 Commits