Timmy-time-dashboard

Archived

forked from Rockachopa/Timmy-time-dashboard

Author	SHA1	Message	Date
Alexander Whitestone	07f2c1b41e	fix: wire up tick engine scheduler + add journal + systemd timer (#163 )	2026-03-11 08:47:57 -04:00
Alexander Whitestone	1de97619e8	fix: restore ollama as default backend to fix broken build (#161 )	2026-03-10 18:17:47 -04:00
Manus AI	755b7e7658	feat: update default backend to AirLLM and optimize for Mac M3 36GB	2026-03-10 18:04:04 -04:00
Alexander Whitestone	904a7c564e	feat: migrate to Agno native HITL tool confirmation flow (#158 ) Replace the homebrew regex-based tool extraction and manual dispatch (tool_executor.py) with Agno's built-in Human-In-The-Loop confirmation: - Toolkit(requires_confirmation_tools=...) marks dangerous tools - agent.run() returns RunOutput with status=paused when confirmation needed - RunRequirement.confirm()/reject() + agent.continue_run() resumes execution Dashboard and Discord vendor both use the native flow. DuckDuckGo import isolated so its absence doesn't kill all tools. Test stubs cleaned up (agno is a real dependency, only truly optional packages stubbed). 1384 tests pass in parallel (~14s). Co-authored-by: Trip T <trip@local> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 21:54:04 -04:00
Alexander Whitestone	574031a55c	fix: remove invalid show_tool_calls kwarg crashing Agent init (#157 ) * fix: remove invalid show_tool_calls kwarg crashing Agent init (regression) show_tool_calls was removed in `f95c960` (Feb 26) because agno 2.5.x doesn't accept it, then reintroduced in `fd0ede0` (Mar 8) without runtime testing — mocked tests hid the breakage. Replace the bogus assertion with a regression guard and an allowlist test that catches unknown kwargs before they reach production. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: auto-install git hooks, add black/isort to dev deps - Add .githooks/ with portable pre-commit hook (macOS + Linux) - make install now auto-activates hooks via core.hooksPath - Add black and isort to poetry dev group (were only in CI via raw pip) - Fix black formatting on 2 files flagged by CI - Fix test_autoresearch_perplexity patching wrong module path Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Trip T <trip@local> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 15:01:00 -04:00
Alexander Whitestone	82fb2417e3	feat: enable SQLite WAL mode for all databases (AGI ticket #1 ) (#153 )	2026-03-08 16:07:02 -04:00
Alexander Whitestone	ae3bb1cc21	feat: code quality audit + autoresearch integration + infra hardening (#150 )	2026-03-08 12:50:44 -04:00
Alexander Whitestone	fd0ede0d51	feat: auto-escalation system + agentic loop fixes (#149 ) (#149 ) Wire up automatic error-to-task escalation and fix the agentic loop stopping after the first tool call. Auto-escalation: - Add swarm.task_queue.models with create_task() bridge to existing task queue SQLite DB - Add swarm.event_log with EventType enum, log_event(), and SQLite persistence + WebSocket broadcast - Wire capture_error() into request logging middleware so unhandled HTTP exceptions auto-create [BUG] tasks with stack traces, git context, and push notifications (5-min dedup window) Agentic loop (Round 11 Bug #1): - Wrap agent_chat() in asyncio.to_thread() to stop blocking the event loop (fixes Discord heartbeat warnings) - Enable Agno's native multi-turn tool chaining via show_tool_calls and tool_call_limit on the Agent config - Strengthen multi-step continuation prompts with explicit examples Co-authored-by: Trip T <trip@local> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 03:11:14 -04:00
Alexander Whitestone	7792ae745f	feat: agentic loop for multi-step tasks + regression fixes (#148 ) * fix: name extraction blocklist, memory preview escaping, and gitignore cleanup - Add _NAME_BLOCKLIST to extract_user_name() to reject gerunds and UI-state words like "Sending" that were incorrectly captured as user names - Collapse whitespace in get_memory_status() preview so newlines survive JSON serialization without showing raw \n escape sequences - Broaden .gitignore from specific memory/self/user_profile.md to memory/self/ and untrack memory/self/methodology.md (runtime-edited file) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: catch Ollama connection errors in session.py + add 71 smoke tests - Wrap agent.run() in session.py with try/except so Ollama connection failures return a graceful fallback message instead of dumping raw tracebacks to Docker logs - Add tests/test_smoke.py with 71 tests covering every GET route: core pages, feature pages, JSON APIs, and a parametrized no-500 sweep — catches import errors, template failures, and schema mismatches that unit tests miss Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: agentic loop for multi-step tasks + Round 10 regression fixes Agentic loop (Parts 1-4): - Add multi-step chaining instructions to system prompt - New agentic_loop.py with plan→execute→adapt→summarize flow - Register plan_and_execute tool for background task execution - Add max_agent_steps config setting (default: 10) - Discord fix: 300s timeout, typing indicator, send error handling - 16 new unit + e2e tests for agentic loop Round 10 regressions (R1-R5, P1): - R1: Fix literal \n escape sequences in tool responses - R2: Chat timeout/error feedback in agent panel - R3: /hands infinite spinner → static empty states - R4: /self-coding infinite spinner → static stats + journal - R5: /grok/status raw JSON → HTML dashboard template - P1: VETO confirmation dialog on task cards Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: briefing route 500 in CI when agno is MagicMock stub _call_agent() returned a MagicMock instead of a string when agno is stubbed in tests, causing SQLite "Error binding parameter 4" on save. Ensure the return value is always an actual string. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: briefing route 500 in CI — graceful degradation at route level When agno is stubbed with MagicMock in CI, agent.run() returns a MagicMock instead of raising — so the exception handler never fires and a MagicMock propagates as the summary to SQLite, which can't bind it. Fix: catch at the route level and return a fallback Briefing object. This follows the project's graceful degradation pattern — the briefing page always renders, even when the backend is completely unavailable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Trip T <trip@local> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 01:46:29 -05:00
Alexander Whitestone	b8e0f4539f	fix: Discord memory bug — add session continuity + 6 memory system fixes (#147 ) Discord created a new agent per message with no conversation history, causing Timmy to lose context between messages (the "yes" bug). Now uses a singleton agent with per-channel/thread session_id, matching the dashboard's session.py pattern. Also applies _clean_response() to strip hallucinated tool-call JSON from Discord output. Additional fixes: - get_system_context() no longer clears the handoff file (was destroying session context on every agent creation) - Orchestrator uses HotMemory.read() to auto-create MEMORY.md if missing - vector_store DB_PATH anchored to __file__ instead of relative CWD - brain/schema.py: removed invalid .load dot-commands from INIT_SQL - tools_intro: fixed wrong table name 'vectors' → 'chunks' in tier3 check Co-authored-by: Trip T <trip@local> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 00:20:38 -05:00
Alexander Whitestone	248af9ed03	fix: dashboard bugs and clean up build artifacts (#145 ) * chore: stop tracking runtime-generated self-modify reports These 65 files in data/self_modify_reports/ are auto-generated at runtime and already listed in .gitignore. Tracking them caused conflicts when pulling from main. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve 8 dashboard bugs from Round 4 testing report - Fix Ollama timeout regression: request_timeout → timeout (agno API) - Add Bootstrap JS to base.html (fixes creative UI tab switching) - Send initial_state on Swarm Live WebSocket connect - Add /api/queue/status endpoint (stops 404 log spam from chat panel) - Populate agent tools from registry on /tools page - Add notification bell dropdown with /api/notifications endpoint - All 1157 tests pass Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Trip T <trip@local> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 23:44:56 -05:00
Alexander Whitestone	e36a1dc939	fix: resolve 6 dashboard bugs and rebuild Task Queue + Work Orders (#144 ) (#144 ) Round 2+3 bug fix batch: 1. Ollama timeout: Add request_timeout=300 to prevent socket read errors on complex 30-60s prompts (production crash fix) 2. Memory API: Create missing HTMX partial templates (memory_facts.html, memory_results.html) so Save/Search buttons work 3. CALM page: Add create_tables() call so SQLAlchemy tables exist on first request (was returning HTTP 500) 4. Task Queue: Full SQLite-backed rebuild with CRUD endpoints, HTMX partials, and action buttons (approve/veto/pause/cancel/retry) 5. Work Orders: Full SQLite-backed rebuild with submit/approve/reject/ execute pipeline and HTMX polling partials 6. Memory READ tool: Add memory_read function so Timmy stops calling read_file when trying to recall stored facts Also: Close GitHub issues #115, #114, #112, #110 as won't-fix. Comment on #107 confirming prune_memories() already wired to startup. Tests: 33 new tests across 4 test files, all passing. Full suite: 1155 passed, 2 pre-existing failures (hands_shell). Co-authored-by: Trip T <trip@local> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 23:21:30 -05:00
Alexander Whitestone	b8164e46b0	fix: remove dead swarm imports, add memory_write tool, and auto-prune on startup (#143 ) - Replace dead `from swarm` imports in tools_delegation and tools_intro with working implementations sourced from _PERSONAS - Add `memory_write` tool so the agent can actually persist memories when users ask it to remember something - Enhance `memory_search` to search both vault files AND the runtime vector store for cross-channel recall (Discord/web/Telegram) - Add memory management config: memory_prune_days, memory_prune_keep_facts, memory_vault_max_mb - Auto-prune old vector store entries and warn on vault size at startup - Update tests for new delegation agent list (mace removed) Co-authored-by: Trip T <trip@local> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 22:34:30 -05:00
Alexander Whitestone	b615595100	refactor: centralize config & harden security (#141 ) * feat: upgrade primary model from llama3.1:8b to qwen2.5:14b - Swap OLLAMA_MODEL_PRIMARY to qwen2.5:14b for better reasoning - llama3.1:8b-instruct becomes fallback - Update .env default and README quick start - Fix hardcoded model assertions in tests qwen2.5:14b provides significantly better multi-step reasoning and tool calling reliability while still running locally on modest hardware. The 8B model remains as automatic fallback. * security: centralize config, harden uploads, fix silent exceptions - Add 9 pydantic Settings fields (skip_embeddings, disable_csrf, rqlite_url, brain_source, brain_db_path, csrf_cookie_secure, chat_api_max_body_bytes, timmy_test_mode) to centralize env-var access - Migrate 8 os.environ.get() calls across 5 source files to use `from config import settings` per project convention - Add path traversal defense-in-depth to file upload endpoint - Add 1MB request body size limit to chat API - Make CSRF cookie secure flag configurable via settings - Replace 2 silent `except: pass` blocks with debug logging in session.py - Remove unused `import os` from brain/memory.py and csrf.py - Update 5 CSRF test fixtures to patch settings instead of os.environ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Trip T <trip@local> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 18:49:37 -05:00
Alexander Whitestone	fb97625404	Consolidate architecture: flatten agents, kill Redis/Celery, thin routes (#133 )	2026-03-05 20:27:02 -05:00
Alexander Whitestone	2b97da9e9c	Add pre-commit hook enforcing 30s test suite time limit (#132 )	2026-03-05 19:45:38 -05:00
Alexander Whitestone	aff3edb06a	Audit cleanup: security fixes, code reduction, test hygiene (#131 )	2026-03-05 18:56:52 -05:00
Alexander Whitestone	e8f1dea3ec	Remove unused deps from poetry build, speed test suite to ~16s (#130 )	2026-03-05 18:07:59 -05:00
Alexander Whitestone	f2dacf4ee0	Integrate Celery task queue for background task processing (#129 )	2026-03-05 12:09:51 -05:00
AlexanderWhitestone	5e8766cef0	Fix build issues, implement missing routes, and stabilize e2e tests for production readiness	2026-03-04 17:15:46 -05:00
Alexander Whitestone	425e7da380	Claude/remove persona system f vgt m (#126 ) * Remove persona system, identity, and all Timmy references Strip the codebase to pure orchestration logic: - Delete TIMMY_IDENTITY.md and memory/self/identity.md - Gut brain/identity.py to no-op stubs (empty returns) - Remove all system prompts reinforcing Timmy's character, faith, sovereignty, sign-off ("Sir, affirmative"), and agent roster - Replace identity-laden prompts with generic local-AI-assistant prompts - Remove "You work for Timmy" from all sub-agent system prompts - Rename PersonaTools → AgentTools, PERSONA_TOOLKITS → AGENT_TOOLKITS - Replace "timmy" agent ID with "orchestrator" across routes, marketplace, tools catalog, and orchestrator class - Strip Timmy references from config comments, templates, telegram bot, chat API, and dashboard UI - Delete tests/brain/test_identity.py entirely - Fix all test assertions that checked for persona identity content 729 tests pass (2 pre-existing failures in test_calm.py unrelated). https://claude.ai/code/session_01LjQGUE6nk9W9674zaxrYxy * Add Taskosaur (PM + AI task execution) to docker-compose Spins up Taskosaur alongside the dashboard on `docker compose up`: - postgres:16-alpine (port 5432, Taskosaur DB) - redis:7-alpine (Bull queue backend) - taskosaur (ports 3000 API / 3001 UI) - dashboard now depends_on taskosaur healthy - TASKOSAUR_API_URL injected into dashboard environment Dashboard can reach Taskosaur at http://taskosaur:3000/api on the internal network. Frontend UI accessible at http://localhost:3001. https://claude.ai/code/session_01LjQGUE6nk9W9674zaxrYxy --------- Co-authored-by: Claude <noreply@anthropic.com>	2026-03-04 12:00:49 -05:00
Alexander Whitestone	584eeb679e	Operation Darling Purge: slim to wealth core (-33,783 lines) (#121 )	2026-03-02 13:17:38 -05:00
Alexander Whitestone	62ef1120a4	Memory Unification + Canonical Identity: -11,074 lines of homebrew (#119 )	2026-03-02 09:58:07 -05:00
Alexander Whitestone	6eefcabc97	feat: Phase 1 autonomy upgrades — introspection, heartbeat, source tagging, Discord auto-detect (#101 ) UC-01: Live System Introspection Tool - Add get_task_queue_status(), get_agent_roster(), get_live_system_status() to timmy/tools_intro with graceful degradation - Enhanced get_memory_status() with line counts, section headers, vault directory listing, semantic memory row count, self-coding journal stats - Register system_status MCP tool (creative/tools/system_status.py) - Add system_status to Timmy's tool list + Hard Rule #7 UC-02: Fix Offline Status Bug - Add registry.heartbeat() calls in task_processor run_loop() and process_single_task() so health endpoint reflects actual agent status - health.py now consults swarm registry instead of Ollama connectivity UC-03: Message Source Tagging - Add source field to Message dataclass (default "browser") - Tag all message_log.append() calls: browser, api, system - Include source in /api/chat/history response UC-04: Discord Token Auto-Detection & Docker Fix - Add _discord_token_watcher() background coroutine that polls every 30s for DISCORD_TOKEN in env vars, .env file, or state file - Add --extras discord to all three Dockerfiles (main, dashboard, test) All 26 Phase 1 tests pass in Docker (make test-docker). Full suite: 1889 passed, 77 skipped, 0 failed. Co-authored-by: Alexander Payne <apayne@MM.local> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 22:49:24 -05:00
Alexander Whitestone	89cfe1be0d	fix: Docker-first test suite, UX improvements, and bug fixes (#100 ) Dashboard UX: - Restructure nav from 22 flat links to 6 core + MORE dropdown - Add mobile nav section labels (Core, Intelligence, Agents, System, Commerce) - Defer marked.js and dompurify.js loading, consolidate CDN to jsdelivr - Optimize font weights (drop unused 300/500), bump style.css cache buster - Remove duplicate HTMX load triggers from sidebar and health panels Bug fixes: - Fix Timmy showing OFFLINE by registering after swarm recovery sweep - Fix ThinkingEngine await bug with asyncio.run_coroutine_threadsafe - Fix chat auto-scroll by calling scrollChat() after history partial loads - Add missing /voice/button page and /voice/command endpoint - Fix Grok api_key="" treated as falsy falling through to env key - Fix self_modify PROJECT_ROOT using settings.repo_root instead of __file__ Docker test infrastructure: - Bind-mount hands/, docker/, Dockerfiles, and compose files into test container - Add fontconfig + fonts-dejavu-core for creative/assembler TextClip tests - Initialize minimal git repo in Dockerfile.test for GitSafety compatibility - Fix introspection and path resolution tests for Docker /app context All 1863 tests pass in Docker (0 failures, 77 skipped). Co-authored-by: Alexander Payne <apayne@MM.local> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 22:14:37 -05:00
Alexander Whitestone	6e67c3b421	feat: add bug report ingestion pipeline with Forge dispatch (#99 ) Replace the stub `handle_bug_report` handler with a real implementation that logs a decision trail and dispatches code_fix tasks to Forge for automated fixing. Add `POST /api/bugs/submit` endpoint and `timmy ingest-report` CLI command so AI test runners (Comet) can submit structured bug reports without manual copy-paste. - POST /api/bugs/submit: accepts JSON reports, creates bug_report tasks - timmy ingest-report: CLI for file/stdin JSON ingestion with --dry-run - handle_bug_report: logs decision trail to event_log, dispatches code_fix task to Forge with parent_task_id linking back to the bug - 18 TDD tests covering endpoint, handler, and CLI Co-authored-by: Alexander Payne <apayne@MM.local> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 21:15:53 -05:00
Alexander Whitestone	a5fd680428	feat: microservices refactoring with TDD and Docker optimization (#88 ) ## Summary Complete refactoring of Timmy Time from monolithic architecture to microservices using Test-Driven Development (TDD) and optimized Docker builds. ## Changes ### Core Improvements - Optimized dashboard startup: moved blocking tasks to async background processes - Fixed model fallback logic in agent configuration - Enhanced test fixtures with comprehensive conftest.py ### Microservices Architecture - Created separate Dockerfiles for dashboard, Ollama, and agent services - Implemented docker-compose.microservices.yml for service orchestration - Added health checks and non-root user execution for security - Multi-stage Docker builds for lean, fast images ### Testing - Added E2E tests for dashboard responsiveness - Added E2E tests for Ollama integration - Added E2E tests for microservices architecture validation - All 36 tests passing, 8 skipped (environment-specific) ### Documentation - Created comprehensive final report - Generated issue resolution plan - Added interview transcript demonstrating core agent functionality ### New Modules - skill_absorption.py: Dynamic skill loading and integration system for Timmy ## Test Results ✅ 36 passed, 8 skipped, 6 warnings ✅ All microservices tests passing ✅ Dashboard responsiveness verified ✅ Ollama integration validated ## Files Added/Modified - docker/: Multi-stage Dockerfiles for all services - tests/e2e/: Comprehensive E2E test suite - src/timmy/skill_absorption.py: Skill absorption system - src/dashboard/app.py: Optimized startup logic - tests/conftest.py: Enhanced test fixtures - docker-compose.microservices.yml: Service orchestration ## Breaking Changes None - all changes are backward compatible ## Next Steps - Integrate skill absorption system into agent workflow - Test with microservices-tdd-refactor skill - Deploy to production with docker-compose orchestration	2026-02-28 11:07:19 -05:00
Alexander Whitestone	ab014dc5c6	feat: add `timmy interview` command for structured agent initialization (#87 )	2026-02-28 09:35:44 -05:00
Alexander Whitestone	849b5b1a8d	feat: add default thinking thread — Timmy always ponders (#75 )	2026-02-27 01:00:11 -05:00
Alexander Whitestone	a975a845c5	feat: Timmy system introspection, delegation, and session logging (#74 ) * test: remove hardcoded sleeps, add pytest-timeout - Replace fixed time.sleep() calls with intelligent polling or WebDriverWait - Add pytest-timeout dependency and --timeout=30 to prevent hangs - Fixes test flakiness and improves test suite speed * feat: add Aider AI tool to Forge's toolkit - Add Aider tool that calls local Ollama (qwen2.5:14b) for AI coding assist - Register tool in Forge's code toolkit - Add functional tests for the Aider tool * config: add opencode.json with local Ollama provider for sovereign AI * feat: Timmy fixes and improvements ## Bug Fixes - Fix read_file path resolution: add ~ expansion, proper relative path handling - Add repo_root to config.py with auto-detection from .git location - Fix hardcoded llama3.2 - now dynamic from settings.ollama_model ## Timmy's Requests - Add communication protocol to AGENTS.md (read context first, explain changes) - Create DECISIONS.md for architectural decision documentation - Add reasoning guidance to system prompts (step-by-step, state uncertainty) - Update tests to reflect correct model name (llama3.1:8b-instruct) ## Testing - All 177 dashboard tests pass - All 32 prompt/tool tests pass * feat: Timmy system introspection, delegation, and session logging ## System Introspection (Sovereign Self-Knowledge) - Add get_system_info() tool - Timmy can now query his runtime environment - Add check_ollama_health() - verify Ollama status - Add get_memory_status() - check memory tier status - True introspection vs hardcoded prompts ## Path Resolution Fix - Fix all toolkits to use settings.repo_root consistently - Now uses Path(settings.repo_root) instead of Path.cwd() ## Inter-Agent Delegation - Add delegate_task() tool - Timmy can dispatch to Seer, Forge, Echo, etc. - Add list_swarm_agents() - query available agents ## Session Logging - Add SessionLogger for comprehensive interaction logging - Records messages, tool calls, errors, decisions - Writes to /logs/session_{date}.jsonl ## Tests - Add tests for introspection tools - Add tests for delegation tools - Add tests for session logging - Add tests for path resolution - All 18 new tests pass - All 177 dashboard tests pass --------- Co-authored-by: Alexander Payne <apayne@MM.local>	2026-02-27 00:11:53 -05:00
Alexander Whitestone	18ed6232f9	feat: Timmy fixes and improvements (#72 ) * test: remove hardcoded sleeps, add pytest-timeout - Replace fixed time.sleep() calls with intelligent polling or WebDriverWait - Add pytest-timeout dependency and --timeout=30 to prevent hangs - Fixes test flakiness and improves test suite speed * feat: add Aider AI tool to Forge's toolkit - Add Aider tool that calls local Ollama (qwen2.5:14b) for AI coding assist - Register tool in Forge's code toolkit - Add functional tests for the Aider tool * config: add opencode.json with local Ollama provider for sovereign AI * feat: Timmy fixes and improvements ## Bug Fixes - Fix read_file path resolution: add ~ expansion, proper relative path handling - Add repo_root to config.py with auto-detection from .git location - Fix hardcoded llama3.2 - now dynamic from settings.ollama_model ## Timmy's Requests - Add communication protocol to AGENTS.md (read context first, explain changes) - Create DECISIONS.md for architectural decision documentation - Add reasoning guidance to system prompts (step-by-step, state uncertainty) - Update tests to reflect correct model name (llama3.1:8b-instruct) ## Testing - All 177 dashboard tests pass - All 32 prompt/tool tests pass --------- Co-authored-by: Alexander Payne <apayne@MM.local>	2026-02-26 23:39:13 -05:00
Alexander Whitestone	a5765c33b6	feat: add Aider AI tool to Forge's toolkit (#70 ) * test: remove hardcoded sleeps, add pytest-timeout - Replace fixed time.sleep() calls with intelligent polling or WebDriverWait - Add pytest-timeout dependency and --timeout=30 to prevent hangs - Fixes test flakiness and improves test suite speed * feat: add Aider AI tool to Forge's toolkit - Add Aider tool that calls local Ollama (qwen2.5:14b) for AI coding assist - Register tool in Forge's code toolkit - Add functional tests for the Aider tool --------- Co-authored-by: Alexander Payne <apayne@MM.local>	2026-02-26 23:17:19 -05:00
Alexander Payne	72a58f1f49	feat: Multi-modal support with automatic model fallback - Add MultiModalManager with capability detection for vision/audio/tools - Define fallback chains: vision (llama3.2:3b -> llava:7b -> moondream) tools (llama3.1:8b-instruct -> qwen2.5:7b) - Update CascadeRouter to detect content type and select appropriate models - Add model pulling with automatic fallback in agent creation - Update providers.yaml with multi-modal model configurations - Update OllamaAdapter to use model resolution with vision support Tests: All 96 infrastructure tests pass	2026-02-26 22:29:44 -05:00
Alexander Payne	a85661274c	Merge main into feature/model-upgrade-llama3.1 with conflict resolution	2026-02-26 22:19:44 -05:00
Claude	17059bc0ea	feat: add Grok (xAI) as opt-in premium backend with monetization - Add GrokBackend class in src/timmy/backends.py with full sync/async support, health checks, usage stats, and cost estimation in sats - Add consult_grok tool to Timmy's toolkit for proactive Grok queries - Extend cascade router with Grok provider type for failover chain - Add Grok Mode toggle card to Mission Control dashboard (HTMX live) - Add "Ask Grok" button on chat input for direct Grok queries - Add /grok/* routes: status, toggle, chat, stats endpoints - Integrate Lightning invoice generation for Grok usage monetization - Add GROK_ENABLED, XAI_API_KEY, GROK_DEFAULT_MODEL, GROK_MAX_SATS_PER_QUERY, GROK_FREE config settings via pydantic-settings - Update .env.example and docker-compose.yml with Grok env vars - Add 21 tests covering backend, tools, and route endpoints (all green) Local-first ethos preserved: Grok is premium augmentation only, disabled by default, and Lightning-payable when enabled. https://claude.ai/code/session_01FygwN8wS8J6WGZ8FPb7XGV	2026-02-27 01:12:51 +00:00
Claude	9f4c809f70	refactor: Phase 2b — consolidate 28 modules into 14 packages Complete the module consolidation planned in REFACTORING_PLAN.md: Modules merged: - work_orders/ + task_queue/ → swarm/ (subpackages) - self_modify/ + self_tdd/ + upgrades/ → self_coding/ (subpackages) - tools/ → creative/tools/ - chat_bridge/ + telegram_bot/ + shortcuts/ + voice/ → integrations/ (new) - ws_manager/ + notifications/ + events/ + router/ → infrastructure/ (new) - agents/ + agent_core/ + memory/ → timmy/ (subpackages) Updated across codebase: - 66 source files: import statements rewritten - 13 test files: import + patch() target strings rewritten - pyproject.toml: wheel includes (28→14), entry points updated - CLAUDE.md: singleton paths, module map, entry points table - AGENTS.md: file convention updates - REFACTORING_PLAN.md: execution status, success metrics Extras: - Module-level CLAUDE.md added to 6 key packages (Phase 6.2) - Zero test regressions: 1462 tests passing https://claude.ai/code/session_01JNjWfHqusjT3aiN4vvYgUk	2026-02-26 22:07:41 +00:00
Claude	6045077144	refactor: Phase 1/4/6 — doc cleanup, config fix, token optimization Phase 1 — Documentation cleanup: - Slim README 303→93 lines (remove duplicated architecture, config tables) - Slim CLAUDE.md 267→80 lines (remove project layout, env vars, CI section) - Slim AGENTS.md 342→72 lines (remove duplicated patterns, running locally) - Delete MEMORY.md, WORKSET_PLAN.md, WORKSET_PLAN_PHASE2.md (session docs) - Archive PLAN.md, IMPLEMENTATION_SUMMARY.md to docs/ - Move QUALITY_ANALYSIS.md, QUALITY_REVIEW_REPORT.md to docs/ - Move apply_security_fixes.py, activate_self_tdd.sh to scripts/ Phase 4 — Config & build cleanup: - Fix wheel build: add 11 missing modules to pyproject.toml include list - Add pytest markers (unit, integration, dashboard, swarm, slow) - Add data/self_modify_reports/ and .handoff/ to .gitignore Phase 6 — Token optimization: - Add docstrings to 15 __init__.py files that were empty - Create __init__.py for events/, memory/, upgrades/ modules Root markdown: 87KB → ~18KB (79% reduction) https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN	2026-02-26 21:03:15 +00:00
Alexander Payne	431cf3e020	merge: resolve conflicts with main, keep comprehensive chat pipeline Resolved merge conflicts in agents.py and test_task_queue.py: - Keep full chat-to-task pipeline (agent/priority extraction, question filtering, context injection) over simpler main version - Incorporate test_briefing_task_queue_summary from main - All 64 task queue tests pass Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 11:47:34 -05:00
Alexander Payne	3ca8e9f2d6	fix: chat evaluation bugs — task pipeline, prompt grounding, markdown rendering Addresses 14 bugs from 3 rounds of deep chat evaluation: - Add chat-to-task pipeline in agents.py with regex-based intent detection, agent extraction, priority extraction, and title cleaning - Filter meta-questions ("how do I create a task?") from task creation - Inject real-time date/time context into every chat message - Inject live queue state when user asks about tasks - Ground system prompts with agent roster, honesty guardrails, self-knowledge, math delegation template, anti-filler rules, values-conflict guidance - Add CSS for markdown code blocks, inline code, lists, blockquotes in chat - Add highlight.js CDN for syntax highlighting in chat responses - Reduce small-model memory context budget (4000→2000) for expanded prompt - Add 27 comprehensive tests covering the full chat-to-task pipeline Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 11:42:42 -05:00
Alexander Payne	bc9089ef96	feat: wire chat-to-task-queue and briefing integration - Chat messages like "add X to the queue" or "create a task" are intercepted and create a task_queue entry with pending_approval status instead of going through to the LLM - Briefing engine now gathers task queue stats (pending, running, completed, failed) and includes them in the morning briefing prompt - 7 new tests covering detection patterns, chat integration, and briefing summary Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 10:33:14 -05:00
Alexander Payne	6e6b4355bb	fix: calculator tool, markdown rendering, prompt guardrails, briefing notification - Add sandboxed calculator tool to Timmy's toolkit so arithmetic questions get exact answers instead of LLM hallucinations - Update system prompts (lite + full) to instruct Timmy to always use the calculator and never attempt multi-digit math in his head - Add self-contradiction guard to both prompts ("commit to your facts") - Render Timmy's chat responses as markdown via marked.js + DOMPurify instead of raw escaped text - Suppress empty briefing notification on startup when there are 0 pending approval items - Add calculator to session response sanitizer regex - 18 new calculator tests, 2 updated briefing notification tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 09:35:59 -05:00
Alexander Payne	f95c9606f1	fix: Timmy startup crashes and clean initialization - Remove show_tool_calls kwarg (not in Agno 2.5.3), which crashed Agent.__init__ - Guard memory_search against top_k=None from model, return formatted string - Skip Telegram/Discord startup silently when no token configured - Replace placeholder MEMORY.md with proper structured hot memory document Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 09:11:48 -05:00
Alexander Whitestone	dccd13df8e	Merge pull request #46 from AlexanderWhitestone/feature/memory-layers-and-conversational-ai feat: Event Log, Ledger, Memory, Cascade Router, Upgrade Queue, Activity Feed	2026-02-26 08:33:32 -05:00
Alexander Payne	d8d976aa60	feat: complete Event Log, Ledger, Memory, Cascade Router, Upgrade Queue, Activity Feed This commit implements six major features: 1. Event Log System (src/swarm/event_log.py) - SQLite-based audit trail for all swarm events - Task lifecycle tracking (created, assigned, completed, failed) - Agent lifecycle tracking (joined, left, status changes) - Integrated with coordinator for automatic logging - Dashboard page at /swarm/events 2. Lightning Ledger (src/lightning/ledger.py) - Transaction tracking for Lightning Network payments - Balance calculations (incoming, outgoing, net, available) - Integrated with payment_handler for automatic logging - Dashboard page at /lightning/ledger 3. Semantic Memory / Vector Store (src/memory/vector_store.py) - Embedding-based similarity search for Echo agent - Fallback to keyword matching if sentence-transformers unavailable - Personal facts storage and retrieval - Dashboard page at /memory 4. Cascade Router Integration (src/timmy/cascade_adapter.py) - Automatic LLM failover between providers (Ollama → AirLLM → API) - Circuit breaker pattern for failing providers - Metrics tracking per provider (latency, error rates) - Dashboard status page at /router/status 5. Self-Upgrade Approval Queue (src/upgrades/) - State machine for self-modifications: proposed → approved/rejected → applied/failed - Human approval required before applying changes - Git integration for branch management - Dashboard queue at /self-modify/queue 6. Real-Time Activity Feed (src/events/broadcaster.py) - WebSocket-based live activity streaming - Bridges event_log to dashboard clients - Activity panel on /swarm/live Tests: - 101 unit tests passing - 4 new E2E test files for Selenium testing - Run with: SELENIUM_UI=1 pytest tests/functional/ -v --headed Documentation: - 6 ADRs (017-022) documenting architecture decisions - Implementation summary in docs/IMPLEMENTATION_SUMMARY.md - Architecture diagram in docs/architecture-v2.md	2026-02-26 08:01:01 -05:00
Alexander Payne	26e1691099	Fix Timmy coherence: persistent session, model-aware tools, response sanitization Timmy was exhibiting severe incoherence (no memory between messages, tool call leakage, chain-of-thought narration, random tool invocations) due to creating a brand new agent per HTTP request and giving a 3B model (llama3.2) a 73-line system prompt with complex tool-calling instructions it couldn't follow. Key changes: - Add session.py singleton with stable session_id for conversation continuity - Add _model_supports_tools() to strip tools from small models (< 7B) - Add two-tier prompts: lite (12 lines) for small models, full for capable ones - Add response sanitizer to strip leaked JSON tool calls and CoT narration - Set show_tool_calls=False to prevent raw tool JSON in output - Wire ConversationManager for user name extraction - Deprecate orphaned memory_layers.py (unused 4-layer system) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 19:18:08 -05:00
Alexander Payne	16b65b28e8	Add Tier 3: Semantic Memory (vector search) Completes the three-tier memory architecture: ## Tier 3 — Semantic Search - Vector embeddings over all vault files - Similarity-based retrieval - memory_search tool for agents - Fallback to hash-based embeddings if transformers unavailable ## Implementation - src/timmy/semantic_memory.py — Core semantic memory - Chunking strategy: paragraphs → sentences - SQLite storage for vectors - cosine_similarity for ranking ## Integration - Added memory_search to create_full_toolkit() - Updated prompts with memory_search examples - Tool triggers: past conversations, reminders ## Features - Automatic vault indexing - Source file tracking (re-indexes on change) - Similarity scoring - Context retrieval for queries ## Usage All 973 tests pass.	2026-02-25 18:25:20 -05:00
Alexander Payne	7838df19b0	Implement three-tier memory architecture (Hot/Vault/Handoff) This commit replaces the previous memory_layers.py with a proper three-tier memory system as specified by the user: ## Tier 1 — Hot Memory (MEMORY.md) - Single flat file always loaded into system context - Contains: current status, standing rules, agent roster, key decisions - ~300 lines max, pruned monthly - Managed by HotMemory class ## Tier 2 — Structured Vault (memory/) - Directory with three namespaces: • self/ — identity.md, user_profile.md, methodology.md • notes/ — session logs, AARs, research • aar/ — post-task retrospectives - Markdown format, Obsidian-compatible - Append-only, date-stamped - Managed by VaultMemory class ## Handoff Protocol - last-session-handoff.md written at session end - Contains: summary, key decisions, open items, next steps - Auto-loaded at next session start - Maintains continuity across resets ## Implementation ### New Files: - src/timmy/memory_system.py — Core memory system - MEMORY.md — Hot memory template - memory/self/*.md — Identity, user profile, methodology ### Modified: - src/timmy/agent.py — Integrated with memory system - create_timmy() injects memory context - TimmyWithMemory class with automatic fact extraction - tests/test_agent.py — Updated for memory context ## Key Principles - Hot memory = small and curated - Vault = append-only, never delete - Handoffs = continuity mechanism - Flat files = human-readable, portable ## Usage All 973 tests pass.	2026-02-25 18:17:43 -05:00
Alexander Payne	625806daf5	Fine-tune Timmy's conversational AI with memory layers ## Enhanced System Prompt - Detailed tool usage guidelines with explicit examples - Clear DO and DON'T examples for tool selection - Memory system documentation - Conversation flow guidelines - Context awareness instructions ## Memory Layer System (NEW) Implemented 3-layer memory architecture: 1. WORKING MEMORY (src/timmy/memory_layers.py) - Immediate context (last 20 messages) - Topic tracking - Tool call tracking - Fast, ephemeral 2. SHORT-TERM MEMORY (Agno SQLite) - Recent conversations (100) - Persists across restarts - Managed by Agno Agent 3. LONG-TERM MEMORY (src/timmy/memory_layers.py) - Facts about user (name, preferences) - SQLite storage in data/memory/ - Auto-extraction from conversations - User profile generation ## Memory Manager (NEW) - Central coordinator for all memory layers - Context injection into prompts - Fact extraction and storage - Session management ## TimmyWithMemory Class (NEW) - Wrapper around Agno Agent with explicit memory - Auto-injects user context from LTM - Tracks exchanges across all layers - Simple chat() interface ## Agent Configuration - Increased num_history_runs: 10 -> 20 - Better conversational context retention ## Tests - All 973 tests pass - Fixed test expectations for new config - Fixed module path in test_scary_paths.py ## Files Added/Modified - src/timmy/prompts.py - Enhanced with memory and tool guidance - src/timmy/agent.py - Added TimmyWithMemory class - src/timmy/memory_layers.py - NEW memory system - src/timmy/conversation.py - NEW conversation manager - tests/ - Updated for new config	2026-02-25 18:07:44 -05:00
Alexander Payne	4961c610f2	Security, privacy, and agent intelligence hardening ## Security (Workset A) - XSS: Verified templates use safe DOM methods (textContent, createElement) - Secrets: Fail-fast in production mode when L402 secrets not set - Environment mode: Add TIMMY_ENV (development\|production) validation ## Privacy (Workset C) - Add telemetry_enabled config (default: False for sovereign AI) - Pass telemetry setting to Agno Agent - Update .env.example with TELEMETRY_ENABLED and TIMMY_ENV docs ## Agent Intelligence (Workset D) - Enhanced TIMMY_SYSTEM_PROMPT with: - Tool usage guidelines (when to use, when not to) - Memory awareness documentation - Operating mode documentation - Help reduce unnecessary tool calls for simple queries All 895 tests pass. Telemetry disabled by default aligns with sovereign AI vision.	2026-02-25 15:32:19 -05:00
Alexander Payne	1bc2cdcb2e	Fix Agno Toolkit API compatibility issues - Change Toolkit.add_tool() to Toolkit.register() (method was renamed in Agno) - Fix PythonTools method: python -> run_python_code - Fix FileTools method: write_file -> save_file - Fix FileTools base_dir parameter: str -> Path object - Fix Agent tools parameter: pass Toolkit wrapped in list These fixes resolve critical startup errors that prevented Timmy agent from initializing: - AttributeError: 'Toolkit' object has no attribute 'add_tool' - AttributeError: 'PythonTools' object has no attribute 'python' - TypeError: 'Toolkit' object is not iterable All 895 tests pass after these changes. Quality review: Agent now fully functional with working inference, memory, and self-awareness capabilities.	2026-02-25 14:11:13 -05:00

1 2

60 Commits