Timmy-time-dashboard

Archived

forked from Rockachopa/Timmy-time-dashboard

Author	SHA1	Message	Date
Alexander Whitestone	36fc10097f	Claude/angry cerf (#173 ) * feat: set qwen3.5:latest as default model - Make qwen3.5:latest the primary default model for faster inference - Move llama3.1:8b-instruct to fallback chain - Update text fallback chain to prioritize qwen3.5:latest Retains full backward compatibility via cascade fallback. * test: remove ~55 brittle, duplicate, and useless tests Audit of all 100 test files identified tests that provided no real regression protection. Removed: - 4 files deleted entirely: test_setup_script (always skipped), test_csrf_bypass (tautological assertions), test_input_validation (accepts 200-500 status codes), test_security_regression (fragile source-pattern checks redundant with rendering tests) - Duplicate test classes (TestToolTracking, TestCalculatorExtended) - Mock-only tests that just verify mock wiring, not behavior - Structurally broken tests (TestCreateToolFunctions patches after import) - Empty/pass-body tests and meaningless assertions (len > 20) - Flaky subprocess tests (aider tool calling real binary) All 1328 remaining tests pass. Net: -699 lines, zero coverage loss. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: prevent test pollution from autoresearch_enabled mutation test_autoresearch_perplexity.py was setting settings.autoresearch_enabled = True but never restoring it in the finally block — polluting subsequent tests. When pytest-randomly ordered it before test_experiments_page_shows_disabled_when_off, the victim test saw enabled=True and failed to find "Disabled" in the page. Fix both sides: - Restore autoresearch_enabled in the finally block (root cause) - Mock settings explicitly in the victim test (defense in depth) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Trip T <trip@local> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 16:55:27 -04:00
Alexander Whitestone	9d78eb31d1	ruff (#169 ) * polish: streamline nav, extract inline styles, improve tablet UX - Restructure desktop nav from 8+ flat links + overflow dropdown into 5 grouped dropdowns (Core, Agents, Intel, System, More) matching the mobile menu structure to reduce decision fatigue - Extract all inline styles from mission_control.html and base.html notification elements into mission-control.css with semantic classes - Replace JS-built innerHTML with secure DOM construction in notification loader and chat history - Add CONNECTING state to connection indicator (amber) instead of showing OFFLINE before WebSocket connects - Add tablet breakpoint (1024px) with larger touch targets for Apple Pencil / stylus use and safe-area padding for iPad toolbar - Add active-link highlighting in desktop dropdown menus - Rename "Mission Control" page title to "System Overview" to disambiguate from the chat home page - Add "Home — Timmy Time" page title to index.html https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h * fix(security): move auth-gate credentials to environment variables Hardcoded username, password, and HMAC secret in auth-gate.py replaced with os.environ lookups. Startup now refuses to run if any variable is unset. Added AUTH_GATE_SECRET/USER/PASS to .env.example. https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h * refactor(tooling): migrate from black+isort+bandit to ruff Replace three separate linting/formatting tools with a single ruff invocation. Updates tox.ini (lint, format, pre-push, pre-commit envs), .pre-commit-config.yaml, and CI workflow. Fixes all ruff errors including unused imports, missing raise-from, and undefined names. Ruff config maps existing bandit skips to equivalent S-rules. https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h --------- Co-authored-by: Claude <noreply@anthropic.com>	2026-03-11 12:23:35 -04:00
Alexander Whitestone	904a7c564e	feat: migrate to Agno native HITL tool confirmation flow (#158 ) Replace the homebrew regex-based tool extraction and manual dispatch (tool_executor.py) with Agno's built-in Human-In-The-Loop confirmation: - Toolkit(requires_confirmation_tools=...) marks dangerous tools - agent.run() returns RunOutput with status=paused when confirmation needed - RunRequirement.confirm()/reject() + agent.continue_run() resumes execution Dashboard and Discord vendor both use the native flow. DuckDuckGo import isolated so its absence doesn't kill all tools. Test stubs cleaned up (agno is a real dependency, only truly optional packages stubbed). 1384 tests pass in parallel (~14s). Co-authored-by: Trip T <trip@local> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 21:54:04 -04:00
Alexander Whitestone	ae3bb1cc21	feat: code quality audit + autoresearch integration + infra hardening (#150 )	2026-03-08 12:50:44 -04:00
Alexander Whitestone	3f06e7231d	Improve test coverage from 63.6% to 73.4% and fix test infrastructure (#137 )	2026-03-06 13:21:05 -05:00
AlexanderWhitestone	bbe975ec54	feat: add TDD setup script and functional tests for Sovereign Agent Stack	2026-03-05 21:54:24 -05:00
AlexanderWhitestone	5e8766cef0	Fix build issues, implement missing routes, and stabilize e2e tests for production readiness	2026-03-04 17:15:46 -05:00
Alexander Whitestone	584eeb679e	Operation Darling Purge: slim to wealth core (-33,783 lines) (#121 )	2026-03-02 13:17:38 -05:00
Alexander Whitestone	e5190b248a	CI/CD Optimization: Guard Rails, Pre-commit Checks, and Test Fixes (#90 ) * CI/CD Optimization: Guard Rails, Black Linting, and Pre-commit Hooks - Fixed all test collection errors (Selenium imports, fixture paths, syntax) - Implemented pre-commit hooks with Black formatting and isort - Created comprehensive Makefile with test targets (unit, integration, functional, e2e) - Added pytest.ini with marker definitions for test categorization - Established guard rails to prevent future collection errors - Wrapped optional dependencies (Selenium, MoviePy) in try-except blocks - Added conftest_markers for automatic test categorization This ensures a smooth development stream with: - Fast feedback loops (pre-commit checks before push) - Consistent code formatting (Black) - Reliable CI/CD (no collection errors, proper test isolation) - Clear test organization (unit, integration, functional, E2E) * Fix CI/CD test failures: - Export templates from dashboard.app - Fix model name assertion in test_agent.py - Fix platform-agnostic path resolution in test_path_resolution.py - Skip Docker tests in test_docker_deployment.py if docker not available - Fix test_model_fallback_chain logic in test_ollama_integration.py * Add preventative pre-commit checks and Docker test skipif decorators: - Create pre_commit_checks.py script for common CI failures - Add skipif decorators to Docker tests - Improve test robustness for CI environments	2026-02-28 11:36:50 -05:00
Alexander Whitestone	3426761894	fix: unblock task queue — auto-approve all tasks, recycle zombie runners (#85 ) The task queue was completely stuck: 82 tasks trapped in pending_approval, 4 zombie tasks frozen in running, and the worker loop unable to process anything. This removes the approval gate as the default and adds startup recovery for orphaned tasks. - Auto-approve all tasks by default; only task_type="escalation" requires human review (and escalations never block the processor) - Add reconcile_zombie_tasks() to reset RUNNING→APPROVED on startup - Use in-memory _current_task for concurrency check instead of DB status so stale RUNNING rows from a crash can't block new work - Update get_next_pending_task to only query APPROVED tasks - Update all callsites (chat route, API, form) to match new defaults Co-authored-by: Alexander Payne <apayne@MM.local> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 06:57:51 -05:00
Alexander Whitestone	6545b7e26a	test: add inbox zero functional tests for task queue processor (#79 ) 11 tests verify that the TaskProcessor drains all queued tasks to completion — the core behavior needed for Timmy's stream of consciousness. Tests cover: single/batch/burst processing, priority ordering, mixed task types, failure recovery, timestamp tracking, and a loop-based inbox zero assertion. Adds an `isolated_task_db` fixture to functional conftest that gives each test a fresh temporary SQLite database via pytest's tmp_path. Co-authored-by: Alexander Payne <apayne@MM.local> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 02:19:02 -05:00
Alexander Whitestone	51140fb7f0	test: remove hardcoded sleeps, add pytest-timeout (#69 ) - Replace fixed time.sleep() calls with intelligent polling or WebDriverWait - Add pytest-timeout dependency and --timeout=30 to prevent hangs - Fixes test flakiness and improves test suite speed Co-authored-by: Alexander Payne <apayne@MM.local>	2026-02-26 22:52:36 -05:00
Claude	9f4c809f70	refactor: Phase 2b — consolidate 28 modules into 14 packages Complete the module consolidation planned in REFACTORING_PLAN.md: Modules merged: - work_orders/ + task_queue/ → swarm/ (subpackages) - self_modify/ + self_tdd/ + upgrades/ → self_coding/ (subpackages) - tools/ → creative/tools/ - chat_bridge/ + telegram_bot/ + shortcuts/ + voice/ → integrations/ (new) - ws_manager/ + notifications/ + events/ + router/ → infrastructure/ (new) - agents/ + agent_core/ + memory/ → timmy/ (subpackages) Updated across codebase: - 66 source files: import statements rewritten - 13 test files: import + patch() target strings rewritten - pyproject.toml: wheel includes (28→14), entry points updated - CLAUDE.md: singleton paths, module map, entry points table - AGENTS.md: file convention updates - REFACTORING_PLAN.md: execution status, success metrics Extras: - Module-level CLAUDE.md added to 6 key packages (Phase 6.2) - Zero test regressions: 1462 tests passing https://claude.ai/code/session_01JNjWfHqusjT3aiN4vvYgUk	2026-02-26 22:07:41 +00:00
Alexander Payne	06a15bb3f2	test: add missing fixtures for functional tests Add fixtures required by functional test suite: - docker_stack: Docker container test URL (skips if FUNCTIONAL_DOCKER != 1) - serve_client: FastAPI TestClient for timmy-serve app - tdd_runner: Alias for self_tdd_runner Fixes CI errors in test_docker_swarm.py, test_l402_flow.py, test_cli.py	2026-02-26 08:30:04 -05:00
Alexander Payne	96ed82d81e	fix: memory route bug + fast E2E tests under 10 seconds - Fix recall_personal_facts() call - remove unsupported limit parameter - Replace 4 slow E2E test files with single fast test file - All 6 E2E tests complete in ~9 seconds (was 60+ seconds) - Reuse browser session across tests (module-scoped fixture) - Combine related checks into single tests - Add HTTP-only smoke test for speed	2026-02-26 08:08:32 -05:00
Alexander Payne	d8d976aa60	feat: complete Event Log, Ledger, Memory, Cascade Router, Upgrade Queue, Activity Feed This commit implements six major features: 1. Event Log System (src/swarm/event_log.py) - SQLite-based audit trail for all swarm events - Task lifecycle tracking (created, assigned, completed, failed) - Agent lifecycle tracking (joined, left, status changes) - Integrated with coordinator for automatic logging - Dashboard page at /swarm/events 2. Lightning Ledger (src/lightning/ledger.py) - Transaction tracking for Lightning Network payments - Balance calculations (incoming, outgoing, net, available) - Integrated with payment_handler for automatic logging - Dashboard page at /lightning/ledger 3. Semantic Memory / Vector Store (src/memory/vector_store.py) - Embedding-based similarity search for Echo agent - Fallback to keyword matching if sentence-transformers unavailable - Personal facts storage and retrieval - Dashboard page at /memory 4. Cascade Router Integration (src/timmy/cascade_adapter.py) - Automatic LLM failover between providers (Ollama → AirLLM → API) - Circuit breaker pattern for failing providers - Metrics tracking per provider (latency, error rates) - Dashboard status page at /router/status 5. Self-Upgrade Approval Queue (src/upgrades/) - State machine for self-modifications: proposed → approved/rejected → applied/failed - Human approval required before applying changes - Git integration for branch management - Dashboard queue at /self-modify/queue 6. Real-Time Activity Feed (src/events/broadcaster.py) - WebSocket-based live activity streaming - Bridges event_log to dashboard clients - Activity panel on /swarm/live Tests: - 101 unit tests passing - 4 new E2E test files for Selenium testing - Run with: SELENIUM_UI=1 pytest tests/functional/ -v --headed Documentation: - 6 ADRs (017-022) documenting architecture decisions - Implementation summary in docs/IMPLEMENTATION_SUMMARY.md - Architecture diagram in docs/architecture-v2.md	2026-02-26 08:01:01 -05:00
Alexander Payne	3463f4e4a4	fix: rename src/websocket to src/ws_manager to avoid websocket-client clash selenium depends on websocket-client which installs a top-level `websocket` package that shadows our src/websocket/ module on CI. Renaming to ws_manager eliminates the conflict entirely — no more sys.path hacks needed in conftest or Selenium tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 07:57:28 -05:00
Alexander Payne	29292cfb84	feat: single-command Docker startup, fix UI bugs, add Selenium tests - Add `make up` / `make up DEV=1` for one-command Docker startup with optional hot-reload via docker-compose.dev.yml overlay - Add `timmy up --dev` / `timmy down` CLI commands - Fix cross-platform font resolution in creative assembler (7 test failures) - Fix Ollama host URL not passed to Agno model (container connectivity) - Fix task panel route shadowing by reordering literal routes before parameterized routes in swarm.py - Fix chat input not clearing after send (hx-on::after-request) - Fix chat scroll overflow (CSS min-height: 0 on flex children) - Add Selenium UI smoke tests (17 tests, gated behind SELENIUM_UI=1) - Install fonts-dejavu-core in Dockerfile for container font support - Remove obsolete docker-compose version key - Bump CSS cache-bust to v4 833 unit tests pass, 15 Selenium tests pass (2 skipped). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 07:20:56 -05:00
Claude	78cf91697c	feat: add functional Ollama chat tests with containerised LLM Add an ollama service (behind --profile ollama) to the test compose stack and a new test suite that verifies real LLM inference end-to-end: - docker-compose.test.yml: add ollama/ollama service with health check, make OLLAMA_URL and OLLAMA_MODEL configurable via env vars - tests/functional/test_ollama_chat.py: session-scoped fixture that brings up Ollama + dashboard, pulls qwen2.5:0.5b (~400MB, CPU-only), and runs chat/history/multi-turn tests against the live stack - Makefile: add `make test-ollama` target Run with: make test-ollama (or FUNCTIONAL_DOCKER=1 pytest tests/functional/test_ollama_chat.py -v) https://claude.ai/code/session_01NTEzfRHSZQCfkfypxgyHKk	2026-02-25 02:44:36 +00:00
Claude	2c419a777d	fix: skip Docker tests gracefully when daemon is unavailable The docker_stack fixture now checks `docker info` before attempting `compose up`. If the daemon isn't reachable, tests skip instead of erroring with pytest.fail. https://claude.ai/code/session_01WU4h3cQQiouMwmgYmAgkMM	2026-02-25 00:49:06 +00:00
Claude	c91e02e7c5	test: add functional test suite with real fixtures, no mocking Three-tier functional test infrastructure: - CLI tests via Typer CliRunner (timmy, timmy-serve, self-tdd) - Dashboard integration tests with real TestClient, real SQLite, real coordinator (no patch/mock — Ollama offline = graceful degradation) - Docker compose container-level tests (gated by FUNCTIONAL_DOCKER=1) - End-to-end L402 payment flow with real mock-lightning backend 42 new tests (8 Docker tests skipped without FUNCTIONAL_DOCKER=1). All 849 tests pass. https://claude.ai/code/session_01WU4h3cQQiouMwmgYmAgkMM	2026-02-25 00:46:22 +00:00

21 Commits