Commit Graph

340 Commits

Author SHA1 Message Date
Alexander Whitestone
904a7c564e feat: migrate to Agno native HITL tool confirmation flow (#158)
Replace the homebrew regex-based tool extraction and manual dispatch
(tool_executor.py) with Agno's built-in Human-In-The-Loop confirmation:

- Toolkit(requires_confirmation_tools=...) marks dangerous tools
- agent.run() returns RunOutput with status=paused when confirmation needed
- RunRequirement.confirm()/reject() + agent.continue_run() resumes execution

Dashboard and Discord vendor both use the native flow. DuckDuckGo import
isolated so its absence doesn't kill all tools. Test stubs cleaned up
(agno is a real dependency, only truly optional packages stubbed).

1384 tests pass in parallel (~14s).

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 21:54:04 -04:00
Alexander Whitestone
574031a55c fix: remove invalid show_tool_calls kwarg crashing Agent init (#157)
* fix: remove invalid show_tool_calls kwarg crashing Agent init (regression)

show_tool_calls was removed in f95c960 (Feb 26) because agno 2.5.x
doesn't accept it, then reintroduced in fd0ede0 (Mar 8) without
runtime testing — mocked tests hid the breakage.

Replace the bogus assertion with a regression guard and an allowlist
test that catches unknown kwargs before they reach production.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: auto-install git hooks, add black/isort to dev deps

- Add .githooks/ with portable pre-commit hook (macOS + Linux)
- make install now auto-activates hooks via core.hooksPath
- Add black and isort to poetry dev group (were only in CI via raw pip)
- Fix black formatting on 2 files flagged by CI
- Fix test_autoresearch_perplexity patching wrong module path

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 15:01:00 -04:00
Alexander Whitestone
21e2ae427a Add test plan for autoresearch with perplexity metric (#154) 2026-03-09 09:36:26 -04:00
Alexander Whitestone
fe484ad7b6 Fix input validation for chat and memory routes (#155) 2026-03-09 09:36:16 -04:00
Alexander Whitestone
82fb2417e3 feat: enable SQLite WAL mode for all databases (AGI ticket #1) (#153) 2026-03-08 16:07:02 -04:00
Alexander Whitestone
8dbce25183 fix: handle concurrent table creation race in SQLite (#151) 2026-03-08 13:27:11 -04:00
Alexander Whitestone
ae3bb1cc21 feat: code quality audit + autoresearch integration + infra hardening (#150) 2026-03-08 12:50:44 -04:00
Alexander Whitestone
fd0ede0d51 feat: auto-escalation system + agentic loop fixes (#149) (#149)
Wire up automatic error-to-task escalation and fix the agentic loop
stopping after the first tool call.

Auto-escalation:
- Add swarm.task_queue.models with create_task() bridge to existing
  task queue SQLite DB
- Add swarm.event_log with EventType enum, log_event(), and SQLite
  persistence + WebSocket broadcast
- Wire capture_error() into request logging middleware so unhandled
  HTTP exceptions auto-create [BUG] tasks with stack traces, git
  context, and push notifications (5-min dedup window)

Agentic loop (Round 11 Bug #1):
- Wrap agent_chat() in asyncio.to_thread() to stop blocking the
  event loop (fixes Discord heartbeat warnings)
- Enable Agno's native multi-turn tool chaining via show_tool_calls
  and tool_call_limit on the Agent config
- Strengthen multi-step continuation prompts with explicit examples

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 03:11:14 -04:00
Alexander Whitestone
7792ae745f feat: agentic loop for multi-step tasks + regression fixes (#148)
* fix: name extraction blocklist, memory preview escaping, and gitignore cleanup

- Add _NAME_BLOCKLIST to extract_user_name() to reject gerunds and UI-state
  words like "Sending" that were incorrectly captured as user names
- Collapse whitespace in get_memory_status() preview so newlines survive
  JSON serialization without showing raw \n escape sequences
- Broaden .gitignore from specific memory/self/user_profile.md to memory/self/
  and untrack memory/self/methodology.md (runtime-edited file)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: catch Ollama connection errors in session.py + add 71 smoke tests

- Wrap agent.run() in session.py with try/except so Ollama connection
  failures return a graceful fallback message instead of dumping raw
  tracebacks to Docker logs
- Add tests/test_smoke.py with 71 tests covering every GET route:
  core pages, feature pages, JSON APIs, and a parametrized no-500 sweep
  — catches import errors, template failures, and schema mismatches
  that unit tests miss

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: agentic loop for multi-step tasks + Round 10 regression fixes

Agentic loop (Parts 1-4):
- Add multi-step chaining instructions to system prompt
- New agentic_loop.py with plan→execute→adapt→summarize flow
- Register plan_and_execute tool for background task execution
- Add max_agent_steps config setting (default: 10)
- Discord fix: 300s timeout, typing indicator, send error handling
- 16 new unit + e2e tests for agentic loop

Round 10 regressions (R1-R5, P1):
- R1: Fix literal \n escape sequences in tool responses
- R2: Chat timeout/error feedback in agent panel
- R3: /hands infinite spinner → static empty states
- R4: /self-coding infinite spinner → static stats + journal
- R5: /grok/status raw JSON → HTML dashboard template
- P1: VETO confirmation dialog on task cards

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: briefing route 500 in CI when agno is MagicMock stub

_call_agent() returned a MagicMock instead of a string when agno is
stubbed in tests, causing SQLite "Error binding parameter 4" on save.
Ensure the return value is always an actual string.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: briefing route 500 in CI — graceful degradation at route level

When agno is stubbed with MagicMock in CI, agent.run() returns a
MagicMock instead of raising — so the exception handler never fires
and a MagicMock propagates as the summary to SQLite, which can't
bind it.

Fix: catch at the route level and return a fallback Briefing object.
This follows the project's graceful degradation pattern — the briefing
page always renders, even when the backend is completely unavailable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 01:46:29 -05:00
Alexander Whitestone
b8e0f4539f fix: Discord memory bug — add session continuity + 6 memory system fixes (#147)
Discord created a new agent per message with no conversation history,
causing Timmy to lose context between messages (the "yes" bug). Now uses
a singleton agent with per-channel/thread session_id, matching the
dashboard's session.py pattern. Also applies _clean_response() to strip
hallucinated tool-call JSON from Discord output.

Additional fixes:
- get_system_context() no longer clears the handoff file (was destroying
  session context on every agent creation)
- Orchestrator uses HotMemory.read() to auto-create MEMORY.md if missing
- vector_store DB_PATH anchored to __file__ instead of relative CWD
- brain/schema.py: removed invalid .load dot-commands from INIT_SQL
- tools_intro: fixed wrong table name 'vectors' → 'chunks' in tier3 check

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 00:20:38 -05:00
Alexander Whitestone
4bc53a43f9 fix: Round 4 bug fixes — 8 dashboard bugs + git blocker + Discord regression (#146)
* chore: stop tracking runtime-generated self-modify reports

These 65 files in data/self_modify_reports/ are auto-generated at
runtime and already listed in .gitignore. Tracking them caused
conflicts when pulling from main.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve 8 dashboard bugs from Round 4 testing report

- Fix Ollama timeout regression: request_timeout → timeout (agno API)
- Add Bootstrap JS to base.html (fixes creative UI tab switching)
- Send initial_state on Swarm Live WebSocket connect
- Add /api/queue/status endpoint (stops 404 log spam from chat panel)
- Populate agent tools from registry on /tools page
- Add notification bell dropdown with /api/notifications endpoint
- All 1157 tests pass

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add 17 e2e tests covering all Round 4 bug fixes

Covers: /calm 200, /api/queue/status JSON, Bootstrap JS presence,
Swarm Live WebSocket initial_state, agent tools populated on /tools,
/api/notifications endpoint, Ollama timeout param, full task lifecycle,
and smoke test for all 15 dashboard pages.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 23:48:20 -05:00
Alexander Whitestone
248af9ed03 fix: dashboard bugs and clean up build artifacts (#145)
* chore: stop tracking runtime-generated self-modify reports

These 65 files in data/self_modify_reports/ are auto-generated at
runtime and already listed in .gitignore. Tracking them caused
conflicts when pulling from main.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve 8 dashboard bugs from Round 4 testing report

- Fix Ollama timeout regression: request_timeout → timeout (agno API)
- Add Bootstrap JS to base.html (fixes creative UI tab switching)
- Send initial_state on Swarm Live WebSocket connect
- Add /api/queue/status endpoint (stops 404 log spam from chat panel)
- Populate agent tools from registry on /tools page
- Add notification bell dropdown with /api/notifications endpoint
- All 1157 tests pass

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 23:44:56 -05:00
Alexander Whitestone
e36a1dc939 fix: resolve 6 dashboard bugs and rebuild Task Queue + Work Orders (#144) (#144)
Round 2+3 bug fix batch:

1. Ollama timeout: Add request_timeout=300 to prevent socket read errors
   on complex 30-60s prompts (production crash fix)

2. Memory API: Create missing HTMX partial templates (memory_facts.html,
   memory_results.html) so Save/Search buttons work

3. CALM page: Add create_tables() call so SQLAlchemy tables exist on
   first request (was returning HTTP 500)

4. Task Queue: Full SQLite-backed rebuild with CRUD endpoints, HTMX
   partials, and action buttons (approve/veto/pause/cancel/retry)

5. Work Orders: Full SQLite-backed rebuild with submit/approve/reject/
   execute pipeline and HTMX polling partials

6. Memory READ tool: Add memory_read function so Timmy stops calling
   read_file when trying to recall stored facts

Also: Close GitHub issues #115, #114, #112, #110 as won't-fix.
Comment on #107 confirming prune_memories() already wired to startup.

Tests: 33 new tests across 4 test files, all passing.
Full suite: 1155 passed, 2 pre-existing failures (hands_shell).

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 23:21:30 -05:00
Alexander Whitestone
b8164e46b0 fix: remove dead swarm imports, add memory_write tool, and auto-prune on startup (#143)
- Replace dead `from swarm` imports in tools_delegation and tools_intro
  with working implementations sourced from _PERSONAS
- Add `memory_write` tool so the agent can actually persist memories
  when users ask it to remember something
- Enhance `memory_search` to search both vault files AND the runtime
  vector store for cross-channel recall (Discord/web/Telegram)
- Add memory management config: memory_prune_days, memory_prune_keep_facts,
  memory_vault_max_mb
- Auto-prune old vector store entries and warn on vault size at startup
- Update tests for new delegation agent list (mace removed)

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 22:34:30 -05:00
Alexander Whitestone
b615595100 refactor: centralize config & harden security (#141)
* feat: upgrade primary model from llama3.1:8b to qwen2.5:14b

- Swap OLLAMA_MODEL_PRIMARY to qwen2.5:14b for better reasoning
- llama3.1:8b-instruct becomes fallback
- Update .env default and README quick start
- Fix hardcoded model assertions in tests

qwen2.5:14b provides significantly better multi-step reasoning
and tool calling reliability while still running locally on
modest hardware. The 8B model remains as automatic fallback.

* security: centralize config, harden uploads, fix silent exceptions

- Add 9 pydantic Settings fields (skip_embeddings, disable_csrf,
  rqlite_url, brain_source, brain_db_path, csrf_cookie_secure,
  chat_api_max_body_bytes, timmy_test_mode) to centralize env-var access
- Migrate 8 os.environ.get() calls across 5 source files to use
  `from config import settings` per project convention
- Add path traversal defense-in-depth to file upload endpoint
- Add 1MB request body size limit to chat API
- Make CSRF cookie secure flag configurable via settings
- Replace 2 silent `except: pass` blocks with debug logging in session.py
- Remove unused `import os` from brain/memory.py and csrf.py
- Update 5 CSRF test fixtures to patch settings instead of os.environ

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 18:49:37 -05:00
Alexander Whitestone
cdd3e1a90b feat: upgrade primary model from llama3.1:8b to qwen2.5:14b (#140)
- Swap OLLAMA_MODEL_PRIMARY to qwen2.5:14b for better reasoning
- llama3.1:8b-instruct becomes fallback
- Update .env default and README quick start
- Fix hardcoded model assertions in tests

qwen2.5:14b provides significantly better multi-step reasoning
and tool calling reliability while still running locally on
modest hardware. The 8B model remains as automatic fallback.

Co-authored-by: Trip T <trip@local>
2026-03-07 18:20:34 -05:00
Alexander Whitestone
480b8d324e security: fix CSRF bypass vulnerabilities via strict path matching and normalization (#138) 2026-03-07 06:45:32 -05:00
Alexander Whitestone
3f06e7231d Improve test coverage from 63.6% to 73.4% and fix test infrastructure (#137) 2026-03-06 13:21:05 -05:00
Alexander Whitestone
3b322d185c feat: add Shell and Git execution hands for Timmy (#136) 2026-03-06 09:01:24 -05:00
Alexander Whitestone
39461858a0 SEC: Fix CSRF bypass via path traversal in exempt routes (#135) 2026-03-06 09:00:56 -05:00
Alexander Whitestone
87dc5eadfe Wire orchestrator pipe into task runner + pipe-verifying integration tests (#134) 2026-03-06 01:20:14 -05:00
AlexanderWhitestone
bbe975ec54 feat: add TDD setup script and functional tests for Sovereign Agent Stack 2026-03-05 21:54:24 -05:00
Alexander Whitestone
fb97625404 Consolidate architecture: flatten agents, kill Redis/Celery, thin routes (#133) 2026-03-05 20:27:02 -05:00
Alexander Whitestone
2b97da9e9c Add pre-commit hook enforcing 30s test suite time limit (#132) 2026-03-05 19:45:38 -05:00
Alexander Whitestone
aff3edb06a Audit cleanup: security fixes, code reduction, test hygiene (#131) 2026-03-05 18:56:52 -05:00
Alexander Whitestone
e8f1dea3ec Remove unused deps from poetry build, speed test suite to ~16s (#130) 2026-03-05 18:07:59 -05:00
Alexander Whitestone
f2dacf4ee0 Integrate Celery task queue for background task processing (#129) 2026-03-05 12:09:51 -05:00
Alexander Whitestone
b8ff534ad8 Security: Enhance CSRF protection with form field support and stricter validation (#128) 2026-03-05 07:04:30 -05:00
AlexanderWhitestone
5e8766cef0 Fix build issues, implement missing routes, and stabilize e2e tests for production readiness 2026-03-04 17:15:46 -05:00
Alexander Whitestone
425e7da380 Claude/remove persona system f vgt m (#126)
* Remove persona system, identity, and all Timmy references

Strip the codebase to pure orchestration logic:

- Delete TIMMY_IDENTITY.md and memory/self/identity.md
- Gut brain/identity.py to no-op stubs (empty returns)
- Remove all system prompts reinforcing Timmy's character, faith,
  sovereignty, sign-off ("Sir, affirmative"), and agent roster
- Replace identity-laden prompts with generic local-AI-assistant prompts
- Remove "You work for Timmy" from all sub-agent system prompts
- Rename PersonaTools → AgentTools, PERSONA_TOOLKITS → AGENT_TOOLKITS
- Replace "timmy" agent ID with "orchestrator" across routes, marketplace,
  tools catalog, and orchestrator class
- Strip Timmy references from config comments, templates, telegram bot,
  chat API, and dashboard UI
- Delete tests/brain/test_identity.py entirely
- Fix all test assertions that checked for persona identity content

729 tests pass (2 pre-existing failures in test_calm.py unrelated).

https://claude.ai/code/session_01LjQGUE6nk9W9674zaxrYxy

* Add Taskosaur (PM + AI task execution) to docker-compose

Spins up Taskosaur alongside the dashboard on `docker compose up`:
- postgres:16-alpine (port 5432, Taskosaur DB)
- redis:7-alpine (Bull queue backend)
- taskosaur (ports 3000 API / 3001 UI)
- dashboard now depends_on taskosaur healthy
- TASKOSAUR_API_URL injected into dashboard environment

Dashboard can reach Taskosaur at http://taskosaur:3000/api on the
internal network. Frontend UI accessible at http://localhost:3001.

https://claude.ai/code/session_01LjQGUE6nk9W9674zaxrYxy

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-03-04 12:00:49 -05:00
Alexander Whitestone
548a3f980d Test: add input validation tests for form handlers (#125) 2026-03-04 07:58:58 -05:00
Alexander Whitestone
d1f2ae3ed4 Security: fix XSS vulnerabilities in health and grok routes (#124) 2026-03-04 07:58:49 -05:00
Alexander Whitestone
15c7ee5d1e Refactor: migrate inline middleware to dedicated classes (#123) 2026-03-04 07:58:39 -05:00
Alexander Whitestone
eb2b34876b security: implement rate limiting for chat API and fix calm tests (#122) 2026-03-03 08:16:36 -05:00
Alexander Whitestone
584eeb679e Operation Darling Purge: slim to wealth core (-33,783 lines) (#121) 2026-03-02 13:17:38 -05:00
Alexander Whitestone
f694eff0a4 fix: align calm imports with project conventions (#120) 2026-03-02 12:04:30 -05:00
AlexanderWhitestone
d080e67faf feat: Implement Minimum Viable Calm (MVC) feature and initial tests 2026-03-02 11:46:40 -05:00
Alexander Whitestone
62ef1120a4 Memory Unification + Canonical Identity: -11,074 lines of homebrew (#119) 2026-03-02 09:58:07 -05:00
Alexander Whitestone
b1552b9f64 Fix font resolution in creative module and add security headers to dashboard (#103) 2026-03-01 11:50:34 -05:00
Alexander Whitestone
3a8496a3f1 feat: add security middleware suite - CSRF, security headers, and request logging (#102)
Implements three security middleware components with full test coverage:

- CSRF Protection: Token generation/validation, safe method allowlist,
  auto-exempt webhooks, constant-time comparison for timing attack prevention

- Security Headers: X-Content-Type-Options, X-Frame-Options, CSP,
  Permissions-Policy, Referrer-Policy, HSTS (production)

- Request Logging: Method/path/status/duration logging with correlation IDs,
  configurable path exclusions, X-Forwarded-For support

Also fixes Discord test isolation issue where settings.discord_token
was not being properly reset between tests.

New files:
- src/dashboard/middleware/{csrf,security_headers,request_logging}.py
- tests/dashboard/middleware/test_{csrf,security_headers,request_logging}.py

Addresses design review recommendations R3, R8, R9, R4.

All tests pass: 1950 passed, 40 skipped

Co-authored-by: Alexander Payne <apayne@MM.local>
2026-02-28 23:21:09 -05:00
Alexander Whitestone
6eefcabc97 feat: Phase 1 autonomy upgrades — introspection, heartbeat, source tagging, Discord auto-detect (#101)
UC-01: Live System Introspection Tool
- Add get_task_queue_status(), get_agent_roster(), get_live_system_status()
  to timmy/tools_intro with graceful degradation
- Enhanced get_memory_status() with line counts, section headers, vault
  directory listing, semantic memory row count, self-coding journal stats
- Register system_status MCP tool (creative/tools/system_status.py)
- Add system_status to Timmy's tool list + Hard Rule #7

UC-02: Fix Offline Status Bug
- Add registry.heartbeat() calls in task_processor run_loop() and
  process_single_task() so health endpoint reflects actual agent status
- health.py now consults swarm registry instead of Ollama connectivity

UC-03: Message Source Tagging
- Add source field to Message dataclass (default "browser")
- Tag all message_log.append() calls: browser, api, system
- Include source in /api/chat/history response

UC-04: Discord Token Auto-Detection & Docker Fix
- Add _discord_token_watcher() background coroutine that polls every 30s
  for DISCORD_TOKEN in env vars, .env file, or state file
- Add --extras discord to all three Dockerfiles (main, dashboard, test)

All 26 Phase 1 tests pass in Docker (make test-docker).
Full suite: 1889 passed, 77 skipped, 0 failed.

Co-authored-by: Alexander Payne <apayne@MM.local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 22:49:24 -05:00
Alexander Whitestone
89cfe1be0d fix: Docker-first test suite, UX improvements, and bug fixes (#100)
Dashboard UX:
- Restructure nav from 22 flat links to 6 core + MORE dropdown
- Add mobile nav section labels (Core, Intelligence, Agents, System, Commerce)
- Defer marked.js and dompurify.js loading, consolidate CDN to jsdelivr
- Optimize font weights (drop unused 300/500), bump style.css cache buster
- Remove duplicate HTMX load triggers from sidebar and health panels

Bug fixes:
- Fix Timmy showing OFFLINE by registering after swarm recovery sweep
- Fix ThinkingEngine await bug with asyncio.run_coroutine_threadsafe
- Fix chat auto-scroll by calling scrollChat() after history partial loads
- Add missing /voice/button page and /voice/command endpoint
- Fix Grok api_key="" treated as falsy falling through to env key
- Fix self_modify PROJECT_ROOT using settings.repo_root instead of __file__

Docker test infrastructure:
- Bind-mount hands/, docker/, Dockerfiles, and compose files into test container
- Add fontconfig + fonts-dejavu-core for creative/assembler TextClip tests
- Initialize minimal git repo in Dockerfile.test for GitSafety compatibility
- Fix introspection and path resolution tests for Docker /app context

All 1863 tests pass in Docker (0 failures, 77 skipped).

Co-authored-by: Alexander Payne <apayne@MM.local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 22:14:37 -05:00
Alexander Whitestone
6e67c3b421 feat: add bug report ingestion pipeline with Forge dispatch (#99)
Replace the stub `handle_bug_report` handler with a real implementation
that logs a decision trail and dispatches code_fix tasks to Forge for
automated fixing. Add `POST /api/bugs/submit` endpoint and `timmy
ingest-report` CLI command so AI test runners (Comet) can submit
structured bug reports without manual copy-paste.

- POST /api/bugs/submit: accepts JSON reports, creates bug_report tasks
- timmy ingest-report: CLI for file/stdin JSON ingestion with --dry-run
- handle_bug_report: logs decision trail to event_log, dispatches
  code_fix task to Forge with parent_task_id linking back to the bug
- 18 TDD tests covering endpoint, handler, and CLI

Co-authored-by: Alexander Payne <apayne@MM.local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 21:15:53 -05:00
Alexander Whitestone
2e92838033 fix: restore real-time chat responses via WebSocket (#98)
The chat WebSocket return path was broken by two bugs that prevented
Timmy's responses from appearing in the live chat feed:

1. Frontend checked msg.type instead of msg.event for 'timmy_response'
   events — the WSEvent dataclass uses 'event' as the field name.
2. Frontend accessed msg.response instead of msg.data.response — the
   response payload is nested in the data field.

Additional fixes:
- Queue acknowledgment ("Message queued...") no longer logged as an
  agent message in chat history; the real response is logged by the
  task processor when it completes, eliminating duplicate messages.
- Chat message template now carries data-task-id so the WS handler
  can find and replace the placeholder with the actual response.
- appendMessage() uses DOM APIs (textContent) instead of innerHTML
  for safer content insertion before markdown rendering.
- Fixed chat_message.html script targeting when queue-status div is
  present between the agent message and the inline script.

https://claude.ai/code/session_011cJfexqBBuGhSRQU8qwKcR

Co-authored-by: Claude <noreply@anthropic.com>
2026-02-28 20:22:47 -05:00
Alexander Whitestone
b7c89d1101 feat: dockerize OpenFang as vendored tool runtime sidecar (#96) 2026-02-28 19:27:48 -05:00
Alexander Whitestone
d7d7a5a80a audit: clean Docker architecture, consolidate test fixtures, add containerized test runner (#94) 2026-02-28 16:11:58 -05:00
Alexander Whitestone
1e19164379 fix: resolve portal startup hangs with non-blocking init (#93)
* fix: resolve portal startup hangs with non-blocking init

- Add socket_connect_timeout/socket_timeout (3s) to Redis connection in
  SwarmComms to prevent infinite hangs when Redis is unreachable
- Defer reconcile_on_startup() from SwarmCoordinator.__init__() to an
  explicit initialize() call during app lifespan, unblocking the
  module-level singleton creation
- Make Ollama health checks non-blocking via asyncio.to_thread() so they
  don't freeze the event loop for 2s per call
- Fix _check_redis() to reuse coordinator's SwarmComms singleton instead
  of creating a new connection on every health check
- Move discord bot platform registration from lifespan critical path
  into background task to avoid heavy import before yield
- Increase Docker healthcheck start_period from 10s/15s to 30s to give
  the app adequate time to complete startup

https://claude.ai/code/session_016t5jNBYsUAQuyoR7sXe7Ux

* fix: disable commit signing in git_tools test fixture

The git_repo fixture inherits global gpgsign config, causing git_commit
to fail when the signing server rejects unsigned source context.
Disable signing in the temp repo's local config.

https://claude.ai/code/session_016t5jNBYsUAQuyoR7sXe7Ux

* fix: add dev extras for pip-based CI install

The CI workflow runs `pip install -e ".[dev]"` but after the Poetry
migration there was no `dev` extra defined — only a Poetry dev group.
This caused pytest to not be installed, resulting in exit code 127
(command not found) on every CI run.

Add a pip-compatible `dev` extra that mirrors the Poetry dev group
so both `pip install -e ".[dev]"` and `poetry install` work.

https://claude.ai/code/session_016t5jNBYsUAQuyoR7sXe7Ux

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-02-28 15:01:48 -05:00
Alexander Whitestone
ca0c42398b feat: migrate to Poetry, fix Docker build, and resolve 6 UI/backend bugs (#92)
Migrate from Hatchling to Poetry for dependency management, fixing the
Docker build failure caused by .dockerignore excluding README.md that
Hatchling needed for metadata. Poetry export strategy bypasses this
entirely. Creative extras removed from main build (separate service).

Docker changes:
- Multi-stage builds with poetry export → pip install
- BuildKit cache mounts for faster rebuilds
- All 3 Dockerfiles updated (root, dashboard, agent)

Bug fixes from tester audit:
- TaskStatus/TaskPriority case-insensitive enum parsing
- scrollChat() upgraded to requestAnimationFrame, removed duplicate
- Desktop/mobile nav items synced in base.html
- HTMX pointed to direct htmx.min.js URL
- Removed unused highlight.js and bootstrap.bundle.min.js
- Registered missing escalation/external task handlers in app.py

Co-authored-by: Alexander Payne <apayne@MM.local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 13:12:14 -05:00
Alexander Whitestone
e5190b248a CI/CD Optimization: Guard Rails, Pre-commit Checks, and Test Fixes (#90)
* CI/CD Optimization: Guard Rails, Black Linting, and Pre-commit Hooks

- Fixed all test collection errors (Selenium imports, fixture paths, syntax)
- Implemented pre-commit hooks with Black formatting and isort
- Created comprehensive Makefile with test targets (unit, integration, functional, e2e)
- Added pytest.ini with marker definitions for test categorization
- Established guard rails to prevent future collection errors
- Wrapped optional dependencies (Selenium, MoviePy) in try-except blocks
- Added conftest_markers for automatic test categorization

This ensures a smooth development stream with:
- Fast feedback loops (pre-commit checks before push)
- Consistent code formatting (Black)
- Reliable CI/CD (no collection errors, proper test isolation)
- Clear test organization (unit, integration, functional, E2E)

* Fix CI/CD test failures:
- Export templates from dashboard.app
- Fix model name assertion in test_agent.py
- Fix platform-agnostic path resolution in test_path_resolution.py
- Skip Docker tests in test_docker_deployment.py if docker not available
- Fix test_model_fallback_chain logic in test_ollama_integration.py

* Add preventative pre-commit checks and Docker test skipif decorators:
- Create pre_commit_checks.py script for common CI failures
- Add skipif decorators to Docker tests
- Improve test robustness for CI environments
2026-02-28 11:36:50 -05:00
Alexander Whitestone
a5fd680428 feat: microservices refactoring with TDD and Docker optimization (#88)
## Summary
Complete refactoring of Timmy Time from monolithic architecture to microservices
using Test-Driven Development (TDD) and optimized Docker builds.

## Changes

### Core Improvements
- Optimized dashboard startup: moved blocking tasks to async background processes
- Fixed model fallback logic in agent configuration
- Enhanced test fixtures with comprehensive conftest.py

### Microservices Architecture
- Created separate Dockerfiles for dashboard, Ollama, and agent services
- Implemented docker-compose.microservices.yml for service orchestration
- Added health checks and non-root user execution for security
- Multi-stage Docker builds for lean, fast images

### Testing
- Added E2E tests for dashboard responsiveness
- Added E2E tests for Ollama integration
- Added E2E tests for microservices architecture validation
- All 36 tests passing, 8 skipped (environment-specific)

### Documentation
- Created comprehensive final report
- Generated issue resolution plan
- Added interview transcript demonstrating core agent functionality

### New Modules
- skill_absorption.py: Dynamic skill loading and integration system for Timmy

## Test Results
 36 passed, 8 skipped, 6 warnings
 All microservices tests passing
 Dashboard responsiveness verified
 Ollama integration validated

## Files Added/Modified
- docker/: Multi-stage Dockerfiles for all services
- tests/e2e/: Comprehensive E2E test suite
- src/timmy/skill_absorption.py: Skill absorption system
- src/dashboard/app.py: Optimized startup logic
- tests/conftest.py: Enhanced test fixtures
- docker-compose.microservices.yml: Service orchestration

## Breaking Changes
None - all changes are backward compatible

## Next Steps
- Integrate skill absorption system into agent workflow
- Test with microservices-tdd-refactor skill
- Deploy to production with docker-compose orchestration
2026-02-28 11:07:19 -05:00