Commit Graph

11 Commits

Author SHA1 Message Date
Trip T
f6a6c0f62e feat: upgrade to qwen3.5, self-hosted Gitea CI, optimize Docker image
All checks were successful
Tests / lint (pull_request) Successful in 2s
Tests / test (pull_request) Successful in 32s
Model upgrade:
- qwen2.5:14b → qwen3.5:latest across config, tools, and docs
- Added qwen3.5 to multimodal model registry

Self-hosted Gitea CI:
- .gitea/workflows/tests.yml: lint + test jobs via act_runner
- Unified Dockerfile: pre-baked deps from poetry.lock for fast CI
- sitepackages=true in tox for ~2s dep resolution (was ~40s)
- OLLAMA_URL set to dead port in CI to prevent real LLM calls

Test isolation fixes:
- Smoke test fixture mocks create_timmy (was hitting real Ollama)
- WebSocket sends initial_state before joining broadcast pool (race fix)
- Tests use settings.ollama_model/url instead of hardcoded values
- skip_ci marker for Ollama-dependent tests, excluded in CI tox envs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 18:36:42 -04:00
Alexander Whitestone
904a7c564e feat: migrate to Agno native HITL tool confirmation flow (#158)
Replace the homebrew regex-based tool extraction and manual dispatch
(tool_executor.py) with Agno's built-in Human-In-The-Loop confirmation:

- Toolkit(requires_confirmation_tools=...) marks dangerous tools
- agent.run() returns RunOutput with status=paused when confirmation needed
- RunRequirement.confirm()/reject() + agent.continue_run() resumes execution

Dashboard and Discord vendor both use the native flow. DuckDuckGo import
isolated so its absence doesn't kill all tools. Test stubs cleaned up
(agno is a real dependency, only truly optional packages stubbed).

1384 tests pass in parallel (~14s).

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 21:54:04 -04:00
Alexander Whitestone
ae3bb1cc21 feat: code quality audit + autoresearch integration + infra hardening (#150) 2026-03-08 12:50:44 -04:00
Alexander Whitestone
cdd3e1a90b feat: upgrade primary model from llama3.1:8b to qwen2.5:14b (#140)
- Swap OLLAMA_MODEL_PRIMARY to qwen2.5:14b for better reasoning
- llama3.1:8b-instruct becomes fallback
- Update .env default and README quick start
- Fix hardcoded model assertions in tests

qwen2.5:14b provides significantly better multi-step reasoning
and tool calling reliability while still running locally on
modest hardware. The 8B model remains as automatic fallback.

Co-authored-by: Trip T <trip@local>
2026-03-07 18:20:34 -05:00
Alexander Whitestone
2b97da9e9c Add pre-commit hook enforcing 30s test suite time limit (#132) 2026-03-05 19:45:38 -05:00
Alexander Whitestone
425e7da380 Claude/remove persona system f vgt m (#126)
* Remove persona system, identity, and all Timmy references

Strip the codebase to pure orchestration logic:

- Delete TIMMY_IDENTITY.md and memory/self/identity.md
- Gut brain/identity.py to no-op stubs (empty returns)
- Remove all system prompts reinforcing Timmy's character, faith,
  sovereignty, sign-off ("Sir, affirmative"), and agent roster
- Replace identity-laden prompts with generic local-AI-assistant prompts
- Remove "You work for Timmy" from all sub-agent system prompts
- Rename PersonaTools → AgentTools, PERSONA_TOOLKITS → AGENT_TOOLKITS
- Replace "timmy" agent ID with "orchestrator" across routes, marketplace,
  tools catalog, and orchestrator class
- Strip Timmy references from config comments, templates, telegram bot,
  chat API, and dashboard UI
- Delete tests/brain/test_identity.py entirely
- Fix all test assertions that checked for persona identity content

729 tests pass (2 pre-existing failures in test_calm.py unrelated).

https://claude.ai/code/session_01LjQGUE6nk9W9674zaxrYxy

* Add Taskosaur (PM + AI task execution) to docker-compose

Spins up Taskosaur alongside the dashboard on `docker compose up`:
- postgres:16-alpine (port 5432, Taskosaur DB)
- redis:7-alpine (Bull queue backend)
- taskosaur (ports 3000 API / 3001 UI)
- dashboard now depends_on taskosaur healthy
- TASKOSAUR_API_URL injected into dashboard environment

Dashboard can reach Taskosaur at http://taskosaur:3000/api on the
internal network. Frontend UI accessible at http://localhost:3001.

https://claude.ai/code/session_01LjQGUE6nk9W9674zaxrYxy

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-03-04 12:00:49 -05:00
Alexander Whitestone
584eeb679e Operation Darling Purge: slim to wealth core (-33,783 lines) (#121) 2026-03-02 13:17:38 -05:00
Alexander Whitestone
2e92838033 fix: restore real-time chat responses via WebSocket (#98)
The chat WebSocket return path was broken by two bugs that prevented
Timmy's responses from appearing in the live chat feed:

1. Frontend checked msg.type instead of msg.event for 'timmy_response'
   events — the WSEvent dataclass uses 'event' as the field name.
2. Frontend accessed msg.response instead of msg.data.response — the
   response payload is nested in the data field.

Additional fixes:
- Queue acknowledgment ("Message queued...") no longer logged as an
  agent message in chat history; the real response is logged by the
  task processor when it completes, eliminating duplicate messages.
- Chat message template now carries data-task-id so the WS handler
  can find and replace the placeholder with the actual response.
- appendMessage() uses DOM APIs (textContent) instead of innerHTML
  for safer content insertion before markdown rendering.
- Fixed chat_message.html script targeting when queue-status div is
  present between the agent message and the inline script.

https://claude.ai/code/session_011cJfexqBBuGhSRQU8qwKcR

Co-authored-by: Claude <noreply@anthropic.com>
2026-02-28 20:22:47 -05:00
Alexander Whitestone
da5745db48 Fix dashboard tests and add SECURITY.md audit report (#84) 2026-02-28 06:59:15 -05:00
Alexander Whitestone
18ed6232f9 feat: Timmy fixes and improvements (#72)
* test: remove hardcoded sleeps, add pytest-timeout

- Replace fixed time.sleep() calls with intelligent polling or WebDriverWait
- Add pytest-timeout dependency and --timeout=30 to prevent hangs
- Fixes test flakiness and improves test suite speed

* feat: add Aider AI tool to Forge's toolkit

- Add Aider tool that calls local Ollama (qwen2.5:14b) for AI coding assist
- Register tool in Forge's code toolkit
- Add functional tests for the Aider tool

* config: add opencode.json with local Ollama provider for sovereign AI

* feat: Timmy fixes and improvements

## Bug Fixes
- Fix read_file path resolution: add ~ expansion, proper relative path handling
- Add repo_root to config.py with auto-detection from .git location
- Fix hardcoded llama3.2 - now dynamic from settings.ollama_model

## Timmy's Requests
- Add communication protocol to AGENTS.md (read context first, explain changes)
- Create DECISIONS.md for architectural decision documentation
- Add reasoning guidance to system prompts (step-by-step, state uncertainty)
- Update tests to reflect correct model name (llama3.1:8b-instruct)

## Testing
- All 177 dashboard tests pass
- All 32 prompt/tool tests pass

---------

Co-authored-by: Alexander Payne <apayne@MM.local>
2026-02-26 23:39:13 -05:00
Claude
4e11dd2490 refactor: Phase 3 — reorganize tests into module-mirroring subdirectories
Move 97 test files from flat tests/ into 13 subdirectories:
  tests/dashboard/   (8 files — routes, mobile, mission control)
  tests/swarm/       (17 files — coordinator, docker, routing, tasks)
  tests/timmy/       (12 files — agent, backends, CLI, tools)
  tests/self_coding/  (14 files — git safety, indexer, self-modify)
  tests/lightning/   (3 files — L402, LND, interface)
  tests/creative/    (8 files — assembler, director, image/music/video)
  tests/integrations/ (10 files — chat bridge, telegram, voice, websocket)
  tests/mcp/         (4 files — bootstrap, discovery, executor)
  tests/spark/       (3 files — engine, tools, events)
  tests/hands/       (3 files — registry, oracle, phase5)
  tests/scripture/   (1 file)
  tests/infrastructure/ (3 files — router cascade, API)
  tests/security/    (3 files — XSS, regression)

Fix Path(__file__) reference in test_mobile_scenarios.py for new depth.
Add __init__.py to all test subdirectories.

Tests: 1503 passed, 9 failed (pre-existing), 53 errors (pre-existing)

https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN
2026-02-26 21:21:28 +00:00