Commit Graph

678 Commits

Author SHA1 Message Date
Alexander Whitestone
21e2ae427a Add test plan for autoresearch with perplexity metric (#154) 2026-03-09 09:36:26 -04:00
Alexander Whitestone
fe484ad7b6 Fix input validation for chat and memory routes (#155) 2026-03-09 09:36:16 -04:00
Alexander Whitestone
82fb2417e3 feat: enable SQLite WAL mode for all databases (AGI ticket #1) (#153) 2026-03-08 16:07:02 -04:00
Alexander Whitestone
11ba21418a docs: sovereign AGI research — architecture analysis + Ghost Core integration (#152) 2026-03-08 15:00:10 -04:00
Alexander Whitestone
8dbce25183 fix: handle concurrent table creation race in SQLite (#151) 2026-03-08 13:27:11 -04:00
Alexander Whitestone
ae3bb1cc21 feat: code quality audit + autoresearch integration + infra hardening (#150) 2026-03-08 12:50:44 -04:00
Alexander Whitestone
fd0ede0d51 feat: auto-escalation system + agentic loop fixes (#149) (#149)
Wire up automatic error-to-task escalation and fix the agentic loop
stopping after the first tool call.

Auto-escalation:
- Add swarm.task_queue.models with create_task() bridge to existing
  task queue SQLite DB
- Add swarm.event_log with EventType enum, log_event(), and SQLite
  persistence + WebSocket broadcast
- Wire capture_error() into request logging middleware so unhandled
  HTTP exceptions auto-create [BUG] tasks with stack traces, git
  context, and push notifications (5-min dedup window)

Agentic loop (Round 11 Bug #1):
- Wrap agent_chat() in asyncio.to_thread() to stop blocking the
  event loop (fixes Discord heartbeat warnings)
- Enable Agno's native multi-turn tool chaining via show_tool_calls
  and tool_call_limit on the Agent config
- Strengthen multi-step continuation prompts with explicit examples

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 03:11:14 -04:00
Alexander Whitestone
7792ae745f feat: agentic loop for multi-step tasks + regression fixes (#148)
* fix: name extraction blocklist, memory preview escaping, and gitignore cleanup

- Add _NAME_BLOCKLIST to extract_user_name() to reject gerunds and UI-state
  words like "Sending" that were incorrectly captured as user names
- Collapse whitespace in get_memory_status() preview so newlines survive
  JSON serialization without showing raw \n escape sequences
- Broaden .gitignore from specific memory/self/user_profile.md to memory/self/
  and untrack memory/self/methodology.md (runtime-edited file)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: catch Ollama connection errors in session.py + add 71 smoke tests

- Wrap agent.run() in session.py with try/except so Ollama connection
  failures return a graceful fallback message instead of dumping raw
  tracebacks to Docker logs
- Add tests/test_smoke.py with 71 tests covering every GET route:
  core pages, feature pages, JSON APIs, and a parametrized no-500 sweep
  — catches import errors, template failures, and schema mismatches
  that unit tests miss

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: agentic loop for multi-step tasks + Round 10 regression fixes

Agentic loop (Parts 1-4):
- Add multi-step chaining instructions to system prompt
- New agentic_loop.py with plan→execute→adapt→summarize flow
- Register plan_and_execute tool for background task execution
- Add max_agent_steps config setting (default: 10)
- Discord fix: 300s timeout, typing indicator, send error handling
- 16 new unit + e2e tests for agentic loop

Round 10 regressions (R1-R5, P1):
- R1: Fix literal \n escape sequences in tool responses
- R2: Chat timeout/error feedback in agent panel
- R3: /hands infinite spinner → static empty states
- R4: /self-coding infinite spinner → static stats + journal
- R5: /grok/status raw JSON → HTML dashboard template
- P1: VETO confirmation dialog on task cards

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: briefing route 500 in CI when agno is MagicMock stub

_call_agent() returned a MagicMock instead of a string when agno is
stubbed in tests, causing SQLite "Error binding parameter 4" on save.
Ensure the return value is always an actual string.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: briefing route 500 in CI — graceful degradation at route level

When agno is stubbed with MagicMock in CI, agent.run() returns a
MagicMock instead of raising — so the exception handler never fires
and a MagicMock propagates as the summary to SQLite, which can't
bind it.

Fix: catch at the route level and return a fallback Briefing object.
This follows the project's graceful degradation pattern — the briefing
page always renders, even when the backend is completely unavailable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 01:46:29 -05:00
Alexander Whitestone
b8e0f4539f fix: Discord memory bug — add session continuity + 6 memory system fixes (#147)
Discord created a new agent per message with no conversation history,
causing Timmy to lose context between messages (the "yes" bug). Now uses
a singleton agent with per-channel/thread session_id, matching the
dashboard's session.py pattern. Also applies _clean_response() to strip
hallucinated tool-call JSON from Discord output.

Additional fixes:
- get_system_context() no longer clears the handoff file (was destroying
  session context on every agent creation)
- Orchestrator uses HotMemory.read() to auto-create MEMORY.md if missing
- vector_store DB_PATH anchored to __file__ instead of relative CWD
- brain/schema.py: removed invalid .load dot-commands from INIT_SQL
- tools_intro: fixed wrong table name 'vectors' → 'chunks' in tier3 check

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 00:20:38 -05:00
Alexander Whitestone
4bc53a43f9 fix: Round 4 bug fixes — 8 dashboard bugs + git blocker + Discord regression (#146)
* chore: stop tracking runtime-generated self-modify reports

These 65 files in data/self_modify_reports/ are auto-generated at
runtime and already listed in .gitignore. Tracking them caused
conflicts when pulling from main.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve 8 dashboard bugs from Round 4 testing report

- Fix Ollama timeout regression: request_timeout → timeout (agno API)
- Add Bootstrap JS to base.html (fixes creative UI tab switching)
- Send initial_state on Swarm Live WebSocket connect
- Add /api/queue/status endpoint (stops 404 log spam from chat panel)
- Populate agent tools from registry on /tools page
- Add notification bell dropdown with /api/notifications endpoint
- All 1157 tests pass

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add 17 e2e tests covering all Round 4 bug fixes

Covers: /calm 200, /api/queue/status JSON, Bootstrap JS presence,
Swarm Live WebSocket initial_state, agent tools populated on /tools,
/api/notifications endpoint, Ollama timeout param, full task lifecycle,
and smoke test for all 15 dashboard pages.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 23:48:20 -05:00
Alexander Whitestone
248af9ed03 fix: dashboard bugs and clean up build artifacts (#145)
* chore: stop tracking runtime-generated self-modify reports

These 65 files in data/self_modify_reports/ are auto-generated at
runtime and already listed in .gitignore. Tracking them caused
conflicts when pulling from main.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve 8 dashboard bugs from Round 4 testing report

- Fix Ollama timeout regression: request_timeout → timeout (agno API)
- Add Bootstrap JS to base.html (fixes creative UI tab switching)
- Send initial_state on Swarm Live WebSocket connect
- Add /api/queue/status endpoint (stops 404 log spam from chat panel)
- Populate agent tools from registry on /tools page
- Add notification bell dropdown with /api/notifications endpoint
- All 1157 tests pass

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 23:44:56 -05:00
Alexander Whitestone
e36a1dc939 fix: resolve 6 dashboard bugs and rebuild Task Queue + Work Orders (#144) (#144)
Round 2+3 bug fix batch:

1. Ollama timeout: Add request_timeout=300 to prevent socket read errors
   on complex 30-60s prompts (production crash fix)

2. Memory API: Create missing HTMX partial templates (memory_facts.html,
   memory_results.html) so Save/Search buttons work

3. CALM page: Add create_tables() call so SQLAlchemy tables exist on
   first request (was returning HTTP 500)

4. Task Queue: Full SQLite-backed rebuild with CRUD endpoints, HTMX
   partials, and action buttons (approve/veto/pause/cancel/retry)

5. Work Orders: Full SQLite-backed rebuild with submit/approve/reject/
   execute pipeline and HTMX polling partials

6. Memory READ tool: Add memory_read function so Timmy stops calling
   read_file when trying to recall stored facts

Also: Close GitHub issues #115, #114, #112, #110 as won't-fix.
Comment on #107 confirming prune_memories() already wired to startup.

Tests: 33 new tests across 4 test files, all passing.
Full suite: 1155 passed, 2 pre-existing failures (hands_shell).

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 23:21:30 -05:00
Alexander Whitestone
b8164e46b0 fix: remove dead swarm imports, add memory_write tool, and auto-prune on startup (#143)
- Replace dead `from swarm` imports in tools_delegation and tools_intro
  with working implementations sourced from _PERSONAS
- Add `memory_write` tool so the agent can actually persist memories
  when users ask it to remember something
- Enhance `memory_search` to search both vault files AND the runtime
  vector store for cross-channel recall (Discord/web/Telegram)
- Add memory management config: memory_prune_days, memory_prune_keep_facts,
  memory_vault_max_mb
- Auto-prune old vector store entries and warn on vault size at startup
- Update tests for new delegation agent list (mace removed)

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 22:34:30 -05:00
Alexander Whitestone
3bf7187482 Clean up generated files and fix 6 dashboard bugs (#142)
* chore: gitignore local/generated files and remove from tracking

Remove user-specific files (MEMORY.md, user_profile.md, prompts.py)
from source control. Add patterns for credentials, backups, and
generated content to .gitignore.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve 6 dashboard bugs — chat, /bugs, /swarm/events, WebSocket, marketplace, sidebar

1. Chat non-functional: CSRF middleware silently blocked HTMX POSTs.
   Added CSRF token transmission via hx-headers in base.html.

2. /bugs → 500: Route missing template vars (total, stats, filter_status).

3. /swarm/events → 500: Called .event_type.value on a plain str
   (SparkEvent.event_type is str, not enum). Also fixed timestamp
   and source field mismatches in the template.

4. WebSocket reconnect loop: No WS endpoint existed at /swarm/live,
   only an HTTP GET. Added @router.websocket("/live") using ws_manager.

5. Marketplace "Agent not found": Nav links /marketplace/ui matched
   the /{agent_id} catch-all. Added explicit /marketplace/ui route
   with enriched template context.

6. Agents sidebar "LOADING...": /swarm/agents/sidebar endpoint was
   missing. Added route returning the existing sidebar partial.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: restore src/timmy/prompts.py to source control

prompts.py is imported by timmy.agent and is production code,
not a user-local file. Re-add to tracking and remove from .gitignore.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 21:26:41 -05:00
Alexander Whitestone
b615595100 refactor: centralize config & harden security (#141)
* feat: upgrade primary model from llama3.1:8b to qwen2.5:14b

- Swap OLLAMA_MODEL_PRIMARY to qwen2.5:14b for better reasoning
- llama3.1:8b-instruct becomes fallback
- Update .env default and README quick start
- Fix hardcoded model assertions in tests

qwen2.5:14b provides significantly better multi-step reasoning
and tool calling reliability while still running locally on
modest hardware. The 8B model remains as automatic fallback.

* security: centralize config, harden uploads, fix silent exceptions

- Add 9 pydantic Settings fields (skip_embeddings, disable_csrf,
  rqlite_url, brain_source, brain_db_path, csrf_cookie_secure,
  chat_api_max_body_bytes, timmy_test_mode) to centralize env-var access
- Migrate 8 os.environ.get() calls across 5 source files to use
  `from config import settings` per project convention
- Add path traversal defense-in-depth to file upload endpoint
- Add 1MB request body size limit to chat API
- Make CSRF cookie secure flag configurable via settings
- Replace 2 silent `except: pass` blocks with debug logging in session.py
- Remove unused `import os` from brain/memory.py and csrf.py
- Update 5 CSRF test fixtures to patch settings instead of os.environ

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 18:49:37 -05:00
Alexander Whitestone
cdd3e1a90b feat: upgrade primary model from llama3.1:8b to qwen2.5:14b (#140)
- Swap OLLAMA_MODEL_PRIMARY to qwen2.5:14b for better reasoning
- llama3.1:8b-instruct becomes fallback
- Update .env default and README quick start
- Fix hardcoded model assertions in tests

qwen2.5:14b provides significantly better multi-step reasoning
and tool calling reliability while still running locally on
modest hardware. The 8B model remains as automatic fallback.

Co-authored-by: Trip T <trip@local>
2026-03-07 18:20:34 -05:00
Alexander Whitestone
39f2eb418a Remove stale references from documentation across 9 files (#139) 2026-03-07 07:28:14 -05:00
Alexander Whitestone
480b8d324e security: fix CSRF bypass vulnerabilities via strict path matching and normalization (#138) 2026-03-07 06:45:32 -05:00
Alexander Whitestone
3f06e7231d Improve test coverage from 63.6% to 73.4% and fix test infrastructure (#137) 2026-03-06 13:21:05 -05:00
AlexanderWhitestone
23f744f296 v8: Hermes + Paperclip, Tailscale-only, systemd, backups, UFW lockdown 2026-03-06 12:48:54 -05:00
Alexander Whitestone
3b322d185c feat: add Shell and Git execution hands for Timmy (#136) 2026-03-06 09:01:24 -05:00
Alexander Whitestone
39461858a0 SEC: Fix CSRF bypass via path traversal in exempt routes (#135) 2026-03-06 09:00:56 -05:00
Alexander Whitestone
87dc5eadfe Wire orchestrator pipe into task runner + pipe-verifying integration tests (#134) 2026-03-06 01:20:14 -05:00
AlexanderWhitestone
d10cff333a Add standalone auth-gate.py and nginx config for reference 2026-03-05 23:49:56 -05:00
AlexanderWhitestone
22b0ec1d67 v7: Paperclip only — stripped OpenFang and Obsidian vault 2026-03-05 23:47:46 -05:00
AlexanderWhitestone
9348c29658 v6: local_trusted mode + nginx reverse proxy + cookie auth gate
Key changes from v5:
- Paperclip runs in local_trusted mode on 127.0.0.1:3100 (not 0.0.0.0)
- Nginx reverse proxy on port 80 passes Host:localhost to bypass Vite allowedHosts
- Cookie-based auth gate (Python) — login once, 7-day session cookie
- Zombie process cleanup before start (kills stale node on ports 3100-3110)
- Auto-stops Docker Caddy if it conflicts on port 80
- Persistent secrets file (.secrets) so auth tokens survive restarts
- Added restart command and improved status output with port checks
- Auth credentials configurable via AUTH_USER/AUTH_PASS env vars
2026-03-05 23:34:20 -05:00
AlexanderWhitestone
3f186a1d57 Fix: Automatically allow VPS IP in PAPERCLIP_ALLOWED_HOSTNAMES 2026-03-05 22:19:27 -05:00
AlexanderWhitestone
d7c23d015d Add final VPS-ready setup script with system Postgres and 0.0.0.0 binding 2026-03-05 22:10:33 -05:00
AlexanderWhitestone
bbe975ec54 feat: add TDD setup script and functional tests for Sovereign Agent Stack 2026-03-05 21:54:24 -05:00
Alexander Whitestone
fb97625404 Consolidate architecture: flatten agents, kill Redis/Celery, thin routes (#133) 2026-03-05 20:27:02 -05:00
Alexander Whitestone
2b97da9e9c Add pre-commit hook enforcing 30s test suite time limit (#132) 2026-03-05 19:45:38 -05:00
Alexander Whitestone
aff3edb06a Audit cleanup: security fixes, code reduction, test hygiene (#131) 2026-03-05 18:56:52 -05:00
Alexander Whitestone
e8f1dea3ec Remove unused deps from poetry build, speed test suite to ~16s (#130) 2026-03-05 18:07:59 -05:00
Alexander Whitestone
f2dacf4ee0 Integrate Celery task queue for background task processing (#129) 2026-03-05 12:09:51 -05:00
Alexander Whitestone
b8ff534ad8 Security: Enhance CSRF protection with form field support and stricter validation (#128) 2026-03-05 07:04:30 -05:00
Alexander Whitestone
a18099a06f Fix build issues, implement missing routes, and stabilize e2e tests (#127)
* Fix build issues, implement missing routes, and stabilize e2e tests for production readiness

* Remove accidentally committed log file
2026-03-04 17:31:01 -05:00
AlexanderWhitestone
7d9dc56f11 Remove accidentally committed log file 2026-03-04 17:16:01 -05:00
AlexanderWhitestone
5e8766cef0 Fix build issues, implement missing routes, and stabilize e2e tests for production readiness 2026-03-04 17:15:46 -05:00
Alexander Whitestone
425e7da380 Claude/remove persona system f vgt m (#126)
* Remove persona system, identity, and all Timmy references

Strip the codebase to pure orchestration logic:

- Delete TIMMY_IDENTITY.md and memory/self/identity.md
- Gut brain/identity.py to no-op stubs (empty returns)
- Remove all system prompts reinforcing Timmy's character, faith,
  sovereignty, sign-off ("Sir, affirmative"), and agent roster
- Replace identity-laden prompts with generic local-AI-assistant prompts
- Remove "You work for Timmy" from all sub-agent system prompts
- Rename PersonaTools → AgentTools, PERSONA_TOOLKITS → AGENT_TOOLKITS
- Replace "timmy" agent ID with "orchestrator" across routes, marketplace,
  tools catalog, and orchestrator class
- Strip Timmy references from config comments, templates, telegram bot,
  chat API, and dashboard UI
- Delete tests/brain/test_identity.py entirely
- Fix all test assertions that checked for persona identity content

729 tests pass (2 pre-existing failures in test_calm.py unrelated).

https://claude.ai/code/session_01LjQGUE6nk9W9674zaxrYxy

* Add Taskosaur (PM + AI task execution) to docker-compose

Spins up Taskosaur alongside the dashboard on `docker compose up`:
- postgres:16-alpine (port 5432, Taskosaur DB)
- redis:7-alpine (Bull queue backend)
- taskosaur (ports 3000 API / 3001 UI)
- dashboard now depends_on taskosaur healthy
- TASKOSAUR_API_URL injected into dashboard environment

Dashboard can reach Taskosaur at http://taskosaur:3000/api on the
internal network. Frontend UI accessible at http://localhost:3001.

https://claude.ai/code/session_01LjQGUE6nk9W9674zaxrYxy

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-03-04 12:00:49 -05:00
Alexander Whitestone
548a3f980d Test: add input validation tests for form handlers (#125) 2026-03-04 07:58:58 -05:00
Alexander Whitestone
d1f2ae3ed4 Security: fix XSS vulnerabilities in health and grok routes (#124) 2026-03-04 07:58:49 -05:00
Alexander Whitestone
15c7ee5d1e Refactor: migrate inline middleware to dedicated classes (#123) 2026-03-04 07:58:39 -05:00
Alexander Whitestone
eb2b34876b security: implement rate limiting for chat API and fix calm tests (#122) 2026-03-03 08:16:36 -05:00
Alexander Whitestone
584eeb679e Operation Darling Purge: slim to wealth core (-33,783 lines) (#121) 2026-03-02 13:17:38 -05:00
Alexander Whitestone
f694eff0a4 fix: align calm imports with project conventions (#120) 2026-03-02 12:04:30 -05:00
AlexanderWhitestone
d080e67faf feat: Implement Minimum Viable Calm (MVC) feature and initial tests 2026-03-02 11:46:40 -05:00
Alexander Whitestone
62ef1120a4 Memory Unification + Canonical Identity: -11,074 lines of homebrew (#119) 2026-03-02 09:58:07 -05:00
Alexander Whitestone
785440ac31 Security: XSS Prevention in Mission Control Dashboard (#117)
* security: prevent XSS in mission control dashboard by using textContent and DOM manipulation instead of innerHTML

* docs: document XSS prevention decision in DECISIONS.md
2026-03-02 07:31:27 -05:00
Alexander Whitestone
f7c574e0b2 feat: distributed brain architecture with rqlite and local embeddings (#118)
- Add new brain module with rqlite-based distributed memory and task queue
- Implement BrainClient for memory operations (store, recall, search)
- Implement DistributedWorker for continuous task processing
- Add local embeddings via sentence-transformers (all-MiniLM-L6-v2)
  - No OpenAI dependency, runs 100% local on CPU
  - 384-dim embeddings, 80MB model download
- Deprecate persona system (swarm/personas.py, persona_node.py)
- Deprecate hands system (hands/__init__.py, routes)
- Update marketplace, tools, hands routes for brain integration
- Add sentence-transformers and numpy to dependencies
- All changes backward compatible with deprecation warnings

Co-authored-by: Alexander Payne <apayne@MM.local>
2026-03-02 07:31:15 -05:00
Alexander Whitestone
d9bb26b9c5 feat: add security middleware suite - CSRF, security headers, and request logging (#116)
Implements three security middleware components with full test coverage:

- CSRF Protection: Token generation/validation, safe method allowlist,
  auto-exempt webhooks, constant-time comparison for timing attack prevention

- Security Headers: X-Content-Type-Options, X-Frame-Options, CSP,
  Permissions-Policy, Referrer-Policy, HSTS (production)

- Request Logging: Method/path/status/duration logging with correlation IDs,
  configurable path exclusions, X-Forwarded-For support

Also fixes Discord test isolation issue where settings.discord_token
was not being properly reset between tests.

New files:
- src/dashboard/middleware/{csrf,security_headers,request_logging}.py
- tests/dashboard/middleware/test_{csrf,security_headers,request_logging}.py

Addresses design review recommendations R3, R8, R9, R4.

All tests pass: 1950 passed, 40 skipped

Co-authored-by: Alexander Payne <apayne@MM.local>
2026-03-01 20:57:31 -05:00