- .timmy_gitea_token checked before legacy ~/.config/gitea/token
- Token created for Timmy user (id=2) with write collaborator perms
- .timmy_gitea_token added to .gitignore
1. gitea-mcp → gitea-mcp-server (brew binary name). Fixes Timmy's
Gitea triage — MCP server can now be found on PATH.
2. Mark test_returns_dict_with_expected_keys as @pytest.mark.slow —
it runs pytest recursively and always exceeds the 30s timeout.
3. Fix ruff F841 lint in test_cli.py (unused result= variable).
All `except Exception:` now catch as `except Exception as exc:` with
appropriate logging (warning for critical paths, debug for graceful degradation).
Added logger setup to 4 files that lacked it:
- src/timmy/memory/vector_store.py
- src/dashboard/middleware/csrf.py
- src/dashboard/middleware/security_headers.py
- src/spark/memory.py
31 files changed across timmy core, dashboard, infrastructure, integrations.
Zero bare excepts remain. 1340 tests passing.
- Add ollama_num_ctx setting (default 4096) to config.py
- Pass num_ctx option to Ollama in agent.py and agents/base.py
- Add OLLAMA_NUM_CTX to .env.example with usage docs
- Add context_window note in providers.yaml
- Fix mock_settings in test_agent.py for new attribute
- qwen3:30b with 4096 ctx uses ~19GB vs 45GB default
Move hardcoded model fallback lists from module-level constants into
settings.fallback_models and settings.vision_fallback_models (pydantic
Settings fields). Can now be overridden via env vars
FALLBACK_MODELS / VISION_FALLBACK_MODELS or config/providers.yaml.
Removed:
- OLLAMA_MODEL_PRIMARY / OLLAMA_MODEL_FALLBACK from config.py
- DEFAULT_MODEL_FALLBACKS / VISION_MODEL_FALLBACKS from agent.py
get_effective_ollama_model() and _resolve_model_with_fallback() now
walk the configurable chains instead of hardcoded constants.
5 new tests guard the configurable behavior and prevent regression
to hardcoded constants.
Give Timmy the ability to file Gitea issues when he notices bugs,
stale state, or improvement opportunities in his own codebase.
Components:
- GiteaHand async API client (infrastructure/hands/gitea.py)
- Token auth with ~/.config/gitea/token fallback
- Create/list/close issues, dedup by title similarity
- Graceful degradation when Gitea unreachable
- Tool functions (timmy/tools_gitea.py)
- create_gitea_issue: file issues with dedup + work order bridge
- list_gitea_issues: check existing backlog
- Classified as SAFE (no confirmation needed)
- Thinking post-hook (_maybe_file_issues in thinking.py)
- Every 20 thoughts, LLM classifies recent thoughts for actionable items
- Auto-files bugs/improvements to Gitea with dedup
- Bridges to local work order system for dashboard tracking
- Config: gitea_url, gitea_token, gitea_repo, gitea_enabled,
gitea_timeout, thinking_issue_every
All 1426 tests pass, 74.17% coverage.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Consolidates 3 separate memory databases (semantic_memory.db, swarm.db
memory_entries, brain.db) into a single data/memory.db with facts,
chunks, and episodes tables.
Key changes:
- Add unified schema (timmy/memory/unified.py) with 3 core tables
- Redirect vector_store.py and semantic_memory.py to memory.db
- Add thought distillation: every Nth thought extracts lasting facts
- Enrich agent context with known facts in system prompt
- Add memory_forget tool for removing outdated memories
- Unify embeddings: vector_store delegates to semantic_memory.embed_text
- Bridge spark events to unified event log
- Add pruning for thoughts and events with configurable retention
- Add data migration script (timmy/memory_migrate.py)
- Deprecate brain.memory in favor of unified system
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Model upgrade:
- qwen2.5:14b → qwen3.5:latest across config, tools, and docs
- Added qwen3.5 to multimodal model registry
Self-hosted Gitea CI:
- .gitea/workflows/tests.yml: lint + test jobs via act_runner
- Unified Dockerfile: pre-baked deps from poetry.lock for fast CI
- sitepackages=true in tox for ~2s dep resolution (was ~40s)
- OLLAMA_URL set to dead port in CI to prevent real LLM calls
Test isolation fixes:
- Smoke test fixture mocks create_timmy (was hitting real Ollama)
- WebSocket sends initial_state before joining broadcast pool (race fix)
- Tests use settings.ollama_model/url instead of hardcoded values
- skip_ci marker for Ollama-dependent tests, excluded in CI tox envs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* polish: streamline nav, extract inline styles, improve tablet UX
- Restructure desktop nav from 8+ flat links + overflow dropdown into
5 grouped dropdowns (Core, Agents, Intel, System, More) matching
the mobile menu structure to reduce decision fatigue
- Extract all inline styles from mission_control.html and base.html
notification elements into mission-control.css with semantic classes
- Replace JS-built innerHTML with secure DOM construction in
notification loader and chat history
- Add CONNECTING state to connection indicator (amber) instead of
showing OFFLINE before WebSocket connects
- Add tablet breakpoint (1024px) with larger touch targets for
Apple Pencil / stylus use and safe-area padding for iPad toolbar
- Add active-link highlighting in desktop dropdown menus
- Rename "Mission Control" page title to "System Overview" to
disambiguate from the chat home page
- Add "Home — Timmy Time" page title to index.html
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
* fix(security): move auth-gate credentials to environment variables
Hardcoded username, password, and HMAC secret in auth-gate.py replaced
with os.environ lookups. Startup now refuses to run if any variable is
unset. Added AUTH_GATE_SECRET/USER/PASS to .env.example.
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
* refactor(tooling): migrate from black+isort+bandit to ruff
Replace three separate linting/formatting tools with a single ruff
invocation. Updates tox.ini (lint, format, pre-push, pre-commit envs),
.pre-commit-config.yaml, and CI workflow. Fixes all ruff errors
including unused imports, missing raise-from, and undefined names.
Ruff config maps existing bandit skips to equivalent S-rules.
https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h
---------
Co-authored-by: Claude <noreply@anthropic.com>
* fix: reserve red for real errors, reduce log noise, allow Tailscale access
- Add _ColorFormatter: red = ERROR/CRITICAL only, yellow = WARNING, green = INFO
- Override uvicorn's default colors to use our scheme
- Downgrade discord "not installed" from ERROR to WARNING (optional dep)
- Downgrade DuckDuckGo unavailable from INFO to DEBUG
- Stop discord token watcher retry loop when discord.py not installed
- Add configurable trusted_hosts setting; dev mode allows all hosts
- Exclude .claude/ from uvicorn reload watcher (worktree isolation)
- Fix pre-commit hook: use tox -e unit, bump timeout to 60s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* style: auto-format with black
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: pre-commit hook auto-formats with black+isort before testing
Formatting should never block a commit — just fix it automatically.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Replace the homebrew regex-based tool extraction and manual dispatch
(tool_executor.py) with Agno's built-in Human-In-The-Loop confirmation:
- Toolkit(requires_confirmation_tools=...) marks dangerous tools
- agent.run() returns RunOutput with status=paused when confirmation needed
- RunRequirement.confirm()/reject() + agent.continue_run() resumes execution
Dashboard and Discord vendor both use the native flow. DuckDuckGo import
isolated so its absence doesn't kill all tools. Test stubs cleaned up
(agno is a real dependency, only truly optional packages stubbed).
1384 tests pass in parallel (~14s).
Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: name extraction blocklist, memory preview escaping, and gitignore cleanup
- Add _NAME_BLOCKLIST to extract_user_name() to reject gerunds and UI-state
words like "Sending" that were incorrectly captured as user names
- Collapse whitespace in get_memory_status() preview so newlines survive
JSON serialization without showing raw \n escape sequences
- Broaden .gitignore from specific memory/self/user_profile.md to memory/self/
and untrack memory/self/methodology.md (runtime-edited file)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: catch Ollama connection errors in session.py + add 71 smoke tests
- Wrap agent.run() in session.py with try/except so Ollama connection
failures return a graceful fallback message instead of dumping raw
tracebacks to Docker logs
- Add tests/test_smoke.py with 71 tests covering every GET route:
core pages, feature pages, JSON APIs, and a parametrized no-500 sweep
— catches import errors, template failures, and schema mismatches
that unit tests miss
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: agentic loop for multi-step tasks + Round 10 regression fixes
Agentic loop (Parts 1-4):
- Add multi-step chaining instructions to system prompt
- New agentic_loop.py with plan→execute→adapt→summarize flow
- Register plan_and_execute tool for background task execution
- Add max_agent_steps config setting (default: 10)
- Discord fix: 300s timeout, typing indicator, send error handling
- 16 new unit + e2e tests for agentic loop
Round 10 regressions (R1-R5, P1):
- R1: Fix literal \n escape sequences in tool responses
- R2: Chat timeout/error feedback in agent panel
- R3: /hands infinite spinner → static empty states
- R4: /self-coding infinite spinner → static stats + journal
- R5: /grok/status raw JSON → HTML dashboard template
- P1: VETO confirmation dialog on task cards
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI when agno is MagicMock stub
_call_agent() returned a MagicMock instead of a string when agno is
stubbed in tests, causing SQLite "Error binding parameter 4" on save.
Ensure the return value is always an actual string.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: briefing route 500 in CI — graceful degradation at route level
When agno is stubbed with MagicMock in CI, agent.run() returns a
MagicMock instead of raising — so the exception handler never fires
and a MagicMock propagates as the summary to SQLite, which can't
bind it.
Fix: catch at the route level and return a fallback Briefing object.
This follows the project's graceful degradation pattern — the briefing
page always renders, even when the backend is completely unavailable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- Replace dead `from swarm` imports in tools_delegation and tools_intro
with working implementations sourced from _PERSONAS
- Add `memory_write` tool so the agent can actually persist memories
when users ask it to remember something
- Enhance `memory_search` to search both vault files AND the runtime
vector store for cross-channel recall (Discord/web/Telegram)
- Add memory management config: memory_prune_days, memory_prune_keep_facts,
memory_vault_max_mb
- Auto-prune old vector store entries and warn on vault size at startup
- Update tests for new delegation agent list (mace removed)
Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: upgrade primary model from llama3.1:8b to qwen2.5:14b
- Swap OLLAMA_MODEL_PRIMARY to qwen2.5:14b for better reasoning
- llama3.1:8b-instruct becomes fallback
- Update .env default and README quick start
- Fix hardcoded model assertions in tests
qwen2.5:14b provides significantly better multi-step reasoning
and tool calling reliability while still running locally on
modest hardware. The 8B model remains as automatic fallback.
* security: centralize config, harden uploads, fix silent exceptions
- Add 9 pydantic Settings fields (skip_embeddings, disable_csrf,
rqlite_url, brain_source, brain_db_path, csrf_cookie_secure,
chat_api_max_body_bytes, timmy_test_mode) to centralize env-var access
- Migrate 8 os.environ.get() calls across 5 source files to use
`from config import settings` per project convention
- Add path traversal defense-in-depth to file upload endpoint
- Add 1MB request body size limit to chat API
- Make CSRF cookie secure flag configurable via settings
- Replace 2 silent `except: pass` blocks with debug logging in session.py
- Remove unused `import os` from brain/memory.py and csrf.py
- Update 5 CSRF test fixtures to patch settings instead of os.environ
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- Swap OLLAMA_MODEL_PRIMARY to qwen2.5:14b for better reasoning
- llama3.1:8b-instruct becomes fallback
- Update .env default and README quick start
- Fix hardcoded model assertions in tests
qwen2.5:14b provides significantly better multi-step reasoning
and tool calling reliability while still running locally on
modest hardware. The 8B model remains as automatic fallback.
Co-authored-by: Trip T <trip@local>
* Remove persona system, identity, and all Timmy references
Strip the codebase to pure orchestration logic:
- Delete TIMMY_IDENTITY.md and memory/self/identity.md
- Gut brain/identity.py to no-op stubs (empty returns)
- Remove all system prompts reinforcing Timmy's character, faith,
sovereignty, sign-off ("Sir, affirmative"), and agent roster
- Replace identity-laden prompts with generic local-AI-assistant prompts
- Remove "You work for Timmy" from all sub-agent system prompts
- Rename PersonaTools → AgentTools, PERSONA_TOOLKITS → AGENT_TOOLKITS
- Replace "timmy" agent ID with "orchestrator" across routes, marketplace,
tools catalog, and orchestrator class
- Strip Timmy references from config comments, templates, telegram bot,
chat API, and dashboard UI
- Delete tests/brain/test_identity.py entirely
- Fix all test assertions that checked for persona identity content
729 tests pass (2 pre-existing failures in test_calm.py unrelated).
https://claude.ai/code/session_01LjQGUE6nk9W9674zaxrYxy
* Add Taskosaur (PM + AI task execution) to docker-compose
Spins up Taskosaur alongside the dashboard on `docker compose up`:
- postgres:16-alpine (port 5432, Taskosaur DB)
- redis:7-alpine (Bull queue backend)
- taskosaur (ports 3000 API / 3001 UI)
- dashboard now depends_on taskosaur healthy
- TASKOSAUR_API_URL injected into dashboard environment
Dashboard can reach Taskosaur at http://taskosaur:3000/api on the
internal network. Frontend UI accessible at http://localhost:3001.
https://claude.ai/code/session_01LjQGUE6nk9W9674zaxrYxy
---------
Co-authored-by: Claude <noreply@anthropic.com>
Errors and uncaught exceptions are now automatically captured, deduplicated,
persisted to a rotating log file, and filed as bug report tasks in the
existing task queue — giving Timmy a sovereign, local issue tracker with
zero new dependencies.
- Add RotatingFileHandler writing errors to logs/errors.log (5MB rotate, 5 backups)
- Add error capture module with stack-trace hashing and 5-min dedup window
- Add FastAPI exception middleware + global exception handler
- Instrument all background loops (briefing, thinking, task processor) with capture_error()
- Extend task queue with bug_report task type and auto-approve rule
- Fix auto-approve type matching (was ignoring task_type field entirely)
- Add /bugs dashboard page and /api/bugs JSON endpoints
- Add ERROR_CAPTURED and BUG_REPORT_CREATED event types for real-time feed
- Add BUGS nav link to desktop and mobile navigation
- Add 16 tests covering error capture, deduplication, and bug report routes
Co-authored-by: Alexander Payne <apayne@MM.local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* test: remove hardcoded sleeps, add pytest-timeout
- Replace fixed time.sleep() calls with intelligent polling or WebDriverWait
- Add pytest-timeout dependency and --timeout=30 to prevent hangs
- Fixes test flakiness and improves test suite speed
* feat: add Aider AI tool to Forge's toolkit
- Add Aider tool that calls local Ollama (qwen2.5:14b) for AI coding assist
- Register tool in Forge's code toolkit
- Add functional tests for the Aider tool
* config: add opencode.json with local Ollama provider for sovereign AI
* feat: Timmy fixes and improvements
## Bug Fixes
- Fix read_file path resolution: add ~ expansion, proper relative path handling
- Add repo_root to config.py with auto-detection from .git location
- Fix hardcoded llama3.2 - now dynamic from settings.ollama_model
## Timmy's Requests
- Add communication protocol to AGENTS.md (read context first, explain changes)
- Create DECISIONS.md for architectural decision documentation
- Add reasoning guidance to system prompts (step-by-step, state uncertainty)
- Update tests to reflect correct model name (llama3.1:8b-instruct)
## Testing
- All 177 dashboard tests pass
- All 32 prompt/tool tests pass
---------
Co-authored-by: Alexander Payne <apayne@MM.local>
Inspired by OpenClaw-RL's multi-model orchestration, this adds four
features for custom model management:
1. Custom model registry (infrastructure/models/registry.py) — SQLite-backed
registry for GGUF, safetensors, HF checkpoint, and Ollama models with
role-based lookups (general, reward, teacher, judge).
2. Per-agent model assignment — each swarm persona can use a different model
instead of sharing the global default. Resolved via registry assignment >
persona default > global default.
3. Runtime model management API (/api/v1/models) — REST endpoints to register,
list, assign, enable/disable, and remove custom models without restart.
Includes a dashboard page at /models.
4. Reward model scoring (PRM-style) — majority-vote quality evaluation of
agent outputs using a configurable reward model. Scores persist in SQLite
and feed into the swarm learner.
New config settings: custom_weights_dir, reward_model_enabled,
reward_model_name, reward_model_votes.
54 new tests covering registry CRUD, API endpoints, agent assignments,
role lookups, and reward scoring.
https://claude.ai/code/session_01V4iTozMwcE2gjfnCJdCugC