- Add src/timmy/workspace.py: WorkspaceMonitor tracks correspondence.md
line count and inbox file list via data/workspace_state.json
- Wire workspace checks into _gather_system_snapshot() so Timmy sees
new workspace activity in his thinking context
- Add 'workspace' seed type for workspace-triggered reflections
- Add _check_workspace() post-hook to mark items as seen after processing
- 16 tests covering detection, mark_seen, persistence, edge cases
- .timmy_gitea_token checked before legacy ~/.config/gitea/token
- Token created for Timmy user (id=2) with write collaborator perms
- .timmy_gitea_token added to .gitignore
1. gitea-mcp → gitea-mcp-server (brew binary name). Fixes Timmy's
Gitea triage — MCP server can now be found on PATH.
2. Mark test_returns_dict_with_expected_keys as @pytest.mark.slow —
it runs pytest recursively and always exceeds the 30s timeout.
3. Fix ruff F841 lint in test_cli.py (unused result= variable).
Three optimizations to the agentic loop:
1. Cache loop agent as singleton (avoid repeated warmups)
2. Sliding window for step context (last 2 results, not all)
3. Replace summary LLM call with deterministic summary
Saves 1 full LLM inference call per agentic loop invocation
(30-60s on local models) and reduces context window pressure.
Also fixes pre-existing test_cli.py repl test bugs (missing result= assignment).
Timmy can now delegate coding tasks to Kimi CLI (262K context).
Includes timeout handling, workdir validation, output truncation.
Sovereign division of labor — Timmy plans, Kimi codes.
Timmy can now run his own test suite via the run_self_tests() tool.
Supports 'fast' (unit only), 'full', or specific path scopes.
Returns structured results with pass/fail counts.
Sovereign self-verification — a fundamental capability.
- Replace flat file list with layered architecture map (config→agent→prompt→tool→memory→interface)
- Add SELF-MODIFICATION section: Timmy knows he can edit his own config and code
- Remove false limitation 'cannot modify own source code'
- Update tests to match new section headers, add self-modification tests
Closes#81 (reasoning depth)
Closes#86 (self-modification awareness)
[loop-cycle-11]
_maybe_consolidate() now checks get_memories(subject=agent_id)
before storing. Skips if a memory of the same type (pattern/anomaly)
was created within the last hour. Prevents duplicate consolidation
entries on repeated task completion/failure events.
Also restructured branching: neutral success rates (0.3-0.8) now
return early instead of falling through.
9 new tests. 1465 total passing.
Adds SELF-KNOWLEDGE section to both SYSTEM_PROMPT_LITE and
SYSTEM_PROMPT_FULL with:
- Codebase map (all src/timmy/ modules with descriptions)
- Current capabilities list (grounded, not generic)
- Known limitations (real gaps, not LLM platitudes)
Lite prompt gets condensed version; full prompt gets detailed.
Timmy can now answer 'what does tool_safety.py do?' and give
grounded answers about his actual limitations.
10 new tests. 1456 total passing.
Changed message param from str to list[str] in chat() and route() commands.
Words are joined with spaces, so 'timmy chat hello how are you' works without
quoting. Single-word messages still work as before.
- chat(): message: list[str], joined to full_message
- route(): message: list[str], joined to full_message
- 7 new tests in test_cli_multiword.py
Closes#26
Timmy can now introspect which session he's running in (cli, dashboard, loop).
- Add {session_id} placeholder to both lite and full system prompts
- get_system_prompt() accepts session_id param (default: 'unknown')
- create_timmy() accepts session_id param, forwards to prompt
- CLI chat/think/status pass their session_id to create_timmy()
- session.py passes _DEFAULT_SESSION_ID to create_timmy()
- 7 new tests in test_session_identity.py
- Updated 2 existing CLI test mocks
Closes#64
- Replace substring matching with word-boundary regex in route_request()
- "fix the bug" now correctly routes to coder
- Multi-word patterns match if all words appear (any order)
- Add "timmy route" CLI command for debugging routing
- Add route_request_with_match() for pattern visibility
- Expand routing keywords in agents.yaml
- 22 new routing tests, all passing
All `except Exception:` now catch as `except Exception as exc:` with
appropriate logging (warning for critical paths, debug for graceful degradation).
Added logger setup to 4 files that lacked it:
- src/timmy/memory/vector_store.py
- src/dashboard/middleware/csrf.py
- src/dashboard/middleware/security_headers.py
- src/spark/memory.py
31 files changed across timmy core, dashboard, infrastructure, integrations.
Zero bare excepts remain. 1340 tests passing.
- Add ollama_num_ctx setting (default 4096) to config.py
- Pass num_ctx option to Ollama in agent.py and agents/base.py
- Add OLLAMA_NUM_CTX to .env.example with usage docs
- Add context_window note in providers.yaml
- Fix mock_settings in test_agent.py for new attribute
- qwen3:30b with 4096 ctx uses ~19GB vs 45GB default
- BaseAgent.run(): catch httpx.ConnectError/ReadError/ConnectionError,
log 'Ollama disconnected: <error>' at ERROR level, then re-raise
- session.py: distinguish Ollama disconnects from other errors in
chat(), chat_with_tools(), continue_chat() — return specific message
'Ollama appears to be disconnected' instead of generic error
- 11 new tests covering all disconnect paths
Add _warmup_model() that sends a minimal generation request (1 token)
before returning the Agent. 60s timeout handles cold VRAM loads.
Warns but does not abort if warmup fails.
Closes#82
Remove DuckDuckGoTools import, all web_search registrations across 4 toolkit
factories, catalog entry, safety classification, prompt references, and
session regex. Total: -41 lines of dead code.
consult_grok is functional (grok_enabled=True, API key set) and opt-in,
so it stays — but Timmy never calls it autonomously, which is correct
sovereign behavior (no cloud calls unless user permits).
Closes#87
_get_ollama_model() used prefix match (startswith) on /api/tags,
causing qwen3:30b to match qwen3.5:latest. Now:
1. Queries /api/ps (loaded models) first — most accurate
2. Falls back to /api/tags with exact name match
3. Reports actual running model, not just configured one
Updated test_get_system_info_contains_model to not assume model==config.
Fixes#77. 5 regression tests added.
Add config/allowlist.yaml — YAML-driven gate that auto-approves bounded
tool calls when no human is present.
When Timmy runs with --autonomous or stdin is not a terminal, tool calls
are checked against allowlist: matched → auto-approved, else → rejected.
Changes:
- config/allowlist.yaml: shell prefixes, deny patterns, path rules
- tool_safety.py: is_allowlisted() checks tools against YAML rules
- cli.py: --autonomous flag, _is_interactive() detection
- 44 new allowlist tests, 8 updated CLI tests
Closes#69
Move hardcoded model fallback lists from module-level constants into
settings.fallback_models and settings.vision_fallback_models (pydantic
Settings fields). Can now be overridden via env vars
FALLBACK_MODELS / VISION_FALLBACK_MODELS or config/providers.yaml.
Removed:
- OLLAMA_MODEL_PRIMARY / OLLAMA_MODEL_FALLBACK from config.py
- DEFAULT_MODEL_FALLBACKS / VISION_MODEL_FALLBACKS from agent.py
get_effective_ollama_model() and _resolve_model_with_fallback() now
walk the configurable chains instead of hardcoded constants.
5 new tests guard the configurable behavior and prevent regression
to hardcoded constants.
Closes#71: Timmy was responding with elaborate markdown formatting
(tables, headers, emoji, bullet lists) for simple questions.
Root causes fixed:
1. Agno Agent markdown=True flag explicitly told the model to format
responses as markdown. Set to False in both agent.py and agents/base.py.
2. SYSTEM_PROMPT_FULL used ## and ### markdown headers, bold (**), and
numbered lists — teaching by example that markdown is expected.
Rewritten to plain text with labeled sections.
3. Brevity instructions were buried at the bottom of the full prompt.
Moved to immediately after the opening line as 'VOICE AND BREVITY'
with explicit override priority.
4. Orchestrator prompt in agents.yaml was silent on response style.
Added 'Voice: brief, plain, direct' with concrete examples.
The full prompt is now 41 lines shorter (124 → 83). The prompt itself
practices the brevity it preaches.
SOUL.md alignment:
- 'Brevity is a kindness' — now front-loaded in both base and agent prompt
- 'I do not fill silence with noise' — explicit in both tiers
- 'I speak plainly. I prefer short sentences.' — structural enforcement
4 new tests guard against regression:
- test_full_prompt_brevity_first: brevity section before tools/memory
- test_full_prompt_no_markdown_headers: no ## or ### in prompt text
- test_full_prompt_plain_text_brevity: 'plain text' instruction present
- test_lite_prompt_brevity: lite tier also instructs brevity
Add post-generation similarity check to ThinkingEngine.think_once().
Problem: Timmy's thinking engine generates repetitive thoughts because
small local models ignore 'don't repeat' instructions in the prompt.
The same observation ('still no chat messages', 'Alexander's name is in
profile') would appear 14+ times in a single day's journal.
Fix: After generating a thought, compare it against the last 5 thoughts
using SequenceMatcher. If similarity >= 0.6, retry with a new seed up to
2 times. If all retries produce repetitive content, discard rather than
store. Uses stdlib difflib — no new dependencies.
Changes:
- thinking.py: Add _is_too_similar() method with SequenceMatcher
- thinking.py: Wrap generation in retry loop with dedup check
- test_thinking.py: 7 new tests covering exact match, near match,
different thoughts, retry behavior, and max-retry discard
+96/-20 lines in thinking.py, +87 lines in tests.
Replace in-memory MessageLog with SQLite-backed implementation.
Same API surface (append/all/clear/len) so zero caller changes needed.
- data/chat.db stores messages with role, content, timestamp, source
- Lazy DB connection (opened on first use, not at import time)
- Retention policy: oldest messages pruned when count > 500
- New .recent(limit) method for efficient last-N queries
- Thread-safe with explicit locking
- WAL mode for concurrent read performance
- Test isolation: conftest redirects DB to tmp_path per test
- 8 new tests: persistence, retention, concurrency, source field
Closes#46
Allows specifying a named session for conversation persistence.
Use cases:
- Autonomous loops can have their own session (e.g. --session-id loop)
- Multiple users/agents can maintain separate conversations
- Testing different conversation threads without polluting the default
Precedence: --session-id > --new > default 'cli' session
Fixes#52
- Replace eval() in calculator() with _safe_eval() that walks the AST
and only permits: numeric constants, arithmetic ops (+,-,*,/,//,%,**),
unary +/-, math module access, and whitelisted builtins (abs, round,
min, max)
- Reject all other syntax: imports, attribute access on non-math objects,
lambdas, comprehensions, string literals, etc.
- Add 39 tests covering arithmetic, precedence, math functions,
allowed builtins, error handling, and 14 injection prevention cases