- BaseAgent.run(): catch httpx.ConnectError/ReadError/ConnectionError,
log 'Ollama disconnected: <error>' at ERROR level, then re-raise
- session.py: distinguish Ollama disconnects from other errors in
chat(), chat_with_tools(), continue_chat() — return specific message
'Ollama appears to be disconnected' instead of generic error
- 11 new tests covering all disconnect paths
Wrap numpy and voice_loop imports in try/except with pytestmark skipif.
Tests skip cleanly instead of ImportError when numpy not in dev deps.
Closes#48
Add _warmup_model() that sends a minimal generation request (1 token)
before returning the Agent. 60s timeout handles cold VRAM loads.
Warns but does not abort if warmup fails.
Closes#82
Remove DuckDuckGoTools import, all web_search registrations across 4 toolkit
factories, catalog entry, safety classification, prompt references, and
session regex. Total: -41 lines of dead code.
consult_grok is functional (grok_enabled=True, API key set) and opt-in,
so it stays — but Timmy never calls it autonomously, which is correct
sovereign behavior (no cloud calls unless user permits).
Closes#87
_get_ollama_model() used prefix match (startswith) on /api/tags,
causing qwen3:30b to match qwen3.5:latest. Now:
1. Queries /api/ps (loaded models) first — most accurate
2. Falls back to /api/tags with exact name match
3. Reports actual running model, not just configured one
Updated test_get_system_info_contains_model to not assume model==config.
Fixes#77. 5 regression tests added.
Add config/allowlist.yaml — YAML-driven gate that auto-approves bounded
tool calls when no human is present.
When Timmy runs with --autonomous or stdin is not a terminal, tool calls
are checked against allowlist: matched → auto-approved, else → rejected.
Changes:
- config/allowlist.yaml: shell prefixes, deny patterns, path rules
- tool_safety.py: is_allowlisted() checks tools against YAML rules
- cli.py: --autonomous flag, _is_interactive() detection
- 44 new allowlist tests, 8 updated CLI tests
Closes#69
Move hardcoded model fallback lists from module-level constants into
settings.fallback_models and settings.vision_fallback_models (pydantic
Settings fields). Can now be overridden via env vars
FALLBACK_MODELS / VISION_FALLBACK_MODELS or config/providers.yaml.
Removed:
- OLLAMA_MODEL_PRIMARY / OLLAMA_MODEL_FALLBACK from config.py
- DEFAULT_MODEL_FALLBACKS / VISION_MODEL_FALLBACKS from agent.py
get_effective_ollama_model() and _resolve_model_with_fallback() now
walk the configurable chains instead of hardcoded constants.
5 new tests guard the configurable behavior and prevent regression
to hardcoded constants.
Closes#71: Timmy was responding with elaborate markdown formatting
(tables, headers, emoji, bullet lists) for simple questions.
Root causes fixed:
1. Agno Agent markdown=True flag explicitly told the model to format
responses as markdown. Set to False in both agent.py and agents/base.py.
2. SYSTEM_PROMPT_FULL used ## and ### markdown headers, bold (**), and
numbered lists — teaching by example that markdown is expected.
Rewritten to plain text with labeled sections.
3. Brevity instructions were buried at the bottom of the full prompt.
Moved to immediately after the opening line as 'VOICE AND BREVITY'
with explicit override priority.
4. Orchestrator prompt in agents.yaml was silent on response style.
Added 'Voice: brief, plain, direct' with concrete examples.
The full prompt is now 41 lines shorter (124 → 83). The prompt itself
practices the brevity it preaches.
SOUL.md alignment:
- 'Brevity is a kindness' — now front-loaded in both base and agent prompt
- 'I do not fill silence with noise' — explicit in both tiers
- 'I speak plainly. I prefer short sentences.' — structural enforcement
4 new tests guard against regression:
- test_full_prompt_brevity_first: brevity section before tools/memory
- test_full_prompt_no_markdown_headers: no ## or ### in prompt text
- test_full_prompt_plain_text_brevity: 'plain text' instruction present
- test_lite_prompt_brevity: lite tier also instructs brevity
Add post-generation similarity check to ThinkingEngine.think_once().
Problem: Timmy's thinking engine generates repetitive thoughts because
small local models ignore 'don't repeat' instructions in the prompt.
The same observation ('still no chat messages', 'Alexander's name is in
profile') would appear 14+ times in a single day's journal.
Fix: After generating a thought, compare it against the last 5 thoughts
using SequenceMatcher. If similarity >= 0.6, retry with a new seed up to
2 times. If all retries produce repetitive content, discard rather than
store. Uses stdlib difflib — no new dependencies.
Changes:
- thinking.py: Add _is_too_similar() method with SequenceMatcher
- thinking.py: Wrap generation in retry loop with dedup check
- test_thinking.py: 7 new tests covering exact match, near match,
different thoughts, retry behavior, and max-retry discard
+96/-20 lines in thinking.py, +87 lines in tests.
Replace in-memory MessageLog with SQLite-backed implementation.
Same API surface (append/all/clear/len) so zero caller changes needed.
- data/chat.db stores messages with role, content, timestamp, source
- Lazy DB connection (opened on first use, not at import time)
- Retention policy: oldest messages pruned when count > 500
- New .recent(limit) method for efficient last-N queries
- Thread-safe with explicit locking
- WAL mode for concurrent read performance
- Test isolation: conftest redirects DB to tmp_path per test
- 8 new tests: persistence, retention, concurrency, source field
Closes#46
Allows specifying a named session for conversation persistence.
Use cases:
- Autonomous loops can have their own session (e.g. --session-id loop)
- Multiple users/agents can maintain separate conversations
- Testing different conversation threads without polluting the default
Precedence: --session-id > --new > default 'cli' session
Fixes#52
- Replace eval() in calculator() with _safe_eval() that walks the AST
and only permits: numeric constants, arithmetic ops (+,-,*,/,//,%,**),
unary +/-, math module access, and whitelisted builtins (abs, round,
min, max)
- Reject all other syntax: imports, attribute access on non-math objects,
lambdas, comprehensions, string literals, etc.
- Add 39 tests covering arithmetic, precedence, math functions,
allowed builtins, error handling, and 14 injection prevention cases
Three fixes from real-world testing:
1. Event loop: replaced asyncio.run() with a persistent loop so
Agno's MCP sessions survive across conversation turns. No more
'Event loop is closed' errors on turn 2+.
2. Markdown stripping: voice preamble tells Timmy to respond in
natural spoken language, plus _strip_markdown() as a safety net
removes **bold**, *italic*, bullets, headers, code fences, etc.
TTS no longer reads 'asterisk asterisk'.
3. MCP noise: _suppress_mcp_noise() quiets mcp/agno/httpx loggers
during voice mode so the terminal shows clean transcript only.
32 tests (12 new for markdown stripping + persistent loop).
Replace repeated asyncio.run() calls with a single event loop that
persists across all interview questions. The old approach created and
destroyed loops per question, orphaning MCP stdio transports and
causing 'Event loop is closed' errors on ~50% of questions.
Also adds clean shutdown: closes MCP sessions before closing the loop.
Ref #36
- memory_system.py: fix regex replacement in update_user_profile()
Used lambda instead of raw replacement string to prevent corruption
- memory_system.py: add guards to update_section() for empty/oversized writes
Ref #39