Commit Graph

213 Commits

Author SHA1 Message Date
b12e29b92e fix: dedup memory consolidation with existing memory search (#105)
_maybe_consolidate() now checks get_memories(subject=agent_id)
before storing. Skips if a memory of the same type (pattern/anomaly)
was created within the last hour. Prevents duplicate consolidation
entries on repeated task completion/failure events.

Also restructured branching: neutral success rates (0.3-0.8) now
return early instead of falling through.

9 new tests. 1465 total passing.
2026-03-14 20:04:18 -04:00
ffae5aa7c6 feat: add codebase self-knowledge to system prompts (#78, #80)
Adds SELF-KNOWLEDGE section to both SYSTEM_PROMPT_LITE and
SYSTEM_PROMPT_FULL with:
- Codebase map (all src/timmy/ modules with descriptions)
- Current capabilities list (grounded, not generic)
- Known limitations (real gaps, not LLM platitudes)

Lite prompt gets condensed version; full prompt gets detailed.
Timmy can now answer 'what does tool_safety.py do?' and give
grounded answers about his actual limitations.

10 new tests. 1456 total passing.
2026-03-14 19:58:10 -04:00
0204ecc520 Merge pull request '[loop-cycle-9] fix: CLI multi-word messages (#26)' (#107) from fix/cli-multiword-messages into main 2026-03-14 19:48:28 -04:00
9171d93ef9 fix: CLI chat accepts multi-word messages without quotes
Changed message param from str to list[str] in chat() and route() commands.
Words are joined with spaces, so 'timmy chat hello how are you' works without
quoting. Single-word messages still work as before.
- chat(): message: list[str], joined to full_message
- route(): message: list[str], joined to full_message
- 7 new tests in test_cli_multiword.py

Closes #26
2026-03-14 19:43:52 -04:00
f8f3b9b81f feat: inject session_id into system prompt for session identity awareness
Timmy can now introspect which session he's running in (cli, dashboard, loop).
- Add {session_id} placeholder to both lite and full system prompts
- get_system_prompt() accepts session_id param (default: 'unknown')
- create_timmy() accepts session_id param, forwards to prompt
- CLI chat/think/status pass their session_id to create_timmy()
- session.py passes _DEFAULT_SESSION_ID to create_timmy()
- 7 new tests in test_session_identity.py
- Updated 2 existing CLI test mocks

Closes #64
2026-03-14 19:43:11 -04:00
4b553fa0ed Merge pull request 'fix: word-boundary routing + debug route command (#31)' (#102) from fix/routing-patterns into main 2026-03-14 19:24:16 -04:00
342b9a9d84 Merge pull request 'feat: JSON status endpoints for briefing, memory, swarm (#49, #50)' (#101) from fix/api-consistency into main 2026-03-14 19:24:15 -04:00
b3809f5246 feat: add JSON status endpoints for briefing, memory, swarm (#49, #50) 2026-03-14 19:23:32 -04:00
67497133fd fix: word-boundary routing + debug route command (#31)
- Replace substring matching with word-boundary regex in route_request()
- "fix the bug" now correctly routes to coder
- Multi-word patterns match if all words appear (any order)
- Add "timmy route" CLI command for debugging routing
- Add route_request_with_match() for pattern visibility
- Expand routing keywords in agents.yaml
- 22 new routing tests, all passing
2026-03-14 19:21:30 -04:00
c1ec43c59f Merge pull request '[loop-cycle-8] fix: replace 59 bare except clauses with proper logging (#25)' (#99) from fix/bare-except-clauses into main 2026-03-14 19:08:40 -04:00
fdc5b861ca fix: replace 59 bare except clauses with proper logging (#25)
All `except Exception:` now catch as `except Exception as exc:` with
appropriate logging (warning for critical paths, debug for graceful degradation).

Added logger setup to 4 files that lacked it:
- src/timmy/memory/vector_store.py
- src/dashboard/middleware/csrf.py
- src/dashboard/middleware/security_headers.py
- src/spark/memory.py

31 files changed across timmy core, dashboard, infrastructure, integrations.
Zero bare excepts remain. 1340 tests passing.
2026-03-14 19:07:14 -04:00
9c59b386d8 feat: add OLLAMA_NUM_CTX config to cap context window (#83)
- Add ollama_num_ctx setting (default 4096) to config.py
- Pass num_ctx option to Ollama in agent.py and agents/base.py
- Add OLLAMA_NUM_CTX to .env.example with usage docs
- Add context_window note in providers.yaml
- Fix mock_settings in test_agent.py for new attribute
- qwen3:30b with 4096 ctx uses ~19GB vs 45GB default
2026-03-14 18:54:43 -04:00
bce6e7d030 fix: log Ollama disconnections with specific error handling (#92)
- BaseAgent.run(): catch httpx.ConnectError/ReadError/ConnectionError,
  log 'Ollama disconnected: <error>' at ERROR level, then re-raise
- session.py: distinguish Ollama disconnects from other errors in
  chat(), chat_with_tools(), continue_chat() — return specific message
  'Ollama appears to be disconnected' instead of generic error
- 11 new tests covering all disconnect paths
2026-03-14 18:40:15 -04:00
8a14bbb3e0 Merge pull request '[loop-cycle-5] fix: warmup model on cold load (#82)' (#95) from fix/warmup-cold-model into main 2026-03-14 18:26:48 -04:00
86956bd057 fix: warmup model on cold load to prevent first-request disconnect
Add _warmup_model() that sends a minimal generation request (1 token)
before returning the Agent. 60s timeout handles cold VRAM loads.
Warns but does not abort if warmup fails.

Closes #82
2026-03-14 18:24:00 -04:00
b3a1e0ce36 fix: prune dead web_search tool — ddgs never installed (#87)
Remove DuckDuckGoTools import, all web_search registrations across 4 toolkit
factories, catalog entry, safety classification, prompt references, and
session regex. Total: -41 lines of dead code.

consult_grok is functional (grok_enabled=True, API key set) and opt-in,
so it stays — but Timmy never calls it autonomously, which is correct
sovereign behavior (no cloud calls unless user permits).

Closes #87
2026-03-14 18:13:51 -04:00
7132b42ff3 fix: model introspection uses exact match, queries /api/ps first
_get_ollama_model() used prefix match (startswith) on /api/tags,
causing qwen3:30b to match qwen3.5:latest. Now:
1. Queries /api/ps (loaded models) first — most accurate
2. Falls back to /api/tags with exact name match
3. Reports actual running model, not just configured one

Updated test_get_system_info_contains_model to not assume model==config.

Fixes #77. 5 regression tests added.
2026-03-14 18:03:59 -04:00
74e426c63b [loop-cycle-2] fix: suppress confirmation tool WARNING spam (#79) (#89) 2026-03-14 17:54:58 -04:00
09fcf956ec Merge pull request '[loop-cycle-1] feat: tool allowlist for autonomous operation (#69)' (#88) from fix/tool-allowlist-autonomous into main 2026-03-14 17:41:56 -04:00
d28e2f4a7e [loop-cycle-1] feat: tool allowlist for autonomous operation (#69)
Add config/allowlist.yaml — YAML-driven gate that auto-approves bounded
tool calls when no human is present.

When Timmy runs with --autonomous or stdin is not a terminal, tool calls
are checked against allowlist: matched → auto-approved, else → rejected.

Changes:
  - config/allowlist.yaml: shell prefixes, deny patterns, path rules
  - tool_safety.py: is_allowlisted() checks tools against YAML rules
  - cli.py: --autonomous flag, _is_interactive() detection
  - 44 new allowlist tests, 8 updated CLI tests

Closes #69
2026-03-14 17:39:48 -04:00
94cd1a9840 fix: make model fallback chains configurable (#53)
Move hardcoded model fallback lists from module-level constants into
settings.fallback_models and settings.vision_fallback_models (pydantic
Settings fields). Can now be overridden via env vars
FALLBACK_MODELS / VISION_FALLBACK_MODELS or config/providers.yaml.

Removed:
- OLLAMA_MODEL_PRIMARY / OLLAMA_MODEL_FALLBACK from config.py
- DEFAULT_MODEL_FALLBACKS / VISION_MODEL_FALLBACKS from agent.py

get_effective_ollama_model() and _resolve_model_with_fallback() now
walk the configurable chains instead of hardcoded constants.

5 new tests guard the configurable behavior and prevent regression
to hardcoded constants.
2026-03-14 17:26:47 -04:00
061c8f6628 fix: brevity tuning — plain text prompts, markdown=False, front-loaded brevity
Closes #71: Timmy was responding with elaborate markdown formatting
(tables, headers, emoji, bullet lists) for simple questions.

Root causes fixed:
1. Agno Agent markdown=True flag explicitly told the model to format
   responses as markdown. Set to False in both agent.py and agents/base.py.
2. SYSTEM_PROMPT_FULL used ## and ### markdown headers, bold (**), and
   numbered lists — teaching by example that markdown is expected.
   Rewritten to plain text with labeled sections.
3. Brevity instructions were buried at the bottom of the full prompt.
   Moved to immediately after the opening line as 'VOICE AND BREVITY'
   with explicit override priority.
4. Orchestrator prompt in agents.yaml was silent on response style.
   Added 'Voice: brief, plain, direct' with concrete examples.

The full prompt is now 41 lines shorter (124 → 83). The prompt itself
practices the brevity it preaches.

SOUL.md alignment:
- 'Brevity is a kindness' — now front-loaded in both base and agent prompt
- 'I do not fill silence with noise' — explicit in both tiers
- 'I speak plainly. I prefer short sentences.' — structural enforcement

4 new tests guard against regression:
- test_full_prompt_brevity_first: brevity section before tools/memory
- test_full_prompt_no_markdown_headers: no ## or ### in prompt text
- test_full_prompt_plain_text_brevity: 'plain text' instruction present
- test_lite_prompt_brevity: lite tier also instructs brevity
2026-03-14 17:15:56 -04:00
rockachopa
927e25cc40 Merge pull request 'fix: replace print() with proper logging (#29, #51)' (#59) from fix/print-to-logging into main 2026-03-14 16:50:04 -04:00
64fd1d9829 voice: reinforce brevity at top of system prompt 2026-03-14 16:32:47 -04:00
f0b0e2f202 fix: WebSocket 403 spam and missing /swarm endpoints
- CSRF middleware now skips WebSocket upgrade requests (they don't carry tokens)
- Added /swarm/live WebSocket endpoint wired to ws_manager singleton
- Added /swarm/agents/sidebar HTMX partial (was 404 on every dashboard poll)

Stops hundreds of 403 Forbidden + 404 log lines per minute.
2026-03-14 16:29:59 -04:00
b30b5c6b57 [loop-cycle-6] Break thinking rumination loop — semantic dedup (#38)
Add post-generation similarity check to ThinkingEngine.think_once().

Problem: Timmy's thinking engine generates repetitive thoughts because
small local models ignore 'don't repeat' instructions in the prompt.
The same observation ('still no chat messages', 'Alexander's name is in
profile') would appear 14+ times in a single day's journal.

Fix: After generating a thought, compare it against the last 5 thoughts
using SequenceMatcher. If similarity >= 0.6, retry with a new seed up to
2 times. If all retries produce repetitive content, discard rather than
store. Uses stdlib difflib — no new dependencies.

Changes:
- thinking.py: Add _is_too_similar() method with SequenceMatcher
- thinking.py: Wrap generation in retry loop with dedup check
- test_thinking.py: 7 new tests covering exact match, near match,
  different thoughts, retry behavior, and max-retry discard

+96/-20 lines in thinking.py, +87 lines in tests.
2026-03-14 16:21:16 -04:00
79edfd1106 feat: persist chat history in SQLite — survives server restarts
Replace in-memory MessageLog with SQLite-backed implementation.
Same API surface (append/all/clear/len) so zero caller changes needed.

- data/chat.db stores messages with role, content, timestamp, source
- Lazy DB connection (opened on first use, not at import time)
- Retention policy: oldest messages pruned when count > 500
- New .recent(limit) method for efficient last-N queries
- Thread-safe with explicit locking
- WAL mode for concurrent read performance
- Test isolation: conftest redirects DB to tmp_path per test
- 8 new tests: persistence, retention, concurrency, source field

Closes #46
2026-03-14 16:09:26 -04:00
f426df5b42 feat: add --session-id option to timmy chat CLI
Allows specifying a named session for conversation persistence.
Use cases:
- Autonomous loops can have their own session (e.g. --session-id loop)
- Multiple users/agents can maintain separate conversations
- Testing different conversation threads without polluting the default

Precedence: --session-id > --new > default 'cli' session
2026-03-14 16:05:00 -04:00
70d5dc5ce1 fix: replace eval() with AST-walking safe evaluator in calculator
Fixes #52

- Replace eval() in calculator() with _safe_eval() that walks the AST
  and only permits: numeric constants, arithmetic ops (+,-,*,/,//,%,**),
  unary +/-, math module access, and whitelisted builtins (abs, round,
  min, max)
- Reject all other syntax: imports, attribute access on non-math objects,
  lambdas, comprehensions, string literals, etc.
- Add 39 tests covering arithmetic, precedence, math functions,
  allowed builtins, error handling, and 14 injection prevention cases
2026-03-14 15:51:35 -04:00
db129bbe16 fix: replace print() with proper logging (#29, #51) 2026-03-14 15:07:07 -04:00
591954891a fix: sanitize dynamic innerHTML in templates (#47) 2026-03-14 15:07:00 -04:00
bb287b2c73 fix: sanitize WebSocket data in HTML templates (XSS #47) 2026-03-14 15:01:48 -04:00
efb1feafc9 fix: replace print() with proper logging (#29, #51) 2026-03-14 15:01:34 -04:00
fa838b0063 fix: clean shutdown — silence MCP async-generator teardown noise
Swallow anyio cancel-scope RuntimeError and BaseExceptionGroup
from MCP stdio_client generators during GC on voice loop exit.
Custom unraisablehook + loop exception handler + warnings filter.
2026-03-14 14:12:05 -04:00
782218aa2c fix: voice loop — persistent event loop, markdown stripping, MCP noise
Three fixes from real-world testing:

1. Event loop: replaced asyncio.run() with a persistent loop so
   Agno's MCP sessions survive across conversation turns. No more
   'Event loop is closed' errors on turn 2+.

2. Markdown stripping: voice preamble tells Timmy to respond in
   natural spoken language, plus _strip_markdown() as a safety net
   removes **bold**, *italic*, bullets, headers, code fences, etc.
   TTS no longer reads 'asterisk asterisk'.

3. MCP noise: _suppress_mcp_noise() quiets mcp/agno/httpx loggers
   during voice mode so the terminal shows clean transcript only.

32 tests (12 new for markdown stripping + persistent loop).
2026-03-14 14:05:24 -04:00
dbadfc425d feat: sovereign voice loop — timmy voice command
Adds fully local listen-think-speak voice interface.
STT: Whisper, LLM: Ollama, TTS: Piper. No cloud, no network.

- src/timmy/voice_loop.py: VoiceLoop with VAD, Whisper, Piper
- src/timmy/cli.py: new voice command
- pyproject.toml: voice extras updated
- 20 new tests
2026-03-14 13:58:56 -04:00
8ecc0b1780 fix: fact distillation — block garbage and secrets, improve dedup
- Rewrite distillation prompt with explicit GOOD/BAD examples
  Good: user preferences, project decisions, learned knowledge
  Bad: meta-observations, internal state, credentials
- Add security filter: block facts containing token/password/secret/key patterns
- Add meta-observation filter: block self-referential 'my thinking' facts
- Lower dedup threshold 0.9 -> 0.75 to catch paraphrased duplicates

Ref #40
2026-03-14 13:00:30 -04:00
b222b28856 fix: use persistent event loop in interview command
Replace repeated asyncio.run() calls with a single event loop that
persists across all interview questions. The old approach created and
destroyed loops per question, orphaning MCP stdio transports and
causing 'Event loop is closed' errors on ~50% of questions.

Also adds clean shutdown: closes MCP sessions before closing the loop.

Ref #36
2026-03-14 12:58:11 -04:00
58ddf55282 fix: regex corruption in update_user_profile + hot memory write guards
- memory_system.py: fix regex replacement in update_user_profile()
  Used lambda instead of raw replacement string to prevent corruption
- memory_system.py: add guards to update_section() for empty/oversized writes

Ref #39
2026-03-14 12:55:02 -04:00
2f623826bd cleanup: delete dead modules — ~7,900 lines removed
Closes #22, Closes #23

Deleted: brain/, swarm/, openfang/, paperclip/, cascade_adapter,
memory_migrate, agents/timmy.py, dead routes + all corresponding tests.

Updated pyproject.toml, app.py, loop_qa.py for removed imports.
2026-03-14 09:49:24 -04:00
Trip T
78167675f2 feat: replace custom Gitea client with MCP servers
Replace the bespoke GiteaHand httpx client and tools_gitea.py wrappers
with official MCP tool servers (gitea-mcp + filesystem MCP), wired into
Agno via MCPTools. Switch all session functions to async (arun/acontinue_run)
so MCP tools auto-connect. Delete ~1070 lines of custom Gitea code.

- Create src/timmy/mcp_tools.py with MCP factories + standalone issue bridge
- Wire MCPTools into agent.py tool list (Gitea + filesystem)
- Switch session.py chat/chat_with_tools/continue_chat to async
- Update all callers (dashboard routes, Discord vendor, CLI, thinking engine)
- Add gitea_token fallback from ~/.config/gitea/token
- Add MCP session cleanup to app shutdown hook
- Update tool_safety.py for MCP tool names
- 11 new tests, all 1417 passing, coverage 74.2%

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 21:40:32 -04:00
Trip T
41d6ebaf6a feat: CLI session persistence + tool confirmation gate
- Chat sessions persist across `timmy chat` invocations via Agno SQLite
  (session_id="cli"), fixing context amnesia between turns
- Dangerous tools (shell, write_file, etc.) now prompt for approval in CLI
  instead of silently exiting — uses typer.confirm() + Agno continue_run
- --new flag starts a fresh conversation when needed
- Improved _maybe_file_issues prompt for engineer-quality issue bodies
  (what's happening, expected behavior, suggested fix, acceptance criteria)
- think/status commands also pass session_id for continuity

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:55:56 -04:00
Trip T
350e6f54ff fix: prevent "Event loop is closed" on repeated Gitea API calls
The httpx AsyncClient was cached across asyncio.run() boundaries.
Each asyncio.run() creates and closes a new event loop, leaving the
cached client's connections on a dead loop.  Second+ calls would fail
with "Event loop is closed".

Fix: create a fresh client per request and close it in a finally block.
No more cross-loop client reuse.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:40:39 -04:00
Trip T
7163b15300 feat: add Gitea issue creation — Timmy's self-improvement channel
Give Timmy the ability to file Gitea issues when he notices bugs,
stale state, or improvement opportunities in his own codebase.

Components:
- GiteaHand async API client (infrastructure/hands/gitea.py)
  - Token auth with ~/.config/gitea/token fallback
  - Create/list/close issues, dedup by title similarity
  - Graceful degradation when Gitea unreachable
- Tool functions (timmy/tools_gitea.py)
  - create_gitea_issue: file issues with dedup + work order bridge
  - list_gitea_issues: check existing backlog
  - Classified as SAFE (no confirmation needed)
- Thinking post-hook (_maybe_file_issues in thinking.py)
  - Every 20 thoughts, LLM classifies recent thoughts for actionable items
  - Auto-files bugs/improvements to Gitea with dedup
  - Bridges to local work order system for dashboard tracking
- Config: gitea_url, gitea_token, gitea_repo, gitea_enabled,
  gitea_timeout, thinking_issue_every

All 1426 tests pass, 74.17% coverage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 18:36:06 -04:00
Trip T
b2f12ca97c feat: consolidate memory into unified memory.db with 4-type model
Consolidates 3 separate memory databases (semantic_memory.db, swarm.db
memory_entries, brain.db) into a single data/memory.db with facts,
chunks, and episodes tables.

Key changes:
- Add unified schema (timmy/memory/unified.py) with 3 core tables
- Redirect vector_store.py and semantic_memory.py to memory.db
- Add thought distillation: every Nth thought extracts lasting facts
- Enrich agent context with known facts in system prompt
- Add memory_forget tool for removing outdated memories
- Unify embeddings: vector_store delegates to semantic_memory.embed_text
- Bridge spark events to unified event log
- Add pruning for thoughts and events with configurable retention
- Add data migration script (timmy/memory_migrate.py)
- Deprecate brain.memory in favor of unified system

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 11:23:18 -04:00
Trip T
bc38fee817 feat: add DB Explorer for read-only SQLite inspection
Adds /db-explorer page and JSON API to browse all 15 SQLite databases
in data/. Sidebar lists databases with sizes, clicking one renders all
tables as scrollable data tables with row truncation at 200.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 10:41:13 -04:00
Trip T
d42c574d26 feat: add Loop QA self-testing framework
Structured self-test framework that probes 6 capabilities (tool use,
multistep planning, memory read/write, self-coding, lightning econ) in
round-robin. Reuses existing infra: event_log for persistence,
create_task() for upgrade proposals, capture_error() for crash handling,
and in-memory circuit breaker for failure tracking.

- src/timmy/loop_qa.py: Capability enum, 6 async probes, orchestrator
- src/dashboard/routes/loop_qa.py: JSON + HTMX health endpoints
- HTMX partial polls every 30s on the health panel
- Background scheduler in app.py lifespan
- 25 tests covering probes, orchestrator, health snapshot, routes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 22:33:16 -04:00
Trip T
f1e909b1e3 feat: enrich thinking engine — anti-loop, anti-confabulation, grounding
Rewrite _THINKING_PROMPT with strict rules: 2-3 sentence limit,
anti-confabulation (only reference real data), anti-repetition.

- Add _pick_seed_type() with recent-type dedup (excludes last 3)
- Add _gather_system_snapshot() for real-time grounding (time, thought
  count, chat activity, task queue)
- Improve _build_continuity_context() with anti-repetition header and
  100-char truncation
- Fix journal + memory timestamps to include local timezone
- 12 new TDD tests covering all improvements

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 21:47:28 -04:00
Trip T
f8dadeec59 feat: tick prompt arg + fix name extraction learning verbs as names
Add optional prompt argument to `timmy tick` so custom journal
prompts can be passed from the CLI (seed_type="prompted").

Fix extract_user_name() learning verbs as names (e.g. "Serving").
Now requires the candidate word to start with a capital letter in
the original message, rejects common verb suffixes (-ing, -tion,
etc.), and deduplicates the naive regex in TimmyWithMemory to use
the fixed ConversationManager.extract_user_name() instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 21:11:53 -04:00
Trip T
6a7875e05f feat: heartbeat memory hooks — pre-recall and post-update
Wire MEMORY.md + soul.md into the thinking loop so each heartbeat
is grounded in identity and recent context, breaking repetitive loops.

Pre-hook: _load_memory_context() reads hot memory first (changes each
cycle) then soul.md (stable identity), truncated to 1500 chars.

Post-hook: _update_memory() writes a "Last Reflection" section to
MEMORY.md after each thought so the next cycle has fresh context.

soul.md is read-only from the heartbeat — never modified by it.
All hooks degrade gracefully and never crash the heartbeat.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 20:54:13 -04:00