Commit Graph

380 Commits

Author SHA1 Message Date
e6bde2f907 chore: remove dead taskosaur/postgres/redis services, fix root user (#32)
All checks were successful
Tests / lint (pull_request) Successful in 5s
Tests / test (pull_request) Successful in 53s
- Remove taskosaur, postgres, redis services (zero Python references)
- Remove postgres-data, redis-data volumes
- Remove taskosaur env vars from dashboard and .env.example
- Change user: "0:0" to user: "" (override per-environment)
- Update header comments to reflect actual services
- celery-worker/openfang remain behind profiles
- Net: -93 lines of dead config
2026-03-14 18:52:44 -04:00
b01c1cb582 Merge pull request '[loop-cycle-6] fix: Ollama disconnect logging and error handling (#92)' (#96) from fix/ollama-disconnect-logging into main
All checks were successful
Tests / lint (push) Successful in 3s
Tests / test (push) Successful in 59s
Tests / lint (pull_request) Successful in 3s
Tests / test (pull_request) Successful in 47s
2026-03-14 18:41:25 -04:00
bce6e7d030 fix: log Ollama disconnections with specific error handling (#92)
All checks were successful
Tests / lint (pull_request) Successful in 5s
Tests / test (pull_request) Successful in 1m4s
- BaseAgent.run(): catch httpx.ConnectError/ReadError/ConnectionError,
  log 'Ollama disconnected: <error>' at ERROR level, then re-raise
- session.py: distinguish Ollama disconnects from other errors in
  chat(), chat_with_tools(), continue_chat() — return specific message
  'Ollama appears to be disconnected' instead of generic error
- 11 new tests covering all disconnect paths
2026-03-14 18:40:15 -04:00
8a14bbb3e0 Merge pull request '[loop-cycle-5] fix: warmup model on cold load (#82)' (#95) from fix/warmup-cold-model into main
All checks were successful
Tests / lint (push) Successful in 3s
Tests / test (push) Successful in 56s
2026-03-14 18:26:48 -04:00
d1a8b16cd7 Merge pull request '[loop-cycle-5] test: skip voice_loop tests when numpy missing (#48)' (#94) from fix/skip-voice-tests-no-numpy into main
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-14 18:26:40 -04:00
bf30d26dd1 test: skip voice_loop tests gracefully when numpy unavailable
All checks were successful
Tests / lint (pull_request) Successful in 5s
Tests / test (pull_request) Successful in 49s
Wrap numpy and voice_loop imports in try/except with pytestmark skipif.
Tests skip cleanly instead of ImportError when numpy not in dev deps.

Closes #48
2026-03-14 18:24:56 -04:00
86956bd057 fix: warmup model on cold load to prevent first-request disconnect
Some checks failed
Tests / lint (pull_request) Successful in 3s
Tests / test (pull_request) Failing after 16s
Add _warmup_model() that sends a minimal generation request (1 token)
before returning the Agent. 60s timeout handles cold VRAM loads.
Warns but does not abort if warmup fails.

Closes #82
2026-03-14 18:24:00 -04:00
23ed2b2791 Merge pull request '[loop-cycle-4] fix: prune dead web_search tool (#87)' (#93) from fix/prune-dead-web-search into main
Some checks failed
Tests / lint (push) Successful in 2s
Tests / test (push) Failing after 14s
2026-03-14 18:15:25 -04:00
b3a1e0ce36 fix: prune dead web_search tool — ddgs never installed (#87)
Some checks failed
Tests / lint (pull_request) Successful in 5s
Tests / test (pull_request) Failing after 17s
Remove DuckDuckGoTools import, all web_search registrations across 4 toolkit
factories, catalog entry, safety classification, prompt references, and
session regex. Total: -41 lines of dead code.

consult_grok is functional (grok_enabled=True, API key set) and opt-in,
so it stays — but Timmy never calls it autonomously, which is correct
sovereign behavior (no cloud calls unless user permits).

Closes #87
2026-03-14 18:13:51 -04:00
7ff012883a Merge pull request '[loop-cycle-3] fix: model introspection prefix-match collision (#77)' (#91) from fix/model-introspection-prefix-match into main
Some checks failed
Tests / lint (push) Successful in 3s
Tests / test (push) Failing after 12s
2026-03-14 18:04:40 -04:00
7132b42ff3 fix: model introspection uses exact match, queries /api/ps first
Some checks failed
Tests / lint (pull_request) Successful in 3s
Tests / test (pull_request) Failing after 13s
_get_ollama_model() used prefix match (startswith) on /api/tags,
causing qwen3:30b to match qwen3.5:latest. Now:
1. Queries /api/ps (loaded models) first — most accurate
2. Falls back to /api/tags with exact name match
3. Reports actual running model, not just configured one

Updated test_get_system_info_contains_model to not assume model==config.

Fixes #77. 5 regression tests added.
2026-03-14 18:03:59 -04:00
1f09323e09 Merge pull request '[loop-cycle-2] test: regression tests for confirmation warning spam (#79)' (#90) from fix/confirmation-warning-spam into main
Some checks failed
Tests / lint (push) Successful in 3s
Tests / test (push) Failing after 15s
2026-03-14 17:55:16 -04:00
74e426c63b [loop-cycle-2] fix: suppress confirmation tool WARNING spam (#79) (#89)
Some checks failed
Tests / test (push) Has been cancelled
Tests / lint (push) Has been cancelled
2026-03-14 17:54:58 -04:00
586c8e3a75 fix: remove unused variable lint warning
Some checks failed
Tests / lint (pull_request) Successful in 3s
Tests / test (pull_request) Failing after 15s
2026-03-14 17:54:27 -04:00
e09ca203dc Merge pull request '[loop-cycle-1] feat: tool allowlist for autonomous operation (#69)' (#88) from fix/tool-allowlist-autonomous into main 2026-03-14 17:53:16 -04:00
09fcf956ec Merge pull request '[loop-cycle-1] feat: tool allowlist for autonomous operation (#69)' (#88) from fix/tool-allowlist-autonomous into main
Some checks failed
Tests / lint (push) Successful in 3s
Tests / test (push) Failing after 13s
2026-03-14 17:41:56 -04:00
d28e2f4a7e [loop-cycle-1] feat: tool allowlist for autonomous operation (#69)
Some checks failed
Tests / lint (pull_request) Successful in 4s
Tests / test (pull_request) Failing after 13s
Add config/allowlist.yaml — YAML-driven gate that auto-approves bounded
tool calls when no human is present.

When Timmy runs with --autonomous or stdin is not a terminal, tool calls
are checked against allowlist: matched → auto-approved, else → rejected.

Changes:
  - config/allowlist.yaml: shell prefixes, deny patterns, path rules
  - tool_safety.py: is_allowlisted() checks tools against YAML rules
  - cli.py: --autonomous flag, _is_interactive() detection
  - 44 new allowlist tests, 8 updated CLI tests

Closes #69
2026-03-14 17:39:48 -04:00
0b0251f702 Merge pull request '[loop-cycle-13] fix: configurable model fallback chains (#53)' (#76) from fix/configurable-fallback-models into main
Some checks failed
Tests / lint (push) Successful in 3s
Tests / test (push) Failing after 13s
2026-03-14 17:28:34 -04:00
94cd1a9840 fix: make model fallback chains configurable (#53)
Some checks failed
Tests / lint (pull_request) Successful in 4s
Tests / test (pull_request) Failing after 13s
Move hardcoded model fallback lists from module-level constants into
settings.fallback_models and settings.vision_fallback_models (pydantic
Settings fields). Can now be overridden via env vars
FALLBACK_MODELS / VISION_FALLBACK_MODELS or config/providers.yaml.

Removed:
- OLLAMA_MODEL_PRIMARY / OLLAMA_MODEL_FALLBACK from config.py
- DEFAULT_MODEL_FALLBACKS / VISION_MODEL_FALLBACKS from agent.py

get_effective_ollama_model() and _resolve_model_with_fallback() now
walk the configurable chains instead of hardcoded constants.

5 new tests guard the configurable behavior and prevent regression
to hardcoded constants.
2026-03-14 17:26:47 -04:00
f097784de8 Merge pull request '[loop-cycle-12] fix: brevity tuning — Timmy speaks plainly (#71)' (#75) from fix/brevity-tuning into main
Some checks failed
Tests / lint (push) Successful in 37s
Tests / test (push) Failing after 38s
2026-03-14 17:18:06 -04:00
061c8f6628 fix: brevity tuning — plain text prompts, markdown=False, front-loaded brevity
Some checks failed
Tests / lint (pull_request) Successful in 15s
Tests / test (pull_request) Failing after 24s
Closes #71: Timmy was responding with elaborate markdown formatting
(tables, headers, emoji, bullet lists) for simple questions.

Root causes fixed:
1. Agno Agent markdown=True flag explicitly told the model to format
   responses as markdown. Set to False in both agent.py and agents/base.py.
2. SYSTEM_PROMPT_FULL used ## and ### markdown headers, bold (**), and
   numbered lists — teaching by example that markdown is expected.
   Rewritten to plain text with labeled sections.
3. Brevity instructions were buried at the bottom of the full prompt.
   Moved to immediately after the opening line as 'VOICE AND BREVITY'
   with explicit override priority.
4. Orchestrator prompt in agents.yaml was silent on response style.
   Added 'Voice: brief, plain, direct' with concrete examples.

The full prompt is now 41 lines shorter (124 → 83). The prompt itself
practices the brevity it preaches.

SOUL.md alignment:
- 'Brevity is a kindness' — now front-loaded in both base and agent prompt
- 'I do not fill silence with noise' — explicit in both tiers
- 'I speak plainly. I prefer short sentences.' — structural enforcement

4 new tests guard against regression:
- test_full_prompt_brevity_first: brevity section before tools/memory
- test_full_prompt_no_markdown_headers: no ## or ### in prompt text
- test_full_prompt_plain_text_brevity: 'plain text' instruction present
- test_lite_prompt_brevity: lite tier also instructs brevity
2026-03-14 17:15:56 -04:00
3c671de446 Merge pull request '[loop-cycle-9] fix: thinking engine skips MCP tools to avoid cancel-scope errors (#72)' (#74) from fix/thinking-mcp-cancel-scope into main
Some checks failed
Tests / lint (push) Successful in 3s
Tests / test (push) Failing after 15s
2026-03-14 16:51:07 -04:00
rockachopa
927e25cc40 Merge pull request 'fix: replace print() with proper logging (#29, #51)' (#59) from fix/print-to-logging into main
Some checks failed
Tests / lint (pull_request) Successful in 3s
Tests / test (pull_request) Failing after 18s
2026-03-14 16:50:04 -04:00
rockachopa
2d2b566e58 Merge pull request 'fix: replace print() with proper logging (#29, #51)' (#59) from fix/print-to-logging into main
Some checks failed
Tests / lint (push) Successful in 3s
Tests / test (push) Failing after 13s
2026-03-14 16:34:48 -04:00
64fd1d9829 voice: reinforce brevity at top of system prompt
Some checks failed
Tests / lint (push) Successful in 3s
Tests / test (push) Failing after 14s
2026-03-14 16:32:47 -04:00
f0b0e2f202 fix: WebSocket 403 spam and missing /swarm endpoints
Some checks failed
Tests / lint (push) Successful in 3s
Tests / test (push) Failing after 16s
- CSRF middleware now skips WebSocket upgrade requests (they don't carry tokens)
- Added /swarm/live WebSocket endpoint wired to ws_manager singleton
- Added /swarm/agents/sidebar HTMX partial (was 404 on every dashboard poll)

Stops hundreds of 403 Forbidden + 404 log lines per minute.
2026-03-14 16:29:59 -04:00
b30b5c6b57 [loop-cycle-6] Break thinking rumination loop — semantic dedup (#38)
Some checks failed
Tests / lint (push) Successful in 3s
Tests / test (push) Failing after 25s
Add post-generation similarity check to ThinkingEngine.think_once().

Problem: Timmy's thinking engine generates repetitive thoughts because
small local models ignore 'don't repeat' instructions in the prompt.
The same observation ('still no chat messages', 'Alexander's name is in
profile') would appear 14+ times in a single day's journal.

Fix: After generating a thought, compare it against the last 5 thoughts
using SequenceMatcher. If similarity >= 0.6, retry with a new seed up to
2 times. If all retries produce repetitive content, discard rather than
store. Uses stdlib difflib — no new dependencies.

Changes:
- thinking.py: Add _is_too_similar() method with SequenceMatcher
- thinking.py: Wrap generation in retry loop with dedup check
- test_thinking.py: 7 new tests covering exact match, near match,
  different thoughts, retry behavior, and max-retry discard

+96/-20 lines in thinking.py, +87 lines in tests.
2026-03-14 16:21:16 -04:00
rockachopa
0d61b709da Merge pull request '[loop-cycle-5] Persist chat history in SQLite (#46)' (#63) from fix/issue-46-chat-persistence into main
Some checks failed
Tests / lint (push) Successful in 3s
Tests / test (push) Failing after 14s
2026-03-14 16:10:55 -04:00
79edfd1106 feat: persist chat history in SQLite — survives server restarts
Some checks failed
Tests / lint (pull_request) Successful in 2s
Tests / test (pull_request) Failing after 13s
Replace in-memory MessageLog with SQLite-backed implementation.
Same API surface (append/all/clear/len) so zero caller changes needed.

- data/chat.db stores messages with role, content, timestamp, source
- Lazy DB connection (opened on first use, not at import time)
- Retention policy: oldest messages pruned when count > 500
- New .recent(limit) method for efficient last-N queries
- Thread-safe with explicit locking
- WAL mode for concurrent read performance
- Test isolation: conftest redirects DB to tmp_path per test
- 8 new tests: persistence, retention, concurrency, source field

Closes #46
2026-03-14 16:09:26 -04:00
rockachopa
013a2cc330 Merge pull request 'feat: add --session-id to timmy chat CLI' (#62) from fix/cli-session-id into main
Some checks failed
Tests / lint (push) Successful in 3s
Tests / test (push) Failing after 14s
2026-03-14 16:06:16 -04:00
f426df5b42 feat: add --session-id option to timmy chat CLI
Some checks failed
Tests / lint (pull_request) Successful in 3s
Tests / test (pull_request) Failing after 15s
Allows specifying a named session for conversation persistence.
Use cases:
- Autonomous loops can have their own session (e.g. --session-id loop)
- Multiple users/agents can maintain separate conversations
- Testing different conversation threads without polluting the default

Precedence: --session-id > --new > default 'cli' session
2026-03-14 16:05:00 -04:00
rockachopa
bef4fc1024 Merge pull request '[loop-cycle-4] Push event system coverage to ≥80% on all modules' (#61) from fix/issue-45-event-coverage into main
Some checks failed
Tests / lint (push) Successful in 3s
Tests / test (push) Failing after 14s
2026-03-14 16:02:27 -04:00
9535dd86de test: push event system coverage to ≥80% on all three modules
Some checks failed
Tests / lint (pull_request) Successful in 4s
Tests / test (pull_request) Failing after 16s
Add 3 targeted tests for infrastructure/error_capture.py:
- test_stale_entries_pruned: exercises dedup cache pruning (line 61)
- test_git_context_fallback_on_failure: exercises exception path (lines 90-91)
- test_returns_none_when_feedback_disabled: exercises early return (line 112)

Coverage results (63 tests, all passing):
- error_capture.py: 75.6% → 80.0%
- broadcaster.py: 93.9% (unchanged)
- bus.py: 92.9% (unchanged)
- Total: 88.1% → 89.4%

Closes #45
2026-03-14 16:01:05 -04:00
70d5dc5ce1 fix: replace eval() with AST-walking safe evaluator in calculator
Some checks failed
Tests / lint (push) Successful in 3s
Tests / test (push) Failing after 14s
Fixes #52

- Replace eval() in calculator() with _safe_eval() that walks the AST
  and only permits: numeric constants, arithmetic ops (+,-,*,/,//,%,**),
  unary +/-, math module access, and whitelisted builtins (abs, round,
  min, max)
- Reject all other syntax: imports, attribute access on non-math objects,
  lambdas, comprehensions, string literals, etc.
- Add 39 tests covering arithmetic, precedence, math functions,
  allowed builtins, error handling, and 14 injection prevention cases
2026-03-14 15:51:35 -04:00
rockachopa
122d07471e Merge pull request 'fix: sanitize dynamic innerHTML in HTML templates (#47)' (#58) from fix/xss-sanitize into main
Some checks failed
Tests / lint (push) Successful in 4s
Tests / test (push) Failing after 12s
2026-03-14 15:45:11 -04:00
rockachopa
3d110098d1 Merge pull request 'feat: Add Kimi agent workspace with development scaffolding' (#44) from kimi/agent-workspace-init into main
Some checks failed
Tests / lint (push) Successful in 2s
Tests / test (push) Failing after 14s
Reviewed-on: http://localhost:3000/rockachopa/Timmy-time-dashboard/pulls/44
2026-03-14 15:09:04 -04:00
db129bbe16 fix: replace print() with proper logging (#29, #51)
Some checks failed
Tests / lint (pull_request) Successful in 3s
Tests / test (pull_request) Failing after 14s
2026-03-14 15:07:07 -04:00
591954891a fix: sanitize dynamic innerHTML in templates (#47)
Some checks failed
Tests / lint (pull_request) Successful in 2s
Tests / test (pull_request) Failing after 15s
2026-03-14 15:07:00 -04:00
bb287b2c73 fix: sanitize WebSocket data in HTML templates (XSS #47) 2026-03-14 15:01:48 -04:00
efb1feafc9 fix: replace print() with proper logging (#29, #51) 2026-03-14 15:01:34 -04:00
6233a8ccd6 feat: Add Kimi agent workspace with development scaffolding
Some checks failed
Tests / lint (pull_request) Successful in 3s
Tests / test (pull_request) Failing after 13s
Create the Kimi (Moonshot AI) agent workspace per AGENTS.md conventions:

Workspace Structure:
- .kimi/AGENTS.md - Workspace guide and conventions
- .kimi/README.md - Quick reference documentation
- .kimi/CHECKPOINT.md - Session state tracking
- .kimi/TODO.md - Task list for upcoming work
- .kimi/notes/ - Working notes directory
- .kimi/plans/ - Plan documents
- .kimi/worktrees/ - Git worktrees (reserved)

Development Scripts:
- scripts/bootstrap.sh - One-time workspace setup (venv, deps, .env)
- scripts/resume.sh - Quick status check + resume prompt
- scripts/dev.sh - Development helpers (status, test, lint, format, clean, nuke)

Features:
- Validates Python 3.11+, venv, deps, .env, git config
- Provides quick status on git, tests, Ollama, dashboard
- Commands for testing, linting, formatting, cleaning

Per AGENTS.md:
- Kimi is Build Tier for large-context feature drops
- Follows existing project patterns
- No changes to source code - workspace only
2026-03-14 14:30:38 -04:00
fa838b0063 fix: clean shutdown — silence MCP async-generator teardown noise
Some checks failed
Tests / lint (push) Successful in 2s
Tests / test (push) Failing after 13s
Swallow anyio cancel-scope RuntimeError and BaseExceptionGroup
from MCP stdio_client generators during GC on voice loop exit.
Custom unraisablehook + loop exception handler + warnings filter.
2026-03-14 14:12:05 -04:00
782218aa2c fix: voice loop — persistent event loop, markdown stripping, MCP noise
Some checks failed
Tests / lint (push) Successful in 3s
Tests / test (push) Failing after 12s
Three fixes from real-world testing:

1. Event loop: replaced asyncio.run() with a persistent loop so
   Agno's MCP sessions survive across conversation turns. No more
   'Event loop is closed' errors on turn 2+.

2. Markdown stripping: voice preamble tells Timmy to respond in
   natural spoken language, plus _strip_markdown() as a safety net
   removes **bold**, *italic*, bullets, headers, code fences, etc.
   TTS no longer reads 'asterisk asterisk'.

3. MCP noise: _suppress_mcp_noise() quiets mcp/agno/httpx loggers
   during voice mode so the terminal shows clean transcript only.

32 tests (12 new for markdown stripping + persistent loop).
2026-03-14 14:05:24 -04:00
dbadfc425d feat: sovereign voice loop — timmy voice command
Some checks failed
Tests / lint (push) Successful in 4s
Tests / test (push) Failing after 14s
Adds fully local listen-think-speak voice interface.
STT: Whisper, LLM: Ollama, TTS: Piper. No cloud, no network.

- src/timmy/voice_loop.py: VoiceLoop with VAD, Whisper, Piper
- src/timmy/cli.py: new voice command
- pyproject.toml: voice extras updated
- 20 new tests
2026-03-14 13:58:56 -04:00
d770d66150 Merge pull request 'fix: fact distillation — block garbage and secrets, improve dedup' (#43) from fix/fact-distillation into main
All checks were successful
Tests / lint (push) Successful in 2s
Tests / test (push) Successful in 33s
hermes/v0.1
2026-03-14 13:00:59 -04:00
8ecc0b1780 fix: fact distillation — block garbage and secrets, improve dedup
All checks were successful
Tests / lint (pull_request) Successful in 5s
Tests / test (pull_request) Successful in 33s
- Rewrite distillation prompt with explicit GOOD/BAD examples
  Good: user preferences, project decisions, learned knowledge
  Bad: meta-observations, internal state, credentials
- Add security filter: block facts containing token/password/secret/key patterns
- Add meta-observation filter: block self-referential 'my thinking' facts
- Lower dedup threshold 0.9 -> 0.75 to catch paraphrased duplicates

Ref #40
2026-03-14 13:00:30 -04:00
60631a7ad1 Merge pull request 'fix: persistent event loop in CLI interview — no more Event loop is closed' (#42) from fix/cli-event-loop into main
All checks were successful
Tests / lint (push) Successful in 3s
Tests / test (push) Successful in 33s
2026-03-14 12:58:46 -04:00
b222b28856 fix: use persistent event loop in interview command
All checks were successful
Tests / lint (pull_request) Successful in 3s
Tests / test (pull_request) Successful in 32s
Replace repeated asyncio.run() calls with a single event loop that
persists across all interview questions. The old approach created and
destroyed loops per question, orphaning MCP stdio transports and
causing 'Event loop is closed' errors on ~50% of questions.

Also adds clean shutdown: closes MCP sessions before closing the loop.

Ref #36
2026-03-14 12:58:11 -04:00
f19b52a4dc Merge pull request 'fix: corrupted memory state + regex bug in update_user_profile' (#41) from fix/corrupted-memory-state into main
All checks were successful
Tests / lint (push) Successful in 2s
Tests / test (push) Successful in 33s
2026-03-14 12:56:52 -04:00
58ddf55282 fix: regex corruption in update_user_profile + hot memory write guards
All checks were successful
Tests / lint (pull_request) Successful in 5s
Tests / test (pull_request) Successful in 36s
- memory_system.py: fix regex replacement in update_user_profile()
  Used lambda instead of raw replacement string to prevent corruption
- memory_system.py: add guards to update_section() for empty/oversized writes

Ref #39
2026-03-14 12:55:02 -04:00