Commit Graph

19 Commits

Author SHA1 Message Date
4b617cfcd0 fix: deep focus mode — single-problem context for Timmy (#409)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-19 02:54:19 -04:00
c1af9e3905 [loop-cycle-154] refactor: extract _annotate_confidence helper — DRY 3x duplication (#369) (#376) 2026-03-18 22:01:51 -04:00
560aed78c3 fix: add cognitive state as observable signal for Matrix avatar (#358)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-18 21:37:17 -04:00
22e0d2d4b3 [loop-cycle-66] fix: replace language-model with inference-backend in error messages (#334) 2026-03-18 20:27:06 -04:00
rockachopa
96e7961a0e fix: make confidence visible to users when below 0.7 threshold (#259)
Co-authored-by: rockachopa <alexpaynex@gmail.com>
Co-committed-by: rockachopa <alexpaynex@gmail.com>
2026-03-15 19:36:52 -04:00
8c63dabd9d [loop-cycle-57] fix: wire confidence estimation into chat flow (#231) (#232) 2026-03-15 13:58:35 -04:00
a50af74ea2 [loop-cycle-56] fix: resolve 5 lint errors on main (#203) (#224) 2026-03-15 13:40:40 -04:00
b3840238cb [loop-cycle-52] feat: response audit trail with inputs, confidence, errors (#144) (#191) 2026-03-15 12:34:48 -04:00
f8f3b9b81f feat: inject session_id into system prompt for session identity awareness
Timmy can now introspect which session he's running in (cli, dashboard, loop).
- Add {session_id} placeholder to both lite and full system prompts
- get_system_prompt() accepts session_id param (default: 'unknown')
- create_timmy() accepts session_id param, forwards to prompt
- CLI chat/think/status pass their session_id to create_timmy()
- session.py passes _DEFAULT_SESSION_ID to create_timmy()
- 7 new tests in test_session_identity.py
- Updated 2 existing CLI test mocks

Closes #64
2026-03-14 19:43:11 -04:00
bce6e7d030 fix: log Ollama disconnections with specific error handling (#92)
- BaseAgent.run(): catch httpx.ConnectError/ReadError/ConnectionError,
  log 'Ollama disconnected: <error>' at ERROR level, then re-raise
- session.py: distinguish Ollama disconnects from other errors in
  chat(), chat_with_tools(), continue_chat() — return specific message
  'Ollama appears to be disconnected' instead of generic error
- 11 new tests covering all disconnect paths
2026-03-14 18:40:15 -04:00
b3a1e0ce36 fix: prune dead web_search tool — ddgs never installed (#87)
Remove DuckDuckGoTools import, all web_search registrations across 4 toolkit
factories, catalog entry, safety classification, prompt references, and
session regex. Total: -41 lines of dead code.

consult_grok is functional (grok_enabled=True, API key set) and opt-in,
so it stays — but Timmy never calls it autonomously, which is correct
sovereign behavior (no cloud calls unless user permits).

Closes #87
2026-03-14 18:13:51 -04:00
Trip T
78167675f2 feat: replace custom Gitea client with MCP servers
Replace the bespoke GiteaHand httpx client and tools_gitea.py wrappers
with official MCP tool servers (gitea-mcp + filesystem MCP), wired into
Agno via MCPTools. Switch all session functions to async (arun/acontinue_run)
so MCP tools auto-connect. Delete ~1070 lines of custom Gitea code.

- Create src/timmy/mcp_tools.py with MCP factories + standalone issue bridge
- Wire MCPTools into agent.py tool list (Gitea + filesystem)
- Switch session.py chat/chat_with_tools/continue_chat to async
- Update all callers (dashboard routes, Discord vendor, CLI, thinking engine)
- Add gitea_token fallback from ~/.config/gitea/token
- Add MCP session cleanup to app shutdown hook
- Update tool_safety.py for MCP tool names
- 11 new tests, all 1417 passing, coverage 74.2%

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 21:40:32 -04:00
Alexander Whitestone
9d78eb31d1 ruff (#169)
* polish: streamline nav, extract inline styles, improve tablet UX

- Restructure desktop nav from 8+ flat links + overflow dropdown into
  5 grouped dropdowns (Core, Agents, Intel, System, More) matching
  the mobile menu structure to reduce decision fatigue
- Extract all inline styles from mission_control.html and base.html
  notification elements into mission-control.css with semantic classes
- Replace JS-built innerHTML with secure DOM construction in
  notification loader and chat history
- Add CONNECTING state to connection indicator (amber) instead of
  showing OFFLINE before WebSocket connects
- Add tablet breakpoint (1024px) with larger touch targets for
  Apple Pencil / stylus use and safe-area padding for iPad toolbar
- Add active-link highlighting in desktop dropdown menus
- Rename "Mission Control" page title to "System Overview" to
  disambiguate from the chat home page
- Add "Home — Timmy Time" page title to index.html

https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h

* fix(security): move auth-gate credentials to environment variables

Hardcoded username, password, and HMAC secret in auth-gate.py replaced
with os.environ lookups. Startup now refuses to run if any variable is
unset. Added AUTH_GATE_SECRET/USER/PASS to .env.example.

https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h

* refactor(tooling): migrate from black+isort+bandit to ruff

Replace three separate linting/formatting tools with a single ruff
invocation. Updates tox.ini (lint, format, pre-push, pre-commit envs),
.pre-commit-config.yaml, and CI workflow. Fixes all ruff errors
including unused imports, missing raise-from, and undefined names.
Ruff config maps existing bandit skips to equivalent S-rules.

https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-03-11 12:23:35 -04:00
Alexander Whitestone
904a7c564e feat: migrate to Agno native HITL tool confirmation flow (#158)
Replace the homebrew regex-based tool extraction and manual dispatch
(tool_executor.py) with Agno's built-in Human-In-The-Loop confirmation:

- Toolkit(requires_confirmation_tools=...) marks dangerous tools
- agent.run() returns RunOutput with status=paused when confirmation needed
- RunRequirement.confirm()/reject() + agent.continue_run() resumes execution

Dashboard and Discord vendor both use the native flow. DuckDuckGo import
isolated so its absence doesn't kill all tools. Test stubs cleaned up
(agno is a real dependency, only truly optional packages stubbed).

1384 tests pass in parallel (~14s).

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 21:54:04 -04:00
Alexander Whitestone
ae3bb1cc21 feat: code quality audit + autoresearch integration + infra hardening (#150) 2026-03-08 12:50:44 -04:00
Alexander Whitestone
7792ae745f feat: agentic loop for multi-step tasks + regression fixes (#148)
* fix: name extraction blocklist, memory preview escaping, and gitignore cleanup

- Add _NAME_BLOCKLIST to extract_user_name() to reject gerunds and UI-state
  words like "Sending" that were incorrectly captured as user names
- Collapse whitespace in get_memory_status() preview so newlines survive
  JSON serialization without showing raw \n escape sequences
- Broaden .gitignore from specific memory/self/user_profile.md to memory/self/
  and untrack memory/self/methodology.md (runtime-edited file)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: catch Ollama connection errors in session.py + add 71 smoke tests

- Wrap agent.run() in session.py with try/except so Ollama connection
  failures return a graceful fallback message instead of dumping raw
  tracebacks to Docker logs
- Add tests/test_smoke.py with 71 tests covering every GET route:
  core pages, feature pages, JSON APIs, and a parametrized no-500 sweep
  — catches import errors, template failures, and schema mismatches
  that unit tests miss

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: agentic loop for multi-step tasks + Round 10 regression fixes

Agentic loop (Parts 1-4):
- Add multi-step chaining instructions to system prompt
- New agentic_loop.py with plan→execute→adapt→summarize flow
- Register plan_and_execute tool for background task execution
- Add max_agent_steps config setting (default: 10)
- Discord fix: 300s timeout, typing indicator, send error handling
- 16 new unit + e2e tests for agentic loop

Round 10 regressions (R1-R5, P1):
- R1: Fix literal \n escape sequences in tool responses
- R2: Chat timeout/error feedback in agent panel
- R3: /hands infinite spinner → static empty states
- R4: /self-coding infinite spinner → static stats + journal
- R5: /grok/status raw JSON → HTML dashboard template
- P1: VETO confirmation dialog on task cards

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: briefing route 500 in CI when agno is MagicMock stub

_call_agent() returned a MagicMock instead of a string when agno is
stubbed in tests, causing SQLite "Error binding parameter 4" on save.
Ensure the return value is always an actual string.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: briefing route 500 in CI — graceful degradation at route level

When agno is stubbed with MagicMock in CI, agent.run() returns a
MagicMock instead of raising — so the exception handler never fires
and a MagicMock propagates as the summary to SQLite, which can't
bind it.

Fix: catch at the route level and return a fallback Briefing object.
This follows the project's graceful degradation pattern — the briefing
page always renders, even when the backend is completely unavailable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 01:46:29 -05:00
Alexander Whitestone
b615595100 refactor: centralize config & harden security (#141)
* feat: upgrade primary model from llama3.1:8b to qwen2.5:14b

- Swap OLLAMA_MODEL_PRIMARY to qwen2.5:14b for better reasoning
- llama3.1:8b-instruct becomes fallback
- Update .env default and README quick start
- Fix hardcoded model assertions in tests

qwen2.5:14b provides significantly better multi-step reasoning
and tool calling reliability while still running locally on
modest hardware. The 8B model remains as automatic fallback.

* security: centralize config, harden uploads, fix silent exceptions

- Add 9 pydantic Settings fields (skip_embeddings, disable_csrf,
  rqlite_url, brain_source, brain_db_path, csrf_cookie_secure,
  chat_api_max_body_bytes, timmy_test_mode) to centralize env-var access
- Migrate 8 os.environ.get() calls across 5 source files to use
  `from config import settings` per project convention
- Add path traversal defense-in-depth to file upload endpoint
- Add 1MB request body size limit to chat API
- Make CSRF cookie secure flag configurable via settings
- Replace 2 silent `except: pass` blocks with debug logging in session.py
- Remove unused `import os` from brain/memory.py and csrf.py
- Update 5 CSRF test fixtures to patch settings instead of os.environ

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 18:49:37 -05:00
Alexander Payne
6e6b4355bb fix: calculator tool, markdown rendering, prompt guardrails, briefing notification
- Add sandboxed calculator tool to Timmy's toolkit so arithmetic questions
  get exact answers instead of LLM hallucinations
- Update system prompts (lite + full) to instruct Timmy to always use the
  calculator and never attempt multi-digit math in his head
- Add self-contradiction guard to both prompts ("commit to your facts")
- Render Timmy's chat responses as markdown via marked.js + DOMPurify
  instead of raw escaped text
- Suppress empty briefing notification on startup when there are 0
  pending approval items
- Add calculator to session response sanitizer regex
- 18 new calculator tests, 2 updated briefing notification tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 09:35:59 -05:00
Alexander Payne
26e1691099 Fix Timmy coherence: persistent session, model-aware tools, response sanitization
Timmy was exhibiting severe incoherence (no memory between messages, tool call
leakage, chain-of-thought narration, random tool invocations) due to creating
a brand new agent per HTTP request and giving a 3B model (llama3.2) a 73-line
system prompt with complex tool-calling instructions it couldn't follow.

Key changes:
- Add session.py singleton with stable session_id for conversation continuity
- Add _model_supports_tools() to strip tools from small models (< 7B)
- Add two-tier prompts: lite (12 lines) for small models, full for capable ones
- Add response sanitizer to strip leaked JSON tool calls and CoT narration
- Set show_tool_calls=False to prevent raw tool JSON in output
- Wire ConversationManager for user name extraction
- Deprecate orphaned memory_layers.py (unused 4-layer system)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 19:18:08 -05:00