Commit Graph

101 Commits

Author SHA1 Message Date
f8f3b9b81f feat: inject session_id into system prompt for session identity awareness
Some checks failed
Tests / lint (pull_request) Successful in 4s
Tests / test (pull_request) Failing after 58s
Timmy can now introspect which session he's running in (cli, dashboard, loop).
- Add {session_id} placeholder to both lite and full system prompts
- get_system_prompt() accepts session_id param (default: 'unknown')
- create_timmy() accepts session_id param, forwards to prompt
- CLI chat/think/status pass their session_id to create_timmy()
- session.py passes _DEFAULT_SESSION_ID to create_timmy()
- 7 new tests in test_session_identity.py
- Updated 2 existing CLI test mocks

Closes #64
2026-03-14 19:43:11 -04:00
4b553fa0ed Merge pull request 'fix: word-boundary routing + debug route command (#31)' (#102) from fix/routing-patterns into main
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / lint (pull_request) Successful in 2s
Tests / test (pull_request) Successful in 42s
2026-03-14 19:24:16 -04:00
67497133fd fix: word-boundary routing + debug route command (#31)
All checks were successful
Tests / lint (pull_request) Successful in 2s
Tests / test (pull_request) Successful in 42s
- Replace substring matching with word-boundary regex in route_request()
- "fix the bug" now correctly routes to coder
- Multi-word patterns match if all words appear (any order)
- Add "timmy route" CLI command for debugging routing
- Add route_request_with_match() for pattern visibility
- Expand routing keywords in agents.yaml
- 22 new routing tests, all passing
2026-03-14 19:21:30 -04:00
c1ec43c59f Merge pull request '[loop-cycle-8] fix: replace 59 bare except clauses with proper logging (#25)' (#99) from fix/bare-except-clauses into main
All checks were successful
Tests / lint (push) Successful in 4s
Tests / test (push) Successful in 44s
2026-03-14 19:08:40 -04:00
fdc5b861ca fix: replace 59 bare except clauses with proper logging (#25)
All checks were successful
Tests / lint (pull_request) Successful in 3s
Tests / test (pull_request) Successful in 44s
All `except Exception:` now catch as `except Exception as exc:` with
appropriate logging (warning for critical paths, debug for graceful degradation).

Added logger setup to 4 files that lacked it:
- src/timmy/memory/vector_store.py
- src/dashboard/middleware/csrf.py
- src/dashboard/middleware/security_headers.py
- src/spark/memory.py

31 files changed across timmy core, dashboard, infrastructure, integrations.
Zero bare excepts remain. 1340 tests passing.
2026-03-14 19:07:14 -04:00
9c59b386d8 feat: add OLLAMA_NUM_CTX config to cap context window (#83)
All checks were successful
Tests / lint (pull_request) Successful in 3s
Tests / test (pull_request) Successful in 43s
- Add ollama_num_ctx setting (default 4096) to config.py
- Pass num_ctx option to Ollama in agent.py and agents/base.py
- Add OLLAMA_NUM_CTX to .env.example with usage docs
- Add context_window note in providers.yaml
- Fix mock_settings in test_agent.py for new attribute
- qwen3:30b with 4096 ctx uses ~19GB vs 45GB default
2026-03-14 18:54:43 -04:00
bce6e7d030 fix: log Ollama disconnections with specific error handling (#92)
All checks were successful
Tests / lint (pull_request) Successful in 5s
Tests / test (pull_request) Successful in 1m4s
- BaseAgent.run(): catch httpx.ConnectError/ReadError/ConnectionError,
  log 'Ollama disconnected: <error>' at ERROR level, then re-raise
- session.py: distinguish Ollama disconnects from other errors in
  chat(), chat_with_tools(), continue_chat() — return specific message
  'Ollama appears to be disconnected' instead of generic error
- 11 new tests covering all disconnect paths
2026-03-14 18:40:15 -04:00
8a14bbb3e0 Merge pull request '[loop-cycle-5] fix: warmup model on cold load (#82)' (#95) from fix/warmup-cold-model into main
All checks were successful
Tests / lint (push) Successful in 3s
Tests / test (push) Successful in 56s
2026-03-14 18:26:48 -04:00
86956bd057 fix: warmup model on cold load to prevent first-request disconnect
Some checks failed
Tests / lint (pull_request) Successful in 3s
Tests / test (pull_request) Failing after 16s
Add _warmup_model() that sends a minimal generation request (1 token)
before returning the Agent. 60s timeout handles cold VRAM loads.
Warns but does not abort if warmup fails.

Closes #82
2026-03-14 18:24:00 -04:00
b3a1e0ce36 fix: prune dead web_search tool — ddgs never installed (#87)
Some checks failed
Tests / lint (pull_request) Successful in 5s
Tests / test (pull_request) Failing after 17s
Remove DuckDuckGoTools import, all web_search registrations across 4 toolkit
factories, catalog entry, safety classification, prompt references, and
session regex. Total: -41 lines of dead code.

consult_grok is functional (grok_enabled=True, API key set) and opt-in,
so it stays — but Timmy never calls it autonomously, which is correct
sovereign behavior (no cloud calls unless user permits).

Closes #87
2026-03-14 18:13:51 -04:00
7132b42ff3 fix: model introspection uses exact match, queries /api/ps first
Some checks failed
Tests / lint (pull_request) Successful in 3s
Tests / test (pull_request) Failing after 13s
_get_ollama_model() used prefix match (startswith) on /api/tags,
causing qwen3:30b to match qwen3.5:latest. Now:
1. Queries /api/ps (loaded models) first — most accurate
2. Falls back to /api/tags with exact name match
3. Reports actual running model, not just configured one

Updated test_get_system_info_contains_model to not assume model==config.

Fixes #77. 5 regression tests added.
2026-03-14 18:03:59 -04:00
74e426c63b [loop-cycle-2] fix: suppress confirmation tool WARNING spam (#79) (#89)
Some checks failed
Tests / test (push) Has been cancelled
Tests / lint (push) Has been cancelled
2026-03-14 17:54:58 -04:00
09fcf956ec Merge pull request '[loop-cycle-1] feat: tool allowlist for autonomous operation (#69)' (#88) from fix/tool-allowlist-autonomous into main
Some checks failed
Tests / lint (push) Successful in 3s
Tests / test (push) Failing after 13s
2026-03-14 17:41:56 -04:00
d28e2f4a7e [loop-cycle-1] feat: tool allowlist for autonomous operation (#69)
Some checks failed
Tests / lint (pull_request) Successful in 4s
Tests / test (pull_request) Failing after 13s
Add config/allowlist.yaml — YAML-driven gate that auto-approves bounded
tool calls when no human is present.

When Timmy runs with --autonomous or stdin is not a terminal, tool calls
are checked against allowlist: matched → auto-approved, else → rejected.

Changes:
  - config/allowlist.yaml: shell prefixes, deny patterns, path rules
  - tool_safety.py: is_allowlisted() checks tools against YAML rules
  - cli.py: --autonomous flag, _is_interactive() detection
  - 44 new allowlist tests, 8 updated CLI tests

Closes #69
2026-03-14 17:39:48 -04:00
94cd1a9840 fix: make model fallback chains configurable (#53)
Some checks failed
Tests / lint (pull_request) Successful in 4s
Tests / test (pull_request) Failing after 13s
Move hardcoded model fallback lists from module-level constants into
settings.fallback_models and settings.vision_fallback_models (pydantic
Settings fields). Can now be overridden via env vars
FALLBACK_MODELS / VISION_FALLBACK_MODELS or config/providers.yaml.

Removed:
- OLLAMA_MODEL_PRIMARY / OLLAMA_MODEL_FALLBACK from config.py
- DEFAULT_MODEL_FALLBACKS / VISION_MODEL_FALLBACKS from agent.py

get_effective_ollama_model() and _resolve_model_with_fallback() now
walk the configurable chains instead of hardcoded constants.

5 new tests guard the configurable behavior and prevent regression
to hardcoded constants.
2026-03-14 17:26:47 -04:00
061c8f6628 fix: brevity tuning — plain text prompts, markdown=False, front-loaded brevity
Some checks failed
Tests / lint (pull_request) Successful in 15s
Tests / test (pull_request) Failing after 24s
Closes #71: Timmy was responding with elaborate markdown formatting
(tables, headers, emoji, bullet lists) for simple questions.

Root causes fixed:
1. Agno Agent markdown=True flag explicitly told the model to format
   responses as markdown. Set to False in both agent.py and agents/base.py.
2. SYSTEM_PROMPT_FULL used ## and ### markdown headers, bold (**), and
   numbered lists — teaching by example that markdown is expected.
   Rewritten to plain text with labeled sections.
3. Brevity instructions were buried at the bottom of the full prompt.
   Moved to immediately after the opening line as 'VOICE AND BREVITY'
   with explicit override priority.
4. Orchestrator prompt in agents.yaml was silent on response style.
   Added 'Voice: brief, plain, direct' with concrete examples.

The full prompt is now 41 lines shorter (124 → 83). The prompt itself
practices the brevity it preaches.

SOUL.md alignment:
- 'Brevity is a kindness' — now front-loaded in both base and agent prompt
- 'I do not fill silence with noise' — explicit in both tiers
- 'I speak plainly. I prefer short sentences.' — structural enforcement

4 new tests guard against regression:
- test_full_prompt_brevity_first: brevity section before tools/memory
- test_full_prompt_no_markdown_headers: no ## or ### in prompt text
- test_full_prompt_plain_text_brevity: 'plain text' instruction present
- test_lite_prompt_brevity: lite tier also instructs brevity
2026-03-14 17:15:56 -04:00
rockachopa
927e25cc40 Merge pull request 'fix: replace print() with proper logging (#29, #51)' (#59) from fix/print-to-logging into main
Some checks failed
Tests / lint (pull_request) Successful in 3s
Tests / test (pull_request) Failing after 18s
2026-03-14 16:50:04 -04:00
64fd1d9829 voice: reinforce brevity at top of system prompt
Some checks failed
Tests / lint (push) Successful in 3s
Tests / test (push) Failing after 14s
2026-03-14 16:32:47 -04:00
b30b5c6b57 [loop-cycle-6] Break thinking rumination loop — semantic dedup (#38)
Some checks failed
Tests / lint (push) Successful in 3s
Tests / test (push) Failing after 25s
Add post-generation similarity check to ThinkingEngine.think_once().

Problem: Timmy's thinking engine generates repetitive thoughts because
small local models ignore 'don't repeat' instructions in the prompt.
The same observation ('still no chat messages', 'Alexander's name is in
profile') would appear 14+ times in a single day's journal.

Fix: After generating a thought, compare it against the last 5 thoughts
using SequenceMatcher. If similarity >= 0.6, retry with a new seed up to
2 times. If all retries produce repetitive content, discard rather than
store. Uses stdlib difflib — no new dependencies.

Changes:
- thinking.py: Add _is_too_similar() method with SequenceMatcher
- thinking.py: Wrap generation in retry loop with dedup check
- test_thinking.py: 7 new tests covering exact match, near match,
  different thoughts, retry behavior, and max-retry discard

+96/-20 lines in thinking.py, +87 lines in tests.
2026-03-14 16:21:16 -04:00
f426df5b42 feat: add --session-id option to timmy chat CLI
Some checks failed
Tests / lint (pull_request) Successful in 3s
Tests / test (pull_request) Failing after 15s
Allows specifying a named session for conversation persistence.
Use cases:
- Autonomous loops can have their own session (e.g. --session-id loop)
- Multiple users/agents can maintain separate conversations
- Testing different conversation threads without polluting the default

Precedence: --session-id > --new > default 'cli' session
2026-03-14 16:05:00 -04:00
70d5dc5ce1 fix: replace eval() with AST-walking safe evaluator in calculator
Some checks failed
Tests / lint (push) Successful in 3s
Tests / test (push) Failing after 14s
Fixes #52

- Replace eval() in calculator() with _safe_eval() that walks the AST
  and only permits: numeric constants, arithmetic ops (+,-,*,/,//,%,**),
  unary +/-, math module access, and whitelisted builtins (abs, round,
  min, max)
- Reject all other syntax: imports, attribute access on non-math objects,
  lambdas, comprehensions, string literals, etc.
- Add 39 tests covering arithmetic, precedence, math functions,
  allowed builtins, error handling, and 14 injection prevention cases
2026-03-14 15:51:35 -04:00
db129bbe16 fix: replace print() with proper logging (#29, #51)
Some checks failed
Tests / lint (pull_request) Successful in 3s
Tests / test (pull_request) Failing after 14s
2026-03-14 15:07:07 -04:00
fa838b0063 fix: clean shutdown — silence MCP async-generator teardown noise
Some checks failed
Tests / lint (push) Successful in 2s
Tests / test (push) Failing after 13s
Swallow anyio cancel-scope RuntimeError and BaseExceptionGroup
from MCP stdio_client generators during GC on voice loop exit.
Custom unraisablehook + loop exception handler + warnings filter.
2026-03-14 14:12:05 -04:00
782218aa2c fix: voice loop — persistent event loop, markdown stripping, MCP noise
Some checks failed
Tests / lint (push) Successful in 3s
Tests / test (push) Failing after 12s
Three fixes from real-world testing:

1. Event loop: replaced asyncio.run() with a persistent loop so
   Agno's MCP sessions survive across conversation turns. No more
   'Event loop is closed' errors on turn 2+.

2. Markdown stripping: voice preamble tells Timmy to respond in
   natural spoken language, plus _strip_markdown() as a safety net
   removes **bold**, *italic*, bullets, headers, code fences, etc.
   TTS no longer reads 'asterisk asterisk'.

3. MCP noise: _suppress_mcp_noise() quiets mcp/agno/httpx loggers
   during voice mode so the terminal shows clean transcript only.

32 tests (12 new for markdown stripping + persistent loop).
2026-03-14 14:05:24 -04:00
dbadfc425d feat: sovereign voice loop — timmy voice command
Some checks failed
Tests / lint (push) Successful in 4s
Tests / test (push) Failing after 14s
Adds fully local listen-think-speak voice interface.
STT: Whisper, LLM: Ollama, TTS: Piper. No cloud, no network.

- src/timmy/voice_loop.py: VoiceLoop with VAD, Whisper, Piper
- src/timmy/cli.py: new voice command
- pyproject.toml: voice extras updated
- 20 new tests
2026-03-14 13:58:56 -04:00
8ecc0b1780 fix: fact distillation — block garbage and secrets, improve dedup
All checks were successful
Tests / lint (pull_request) Successful in 5s
Tests / test (pull_request) Successful in 33s
- Rewrite distillation prompt with explicit GOOD/BAD examples
  Good: user preferences, project decisions, learned knowledge
  Bad: meta-observations, internal state, credentials
- Add security filter: block facts containing token/password/secret/key patterns
- Add meta-observation filter: block self-referential 'my thinking' facts
- Lower dedup threshold 0.9 -> 0.75 to catch paraphrased duplicates

Ref #40
2026-03-14 13:00:30 -04:00
b222b28856 fix: use persistent event loop in interview command
All checks were successful
Tests / lint (pull_request) Successful in 3s
Tests / test (pull_request) Successful in 32s
Replace repeated asyncio.run() calls with a single event loop that
persists across all interview questions. The old approach created and
destroyed loops per question, orphaning MCP stdio transports and
causing 'Event loop is closed' errors on ~50% of questions.

Also adds clean shutdown: closes MCP sessions before closing the loop.

Ref #36
2026-03-14 12:58:11 -04:00
58ddf55282 fix: regex corruption in update_user_profile + hot memory write guards
All checks were successful
Tests / lint (pull_request) Successful in 5s
Tests / test (pull_request) Successful in 36s
- memory_system.py: fix regex replacement in update_user_profile()
  Used lambda instead of raw replacement string to prevent corruption
- memory_system.py: add guards to update_section() for empty/oversized writes

Ref #39
2026-03-14 12:55:02 -04:00
2f623826bd cleanup: delete dead modules — ~7,900 lines removed
All checks were successful
Tests / lint (pull_request) Successful in 5s
Tests / test (pull_request) Successful in 1m30s
Closes #22, Closes #23

Deleted: brain/, swarm/, openfang/, paperclip/, cascade_adapter,
memory_migrate, agents/timmy.py, dead routes + all corresponding tests.

Updated pyproject.toml, app.py, loop_qa.py for removed imports.
2026-03-14 09:49:24 -04:00
Trip T
78167675f2 feat: replace custom Gitea client with MCP servers
All checks were successful
Tests / lint (pull_request) Successful in 3s
Tests / test (pull_request) Successful in 29s
Replace the bespoke GiteaHand httpx client and tools_gitea.py wrappers
with official MCP tool servers (gitea-mcp + filesystem MCP), wired into
Agno via MCPTools. Switch all session functions to async (arun/acontinue_run)
so MCP tools auto-connect. Delete ~1070 lines of custom Gitea code.

- Create src/timmy/mcp_tools.py with MCP factories + standalone issue bridge
- Wire MCPTools into agent.py tool list (Gitea + filesystem)
- Switch session.py chat/chat_with_tools/continue_chat to async
- Update all callers (dashboard routes, Discord vendor, CLI, thinking engine)
- Add gitea_token fallback from ~/.config/gitea/token
- Add MCP session cleanup to app shutdown hook
- Update tool_safety.py for MCP tool names
- 11 new tests, all 1417 passing, coverage 74.2%

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 21:40:32 -04:00
Trip T
41d6ebaf6a feat: CLI session persistence + tool confirmation gate
All checks were successful
Tests / lint (pull_request) Successful in 5s
Tests / test (pull_request) Successful in 31s
- Chat sessions persist across `timmy chat` invocations via Agno SQLite
  (session_id="cli"), fixing context amnesia between turns
- Dangerous tools (shell, write_file, etc.) now prompt for approval in CLI
  instead of silently exiting — uses typer.confirm() + Agno continue_run
- --new flag starts a fresh conversation when needed
- Improved _maybe_file_issues prompt for engineer-quality issue bodies
  (what's happening, expected behavior, suggested fix, acceptance criteria)
- think/status commands also pass session_id for continuity

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:55:56 -04:00
Trip T
7163b15300 feat: add Gitea issue creation — Timmy's self-improvement channel
All checks were successful
Tests / lint (pull_request) Successful in 4s
Tests / test (pull_request) Successful in 31s
Give Timmy the ability to file Gitea issues when he notices bugs,
stale state, or improvement opportunities in his own codebase.

Components:
- GiteaHand async API client (infrastructure/hands/gitea.py)
  - Token auth with ~/.config/gitea/token fallback
  - Create/list/close issues, dedup by title similarity
  - Graceful degradation when Gitea unreachable
- Tool functions (timmy/tools_gitea.py)
  - create_gitea_issue: file issues with dedup + work order bridge
  - list_gitea_issues: check existing backlog
  - Classified as SAFE (no confirmation needed)
- Thinking post-hook (_maybe_file_issues in thinking.py)
  - Every 20 thoughts, LLM classifies recent thoughts for actionable items
  - Auto-files bugs/improvements to Gitea with dedup
  - Bridges to local work order system for dashboard tracking
- Config: gitea_url, gitea_token, gitea_repo, gitea_enabled,
  gitea_timeout, thinking_issue_every

All 1426 tests pass, 74.17% coverage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 18:36:06 -04:00
Trip T
b2f12ca97c feat: consolidate memory into unified memory.db with 4-type model
All checks were successful
Tests / lint (pull_request) Successful in 3s
Tests / test (pull_request) Successful in 30s
Consolidates 3 separate memory databases (semantic_memory.db, swarm.db
memory_entries, brain.db) into a single data/memory.db with facts,
chunks, and episodes tables.

Key changes:
- Add unified schema (timmy/memory/unified.py) with 3 core tables
- Redirect vector_store.py and semantic_memory.py to memory.db
- Add thought distillation: every Nth thought extracts lasting facts
- Enrich agent context with known facts in system prompt
- Add memory_forget tool for removing outdated memories
- Unify embeddings: vector_store delegates to semantic_memory.embed_text
- Bridge spark events to unified event log
- Add pruning for thoughts and events with configurable retention
- Add data migration script (timmy/memory_migrate.py)
- Deprecate brain.memory in favor of unified system

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 11:23:18 -04:00
Trip T
d42c574d26 feat: add Loop QA self-testing framework
All checks were successful
Tests / lint (pull_request) Successful in 3s
Tests / test (pull_request) Successful in 29s
Structured self-test framework that probes 6 capabilities (tool use,
multistep planning, memory read/write, self-coding, lightning econ) in
round-robin. Reuses existing infra: event_log for persistence,
create_task() for upgrade proposals, capture_error() for crash handling,
and in-memory circuit breaker for failure tracking.

- src/timmy/loop_qa.py: Capability enum, 6 async probes, orchestrator
- src/dashboard/routes/loop_qa.py: JSON + HTMX health endpoints
- HTMX partial polls every 30s on the health panel
- Background scheduler in app.py lifespan
- 25 tests covering probes, orchestrator, health snapshot, routes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 22:33:16 -04:00
Trip T
f1e909b1e3 feat: enrich thinking engine — anti-loop, anti-confabulation, grounding
All checks were successful
Tests / lint (pull_request) Successful in 3s
Tests / test (pull_request) Successful in 30s
Rewrite _THINKING_PROMPT with strict rules: 2-3 sentence limit,
anti-confabulation (only reference real data), anti-repetition.

- Add _pick_seed_type() with recent-type dedup (excludes last 3)
- Add _gather_system_snapshot() for real-time grounding (time, thought
  count, chat activity, task queue)
- Improve _build_continuity_context() with anti-repetition header and
  100-char truncation
- Fix journal + memory timestamps to include local timezone
- 12 new TDD tests covering all improvements

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 21:47:28 -04:00
Trip T
f8dadeec59 feat: tick prompt arg + fix name extraction learning verbs as names
All checks were successful
Tests / lint (pull_request) Successful in 4s
Tests / test (pull_request) Successful in 36s
Add optional prompt argument to `timmy tick` so custom journal
prompts can be passed from the CLI (seed_type="prompted").

Fix extract_user_name() learning verbs as names (e.g. "Serving").
Now requires the candidate word to start with a capital letter in
the original message, rejects common verb suffixes (-ing, -tion,
etc.), and deduplicates the naive regex in TimmyWithMemory to use
the fixed ConversationManager.extract_user_name() instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 21:11:53 -04:00
Trip T
6a7875e05f feat: heartbeat memory hooks — pre-recall and post-update
All checks were successful
Tests / lint (pull_request) Successful in 4s
Tests / test (pull_request) Successful in 32s
Wire MEMORY.md + soul.md into the thinking loop so each heartbeat
is grounded in identity and recent context, breaking repetitive loops.

Pre-hook: _load_memory_context() reads hot memory first (changes each
cycle) then soul.md (stable identity), truncated to 1500 chars.

Post-hook: _update_memory() writes a "Last Reflection" section to
MEMORY.md after each thought so the next cycle has fresh context.

soul.md is read-only from the heartbeat — never modified by it.
All hooks degrade gracefully and never crash the heartbeat.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 20:54:13 -04:00
Trip T
f6a6c0f62e feat: upgrade to qwen3.5, self-hosted Gitea CI, optimize Docker image
All checks were successful
Tests / lint (pull_request) Successful in 2s
Tests / test (pull_request) Successful in 32s
Model upgrade:
- qwen2.5:14b → qwen3.5:latest across config, tools, and docs
- Added qwen3.5 to multimodal model registry

Self-hosted Gitea CI:
- .gitea/workflows/tests.yml: lint + test jobs via act_runner
- Unified Dockerfile: pre-baked deps from poetry.lock for fast CI
- sitepackages=true in tox for ~2s dep resolution (was ~40s)
- OLLAMA_URL set to dead port in CI to prevent real LLM calls

Test isolation fixes:
- Smoke test fixture mocks create_timmy (was hitting real Ollama)
- WebSocket sends initial_state before joining broadcast pool (race fix)
- Tests use settings.ollama_model/url instead of hardcoded values
- skip_ci marker for Ollama-dependent tests, excluded in CI tox envs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 18:36:42 -04:00
Alexander Whitestone
9d78eb31d1 ruff (#169)
* polish: streamline nav, extract inline styles, improve tablet UX

- Restructure desktop nav from 8+ flat links + overflow dropdown into
  5 grouped dropdowns (Core, Agents, Intel, System, More) matching
  the mobile menu structure to reduce decision fatigue
- Extract all inline styles from mission_control.html and base.html
  notification elements into mission-control.css with semantic classes
- Replace JS-built innerHTML with secure DOM construction in
  notification loader and chat history
- Add CONNECTING state to connection indicator (amber) instead of
  showing OFFLINE before WebSocket connects
- Add tablet breakpoint (1024px) with larger touch targets for
  Apple Pencil / stylus use and safe-area padding for iPad toolbar
- Add active-link highlighting in desktop dropdown menus
- Rename "Mission Control" page title to "System Overview" to
  disambiguate from the chat home page
- Add "Home — Timmy Time" page title to index.html

https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h

* fix(security): move auth-gate credentials to environment variables

Hardcoded username, password, and HMAC secret in auth-gate.py replaced
with os.environ lookups. Startup now refuses to run if any variable is
unset. Added AUTH_GATE_SECRET/USER/PASS to .env.example.

https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h

* refactor(tooling): migrate from black+isort+bandit to ruff

Replace three separate linting/formatting tools with a single ruff
invocation. Updates tox.ini (lint, format, pre-push, pre-commit envs),
.pre-commit-config.yaml, and CI workflow. Fixes all ruff errors
including unused imports, missing raise-from, and undefined names.
Ruff config maps existing bandit skips to equivalent S-rules.

https://claude.ai/code/session_015uPUoKyYa8M2UAcyk5Gt6h

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-03-11 12:23:35 -04:00
Alexander Whitestone
b028b768c9 enhance: diversify and deepen thinking engine prompts (#167)
Add sovereignty and observation seed types, expand creative metaphors,
improve swarm seeds with reflective prompts, and update the thinking
prompt to encourage grounded, specific, varied inner thoughts.

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 11:08:41 -04:00
Alexander Whitestone
c41e3e1e15 fix: clean up logging colors, reduce noise, enable Tailscale access (#166)
* fix: reserve red for real errors, reduce log noise, allow Tailscale access

- Add _ColorFormatter: red = ERROR/CRITICAL only, yellow = WARNING, green = INFO
- Override uvicorn's default colors to use our scheme
- Downgrade discord "not installed" from ERROR to WARNING (optional dep)
- Downgrade DuckDuckGo unavailable from INFO to DEBUG
- Stop discord token watcher retry loop when discord.py not installed
- Add configurable trusted_hosts setting; dev mode allows all hosts
- Exclude .claude/ from uvicorn reload watcher (worktree isolation)
- Fix pre-commit hook: use tox -e unit, bump timeout to 60s

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: auto-format with black

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: pre-commit hook auto-formats with black+isort before testing

Formatting should never block a commit — just fix it automatically.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 10:37:20 -04:00
Alexander Whitestone
07f2c1b41e fix: wire up tick engine scheduler + add journal + systemd timer (#163) 2026-03-11 08:47:57 -04:00
Alexander Whitestone
1de97619e8 fix: restore ollama as default backend to fix broken build (#161) 2026-03-10 18:17:47 -04:00
Manus AI
755b7e7658 feat: update default backend to AirLLM and optimize for Mac M3 36GB 2026-03-10 18:04:04 -04:00
Alexander Whitestone
904a7c564e feat: migrate to Agno native HITL tool confirmation flow (#158)
Replace the homebrew regex-based tool extraction and manual dispatch
(tool_executor.py) with Agno's built-in Human-In-The-Loop confirmation:

- Toolkit(requires_confirmation_tools=...) marks dangerous tools
- agent.run() returns RunOutput with status=paused when confirmation needed
- RunRequirement.confirm()/reject() + agent.continue_run() resumes execution

Dashboard and Discord vendor both use the native flow. DuckDuckGo import
isolated so its absence doesn't kill all tools. Test stubs cleaned up
(agno is a real dependency, only truly optional packages stubbed).

1384 tests pass in parallel (~14s).

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 21:54:04 -04:00
Alexander Whitestone
574031a55c fix: remove invalid show_tool_calls kwarg crashing Agent init (#157)
* fix: remove invalid show_tool_calls kwarg crashing Agent init (regression)

show_tool_calls was removed in f95c960 (Feb 26) because agno 2.5.x
doesn't accept it, then reintroduced in fd0ede0 (Mar 8) without
runtime testing — mocked tests hid the breakage.

Replace the bogus assertion with a regression guard and an allowlist
test that catches unknown kwargs before they reach production.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: auto-install git hooks, add black/isort to dev deps

- Add .githooks/ with portable pre-commit hook (macOS + Linux)
- make install now auto-activates hooks via core.hooksPath
- Add black and isort to poetry dev group (were only in CI via raw pip)
- Fix black formatting on 2 files flagged by CI
- Fix test_autoresearch_perplexity patching wrong module path

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 15:01:00 -04:00
Alexander Whitestone
82fb2417e3 feat: enable SQLite WAL mode for all databases (AGI ticket #1) (#153) 2026-03-08 16:07:02 -04:00
Alexander Whitestone
ae3bb1cc21 feat: code quality audit + autoresearch integration + infra hardening (#150) 2026-03-08 12:50:44 -04:00
Alexander Whitestone
fd0ede0d51 feat: auto-escalation system + agentic loop fixes (#149) (#149)
Wire up automatic error-to-task escalation and fix the agentic loop
stopping after the first tool call.

Auto-escalation:
- Add swarm.task_queue.models with create_task() bridge to existing
  task queue SQLite DB
- Add swarm.event_log with EventType enum, log_event(), and SQLite
  persistence + WebSocket broadcast
- Wire capture_error() into request logging middleware so unhandled
  HTTP exceptions auto-create [BUG] tasks with stack traces, git
  context, and push notifications (5-min dedup window)

Agentic loop (Round 11 Bug #1):
- Wrap agent_chat() in asyncio.to_thread() to stop blocking the
  event loop (fixes Discord heartbeat warnings)
- Enable Agno's native multi-turn tool chaining via show_tool_calls
  and tool_call_limit on the Agent config
- Strengthen multi-step continuation prompts with explicit examples

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 03:11:14 -04:00
Alexander Whitestone
7792ae745f feat: agentic loop for multi-step tasks + regression fixes (#148)
* fix: name extraction blocklist, memory preview escaping, and gitignore cleanup

- Add _NAME_BLOCKLIST to extract_user_name() to reject gerunds and UI-state
  words like "Sending" that were incorrectly captured as user names
- Collapse whitespace in get_memory_status() preview so newlines survive
  JSON serialization without showing raw \n escape sequences
- Broaden .gitignore from specific memory/self/user_profile.md to memory/self/
  and untrack memory/self/methodology.md (runtime-edited file)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: catch Ollama connection errors in session.py + add 71 smoke tests

- Wrap agent.run() in session.py with try/except so Ollama connection
  failures return a graceful fallback message instead of dumping raw
  tracebacks to Docker logs
- Add tests/test_smoke.py with 71 tests covering every GET route:
  core pages, feature pages, JSON APIs, and a parametrized no-500 sweep
  — catches import errors, template failures, and schema mismatches
  that unit tests miss

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: agentic loop for multi-step tasks + Round 10 regression fixes

Agentic loop (Parts 1-4):
- Add multi-step chaining instructions to system prompt
- New agentic_loop.py with plan→execute→adapt→summarize flow
- Register plan_and_execute tool for background task execution
- Add max_agent_steps config setting (default: 10)
- Discord fix: 300s timeout, typing indicator, send error handling
- 16 new unit + e2e tests for agentic loop

Round 10 regressions (R1-R5, P1):
- R1: Fix literal \n escape sequences in tool responses
- R2: Chat timeout/error feedback in agent panel
- R3: /hands infinite spinner → static empty states
- R4: /self-coding infinite spinner → static stats + journal
- R5: /grok/status raw JSON → HTML dashboard template
- P1: VETO confirmation dialog on task cards

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: briefing route 500 in CI when agno is MagicMock stub

_call_agent() returned a MagicMock instead of a string when agno is
stubbed in tests, causing SQLite "Error binding parameter 4" on save.
Ensure the return value is always an actual string.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: briefing route 500 in CI — graceful degradation at route level

When agno is stubbed with MagicMock in CI, agent.run() returns a
MagicMock instead of raising — so the exception handler never fires
and a MagicMock propagates as the summary to SQLite, which can't
bind it.

Fix: catch at the route level and return a fallback Briefing object.
This follows the project's graceful degradation pattern — the briefing
page always renders, even when the backend is completely unavailable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Trip T <trip@local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 01:46:29 -05:00