Resolved merge conflicts in agents.py and test_task_queue.py:
- Keep full chat-to-task pipeline (agent/priority extraction, question
filtering, context injection) over simpler main version
- Incorporate test_briefing_task_queue_summary from main
- All 64 task queue tests pass
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addresses 14 bugs from 3 rounds of deep chat evaluation:
- Add chat-to-task pipeline in agents.py with regex-based intent detection,
agent extraction, priority extraction, and title cleaning
- Filter meta-questions ("how do I create a task?") from task creation
- Inject real-time date/time context into every chat message
- Inject live queue state when user asks about tasks
- Ground system prompts with agent roster, honesty guardrails, self-knowledge,
math delegation template, anti-filler rules, values-conflict guidance
- Add CSS for markdown code blocks, inline code, lists, blockquotes in chat
- Add highlight.js CDN for syntax highlighting in chat responses
- Reduce small-model memory context budget (4000→2000) for expanded prompt
- Add 27 comprehensive tests covering the full chat-to-task pipeline
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Chat messages like "add X to the queue" or "create a task" are
intercepted and create a task_queue entry with pending_approval
status instead of going through to the LLM
- Briefing engine now gathers task queue stats (pending, running,
completed, failed) and includes them in the morning briefing prompt
- 7 new tests covering detection patterns, chat integration, and
briefing summary
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add sandboxed calculator tool to Timmy's toolkit so arithmetic questions
get exact answers instead of LLM hallucinations
- Update system prompts (lite + full) to instruct Timmy to always use the
calculator and never attempt multi-digit math in his head
- Add self-contradiction guard to both prompts ("commit to your facts")
- Render Timmy's chat responses as markdown via marked.js + DOMPurify
instead of raw escaped text
- Suppress empty briefing notification on startup when there are 0
pending approval items
- Add calculator to session response sanitizer regex
- 18 new calculator tests, 2 updated briefing notification tests
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove show_tool_calls kwarg (not in Agno 2.5.3), which crashed Agent.__init__
- Guard memory_search against top_k=None from model, return formatted string
- Skip Telegram/Discord startup silently when no token configured
- Replace placeholder MEMORY.md with proper structured hot memory document
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Timmy was exhibiting severe incoherence (no memory between messages, tool call
leakage, chain-of-thought narration, random tool invocations) due to creating
a brand new agent per HTTP request and giving a 3B model (llama3.2) a 73-line
system prompt with complex tool-calling instructions it couldn't follow.
Key changes:
- Add session.py singleton with stable session_id for conversation continuity
- Add _model_supports_tools() to strip tools from small models (< 7B)
- Add two-tier prompts: lite (12 lines) for small models, full for capable ones
- Add response sanitizer to strip leaked JSON tool calls and CoT narration
- Set show_tool_calls=False to prevent raw tool JSON in output
- Wire ConversationManager for user name extraction
- Deprecate orphaned memory_layers.py (unused 4-layer system)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit replaces the previous memory_layers.py with a proper three-tier
memory system as specified by the user:
## Tier 1 — Hot Memory (MEMORY.md)
- Single flat file always loaded into system context
- Contains: current status, standing rules, agent roster, key decisions
- ~300 lines max, pruned monthly
- Managed by HotMemory class
## Tier 2 — Structured Vault (memory/)
- Directory with three namespaces:
• self/ — identity.md, user_profile.md, methodology.md
• notes/ — session logs, AARs, research
• aar/ — post-task retrospectives
- Markdown format, Obsidian-compatible
- Append-only, date-stamped
- Managed by VaultMemory class
## Handoff Protocol
- last-session-handoff.md written at session end
- Contains: summary, key decisions, open items, next steps
- Auto-loaded at next session start
- Maintains continuity across resets
## Implementation
### New Files:
- src/timmy/memory_system.py — Core memory system
- MEMORY.md — Hot memory template
- memory/self/*.md — Identity, user profile, methodology
### Modified:
- src/timmy/agent.py — Integrated with memory system
- create_timmy() injects memory context
- TimmyWithMemory class with automatic fact extraction
- tests/test_agent.py — Updated for memory context
## Key Principles
- Hot memory = small and curated
- Vault = append-only, never delete
- Handoffs = continuity mechanism
- Flat files = human-readable, portable
## Usage
All 973 tests pass.
## Security (Workset A)
- XSS: Verified templates use safe DOM methods (textContent, createElement)
- Secrets: Fail-fast in production mode when L402 secrets not set
- Environment mode: Add TIMMY_ENV (development|production) validation
## Privacy (Workset C)
- Add telemetry_enabled config (default: False for sovereign AI)
- Pass telemetry setting to Agno Agent
- Update .env.example with TELEMETRY_ENABLED and TIMMY_ENV docs
## Agent Intelligence (Workset D)
- Enhanced TIMMY_SYSTEM_PROMPT with:
- Tool usage guidelines (when to use, when not to)
- Memory awareness documentation
- Operating mode documentation
- Help reduce unnecessary tool calls for simple queries
All 895 tests pass.
Telemetry disabled by default aligns with sovereign AI vision.
- Change Toolkit.add_tool() to Toolkit.register() (method was renamed in Agno)
- Fix PythonTools method: python -> run_python_code
- Fix FileTools method: write_file -> save_file
- Fix FileTools base_dir parameter: str -> Path object
- Fix Agent tools parameter: pass Toolkit wrapped in list
These fixes resolve critical startup errors that prevented Timmy agent from initializing:
- AttributeError: 'Toolkit' object has no attribute 'add_tool'
- AttributeError: 'PythonTools' object has no attribute 'python'
- TypeError: 'Toolkit' object is not iterable
All 895 tests pass after these changes.
Quality review: Agent now fully functional with working inference, memory,
and self-awareness capabilities.
Adds 3 new personas (Pixel, Lyra, Reel) and 5 new tool modules:
- Git/DevOps tools (GitPython): clone, status, diff, log, blame, branch,
add, commit, push, pull, stash — wired to Forge and Helm personas
- Image generation (FLUX via diffusers): text-to-image, storyboards,
variations — Pixel persona
- Music generation (ACE-Step 1.5): full songs with vocals+instrumentals,
instrumental tracks, vocal-only tracks — Lyra persona
- Video generation (Wan 2.1 via diffusers): text-to-video, image-to-video
clips — Reel persona
- Creative Director pipeline: multi-step orchestration that chains
storyboard → music → video → assembly into 3+ minute final videos
- Video assembler (MoviePy + FFmpeg): stitch clips, overlay audio,
title cards, subtitles, final export
Also includes:
- Spark Intelligence tool-level + creative pipeline event capture
- Creative Studio dashboard page (/creative/ui) with 4 tabs
- Config settings for all new models and output directories
- pyproject.toml creative optional extra for GPU dependencies
- 107 new tests covering all modules (624 total, all passing)
https://claude.ai/code/session_01KJm6jQkNi3aA3yoQJn636c
Add infrastructure for running swarm agents as isolated Docker
containers with HTTP-based coordination, startup recovery, and
enhanced dashboard UI for agent management.
- Dockerfile and docker-compose.yml for multi-service orchestration
- DockerAgentRunner for programmatic container lifecycle management
- Internal HTTP API for container agents to poll tasks and submit bids
- Startup recovery system to reconcile orphaned tasks and stale agents
- Enhanced UI partials for agent panels, chat, and task assignment
- Timmy docker entry point with heartbeat and task polling
- New Makefile targets for Docker workflows
- Tests for swarm recovery
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements the Morning Briefing and Approval Queue feature — the first step
from tool to companion. Timmy now shows up before the owner asks.
New modules
-----------
• src/timmy/approvals.py — ApprovalItem dataclass, GOLDEN_TIMMY governance
constant, full SQLite CRUD (create / list / approve / reject / expire).
Items auto-expire after 7 days if not actioned.
• src/timmy/briefing.py — BriefingEngine that queries swarm activity and
chat history, calls Timmy's Agno agent for a prose summary, and caches
the result in SQLite (~/.timmy/briefings.db). get_or_generate() skips
regeneration if a fresh briefing (< 30 min) already exists.
New routes (src/dashboard/routes/briefing.py)
----------------------------------------------
GET /briefing — full briefing page
GET /briefing/approvals — HTMX partial: pending approval cards
POST /briefing/approvals/{id}/approve — approve via HTMX (no page reload)
POST /briefing/approvals/{id}/reject — reject via HTMX (no page reload)
New templates
-------------
• briefing.html — clean, mobile-first prose layout (max 680px)
• partials/approval_cards.html — list of approval cards
• partials/approval_card_single.html — single approval card with
Approve/Reject HTMX buttons
App wiring (src/dashboard/app.py)
----------------------------------
• Added asynccontextmanager lifespan with _briefing_scheduler background task.
Generates a briefing at startup and every 6 hours; skips if fresh.
Push notification hook (src/notifications/push.py)
---------------------------------------------------
• notify_briefing_ready(briefing) — logs + triggers local notifier.
Placeholder for APNs/Pushover wiring later.
Navigation
----------
• Added BRIEFING link to the header nav in base.html.
Tests
-----
• tests/test_approvals.py — 17 tests: GOLDEN_TIMMY, CRUD, expiry, ordering
• tests/test_briefing.py — 22 tests: dataclass, freshness, cache round-trip,
generate/get_or_generate, push notification hook
354 tests, 354 passing.
https://claude.ai/code/session_01D7p5w91KX3grBeioGiiGy8
TIMMY_STATUS_PROMPT was defined in timmy/prompts.py and covered by
tests, but never wired into the application. The CLI status command
was passing a hardcoded inline string instead. Replace the inline
string with the canonical prompt and add two CLI tests that verify
the correct prompt is used.
https://claude.ai/code/session_01DMjQ5qMZ8iHeyix1j3GS7c
Config (src/config.py):
- pydantic-settings Settings class: OLLAMA_URL, OLLAMA_MODEL, DEBUG
- Reads from .env (gitignored) with sane defaults
- settings singleton imported by health.py and agent.py
Removes two hardcodes:
- health.py: OLLAMA_URL="http://localhost:11434" → settings.ollama_url
- agent.py: Ollama(id="llama3.2") → settings.ollama_model
app.py:
- logging.basicConfig at INFO — requests/errors now visible in terminal
- docs_url/redoc_url gated on settings.debug (off by default)
pyproject.toml:
- pydantic-settings>=2.0.0 added to main dependencies
- hatch wheel config updated to include src/config.py
.env.example: documents all three env vars with inline comments
.gitignore: add !.env.example negation so the template gets committed
.github/workflows/tests.yml: runs pytest --cov on every push/PR
(ubuntu-latest, Python 3.11, pip cache)
All 27 tests pass.
https://claude.ai/code/session_01M4L3R98N5fgXFZRvV8X9b6