- Add sandboxed calculator tool to Timmy's toolkit so arithmetic questions
get exact answers instead of LLM hallucinations
- Update system prompts (lite + full) to instruct Timmy to always use the
calculator and never attempt multi-digit math in his head
- Add self-contradiction guard to both prompts ("commit to your facts")
- Render Timmy's chat responses as markdown via marked.js + DOMPurify
instead of raw escaped text
- Suppress empty briefing notification on startup when there are 0
pending approval items
- Add calculator to session response sanitizer regex
- 18 new calculator tests, 2 updated briefing notification tests
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Set overflow:hidden on mc-main to prevent page-level scrolling
- Add max-height:100% to sidebar and chat panel to contain within viewport
- Use flex-wrap:nowrap on layout row to prevent column stacking on desktop
- Move scrollChat() to hx-on::after-settle for reliable post-swap scrolling
- Use requestAnimationFrame for smooth scroll-to-bottom timing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove show_tool_calls kwarg (not in Agno 2.5.3), which crashed Agent.__init__
- Guard memory_search against top_k=None from model, return formatted string
- Skip Telegram/Discord startup silently when no token configured
- Replace placeholder MEMORY.md with proper structured hot memory document
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix recall_personal_facts() call - remove unsupported limit parameter
- Replace 4 slow E2E test files with single fast test file
- All 6 E2E tests complete in ~9 seconds (was 60+ seconds)
- Reuse browser session across tests (module-scoped fixture)
- Combine related checks into single tests
- Add HTTP-only smoke test for speed
- YAML-based provider configuration (config/providers.yaml)
- Priority-ordered provider routing
- Circuit breaker pattern for failing providers
- Health check and availability monitoring
- Metrics tracking (latency, errors, success rates)
- Support for Ollama, OpenAI, Anthropic, AirLLM providers
- Automatic failover on rate limits or errors
- REST API endpoints for monitoring and control
- 41 comprehensive tests
API Endpoints:
- POST /api/v1/router/complete - Chat completion with failover
- GET /api/v1/router/status - Provider health status
- GET /api/v1/router/metrics - Detailed metrics
- GET /api/v1/router/providers - List all providers
- POST /api/v1/router/providers/{name}/control - Enable/disable/reset
- POST /api/v1/router/health-check - Run health checks
- GET /api/v1/router/config - View configuration
Timmy was exhibiting severe incoherence (no memory between messages, tool call
leakage, chain-of-thought narration, random tool invocations) due to creating
a brand new agent per HTTP request and giving a 3B model (llama3.2) a 73-line
system prompt with complex tool-calling instructions it couldn't follow.
Key changes:
- Add session.py singleton with stable session_id for conversation continuity
- Add _model_supports_tools() to strip tools from small models (< 7B)
- Add two-tier prompts: lite (12 lines) for small models, full for capable ones
- Add response sanitizer to strip leaked JSON tool calls and CoT narration
- Set show_tool_calls=False to prevent raw tool JSON in output
- Wire ConversationManager for user name extraction
- Deprecate orphaned memory_layers.py (unused 4-layer system)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit replaces the previous memory_layers.py with a proper three-tier
memory system as specified by the user:
## Tier 1 — Hot Memory (MEMORY.md)
- Single flat file always loaded into system context
- Contains: current status, standing rules, agent roster, key decisions
- ~300 lines max, pruned monthly
- Managed by HotMemory class
## Tier 2 — Structured Vault (memory/)
- Directory with three namespaces:
• self/ — identity.md, user_profile.md, methodology.md
• notes/ — session logs, AARs, research
• aar/ — post-task retrospectives
- Markdown format, Obsidian-compatible
- Append-only, date-stamped
- Managed by VaultMemory class
## Handoff Protocol
- last-session-handoff.md written at session end
- Contains: summary, key decisions, open items, next steps
- Auto-loaded at next session start
- Maintains continuity across resets
## Implementation
### New Files:
- src/timmy/memory_system.py — Core memory system
- MEMORY.md — Hot memory template
- memory/self/*.md — Identity, user profile, methodology
### Modified:
- src/timmy/agent.py — Integrated with memory system
- create_timmy() injects memory context
- TimmyWithMemory class with automatic fact extraction
- tests/test_agent.py — Updated for memory context
## Key Principles
- Hot memory = small and curated
- Vault = append-only, never delete
- Handoffs = continuity mechanism
- Flat files = human-readable, portable
## Usage
All 973 tests pass.
Keep Mission Control link from this branch alongside SWARM and SPARK
links from main. All 939 tests pass.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Workset E: Swarm System Realization
- Verified PersonaNode bidding system is properly connected
- Coordinator already subscribes personas to task announcements
- Auction system works when /tasks/auction endpoint is used
## Workset F: Testing & Reliability
- Mobile nav: Add MOBILE link to desktop header (UX-01)
- Voice TTS: Verified graceful degradation already implemented
- Registry: Add proper connection cleanup with try/finally
## Workset G: Performance & Architecture
- Fix module path: websocket.handler -> ws_manager.handler
- Registry connections now properly closed after operations
All 895 tests pass.
Addresses QUALITY_ANALYSIS.md:
- UX-01: /mobile route now in desktop nav
- PERF-01: Connection cleanup improved (P3)
- FUNC-01/02: Verified bidding system operational
Adds SelfModifyLoop — an edit→validate→test→commit cycle that can read
its own failure reports, diagnose root causes, and restart autonomously.
Key capabilities:
- Multi-backend LLM: Anthropic Claude API, Ollama, or auto-detect
- Syntax validation via compile() before writing to disk
- Autonomous self-correction loop with configurable max cycles
- XML-based output format to avoid triple-quote delimiter conflicts
- Branch creation skipped by default to prevent container restarts
- CLI: self-modify run "instruction" --backend auto --autonomous
- 939 tests passing, 30 skipped
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Security (Workset A)
- XSS: Verified templates use safe DOM methods (textContent, createElement)
- Secrets: Fail-fast in production mode when L402 secrets not set
- Environment mode: Add TIMMY_ENV (development|production) validation
## Privacy (Workset C)
- Add telemetry_enabled config (default: False for sovereign AI)
- Pass telemetry setting to Agno Agent
- Update .env.example with TELEMETRY_ENABLED and TIMMY_ENV docs
## Agent Intelligence (Workset D)
- Enhanced TIMMY_SYSTEM_PROMPT with:
- Tool usage guidelines (when to use, when not to)
- Memory awareness documentation
- Operating mode documentation
- Help reduce unnecessary tool calls for simple queries
All 895 tests pass.
Telemetry disabled by default aligns with sovereign AI vision.
- Change Toolkit.add_tool() to Toolkit.register() (method was renamed in Agno)
- Fix PythonTools method: python -> run_python_code
- Fix FileTools method: write_file -> save_file
- Fix FileTools base_dir parameter: str -> Path object
- Fix Agent tools parameter: pass Toolkit wrapped in list
These fixes resolve critical startup errors that prevented Timmy agent from initializing:
- AttributeError: 'Toolkit' object has no attribute 'add_tool'
- AttributeError: 'PythonTools' object has no attribute 'python'
- TypeError: 'Toolkit' object is not iterable
All 895 tests pass after these changes.
Quality review: Agent now fully functional with working inference, memory,
and self-awareness capabilities.
- config.py: add L402_HMAC_SECRET, L402_MACAROON_SECRET, LIGHTNING_BACKEND
to pydantic-settings with startup warnings for default secrets
- l402_proxy.py, mock_backend.py, factory.py: migrate from os.environ.get()
to `from config import settings` per project convention
- Makefile: `make install-creative` now auto-installs PyTorch nightly with
Metal (MPS) support on Apple Silicon instead of just printing a note
- activate_self_tdd.sh: add PID file (.watchdog.pid) and EXIT trap so
Ctrl-C cleanly stops both the dashboard and the watchdog process
- .gitignore: add .watchdog.pid
https://claude.ai/code/session_01A81E5HMxZEPxzv2acNo35u
selenium depends on websocket-client which installs a top-level
`websocket` package that shadows our src/websocket/ module on CI.
Renaming to ws_manager eliminates the conflict entirely — no more
sys.path hacks needed in conftest or Selenium tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduces a vendor-agnostic chat platform architecture:
- chat_bridge/base.py: ChatPlatform ABC, ChatMessage, ChatThread
- chat_bridge/registry.py: PlatformRegistry singleton
- chat_bridge/invite_parser.py: QR + Ollama vision invite extraction
- chat_bridge/vendors/discord.py: DiscordVendor with native threads
Workflow: paste a screenshot of a Discord invite or QR code at
POST /discord/join → Timmy extracts the invite automatically.
Every Discord conversation gets its own thread, keeping channels clean.
Bot responds to @mentions and DMs, routes through Timmy agent.
43 new tests (base classes, registry, invite parser, vendor, routes).
https://claude.ai/code/session_01WU4h3cQQiouMwmgYmAgkMM
- Fix /serve/chat AttributeError: split Request and ChatRequest params
so auth headers are read from HTTP request, not Pydantic body
- Add regression tests for the serve_chat endpoint bug
- Add agent_core and lightning to pyproject.toml wheel includes
- Replace Apache 2.0 LICENSE with MIT to match pyproject.toml
- Update test count from "228" to "600+" across README, docs, AGENTS.md
- Add 5 missing subsystems to README table (Spark, Creative, Tools,
Telegram, agent_core/lightning)
- Update AGENTS.md project structure with 6 missing modules
- Mark completed v2 roadmap items (personas, MCP tools) in AGENTS.md
https://claude.ai/code/session_01GMiccXbo77GkV3TA69x6KS
Build real PNG, WAV, and MP4 fixtures (no AI models) and exercise the
full assembler and Creative Director pipeline end-to-end. Fix MoviePy v2
crossfade API (vfx.CrossFadeIn) and font resolution (DejaVu-Sans).
14 new integration tests — 638 total, all passing.
https://claude.ai/code/session_01KJm6jQkNi3aA3yoQJn636c
Adds 3 new personas (Pixel, Lyra, Reel) and 5 new tool modules:
- Git/DevOps tools (GitPython): clone, status, diff, log, blame, branch,
add, commit, push, pull, stash — wired to Forge and Helm personas
- Image generation (FLUX via diffusers): text-to-image, storyboards,
variations — Pixel persona
- Music generation (ACE-Step 1.5): full songs with vocals+instrumentals,
instrumental tracks, vocal-only tracks — Lyra persona
- Video generation (Wan 2.1 via diffusers): text-to-video, image-to-video
clips — Reel persona
- Creative Director pipeline: multi-step orchestration that chains
storyboard → music → video → assembly into 3+ minute final videos
- Video assembler (MoviePy + FFmpeg): stitch clips, overlay audio,
title cards, subtitles, final export
Also includes:
- Spark Intelligence tool-level + creative pipeline event capture
- Creative Studio dashboard page (/creative/ui) with 4 tabs
- Config settings for all new models and output directories
- pyproject.toml creative optional extra for GPU dependencies
- 107 new tests covering all modules (624 total, all passing)
https://claude.ai/code/session_01KJm6jQkNi3aA3yoQJn636c
Mission Control Dashboard:
- /swarm/mission-control page with real-time system status
- Sovereignty score display with visual progress bar
- Dependency health grid (Ollama, Redis, Lightning, SQLite)
- Recommendations based on dependency status
- Heartbeat monitor with tick counter
- System metrics: uptime, agents, tasks, sats earned
Health Endpoints:
- /health/sovereignty - Full sovereignty audit report
- /health/components - Component status and config
Tests (TDD approach):
- 11 Mission Control tests (all passing)
- 23 scary path tests for production scenarios
- Concurrent load, memory persistence, edge cases
Total: 525 tests passing
Introduce a feedback loop where task outcomes (win/loss, success/failure)
feed back into agent bidding strategy. Borrows the "learn from outcomes"
concept from Spark Intelligence but builds it natively on Timmy's existing
SQLite + swarm architecture.
New module: src/swarm/learner.py
- Records every bid outcome with task description context
- Computes per-agent metrics: win rate, success rate, keyword performance
- suggest_bid() adjusts bids based on historical performance
- learned_keywords() discovers what task types agents actually excel at
Changes:
- persona_node: _compute_bid() now consults learner for adaptive adjustments
- coordinator: complete_task/fail_task feed results into learner
- coordinator: run_auction_and_assign records all bid outcomes
- routes/swarm: add /swarm/insights and /swarm/insights/{agent_id} endpoints
- routes/swarm: add POST /swarm/tasks/{task_id}/fail endpoint
All 413 tests pass (23 new + 390 existing).
https://claude.ai/code/session_01E5jhTCwSUnJk9p9zrTMVUJ
Add infrastructure for running swarm agents as isolated Docker
containers with HTTP-based coordination, startup recovery, and
enhanced dashboard UI for agent management.
- Dockerfile and docker-compose.yml for multi-service orchestration
- DockerAgentRunner for programmatic container lifecycle management
- Internal HTTP API for container agents to poll tasks and submit bids
- Startup recovery system to reconcile orphaned tasks and stale agents
- Enhanced UI partials for agent panels, chat, and task assignment
- Timmy docker entry point with heartbeat and task polling
- New Makefile targets for Docker workflows
- Tests for swarm recovery
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bridges Telegram messages to Timmy via python-telegram-bot (optional
dependency). The bot token can be supplied through the TELEGRAM_TOKEN
env var or at runtime via the new POST /telegram/setup dashboard
endpoint, which (re)starts the bot without a restart.
Changes:
- src/telegram_bot/bot.py — TelegramBot singleton: token persistence
(telegram_state.json), lifecycle (start/stop), /start command and
message handler that forwards to Timmy
- src/dashboard/routes/telegram.py — /telegram/setup and /telegram/status
FastAPI routes
- src/dashboard/app.py — register telegram router; auto-start/stop bot
in lifespan hook
- src/config.py — TELEGRAM_TOKEN setting (pydantic-settings)
- pyproject.toml — [telegram] optional extra (python-telegram-bot>=21),
telegram_bot wheel include
- .env.example — TELEGRAM_TOKEN section
- .gitignore — exclude telegram_state.json (contains token)
- tests/conftest.py — stub telegram/telegram.ext for offline test runs
- tests/test_telegram_bot.py — 16 tests covering token helpers,
lifecycle, and all dashboard routes (370 total, all passing)
https://claude.ai/code/session_01CNBm3ZLobtx3Z1YogHq8ZS
Implements the Morning Briefing and Approval Queue feature — the first step
from tool to companion. Timmy now shows up before the owner asks.
New modules
-----------
• src/timmy/approvals.py — ApprovalItem dataclass, GOLDEN_TIMMY governance
constant, full SQLite CRUD (create / list / approve / reject / expire).
Items auto-expire after 7 days if not actioned.
• src/timmy/briefing.py — BriefingEngine that queries swarm activity and
chat history, calls Timmy's Agno agent for a prose summary, and caches
the result in SQLite (~/.timmy/briefings.db). get_or_generate() skips
regeneration if a fresh briefing (< 30 min) already exists.
New routes (src/dashboard/routes/briefing.py)
----------------------------------------------
GET /briefing — full briefing page
GET /briefing/approvals — HTMX partial: pending approval cards
POST /briefing/approvals/{id}/approve — approve via HTMX (no page reload)
POST /briefing/approvals/{id}/reject — reject via HTMX (no page reload)
New templates
-------------
• briefing.html — clean, mobile-first prose layout (max 680px)
• partials/approval_cards.html — list of approval cards
• partials/approval_card_single.html — single approval card with
Approve/Reject HTMX buttons
App wiring (src/dashboard/app.py)
----------------------------------
• Added asynccontextmanager lifespan with _briefing_scheduler background task.
Generates a briefing at startup and every 6 hours; skips if fresh.
Push notification hook (src/notifications/push.py)
---------------------------------------------------
• notify_briefing_ready(briefing) — logs + triggers local notifier.
Placeholder for APNs/Pushover wiring later.
Navigation
----------
• Added BRIEFING link to the header nav in base.html.
Tests
-----
• tests/test_approvals.py — 17 tests: GOLDEN_TIMMY, CRUD, expiry, ordering
• tests/test_briefing.py — 22 tests: dataclass, freshness, cache round-trip,
generate/get_or_generate, push notification hook
354 tests, 354 passing.
https://claude.ai/code/session_01D7p5w91KX3grBeioGiiGy8
v2.0.0 Exodus — three roadmap items implemented in one PR:
**1. Agent Personas (Echo, Mace, Helm, Seer, Forge, Quill)**
- src/swarm/personas.py — PERSONAS dict with role, description, capabilities,
rate_sats, bid_base/jitter, and preferred_keywords for each of the 6 agents
- src/swarm/persona_node.py — PersonaNode extends SwarmNode with capability-
aware bidding: bids lower when the task description contains a preferred
keyword (specialist advantage), higher otherwise (off-spec inflation)
- SwarmCoordinator.spawn_persona(persona_id) — registers the persona in the
SQLite registry with its full capabilities string and wires it into the
shared AuctionManager via comms subscription
**2. Bid History Persistence (prerequisite for marketplace stats)**
- src/swarm/stats.py — bid_history table in data/swarm.db:
record_bid(), mark_winner(), get_agent_stats(), get_all_agent_stats()
- coordinator.run_auction_and_assign() now calls swarm_stats.mark_winner()
when a winner is chosen, so tasks_won/total_earned survive restarts
- spawn_persona() records each bid for stats tracking
**3. Marketplace Frontend wired to real data**
- /marketplace/ui — new HTML route renders marketplace.html with live
registry status (idle/busy/offline/planned) and cumulative bid stats
- /marketplace JSON endpoint enriched with same registry+stats data
- marketplace.html — fixed field names (rate_sats, tasks_completed,
total_earned), added role subtitle, comma-split capabilities string,
FREE label for Timmy, "planned_count" display
- base.html — added MARKET nav link pointing to /marketplace/ui
Tests: 315 passed (87 new) covering personas, persona_node, stats CRUD,
marketplace UI route, and enriched catalog data.
https://claude.ai/code/session_013CPPgLc589wfdS8LDNuarL
- payment_handler.py: warn when L402_HMAC_SECRET uses default value
- l402_proxy.py: warn when L402_MACAROON_SECRET uses default value
- .env.example: document L402_HMAC_SECRET, L402_MACAROON_SECRET, and
LIGHTNING_BACKEND with generation instructions
These warnings ensure operators are alerted before deploying with
insecure default secrets.
- Fix swarm_live.html WebSocket URL from /swarm/ws to /swarm/live
(matching the actual endpoint in swarm_ws.py)
- Update handleMessage() to process individual swarm events
(agent_joined, task_posted, bid_submitted, task_assigned, etc.)
in addition to bulk state snapshots
- Add refreshStats() helper that fetches /swarm REST endpoint to
update stat counters after each event
- Add GET /swarm/live page route to render the swarm_live.html template
- Add SWARM and MOBILE navigation links to base.html header
(fixes UX-01: /mobile route not in desktop nav)
- Add spawn_in_process_agent() to SwarmCoordinator: creates lightweight
SwarmNode instances that share the coordinator's comms layer and
AuctionManager, enabling synchronous bid submission
- Fix post_task() to open the auction BEFORE announcing via comms so
in-process agent callbacks can submit bids into an open auction
- Fix run_auction_and_assign() to close an already-open auction instead
of re-opening (which would discard bids)
- Add POST /swarm/tasks/auction route for atomic task+auction flow
- Add 7 integration tests (TDD) covering the full lifecycle:
spawn → post → auction → assign → complete
Replace all innerHTML string interpolation with safe DOM methods
(createElement, textContent, appendChild) to prevent script injection
from user chat messages and WebSocket agent data.
Fixes: XSS-01, XSS-02