Adds SelfModifyLoop — an edit→validate→test→commit cycle that can read
its own failure reports, diagnose root causes, and restart autonomously.
Key capabilities:
- Multi-backend LLM: Anthropic Claude API, Ollama, or auto-detect
- Syntax validation via compile() before writing to disk
- Autonomous self-correction loop with configurable max cycles
- XML-based output format to avoid triple-quote delimiter conflicts
- Branch creation skipped by default to prevent container restarts
- CLI: self-modify run "instruction" --backend auto --autonomous
- 939 tests passing, 30 skipped
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
selenium depends on websocket-client which installs a top-level
`websocket` package that shadows our src/websocket/ module on CI.
Renaming to ws_manager eliminates the conflict entirely — no more
sys.path hacks needed in conftest or Selenium tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
selenium depends on websocket-client which installs a top-level
`websocket` package that shadows our src/websocket/ module. Ensure
src/ is inserted at the front of sys.path in conftest so the project
module wins the import race. Fixes collection errors for
test_websocket.py and test_websocket_extended.py on GitHub Actions.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add an ollama service (behind --profile ollama) to the test compose stack
and a new test suite that verifies real LLM inference end-to-end:
- docker-compose.test.yml: add ollama/ollama service with health check,
make OLLAMA_URL and OLLAMA_MODEL configurable via env vars
- tests/functional/test_ollama_chat.py: session-scoped fixture that
brings up Ollama + dashboard, pulls qwen2.5:0.5b (~400MB, CPU-only),
and runs chat/history/multi-turn tests against the live stack
- Makefile: add `make test-ollama` target
Run with: make test-ollama (or FUNCTIONAL_DOCKER=1 pytest tests/functional/test_ollama_chat.py -v)
https://claude.ai/code/session_01NTEzfRHSZQCfkfypxgyHKk
Introduces a vendor-agnostic chat platform architecture:
- chat_bridge/base.py: ChatPlatform ABC, ChatMessage, ChatThread
- chat_bridge/registry.py: PlatformRegistry singleton
- chat_bridge/invite_parser.py: QR + Ollama vision invite extraction
- chat_bridge/vendors/discord.py: DiscordVendor with native threads
Workflow: paste a screenshot of a Discord invite or QR code at
POST /discord/join → Timmy extracts the invite automatically.
Every Discord conversation gets its own thread, keeping channels clean.
Bot responds to @mentions and DMs, routes through Timmy agent.
43 new tests (base classes, registry, invite parser, vendor, routes).
https://claude.ai/code/session_01WU4h3cQQiouMwmgYmAgkMM
Three-tier functional test infrastructure:
- CLI tests via Typer CliRunner (timmy, timmy-serve, self-tdd)
- Dashboard integration tests with real TestClient, real SQLite, real
coordinator (no patch/mock — Ollama offline = graceful degradation)
- Docker compose container-level tests (gated by FUNCTIONAL_DOCKER=1)
- End-to-end L402 payment flow with real mock-lightning backend
42 new tests (8 Docker tests skipped without FUNCTIONAL_DOCKER=1).
All 849 tests pass.
https://claude.ai/code/session_01WU4h3cQQiouMwmgYmAgkMM
- Fix /serve/chat AttributeError: split Request and ChatRequest params
so auth headers are read from HTTP request, not Pydantic body
- Add regression tests for the serve_chat endpoint bug
- Add agent_core and lightning to pyproject.toml wheel includes
- Replace Apache 2.0 LICENSE with MIT to match pyproject.toml
- Update test count from "228" to "600+" across README, docs, AGENTS.md
- Add 5 missing subsystems to README table (Spark, Creative, Tools,
Telegram, agent_core/lightning)
- Update AGENTS.md project structure with 6 missing modules
- Mark completed v2 roadmap items (personas, MCP tools) in AGENTS.md
https://claude.ai/code/session_01GMiccXbo77GkV3TA69x6KS
Build real PNG, WAV, and MP4 fixtures (no AI models) and exercise the
full assembler and Creative Director pipeline end-to-end. Fix MoviePy v2
crossfade API (vfx.CrossFadeIn) and font resolution (DejaVu-Sans).
14 new integration tests — 638 total, all passing.
https://claude.ai/code/session_01KJm6jQkNi3aA3yoQJn636c
Adds 3 new personas (Pixel, Lyra, Reel) and 5 new tool modules:
- Git/DevOps tools (GitPython): clone, status, diff, log, blame, branch,
add, commit, push, pull, stash — wired to Forge and Helm personas
- Image generation (FLUX via diffusers): text-to-image, storyboards,
variations — Pixel persona
- Music generation (ACE-Step 1.5): full songs with vocals+instrumentals,
instrumental tracks, vocal-only tracks — Lyra persona
- Video generation (Wan 2.1 via diffusers): text-to-video, image-to-video
clips — Reel persona
- Creative Director pipeline: multi-step orchestration that chains
storyboard → music → video → assembly into 3+ minute final videos
- Video assembler (MoviePy + FFmpeg): stitch clips, overlay audio,
title cards, subtitles, final export
Also includes:
- Spark Intelligence tool-level + creative pipeline event capture
- Creative Studio dashboard page (/creative/ui) with 4 tabs
- Config settings for all new models and output directories
- pyproject.toml creative optional extra for GPU dependencies
- 107 new tests covering all modules (624 total, all passing)
https://claude.ai/code/session_01KJm6jQkNi3aA3yoQJn636c
Mission Control Dashboard:
- /swarm/mission-control page with real-time system status
- Sovereignty score display with visual progress bar
- Dependency health grid (Ollama, Redis, Lightning, SQLite)
- Recommendations based on dependency status
- Heartbeat monitor with tick counter
- System metrics: uptime, agents, tasks, sats earned
Health Endpoints:
- /health/sovereignty - Full sovereignty audit report
- /health/components - Component status and config
Tests (TDD approach):
- 11 Mission Control tests (all passing)
- 23 scary path tests for production scenarios
- Concurrent load, memory persistence, edge cases
Total: 525 tests passing
Add full pytest-cov configuration with fail_under=60% threshold,
HTML/XML report targets, and proper exclude_lines. Fix websocket
history test to use public broadcast() API instead of manually
manipulating internals. Audit confirmed 491 tests at 71.2% coverage.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduce a feedback loop where task outcomes (win/loss, success/failure)
feed back into agent bidding strategy. Borrows the "learn from outcomes"
concept from Spark Intelligence but builds it natively on Timmy's existing
SQLite + swarm architecture.
New module: src/swarm/learner.py
- Records every bid outcome with task description context
- Computes per-agent metrics: win rate, success rate, keyword performance
- suggest_bid() adjusts bids based on historical performance
- learned_keywords() discovers what task types agents actually excel at
Changes:
- persona_node: _compute_bid() now consults learner for adaptive adjustments
- coordinator: complete_task/fail_task feed results into learner
- coordinator: run_auction_and_assign records all bid outcomes
- routes/swarm: add /swarm/insights and /swarm/insights/{agent_id} endpoints
- routes/swarm: add POST /swarm/tasks/{task_id}/fail endpoint
All 413 tests pass (23 new + 390 existing).
https://claude.ai/code/session_01E5jhTCwSUnJk9p9zrTMVUJ
Add infrastructure for running swarm agents as isolated Docker
containers with HTTP-based coordination, startup recovery, and
enhanced dashboard UI for agent management.
- Dockerfile and docker-compose.yml for multi-service orchestration
- DockerAgentRunner for programmatic container lifecycle management
- Internal HTTP API for container agents to poll tasks and submit bids
- Startup recovery system to reconcile orphaned tasks and stale agents
- Enhanced UI partials for agent panels, chat, and task assignment
- Timmy docker entry point with heartbeat and task polling
- New Makefile targets for Docker workflows
- Tests for swarm recovery
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bridges Telegram messages to Timmy via python-telegram-bot (optional
dependency). The bot token can be supplied through the TELEGRAM_TOKEN
env var or at runtime via the new POST /telegram/setup dashboard
endpoint, which (re)starts the bot without a restart.
Changes:
- src/telegram_bot/bot.py — TelegramBot singleton: token persistence
(telegram_state.json), lifecycle (start/stop), /start command and
message handler that forwards to Timmy
- src/dashboard/routes/telegram.py — /telegram/setup and /telegram/status
FastAPI routes
- src/dashboard/app.py — register telegram router; auto-start/stop bot
in lifespan hook
- src/config.py — TELEGRAM_TOKEN setting (pydantic-settings)
- pyproject.toml — [telegram] optional extra (python-telegram-bot>=21),
telegram_bot wheel include
- .env.example — TELEGRAM_TOKEN section
- .gitignore — exclude telegram_state.json (contains token)
- tests/conftest.py — stub telegram/telegram.ext for offline test runs
- tests/test_telegram_bot.py — 16 tests covering token helpers,
lifecycle, and all dashboard routes (370 total, all passing)
https://claude.ai/code/session_01CNBm3ZLobtx3Z1YogHq8ZS
Implements the Morning Briefing and Approval Queue feature — the first step
from tool to companion. Timmy now shows up before the owner asks.
New modules
-----------
• src/timmy/approvals.py — ApprovalItem dataclass, GOLDEN_TIMMY governance
constant, full SQLite CRUD (create / list / approve / reject / expire).
Items auto-expire after 7 days if not actioned.
• src/timmy/briefing.py — BriefingEngine that queries swarm activity and
chat history, calls Timmy's Agno agent for a prose summary, and caches
the result in SQLite (~/.timmy/briefings.db). get_or_generate() skips
regeneration if a fresh briefing (< 30 min) already exists.
New routes (src/dashboard/routes/briefing.py)
----------------------------------------------
GET /briefing — full briefing page
GET /briefing/approvals — HTMX partial: pending approval cards
POST /briefing/approvals/{id}/approve — approve via HTMX (no page reload)
POST /briefing/approvals/{id}/reject — reject via HTMX (no page reload)
New templates
-------------
• briefing.html — clean, mobile-first prose layout (max 680px)
• partials/approval_cards.html — list of approval cards
• partials/approval_card_single.html — single approval card with
Approve/Reject HTMX buttons
App wiring (src/dashboard/app.py)
----------------------------------
• Added asynccontextmanager lifespan with _briefing_scheduler background task.
Generates a briefing at startup and every 6 hours; skips if fresh.
Push notification hook (src/notifications/push.py)
---------------------------------------------------
• notify_briefing_ready(briefing) — logs + triggers local notifier.
Placeholder for APNs/Pushover wiring later.
Navigation
----------
• Added BRIEFING link to the header nav in base.html.
Tests
-----
• tests/test_approvals.py — 17 tests: GOLDEN_TIMMY, CRUD, expiry, ordering
• tests/test_briefing.py — 22 tests: dataclass, freshness, cache round-trip,
generate/get_or_generate, push notification hook
354 tests, 354 passing.
https://claude.ai/code/session_01D7p5w91KX3grBeioGiiGy8
v2.0.0 Exodus — three roadmap items implemented in one PR:
**1. Agent Personas (Echo, Mace, Helm, Seer, Forge, Quill)**
- src/swarm/personas.py — PERSONAS dict with role, description, capabilities,
rate_sats, bid_base/jitter, and preferred_keywords for each of the 6 agents
- src/swarm/persona_node.py — PersonaNode extends SwarmNode with capability-
aware bidding: bids lower when the task description contains a preferred
keyword (specialist advantage), higher otherwise (off-spec inflation)
- SwarmCoordinator.spawn_persona(persona_id) — registers the persona in the
SQLite registry with its full capabilities string and wires it into the
shared AuctionManager via comms subscription
**2. Bid History Persistence (prerequisite for marketplace stats)**
- src/swarm/stats.py — bid_history table in data/swarm.db:
record_bid(), mark_winner(), get_agent_stats(), get_all_agent_stats()
- coordinator.run_auction_and_assign() now calls swarm_stats.mark_winner()
when a winner is chosen, so tasks_won/total_earned survive restarts
- spawn_persona() records each bid for stats tracking
**3. Marketplace Frontend wired to real data**
- /marketplace/ui — new HTML route renders marketplace.html with live
registry status (idle/busy/offline/planned) and cumulative bid stats
- /marketplace JSON endpoint enriched with same registry+stats data
- marketplace.html — fixed field names (rate_sats, tasks_completed,
total_earned), added role subtitle, comma-split capabilities string,
FREE label for Timmy, "planned_count" display
- base.html — added MARKET nav link pointing to /marketplace/ui
Tests: 315 passed (87 new) covering personas, persona_node, stats CRUD,
marketplace UI route, and enriched catalog data.
https://claude.ai/code/session_013CPPgLc589wfdS8LDNuarL
- Add 8 tests for timmy_serve/cli.py (start, invoice, status commands)
covering default args, custom args, and output validation
- Add 8 tests for voice_enhanced route covering all intent types
(status, help, swarm, voice, chat fallback) plus error handling
- Add 17 tests for websocket/handler.py covering broadcast to
multiple clients, dead connection cleanup, history trimming,
connect/disconnect, and all convenience broadcast methods
- Add 4 tests for the new GET /swarm/live page route
Total new tests: 37
- Add spawn_in_process_agent() to SwarmCoordinator: creates lightweight
SwarmNode instances that share the coordinator's comms layer and
AuctionManager, enabling synchronous bid submission
- Fix post_task() to open the auction BEFORE announcing via comms so
in-process agent callbacks can submit bids into an open auction
- Fix run_auction_and_assign() to close an already-open auction instead
of re-opening (which would discard bids)
- Add POST /swarm/tasks/auction route for atomic task+auction flow
- Add 7 integration tests (TDD) covering the full lifecycle:
spawn → post → auction → assign → complete
- Add QUALITY_ANALYSIS.md — 10-point architect review covering
architecture coherence, completeness (~35-40% vs vision), mobile UX,
security, test coverage, code quality, and DX
- Fix P0 XSS: mobile.html chat input now uses DOM textContent instead
of innerHTML string interpolation with raw user input
- Fix P0 XSS: swarm_live.html agent/auction rendering rewritten with
safe DOM methods (_t/_el helpers) — no more ${agent.name} in innerHTML
- Add M7xx test category (4 new tests) covering XSS prevention assertions;
total suite now 232 passing (was 228)
- HITL session guide included in analysis with step-by-step phone test
instructions and critical scenario priority ordering
https://claude.ai/code/session_0183Nzcy7TMqjrAopnTtygds
Adds `src/self_tdd/watchdog.py` with a `_run_tests()` function that
shells out to pytest and a `watch` command that polls on a configurable
interval, printing green on recovery and full short-traceback output on
regression. No files are modified and no commits are made automatically.
Usage:
self-tdd watch # default 60s interval
self-tdd watch -i 15 # poll every 15s
Also adds 6 unit tests and wires the `self-tdd` entry point +
`src/self_tdd` wheel include into pyproject.toml.
https://claude.ai/code/session_01DMjQ5qMZ8iHeyix1j3GS7c
TIMMY_STATUS_PROMPT was defined in timmy/prompts.py and covered by
tests, but never wired into the application. The CLI status command
was passing a hardcoded inline string instead. Replace the inline
string with the canonical prompt and add two CLI tests that verify
the correct prompt is used.
https://claude.ai/code/session_01DMjQ5qMZ8iHeyix1j3GS7c
- Add dashboard/store.py: MessageLog dataclass singleton tracking
user/agent/error messages for the lifetime of the server process
- agents.py: write each chat turn to MessageLog; add GET and DELETE
/agents/timmy/history routes returning the history.html partial
- partials/history.html: render stored messages by role (YOU / TIMMY /
SYSTEM); falls back to the Mission Control init message when empty
- index.html: chat-log loads history via hx-get on page start; new
CLEAR button in panel header sends hx-delete to reset the log
- style.css: add .mc-btn-clear (muted, red-on-hover for the header)
- tests: autouse reset_message_log fixture in conftest; 5 new history
tests covering empty state, recording, offline errors, clear, and
post-clear state → 32 tests total, all passing
https://claude.ai/code/session_01KZMfwBpLuiv6x9GbzTqbys