Resolved merge conflicts in agents.py and test_task_queue.py:
- Keep full chat-to-task pipeline (agent/priority extraction, question
filtering, context injection) over simpler main version
- Incorporate test_briefing_task_queue_summary from main
- All 64 task queue tests pass
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addresses 14 bugs from 3 rounds of deep chat evaluation:
- Add chat-to-task pipeline in agents.py with regex-based intent detection,
agent extraction, priority extraction, and title cleaning
- Filter meta-questions ("how do I create a task?") from task creation
- Inject real-time date/time context into every chat message
- Inject live queue state when user asks about tasks
- Ground system prompts with agent roster, honesty guardrails, self-knowledge,
math delegation template, anti-filler rules, values-conflict guidance
- Add CSS for markdown code blocks, inline code, lists, blockquotes in chat
- Add highlight.js CDN for syntax highlighting in chat responses
- Reduce small-model memory context budget (4000→2000) for expanded prompt
- Add 27 comprehensive tests covering the full chat-to-task pipeline
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Chat messages like "add X to the queue" or "create a task" are
intercepted and create a task_queue entry with pending_approval
status instead of going through to the LLM
- Briefing engine now gathers task queue stats (pending, running,
completed, failed) and includes them in the morning briefing prompt
- 7 new tests covering detection patterns, chat integration, and
briefing summary
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Task Queue system:
- New /tasks page with three-column layout (Pending/Active/Completed)
- Full CRUD API at /api/tasks with approve/veto/modify/pause/cancel/retry
- SQLite persistence in task_queue table
- WebSocket live updates via ws_manager
- Create task modal with agent assignment and priority
- Auto-approve rules for low-risk tasks
- HTMX polling for real-time column updates
- HOME TASK buttons now link to task queue with agent pre-selected
- MARKET HIRE buttons link to task queue with agent pre-selected
Work Order system:
- External submission API for agents/users (POST /work-orders/submit)
- Risk scoring and configurable auto-execution thresholds
- Dashboard at /work-orders/queue with approve/reject/execute flow
- Integration with swarm task system for execution
UI & Dashboard bug fixes:
- EVENTS: add startup event so page is never empty
- LEDGER: fix empty filter params in URL
- MISSION CONTROL: LLM backend and model now read from /health
- MISSION CONTROL: agent count fallback to /swarm/agents
- SWARM: HTMX fallback loads initial data if WebSocket is slow
- MEMORY: add edit/delete buttons for personal facts
- UPGRADES: add empty state guidance with links
- BRIEFING: add regenerate button and POST /briefing/regenerate endpoint
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add sandboxed calculator tool to Timmy's toolkit so arithmetic questions
get exact answers instead of LLM hallucinations
- Update system prompts (lite + full) to instruct Timmy to always use the
calculator and never attempt multi-digit math in his head
- Add self-contradiction guard to both prompts ("commit to your facts")
- Render Timmy's chat responses as markdown via marked.js + DOMPurify
instead of raw escaped text
- Suppress empty briefing notification on startup when there are 0
pending approval items
- Add calculator to session response sanitizer regex
- 18 new calculator tests, 2 updated briefing notification tests
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Set overflow:hidden on mc-main to prevent page-level scrolling
- Add max-height:100% to sidebar and chat panel to contain within viewport
- Use flex-wrap:nowrap on layout row to prevent column stacking on desktop
- Move scrollChat() to hx-on::after-settle for reliable post-swap scrolling
- Use requestAnimationFrame for smooth scroll-to-bottom timing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove show_tool_calls kwarg (not in Agno 2.5.3), which crashed Agent.__init__
- Guard memory_search against top_k=None from model, return formatted string
- Skip Telegram/Discord startup silently when no token configured
- Replace placeholder MEMORY.md with proper structured hot memory document
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix recall_personal_facts() call - remove unsupported limit parameter
- Replace 4 slow E2E test files with single fast test file
- All 6 E2E tests complete in ~9 seconds (was 60+ seconds)
- Reuse browser session across tests (module-scoped fixture)
- Combine related checks into single tests
- Add HTTP-only smoke test for speed
- YAML-based provider configuration (config/providers.yaml)
- Priority-ordered provider routing
- Circuit breaker pattern for failing providers
- Health check and availability monitoring
- Metrics tracking (latency, errors, success rates)
- Support for Ollama, OpenAI, Anthropic, AirLLM providers
- Automatic failover on rate limits or errors
- REST API endpoints for monitoring and control
- 41 comprehensive tests
API Endpoints:
- POST /api/v1/router/complete - Chat completion with failover
- GET /api/v1/router/status - Provider health status
- GET /api/v1/router/metrics - Detailed metrics
- GET /api/v1/router/providers - List all providers
- POST /api/v1/router/providers/{name}/control - Enable/disable/reset
- POST /api/v1/router/health-check - Run health checks
- GET /api/v1/router/config - View configuration
Timmy was exhibiting severe incoherence (no memory between messages, tool call
leakage, chain-of-thought narration, random tool invocations) due to creating
a brand new agent per HTTP request and giving a 3B model (llama3.2) a 73-line
system prompt with complex tool-calling instructions it couldn't follow.
Key changes:
- Add session.py singleton with stable session_id for conversation continuity
- Add _model_supports_tools() to strip tools from small models (< 7B)
- Add two-tier prompts: lite (12 lines) for small models, full for capable ones
- Add response sanitizer to strip leaked JSON tool calls and CoT narration
- Set show_tool_calls=False to prevent raw tool JSON in output
- Wire ConversationManager for user name extraction
- Deprecate orphaned memory_layers.py (unused 4-layer system)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit replaces the previous memory_layers.py with a proper three-tier
memory system as specified by the user:
## Tier 1 — Hot Memory (MEMORY.md)
- Single flat file always loaded into system context
- Contains: current status, standing rules, agent roster, key decisions
- ~300 lines max, pruned monthly
- Managed by HotMemory class
## Tier 2 — Structured Vault (memory/)
- Directory with three namespaces:
• self/ — identity.md, user_profile.md, methodology.md
• notes/ — session logs, AARs, research
• aar/ — post-task retrospectives
- Markdown format, Obsidian-compatible
- Append-only, date-stamped
- Managed by VaultMemory class
## Handoff Protocol
- last-session-handoff.md written at session end
- Contains: summary, key decisions, open items, next steps
- Auto-loaded at next session start
- Maintains continuity across resets
## Implementation
### New Files:
- src/timmy/memory_system.py — Core memory system
- MEMORY.md — Hot memory template
- memory/self/*.md — Identity, user profile, methodology
### Modified:
- src/timmy/agent.py — Integrated with memory system
- create_timmy() injects memory context
- TimmyWithMemory class with automatic fact extraction
- tests/test_agent.py — Updated for memory context
## Key Principles
- Hot memory = small and curated
- Vault = append-only, never delete
- Handoffs = continuity mechanism
- Flat files = human-readable, portable
## Usage
All 973 tests pass.
Keep Mission Control link from this branch alongside SWARM and SPARK
links from main. All 939 tests pass.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Workset E: Swarm System Realization
- Verified PersonaNode bidding system is properly connected
- Coordinator already subscribes personas to task announcements
- Auction system works when /tasks/auction endpoint is used
## Workset F: Testing & Reliability
- Mobile nav: Add MOBILE link to desktop header (UX-01)
- Voice TTS: Verified graceful degradation already implemented
- Registry: Add proper connection cleanup with try/finally
## Workset G: Performance & Architecture
- Fix module path: websocket.handler -> ws_manager.handler
- Registry connections now properly closed after operations
All 895 tests pass.
Addresses QUALITY_ANALYSIS.md:
- UX-01: /mobile route now in desktop nav
- PERF-01: Connection cleanup improved (P3)
- FUNC-01/02: Verified bidding system operational
Adds SelfModifyLoop — an edit→validate→test→commit cycle that can read
its own failure reports, diagnose root causes, and restart autonomously.
Key capabilities:
- Multi-backend LLM: Anthropic Claude API, Ollama, or auto-detect
- Syntax validation via compile() before writing to disk
- Autonomous self-correction loop with configurable max cycles
- XML-based output format to avoid triple-quote delimiter conflicts
- Branch creation skipped by default to prevent container restarts
- CLI: self-modify run "instruction" --backend auto --autonomous
- 939 tests passing, 30 skipped
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Security (Workset A)
- XSS: Verified templates use safe DOM methods (textContent, createElement)
- Secrets: Fail-fast in production mode when L402 secrets not set
- Environment mode: Add TIMMY_ENV (development|production) validation
## Privacy (Workset C)
- Add telemetry_enabled config (default: False for sovereign AI)
- Pass telemetry setting to Agno Agent
- Update .env.example with TELEMETRY_ENABLED and TIMMY_ENV docs
## Agent Intelligence (Workset D)
- Enhanced TIMMY_SYSTEM_PROMPT with:
- Tool usage guidelines (when to use, when not to)
- Memory awareness documentation
- Operating mode documentation
- Help reduce unnecessary tool calls for simple queries
All 895 tests pass.
Telemetry disabled by default aligns with sovereign AI vision.
- Change Toolkit.add_tool() to Toolkit.register() (method was renamed in Agno)
- Fix PythonTools method: python -> run_python_code
- Fix FileTools method: write_file -> save_file
- Fix FileTools base_dir parameter: str -> Path object
- Fix Agent tools parameter: pass Toolkit wrapped in list
These fixes resolve critical startup errors that prevented Timmy agent from initializing:
- AttributeError: 'Toolkit' object has no attribute 'add_tool'
- AttributeError: 'PythonTools' object has no attribute 'python'
- TypeError: 'Toolkit' object is not iterable
All 895 tests pass after these changes.
Quality review: Agent now fully functional with working inference, memory,
and self-awareness capabilities.
- config.py: add L402_HMAC_SECRET, L402_MACAROON_SECRET, LIGHTNING_BACKEND
to pydantic-settings with startup warnings for default secrets
- l402_proxy.py, mock_backend.py, factory.py: migrate from os.environ.get()
to `from config import settings` per project convention
- Makefile: `make install-creative` now auto-installs PyTorch nightly with
Metal (MPS) support on Apple Silicon instead of just printing a note
- activate_self_tdd.sh: add PID file (.watchdog.pid) and EXIT trap so
Ctrl-C cleanly stops both the dashboard and the watchdog process
- .gitignore: add .watchdog.pid
https://claude.ai/code/session_01A81E5HMxZEPxzv2acNo35u
selenium depends on websocket-client which installs a top-level
`websocket` package that shadows our src/websocket/ module on CI.
Renaming to ws_manager eliminates the conflict entirely — no more
sys.path hacks needed in conftest or Selenium tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduces a vendor-agnostic chat platform architecture:
- chat_bridge/base.py: ChatPlatform ABC, ChatMessage, ChatThread
- chat_bridge/registry.py: PlatformRegistry singleton
- chat_bridge/invite_parser.py: QR + Ollama vision invite extraction
- chat_bridge/vendors/discord.py: DiscordVendor with native threads
Workflow: paste a screenshot of a Discord invite or QR code at
POST /discord/join → Timmy extracts the invite automatically.
Every Discord conversation gets its own thread, keeping channels clean.
Bot responds to @mentions and DMs, routes through Timmy agent.
43 new tests (base classes, registry, invite parser, vendor, routes).
https://claude.ai/code/session_01WU4h3cQQiouMwmgYmAgkMM
- Fix /serve/chat AttributeError: split Request and ChatRequest params
so auth headers are read from HTTP request, not Pydantic body
- Add regression tests for the serve_chat endpoint bug
- Add agent_core and lightning to pyproject.toml wheel includes
- Replace Apache 2.0 LICENSE with MIT to match pyproject.toml
- Update test count from "228" to "600+" across README, docs, AGENTS.md
- Add 5 missing subsystems to README table (Spark, Creative, Tools,
Telegram, agent_core/lightning)
- Update AGENTS.md project structure with 6 missing modules
- Mark completed v2 roadmap items (personas, MCP tools) in AGENTS.md
https://claude.ai/code/session_01GMiccXbo77GkV3TA69x6KS
Build real PNG, WAV, and MP4 fixtures (no AI models) and exercise the
full assembler and Creative Director pipeline end-to-end. Fix MoviePy v2
crossfade API (vfx.CrossFadeIn) and font resolution (DejaVu-Sans).
14 new integration tests — 638 total, all passing.
https://claude.ai/code/session_01KJm6jQkNi3aA3yoQJn636c
Adds 3 new personas (Pixel, Lyra, Reel) and 5 new tool modules:
- Git/DevOps tools (GitPython): clone, status, diff, log, blame, branch,
add, commit, push, pull, stash — wired to Forge and Helm personas
- Image generation (FLUX via diffusers): text-to-image, storyboards,
variations — Pixel persona
- Music generation (ACE-Step 1.5): full songs with vocals+instrumentals,
instrumental tracks, vocal-only tracks — Lyra persona
- Video generation (Wan 2.1 via diffusers): text-to-video, image-to-video
clips — Reel persona
- Creative Director pipeline: multi-step orchestration that chains
storyboard → music → video → assembly into 3+ minute final videos
- Video assembler (MoviePy + FFmpeg): stitch clips, overlay audio,
title cards, subtitles, final export
Also includes:
- Spark Intelligence tool-level + creative pipeline event capture
- Creative Studio dashboard page (/creative/ui) with 4 tabs
- Config settings for all new models and output directories
- pyproject.toml creative optional extra for GPU dependencies
- 107 new tests covering all modules (624 total, all passing)
https://claude.ai/code/session_01KJm6jQkNi3aA3yoQJn636c
Mission Control Dashboard:
- /swarm/mission-control page with real-time system status
- Sovereignty score display with visual progress bar
- Dependency health grid (Ollama, Redis, Lightning, SQLite)
- Recommendations based on dependency status
- Heartbeat monitor with tick counter
- System metrics: uptime, agents, tasks, sats earned
Health Endpoints:
- /health/sovereignty - Full sovereignty audit report
- /health/components - Component status and config
Tests (TDD approach):
- 11 Mission Control tests (all passing)
- 23 scary path tests for production scenarios
- Concurrent load, memory persistence, edge cases
Total: 525 tests passing
Introduce a feedback loop where task outcomes (win/loss, success/failure)
feed back into agent bidding strategy. Borrows the "learn from outcomes"
concept from Spark Intelligence but builds it natively on Timmy's existing
SQLite + swarm architecture.
New module: src/swarm/learner.py
- Records every bid outcome with task description context
- Computes per-agent metrics: win rate, success rate, keyword performance
- suggest_bid() adjusts bids based on historical performance
- learned_keywords() discovers what task types agents actually excel at
Changes:
- persona_node: _compute_bid() now consults learner for adaptive adjustments
- coordinator: complete_task/fail_task feed results into learner
- coordinator: run_auction_and_assign records all bid outcomes
- routes/swarm: add /swarm/insights and /swarm/insights/{agent_id} endpoints
- routes/swarm: add POST /swarm/tasks/{task_id}/fail endpoint
All 413 tests pass (23 new + 390 existing).
https://claude.ai/code/session_01E5jhTCwSUnJk9p9zrTMVUJ
Add infrastructure for running swarm agents as isolated Docker
containers with HTTP-based coordination, startup recovery, and
enhanced dashboard UI for agent management.
- Dockerfile and docker-compose.yml for multi-service orchestration
- DockerAgentRunner for programmatic container lifecycle management
- Internal HTTP API for container agents to poll tasks and submit bids
- Startup recovery system to reconcile orphaned tasks and stale agents
- Enhanced UI partials for agent panels, chat, and task assignment
- Timmy docker entry point with heartbeat and task polling
- New Makefile targets for Docker workflows
- Tests for swarm recovery
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>