Resolved merge conflicts in agents.py and test_task_queue.py:
- Keep full chat-to-task pipeline (agent/priority extraction, question
filtering, context injection) over simpler main version
- Incorporate test_briefing_task_queue_summary from main
- All 64 task queue tests pass
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addresses 14 bugs from 3 rounds of deep chat evaluation:
- Add chat-to-task pipeline in agents.py with regex-based intent detection,
agent extraction, priority extraction, and title cleaning
- Filter meta-questions ("how do I create a task?") from task creation
- Inject real-time date/time context into every chat message
- Inject live queue state when user asks about tasks
- Ground system prompts with agent roster, honesty guardrails, self-knowledge,
math delegation template, anti-filler rules, values-conflict guidance
- Add CSS for markdown code blocks, inline code, lists, blockquotes in chat
- Add highlight.js CDN for syntax highlighting in chat responses
- Reduce small-model memory context budget (4000→2000) for expanded prompt
- Add 27 comprehensive tests covering the full chat-to-task pipeline
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Chat messages like "add X to the queue" or "create a task" are
intercepted and create a task_queue entry with pending_approval
status instead of going through to the LLM
- Briefing engine now gathers task queue stats (pending, running,
completed, failed) and includes them in the morning briefing prompt
- 7 new tests covering detection patterns, chat integration, and
briefing summary
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Task Queue system:
- New /tasks page with three-column layout (Pending/Active/Completed)
- Full CRUD API at /api/tasks with approve/veto/modify/pause/cancel/retry
- SQLite persistence in task_queue table
- WebSocket live updates via ws_manager
- Create task modal with agent assignment and priority
- Auto-approve rules for low-risk tasks
- HTMX polling for real-time column updates
- HOME TASK buttons now link to task queue with agent pre-selected
- MARKET HIRE buttons link to task queue with agent pre-selected
Work Order system:
- External submission API for agents/users (POST /work-orders/submit)
- Risk scoring and configurable auto-execution thresholds
- Dashboard at /work-orders/queue with approve/reject/execute flow
- Integration with swarm task system for execution
UI & Dashboard bug fixes:
- EVENTS: add startup event so page is never empty
- LEDGER: fix empty filter params in URL
- MISSION CONTROL: LLM backend and model now read from /health
- MISSION CONTROL: agent count fallback to /swarm/agents
- SWARM: HTMX fallback loads initial data if WebSocket is slow
- MEMORY: add edit/delete buttons for personal facts
- UPGRADES: add empty state guidance with links
- BRIEFING: add regenerate button and POST /briefing/regenerate endpoint
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add sandboxed calculator tool to Timmy's toolkit so arithmetic questions
get exact answers instead of LLM hallucinations
- Update system prompts (lite + full) to instruct Timmy to always use the
calculator and never attempt multi-digit math in his head
- Add self-contradiction guard to both prompts ("commit to your facts")
- Render Timmy's chat responses as markdown via marked.js + DOMPurify
instead of raw escaped text
- Suppress empty briefing notification on startup when there are 0
pending approval items
- Add calculator to session response sanitizer regex
- 18 new calculator tests, 2 updated briefing notification tests
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Set overflow:hidden on mc-main to prevent page-level scrolling
- Add max-height:100% to sidebar and chat panel to contain within viewport
- Use flex-wrap:nowrap on layout row to prevent column stacking on desktop
- Move scrollChat() to hx-on::after-settle for reliable post-swap scrolling
- Use requestAnimationFrame for smooth scroll-to-bottom timing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove show_tool_calls kwarg (not in Agno 2.5.3), which crashed Agent.__init__
- Guard memory_search against top_k=None from model, return formatted string
- Skip Telegram/Discord startup silently when no token configured
- Replace placeholder MEMORY.md with proper structured hot memory document
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add fixtures required by functional test suite:
- docker_stack: Docker container test URL (skips if FUNCTIONAL_DOCKER != 1)
- serve_client: FastAPI TestClient for timmy-serve app
- tdd_runner: Alias for self_tdd_runner
Fixes CI errors in test_docker_swarm.py, test_l402_flow.py, test_cli.py
- Fix recall_personal_facts() call - remove unsupported limit parameter
- Replace 4 slow E2E test files with single fast test file
- All 6 E2E tests complete in ~9 seconds (was 60+ seconds)
- Reuse browser session across tests (module-scoped fixture)
- Combine related checks into single tests
- Add HTTP-only smoke test for speed
- YAML-based provider configuration (config/providers.yaml)
- Priority-ordered provider routing
- Circuit breaker pattern for failing providers
- Health check and availability monitoring
- Metrics tracking (latency, errors, success rates)
- Support for Ollama, OpenAI, Anthropic, AirLLM providers
- Automatic failover on rate limits or errors
- REST API endpoints for monitoring and control
- 41 comprehensive tests
API Endpoints:
- POST /api/v1/router/complete - Chat completion with failover
- GET /api/v1/router/status - Provider health status
- GET /api/v1/router/metrics - Detailed metrics
- GET /api/v1/router/providers - List all providers
- POST /api/v1/router/providers/{name}/control - Enable/disable/reset
- POST /api/v1/router/health-check - Run health checks
- GET /api/v1/router/config - View configuration
Timmy was exhibiting severe incoherence (no memory between messages, tool call
leakage, chain-of-thought narration, random tool invocations) due to creating
a brand new agent per HTTP request and giving a 3B model (llama3.2) a 73-line
system prompt with complex tool-calling instructions it couldn't follow.
Key changes:
- Add session.py singleton with stable session_id for conversation continuity
- Add _model_supports_tools() to strip tools from small models (< 7B)
- Add two-tier prompts: lite (12 lines) for small models, full for capable ones
- Add response sanitizer to strip leaked JSON tool calls and CoT narration
- Set show_tool_calls=False to prevent raw tool JSON in output
- Wire ConversationManager for user name extraction
- Deprecate orphaned memory_layers.py (unused 4-layer system)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit replaces the previous memory_layers.py with a proper three-tier
memory system as specified by the user:
## Tier 1 — Hot Memory (MEMORY.md)
- Single flat file always loaded into system context
- Contains: current status, standing rules, agent roster, key decisions
- ~300 lines max, pruned monthly
- Managed by HotMemory class
## Tier 2 — Structured Vault (memory/)
- Directory with three namespaces:
• self/ — identity.md, user_profile.md, methodology.md
• notes/ — session logs, AARs, research
• aar/ — post-task retrospectives
- Markdown format, Obsidian-compatible
- Append-only, date-stamped
- Managed by VaultMemory class
## Handoff Protocol
- last-session-handoff.md written at session end
- Contains: summary, key decisions, open items, next steps
- Auto-loaded at next session start
- Maintains continuity across resets
## Implementation
### New Files:
- src/timmy/memory_system.py — Core memory system
- MEMORY.md — Hot memory template
- memory/self/*.md — Identity, user profile, methodology
### Modified:
- src/timmy/agent.py — Integrated with memory system
- create_timmy() injects memory context
- TimmyWithMemory class with automatic fact extraction
- tests/test_agent.py — Updated for memory context
## Key Principles
- Hot memory = small and curated
- Vault = append-only, never delete
- Handoffs = continuity mechanism
- Flat files = human-readable, portable
## Usage
All 973 tests pass.
Keep Mission Control link from this branch alongside SWARM and SPARK
links from main. All 939 tests pass.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Keep both L402/privacy settings from security hardening PR and
self-modification settings. All 939 tests pass.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The POST /swarm/tasks endpoint now triggers an automatic auction via
asyncio.create_task. Tests must allow tasks to be in bidding, assigned,
or failed status since the background auction may resolve before the
follow-up GET query.
All 895 tests pass.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Workset E: Swarm System Realization
- Verified PersonaNode bidding system is properly connected
- Coordinator already subscribes personas to task announcements
- Auction system works when /tasks/auction endpoint is used
## Workset F: Testing & Reliability
- Mobile nav: Add MOBILE link to desktop header (UX-01)
- Voice TTS: Verified graceful degradation already implemented
- Registry: Add proper connection cleanup with try/finally
## Workset G: Performance & Architecture
- Fix module path: websocket.handler -> ws_manager.handler
- Registry connections now properly closed after operations
All 895 tests pass.
Addresses QUALITY_ANALYSIS.md:
- UX-01: /mobile route now in desktop nav
- PERF-01: Connection cleanup improved (P3)
- FUNC-01/02: Verified bidding system operational
Adds SelfModifyLoop — an edit→validate→test→commit cycle that can read
its own failure reports, diagnose root causes, and restart autonomously.
Key capabilities:
- Multi-backend LLM: Anthropic Claude API, Ollama, or auto-detect
- Syntax validation via compile() before writing to disk
- Autonomous self-correction loop with configurable max cycles
- XML-based output format to avoid triple-quote delimiter conflicts
- Branch creation skipped by default to prevent container restarts
- CLI: self-modify run "instruction" --backend auto --autonomous
- 939 tests passing, 30 skipped
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds SelfModifyLoop — an edit→validate→test→commit cycle that can read
its own failure reports, diagnose root causes, and restart autonomously.
Key capabilities:
- Multi-backend LLM: Anthropic Claude API, Ollama, or auto-detect
- Syntax validation via compile() before writing to disk
- Autonomous self-correction loop with configurable max cycles
- XML-based output format to avoid triple-quote delimiter conflicts
- Branch creation skipped by default to prevent container restarts
- CLI: self-modify run "instruction" --backend auto --autonomous
- 939 tests passing, 30 skipped
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Agno Toolkit API fix (1bc2cdc) wasn't taking effect because
Python was loading stale __pycache__/*.pyc files with the old
add_tool() calls. Now `make nuke` clears all bytecache, and
`make dev` sets PYTHONDONTWRITEBYTECODE=1 to prevent .pyc
creation during development.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Security (Workset A)
- XSS: Verified templates use safe DOM methods (textContent, createElement)
- Secrets: Fail-fast in production mode when L402 secrets not set
- Environment mode: Add TIMMY_ENV (development|production) validation
## Privacy (Workset C)
- Add telemetry_enabled config (default: False for sovereign AI)
- Pass telemetry setting to Agno Agent
- Update .env.example with TELEMETRY_ENABLED and TIMMY_ENV docs
## Agent Intelligence (Workset D)
- Enhanced TIMMY_SYSTEM_PROMPT with:
- Tool usage guidelines (when to use, when not to)
- Memory awareness documentation
- Operating mode documentation
- Help reduce unnecessary tool calls for simple queries
All 895 tests pass.
Telemetry disabled by default aligns with sovereign AI vision.