Commit Graph

69 Commits

Author SHA1 Message Date
Claude
63bbe2a288 feat: add sovereign biblical text integration module (scripture)
Implement the core scripture module for local-first ESV text storage,
verse retrieval, reference parsing, original language support,
cross-referencing, topical mapping, and automated meditation workflows.

Architecture:
- scripture/constants.py: 66-book Protestant canon with aliases and metadata
- scripture/models.py: Pydantic models with integer-encoded verse IDs
- scripture/parser.py: Regex-based reference extraction and formatting
- scripture/store.py: SQLite-backed verse/xref/topic/Strong's storage
- scripture/memory.py: Tripartite memory (working/long-term/associative)
- scripture/meditation.py: Sequential/thematic/lectionary meditation scheduler
- dashboard/routes/scripture.py: REST endpoints for all scripture operations
- config.py: scripture_enabled, translation, meditation settings
- 95 comprehensive tests covering all modules and routes

https://claude.ai/code/session_015wv7FM6BFsgZ35Us6WeY7H
2026-02-26 17:06:00 +00:00
Alexander Payne
431cf3e020 merge: resolve conflicts with main, keep comprehensive chat pipeline
Resolved merge conflicts in agents.py and test_task_queue.py:
- Keep full chat-to-task pipeline (agent/priority extraction, question
  filtering, context injection) over simpler main version
- Incorporate test_briefing_task_queue_summary from main
- All 64 task queue tests pass

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 11:47:34 -05:00
Alexander Payne
3ca8e9f2d6 fix: chat evaluation bugs — task pipeline, prompt grounding, markdown rendering
Addresses 14 bugs from 3 rounds of deep chat evaluation:

- Add chat-to-task pipeline in agents.py with regex-based intent detection,
  agent extraction, priority extraction, and title cleaning
- Filter meta-questions ("how do I create a task?") from task creation
- Inject real-time date/time context into every chat message
- Inject live queue state when user asks about tasks
- Ground system prompts with agent roster, honesty guardrails, self-knowledge,
  math delegation template, anti-filler rules, values-conflict guidance
- Add CSS for markdown code blocks, inline code, lists, blockquotes in chat
- Add highlight.js CDN for syntax highlighting in chat responses
- Reduce small-model memory context budget (4000→2000) for expanded prompt
- Add 27 comprehensive tests covering the full chat-to-task pipeline

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 11:42:42 -05:00
Alexander Whitestone
32ad43a61a Merge pull request #51 from AlexanderWhitestone/feature/task-queue-and-ui-fixes
feat: wire chat-to-task-queue and briefing integration
2026-02-26 11:31:25 -05:00
Alexander Payne
18bc64b36d feat: Self-Coding Foundation (Phase 1)
Implements the foundational infrastructure for Timmy's self-modification capability:

## New Services

1. **GitSafety** (src/self_coding/git_safety.py)
   - Atomic git operations with rollback capability
   - Snapshot/restore for safe experimentation
   - Feature branch management (timmy/self-edit/{timestamp})
   - Merge to main only after tests pass

2. **CodebaseIndexer** (src/self_coding/codebase_indexer.py)
   - AST-based parsing of Python source files
   - Extracts classes, functions, imports, docstrings
   - Builds dependency graph for blast radius analysis
   - SQLite storage with hash-based incremental indexing
   - get_summary() for LLM context (<4000 tokens)
   - get_relevant_files() for task-based file discovery

3. **ModificationJournal** (src/self_coding/modification_journal.py)
   - Persistent log of all self-modification attempts
   - Tracks outcomes: success, failure, rollback
   - find_similar() for learning from past attempts
   - Success rate metrics and recent failure tracking
   - Supports vector embeddings (Phase 2)

4. **ReflectionService** (src/self_coding/reflection.py)
   - LLM-powered analysis of modification attempts
   - Generates lessons learned from successes and failures
   - Fallback templates when LLM unavailable
   - Supports context from similar past attempts

## Test Coverage

- 104 new tests across 7 test files
- 95% code coverage on self_coding module
- Green path tests: full workflow integration
- Red path tests: errors, rollbacks, edge cases
- Safety constraint tests: test coverage requirements, protected files

## Usage

    from self_coding import GitSafety, CodebaseIndexer, ModificationJournal

    git = GitSafety(repo_path=/path/to/repo)
    indexer = CodebaseIndexer(repo_path=/path/to/repo)
    journal = ModificationJournal()

Phase 2 will build the Self-Edit MCP Tool that orchestrates these services.
2026-02-26 11:08:05 -05:00
Alexander Payne
bc9089ef96 feat: wire chat-to-task-queue and briefing integration
- Chat messages like "add X to the queue" or "create a task" are
  intercepted and create a task_queue entry with pending_approval
  status instead of going through to the LLM
- Briefing engine now gathers task queue stats (pending, running,
  completed, failed) and includes them in the morning briefing prompt
- 7 new tests covering detection patterns, chat integration, and
  briefing summary

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 10:33:14 -05:00
Alexander Payne
5f9bbb8435 feat: add task queue with human-in-the-loop approval + work orders + UI bug fixes
Task Queue system:
- New /tasks page with three-column layout (Pending/Active/Completed)
- Full CRUD API at /api/tasks with approve/veto/modify/pause/cancel/retry
- SQLite persistence in task_queue table
- WebSocket live updates via ws_manager
- Create task modal with agent assignment and priority
- Auto-approve rules for low-risk tasks
- HTMX polling for real-time column updates
- HOME TASK buttons now link to task queue with agent pre-selected
- MARKET HIRE buttons link to task queue with agent pre-selected

Work Order system:
- External submission API for agents/users (POST /work-orders/submit)
- Risk scoring and configurable auto-execution thresholds
- Dashboard at /work-orders/queue with approve/reject/execute flow
- Integration with swarm task system for execution

UI & Dashboard bug fixes:
- EVENTS: add startup event so page is never empty
- LEDGER: fix empty filter params in URL
- MISSION CONTROL: LLM backend and model now read from /health
- MISSION CONTROL: agent count fallback to /swarm/agents
- SWARM: HTMX fallback loads initial data if WebSocket is slow
- MEMORY: add edit/delete buttons for personal facts
- UPGRADES: add empty state guidance with links
- BRIEFING: add regenerate button and POST /briefing/regenerate endpoint

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 10:27:08 -05:00
Alexander Payne
6e6b4355bb fix: calculator tool, markdown rendering, prompt guardrails, briefing notification
- Add sandboxed calculator tool to Timmy's toolkit so arithmetic questions
  get exact answers instead of LLM hallucinations
- Update system prompts (lite + full) to instruct Timmy to always use the
  calculator and never attempt multi-digit math in his head
- Add self-contradiction guard to both prompts ("commit to your facts")
- Render Timmy's chat responses as markdown via marked.js + DOMPurify
  instead of raw escaped text
- Suppress empty briefing notification on startup when there are 0
  pending approval items
- Add calculator to session response sanitizer regex
- 18 new calculator tests, 2 updated briefing notification tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 09:35:59 -05:00
Alexander Payne
05d4dc997c fix: chat panel scroll — internal scroll on #chat-log, auto-scroll on new messages
- Set overflow:hidden on mc-main to prevent page-level scrolling
- Add max-height:100% to sidebar and chat panel to contain within viewport
- Use flex-wrap:nowrap on layout row to prevent column stacking on desktop
- Move scrollChat() to hx-on::after-settle for reliable post-swap scrolling
- Use requestAnimationFrame for smooth scroll-to-bottom timing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 09:15:40 -05:00
Alexander Payne
f95c9606f1 fix: Timmy startup crashes and clean initialization
- Remove show_tool_calls kwarg (not in Agno 2.5.3), which crashed Agent.__init__
- Guard memory_search against top_k=None from model, return formatted string
- Skip Telegram/Discord startup silently when no token configured
- Replace placeholder MEMORY.md with proper structured hot memory document

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 09:11:48 -05:00
Alexander Whitestone
dccd13df8e Merge pull request #46 from AlexanderWhitestone/feature/memory-layers-and-conversational-ai
feat: Event Log, Ledger, Memory, Cascade Router, Upgrade Queue, Activity Feed
2026-02-26 08:33:32 -05:00
Alexander Payne
96ed82d81e fix: memory route bug + fast E2E tests under 10 seconds
- Fix recall_personal_facts() call - remove unsupported limit parameter
- Replace 4 slow E2E test files with single fast test file
- All 6 E2E tests complete in ~9 seconds (was 60+ seconds)
- Reuse browser session across tests (module-scoped fixture)
- Combine related checks into single tests
- Add HTTP-only smoke test for speed
2026-02-26 08:08:32 -05:00
Alexander Payne
d8d976aa60 feat: complete Event Log, Ledger, Memory, Cascade Router, Upgrade Queue, Activity Feed
This commit implements six major features:

1. Event Log System (src/swarm/event_log.py)
   - SQLite-based audit trail for all swarm events
   - Task lifecycle tracking (created, assigned, completed, failed)
   - Agent lifecycle tracking (joined, left, status changes)
   - Integrated with coordinator for automatic logging
   - Dashboard page at /swarm/events

2. Lightning Ledger (src/lightning/ledger.py)
   - Transaction tracking for Lightning Network payments
   - Balance calculations (incoming, outgoing, net, available)
   - Integrated with payment_handler for automatic logging
   - Dashboard page at /lightning/ledger

3. Semantic Memory / Vector Store (src/memory/vector_store.py)
   - Embedding-based similarity search for Echo agent
   - Fallback to keyword matching if sentence-transformers unavailable
   - Personal facts storage and retrieval
   - Dashboard page at /memory

4. Cascade Router Integration (src/timmy/cascade_adapter.py)
   - Automatic LLM failover between providers (Ollama → AirLLM → API)
   - Circuit breaker pattern for failing providers
   - Metrics tracking per provider (latency, error rates)
   - Dashboard status page at /router/status

5. Self-Upgrade Approval Queue (src/upgrades/)
   - State machine for self-modifications: proposed → approved/rejected → applied/failed
   - Human approval required before applying changes
   - Git integration for branch management
   - Dashboard queue at /self-modify/queue

6. Real-Time Activity Feed (src/events/broadcaster.py)
   - WebSocket-based live activity streaming
   - Bridges event_log to dashboard clients
   - Activity panel on /swarm/live

Tests:
- 101 unit tests passing
- 4 new E2E test files for Selenium testing
- Run with: SELENIUM_UI=1 pytest tests/functional/ -v --headed

Documentation:
- 6 ADRs (017-022) documenting architecture decisions
- Implementation summary in docs/IMPLEMENTATION_SUMMARY.md
- Architecture diagram in docs/architecture-v2.md
2026-02-26 08:01:01 -05:00
AlexanderWhitestone
930ec9eb80 Security: Fix XSS vulnerabilities in dashboard templates and improve mobile test UI safety 2026-02-26 02:07:54 -05:00
Alexander Payne
8d85f95ee5 Fix router disabled provider check + comprehensive functional tests
Fixes:
- Router now properly skips disabled providers in complete() method
- Fixed avg_latency calculation comment in tests (now correctly documents behavior)

New Test Suites:
- tests/test_functional_router.py: 10 functional tests for router
- tests/test_functional_mcp.py: 15 functional tests for MCP discovery/bootstrap
- tests/test_integration_full.py: 14 end-to-end integration tests

Total: 39 new functional/integration tests

All 144 tests passing (105 router/mcp + 39 functional/integration)
2026-02-25 20:22:51 -05:00
Alexander Whitestone
3792bf16cf Merge pull request #44 from AlexanderWhitestone/feature/memory-layers-and-conversational-ai
Phase 3-4: Cascade LLM Router + Tool Registry Auto-Discovery
2026-02-25 20:04:30 -05:00
Alexander Payne
56437751d3 Phase 4: Tool Registry Auto-Discovery
- @mcp_tool decorator for marking functions as tools
- ToolDiscovery class for introspecting modules and packages
- Automatic JSON schema generation from type hints
- AST-based discovery for files (without importing)
- Auto-bootstrap on startup (packages=['tools'] by default)
- Support for tags, categories, and metadata
- Updated registry with register_tool() convenience method
- Environment variable MCP_AUTO_BOOTSTRAP to disable
- 39 tests with proper isolation and cleanup

Files Added:
- src/mcp/discovery.py: Tool discovery and introspection
- src/mcp/bootstrap.py: Auto-bootstrap functionality
- tests/test_mcp_discovery.py: 26 tests
- tests/test_mcp_bootstrap.py: 13 tests

Files Modified:
- src/mcp/registry.py: Added tags, source_module, auto_discovered fields
- src/mcp/__init__.py: Export discovery and bootstrap modules
- src/dashboard/app.py: Auto-bootstrap on startup
2026-02-25 19:59:42 -05:00
Alexander Payne
c658ca829c Phase 3: Cascade LLM Router with automatic failover
- YAML-based provider configuration (config/providers.yaml)
- Priority-ordered provider routing
- Circuit breaker pattern for failing providers
- Health check and availability monitoring
- Metrics tracking (latency, errors, success rates)
- Support for Ollama, OpenAI, Anthropic, AirLLM providers
- Automatic failover on rate limits or errors
- REST API endpoints for monitoring and control
- 41 comprehensive tests

API Endpoints:
- POST /api/v1/router/complete - Chat completion with failover
- GET /api/v1/router/status - Provider health status
- GET /api/v1/router/metrics - Detailed metrics
- GET /api/v1/router/providers - List all providers
- POST /api/v1/router/providers/{name}/control - Enable/disable/reset
- POST /api/v1/router/health-check - Run health checks
- GET /api/v1/router/config - View configuration
2026-02-25 19:43:43 -05:00
Alexander Payne
a719c7538d Implement MCP system, Event Bus, and Sub-Agents
## 1. MCP (Model Context Protocol) Implementation

### Registry (src/mcp/registry.py)
- Tool registration with JSON schemas
- Dynamic tool discovery
- Health tracking per tool
- Metrics collection (latency, error rates)
- @register_tool decorator for easy registration

### Server (src/mcp/server.py)
- MCPServer class implementing MCP protocol
- MCPHTTPServer for FastAPI integration
- Standard endpoints: list_tools, call_tool, get_schema

### Schemas (src/mcp/schemas/base.py)
- create_tool_schema() helper
- Common parameter types
- Standard return types

### Bootstrap (src/mcp/bootstrap.py)
- Automatic tool module loading
- Status reporting

## 2. MCP-Compliant Tools (src/tools/)

| Tool | Purpose | Category |
|------|---------|----------|
| web_search | DuckDuckGo search | research |
| read_file | File reading | files |
| write_file | File writing (confirmation) | files |
| list_directory | Directory listing | files |
| python | Python code execution | code |
| memory_search | Vector memory search | memory |

All tools have proper schemas, error handling, and MCP registration.

## 3. Event Bus (src/events/bus.py)

- Async publish/subscribe pattern
- Pattern matching with wildcards (agent.task.*)
- Event history tracking
- Concurrent handler execution
- Module-level singleton for system-wide use

## 4. Sub-Agents (src/agents/)

All agents inherit from BaseAgent with:
- Agno Agent integration
- MCP tool registry access
- Event bus connectivity
- Structured logging

### Agent Roster

| Agent | Role | Tools | Purpose |
|-------|------|-------|---------|
| Seer | Research | web_search, read_file, memory_search | Information gathering |
| Forge | Code | python, write_file, read_file | Code generation |
| Quill | Writing | write_file, read_file, memory_search | Content creation |
| Echo | Memory | memory_search, read_file, write_file | Context retrieval |
| Helm | Routing | memory_search | Task routing decisions |
| Timmy | Orchestrator | All tools | Coordination & user interface |

### Timmy Orchestrator
- Analyzes user requests
- Routes to appropriate sub-agent
- Handles direct queries
- Manages swarm coordination
- create_timmy_swarm() factory function

## 5. Integration

All components wired together:
- Tools auto-register on import
- Agents connect to event bus
- MCP server provides HTTP API
- Ready for dashboard integration

## Tests
- All 973 existing tests pass
- New components tested manually
- Import verification successful

Next steps: Cascade Router, Self-Upgrade Loop, Dashboard integration
2026-02-25 19:26:24 -05:00
Alexander Payne
26e1691099 Fix Timmy coherence: persistent session, model-aware tools, response sanitization
Timmy was exhibiting severe incoherence (no memory between messages, tool call
leakage, chain-of-thought narration, random tool invocations) due to creating
a brand new agent per HTTP request and giving a 3B model (llama3.2) a 73-line
system prompt with complex tool-calling instructions it couldn't follow.

Key changes:
- Add session.py singleton with stable session_id for conversation continuity
- Add _model_supports_tools() to strip tools from small models (< 7B)
- Add two-tier prompts: lite (12 lines) for small models, full for capable ones
- Add response sanitizer to strip leaked JSON tool calls and CoT narration
- Set show_tool_calls=False to prevent raw tool JSON in output
- Wire ConversationManager for user name extraction
- Deprecate orphaned memory_layers.py (unused 4-layer system)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 19:18:08 -05:00
Alexander Payne
16b65b28e8 Add Tier 3: Semantic Memory (vector search)
Completes the three-tier memory architecture:

## Tier 3 — Semantic Search
- Vector embeddings over all vault files
- Similarity-based retrieval
- memory_search tool for agents
- Fallback to hash-based embeddings if transformers unavailable

## Implementation
- src/timmy/semantic_memory.py — Core semantic memory
- Chunking strategy: paragraphs → sentences
- SQLite storage for vectors
- cosine_similarity for ranking

## Integration
- Added memory_search to create_full_toolkit()
- Updated prompts with memory_search examples
- Tool triggers: past conversations, reminders

## Features
- Automatic vault indexing
- Source file tracking (re-indexes on change)
- Similarity scoring
- Context retrieval for queries

## Usage

All 973 tests pass.
2026-02-25 18:25:20 -05:00
Alexander Payne
7838df19b0 Implement three-tier memory architecture (Hot/Vault/Handoff)
This commit replaces the previous memory_layers.py with a proper three-tier
memory system as specified by the user:

## Tier 1 — Hot Memory (MEMORY.md)
- Single flat file always loaded into system context
- Contains: current status, standing rules, agent roster, key decisions
- ~300 lines max, pruned monthly
- Managed by HotMemory class

## Tier 2 — Structured Vault (memory/)
- Directory with three namespaces:
  • self/ — identity.md, user_profile.md, methodology.md
  • notes/ — session logs, AARs, research
  • aar/ — post-task retrospectives
- Markdown format, Obsidian-compatible
- Append-only, date-stamped
- Managed by VaultMemory class

## Handoff Protocol
- last-session-handoff.md written at session end
- Contains: summary, key decisions, open items, next steps
- Auto-loaded at next session start
- Maintains continuity across resets

## Implementation

### New Files:
- src/timmy/memory_system.py — Core memory system
- MEMORY.md — Hot memory template
- memory/self/*.md — Identity, user profile, methodology

### Modified:
- src/timmy/agent.py — Integrated with memory system
  - create_timmy() injects memory context
  - TimmyWithMemory class with automatic fact extraction
- tests/test_agent.py — Updated for memory context

## Key Principles
- Hot memory = small and curated
- Vault = append-only, never delete
- Handoffs = continuity mechanism
- Flat files = human-readable, portable

## Usage

All 973 tests pass.
2026-02-25 18:17:43 -05:00
Alexander Payne
625806daf5 Fine-tune Timmy's conversational AI with memory layers
## Enhanced System Prompt
- Detailed tool usage guidelines with explicit examples
- Clear DO and DON'T examples for tool selection
- Memory system documentation
- Conversation flow guidelines
- Context awareness instructions

## Memory Layer System (NEW)
Implemented 3-layer memory architecture:

1. WORKING MEMORY (src/timmy/memory_layers.py)
   - Immediate context (last 20 messages)
   - Topic tracking
   - Tool call tracking
   - Fast, ephemeral

2. SHORT-TERM MEMORY (Agno SQLite)
   - Recent conversations (100)
   - Persists across restarts
   - Managed by Agno Agent

3. LONG-TERM MEMORY (src/timmy/memory_layers.py)
   - Facts about user (name, preferences)
   - SQLite storage in data/memory/
   - Auto-extraction from conversations
   - User profile generation

## Memory Manager (NEW)
- Central coordinator for all memory layers
- Context injection into prompts
- Fact extraction and storage
- Session management

## TimmyWithMemory Class (NEW)
- Wrapper around Agno Agent with explicit memory
- Auto-injects user context from LTM
- Tracks exchanges across all layers
- Simple chat() interface

## Agent Configuration
- Increased num_history_runs: 10 -> 20
- Better conversational context retention

## Tests
- All 973 tests pass
- Fixed test expectations for new config
- Fixed module path in test_scary_paths.py

## Files Added/Modified
- src/timmy/prompts.py - Enhanced with memory and tool guidance
- src/timmy/agent.py - Added TimmyWithMemory class
- src/timmy/memory_layers.py - NEW memory system
- src/timmy/conversation.py - NEW conversation manager
- tests/ - Updated for new config
2026-02-25 18:07:44 -05:00
Alexander Payne
90a93aa070 fix: resolve merge conflict in base.html nav with main
Keep Mission Control link from this branch alongside SWARM and SPARK
links from main. All 939 tests pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 17:51:15 -05:00
Alexander Whitestone
d853e931ec Merge pull request #40 from AlexanderWhitestone/kimi/phase2-swarm-hardening-v2
Phase 2: Swarm hardening, auto-auction, WebSocket fix
2026-02-25 17:34:13 -05:00
Alexander Payne
4b12aca090 Swarm hardening: mobile nav, registry cleanup, module path fix
## Workset E: Swarm System Realization
- Verified PersonaNode bidding system is properly connected
- Coordinator already subscribes personas to task announcements
- Auction system works when /tasks/auction endpoint is used

## Workset F: Testing & Reliability
- Mobile nav: Add MOBILE link to desktop header (UX-01)
- Voice TTS: Verified graceful degradation already implemented
- Registry: Add proper connection cleanup with try/finally

## Workset G: Performance & Architecture
- Fix module path: websocket.handler -> ws_manager.handler
- Registry connections now properly closed after operations

All 895 tests pass.

Addresses QUALITY_ANALYSIS.md:
- UX-01: /mobile route now in desktop nav
- PERF-01: Connection cleanup improved (P3)
- FUNC-01/02: Verified bidding system operational
2026-02-25 17:26:42 -05:00
Alexander Payne
8fec9c41a5 feat: autonomous self-modifying agent with multi-backend LLM support
Adds SelfModifyLoop — an edit→validate→test→commit cycle that can read
its own failure reports, diagnose root causes, and restart autonomously.

Key capabilities:
- Multi-backend LLM: Anthropic Claude API, Ollama, or auto-detect
- Syntax validation via compile() before writing to disk
- Autonomous self-correction loop with configurable max cycles
- XML-based output format to avoid triple-quote delimiter conflicts
- Branch creation skipped by default to prevent container restarts
- CLI: self-modify run "instruction" --backend auto --autonomous
- 939 tests passing, 30 skipped

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 17:23:47 -05:00
Alexander Payne
4961c610f2 Security, privacy, and agent intelligence hardening
## Security (Workset A)
- XSS: Verified templates use safe DOM methods (textContent, createElement)
- Secrets: Fail-fast in production mode when L402 secrets not set
- Environment mode: Add TIMMY_ENV (development|production) validation

## Privacy (Workset C)
- Add telemetry_enabled config (default: False for sovereign AI)
- Pass telemetry setting to Agno Agent
- Update .env.example with TELEMETRY_ENABLED and TIMMY_ENV docs

## Agent Intelligence (Workset D)
- Enhanced TIMMY_SYSTEM_PROMPT with:
  - Tool usage guidelines (when to use, when not to)
  - Memory awareness documentation
  - Operating mode documentation
- Help reduce unnecessary tool calls for simple queries

All 895 tests pass.
Telemetry disabled by default aligns with sovereign AI vision.
2026-02-25 15:32:19 -05:00
Alexander Payne
1bc2cdcb2e Fix Agno Toolkit API compatibility issues
- Change Toolkit.add_tool() to Toolkit.register() (method was renamed in Agno)
- Fix PythonTools method: python -> run_python_code
- Fix FileTools method: write_file -> save_file
- Fix FileTools base_dir parameter: str -> Path object
- Fix Agent tools parameter: pass Toolkit wrapped in list

These fixes resolve critical startup errors that prevented Timmy agent from initializing:
- AttributeError: 'Toolkit' object has no attribute 'add_tool'
- AttributeError: 'PythonTools' object has no attribute 'python'
- TypeError: 'Toolkit' object is not iterable

All 895 tests pass after these changes.

Quality review: Agent now fully functional with working inference, memory,
and self-awareness capabilities.
2026-02-25 14:11:13 -05:00
Claude
2e7f3d1b29 feat: centralize L402 config, automate Metal install, fix watchdog cleanup
- config.py: add L402_HMAC_SECRET, L402_MACAROON_SECRET, LIGHTNING_BACKEND
  to pydantic-settings with startup warnings for default secrets
- l402_proxy.py, mock_backend.py, factory.py: migrate from os.environ.get()
  to `from config import settings` per project convention
- Makefile: `make install-creative` now auto-installs PyTorch nightly with
  Metal (MPS) support on Apple Silicon instead of just printing a note
- activate_self_tdd.sh: add PID file (.watchdog.pid) and EXIT trap so
  Ctrl-C cleanly stops both the dashboard and the watchdog process
- .gitignore: add .watchdog.pid

https://claude.ai/code/session_01A81E5HMxZEPxzv2acNo35u
2026-02-25 18:19:22 +00:00
Alexander Whitestone
c430f8002c Merge pull request #29 from AlexanderWhitestone/fix/xss-prevention-mobile-test
Security: XSS Prevention in Mobile Test Page
2026-02-25 08:01:05 -05:00
Alexander Payne
3463f4e4a4 fix: rename src/websocket to src/ws_manager to avoid websocket-client clash
selenium depends on websocket-client which installs a top-level
`websocket` package that shadows our src/websocket/ module on CI.
Renaming to ws_manager eliminates the conflict entirely — no more
sys.path hacks needed in conftest or Selenium tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 07:57:28 -05:00
Alexander Payne
29292cfb84 feat: single-command Docker startup, fix UI bugs, add Selenium tests
- Add `make up` / `make up DEV=1` for one-command Docker startup with
  optional hot-reload via docker-compose.dev.yml overlay
- Add `timmy up --dev` / `timmy down` CLI commands
- Fix cross-platform font resolution in creative assembler (7 test failures)
- Fix Ollama host URL not passed to Agno model (container connectivity)
- Fix task panel route shadowing by reordering literal routes before
  parameterized routes in swarm.py
- Fix chat input not clearing after send (hx-on::after-request)
- Fix chat scroll overflow (CSS min-height: 0 on flex children)
- Add Selenium UI smoke tests (17 tests, gated behind SELENIUM_UI=1)
- Install fonts-dejavu-core in Dockerfile for container font support
- Remove obsolete docker-compose version key
- Bump CSS cache-bust to v4

833 unit tests pass, 15 Selenium tests pass (2 skipped).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 07:20:56 -05:00
AlexanderWhitestone
bc1be23e23 security: prevent XSS in mobile-test by using textContent 2026-02-25 02:08:02 -05:00
Claude
15596ca325 feat: add Discord integration with chat_bridge abstraction layer
Introduces a vendor-agnostic chat platform architecture:

- chat_bridge/base.py: ChatPlatform ABC, ChatMessage, ChatThread
- chat_bridge/registry.py: PlatformRegistry singleton
- chat_bridge/invite_parser.py: QR + Ollama vision invite extraction
- chat_bridge/vendors/discord.py: DiscordVendor with native threads

Workflow: paste a screenshot of a Discord invite or QR code at
POST /discord/join → Timmy extracts the invite automatically.

Every Discord conversation gets its own thread, keeping channels clean.
Bot responds to @mentions and DMs, routes through Timmy agent.

43 new tests (base classes, registry, invite parser, vendor, routes).

https://claude.ai/code/session_01WU4h3cQQiouMwmgYmAgkMM
2026-02-25 01:11:14 +00:00
Claude
65a278dbee fix: comprehensive iPhone UI overhaul — glassmorphism, responsive layouts, theme unification
- base.html: add missing {% block extra_styles %}, mobile hamburger menu with
  slide-out nav, interactive-widget viewport meta, -webkit-text-size-adjust
- style.css: define 15+ missing CSS variables (--bg-secondary, --text-muted,
  --accent, --success, --danger, etc.), add missing utility classes (.grid,
  .stat, .agent-card, .agent-avatar, .form-group), glassmorphism card effects,
  iPhone breakpoints (768px, 390px), 44pt min touch targets, smooth animations
- mobile.html: rewrite with proper theme variables, glass cards, touch-friendly
  quick actions grid, chat with proper message bubbles
- swarm_live.html: replace undefined CSS vars, use mc-panel theme cards
- marketplace.html: responsive agent cards that stack on iPhone, themed pricing
- voice_button.html & voice_enhanced.html: proper theme integration, touch-sized
  buttons, themed result containers
- create_task.html: mobile-friendly forms with 16px font (prevents iOS zoom)
- tools.html & creative.html: themed headers, responsive column stacking
- spark.html: replace all hardcoded blue (#00d4ff) colors with theme purple/orange
- briefing.html: replace hardcoded bootstrap colors with theme variables

Fixes: header nav overflow on iPhone (7 links in single row), missing
extra_styles block silently dropping child template styles, undefined CSS
variables breaking mobile/swarm/marketplace/voice pages, sub-44pt touch
targets, missing -webkit-text-size-adjust, inconsistent color themes.

97 UI tests pass (91 UI-specific + 6 creative route).

https://claude.ai/code/session_01JiyhGyee2zoMN4p8xWYqEe
2026-02-24 22:25:04 +00:00
Alexander Whitestone
7018a756b3 Merge pull request #22 from AlexanderWhitestone/claude/audit-timmy-dashboard-ft27r 2026-02-24 14:18:29 -05:00
Claude
96c9f1b02f fix: address audit low-hanging fruit — docs accuracy, auction timing, stubs, tests
- Docs: "No Cloud" → "No Cloud AI" (frontend uses CDN for Bootstrap/HTMX/fonts)
- Docs: "600+" → "640+" tests, "20+" → "58" endpoints (actual counts)
- Docs: LND described as "scaffolded" not "gRPC-ready"; remove "agents earn sats"
- Fix auction timing: coordinator sleep(0) → sleep(AUCTION_DURATION_SECONDS)
- agent_core: implement remember() with dedup/eviction, communicate() via swarm comms
- Tests: add CLI tests for chat, think, and backend/model-size forwarding (647 passing)

https://claude.ai/code/session_01SZTwAkTg6v4ybv8g9NLxqN
2026-02-24 18:29:21 +00:00
Alexander Whitestone
03ff505c4b Merge pull request #23 from AlexanderWhitestone/security/macaroon-forgery-and-xss-1771955896 2026-02-24 13:00:52 -05:00
AlexanderWhitestone
4daf382819 security: fix L402 macaroon forgery and XSS in templates 2026-02-24 12:58:19 -05:00
Claude
832478f0d0 fix: serve_chat endpoint bug, stale docs, and license mismatch
- Fix /serve/chat AttributeError: split Request and ChatRequest params
  so auth headers are read from HTTP request, not Pydantic body
- Add regression tests for the serve_chat endpoint bug
- Add agent_core and lightning to pyproject.toml wheel includes
- Replace Apache 2.0 LICENSE with MIT to match pyproject.toml
- Update test count from "228" to "600+" across README, docs, AGENTS.md
- Add 5 missing subsystems to README table (Spark, Creative, Tools,
  Telegram, agent_core/lightning)
- Update AGENTS.md project structure with 6 missing modules
- Mark completed v2 roadmap items (personas, MCP tools) in AGENTS.md

https://claude.ai/code/session_01GMiccXbo77GkV3TA69x6KS
2026-02-24 17:18:29 +00:00
Claude
b098b00959 test: add integration tests with real media for music video pipeline
Build real PNG, WAV, and MP4 fixtures (no AI models) and exercise the
full assembler and Creative Director pipeline end-to-end.  Fix MoviePy v2
crossfade API (vfx.CrossFadeIn) and font resolution (DejaVu-Sans).

14 new integration tests — 638 total, all passing.

https://claude.ai/code/session_01KJm6jQkNi3aA3yoQJn636c
2026-02-24 16:48:14 +00:00
Claude
1103da339c feat: add full creative studio + DevOps tools (Pixel, Lyra, Reel personas)
Adds 3 new personas (Pixel, Lyra, Reel) and 5 new tool modules:

- Git/DevOps tools (GitPython): clone, status, diff, log, blame, branch,
  add, commit, push, pull, stash — wired to Forge and Helm personas
- Image generation (FLUX via diffusers): text-to-image, storyboards,
  variations — Pixel persona
- Music generation (ACE-Step 1.5): full songs with vocals+instrumentals,
  instrumental tracks, vocal-only tracks — Lyra persona
- Video generation (Wan 2.1 via diffusers): text-to-video, image-to-video
  clips — Reel persona
- Creative Director pipeline: multi-step orchestration that chains
  storyboard → music → video → assembly into 3+ minute final videos
- Video assembler (MoviePy + FFmpeg): stitch clips, overlay audio,
  title cards, subtitles, final export

Also includes:
- Spark Intelligence tool-level + creative pipeline event capture
- Creative Studio dashboard page (/creative/ui) with 4 tabs
- Config settings for all new models and output directories
- pyproject.toml creative optional extra for GPU dependencies
- 107 new tests covering all modules (624 total, all passing)

https://claude.ai/code/session_01KJm6jQkNi3aA3yoQJn636c
2026-02-24 16:31:47 +00:00
Claude
1ab26d30ad feat: integrate Spark Intelligence into Timmy swarm system
Adds a self-evolving cognitive layer inspired by vibeship-spark-intelligence,
adapted for Timmy's agent architecture. Spark captures swarm events, runs
EIDOS prediction-evaluation loops, consolidates memories, and generates
advisory recommendations — all backed by SQLite consistent with existing
patterns.

New modules:
- spark/memory.py — event capture with importance scoring + memory consolidation
- spark/eidos.py — EIDOS cognitive loop (predict → observe → evaluate → learn)
- spark/advisor.py — ranked advisory generation from accumulated intelligence
- spark/engine.py — top-level API wiring all subsystems together

Dashboard:
- /spark/ui — full Spark Intelligence dashboard (3-column: status/advisories,
  predictions/memories, event timeline) with HTMX auto-refresh
- /spark — JSON API for programmatic access
- SPARK link added to navigation header

Integration:
- Coordinator hooks emit Spark events on task post, bid, assign, complete, fail
- EIDOS predictions generated when tasks are posted, evaluated on completion
- Memory consolidation triggers when agents accumulate enough outcomes
- SPARK_ENABLED config toggle (default: true)

Tests: 47 new tests covering all Spark subsystems + dashboard routes.
Full suite: 538 tests passing.

https://claude.ai/code/session_01KJm6jQkNi3aA3yoQJn636c
2026-02-24 15:51:15 +00:00
Alexander Payne
ace5bfdf5f feat: Mission Control dashboard with sovereignty audit + scary path tests
Mission Control Dashboard:
- /swarm/mission-control page with real-time system status
- Sovereignty score display with visual progress bar
- Dependency health grid (Ollama, Redis, Lightning, SQLite)
- Recommendations based on dependency status
- Heartbeat monitor with tick counter
- System metrics: uptime, agents, tasks, sats earned

Health Endpoints:
- /health/sovereignty - Full sovereignty audit report
- /health/components - Component status and config

Tests (TDD approach):
- 11 Mission Control tests (all passing)
- 23 scary path tests for production scenarios
- Concurrent load, memory persistence, edge cases

Total: 525 tests passing
2026-02-22 20:48:14 -05:00
Alexander Payne
14072f9bb5 feat: MCP tools integration for swarm agents
ToolExecutor:
- Persona-specific toolkit selection (forge gets code tools, echo gets search)
- Tool inference from task keywords (search→web_search, code→python)
- LLM-powered reasoning about tool selection
- Graceful degradation when Agno unavailable

PersonaNode Updates:
- Subscribe to swarm:events for task assignments
- Execute tasks using ToolExecutor when assigned
- Complete tasks via comms.complete_task()
- Track current_task for status monitoring

Tests:
- 19 new tests for tool execution
- All 6 personas covered
- Tool inference verification
- Edge cases (no toolkit, unknown tasks)

Total: 491 tests passing
2026-02-22 20:33:26 -05:00
Alexander Payne
c5df954d44 feat: Lightning interface, swarm routing, sovereignty audit, embodiment prep
Lightning Backend Interface:
- Abstract LightningBackend with pluggable implementations
- MockBackend for development (auto-settle invoices)
- LndBackend stub with gRPC integration path documented
- Backend factory for runtime selection via LIGHTNING_BACKEND env

Intelligent Swarm Routing:
- CapabilityManifest for agent skill declarations
- Task scoring based on keywords + capabilities + bid price
- RoutingDecision audit logging to SQLite
- Agent stats tracking (wins, consideration rate)

Sovereignty Audit:
- Comprehensive audit report (docs/SOVEREIGNTY_AUDIT.md)
- 9.2/10 sovereignty score
- Documented all external dependencies and local alternatives

Substrate-Agnostic Agent Interface:
- TimAgent abstract base class
- Perception/Action/Memory/Communication types
- OllamaAdapter implementation
- Foundation for future embodiment (robot, VR)

Tests:
- 36 new tests for Lightning and routing
- 472 total tests passing
- Maintained 0 warning policy
2026-02-22 20:20:11 -05:00
Alexander Payne
f0aa43533f feat: swarm E2E, MCP tools, timmy-serve L402, tests, notifications
Major Features:
- Auto-spawn persona agents (Echo, Forge, Seer) on app startup
- WebSocket broadcasts for real-time swarm UI updates
- MCP tool integration: web search, file I/O, shell, Python execution
- New /tools dashboard page showing agent capabilities
- Real timmy-serve start with L402 payment gating middleware
- Browser push notifications for briefings and task events

Tests:
- test_docker_agent.py: 9 tests for Docker agent runner
- test_swarm_integration_full.py: 18 E2E lifecycle tests
- Fixed all pytest warnings (436 tests, 0 warnings)

Improvements:
- Fixed coroutine warnings in coordinator broadcasts
- Fixed ResourceWarning for unclosed process pipes
- Added pytest-asyncio config to pyproject.toml
- Test isolation with proper event loop cleanup
2026-02-22 19:01:04 -05:00
Claude
167fd0a7b4 Add outcome-based learning system for swarm agents
Introduce a feedback loop where task outcomes (win/loss, success/failure)
feed back into agent bidding strategy. Borrows the "learn from outcomes"
concept from Spark Intelligence but builds it natively on Timmy's existing
SQLite + swarm architecture.

New module: src/swarm/learner.py
- Records every bid outcome with task description context
- Computes per-agent metrics: win rate, success rate, keyword performance
- suggest_bid() adjusts bids based on historical performance
- learned_keywords() discovers what task types agents actually excel at

Changes:
- persona_node: _compute_bid() now consults learner for adaptive adjustments
- coordinator: complete_task/fail_task feed results into learner
- coordinator: run_auction_and_assign records all bid outcomes
- routes/swarm: add /swarm/insights and /swarm/insights/{agent_id} endpoints
- routes/swarm: add POST /swarm/tasks/{task_id}/fail endpoint

All 413 tests pass (23 new + 390 existing).

https://claude.ai/code/session_01E5jhTCwSUnJk9p9zrTMVUJ
2026-02-22 22:04:37 +00:00
Alexander Payne
4020b5222f feat: add Docker-based swarm agent containerization
Add infrastructure for running swarm agents as isolated Docker
containers with HTTP-based coordination, startup recovery, and
enhanced dashboard UI for agent management.

- Dockerfile and docker-compose.yml for multi-service orchestration
- DockerAgentRunner for programmatic container lifecycle management
- Internal HTTP API for container agents to poll tasks and submit bids
- Startup recovery system to reconcile orphaned tasks and stale agents
- Enhanced UI partials for agent panels, chat, and task assignment
- Timmy docker entry point with heartbeat and task polling
- New Makefile targets for Docker workflows
- Tests for swarm recovery

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 16:21:32 -05:00