Commit Graph

182 Commits

Author SHA1 Message Date
Alexander Whitestone
e4d5ec5ed4 Merge pull request #62 from AlexanderWhitestone/claude/grok-backend-monetization-iVc5i 2026-02-26 20:26:15 -05:00
Claude
17059bc0ea feat: add Grok (xAI) as opt-in premium backend with monetization
- Add GrokBackend class in src/timmy/backends.py with full sync/async
  support, health checks, usage stats, and cost estimation in sats
- Add consult_grok tool to Timmy's toolkit for proactive Grok queries
- Extend cascade router with Grok provider type for failover chain
- Add Grok Mode toggle card to Mission Control dashboard (HTMX live)
- Add "Ask Grok" button on chat input for direct Grok queries
- Add /grok/* routes: status, toggle, chat, stats endpoints
- Integrate Lightning invoice generation for Grok usage monetization
- Add GROK_ENABLED, XAI_API_KEY, GROK_DEFAULT_MODEL, GROK_MAX_SATS_PER_QUERY,
  GROK_FREE config settings via pydantic-settings
- Update .env.example and docker-compose.yml with Grok env vars
- Add 21 tests covering backend, tools, and route endpoints (all green)

Local-first ethos preserved: Grok is premium augmentation only,
disabled by default, and Lightning-payable when enabled.

https://claude.ai/code/session_01FygwN8wS8J6WGZ8FPb7XGV
2026-02-27 01:12:51 +00:00
Alexander Whitestone
bb31f322e5 Merge pull request #61 from AlexanderWhitestone/claude/add-github-chat-interface-iZ0yN 2026-02-26 19:41:00 -05:00
Claude
bc2c09d3f8 feat: replace GitHub page with embedded Timmy chat interface
Replaces the marketing landing page with a minimal, full-screen chat
interface that connects to a running Timmy instance. Mobile-first design
with single vertical scroll direction, looping scroll, no zoom, no
buttons — just type and press Enter to talk to Timmy.

- docs/index.html: full rewrite as a clean chat UI with dark terminal
  theme, looping infinite scroll, markdown rendering, connection status,
  and /connect, /clear, /help slash commands
- src/dashboard/app.py: add CORS middleware so the GitHub Pages site can
  reach a local Timmy server cross-origin
- src/config.py: add cors_origins setting (defaults to ["*"])

https://claude.ai/code/session_01AWLxg6KDWsfCATiuvsRMGr
2026-02-27 00:35:33 +00:00
Alexander Whitestone
e0e2a2b9d8 Merge pull request #60 from AlexanderWhitestone/claude/local-models-iphone-EwXtC 2026-02-26 19:24:32 -05:00
Claude
3b7fcc5ebc feat: add in-browser local model support for iPhone via WebLLM
Enable Timmy to run directly on iPhone by loading a small LLM into
the browser via WebGPU (Safari 26+ / iOS 26+). No server connection
required — fully sovereign, fully offline.

New files:
- static/local_llm.js: WebLLM wrapper with model catalogue, WebGPU
  detection, streaming chat, and progress callbacks
- templates/mobile_local.html: Mobile-optimized UI with model
  selector, download progress, LOCAL/SERVER badge, and chat
- tests/dashboard/test_local_models.py: 31 tests covering routes,
  config, template UX, JS asset, and XSS prevention

Changes:
- config.py: browser_model_enabled, browser_model_id,
  browser_model_fallback settings
- routes/mobile.py: /mobile/local page, /mobile/local-models API
- base.html: LOCAL AI nav link

Supported models: SmolLM2-360M (~200MB), Qwen2.5-0.5B (~350MB),
SmolLM2-1.7B (~1GB), Llama-3.2-1B (~700MB). Falls back to
server-side Ollama when local model is unavailable.

https://claude.ai/code/session_01Cqkvr4sZbED7T3iDu1rwSD
2026-02-27 00:03:05 +00:00
Alexander Whitestone
528c86298a Merge pull request #59 from AlexanderWhitestone/claude/refactoring-phase-two-lhBGv 2026-02-26 18:37:24 -05:00
Claude
3adc18c208 chore: gitignore src/data/ (test runtime artifacts)
Test runs generate src/data/swarm.db and src/data/self_modify_reports/
which should not be tracked.

https://claude.ai/code/session_01JNjWfHqusjT3aiN4vvYgUk
2026-02-26 22:09:04 +00:00
Claude
89e677e5cc chore: remove accidentally tracked self_modify_reports
These test artifacts are already in .gitignore (data/self_modify_reports/)
but were included because they landed in src/data/ during test runs.

https://claude.ai/code/session_01JNjWfHqusjT3aiN4vvYgUk
2026-02-26 22:07:59 +00:00
Claude
9f4c809f70 refactor: Phase 2b — consolidate 28 modules into 14 packages
Complete the module consolidation planned in REFACTORING_PLAN.md:

Modules merged:
- work_orders/ + task_queue/ → swarm/ (subpackages)
- self_modify/ + self_tdd/ + upgrades/ → self_coding/ (subpackages)
- tools/ → creative/tools/
- chat_bridge/ + telegram_bot/ + shortcuts/ + voice/ → integrations/ (new)
- ws_manager/ + notifications/ + events/ + router/ → infrastructure/ (new)
- agents/ + agent_core/ + memory/ → timmy/ (subpackages)

Updated across codebase:
- 66 source files: import statements rewritten
- 13 test files: import + patch() target strings rewritten
- pyproject.toml: wheel includes (28→14), entry points updated
- CLAUDE.md: singleton paths, module map, entry points table
- AGENTS.md: file convention updates
- REFACTORING_PLAN.md: execution status, success metrics

Extras:
- Module-level CLAUDE.md added to 6 key packages (Phase 6.2)
- Zero test regressions: 1462 tests passing

https://claude.ai/code/session_01JNjWfHqusjT3aiN4vvYgUk
2026-02-26 22:07:41 +00:00
Alexander Whitestone
24c3d33c3b Merge pull request #58 from AlexanderWhitestone/claude/plan-repo-refactoring-hgskF 2026-02-26 16:33:11 -05:00
Claude
f15559482b docs: Update REFACTORING_PLAN.md with execution status
Mark completed phases (1, 2a, 3, 4, 6) and document remaining work
(full module consolidation, package extraction) with guidance on
incremental execution approach.

https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN
2026-02-26 21:32:18 +00:00
Claude
d2c80fbf4c refactor: Phase 2a — consolidate dashboard routes (27→22 files)
Merge related route files to reduce sprawl:
- voice.py ← voice_enhanced.py (enhanced pipeline merged in)
- swarm.py ← swarm_internal.py + swarm_ws.py (internal API + WebSocket)
- self_coding.py ← self_modify.py (self-modify endpoints merged in)
- Delete mobile_test.py route + template (test-only page, not for prod)
- Delete test_xss_prevention.py (tested the deleted mobile_test page)

Update app.py to use consolidated imports.
Update test_voice_enhanced.py patch paths.
Remove mobile_test.py from coverage omit (file deleted).

27 route files → 22. Tests: 1502 passed (1 removed with deleted page).

https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN
2026-02-26 21:30:39 +00:00
Claude
4e11dd2490 refactor: Phase 3 — reorganize tests into module-mirroring subdirectories
Move 97 test files from flat tests/ into 13 subdirectories:
  tests/dashboard/   (8 files — routes, mobile, mission control)
  tests/swarm/       (17 files — coordinator, docker, routing, tasks)
  tests/timmy/       (12 files — agent, backends, CLI, tools)
  tests/self_coding/  (14 files — git safety, indexer, self-modify)
  tests/lightning/   (3 files — L402, LND, interface)
  tests/creative/    (8 files — assembler, director, image/music/video)
  tests/integrations/ (10 files — chat bridge, telegram, voice, websocket)
  tests/mcp/         (4 files — bootstrap, discovery, executor)
  tests/spark/       (3 files — engine, tools, events)
  tests/hands/       (3 files — registry, oracle, phase5)
  tests/scripture/   (1 file)
  tests/infrastructure/ (3 files — router cascade, API)
  tests/security/    (3 files — XSS, regression)

Fix Path(__file__) reference in test_mobile_scenarios.py for new depth.
Add __init__.py to all test subdirectories.

Tests: 1503 passed, 9 failed (pre-existing), 53 errors (pre-existing)

https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN
2026-02-26 21:21:28 +00:00
Claude
6045077144 refactor: Phase 1/4/6 — doc cleanup, config fix, token optimization
Phase 1 — Documentation cleanup:
- Slim README 303→93 lines (remove duplicated architecture, config tables)
- Slim CLAUDE.md 267→80 lines (remove project layout, env vars, CI section)
- Slim AGENTS.md 342→72 lines (remove duplicated patterns, running locally)
- Delete MEMORY.md, WORKSET_PLAN.md, WORKSET_PLAN_PHASE2.md (session docs)
- Archive PLAN.md, IMPLEMENTATION_SUMMARY.md to docs/
- Move QUALITY_ANALYSIS.md, QUALITY_REVIEW_REPORT.md to docs/
- Move apply_security_fixes.py, activate_self_tdd.sh to scripts/

Phase 4 — Config & build cleanup:
- Fix wheel build: add 11 missing modules to pyproject.toml include list
- Add pytest markers (unit, integration, dashboard, swarm, slow)
- Add data/self_modify_reports/ and .handoff/ to .gitignore

Phase 6 — Token optimization:
- Add docstrings to 15 __init__.py files that were empty
- Create __init__.py for events/, memory/, upgrades/ modules

Root markdown: 87KB → ~18KB (79% reduction)

https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN
2026-02-26 21:03:15 +00:00
Claude
31760682f6 docs: Add comprehensive architectural refactoring plan
Full VP-engineering-level review of the codebase identifying 8 problems
(monolith sprawl, dashboard gravity well, doc entropy, test skeleton
bloat, unclear project boundaries, broken wheel build, dashboard
coupling, overscoped conftest) and proposing 6 phases of incremental
refactoring from low-risk doc cleanup to potential package extraction.

Key findings:
- 28 modules in src/, 11 missing from wheel build
- 87KB of root markdown with massive duplication
- 61 of 97 test files are empty skeletons (0 test functions)
- Dashboard routes: 27 files, 4,562 lines (gravity well)
- 4 autouse fixtures run on every test regardless of need

https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN
2026-02-26 20:42:02 +00:00
Alexander Whitestone
f403d69bc1 Merge pull request #56 from AlexanderWhitestone/feature/hands-infrastructure-phase3
feat: Hands Infrastructure + 6 Autonomous Agents
2026-02-26 13:10:38 -05:00
Alexander Payne
9edcc627ea docs: Update README with all 6 Hands
Update Hands documentation to include Phase 5 additions:
- Scout: hourly OSINT monitoring
- Scribe: daily content production
- Ledger: 6-hour treasury tracking
- Weaver: weekly creative pipeline

Total: 6 autonomous Hands using existing agent framework.
2026-02-26 13:09:03 -05:00
Alexander Payne
7b26922339 test: Phase 5 Hands tests
Add comprehensive tests for new Hands:

TestScoutHand:
- Directory structure, TOML validity, SYSTEM.md
- Registry loading

TestScribeHand:
- Same validation pattern

TestLedgerHand:
- Same validation pattern

TestWeaverHand:
- Same validation pattern

TestPhase5Schedules:
- Scout: hourly (0 * * * *)
- Scribe: daily 9am (0 9 * * *)
- Ledger: every 6 hours (0 */6 * * *)
- Weaver: Sunday 10am (0 10 * * 0)

TestPhase5ApprovalGates:
- All 4 Hands have approval gates

TestAllHandsLoad:
- All 6 Hands load together

25 tests total, all passing.
2026-02-26 13:08:48 -05:00
Alexander Payne
a8f44c159e feat: Phase 5 Additional Hands (Scout, Scribe, Ledger, Weaver)
Add 4 new autonomous Hands using existing agent framework:

Scout Hand (hands/scout/):
- OSINT monitoring every hour
- Monitors: HN, Reddit, RSS for Bitcoin/sovereign AI topics
- Uses: web_search, rss_fetch, sentiment analysis

Scribe Hand (hands/scribe/):
- Content production daily at 9am
- Produces: blog posts, docs, changelog
- Uses: file ops, git tools, codebase indexer

Ledger Hand (hands/ledger/):
- Treasury tracking every 6 hours
- Monitors: on-chain, Lightning balances, payment flows
- Uses: lightning_balance, onchain_balance, payment_audit

Weaver Hand (hands/weaver/):
- Creative pipeline weekly on Sundays
- Orchestrates: Pixel + Lyra + Reel for video production
- Uses: creative_director, project management tools

All Hands configured with:
- HAND.toml manifests with schedules
- SYSTEM.md prompts
- Approval gates for write actions
- Dashboard + Telegram output
2026-02-26 13:07:43 -05:00
Alexander Payne
b884884bad docs: Hands documentation in README (Phase 4)
Update README with Hands subsystem documentation:

- Add Hands to 'What's built' table
- New 'Hands — Autonomous Agents' section with:
  - Built-in Hands reference (Oracle, Sentinel)
  - Dashboard URL
  - HAND.toml example
- Update project layout to include src/hands/ and hands/
- Update roadmap: Exodus now includes Hands

Complete documentation for Phase 3-4 Hands infrastructure.
2026-02-26 12:58:21 -05:00
Alexander Payne
7508ef13c1 test: Oracle and Sentinel Hands tests (Phase 4)
Add validation tests for the first two autonomous Hands:

TestOracleHand:
- Directory structure exists
- HAND.toml is valid TOML with correct config
- SYSTEM.md exists with proper content
- Skills directory populated
- Loads correctly in HandRegistry

TestSentinelHand:
- Same validation pattern as Oracle

TestHandSchedules:
- Oracle runs twice daily (7am, 7pm UTC)
- Sentinel runs every 15 minutes

TestHandApprovalGates:
- Both Hands have approval gates configured
- Safety model enforced

14 tests total, all passing.
2026-02-26 12:57:41 -05:00
Alexander Payne
1ba03e4ce2 feat: Oracle and Sentinel Hands (Phase 4)
Add the first two autonomous Hands to validate infrastructure:

Oracle Hand (hands/oracle/):
- Bitcoin intelligence briefing, 2x daily (7am, 7pm)
- Monitors: price action, on-chain metrics, macro context
- Tools: mempool_fetch, fee_estimate, price_fetch, whale_alert
- Output: Dashboard + Telegram, markdown format
- Safety: Broadcast requires approval (5min auto)

Sentinel Hand (hands/sentinel/):
- System health monitoring, every 15 minutes
- Monitors: dashboard, agents, database, disk, memory
- Tools: system_stats, db_health, agent_status, disk_check
- Output: Dashboard + Telegram, JSON format
- Safety: Service restart requires approval (1min auto)

Both include:
- HAND.toml configuration with schedules
- SYSTEM.md with complete prompts
- skills/ directory with specialized knowledge
- Approval gates for write actions
2026-02-26 12:57:07 -05:00
Alexander Whitestone
536a371d48 Merge pull request #55 from AlexanderWhitestone/feature/hands-infrastructure-phase3
Feature/hands infrastructure phase3
2026-02-26 12:55:28 -05:00
Alexander Payne
a1d00da2de test: Hands infrastructure tests (Phase 3)
Add comprehensive test suite for Hands framework:

TestHandRegistry:
- Load all Hands from directory
- Get Hand by name (with not-found handling)
- Get scheduled vs all Hands
- State management (status updates)
- Approval queue operations

TestHandScheduler:
- Scheduler initialization
- Schedule Hand with cron
- Get scheduled jobs list
- Manual trigger execution

TestHandRunner:
- Load system prompts from SYSTEM.md
- Load skills from skills/ directory
- Build execution prompts

TestHandConfig:
- HandConfig creation and validation
- Cron schedule validation

TestHandModels:
- HandStatus enum values
- HandState serialization to dict

17 tests total, all passing.
2026-02-26 12:49:06 -05:00
Alexander Payne
d7aaae74d5 feat: Hands Dashboard Routes and UI (Phase 3.6)
Add dashboard for managing autonomous Hands:

Routes (src/dashboard/routes/hands.py):
- GET /api/hands - List all Hands with status
- GET /api/hands/{name} - Get Hand details
- POST /api/hands/{name}/trigger - Manual trigger
- POST /api/hands/{name}/pause - Pause scheduled Hand
- POST /api/hands/{name}/resume - Resume paused Hand
- GET /api/approvals - List pending approvals
- POST /api/approvals/{id}/approve - Approve request
- POST /api/approvals/{id}/reject - Reject request
- GET /api/executions - List execution history

Templates:
- hands.html - Main dashboard page
- partials/hands_list.html - Active Hands list
- partials/approvals_list.html - Pending approvals
- partials/hand_executions.html - Execution history

Integration:
- Wired up in app.py
- Navigation links in base.html
2026-02-26 12:46:48 -05:00
Alexander Payne
73cf780656 feat: HandRunner and hands module init (Phase 3.5)
Add HandRunner for executing Hands:

- hands/runner.py: Hand execution engine
  - Load SYSTEM.md and SKILL.md files
  - Inject domain expertise into LLM context
  - Check and handle approval gates
  - Execute tool loop with LLM
  - Deliver output to dashboard/channel/file
  - Log execution records

- hands/__init__.py: Module exports
  - Export all public classes and models
  - Usage documentation

The HandRunner completes the core Hands infrastructure.
2026-02-26 12:43:40 -05:00
Alexander Payne
8a952f6818 feat: Hands Infrastructure - Models, Registry, Scheduler (Phase 3.1-3.3)
Add core Hands infrastructure:

- hands/models.py: Pydantic models for HAND.toml schema
  - HandConfig: Complete hand configuration
  - HandState: Runtime state tracking
  - HandExecution: Execution records
  - ApprovalRequest: Approval queue entries

- hands/registry.py: HandRegistry for loading and indexing
  - Load Hands from hands/ directory
  - Parse HAND.toml manifests
  - SQLite indexing for fast lookup
  - Approval queue management
  - Execution history logging

- hands/scheduler.py: APScheduler-based scheduling
  - Cron and interval triggers
  - Job management (schedule, pause, resume, unschedule)
  - Hand execution wrapper
  - Manual trigger support
2026-02-26 12:41:52 -05:00
Alexander Whitestone
7de9db32ea Merge pull request #54 from AlexanderWhitestone/feature/self-coding-rebased
Feature/self coding rebased
2026-02-26 12:36:37 -05:00
Alexander Payne
4d3995012a test: Self-Coding Dashboard Tests
Add tests for dashboard routes:

- Page routes (main page, journal partial, stats partial, execute form)
- API routes (journal list/detail, stats, codebase summary/reindex)
- Execute endpoints (API and HTMX)
- Navigation integration (link in header)

Tests verify endpoints return correct status codes and content types.
2026-02-26 12:28:30 -05:00
Alexander Payne
62365cc9b2 feat: Wire up Self-Coding Dashboard
Integrate self-coding routes into dashboard:

Changes:
- Add import for self_coding_router in app.py
- Include self_coding_router in FastAPI app
- Add SELF-CODING link to desktop navigation
- Add SELF-CODING link to mobile navigation

The self-coding dashboard is now accessible at /self-coding
2026-02-26 12:28:30 -05:00
Alexander Payne
e81be8aed7 feat: Self-Coding Dashboard HTMX Templates
Add complete UI for self-coding dashboard:

Templates:
- self_coding.html - Main dashboard page with layout
- partials/self_coding_stats.html - Stats cards (total, success rate, etc)
- partials/journal_entries.html - List of modification attempts
- partials/journal_entry_detail.html - Expanded view of single attempt
- partials/execute_form.html - Task execution form
- partials/execute_result.html - Execution result display
- partials/error.html - Error message display

Features:
- HTMX-powered dynamic updates
- Real-time journal filtering (all/success/failure)
- Modal dialog for task execution
- Responsive Bootstrap 5 styling
- Automatic refresh after successful execution
2026-02-26 12:28:05 -05:00
Alexander Payne
cb70cb392a feat: Self-Coding Dashboard API Routes
Add FastAPI routes for self-coding dashboard:

API Endpoints:
- GET /api/journal - List modification journal entries
- GET /api/journal/{id} - Get detailed attempt info
- GET /api/stats - Get success rate metrics
- POST /api/execute - Execute self-edit task
- GET /api/codebase/summary - Get codebase summary
- POST /api/codebase/reindex - Trigger reindex

HTMX Partials:
- GET /self-coding/ - Main dashboard page
- GET /self-coding/journal - Journal entries list
- GET /self-coding/stats - Stats cards
- GET /self-coding/execute-form - Task execution form
- POST /self-coding/execute - Execute task endpoint
- GET /journal/{id}/detail - Entry detail view
2026-02-26 12:28:05 -05:00
Alexander Payne
49ca4dad43 feat: Self-Edit MCP Tool (Phase 2.1)
Implements the Self-Edit MCP Tool that orchestrates the self-coding foundation:

## Core Features

1. **SelfEditTool** (src/tools/self_edit.py)
   - Complete self-modification orchestrator
   - Pre-flight safety checks (clean repo, on main branch)
   - Context gathering (codebase indexer + modification journal)
   - Feature branch creation (timmy/self-edit/{timestamp})
   - LLM-based edit planning with fallback
   - Safety constraint validation
   - Aider integration (preferred) with fallback to direct editing
   - Automatic test execution via pytest
   - Commit on success, rollback on failure
   - Modification journaling with reflections

2. **Safety Constraints**
   - Max 3 files per commit
   - Max 100 lines changed
   - Protected files list (self-edit tool, foundation services)
   - Only modify files with test coverage
   - Max 3 retries on failure
   - Requires user confirmation (MCP tool registration)

3. **Execution Backends**
   - Aider integration: --auto-test --test-cmd pytest --yes --no-git
   - Direct editing fallback: LLM-based file modification with AST validation
   - Automatic backend selection based on availability

## Test Coverage

- 19 new tests covering:
  - Basic functionality (initialization, preflight checks)
  - Edit planning (with/without LLM)
  - Safety validation (file limits, protected files)
  - Execution flow (success and failure paths)
  - Error handling (exceptions, LLM failures)
  - MCP registration

## Usage

    from tools.self_edit import register_self_edit_tool
    from mcp.registry import tool_registry

    # Register with MCP
    register_self_edit_tool(tool_registry, llm_adapter)

Phase 2.2 will add Dashboard API endpoints and UI.
2026-02-26 12:28:05 -05:00
Alexander Whitestone
bb13052da2 Merge pull request #53 from AlexanderWhitestone/claude/sovereign-biblical-ai-design-0nuHW
Add scripture module: ESV text storage, parsing, and meditation
2026-02-26 12:16:56 -05:00
Claude
485b704145 chore: include pre-existing self-modify report artifacts
https://claude.ai/code/session_015wv7FM6BFsgZ35Us6WeY7H
2026-02-26 17:07:01 +00:00
Claude
63bbe2a288 feat: add sovereign biblical text integration module (scripture)
Implement the core scripture module for local-first ESV text storage,
verse retrieval, reference parsing, original language support,
cross-referencing, topical mapping, and automated meditation workflows.

Architecture:
- scripture/constants.py: 66-book Protestant canon with aliases and metadata
- scripture/models.py: Pydantic models with integer-encoded verse IDs
- scripture/parser.py: Regex-based reference extraction and formatting
- scripture/store.py: SQLite-backed verse/xref/topic/Strong's storage
- scripture/memory.py: Tripartite memory (working/long-term/associative)
- scripture/meditation.py: Sequential/thematic/lectionary meditation scheduler
- dashboard/routes/scripture.py: REST endpoints for all scripture operations
- config.py: scripture_enabled, translation, meditation settings
- 95 comprehensive tests covering all modules and routes

https://claude.ai/code/session_015wv7FM6BFsgZ35Us6WeY7H
2026-02-26 17:06:00 +00:00
Alexander Whitestone
166e9f7544 Merge pull request #52 from AlexanderWhitestone/fix/chat-eval-bugs
Fix chat evaluation bugs: task pipeline, prompt grounding, markdown rendering
2026-02-26 11:47:51 -05:00
Alexander Payne
431cf3e020 merge: resolve conflicts with main, keep comprehensive chat pipeline
Resolved merge conflicts in agents.py and test_task_queue.py:
- Keep full chat-to-task pipeline (agent/priority extraction, question
  filtering, context injection) over simpler main version
- Incorporate test_briefing_task_queue_summary from main
- All 64 task queue tests pass

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 11:47:34 -05:00
Alexander Payne
3ca8e9f2d6 fix: chat evaluation bugs — task pipeline, prompt grounding, markdown rendering
Addresses 14 bugs from 3 rounds of deep chat evaluation:

- Add chat-to-task pipeline in agents.py with regex-based intent detection,
  agent extraction, priority extraction, and title cleaning
- Filter meta-questions ("how do I create a task?") from task creation
- Inject real-time date/time context into every chat message
- Inject live queue state when user asks about tasks
- Ground system prompts with agent roster, honesty guardrails, self-knowledge,
  math delegation template, anti-filler rules, values-conflict guidance
- Add CSS for markdown code blocks, inline code, lists, blockquotes in chat
- Add highlight.js CDN for syntax highlighting in chat responses
- Reduce small-model memory context budget (4000→2000) for expanded prompt
- Add 27 comprehensive tests covering the full chat-to-task pipeline

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 11:42:42 -05:00
Alexander Whitestone
32ad43a61a Merge pull request #51 from AlexanderWhitestone/feature/task-queue-and-ui-fixes
feat: wire chat-to-task-queue and briefing integration
2026-02-26 11:31:25 -05:00
Alexander Whitestone
13018ea04c Merge pull request #50 from AlexanderWhitestone/feature/self-coding-phase1-clean
feat: Self-Coding Foundation (Phase 1)
2026-02-26 11:30:41 -05:00
Alexander Payne
18bc64b36d feat: Self-Coding Foundation (Phase 1)
Implements the foundational infrastructure for Timmy's self-modification capability:

## New Services

1. **GitSafety** (src/self_coding/git_safety.py)
   - Atomic git operations with rollback capability
   - Snapshot/restore for safe experimentation
   - Feature branch management (timmy/self-edit/{timestamp})
   - Merge to main only after tests pass

2. **CodebaseIndexer** (src/self_coding/codebase_indexer.py)
   - AST-based parsing of Python source files
   - Extracts classes, functions, imports, docstrings
   - Builds dependency graph for blast radius analysis
   - SQLite storage with hash-based incremental indexing
   - get_summary() for LLM context (<4000 tokens)
   - get_relevant_files() for task-based file discovery

3. **ModificationJournal** (src/self_coding/modification_journal.py)
   - Persistent log of all self-modification attempts
   - Tracks outcomes: success, failure, rollback
   - find_similar() for learning from past attempts
   - Success rate metrics and recent failure tracking
   - Supports vector embeddings (Phase 2)

4. **ReflectionService** (src/self_coding/reflection.py)
   - LLM-powered analysis of modification attempts
   - Generates lessons learned from successes and failures
   - Fallback templates when LLM unavailable
   - Supports context from similar past attempts

## Test Coverage

- 104 new tests across 7 test files
- 95% code coverage on self_coding module
- Green path tests: full workflow integration
- Red path tests: errors, rollbacks, edge cases
- Safety constraint tests: test coverage requirements, protected files

## Usage

    from self_coding import GitSafety, CodebaseIndexer, ModificationJournal

    git = GitSafety(repo_path=/path/to/repo)
    indexer = CodebaseIndexer(repo_path=/path/to/repo)
    journal = ModificationJournal()

Phase 2 will build the Self-Edit MCP Tool that orchestrates these services.
2026-02-26 11:08:05 -05:00
Alexander Payne
bc9089ef96 feat: wire chat-to-task-queue and briefing integration
- Chat messages like "add X to the queue" or "create a task" are
  intercepted and create a task_queue entry with pending_approval
  status instead of going through to the LLM
- Briefing engine now gathers task queue stats (pending, running,
  completed, failed) and includes them in the morning briefing prompt
- 7 new tests covering detection patterns, chat integration, and
  briefing summary

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 10:33:14 -05:00
Alexander Whitestone
6c6b6f8a54 Merge pull request #49 from AlexanderWhitestone/feature/task-queue-and-ui-fixes
feat: task queue + work orders + UI bug fixes
2026-02-26 10:28:35 -05:00
Alexander Payne
5f9bbb8435 feat: add task queue with human-in-the-loop approval + work orders + UI bug fixes
Task Queue system:
- New /tasks page with three-column layout (Pending/Active/Completed)
- Full CRUD API at /api/tasks with approve/veto/modify/pause/cancel/retry
- SQLite persistence in task_queue table
- WebSocket live updates via ws_manager
- Create task modal with agent assignment and priority
- Auto-approve rules for low-risk tasks
- HTMX polling for real-time column updates
- HOME TASK buttons now link to task queue with agent pre-selected
- MARKET HIRE buttons link to task queue with agent pre-selected

Work Order system:
- External submission API for agents/users (POST /work-orders/submit)
- Risk scoring and configurable auto-execution thresholds
- Dashboard at /work-orders/queue with approve/reject/execute flow
- Integration with swarm task system for execution

UI & Dashboard bug fixes:
- EVENTS: add startup event so page is never empty
- LEDGER: fix empty filter params in URL
- MISSION CONTROL: LLM backend and model now read from /health
- MISSION CONTROL: agent count fallback to /swarm/agents
- SWARM: HTMX fallback loads initial data if WebSocket is slow
- MEMORY: add edit/delete buttons for personal facts
- UPGRADES: add empty state guidance with links
- BRIEFING: add regenerate button and POST /briefing/regenerate endpoint

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 10:27:08 -05:00
Alexander Whitestone
4e78f7102e Merge pull request #48 from AlexanderWhitestone/fix/timmy-startup-and-stability
fix: Timmy QA bugs — calculator, markdown, prompt guardrails, briefing
2026-02-26 09:44:33 -05:00
Alexander Payne
6e6b4355bb fix: calculator tool, markdown rendering, prompt guardrails, briefing notification
- Add sandboxed calculator tool to Timmy's toolkit so arithmetic questions
  get exact answers instead of LLM hallucinations
- Update system prompts (lite + full) to instruct Timmy to always use the
  calculator and never attempt multi-digit math in his head
- Add self-contradiction guard to both prompts ("commit to your facts")
- Render Timmy's chat responses as markdown via marked.js + DOMPurify
  instead of raw escaped text
- Suppress empty briefing notification on startup when there are 0
  pending approval items
- Add calculator to session response sanitizer regex
- 18 new calculator tests, 2 updated briefing notification tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 09:35:59 -05:00
Alexander Whitestone
e6a7db7d80 Merge pull request #47 from AlexanderWhitestone/fix/timmy-startup-and-stability
fix: Timmy startup crashes and clean initialization
2026-02-26 09:21:23 -05:00
Alexander Payne
05d4dc997c fix: chat panel scroll — internal scroll on #chat-log, auto-scroll on new messages
- Set overflow:hidden on mc-main to prevent page-level scrolling
- Add max-height:100% to sidebar and chat panel to contain within viewport
- Use flex-wrap:nowrap on layout row to prevent column stacking on desktop
- Move scrollChat() to hx-on::after-settle for reliable post-swap scrolling
- Use requestAnimationFrame for smooth scroll-to-bottom timing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 09:15:40 -05:00