Commit Graph

186 Commits

Author SHA1 Message Date
Claude
eb501c43da fix: resolve 8 test failures from missing requests stub and wrong python path
- Add `requests` to conftest.py module stubs so patch("requests.post") works
  in reward scoring tests without the package installed
- Use sys.executable instead of bare "python" in git safety tests so the
  subprocess finds pytest from the venv rather than system python

https://claude.ai/code/session_012Ye9nyFEiw2QQfx4bZeDmn
2026-02-27 02:06:45 +00:00
Alexander Whitestone
c006609094 Merge pull request #65 from AlexanderWhitestone/claude/finish-and-submit-pr-fmxyC 2026-02-26 20:55:04 -05:00
Claude
21846f3897 fix: disable gpg signing in test git fixtures and skip root-only permission test
Test fixtures that create temporary git repos now set commit.gpgsign=false
to avoid failures in environments with global commit signing configured.
The permission error test is skipped when running as root since file
permissions don't apply to the root user.

https://claude.ai/code/session_018u1fAx2GihSGctYS64tD4H
2026-02-27 01:52:47 +00:00
Claude
211c54bc8c feat: add custom weights, model registry, per-agent models, and reward scoring
Inspired by OpenClaw-RL's multi-model orchestration, this adds four
features for custom model management:

1. Custom model registry (infrastructure/models/registry.py) — SQLite-backed
   registry for GGUF, safetensors, HF checkpoint, and Ollama models with
   role-based lookups (general, reward, teacher, judge).

2. Per-agent model assignment — each swarm persona can use a different model
   instead of sharing the global default. Resolved via registry assignment >
   persona default > global default.

3. Runtime model management API (/api/v1/models) — REST endpoints to register,
   list, assign, enable/disable, and remove custom models without restart.
   Includes a dashboard page at /models.

4. Reward model scoring (PRM-style) — majority-vote quality evaluation of
   agent outputs using a configurable reward model. Scores persist in SQLite
   and feed into the swarm learner.

New config settings: custom_weights_dir, reward_model_enabled,
reward_model_name, reward_model_votes.

54 new tests covering registry CRUD, API endpoints, agent assignments,
role lookups, and reward scoring.

https://claude.ai/code/session_01V4iTozMwcE2gjfnCJdCugC
2026-02-27 01:27:53 +00:00
Alexander Whitestone
e4d5ec5ed4 Merge pull request #62 from AlexanderWhitestone/claude/grok-backend-monetization-iVc5i 2026-02-26 20:26:15 -05:00
Claude
17059bc0ea feat: add Grok (xAI) as opt-in premium backend with monetization
- Add GrokBackend class in src/timmy/backends.py with full sync/async
  support, health checks, usage stats, and cost estimation in sats
- Add consult_grok tool to Timmy's toolkit for proactive Grok queries
- Extend cascade router with Grok provider type for failover chain
- Add Grok Mode toggle card to Mission Control dashboard (HTMX live)
- Add "Ask Grok" button on chat input for direct Grok queries
- Add /grok/* routes: status, toggle, chat, stats endpoints
- Integrate Lightning invoice generation for Grok usage monetization
- Add GROK_ENABLED, XAI_API_KEY, GROK_DEFAULT_MODEL, GROK_MAX_SATS_PER_QUERY,
  GROK_FREE config settings via pydantic-settings
- Update .env.example and docker-compose.yml with Grok env vars
- Add 21 tests covering backend, tools, and route endpoints (all green)

Local-first ethos preserved: Grok is premium augmentation only,
disabled by default, and Lightning-payable when enabled.

https://claude.ai/code/session_01FygwN8wS8J6WGZ8FPb7XGV
2026-02-27 01:12:51 +00:00
Alexander Whitestone
bb31f322e5 Merge pull request #61 from AlexanderWhitestone/claude/add-github-chat-interface-iZ0yN 2026-02-26 19:41:00 -05:00
Claude
bc2c09d3f8 feat: replace GitHub page with embedded Timmy chat interface
Replaces the marketing landing page with a minimal, full-screen chat
interface that connects to a running Timmy instance. Mobile-first design
with single vertical scroll direction, looping scroll, no zoom, no
buttons — just type and press Enter to talk to Timmy.

- docs/index.html: full rewrite as a clean chat UI with dark terminal
  theme, looping infinite scroll, markdown rendering, connection status,
  and /connect, /clear, /help slash commands
- src/dashboard/app.py: add CORS middleware so the GitHub Pages site can
  reach a local Timmy server cross-origin
- src/config.py: add cors_origins setting (defaults to ["*"])

https://claude.ai/code/session_01AWLxg6KDWsfCATiuvsRMGr
2026-02-27 00:35:33 +00:00
Alexander Whitestone
e0e2a2b9d8 Merge pull request #60 from AlexanderWhitestone/claude/local-models-iphone-EwXtC 2026-02-26 19:24:32 -05:00
Claude
3b7fcc5ebc feat: add in-browser local model support for iPhone via WebLLM
Enable Timmy to run directly on iPhone by loading a small LLM into
the browser via WebGPU (Safari 26+ / iOS 26+). No server connection
required — fully sovereign, fully offline.

New files:
- static/local_llm.js: WebLLM wrapper with model catalogue, WebGPU
  detection, streaming chat, and progress callbacks
- templates/mobile_local.html: Mobile-optimized UI with model
  selector, download progress, LOCAL/SERVER badge, and chat
- tests/dashboard/test_local_models.py: 31 tests covering routes,
  config, template UX, JS asset, and XSS prevention

Changes:
- config.py: browser_model_enabled, browser_model_id,
  browser_model_fallback settings
- routes/mobile.py: /mobile/local page, /mobile/local-models API
- base.html: LOCAL AI nav link

Supported models: SmolLM2-360M (~200MB), Qwen2.5-0.5B (~350MB),
SmolLM2-1.7B (~1GB), Llama-3.2-1B (~700MB). Falls back to
server-side Ollama when local model is unavailable.

https://claude.ai/code/session_01Cqkvr4sZbED7T3iDu1rwSD
2026-02-27 00:03:05 +00:00
Alexander Whitestone
528c86298a Merge pull request #59 from AlexanderWhitestone/claude/refactoring-phase-two-lhBGv 2026-02-26 18:37:24 -05:00
Claude
3adc18c208 chore: gitignore src/data/ (test runtime artifacts)
Test runs generate src/data/swarm.db and src/data/self_modify_reports/
which should not be tracked.

https://claude.ai/code/session_01JNjWfHqusjT3aiN4vvYgUk
2026-02-26 22:09:04 +00:00
Claude
89e677e5cc chore: remove accidentally tracked self_modify_reports
These test artifacts are already in .gitignore (data/self_modify_reports/)
but were included because they landed in src/data/ during test runs.

https://claude.ai/code/session_01JNjWfHqusjT3aiN4vvYgUk
2026-02-26 22:07:59 +00:00
Claude
9f4c809f70 refactor: Phase 2b — consolidate 28 modules into 14 packages
Complete the module consolidation planned in REFACTORING_PLAN.md:

Modules merged:
- work_orders/ + task_queue/ → swarm/ (subpackages)
- self_modify/ + self_tdd/ + upgrades/ → self_coding/ (subpackages)
- tools/ → creative/tools/
- chat_bridge/ + telegram_bot/ + shortcuts/ + voice/ → integrations/ (new)
- ws_manager/ + notifications/ + events/ + router/ → infrastructure/ (new)
- agents/ + agent_core/ + memory/ → timmy/ (subpackages)

Updated across codebase:
- 66 source files: import statements rewritten
- 13 test files: import + patch() target strings rewritten
- pyproject.toml: wheel includes (28→14), entry points updated
- CLAUDE.md: singleton paths, module map, entry points table
- AGENTS.md: file convention updates
- REFACTORING_PLAN.md: execution status, success metrics

Extras:
- Module-level CLAUDE.md added to 6 key packages (Phase 6.2)
- Zero test regressions: 1462 tests passing

https://claude.ai/code/session_01JNjWfHqusjT3aiN4vvYgUk
2026-02-26 22:07:41 +00:00
Alexander Whitestone
24c3d33c3b Merge pull request #58 from AlexanderWhitestone/claude/plan-repo-refactoring-hgskF 2026-02-26 16:33:11 -05:00
Claude
f15559482b docs: Update REFACTORING_PLAN.md with execution status
Mark completed phases (1, 2a, 3, 4, 6) and document remaining work
(full module consolidation, package extraction) with guidance on
incremental execution approach.

https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN
2026-02-26 21:32:18 +00:00
Claude
d2c80fbf4c refactor: Phase 2a — consolidate dashboard routes (27→22 files)
Merge related route files to reduce sprawl:
- voice.py ← voice_enhanced.py (enhanced pipeline merged in)
- swarm.py ← swarm_internal.py + swarm_ws.py (internal API + WebSocket)
- self_coding.py ← self_modify.py (self-modify endpoints merged in)
- Delete mobile_test.py route + template (test-only page, not for prod)
- Delete test_xss_prevention.py (tested the deleted mobile_test page)

Update app.py to use consolidated imports.
Update test_voice_enhanced.py patch paths.
Remove mobile_test.py from coverage omit (file deleted).

27 route files → 22. Tests: 1502 passed (1 removed with deleted page).

https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN
2026-02-26 21:30:39 +00:00
Claude
4e11dd2490 refactor: Phase 3 — reorganize tests into module-mirroring subdirectories
Move 97 test files from flat tests/ into 13 subdirectories:
  tests/dashboard/   (8 files — routes, mobile, mission control)
  tests/swarm/       (17 files — coordinator, docker, routing, tasks)
  tests/timmy/       (12 files — agent, backends, CLI, tools)
  tests/self_coding/  (14 files — git safety, indexer, self-modify)
  tests/lightning/   (3 files — L402, LND, interface)
  tests/creative/    (8 files — assembler, director, image/music/video)
  tests/integrations/ (10 files — chat bridge, telegram, voice, websocket)
  tests/mcp/         (4 files — bootstrap, discovery, executor)
  tests/spark/       (3 files — engine, tools, events)
  tests/hands/       (3 files — registry, oracle, phase5)
  tests/scripture/   (1 file)
  tests/infrastructure/ (3 files — router cascade, API)
  tests/security/    (3 files — XSS, regression)

Fix Path(__file__) reference in test_mobile_scenarios.py for new depth.
Add __init__.py to all test subdirectories.

Tests: 1503 passed, 9 failed (pre-existing), 53 errors (pre-existing)

https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN
2026-02-26 21:21:28 +00:00
Claude
6045077144 refactor: Phase 1/4/6 — doc cleanup, config fix, token optimization
Phase 1 — Documentation cleanup:
- Slim README 303→93 lines (remove duplicated architecture, config tables)
- Slim CLAUDE.md 267→80 lines (remove project layout, env vars, CI section)
- Slim AGENTS.md 342→72 lines (remove duplicated patterns, running locally)
- Delete MEMORY.md, WORKSET_PLAN.md, WORKSET_PLAN_PHASE2.md (session docs)
- Archive PLAN.md, IMPLEMENTATION_SUMMARY.md to docs/
- Move QUALITY_ANALYSIS.md, QUALITY_REVIEW_REPORT.md to docs/
- Move apply_security_fixes.py, activate_self_tdd.sh to scripts/

Phase 4 — Config & build cleanup:
- Fix wheel build: add 11 missing modules to pyproject.toml include list
- Add pytest markers (unit, integration, dashboard, swarm, slow)
- Add data/self_modify_reports/ and .handoff/ to .gitignore

Phase 6 — Token optimization:
- Add docstrings to 15 __init__.py files that were empty
- Create __init__.py for events/, memory/, upgrades/ modules

Root markdown: 87KB → ~18KB (79% reduction)

https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN
2026-02-26 21:03:15 +00:00
Claude
31760682f6 docs: Add comprehensive architectural refactoring plan
Full VP-engineering-level review of the codebase identifying 8 problems
(monolith sprawl, dashboard gravity well, doc entropy, test skeleton
bloat, unclear project boundaries, broken wheel build, dashboard
coupling, overscoped conftest) and proposing 6 phases of incremental
refactoring from low-risk doc cleanup to potential package extraction.

Key findings:
- 28 modules in src/, 11 missing from wheel build
- 87KB of root markdown with massive duplication
- 61 of 97 test files are empty skeletons (0 test functions)
- Dashboard routes: 27 files, 4,562 lines (gravity well)
- 4 autouse fixtures run on every test regardless of need

https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN
2026-02-26 20:42:02 +00:00
Alexander Whitestone
f403d69bc1 Merge pull request #56 from AlexanderWhitestone/feature/hands-infrastructure-phase3
feat: Hands Infrastructure + 6 Autonomous Agents
2026-02-26 13:10:38 -05:00
Alexander Payne
9edcc627ea docs: Update README with all 6 Hands
Update Hands documentation to include Phase 5 additions:
- Scout: hourly OSINT monitoring
- Scribe: daily content production
- Ledger: 6-hour treasury tracking
- Weaver: weekly creative pipeline

Total: 6 autonomous Hands using existing agent framework.
2026-02-26 13:09:03 -05:00
Alexander Payne
7b26922339 test: Phase 5 Hands tests
Add comprehensive tests for new Hands:

TestScoutHand:
- Directory structure, TOML validity, SYSTEM.md
- Registry loading

TestScribeHand:
- Same validation pattern

TestLedgerHand:
- Same validation pattern

TestWeaverHand:
- Same validation pattern

TestPhase5Schedules:
- Scout: hourly (0 * * * *)
- Scribe: daily 9am (0 9 * * *)
- Ledger: every 6 hours (0 */6 * * *)
- Weaver: Sunday 10am (0 10 * * 0)

TestPhase5ApprovalGates:
- All 4 Hands have approval gates

TestAllHandsLoad:
- All 6 Hands load together

25 tests total, all passing.
2026-02-26 13:08:48 -05:00
Alexander Payne
a8f44c159e feat: Phase 5 Additional Hands (Scout, Scribe, Ledger, Weaver)
Add 4 new autonomous Hands using existing agent framework:

Scout Hand (hands/scout/):
- OSINT monitoring every hour
- Monitors: HN, Reddit, RSS for Bitcoin/sovereign AI topics
- Uses: web_search, rss_fetch, sentiment analysis

Scribe Hand (hands/scribe/):
- Content production daily at 9am
- Produces: blog posts, docs, changelog
- Uses: file ops, git tools, codebase indexer

Ledger Hand (hands/ledger/):
- Treasury tracking every 6 hours
- Monitors: on-chain, Lightning balances, payment flows
- Uses: lightning_balance, onchain_balance, payment_audit

Weaver Hand (hands/weaver/):
- Creative pipeline weekly on Sundays
- Orchestrates: Pixel + Lyra + Reel for video production
- Uses: creative_director, project management tools

All Hands configured with:
- HAND.toml manifests with schedules
- SYSTEM.md prompts
- Approval gates for write actions
- Dashboard + Telegram output
2026-02-26 13:07:43 -05:00
Alexander Payne
b884884bad docs: Hands documentation in README (Phase 4)
Update README with Hands subsystem documentation:

- Add Hands to 'What's built' table
- New 'Hands — Autonomous Agents' section with:
  - Built-in Hands reference (Oracle, Sentinel)
  - Dashboard URL
  - HAND.toml example
- Update project layout to include src/hands/ and hands/
- Update roadmap: Exodus now includes Hands

Complete documentation for Phase 3-4 Hands infrastructure.
2026-02-26 12:58:21 -05:00
Alexander Payne
7508ef13c1 test: Oracle and Sentinel Hands tests (Phase 4)
Add validation tests for the first two autonomous Hands:

TestOracleHand:
- Directory structure exists
- HAND.toml is valid TOML with correct config
- SYSTEM.md exists with proper content
- Skills directory populated
- Loads correctly in HandRegistry

TestSentinelHand:
- Same validation pattern as Oracle

TestHandSchedules:
- Oracle runs twice daily (7am, 7pm UTC)
- Sentinel runs every 15 minutes

TestHandApprovalGates:
- Both Hands have approval gates configured
- Safety model enforced

14 tests total, all passing.
2026-02-26 12:57:41 -05:00
Alexander Payne
1ba03e4ce2 feat: Oracle and Sentinel Hands (Phase 4)
Add the first two autonomous Hands to validate infrastructure:

Oracle Hand (hands/oracle/):
- Bitcoin intelligence briefing, 2x daily (7am, 7pm)
- Monitors: price action, on-chain metrics, macro context
- Tools: mempool_fetch, fee_estimate, price_fetch, whale_alert
- Output: Dashboard + Telegram, markdown format
- Safety: Broadcast requires approval (5min auto)

Sentinel Hand (hands/sentinel/):
- System health monitoring, every 15 minutes
- Monitors: dashboard, agents, database, disk, memory
- Tools: system_stats, db_health, agent_status, disk_check
- Output: Dashboard + Telegram, JSON format
- Safety: Service restart requires approval (1min auto)

Both include:
- HAND.toml configuration with schedules
- SYSTEM.md with complete prompts
- skills/ directory with specialized knowledge
- Approval gates for write actions
2026-02-26 12:57:07 -05:00
Alexander Whitestone
536a371d48 Merge pull request #55 from AlexanderWhitestone/feature/hands-infrastructure-phase3
Feature/hands infrastructure phase3
2026-02-26 12:55:28 -05:00
Alexander Payne
a1d00da2de test: Hands infrastructure tests (Phase 3)
Add comprehensive test suite for Hands framework:

TestHandRegistry:
- Load all Hands from directory
- Get Hand by name (with not-found handling)
- Get scheduled vs all Hands
- State management (status updates)
- Approval queue operations

TestHandScheduler:
- Scheduler initialization
- Schedule Hand with cron
- Get scheduled jobs list
- Manual trigger execution

TestHandRunner:
- Load system prompts from SYSTEM.md
- Load skills from skills/ directory
- Build execution prompts

TestHandConfig:
- HandConfig creation and validation
- Cron schedule validation

TestHandModels:
- HandStatus enum values
- HandState serialization to dict

17 tests total, all passing.
2026-02-26 12:49:06 -05:00
Alexander Payne
d7aaae74d5 feat: Hands Dashboard Routes and UI (Phase 3.6)
Add dashboard for managing autonomous Hands:

Routes (src/dashboard/routes/hands.py):
- GET /api/hands - List all Hands with status
- GET /api/hands/{name} - Get Hand details
- POST /api/hands/{name}/trigger - Manual trigger
- POST /api/hands/{name}/pause - Pause scheduled Hand
- POST /api/hands/{name}/resume - Resume paused Hand
- GET /api/approvals - List pending approvals
- POST /api/approvals/{id}/approve - Approve request
- POST /api/approvals/{id}/reject - Reject request
- GET /api/executions - List execution history

Templates:
- hands.html - Main dashboard page
- partials/hands_list.html - Active Hands list
- partials/approvals_list.html - Pending approvals
- partials/hand_executions.html - Execution history

Integration:
- Wired up in app.py
- Navigation links in base.html
2026-02-26 12:46:48 -05:00
Alexander Payne
73cf780656 feat: HandRunner and hands module init (Phase 3.5)
Add HandRunner for executing Hands:

- hands/runner.py: Hand execution engine
  - Load SYSTEM.md and SKILL.md files
  - Inject domain expertise into LLM context
  - Check and handle approval gates
  - Execute tool loop with LLM
  - Deliver output to dashboard/channel/file
  - Log execution records

- hands/__init__.py: Module exports
  - Export all public classes and models
  - Usage documentation

The HandRunner completes the core Hands infrastructure.
2026-02-26 12:43:40 -05:00
Alexander Payne
8a952f6818 feat: Hands Infrastructure - Models, Registry, Scheduler (Phase 3.1-3.3)
Add core Hands infrastructure:

- hands/models.py: Pydantic models for HAND.toml schema
  - HandConfig: Complete hand configuration
  - HandState: Runtime state tracking
  - HandExecution: Execution records
  - ApprovalRequest: Approval queue entries

- hands/registry.py: HandRegistry for loading and indexing
  - Load Hands from hands/ directory
  - Parse HAND.toml manifests
  - SQLite indexing for fast lookup
  - Approval queue management
  - Execution history logging

- hands/scheduler.py: APScheduler-based scheduling
  - Cron and interval triggers
  - Job management (schedule, pause, resume, unschedule)
  - Hand execution wrapper
  - Manual trigger support
2026-02-26 12:41:52 -05:00
Alexander Whitestone
7de9db32ea Merge pull request #54 from AlexanderWhitestone/feature/self-coding-rebased
Feature/self coding rebased
2026-02-26 12:36:37 -05:00
Alexander Payne
4d3995012a test: Self-Coding Dashboard Tests
Add tests for dashboard routes:

- Page routes (main page, journal partial, stats partial, execute form)
- API routes (journal list/detail, stats, codebase summary/reindex)
- Execute endpoints (API and HTMX)
- Navigation integration (link in header)

Tests verify endpoints return correct status codes and content types.
2026-02-26 12:28:30 -05:00
Alexander Payne
62365cc9b2 feat: Wire up Self-Coding Dashboard
Integrate self-coding routes into dashboard:

Changes:
- Add import for self_coding_router in app.py
- Include self_coding_router in FastAPI app
- Add SELF-CODING link to desktop navigation
- Add SELF-CODING link to mobile navigation

The self-coding dashboard is now accessible at /self-coding
2026-02-26 12:28:30 -05:00
Alexander Payne
e81be8aed7 feat: Self-Coding Dashboard HTMX Templates
Add complete UI for self-coding dashboard:

Templates:
- self_coding.html - Main dashboard page with layout
- partials/self_coding_stats.html - Stats cards (total, success rate, etc)
- partials/journal_entries.html - List of modification attempts
- partials/journal_entry_detail.html - Expanded view of single attempt
- partials/execute_form.html - Task execution form
- partials/execute_result.html - Execution result display
- partials/error.html - Error message display

Features:
- HTMX-powered dynamic updates
- Real-time journal filtering (all/success/failure)
- Modal dialog for task execution
- Responsive Bootstrap 5 styling
- Automatic refresh after successful execution
2026-02-26 12:28:05 -05:00
Alexander Payne
cb70cb392a feat: Self-Coding Dashboard API Routes
Add FastAPI routes for self-coding dashboard:

API Endpoints:
- GET /api/journal - List modification journal entries
- GET /api/journal/{id} - Get detailed attempt info
- GET /api/stats - Get success rate metrics
- POST /api/execute - Execute self-edit task
- GET /api/codebase/summary - Get codebase summary
- POST /api/codebase/reindex - Trigger reindex

HTMX Partials:
- GET /self-coding/ - Main dashboard page
- GET /self-coding/journal - Journal entries list
- GET /self-coding/stats - Stats cards
- GET /self-coding/execute-form - Task execution form
- POST /self-coding/execute - Execute task endpoint
- GET /journal/{id}/detail - Entry detail view
2026-02-26 12:28:05 -05:00
Alexander Payne
49ca4dad43 feat: Self-Edit MCP Tool (Phase 2.1)
Implements the Self-Edit MCP Tool that orchestrates the self-coding foundation:

## Core Features

1. **SelfEditTool** (src/tools/self_edit.py)
   - Complete self-modification orchestrator
   - Pre-flight safety checks (clean repo, on main branch)
   - Context gathering (codebase indexer + modification journal)
   - Feature branch creation (timmy/self-edit/{timestamp})
   - LLM-based edit planning with fallback
   - Safety constraint validation
   - Aider integration (preferred) with fallback to direct editing
   - Automatic test execution via pytest
   - Commit on success, rollback on failure
   - Modification journaling with reflections

2. **Safety Constraints**
   - Max 3 files per commit
   - Max 100 lines changed
   - Protected files list (self-edit tool, foundation services)
   - Only modify files with test coverage
   - Max 3 retries on failure
   - Requires user confirmation (MCP tool registration)

3. **Execution Backends**
   - Aider integration: --auto-test --test-cmd pytest --yes --no-git
   - Direct editing fallback: LLM-based file modification with AST validation
   - Automatic backend selection based on availability

## Test Coverage

- 19 new tests covering:
  - Basic functionality (initialization, preflight checks)
  - Edit planning (with/without LLM)
  - Safety validation (file limits, protected files)
  - Execution flow (success and failure paths)
  - Error handling (exceptions, LLM failures)
  - MCP registration

## Usage

    from tools.self_edit import register_self_edit_tool
    from mcp.registry import tool_registry

    # Register with MCP
    register_self_edit_tool(tool_registry, llm_adapter)

Phase 2.2 will add Dashboard API endpoints and UI.
2026-02-26 12:28:05 -05:00
Alexander Whitestone
bb13052da2 Merge pull request #53 from AlexanderWhitestone/claude/sovereign-biblical-ai-design-0nuHW
Add scripture module: ESV text storage, parsing, and meditation
2026-02-26 12:16:56 -05:00
Claude
485b704145 chore: include pre-existing self-modify report artifacts
https://claude.ai/code/session_015wv7FM6BFsgZ35Us6WeY7H
2026-02-26 17:07:01 +00:00
Claude
63bbe2a288 feat: add sovereign biblical text integration module (scripture)
Implement the core scripture module for local-first ESV text storage,
verse retrieval, reference parsing, original language support,
cross-referencing, topical mapping, and automated meditation workflows.

Architecture:
- scripture/constants.py: 66-book Protestant canon with aliases and metadata
- scripture/models.py: Pydantic models with integer-encoded verse IDs
- scripture/parser.py: Regex-based reference extraction and formatting
- scripture/store.py: SQLite-backed verse/xref/topic/Strong's storage
- scripture/memory.py: Tripartite memory (working/long-term/associative)
- scripture/meditation.py: Sequential/thematic/lectionary meditation scheduler
- dashboard/routes/scripture.py: REST endpoints for all scripture operations
- config.py: scripture_enabled, translation, meditation settings
- 95 comprehensive tests covering all modules and routes

https://claude.ai/code/session_015wv7FM6BFsgZ35Us6WeY7H
2026-02-26 17:06:00 +00:00
Alexander Whitestone
166e9f7544 Merge pull request #52 from AlexanderWhitestone/fix/chat-eval-bugs
Fix chat evaluation bugs: task pipeline, prompt grounding, markdown rendering
2026-02-26 11:47:51 -05:00
Alexander Payne
431cf3e020 merge: resolve conflicts with main, keep comprehensive chat pipeline
Resolved merge conflicts in agents.py and test_task_queue.py:
- Keep full chat-to-task pipeline (agent/priority extraction, question
  filtering, context injection) over simpler main version
- Incorporate test_briefing_task_queue_summary from main
- All 64 task queue tests pass

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 11:47:34 -05:00
Alexander Payne
3ca8e9f2d6 fix: chat evaluation bugs — task pipeline, prompt grounding, markdown rendering
Addresses 14 bugs from 3 rounds of deep chat evaluation:

- Add chat-to-task pipeline in agents.py with regex-based intent detection,
  agent extraction, priority extraction, and title cleaning
- Filter meta-questions ("how do I create a task?") from task creation
- Inject real-time date/time context into every chat message
- Inject live queue state when user asks about tasks
- Ground system prompts with agent roster, honesty guardrails, self-knowledge,
  math delegation template, anti-filler rules, values-conflict guidance
- Add CSS for markdown code blocks, inline code, lists, blockquotes in chat
- Add highlight.js CDN for syntax highlighting in chat responses
- Reduce small-model memory context budget (4000→2000) for expanded prompt
- Add 27 comprehensive tests covering the full chat-to-task pipeline

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 11:42:42 -05:00
Alexander Whitestone
32ad43a61a Merge pull request #51 from AlexanderWhitestone/feature/task-queue-and-ui-fixes
feat: wire chat-to-task-queue and briefing integration
2026-02-26 11:31:25 -05:00
Alexander Whitestone
13018ea04c Merge pull request #50 from AlexanderWhitestone/feature/self-coding-phase1-clean
feat: Self-Coding Foundation (Phase 1)
2026-02-26 11:30:41 -05:00
Alexander Payne
18bc64b36d feat: Self-Coding Foundation (Phase 1)
Implements the foundational infrastructure for Timmy's self-modification capability:

## New Services

1. **GitSafety** (src/self_coding/git_safety.py)
   - Atomic git operations with rollback capability
   - Snapshot/restore for safe experimentation
   - Feature branch management (timmy/self-edit/{timestamp})
   - Merge to main only after tests pass

2. **CodebaseIndexer** (src/self_coding/codebase_indexer.py)
   - AST-based parsing of Python source files
   - Extracts classes, functions, imports, docstrings
   - Builds dependency graph for blast radius analysis
   - SQLite storage with hash-based incremental indexing
   - get_summary() for LLM context (<4000 tokens)
   - get_relevant_files() for task-based file discovery

3. **ModificationJournal** (src/self_coding/modification_journal.py)
   - Persistent log of all self-modification attempts
   - Tracks outcomes: success, failure, rollback
   - find_similar() for learning from past attempts
   - Success rate metrics and recent failure tracking
   - Supports vector embeddings (Phase 2)

4. **ReflectionService** (src/self_coding/reflection.py)
   - LLM-powered analysis of modification attempts
   - Generates lessons learned from successes and failures
   - Fallback templates when LLM unavailable
   - Supports context from similar past attempts

## Test Coverage

- 104 new tests across 7 test files
- 95% code coverage on self_coding module
- Green path tests: full workflow integration
- Red path tests: errors, rollbacks, edge cases
- Safety constraint tests: test coverage requirements, protected files

## Usage

    from self_coding import GitSafety, CodebaseIndexer, ModificationJournal

    git = GitSafety(repo_path=/path/to/repo)
    indexer = CodebaseIndexer(repo_path=/path/to/repo)
    journal = ModificationJournal()

Phase 2 will build the Self-Edit MCP Tool that orchestrates these services.
2026-02-26 11:08:05 -05:00
Alexander Payne
bc9089ef96 feat: wire chat-to-task-queue and briefing integration
- Chat messages like "add X to the queue" or "create a task" are
  intercepted and create a task_queue entry with pending_approval
  status instead of going through to the LLM
- Briefing engine now gathers task queue stats (pending, running,
  completed, failed) and includes them in the morning briefing prompt
- 7 new tests covering detection patterns, chat integration, and
  briefing summary

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 10:33:14 -05:00
Alexander Whitestone
6c6b6f8a54 Merge pull request #49 from AlexanderWhitestone/feature/task-queue-and-ui-fixes
feat: task queue + work orders + UI bug fixes
2026-02-26 10:28:35 -05:00
Alexander Payne
5f9bbb8435 feat: add task queue with human-in-the-loop approval + work orders + UI bug fixes
Task Queue system:
- New /tasks page with three-column layout (Pending/Active/Completed)
- Full CRUD API at /api/tasks with approve/veto/modify/pause/cancel/retry
- SQLite persistence in task_queue table
- WebSocket live updates via ws_manager
- Create task modal with agent assignment and priority
- Auto-approve rules for low-risk tasks
- HTMX polling for real-time column updates
- HOME TASK buttons now link to task queue with agent pre-selected
- MARKET HIRE buttons link to task queue with agent pre-selected

Work Order system:
- External submission API for agents/users (POST /work-orders/submit)
- Risk scoring and configurable auto-execution thresholds
- Dashboard at /work-orders/queue with approve/reject/execute flow
- Integration with swarm task system for execution

UI & Dashboard bug fixes:
- EVENTS: add startup event so page is never empty
- LEDGER: fix empty filter params in URL
- MISSION CONTROL: LLM backend and model now read from /health
- MISSION CONTROL: agent count fallback to /swarm/agents
- SWARM: HTMX fallback loads initial data if WebSocket is slow
- MEMORY: add edit/delete buttons for personal facts
- UPGRADES: add empty state guidance with links
- BRIEFING: add regenerate button and POST /briefing/regenerate endpoint

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 10:27:08 -05:00