docs: Add comprehensive architectural refactoring plan

Full VP-engineering-level review of the codebase identifying 8 problems (monolith sprawl, dashboard gravity well, doc entropy, test skeleton bloat, unclear project boundaries, broken wheel build, dashboard coupling, overscoped conftest) and proposing 6 phases of incremental refactoring from low-risk doc cleanup to potential package extraction. Key findings: - 28 modules in src/, 11 missing from wheel build - 87KB of root markdown with massive duplication - 61 of 97 test files are empty skeletons (0 test functions) - Dashboard routes: 27 files, 4,562 lines (gravity well) - 4 autouse fixtures run on every test regardless of need https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN
2026-02-26 20:42:02 +00:00
parent f403d69bc1
commit 31760682f6
1 changed files with 499 additions and 0 deletions
--- a/REFACTORING_PLAN.md
+++ b/REFACTORING_PLAN.md
@@ -0,0 +1,499 @@
+# Timmy Time — Architectural Refactoring Plan
+
+**Author:** Claude (VP Engineering review)
+**Date:** 2026-02-26
+**Branch:** `claude/plan-repo-refactoring-hgskF`
+
+---
+
+## Executive Summary
+
+The Timmy Time codebase has grown to **53K lines of Python** across **272
+files** (169 source + 103 test), **28 modules** in `src/`, **27 route files**,
+**49 templates**, **90 test files**, and **87KB of root-level markdown**. It
+works, but it's burning tokens, slowing down test runs, and making it hard to
+reason about change impact.
+
+This plan proposes **6 phases** of refactoring, ordered by impact and risk. Each
+phase is independently valuable — you can stop after any phase and still be
+better off.
+
+---
+
+## The Problems
+
+### 1. Monolith sprawl
+28 modules in `src/` with no grouping. Eleven modules aren't even included in
+the wheel build (`agents`, `events`, `hands`, `mcp`, `memory`, `router`,
+`self_coding`, `task_queue`, `tools`, `upgrades`, `work_orders`). Some are
+used by the dashboard routes but forgotten in `pyproject.toml`.
+
+### 2. Dashboard is the gravity well
+The dashboard has 27 route files (4,562 lines), 49 templates, and has become
+the integration point for everything. Every new feature = new route file + new
+template + new test file. This doesn't scale.
+
+### 3. Documentation entropy
+10 root-level `.md` files (87KB). README is 303 lines, CLAUDE.md is 267 lines,
+AGENTS.md is 342 lines — with massive content duplication between them. Plus
+PLAN.md, WORKSET_PLAN.md, WORKSET_PLAN_PHASE2.md, MEMORY.md,
+IMPLEMENTATION_SUMMARY.md, QUALITY_ANALYSIS.md, QUALITY_REVIEW_REPORT.md.
+Human eyes glaze over. AI assistants waste tokens reading redundant info.
+
+### 4. Test sprawl — and a skeleton problem
+97 test files, 19,600 lines — but **61 of those files (63%) are empty
+skeletons** with zero actual test functions. Only 36 files have real tests
+containing 471 test functions total. Many "large" test files (like
+`test_scripture.py` at 901 lines, `test_router_cascade.py` at 523 lines) are
+infrastructure-only — class definitions, imports, fixtures, but no assertions.
+The functional/E2E directory (`tests/functional/`) has 7 files and 0 working
+tests. Tests are flat in `tests/` with no organization. Running the full suite
+means loading every module, every mock, every fixture even when you only
+changed one thing.
+
+### 5. Unclear project boundaries
+Is this one project or several? The `timmy` CLI, `timmy-serve` API server,
+`self-tdd` watchdog, and `self-modify` CLI are four separate entry points that
+could be four separate packages. The `creative` extra needs PyTorch. The
+`lightning` module is a standalone payment system. These shouldn't live in the
+same test run.
+
+### 6. Wheel build doesn't match reality
+`pyproject.toml` includes 17 modules but `src/` has 28. The missing 11 modules
+are used by code that IS included (dashboard routes import from `hands`,
+`mcp`, `memory`, `work_orders`, etc.). The wheel would break at runtime.
+
+### 7. Dependency coupling through dashboard
+
+The dashboard is the hub that imports from 20+ modules. The dependency graph
+flows inward: `config` is the foundation (22 modules depend on it), `mcp` is
+widely used (12+ importers), `swarm` is referenced by 15+ modules. No true
+circular dependencies exist (the `timmy ↔ swarm` relationship uses lazy
+imports), but the dashboard pulls in everything, so changing any module can
+break the dashboard routes.
+
+### 8. Conftest does too much
+
+`tests/conftest.py` has 4 autouse fixtures that run on **every single test**:
+reset message log, reset coordinator state, clean database, cleanup event
+loops. Many tests don't need any of these. This adds overhead to the test
+suite and couples all tests to the swarm coordinator.
+
+---
+
+## Phase 1: Documentation Cleanup (Low Risk, High Impact)
+
+**Goal:** Cut root markdown from 87KB to ~20KB. Make README human-readable.
+Eliminate token waste.
+
+### 1.1 Slim the README
+
+Cut README.md from 303 lines to ~80 lines:
+
+```
+# Timmy Time — Mission Control
+
+Local-first sovereign AI agent system. Browser dashboard, Ollama inference,
+Bitcoin Lightning economics. No cloud AI.
+
+## Quick Start
+  make install && make dev  →  http://localhost:8000
+
+## What's Here
+  - Timmy Agent (Ollama/AirLLM)
+  - Mission Control Dashboard (FastAPI + HTMX)
+  - Swarm Coordinator (multi-agent auctions)
+  - Lightning Payments (L402 gating)
+  - Creative Studio (image/music/video)
+  - Self-Coding (codebase-aware self-modification)
+
+## Commands
+  make dev / make test / make docker-up / make help
+
+## Documentation
+  - Development guide: CLAUDE.md
+  - Architecture: docs/architecture-v2.md
+  - Agent conventions: AGENTS.md
+  - Config reference: .env.example
+```
+
+### 1.2 De-duplicate CLAUDE.md
+
+Remove content that duplicates README or AGENTS.md. CLAUDE.md should only
+contain what AI assistants need that isn't elsewhere:
+- Architecture patterns (singletons, config, HTMX, graceful degradation)
+- Testing conventions (conftest, fixtures, stubs)
+- Security-sensitive areas
+- Entry points table
+
+Target: 267 → ~130 lines.
+
+### 1.3 Archive or delete temporary docs
+
+| File | Action |
+|------|--------|
+| `MEMORY.md` | DELETE — session context, not permanent docs |
+| `WORKSET_PLAN.md` | DELETE — use GitHub Issues |
+| `WORKSET_PLAN_PHASE2.md` | DELETE — use GitHub Issues |
+| `PLAN.md` | MOVE to `docs/PLAN_ARCHIVE.md` |
+| `IMPLEMENTATION_SUMMARY.md` | MOVE to `docs/IMPLEMENTATION_ARCHIVE.md` |
+| `QUALITY_ANALYSIS.md` | CONSOLIDATE with `docs/QUALITY_AUDIT.md` |
+| `QUALITY_REVIEW_REPORT.md` | CONSOLIDATE with `docs/QUALITY_AUDIT.md` |
+
+**Result:** Root directory goes from 10 `.md` files to 3 (README, CLAUDE,
+AGENTS).
+
+### 1.4 Clean up .handoff/
+
+The `.handoff/` directory (CHECKPOINT.md, CONTINUE.md, TODO.md, scripts) is
+session-scoped context. Either gitignore it or move to `docs/handoff/`.
+
+---
+
+## Phase 2: Module Consolidation (Medium Risk, High Impact)
+
+**Goal:** Reduce 28 modules to ~12 by merging small, related modules into
+coherent packages. This directly reduces cognitive load and token consumption.
+
+### 2.1 Proposed module structure
+
+```
+src/
+  config.py                    # (keep as-is)
+
+  timmy/                       # Core agent — MERGE IN agents/, agent_core/, memory/
+    agent.py                   # Main Timmy agent
+    backends.py                # Ollama/AirLLM backends
+    cli.py                     # CLI entry point
+    orchestrator.py            # ← from agents/timmy.py
+    personas/                  # ← from agents/ (seer, helm, quill, echo, forge)
+    agent_core/                # ← from src/agent_core/ (becomes subpackage)
+    memory/                    # ← from src/memory/ (becomes subpackage)
+    prompts.py
+    ...
+
+  dashboard/                   # Web UI — CONSOLIDATE routes
+    app.py
+    store.py
+    routes/                    # See §2.2 for route consolidation
+    templates/
+
+  swarm/                       # Multi-agent system — MERGE IN task_queue/, work_orders/
+    coordinator.py
+    tasks.py                   # ← existing + task_queue/ models
+    work_orders/               # ← from src/work_orders/ (becomes subpackage)
+    ...
+
+  integrations/                # NEW — MERGE chat_bridge/, telegram_bot/, shortcuts/
+    chat_bridge/               # Discord, unified chat
+    telegram.py                # ← from telegram_bot/
+    shortcuts.py               # ← from shortcuts/
+    voice/                     # ← from src/voice/
+
+  lightning/                   # (keep as-is — standalone, security-sensitive)
+
+  self_coding/                 # MERGE IN self_modify/, self_tdd/, upgrades/
+    codebase_indexer.py
+    git_safety.py
+    modification_journal.py
+    self_modify/               # ← from src/self_modify/ (becomes subpackage)
+    watchdog.py                # ← from src/self_tdd/
+    upgrades/                  # ← from src/upgrades/
+
+  mcp/                         # (keep as-is — used across multiple modules)
+
+  spark/                       # (keep as-is)
+
+  creative/                    # MERGE IN tools/
+    director.py
+    assembler.py
+    tools/                     # ← from src/tools/ (becomes subpackage)
+
+  hands/                       # (keep as-is)
+
+  scripture/                   # (keep as-is — domain-specific)
+
+  infrastructure/              # NEW — MERGE ws_manager/, notifications/, events/, router/
+    ws_manager.py              # ← from ws_manager/handler.py (157 lines)
+    notifications.py           # ← from notifications/push.py (153 lines)
+    events.py                  # ← from events/ (354 lines)
+    router/                    # ← from src/router/ (cascade LLM router)
+```
+
+### 2.2 Dashboard route consolidation
+
+27 route files → ~12 by grouping related routes:
+
+| Current files | Merged into |
+|--------------|-------------|
+| `agents.py`, `briefing.py` | `agents.py` |
+| `swarm.py`, `swarm_internal.py`, `swarm_ws.py` | `swarm.py` |
+| `voice.py`, `voice_enhanced.py` | `voice.py` |
+| `mobile.py`, `mobile_test.py` | `mobile.py` (delete test page) |
+| `self_coding.py`, `self_modify.py` | `self_coding.py` |
+| `tasks.py`, `work_orders.py` | `tasks.py` |
+
+`mobile_test.py` (257 lines) is a test page route that's excluded from
+coverage — it should not ship in production.
+
+### 2.3 Fix the wheel build
+
+Update `pyproject.toml` `[tool.hatch.build.targets.wheel]` to include all
+modules that are actually imported. Currently 11 modules are missing from the
+build manifest.
+
+---
+
+## Phase 3: Test Reorganization (Medium Risk, Medium Impact)
+
+**Goal:** Organize tests to match module structure, enable selective test runs,
+reduce full-suite runtime.
+
+### 3.1 Mirror source structure in tests
+
+```
+tests/
+  conftest.py               # Global fixtures only
+  timmy/                    # Tests for timmy/ module
+    conftest.py             # Timmy-specific fixtures
+    test_agent.py
+    test_backends.py
+    test_cli.py
+    test_orchestrator.py
+    test_personas.py
+    test_memory.py
+  dashboard/
+    conftest.py             # Dashboard fixtures (client fixture)
+    test_routes_agents.py
+    test_routes_swarm.py
+    ...
+  swarm/
+    test_coordinator.py
+    test_tasks.py
+    test_work_orders.py
+  integrations/
+    test_chat_bridge.py
+    test_telegram.py
+    test_voice.py
+  self_coding/
+    test_git_safety.py
+    test_codebase_indexer.py
+    test_self_modify.py
+  ...
+```
+
+### 3.2 Add pytest marks for selective execution
+
+```python
+# pyproject.toml
+[tool.pytest.ini_options]
+markers = [
+    "unit: Unit tests (fast, no I/O)",
+    "integration: Integration tests (may use SQLite)",
+    "dashboard: Dashboard route tests",
+    "swarm: Swarm coordinator tests",
+    "slow: Tests that take >1 second",
+]
+```
+
+Usage:
+```bash
+make test                    # Run all tests
+pytest -m unit               # Fast unit tests only
+pytest -m dashboard          # Just dashboard tests
+pytest tests/swarm/          # Just swarm module tests
+pytest -m "not slow"         # Skip slow tests
+```
+
+### 3.3 Audit and clean skeleton test files
+
+61 test files are empty skeletons — they have imports, class definitions, and
+fixture setup but **zero test functions**. These add import overhead and create
+a false sense of coverage. For each skeleton file:
+
+1. If the module it tests is stable and well-covered elsewhere → **delete it**
+2. If the module genuinely needs tests → **implement the tests** or file an
+   issue
+3. If it's a duplicate (e.g., both `test_swarm.py` and
+   `test_swarm_integration.py` exist) → **consolidate**
+
+Notable skeletons to address:
+- `test_scripture.py` (901 lines, 0 tests) — massive infrastructure, no assertions
+- `test_router_cascade.py` (523 lines, 0 tests) — same pattern
+- `test_agent_core.py` (457 lines, 0 tests)
+- `test_self_modify.py` (451 lines, 0 tests)
+- All 7 files in `tests/functional/` (0 working tests)
+
+### 3.4 Split genuinely oversized test files
+
+For files that DO have tests but are too large:
+- `test_task_queue.py` (560 lines, 30 tests) → split by feature area
+- `test_mobile_scenarios.py` (339 lines, 36 tests) → split by scenario group
+
+Rule of thumb: No test file over 400 lines.
+
+---
+
+## Phase 4: Configuration & Build Cleanup (Low Risk, Medium Impact)
+
+### 4.1 Clean up pyproject.toml
+
+- Fix the wheel include list to match actual imports
+- Consider whether 4 separate CLI entry points belong in one package
+- Add `[project.urls]` for documentation, repository links
+- Review dependency pins — some are very loose (`>=1.0.0`)
+
+### 4.2 Consolidate Docker files
+
+4 docker-compose variants (default, dev, prod, test) is a lot. Consider:
+- `docker-compose.yml` (base)
+- `docker-compose.override.yml` (dev — auto-loaded by Docker)
+- `docker-compose.prod.yml` (production only)
+
+### 4.3 Clean up root directory
+
+Non-essential root files to move or delete:
+
+| File | Action |
+|------|--------|
+| `apply_security_fixes.py` | Move to `scripts/` or delete if one-time |
+| `activate_self_tdd.sh` | Move to `scripts/` |
+| `coverage.xml` | Gitignore (CI artifact) |
+| `data/self_modify_reports/` | Gitignore the contents |
+
+---
+
+## Phase 5: Consider Package Extraction (High Risk, High Impact)
+
+**Goal:** Evaluate whether some modules should be separate packages/repos.
+
+### 5.1 Candidates for extraction
+
+| Module | Why extract | Dependency direction |
+|--------|------------|---------------------|
+| `lightning/` | Standalone payment system, security-sensitive | Dashboard imports lightning |
+| `creative/` | Needs PyTorch, very different dependency profile | Dashboard imports creative |
+| `timmy-serve` | Separate process (port 8402), separate purpose | Shares config + timmy agent |
+| `self_coding/` + `self_modify/` | Self-contained self-modification system | Dashboard imports for routes |
+
+### 5.2 Monorepo approach (recommended over multi-repo)
+
+If splitting, use a monorepo with namespace packages:
+
+```
+packages/
+  timmy-core/          # Agent + memory + CLI
+  timmy-dashboard/     # FastAPI app
+  timmy-swarm/         # Coordinator + tasks
+  timmy-lightning/     # Payment system
+  timmy-creative/      # Creative tools (heavy deps)
+```
+
+Each package gets its own `pyproject.toml`, test suite, and can be installed
+independently. But they share the same repo, CI, and release cycle.
+
+**However:** This is high effort and may not be worth it unless the team
+grows or the dependency profiles diverge further. Consider this only after
+Phases 1-4 are done and the pain persists.
+
+---
+
+## Phase 6: Token Optimization for AI Development (Low Risk, High Impact)
+
+**Goal:** Reduce context window consumption when AI assistants work on this
+codebase.
+
+### 6.1 Lean CLAUDE.md (already covered in Phase 1)
+
+Every byte in CLAUDE.md is read by every AI interaction. Remove duplication.
+
+### 6.2 Module-level CLAUDE.md files
+
+Instead of one massive guide, put module-specific context where it's needed:
+
+```
+src/swarm/CLAUDE.md        # "This module is security-sensitive. Always..."
+src/lightning/CLAUDE.md    # "Never hard-code secrets. Use settings..."
+src/dashboard/CLAUDE.md   # "Routes return template partials for HTMX..."
+```
+
+AI assistants read these only when working in that directory.
+
+### 6.3 Standardize module docstrings
+
+Every `__init__.py` should have a one-line summary. AI assistants read these
+to understand module purpose without reading every file:
+
+```python
+"""Swarm — Multi-agent coordinator with auction-based task assignment."""
+```
+
+### 6.4 Reduce template duplication
+
+49 templates with repeated boilerplate. Consider Jinja2 macros for common
+patterns (card layouts, form groups, table rows).
+
+---
+
+## Prioritized Execution Order
+
+| Priority | Phase | Effort | Risk | Impact |
+|----------|-------|--------|------|--------|
+| **1** | Phase 1: Doc cleanup | 2-3 hours | Low | High — immediate token savings |
+| **2** | Phase 6: Token optimization | 1-2 hours | Low | High — ongoing AI efficiency |
+| **3** | Phase 4: Config/build cleanup | 1-2 hours | Low | Medium — hygiene |
+| **4** | Phase 2: Module consolidation | 4-8 hours | Medium | High — structural improvement |
+| **5** | Phase 3: Test reorganization | 3-5 hours | Medium | Medium — faster test cycles |
+| **6** | Phase 5: Package extraction | 8-16 hours | High | High — only if needed |
+
+---
+
+## Quick Wins (Can Do Right Now)
+
+1. Delete MEMORY.md, WORKSET_PLAN.md, WORKSET_PLAN_PHASE2.md (3 files, 0 risk)
+2. Move PLAN.md, IMPLEMENTATION_SUMMARY.md, quality docs to `docs/` (5 files)
+3. Slim README to ~80 lines
+4. Fix pyproject.toml wheel includes (11 missing modules)
+5. Gitignore `coverage.xml` and `data/self_modify_reports/`
+6. Delete `dashboard/routes/mobile_test.py` (test page in production routes)
+7. Delete or gut empty test skeletons (61 files with 0 tests — they waste CI
+   time and create noise)
+
+---
+
+## What NOT to Do
+
+- **Don't rewrite from scratch.** The code works. Refactor incrementally.
+- **Don't split into multiple repos.** Monorepo with packages (if needed) is
+  simpler for a small team.
+- **Don't change the tech stack.** FastAPI + HTMX + Jinja2 is fine. Don't add
+  React, Vue, or a SPA framework.
+- **Don't merge CLAUDE.md into README.** They serve different audiences.
+- **Don't remove test files** just to reduce count. Reorganize them.
+- **Don't break the singleton pattern.** It works for this scale.
+
+---
+
+## Success Metrics
+
+After refactoring:
+- Root `.md` files: 10 → 3
+- Root markdown size: 87KB → ~20KB
+- `src/` modules: 28 → ~12-15
+- Dashboard route files: 27 → ~12-15
+- Test files: organized in subdirectories matching source
+- Empty skeleton test files: 61 → 0 (either implemented or deleted)
+- Real test functions: 471 → 500+ (fill gaps in coverage)
+- `pytest -m unit` runs in <10 seconds
+- Wheel build includes all modules that are actually imported
+- AI assistant context consumption drops ~40%
+- Conftest autouse fixtures scoped to relevant test directories
+
+---
+
+## Next Steps
+
+1. Review this plan
+2. Pick a phase to start (recommended: Phase 1)
+3. Create a tracking issue for each phase
+4. Execute incrementally, keeping tests green at every step