From 31760682f6c5e91bae898fbd7f4bd75cf96b3e2f Mon Sep 17 00:00:00 2001 From: Claude Date: Thu, 26 Feb 2026 20:42:02 +0000 Subject: [PATCH] docs: Add comprehensive architectural refactoring plan Full VP-engineering-level review of the codebase identifying 8 problems (monolith sprawl, dashboard gravity well, doc entropy, test skeleton bloat, unclear project boundaries, broken wheel build, dashboard coupling, overscoped conftest) and proposing 6 phases of incremental refactoring from low-risk doc cleanup to potential package extraction. Key findings: - 28 modules in src/, 11 missing from wheel build - 87KB of root markdown with massive duplication - 61 of 97 test files are empty skeletons (0 test functions) - Dashboard routes: 27 files, 4,562 lines (gravity well) - 4 autouse fixtures run on every test regardless of need https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN --- REFACTORING_PLAN.md | 499 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 499 insertions(+) create mode 100644 REFACTORING_PLAN.md diff --git a/REFACTORING_PLAN.md b/REFACTORING_PLAN.md new file mode 100644 index 0000000..0398961 --- /dev/null +++ b/REFACTORING_PLAN.md @@ -0,0 +1,499 @@ +# Timmy Time — Architectural Refactoring Plan + +**Author:** Claude (VP Engineering review) +**Date:** 2026-02-26 +**Branch:** `claude/plan-repo-refactoring-hgskF` + +--- + +## Executive Summary + +The Timmy Time codebase has grown to **53K lines of Python** across **272 +files** (169 source + 103 test), **28 modules** in `src/`, **27 route files**, +**49 templates**, **90 test files**, and **87KB of root-level markdown**. It +works, but it's burning tokens, slowing down test runs, and making it hard to +reason about change impact. + +This plan proposes **6 phases** of refactoring, ordered by impact and risk. Each +phase is independently valuable — you can stop after any phase and still be +better off. + +--- + +## The Problems + +### 1. Monolith sprawl +28 modules in `src/` with no grouping. Eleven modules aren't even included in +the wheel build (`agents`, `events`, `hands`, `mcp`, `memory`, `router`, +`self_coding`, `task_queue`, `tools`, `upgrades`, `work_orders`). Some are +used by the dashboard routes but forgotten in `pyproject.toml`. + +### 2. Dashboard is the gravity well +The dashboard has 27 route files (4,562 lines), 49 templates, and has become +the integration point for everything. Every new feature = new route file + new +template + new test file. This doesn't scale. + +### 3. Documentation entropy +10 root-level `.md` files (87KB). README is 303 lines, CLAUDE.md is 267 lines, +AGENTS.md is 342 lines — with massive content duplication between them. Plus +PLAN.md, WORKSET_PLAN.md, WORKSET_PLAN_PHASE2.md, MEMORY.md, +IMPLEMENTATION_SUMMARY.md, QUALITY_ANALYSIS.md, QUALITY_REVIEW_REPORT.md. +Human eyes glaze over. AI assistants waste tokens reading redundant info. + +### 4. Test sprawl — and a skeleton problem +97 test files, 19,600 lines — but **61 of those files (63%) are empty +skeletons** with zero actual test functions. Only 36 files have real tests +containing 471 test functions total. Many "large" test files (like +`test_scripture.py` at 901 lines, `test_router_cascade.py` at 523 lines) are +infrastructure-only — class definitions, imports, fixtures, but no assertions. +The functional/E2E directory (`tests/functional/`) has 7 files and 0 working +tests. Tests are flat in `tests/` with no organization. Running the full suite +means loading every module, every mock, every fixture even when you only +changed one thing. + +### 5. Unclear project boundaries +Is this one project or several? The `timmy` CLI, `timmy-serve` API server, +`self-tdd` watchdog, and `self-modify` CLI are four separate entry points that +could be four separate packages. The `creative` extra needs PyTorch. The +`lightning` module is a standalone payment system. These shouldn't live in the +same test run. + +### 6. Wheel build doesn't match reality +`pyproject.toml` includes 17 modules but `src/` has 28. The missing 11 modules +are used by code that IS included (dashboard routes import from `hands`, +`mcp`, `memory`, `work_orders`, etc.). The wheel would break at runtime. + +### 7. Dependency coupling through dashboard + +The dashboard is the hub that imports from 20+ modules. The dependency graph +flows inward: `config` is the foundation (22 modules depend on it), `mcp` is +widely used (12+ importers), `swarm` is referenced by 15+ modules. No true +circular dependencies exist (the `timmy ↔ swarm` relationship uses lazy +imports), but the dashboard pulls in everything, so changing any module can +break the dashboard routes. + +### 8. Conftest does too much + +`tests/conftest.py` has 4 autouse fixtures that run on **every single test**: +reset message log, reset coordinator state, clean database, cleanup event +loops. Many tests don't need any of these. This adds overhead to the test +suite and couples all tests to the swarm coordinator. + +--- + +## Phase 1: Documentation Cleanup (Low Risk, High Impact) + +**Goal:** Cut root markdown from 87KB to ~20KB. Make README human-readable. +Eliminate token waste. + +### 1.1 Slim the README + +Cut README.md from 303 lines to ~80 lines: + +``` +# Timmy Time — Mission Control + +Local-first sovereign AI agent system. Browser dashboard, Ollama inference, +Bitcoin Lightning economics. No cloud AI. + +## Quick Start + make install && make dev → http://localhost:8000 + +## What's Here + - Timmy Agent (Ollama/AirLLM) + - Mission Control Dashboard (FastAPI + HTMX) + - Swarm Coordinator (multi-agent auctions) + - Lightning Payments (L402 gating) + - Creative Studio (image/music/video) + - Self-Coding (codebase-aware self-modification) + +## Commands + make dev / make test / make docker-up / make help + +## Documentation + - Development guide: CLAUDE.md + - Architecture: docs/architecture-v2.md + - Agent conventions: AGENTS.md + - Config reference: .env.example +``` + +### 1.2 De-duplicate CLAUDE.md + +Remove content that duplicates README or AGENTS.md. CLAUDE.md should only +contain what AI assistants need that isn't elsewhere: +- Architecture patterns (singletons, config, HTMX, graceful degradation) +- Testing conventions (conftest, fixtures, stubs) +- Security-sensitive areas +- Entry points table + +Target: 267 → ~130 lines. + +### 1.3 Archive or delete temporary docs + +| File | Action | +|------|--------| +| `MEMORY.md` | DELETE — session context, not permanent docs | +| `WORKSET_PLAN.md` | DELETE — use GitHub Issues | +| `WORKSET_PLAN_PHASE2.md` | DELETE — use GitHub Issues | +| `PLAN.md` | MOVE to `docs/PLAN_ARCHIVE.md` | +| `IMPLEMENTATION_SUMMARY.md` | MOVE to `docs/IMPLEMENTATION_ARCHIVE.md` | +| `QUALITY_ANALYSIS.md` | CONSOLIDATE with `docs/QUALITY_AUDIT.md` | +| `QUALITY_REVIEW_REPORT.md` | CONSOLIDATE with `docs/QUALITY_AUDIT.md` | + +**Result:** Root directory goes from 10 `.md` files to 3 (README, CLAUDE, +AGENTS). + +### 1.4 Clean up .handoff/ + +The `.handoff/` directory (CHECKPOINT.md, CONTINUE.md, TODO.md, scripts) is +session-scoped context. Either gitignore it or move to `docs/handoff/`. + +--- + +## Phase 2: Module Consolidation (Medium Risk, High Impact) + +**Goal:** Reduce 28 modules to ~12 by merging small, related modules into +coherent packages. This directly reduces cognitive load and token consumption. + +### 2.1 Proposed module structure + +``` +src/ + config.py # (keep as-is) + + timmy/ # Core agent — MERGE IN agents/, agent_core/, memory/ + agent.py # Main Timmy agent + backends.py # Ollama/AirLLM backends + cli.py # CLI entry point + orchestrator.py # ← from agents/timmy.py + personas/ # ← from agents/ (seer, helm, quill, echo, forge) + agent_core/ # ← from src/agent_core/ (becomes subpackage) + memory/ # ← from src/memory/ (becomes subpackage) + prompts.py + ... + + dashboard/ # Web UI — CONSOLIDATE routes + app.py + store.py + routes/ # See §2.2 for route consolidation + templates/ + + swarm/ # Multi-agent system — MERGE IN task_queue/, work_orders/ + coordinator.py + tasks.py # ← existing + task_queue/ models + work_orders/ # ← from src/work_orders/ (becomes subpackage) + ... + + integrations/ # NEW — MERGE chat_bridge/, telegram_bot/, shortcuts/ + chat_bridge/ # Discord, unified chat + telegram.py # ← from telegram_bot/ + shortcuts.py # ← from shortcuts/ + voice/ # ← from src/voice/ + + lightning/ # (keep as-is — standalone, security-sensitive) + + self_coding/ # MERGE IN self_modify/, self_tdd/, upgrades/ + codebase_indexer.py + git_safety.py + modification_journal.py + self_modify/ # ← from src/self_modify/ (becomes subpackage) + watchdog.py # ← from src/self_tdd/ + upgrades/ # ← from src/upgrades/ + + mcp/ # (keep as-is — used across multiple modules) + + spark/ # (keep as-is) + + creative/ # MERGE IN tools/ + director.py + assembler.py + tools/ # ← from src/tools/ (becomes subpackage) + + hands/ # (keep as-is) + + scripture/ # (keep as-is — domain-specific) + + infrastructure/ # NEW — MERGE ws_manager/, notifications/, events/, router/ + ws_manager.py # ← from ws_manager/handler.py (157 lines) + notifications.py # ← from notifications/push.py (153 lines) + events.py # ← from events/ (354 lines) + router/ # ← from src/router/ (cascade LLM router) +``` + +### 2.2 Dashboard route consolidation + +27 route files → ~12 by grouping related routes: + +| Current files | Merged into | +|--------------|-------------| +| `agents.py`, `briefing.py` | `agents.py` | +| `swarm.py`, `swarm_internal.py`, `swarm_ws.py` | `swarm.py` | +| `voice.py`, `voice_enhanced.py` | `voice.py` | +| `mobile.py`, `mobile_test.py` | `mobile.py` (delete test page) | +| `self_coding.py`, `self_modify.py` | `self_coding.py` | +| `tasks.py`, `work_orders.py` | `tasks.py` | + +`mobile_test.py` (257 lines) is a test page route that's excluded from +coverage — it should not ship in production. + +### 2.3 Fix the wheel build + +Update `pyproject.toml` `[tool.hatch.build.targets.wheel]` to include all +modules that are actually imported. Currently 11 modules are missing from the +build manifest. + +--- + +## Phase 3: Test Reorganization (Medium Risk, Medium Impact) + +**Goal:** Organize tests to match module structure, enable selective test runs, +reduce full-suite runtime. + +### 3.1 Mirror source structure in tests + +``` +tests/ + conftest.py # Global fixtures only + timmy/ # Tests for timmy/ module + conftest.py # Timmy-specific fixtures + test_agent.py + test_backends.py + test_cli.py + test_orchestrator.py + test_personas.py + test_memory.py + dashboard/ + conftest.py # Dashboard fixtures (client fixture) + test_routes_agents.py + test_routes_swarm.py + ... + swarm/ + test_coordinator.py + test_tasks.py + test_work_orders.py + integrations/ + test_chat_bridge.py + test_telegram.py + test_voice.py + self_coding/ + test_git_safety.py + test_codebase_indexer.py + test_self_modify.py + ... +``` + +### 3.2 Add pytest marks for selective execution + +```python +# pyproject.toml +[tool.pytest.ini_options] +markers = [ + "unit: Unit tests (fast, no I/O)", + "integration: Integration tests (may use SQLite)", + "dashboard: Dashboard route tests", + "swarm: Swarm coordinator tests", + "slow: Tests that take >1 second", +] +``` + +Usage: +```bash +make test # Run all tests +pytest -m unit # Fast unit tests only +pytest -m dashboard # Just dashboard tests +pytest tests/swarm/ # Just swarm module tests +pytest -m "not slow" # Skip slow tests +``` + +### 3.3 Audit and clean skeleton test files + +61 test files are empty skeletons — they have imports, class definitions, and +fixture setup but **zero test functions**. These add import overhead and create +a false sense of coverage. For each skeleton file: + +1. If the module it tests is stable and well-covered elsewhere → **delete it** +2. If the module genuinely needs tests → **implement the tests** or file an + issue +3. If it's a duplicate (e.g., both `test_swarm.py` and + `test_swarm_integration.py` exist) → **consolidate** + +Notable skeletons to address: +- `test_scripture.py` (901 lines, 0 tests) — massive infrastructure, no assertions +- `test_router_cascade.py` (523 lines, 0 tests) — same pattern +- `test_agent_core.py` (457 lines, 0 tests) +- `test_self_modify.py` (451 lines, 0 tests) +- All 7 files in `tests/functional/` (0 working tests) + +### 3.4 Split genuinely oversized test files + +For files that DO have tests but are too large: +- `test_task_queue.py` (560 lines, 30 tests) → split by feature area +- `test_mobile_scenarios.py` (339 lines, 36 tests) → split by scenario group + +Rule of thumb: No test file over 400 lines. + +--- + +## Phase 4: Configuration & Build Cleanup (Low Risk, Medium Impact) + +### 4.1 Clean up pyproject.toml + +- Fix the wheel include list to match actual imports +- Consider whether 4 separate CLI entry points belong in one package +- Add `[project.urls]` for documentation, repository links +- Review dependency pins — some are very loose (`>=1.0.0`) + +### 4.2 Consolidate Docker files + +4 docker-compose variants (default, dev, prod, test) is a lot. Consider: +- `docker-compose.yml` (base) +- `docker-compose.override.yml` (dev — auto-loaded by Docker) +- `docker-compose.prod.yml` (production only) + +### 4.3 Clean up root directory + +Non-essential root files to move or delete: + +| File | Action | +|------|--------| +| `apply_security_fixes.py` | Move to `scripts/` or delete if one-time | +| `activate_self_tdd.sh` | Move to `scripts/` | +| `coverage.xml` | Gitignore (CI artifact) | +| `data/self_modify_reports/` | Gitignore the contents | + +--- + +## Phase 5: Consider Package Extraction (High Risk, High Impact) + +**Goal:** Evaluate whether some modules should be separate packages/repos. + +### 5.1 Candidates for extraction + +| Module | Why extract | Dependency direction | +|--------|------------|---------------------| +| `lightning/` | Standalone payment system, security-sensitive | Dashboard imports lightning | +| `creative/` | Needs PyTorch, very different dependency profile | Dashboard imports creative | +| `timmy-serve` | Separate process (port 8402), separate purpose | Shares config + timmy agent | +| `self_coding/` + `self_modify/` | Self-contained self-modification system | Dashboard imports for routes | + +### 5.2 Monorepo approach (recommended over multi-repo) + +If splitting, use a monorepo with namespace packages: + +``` +packages/ + timmy-core/ # Agent + memory + CLI + timmy-dashboard/ # FastAPI app + timmy-swarm/ # Coordinator + tasks + timmy-lightning/ # Payment system + timmy-creative/ # Creative tools (heavy deps) +``` + +Each package gets its own `pyproject.toml`, test suite, and can be installed +independently. But they share the same repo, CI, and release cycle. + +**However:** This is high effort and may not be worth it unless the team +grows or the dependency profiles diverge further. Consider this only after +Phases 1-4 are done and the pain persists. + +--- + +## Phase 6: Token Optimization for AI Development (Low Risk, High Impact) + +**Goal:** Reduce context window consumption when AI assistants work on this +codebase. + +### 6.1 Lean CLAUDE.md (already covered in Phase 1) + +Every byte in CLAUDE.md is read by every AI interaction. Remove duplication. + +### 6.2 Module-level CLAUDE.md files + +Instead of one massive guide, put module-specific context where it's needed: + +``` +src/swarm/CLAUDE.md # "This module is security-sensitive. Always..." +src/lightning/CLAUDE.md # "Never hard-code secrets. Use settings..." +src/dashboard/CLAUDE.md # "Routes return template partials for HTMX..." +``` + +AI assistants read these only when working in that directory. + +### 6.3 Standardize module docstrings + +Every `__init__.py` should have a one-line summary. AI assistants read these +to understand module purpose without reading every file: + +```python +"""Swarm — Multi-agent coordinator with auction-based task assignment.""" +``` + +### 6.4 Reduce template duplication + +49 templates with repeated boilerplate. Consider Jinja2 macros for common +patterns (card layouts, form groups, table rows). + +--- + +## Prioritized Execution Order + +| Priority | Phase | Effort | Risk | Impact | +|----------|-------|--------|------|--------| +| **1** | Phase 1: Doc cleanup | 2-3 hours | Low | High — immediate token savings | +| **2** | Phase 6: Token optimization | 1-2 hours | Low | High — ongoing AI efficiency | +| **3** | Phase 4: Config/build cleanup | 1-2 hours | Low | Medium — hygiene | +| **4** | Phase 2: Module consolidation | 4-8 hours | Medium | High — structural improvement | +| **5** | Phase 3: Test reorganization | 3-5 hours | Medium | Medium — faster test cycles | +| **6** | Phase 5: Package extraction | 8-16 hours | High | High — only if needed | + +--- + +## Quick Wins (Can Do Right Now) + +1. Delete MEMORY.md, WORKSET_PLAN.md, WORKSET_PLAN_PHASE2.md (3 files, 0 risk) +2. Move PLAN.md, IMPLEMENTATION_SUMMARY.md, quality docs to `docs/` (5 files) +3. Slim README to ~80 lines +4. Fix pyproject.toml wheel includes (11 missing modules) +5. Gitignore `coverage.xml` and `data/self_modify_reports/` +6. Delete `dashboard/routes/mobile_test.py` (test page in production routes) +7. Delete or gut empty test skeletons (61 files with 0 tests — they waste CI + time and create noise) + +--- + +## What NOT to Do + +- **Don't rewrite from scratch.** The code works. Refactor incrementally. +- **Don't split into multiple repos.** Monorepo with packages (if needed) is + simpler for a small team. +- **Don't change the tech stack.** FastAPI + HTMX + Jinja2 is fine. Don't add + React, Vue, or a SPA framework. +- **Don't merge CLAUDE.md into README.** They serve different audiences. +- **Don't remove test files** just to reduce count. Reorganize them. +- **Don't break the singleton pattern.** It works for this scale. + +--- + +## Success Metrics + +After refactoring: +- Root `.md` files: 10 → 3 +- Root markdown size: 87KB → ~20KB +- `src/` modules: 28 → ~12-15 +- Dashboard route files: 27 → ~12-15 +- Test files: organized in subdirectories matching source +- Empty skeleton test files: 61 → 0 (either implemented or deleted) +- Real test functions: 471 → 500+ (fill gaps in coverage) +- `pytest -m unit` runs in <10 seconds +- Wheel build includes all modules that are actually imported +- AI assistant context consumption drops ~40% +- Conftest autouse fixtures scoped to relevant test directories + +--- + +## Next Steps + +1. Review this plan +2. Pick a phase to start (recommended: Phase 1) +3. Create a tracking issue for each phase +4. Execute incrementally, keeping tests green at every step