forked from Rockachopa/Timmy-time-dashboard
docs: Add comprehensive architectural refactoring plan
Full VP-engineering-level review of the codebase identifying 8 problems (monolith sprawl, dashboard gravity well, doc entropy, test skeleton bloat, unclear project boundaries, broken wheel build, dashboard coupling, overscoped conftest) and proposing 6 phases of incremental refactoring from low-risk doc cleanup to potential package extraction. Key findings: - 28 modules in src/, 11 missing from wheel build - 87KB of root markdown with massive duplication - 61 of 97 test files are empty skeletons (0 test functions) - Dashboard routes: 27 files, 4,562 lines (gravity well) - 4 autouse fixtures run on every test regardless of need https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN
This commit is contained in:
499
REFACTORING_PLAN.md
Normal file
499
REFACTORING_PLAN.md
Normal file
@@ -0,0 +1,499 @@
|
||||
# Timmy Time — Architectural Refactoring Plan
|
||||
|
||||
**Author:** Claude (VP Engineering review)
|
||||
**Date:** 2026-02-26
|
||||
**Branch:** `claude/plan-repo-refactoring-hgskF`
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
The Timmy Time codebase has grown to **53K lines of Python** across **272
|
||||
files** (169 source + 103 test), **28 modules** in `src/`, **27 route files**,
|
||||
**49 templates**, **90 test files**, and **87KB of root-level markdown**. It
|
||||
works, but it's burning tokens, slowing down test runs, and making it hard to
|
||||
reason about change impact.
|
||||
|
||||
This plan proposes **6 phases** of refactoring, ordered by impact and risk. Each
|
||||
phase is independently valuable — you can stop after any phase and still be
|
||||
better off.
|
||||
|
||||
---
|
||||
|
||||
## The Problems
|
||||
|
||||
### 1. Monolith sprawl
|
||||
28 modules in `src/` with no grouping. Eleven modules aren't even included in
|
||||
the wheel build (`agents`, `events`, `hands`, `mcp`, `memory`, `router`,
|
||||
`self_coding`, `task_queue`, `tools`, `upgrades`, `work_orders`). Some are
|
||||
used by the dashboard routes but forgotten in `pyproject.toml`.
|
||||
|
||||
### 2. Dashboard is the gravity well
|
||||
The dashboard has 27 route files (4,562 lines), 49 templates, and has become
|
||||
the integration point for everything. Every new feature = new route file + new
|
||||
template + new test file. This doesn't scale.
|
||||
|
||||
### 3. Documentation entropy
|
||||
10 root-level `.md` files (87KB). README is 303 lines, CLAUDE.md is 267 lines,
|
||||
AGENTS.md is 342 lines — with massive content duplication between them. Plus
|
||||
PLAN.md, WORKSET_PLAN.md, WORKSET_PLAN_PHASE2.md, MEMORY.md,
|
||||
IMPLEMENTATION_SUMMARY.md, QUALITY_ANALYSIS.md, QUALITY_REVIEW_REPORT.md.
|
||||
Human eyes glaze over. AI assistants waste tokens reading redundant info.
|
||||
|
||||
### 4. Test sprawl — and a skeleton problem
|
||||
97 test files, 19,600 lines — but **61 of those files (63%) are empty
|
||||
skeletons** with zero actual test functions. Only 36 files have real tests
|
||||
containing 471 test functions total. Many "large" test files (like
|
||||
`test_scripture.py` at 901 lines, `test_router_cascade.py` at 523 lines) are
|
||||
infrastructure-only — class definitions, imports, fixtures, but no assertions.
|
||||
The functional/E2E directory (`tests/functional/`) has 7 files and 0 working
|
||||
tests. Tests are flat in `tests/` with no organization. Running the full suite
|
||||
means loading every module, every mock, every fixture even when you only
|
||||
changed one thing.
|
||||
|
||||
### 5. Unclear project boundaries
|
||||
Is this one project or several? The `timmy` CLI, `timmy-serve` API server,
|
||||
`self-tdd` watchdog, and `self-modify` CLI are four separate entry points that
|
||||
could be four separate packages. The `creative` extra needs PyTorch. The
|
||||
`lightning` module is a standalone payment system. These shouldn't live in the
|
||||
same test run.
|
||||
|
||||
### 6. Wheel build doesn't match reality
|
||||
`pyproject.toml` includes 17 modules but `src/` has 28. The missing 11 modules
|
||||
are used by code that IS included (dashboard routes import from `hands`,
|
||||
`mcp`, `memory`, `work_orders`, etc.). The wheel would break at runtime.
|
||||
|
||||
### 7. Dependency coupling through dashboard
|
||||
|
||||
The dashboard is the hub that imports from 20+ modules. The dependency graph
|
||||
flows inward: `config` is the foundation (22 modules depend on it), `mcp` is
|
||||
widely used (12+ importers), `swarm` is referenced by 15+ modules. No true
|
||||
circular dependencies exist (the `timmy ↔ swarm` relationship uses lazy
|
||||
imports), but the dashboard pulls in everything, so changing any module can
|
||||
break the dashboard routes.
|
||||
|
||||
### 8. Conftest does too much
|
||||
|
||||
`tests/conftest.py` has 4 autouse fixtures that run on **every single test**:
|
||||
reset message log, reset coordinator state, clean database, cleanup event
|
||||
loops. Many tests don't need any of these. This adds overhead to the test
|
||||
suite and couples all tests to the swarm coordinator.
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Documentation Cleanup (Low Risk, High Impact)
|
||||
|
||||
**Goal:** Cut root markdown from 87KB to ~20KB. Make README human-readable.
|
||||
Eliminate token waste.
|
||||
|
||||
### 1.1 Slim the README
|
||||
|
||||
Cut README.md from 303 lines to ~80 lines:
|
||||
|
||||
```
|
||||
# Timmy Time — Mission Control
|
||||
|
||||
Local-first sovereign AI agent system. Browser dashboard, Ollama inference,
|
||||
Bitcoin Lightning economics. No cloud AI.
|
||||
|
||||
## Quick Start
|
||||
make install && make dev → http://localhost:8000
|
||||
|
||||
## What's Here
|
||||
- Timmy Agent (Ollama/AirLLM)
|
||||
- Mission Control Dashboard (FastAPI + HTMX)
|
||||
- Swarm Coordinator (multi-agent auctions)
|
||||
- Lightning Payments (L402 gating)
|
||||
- Creative Studio (image/music/video)
|
||||
- Self-Coding (codebase-aware self-modification)
|
||||
|
||||
## Commands
|
||||
make dev / make test / make docker-up / make help
|
||||
|
||||
## Documentation
|
||||
- Development guide: CLAUDE.md
|
||||
- Architecture: docs/architecture-v2.md
|
||||
- Agent conventions: AGENTS.md
|
||||
- Config reference: .env.example
|
||||
```
|
||||
|
||||
### 1.2 De-duplicate CLAUDE.md
|
||||
|
||||
Remove content that duplicates README or AGENTS.md. CLAUDE.md should only
|
||||
contain what AI assistants need that isn't elsewhere:
|
||||
- Architecture patterns (singletons, config, HTMX, graceful degradation)
|
||||
- Testing conventions (conftest, fixtures, stubs)
|
||||
- Security-sensitive areas
|
||||
- Entry points table
|
||||
|
||||
Target: 267 → ~130 lines.
|
||||
|
||||
### 1.3 Archive or delete temporary docs
|
||||
|
||||
| File | Action |
|
||||
|------|--------|
|
||||
| `MEMORY.md` | DELETE — session context, not permanent docs |
|
||||
| `WORKSET_PLAN.md` | DELETE — use GitHub Issues |
|
||||
| `WORKSET_PLAN_PHASE2.md` | DELETE — use GitHub Issues |
|
||||
| `PLAN.md` | MOVE to `docs/PLAN_ARCHIVE.md` |
|
||||
| `IMPLEMENTATION_SUMMARY.md` | MOVE to `docs/IMPLEMENTATION_ARCHIVE.md` |
|
||||
| `QUALITY_ANALYSIS.md` | CONSOLIDATE with `docs/QUALITY_AUDIT.md` |
|
||||
| `QUALITY_REVIEW_REPORT.md` | CONSOLIDATE with `docs/QUALITY_AUDIT.md` |
|
||||
|
||||
**Result:** Root directory goes from 10 `.md` files to 3 (README, CLAUDE,
|
||||
AGENTS).
|
||||
|
||||
### 1.4 Clean up .handoff/
|
||||
|
||||
The `.handoff/` directory (CHECKPOINT.md, CONTINUE.md, TODO.md, scripts) is
|
||||
session-scoped context. Either gitignore it or move to `docs/handoff/`.
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Module Consolidation (Medium Risk, High Impact)
|
||||
|
||||
**Goal:** Reduce 28 modules to ~12 by merging small, related modules into
|
||||
coherent packages. This directly reduces cognitive load and token consumption.
|
||||
|
||||
### 2.1 Proposed module structure
|
||||
|
||||
```
|
||||
src/
|
||||
config.py # (keep as-is)
|
||||
|
||||
timmy/ # Core agent — MERGE IN agents/, agent_core/, memory/
|
||||
agent.py # Main Timmy agent
|
||||
backends.py # Ollama/AirLLM backends
|
||||
cli.py # CLI entry point
|
||||
orchestrator.py # ← from agents/timmy.py
|
||||
personas/ # ← from agents/ (seer, helm, quill, echo, forge)
|
||||
agent_core/ # ← from src/agent_core/ (becomes subpackage)
|
||||
memory/ # ← from src/memory/ (becomes subpackage)
|
||||
prompts.py
|
||||
...
|
||||
|
||||
dashboard/ # Web UI — CONSOLIDATE routes
|
||||
app.py
|
||||
store.py
|
||||
routes/ # See §2.2 for route consolidation
|
||||
templates/
|
||||
|
||||
swarm/ # Multi-agent system — MERGE IN task_queue/, work_orders/
|
||||
coordinator.py
|
||||
tasks.py # ← existing + task_queue/ models
|
||||
work_orders/ # ← from src/work_orders/ (becomes subpackage)
|
||||
...
|
||||
|
||||
integrations/ # NEW — MERGE chat_bridge/, telegram_bot/, shortcuts/
|
||||
chat_bridge/ # Discord, unified chat
|
||||
telegram.py # ← from telegram_bot/
|
||||
shortcuts.py # ← from shortcuts/
|
||||
voice/ # ← from src/voice/
|
||||
|
||||
lightning/ # (keep as-is — standalone, security-sensitive)
|
||||
|
||||
self_coding/ # MERGE IN self_modify/, self_tdd/, upgrades/
|
||||
codebase_indexer.py
|
||||
git_safety.py
|
||||
modification_journal.py
|
||||
self_modify/ # ← from src/self_modify/ (becomes subpackage)
|
||||
watchdog.py # ← from src/self_tdd/
|
||||
upgrades/ # ← from src/upgrades/
|
||||
|
||||
mcp/ # (keep as-is — used across multiple modules)
|
||||
|
||||
spark/ # (keep as-is)
|
||||
|
||||
creative/ # MERGE IN tools/
|
||||
director.py
|
||||
assembler.py
|
||||
tools/ # ← from src/tools/ (becomes subpackage)
|
||||
|
||||
hands/ # (keep as-is)
|
||||
|
||||
scripture/ # (keep as-is — domain-specific)
|
||||
|
||||
infrastructure/ # NEW — MERGE ws_manager/, notifications/, events/, router/
|
||||
ws_manager.py # ← from ws_manager/handler.py (157 lines)
|
||||
notifications.py # ← from notifications/push.py (153 lines)
|
||||
events.py # ← from events/ (354 lines)
|
||||
router/ # ← from src/router/ (cascade LLM router)
|
||||
```
|
||||
|
||||
### 2.2 Dashboard route consolidation
|
||||
|
||||
27 route files → ~12 by grouping related routes:
|
||||
|
||||
| Current files | Merged into |
|
||||
|--------------|-------------|
|
||||
| `agents.py`, `briefing.py` | `agents.py` |
|
||||
| `swarm.py`, `swarm_internal.py`, `swarm_ws.py` | `swarm.py` |
|
||||
| `voice.py`, `voice_enhanced.py` | `voice.py` |
|
||||
| `mobile.py`, `mobile_test.py` | `mobile.py` (delete test page) |
|
||||
| `self_coding.py`, `self_modify.py` | `self_coding.py` |
|
||||
| `tasks.py`, `work_orders.py` | `tasks.py` |
|
||||
|
||||
`mobile_test.py` (257 lines) is a test page route that's excluded from
|
||||
coverage — it should not ship in production.
|
||||
|
||||
### 2.3 Fix the wheel build
|
||||
|
||||
Update `pyproject.toml` `[tool.hatch.build.targets.wheel]` to include all
|
||||
modules that are actually imported. Currently 11 modules are missing from the
|
||||
build manifest.
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Test Reorganization (Medium Risk, Medium Impact)
|
||||
|
||||
**Goal:** Organize tests to match module structure, enable selective test runs,
|
||||
reduce full-suite runtime.
|
||||
|
||||
### 3.1 Mirror source structure in tests
|
||||
|
||||
```
|
||||
tests/
|
||||
conftest.py # Global fixtures only
|
||||
timmy/ # Tests for timmy/ module
|
||||
conftest.py # Timmy-specific fixtures
|
||||
test_agent.py
|
||||
test_backends.py
|
||||
test_cli.py
|
||||
test_orchestrator.py
|
||||
test_personas.py
|
||||
test_memory.py
|
||||
dashboard/
|
||||
conftest.py # Dashboard fixtures (client fixture)
|
||||
test_routes_agents.py
|
||||
test_routes_swarm.py
|
||||
...
|
||||
swarm/
|
||||
test_coordinator.py
|
||||
test_tasks.py
|
||||
test_work_orders.py
|
||||
integrations/
|
||||
test_chat_bridge.py
|
||||
test_telegram.py
|
||||
test_voice.py
|
||||
self_coding/
|
||||
test_git_safety.py
|
||||
test_codebase_indexer.py
|
||||
test_self_modify.py
|
||||
...
|
||||
```
|
||||
|
||||
### 3.2 Add pytest marks for selective execution
|
||||
|
||||
```python
|
||||
# pyproject.toml
|
||||
[tool.pytest.ini_options]
|
||||
markers = [
|
||||
"unit: Unit tests (fast, no I/O)",
|
||||
"integration: Integration tests (may use SQLite)",
|
||||
"dashboard: Dashboard route tests",
|
||||
"swarm: Swarm coordinator tests",
|
||||
"slow: Tests that take >1 second",
|
||||
]
|
||||
```
|
||||
|
||||
Usage:
|
||||
```bash
|
||||
make test # Run all tests
|
||||
pytest -m unit # Fast unit tests only
|
||||
pytest -m dashboard # Just dashboard tests
|
||||
pytest tests/swarm/ # Just swarm module tests
|
||||
pytest -m "not slow" # Skip slow tests
|
||||
```
|
||||
|
||||
### 3.3 Audit and clean skeleton test files
|
||||
|
||||
61 test files are empty skeletons — they have imports, class definitions, and
|
||||
fixture setup but **zero test functions**. These add import overhead and create
|
||||
a false sense of coverage. For each skeleton file:
|
||||
|
||||
1. If the module it tests is stable and well-covered elsewhere → **delete it**
|
||||
2. If the module genuinely needs tests → **implement the tests** or file an
|
||||
issue
|
||||
3. If it's a duplicate (e.g., both `test_swarm.py` and
|
||||
`test_swarm_integration.py` exist) → **consolidate**
|
||||
|
||||
Notable skeletons to address:
|
||||
- `test_scripture.py` (901 lines, 0 tests) — massive infrastructure, no assertions
|
||||
- `test_router_cascade.py` (523 lines, 0 tests) — same pattern
|
||||
- `test_agent_core.py` (457 lines, 0 tests)
|
||||
- `test_self_modify.py` (451 lines, 0 tests)
|
||||
- All 7 files in `tests/functional/` (0 working tests)
|
||||
|
||||
### 3.4 Split genuinely oversized test files
|
||||
|
||||
For files that DO have tests but are too large:
|
||||
- `test_task_queue.py` (560 lines, 30 tests) → split by feature area
|
||||
- `test_mobile_scenarios.py` (339 lines, 36 tests) → split by scenario group
|
||||
|
||||
Rule of thumb: No test file over 400 lines.
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Configuration & Build Cleanup (Low Risk, Medium Impact)
|
||||
|
||||
### 4.1 Clean up pyproject.toml
|
||||
|
||||
- Fix the wheel include list to match actual imports
|
||||
- Consider whether 4 separate CLI entry points belong in one package
|
||||
- Add `[project.urls]` for documentation, repository links
|
||||
- Review dependency pins — some are very loose (`>=1.0.0`)
|
||||
|
||||
### 4.2 Consolidate Docker files
|
||||
|
||||
4 docker-compose variants (default, dev, prod, test) is a lot. Consider:
|
||||
- `docker-compose.yml` (base)
|
||||
- `docker-compose.override.yml` (dev — auto-loaded by Docker)
|
||||
- `docker-compose.prod.yml` (production only)
|
||||
|
||||
### 4.3 Clean up root directory
|
||||
|
||||
Non-essential root files to move or delete:
|
||||
|
||||
| File | Action |
|
||||
|------|--------|
|
||||
| `apply_security_fixes.py` | Move to `scripts/` or delete if one-time |
|
||||
| `activate_self_tdd.sh` | Move to `scripts/` |
|
||||
| `coverage.xml` | Gitignore (CI artifact) |
|
||||
| `data/self_modify_reports/` | Gitignore the contents |
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: Consider Package Extraction (High Risk, High Impact)
|
||||
|
||||
**Goal:** Evaluate whether some modules should be separate packages/repos.
|
||||
|
||||
### 5.1 Candidates for extraction
|
||||
|
||||
| Module | Why extract | Dependency direction |
|
||||
|--------|------------|---------------------|
|
||||
| `lightning/` | Standalone payment system, security-sensitive | Dashboard imports lightning |
|
||||
| `creative/` | Needs PyTorch, very different dependency profile | Dashboard imports creative |
|
||||
| `timmy-serve` | Separate process (port 8402), separate purpose | Shares config + timmy agent |
|
||||
| `self_coding/` + `self_modify/` | Self-contained self-modification system | Dashboard imports for routes |
|
||||
|
||||
### 5.2 Monorepo approach (recommended over multi-repo)
|
||||
|
||||
If splitting, use a monorepo with namespace packages:
|
||||
|
||||
```
|
||||
packages/
|
||||
timmy-core/ # Agent + memory + CLI
|
||||
timmy-dashboard/ # FastAPI app
|
||||
timmy-swarm/ # Coordinator + tasks
|
||||
timmy-lightning/ # Payment system
|
||||
timmy-creative/ # Creative tools (heavy deps)
|
||||
```
|
||||
|
||||
Each package gets its own `pyproject.toml`, test suite, and can be installed
|
||||
independently. But they share the same repo, CI, and release cycle.
|
||||
|
||||
**However:** This is high effort and may not be worth it unless the team
|
||||
grows or the dependency profiles diverge further. Consider this only after
|
||||
Phases 1-4 are done and the pain persists.
|
||||
|
||||
---
|
||||
|
||||
## Phase 6: Token Optimization for AI Development (Low Risk, High Impact)
|
||||
|
||||
**Goal:** Reduce context window consumption when AI assistants work on this
|
||||
codebase.
|
||||
|
||||
### 6.1 Lean CLAUDE.md (already covered in Phase 1)
|
||||
|
||||
Every byte in CLAUDE.md is read by every AI interaction. Remove duplication.
|
||||
|
||||
### 6.2 Module-level CLAUDE.md files
|
||||
|
||||
Instead of one massive guide, put module-specific context where it's needed:
|
||||
|
||||
```
|
||||
src/swarm/CLAUDE.md # "This module is security-sensitive. Always..."
|
||||
src/lightning/CLAUDE.md # "Never hard-code secrets. Use settings..."
|
||||
src/dashboard/CLAUDE.md # "Routes return template partials for HTMX..."
|
||||
```
|
||||
|
||||
AI assistants read these only when working in that directory.
|
||||
|
||||
### 6.3 Standardize module docstrings
|
||||
|
||||
Every `__init__.py` should have a one-line summary. AI assistants read these
|
||||
to understand module purpose without reading every file:
|
||||
|
||||
```python
|
||||
"""Swarm — Multi-agent coordinator with auction-based task assignment."""
|
||||
```
|
||||
|
||||
### 6.4 Reduce template duplication
|
||||
|
||||
49 templates with repeated boilerplate. Consider Jinja2 macros for common
|
||||
patterns (card layouts, form groups, table rows).
|
||||
|
||||
---
|
||||
|
||||
## Prioritized Execution Order
|
||||
|
||||
| Priority | Phase | Effort | Risk | Impact |
|
||||
|----------|-------|--------|------|--------|
|
||||
| **1** | Phase 1: Doc cleanup | 2-3 hours | Low | High — immediate token savings |
|
||||
| **2** | Phase 6: Token optimization | 1-2 hours | Low | High — ongoing AI efficiency |
|
||||
| **3** | Phase 4: Config/build cleanup | 1-2 hours | Low | Medium — hygiene |
|
||||
| **4** | Phase 2: Module consolidation | 4-8 hours | Medium | High — structural improvement |
|
||||
| **5** | Phase 3: Test reorganization | 3-5 hours | Medium | Medium — faster test cycles |
|
||||
| **6** | Phase 5: Package extraction | 8-16 hours | High | High — only if needed |
|
||||
|
||||
---
|
||||
|
||||
## Quick Wins (Can Do Right Now)
|
||||
|
||||
1. Delete MEMORY.md, WORKSET_PLAN.md, WORKSET_PLAN_PHASE2.md (3 files, 0 risk)
|
||||
2. Move PLAN.md, IMPLEMENTATION_SUMMARY.md, quality docs to `docs/` (5 files)
|
||||
3. Slim README to ~80 lines
|
||||
4. Fix pyproject.toml wheel includes (11 missing modules)
|
||||
5. Gitignore `coverage.xml` and `data/self_modify_reports/`
|
||||
6. Delete `dashboard/routes/mobile_test.py` (test page in production routes)
|
||||
7. Delete or gut empty test skeletons (61 files with 0 tests — they waste CI
|
||||
time and create noise)
|
||||
|
||||
---
|
||||
|
||||
## What NOT to Do
|
||||
|
||||
- **Don't rewrite from scratch.** The code works. Refactor incrementally.
|
||||
- **Don't split into multiple repos.** Monorepo with packages (if needed) is
|
||||
simpler for a small team.
|
||||
- **Don't change the tech stack.** FastAPI + HTMX + Jinja2 is fine. Don't add
|
||||
React, Vue, or a SPA framework.
|
||||
- **Don't merge CLAUDE.md into README.** They serve different audiences.
|
||||
- **Don't remove test files** just to reduce count. Reorganize them.
|
||||
- **Don't break the singleton pattern.** It works for this scale.
|
||||
|
||||
---
|
||||
|
||||
## Success Metrics
|
||||
|
||||
After refactoring:
|
||||
- Root `.md` files: 10 → 3
|
||||
- Root markdown size: 87KB → ~20KB
|
||||
- `src/` modules: 28 → ~12-15
|
||||
- Dashboard route files: 27 → ~12-15
|
||||
- Test files: organized in subdirectories matching source
|
||||
- Empty skeleton test files: 61 → 0 (either implemented or deleted)
|
||||
- Real test functions: 471 → 500+ (fill gaps in coverage)
|
||||
- `pytest -m unit` runs in <10 seconds
|
||||
- Wheel build includes all modules that are actually imported
|
||||
- AI assistant context consumption drops ~40%
|
||||
- Conftest autouse fixtures scoped to relevant test directories
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Review this plan
|
||||
2. Pick a phase to start (recommended: Phase 1)
|
||||
3. Create a tracking issue for each phase
|
||||
4. Execute incrementally, keeping tests green at every step
|
||||
Reference in New Issue
Block a user