docs: Add comprehensive architectural refactoring plan
Full VP-engineering-level review of the codebase identifying 8 problems
(monolith sprawl, dashboard gravity well, doc entropy, test skeleton
bloat, unclear project boundaries, broken wheel build, dashboard
coupling, overscoped conftest) and proposing 6 phases of incremental
refactoring from low-risk doc cleanup to potential package extraction.
Key findings:
- 28 modules in src/, 11 missing from wheel build
- 87KB of root markdown with massive duplication
- 61 of 97 test files are empty skeletons (0 test functions)
- Dashboard routes: 27 files, 4,562 lines (gravity well)
- 4 autouse fixtures run on every test regardless of need
https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN
2026-02-26 20:42:02 +00:00
|
|
|
# Timmy Time — Architectural Refactoring Plan
|
|
|
|
|
|
|
|
|
|
**Author:** Claude (VP Engineering review)
|
|
|
|
|
**Date:** 2026-02-26
|
|
|
|
|
**Branch:** `claude/plan-repo-refactoring-hgskF`
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Executive Summary
|
|
|
|
|
|
|
|
|
|
The Timmy Time codebase has grown to **53K lines of Python** across **272
|
|
|
|
|
files** (169 source + 103 test), **28 modules** in `src/`, **27 route files**,
|
|
|
|
|
**49 templates**, **90 test files**, and **87KB of root-level markdown**. It
|
|
|
|
|
works, but it's burning tokens, slowing down test runs, and making it hard to
|
|
|
|
|
reason about change impact.
|
|
|
|
|
|
|
|
|
|
This plan proposes **6 phases** of refactoring, ordered by impact and risk. Each
|
|
|
|
|
phase is independently valuable — you can stop after any phase and still be
|
|
|
|
|
better off.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## The Problems
|
|
|
|
|
|
|
|
|
|
### 1. Monolith sprawl
|
|
|
|
|
28 modules in `src/` with no grouping. Eleven modules aren't even included in
|
|
|
|
|
the wheel build (`agents`, `events`, `hands`, `mcp`, `memory`, `router`,
|
|
|
|
|
`self_coding`, `task_queue`, `tools`, `upgrades`, `work_orders`). Some are
|
|
|
|
|
used by the dashboard routes but forgotten in `pyproject.toml`.
|
|
|
|
|
|
|
|
|
|
### 2. Dashboard is the gravity well
|
|
|
|
|
The dashboard has 27 route files (4,562 lines), 49 templates, and has become
|
|
|
|
|
the integration point for everything. Every new feature = new route file + new
|
|
|
|
|
template + new test file. This doesn't scale.
|
|
|
|
|
|
|
|
|
|
### 3. Documentation entropy
|
|
|
|
|
10 root-level `.md` files (87KB). README is 303 lines, CLAUDE.md is 267 lines,
|
|
|
|
|
AGENTS.md is 342 lines — with massive content duplication between them. Plus
|
|
|
|
|
PLAN.md, WORKSET_PLAN.md, WORKSET_PLAN_PHASE2.md, MEMORY.md,
|
|
|
|
|
IMPLEMENTATION_SUMMARY.md, QUALITY_ANALYSIS.md, QUALITY_REVIEW_REPORT.md.
|
|
|
|
|
Human eyes glaze over. AI assistants waste tokens reading redundant info.
|
|
|
|
|
|
|
|
|
|
### 4. Test sprawl — and a skeleton problem
|
|
|
|
|
97 test files, 19,600 lines — but **61 of those files (63%) are empty
|
|
|
|
|
skeletons** with zero actual test functions. Only 36 files have real tests
|
|
|
|
|
containing 471 test functions total. Many "large" test files (like
|
|
|
|
|
`test_scripture.py` at 901 lines, `test_router_cascade.py` at 523 lines) are
|
|
|
|
|
infrastructure-only — class definitions, imports, fixtures, but no assertions.
|
|
|
|
|
The functional/E2E directory (`tests/functional/`) has 7 files and 0 working
|
|
|
|
|
tests. Tests are flat in `tests/` with no organization. Running the full suite
|
|
|
|
|
means loading every module, every mock, every fixture even when you only
|
|
|
|
|
changed one thing.
|
|
|
|
|
|
|
|
|
|
### 5. Unclear project boundaries
|
|
|
|
|
Is this one project or several? The `timmy` CLI, `timmy-serve` API server,
|
|
|
|
|
`self-tdd` watchdog, and `self-modify` CLI are four separate entry points that
|
|
|
|
|
could be four separate packages. The `creative` extra needs PyTorch. The
|
|
|
|
|
`lightning` module is a standalone payment system. These shouldn't live in the
|
|
|
|
|
same test run.
|
|
|
|
|
|
|
|
|
|
### 6. Wheel build doesn't match reality
|
|
|
|
|
`pyproject.toml` includes 17 modules but `src/` has 28. The missing 11 modules
|
|
|
|
|
are used by code that IS included (dashboard routes import from `hands`,
|
|
|
|
|
`mcp`, `memory`, `work_orders`, etc.). The wheel would break at runtime.
|
|
|
|
|
|
|
|
|
|
### 7. Dependency coupling through dashboard
|
|
|
|
|
|
|
|
|
|
The dashboard is the hub that imports from 20+ modules. The dependency graph
|
|
|
|
|
flows inward: `config` is the foundation (22 modules depend on it), `mcp` is
|
|
|
|
|
widely used (12+ importers), `swarm` is referenced by 15+ modules. No true
|
|
|
|
|
circular dependencies exist (the `timmy ↔ swarm` relationship uses lazy
|
|
|
|
|
imports), but the dashboard pulls in everything, so changing any module can
|
|
|
|
|
break the dashboard routes.
|
|
|
|
|
|
|
|
|
|
### 8. Conftest does too much
|
|
|
|
|
|
|
|
|
|
`tests/conftest.py` has 4 autouse fixtures that run on **every single test**:
|
|
|
|
|
reset message log, reset coordinator state, clean database, cleanup event
|
|
|
|
|
loops. Many tests don't need any of these. This adds overhead to the test
|
|
|
|
|
suite and couples all tests to the swarm coordinator.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Phase 1: Documentation Cleanup (Low Risk, High Impact)
|
|
|
|
|
|
|
|
|
|
**Goal:** Cut root markdown from 87KB to ~20KB. Make README human-readable.
|
|
|
|
|
Eliminate token waste.
|
|
|
|
|
|
|
|
|
|
### 1.1 Slim the README
|
|
|
|
|
|
|
|
|
|
Cut README.md from 303 lines to ~80 lines:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
# Timmy Time — Mission Control
|
|
|
|
|
|
|
|
|
|
Local-first sovereign AI agent system. Browser dashboard, Ollama inference,
|
|
|
|
|
Bitcoin Lightning economics. No cloud AI.
|
|
|
|
|
|
|
|
|
|
## Quick Start
|
|
|
|
|
make install && make dev → http://localhost:8000
|
|
|
|
|
|
|
|
|
|
## What's Here
|
|
|
|
|
- Timmy Agent (Ollama/AirLLM)
|
|
|
|
|
- Mission Control Dashboard (FastAPI + HTMX)
|
|
|
|
|
- Swarm Coordinator (multi-agent auctions)
|
|
|
|
|
- Lightning Payments (L402 gating)
|
|
|
|
|
- Creative Studio (image/music/video)
|
|
|
|
|
- Self-Coding (codebase-aware self-modification)
|
|
|
|
|
|
|
|
|
|
## Commands
|
|
|
|
|
make dev / make test / make docker-up / make help
|
|
|
|
|
|
|
|
|
|
## Documentation
|
|
|
|
|
- Development guide: CLAUDE.md
|
|
|
|
|
- Architecture: docs/architecture-v2.md
|
|
|
|
|
- Agent conventions: AGENTS.md
|
|
|
|
|
- Config reference: .env.example
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### 1.2 De-duplicate CLAUDE.md
|
|
|
|
|
|
|
|
|
|
Remove content that duplicates README or AGENTS.md. CLAUDE.md should only
|
|
|
|
|
contain what AI assistants need that isn't elsewhere:
|
|
|
|
|
- Architecture patterns (singletons, config, HTMX, graceful degradation)
|
|
|
|
|
- Testing conventions (conftest, fixtures, stubs)
|
|
|
|
|
- Security-sensitive areas
|
|
|
|
|
- Entry points table
|
|
|
|
|
|
|
|
|
|
Target: 267 → ~130 lines.
|
|
|
|
|
|
|
|
|
|
### 1.3 Archive or delete temporary docs
|
|
|
|
|
|
|
|
|
|
| File | Action |
|
|
|
|
|
|------|--------|
|
|
|
|
|
| `MEMORY.md` | DELETE — session context, not permanent docs |
|
|
|
|
|
| `WORKSET_PLAN.md` | DELETE — use GitHub Issues |
|
|
|
|
|
| `WORKSET_PLAN_PHASE2.md` | DELETE — use GitHub Issues |
|
|
|
|
|
| `PLAN.md` | MOVE to `docs/PLAN_ARCHIVE.md` |
|
|
|
|
|
| `IMPLEMENTATION_SUMMARY.md` | MOVE to `docs/IMPLEMENTATION_ARCHIVE.md` |
|
|
|
|
|
| `QUALITY_ANALYSIS.md` | CONSOLIDATE with `docs/QUALITY_AUDIT.md` |
|
|
|
|
|
| `QUALITY_REVIEW_REPORT.md` | CONSOLIDATE with `docs/QUALITY_AUDIT.md` |
|
|
|
|
|
|
|
|
|
|
**Result:** Root directory goes from 10 `.md` files to 3 (README, CLAUDE,
|
|
|
|
|
AGENTS).
|
|
|
|
|
|
|
|
|
|
### 1.4 Clean up .handoff/
|
|
|
|
|
|
|
|
|
|
The `.handoff/` directory (CHECKPOINT.md, CONTINUE.md, TODO.md, scripts) is
|
|
|
|
|
session-scoped context. Either gitignore it or move to `docs/handoff/`.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Phase 2: Module Consolidation (Medium Risk, High Impact)
|
|
|
|
|
|
|
|
|
|
**Goal:** Reduce 28 modules to ~12 by merging small, related modules into
|
|
|
|
|
coherent packages. This directly reduces cognitive load and token consumption.
|
|
|
|
|
|
2026-03-07 07:28:14 -05:00
|
|
|
### 2.1 Module structure (partially implemented)
|
|
|
|
|
|
|
|
|
|
**Actual current structure (7 packages + config):**
|
docs: Add comprehensive architectural refactoring plan
Full VP-engineering-level review of the codebase identifying 8 problems
(monolith sprawl, dashboard gravity well, doc entropy, test skeleton
bloat, unclear project boundaries, broken wheel build, dashboard
coupling, overscoped conftest) and proposing 6 phases of incremental
refactoring from low-risk doc cleanup to potential package extraction.
Key findings:
- 28 modules in src/, 11 missing from wheel build
- 87KB of root markdown with massive duplication
- 61 of 97 test files are empty skeletons (0 test functions)
- Dashboard routes: 27 files, 4,562 lines (gravity well)
- 4 autouse fixtures run on every test regardless of need
https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN
2026-02-26 20:42:02 +00:00
|
|
|
|
|
|
|
|
```
|
2026-03-07 07:28:14 -05:00
|
|
|
src/ # 7 packages (was 28)
|
2026-02-26 22:07:41 +00:00
|
|
|
config.py # Pydantic settings (foundation)
|
|
|
|
|
|
|
|
|
|
timmy/ # Core agent + agents/ + agent_core/ + memory/
|
2026-03-07 07:28:14 -05:00
|
|
|
dashboard/ # FastAPI web UI, routes, templates
|
2026-02-26 22:07:41 +00:00
|
|
|
infrastructure/ # ws_manager/ + notifications/ + events/ + router/
|
|
|
|
|
integrations/ # chat_bridge/ + telegram_bot/ + shortcuts/ + voice/
|
|
|
|
|
spark/ # Event capture and advisory
|
2026-03-07 07:28:14 -05:00
|
|
|
brain/ # Identity system, memory interface
|
|
|
|
|
timmy_serve/ # API server
|
docs: Add comprehensive architectural refactoring plan
Full VP-engineering-level review of the codebase identifying 8 problems
(monolith sprawl, dashboard gravity well, doc entropy, test skeleton
bloat, unclear project boundaries, broken wheel build, dashboard
coupling, overscoped conftest) and proposing 6 phases of incremental
refactoring from low-risk doc cleanup to potential package extraction.
Key findings:
- 28 modules in src/, 11 missing from wheel build
- 87KB of root markdown with massive duplication
- 61 of 97 test files are empty skeletons (0 test functions)
- Dashboard routes: 27 files, 4,562 lines (gravity well)
- 4 autouse fixtures run on every test regardless of need
https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN
2026-02-26 20:42:02 +00:00
|
|
|
```
|
|
|
|
|
|
2026-03-07 07:28:14 -05:00
|
|
|
**Planned but never created:** `swarm/`, `self_coding/`, `creative/`,
|
|
|
|
|
`lightning/`, `mcp/`, `hands/`, `scripture/`
|
|
|
|
|
|
docs: Add comprehensive architectural refactoring plan
Full VP-engineering-level review of the codebase identifying 8 problems
(monolith sprawl, dashboard gravity well, doc entropy, test skeleton
bloat, unclear project boundaries, broken wheel build, dashboard
coupling, overscoped conftest) and proposing 6 phases of incremental
refactoring from low-risk doc cleanup to potential package extraction.
Key findings:
- 28 modules in src/, 11 missing from wheel build
- 87KB of root markdown with massive duplication
- 61 of 97 test files are empty skeletons (0 test functions)
- Dashboard routes: 27 files, 4,562 lines (gravity well)
- 4 autouse fixtures run on every test regardless of need
https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN
2026-02-26 20:42:02 +00:00
|
|
|
### 2.2 Dashboard route consolidation
|
|
|
|
|
|
|
|
|
|
27 route files → ~12 by grouping related routes:
|
|
|
|
|
|
|
|
|
|
| Current files | Merged into |
|
|
|
|
|
|--------------|-------------|
|
|
|
|
|
| `agents.py`, `briefing.py` | `agents.py` |
|
|
|
|
|
| `swarm.py`, `swarm_internal.py`, `swarm_ws.py` | `swarm.py` |
|
|
|
|
|
| `voice.py`, `voice_enhanced.py` | `voice.py` |
|
|
|
|
|
| `mobile.py`, `mobile_test.py` | `mobile.py` (delete test page) |
|
|
|
|
|
| `self_coding.py`, `self_modify.py` | `self_coding.py` |
|
|
|
|
|
| `tasks.py`, `work_orders.py` | `tasks.py` |
|
|
|
|
|
|
|
|
|
|
`mobile_test.py` (257 lines) is a test page route that's excluded from
|
|
|
|
|
coverage — it should not ship in production.
|
|
|
|
|
|
|
|
|
|
### 2.3 Fix the wheel build
|
|
|
|
|
|
|
|
|
|
Update `pyproject.toml` `[tool.hatch.build.targets.wheel]` to include all
|
|
|
|
|
modules that are actually imported. Currently 11 modules are missing from the
|
|
|
|
|
build manifest.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Phase 3: Test Reorganization (Medium Risk, Medium Impact)
|
|
|
|
|
|
|
|
|
|
**Goal:** Organize tests to match module structure, enable selective test runs,
|
|
|
|
|
reduce full-suite runtime.
|
|
|
|
|
|
|
|
|
|
### 3.1 Mirror source structure in tests
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
tests/
|
|
|
|
|
conftest.py # Global fixtures only
|
|
|
|
|
timmy/ # Tests for timmy/ module
|
|
|
|
|
conftest.py # Timmy-specific fixtures
|
|
|
|
|
test_agent.py
|
|
|
|
|
test_backends.py
|
|
|
|
|
test_cli.py
|
|
|
|
|
test_orchestrator.py
|
|
|
|
|
test_personas.py
|
|
|
|
|
test_memory.py
|
|
|
|
|
dashboard/
|
|
|
|
|
conftest.py # Dashboard fixtures (client fixture)
|
|
|
|
|
test_routes_agents.py
|
|
|
|
|
test_routes_swarm.py
|
|
|
|
|
...
|
|
|
|
|
swarm/
|
|
|
|
|
test_coordinator.py
|
|
|
|
|
test_tasks.py
|
|
|
|
|
test_work_orders.py
|
|
|
|
|
integrations/
|
|
|
|
|
test_chat_bridge.py
|
|
|
|
|
test_telegram.py
|
|
|
|
|
test_voice.py
|
|
|
|
|
self_coding/
|
|
|
|
|
test_git_safety.py
|
|
|
|
|
test_codebase_indexer.py
|
|
|
|
|
test_self_modify.py
|
|
|
|
|
...
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### 3.2 Add pytest marks for selective execution
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
# pyproject.toml
|
|
|
|
|
[tool.pytest.ini_options]
|
|
|
|
|
markers = [
|
|
|
|
|
"unit: Unit tests (fast, no I/O)",
|
|
|
|
|
"integration: Integration tests (may use SQLite)",
|
|
|
|
|
"dashboard: Dashboard route tests",
|
|
|
|
|
"swarm: Swarm coordinator tests",
|
|
|
|
|
"slow: Tests that take >1 second",
|
|
|
|
|
]
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Usage:
|
|
|
|
|
```bash
|
|
|
|
|
make test # Run all tests
|
|
|
|
|
pytest -m unit # Fast unit tests only
|
|
|
|
|
pytest -m dashboard # Just dashboard tests
|
|
|
|
|
pytest tests/swarm/ # Just swarm module tests
|
|
|
|
|
pytest -m "not slow" # Skip slow tests
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### 3.3 Audit and clean skeleton test files
|
|
|
|
|
|
|
|
|
|
61 test files are empty skeletons — they have imports, class definitions, and
|
|
|
|
|
fixture setup but **zero test functions**. These add import overhead and create
|
|
|
|
|
a false sense of coverage. For each skeleton file:
|
|
|
|
|
|
|
|
|
|
1. If the module it tests is stable and well-covered elsewhere → **delete it**
|
|
|
|
|
2. If the module genuinely needs tests → **implement the tests** or file an
|
|
|
|
|
issue
|
|
|
|
|
3. If it's a duplicate (e.g., both `test_swarm.py` and
|
|
|
|
|
`test_swarm_integration.py` exist) → **consolidate**
|
|
|
|
|
|
|
|
|
|
Notable skeletons to address:
|
|
|
|
|
- `test_scripture.py` (901 lines, 0 tests) — massive infrastructure, no assertions
|
|
|
|
|
- `test_router_cascade.py` (523 lines, 0 tests) — same pattern
|
|
|
|
|
- `test_agent_core.py` (457 lines, 0 tests)
|
|
|
|
|
- `test_self_modify.py` (451 lines, 0 tests)
|
|
|
|
|
- All 7 files in `tests/functional/` (0 working tests)
|
|
|
|
|
|
|
|
|
|
### 3.4 Split genuinely oversized test files
|
|
|
|
|
|
|
|
|
|
For files that DO have tests but are too large:
|
|
|
|
|
- `test_task_queue.py` (560 lines, 30 tests) → split by feature area
|
|
|
|
|
- `test_mobile_scenarios.py` (339 lines, 36 tests) → split by scenario group
|
|
|
|
|
|
|
|
|
|
Rule of thumb: No test file over 400 lines.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Phase 4: Configuration & Build Cleanup (Low Risk, Medium Impact)
|
|
|
|
|
|
|
|
|
|
### 4.1 Clean up pyproject.toml
|
|
|
|
|
|
|
|
|
|
- Fix the wheel include list to match actual imports
|
|
|
|
|
- Consider whether 4 separate CLI entry points belong in one package
|
|
|
|
|
- Add `[project.urls]` for documentation, repository links
|
|
|
|
|
- Review dependency pins — some are very loose (`>=1.0.0`)
|
|
|
|
|
|
|
|
|
|
### 4.2 Consolidate Docker files
|
|
|
|
|
|
|
|
|
|
4 docker-compose variants (default, dev, prod, test) is a lot. Consider:
|
|
|
|
|
- `docker-compose.yml` (base)
|
|
|
|
|
- `docker-compose.override.yml` (dev — auto-loaded by Docker)
|
|
|
|
|
- `docker-compose.prod.yml` (production only)
|
|
|
|
|
|
|
|
|
|
### 4.3 Clean up root directory
|
|
|
|
|
|
|
|
|
|
Non-essential root files to move or delete:
|
|
|
|
|
|
|
|
|
|
| File | Action |
|
|
|
|
|
|------|--------|
|
|
|
|
|
| `apply_security_fixes.py` | Move to `scripts/` or delete if one-time |
|
|
|
|
|
| `activate_self_tdd.sh` | Move to `scripts/` |
|
|
|
|
|
| `coverage.xml` | Gitignore (CI artifact) |
|
|
|
|
|
| `data/self_modify_reports/` | Gitignore the contents |
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Phase 5: Consider Package Extraction (High Risk, High Impact)
|
|
|
|
|
|
|
|
|
|
**Goal:** Evaluate whether some modules should be separate packages/repos.
|
|
|
|
|
|
|
|
|
|
### 5.1 Candidates for extraction
|
|
|
|
|
|
|
|
|
|
| Module | Why extract | Dependency direction |
|
|
|
|
|
|--------|------------|---------------------|
|
|
|
|
|
| `lightning/` | Standalone payment system, security-sensitive | Dashboard imports lightning |
|
|
|
|
|
| `creative/` | Needs PyTorch, very different dependency profile | Dashboard imports creative |
|
|
|
|
|
| `timmy-serve` | Separate process (port 8402), separate purpose | Shares config + timmy agent |
|
|
|
|
|
| `self_coding/` + `self_modify/` | Self-contained self-modification system | Dashboard imports for routes |
|
|
|
|
|
|
|
|
|
|
### 5.2 Monorepo approach (recommended over multi-repo)
|
|
|
|
|
|
|
|
|
|
If splitting, use a monorepo with namespace packages:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
packages/
|
|
|
|
|
timmy-core/ # Agent + memory + CLI
|
|
|
|
|
timmy-dashboard/ # FastAPI app
|
|
|
|
|
timmy-swarm/ # Coordinator + tasks
|
|
|
|
|
timmy-lightning/ # Payment system
|
|
|
|
|
timmy-creative/ # Creative tools (heavy deps)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Each package gets its own `pyproject.toml`, test suite, and can be installed
|
|
|
|
|
independently. But they share the same repo, CI, and release cycle.
|
|
|
|
|
|
|
|
|
|
**However:** This is high effort and may not be worth it unless the team
|
|
|
|
|
grows or the dependency profiles diverge further. Consider this only after
|
|
|
|
|
Phases 1-4 are done and the pain persists.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Phase 6: Token Optimization for AI Development (Low Risk, High Impact)
|
|
|
|
|
|
|
|
|
|
**Goal:** Reduce context window consumption when AI assistants work on this
|
|
|
|
|
codebase.
|
|
|
|
|
|
|
|
|
|
### 6.1 Lean CLAUDE.md (already covered in Phase 1)
|
|
|
|
|
|
|
|
|
|
Every byte in CLAUDE.md is read by every AI interaction. Remove duplication.
|
|
|
|
|
|
|
|
|
|
### 6.2 Module-level CLAUDE.md files
|
|
|
|
|
|
|
|
|
|
Instead of one massive guide, put module-specific context where it's needed:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
src/swarm/CLAUDE.md # "This module is security-sensitive. Always..."
|
|
|
|
|
src/lightning/CLAUDE.md # "Never hard-code secrets. Use settings..."
|
|
|
|
|
src/dashboard/CLAUDE.md # "Routes return template partials for HTMX..."
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
AI assistants read these only when working in that directory.
|
|
|
|
|
|
|
|
|
|
### 6.3 Standardize module docstrings
|
|
|
|
|
|
|
|
|
|
Every `__init__.py` should have a one-line summary. AI assistants read these
|
|
|
|
|
to understand module purpose without reading every file:
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
"""Swarm — Multi-agent coordinator with auction-based task assignment."""
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### 6.4 Reduce template duplication
|
|
|
|
|
|
|
|
|
|
49 templates with repeated boilerplate. Consider Jinja2 macros for common
|
|
|
|
|
patterns (card layouts, form groups, table rows).
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Prioritized Execution Order
|
|
|
|
|
|
|
|
|
|
| Priority | Phase | Effort | Risk | Impact |
|
|
|
|
|
|----------|-------|--------|------|--------|
|
|
|
|
|
| **1** | Phase 1: Doc cleanup | 2-3 hours | Low | High — immediate token savings |
|
|
|
|
|
| **2** | Phase 6: Token optimization | 1-2 hours | Low | High — ongoing AI efficiency |
|
|
|
|
|
| **3** | Phase 4: Config/build cleanup | 1-2 hours | Low | Medium — hygiene |
|
|
|
|
|
| **4** | Phase 2: Module consolidation | 4-8 hours | Medium | High — structural improvement |
|
|
|
|
|
| **5** | Phase 3: Test reorganization | 3-5 hours | Medium | Medium — faster test cycles |
|
|
|
|
|
| **6** | Phase 5: Package extraction | 8-16 hours | High | High — only if needed |
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Quick Wins (Can Do Right Now)
|
|
|
|
|
|
|
|
|
|
1. Delete MEMORY.md, WORKSET_PLAN.md, WORKSET_PLAN_PHASE2.md (3 files, 0 risk)
|
|
|
|
|
2. Move PLAN.md, IMPLEMENTATION_SUMMARY.md, quality docs to `docs/` (5 files)
|
|
|
|
|
3. Slim README to ~80 lines
|
|
|
|
|
4. Fix pyproject.toml wheel includes (11 missing modules)
|
|
|
|
|
5. Gitignore `coverage.xml` and `data/self_modify_reports/`
|
|
|
|
|
6. Delete `dashboard/routes/mobile_test.py` (test page in production routes)
|
|
|
|
|
7. Delete or gut empty test skeletons (61 files with 0 tests — they waste CI
|
|
|
|
|
time and create noise)
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## What NOT to Do
|
|
|
|
|
|
|
|
|
|
- **Don't rewrite from scratch.** The code works. Refactor incrementally.
|
|
|
|
|
- **Don't split into multiple repos.** Monorepo with packages (if needed) is
|
|
|
|
|
simpler for a small team.
|
|
|
|
|
- **Don't change the tech stack.** FastAPI + HTMX + Jinja2 is fine. Don't add
|
|
|
|
|
React, Vue, or a SPA framework.
|
|
|
|
|
- **Don't merge CLAUDE.md into README.** They serve different audiences.
|
|
|
|
|
- **Don't remove test files** just to reduce count. Reorganize them.
|
|
|
|
|
- **Don't break the singleton pattern.** It works for this scale.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Success Metrics
|
|
|
|
|
|
2026-02-26 22:07:41 +00:00
|
|
|
| Metric | Original | Target | Current |
|
|
|
|
|
|--------|----------|--------|---------|
|
|
|
|
|
| Root `.md` files | 10 | 3 | 5 |
|
|
|
|
|
| Root markdown size | 87KB | ~20KB | ~28KB |
|
2026-03-07 07:28:14 -05:00
|
|
|
| `src/` modules | 28 | ~12-15 | **7 packages + config** |
|
2026-02-26 22:07:41 +00:00
|
|
|
| Dashboard routes | 27 | ~12-15 | 22 |
|
|
|
|
|
| Test organization | flat | mirrored | **mirrored** |
|
2026-03-07 07:28:14 -05:00
|
|
|
| Wheel modules | 17/28 | all | needs audit |
|
|
|
|
|
| Module-level docs | 0 | all key modules | needs audit |
|
docs: Add comprehensive architectural refactoring plan
Full VP-engineering-level review of the codebase identifying 8 problems
(monolith sprawl, dashboard gravity well, doc entropy, test skeleton
bloat, unclear project boundaries, broken wheel build, dashboard
coupling, overscoped conftest) and proposing 6 phases of incremental
refactoring from low-risk doc cleanup to potential package extraction.
Key findings:
- 28 modules in src/, 11 missing from wheel build
- 87KB of root markdown with massive duplication
- 61 of 97 test files are empty skeletons (0 test functions)
- Dashboard routes: 27 files, 4,562 lines (gravity well)
- 4 autouse fixtures run on every test regardless of need
https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN
2026-02-26 20:42:02 +00:00
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
2026-02-26 21:32:18 +00:00
|
|
|
## Execution Status
|
|
|
|
|
|
|
|
|
|
### Completed
|
|
|
|
|
|
|
|
|
|
- [x] **Phase 1: Doc cleanup** — README 303→93 lines, CLAUDE.md 267→80,
|
|
|
|
|
AGENTS.md 342→72, deleted 3 session docs, archived 4 planning docs
|
|
|
|
|
- [x] **Phase 4: Config/build cleanup** — fixed 11 missing wheel modules, added
|
|
|
|
|
pytest markers, updated .gitignore, moved scripts to scripts/
|
|
|
|
|
- [x] **Phase 6: Token optimization** — added docstrings to 15+ __init__.py files
|
|
|
|
|
- [x] **Phase 3: Test reorganization** — 97 test files organized into 13
|
|
|
|
|
subdirectories mirroring source structure
|
|
|
|
|
- [x] **Phase 2a: Route consolidation** — 27 → 22 route files (merged voice,
|
|
|
|
|
swarm internal/ws, self-modify; deleted mobile_test)
|
|
|
|
|
|
2026-03-07 07:28:14 -05:00
|
|
|
- [ ] **Phase 2b: Full module consolidation** — 28 → 14 modules. Partially
|
|
|
|
|
completed. Some consolidations were applied:
|
|
|
|
|
- `chat_bridge/` + `telegram_bot/` + `shortcuts/` + `voice/` → `integrations/` (done)
|
|
|
|
|
- `ws_manager/` + `notifications/` + `events/` + `router/` → `infrastructure/` (done)
|
|
|
|
|
- `agents/` + `agent_core/` + `memory/` → `timmy/` (done)
|
|
|
|
|
- **Not completed:** `swarm/`, `self_coding/`, `creative/`, `lightning/` packages
|
|
|
|
|
were never created. These modules do not exist in `src/`.
|
|
|
|
|
- [ ] **Phase 6.2: Module-level CLAUDE.md** — not completed. The referenced
|
|
|
|
|
directories (`swarm/`, `self_coding/`, `creative/`, `lightning/`) do not exist.
|
2026-02-26 22:07:41 +00:00
|
|
|
|
2026-02-26 21:32:18 +00:00
|
|
|
### Remaining
|
|
|
|
|
|
|
|
|
|
- [ ] **Phase 5: Package extraction** — only if team grows or dep profiles diverge
|