refactor: Phase 1/4/6 — doc cleanup, config fix, token optimization

Phase 1 — Documentation cleanup: - Slim README 303→93 lines (remove duplicated architecture, config tables) - Slim CLAUDE.md 267→80 lines (remove project layout, env vars, CI section) - Slim AGENTS.md 342→72 lines (remove duplicated patterns, running locally) - Delete MEMORY.md, WORKSET_PLAN.md, WORKSET_PLAN_PHASE2.md (session docs) - Archive PLAN.md, IMPLEMENTATION_SUMMARY.md to docs/ - Move QUALITY_ANALYSIS.md, QUALITY_REVIEW_REPORT.md to docs/ - Move apply_security_fixes.py, activate_self_tdd.sh to scripts/ Phase 4 — Config & build cleanup: - Fix wheel build: add 11 missing modules to pyproject.toml include list - Add pytest markers (unit, integration, dashboard, swarm, slow) - Add data/self_modify_reports/ and .handoff/ to .gitignore Phase 6 — Token optimization: - Add docstrings to 15 __init__.py files that were empty - Create __init__.py for events/, memory/, upgrades/ modules Root markdown: 87KB → ~18KB (79% reduction) https://claude.ai/code/session_019oMFNvD8uSGSSmBMGkBfQN
2026-02-26 21:03:15 +00:00
parent 31760682f6
commit 6045077144
31 changed files with 191 additions and 1150 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -1,342 +1,79 @@
 # AGENTS.md — Timmy Time Development Standards for AI Agents

-This file is the authoritative reference for any AI agent contributing to
-this repository.  Read it first.  Every time.
+Read [`CLAUDE.md`](CLAUDE.md) for architecture patterns and conventions.

 ---

-## 1. Project at a Glance
+## Non-Negotiable Rules

-**Timmy Time** is a local-first, sovereign AI agent system.  No cloud.  No telemetry.
-Bitcoin Lightning economics baked in.
+1. **Tests must stay green.** Run `make test` before committing.
+2. **No cloud dependencies.** All AI computation runs on localhost.
+3. **No new top-level files without purpose.** Don't litter the root directory.
+4. **Follow existing patterns** — singletons, graceful degradation, pydantic-settings.
+5. **Security defaults:** Never hard-code secrets.
+6. **XSS prevention:** Never use `innerHTML` with untrusted content.

-| Thing            | Value                                              |
-|------------------|----------------------------------------------------|
-| Language         | Python 3.11+                                       |
-| Web framework    | FastAPI + Jinja2 + HTMX                            |
-| Agent framework  | Agno (wraps Ollama or AirLLM)                      |
-| Persistence      | SQLite (`timmy.db`, `data/swarm.db`)               |
-| Tests            | pytest — must stay green                           |
-| Entry points     | `timmy`, `timmy-serve`, `self-tdd`                 |
-| Config           | pydantic-settings, reads `.env`                    |
-| Containers       | Docker — each agent can run as an isolated service |
+---
+
+## Agent Roster
+
+### Build Tier
+
+**Local (Ollama)** — Primary workhorse. Free. Unrestricted.
+Best for: everything, iterative dev, Docker swarm workers.
+
+**Kimi (Moonshot)** — Paid. Large-context feature drops, new subsystems, persona agents.
+Avoid: touching CI/pyproject.toml, adding cloud calls, removing tests.
+
+**DeepSeek** — Near-free. Second-opinion generation, large refactors (R1 for hard problems).
+Avoid: bypassing review tier for security modules.
+
+### Review Tier
+
+**Claude (Anthropic)** — Architecture, tests, docs, CI/CD, PR review.
+Avoid: large one-shot feature dumps.
+
+**Gemini (Google)** — Docs, frontend polish, boilerplate, diff summaries.
+Avoid: security modules, Python business logic without Claude review.
+
+**Manus AI** — Security audits, coverage gaps, L402 validation.
+Avoid: large refactors, new features, prompt changes.
+
+---
+
+## Docker Agents
+
+Container agents poll the coordinator's HTTP API (not in-memory `SwarmComms`):

 ```
-src/
-  config.py             # Central settings (OLLAMA_URL, DEBUG, etc.)
-  timmy/                # Core agent: agent.py, backends.py, cli.py, prompts.py
-  dashboard/            # FastAPI app + routes + Jinja2 templates
-    app.py
-    store.py            # In-memory MessageLog singleton
-    routes/             # agents, health, swarm, swarm_ws, marketplace,
-    │                   # mobile, mobile_test, voice, voice_enhanced,
-    │                   # swarm_internal (HTTP API for Docker agents)
-    templates/          # base.html + page templates + partials/
-  swarm/                # Multi-agent coordinator, registry, bidder, tasks, comms
-    docker_runner.py    # Spawn agents as Docker containers
-  timmy_serve/          # L402 Lightning proxy, payment handler, TTS, CLI
-  spark/                # Intelligence engine — events, predictions, advisory
-  creative/             # Creative director + video assembler pipeline
-  tools/                # Git, image, music, video tools for persona agents
-  lightning/            # Lightning backend abstraction (mock + LND)
-  agent_core/           # Substrate-agnostic agent interface
-  voice/                # NLU intent detection (regex-based, no cloud)
-  ws_manager/           # WebSocket manager (ws_manager singleton)
-  notifications/        # Push notification store (notifier singleton)
-  shortcuts/            # Siri Shortcuts API endpoints
-  telegram_bot/         # Telegram bridge
-  self_tdd/             # Continuous test watchdog
-tests/                  # One test_*.py per module, all mocked
-static/                 # style.css + bg.svg (arcane theme)
-docs/                   # GitHub Pages site
+GET  /internal/tasks    → list tasks open for bidding
+POST /internal/bids     → submit a bid
 ```

---
-
-## 2. Non-Negotiable Rules
-
-1. **Tests must stay green.**  Run `make test` before committing.
-2. **No cloud dependencies.**  All AI computation runs on localhost.
-3. **No new top-level files without purpose.**  Don't litter the root directory.
-4. **Follow existing patterns** — singletons, graceful degradation, pydantic-settings config.
-5. **Security defaults:** Never hard-code secrets.  Warn at startup when defaults are in use.
-6. **XSS prevention:**  Never use `innerHTML` with untrusted content.
-
---
-
-## 3. Agent Roster
-
-Agents are divided into two tiers: **Builders** generate code and features;
-**Reviewers** provide quality gates, feedback, and hardening.  The Local agent
-is the primary workhorse — use it as much as possible to minimise cost.
-
---
-
-### 🏗️ BUILD TIER
-
---
-
-### Local — Ollama (primary workhorse)
-**Model:** Any — `qwen2.5-coder`, `deepseek-coder-v2`, `codellama`, or whatever
-is loaded in Ollama.  The owner decides the model; this agent is unrestricted.
-**Cost:** Free.  Runs on the host machine.
-
-**Best for:**
- Everything.  This is the default agent for all coding tasks.
- Iterative development, fast feedback loops, bulk generation
- Running as a Docker swarm worker — scales horizontally at zero marginal cost
- Experimenting with new models without changing any other code
-
-**Conventions to follow:**
- Communicate with the coordinator over HTTP (`COORDINATOR_URL` env var)
- Register capabilities honestly so the auction system routes tasks well
- Write tests for anything non-trivial
-
-**No restrictions.**  If a model can do it, do it.
-
---
-
-### Kimi (Moonshot AI)
-**Model:** Moonshot large-context models.
-**Cost:** Paid API.
-
-**Best for:**
- Large context feature drops (new pages, new subsystems, new agent personas)
- Implementing roadmap items that require reading many files at once
- Generating boilerplate for new agents (Echo, Mace, Helm, Seer, Forge, Quill)
-
-**Conventions to follow:**
- Deliver working code with accompanying tests (even if minimal)
- Match the arcane CSS theme — extend `static/style.css`
- New agents follow the `SwarmNode` + `Registry` + Docker pattern
- Lightning-gated endpoints follow the L402 pattern in `src/timmy_serve/l402_proxy.py`
-
-**Avoid:**
- Touching CI/CD or pyproject.toml without coordinating
- Adding cloud API calls
- Removing existing tests
-
---
-
-### DeepSeek (DeepSeek API)
-**Model:** `deepseek-chat` (V3) or `deepseek-reasoner` (R1).
-**Cost:** Near-free (~$0.14/M tokens).
-
-**Best for:**
- Second-opinion feature generation when Kimi is busy or context is smaller
- Large refactors with reasoning traces (use R1 for hard problems)
- Code review passes before merging Kimi PRs
- Anything that doesn't need a frontier model but benefits from strong reasoning
-
-**Conventions to follow:**
- Same conventions as Kimi
- Prefer V3 for straightforward tasks; R1 for anything requiring multi-step logic
- Submit PRs for review by Claude before merging
-
-**Avoid:**
- Bypassing the review tier for security-sensitive modules
- Touching `src/swarm/coordinator.py` without Claude review
-
---
-
-### 🔍 REVIEW TIER
-
---
-
-### Claude (Anthropic)
-**Model:** Claude Sonnet.
-**Cost:** Paid API.
-
-**Best for:**
- Architecture decisions and code-quality review
- Writing and fixing tests; keeping coverage green
- Updating documentation (README, AGENTS.md, inline comments)
- CI/CD, tooling, Docker infrastructure
- Debugging tricky async or import issues
- Reviewing PRs from Local, Kimi, and DeepSeek before merge
-
-**Conventions to follow:**
- Prefer editing existing files over creating new ones
- Keep route files thin — business logic lives in the module, not the route
- Use `from config import settings` for all env-var access
- New routes go in `src/dashboard/routes/`, registered in `app.py`
- Always add a corresponding `tests/test_<module>.py`
-
-**Avoid:**
- Large one-shot feature dumps (use Local or Kimi)
- Touching `src/swarm/coordinator.py` for security work (that's Manus's lane)
-
---
-
-### Gemini (Google)
-**Model:** Gemini 2.0 Flash (free tier) or Pro.
-**Cost:** Free tier generous; upgrade only if needed.
-
-**Best for:**
- Documentation, README updates, inline docstrings
- Frontend polish — HTML templates, CSS, accessibility review
- Boilerplate generation (test stubs, config files, GitHub Actions)
- Summarising large diffs for human review
-
-**Conventions to follow:**
- Submit changes as PRs; always include a plain-English summary of what changed
- For CSS changes, test at mobile breakpoint (≤768px) before submitting
- Never modify Python business logic without Claude review
-
-**Avoid:**
- Security-sensitive modules (that's Manus's lane)
- Changing auction or payment logic
- Large Python refactors
-
---
-
-### Manus AI
-**Strengths:** Precision security work, targeted bug fixes, coverage gap analysis.
-
-**Best for:**
- Security audits (XSS, injection, secret exposure)
- Closing test coverage gaps for existing modules
- Performance profiling of specific endpoints
- Validating L402/Lightning payment flows
-
-**Conventions to follow:**
- Scope tightly — one security issue per PR
- Every security fix must have a regression test
- Use `pytest-cov` output to identify gaps before writing new tests
- Document the vulnerability class in the PR description
-
-**Avoid:**
- Large-scale refactors (that's Claude's lane)
- New feature work (use Local or Kimi)
- Changing agent personas or prompt content
-
---
-
-## 4. Docker — Running Agents as Containers
-
-Each agent can run as an isolated Docker container.  Containers share the
-`data/` volume for SQLite and communicate with the coordinator over HTTP.
+`COORDINATOR_URL=http://dashboard:8000` is set by docker-compose.

 ```bash
-make docker-build          # build the image
-make docker-up             # start dashboard + deps
-make docker-agent          # spawn one agent worker (LOCAL model)
-make docker-down           # stop everything
-make docker-logs           # tail all service logs
-```
-
-### How container agents communicate
-
-Container agents cannot use the in-memory `SwarmComms` channel.  Instead they
-poll the coordinator's internal HTTP API:
-
-```
-GET  /internal/tasks          → list tasks open for bidding
-POST /internal/bids           → submit a bid
-```
-
-Set `COORDINATOR_URL=http://dashboard:8000` in the container environment
-(docker-compose sets this automatically).
-
-### Spawning a container agent from Python
-
-```python
-from swarm.docker_runner import DockerAgentRunner
-
-runner = DockerAgentRunner(coordinator_url="http://dashboard:8000")
-info   = runner.spawn("Echo", image="timmy-time:latest")
-runner.stop(info["container_id"])
+make docker-build       # build image
+make docker-up          # start dashboard
+make docker-agent       # add a worker
 ```

 ---

-## 5. Architecture Patterns
-
-### Singletons (module-level instances)
-```python
-from dashboard.store import message_log
-from notifications.push import notifier
-from ws_manager.handler import ws_manager
-from timmy_serve.payment_handler import payment_handler
-from swarm.coordinator import coordinator
-```
-
-### Config access
-```python
-from config import settings
-url = settings.ollama_url   # never os.environ.get() directly in route files
-```
-
-### HTMX pattern
-```python
-return templates.TemplateResponse(
-    "partials/chat_message.html",
-    {"request": request, "role": "user", "content": message}
-)
-```
-
-### Graceful degradation
-```python
-try:
-    result = await some_optional_service()
-except Exception:
-    result = fallback_value   # log, don't crash
-```
-
-### Tests
- All heavy deps (`agno`, `airllm`, `pyttsx3`) are stubbed in `tests/conftest.py`
- Use `pytest.fixture` for shared state; prefer function scope
- Use `TestClient` from `fastapi.testclient` for route tests
- No real Ollama required — mock `agent.run()`
-
---
-
-## 6. Running Locally
-
-```bash
-make install        # create venv + install dev deps
-make test           # run full test suite
-make dev            # start dashboard (http://localhost:8000)
-make watch          # self-TDD watchdog (60s poll)
-make test-cov       # coverage report
-```
-
-Or with Docker:
-```bash
-make docker-build   # build image
-make docker-up      # start dashboard
-make docker-agent   # add a Local agent worker
-```
-
---
-
-## 7. Roadmap (v2 → v3)
-
-**v2.0.0 — Exodus (in progress)**
- [x] Persistent swarm state across restarts
- [x] Docker infrastructure for agent containers
- [x] Implement Echo, Mace, Helm, Seer, Forge, Quill persona agents (+ Pixel, Lyra, Reel)
- [x] MCP tool integration for Timmy
- [ ] Real LND gRPC backend for `PaymentHandler` (replace mock)
- [ ] Marketplace frontend — wire `/marketplace` route to real data
-
-**v3.0.0 — Revelation (planned)**
- [ ] Bitcoin Lightning treasury (agent earns and spends sats autonomously)
- [ ] Single `.app` bundle for macOS (no Python install required)
- [ ] Federation — multiple Timmy instances discover and bid on each other's tasks
- [ ] Redis pub/sub replacing SQLite polling for high-throughput swarms
-
---
-
-## 8. File Conventions
+## File Conventions

 | Pattern | Convention |
 |---------|-----------|
 | New route | `src/dashboard/routes/<name>.py` + register in `app.py` |
 | New template | `src/dashboard/templates/<name>.html` extends `base.html` |
-| New partial | `src/dashboard/templates/partials/<name>.html` |
 | New subsystem | `src/<name>/` with `__init__.py` |
-| New test file | `tests/test_<module>.py` |
-| Secrets | Read via `os.environ.get("VAR", "default")` + startup warning if default |
-| DB files | `.db` files go in project root or `data/` — never in `src/` |
-| Docker | One service per agent type in `docker-compose.yml` |
+| New test | `tests/test_<module>.py` |
+| Secrets | Via `config.settings` + startup warning if default |
+| DB files | Project root or `data/` — never in `src/` |
+
+---
+
+## Roadmap
+
+**v2.0 Exodus (in progress):** Swarm + L402 + Voice + Marketplace + Hands
+**v3.0 Revelation (planned):** Lightning treasury + `.app` bundle + federation