Files
Timmy-time-dashboard/AGENTS.md

343 lines
12 KiB
Markdown
Raw Normal View History

# AGENTS.md — Timmy Time Development Standards for AI Agents
This file is the authoritative reference for any AI agent contributing to
this repository. Read it first. Every time.
---
## 1. Project at a Glance
**Timmy Time** is a local-first, sovereign AI agent system. No cloud. No telemetry.
Bitcoin Lightning economics baked in.
| Thing | Value |
|------------------|----------------------------------------------------|
| Language | Python 3.11+ |
| Web framework | FastAPI + Jinja2 + HTMX |
| Agent framework | Agno (wraps Ollama or AirLLM) |
| Persistence | SQLite (`timmy.db`, `data/swarm.db`) |
| Tests | pytest — must stay green |
| Entry points | `timmy`, `timmy-serve`, `self-tdd` |
| Config | pydantic-settings, reads `.env` |
| Containers | Docker — each agent can run as an isolated service |
```
src/
config.py # Central settings (OLLAMA_URL, DEBUG, etc.)
timmy/ # Core agent: agent.py, backends.py, cli.py, prompts.py
dashboard/ # FastAPI app + routes + Jinja2 templates
app.py
store.py # In-memory MessageLog singleton
routes/ # agents, health, swarm, swarm_ws, marketplace,
│ # mobile, mobile_test, voice, voice_enhanced,
│ # swarm_internal (HTTP API for Docker agents)
templates/ # base.html + page templates + partials/
swarm/ # Multi-agent coordinator, registry, bidder, tasks, comms
docker_runner.py # Spawn agents as Docker containers
timmy_serve/ # L402 Lightning proxy, payment handler, TTS, CLI
spark/ # Intelligence engine — events, predictions, advisory
creative/ # Creative director + video assembler pipeline
tools/ # Git, image, music, video tools for persona agents
lightning/ # Lightning backend abstraction (mock + LND)
agent_core/ # Substrate-agnostic agent interface
voice/ # NLU intent detection (regex-based, no cloud)
ws_manager/ # WebSocket manager (ws_manager singleton)
notifications/ # Push notification store (notifier singleton)
shortcuts/ # Siri Shortcuts API endpoints
telegram_bot/ # Telegram bridge
self_tdd/ # Continuous test watchdog
tests/ # One test_*.py per module, all mocked
static/ # style.css + bg.svg (arcane theme)
docs/ # GitHub Pages site
```
---
## 2. Non-Negotiable Rules
1. **Tests must stay green.** Run `make test` before committing.
2. **No cloud dependencies.** All AI computation runs on localhost.
3. **No new top-level files without purpose.** Don't litter the root directory.
4. **Follow existing patterns** — singletons, graceful degradation, pydantic-settings config.
5. **Security defaults:** Never hard-code secrets. Warn at startup when defaults are in use.
6. **XSS prevention:** Never use `innerHTML` with untrusted content.
---
## 3. Agent Roster
Agents are divided into two tiers: **Builders** generate code and features;
**Reviewers** provide quality gates, feedback, and hardening. The Local agent
is the primary workhorse — use it as much as possible to minimise cost.
---
### 🏗️ BUILD TIER
---
### Local — Ollama (primary workhorse)
**Model:** Any — `qwen2.5-coder`, `deepseek-coder-v2`, `codellama`, or whatever
is loaded in Ollama. The owner decides the model; this agent is unrestricted.
**Cost:** Free. Runs on the host machine.
**Best for:**
- Everything. This is the default agent for all coding tasks.
- Iterative development, fast feedback loops, bulk generation
- Running as a Docker swarm worker — scales horizontally at zero marginal cost
- Experimenting with new models without changing any other code
**Conventions to follow:**
- Communicate with the coordinator over HTTP (`COORDINATOR_URL` env var)
- Register capabilities honestly so the auction system routes tasks well
- Write tests for anything non-trivial
**No restrictions.** If a model can do it, do it.
---
### Kimi (Moonshot AI)
**Model:** Moonshot large-context models.
**Cost:** Paid API.
**Best for:**
- Large context feature drops (new pages, new subsystems, new agent personas)
- Implementing roadmap items that require reading many files at once
- Generating boilerplate for new agents (Echo, Mace, Helm, Seer, Forge, Quill)
**Conventions to follow:**
- Deliver working code with accompanying tests (even if minimal)
- Match the arcane CSS theme — extend `static/style.css`
- New agents follow the `SwarmNode` + `Registry` + Docker pattern
- Lightning-gated endpoints follow the L402 pattern in `src/timmy_serve/l402_proxy.py`
**Avoid:**
- Touching CI/CD or pyproject.toml without coordinating
- Adding cloud API calls
- Removing existing tests
---
### DeepSeek (DeepSeek API)
**Model:** `deepseek-chat` (V3) or `deepseek-reasoner` (R1).
**Cost:** Near-free (~$0.14/M tokens).
**Best for:**
- Second-opinion feature generation when Kimi is busy or context is smaller
- Large refactors with reasoning traces (use R1 for hard problems)
- Code review passes before merging Kimi PRs
- Anything that doesn't need a frontier model but benefits from strong reasoning
**Conventions to follow:**
- Same conventions as Kimi
- Prefer V3 for straightforward tasks; R1 for anything requiring multi-step logic
- Submit PRs for review by Claude before merging
**Avoid:**
- Bypassing the review tier for security-sensitive modules
- Touching `src/swarm/coordinator.py` without Claude review
---
### 🔍 REVIEW TIER
---
### Claude (Anthropic)
**Model:** Claude Sonnet.
**Cost:** Paid API.
**Best for:**
- Architecture decisions and code-quality review
- Writing and fixing tests; keeping coverage green
- Updating documentation (README, AGENTS.md, inline comments)
- CI/CD, tooling, Docker infrastructure
- Debugging tricky async or import issues
- Reviewing PRs from Local, Kimi, and DeepSeek before merge
**Conventions to follow:**
- Prefer editing existing files over creating new ones
- Keep route files thin — business logic lives in the module, not the route
- Use `from config import settings` for all env-var access
- New routes go in `src/dashboard/routes/`, registered in `app.py`
- Always add a corresponding `tests/test_<module>.py`
**Avoid:**
- Large one-shot feature dumps (use Local or Kimi)
- Touching `src/swarm/coordinator.py` for security work (that's Manus's lane)
---
### Gemini (Google)
**Model:** Gemini 2.0 Flash (free tier) or Pro.
**Cost:** Free tier generous; upgrade only if needed.
**Best for:**
- Documentation, README updates, inline docstrings
- Frontend polish — HTML templates, CSS, accessibility review
- Boilerplate generation (test stubs, config files, GitHub Actions)
- Summarising large diffs for human review
**Conventions to follow:**
- Submit changes as PRs; always include a plain-English summary of what changed
- For CSS changes, test at mobile breakpoint (≤768px) before submitting
- Never modify Python business logic without Claude review
**Avoid:**
- Security-sensitive modules (that's Manus's lane)
- Changing auction or payment logic
- Large Python refactors
---
### Manus AI
**Strengths:** Precision security work, targeted bug fixes, coverage gap analysis.
**Best for:**
- Security audits (XSS, injection, secret exposure)
- Closing test coverage gaps for existing modules
- Performance profiling of specific endpoints
- Validating L402/Lightning payment flows
**Conventions to follow:**
- Scope tightly — one security issue per PR
- Every security fix must have a regression test
- Use `pytest-cov` output to identify gaps before writing new tests
- Document the vulnerability class in the PR description
**Avoid:**
- Large-scale refactors (that's Claude's lane)
- New feature work (use Local or Kimi)
- Changing agent personas or prompt content
---
## 4. Docker — Running Agents as Containers
Each agent can run as an isolated Docker container. Containers share the
`data/` volume for SQLite and communicate with the coordinator over HTTP.
```bash
make docker-build # build the image
make docker-up # start dashboard + deps
make docker-agent # spawn one agent worker (LOCAL model)
make docker-down # stop everything
make docker-logs # tail all service logs
```
### How container agents communicate
Container agents cannot use the in-memory `SwarmComms` channel. Instead they
poll the coordinator's internal HTTP API:
```
GET /internal/tasks → list tasks open for bidding
POST /internal/bids → submit a bid
```
Set `COORDINATOR_URL=http://dashboard:8000` in the container environment
(docker-compose sets this automatically).
### Spawning a container agent from Python
```python
from swarm.docker_runner import DockerAgentRunner
runner = DockerAgentRunner(coordinator_url="http://dashboard:8000")
info = runner.spawn("Echo", image="timmy-time:latest")
runner.stop(info["container_id"])
```
---
## 5. Architecture Patterns
### Singletons (module-level instances)
```python
from dashboard.store import message_log
from notifications.push import notifier
from ws_manager.handler import ws_manager
from timmy_serve.payment_handler import payment_handler
from swarm.coordinator import coordinator
```
### Config access
```python
from config import settings
url = settings.ollama_url # never os.environ.get() directly in route files
```
### HTMX pattern
```python
return templates.TemplateResponse(
"partials/chat_message.html",
{"request": request, "role": "user", "content": message}
)
```
### Graceful degradation
```python
try:
result = await some_optional_service()
except Exception:
result = fallback_value # log, don't crash
```
### Tests
- All heavy deps (`agno`, `airllm`, `pyttsx3`) are stubbed in `tests/conftest.py`
- Use `pytest.fixture` for shared state; prefer function scope
- Use `TestClient` from `fastapi.testclient` for route tests
- No real Ollama required — mock `agent.run()`
---
## 6. Running Locally
```bash
make install # create venv + install dev deps
make test # run full test suite
make dev # start dashboard (http://localhost:8000)
make watch # self-TDD watchdog (60s poll)
make test-cov # coverage report
```
Or with Docker:
```bash
make docker-build # build image
make docker-up # start dashboard
make docker-agent # add a Local agent worker
```
---
## 7. Roadmap (v2 → v3)
**v2.0.0 — Exodus (in progress)**
- [x] Persistent swarm state across restarts
- [x] Docker infrastructure for agent containers
- [x] Implement Echo, Mace, Helm, Seer, Forge, Quill persona agents (+ Pixel, Lyra, Reel)
- [x] MCP tool integration for Timmy
- [ ] Real LND gRPC backend for `PaymentHandler` (replace mock)
- [ ] Marketplace frontend — wire `/marketplace` route to real data
**v3.0.0 — Revelation (planned)**
- [ ] Bitcoin Lightning treasury (agent earns and spends sats autonomously)
- [ ] Single `.app` bundle for macOS (no Python install required)
- [ ] Federation — multiple Timmy instances discover and bid on each other's tasks
- [ ] Redis pub/sub replacing SQLite polling for high-throughput swarms
---
## 8. File Conventions
| Pattern | Convention |
|---------|-----------|
| New route | `src/dashboard/routes/<name>.py` + register in `app.py` |
| New template | `src/dashboard/templates/<name>.html` extends `base.html` |
| New partial | `src/dashboard/templates/partials/<name>.html` |
| New subsystem | `src/<name>/` with `__init__.py` |
| New test file | `tests/test_<module>.py` |
| Secrets | Read via `os.environ.get("VAR", "default")` + startup warning if default |
| DB files | `.db` files go in project root or `data/` — never in `src/` |
| Docker | One service per agent type in `docker-compose.yml` |