selenium depends on websocket-client which installs a top-level `websocket` package that shadows our src/websocket/ module on CI. Renaming to ws_manager eliminates the conflict entirely — no more sys.path hacks needed in conftest or Selenium tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
12 KiB
AGENTS.md — Timmy Time Development Standards for AI Agents
This file is the authoritative reference for any AI agent contributing to this repository. Read it first. Every time.
1. Project at a Glance
Timmy Time is a local-first, sovereign AI agent system. No cloud. No telemetry. Bitcoin Lightning economics baked in.
| Thing | Value |
|---|---|
| Language | Python 3.11+ |
| Web framework | FastAPI + Jinja2 + HTMX |
| Agent framework | Agno (wraps Ollama or AirLLM) |
| Persistence | SQLite (timmy.db, data/swarm.db) |
| Tests | pytest — must stay green |
| Entry points | timmy, timmy-serve, self-tdd |
| Config | pydantic-settings, reads .env |
| Containers | Docker — each agent can run as an isolated service |
src/
config.py # Central settings (OLLAMA_URL, DEBUG, etc.)
timmy/ # Core agent: agent.py, backends.py, cli.py, prompts.py
dashboard/ # FastAPI app + routes + Jinja2 templates
app.py
store.py # In-memory MessageLog singleton
routes/ # agents, health, swarm, swarm_ws, marketplace,
│ # mobile, mobile_test, voice, voice_enhanced,
│ # swarm_internal (HTTP API for Docker agents)
templates/ # base.html + page templates + partials/
swarm/ # Multi-agent coordinator, registry, bidder, tasks, comms
docker_runner.py # Spawn agents as Docker containers
timmy_serve/ # L402 Lightning proxy, payment handler, TTS, CLI
spark/ # Intelligence engine — events, predictions, advisory
creative/ # Creative director + video assembler pipeline
tools/ # Git, image, music, video tools for persona agents
lightning/ # Lightning backend abstraction (mock + LND)
agent_core/ # Substrate-agnostic agent interface
voice/ # NLU intent detection (regex-based, no cloud)
ws_manager/ # WebSocket manager (ws_manager singleton)
notifications/ # Push notification store (notifier singleton)
shortcuts/ # Siri Shortcuts API endpoints
telegram_bot/ # Telegram bridge
self_tdd/ # Continuous test watchdog
tests/ # One test_*.py per module, all mocked
static/ # style.css + bg.svg (arcane theme)
docs/ # GitHub Pages site
2. Non-Negotiable Rules
- Tests must stay green. Run
make testbefore committing. - No cloud dependencies. All AI computation runs on localhost.
- No new top-level files without purpose. Don't litter the root directory.
- Follow existing patterns — singletons, graceful degradation, pydantic-settings config.
- Security defaults: Never hard-code secrets. Warn at startup when defaults are in use.
- XSS prevention: Never use
innerHTMLwith untrusted content.
3. Agent Roster
Agents are divided into two tiers: Builders generate code and features; Reviewers provide quality gates, feedback, and hardening. The Local agent is the primary workhorse — use it as much as possible to minimise cost.
🏗️ BUILD TIER
Local — Ollama (primary workhorse)
Model: Any — qwen2.5-coder, deepseek-coder-v2, codellama, or whatever
is loaded in Ollama. The owner decides the model; this agent is unrestricted.
Cost: Free. Runs on the host machine.
Best for:
- Everything. This is the default agent for all coding tasks.
- Iterative development, fast feedback loops, bulk generation
- Running as a Docker swarm worker — scales horizontally at zero marginal cost
- Experimenting with new models without changing any other code
Conventions to follow:
- Communicate with the coordinator over HTTP (
COORDINATOR_URLenv var) - Register capabilities honestly so the auction system routes tasks well
- Write tests for anything non-trivial
No restrictions. If a model can do it, do it.
Kimi (Moonshot AI)
Model: Moonshot large-context models. Cost: Paid API.
Best for:
- Large context feature drops (new pages, new subsystems, new agent personas)
- Implementing roadmap items that require reading many files at once
- Generating boilerplate for new agents (Echo, Mace, Helm, Seer, Forge, Quill)
Conventions to follow:
- Deliver working code with accompanying tests (even if minimal)
- Match the arcane CSS theme — extend
static/style.css - New agents follow the
SwarmNode+Registry+ Docker pattern - Lightning-gated endpoints follow the L402 pattern in
src/timmy_serve/l402_proxy.py
Avoid:
- Touching CI/CD or pyproject.toml without coordinating
- Adding cloud API calls
- Removing existing tests
DeepSeek (DeepSeek API)
Model: deepseek-chat (V3) or deepseek-reasoner (R1).
Cost: Near-free (~$0.14/M tokens).
Best for:
- Second-opinion feature generation when Kimi is busy or context is smaller
- Large refactors with reasoning traces (use R1 for hard problems)
- Code review passes before merging Kimi PRs
- Anything that doesn't need a frontier model but benefits from strong reasoning
Conventions to follow:
- Same conventions as Kimi
- Prefer V3 for straightforward tasks; R1 for anything requiring multi-step logic
- Submit PRs for review by Claude before merging
Avoid:
- Bypassing the review tier for security-sensitive modules
- Touching
src/swarm/coordinator.pywithout Claude review
🔍 REVIEW TIER
Claude (Anthropic)
Model: Claude Sonnet. Cost: Paid API.
Best for:
- Architecture decisions and code-quality review
- Writing and fixing tests; keeping coverage green
- Updating documentation (README, AGENTS.md, inline comments)
- CI/CD, tooling, Docker infrastructure
- Debugging tricky async or import issues
- Reviewing PRs from Local, Kimi, and DeepSeek before merge
Conventions to follow:
- Prefer editing existing files over creating new ones
- Keep route files thin — business logic lives in the module, not the route
- Use
from config import settingsfor all env-var access - New routes go in
src/dashboard/routes/, registered inapp.py - Always add a corresponding
tests/test_<module>.py
Avoid:
- Large one-shot feature dumps (use Local or Kimi)
- Touching
src/swarm/coordinator.pyfor security work (that's Manus's lane)
Gemini (Google)
Model: Gemini 2.0 Flash (free tier) or Pro. Cost: Free tier generous; upgrade only if needed.
Best for:
- Documentation, README updates, inline docstrings
- Frontend polish — HTML templates, CSS, accessibility review
- Boilerplate generation (test stubs, config files, GitHub Actions)
- Summarising large diffs for human review
Conventions to follow:
- Submit changes as PRs; always include a plain-English summary of what changed
- For CSS changes, test at mobile breakpoint (≤768px) before submitting
- Never modify Python business logic without Claude review
Avoid:
- Security-sensitive modules (that's Manus's lane)
- Changing auction or payment logic
- Large Python refactors
Manus AI
Strengths: Precision security work, targeted bug fixes, coverage gap analysis.
Best for:
- Security audits (XSS, injection, secret exposure)
- Closing test coverage gaps for existing modules
- Performance profiling of specific endpoints
- Validating L402/Lightning payment flows
Conventions to follow:
- Scope tightly — one security issue per PR
- Every security fix must have a regression test
- Use
pytest-covoutput to identify gaps before writing new tests - Document the vulnerability class in the PR description
Avoid:
- Large-scale refactors (that's Claude's lane)
- New feature work (use Local or Kimi)
- Changing agent personas or prompt content
4. Docker — Running Agents as Containers
Each agent can run as an isolated Docker container. Containers share the
data/ volume for SQLite and communicate with the coordinator over HTTP.
make docker-build # build the image
make docker-up # start dashboard + deps
make docker-agent # spawn one agent worker (LOCAL model)
make docker-down # stop everything
make docker-logs # tail all service logs
How container agents communicate
Container agents cannot use the in-memory SwarmComms channel. Instead they
poll the coordinator's internal HTTP API:
GET /internal/tasks → list tasks open for bidding
POST /internal/bids → submit a bid
Set COORDINATOR_URL=http://dashboard:8000 in the container environment
(docker-compose sets this automatically).
Spawning a container agent from Python
from swarm.docker_runner import DockerAgentRunner
runner = DockerAgentRunner(coordinator_url="http://dashboard:8000")
info = runner.spawn("Echo", image="timmy-time:latest")
runner.stop(info["container_id"])
5. Architecture Patterns
Singletons (module-level instances)
from dashboard.store import message_log
from notifications.push import notifier
from ws_manager.handler import ws_manager
from timmy_serve.payment_handler import payment_handler
from swarm.coordinator import coordinator
Config access
from config import settings
url = settings.ollama_url # never os.environ.get() directly in route files
HTMX pattern
return templates.TemplateResponse(
"partials/chat_message.html",
{"request": request, "role": "user", "content": message}
)
Graceful degradation
try:
result = await some_optional_service()
except Exception:
result = fallback_value # log, don't crash
Tests
- All heavy deps (
agno,airllm,pyttsx3) are stubbed intests/conftest.py - Use
pytest.fixturefor shared state; prefer function scope - Use
TestClientfromfastapi.testclientfor route tests - No real Ollama required — mock
agent.run()
6. Running Locally
make install # create venv + install dev deps
make test # run full test suite
make dev # start dashboard (http://localhost:8000)
make watch # self-TDD watchdog (60s poll)
make test-cov # coverage report
Or with Docker:
make docker-build # build image
make docker-up # start dashboard
make docker-agent # add a Local agent worker
7. Roadmap (v2 → v3)
v2.0.0 — Exodus (in progress)
- Persistent swarm state across restarts
- Docker infrastructure for agent containers
- Implement Echo, Mace, Helm, Seer, Forge, Quill persona agents (+ Pixel, Lyra, Reel)
- MCP tool integration for Timmy
- Real LND gRPC backend for
PaymentHandler(replace mock) - Marketplace frontend — wire
/marketplaceroute to real data
v3.0.0 — Revelation (planned)
- Bitcoin Lightning treasury (agent earns and spends sats autonomously)
- Single
.appbundle for macOS (no Python install required) - Federation — multiple Timmy instances discover and bid on each other's tasks
- Redis pub/sub replacing SQLite polling for high-throughput swarms
8. File Conventions
| Pattern | Convention |
|---|---|
| New route | src/dashboard/routes/<name>.py + register in app.py |
| New template | src/dashboard/templates/<name>.html extends base.html |
| New partial | src/dashboard/templates/partials/<name>.html |
| New subsystem | src/<name>/ with __init__.py |
| New test file | tests/test_<module>.py |
| Secrets | Read via os.environ.get("VAR", "default") + startup warning if default |
| DB files | .db files go in project root or data/ — never in src/ |
| Docker | One service per agent type in docker-compose.yml |