Merge pull request #58 from AlexanderWhitestone/claude/plan-repo-refactoring-hgskF

This commit is contained in:
Alexander Whitestone
2026-02-26 16:33:11 -05:00
committed by GitHub
145 changed files with 997 additions and 2169 deletions

6
.gitignore vendored
View File

@@ -35,6 +35,12 @@ coverage.xml
htmlcov/
reports/
# Self-modify reports (auto-generated)
data/self_modify_reports/
# Handoff context (session-scoped)
.handoff/
# IDE
.idea/
.vscode/

375
AGENTS.md
View File

@@ -1,342 +1,79 @@
# AGENTS.md — Timmy Time Development Standards for AI Agents
This file is the authoritative reference for any AI agent contributing to
this repository. Read it first. Every time.
Read [`CLAUDE.md`](CLAUDE.md) for architecture patterns and conventions.
---
## 1. Project at a Glance
## Non-Negotiable Rules
**Timmy Time** is a local-first, sovereign AI agent system. No cloud. No telemetry.
Bitcoin Lightning economics baked in.
1. **Tests must stay green.** Run `make test` before committing.
2. **No cloud dependencies.** All AI computation runs on localhost.
3. **No new top-level files without purpose.** Don't litter the root directory.
4. **Follow existing patterns** — singletons, graceful degradation, pydantic-settings.
5. **Security defaults:** Never hard-code secrets.
6. **XSS prevention:** Never use `innerHTML` with untrusted content.
| Thing | Value |
|------------------|----------------------------------------------------|
| Language | Python 3.11+ |
| Web framework | FastAPI + Jinja2 + HTMX |
| Agent framework | Agno (wraps Ollama or AirLLM) |
| Persistence | SQLite (`timmy.db`, `data/swarm.db`) |
| Tests | pytest — must stay green |
| Entry points | `timmy`, `timmy-serve`, `self-tdd` |
| Config | pydantic-settings, reads `.env` |
| Containers | Docker — each agent can run as an isolated service |
---
## Agent Roster
### Build Tier
**Local (Ollama)** — Primary workhorse. Free. Unrestricted.
Best for: everything, iterative dev, Docker swarm workers.
**Kimi (Moonshot)** — Paid. Large-context feature drops, new subsystems, persona agents.
Avoid: touching CI/pyproject.toml, adding cloud calls, removing tests.
**DeepSeek** — Near-free. Second-opinion generation, large refactors (R1 for hard problems).
Avoid: bypassing review tier for security modules.
### Review Tier
**Claude (Anthropic)** — Architecture, tests, docs, CI/CD, PR review.
Avoid: large one-shot feature dumps.
**Gemini (Google)** — Docs, frontend polish, boilerplate, diff summaries.
Avoid: security modules, Python business logic without Claude review.
**Manus AI** — Security audits, coverage gaps, L402 validation.
Avoid: large refactors, new features, prompt changes.
---
## Docker Agents
Container agents poll the coordinator's HTTP API (not in-memory `SwarmComms`):
```
src/
config.py # Central settings (OLLAMA_URL, DEBUG, etc.)
timmy/ # Core agent: agent.py, backends.py, cli.py, prompts.py
dashboard/ # FastAPI app + routes + Jinja2 templates
app.py
store.py # In-memory MessageLog singleton
routes/ # agents, health, swarm, swarm_ws, marketplace,
│ # mobile, mobile_test, voice, voice_enhanced,
│ # swarm_internal (HTTP API for Docker agents)
templates/ # base.html + page templates + partials/
swarm/ # Multi-agent coordinator, registry, bidder, tasks, comms
docker_runner.py # Spawn agents as Docker containers
timmy_serve/ # L402 Lightning proxy, payment handler, TTS, CLI
spark/ # Intelligence engine — events, predictions, advisory
creative/ # Creative director + video assembler pipeline
tools/ # Git, image, music, video tools for persona agents
lightning/ # Lightning backend abstraction (mock + LND)
agent_core/ # Substrate-agnostic agent interface
voice/ # NLU intent detection (regex-based, no cloud)
ws_manager/ # WebSocket manager (ws_manager singleton)
notifications/ # Push notification store (notifier singleton)
shortcuts/ # Siri Shortcuts API endpoints
telegram_bot/ # Telegram bridge
self_tdd/ # Continuous test watchdog
tests/ # One test_*.py per module, all mocked
static/ # style.css + bg.svg (arcane theme)
docs/ # GitHub Pages site
GET /internal/tasks → list tasks open for bidding
POST /internal/bids → submit a bid
```
---
## 2. Non-Negotiable Rules
1. **Tests must stay green.** Run `make test` before committing.
2. **No cloud dependencies.** All AI computation runs on localhost.
3. **No new top-level files without purpose.** Don't litter the root directory.
4. **Follow existing patterns** — singletons, graceful degradation, pydantic-settings config.
5. **Security defaults:** Never hard-code secrets. Warn at startup when defaults are in use.
6. **XSS prevention:** Never use `innerHTML` with untrusted content.
---
## 3. Agent Roster
Agents are divided into two tiers: **Builders** generate code and features;
**Reviewers** provide quality gates, feedback, and hardening. The Local agent
is the primary workhorse — use it as much as possible to minimise cost.
---
### 🏗️ BUILD TIER
---
### Local — Ollama (primary workhorse)
**Model:** Any — `qwen2.5-coder`, `deepseek-coder-v2`, `codellama`, or whatever
is loaded in Ollama. The owner decides the model; this agent is unrestricted.
**Cost:** Free. Runs on the host machine.
**Best for:**
- Everything. This is the default agent for all coding tasks.
- Iterative development, fast feedback loops, bulk generation
- Running as a Docker swarm worker — scales horizontally at zero marginal cost
- Experimenting with new models without changing any other code
**Conventions to follow:**
- Communicate with the coordinator over HTTP (`COORDINATOR_URL` env var)
- Register capabilities honestly so the auction system routes tasks well
- Write tests for anything non-trivial
**No restrictions.** If a model can do it, do it.
---
### Kimi (Moonshot AI)
**Model:** Moonshot large-context models.
**Cost:** Paid API.
**Best for:**
- Large context feature drops (new pages, new subsystems, new agent personas)
- Implementing roadmap items that require reading many files at once
- Generating boilerplate for new agents (Echo, Mace, Helm, Seer, Forge, Quill)
**Conventions to follow:**
- Deliver working code with accompanying tests (even if minimal)
- Match the arcane CSS theme — extend `static/style.css`
- New agents follow the `SwarmNode` + `Registry` + Docker pattern
- Lightning-gated endpoints follow the L402 pattern in `src/timmy_serve/l402_proxy.py`
**Avoid:**
- Touching CI/CD or pyproject.toml without coordinating
- Adding cloud API calls
- Removing existing tests
---
### DeepSeek (DeepSeek API)
**Model:** `deepseek-chat` (V3) or `deepseek-reasoner` (R1).
**Cost:** Near-free (~$0.14/M tokens).
**Best for:**
- Second-opinion feature generation when Kimi is busy or context is smaller
- Large refactors with reasoning traces (use R1 for hard problems)
- Code review passes before merging Kimi PRs
- Anything that doesn't need a frontier model but benefits from strong reasoning
**Conventions to follow:**
- Same conventions as Kimi
- Prefer V3 for straightforward tasks; R1 for anything requiring multi-step logic
- Submit PRs for review by Claude before merging
**Avoid:**
- Bypassing the review tier for security-sensitive modules
- Touching `src/swarm/coordinator.py` without Claude review
---
### 🔍 REVIEW TIER
---
### Claude (Anthropic)
**Model:** Claude Sonnet.
**Cost:** Paid API.
**Best for:**
- Architecture decisions and code-quality review
- Writing and fixing tests; keeping coverage green
- Updating documentation (README, AGENTS.md, inline comments)
- CI/CD, tooling, Docker infrastructure
- Debugging tricky async or import issues
- Reviewing PRs from Local, Kimi, and DeepSeek before merge
**Conventions to follow:**
- Prefer editing existing files over creating new ones
- Keep route files thin — business logic lives in the module, not the route
- Use `from config import settings` for all env-var access
- New routes go in `src/dashboard/routes/`, registered in `app.py`
- Always add a corresponding `tests/test_<module>.py`
**Avoid:**
- Large one-shot feature dumps (use Local or Kimi)
- Touching `src/swarm/coordinator.py` for security work (that's Manus's lane)
---
### Gemini (Google)
**Model:** Gemini 2.0 Flash (free tier) or Pro.
**Cost:** Free tier generous; upgrade only if needed.
**Best for:**
- Documentation, README updates, inline docstrings
- Frontend polish — HTML templates, CSS, accessibility review
- Boilerplate generation (test stubs, config files, GitHub Actions)
- Summarising large diffs for human review
**Conventions to follow:**
- Submit changes as PRs; always include a plain-English summary of what changed
- For CSS changes, test at mobile breakpoint (≤768px) before submitting
- Never modify Python business logic without Claude review
**Avoid:**
- Security-sensitive modules (that's Manus's lane)
- Changing auction or payment logic
- Large Python refactors
---
### Manus AI
**Strengths:** Precision security work, targeted bug fixes, coverage gap analysis.
**Best for:**
- Security audits (XSS, injection, secret exposure)
- Closing test coverage gaps for existing modules
- Performance profiling of specific endpoints
- Validating L402/Lightning payment flows
**Conventions to follow:**
- Scope tightly — one security issue per PR
- Every security fix must have a regression test
- Use `pytest-cov` output to identify gaps before writing new tests
- Document the vulnerability class in the PR description
**Avoid:**
- Large-scale refactors (that's Claude's lane)
- New feature work (use Local or Kimi)
- Changing agent personas or prompt content
---
## 4. Docker — Running Agents as Containers
Each agent can run as an isolated Docker container. Containers share the
`data/` volume for SQLite and communicate with the coordinator over HTTP.
`COORDINATOR_URL=http://dashboard:8000` is set by docker-compose.
```bash
make docker-build # build the image
make docker-up # start dashboard + deps
make docker-agent # spawn one agent worker (LOCAL model)
make docker-down # stop everything
make docker-logs # tail all service logs
```
### How container agents communicate
Container agents cannot use the in-memory `SwarmComms` channel. Instead they
poll the coordinator's internal HTTP API:
```
GET /internal/tasks → list tasks open for bidding
POST /internal/bids → submit a bid
```
Set `COORDINATOR_URL=http://dashboard:8000` in the container environment
(docker-compose sets this automatically).
### Spawning a container agent from Python
```python
from swarm.docker_runner import DockerAgentRunner
runner = DockerAgentRunner(coordinator_url="http://dashboard:8000")
info = runner.spawn("Echo", image="timmy-time:latest")
runner.stop(info["container_id"])
make docker-build # build image
make docker-up # start dashboard
make docker-agent # add a worker
```
---
## 5. Architecture Patterns
### Singletons (module-level instances)
```python
from dashboard.store import message_log
from notifications.push import notifier
from ws_manager.handler import ws_manager
from timmy_serve.payment_handler import payment_handler
from swarm.coordinator import coordinator
```
### Config access
```python
from config import settings
url = settings.ollama_url # never os.environ.get() directly in route files
```
### HTMX pattern
```python
return templates.TemplateResponse(
"partials/chat_message.html",
{"request": request, "role": "user", "content": message}
)
```
### Graceful degradation
```python
try:
result = await some_optional_service()
except Exception:
result = fallback_value # log, don't crash
```
### Tests
- All heavy deps (`agno`, `airllm`, `pyttsx3`) are stubbed in `tests/conftest.py`
- Use `pytest.fixture` for shared state; prefer function scope
- Use `TestClient` from `fastapi.testclient` for route tests
- No real Ollama required — mock `agent.run()`
---
## 6. Running Locally
```bash
make install # create venv + install dev deps
make test # run full test suite
make dev # start dashboard (http://localhost:8000)
make watch # self-TDD watchdog (60s poll)
make test-cov # coverage report
```
Or with Docker:
```bash
make docker-build # build image
make docker-up # start dashboard
make docker-agent # add a Local agent worker
```
---
## 7. Roadmap (v2 → v3)
**v2.0.0 — Exodus (in progress)**
- [x] Persistent swarm state across restarts
- [x] Docker infrastructure for agent containers
- [x] Implement Echo, Mace, Helm, Seer, Forge, Quill persona agents (+ Pixel, Lyra, Reel)
- [x] MCP tool integration for Timmy
- [ ] Real LND gRPC backend for `PaymentHandler` (replace mock)
- [ ] Marketplace frontend — wire `/marketplace` route to real data
**v3.0.0 — Revelation (planned)**
- [ ] Bitcoin Lightning treasury (agent earns and spends sats autonomously)
- [ ] Single `.app` bundle for macOS (no Python install required)
- [ ] Federation — multiple Timmy instances discover and bid on each other's tasks
- [ ] Redis pub/sub replacing SQLite polling for high-throughput swarms
---
## 8. File Conventions
## File Conventions
| Pattern | Convention |
|---------|-----------|
| New route | `src/dashboard/routes/<name>.py` + register in `app.py` |
| New template | `src/dashboard/templates/<name>.html` extends `base.html` |
| New partial | `src/dashboard/templates/partials/<name>.html` |
| New subsystem | `src/<name>/` with `__init__.py` |
| New test file | `tests/test_<module>.py` |
| Secrets | Read via `os.environ.get("VAR", "default")` + startup warning if default |
| DB files | `.db` files go in project root or `data/` — never in `src/` |
| Docker | One service per agent type in `docker-compose.yml` |
| New test | `tests/test_<module>.py` |
| Secrets | Via `config.settings` + startup warning if default |
| DB files | Project root or `data/` — never in `src/` |
---
## Roadmap
**v2.0 Exodus (in progress):** Swarm + L402 + Voice + Marketplace + Hands
**v3.0 Revelation (planned):** Lightning treasury + `.app` bundle + federation

227
CLAUDE.md
View File

@@ -1,77 +1,9 @@
# CLAUDE.md — AI Assistant Guide for Timmy Time
This file provides context for AI assistants (Claude Code, Copilot, etc.)
working in this repository. Read this before making any changes.
**Tech stack:** Python 3.11+ · FastAPI · Jinja2 + HTMX · SQLite · Agno ·
Ollama · pydantic-settings · WebSockets · Docker
For multi-agent development standards and agent-specific conventions, see
[`AGENTS.md`](AGENTS.md).
---
## Project Summary
**Timmy Time** is a local-first, sovereign AI agent system with a browser-based
Mission Control dashboard. No cloud AI — all inference runs on localhost via
Ollama (or AirLLM for large models). Bitcoin Lightning economics are built in
for API access gating.
**Tech stack:** Python 3.11+ · FastAPI · Jinja2 + HTMX · SQLite · Agno (agent
framework) · Ollama · pydantic-settings · WebSockets · Docker
---
## Quick Reference Commands
```bash
# Setup
make install # Create venv + install dev deps
cp .env.example .env # Configure environment
# Development
make dev # Start dashboard at http://localhost:8000
make test # Run full test suite (no Ollama needed)
make test-cov # Tests + coverage report (terminal + XML)
make lint # Run ruff or flake8
# Docker
make docker-build # Build timmy-time:latest image
make docker-up # Start dashboard container
make docker-agent # Spawn one agent worker
make docker-down # Stop all containers
```
---
## Project Layout
```
src/
config.py # Central pydantic-settings (all env vars)
timmy/ # Core agent: agent.py, backends.py, cli.py, prompts.py
dashboard/ # FastAPI app + routes + Jinja2 templates
app.py # App factory, lifespan, router registration
store.py # In-memory MessageLog singleton
routes/ # One file per route group (agents, health, swarm, etc.)
templates/ # base.html + page templates + partials/
swarm/ # Multi-agent coordinator, registry, bidder, tasks, comms
coordinator.py # Central swarm orchestrator (security-sensitive)
docker_runner.py # Spawn agents as Docker containers
timmy_serve/ # L402 Lightning proxy, payment handler, TTS, CLI
spark/ # Intelligence engine — events, predictions, advisory
creative/ # Creative director + video assembler pipeline
tools/ # Git, image, music, video tools for persona agents
lightning/ # Lightning backend abstraction (mock + LND)
agent_core/ # Substrate-agnostic agent interface
voice/ # NLU intent detection (regex-based, local)
ws_manager/ # WebSocket connection manager (ws_manager singleton)
notifications/ # Push notification store (notifier singleton)
shortcuts/ # Siri Shortcuts API endpoints
telegram_bot/ # Telegram bridge
self_tdd/ # Continuous test watchdog
tests/ # One test_*.py per module, all mocked
static/ # style.css + bg.svg (dark arcane theme)
docs/ # GitHub Pages landing site
```
For agent roster and conventions, see [`AGENTS.md`](AGENTS.md).
---
@@ -79,32 +11,22 @@ docs/ # GitHub Pages landing site
### Config access
All configuration goes through `src/config.py` using pydantic-settings:
```python
from config import settings
url = settings.ollama_url # never use os.environ.get() directly in app code
```
Environment variables are read from `.env` automatically. See `.env.example` for
all available settings.
### Singletons
Core services are module-level singleton instances imported directly:
```python
from dashboard.store import message_log
from notifications.push import notifier
from ws_manager.handler import ws_manager
from timmy_serve.payment_handler import payment_handler
from swarm.coordinator import coordinator
```
### HTMX response pattern
Routes return Jinja2 template partials for HTMX requests:
```python
return templates.TemplateResponse(
"partials/chat_message.html",
@@ -115,147 +37,41 @@ return templates.TemplateResponse(
### Graceful degradation
Optional services (Ollama, Redis, AirLLM) degrade gracefully — log the error,
return a fallback, never crash:
```python
try:
result = await some_optional_service()
except Exception:
result = fallback_value
```
return a fallback, never crash.
### Route registration
New routes go in `src/dashboard/routes/<name>.py`, then register the router in
`src/dashboard/app.py`:
```python
from dashboard.routes.<name> import router as <name>_router
app.include_router(<name>_router)
```
New routes: `src/dashboard/routes/<name>.py` register in `src/dashboard/app.py`.
---
## Testing
### Running tests
```bash
make test # Quick run (pytest -q --tb=short)
make test # Quick run (no Ollama needed)
make test-cov # With coverage (term-missing + XML)
make test-cov-html # With HTML coverage report
```
No Ollama or external services needed — all heavy dependencies are mocked.
### Test conventions
- **One test file per module:** `tests/test_<module>.py`
- **Stubs in conftest:** `agno`, `airllm`, `pyttsx3`, `telegram` are stubbed in
`tests/conftest.py` using `sys.modules.setdefault()` so tests run without
those packages installed
- **Test mode:** `TIMMY_TEST_MODE=1` is set automatically in conftest to disable
auto-spawning of persona agents during tests
- **FastAPI testing:** Use the `client` fixture (wraps `TestClient`)
- **Database isolation:** SQLite files in `data/` are cleaned between tests;
coordinator state is reset via autouse fixtures
- **Async:** `asyncio_mode = "auto"` in pytest config — async test functions
are detected automatically
- **Coverage threshold:** CI fails if coverage drops below 60%
(`fail_under = 60` in `pyproject.toml`)
### Adding a new test
```python
# tests/test_my_feature.py
from fastapi.testclient import TestClient
def test_my_endpoint(client):
response = client.get("/my-endpoint")
assert response.status_code == 200
```
---
## CI/CD
GitHub Actions workflow (`.github/workflows/tests.yml`):
- Runs on every push and pull request to all branches
- Python 3.11, installs `.[dev]` dependencies
- Runs pytest with coverage + JUnit XML output
- Publishes test results as PR comments and check annotations
- Uploads coverage XML as a downloadable artifact (14-day retention)
- **Stubs in conftest:** `agno`, `airllm`, `pyttsx3`, `telegram`, `discord`
stubbed via `sys.modules.setdefault()` — tests run without those packages
- **Test mode:** `TIMMY_TEST_MODE=1` set automatically in conftest
- **FastAPI testing:** Use the `client` fixture
- **Async:** `asyncio_mode = "auto"` — async tests detected automatically
- **Coverage threshold:** 60% (`fail_under` in `pyproject.toml`)
---
## Key Conventions
1. **Tests must stay green.** Run `make test` before committing.
2. **No cloud AI dependencies.** All inference runs on localhost.
3. **No new top-level files without purpose.** Keep the root directory clean.
4. **Follow existing patterns** — singletons, graceful degradation,
pydantic-settings config.
5. **Security defaults:** Never hard-code secrets. Warn at startup when using
default values.
2. **No cloud AI dependencies.** All inference on localhost.
3. **Keep the root directory clean.** No new top-level files without purpose.
4. **Follow existing patterns** — singletons, graceful degradation, pydantic config.
5. **Security defaults:** Never hard-code secrets.
6. **XSS prevention:** Never use `innerHTML` with untrusted content.
7. **Keep routes thin** — business logic lives in the module, not the route.
8. **Prefer editing existing files** over creating new ones.
9. **Use `from config import settings`** for all env-var access.
10. **Every new module gets a test:** `tests/test_<module>.py`.
---
## Entry Points
Three CLI commands are installed via `pyproject.toml`:
| Command | Module | Purpose |
|---------|--------|---------|
| `timmy` | `src/timmy/cli.py` | Chat, think, status commands |
| `timmy-serve` | `src/timmy_serve/cli.py` | L402-gated API server (port 8402) |
| `self-tdd` | `src/self_tdd/watchdog.py` | Continuous test watchdog |
---
## Environment Variables
Key variables (full list in `.env.example`):
| Variable | Default | Purpose |
|----------|---------|---------|
| `OLLAMA_URL` | `http://localhost:11434` | Ollama host |
| `OLLAMA_MODEL` | `llama3.2` | Model served by Ollama |
| `DEBUG` | `false` | Enable `/docs` and `/redoc` |
| `TIMMY_MODEL_BACKEND` | `ollama` | `ollama` / `airllm` / `auto` |
| `AIRLLM_MODEL_SIZE` | `70b` | `8b` / `70b` / `405b` |
| `L402_HMAC_SECRET` | *(change in prod)* | HMAC signing for invoices |
| `L402_MACAROON_SECRET` | *(change in prod)* | Macaroon signing |
| `LIGHTNING_BACKEND` | `mock` | `mock` / `lnd` |
| `SPARK_ENABLED` | `true` | Enable Spark intelligence engine |
| `TELEGRAM_TOKEN` | *(empty)* | Telegram bot token |
---
## Persistence
- `timmy.db` — Agno agent memory (SQLite, project root)
- `data/swarm.db` — Swarm registry + tasks (SQLite, `data/` directory)
- All `.db` files are gitignored — never commit database files
---
## Docker
Containers share a `data/` volume for SQLite. Container agents communicate with
the coordinator over HTTP (not in-memory `SwarmComms`):
```
GET /internal/tasks → list tasks open for bidding
POST /internal/bids → submit a bid
```
`COORDINATOR_URL=http://dashboard:8000` is set automatically by docker-compose.
---
@@ -265,3 +81,14 @@ POST /internal/bids → submit a bid
- `src/timmy_serve/l402_proxy.py` — Lightning payment gating
- `src/lightning/` — payment backend abstraction
- Any file handling secrets or authentication tokens
---
## Entry Points
| Command | Module | Purpose |
|---------|--------|---------|
| `timmy` | `src/timmy/cli.py` | Chat, think, status |
| `timmy-serve` | `src/timmy_serve/cli.py` | L402-gated API server (port 8402) |
| `self-tdd` | `src/self_tdd/watchdog.py` | Continuous test watchdog |
| `self-modify` | `src/self_modify/cli.py` | Self-modification CLI |

View File

@@ -7,34 +7,20 @@
## Current Status
**Agent State:** Operational
**Mode:** Development
**Model:** llama3.2 (local via Ollama)
**Backend:** Ollama on localhost:11434
**Dashboard:** http://localhost:8000
**Agent State:** Operational
**Mode:** Development
**Active Tasks:** 0
**Pending Decisions:** None
---
## Standing Rules
1. **Sovereignty First** — No cloud AI dependencies
1. **Sovereignty First** — No cloud dependencies
2. **Local-Only Inference** — Ollama on localhost
3. **Privacy by Design** — Telemetry disabled
4. **Tool Minimalism** — Use tools only when necessary
5. **Memory Discipline** — Write handoffs at session end
6. **Clean Output** — Never show JSON, tool calls, or function syntax
---
## System Architecture
**Memory Tiers:**
- Tier 1 (Hot): This file (MEMORY.md) — always in context
- Tier 2 (Vault): memory/ directory — notes, profiles, AARs
- Tier 3 (Semantic): Vector search over vault content
**Swarm Agents:** Echo (research), Forge (code), Seer (data)
**Dashboard Pages:** Briefing, Swarm, Spark, Market, Tools, Events, Ledger, Memory, Router, Upgrades, Creative
---
@@ -42,16 +28,13 @@
| Agent | Role | Status |
|-------|------|--------|
| Timmy | Core AI | Active |
| Echo | Research & Summarization | Active |
| Forge | Coding & Debugging | Active |
| Seer | Analytics & Prediction | Active |
| Timmy | Core | Active |
---
## User Profile
**Name:** (not set)
**Name:** (not set)
**Interests:** (to be learned)
---
@@ -64,8 +47,8 @@
## Pending Actions
- [ ] Learn user's name and preferences
- [ ] Learn user's name
---
*Prune date: 2026-03-25*
*Prune date: 2026-02-25*

310
README.md
View File

@@ -2,184 +2,69 @@
[![Tests](https://github.com/AlexanderWhitestone/Timmy-time-dashboard/actions/workflows/tests.yml/badge.svg)](https://github.com/AlexanderWhitestone/Timmy-time-dashboard/actions/workflows/tests.yml)
A local-first, sovereign AI agent system. Talk to Timmy, watch his swarm, gate API access with Bitcoin Lightning — all from a browser, no cloud AI required.
A local-first, sovereign AI agent system. Talk to Timmy, watch his swarm, gate
API access with Bitcoin Lightning — all from a browser, no cloud AI required.
**[Live Docs →](https://alexanderwhitestone.github.io/Timmy-time-dashboard/)**
---
## What's built
## Quick Start
```bash
git clone https://github.com/AlexanderWhitestone/Timmy-time-dashboard.git
cd Timmy-time-dashboard
make install # create venv + install deps
cp .env.example .env # configure environment
ollama serve # separate terminal
ollama pull llama3.2
make dev # http://localhost:8000
make test # no Ollama needed
```
---
## What's Here
| Subsystem | Description |
|-----------|-------------|
| **Timmy Agent** | Agno-powered agent (Ollama default, AirLLM optional for 70B/405B) |
| **Mission Control** | FastAPI + HTMX dashboard — chat, health, swarm, marketplace |
| **Swarm** | Multi-agent coordinator — spawn agents, post tasks, run Lightning auctions |
| **L402 / Lightning** | Bitcoin Lightning payment gating for API access (mock backend; LND scaffolded) |
| **Spark Intelligence** | Event capture, predictions, memory consolidation, advisory engine |
| **Creative Studio** | Multi-persona creative pipeline — image, music, video generation |
| **Tools** | Git, image, music, and video tools accessible by persona agents |
| **Voice** | NLU intent detection + TTS (pyttsx3, no cloud) |
| **WebSocket** | Real-time swarm live feed |
| **Mobile** | Responsive layout with full iOS safe-area and touch support |
| **Telegram** | Bridge Telegram messages to Timmy |
| **Swarm** | Multi-agent coordinator — spawn agents, post tasks, Lightning auctions |
| **L402 / Lightning** | Bitcoin Lightning payment gating for API access |
| **Spark** | Event capture, predictions, memory consolidation, advisory |
| **Creative Studio** | Multi-persona pipeline — image, music, video generation |
| **Hands** | 6 autonomous scheduled agents — Oracle, Sentinel, Scout, Scribe, Ledger, Weaver |
| **CLI** | `timmy`, `timmy-serve`, `self-tdd` entry points |
**Full test suite, 100% passing.**
| **Self-Coding** | Codebase-aware self-modification with git safety |
| **Integrations** | Telegram bridge, Siri Shortcuts, voice NLU, mobile layout |
---
## Prerequisites
## Commands
**Python 3.11+**
```bash
python3 --version # must be 3.11+
make dev # start dashboard (http://localhost:8000)
make test # run all tests
make test-cov # tests + coverage report
make lint # run ruff/flake8
make docker-up # start via Docker
make help # see all commands
```
If not: `brew install python@3.11`
**Ollama** — runs the local LLM
```bash
brew install ollama
# or download from https://ollama.com
```
**CLI tools:** `timmy`, `timmy-serve`, `self-tdd`, `self-modify`
---
## Quickstart
## Documentation
```bash
# 1. Clone
git clone https://github.com/AlexanderWhitestone/Timmy-time-dashboard.git
cd Timmy-time-dashboard
# 2. Install
make install
# or manually: python3 -m venv .venv && source .venv/bin/activate && pip install -e ".[dev]"
# 3. Start Ollama (separate terminal)
ollama serve
ollama pull llama3.2
# 4. Launch dashboard
make dev
# opens at http://localhost:8000
```
---
## Common commands
```bash
make test # run all tests (no Ollama needed)
make test-cov # test + coverage report
make dev # start dashboard (http://localhost:8000)
make watch # self-TDD watchdog (60s poll, alerts on regressions)
```
Or with the bootstrap script (creates venv, tests, watchdog, server in one shot):
```bash
bash activate_self_tdd.sh
bash activate_self_tdd.sh --big-brain # also installs AirLLM
```
---
## CLI
```bash
timmy chat "What is sovereignty?"
timmy think "Bitcoin and self-custody"
timmy status
timmy-serve start # L402-gated API server (port 8402)
timmy-serve invoice # generate a Lightning invoice
timmy-serve status
```
---
## Mobile access
The dashboard is fully mobile-optimized (iOS safe area, 44px touch targets, 16px
input to prevent zoom, momentum scroll).
```bash
# Bind to your local network
uvicorn dashboard.app:app --host 0.0.0.0 --port 8000 --reload
# Find your IP
ipconfig getifaddr en0 # Wi-Fi on macOS
```
Open `http://<your-ip>:8000` on your phone (same Wi-Fi network).
Mobile-specific routes:
- `/mobile` — single-column optimized layout
- `/mobile-test` — 21-scenario HITL test harness (layout, touch, scroll, notch)
---
## Hands — Autonomous Agents
Hands are scheduled, autonomous agents that run on cron schedules. Each Hand has a `HAND.toml` manifest, `SYSTEM.md` prompt, and optional `skills/` directory.
**Built-in Hands:**
| Hand | Schedule | Purpose |
|------|----------|---------|
| **Oracle** | 7am, 7pm UTC | Bitcoin intelligence — price, on-chain, macro analysis |
| **Sentinel** | Every 15 min | System health — dashboard, agents, database, resources |
| **Scout** | Every hour | OSINT monitoring — HN, Reddit, RSS for Bitcoin/sovereign AI |
| **Scribe** | Daily 9am | Content production — blog posts, docs, changelog |
| **Ledger** | Every 6 hours | Treasury tracking — Bitcoin/Lightning balances, payment audit |
| **Weaver** | Sunday 10am | Creative pipeline — orchestrates Pixel+Lyra+Reel for video |
**Dashboard:** `/hands` — manage, trigger, approve actions
**Example HAND.toml:**
```toml
[hand]
name = "oracle"
schedule = "0 7,19 * * *" # Twice daily
enabled = true
[tools]
required = ["mempool_fetch", "price_fetch"]
[approval_gates]
broadcast = { action = "broadcast", description = "Post to dashboard" }
[output]
dashboard = true
channel = "telegram"
```
---
## AirLLM — big brain backend
Run 70B or 405B models locally with no GPU, using AirLLM's layer-by-layer loading.
Apple Silicon uses MLX automatically.
```bash
pip install ".[bigbrain]"
pip install "airllm[mlx]" # Apple Silicon only
timmy chat "Explain self-custody" --backend airllm --model-size 70b
```
Or set once in `.env`:
```bash
TIMMY_MODEL_BACKEND=auto
AIRLLM_MODEL_SIZE=70b
```
| Flag | Parameters | RAM needed |
|-------|-------------|------------|
| `8b` | 8 billion | ~16 GB |
| `70b` | 70 billion | ~140 GB |
| `405b`| 405 billion | ~810 GB |
| Document | Purpose |
|----------|---------|
| [CLAUDE.md](CLAUDE.md) | AI assistant development guide |
| [AGENTS.md](AGENTS.md) | Multi-agent development standards |
| [.env.example](.env.example) | Configuration reference |
| [docs/](docs/) | Architecture docs, ADRs, audits |
---
@@ -187,117 +72,26 @@ AIRLLM_MODEL_SIZE=70b
```bash
cp .env.example .env
# edit .env
```
| Variable | Default | Purpose |
|----------|---------|---------|
| `OLLAMA_URL` | `http://localhost:11434` | Ollama host |
| `OLLAMA_MODEL` | `llama3.2` | Model served by Ollama |
| `DEBUG` | `false` | Enable `/docs` and `/redoc` |
| `TIMMY_MODEL_BACKEND` | `ollama` | `ollama` \| `airllm` \| `auto` |
| `AIRLLM_MODEL_SIZE` | `70b` | `8b` \| `70b` \| `405b` |
| `L402_HMAC_SECRET` | *(default — change in prod)* | HMAC signing key for macaroons |
| `L402_MACAROON_SECRET` | *(default — change in prod)* | Macaroon secret |
| `LIGHTNING_BACKEND` | `mock` | `mock` (production-ready) \| `lnd` (scaffolded, not yet functional) |
---
## Architecture
```
Browser / Phone
│ HTTP + HTMX + WebSocket
┌─────────────────────────────────────────┐
│ FastAPI (dashboard.app) │
│ routes: agents, health, swarm, │
│ marketplace, voice, mobile │
└───┬─────────────┬──────────┬────────────┘
│ │ │
▼ ▼ ▼
Jinja2 Timmy Swarm
Templates Agent Coordinator
(HTMX) │ ├─ Registry (SQLite)
├─ Ollama ├─ AuctionManager (L402 bids)
└─ AirLLM ├─ SwarmComms (Redis / in-memory)
└─ SwarmManager (subprocess)
├── Voice NLU + TTS (pyttsx3, local)
├── WebSocket live feed (ws_manager)
├── L402 Lightning proxy (macaroon + invoice)
├── Push notifications (local + macOS native)
└── Siri Shortcuts API endpoints
Persistence: timmy.db (Agno memory), data/swarm.db (registry + tasks)
External: Ollama :11434, optional Redis, optional LND gRPC
```
---
## Project layout
```
src/
config.py # pydantic-settings — all env vars live here
timmy/ # Core agent (agent.py, backends.py, cli.py, prompts.py)
hands/ # Autonomous scheduled agents (registry, scheduler, runner)
dashboard/ # FastAPI app, routes, Jinja2 templates
swarm/ # Multi-agent: coordinator, registry, bidder, tasks, comms
timmy_serve/ # L402 proxy, payment handler, TTS, serve CLI
spark/ # Intelligence engine — events, predictions, advisory
creative/ # Creative director + video assembler pipeline
tools/ # Git, image, music, video tools for persona agents
lightning/ # Lightning backend abstraction (mock + LND)
agent_core/ # Substrate-agnostic agent interface
voice/ # NLU intent detection
ws_manager/ # WebSocket connection manager
notifications/ # Push notification store
shortcuts/ # Siri Shortcuts endpoints
telegram_bot/ # Telegram bridge
self_tdd/ # Continuous test watchdog
hands/ # Hand manifests — oracle/, sentinel/, etc.
tests/ # one test file per module, all mocked
static/style.css # Dark mission-control theme (JetBrains Mono)
docs/ # GitHub Pages landing page
AGENTS.md # AI agent development standards ← read this
.env.example # Environment variable reference
Makefile # Common dev commands
```
Key variables: `OLLAMA_URL`, `OLLAMA_MODEL`, `TIMMY_MODEL_BACKEND`,
`L402_HMAC_SECRET`, `LIGHTNING_BACKEND`, `DEBUG`. Full list in `.env.example`.
---
## Troubleshooting
**`ollama: command not found`** — install from `brew install ollama` or ollama.com
**`connection refused` in chat** — run `ollama serve` in a separate terminal
**`ModuleNotFoundError: No module named 'sqlalchemy'`** — re-run install to pick up the updated `agno[sqlite]` dependency:
`make install`
**`ModuleNotFoundError: No module named 'dashboard'`** — activate the venv:
`source .venv/bin/activate && pip install -e ".[dev]"`
**Health panel shows DOWN** — Ollama isn't running; chat still works but returns
the offline error message
**L402 startup warnings** — set `L402_HMAC_SECRET` and `L402_MACAROON_SECRET` in
`.env` to silence them (required for production)
---
## For AI agents contributing to this repo
Read [`AGENTS.md`](AGENTS.md). It covers per-agent assignments, architecture
patterns, coding conventions, and the v2→v3 roadmap.
- **`ollama: command not found`** — `brew install ollama` or ollama.com
- **`connection refused`** — run `ollama serve` first
- **`ModuleNotFoundError`** — `source .venv/bin/activate && make install`
- **Health panel shows DOWN** — Ollama isn't running; chat returns offline message
---
## Roadmap
| Version | Name | Status | Milestone |
|---------|------------|-------------|-----------|
| 1.0.0 | Genesis | ✅ Complete | Agno + Ollama + SQLite + Dashboard |
| 2.0.0 | Exodus | 🔄 In progress | Swarm + L402 + Voice + Marketplace + Hands |
| 3.0.0 | Revelation | 📋 Planned | Lightning treasury + single `.app` bundle |
| Version | Name | Status |
|---------|------|--------|
| 1.0 | Genesis | Complete Agno + Ollama + SQLite + Dashboard |
| 2.0 | Exodus | In progress Swarm + L402 + Voice + Marketplace + Hands |
| 3.0 | Revelation | Planned Lightning treasury + single `.app` bundle |

519
REFACTORING_PLAN.md Normal file
View File

@@ -0,0 +1,519 @@
# Timmy Time — Architectural Refactoring Plan
**Author:** Claude (VP Engineering review)
**Date:** 2026-02-26
**Branch:** `claude/plan-repo-refactoring-hgskF`
---
## Executive Summary
The Timmy Time codebase has grown to **53K lines of Python** across **272
files** (169 source + 103 test), **28 modules** in `src/`, **27 route files**,
**49 templates**, **90 test files**, and **87KB of root-level markdown**. It
works, but it's burning tokens, slowing down test runs, and making it hard to
reason about change impact.
This plan proposes **6 phases** of refactoring, ordered by impact and risk. Each
phase is independently valuable — you can stop after any phase and still be
better off.
---
## The Problems
### 1. Monolith sprawl
28 modules in `src/` with no grouping. Eleven modules aren't even included in
the wheel build (`agents`, `events`, `hands`, `mcp`, `memory`, `router`,
`self_coding`, `task_queue`, `tools`, `upgrades`, `work_orders`). Some are
used by the dashboard routes but forgotten in `pyproject.toml`.
### 2. Dashboard is the gravity well
The dashboard has 27 route files (4,562 lines), 49 templates, and has become
the integration point for everything. Every new feature = new route file + new
template + new test file. This doesn't scale.
### 3. Documentation entropy
10 root-level `.md` files (87KB). README is 303 lines, CLAUDE.md is 267 lines,
AGENTS.md is 342 lines — with massive content duplication between them. Plus
PLAN.md, WORKSET_PLAN.md, WORKSET_PLAN_PHASE2.md, MEMORY.md,
IMPLEMENTATION_SUMMARY.md, QUALITY_ANALYSIS.md, QUALITY_REVIEW_REPORT.md.
Human eyes glaze over. AI assistants waste tokens reading redundant info.
### 4. Test sprawl — and a skeleton problem
97 test files, 19,600 lines — but **61 of those files (63%) are empty
skeletons** with zero actual test functions. Only 36 files have real tests
containing 471 test functions total. Many "large" test files (like
`test_scripture.py` at 901 lines, `test_router_cascade.py` at 523 lines) are
infrastructure-only — class definitions, imports, fixtures, but no assertions.
The functional/E2E directory (`tests/functional/`) has 7 files and 0 working
tests. Tests are flat in `tests/` with no organization. Running the full suite
means loading every module, every mock, every fixture even when you only
changed one thing.
### 5. Unclear project boundaries
Is this one project or several? The `timmy` CLI, `timmy-serve` API server,
`self-tdd` watchdog, and `self-modify` CLI are four separate entry points that
could be four separate packages. The `creative` extra needs PyTorch. The
`lightning` module is a standalone payment system. These shouldn't live in the
same test run.
### 6. Wheel build doesn't match reality
`pyproject.toml` includes 17 modules but `src/` has 28. The missing 11 modules
are used by code that IS included (dashboard routes import from `hands`,
`mcp`, `memory`, `work_orders`, etc.). The wheel would break at runtime.
### 7. Dependency coupling through dashboard
The dashboard is the hub that imports from 20+ modules. The dependency graph
flows inward: `config` is the foundation (22 modules depend on it), `mcp` is
widely used (12+ importers), `swarm` is referenced by 15+ modules. No true
circular dependencies exist (the `timmy ↔ swarm` relationship uses lazy
imports), but the dashboard pulls in everything, so changing any module can
break the dashboard routes.
### 8. Conftest does too much
`tests/conftest.py` has 4 autouse fixtures that run on **every single test**:
reset message log, reset coordinator state, clean database, cleanup event
loops. Many tests don't need any of these. This adds overhead to the test
suite and couples all tests to the swarm coordinator.
---
## Phase 1: Documentation Cleanup (Low Risk, High Impact)
**Goal:** Cut root markdown from 87KB to ~20KB. Make README human-readable.
Eliminate token waste.
### 1.1 Slim the README
Cut README.md from 303 lines to ~80 lines:
```
# Timmy Time — Mission Control
Local-first sovereign AI agent system. Browser dashboard, Ollama inference,
Bitcoin Lightning economics. No cloud AI.
## Quick Start
make install && make dev → http://localhost:8000
## What's Here
- Timmy Agent (Ollama/AirLLM)
- Mission Control Dashboard (FastAPI + HTMX)
- Swarm Coordinator (multi-agent auctions)
- Lightning Payments (L402 gating)
- Creative Studio (image/music/video)
- Self-Coding (codebase-aware self-modification)
## Commands
make dev / make test / make docker-up / make help
## Documentation
- Development guide: CLAUDE.md
- Architecture: docs/architecture-v2.md
- Agent conventions: AGENTS.md
- Config reference: .env.example
```
### 1.2 De-duplicate CLAUDE.md
Remove content that duplicates README or AGENTS.md. CLAUDE.md should only
contain what AI assistants need that isn't elsewhere:
- Architecture patterns (singletons, config, HTMX, graceful degradation)
- Testing conventions (conftest, fixtures, stubs)
- Security-sensitive areas
- Entry points table
Target: 267 → ~130 lines.
### 1.3 Archive or delete temporary docs
| File | Action |
|------|--------|
| `MEMORY.md` | DELETE — session context, not permanent docs |
| `WORKSET_PLAN.md` | DELETE — use GitHub Issues |
| `WORKSET_PLAN_PHASE2.md` | DELETE — use GitHub Issues |
| `PLAN.md` | MOVE to `docs/PLAN_ARCHIVE.md` |
| `IMPLEMENTATION_SUMMARY.md` | MOVE to `docs/IMPLEMENTATION_ARCHIVE.md` |
| `QUALITY_ANALYSIS.md` | CONSOLIDATE with `docs/QUALITY_AUDIT.md` |
| `QUALITY_REVIEW_REPORT.md` | CONSOLIDATE with `docs/QUALITY_AUDIT.md` |
**Result:** Root directory goes from 10 `.md` files to 3 (README, CLAUDE,
AGENTS).
### 1.4 Clean up .handoff/
The `.handoff/` directory (CHECKPOINT.md, CONTINUE.md, TODO.md, scripts) is
session-scoped context. Either gitignore it or move to `docs/handoff/`.
---
## Phase 2: Module Consolidation (Medium Risk, High Impact)
**Goal:** Reduce 28 modules to ~12 by merging small, related modules into
coherent packages. This directly reduces cognitive load and token consumption.
### 2.1 Proposed module structure
```
src/
config.py # (keep as-is)
timmy/ # Core agent — MERGE IN agents/, agent_core/, memory/
agent.py # Main Timmy agent
backends.py # Ollama/AirLLM backends
cli.py # CLI entry point
orchestrator.py # ← from agents/timmy.py
personas/ # ← from agents/ (seer, helm, quill, echo, forge)
agent_core/ # ← from src/agent_core/ (becomes subpackage)
memory/ # ← from src/memory/ (becomes subpackage)
prompts.py
...
dashboard/ # Web UI — CONSOLIDATE routes
app.py
store.py
routes/ # See §2.2 for route consolidation
templates/
swarm/ # Multi-agent system — MERGE IN task_queue/, work_orders/
coordinator.py
tasks.py # ← existing + task_queue/ models
work_orders/ # ← from src/work_orders/ (becomes subpackage)
...
integrations/ # NEW — MERGE chat_bridge/, telegram_bot/, shortcuts/
chat_bridge/ # Discord, unified chat
telegram.py # ← from telegram_bot/
shortcuts.py # ← from shortcuts/
voice/ # ← from src/voice/
lightning/ # (keep as-is — standalone, security-sensitive)
self_coding/ # MERGE IN self_modify/, self_tdd/, upgrades/
codebase_indexer.py
git_safety.py
modification_journal.py
self_modify/ # ← from src/self_modify/ (becomes subpackage)
watchdog.py # ← from src/self_tdd/
upgrades/ # ← from src/upgrades/
mcp/ # (keep as-is — used across multiple modules)
spark/ # (keep as-is)
creative/ # MERGE IN tools/
director.py
assembler.py
tools/ # ← from src/tools/ (becomes subpackage)
hands/ # (keep as-is)
scripture/ # (keep as-is — domain-specific)
infrastructure/ # NEW — MERGE ws_manager/, notifications/, events/, router/
ws_manager.py # ← from ws_manager/handler.py (157 lines)
notifications.py # ← from notifications/push.py (153 lines)
events.py # ← from events/ (354 lines)
router/ # ← from src/router/ (cascade LLM router)
```
### 2.2 Dashboard route consolidation
27 route files → ~12 by grouping related routes:
| Current files | Merged into |
|--------------|-------------|
| `agents.py`, `briefing.py` | `agents.py` |
| `swarm.py`, `swarm_internal.py`, `swarm_ws.py` | `swarm.py` |
| `voice.py`, `voice_enhanced.py` | `voice.py` |
| `mobile.py`, `mobile_test.py` | `mobile.py` (delete test page) |
| `self_coding.py`, `self_modify.py` | `self_coding.py` |
| `tasks.py`, `work_orders.py` | `tasks.py` |
`mobile_test.py` (257 lines) is a test page route that's excluded from
coverage — it should not ship in production.
### 2.3 Fix the wheel build
Update `pyproject.toml` `[tool.hatch.build.targets.wheel]` to include all
modules that are actually imported. Currently 11 modules are missing from the
build manifest.
---
## Phase 3: Test Reorganization (Medium Risk, Medium Impact)
**Goal:** Organize tests to match module structure, enable selective test runs,
reduce full-suite runtime.
### 3.1 Mirror source structure in tests
```
tests/
conftest.py # Global fixtures only
timmy/ # Tests for timmy/ module
conftest.py # Timmy-specific fixtures
test_agent.py
test_backends.py
test_cli.py
test_orchestrator.py
test_personas.py
test_memory.py
dashboard/
conftest.py # Dashboard fixtures (client fixture)
test_routes_agents.py
test_routes_swarm.py
...
swarm/
test_coordinator.py
test_tasks.py
test_work_orders.py
integrations/
test_chat_bridge.py
test_telegram.py
test_voice.py
self_coding/
test_git_safety.py
test_codebase_indexer.py
test_self_modify.py
...
```
### 3.2 Add pytest marks for selective execution
```python
# pyproject.toml
[tool.pytest.ini_options]
markers = [
"unit: Unit tests (fast, no I/O)",
"integration: Integration tests (may use SQLite)",
"dashboard: Dashboard route tests",
"swarm: Swarm coordinator tests",
"slow: Tests that take >1 second",
]
```
Usage:
```bash
make test # Run all tests
pytest -m unit # Fast unit tests only
pytest -m dashboard # Just dashboard tests
pytest tests/swarm/ # Just swarm module tests
pytest -m "not slow" # Skip slow tests
```
### 3.3 Audit and clean skeleton test files
61 test files are empty skeletons — they have imports, class definitions, and
fixture setup but **zero test functions**. These add import overhead and create
a false sense of coverage. For each skeleton file:
1. If the module it tests is stable and well-covered elsewhere → **delete it**
2. If the module genuinely needs tests → **implement the tests** or file an
issue
3. If it's a duplicate (e.g., both `test_swarm.py` and
`test_swarm_integration.py` exist) → **consolidate**
Notable skeletons to address:
- `test_scripture.py` (901 lines, 0 tests) — massive infrastructure, no assertions
- `test_router_cascade.py` (523 lines, 0 tests) — same pattern
- `test_agent_core.py` (457 lines, 0 tests)
- `test_self_modify.py` (451 lines, 0 tests)
- All 7 files in `tests/functional/` (0 working tests)
### 3.4 Split genuinely oversized test files
For files that DO have tests but are too large:
- `test_task_queue.py` (560 lines, 30 tests) → split by feature area
- `test_mobile_scenarios.py` (339 lines, 36 tests) → split by scenario group
Rule of thumb: No test file over 400 lines.
---
## Phase 4: Configuration & Build Cleanup (Low Risk, Medium Impact)
### 4.1 Clean up pyproject.toml
- Fix the wheel include list to match actual imports
- Consider whether 4 separate CLI entry points belong in one package
- Add `[project.urls]` for documentation, repository links
- Review dependency pins — some are very loose (`>=1.0.0`)
### 4.2 Consolidate Docker files
4 docker-compose variants (default, dev, prod, test) is a lot. Consider:
- `docker-compose.yml` (base)
- `docker-compose.override.yml` (dev — auto-loaded by Docker)
- `docker-compose.prod.yml` (production only)
### 4.3 Clean up root directory
Non-essential root files to move or delete:
| File | Action |
|------|--------|
| `apply_security_fixes.py` | Move to `scripts/` or delete if one-time |
| `activate_self_tdd.sh` | Move to `scripts/` |
| `coverage.xml` | Gitignore (CI artifact) |
| `data/self_modify_reports/` | Gitignore the contents |
---
## Phase 5: Consider Package Extraction (High Risk, High Impact)
**Goal:** Evaluate whether some modules should be separate packages/repos.
### 5.1 Candidates for extraction
| Module | Why extract | Dependency direction |
|--------|------------|---------------------|
| `lightning/` | Standalone payment system, security-sensitive | Dashboard imports lightning |
| `creative/` | Needs PyTorch, very different dependency profile | Dashboard imports creative |
| `timmy-serve` | Separate process (port 8402), separate purpose | Shares config + timmy agent |
| `self_coding/` + `self_modify/` | Self-contained self-modification system | Dashboard imports for routes |
### 5.2 Monorepo approach (recommended over multi-repo)
If splitting, use a monorepo with namespace packages:
```
packages/
timmy-core/ # Agent + memory + CLI
timmy-dashboard/ # FastAPI app
timmy-swarm/ # Coordinator + tasks
timmy-lightning/ # Payment system
timmy-creative/ # Creative tools (heavy deps)
```
Each package gets its own `pyproject.toml`, test suite, and can be installed
independently. But they share the same repo, CI, and release cycle.
**However:** This is high effort and may not be worth it unless the team
grows or the dependency profiles diverge further. Consider this only after
Phases 1-4 are done and the pain persists.
---
## Phase 6: Token Optimization for AI Development (Low Risk, High Impact)
**Goal:** Reduce context window consumption when AI assistants work on this
codebase.
### 6.1 Lean CLAUDE.md (already covered in Phase 1)
Every byte in CLAUDE.md is read by every AI interaction. Remove duplication.
### 6.2 Module-level CLAUDE.md files
Instead of one massive guide, put module-specific context where it's needed:
```
src/swarm/CLAUDE.md # "This module is security-sensitive. Always..."
src/lightning/CLAUDE.md # "Never hard-code secrets. Use settings..."
src/dashboard/CLAUDE.md # "Routes return template partials for HTMX..."
```
AI assistants read these only when working in that directory.
### 6.3 Standardize module docstrings
Every `__init__.py` should have a one-line summary. AI assistants read these
to understand module purpose without reading every file:
```python
"""Swarm — Multi-agent coordinator with auction-based task assignment."""
```
### 6.4 Reduce template duplication
49 templates with repeated boilerplate. Consider Jinja2 macros for common
patterns (card layouts, form groups, table rows).
---
## Prioritized Execution Order
| Priority | Phase | Effort | Risk | Impact |
|----------|-------|--------|------|--------|
| **1** | Phase 1: Doc cleanup | 2-3 hours | Low | High — immediate token savings |
| **2** | Phase 6: Token optimization | 1-2 hours | Low | High — ongoing AI efficiency |
| **3** | Phase 4: Config/build cleanup | 1-2 hours | Low | Medium — hygiene |
| **4** | Phase 2: Module consolidation | 4-8 hours | Medium | High — structural improvement |
| **5** | Phase 3: Test reorganization | 3-5 hours | Medium | Medium — faster test cycles |
| **6** | Phase 5: Package extraction | 8-16 hours | High | High — only if needed |
---
## Quick Wins (Can Do Right Now)
1. Delete MEMORY.md, WORKSET_PLAN.md, WORKSET_PLAN_PHASE2.md (3 files, 0 risk)
2. Move PLAN.md, IMPLEMENTATION_SUMMARY.md, quality docs to `docs/` (5 files)
3. Slim README to ~80 lines
4. Fix pyproject.toml wheel includes (11 missing modules)
5. Gitignore `coverage.xml` and `data/self_modify_reports/`
6. Delete `dashboard/routes/mobile_test.py` (test page in production routes)
7. Delete or gut empty test skeletons (61 files with 0 tests — they waste CI
time and create noise)
---
## What NOT to Do
- **Don't rewrite from scratch.** The code works. Refactor incrementally.
- **Don't split into multiple repos.** Monorepo with packages (if needed) is
simpler for a small team.
- **Don't change the tech stack.** FastAPI + HTMX + Jinja2 is fine. Don't add
React, Vue, or a SPA framework.
- **Don't merge CLAUDE.md into README.** They serve different audiences.
- **Don't remove test files** just to reduce count. Reorganize them.
- **Don't break the singleton pattern.** It works for this scale.
---
## Success Metrics
After refactoring:
- Root `.md` files: 10 → 3
- Root markdown size: 87KB → ~20KB
- `src/` modules: 28 → ~12-15
- Dashboard route files: 27 → ~12-15
- Test files: organized in subdirectories matching source
- Empty skeleton test files: 61 → 0 (either implemented or deleted)
- Real test functions: 471 → 500+ (fill gaps in coverage)
- `pytest -m unit` runs in <10 seconds
- Wheel build includes all modules that are actually imported
- AI assistant context consumption drops ~40%
- Conftest autouse fixtures scoped to relevant test directories
---
## Execution Status
### Completed
- [x] **Phase 1: Doc cleanup** — README 303→93 lines, CLAUDE.md 267→80,
AGENTS.md 342→72, deleted 3 session docs, archived 4 planning docs
- [x] **Phase 4: Config/build cleanup** — fixed 11 missing wheel modules, added
pytest markers, updated .gitignore, moved scripts to scripts/
- [x] **Phase 6: Token optimization** — added docstrings to 15+ __init__.py files
- [x] **Phase 3: Test reorganization** — 97 test files organized into 13
subdirectories mirroring source structure
- [x] **Phase 2a: Route consolidation** — 27 → 22 route files (merged voice,
swarm internal/ws, self-modify; deleted mobile_test)
### Remaining
- [ ] **Phase 2b: Full module consolidation** (28 → ~12 modules) — requires
updating hundreds of import statements. Should be done incrementally across
focused PRs, one module merge at a time. Candidates by import footprint:
- `work_orders/``swarm/work_orders/` (1 importer)
- `upgrades/``self_coding/upgrades/` (1 importer)
- `shortcuts/``integrations/shortcuts/` (1 importer)
- `events/``swarm/events/` (4 importers)
- `task_queue/``swarm/task_queue/` (3 importers)
- Larger merges: agents/ + agent_core/ + memory/ → timmy/ (many importers)
- [ ] **Phase 5: Package extraction** — only if team grows or dep profiles diverge

View File

@@ -1,147 +0,0 @@
# Timmy Time — Workset Plan (Post-Quality Review)
**Date:** 2026-02-25
**Based on:** QUALITY_ANALYSIS.md + QUALITY_REVIEW_REPORT.md
---
## Executive Summary
This workset addresses critical security vulnerabilities, hardens the tool system for reliability, improves privacy alignment with the "sovereign AI" vision, and enhances agent intelligence.
---
## Workset A: Security Fixes (P0) 🔒
### A1: XSS Vulnerabilities (SEC-01)
**Priority:** P0 — Critical
**Files:** `mobile.html`, `swarm_live.html`
**Issues:**
- `mobile.html` line ~85 uses raw `innerHTML` with unsanitized user input
- `swarm_live.html` line ~72 uses `innerHTML` with WebSocket agent data
**Fix:** Replace `innerHTML` string interpolation with safe DOM methods (`textContent`, `createTextNode`, or DOMPurify if available).
### A2: Hardcoded Secrets (SEC-02)
**Priority:** P1 — High
**Files:** `l402_proxy.py`, `payment_handler.py`
**Issue:** Default secrets are production-safe strings instead of `None` with startup assertion.
**Fix:**
- Change defaults to `None`
- Add startup assertion requiring env vars to be set
- Fail fast with clear error message
---
## Workset B: Tool System Hardening ⚙️
### B1: SSL Certificate Fix
**Priority:** P1 — High
**File:** Web search via DuckDuckGo
**Issue:** `CERTIFICATE_VERIFY_FAILED` errors prevent web search from working.
**Fix Options:**
- Option 1: Use `certifi` package for proper certificate bundle
- Option 2: Add `verify_ssl=False` parameter (less secure, acceptable for local)
- Option 3: Document SSL fix in troubleshooting
### B2: Tool Usage Instructions
**Priority:** P2 — Medium
**File:** `prompts.py`
**Issue:** Agent makes unnecessary tool calls for simple questions.
**Fix:** Add tool usage instructions to system prompt:
- Only use tools when explicitly needed
- For simple chat/questions, respond directly
- Tools are for: web search, file operations, code execution
### B3: Tool Error Handling
**Priority:** P2 — Medium
**File:** `tools.py`
**Issue:** Tool failures show stack traces to user.
**Fix:** Add graceful error handling with user-friendly messages.
---
## Workset C: Privacy & Sovereignty 🛡️
### C1: Agno Telemetry (Privacy)
**Priority:** P2 — Medium
**File:** `agent.py`, `backends.py`
**Issue:** Agno sends telemetry to `os-api.agno.com` which conflicts with "sovereign" vision.
**Fix:**
- Add `telemetry_enabled=False` parameter to Agent
- Document how to disable for air-gapped deployments
- Consider environment variable `TIMMY_TELEMETRY=0`
### C2: Secrets Validation
**Priority:** P1 — High
**File:** `config.py`, startup
**Issue:** Default secrets used without warning in production.
**Fix:**
- Add production mode detection
- Fatal error if default secrets in production
- Clear documentation on generating secrets
---
## Workset D: Agent Intelligence 🧠
### D1: Enhanced System Prompt
**Priority:** P2 — Medium
**File:** `prompts.py`
**Enhancements:**
- Tool usage guidelines (when to use, when not to)
- Memory awareness ("You remember previous conversations")
- Self-knowledge (capabilities, limitations)
- Response style guidelines
### D2: Memory Improvements
**Priority:** P2 — Medium
**File:** `agent.py`
**Enhancements:**
- Increase history runs from 10 to 20 for better context
- Add memory summarization for very long conversations
- Persistent session tracking
---
## Execution Order
| Order | Workset | Task | Est. Time |
|-------|---------|------|-----------|
| 1 | A | XSS fixes | 30 min |
| 2 | A | Secrets hardening | 20 min |
| 3 | B | SSL certificate fix | 15 min |
| 4 | B | Tool instructions | 20 min |
| 5 | C | Telemetry disable | 15 min |
| 6 | C | Secrets validation | 20 min |
| 7 | D | Enhanced prompts | 30 min |
| 8 | — | Test everything | 30 min |
**Total: ~3 hours**
---
## Success Criteria
- [ ] No XSS vulnerabilities (verified by code review)
- [ ] Secrets fail fast in production
- [ ] Web search works without SSL errors
- [ ] Agent uses tools appropriately (not for simple chat)
- [ ] Telemetry disabled by default
- [ ] All 895+ tests pass
- [ ] New tests added for security fixes

View File

@@ -1,133 +0,0 @@
# Timmy Time — Workset Plan Phase 2 (Functional Hardening)
**Date:** 2026-02-25
**Based on:** QUALITY_ANALYSIS.md remaining issues
---
## Executive Summary
This workset addresses the core functional gaps that prevent the swarm system from operating as designed. The swarm currently registers agents in the database but doesn't actually spawn processes or execute bids. This workset makes the swarm operational.
---
## Workset E: Swarm System Realization 🐝
### E1: Real Agent Process Spawning (FUNC-01)
**Priority:** P1 — High
**Files:** `swarm/agent_runner.py`, `swarm/coordinator.py`
**Issue:** `spawn_agent()` creates a database record but no Python process is actually launched.
**Fix:**
- Complete the `agent_runner.py` subprocess implementation
- Ensure spawned agents can communicate with coordinator
- Add proper lifecycle management (start, monitor, stop)
### E2: Working Auction System (FUNC-02)
**Priority:** P1 — High
**Files:** `swarm/bidder.py`, `swarm/persona_node.py`
**Issue:** Bidding system runs auctions but no actual agents submit bids.
**Fix:**
- Connect persona agents to the bidding system
- Implement automatic bid generation based on capabilities
- Ensure auction resolution assigns tasks to winners
### E3: Persona Agent Auto-Bidding
**Priority:** P1 — High
**Files:** `swarm/persona_node.py`, `swarm/coordinator.py`
**Fix:**
- Spawned persona agents should automatically bid on matching tasks
- Implement capability-based bid decisions
- Add bid amount calculation (base + jitter)
---
## Workset F: Testing & Reliability 🧪
### F1: WebSocket Reconnection Tests (TEST-01)
**Priority:** P2 — Medium
**Files:** `tests/test_websocket.py`
**Issue:** WebSocket tests don't cover reconnection logic or malformed payloads.
**Fix:**
- Add reconnection scenario tests
- Test malformed payload handling
- Test connection failure recovery
### F2: Voice TTS Graceful Degradation
**Priority:** P2 — Medium
**Files:** `timmy_serve/voice_tts.py`, `dashboard/routes/voice.py`
**Issue:** Voice routes fail without clear message when `pyttsx3` not installed.
**Fix:**
- Add graceful fallback message
- Return helpful error suggesting `pip install ".[voice]"`
- Don't crash, return 503 with instructions
### F3: Mobile Route Navigation
**Priority:** P2 — Medium
**Files:** `templates/base.html`
**Issue:** `/mobile` route not linked from desktop navigation.
**Fix:**
- Add mobile link to base template nav
- Make it easy to find mobile-optimized view
---
## Workset G: Performance & Architecture ⚡
### G1: SQLite Connection Pooling (PERF-01)
**Priority:** P3 — Low
**Files:** `swarm/registry.py`
**Issue:** New SQLite connection opened on every query.
**Fix:**
- Implement connection pooling or singleton pattern
- Reduce connection overhead
- Maintain thread safety
### G2: Development Experience
**Priority:** P2 — Medium
**Files:** `Makefile`, `README.md`
**Issue:** No single command to start full dev environment.
**Fix:**
- Add `make dev-full` that starts dashboard + Ollama check
- Add better startup validation
---
## Execution Order
| Order | Workset | Task | Est. Time |
|-------|---------|------|-----------|
| 1 | E | Persona auto-bidding system | 45 min |
| 2 | E | Fix auction resolution | 30 min |
| 3 | F | Voice graceful degradation | 20 min |
| 4 | F | Mobile nav link | 10 min |
| 5 | G | SQLite connection pooling | 30 min |
| 6 | — | Test everything | 30 min |
**Total: ~2.5 hours**
---
## Success Criteria
- [ ] Persona agents automatically bid on matching tasks
- [ ] Auctions resolve with actual winners
- [ ] Voice routes degrade gracefully without pyttsx3
- [ ] Mobile route accessible from desktop nav
- [ ] SQLite connections pooled/reused
- [ ] All 895+ tests pass
- [ ] New tests for bidding system

View File

@@ -81,25 +81,35 @@ self-modify = "self_modify.cli:main"
[tool.hatch.build.targets.wheel]
sources = {"src" = ""}
include = [
"src/config.py",
"src/agent_core",
"src/agents",
"src/chat_bridge",
"src/creative",
"src/dashboard",
"src/events",
"src/hands",
"src/lightning",
"src/mcp",
"src/memory",
"src/notifications",
"src/router",
"src/scripture",
"src/self_coding",
"src/self_modify",
"src/self_tdd",
"src/shortcuts",
"src/spark",
"src/swarm",
"src/task_queue",
"src/telegram_bot",
"src/timmy",
"src/timmy_serve",
"src/dashboard",
"src/config.py",
"src/self_tdd",
"src/swarm",
"src/ws_manager",
"src/voice",
"src/notifications",
"src/shortcuts",
"src/telegram_bot",
"src/chat_bridge",
"src/spark",
"src/tools",
"src/creative",
"src/agent_core",
"src/lightning",
"src/self_modify",
"src/scripture",
"src/upgrades",
"src/voice",
"src/work_orders",
"src/ws_manager",
]
[tool.pytest.ini_options]
@@ -108,12 +118,18 @@ pythonpath = ["src", "tests"]
asyncio_mode = "auto"
asyncio_default_fixture_loop_scope = "function"
addopts = "-v --tb=short"
markers = [
"unit: Unit tests (fast, no I/O)",
"integration: Integration tests (may use SQLite)",
"dashboard: Dashboard route tests",
"swarm: Swarm coordinator tests",
"slow: Tests that take >1 second",
]
[tool.coverage.run]
source = ["src"]
omit = [
"*/tests/*",
"src/dashboard/routes/mobile_test.py",
]
[tool.coverage.report]

View File

@@ -0,0 +1 @@
"""Agent Core — Substrate-agnostic agent interface and base classes."""

View File

@@ -0,0 +1 @@
"""Dashboard — FastAPI + HTMX Mission Control web application."""

View File

@@ -12,21 +12,17 @@ from fastapi.templating import Jinja2Templates
from config import settings
from dashboard.routes.agents import router as agents_router
from dashboard.routes.health import router as health_router
from dashboard.routes.mobile_test import router as mobile_test_router
from dashboard.routes.swarm import router as swarm_router
from dashboard.routes.swarm import internal_router as swarm_internal_router
from dashboard.routes.marketplace import router as marketplace_router
from dashboard.routes.voice import router as voice_router
from dashboard.routes.voice_enhanced import router as voice_enhanced_router
from dashboard.routes.mobile import router as mobile_router
from dashboard.routes.swarm_ws import router as swarm_ws_router
from dashboard.routes.briefing import router as briefing_router
from dashboard.routes.telegram import router as telegram_router
from dashboard.routes.swarm_internal import router as swarm_internal_router
from dashboard.routes.tools import router as tools_router
from dashboard.routes.spark import router as spark_router
from dashboard.routes.creative import router as creative_router
from dashboard.routes.discord import router as discord_router
from dashboard.routes.self_modify import router as self_modify_router
from dashboard.routes.events import router as events_router
from dashboard.routes.ledger import router as ledger_router
from dashboard.routes.memory import router as memory_router
@@ -36,6 +32,7 @@ from dashboard.routes.work_orders import router as work_orders_router
from dashboard.routes.tasks import router as tasks_router
from dashboard.routes.scripture import router as scripture_router
from dashboard.routes.self_coding import router as self_coding_router
from dashboard.routes.self_coding import self_modify_router
from dashboard.routes.hands import router as hands_router
from router.api import router as cascade_router
@@ -131,7 +128,7 @@ async def lifespan(app: FastAPI):
logger.info("MCP auto-bootstrap: %d tools registered", len(registered))
except Exception as exc:
logger.warning("MCP auto-bootstrap failed: %s", exc)
# Initialise Spark Intelligence engine
from spark.engine import spark_engine
if spark_engine.enabled:
@@ -178,20 +175,18 @@ app.mount("/static", StaticFiles(directory=str(PROJECT_ROOT / "static")), name="
app.include_router(health_router)
app.include_router(agents_router)
app.include_router(mobile_test_router)
app.include_router(swarm_router)
app.include_router(swarm_internal_router)
app.include_router(marketplace_router)
app.include_router(voice_router)
app.include_router(voice_enhanced_router)
app.include_router(mobile_router)
app.include_router(swarm_ws_router)
app.include_router(briefing_router)
app.include_router(telegram_router)
app.include_router(swarm_internal_router)
app.include_router(tools_router)
app.include_router(spark_router)
app.include_router(creative_router)
app.include_router(discord_router)
app.include_router(self_coding_router)
app.include_router(self_modify_router)
app.include_router(events_router)
app.include_router(ledger_router)
@@ -201,7 +196,6 @@ app.include_router(upgrades_router)
app.include_router(work_orders_router)
app.include_router(tasks_router)
app.include_router(scripture_router)
app.include_router(self_coding_router)
app.include_router(hands_router)
app.include_router(cascade_router)

View File

@@ -0,0 +1 @@
"""Dashboard route modules — one file per route group."""

View File

@@ -1,257 +0,0 @@
"""Mobile HITL (Human-in-the-Loop) test checklist route.
GET /mobile-test — interactive checklist for a human tester on their phone.
Each scenario specifies what to do and what to observe. The tester marks
each one PASS / FAIL / SKIP. Results are stored in sessionStorage so they
survive page scrolling without hitting the server.
"""
from pathlib import Path
from fastapi import APIRouter, Request
from fastapi.responses import HTMLResponse
from fastapi.templating import Jinja2Templates
router = APIRouter(tags=["mobile-test"])
templates = Jinja2Templates(directory=str(Path(__file__).parent.parent / "templates"))
# ── Test scenarios ────────────────────────────────────────────────────────────
# Each dict: id, category, title, steps (list), expected
SCENARIOS = [
# Layout
{
"id": "L01",
"category": "Layout",
"title": "Sidebar renders as horizontal strip",
"steps": [
"Open the Mission Control page on your phone.",
"Look at the top section above the chat window.",
],
"expected": (
"AGENTS and SYSTEM HEALTH panels appear side-by-side in a "
"horizontally scrollable strip — not stacked vertically."
),
},
{
"id": "L02",
"category": "Layout",
"title": "Sidebar panels are horizontally scrollable",
"steps": [
"Swipe left/right on the AGENTS / SYSTEM HEALTH strip.",
],
"expected": "Both panels slide smoothly; no page scroll is triggered.",
},
{
"id": "L03",
"category": "Layout",
"title": "Chat panel fills ≥ 60 % of viewport height",
"steps": [
"Look at the TIMMY INTERFACE chat card below the strip.",
],
"expected": "The chat card occupies at least 60 % of the visible screen height.",
},
{
"id": "L04",
"category": "Layout",
"title": "Header stays fixed while chat scrolls",
"steps": [
"Send several messages until the chat overflows.",
"Scroll the chat log up and down.",
],
"expected": "The TIMMY TIME / MISSION CONTROL header remains pinned at the top.",
},
{
"id": "L05",
"category": "Layout",
"title": "No horizontal page overflow",
"steps": [
"Try swiping left or right anywhere on the page.",
],
"expected": "The page does not scroll horizontally; nothing is cut off.",
},
# Touch & Input
{
"id": "T01",
"category": "Touch & Input",
"title": "iOS does NOT zoom when tapping the input",
"steps": [
"Tap the message input field once.",
"Watch whether the browser zooms in.",
],
"expected": "The keyboard rises; the layout does NOT zoom in.",
},
{
"id": "T02",
"category": "Touch & Input",
"title": "Keyboard return key is labelled 'Send'",
"steps": [
"Tap the message input to open the iOS/Android keyboard.",
"Look at the return / action key in the bottom-right of the keyboard.",
],
"expected": "The key is labelled 'Send' (not 'Return' or 'Go').",
},
{
"id": "T03",
"category": "Touch & Input",
"title": "Send button is easy to tap (≥ 44 px tall)",
"steps": [
"Try tapping the SEND button with your thumb.",
],
"expected": "The button registers the tap reliably on the first attempt.",
},
{
"id": "T04",
"category": "Touch & Input",
"title": "SEND button disabled during in-flight request",
"steps": [
"Type a message and press SEND.",
"Immediately try to tap SEND again before a response arrives.",
],
"expected": "The button is visually disabled; no duplicate message is sent.",
},
{
"id": "T05",
"category": "Touch & Input",
"title": "Empty message cannot be submitted",
"steps": [
"Leave the input blank.",
"Tap SEND.",
],
"expected": "Nothing is submitted; the form shows a required-field indicator.",
},
{
"id": "T06",
"category": "Touch & Input",
"title": "CLEAR button shows confirmation dialog",
"steps": [
"Send at least one message.",
"Tap the CLEAR button in the top-right of the chat header.",
],
"expected": "A browser confirmation dialog appears before history is cleared.",
},
# Chat behaviour
{
"id": "C01",
"category": "Chat",
"title": "Chat auto-scrolls to the latest message",
"steps": [
"Scroll the chat log to the top.",
"Send a new message.",
],
"expected": "After the response arrives the chat automatically scrolls to the bottom.",
},
{
"id": "C02",
"category": "Chat",
"title": "Multi-turn conversation — Timmy remembers context",
"steps": [
"Send: 'My name is <your name>.'",
"Then send: 'What is my name?'",
],
"expected": "Timmy replies with your name, demonstrating conversation memory.",
},
{
"id": "C03",
"category": "Chat",
"title": "Loading indicator appears while waiting",
"steps": [
"Send a message and watch the SEND button.",
],
"expected": "A blinking cursor (▋) appears next to SEND while the response is loading.",
},
{
"id": "C04",
"category": "Chat",
"title": "Offline error is shown gracefully",
"steps": [
"Stop Ollama on your host machine (or disconnect from Wi-Fi temporarily).",
"Send a message from your phone.",
],
"expected": "A red 'Timmy is offline' error appears in the chat — no crash or spinner hang.",
},
# Health panel
{
"id": "H01",
"category": "Health",
"title": "Health panel shows Ollama UP when running",
"steps": [
"Ensure Ollama is running on your host.",
"Check the SYSTEM HEALTH panel.",
],
"expected": "OLLAMA badge shows green UP.",
},
{
"id": "H02",
"category": "Health",
"title": "Health panel auto-refreshes without reload",
"steps": [
"Start Ollama if it is not running.",
"Wait up to 35 seconds with the page open.",
],
"expected": "The OLLAMA badge flips from DOWN → UP automatically, without a page reload.",
},
# Scroll & overscroll
{
"id": "S01",
"category": "Scroll",
"title": "No rubber-band / bounce on the main page",
"steps": [
"Scroll to the very top of the page.",
"Continue pulling downward.",
],
"expected": "The page does not bounce or show a white gap — overscroll is suppressed.",
},
{
"id": "S02",
"category": "Scroll",
"title": "Chat log scrolls independently inside the card",
"steps": [
"Scroll inside the chat log area.",
],
"expected": "The chat log scrolls smoothly; the outer page does not move.",
},
# Safe area / notch
{
"id": "N01",
"category": "Notch / Home Bar",
"title": "Header clears the status bar / Dynamic Island",
"steps": [
"On a notched iPhone (Face ID), look at the top of the page.",
],
"expected": "The TIMMY TIME header text is not obscured by the notch or Dynamic Island.",
},
{
"id": "N02",
"category": "Notch / Home Bar",
"title": "Chat input not hidden behind home indicator",
"steps": [
"Tap the input field and look at the bottom of the screen.",
],
"expected": "The input row sits above the iPhone home indicator bar — nothing is cut off.",
},
# Clock
{
"id": "X01",
"category": "Live UI",
"title": "Clock updates every second",
"steps": [
"Look at the time display in the top-right of the header.",
"Watch for 3 seconds.",
],
"expected": "The time increments each second in HH:MM:SS format.",
},
]
@router.get("/mobile-test", response_class=HTMLResponse)
async def mobile_test(request: Request):
"""Interactive HITL mobile test checklist — open on your phone."""
categories: dict[str, list] = {}
for s in SCENARIOS:
categories.setdefault(s["category"], []).append(s)
return templates.TemplateResponse(
request,
"mobile_test.html",
{"scenarios": SCENARIOS, "categories": categories, "total": len(SCENARIOS)},
)

View File

@@ -5,17 +5,21 @@ API endpoints and HTMX views for the self-coding system:
- Stats dashboard
- Manual task execution
- Real-time status updates
- Self-modification loop (/self-modify/*)
"""
from __future__ import annotations
import asyncio
import logging
from typing import Optional
from fastapi import APIRouter, Form, Request
from fastapi import APIRouter, Form, HTTPException, Request
from fastapi.responses import HTMLResponse, JSONResponse
from pydantic import BaseModel
from config import settings
from self_coding import (
CodebaseIndexer,
ModificationJournal,
@@ -366,3 +370,59 @@ async def journal_entry_detail(request: Request, attempt_id: int):
"entry": entry,
},
)
# ── Self-Modification Routes (/self-modify/*) ───────────────────────────
self_modify_router = APIRouter(prefix="/self-modify", tags=["self-modify"])
@self_modify_router.post("/run")
async def run_self_modify(
instruction: str = Form(...),
target_files: str = Form(""),
dry_run: bool = Form(False),
speak_result: bool = Form(False),
):
"""Execute a self-modification loop."""
if not settings.self_modify_enabled:
raise HTTPException(403, "Self-modification is disabled")
from self_modify.loop import SelfModifyLoop, ModifyRequest
files = [f.strip() for f in target_files.split(",") if f.strip()]
request = ModifyRequest(
instruction=instruction,
target_files=files,
dry_run=dry_run,
)
loop = SelfModifyLoop()
result = await asyncio.to_thread(loop.run, request)
if speak_result and result.success:
try:
from timmy_serve.voice_tts import voice_tts
if voice_tts.available:
voice_tts.speak(
f"Code modification complete. "
f"{len(result.files_changed)} files changed. Tests passing."
)
except Exception:
pass
return {
"success": result.success,
"files_changed": result.files_changed,
"test_passed": result.test_passed,
"commit_sha": result.commit_sha,
"branch_name": result.branch_name,
"error": result.error,
"attempts": result.attempts,
}
@self_modify_router.get("/status")
async def self_modify_status():
"""Return whether self-modification is enabled."""
return {"enabled": settings.self_modify_enabled}

View File

@@ -1,71 +0,0 @@
"""Self-modification routes — /self-modify endpoints.
Exposes the edit-test-commit loop as a REST API. Gated by
``SELF_MODIFY_ENABLED`` (default False).
"""
import asyncio
import logging
from fastapi import APIRouter, Form, HTTPException
from config import settings
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/self-modify", tags=["self-modify"])
@router.post("/run")
async def run_self_modify(
instruction: str = Form(...),
target_files: str = Form(""),
dry_run: bool = Form(False),
speak_result: bool = Form(False),
):
"""Execute a self-modification loop.
Returns the ModifyResult as JSON.
"""
if not settings.self_modify_enabled:
raise HTTPException(403, "Self-modification is disabled")
from self_modify.loop import SelfModifyLoop, ModifyRequest
files = [f.strip() for f in target_files.split(",") if f.strip()]
request = ModifyRequest(
instruction=instruction,
target_files=files,
dry_run=dry_run,
)
loop = SelfModifyLoop()
result = await asyncio.to_thread(loop.run, request)
if speak_result and result.success:
try:
from timmy_serve.voice_tts import voice_tts
if voice_tts.available:
voice_tts.speak(
f"Code modification complete. "
f"{len(result.files_changed)} files changed. Tests passing."
)
except Exception:
pass
return {
"success": result.success,
"files_changed": result.files_changed,
"test_passed": result.test_passed,
"commit_sha": result.commit_sha,
"branch_name": result.branch_name,
"error": result.error,
"attempts": result.attempts,
}
@router.get("/status")
async def self_modify_status():
"""Return whether self-modification is enabled."""
return {"enabled": settings.self_modify_enabled}

View File

@@ -1,22 +1,28 @@
"""Swarm dashboard routes — /swarm/* endpoints.
"""Swarm dashboard routes — /swarm/*, /internal/*, and /swarm/live endpoints.
Provides REST endpoints for managing the swarm: listing agents,
spawning sub-agents, posting tasks, and viewing auction results.
spawning sub-agents, posting tasks, viewing auction results, Docker
container agent HTTP API, and WebSocket live feed.
"""
import asyncio
import logging
from datetime import datetime, timezone
from pathlib import Path
from typing import Optional
from fastapi import APIRouter, Form, HTTPException, Request
from fastapi import APIRouter, Form, HTTPException, Request, WebSocket, WebSocketDisconnect
from fastapi.responses import HTMLResponse
from fastapi.templating import Jinja2Templates
from pydantic import BaseModel
from swarm import learner as swarm_learner
from swarm import registry
from swarm.coordinator import coordinator
from swarm.tasks import TaskStatus, update_task
from ws_manager.handler import ws_manager
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/swarm", tags=["swarm"])
templates = Jinja2Templates(directory=str(Path(__file__).parent.parent / "templates"))
@@ -325,3 +331,92 @@ async def message_agent(agent_id: str, request: Request, message: str = Form(...
)
# ── Internal HTTP API (Docker container agents) ─────────────────────────
internal_router = APIRouter(prefix="/internal", tags=["internal"])
class BidRequest(BaseModel):
task_id: str
agent_id: str
bid_sats: int
capabilities: Optional[str] = ""
class BidResponse(BaseModel):
accepted: bool
task_id: str
agent_id: str
message: str
class TaskSummary(BaseModel):
task_id: str
description: str
status: str
@internal_router.get("/tasks", response_model=list[TaskSummary])
def list_biddable_tasks():
"""Return all tasks currently open for bidding."""
tasks = coordinator.list_tasks(status=TaskStatus.BIDDING)
return [
TaskSummary(
task_id=t.id,
description=t.description,
status=t.status.value,
)
for t in tasks
]
@internal_router.post("/bids", response_model=BidResponse)
def submit_bid(bid: BidRequest):
"""Accept a bid from a container agent."""
if bid.bid_sats <= 0:
raise HTTPException(status_code=422, detail="bid_sats must be > 0")
accepted = coordinator.auctions.submit_bid(
task_id=bid.task_id,
agent_id=bid.agent_id,
bid_sats=bid.bid_sats,
)
if accepted:
from swarm import stats as swarm_stats
swarm_stats.record_bid(bid.task_id, bid.agent_id, bid.bid_sats, won=False)
logger.info(
"Docker agent %s bid %d sats on task %s",
bid.agent_id, bid.bid_sats, bid.task_id,
)
return BidResponse(
accepted=True,
task_id=bid.task_id,
agent_id=bid.agent_id,
message="Bid accepted.",
)
return BidResponse(
accepted=False,
task_id=bid.task_id,
agent_id=bid.agent_id,
message="No open auction for this task — it may have already closed.",
)
# ── WebSocket live feed ──────────────────────────────────────────────────
@router.websocket("/live")
async def swarm_live(websocket: WebSocket):
"""WebSocket endpoint for live swarm event streaming."""
await ws_manager.connect(websocket)
try:
while True:
data = await websocket.receive_text()
logger.debug("WS received: %s", data[:100])
except WebSocketDisconnect:
ws_manager.disconnect(websocket)
except Exception as exc:
logger.error("WebSocket error: %s", exc)
ws_manager.disconnect(websocket)

View File

@@ -1,115 +0,0 @@
"""Internal swarm HTTP API — for Docker container agents.
Container agents can't use the in-memory SwarmComms channel, so they poll
these lightweight endpoints to participate in the auction system.
Routes
------
GET /internal/tasks
Returns all tasks currently in BIDDING status — the set an agent
can submit bids for.
POST /internal/bids
Accepts a bid from a container agent and feeds it into the in-memory
AuctionManager. The coordinator then closes auctions and assigns
winners exactly as it does for in-process agents.
These endpoints are intentionally unauthenticated because they are only
reachable inside the Docker swarm-net bridge network. Do not expose them
through a reverse-proxy to the public internet.
"""
import logging
from typing import Optional
from fastapi import APIRouter, HTTPException
from pydantic import BaseModel
from swarm.coordinator import coordinator
from swarm.tasks import TaskStatus
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/internal", tags=["internal"])
# ── Request / response models ─────────────────────────────────────────────────
class BidRequest(BaseModel):
task_id: str
agent_id: str
bid_sats: int
capabilities: Optional[str] = ""
class BidResponse(BaseModel):
accepted: bool
task_id: str
agent_id: str
message: str
class TaskSummary(BaseModel):
task_id: str
description: str
status: str
# ── Routes ────────────────────────────────────────────────────────────────────
@router.get("/tasks", response_model=list[TaskSummary])
def list_biddable_tasks():
"""Return all tasks currently open for bidding.
Container agents should poll this endpoint and submit bids for any
tasks they are capable of handling.
"""
tasks = coordinator.list_tasks(status=TaskStatus.BIDDING)
return [
TaskSummary(
task_id=t.id,
description=t.description,
status=t.status.value,
)
for t in tasks
]
@router.post("/bids", response_model=BidResponse)
def submit_bid(bid: BidRequest):
"""Accept a bid from a container agent.
The bid is injected directly into the in-memory AuctionManager.
If no auction is open for the task (e.g. it already closed), the
bid is rejected gracefully — the agent should just move on.
"""
if bid.bid_sats <= 0:
raise HTTPException(status_code=422, detail="bid_sats must be > 0")
accepted = coordinator.auctions.submit_bid(
task_id=bid.task_id,
agent_id=bid.agent_id,
bid_sats=bid.bid_sats,
)
if accepted:
# Persist bid in stats table for marketplace analytics
from swarm import stats as swarm_stats
swarm_stats.record_bid(bid.task_id, bid.agent_id, bid.bid_sats, won=False)
logger.info(
"Docker agent %s bid %d sats on task %s",
bid.agent_id, bid.bid_sats, bid.task_id,
)
return BidResponse(
accepted=True,
task_id=bid.task_id,
agent_id=bid.agent_id,
message="Bid accepted.",
)
return BidResponse(
accepted=False,
task_id=bid.task_id,
agent_id=bid.agent_id,
message="No open auction for this task — it may have already closed.",
)

View File

@@ -1,33 +0,0 @@
"""Swarm WebSocket route — /swarm/live endpoint.
Provides a real-time WebSocket feed of swarm events for the live
dashboard view. Clients connect and receive JSON events as they
happen: agent joins, task posts, bids, assignments, completions.
"""
import logging
from fastapi import APIRouter, WebSocket, WebSocketDisconnect
from ws_manager.handler import ws_manager
logger = logging.getLogger(__name__)
router = APIRouter(tags=["swarm-ws"])
@router.websocket("/swarm/live")
async def swarm_live(websocket: WebSocket):
"""WebSocket endpoint for live swarm event streaming."""
await ws_manager.connect(websocket)
try:
while True:
# Keep the connection alive; client can also send commands
data = await websocket.receive_text()
# Echo back as acknowledgment (future: handle client commands)
logger.debug("WS received: %s", data[:100])
except WebSocketDisconnect:
ws_manager.disconnect(websocket)
except Exception as exc:
logger.error("WebSocket error: %s", exc)
ws_manager.disconnect(websocket)

View File

@@ -1,12 +1,17 @@
"""Voice routes — /voice/* endpoints.
"""Voice routes — /voice/* and /voice/enhanced/* endpoints.
Provides NLU intent detection and TTS control endpoints for the
voice interface.
Provides NLU intent detection, TTS control, and the full voice-to-action
pipeline (detect intent → execute → optionally speak).
"""
import logging
from fastapi import APIRouter, Form
from voice.nlu import detect_intent, extract_command
from timmy.agent import create_timmy
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/voice", tags=["voice"])
@@ -49,3 +54,103 @@ async def tts_speak(text: str = Form(...)):
return {"spoken": True, "text": text}
except Exception as exc:
return {"spoken": False, "reason": str(exc)}
# ── Enhanced voice pipeline ──────────────────────────────────────────────
@router.post("/enhanced/process")
async def process_voice_input(
text: str = Form(...),
speak_response: bool = Form(False),
):
"""Process a voice input: detect intent -> execute -> optionally speak.
This is the main entry point for voice-driven interaction with Timmy.
"""
intent = detect_intent(text)
response_text = None
error = None
try:
if intent.name == "status":
response_text = "Timmy is operational and running locally. All systems sovereign."
elif intent.name == "help":
response_text = (
"Available commands: chat with me, check status, "
"manage the swarm, create tasks, or adjust voice settings. "
"Everything runs locally — no cloud, no permission needed."
)
elif intent.name == "swarm":
from swarm.coordinator import coordinator
status = coordinator.status()
response_text = (
f"Swarm status: {status['agents']} agents registered, "
f"{status['agents_idle']} idle, {status['agents_busy']} busy. "
f"{status['tasks_total']} total tasks, "
f"{status['tasks_completed']} completed."
)
elif intent.name == "voice":
response_text = "Voice settings acknowledged. TTS is available for spoken responses."
elif intent.name == "code":
from config import settings as app_settings
if not app_settings.self_modify_enabled:
response_text = (
"Self-modification is disabled. "
"Set SELF_MODIFY_ENABLED=true to enable."
)
else:
import asyncio
from self_modify.loop import SelfModifyLoop, ModifyRequest
target_files = []
if "target_file" in intent.entities:
target_files = [intent.entities["target_file"]]
loop = SelfModifyLoop()
request = ModifyRequest(
instruction=text,
target_files=target_files,
)
result = await asyncio.to_thread(loop.run, request)
if result.success:
sha_short = result.commit_sha[:8] if result.commit_sha else "none"
response_text = (
f"Code modification complete. "
f"Changed {len(result.files_changed)} file(s). "
f"Tests passed. Committed as {sha_short} "
f"on branch {result.branch_name}."
)
else:
response_text = f"Code modification failed: {result.error}"
else:
# Default: chat with Timmy
agent = create_timmy()
run = agent.run(text, stream=False)
response_text = run.content if hasattr(run, "content") else str(run)
except Exception as exc:
error = f"Processing failed: {exc}"
logger.error("Voice processing error: %s", exc)
# Optionally speak the response
if speak_response and response_text:
try:
from timmy_serve.voice_tts import voice_tts
if voice_tts.available:
voice_tts.speak(response_text)
except Exception:
pass
return {
"intent": intent.name,
"confidence": intent.confidence,
"response": response_text,
"error": error,
"spoken": speak_response and response_text is not None,
}

View File

@@ -1,116 +0,0 @@
"""Enhanced voice routes — /voice/enhanced/* endpoints.
Combines NLU intent detection with Timmy agent execution to provide
a complete voice-to-action pipeline. Detects the intent, routes to
the appropriate handler, and optionally speaks the response.
"""
import logging
from typing import Optional
from fastapi import APIRouter, Form
from voice.nlu import detect_intent
from timmy.agent import create_timmy
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/voice/enhanced", tags=["voice-enhanced"])
@router.post("/process")
async def process_voice_input(
text: str = Form(...),
speak_response: bool = Form(False),
):
"""Process a voice input: detect intent → execute → optionally speak.
This is the main entry point for voice-driven interaction with Timmy.
"""
intent = detect_intent(text)
response_text = None
error = None
try:
if intent.name == "status":
response_text = "Timmy is operational and running locally. All systems sovereign."
elif intent.name == "help":
response_text = (
"Available commands: chat with me, check status, "
"manage the swarm, create tasks, or adjust voice settings. "
"Everything runs locally — no cloud, no permission needed."
)
elif intent.name == "swarm":
from swarm.coordinator import coordinator
status = coordinator.status()
response_text = (
f"Swarm status: {status['agents']} agents registered, "
f"{status['agents_idle']} idle, {status['agents_busy']} busy. "
f"{status['tasks_total']} total tasks, "
f"{status['tasks_completed']} completed."
)
elif intent.name == "voice":
response_text = "Voice settings acknowledged. TTS is available for spoken responses."
elif intent.name == "code":
from config import settings as app_settings
if not app_settings.self_modify_enabled:
response_text = (
"Self-modification is disabled. "
"Set SELF_MODIFY_ENABLED=true to enable."
)
else:
import asyncio
from self_modify.loop import SelfModifyLoop, ModifyRequest
target_files = []
if "target_file" in intent.entities:
target_files = [intent.entities["target_file"]]
loop = SelfModifyLoop()
request = ModifyRequest(
instruction=text,
target_files=target_files,
)
result = await asyncio.to_thread(loop.run, request)
if result.success:
sha_short = result.commit_sha[:8] if result.commit_sha else "none"
response_text = (
f"Code modification complete. "
f"Changed {len(result.files_changed)} file(s). "
f"Tests passed. Committed as {sha_short} "
f"on branch {result.branch_name}."
)
else:
response_text = f"Code modification failed: {result.error}"
else:
# Default: chat with Timmy
agent = create_timmy()
run = agent.run(text, stream=False)
response_text = run.content if hasattr(run, "content") else str(run)
except Exception as exc:
error = f"Processing failed: {exc}"
logger.error("Voice processing error: %s", exc)
# Optionally speak the response
if speak_response and response_text:
try:
from timmy_serve.voice_tts import voice_tts
if voice_tts.available:
voice_tts.speak(response_text)
except Exception:
pass
return {
"intent": intent.name,
"confidence": intent.confidence,
"response": response_text,
"error": error,
"spoken": speak_response and response_text is not None,
}

View File

@@ -1,422 +0,0 @@
{% extends "base.html" %}
{% block title %}Mobile Test — Timmy Time{% endblock %}
{% block content %}
<div class="container-fluid mc-content" style="height:auto; overflow:visible;">
<!-- ── Page header ─────────────────────────────────────────────────── -->
<div class="mt-hitl-header">
<div>
<span class="mt-title">// MOBILE TEST SUITE</span>
<span class="mt-sub">HUMAN-IN-THE-LOOP</span>
</div>
<div class="mt-score-wrap">
<span class="mt-score" id="score-display">0 / {{ total }}</span>
<span class="mt-score-label">PASSED</span>
</div>
</div>
<!-- ── Progress bar ────────────────────────────────────────────────── -->
<div class="mt-progress-wrap">
<div class="progress" style="height:6px; background:var(--bg-card); border-radius:3px;">
<div class="progress-bar mt-progress-bar"
id="progress-bar"
role="progressbar"
style="width:0%; background:var(--green);"
aria-valuenow="0" aria-valuemin="0" aria-valuemax="{{ total }}"></div>
</div>
<div class="mt-progress-legend">
<span><span class="mt-dot green"></span>PASS</span>
<span><span class="mt-dot red"></span>FAIL</span>
<span><span class="mt-dot amber"></span>SKIP</span>
<span><span class="mt-dot" style="background:var(--text-dim);box-shadow:none;"></span>PENDING</span>
</div>
</div>
<!-- ── Reset / Back ────────────────────────────────────────────────── -->
<div class="mt-actions">
<a href="/" class="mc-btn-clear">← MISSION CONTROL</a>
<button class="mc-btn-clear" onclick="resetAll()" style="border-color:var(--red);color:var(--red);">RESET ALL</button>
</div>
<!-- ── Scenario cards ──────────────────────────────────────────────── -->
{% for category, items in categories.items() %}
<div class="mt-category-label">{{ category | upper }}</div>
{% for s in items %}
<div class="card mc-panel mt-card" id="card-{{ s.id }}" data-scenario="{{ s.id }}">
<div class="card-header mc-panel-header d-flex justify-content-between align-items-center">
<div>
<span class="mt-id-badge" id="badge-{{ s.id }}">{{ s.id }}</span>
<span class="mt-scenario-title">{{ s.title }}</span>
</div>
<span class="mt-state-chip" id="chip-{{ s.id }}">PENDING</span>
</div>
<div class="card-body p-3">
<div class="mt-steps-label">STEPS</div>
<ol class="mt-steps">
{% for step in s.steps %}
<li>{{ step }}</li>
{% endfor %}
</ol>
<div class="mt-expected-label">EXPECTED</div>
<div class="mt-expected">{{ s.expected }}</div>
<div class="mt-btn-row">
<button class="mt-btn mt-btn-pass" onclick="mark('{{ s.id }}', 'pass')">✓ PASS</button>
<button class="mt-btn mt-btn-fail" onclick="mark('{{ s.id }}', 'fail')">✗ FAIL</button>
<button class="mt-btn mt-btn-skip" onclick="mark('{{ s.id }}', 'skip')">— SKIP</button>
</div>
</div>
</div>
{% endfor %}
{% endfor %}
<!-- ── Summary footer ──────────────────────────────────────────────── -->
<div class="card mc-panel mt-summary" id="summary">
<div class="card-header mc-panel-header">// SUMMARY</div>
<div class="card-body p-3" id="summary-body">
<p class="mt-summary-hint">Mark all scenarios above to see your final score.</p>
</div>
</div>
</div><!-- /container -->
<!-- ── Styles (scoped to this page) ────────────────────────────────────── -->
<style>
.mt-hitl-header {
display: flex;
justify-content: space-between;
align-items: flex-end;
padding: 16px 0 12px;
border-bottom: 1px solid var(--border);
margin-bottom: 12px;
}
.mt-title {
font-size: 14px;
font-weight: 700;
color: var(--text-bright);
letter-spacing: 0.18em;
display: block;
}
.mt-sub {
font-size: 10px;
color: var(--text-dim);
letter-spacing: 0.2em;
display: block;
margin-top: 2px;
}
.mt-score-wrap { text-align: right; }
.mt-score {
font-size: 22px;
font-weight: 700;
color: var(--green);
letter-spacing: 0.06em;
display: block;
}
.mt-score-label { font-size: 9px; color: var(--text-dim); letter-spacing: 0.2em; }
.mt-progress-wrap { margin-bottom: 10px; }
.mt-progress-legend {
display: flex;
gap: 16px;
font-size: 9px;
color: var(--text-dim);
letter-spacing: 0.12em;
margin-top: 6px;
}
.mt-dot {
display: inline-block;
width: 7px;
height: 7px;
border-radius: 50%;
margin-right: 4px;
vertical-align: middle;
}
.mt-dot.green { background: var(--green); box-shadow: 0 0 5px var(--green); }
.mt-dot.red { background: var(--red); box-shadow: 0 0 5px var(--red); }
.mt-dot.amber { background: var(--amber); box-shadow: 0 0 5px var(--amber); }
.mt-actions {
display: flex;
gap: 10px;
margin-bottom: 16px;
}
.mt-category-label {
font-size: 9px;
font-weight: 700;
color: var(--text-dim);
letter-spacing: 0.25em;
margin: 20px 0 8px;
padding-left: 2px;
}
.mt-card {
margin-bottom: 10px;
transition: border-color 0.2s;
}
.mt-card.state-pass { border-color: var(--green) !important; }
.mt-card.state-fail { border-color: var(--red) !important; }
.mt-card.state-skip { border-color: var(--amber) !important; opacity: 0.7; }
.mt-id-badge {
font-size: 9px;
font-weight: 700;
background: var(--border);
color: var(--text-dim);
border-radius: 2px;
padding: 2px 6px;
letter-spacing: 0.12em;
margin-right: 8px;
}
.mt-card.state-pass .mt-id-badge { background: var(--green-dim); color: var(--green); }
.mt-card.state-fail .mt-id-badge { background: var(--red-dim); color: var(--red); }
.mt-card.state-skip .mt-id-badge { background: var(--amber-dim); color: var(--amber); }
.mt-scenario-title {
font-size: 12px;
font-weight: 700;
color: var(--text-bright);
letter-spacing: 0.05em;
}
.mt-state-chip {
font-size: 9px;
font-weight: 700;
letter-spacing: 0.15em;
color: var(--text-dim);
padding: 2px 8px;
border: 1px solid var(--border);
border-radius: 2px;
white-space: nowrap;
}
.mt-card.state-pass .mt-state-chip { color: var(--green); border-color: var(--green); }
.mt-card.state-fail .mt-state-chip { color: var(--red); border-color: var(--red); }
.mt-card.state-skip .mt-state-chip { color: var(--amber); border-color: var(--amber); }
.mt-steps-label, .mt-expected-label {
font-size: 9px;
font-weight: 700;
color: var(--text-dim);
letter-spacing: 0.2em;
margin-bottom: 6px;
}
.mt-expected-label { margin-top: 12px; }
.mt-steps {
padding-left: 18px;
margin: 0;
font-size: 12px;
line-height: 1.8;
color: var(--text);
}
.mt-expected {
font-size: 12px;
line-height: 1.65;
color: var(--text-bright);
background: var(--bg-card);
border-left: 3px solid var(--border-glow);
padding: 8px 12px;
border-radius: 0 3px 3px 0;
}
.mt-btn-row {
display: flex;
gap: 8px;
margin-top: 14px;
}
.mt-btn {
flex: 1;
min-height: 44px;
border: 1px solid var(--border);
border-radius: 3px;
background: var(--bg-deep);
color: var(--text-dim);
font-family: var(--font);
font-size: 11px;
font-weight: 700;
letter-spacing: 0.12em;
cursor: pointer;
touch-action: manipulation;
transition: background 0.15s, color 0.15s, border-color 0.15s;
}
.mt-btn-pass:hover, .mt-btn-pass.active { background: var(--green-dim); color: var(--green); border-color: var(--green); }
.mt-btn-fail:hover, .mt-btn-fail.active { background: var(--red-dim); color: var(--red); border-color: var(--red); }
.mt-btn-skip:hover, .mt-btn-skip.active { background: var(--amber-dim); color: var(--amber); border-color: var(--amber); }
.mt-summary { margin-top: 24px; margin-bottom: 32px; }
.mt-summary-hint { color: var(--text-dim); font-size: 12px; margin: 0; }
.mt-summary-row {
display: flex;
justify-content: space-between;
align-items: center;
padding: 8px 0;
border-bottom: 1px solid var(--border);
font-size: 12px;
}
.mt-summary-row:last-child { border-bottom: none; }
.mt-summary-score { font-size: 28px; font-weight: 700; color: var(--green); margin: 12px 0 4px; }
.mt-summary-pct { font-size: 13px; color: var(--text-dim); }
@media (max-width: 768px) {
.mt-btn-row { gap: 6px; }
.mt-btn { font-size: 10px; padding: 0 4px; }
}
</style>
<!-- ── HITL State Machine (sessionStorage) ─────────────────────────────── -->
<script>
const TOTAL = {{ total }};
const KEY = "timmy-mobile-test-results";
function loadResults() {
try { return JSON.parse(sessionStorage.getItem(KEY) || "{}"); }
catch { return {}; }
}
function saveResults(r) {
sessionStorage.setItem(KEY, JSON.stringify(r));
}
function mark(id, state) {
const results = loadResults();
results[id] = state;
saveResults(results);
applyState(id, state);
updateScore(results);
updateSummary(results);
}
function applyState(id, state) {
const card = document.getElementById("card-" + id);
const chip = document.getElementById("chip-" + id);
const labels = { pass: "PASS", fail: "FAIL", skip: "SKIP" };
card.classList.remove("state-pass", "state-fail", "state-skip");
if (state) card.classList.add("state-" + state);
chip.textContent = state ? labels[state] : "PENDING";
// highlight active button
card.querySelectorAll(".mt-btn").forEach(btn => btn.classList.remove("active"));
const activeBtn = card.querySelector(".mt-btn-" + state);
if (activeBtn) activeBtn.classList.add("active");
}
function updateScore(results) {
const passed = Object.values(results).filter(v => v === "pass").length;
const decided = Object.values(results).filter(v => v !== undefined).length;
document.getElementById("score-display").textContent = passed + " / " + TOTAL;
const pct = TOTAL ? (decided / TOTAL) * 100 : 0;
const bar = document.getElementById("progress-bar");
bar.style.width = pct + "%";
// colour the bar by overall health
const failCount = Object.values(results).filter(v => v === "fail").length;
bar.style.background = failCount > 0
? "var(--red)"
: passed === TOTAL ? "var(--green)" : "var(--amber)";
}
function updateSummary(results) {
const passed = Object.values(results).filter(v => v === "pass").length;
const failed = Object.values(results).filter(v => v === "fail").length;
const skipped = Object.values(results).filter(v => v === "skip").length;
const decided = passed + failed + skipped;
const summaryBody = document.getElementById("summary-body");
if (decided < TOTAL) {
summaryBody.innerHTML = '';
const p = document.createElement('p');
p.className = 'mt-summary-hint';
p.textContent = (TOTAL - decided) + ' scenario(s) still pending.';
summaryBody.appendChild(p);
return;
}
const pct = TOTAL ? Math.round((passed / TOTAL) * 100) : 0;
const color = failed > 0 ? "var(--red)" : "var(--green)";
// Safely build summary UI using DOM API to avoid XSS from potentially untrusted variables
summaryBody.innerHTML = '';
const scoreDiv = document.createElement('div');
scoreDiv.className = 'mt-summary-score';
scoreDiv.style.color = color;
scoreDiv.textContent = passed + ' / ' + TOTAL;
summaryBody.appendChild(scoreDiv);
const pctDiv = document.createElement('div');
pctDiv.className = 'mt-summary-pct';
pctDiv.textContent = pct + '% pass rate';
summaryBody.appendChild(pctDiv);
const statsContainer = document.createElement('div');
statsContainer.style.marginTop = '16px';
const createRow = (label, value, colorVar) => {
const row = document.createElement('div');
row.className = 'mt-summary-row';
const labelSpan = document.createElement('span');
labelSpan.textContent = label;
const valSpan = document.createElement('span');
valSpan.style.color = 'var(--' + colorVar + ')';
valSpan.style.fontWeight = '700';
valSpan.textContent = value;
row.appendChild(labelSpan);
row.appendChild(valSpan);
return row;
};
statsContainer.appendChild(createRow('PASSED', passed, 'green'));
statsContainer.appendChild(createRow('FAILED', failed, 'red'));
statsContainer.appendChild(createRow('SKIPPED', skipped, 'amber'));
summaryBody.appendChild(statsContainer);
const statusMsg = document.createElement('p');
statusMsg.style.marginTop = '12px';
statusMsg.style.fontSize = '11px';
if (failed > 0) {
statusMsg.style.color = 'var(--red)';
statusMsg.textContent = '⚠ ' + failed + ' failure(s) need attention before release.';
} else {
statusMsg.style.color = 'var(--green)';
statusMsg.textContent = 'All tested scenarios passed — ship it.';
}
summaryBody.appendChild(statusMsg);
}
function resetAll() {
if (!confirm("Reset all test results?")) return;
sessionStorage.removeItem(KEY);
const results = {};
document.querySelectorAll("[data-scenario]").forEach(card => {
const id = card.dataset.scenario;
applyState(id, null);
});
updateScore(results);
const summaryBody = document.getElementById("summary-body");
summaryBody.innerHTML = '';
const p = document.createElement('p');
p.className = 'mt-summary-hint';
p.textContent = 'Mark all scenarios above to see your final score.';
summaryBody.appendChild(p);
}
// Restore saved state on load
(function init() {
const results = loadResults();
Object.entries(results).forEach(([id, state]) => applyState(id, state));
updateScore(results);
updateSummary(results);
})();
</script>
{% endblock %}

1
src/events/__init__.py Normal file
View File

@@ -0,0 +1 @@
"""Events — Domain event dispatch and subscription."""

1
src/memory/__init__.py Normal file
View File

@@ -0,0 +1 @@
"""Memory — Persistent conversation and knowledge memory."""

View File

@@ -1 +1 @@
"""Notifications — Push notification store (notifier singleton)."""

View File

@@ -0,0 +1 @@
"""Self-Modify — Runtime self-modification with safety constraints."""

View File

@@ -0,0 +1 @@
"""Self-TDD — Continuous test watchdog with regression alerting."""

View File

@@ -1 +1 @@
"""Shortcuts — Siri Shortcuts API endpoints."""

View File

@@ -0,0 +1 @@
"""Spark — Intelligence engine for events, predictions, and advisory."""

View File

@@ -1 +1 @@
"""Swarm — Multi-agent coordinator with auction-based task assignment."""

View File

@@ -0,0 +1 @@
"""Telegram Bot — Bridge Telegram messages to Timmy."""

View File

@@ -0,0 +1 @@
"""Timmy — Core AI agent (Ollama/AirLLM backends, CLI, prompts)."""

View File

@@ -1 +1 @@
"""Timmy Serve — L402 Lightning-gated API server (port 8402)."""

1
src/upgrades/__init__.py Normal file
View File

@@ -0,0 +1 @@
"""Upgrades — System upgrade queue and execution pipeline."""

View File

@@ -1 +1 @@
"""Voice — NLU intent detection (regex-based, local, no cloud)."""

View File

@@ -1 +1 @@
"""WebSocket Manager — Real-time connection handler (ws_manager singleton)."""

View File

View File

View File

@@ -22,7 +22,7 @@ from unittest.mock import AsyncMock, MagicMock, patch
def _css() -> str:
"""Read the main stylesheet."""
css_path = Path(__file__).parent.parent / "static" / "style.css"
css_path = Path(__file__).parent.parent.parent / "static" / "style.css"
return css_path.read_text()
@@ -290,13 +290,13 @@ def test_M605_health_status_passes_model_to_template(client):
def _mobile_html() -> str:
"""Read the mobile template source."""
path = Path(__file__).parent.parent / "src" / "dashboard" / "templates" / "mobile.html"
path = Path(__file__).parent.parent.parent / "src" / "dashboard" / "templates" / "mobile.html"
return path.read_text()
def _swarm_live_html() -> str:
"""Read the swarm live template source."""
path = Path(__file__).parent.parent / "src" / "dashboard" / "templates" / "swarm_live.html"
path = Path(__file__).parent.parent.parent / "src" / "dashboard" / "templates" / "swarm_live.html"
return path.read_text()

0
tests/hands/__init__.py Normal file
View File

View File

View File

View File

@@ -1,4 +1,4 @@
"""Tests for dashboard/routes/voice_enhanced.py — enhanced voice processing."""
"""Tests for enhanced voice processing (merged into dashboard/routes/voice.py)."""
from unittest.mock import MagicMock, patch
@@ -56,7 +56,7 @@ class TestVoiceEnhancedProcess:
mock_run.content = "Hello from Timmy!"
mock_agent.run.return_value = mock_run
with patch("dashboard.routes.voice_enhanced.create_timmy", return_value=mock_agent):
with patch("dashboard.routes.voice.create_timmy", return_value=mock_agent):
resp = client.post(
"/voice/enhanced/process",
data={"text": "tell me about Bitcoin", "speak_response": "false"},
@@ -69,7 +69,7 @@ class TestVoiceEnhancedProcess:
def test_chat_fallback_error_handling(self, client):
"""When the agent raises, the error should be captured gracefully."""
with patch(
"dashboard.routes.voice_enhanced.create_timmy",
"dashboard.routes.voice.create_timmy",
side_effect=RuntimeError("Ollama offline"),
):
resp = client.post(

View File

0
tests/mcp/__init__.py Normal file
View File

View File

View File

View File

Some files were not shown because too many files have changed in this diff Show More