Timmy-time-dashboard/TEST_COVERAGE_ANALYSIS.md

# Test Coverage Analysis — Timmy Time Dashboard

**Date:** 2026-03-06
**Overall coverage:** 63.6% (7,996 statements, 2,910 missed)
**Threshold:** 60% (passes, but barely)
**Test suite:** 914 passed, 4 failed, 39 skipped, 5 errors — 35 seconds

---

## Current Coverage by Package

| Package | Approx. Coverage | Notes |
|---------|-----------------|-------|
| `spark/` | 90–98% | Best-covered package |
| `timmy_serve/` | 80–100% | Small package, well tested |
| `infrastructure/models/` | 42–97% | `registry` great, `multimodal` weak |
| `dashboard/middleware/` | 79–100% | Solid |
| `dashboard/routes/` | 36–100% | Highly uneven — some routes untested |
| `integrations/` | 51–100% | Paperclip well covered; Discord weak |
| `timmy/` | 0–100% | Several core modules at 0% |
| `brain/` | 0–75% | `client` and `worker` very low |
| `infrastructure/events/` | 0% | Completely untested |
| `infrastructure/error_capture.py` | 0% | Completely untested |

---

## Priority 1 — Zero-Coverage Modules (0%)

These modules have **no test coverage at all** and represent the biggest risk:

| Module | Stmts | Purpose |
|--------|-------|---------|
| `src/timmy/semantic_memory.py` | 187 | Semantic memory system — core agent feature |
| `src/timmy/agents/timmy.py` | 165 | Main Timmy agent class |
| `src/timmy/agents/base.py` | 57 | Base agent class |
| `src/timmy/interview.py` | 46 | Interview flow |
| `src/infrastructure/error_capture.py` | 91 | Error capture/reporting |
| `src/infrastructure/events/broadcaster.py` | 67 | Event broadcasting |
| `src/infrastructure/events/bus.py` | 74 | Event bus |
| `src/infrastructure/openfang/tools.py` | 41 | OpenFang tool definitions |
| `src/brain/schema.py` | 14 | Brain schema definitions |

**Recommendation:** `timmy/agents/timmy.py` (165 stmts) and `semantic_memory.py` (187 stmts) are the highest-value targets. The events subsystem (`broadcaster.py` + `bus.py` = 141 stmts) is critical infrastructure with zero tests.

---

## Priority 2 — Under-Tested Modules (<50%)

| Module | Cover | Stmts Missed | Purpose |
|--------|-------|-------------|---------|
| `brain/client.py` | 14.8% | 127 | Brain client — primary brain interface |
| `brain/worker.py` | 16.1% | 156 | Background brain worker |
| `brain/embeddings.py` | 35.0% | 26 | Embedding generation |
| `timmy/approvals.py` | 39.1% | 42 | Approval workflow |
| `dashboard/routes/marketplace.py` | 36.4% | 21 | Marketplace routes |
| `dashboard/routes/paperclip.py` | 41.1% | 96 | Paperclip dashboard routes |
| `infrastructure/hands/tools.py` | 41.3% | 27 | Tool execution |
| `infrastructure/models/multimodal.py` | 42.6% | 81 | Multimodal model support |
| `dashboard/routes/router.py` | 42.9% | 12 | Route registration |
| `dashboard/routes/swarm.py` | 43.3% | 17 | Swarm routes |
| `timmy/cascade_adapter.py` | 43.2% | 25 | Cascade LLM adapter |
| `timmy/tools_intro/__init__.py` | 44.7% | 84 | Tool introduction system |
| `timmy/tools.py` | 46.4% | 147 | Agent tool definitions |
| `timmy/cli.py` | 47.4% | 30 | CLI entry point |
| `timmy/conversation.py` | 48.5% | 34 | Conversation management |

**Recommendation:** `brain/client.py` + `brain/worker.py` together miss 283 statements and are the core of the brain/memory system. `timmy/tools.py` misses 147 statements and is the agent's tool registry — high impact.

---

## Priority 3 — Test Infrastructure Issues

### 3a. Broken Tests (4 failures)

All in `tests/test_setup_script.py` — tests reference `/home/ubuntu/setup_timmy.sh` which doesn't exist. These tests are environment-specific and should either:
- Be marked `@pytest.mark.skip_ci` or `@pytest.mark.functional`
- Use a fixture to locate the script relative to the project

### 3b. Collection Errors (5 errors)

`tests/functional/test_setup_prod.py` — same issue, references a non-existent script path. Should be guarded with a skip condition.

### 3c. pytest-xdist Conflicts with Coverage

The `pyproject.toml` `addopts` includes `-n auto --dist worksteal` (xdist), but `make test-cov` also passes `--cov` flags. This causes a conflict:
```
pytest: error: unrecognized arguments: -n --dist worksteal
```
**Fix:** Either:
- Remove `-n auto --dist worksteal` from `addopts` and add it only in `make test` target
- Or use `-p no:xdist` in the coverage targets (current workaround)

### 3d. Tox Configuration

`tox.ini` has `unit` and `integration` environments that run the **exact same command** — they're aliases. This is misleading:
- `unit` should run `-m unit` (fast, no I/O)
- `integration` should run `-m integration` (may use SQLite)
- Consider adding a `coverage` tox env

### 3e. CI Workflow (`tests.yml`)

- CI uses `pip install -e ".[dev]"` but the project uses Poetry — dependency resolution may differ
- CI doesn't pass marker filters, so it runs **all** tests including those that may need Docker/Ollama
- No coverage enforcement in CI (the `fail_under=60` in pyproject.toml only works with `--cov-fail-under`)
- No caching of Poetry virtualenvs

---

## Priority 4 — Test Quality Gaps

### 4a. Missing Error-Path Testing

Many modules have happy-path tests but lack coverage for:
- **Graceful degradation paths**: The architecture mandates graceful degradation when Ollama/Redis/AirLLM are unavailable, but most fallback paths are untested (e.g., `cascade.py` lines 563–655)
- **`brain/client.py`**: Only 14.8% covered — connection failures, retries, and error handling are untested
- **`infrastructure/error_capture.py`**: 0% — the error capture system itself has no tests

### 4b. No Integration Tests for Events System

The `infrastructure/events/` package (`broadcaster.py` + `bus.py`) is 0% covered. This is the pub/sub backbone for the application. Tests should cover:
- Event subscription and dispatch
- Multiple subscribers
- Error handling in event handlers
- Async event broadcasting

### 4c. Security Tests Are Thin

- `tests/security/` has only 3 files totaling ~140 lines
- `src/timmy_serve/l402_proxy.py` (payment gating, listed as security-sensitive) has no dedicated test file
- CSRF tests exist but bypass/traversal tests are minimal
- No tests for the `approvals.py` authorization workflow (39.1% covered)

### 4d. Missing WebSocket Tests

WebSocket handler (`ws_manager/handler.py`) has 81.2% coverage, but the disconnect/reconnect and error paths (lines 132–147) aren't tested. For a real-time dashboard, WebSocket reliability is critical.

### 4e. No Tests for `timmy/agents/` Subpackage

The Agno-based agent classes (`base.py`, `timmy.py`) are at 0% coverage (222 statements). These are stubbed in conftest but never actually exercised. Even with the Agno stub, the control flow and prompt construction logic should be tested.

---

## Priority 5 — Test Speed & Parallelism

| Metric | Value |
|--------|-------|
| Total wall time | ~35s (sequential) |
| Parallel (`-n auto`) | Would be ~10-15s |
| Slowest category | Functional tests (HTTP, Docker) |

**Observations:**
- 30-second timeout per test is generous — consider 10s for unit, 30s for integration
- The `--dist worksteal` strategy is good for uneven test durations
- 39 tests are skipped (mostly due to missing markers/env) — this is expected
- No test duration profiling is configured (consider `--durations=10`)

---

## Recommended Action Plan

### Quick Wins (High ROI, Low Effort)
1. **Fix the 4 broken tests** in `test_setup_script.py` (add skip guards)
2. **Fix xdist/coverage conflict** in `pyproject.toml` addopts
3. **Differentiate tox `unit` vs `integration`** environments
4. **Add `--durations=10`** to default addopts for profiling slow tests
5. **Add `--cov-fail-under=60`** to CI workflow to enforce the threshold

### Medium Effort, High Impact
6. **Test the events system** (`broadcaster.py` + `bus.py`) — 141 uncovered statements, critical infrastructure
7. **Test `timmy/agents/timmy.py`** — 165 uncovered statements, core agent
8. **Test `brain/client.py` and `brain/worker.py`** — 283 uncovered statements, core memory
9. **Test `timmy/tools.py`** error paths — 147 uncovered statements
10. **Test `error_capture.py`** — 91 uncovered statements, observability blind spot

### Longer Term
11. **Add graceful-degradation tests** — verify fallback behavior for all optional services
12. **Expand security test suite** — approvals, L402 proxy, input sanitization
13. **Add coverage tox environment** and enforce in CI
14. **Align CI with Poetry** — use `poetry install` instead of pip for consistent resolution
15. **Target 75% coverage** as the next threshold milestone (currently 63.6%)

---

## Coverage Floor Modules (Already Well-Tested)

These modules are at 95%+ and serve as good examples of testing patterns:

- `spark/eidos.py` — 98.3%
- `spark/memory.py` — 98.3%
- `infrastructure/models/registry.py` — 97.1%
- `timmy/agent_core/ollama_adapter.py` — 97.8%
- `timmy/agent_core/interface.py` — 100%
- `dashboard/middleware/security_headers.py` — 100%
- `dashboard/routes/agents.py` — 100%
- `timmy_serve/inter_agent.py` — 100%