Files
Timmy-time-dashboard/TEST_COVERAGE_ANALYSIS.md

196 lines
8.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Test Coverage Analysis — Timmy Time Dashboard
**Date:** 2026-03-06
**Overall coverage:** 63.6% (7,996 statements, 2,910 missed)
**Threshold:** 60% (passes, but barely)
**Test suite:** 914 passed, 4 failed, 39 skipped, 5 errors — 35 seconds
---
## Current Coverage by Package
| Package | Approx. Coverage | Notes |
|---------|-----------------|-------|
| `spark/` | 9098% | Best-covered package |
| `timmy_serve/` | 80100% | Small package, well tested |
| `infrastructure/models/` | 4297% | `registry` great, `multimodal` weak |
| `dashboard/middleware/` | 79100% | Solid |
| `dashboard/routes/` | 36100% | Highly uneven — some routes untested |
| `integrations/` | 51100% | Paperclip well covered; Discord weak |
| `timmy/` | 0100% | Several core modules at 0% |
| `brain/` | 075% | `client` and `worker` very low |
| `infrastructure/events/` | 0% | Completely untested |
| `infrastructure/error_capture.py` | 0% | Completely untested |
---
## Priority 1 — Zero-Coverage Modules (0%)
These modules have **no test coverage at all** and represent the biggest risk:
| Module | Stmts | Purpose |
|--------|-------|---------|
| `src/timmy/semantic_memory.py` | 187 | Semantic memory system — core agent feature |
| `src/timmy/agents/timmy.py` | 165 | Main Timmy agent class |
| `src/timmy/agents/base.py` | 57 | Base agent class |
| `src/timmy/interview.py` | 46 | Interview flow |
| `src/infrastructure/error_capture.py` | 91 | Error capture/reporting |
| `src/infrastructure/events/broadcaster.py` | 67 | Event broadcasting |
| `src/infrastructure/events/bus.py` | 74 | Event bus |
| `src/infrastructure/openfang/tools.py` | 41 | OpenFang tool definitions |
| `src/brain/schema.py` | 14 | Brain schema definitions |
**Recommendation:** `timmy/agents/timmy.py` (165 stmts) and `semantic_memory.py` (187 stmts) are the highest-value targets. The events subsystem (`broadcaster.py` + `bus.py` = 141 stmts) is critical infrastructure with zero tests.
---
## Priority 2 — Under-Tested Modules (<50%)
| Module | Cover | Stmts Missed | Purpose |
|--------|-------|-------------|---------|
| `brain/client.py` | 14.8% | 127 | Brain client — primary brain interface |
| `brain/worker.py` | 16.1% | 156 | Background brain worker |
| `brain/embeddings.py` | 35.0% | 26 | Embedding generation |
| `timmy/approvals.py` | 39.1% | 42 | Approval workflow |
| `dashboard/routes/marketplace.py` | 36.4% | 21 | Marketplace routes |
| `dashboard/routes/paperclip.py` | 41.1% | 96 | Paperclip dashboard routes |
| `infrastructure/hands/tools.py` | 41.3% | 27 | Tool execution |
| `infrastructure/models/multimodal.py` | 42.6% | 81 | Multimodal model support |
| `dashboard/routes/router.py` | 42.9% | 12 | Route registration |
| `dashboard/routes/swarm.py` | 43.3% | 17 | Swarm routes |
| `timmy/cascade_adapter.py` | 43.2% | 25 | Cascade LLM adapter |
| `timmy/tools_intro/__init__.py` | 44.7% | 84 | Tool introduction system |
| `timmy/tools.py` | 46.4% | 147 | Agent tool definitions |
| `timmy/cli.py` | 47.4% | 30 | CLI entry point |
| `timmy/conversation.py` | 48.5% | 34 | Conversation management |
**Recommendation:** `brain/client.py` + `brain/worker.py` together miss 283 statements and are the core of the brain/memory system. `timmy/tools.py` misses 147 statements and is the agent's tool registry — high impact.
---
## Priority 3 — Test Infrastructure Issues
### 3a. Broken Tests (4 failures)
All in `tests/test_setup_script.py` — tests reference `/home/ubuntu/setup_timmy.sh` which doesn't exist. These tests are environment-specific and should either:
- Be marked `@pytest.mark.skip_ci` or `@pytest.mark.functional`
- Use a fixture to locate the script relative to the project
### 3b. Collection Errors (5 errors)
`tests/functional/test_setup_prod.py` — same issue, references a non-existent script path. Should be guarded with a skip condition.
### 3c. pytest-xdist Conflicts with Coverage
The `pyproject.toml` `addopts` includes `-n auto --dist worksteal` (xdist), but `make test-cov` also passes `--cov` flags. This causes a conflict:
```
pytest: error: unrecognized arguments: -n --dist worksteal
```
**Fix:** Either:
- Remove `-n auto --dist worksteal` from `addopts` and add it only in `make test` target
- Or use `-p no:xdist` in the coverage targets (current workaround)
### 3d. Tox Configuration
`tox.ini` has `unit` and `integration` environments that run the **exact same command** — they're aliases. This is misleading:
- `unit` should run `-m unit` (fast, no I/O)
- `integration` should run `-m integration` (may use SQLite)
- Consider adding a `coverage` tox env
### 3e. CI Workflow (`tests.yml`)
- CI uses `pip install -e ".[dev]"` but the project uses Poetry — dependency resolution may differ
- CI doesn't pass marker filters, so it runs **all** tests including those that may need Docker/Ollama
- No coverage enforcement in CI (the `fail_under=60` in pyproject.toml only works with `--cov-fail-under`)
- No caching of Poetry virtualenvs
---
## Priority 4 — Test Quality Gaps
### 4a. Missing Error-Path Testing
Many modules have happy-path tests but lack coverage for:
- **Graceful degradation paths**: The architecture mandates graceful degradation when Ollama/Redis/AirLLM are unavailable, but most fallback paths are untested (e.g., `cascade.py` lines 563655)
- **`brain/client.py`**: Only 14.8% covered — connection failures, retries, and error handling are untested
- **`infrastructure/error_capture.py`**: 0% — the error capture system itself has no tests
### 4b. No Integration Tests for Events System
The `infrastructure/events/` package (`broadcaster.py` + `bus.py`) is 0% covered. This is the pub/sub backbone for the application. Tests should cover:
- Event subscription and dispatch
- Multiple subscribers
- Error handling in event handlers
- Async event broadcasting
### 4c. Security Tests Are Thin
- `tests/security/` has only 3 files totaling ~140 lines
- `src/timmy_serve/l402_proxy.py` (payment gating, listed as security-sensitive) has no dedicated test file
- CSRF tests exist but bypass/traversal tests are minimal
- No tests for the `approvals.py` authorization workflow (39.1% covered)
### 4d. Missing WebSocket Tests
WebSocket handler (`ws_manager/handler.py`) has 81.2% coverage, but the disconnect/reconnect and error paths (lines 132147) aren't tested. For a real-time dashboard, WebSocket reliability is critical.
### 4e. No Tests for `timmy/agents/` Subpackage
The Agno-based agent classes (`base.py`, `timmy.py`) are at 0% coverage (222 statements). These are stubbed in conftest but never actually exercised. Even with the Agno stub, the control flow and prompt construction logic should be tested.
---
## Priority 5 — Test Speed & Parallelism
| Metric | Value |
|--------|-------|
| Total wall time | ~35s (sequential) |
| Parallel (`-n auto`) | Would be ~10-15s |
| Slowest category | Functional tests (HTTP, Docker) |
**Observations:**
- 30-second timeout per test is generous — consider 10s for unit, 30s for integration
- The `--dist worksteal` strategy is good for uneven test durations
- 39 tests are skipped (mostly due to missing markers/env) — this is expected
- No test duration profiling is configured (consider `--durations=10`)
---
## Recommended Action Plan
### Quick Wins (High ROI, Low Effort)
1. **Fix the 4 broken tests** in `test_setup_script.py` (add skip guards)
2. **Fix xdist/coverage conflict** in `pyproject.toml` addopts
3. **Differentiate tox `unit` vs `integration`** environments
4. **Add `--durations=10`** to default addopts for profiling slow tests
5. **Add `--cov-fail-under=60`** to CI workflow to enforce the threshold
### Medium Effort, High Impact
6. **Test the events system** (`broadcaster.py` + `bus.py`) — 141 uncovered statements, critical infrastructure
7. **Test `timmy/agents/timmy.py`** — 165 uncovered statements, core agent
8. **Test `brain/client.py` and `brain/worker.py`** — 283 uncovered statements, core memory
9. **Test `timmy/tools.py`** error paths — 147 uncovered statements
10. **Test `error_capture.py`** — 91 uncovered statements, observability blind spot
### Longer Term
11. **Add graceful-degradation tests** — verify fallback behavior for all optional services
12. **Expand security test suite** — approvals, L402 proxy, input sanitization
13. **Add coverage tox environment** and enforce in CI
14. **Align CI with Poetry** — use `poetry install` instead of pip for consistent resolution
15. **Target 75% coverage** as the next threshold milestone (currently 63.6%)
---
## Coverage Floor Modules (Already Well-Tested)
These modules are at 95%+ and serve as good examples of testing patterns:
- `spark/eidos.py` — 98.3%
- `spark/memory.py` — 98.3%
- `infrastructure/models/registry.py` — 97.1%
- `timmy/agent_core/ollama_adapter.py` — 97.8%
- `timmy/agent_core/interface.py` — 100%
- `dashboard/middleware/security_headers.py` — 100%
- `dashboard/routes/agents.py` — 100%
- `timmy_serve/inter_agent.py` — 100%