forked from Rockachopa/Timmy-time-dashboard
196 lines
8.9 KiB
Markdown
196 lines
8.9 KiB
Markdown
|
|
# Test Coverage Analysis — Timmy Time Dashboard
|
|||
|
|
|
|||
|
|
**Date:** 2026-03-06
|
|||
|
|
**Overall coverage:** 63.6% (7,996 statements, 2,910 missed)
|
|||
|
|
**Threshold:** 60% (passes, but barely)
|
|||
|
|
**Test suite:** 914 passed, 4 failed, 39 skipped, 5 errors — 35 seconds
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Current Coverage by Package
|
|||
|
|
|
|||
|
|
| Package | Approx. Coverage | Notes |
|
|||
|
|
|---------|-----------------|-------|
|
|||
|
|
| `spark/` | 90–98% | Best-covered package |
|
|||
|
|
| `timmy_serve/` | 80–100% | Small package, well tested |
|
|||
|
|
| `infrastructure/models/` | 42–97% | `registry` great, `multimodal` weak |
|
|||
|
|
| `dashboard/middleware/` | 79–100% | Solid |
|
|||
|
|
| `dashboard/routes/` | 36–100% | Highly uneven — some routes untested |
|
|||
|
|
| `integrations/` | 51–100% | Paperclip well covered; Discord weak |
|
|||
|
|
| `timmy/` | 0–100% | Several core modules at 0% |
|
|||
|
|
| `brain/` | 0–75% | `client` and `worker` very low |
|
|||
|
|
| `infrastructure/events/` | 0% | Completely untested |
|
|||
|
|
| `infrastructure/error_capture.py` | 0% | Completely untested |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Priority 1 — Zero-Coverage Modules (0%)
|
|||
|
|
|
|||
|
|
These modules have **no test coverage at all** and represent the biggest risk:
|
|||
|
|
|
|||
|
|
| Module | Stmts | Purpose |
|
|||
|
|
|--------|-------|---------|
|
|||
|
|
| `src/timmy/semantic_memory.py` | 187 | Semantic memory system — core agent feature |
|
|||
|
|
| `src/timmy/agents/timmy.py` | 165 | Main Timmy agent class |
|
|||
|
|
| `src/timmy/agents/base.py` | 57 | Base agent class |
|
|||
|
|
| `src/timmy/interview.py` | 46 | Interview flow |
|
|||
|
|
| `src/infrastructure/error_capture.py` | 91 | Error capture/reporting |
|
|||
|
|
| `src/infrastructure/events/broadcaster.py` | 67 | Event broadcasting |
|
|||
|
|
| `src/infrastructure/events/bus.py` | 74 | Event bus |
|
|||
|
|
| `src/infrastructure/openfang/tools.py` | 41 | OpenFang tool definitions |
|
|||
|
|
| `src/brain/schema.py` | 14 | Brain schema definitions |
|
|||
|
|
|
|||
|
|
**Recommendation:** `timmy/agents/timmy.py` (165 stmts) and `semantic_memory.py` (187 stmts) are the highest-value targets. The events subsystem (`broadcaster.py` + `bus.py` = 141 stmts) is critical infrastructure with zero tests.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Priority 2 — Under-Tested Modules (<50%)
|
|||
|
|
|
|||
|
|
| Module | Cover | Stmts Missed | Purpose |
|
|||
|
|
|--------|-------|-------------|---------|
|
|||
|
|
| `brain/client.py` | 14.8% | 127 | Brain client — primary brain interface |
|
|||
|
|
| `brain/worker.py` | 16.1% | 156 | Background brain worker |
|
|||
|
|
| `brain/embeddings.py` | 35.0% | 26 | Embedding generation |
|
|||
|
|
| `timmy/approvals.py` | 39.1% | 42 | Approval workflow |
|
|||
|
|
| `dashboard/routes/marketplace.py` | 36.4% | 21 | Marketplace routes |
|
|||
|
|
| `dashboard/routes/paperclip.py` | 41.1% | 96 | Paperclip dashboard routes |
|
|||
|
|
| `infrastructure/hands/tools.py` | 41.3% | 27 | Tool execution |
|
|||
|
|
| `infrastructure/models/multimodal.py` | 42.6% | 81 | Multimodal model support |
|
|||
|
|
| `dashboard/routes/router.py` | 42.9% | 12 | Route registration |
|
|||
|
|
| `dashboard/routes/swarm.py` | 43.3% | 17 | Swarm routes |
|
|||
|
|
| `timmy/cascade_adapter.py` | 43.2% | 25 | Cascade LLM adapter |
|
|||
|
|
| `timmy/tools_intro/__init__.py` | 44.7% | 84 | Tool introduction system |
|
|||
|
|
| `timmy/tools.py` | 46.4% | 147 | Agent tool definitions |
|
|||
|
|
| `timmy/cli.py` | 47.4% | 30 | CLI entry point |
|
|||
|
|
| `timmy/conversation.py` | 48.5% | 34 | Conversation management |
|
|||
|
|
|
|||
|
|
**Recommendation:** `brain/client.py` + `brain/worker.py` together miss 283 statements and are the core of the brain/memory system. `timmy/tools.py` misses 147 statements and is the agent's tool registry — high impact.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Priority 3 — Test Infrastructure Issues
|
|||
|
|
|
|||
|
|
### 3a. Broken Tests (4 failures)
|
|||
|
|
|
|||
|
|
All in `tests/test_setup_script.py` — tests reference `/home/ubuntu/setup_timmy.sh` which doesn't exist. These tests are environment-specific and should either:
|
|||
|
|
- Be marked `@pytest.mark.skip_ci` or `@pytest.mark.functional`
|
|||
|
|
- Use a fixture to locate the script relative to the project
|
|||
|
|
|
|||
|
|
### 3b. Collection Errors (5 errors)
|
|||
|
|
|
|||
|
|
`tests/functional/test_setup_prod.py` — same issue, references a non-existent script path. Should be guarded with a skip condition.
|
|||
|
|
|
|||
|
|
### 3c. pytest-xdist Conflicts with Coverage
|
|||
|
|
|
|||
|
|
The `pyproject.toml` `addopts` includes `-n auto --dist worksteal` (xdist), but `make test-cov` also passes `--cov` flags. This causes a conflict:
|
|||
|
|
```
|
|||
|
|
pytest: error: unrecognized arguments: -n --dist worksteal
|
|||
|
|
```
|
|||
|
|
**Fix:** Either:
|
|||
|
|
- Remove `-n auto --dist worksteal` from `addopts` and add it only in `make test` target
|
|||
|
|
- Or use `-p no:xdist` in the coverage targets (current workaround)
|
|||
|
|
|
|||
|
|
### 3d. Tox Configuration
|
|||
|
|
|
|||
|
|
`tox.ini` has `unit` and `integration` environments that run the **exact same command** — they're aliases. This is misleading:
|
|||
|
|
- `unit` should run `-m unit` (fast, no I/O)
|
|||
|
|
- `integration` should run `-m integration` (may use SQLite)
|
|||
|
|
- Consider adding a `coverage` tox env
|
|||
|
|
|
|||
|
|
### 3e. CI Workflow (`tests.yml`)
|
|||
|
|
|
|||
|
|
- CI uses `pip install -e ".[dev]"` but the project uses Poetry — dependency resolution may differ
|
|||
|
|
- CI doesn't pass marker filters, so it runs **all** tests including those that may need Docker/Ollama
|
|||
|
|
- No coverage enforcement in CI (the `fail_under=60` in pyproject.toml only works with `--cov-fail-under`)
|
|||
|
|
- No caching of Poetry virtualenvs
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Priority 4 — Test Quality Gaps
|
|||
|
|
|
|||
|
|
### 4a. Missing Error-Path Testing
|
|||
|
|
|
|||
|
|
Many modules have happy-path tests but lack coverage for:
|
|||
|
|
- **Graceful degradation paths**: The architecture mandates graceful degradation when Ollama/Redis/AirLLM are unavailable, but most fallback paths are untested (e.g., `cascade.py` lines 563–655)
|
|||
|
|
- **`brain/client.py`**: Only 14.8% covered — connection failures, retries, and error handling are untested
|
|||
|
|
- **`infrastructure/error_capture.py`**: 0% — the error capture system itself has no tests
|
|||
|
|
|
|||
|
|
### 4b. No Integration Tests for Events System
|
|||
|
|
|
|||
|
|
The `infrastructure/events/` package (`broadcaster.py` + `bus.py`) is 0% covered. This is the pub/sub backbone for the application. Tests should cover:
|
|||
|
|
- Event subscription and dispatch
|
|||
|
|
- Multiple subscribers
|
|||
|
|
- Error handling in event handlers
|
|||
|
|
- Async event broadcasting
|
|||
|
|
|
|||
|
|
### 4c. Security Tests Are Thin
|
|||
|
|
|
|||
|
|
- `tests/security/` has only 3 files totaling ~140 lines
|
|||
|
|
- `src/timmy_serve/l402_proxy.py` (payment gating, listed as security-sensitive) has no dedicated test file
|
|||
|
|
- CSRF tests exist but bypass/traversal tests are minimal
|
|||
|
|
- No tests for the `approvals.py` authorization workflow (39.1% covered)
|
|||
|
|
|
|||
|
|
### 4d. Missing WebSocket Tests
|
|||
|
|
|
|||
|
|
WebSocket handler (`ws_manager/handler.py`) has 81.2% coverage, but the disconnect/reconnect and error paths (lines 132–147) aren't tested. For a real-time dashboard, WebSocket reliability is critical.
|
|||
|
|
|
|||
|
|
### 4e. No Tests for `timmy/agents/` Subpackage
|
|||
|
|
|
|||
|
|
The Agno-based agent classes (`base.py`, `timmy.py`) are at 0% coverage (222 statements). These are stubbed in conftest but never actually exercised. Even with the Agno stub, the control flow and prompt construction logic should be tested.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Priority 5 — Test Speed & Parallelism
|
|||
|
|
|
|||
|
|
| Metric | Value |
|
|||
|
|
|--------|-------|
|
|||
|
|
| Total wall time | ~35s (sequential) |
|
|||
|
|
| Parallel (`-n auto`) | Would be ~10-15s |
|
|||
|
|
| Slowest category | Functional tests (HTTP, Docker) |
|
|||
|
|
|
|||
|
|
**Observations:**
|
|||
|
|
- 30-second timeout per test is generous — consider 10s for unit, 30s for integration
|
|||
|
|
- The `--dist worksteal` strategy is good for uneven test durations
|
|||
|
|
- 39 tests are skipped (mostly due to missing markers/env) — this is expected
|
|||
|
|
- No test duration profiling is configured (consider `--durations=10`)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Recommended Action Plan
|
|||
|
|
|
|||
|
|
### Quick Wins (High ROI, Low Effort)
|
|||
|
|
1. **Fix the 4 broken tests** in `test_setup_script.py` (add skip guards)
|
|||
|
|
2. **Fix xdist/coverage conflict** in `pyproject.toml` addopts
|
|||
|
|
3. **Differentiate tox `unit` vs `integration`** environments
|
|||
|
|
4. **Add `--durations=10`** to default addopts for profiling slow tests
|
|||
|
|
5. **Add `--cov-fail-under=60`** to CI workflow to enforce the threshold
|
|||
|
|
|
|||
|
|
### Medium Effort, High Impact
|
|||
|
|
6. **Test the events system** (`broadcaster.py` + `bus.py`) — 141 uncovered statements, critical infrastructure
|
|||
|
|
7. **Test `timmy/agents/timmy.py`** — 165 uncovered statements, core agent
|
|||
|
|
8. **Test `brain/client.py` and `brain/worker.py`** — 283 uncovered statements, core memory
|
|||
|
|
9. **Test `timmy/tools.py`** error paths — 147 uncovered statements
|
|||
|
|
10. **Test `error_capture.py`** — 91 uncovered statements, observability blind spot
|
|||
|
|
|
|||
|
|
### Longer Term
|
|||
|
|
11. **Add graceful-degradation tests** — verify fallback behavior for all optional services
|
|||
|
|
12. **Expand security test suite** — approvals, L402 proxy, input sanitization
|
|||
|
|
13. **Add coverage tox environment** and enforce in CI
|
|||
|
|
14. **Align CI with Poetry** — use `poetry install` instead of pip for consistent resolution
|
|||
|
|
15. **Target 75% coverage** as the next threshold milestone (currently 63.6%)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Coverage Floor Modules (Already Well-Tested)
|
|||
|
|
|
|||
|
|
These modules are at 95%+ and serve as good examples of testing patterns:
|
|||
|
|
|
|||
|
|
- `spark/eidos.py` — 98.3%
|
|||
|
|
- `spark/memory.py` — 98.3%
|
|||
|
|
- `infrastructure/models/registry.py` — 97.1%
|
|||
|
|
- `timmy/agent_core/ollama_adapter.py` — 97.8%
|
|||
|
|
- `timmy/agent_core/interface.py` — 100%
|
|||
|
|
- `dashboard/middleware/security_headers.py` — 100%
|
|||
|
|
- `dashboard/routes/agents.py` — 100%
|
|||
|
|
- `timmy_serve/inter_agent.py` — 100%
|