Files
the-nexus/docs/QUARANTINE_PROCESS.md
2026-04-07 14:38:49 +00:00

169 lines
4.2 KiB
Markdown

# Quarantine Process
**Poka-yoke principle:** a flaky or broken test must never silently rot in
place. Quarantine is the correction step in the
Prevention → Detection → Correction triad described in issue #1094.
---
## When to quarantine
Quarantine a test when **any** of the following are true:
| Signal | Source |
|--------|--------|
| `flake_detector.py` flags the test at < 95 % consistency | Automated |
| The test fails intermittently in CI over two consecutive runs | Manual observation |
| The test depends on infrastructure that is temporarily unavailable | Manual observation |
| You are fixing a bug and need to defer a related test | Developer judgement |
Do **not** use quarantine as a way to ignore tests indefinitely. The
quarantine directory is a **30-day time-box** — see the escalation rule below.
---
## Step-by-step workflow
### 1 File an issue
Open a Gitea issue with the title prefix `[FLAKY]` or `[BROKEN]`:
```
[FLAKY] test_foo_bar non-deterministically fails with assertion error
```
Note the issue number — you will need it in the next step.
### 2 Move the test file
Move (or copy) the test from `tests/` into `tests/quarantine/`.
```bash
git mv tests/test_my_thing.py tests/quarantine/test_my_thing.py
```
If only individual test functions are flaky, extract them into a new file in
`tests/quarantine/` rather than moving the whole module.
### 3 Annotate the test
Add the `@pytest.mark.quarantine` marker with the issue reference:
```python
import pytest
@pytest.mark.quarantine(reason="Flaky until #NNN is resolved")
def test_my_thing():
...
```
This satisfies the poka-yoke skip-enforcement rule: the test is allowed to
skip/be excluded because it is explicitly linked to a tracking issue.
### 4 Verify CI still passes
```bash
pytest # default run — quarantine tests are excluded
pytest --run-quarantine # optional: run quarantined tests explicitly
```
The main CI run must be green before merging.
### 5 Add to `.test-history.json` exclusions (optional)
If the flake detector is tracking the test, add it to the `quarantine_list` in
`.test-history.json` so it is excluded from the consistency report:
```json
{
"quarantine_list": [
"tests/quarantine/test_my_thing.py::test_my_thing"
]
}
```
---
## Escalation rule
If a quarantined test's tracking issue has had **no activity for 30 days**,
the next developer to touch that file must:
1. Attempt to fix and un-quarantine the test, **or**
2. Delete the test and close the issue with a comment explaining why, **or**
3. Leave a comment on the issue explaining the blocker and reset the 30-day
clock explicitly.
**A test may not stay in quarantine indefinitely without active attention.**
---
## Un-quarantining a test
When the underlying issue is resolved:
1. Remove `@pytest.mark.quarantine` from the test.
2. Move the file back from `tests/quarantine/` to `tests/`.
3. Run the full suite to confirm it passes consistently (at least 3 local runs).
4. Close the tracking issue.
5. Remove any entries from `.test-history.json`'s `quarantine_list`.
---
## Flake detector integration
The flake detector (`scripts/flake_detector.py`) is run after every CI test
execution. It reads `.test-report.json` (produced by `pytest --json-report`)
and updates `.test-history.json`.
**CI integration example (shell script or CI step):**
```bash
pytest --json-report --json-report-file=.test-report.json
python scripts/flake_detector.py
```
If the flake detector exits non-zero, the CI step fails and the output lists
the offending tests with their consistency percentages.
**Local usage:**
```bash
# After running tests with JSON report:
python scripts/flake_detector.py
# Just view current statistics without ingesting a new report:
python scripts/flake_detector.py --no-update
# Lower threshold for local dev:
python scripts/flake_detector.py --threshold 0.90
```
---
## Summary
```
Test fails intermittently
File [FLAKY] issue
git mv test → tests/quarantine/
Add @pytest.mark.quarantine(reason="#NNN")
Main CI green ✓
Fix the root cause (within 30 days)
git mv back → tests/
Remove quarantine marker
Close issue ✓
```