forked from Timmy_Foundation/the-nexus
This commit is contained in:
168
docs/QUARANTINE_PROCESS.md
Normal file
168
docs/QUARANTINE_PROCESS.md
Normal file
@@ -0,0 +1,168 @@
|
||||
# Quarantine Process
|
||||
|
||||
**Poka-yoke principle:** a flaky or broken test must never silently rot in
|
||||
place. Quarantine is the correction step in the
|
||||
Prevention → Detection → Correction triad described in issue #1094.
|
||||
|
||||
---
|
||||
|
||||
## When to quarantine
|
||||
|
||||
Quarantine a test when **any** of the following are true:
|
||||
|
||||
| Signal | Source |
|
||||
|--------|--------|
|
||||
| `flake_detector.py` flags the test at < 95 % consistency | Automated |
|
||||
| The test fails intermittently in CI over two consecutive runs | Manual observation |
|
||||
| The test depends on infrastructure that is temporarily unavailable | Manual observation |
|
||||
| You are fixing a bug and need to defer a related test | Developer judgement |
|
||||
|
||||
Do **not** use quarantine as a way to ignore tests indefinitely. The
|
||||
quarantine directory is a **30-day time-box** — see the escalation rule below.
|
||||
|
||||
---
|
||||
|
||||
## Step-by-step workflow
|
||||
|
||||
### 1 File an issue
|
||||
|
||||
Open a Gitea issue with the title prefix `[FLAKY]` or `[BROKEN]`:
|
||||
|
||||
```
|
||||
[FLAKY] test_foo_bar non-deterministically fails with assertion error
|
||||
```
|
||||
|
||||
Note the issue number — you will need it in the next step.
|
||||
|
||||
### 2 Move the test file
|
||||
|
||||
Move (or copy) the test from `tests/` into `tests/quarantine/`.
|
||||
|
||||
```bash
|
||||
git mv tests/test_my_thing.py tests/quarantine/test_my_thing.py
|
||||
```
|
||||
|
||||
If only individual test functions are flaky, extract them into a new file in
|
||||
`tests/quarantine/` rather than moving the whole module.
|
||||
|
||||
### 3 Annotate the test
|
||||
|
||||
Add the `@pytest.mark.quarantine` marker with the issue reference:
|
||||
|
||||
```python
|
||||
import pytest
|
||||
|
||||
@pytest.mark.quarantine(reason="Flaky until #NNN is resolved")
|
||||
def test_my_thing():
|
||||
...
|
||||
```
|
||||
|
||||
This satisfies the poka-yoke skip-enforcement rule: the test is allowed to
|
||||
skip/be excluded because it is explicitly linked to a tracking issue.
|
||||
|
||||
### 4 Verify CI still passes
|
||||
|
||||
```bash
|
||||
pytest # default run — quarantine tests are excluded
|
||||
pytest --run-quarantine # optional: run quarantined tests explicitly
|
||||
```
|
||||
|
||||
The main CI run must be green before merging.
|
||||
|
||||
### 5 Add to `.test-history.json` exclusions (optional)
|
||||
|
||||
If the flake detector is tracking the test, add it to the `quarantine_list` in
|
||||
`.test-history.json` so it is excluded from the consistency report:
|
||||
|
||||
```json
|
||||
{
|
||||
"quarantine_list": [
|
||||
"tests/quarantine/test_my_thing.py::test_my_thing"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Escalation rule
|
||||
|
||||
If a quarantined test's tracking issue has had **no activity for 30 days**,
|
||||
the next developer to touch that file must:
|
||||
|
||||
1. Attempt to fix and un-quarantine the test, **or**
|
||||
2. Delete the test and close the issue with a comment explaining why, **or**
|
||||
3. Leave a comment on the issue explaining the blocker and reset the 30-day
|
||||
clock explicitly.
|
||||
|
||||
**A test may not stay in quarantine indefinitely without active attention.**
|
||||
|
||||
---
|
||||
|
||||
## Un-quarantining a test
|
||||
|
||||
When the underlying issue is resolved:
|
||||
|
||||
1. Remove `@pytest.mark.quarantine` from the test.
|
||||
2. Move the file back from `tests/quarantine/` to `tests/`.
|
||||
3. Run the full suite to confirm it passes consistently (at least 3 local runs).
|
||||
4. Close the tracking issue.
|
||||
5. Remove any entries from `.test-history.json`'s `quarantine_list`.
|
||||
|
||||
---
|
||||
|
||||
## Flake detector integration
|
||||
|
||||
The flake detector (`scripts/flake_detector.py`) is run after every CI test
|
||||
execution. It reads `.test-report.json` (produced by `pytest --json-report`)
|
||||
and updates `.test-history.json`.
|
||||
|
||||
**CI integration example (shell script or CI step):**
|
||||
|
||||
```bash
|
||||
pytest --json-report --json-report-file=.test-report.json
|
||||
python scripts/flake_detector.py
|
||||
```
|
||||
|
||||
If the flake detector exits non-zero, the CI step fails and the output lists
|
||||
the offending tests with their consistency percentages.
|
||||
|
||||
**Local usage:**
|
||||
|
||||
```bash
|
||||
# After running tests with JSON report:
|
||||
python scripts/flake_detector.py
|
||||
|
||||
# Just view current statistics without ingesting a new report:
|
||||
python scripts/flake_detector.py --no-update
|
||||
|
||||
# Lower threshold for local dev:
|
||||
python scripts/flake_detector.py --threshold 0.90
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
```
|
||||
Test fails intermittently
|
||||
│
|
||||
▼
|
||||
File [FLAKY] issue
|
||||
│
|
||||
▼
|
||||
git mv test → tests/quarantine/
|
||||
│
|
||||
▼
|
||||
Add @pytest.mark.quarantine(reason="#NNN")
|
||||
│
|
||||
▼
|
||||
Main CI green ✓
|
||||
│
|
||||
▼
|
||||
Fix the root cause (within 30 days)
|
||||
│
|
||||
▼
|
||||
git mv back → tests/
|
||||
Remove quarantine marker
|
||||
Close issue ✓
|
||||
```
|
||||
Reference in New Issue
Block a user