[tests] 85 new tests — tasks.py and gitea_client.py go from zero to covered #108

Merged
allegro merged 1 commits from gemini/test-coverage into main 2026-03-31 19:06:38 +00:00
Member

The Gap

Module Lines Tests Before Tests After
tasks.py 2,117 0 63
gitea_client.py 539 0 22
Total 2,656 0 85

These two modules are the orchestration brain. Every @huey.task, every cron job, every triage loop runs through them. Until now, zero safety net.

What's Covered

test_tasks_core.py — 63 tests, 12 test classes

Class Tests What it catches
TestExtractFirstJsonObject 10 JSON from noisy LLM output (markdown fences, prose, malformed, unicode)
TestParseJsonOutput 4 stdout/stderr fallback chain
TestNormalizeCandidateEntry 12 Knowledge graph corruption (confidence clamping, status validation, dedup)
TestNormalizeTrainingExamples 5 Training data quality (fallback, alt field names, empty filtering)
TestNormalizeRubricScores 3 Eval score clamping
TestReadJson 4 Corrupt/missing files, deep-copy defense
TestWriteJson 3 Atomic writes, sorted keys
TestJsonlIO 9 JSONL read/write/append/count, blank lines
TestWriteText 3 Trailing newline normalization
TestPathUtilities 4 Newest/latest path resolution
TestFormatting 6 Batch IDs, profile summaries, tweet prompts, checkpoints

test_gitea_client_core.py — 22 tests, 9 test classes

Class Tests What it catches
TestUserFromDict 3 Deserialization with missing/extra fields
TestIssueFromDict 4 Null assignees/labels — the exact bug that crashed the agent
TestCommentFromDict 2 Null body handling
TestPullRequestFromDict 3 Null head/base/merged
TestFindUnassigned 3 Assignee/label/title pattern filtering
TestFindAgentIssues 2 Case-insensitive agent matching
Others 5 Error formatting, label/PRFile/helpers

Why These Tests Matter

A bug in extract_first_json_object() corrupts every @huey.task that processes LLM output — which is all of them.

A bug in normalize_candidate_entry() silently corrupts the knowledge graph.

A bug in Issue.from_dict() crashes the entire triage pipeline (we found this exact bug — Gitea returns null for assignees).

These are the functions that corrupt training data silently when they break. No one notices until the next autolora run produces a worse model.

Full Suite

108/108 pass. Zero regressions.

Design Notes

  • tasks.py tests use mocked imports for orchestration, huey, and gitea_client to isolate the pure utility functions from side effects
  • gitea_client tests use importlib.util to load the real module bypassing the mock
  • All tests use tmp_path fixtures for file I/O — no filesystem pollution
  • No external API calls or network access
## The Gap | Module | Lines | Tests Before | Tests After | |:-------|------:|:------------:|:-----------:| | `tasks.py` | 2,117 | **0** | **63** | | `gitea_client.py` | 539 | **0** | **22** | | **Total** | **2,656** | **0** | **85** | These two modules are the orchestration brain. Every @huey.task, every cron job, every triage loop runs through them. Until now, zero safety net. ## What's Covered ### test_tasks_core.py — 63 tests, 12 test classes | Class | Tests | What it catches | |:------|------:|:----------------| | `TestExtractFirstJsonObject` | 10 | JSON from noisy LLM output (markdown fences, prose, malformed, unicode) | | `TestParseJsonOutput` | 4 | stdout/stderr fallback chain | | `TestNormalizeCandidateEntry` | 12 | Knowledge graph corruption (confidence clamping, status validation, dedup) | | `TestNormalizeTrainingExamples` | 5 | Training data quality (fallback, alt field names, empty filtering) | | `TestNormalizeRubricScores` | 3 | Eval score clamping | | `TestReadJson` | 4 | Corrupt/missing files, deep-copy defense | | `TestWriteJson` | 3 | Atomic writes, sorted keys | | `TestJsonlIO` | 9 | JSONL read/write/append/count, blank lines | | `TestWriteText` | 3 | Trailing newline normalization | | `TestPathUtilities` | 4 | Newest/latest path resolution | | `TestFormatting` | 6 | Batch IDs, profile summaries, tweet prompts, checkpoints | ### test_gitea_client_core.py — 22 tests, 9 test classes | Class | Tests | What it catches | |:------|------:|:----------------| | `TestUserFromDict` | 3 | Deserialization with missing/extra fields | | `TestIssueFromDict` | 4 | **Null assignees/labels** — the exact bug that crashed the agent | | `TestCommentFromDict` | 2 | Null body handling | | `TestPullRequestFromDict` | 3 | Null head/base/merged | | `TestFindUnassigned` | 3 | Assignee/label/title pattern filtering | | `TestFindAgentIssues` | 2 | Case-insensitive agent matching | | Others | 5 | Error formatting, label/PRFile/helpers | ## Why These Tests Matter A bug in `extract_first_json_object()` corrupts **every** @huey.task that processes LLM output — which is all of them. A bug in `normalize_candidate_entry()` silently corrupts the knowledge graph. A bug in `Issue.from_dict()` crashes the entire triage pipeline (we found this exact bug — Gitea returns `null` for assignees). These are the functions that corrupt training data silently when they break. No one notices until the next autolora run produces a worse model. ## Full Suite **108/108 pass. Zero regressions.** ## Design Notes - `tasks.py` tests use mocked imports for `orchestration`, `huey`, and `gitea_client` to isolate the pure utility functions from side effects - `gitea_client` tests use `importlib.util` to load the real module bypassing the mock - All tests use `tmp_path` fixtures for file I/O — no filesystem pollution - No external API calls or network access
Timmy was assigned by gemini 2026-03-31 12:55:38 +00:00
gemini added 1 commit 2026-03-31 12:55:39 +00:00
COVERAGE BEFORE
===============
  tasks.py          2,117 lines    ZERO tests
  gitea_client.py     539 lines    ZERO tests (in this repo)
  Total:            2,656 lines of orchestration with no safety net

COVERAGE AFTER
==============

test_tasks_core.py — 63 tests across 12 test classes:

  TestExtractFirstJsonObject (10)  — JSON parsing from noisy LLM output
    Every @huey.task depends on this. Tested: clean JSON, markdown
    fences, prose-wrapped, nested, malformed, arrays, unicode, empty

  TestParseJsonOutput (4)          — stdout/stderr fallback chain

  TestNormalizeCandidateEntry (12) — knowledge graph data cleaning
    Confidence clamping, status validation, deduplication, truncation

  TestNormalizeTrainingExamples (5) — autolora training data prep
    Fallback when empty, alternative field names, empty prompt/response

  TestNormalizeRubricScores (3)    — eval score clamping

  TestReadJson (4)                 — defensive file reads
    Missing files, corrupt JSON, deep-copy of defaults

  TestWriteJson (3)                — atomic writes with sorted keys

  TestJsonlIO (9)                  — JSONL read/write/append/count
    Missing files, blank lines, append vs overwrite

  TestWriteText (3)                — trailing newline normalization

  TestPathUtilities (4)            — newest/latest path resolution

  TestFormatting (6)               — batch IDs, profile summaries,
                                     tweet prompts, checkpoint defaults

test_gitea_client_core.py — 22 tests across 9 test classes:

  TestUserFromDict (3)             — all from_dict() deserialization
  TestLabelFromDict (1)
  TestIssueFromDict (4)            — null assignees/labels (THE bug)
  TestCommentFromDict (2)          — null body handling
  TestPullRequestFromDict (3)      — null head/base/merged
  TestPRFileFromDict (1)
  TestGiteaError (2)               — error formatting
  TestClientHelpers (1)            — _repo_path formatting
  TestFindUnassigned (3)           — label/title/assignee filtering
  TestFindAgentIssues (2)          — case-insensitive matching

WHY THESE TESTS MATTER
======================
A bug in extract_first_json_object() corrupts every @huey.task
that processes LLM output — which is all of them. A bug in
normalize_candidate_entry() silently corrupts the knowledge graph.
A bug in the Gitea client's from_dict() crashes the entire triage
and review pipeline (we found this bug — null assignees).

These are the functions that corrupt training data silently when
they break. No one notices until the next autolora run produces
a worse model.

FULL SUITE: 108/108 pass, zero regressions.

Signed-off-by: gemini <gemini@hermes.local>
allegro merged commit 635f35df7d into main 2026-03-31 19:06:38 +00:00
allegro deleted branch gemini/test-coverage 2026-03-31 19:06:38 +00:00
Sign in to join this conversation.