[tests] 85 new tests — tasks.py and gitea_client.py go from zero to covered #108

gemini · 2026-03-31T12:55:38Z

gemini commented

2026-03-31 12:55:38 +00:00

The Gap

Module	Lines	Tests After
`tasks.py`	2,117	63
`gitea_client.py`	539	22
Total	2,656	85

These two modules are the orchestration brain. Every @huey.task, every cron job, every triage loop runs through them. Until now, zero safety net.

What's Covered

test_tasks_core.py — 63 tests, 12 test classes

Class	Tests	What it catches
`TestExtractFirstJsonObject`	10	JSON from noisy LLM output (markdown fences, prose, malformed, unicode)
`TestParseJsonOutput`	4	stdout/stderr fallback chain
`TestNormalizeCandidateEntry`	12	Knowledge graph corruption (confidence clamping, status validation, dedup)
`TestNormalizeTrainingExamples`	5	Training data quality (fallback, alt field names, empty filtering)
`TestNormalizeRubricScores`	3	Eval score clamping
`TestReadJson`	4	Corrupt/missing files, deep-copy defense
`TestWriteJson`	3	Atomic writes, sorted keys
`TestJsonlIO`	9	JSONL read/write/append/count, blank lines
`TestWriteText`	3	Trailing newline normalization
`TestPathUtilities`	4	Newest/latest path resolution
`TestFormatting`	6	Batch IDs, profile summaries, tweet prompts, checkpoints

test_gitea_client_core.py — 22 tests, 9 test classes

Class	Tests	What it catches
`TestUserFromDict`	3	Deserialization with missing/extra fields
`TestIssueFromDict`	4	Null assignees/labels — the exact bug that crashed the agent
`TestCommentFromDict`	2	Null body handling
`TestPullRequestFromDict`	3	Null head/base/merged
`TestFindUnassigned`	3	Assignee/label/title pattern filtering
`TestFindAgentIssues`	2	Case-insensitive agent matching
Others	5	Error formatting, label/PRFile/helpers

Why These Tests Matter

A bug in extract_first_json_object() corrupts every @huey.task that processes LLM output — which is all of them.

A bug in normalize_candidate_entry() silently corrupts the knowledge graph.

A bug in Issue.from_dict() crashes the entire triage pipeline (we found this exact bug — Gitea returns null for assignees).

These are the functions that corrupt training data silently when they break. No one notices until the next autolora run produces a worse model.

Full Suite

108/108 pass. Zero regressions.

Design Notes

tasks.py tests use mocked imports for orchestration, huey, and gitea_client to isolate the pure utility functions from side effects
gitea_client tests use importlib.util to load the real module bypassing the mock
All tests use tmp_path fixtures for file I/O — no filesystem pollution
No external API calls or network access

## The Gap | Module | Lines | Tests Before | Tests After | |:-------|------:|:------------:|:-----------:| | `tasks.py` | 2,117 | **0** | **63** | | `gitea_client.py` | 539 | **0** | **22** | | **Total** | **2,656** | **0** | **85** | These two modules are the orchestration brain. Every @huey.task, every cron job, every triage loop runs through them. Until now, zero safety net. ## What's Covered ### test_tasks_core.py — 63 tests, 12 test classes | Class | Tests | What it catches | |:------|------:|:----------------| | `TestExtractFirstJsonObject` | 10 | JSON from noisy LLM output (markdown fences, prose, malformed, unicode) | | `TestParseJsonOutput` | 4 | stdout/stderr fallback chain | | `TestNormalizeCandidateEntry` | 12 | Knowledge graph corruption (confidence clamping, status validation, dedup) | | `TestNormalizeTrainingExamples` | 5 | Training data quality (fallback, alt field names, empty filtering) | | `TestNormalizeRubricScores` | 3 | Eval score clamping | | `TestReadJson` | 4 | Corrupt/missing files, deep-copy defense | | `TestWriteJson` | 3 | Atomic writes, sorted keys | | `TestJsonlIO` | 9 | JSONL read/write/append/count, blank lines | | `TestWriteText` | 3 | Trailing newline normalization | | `TestPathUtilities` | 4 | Newest/latest path resolution | | `TestFormatting` | 6 | Batch IDs, profile summaries, tweet prompts, checkpoints | ### test_gitea_client_core.py — 22 tests, 9 test classes | Class | Tests | What it catches | |:------|------:|:----------------| | `TestUserFromDict` | 3 | Deserialization with missing/extra fields | | `TestIssueFromDict` | 4 | **Null assignees/labels** — the exact bug that crashed the agent | | `TestCommentFromDict` | 2 | Null body handling | | `TestPullRequestFromDict` | 3 | Null head/base/merged | | `TestFindUnassigned` | 3 | Assignee/label/title pattern filtering | | `TestFindAgentIssues` | 2 | Case-insensitive agent matching | | Others | 5 | Error formatting, label/PRFile/helpers | ## Why These Tests Matter A bug in `extract_first_json_object()` corrupts **every** @huey.task that processes LLM output — which is all of them. A bug in `normalize_candidate_entry()` silently corrupts the knowledge graph. A bug in `Issue.from_dict()` crashes the entire triage pipeline (we found this exact bug — Gitea returns `null` for assignees). These are the functions that corrupt training data silently when they break. No one notices until the next autolora run produces a worse model. ## Full Suite **108/108 pass. Zero regressions.** ## Design Notes - `tasks.py` tests use mocked imports for `orchestration`, `huey`, and `gitea_client` to isolate the pure utility functions from side effects - `gitea_client` tests use `importlib.util` to load the real module bypassing the mock - All tests use `tmp_path` fixtures for file I/O — no filesystem pollution - No external API calls or network access

Timmy was assigned by gemini

2026-03-31 12:55:38 +00:00

gemini added 1 commit 2026-03-31 12:55:39 +00:00

[tests] 85 new tests for tasks.py and gitea_client.py — zero to covered eb1e384edc

COVERAGE BEFORE
===============
  tasks.py          2,117 lines    ZERO tests
  gitea_client.py     539 lines    ZERO tests (in this repo)
  Total:            2,656 lines of orchestration with no safety net

COVERAGE AFTER
==============

test_tasks_core.py — 63 tests across 12 test classes:

  TestExtractFirstJsonObject (10)  — JSON parsing from noisy LLM output
    Every @huey.task depends on this. Tested: clean JSON, markdown
    fences, prose-wrapped, nested, malformed, arrays, unicode, empty

  TestParseJsonOutput (4)          — stdout/stderr fallback chain

  TestNormalizeCandidateEntry (12) — knowledge graph data cleaning
    Confidence clamping, status validation, deduplication, truncation

  TestNormalizeTrainingExamples (5) — autolora training data prep
    Fallback when empty, alternative field names, empty prompt/response

  TestNormalizeRubricScores (3)    — eval score clamping

  TestReadJson (4)                 — defensive file reads
    Missing files, corrupt JSON, deep-copy of defaults

  TestWriteJson (3)                — atomic writes with sorted keys

  TestJsonlIO (9)                  — JSONL read/write/append/count
    Missing files, blank lines, append vs overwrite

  TestWriteText (3)                — trailing newline normalization

  TestPathUtilities (4)            — newest/latest path resolution

  TestFormatting (6)               — batch IDs, profile summaries,
                                     tweet prompts, checkpoint defaults

test_gitea_client_core.py — 22 tests across 9 test classes:

  TestUserFromDict (3)             — all from_dict() deserialization
  TestLabelFromDict (1)
  TestIssueFromDict (4)            — null assignees/labels (THE bug)
  TestCommentFromDict (2)          — null body handling
  TestPullRequestFromDict (3)      — null head/base/merged
  TestPRFileFromDict (1)
  TestGiteaError (2)               — error formatting
  TestClientHelpers (1)            — _repo_path formatting
  TestFindUnassigned (3)           — label/title/assignee filtering
  TestFindAgentIssues (2)          — case-insensitive matching

WHY THESE TESTS MATTER
======================
A bug in extract_first_json_object() corrupts every @huey.task
that processes LLM output — which is all of them. A bug in
normalize_candidate_entry() silently corrupts the knowledge graph.
A bug in the Gitea client's from_dict() crashes the entire triage
and review pipeline (we found this bug — null assignees).

These are the functions that corrupt training data silently when
they break. No one notices until the next autolora run produces
a worse model.

FULL SUITE: 108/108 pass, zero regressions.

Signed-off-by: gemini <gemini@hermes.local>

allegro merged commit 635f35df7d into main

2026-03-31 19:06:38 +00:00

allegro deleted branch gemini/test-coverage

2026-03-31 19:06:38 +00:00

allegro referenced this issue from a commit

2026-03-31 19:06:39 +00:00

Merge pull request '[tests] 85 new tests — tasks.py and gitea_client.py go from zero to covered' (#108) from gemini/test-coverage into main

Timmy referenced this pull request

2026-04-02 10:04:24 +00:00

[MORNING REPORT] Fleet Production Report -- Due 7 AM #110

Timmy referenced this pull request

2026-04-06 17:11:41 +00:00