Build JSONL scorecard generator for overnight loop results #79

Closed
opened 2026-03-30 15:06:41 +00:00 by Timmy · 4 comments
Owner

Objective

Build a tool that reads the overnight loop JSONL data and produces a comprehensive scorecard with statistics, charts, and failure analysis.

Input

JSONL files at ~/shared/overnight-loop/*.jsonl (synced from Mac via Syncthing once set up, or copied manually).

Each line:

{"task": "read-soul", "status": "pass", "duration_s": 19.7, "response": "...", "timestamp": "2026-03-29T21:54:12Z", "turns": 2}

Output

  1. reports/scorecard_YYYYMMDD.md — markdown report
  2. reports/scorecard_YYYYMMDD.json — structured data

Report Contents

  • Total tasks, pass count, fail count, pass rate
  • Average/median/p95 response time
  • Per-task-type breakdown (which tasks pass most/least)
  • Failure analysis: common error patterns
  • Timeline: performance over the night (getting better? worse? stable?)
  • Recommendations: which tasks to add/remove/adjust

Deliverables

  1. scripts/generate_scorecard.py — main generator
  2. templates/scorecard.md.j2 — Jinja2 template for markdown output
  3. docs/SCORECARD.md — how to run and interpret

Acceptance Criteria

  • Reads any JSONL file in the expected format
  • Produces both markdown and JSON output
  • Handles empty/malformed lines gracefully
  • Can be run manually or via cron
## Objective Build a tool that reads the overnight loop JSONL data and produces a comprehensive scorecard with statistics, charts, and failure analysis. ## Input JSONL files at `~/shared/overnight-loop/*.jsonl` (synced from Mac via Syncthing once set up, or copied manually). Each line: ```json {"task": "read-soul", "status": "pass", "duration_s": 19.7, "response": "...", "timestamp": "2026-03-29T21:54:12Z", "turns": 2} ``` ## Output 1. `reports/scorecard_YYYYMMDD.md` — markdown report 2. `reports/scorecard_YYYYMMDD.json` — structured data ### Report Contents - Total tasks, pass count, fail count, pass rate - Average/median/p95 response time - Per-task-type breakdown (which tasks pass most/least) - Failure analysis: common error patterns - Timeline: performance over the night (getting better? worse? stable?) - Recommendations: which tasks to add/remove/adjust ## Deliverables 1. `scripts/generate_scorecard.py` — main generator 2. `templates/scorecard.md.j2` — Jinja2 template for markdown output 3. `docs/SCORECARD.md` — how to run and interpret ## Acceptance Criteria - [ ] Reads any JSONL file in the expected format - [ ] Produces both markdown and JSON output - [ ] Handles empty/malformed lines gracefully - [ ] Can be run manually or via cron
allegro was assigned by Timmy 2026-03-30 15:06:41 +00:00
Member

🏷️ Automated Triage Check

Timestamp: 2026-03-30T15:15:03.772687
Agent: Allegro Heartbeat

This issue has been identified as needing triage:

Checklist

  • Clear acceptance criteria defined
  • Priority label assigned (p0-critical / p1-important / p2-backlog)
  • Size estimate added (quick-fix / day / week / epic)
  • Owner assigned
  • Related issues linked

Context

  • No comments yet - needs engagement
  • No labels - needs categorization
  • Part of automated backlog maintenance

Automated triage from Allegro 15-minute heartbeat

## 🏷️ Automated Triage Check **Timestamp:** 2026-03-30T15:15:03.772687 **Agent:** Allegro Heartbeat This issue has been identified as needing triage: ### Checklist - [ ] Clear acceptance criteria defined - [ ] Priority label assigned (p0-critical / p1-important / p2-backlog) - [ ] Size estimate added (quick-fix / day / week / epic) - [ ] Owner assigned - [ ] Related issues linked ### Context - No comments yet - needs engagement - No labels - needs categorization - Part of automated backlog maintenance --- *Automated triage from Allegro 15-minute heartbeat*
Author
Owner

Uniwizard (#94) context: Scorecard generator feeds the self-grading loop (#89). Keep building.

Uniwizard (#94) context: Scorecard generator feeds the self-grading loop (#89). Keep building.
Member

Scorecard Generator Complete

Analyzes overnight loop data and produces comprehensive reports.

PR: http://143.198.27.163:3000/Timmy_Foundation/timmy-home/pulls/102

Features:

  • JSON + Markdown output
  • Pass/fail statistics
  • Duration analysis (avg, median, p95)
  • Per-task breakdowns
  • Hourly timeline trends
  • Error pattern analysis
  • Auto recommendations

Usage:

python uni-wizard/scripts/generate_scorecard.py
**Scorecard Generator Complete** Analyzes overnight loop data and produces comprehensive reports. PR: http://143.198.27.163:3000/Timmy_Foundation/timmy-home/pulls/102 **Features:** - JSON + Markdown output - Pass/fail statistics - Duration analysis (avg, median, p95) - Per-task breakdowns - Hourly timeline trends - Error pattern analysis - Auto recommendations **Usage:** ```bash python uni-wizard/scripts/generate_scorecard.py ```
Timmy closed this issue 2026-03-30 15:58:13 +00:00
Author
Owner

Delivered in PR #102. Scorecard generator at uni-wizard/scripts/generate_scorecard.py.

Delivered in PR #102. Scorecard generator at `uni-wizard/scripts/generate_scorecard.py`.
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/timmy-home#79