2026-04-04 15:21:29 -04:00
|
|
|
# Workflow Scorecard
|
2026-03-30 15:50:06 +00:00
|
|
|
|
2026-04-04 15:21:29 -04:00
|
|
|
Updated: April 4, 2026
|
2026-03-30 15:50:06 +00:00
|
|
|
|
2026-04-04 15:21:29 -04:00
|
|
|
The old overnight `uni-wizard` scorecard is no longer the primary operational metric.
|
|
|
|
|
The current scorecard should measure whether Timmy's real workflow is healthy.
|
2026-03-30 15:50:06 +00:00
|
|
|
|
2026-04-04 15:21:29 -04:00
|
|
|
## What To Score
|
2026-03-30 15:50:06 +00:00
|
|
|
|
2026-04-04 15:21:29 -04:00
|
|
|
### Queue Health
|
2026-03-30 15:50:06 +00:00
|
|
|
|
2026-04-04 15:21:29 -04:00
|
|
|
- unassigned issue count
|
|
|
|
|
- PRs waiting on Timmy or Allegro review
|
|
|
|
|
- overloaded assignees
|
|
|
|
|
- duplicate issue / duplicate PR pressure
|
2026-03-30 15:50:06 +00:00
|
|
|
|
2026-04-04 15:21:29 -04:00
|
|
|
### Runtime Health
|
2026-03-30 15:50:06 +00:00
|
|
|
|
2026-04-04 15:21:29 -04:00
|
|
|
- Hermes gateway reachable
|
|
|
|
|
- local provider responding
|
|
|
|
|
- latest heartbeat tick present
|
|
|
|
|
- model health reporting accurately
|
2026-03-30 15:50:06 +00:00
|
|
|
|
2026-04-04 15:21:29 -04:00
|
|
|
### Learning Loop Health
|
2026-03-30 15:50:06 +00:00
|
|
|
|
2026-04-04 15:21:29 -04:00
|
|
|
- archive checkpoint advancing
|
|
|
|
|
- notes and knowledge artifacts being emitted
|
|
|
|
|
- DPO files growing
|
|
|
|
|
- freshness lag between sessions and exports
|
2026-03-30 15:50:06 +00:00
|
|
|
|
2026-04-04 15:21:29 -04:00
|
|
|
## Suggested Daily Questions
|
2026-03-30 15:50:06 +00:00
|
|
|
|
2026-04-04 15:21:29 -04:00
|
|
|
1. Did review keep pace with execution today?
|
|
|
|
|
2. Did any builder receive work outside their lane?
|
|
|
|
|
3. Did Timmy spend time on judgment rather than routine queue cleanup?
|
|
|
|
|
4. Did the private learning pipeline produce usable artifacts?
|
|
|
|
|
5. Did any stale doc, helper, or default try to pull the system back into old habits?
|
2026-03-30 15:50:06 +00:00
|
|
|
|
2026-04-04 15:21:29 -04:00
|
|
|
## Useful Inputs
|
2026-03-30 15:50:06 +00:00
|
|
|
|
2026-04-04 15:21:29 -04:00
|
|
|
- `~/.timmy/heartbeat/ticks_YYYYMMDD.jsonl`
|
|
|
|
|
- `~/.timmy/metrics/local_YYYYMMDD.jsonl`
|
|
|
|
|
- `~/.timmy/twitter-archive/checkpoint.json`
|
|
|
|
|
- `~/.timmy/twitter-archive/metrics/progress.json`
|
|
|
|
|
- Gitea open PR queue
|
|
|
|
|
- Gitea unassigned issue queue
|
2026-03-30 15:50:06 +00:00
|
|
|
|
2026-04-04 15:21:29 -04:00
|
|
|
## Suggested Ratings
|
2026-03-30 15:50:06 +00:00
|
|
|
|
2026-04-04 15:21:29 -04:00
|
|
|
### Queue Discipline
|
2026-03-30 15:50:06 +00:00
|
|
|
|
2026-04-04 15:21:29 -04:00
|
|
|
- Strong: review and dispatch are keeping up, little duplicate churn
|
|
|
|
|
- Mixed: queue moves, but ambiguity or duplication is increasing
|
|
|
|
|
- Weak: review is backlogged or agents are being misrouted
|
2026-03-30 15:50:06 +00:00
|
|
|
|
2026-04-04 15:21:29 -04:00
|
|
|
### Runtime Reliability
|
2026-03-30 15:50:06 +00:00
|
|
|
|
2026-04-04 15:21:29 -04:00
|
|
|
- Strong: heartbeat, Hermes, and provider surfaces all healthy
|
|
|
|
|
- Mixed: intermittent downtime or weak health signals
|
|
|
|
|
- Weak: major surfaces untrusted or stale
|
2026-03-30 15:50:06 +00:00
|
|
|
|
2026-04-04 15:21:29 -04:00
|
|
|
### Learning Throughput
|
2026-03-30 15:50:06 +00:00
|
|
|
|
2026-04-04 15:21:29 -04:00
|
|
|
- Strong: checkpoint advances, DPO output accumulates, eval gates are visible
|
|
|
|
|
- Mixed: some artifacts land, but freshness or checkpointing lags
|
|
|
|
|
- Weak: sessions occur without export, or learning artifacts stall
|
2026-03-30 15:50:06 +00:00
|
|
|
|
2026-04-04 15:21:29 -04:00
|
|
|
## The Goal
|
2026-03-30 15:50:06 +00:00
|
|
|
|
2026-04-04 15:21:29 -04:00
|
|
|
The point of the scorecard is not to admire activity.
|
|
|
|
|
The point is to tell whether the system is becoming more reviewable, more sovereign, and more capable of learning from lived work.
|