2.1 KiB
2.1 KiB
Workflow Scorecard
Updated: April 4, 2026
The old overnight uni-wizard scorecard is no longer the primary operational metric.
The current scorecard should measure whether Timmy's real workflow is healthy.
What To Score
Queue Health
- unassigned issue count
- PRs waiting on Timmy or Allegro review
- overloaded assignees
- duplicate issue / duplicate PR pressure
Runtime Health
- Hermes gateway reachable
- local provider responding
- latest heartbeat tick present
- model health reporting accurately
Learning Loop Health
- archive checkpoint advancing
- notes and knowledge artifacts being emitted
- DPO files growing
- freshness lag between sessions and exports
Suggested Daily Questions
- Did review keep pace with execution today?
- Did any builder receive work outside their lane?
- Did Timmy spend time on judgment rather than routine queue cleanup?
- Did the private learning pipeline produce usable artifacts?
- Did any stale doc, helper, or default try to pull the system back into old habits?
Useful Inputs
~/.timmy/heartbeat/ticks_YYYYMMDD.jsonl~/.timmy/metrics/local_YYYYMMDD.jsonl~/.timmy/twitter-archive/checkpoint.json~/.timmy/twitter-archive/metrics/progress.json- Gitea open PR queue
- Gitea unassigned issue queue
Suggested Ratings
Queue Discipline
- Strong: review and dispatch are keeping up, little duplicate churn
- Mixed: queue moves, but ambiguity or duplication is increasing
- Weak: review is backlogged or agents are being misrouted
Runtime Reliability
- Strong: heartbeat, Hermes, and provider surfaces all healthy
- Mixed: intermittent downtime or weak health signals
- Weak: major surfaces untrusted or stale
Learning Throughput
- Strong: checkpoint advances, DPO output accumulates, eval gates are visible
- Mixed: some artifacts land, but freshness or checkpointing lags
- Weak: sessions occur without export, or learning artifacts stall
The Goal
The point of the scorecard is not to admire activity. The point is to tell whether the system is becoming more reviewable, more sovereign, and more capable of learning from lived work.