Files
timmy-home/docs/SCORECARD.md
2026-04-04 15:21:29 -04:00

2.1 KiB

Workflow Scorecard

Updated: April 4, 2026

The old overnight uni-wizard scorecard is no longer the primary operational metric. The current scorecard should measure whether Timmy's real workflow is healthy.

What To Score

Queue Health

  • unassigned issue count
  • PRs waiting on Timmy or Allegro review
  • overloaded assignees
  • duplicate issue / duplicate PR pressure

Runtime Health

  • Hermes gateway reachable
  • local provider responding
  • latest heartbeat tick present
  • model health reporting accurately

Learning Loop Health

  • archive checkpoint advancing
  • notes and knowledge artifacts being emitted
  • DPO files growing
  • freshness lag between sessions and exports

Suggested Daily Questions

  1. Did review keep pace with execution today?
  2. Did any builder receive work outside their lane?
  3. Did Timmy spend time on judgment rather than routine queue cleanup?
  4. Did the private learning pipeline produce usable artifacts?
  5. Did any stale doc, helper, or default try to pull the system back into old habits?

Useful Inputs

  • ~/.timmy/heartbeat/ticks_YYYYMMDD.jsonl
  • ~/.timmy/metrics/local_YYYYMMDD.jsonl
  • ~/.timmy/twitter-archive/checkpoint.json
  • ~/.timmy/twitter-archive/metrics/progress.json
  • Gitea open PR queue
  • Gitea unassigned issue queue

Suggested Ratings

Queue Discipline

  • Strong: review and dispatch are keeping up, little duplicate churn
  • Mixed: queue moves, but ambiguity or duplication is increasing
  • Weak: review is backlogged or agents are being misrouted

Runtime Reliability

  • Strong: heartbeat, Hermes, and provider surfaces all healthy
  • Mixed: intermittent downtime or weak health signals
  • Weak: major surfaces untrusted or stale

Learning Throughput

  • Strong: checkpoint advances, DPO output accumulates, eval gates are visible
  • Mixed: some artifacts land, but freshness or checkpointing lags
  • Weak: sessions occur without export, or learning artifacts stall

The Goal

The point of the scorecard is not to admire activity. The point is to tell whether the system is becoming more reviewable, more sovereign, and more capable of learning from lived work.