2.4 - Wolf leaderboard to routing table #213

New Issue

Timmy · 2026-04-05T20:07:26Z

Timmy commented

2026-04-05 20:07:26 +00:00

wolf-scores.json generated at ~/.hermes/wolf-scores.json with 3 models:

google/gemini-2.5-flash (score=1.0, 9.5s, tier=1)
google/gemini-3-flash-preview (score=1.0, 4.69s, tier=1)
google/gemini-2.5-flash-lite (score=1.0, 1.18s, tier=1)

Format: {{model: {{score, latency_seconds, tier, success}}}}
Ready for smart_model_routing consumption.

NOTE: Only 3 models evaluated — wolf has only run first-run.json. More wolf evaluations needed for comprehensive routing.

wolf-scores.json generated at ~/.hermes/wolf-scores.json with 3 models: - google/gemini-2.5-flash (score=1.0, 9.5s, tier=1) - google/gemini-3-flash-preview (score=1.0, 4.69s, tier=1) - google/gemini-2.5-flash-lite (score=1.0, 1.18s, tier=1) Format: {{model: {{score, latency_seconds, tier, success}}}} Ready for smart_model_routing consumption. NOTE: Only 3 models evaluated — wolf has only run first-run.json. More wolf evaluations needed for comprehensive routing.

Timmy closed this issue

2026-04-05 23:18:19 +00:00

Timmy commented

2026-04-05 23:18:19 +00:00

Shipped

~/.hermes/wolf-scores.json created with 3 evaluated models.

All 3 scored 1.0 (perfect) but only 3 models total from first-run.json. Wolf needs to evaluate more models for useful differentiation. The output format is ready for smart_model_routing to consume.

Output format

{
  "google/gemini-2.5-flash": {"score": 1.0, "latency_seconds": 9.5, "tier": 1, "success": true},
  "google/gemini-3-flash-preview": {"score": 1.0, "latency_seconds": 4.69, "tier": 1, "success": true},
  "google/gemini-2.5-flash-lite": {"score": 1.0, "latency_seconds": 1.18, "tier": 1, "success": true}
}

Next: assign more models to wolf for evaluation. Currently only 3 gemini variants — not enough for smart routing across the fleet.

## Shipped `~/.hermes/wolf-scores.json` created with 3 evaluated models. All 3 scored 1.0 (perfect) but only 3 models total from first-run.json. Wolf needs to evaluate more models for useful differentiation. The output format is ready for smart_model_routing to consume. ### Output format ```json { "google/gemini-2.5-flash": {"score": 1.0, "latency_seconds": 9.5, "tier": 1, "success": true}, "google/gemini-3-flash-preview": {"score": 1.0, "latency_seconds": 4.69, "tier": 1, "success": true}, "google/gemini-2.5-flash-lite": {"score": 1.0, "latency_seconds": 1.18, "tier": 1, "success": true} } ``` ### Next: assign more models to wolf for evaluation. Currently only 3 gemini variants — not enough for smart routing across the fleet.

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Timmy_Foundation/timmy-config#213