hermes-agent

Timmy_Foundation/hermes-agent

Fork 0

Commit Graph

Author	SHA1	Message	Date
Alexander Whitestone	f8f4678ee4	feat: benchmark local Ollama models against 50 tok/s threshold (#287 ) Some checks failed Forge CI / smoke-and-build (pull_request) Failing after 1m24s Details Add scripts/benchmark_local_models.py — tests all local Ollama models against the 50 tok/s UX threshold (configurable via --threshold). Features: - Auto-discovers all pulled Ollama models or test specific ones - Configurable rounds, max tokens, threshold - Per-round timing with prompt_eval/eval token breakdown - Human-readable table report with PASS/FAIL/ERROR status - JSON output mode (--json) for CI integration - Exit code 1 if any model fails threshold Usage: python3 scripts/benchmark_local_models.py # all models, 3 rounds python3 scripts/benchmark_local_models.py --models qwen2.5:7b # single model python3 scripts/benchmark_local_models.py --json # CI output python3 scripts/benchmark_local_models.py --threshold 30 # custom threshold Tested: gemma3:1b scores 141.8 tok/s (PASS). Closes #287	2026-04-13 17:46:53 -04:00

Author

SHA1

Message

Date

Alexander Whitestone

f8f4678ee4

feat: benchmark local Ollama models against 50 tok/s threshold (#287 )

Forge CI / smoke-and-build (pull_request) Failing after 1m24s

Details

Add scripts/benchmark_local_models.py — tests all local Ollama models
against the 50 tok/s UX threshold (configurable via --threshold).

Features:
- Auto-discovers all pulled Ollama models or test specific ones
- Configurable rounds, max tokens, threshold
- Per-round timing with prompt_eval/eval token breakdown
- Human-readable table report with PASS/FAIL/ERROR status
- JSON output mode (--json) for CI integration
- Exit code 1 if any model fails threshold

Usage:
  python3 scripts/benchmark_local_models.py                 # all models, 3 rounds
  python3 scripts/benchmark_local_models.py --models qwen2.5:7b  # single model
  python3 scripts/benchmark_local_models.py --json          # CI output
  python3 scripts/benchmark_local_models.py --threshold 30  # custom threshold

Tested: gemma3:1b scores 141.8 tok/s (PASS).

Closes #287

2026-04-13 17:46:53 -04:00

1 Commits