hermes-agent

Files

teknium1 b4fbb6fe10 feat: add YC-Bench long-horizon agent benchmark environment

Adds eval-only benchmark for YC-Bench (collinear-ai/yc-bench), a
deterministic long-horizon benchmark where the agent acts as CEO of an
AI startup over a simulated 1-3 year run.

Key design decisions verified against the official yc-bench repo:
- Uses 'sim init' (NOT 'yc-bench run') to avoid starting a competing
  built-in agent loop
- Correct DB table names: 'companies' and 'sim_events'
- Correct 4 domains: research, inference, data_environment, training
- Penalty values are preset-dependent (not hardcoded in system prompt)
- Sequential evaluation (each run is 100-500 turns)
- Follows TerminalBench2 patterns: KeyboardInterrupt handling,
  cleanup_all_environments(), tqdm logging handler, streaming JSONL

yc-bench added as optional dependency: pip install hermes-agent[yc-bench]

Closes #340

2026-03-06 19:25:56 -08:00

tblite

feat: add OpenThoughts-TBLite evaluation script

2026-03-04 12:55:56 +00:00

terminalbench_2

feat: add OpenThoughts-TBLite evaluation script

2026-03-04 12:55:56 +00:00

yc_bench

feat: add YC-Bench long-horizon agent benchmark environment

2026-03-06 19:25:56 -08:00

__init__.py

Add new environments and enhance tool context functionality

2026-02-10 19:39:05 +00:00