hermes-agent/environments/web_research_env.py at b76cae94d440ac3cbd494402e43d0130ef8b94de

Files

teknium1 975fd86dc4 fix: eliminate double LLM judge call and eval buffer pollution

evaluate() was calling _llm_judge twice per item (once via
compute_reward, once directly) — double the API cost for no benefit.
Now extracts correctness from compute_reward's buffer instead.

Also: compute_reward appends to training metric buffers during eval,
which would pollute wandb training charts. Now rolls back buffer
entries added during eval so training metrics stay clean.

2026-03-09 20:57:46 -07:00

28 KiB

Raw Blame History

View Raw

28 KiB Raw Blame History

28 KiB

Raw Blame History