Files
hermes-agent/environments/benchmarks
teknium 1b7bc299f3 Enhance TerminalBench2 environment with task filtering due to incompat with modal and logging improvements
- Updated task filter descriptions for clarity and added a new skip task feature to exclude incompatible tasks.
- Introduced a set of modal incompatible tasks to prevent execution errors in cloud environments.
- Implemented streaming JSONL logging for task results, preserving data even on interruptions.
- Refactored task evaluation logic to include skipped task reporting and improved error handling.
2026-02-12 05:36:45 +00:00
..