hermes-agent

Timmy_Foundation/hermes-agent

Fork 0

Commit Graph

Author	SHA1	Message	Date
teknium	ba3fea24f1	Enhance TerminalBench 2 configuration and evaluation handling - Added task_timeout parameter to enforce a maximum wall-clock time for each task, automatically scoring as FAIL if exceeded. - Introduced terminal_timeout and tool_pool_size parameters to improve command execution and concurrency management. - Updated logging to provide detailed task execution times and timeout handling, enhancing overall monitoring. - Removed outdated evaluate_config.yaml file to streamline configuration management.	2026-02-10 22:53:24 +00:00
teknium	ad042fdd68	Update terminalbench_2 configuration for enhanced performance and evaluation - Increased max_token_length from 16000 to 32000 to allow for longer inputs. - Adjusted agent_temperature from 0.6 to 0.8 for more varied responses. - Extended test_timeout from 180 to 600 seconds to accommodate longer evaluations. - Updated data directory path for saving evaluations to ensure proper organization.	2026-02-10 19:48:41 +00:00
teknium	35ad3146a8	Add new environments and enhance tool context functionality - Introduced new environments: Terminal Test Environment and SWE Environment, each with default configurations for testing and software engineering tasks. - Added TerminalBench 2.0 evaluation environment with comprehensive setup for agentic LLMs, including task execution and verification. - Enhanced ToolContext with methods for uploading and downloading files, ensuring binary-safe operations. - Updated documentation across environments to reflect new features and usage instructions. - Refactored existing environment configurations for consistency and clarity.	2026-02-10 19:39:05 +00:00

Author

SHA1

Message

Date

teknium

ba3fea24f1

Enhance TerminalBench 2 configuration and evaluation handling

- Added task_timeout parameter to enforce a maximum wall-clock time for each task, automatically scoring as FAIL if exceeded.
- Introduced terminal_timeout and tool_pool_size parameters to improve command execution and concurrency management.
- Updated logging to provide detailed task execution times and timeout handling, enhancing overall monitoring.
- Removed outdated evaluate_config.yaml file to streamline configuration management.

2026-02-10 22:53:24 +00:00

teknium

ad042fdd68

Update terminalbench_2 configuration for enhanced performance and evaluation

- Increased max_token_length from 16000 to 32000 to allow for longer inputs.
- Adjusted agent_temperature from 0.6 to 0.8 for more varied responses.
- Extended test_timeout from 180 to 600 seconds to accommodate longer evaluations.
- Updated data directory path for saving evaluations to ensure proper organization.

2026-02-10 19:48:41 +00:00

teknium

35ad3146a8

Add new environments and enhance tool context functionality

- Introduced new environments: Terminal Test Environment and SWE Environment, each with default configurations for testing and software engineering tasks.
- Added TerminalBench 2.0 evaluation environment with comprehensive setup for agentic LLMs, including task execution and verification.
- Enhanced ToolContext with methods for uploading and downloading files, ensuring binary-safe operations.
- Updated documentation across environments to reflect new features and usage instructions.
- Refactored existing environment configurations for consistency and clarity.

2026-02-10 19:39:05 +00:00

3 Commits