Commit Graph

4 Commits

Author SHA1 Message Date
teknium1
55a21fe37b docs: add Environments, Benchmarks & Data Generation guide
Comprehensive developer guide covering:
- Architecture (BaseEnv → HermesAgentBaseEnv → concrete envs)
- All three benchmarks (TerminalBench2, TBLite, YC-Bench)
- Training environments (TerminalTestEnv, HermesSweEnv)
- Core components (AgentLoop, ToolContext, Tool Call Parsers)
- Two-phase operation (Phase 1 OpenAI, Phase 2 VLLM)
- Running environments (evaluate, process, serve modes)
- Creating new environments (training + eval-only)
- Configuration reference and prerequisites

Also updates environments/README.md directory tree to include
TBLite and YC-Bench benchmarks.
2026-03-06 23:31:45 -08:00
teknium1
8481fdcf08 docs: complete Daytona backend documentation coverage
Update all remaining files that enumerate terminal backends to include
Daytona. Covers security docs (bypass info, backend comparison table),
environment variables reference (DAYTONA_API_KEY, TERMINAL_DAYTONA_IMAGE,
container resources header), AGENTS.md (architecture tree, config keys),
environments/README.md, hermes_base_env.py field description, and various
module docstrings.

Follow-up to PR #451 merge.
2026-03-06 03:37:05 -08:00
teknium1
9018e9dd70 refactor: update tool registration and documentation
- Enhanced tool registration process by implementing a self-registering mechanism in each tool file via `tools/registry.py`.
- Updated `model_tools.py` to serve as a thin orchestration layer, simplifying tool discovery and registration.
- Revised documentation to clarify the steps for adding new tools, emphasizing the importance of schema, handler, and registration consistency.
- Improved dependency resolution in environments by ensuring toolsets are queried from `tools/registry.py`.
2026-02-21 21:03:40 -08:00
teknium
35ad3146a8 Add new environments and enhance tool context functionality
- Introduced new environments: Terminal Test Environment and SWE Environment, each with default configurations for testing and software engineering tasks.
- Added TerminalBench 2.0 evaluation environment with comprehensive setup for agentic LLMs, including task execution and verification.
- Enhanced ToolContext with methods for uploading and downloading files, ensuring binary-safe operations.
- Updated documentation across environments to reflect new features and usage instructions.
- Refactored existing environment configurations for consistency and clarity.
2026-02-10 19:39:05 +00:00