hermes-agent

Author	SHA1	Message	Date
dmahan93	366de72a38	add a local vllm instance	2026-03-11 06:52:55 -07:00
dmahan93	13f5459670	fix: use ManagedServer for vLLM in TBLite eval + local_vllm config TBLite eval was bypassing ManagedServer and calling ServerManager directly, which uses /v1/chat/completions — not available on the atropos vllm_api_server (/generate only). Now uses _use_managed_server() to detect vLLM/SGLang backends and route through ManagedServer (Phase 2) with proper tool_parser and /generate endpoint. Falls back to Phase 1 for OpenAI endpoints. Also adds local_vllm.yaml config for running against a local vLLM server with Docker sandboxes.	2026-03-11 06:52:55 -07:00
dmahan93	ed27b826c5	feat: add eval_concurrency limit + Docker local config for TBLite - Add eval_concurrency config field with asyncio.Semaphore - Add local.yaml config using Docker backend (sandboxed, no cloud costs) - Register docker_image alongside modal_image for backend flexibility - Default: 8 parallel tasks for local runs	2026-03-11 06:52:26 -07:00
teknium1	ee7fde6531	feat: add OpenThoughts-TBLite evaluation script Introduced a new evaluation script for the OpenThoughts-TBLite environment, enabling users to run evaluations with customizable options. The script includes logging capabilities and real-time output, enhancing the evaluation process for terminal agents. This addition complements the existing benchmarking tools and improves usability for users.	2026-03-04 12:55:56 +00:00
teknium1	0ea6c34325	feat: add OpenThoughts-TBLite evaluation environment and configuration files Introduced a new evaluation environment for OpenThoughts-TBLite, including the main evaluation script, configuration YAML, and README documentation. This environment provides a faster alternative to Terminal-Bench 2.0, featuring 100 difficulty-calibrated tasks for terminal agents. The setup allows for easy evaluation and configuration, enhancing the benchmarking capabilities for terminal agents.	2026-03-04 11:42:41 +00:00

5 Commits