Add cleanup functionality for orphaned sandboxes in TerminalBench2EvalEnv

- Implemented a cleanup process to terminate any remaining sandboxes after evaluation, addressing issues with orphaned thread pool workers.
- Enhanced logging to inform users about the cleanup process, ensuring better resource management and user awareness.
This commit is contained in:
teknium
2026-02-10 23:48:49 +00:00
parent 999a28062d
commit 85e629e915

View File

@@ -826,6 +826,13 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
except Exception as e:
print(f"Error logging evaluation results: {e}")
# Kill all remaining sandboxes. Timed-out tasks leave orphaned thread
# pool workers still executing commands -- cleanup_all stops them.
from tools.terminal_tool import cleanup_all_environments
print("\nCleaning up all sandboxes...")
cleanup_all_environments()
print("Done.")
# =========================================================================
# Wandb logging
# =========================================================================