fix: limit concurrent Modal sandbox creations to avoid deadlocks

- Add max_concurrent_tasks config (default 8) with semaphore in TB2 eval
- Pass cwd: /app via register_task_env_overrides for TB2 tasks
- Add /home/ to host path prefixes as safety net for container backends

When all 86 TerminalBench2 tasks fire simultaneously, each creates a Modal sandbox
via asyncio.run() inside a thread pool worker. Modal's blocking calls deadlock
when too many are created at once. The semaphore ensures max 8 concurrent creations.

Co-Authored-By: hermes-agent[bot] <hermes-agent[bot]@users.noreply.github.com>
This commit is contained in:
Blake Johnson
2026-03-07 21:34:06 +00:00
parent 306d92a9d7
commit c6df39955c
4 changed files with 32 additions and 3 deletions

View File

@@ -137,6 +137,10 @@ class ModalEnvironment(BaseEnvironment):
def cleanup(self):
"""Snapshot the filesystem (if persistent) then stop the sandbox."""
# Check if _inner was ever set (init may have failed)
if not hasattr(self, '_inner') or self._inner is None:
return
if self._persistent:
try:
sandbox = getattr(self._inner, 'deployment', None)