2026-02-21 22:31:43 -08:00
|
|
|
"""Singularity/Apptainer persistent container environment.
|
|
|
|
|
|
2026-02-23 02:11:33 -08:00
|
|
|
Security-hardened with --containall, --no-home, capability dropping.
|
|
|
|
|
Supports configurable resource limits and optional filesystem persistence
|
|
|
|
|
via writable overlay directories that survive across sessions.
|
2026-02-21 22:31:43 -08:00
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
import logging
|
|
|
|
|
import os
|
|
|
|
|
import shutil
|
|
|
|
|
import subprocess
|
|
|
|
|
import threading
|
|
|
|
|
import uuid
|
|
|
|
|
from pathlib import Path
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
from typing import Optional
|
2026-02-21 22:31:43 -08:00
|
|
|
|
2026-03-31 08:48:54 +09:00
|
|
|
from hermes_constants import get_hermes_home
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
from tools.environments.base import (
|
|
|
|
|
BaseEnvironment,
|
|
|
|
|
_load_json_store,
|
|
|
|
|
_popen_bash,
|
|
|
|
|
_save_json_store,
|
|
|
|
|
)
|
2026-02-21 22:31:43 -08:00
|
|
|
|
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
fix(cli): respect HERMES_HOME in all remaining hardcoded ~/.hermes paths
Several files resolved paths via Path.home() / ".hermes" or
os.path.expanduser("~/.hermes/..."), bypassing the HERMES_HOME
environment variable. This broke isolation when running multiple
Hermes instances with distinct HERMES_HOME directories.
Replace all hardcoded paths with calls to get_hermes_home() from
hermes_cli.config, consistent with the rest of the codebase.
Files fixed:
- tools/process_registry.py (processes.json)
- gateway/pairing.py (pairing/)
- gateway/sticker_cache.py (sticker_cache.json)
- gateway/channel_directory.py (channel_directory.json, sessions.json)
- gateway/config.py (gateway.json, config.yaml, sessions_dir)
- gateway/mirror.py (sessions/)
- gateway/hooks.py (hooks/)
- gateway/platforms/base.py (image_cache/, audio_cache/, document_cache/)
- gateway/platforms/whatsapp.py (whatsapp/session)
- gateway/delivery.py (cron/output)
- agent/auxiliary_client.py (auth.json)
- agent/prompt_builder.py (SOUL.md)
- cli.py (config.yaml, images/, pastes/, history)
- run_agent.py (logs/)
- tools/environments/base.py (sandboxes/)
- tools/environments/modal.py (modal_snapshots.json)
- tools/environments/singularity.py (singularity_snapshots.json)
- tools/tts_tool.py (audio_cache)
- hermes_cli/status.py (cron/jobs.json, sessions.json)
- hermes_cli/gateway.py (logs/, whatsapp session)
- hermes_cli/main.py (whatsapp/session)
Tests updated to use HERMES_HOME env var instead of patching Path.home().
Closes #892
(cherry picked from commit 78ac1bba43b8b74a934c6172f2c29bb4d03164b9)
2026-03-11 07:31:41 +01:00
|
|
|
_SNAPSHOT_STORE = get_hermes_home() / "singularity_snapshots.json"
|
2026-02-23 02:11:33 -08:00
|
|
|
|
|
|
|
|
|
2026-03-16 18:25:20 +03:00
|
|
|
def _find_singularity_executable() -> str:
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
"""Locate the apptainer or singularity CLI binary."""
|
2026-03-16 18:25:20 +03:00
|
|
|
if shutil.which("apptainer"):
|
|
|
|
|
return "apptainer"
|
|
|
|
|
if shutil.which("singularity"):
|
|
|
|
|
return "singularity"
|
|
|
|
|
raise RuntimeError(
|
|
|
|
|
"Neither 'apptainer' nor 'singularity' was found in PATH. "
|
|
|
|
|
"Install Apptainer (https://apptainer.org/docs/admin/main/installation.html) "
|
|
|
|
|
"or Singularity and ensure the CLI is available."
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _ensure_singularity_available() -> str:
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
"""Preflight check: resolve the executable and verify it responds."""
|
2026-03-16 18:25:20 +03:00
|
|
|
exe = _find_singularity_executable()
|
|
|
|
|
try:
|
|
|
|
|
result = subprocess.run(
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
[exe, "version"], capture_output=True, text=True, timeout=10,
|
2026-03-16 18:25:20 +03:00
|
|
|
)
|
|
|
|
|
except FileNotFoundError:
|
|
|
|
|
raise RuntimeError(
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
f"Singularity backend selected but '{exe}' could not be executed."
|
2026-03-16 18:25:20 +03:00
|
|
|
)
|
|
|
|
|
except subprocess.TimeoutExpired:
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
raise RuntimeError(f"'{exe} version' timed out.")
|
2026-03-16 18:25:20 +03:00
|
|
|
|
|
|
|
|
if result.returncode != 0:
|
|
|
|
|
stderr = result.stderr.strip()[:200]
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
raise RuntimeError(f"'{exe} version' failed (exit code {result.returncode}): {stderr}")
|
2026-03-16 18:25:20 +03:00
|
|
|
return exe
|
|
|
|
|
|
|
|
|
|
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
def _load_snapshots() -> dict:
|
|
|
|
|
return _load_json_store(_SNAPSHOT_STORE)
|
2026-02-23 02:11:33 -08:00
|
|
|
|
|
|
|
|
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
def _save_snapshots(data: dict) -> None:
|
|
|
|
|
_save_json_store(_SNAPSHOT_STORE, data)
|
2026-02-21 22:31:43 -08:00
|
|
|
|
|
|
|
|
|
|
|
|
|
def _get_scratch_dir() -> Path:
|
|
|
|
|
custom_scratch = os.getenv("TERMINAL_SCRATCH_DIR")
|
|
|
|
|
if custom_scratch:
|
|
|
|
|
scratch_path = Path(custom_scratch)
|
|
|
|
|
scratch_path.mkdir(parents=True, exist_ok=True)
|
|
|
|
|
return scratch_path
|
|
|
|
|
|
2026-02-23 21:15:35 -08:00
|
|
|
from tools.environments.base import get_sandbox_dir
|
|
|
|
|
sandbox = get_sandbox_dir() / "singularity"
|
|
|
|
|
|
2026-02-21 22:31:43 -08:00
|
|
|
scratch = Path("/scratch")
|
|
|
|
|
if scratch.exists() and os.access(scratch, os.W_OK):
|
|
|
|
|
user_scratch = scratch / os.getenv("USER", "hermes") / "hermes-agent"
|
|
|
|
|
user_scratch.mkdir(parents=True, exist_ok=True)
|
|
|
|
|
logger.info("Using /scratch for sandboxes: %s", user_scratch)
|
|
|
|
|
return user_scratch
|
|
|
|
|
|
2026-02-23 21:15:35 -08:00
|
|
|
sandbox.mkdir(parents=True, exist_ok=True)
|
|
|
|
|
return sandbox
|
2026-02-21 22:31:43 -08:00
|
|
|
|
|
|
|
|
|
|
|
|
|
def _get_apptainer_cache_dir() -> Path:
|
|
|
|
|
cache_dir = os.getenv("APPTAINER_CACHEDIR")
|
|
|
|
|
if cache_dir:
|
|
|
|
|
cache_path = Path(cache_dir)
|
|
|
|
|
cache_path.mkdir(parents=True, exist_ok=True)
|
|
|
|
|
return cache_path
|
|
|
|
|
scratch = _get_scratch_dir()
|
|
|
|
|
cache_path = scratch / ".apptainer"
|
|
|
|
|
cache_path.mkdir(parents=True, exist_ok=True)
|
|
|
|
|
return cache_path
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
_sif_build_lock = threading.Lock()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _get_or_build_sif(image: str, executable: str = "apptainer") -> str:
|
|
|
|
|
if image.endswith('.sif') and Path(image).exists():
|
|
|
|
|
return image
|
|
|
|
|
if not image.startswith('docker://'):
|
|
|
|
|
return image
|
|
|
|
|
|
|
|
|
|
image_name = image.replace('docker://', '').replace('/', '-').replace(':', '-')
|
|
|
|
|
cache_dir = _get_apptainer_cache_dir()
|
|
|
|
|
sif_path = cache_dir / f"{image_name}.sif"
|
|
|
|
|
|
|
|
|
|
if sif_path.exists():
|
|
|
|
|
return str(sif_path)
|
|
|
|
|
|
|
|
|
|
with _sif_build_lock:
|
|
|
|
|
if sif_path.exists():
|
|
|
|
|
return str(sif_path)
|
|
|
|
|
|
|
|
|
|
logger.info("Building SIF image (one-time setup)...")
|
|
|
|
|
logger.info(" Source: %s", image)
|
|
|
|
|
logger.info(" Target: %s", sif_path)
|
|
|
|
|
|
|
|
|
|
tmp_dir = cache_dir / "tmp"
|
|
|
|
|
tmp_dir.mkdir(parents=True, exist_ok=True)
|
|
|
|
|
|
|
|
|
|
env = os.environ.copy()
|
|
|
|
|
env["APPTAINER_TMPDIR"] = str(tmp_dir)
|
|
|
|
|
env["APPTAINER_CACHEDIR"] = str(cache_dir)
|
|
|
|
|
|
|
|
|
|
try:
|
|
|
|
|
result = subprocess.run(
|
|
|
|
|
[executable, "build", str(sif_path), image],
|
|
|
|
|
capture_output=True, text=True, timeout=600, env=env,
|
|
|
|
|
)
|
|
|
|
|
if result.returncode != 0:
|
|
|
|
|
logger.warning("SIF build failed, falling back to docker:// URL")
|
|
|
|
|
logger.warning(" Error: %s", result.stderr[:500])
|
|
|
|
|
return image
|
|
|
|
|
logger.info("SIF image built successfully")
|
|
|
|
|
return str(sif_path)
|
|
|
|
|
except subprocess.TimeoutExpired:
|
|
|
|
|
logger.warning("SIF build timed out, falling back to docker:// URL")
|
|
|
|
|
if sif_path.exists():
|
|
|
|
|
sif_path.unlink()
|
|
|
|
|
return image
|
|
|
|
|
except Exception as e:
|
|
|
|
|
logger.warning("SIF build error: %s, falling back to docker:// URL", e)
|
|
|
|
|
return image
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
class SingularityEnvironment(BaseEnvironment):
|
2026-02-23 02:11:33 -08:00
|
|
|
"""Hardened Singularity/Apptainer container with resource limits and persistence.
|
|
|
|
|
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
Spawn-per-call: every execute() spawns a fresh ``apptainer exec ... bash -c`` process.
|
|
|
|
|
Session snapshot preserves env vars across calls.
|
|
|
|
|
CWD persists via in-band stdout markers.
|
2026-02-21 22:31:43 -08:00
|
|
|
"""
|
|
|
|
|
|
2026-02-23 02:11:33 -08:00
|
|
|
def __init__(
|
|
|
|
|
self,
|
|
|
|
|
image: str,
|
2026-02-23 21:15:35 -08:00
|
|
|
cwd: str = "~",
|
2026-02-23 02:11:33 -08:00
|
|
|
timeout: int = 60,
|
|
|
|
|
cpu: float = 0,
|
|
|
|
|
memory: int = 0,
|
|
|
|
|
disk: int = 0,
|
|
|
|
|
persistent_filesystem: bool = False,
|
|
|
|
|
task_id: str = "default",
|
|
|
|
|
):
|
2026-02-21 22:31:43 -08:00
|
|
|
super().__init__(cwd=cwd, timeout=timeout)
|
2026-03-16 18:25:20 +03:00
|
|
|
self.executable = _ensure_singularity_available()
|
2026-02-21 22:31:43 -08:00
|
|
|
self.image = _get_or_build_sif(image, self.executable)
|
|
|
|
|
self.instance_id = f"hermes_{uuid.uuid4().hex[:12]}"
|
|
|
|
|
self._instance_started = False
|
2026-02-23 02:11:33 -08:00
|
|
|
self._persistent = persistent_filesystem
|
|
|
|
|
self._task_id = task_id
|
|
|
|
|
self._overlay_dir: Optional[Path] = None
|
|
|
|
|
self._cpu = cpu
|
|
|
|
|
self._memory = memory
|
|
|
|
|
|
|
|
|
|
if self._persistent:
|
|
|
|
|
overlay_base = _get_scratch_dir() / "hermes-overlays"
|
|
|
|
|
overlay_base.mkdir(parents=True, exist_ok=True)
|
|
|
|
|
self._overlay_dir = overlay_base / f"overlay-{task_id}"
|
|
|
|
|
self._overlay_dir.mkdir(parents=True, exist_ok=True)
|
|
|
|
|
|
2026-02-21 22:31:43 -08:00
|
|
|
self._start_instance()
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
self.init_session()
|
2026-02-21 22:31:43 -08:00
|
|
|
|
|
|
|
|
def _start_instance(self):
|
2026-02-23 02:11:33 -08:00
|
|
|
cmd = [self.executable, "instance", "start"]
|
|
|
|
|
cmd.extend(["--containall", "--no-home"])
|
|
|
|
|
|
|
|
|
|
if self._persistent and self._overlay_dir:
|
|
|
|
|
cmd.extend(["--overlay", str(self._overlay_dir)])
|
|
|
|
|
else:
|
|
|
|
|
cmd.append("--writable-tmpfs")
|
|
|
|
|
|
feat: mount skills directory into all remote backends with live sync (#3890)
Skills with scripts/, templates/, and references/ subdirectories need
those files available inside sandboxed execution environments. Previously
the skills directory was missing entirely from remote backends.
Live sync — files stay current as credentials refresh and skills update:
- Docker/Singularity: bind mounts are inherently live (host changes
visible immediately)
- Modal: _sync_files() runs before each command with mtime+size caching,
pushing only changed credential and skill files (~13μs no-op overhead)
- SSH: rsync --safe-links before each command (naturally incremental)
- Daytona: _upload_if_changed() with mtime+size caching before each command
Security — symlink filtering:
- Docker/Singularity: sanitized temp copy when symlinks detected
- Modal/Daytona: iter_skills_files() skips symlinks
- SSH: rsync --safe-links skips symlinks pointing outside source tree
- Temp dir cleanup via atexit + reuse across calls
Non-root user support:
- SSH: detects remote home via echo $HOME, syncs to $HOME/.hermes/
- Daytona: detects sandbox home before sync, uploads to $HOME/.hermes/
- Docker/Modal/Singularity: run as root, /root/.hermes/ is correct
Also:
- credential_files.py: fix name/path key fallback in required_credential_files
- Singularity, SSH, Daytona: gained credential file support
- 14 tests covering symlink filtering, name/path fallback, iter_skills_files
2026-03-30 02:45:41 -07:00
|
|
|
try:
|
|
|
|
|
from tools.credential_files import get_credential_file_mounts, get_skills_directory_mount
|
|
|
|
|
for mount_entry in get_credential_file_mounts():
|
|
|
|
|
cmd.extend(["--bind", f"{mount_entry['host_path']}:{mount_entry['container_path']}:ro"])
|
2026-04-03 21:14:34 -07:00
|
|
|
for skills_mount in get_skills_directory_mount():
|
feat: mount skills directory into all remote backends with live sync (#3890)
Skills with scripts/, templates/, and references/ subdirectories need
those files available inside sandboxed execution environments. Previously
the skills directory was missing entirely from remote backends.
Live sync — files stay current as credentials refresh and skills update:
- Docker/Singularity: bind mounts are inherently live (host changes
visible immediately)
- Modal: _sync_files() runs before each command with mtime+size caching,
pushing only changed credential and skill files (~13μs no-op overhead)
- SSH: rsync --safe-links before each command (naturally incremental)
- Daytona: _upload_if_changed() with mtime+size caching before each command
Security — symlink filtering:
- Docker/Singularity: sanitized temp copy when symlinks detected
- Modal/Daytona: iter_skills_files() skips symlinks
- SSH: rsync --safe-links skips symlinks pointing outside source tree
- Temp dir cleanup via atexit + reuse across calls
Non-root user support:
- SSH: detects remote home via echo $HOME, syncs to $HOME/.hermes/
- Daytona: detects sandbox home before sync, uploads to $HOME/.hermes/
- Docker/Modal/Singularity: run as root, /root/.hermes/ is correct
Also:
- credential_files.py: fix name/path key fallback in required_credential_files
- Singularity, SSH, Daytona: gained credential file support
- 14 tests covering symlink filtering, name/path fallback, iter_skills_files
2026-03-30 02:45:41 -07:00
|
|
|
cmd.extend(["--bind", f"{skills_mount['host_path']}:{skills_mount['container_path']}:ro"])
|
|
|
|
|
except Exception as e:
|
|
|
|
|
logger.debug("Singularity: could not load credential/skills mounts: %s", e)
|
|
|
|
|
|
2026-02-23 02:11:33 -08:00
|
|
|
if self._memory > 0:
|
|
|
|
|
cmd.extend(["--memory", f"{self._memory}M"])
|
|
|
|
|
if self._cpu > 0:
|
|
|
|
|
cmd.extend(["--cpus", str(self._cpu)])
|
|
|
|
|
|
|
|
|
|
cmd.extend([str(self.image), self.instance_id])
|
|
|
|
|
|
2026-02-21 22:31:43 -08:00
|
|
|
try:
|
|
|
|
|
result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
|
|
|
|
|
if result.returncode != 0:
|
|
|
|
|
raise RuntimeError(f"Failed to start instance: {result.stderr}")
|
|
|
|
|
self._instance_started = True
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
logger.info("Singularity instance %s started (persistent=%s)",
|
2026-02-23 02:11:33 -08:00
|
|
|
self.instance_id, self._persistent)
|
2026-02-21 22:31:43 -08:00
|
|
|
except subprocess.TimeoutExpired:
|
|
|
|
|
raise RuntimeError("Instance start timed out")
|
|
|
|
|
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
def _run_bash(self, cmd_string: str, *, login: bool = False,
|
|
|
|
|
timeout: int = 120,
|
|
|
|
|
stdin_data: str | None = None) -> subprocess.Popen:
|
|
|
|
|
"""Spawn a bash process inside the Singularity instance."""
|
2026-02-21 22:31:43 -08:00
|
|
|
if not self._instance_started:
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
raise RuntimeError("Singularity instance not started")
|
2026-03-08 17:46:11 +03:30
|
|
|
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
cmd = [self.executable, "exec",
|
|
|
|
|
f"instance://{self.instance_id}"]
|
|
|
|
|
if login:
|
|
|
|
|
cmd.extend(["bash", "-l", "-c", cmd_string])
|
2026-03-08 17:46:11 +03:30
|
|
|
else:
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
cmd.extend(["bash", "-c", cmd_string])
|
2026-02-21 22:31:43 -08:00
|
|
|
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
return _popen_bash(cmd, stdin_data)
|
2026-02-21 22:31:43 -08:00
|
|
|
|
|
|
|
|
def cleanup(self):
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
"""Stop the instance. If persistent, the overlay dir survives."""
|
2026-02-21 22:31:43 -08:00
|
|
|
if self._instance_started:
|
|
|
|
|
try:
|
|
|
|
|
subprocess.run(
|
|
|
|
|
[self.executable, "instance", "stop", self.instance_id],
|
|
|
|
|
capture_output=True, text=True, timeout=30,
|
|
|
|
|
)
|
|
|
|
|
logger.info("Singularity instance %s stopped", self.instance_id)
|
|
|
|
|
except Exception as e:
|
|
|
|
|
logger.warning("Failed to stop Singularity instance %s: %s", self.instance_id, e)
|
|
|
|
|
self._instance_started = False
|
2026-02-23 02:11:33 -08:00
|
|
|
|
|
|
|
|
if self._persistent and self._overlay_dir:
|
|
|
|
|
snapshots = _load_snapshots()
|
|
|
|
|
snapshots[self._task_id] = str(self._overlay_dir)
|
|
|
|
|
_save_snapshots(snapshots)
|