build_execute_code_schema(set()) produced "from hermes_tools import , ..."
in the code property description — invalid Python syntax shown to the model.
This triggers when a user enables only the code_execution toolset without
any of the sandbox-allowed tools (e.g. `hermes tools code_execution`),
because SANDBOX_ALLOWED_TOOLS & {"execute_code"} = empty set.
Also adds 29 unit tests covering build_execute_code_schema, environment
variable filtering, execute_code edge cases, and interrupt handling.
Authored by voteblake.
- Semaphore limits concurrent Modal sandbox creations to 8 (configurable)
to prevent thread pool deadlocks when 86+ tasks fire simultaneously
- Modal cleanup guard for failed init (prevents AttributeError)
- CWD override to /app for TB2 containers
- Add /home/ to host path validation for container backends
Authored by aydnOktay. Adds logging to skills_tool.py with specific
exception handling for file read errors (UnicodeDecodeError, PermissionError)
vs unexpected exceptions, replacing bare except-and-continue blocks.
Automatic filesystem snapshots before destructive file operations,
with user-facing rollback. Inspired by PR #559 (by @alireza78a).
Architecture:
- Shadow git repos at ~/.hermes/checkpoints/{hash}/ via GIT_DIR
- CheckpointManager: take/list/restore, turn-scoped dedup, pruning
- Transparent — the LLM never sees it, no tool schema, no tokens
- Once per turn — only first write_file/patch triggers a snapshot
Integration:
- Config: checkpoints.enabled + checkpoints.max_snapshots
- CLI flag: hermes --checkpoints
- Trigger: run_agent.py _execute_tool_calls() before write_file/patch
- /rollback slash command in CLI + gateway (list, restore by number)
- Pre-rollback snapshot auto-created on restore (undo the undo)
Safety:
- Never blocks file operations — all errors silently logged
- Skips root dir, home dir, dirs >50K files
- Disables gracefully when git not installed
- Shadow repo completely isolated from project git
Tests: 35 new tests, all passing (2798 total suite)
Docs: feature page, config reference, CLI commands reference
PR #568 moved the close entirely to the finally block, but the success-path
close is needed to break the RPC thread out of accept() immediately. Without
it, rpc_thread.join(3) may block for up to 3 seconds if the child process
never connected. The finally-block close remains as a safety net for the
exception/error path (the actual fd leak fix).
Authored by alireza78a. Moves server_sock.close() into the finally block so
the socket fd is always cleaned up, even if an exception occurs between socket
creation and the success-path close.
Authored by aydnOktay. Adds _atomic_write_text() helper using tempfile.mkstemp()
+ os.replace() to prevent skill file corruption on crash/interrupt. All 7
write_text() calls in skill_manager_tool.py converted, including rollback writes
during security scans.
Authored by Himess. Three independent fixes:
- cron/jobs.py: respect HERMES_HOME env var (consistent with scheduler.py)
- gateway/run.py: add Platform.HOMEASSISTANT to toolset mappings
- tools/environments/daytona.py: use time.monotonic() for timeout deadline
Authored by Himess. Replaces split(':', 2) with regex that optionally
captures Windows drive-letter prefix in rg/grep output parsing. Fixes
search_files returning zero results on Windows where paths like
C:\path\file.py:42:content were misparsed by naive colon splitting.
No behavior change on Unix/Mac.
Authored by shitcoinsherpa. Adds encoding='utf-8' to all text-mode
open() calls in gateway/run.py, gateway/config.py, hermes_cli/config.py,
hermes_cli/main.py, and hermes_cli/status.py. Prevents encoding errors
on Windows where the default locale is not UTF-8.
Also fixed 4 additional open() calls in gateway/run.py that were added
after the PR branch was created.
The existing password prompt uses /dev/tty and termios to read input
with echo disabled. Neither exists on Windows.
On Windows, msvcrt.getwch() reads a single character from the console
without echoing it. This adds a Windows code path that uses getwch()
in a loop, collecting characters until Enter is pressed.
The Unix path using termios and /dev/tty is unchanged.
Authored by shitcoinsherpa. Imports winpty.PtyProcess on Windows instead
of ptyprocess.PtyProcess, and adds platform markers to the [pty] extra
so the correct package is installed automatically.
cap-drop ALL removes DAC_OVERRIDE, which root needs to write to
bind-mounted directories owned by the host user (uid 1000). This
broke persistent Docker sandboxes — the container couldn't write
to /workspace or /root.
Add back the minimum capabilities needed:
- DAC_OVERRIDE: root can write to bind-mounted dirs owned by host user
- CHOWN: package managers (pip, npm, apt) need to set file ownership
- FOWNER: needed for operations on files owned by other users
Still drops all other capabilities (NET_RAW, SYS_ADMIN, etc.) and
keeps no-new-privileges. Security boundary is the container itself.
Verified end-to-end: create files → destroy container → new container
with same task_id → files persist on host and are accessible in the
new container.
Validate that the username portion of ~username paths contains only
valid characters (alphanumeric, dot, hyphen, underscore) before passing
to shell echo for expansion. Previously, paths like '~; rm -rf /'
would be passed unquoted to self._exec(f'echo {path}'), allowing
arbitrary command execution.
The approach validates the username rather than using shlex.quote(),
which would prevent tilde expansion from working at all since
echo '~user' outputs the literal string instead of expanding it.
Added tests for injection blocking and valid ~username/path expansion.
Credit to @alireza78a for reporting (PR #442, issue #442).
Authored by aydnOktay. Improves URL validation with urlparse, adds exc_info
to error logs for full stack traces, and tightens type hints.
Resolved merge conflict in _handle_vision_analyze: kept PR's string formatting
with our AUXILIARY_VISION_MODEL env var logic.
Add MCP sampling/createMessage capability via SamplingHandler class.
Text-only sampling + tool use in sampling with governance (rate limits,
model whitelist, token caps, tool loop limits). Per-server audit metrics.
Based on concept from PR #366 by eren-karakus0. Restructured as class-based
design with bug fixes and tests using real MCP SDK types.
50 new tests, 2600 total passing.
Terminal output was already redacted via redact_sensitive_text() but
read_file and search_files returned raw content. Now both tools
redact secrets before returning results to the LLM.
Based on PR #372 by @teyrebaz33 (closes#363) — applied manually
due to branch conflicts with the current codebase.
Uses temp file + fsync + os.replace() to avoid corruption if the
process crashes mid-write. Cleans up temp file on failure, logs
errors at debug level.
Based on PR #335 by @aydnOktay — adapted for the current v2
manifest format (name:hash).
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
Complete Signal adapter using signal-cli daemon HTTP API.
Based on PR #268 by ibhagwan, rebuilt on current main with bug fixes.
Architecture:
- SSE streaming for inbound messages with exponential backoff (2s→60s)
- JSON-RPC 2.0 for outbound (send, typing, attachments, contacts)
- Health monitor detects stale SSE connections (120s threshold)
- Phone number redaction in all logs and global redact.py
Features:
- DM and group message support with separate access policies
- DM policies: pairing (default), allowlist, open
- Group policies: disabled (default), allowlist, open
- Attachment download with magic-byte type detection
- Typing indicators (8s refresh interval)
- 100MB attachment size limit, 8000 char message limit
- E.164 phone + UUID allowlist support
Integration:
- Platform.SIGNAL enum in gateway/config.py
- Signal in _is_user_authorized() allowlist maps (gateway/run.py)
- Adapter factory in _create_adapter() (gateway/run.py)
- user_id_alt/chat_id_alt fields in SessionSource for UUIDs
- send_message tool support via httpx JSON-RPC (not aiohttp)
- Interactive setup wizard in 'hermes gateway setup'
- Connectivity testing during setup (pings /api/v1/check)
- signal-cli detection and install guidance
Bug fixes from PR #268:
- Timestamp reads from envelope_data (not outer wrapper)
- Uses httpx consistently (not aiohttp in send_message tool)
- SIGNAL_DEBUG scoped to signal logger (not root)
- extract_images regex NOT modified (preserves group numbering)
- pairing.py NOT modified (no cross-platform side effects)
- No dual authorization (adapter defers to run.py for user auth)
- Wildcard uses set membership ('*' in set, not list equality)
- .zip default for PK magic bytes (not .docx)
No new Python dependencies — uses httpx (already core).
External requirement: signal-cli daemon (user-installed).
Tests: 30 new tests covering config, init, helpers, session source,
phone redaction, authorization, and send_message integration.
Co-authored-by: ibhagwan <ibhagwan@users.noreply.github.com>
All failure paths in _run_browser_command now log at WARNING level,
which means they automatically land in ~/.hermes/logs/errors.log
(the persistent error log captures WARNING+).
What's now logged:
- agent-browser CLI not found (warning)
- Session creation failure with task ID (warning)
- Command entry with socket_dir path and length (debug)
- Non-zero return code with stderr (warning)
- Non-JSON output from agent-browser (warning — version mismatch/crash)
- Command timeout with task ID and socket path (warning)
- Unexpected exceptions with full traceback (warning + exc_info)
- browser_vision: which model is used and screenshot size (debug)
- browser_vision: LLM analysis failure with full traceback (warning)
Also fixed: _get_vision_model() was called twice in browser_vision —
now called once and reused.
macOS sets TMPDIR to /var/folders/xx/.../T/ (~51 chars). Combined with
agent-browser session names, socket paths reach 121 chars — exceeding
the 104-byte macOS AF_UNIX limit. This causes 'Screenshot file was not
created' errors and silent browser_vision failures on macOS.
Fix: use /tmp/ on macOS (symlink to /private/tmp, sticky-bit protected).
On Linux, tempfile.gettempdir() already returns /tmp — no behavior change.
Changes in browser_tool.py:
- Add _socket_safe_tmpdir() helper — returns /tmp on macOS, gettempdir()
elsewhere
- Replace all 3 tempfile.gettempdir() calls for socket dirs
- Set mode=0o700 on socket dirs for privacy (was using default umask)
- Guard vision/text client init with try/except — a broken auxiliary
config no longer prevents the entire browser_tool module from importing
(which would disable all 10 browser tools, not just vision)
- Improve screenshot error messages with mode info and diagnostic hints
- Don't delete screenshots when LLM analysis fails — the capture was
valid, only the vision API call failed. Screenshots are still cleaned
up by the existing 24-hour _cleanup_old_screenshots mechanism.
Changes in code_execution_tool.py:
- Same /tmp fix for RPC socket path (was 103 chars on macOS — one char
from the 104-byte limit)
- Added support for auxiliary model overrides in the configuration, allowing users to specify providers and models for vision and web extraction tasks.
- Updated the CLI configuration example to include new auxiliary model settings.
- Enhanced the environment variable mapping in the CLI to accommodate auxiliary model configurations.
- Improved the resolution logic for auxiliary clients to support task-specific provider overrides.
- Updated relevant documentation and comments for clarity on the new features and their usage.
Add contextual [Hint: ...] suffixes to tool results where they save
real iterations:
- patch (no match): suggests read_file/search_files to verify content
before retrying — addresses the common pattern where the agent retries
with stale old_string instead of re-reading the file.
- search_files (truncated): provides explicit next offset and suggests
narrowing the search — clearer than relying on total_count inference.
Other hints proposed in #722 (terminal, web_search, web_extract,
browser_snapshot, search zero-results, search content-matches) were
evaluated and found to be low-value: either already covered by existing
mechanisms (read_file pagination, similar-files, schema descriptions)
or guidance the agent already follows from its own reasoning.
5 new tests covering hint presence/absence for both tools.
Previously, search_files would silently return 0 results when the
search path didn't exist (e.g., /root/.hermes/... when HOME is
/home/user). The path was passed to rg/grep/find which would fail
silently, and the empty stdout was parsed as 'no matches found'.
Changes:
- Add path existence check at the top of search() using test -e.
Returns SearchResult with a clear error message when path doesn't exist.
- Add exit code 2 checks in _search_with_rg() and _search_with_grep()
as secondary safety net for other error types (bad regex, permissions).
- Add 4 new tests covering: nonexistent path (content mode), nonexistent
path (files mode), existing path proceeds normally, rg error exit code.
Tests: 37 → 41 in test_file_operations.py, full suite 2330 passed.
These two files were creating bare OpenAI clients pointing at OpenRouter
without the HTTP-Referer / X-OpenRouter-Title / X-OpenRouter-Categories
headers that the rest of the codebase sends for app attribution.
- skills_guard.py: LLM audit client (always OpenRouter)
- trajectory_compressor.py: sync + async summarization clients
(guarded with 'openrouter' in base_url check since the endpoint
is user-configurable)
Updated the systemd unit generation to include the virtual environment and node modules in the PATH, improving the execution context for the hermes CLI. Additionally, added support for installing Playwright and its dependencies on Arch/Manjaro systems in the install script, ensuring a smoother setup process for browser tools.
Enhanced the environment setup for browser commands by ensuring the PATH variable includes standard directories, addressing potential issues with minimal PATH in systemd services. Additionally, updated the logging of stderr to use a warning level on failure for better visibility of errors. This change improves the robustness of subprocess execution in the browser tool.
Renamed _find_shell to _find_bash to clarify its purpose of specifically locating bash. Improved the shell detection logic to prioritize bash over the user's $SHELL, ensuring compatibility with the fence wrapper's syntax requirements. Added a backward compatibility alias for _find_shell to maintain existing imports in process_registry.py.
Updated the LocalEnvironment class to ensure the PATH variable includes standard directories. This change addresses issues with systemd services and terminal multiplexers that inherit a minimal PATH, improving the execution environment for subprocesses.
Updated the _find_shell function to improve shell detection on non-Windows systems. The function now checks for the existence of /usr/bin/bash and /bin/bash before falling back to /bin/sh, ensuring a more robust shell resolution process.
Skills can now declare runtime prerequisites (env vars, CLI binaries) via
YAML frontmatter. Skills with unmet prerequisites are excluded from the
system prompt so the agent never claims capabilities it can't deliver, and
skill_view() warns the agent about what's missing.
Three layers of defense:
- build_skills_system_prompt() filters out unavailable skills
- _find_all_skills() flags unmet prerequisites in metadata
- skill_view() returns prerequisites_warning with actionable details
Tagged 12 bundled skills that have hard runtime dependencies:
gif-search (TENOR_API_KEY), notion (NOTION_API_KEY), himalaya, imessage,
apple-notes, apple-reminders, openhue, duckduckgo-search, codebase-inspection,
blogwatcher, songsee, mcporter.
Closes#658Fixes#630
browser_vision now saves screenshots persistently to ~/.hermes/browser_screenshots/
and returns the screenshot_path in its JSON response. The model can include
MEDIA:<path> in its response to share screenshots as native photos.
Changes:
- browser_tool.py: Save screenshots persistently, return screenshot_path,
auto-cleanup files older than 24 hours, mkdir moved inside try/except
- telegram.py: Add send_image_file() — sends local images via bot.send_photo()
- discord.py: Add send_image_file() — sends local images via discord.File
- slack.py: Add send_image_file() — sends local images via files_upload_v2()
(WhatsApp already had send_image_file — no changes needed)
- prompt_builder.py: Updated Telegram hint to list image extensions,
added Discord and Slack MEDIA: platform hints
- browser.md: Document screenshot sharing and 24h cleanup
- send_file_integration_map.md: Updated to reflect send_image_file is now
implemented on Telegram/Discord/Slack
- test_send_image_file.py: 19 tests covering MEDIA: .png extraction,
send_image_file on all platforms, and screenshot cleanup
Partially addresses #466 (Phase 0: platform adapter gaps for send_image_file).
The web_extract_tool was stripping the 'url' key during its output
trimming step, but documentation in 3 places claimed it was present.
This caused KeyError when accessing result['url'] in execute_code
scripts, especially when extracting from multiple URLs.
Changes:
- web_tools.py: Add 'url' back to trimmed_results output
- code_execution_tool.py: Add 'title' to _TOOL_STUBS docstring and
_TOOL_DOC_LINES so docs match actual {url, title, content, error}
response format
Root cause: fal_client.AsyncClient uses @cached_property for its
httpx.AsyncClient, creating it once and caching forever. In the gateway,
the agent runs in a thread pool where _run_async() calls asyncio.run()
which creates a temporary event loop. The first call works, but
asyncio.run() closes that loop. On the next call, a new loop is created
but the cached httpx.AsyncClient still references the old closed loop,
causing 'Event loop is closed'.
Fix: Switch from async fal_client API (submit_async/handler.get with
await) to sync API (submit/handler.get). The sync API uses httpx.Client
which has no event loop dependency. Since the tool already runs in a
thread pool via the gateway, async adds no benefit here.
Changes:
- image_generate_tool: async def -> def
- _upscale_image: async def -> def
- fal_client.submit_async -> fal_client.submit
- await handler.get() -> handler.get()
- is_async=True -> is_async=False in registry
- Remove unused asyncio import
- Add max_concurrent_tasks config (default 8) with semaphore in TB2 eval
- Pass cwd: /app via register_task_env_overrides for TB2 tasks
- Add /home/ to host path prefixes as safety net for container backends
When all 86 TerminalBench2 tasks fire simultaneously, each creates a Modal sandbox
via asyncio.run() inside a thread pool worker. Modal's blocking calls deadlock
when too many are created at once. The semaphore ensures max 8 concurrent creations.
Co-Authored-By: hermes-agent[bot] <hermes-agent[bot]@users.noreply.github.com>
Enhanced the _run_single_child function by introducing max_tokens, reasoning_config, and prefill_messages parameters from the parent agent. This allows for more flexible configuration of child agents, improving their operational capabilities.
Eliminated the model parameter from the delegate_task function and its associated schema, defaulting to None for subagent calls. This change simplifies the function signature and enforces consistent behavior across task delegation.
Subagent tool calls now count toward the same session-wide iteration
limit as the parent agent. Previously, each subagent had its own
independent counter, so a parent with max_iterations=60 could spawn
3 subagents each doing 50 calls = 150 total tool calls unmetered.
Changes:
- IterationBudget: thread-safe shared counter (run_agent.py)
- consume(): try to use one iteration, returns False if exhausted
- refund(): give back one iteration (for execute_code turns)
- Thread-safe via Lock (subagents run in ThreadPoolExecutor)
- Parent creates the budget, children inherit it via delegate_tool.py
- execute_code turns are refunded (don't count against budget)
- Default raised from 60 → 90 to account for shared consumption
- Per-child cap (50) still applies as a safety valve
The per-child max_iterations (default 50) remains as a per-child
ceiling, but the shared budget is the hard session-wide limit.
A child stops at whichever comes first.
Add local browser mode as an automatic fallback when Browserbase
credentials are not configured. Uses the same agent-browser CLI with
--session (local Chromium) instead of --cdp (cloud Browserbase).
The agent-facing API is completely unchanged — all 10 browser_* tools
produce identical output in both modes. Auto-detection:
- BROWSERBASE_API_KEY set → cloud mode (existing behavior)
- No key → local mode (new, free, headless Chromium)
Changes:
- _is_local_mode(): auto-detect based on env vars
- _create_local_session(): lightweight session (no API call)
- _get_session_info(): branches on local vs cloud
- _run_browser_command(): --session in local, --cdp in cloud
- check_browser_requirements(): only needs agent-browser CLI in local mode
- _emergency_cleanup: CLI close in local, API release in cloud
- cleanup_browser/browser_close: skip BB API calls in local mode
- Registry: removed requires_env — check_fn handles both modes
Setup for local mode:
npm install -g agent-browser
agent-browser install # downloads Chromium
agent-browser install --with-deps # also installs system libs (Docker/Debian)
Closes#374 (Phase 1)
Add a 'platforms' field to SKILL.md frontmatter that restricts skills
to specific operating systems. Skills with platforms: [macos] only
appear in the system prompt, skills_list(), and slash commands on macOS.
Skills without the field load everywhere (backward compatible).
Implementation:
- skill_matches_platform() in tools/skills_tool.py — core filter
- Wired into all 3 discovery paths: prompt_builder.py, skills_tool.py,
skill_commands.py
- 28 new tests across 3 test files
New bundled Apple/macOS skills (all platforms: [macos]):
- imessage — Send/receive iMessages via imsg CLI
- apple-reminders — Manage Reminders via remindctl CLI
- apple-notes — Manage Notes via memo CLI
- findmy — Track devices/AirTags via AppleScript + screen capture
Docs updated: CONTRIBUTING.md, AGENTS.md, creating-skills.md,
skills.md (user guide)