hermes-agent

Author	SHA1	Message	Date
Teknium	745859babb	feat: env var passthrough for skills and user config (#2807 ) * feat: env var passthrough for skills and user config Skills that declare required_environment_variables now have those vars passed through to sandboxed execution environments (execute_code and terminal). Previously, execute_code stripped all vars containing KEY, TOKEN, SECRET, etc. and the terminal blocklist removed Hermes infrastructure vars — both blocked skill-declared env vars. Two passthrough sources: 1. Skill-scoped (automatic): when a skill is loaded via skill_view and declares required_environment_variables, vars that are present in the environment are registered in a session-scoped passthrough set. 2. Config-based (manual): terminal.env_passthrough in config.yaml lets users explicitly allowlist vars for non-skill use cases. Changes: - New module: tools/env_passthrough.py — shared passthrough registry - hermes_cli/config.py: add terminal.env_passthrough to DEFAULT_CONFIG - tools/skills_tool.py: register available skill env vars on load - tools/code_execution_tool.py: check passthrough before filtering - tools/environments/local.py: check passthrough in _sanitize_subprocess_env and _make_run_env - 19 new tests covering all layers * docs: add environment variable passthrough documentation Document the env var passthrough feature across four docs pages: - security.md: new 'Environment Variable Passthrough' section with full explanation, comparison table, and security considerations - code-execution.md: update security section, add passthrough subsection, fix comparison table - creating-skills.md: add tip about automatic sandbox passthrough - skills.md: add note about passthrough after secure setup docs Live-tested: launched interactive CLI, loaded a skill with required_environment_variables, verified TEST_SKILL_SECRET_KEY was accessible inside execute_code sandbox (value: passthrough-test-value-42).	2026-03-24 08:19:34 -07:00
Teknium	ad1bf16f28	chore: remove all remaining mini-swe-agent references Complete cleanup after dropping the mini-swe-agent submodule (PR #2804): - Remove MSWEA_SILENT_STARTUP and MSWEA_GLOBAL_CONFIG_DIR env var settings from cli.py, run_agent.py, hermes_cli/main.py, doctor.py - Remove mini-swe-agent health check from hermes doctor - Remove 'minisweagent' from logger suppression lists - Remove litellm/typer/platformdirs from requirements.txt - Remove mini-swe-agent install steps from install.ps1 (Windows) - Remove mini-swe-agent install steps from website docs - Update all stale comments/docstrings referencing mini-swe-agent in terminal_tool.py, tools/__init__.py, code_execution_tool.py, environments/README.md, environments/agent_loop.py - Remove mini_swe_runner from pyproject.toml py-modules (still exists as standalone script for RL training use) - Shrink test_minisweagent_path.py to empty stub The orphaned mini-swe-agent/ directory on disk needs manual removal: rm -rf mini-swe-agent/	2026-03-24 08:19:23 -07:00
Teknium	934fbe3c06	fix: strip ANSI at the source — clean terminal output before it reaches the model Root cause: terminal_tool, execute_code, and process_registry returned raw subprocess output with ANSI escape sequences intact. The model saw these in tool results and copied them into file writes. Previous fix (PR #2532) stripped ANSI at the write point in file_tools.py, but this was a band-aid — regex on file content risks corrupting legitimate content, and doesn't prevent ANSI from wasting tokens in the model context. Source-level fix: - New tools/ansi_strip.py with comprehensive ECMA-48 regex covering CSI (incl. private-mode, colon-separated, intermediate bytes), OSC (both terminators), DCS/SOS/PM/APC strings, Fp/Fe/Fs/nF escapes, 8-bit C1 - terminal_tool.py: strip output before returning to model - code_execution_tool.py: strip stdout/stderr before returning - process_registry.py: strip output in poll/read_log/wait - file_tools.py: remove _strip_ansi band-aid (no longer needed) Verified: `ls --color=always` output returned as clean text to model, file written from that output contains zero ESC bytes.	2026-03-23 07:43:12 -07:00
Teknium	36079c6646	fix(tools): fix resource leak and double socket close in code_execution_tool (#2381 ) Two fixes: 1. Use a single open(os.devnull) handle for both stdout and stderr suppression, preventing a file handle leak if the second open() fails. 2. Set server_sock = None after closing it in the try block to prevent the finally block from closing it again (causing an OSError). Closes #2136 Co-authored-by: dieutx <dangtc94@gmail.com>	2026-03-21 15:55:25 -07:00
Teknium	474301adc6	fix: improve execute_code error logging and harden cleanup (#1623 ) * fix(tools): improve error logging in code_execution_tool * fix: harden execute_code cleanup and reduce logging noise Follow-up to cherry-picked PR #1588 (aydnOktay): - Initialize server_sock = None before try block to prevent NameError if exception occurs before socket creation (line 413 is inside the try) - Guard server_sock.close() with None check - Narrow cleanup exception handlers to OSError (the actual error type) - Remove exc_info=True from cleanup debug logs — benign teardown failures don't need stack traces, the message is sufficient - Remove redundant try/except around shutil.rmtree(ignore_errors=True) - Silence sock_path unlink with pass — expected when already cleaned up --------- Co-authored-by: aydnOktay <xaydinoktay@gmail.com>	2026-03-16 23:13:26 -07:00
teknium1	210d5ade1e	feat(tools): centralize tool emoji metadata in registry + skin integration - Add 'emoji' field to ToolEntry and 'get_emoji()' to ToolRegistry - Add emoji= to all 50+ registry.register() calls across tool files - Add get_tool_emoji() helper in agent/display.py with 3-tier resolution: skin override → registry default → hardcoded fallback - Replace hardcoded emoji maps in run_agent.py, delegate_tool.py, and gateway/run.py with centralized get_tool_emoji() calls - Add 'tool_emojis' field to SkinConfig so skins can override per-tool emojis (e.g. ares skin could use swords instead of wrenches) - Add 11 tests (5 registry emoji, 6 display/skin integration) - Update AGENTS.md skin docs table Based on the approach from PR #1061 by ForgingAlex (emoji centralization in registry). This salvage fixes several issues from the original: - Does NOT split the cronjob tool (which would crash on missing schemas) - Does NOT change image_generate toolset/requires_env/is_async - Does NOT delete existing tests - Completes the centralization (gateway/run.py was missed) - Hooks into the skin system for full customizability	2026-03-15 20:21:21 -07:00
anastazya	23bc642c82	fix: add project root to PYTHONPATH in execute_code sandbox The execute_code sandbox spawns a child process with cwd set to a temporary directory, but never adds the hermes-agent project root to PYTHONPATH. This makes project-root modules like minisweagent_path unreachable from sandboxed scripts, causing ImportError when the agent runs self-diagnostic or analysis code via execute_code. Fix by prepending the hermes-agent root directory to PYTHONPATH in the child process environment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-14 21:39:05 -07:00
teknium1	a9241f3e3e	fix: head+tail truncation for execute_code stdout Replaces head-only stdout capture with a two-buffer approach (40% head, 60% tail rolling window) so scripts that print() their final results at the end never lose them. Adds truncation notice between sections. Cherry-picked from PR #755, conflict resolved (test file additions). 3 new tests for short output, head+tail preservation, and notice format.	2026-03-11 00:26:13 -07:00
teknium1	a458b535c9	fix: improve read-loop detection — consecutive-only, correct thresholds, fix bugs Follow-up to PR #705 (merged from 0xbyt4). Addresses several issues: 1. CONSECUTIVE-ONLY TRACKING: Redesigned the read/search tracker to only warn/block on truly consecutive identical calls. Any other tool call in between (write, patch, terminal, etc.) resets the counter via notify_other_tool_call(), called from handle_function_call() in model_tools.py. This prevents false blocks in read→edit→verify flows. 2. THRESHOLD ADJUSTMENT: Warn on 3rd consecutive (was 2nd), block on 4th+ consecutive (was 3rd+). Gives the model more room before intervening. 3. TUPLE UNPACKING BUG: Fixed get_read_files_summary() which crashed on search keys (5-tuple) when trying to unpack as 3-tuple. Now uses a separate read_history set that only tracks file reads. 4. WEB_EXTRACT DOCSTRING: Reverted incorrect removal of 'title' from web_extract return docs in code_execution_tool.py — the field IS returned by web_tools.py. 5. TESTS: Rewrote test_read_loop_detection.py (35 tests) to cover consecutive-only behavior, notify_other_tool_call, interleaved read/search, and summary-unaffected-by-searches.	2026-03-10 16:25:41 -07:00
teknium1	b53d5dad67	Merge PR #705 : fix: detect, warn, and block file re-read/search loops after context compression Authored by 0xbyt4. Adds read/search loop detection, file history injection after compression, and todo filtering for active items only.	2026-03-10 16:17:03 -07:00
teknium1	0fdeffe6c4	fix: replace silent exception swallowing with debug logging across tools Add logger.debug() calls to 27 bare 'except: pass' blocks across 7 core files, giving visibility into errors that were previously silently swallowed. This makes it much easier to diagnose user-reported issues from debug logs. Files changed: - tools/terminal_tool.py: 5 catches (stat, termios, fd close, cleanup) - tools/delegate_tool.py: 7 catches + added logger (spinner, callbacks) - tools/browser_tool.py: 5 catches (screenshot/recording cleanup, daemon kill) - tools/code_execution_tool.py: 2 remaining catches (socket, server close) - gateway/session.py: 2 catches (platform enum parse, temp file cleanup) - agent/display.py: 2 catches + added logger (JSON parse in failure detect) - agent/prompt_builder.py: 1 catch (skill description read) Deliberately kept bare pass for: - ImportError checks for optional dependencies (terminal_tool.py) - SystemExit/KeyboardInterrupt handlers - Spinner _write catch (would spam on every frame when stdout closed) - process_registry PID-alive check (canonical os.kill(pid,0) pattern) Extends the pattern from PR #686 (@aydnOktay).	2026-03-10 06:59:20 -07:00
teknium1	87af622df4	Merge PR #686 : improve error handling and logging in code execution tool Authored by @aydnOktay. Adds exc_info=True to exception logging, replaces silent pass statements with logger.debug calls, fixes variable shadowing in _kill_process_group nested except blocks.	2026-03-10 06:43:11 -07:00
teknium1	771969f747	fix: wire up enabled_tools in agent loop + simplify sandbox tool selection Completes the fix started in `8318a51` — handle_function_call() accepted enabled_tools but run_agent.py never passed it. Now both call sites in _execute_tool_calls() pass self.valid_tool_names, so each agent session uses its own tool list instead of the process-global _last_resolved_tool_names (which subagents can overwrite). Also simplifies the redundant ternary in code_execution_tool.py: sandbox_tools is already computed correctly (intersection with session tools, or full SANDBOX_ALLOWED_TOOLS as fallback), so the conditional was dead logic. Inspired by PR #663 (JasonOA888). Closes #662. Tests: 2857 passed.	2026-03-10 06:35:28 -07:00
0xbyt4	694a3ebdd5	fix(code_execution): handle empty enabled_sandbox_tools in schema description build_execute_code_schema(set()) produced "from hermes_tools import , ..." in the code property description — invalid Python syntax shown to the model. This triggers when a user enables only the code_execution toolset without any of the sandbox-allowed tools (e.g. `hermes tools code_execution`), because SANDBOX_ALLOWED_TOOLS & {"execute_code"} = empty set. Also adds 29 unit tests covering build_execute_code_schema, environment variable filtering, execute_code edge cases, and interrupt handling.	2026-03-10 06:18:27 -07:00
teknium1	6f3a673aba	fix: restore success-path server_sock.close() before rpc_thread.join() PR #568 moved the close entirely to the finally block, but the success-path close is needed to break the RPC thread out of accept() immediately. Without it, rpc_thread.join(3) may block for up to 3 seconds if the child process never connected. The finally-block close remains as a safety net for the exception/error path (the actual fd leak fix).	2026-03-09 23:40:20 -07:00
teknium1	ab6a6338c4	Merge PR #568 : fix(code-execution): close server socket in finally block to prevent fd leak Authored by alireza78a. Moves server_sock.close() into the finally block so the socket fd is always cleaned up, even if an exception occurs between socket creation and the success-path close.	2026-03-09 23:39:13 -07:00
0xbyt4	4684aaffdc	merge: resolve file_tools.py conflict with origin/main Combine read/search loop detection with main's redact_sensitive_text and truncation hint features. Add tracker reset to TestSearchHints to prevent cross-test state leakage.	2026-03-09 13:21:46 +03:00
teknium1	2036c22f88	fix: macOS browser/code-exec socket path exceeds Unix limit (#374 ) macOS sets TMPDIR to /var/folders/xx/.../T/ (~51 chars). Combined with agent-browser session names, socket paths reach 121 chars — exceeding the 104-byte macOS AF_UNIX limit. This causes 'Screenshot file was not created' errors and silent browser_vision failures on macOS. Fix: use /tmp/ on macOS (symlink to /private/tmp, sticky-bit protected). On Linux, tempfile.gettempdir() already returns /tmp — no behavior change. Changes in browser_tool.py: - Add _socket_safe_tmpdir() helper — returns /tmp on macOS, gettempdir() elsewhere - Replace all 3 tempfile.gettempdir() calls for socket dirs - Set mode=0o700 on socket dirs for privacy (was using default umask) - Guard vision/text client init with try/except — a broken auxiliary config no longer prevents the entire browser_tool module from importing (which would disable all 10 browser tools, not just vision) - Improve screenshot error messages with mode info and diagnostic hints - Don't delete screenshots when LLM analysis fails — the capture was valid, only the vision API call failed. Screenshots are still cleaned up by the existing 24-hour _cleanup_old_screenshots mechanism. Changes in code_execution_tool.py: - Same /tmp fix for RPC socket path (was 103 chars on macOS — one char from the 104-byte limit)	2026-03-08 19:31:23 -07:00
0xbyt4	e2fe1373f3	fix: escalate read/search blocking, track search loops, filter completed todos - Block file reads after 3+ re-reads of same region (no content returned) - Track search_files calls and block repeated identical searches - Filter completed/cancelled todos from post-compression injection to prevent agent from re-doing finished work - Add 10 new tests covering all three fixes	2026-03-08 23:01:21 +03:00
aydnOktay	7b1f40dd00	Improve error handling and logging in code execution tool	2026-03-08 14:50:23 +03:00
teknium1	3830bbda41	fix: include url in web_extract trimmed results & fix docs The web_extract_tool was stripping the 'url' key during its output trimming step, but documentation in 3 places claimed it was present. This caused KeyError when accessing result['url'] in execute_code scripts, especially when extracting from multiple URLs. Changes: - web_tools.py: Add 'url' back to trimmed_results output - code_execution_tool.py: Add 'title' to _TOOL_STUBS docstring and _TOOL_DOC_LINES so docs match actual {url, title, content, error} response format	2026-03-07 18:07:36 -08:00
teknium1	69a36a3361	Merge PR #309 : fix(timezone): timezone-aware now() for prompt, cron, and execute_code Authored by areu01or00. Adds timezone support via hermes_time.now() helper with IANA timezone resolution (HERMES_TIMEZONE env → config.yaml → server-local). Updates system prompt timestamp, cron scheduling, and execute_code sandbox TZ injection. Includes config migration (v4→v5) and comprehensive test coverage.	2026-03-07 00:04:41 -08:00
alireza78a	a857321463	fix(code-execution): close server socket in finally block to prevent fd leak	2026-03-07 05:49:48 +03:30
teknium1	f75b1d21b4	fix: execute_code and delegate_task now respect disabled toolsets When a user disables the web toolset via 'hermes tools', the execute_code schema description still hardcoded web_search/web_extract as available, causing the model to keep trying to use them. Similarly, delegate_task always defaulted to ['terminal', 'file', 'web'] for subagents regardless of the parent's config. Changes: - execute_code schema is now built dynamically via build_execute_code_schema() based on which sandbox tools are actually enabled - model_tools.py rebuilds the execute_code schema at definition time using the intersection of sandbox-allowed and session-enabled tools - delegate_task now inherits the parent agent's enabled_toolsets instead of hardcoding DEFAULT_TOOLSETS when no explicit toolsets are specified - delegate_task description updated to say 'inherits your enabled toolsets' Reported by kotyKD on Discord.	2026-03-06 17:36:14 -08:00
teknium1	3982fcf095	fix: sync execute_code sandbox stubs with real tool schemas The _TOOL_STUBS dict in code_execution_tool.py was out of sync with the actual tool schemas, causing TypeErrors when the LLM used parameters it sees in its system prompt but the sandbox stubs didn't accept: search_files: - Added missing params: context, offset, output_mode - Fixed target default: 'grep' → 'content' (old value was obsolete) patch: - Added missing params: mode, patch (V4A multi-file patch support) Also added 4 drift-detection tests (TestStubSchemaDrift) that will catch future divergence between stubs and real schemas: - test_stubs_cover_all_schema_params: every schema param in stub - test_stubs_pass_all_params_to_rpc: every stub param sent over RPC - test_search_files_target_uses_current_values: no obsolete values - test_generated_module_accepts_all_params: generated code compiles All 28 tests pass.	2026-03-06 03:40:06 -08:00
teknium1	efec4fcaab	feat(execute_code): add json_parse, shell_quote, retry helpers to sandbox The execute_code sandbox generates a hermes_tools.py stub module for LLM scripts. Three common failure modes keep tripping up scripts: 1. json.loads(strict=True) rejects control chars in terminal() output (e.g., GitHub issue bodies with literal tabs/newlines) 2. Shell backtick/quote interpretation when interpolating dynamic content into terminal() commands (markdown with backticks gets eaten by bash) 3. No retry logic for transient network failures (API timeouts, rate limits) Adds three convenience helpers to the generated hermes_tools module: - json_parse(text) — json.loads with strict=False for tolerant parsing - shell_quote(s) — shlex.quote() for safe shell interpolation - retry(fn, max_attempts=3, delay=2) — exponential backoff wrapper Also updates the EXECUTE_CODE_SCHEMA description to document these helpers so LLMs know they're available without importing anything extra. Includes 7 new tests (unit + integration) covering all three helpers.	2026-03-06 01:52:46 -08:00
areu01or00	a1c25046a9	fix(timezone): add timezone-aware clock across agent, cron, and execute_code	2026-03-03 18:23:40 +05:30
Farukest	3f58e47c63	fix: guard POSIX-only process functions for Windows compatibility os.setsid, os.killpg, and os.getpgid do not exist on Windows and raise AttributeError on import or first call. This breaks the terminal tool, code execution sandbox, process registry, and WhatsApp bridge on Windows. Added _IS_WINDOWS platform guard in all four affected files, following the pattern documented in CONTRIBUTING.md. On Windows, preexec_fn is set to None and process termination falls back to proc.terminate() / proc.kill() instead of process group signals. Files changed: - tools/environments/local.py (3 call sites) - tools/process_registry.py (2 call sites) - tools/code_execution_tool.py (3 call sites) - gateway/platforms/whatsapp.py (3 call sites)	2026-03-01 01:54:27 +03:00
teknium1	e5bd25c73f	Fix: #41	2026-02-25 21:16:15 -08:00
teknium1	91907789af	refactor: remove temporary debug logging in code execution tool - Eliminated the temporary debug logging in the `execute_code` function that tracked enabled and sandbox tools, streamlining the code and reducing clutter.	2026-02-24 14:25:53 -08:00
teknium1	6845852e82	refactor: update failure message handling in display module and add debug logging in code execution tool - Modified the `_wrap` function to append a failure suffix without applying red coloring, simplifying the failure message format. - Introduced temporary debug logging in the `execute_code` function to track enabled and sandbox tools, aiding in troubleshooting.	2026-02-24 14:25:53 -08:00
teknium1	08ff1c1aa8	More major refactor/tech debt removal!	2026-02-21 20:22:33 -08:00
teknium1	748fd3db88	refactor: enhance error handling with structured logging across multiple modules - Updated various modules including cli.py, run_agent.py, gateway, and tools to replace silent exception handling with structured logging. - Improved error messages to provide more context, aiding in debugging and monitoring. - Ensured consistent logging practices throughout the codebase, enhancing traceability and maintainability.	2026-02-21 03:32:11 -08:00
teknium1	b6247b71b5	refactor: update tool descriptions for clarity and conciseness - Revised descriptions for various tools in model_tools.py, browser_tool.py, code_execution_tool.py, delegate_tool.py, and terminal_tool.py to enhance clarity and reduce verbosity. - Improved consistency in terminology and formatting across tool descriptions, ensuring users have a clearer understanding of tool functionalities and usage.	2026-02-21 02:41:30 -08:00
teknium1	70dd3a16dc	Cleanup time!	2026-02-20 23:23:32 -08:00
teknium1	c0d412a736	refactor: update search tool parameters and documentation for clarity - Changed the target parameter from "content" and "files" to "grep" and "find" to better represent their functionality. - Revised descriptions in the tool definitions and execution code schema to enhance understanding of search modes and output formats. - Ensured consistency in the handling of search operations across the codebase.	2026-02-20 02:46:30 -08:00
teknium1	f9eb5edb96	refactor: rename search tool for clarity and consistency - Updated the tool name from "search" to "search_files" across multiple files to better reflect its functionality. - Adjusted related documentation and descriptions to ensure clarity in usage and expected behavior. - Enhanced the toolset definitions and mappings to incorporate the new naming convention, improving overall consistency in the codebase.	2026-02-20 02:43:57 -08:00
teknium1	3b90fa5c9b	fix: increase default timeout for code execution sandbox - Updated the default timeout for sandbox script execution from 120 seconds to 300 seconds (5 minutes) to allow longer-running scripts. - Enhanced comments in the code execution tool to clarify the timeout duration. - Suppressed stdout and stderr output from internal tool handlers during execution to prevent clutter in the CLI interface.	2026-02-20 01:29:53 -08:00
teknium1	273b367f05	fix: update documentation and return types for web tools - Revised docstrings for `web_search` and `web_extract` functions to clarify return types and structure. - Updated the execution code schema documentation to reflect changes in the output format for both tools, ensuring consistency and improved understanding for users.	2026-02-19 23:30:01 -08:00
teknium1	783acd712d	feat: implement code execution sandbox for programmatic tool calling - Introduced a new `execute_code` tool that allows the agent to run Python scripts that call Hermes tools via RPC, reducing the number of round trips required for tool interactions. - Added configuration options for timeout and maximum tool calls in the sandbox environment. - Updated the toolset definitions to include the new code execution capabilities, ensuring integration across platforms. - Implemented comprehensive tests for the code execution sandbox, covering various scenarios including tool call limits and error handling. - Enhanced the CLI and documentation to reflect the new functionality, providing users with clear guidance on using the code execution tool.	2026-02-19 23:23:43 -08:00

40 Commits