Files

Teknium 745859babb feat: env var passthrough for skills and user config (#2807 )

* feat: env var passthrough for skills and user config

Skills that declare required_environment_variables now have those vars
passed through to sandboxed execution environments (execute_code and
terminal).  Previously, execute_code stripped all vars containing KEY,
TOKEN, SECRET, etc. and the terminal blocklist removed Hermes
infrastructure vars — both blocked skill-declared env vars.

Two passthrough sources:

1. Skill-scoped (automatic): when a skill is loaded via skill_view and
   declares required_environment_variables, vars that are present in
   the environment are registered in a session-scoped passthrough set.

2. Config-based (manual): terminal.env_passthrough in config.yaml lets
   users explicitly allowlist vars for non-skill use cases.

Changes:
- New module: tools/env_passthrough.py — shared passthrough registry
- hermes_cli/config.py: add terminal.env_passthrough to DEFAULT_CONFIG
- tools/skills_tool.py: register available skill env vars on load
- tools/code_execution_tool.py: check passthrough before filtering
- tools/environments/local.py: check passthrough in _sanitize_subprocess_env
  and _make_run_env
- 19 new tests covering all layers

* docs: add environment variable passthrough documentation

Document the env var passthrough feature across four docs pages:

- security.md: new 'Environment Variable Passthrough' section with
  full explanation, comparison table, and security considerations
- code-execution.md: update security section, add passthrough subsection,
  fix comparison table
- creating-skills.md: add tip about automatic sandbox passthrough
- skills.md: add note about passthrough after secure setup docs

Live-tested: launched interactive CLI, loaded a skill with
required_environment_variables, verified TEST_SKILL_SECRET_KEY was
accessible inside execute_code sandbox (value: passthrough-test-value-42).

2026-03-24 08:19:34 -07:00

7.9 KiB

Raw Blame History

sidebar_position, title, description

sidebar_position	title	description
8	Code Execution	Sandboxed Python execution with RPC tool access — collapse multi-step workflows into a single turn

Code Execution (Programmatic Tool Calling)

The execute_code tool lets the agent write Python scripts that call Hermes tools programmatically, collapsing multi-step workflows into a single LLM turn. The script runs in a sandboxed child process on the agent host, communicating via Unix domain socket RPC.

How It Works

The agent writes a Python script using from hermes_tools import ...
Hermes generates a hermes_tools.py stub module with RPC functions
Hermes opens a Unix domain socket and starts an RPC listener thread
The script runs in a child process — tool calls travel over the socket back to Hermes
Only the script's print() output is returned to the LLM; intermediate tool results never enter the context window

# The agent can write scripts like:
from hermes_tools import web_search, web_extract

results = web_search("Python 3.13 features", limit=5)
for r in results["data"]["web"]:
    content = web_extract([r["url"]])
    # ... filter and process ...
print(summary)

Available tools in sandbox: web_search, web_extract, read_file, write_file, search_files, patch, terminal (foreground only).

When the Agent Uses This

The agent uses execute_code when there are:

3+ tool calls with processing logic between them
Bulk data filtering or conditional branching
Loops over results

The key benefit: intermediate tool results never enter the context window — only the final print() output comes back, dramatically reducing token usage.

Practical Examples

Data Processing Pipeline

from hermes_tools import search_files, read_file
import json

# Find all config files and extract database settings
matches = search_files("database", path=".", file_glob="*.yaml", limit=20)
configs = []
for match in matches.get("matches", []):
    content = read_file(match["path"])
    configs.append({"file": match["path"], "preview": content["content"][:200]})

print(json.dumps(configs, indent=2))

Multi-Step Web Research

from hermes_tools import web_search, web_extract
import json

# Search, extract, and summarize in one turn
results = web_search("Rust async runtime comparison 2025", limit=5)
summaries = []
for r in results["data"]["web"]:
    page = web_extract([r["url"]])
    for p in page.get("results", []):
        if p.get("content"):
            summaries.append({
                "title": r["title"],
                "url": r["url"],
                "excerpt": p["content"][:500]
            })

print(json.dumps(summaries, indent=2))

Bulk File Refactoring

from hermes_tools import search_files, read_file, patch

# Find all Python files using deprecated API and fix them
matches = search_files("old_api_call", path="src/", file_glob="*.py")
fixed = 0
for match in matches.get("matches", []):
    result = patch(
        path=match["path"],
        old_string="old_api_call(",
        new_string="new_api_call(",
        replace_all=True
    )
    if "error" not in str(result):
        fixed += 1

print(f"Fixed {fixed} files out of {len(matches.get('matches', []))} matches")

Build and Test Pipeline

from hermes_tools import terminal, read_file
import json

# Run tests, parse results, and report
result = terminal("cd /project && python -m pytest --tb=short -q 2>&1", timeout=120)
output = result.get("output", "")

# Parse test output
passed = output.count(" passed")
failed = output.count(" failed")
errors = output.count(" error")

report = {
    "passed": passed,
    "failed": failed,
    "errors": errors,
    "exit_code": result.get("exit_code", -1),
    "summary": output[-500:] if len(output) > 500 else output
}

print(json.dumps(report, indent=2))

Resource Limits

Resource	Limit	Notes
Timeout	5 minutes (300s)	Script is killed with SIGTERM, then SIGKILL after 5s grace
Stdout	50 KB	Output truncated with `[output truncated at 50KB]` notice
Stderr	10 KB	Included in output on non-zero exit for debugging
Tool calls	50 per execution	Error returned when limit reached

All limits are configurable via config.yaml:

# In ~/.hermes/config.yaml
code_execution:
  timeout: 300       # Max seconds per script (default: 300)
  max_tool_calls: 50 # Max tool calls per execution (default: 50)

How Tool Calls Work Inside Scripts

When your script calls a function like web_search("query"):

The call is serialized to JSON and sent over a Unix domain socket to the parent process
The parent dispatches through the standard handle_function_call handler
The result is sent back over the socket
The function returns the parsed result

This means tool calls inside scripts behave identically to normal tool calls — same rate limits, same error handling, same capabilities. The only restriction is that terminal() is foreground-only (no background, pty, or check_interval parameters).

Error Handling

When a script fails, the agent receives structured error information:

Non-zero exit code: stderr is included in the output so the agent sees the full traceback
Timeout: Script is killed and the agent sees "Script timed out after 300s and was killed."
Interruption: If the user sends a new message during execution, the script is terminated and the agent sees [execution interrupted — user sent a new message]
Tool call limit: When the 50-call limit is hit, subsequent tool calls return an error message

The response always includes status (success/error/timeout/interrupted), output, tool_calls_made, and duration_seconds.

Security

:::danger Security Model The child process runs with a minimal environment. API keys, tokens, and credentials are stripped by default. The script accesses tools exclusively via the RPC channel — it cannot read secrets from environment variables unless explicitly allowed. :::

Environment variables containing KEY, TOKEN, SECRET, PASSWORD, CREDENTIAL, PASSWD, or AUTH in their names are excluded. Only safe system variables (PATH, HOME, LANG, SHELL, PYTHONPATH, VIRTUAL_ENV, etc.) are passed through.

Skill Environment Variable Passthrough

When a skill declares required_environment_variables in its frontmatter, those variables are automatically passed through to both execute_code and terminal sandboxes after the skill is loaded. This lets skills use their declared API keys without weakening the security posture for arbitrary code.

For non-skill use cases, you can explicitly allowlist variables in config.yaml:

terminal:
  env_passthrough:
    - MY_CUSTOM_KEY
    - ANOTHER_TOKEN

See the Security guide for full details.

The script runs in a temporary directory that is cleaned up after execution. The child process runs in its own process group so it can be cleanly killed on timeout or interruption.

execute_code vs terminal

Use Case	execute_code	terminal
Multi-step workflows with tool calls between	✅	❌
Simple shell command	❌	✅
Filtering/processing large tool outputs	✅	❌
Running a build or test suite	❌	✅
Looping over search results	✅	❌
Interactive/background processes	❌	✅
Needs API keys in environment	⚠️ Only via passthrough	✅ (most pass through)

Rule of thumb: Use execute_code when you need to call Hermes tools programmatically with logic between calls. Use terminal for running shell commands, builds, and processes.

Platform Support

Code execution requires Unix domain sockets and is available on Linux and macOS only. It is automatically disabled on Windows — the agent falls back to regular sequential tool calls.

7.9 KiB Raw Blame History