PR #568 moved the close entirely to the finally block, but the success-path
close is needed to break the RPC thread out of accept() immediately. Without
it, rpc_thread.join(3) may block for up to 3 seconds if the child process
never connected. The finally-block close remains as a safety net for the
exception/error path (the actual fd leak fix).
Authored by alireza78a. Moves server_sock.close() into the finally block so
the socket fd is always cleaned up, even if an exception occurs between socket
creation and the success-path close.
Authored by aydnOktay. Adds _atomic_write_text() helper using tempfile.mkstemp()
+ os.replace() to prevent skill file corruption on crash/interrupt. All 7
write_text() calls in skill_manager_tool.py converted, including rollback writes
during security scans.
Platform.SIGNAL was missing from default_toolset_map and platform_config_key
in gateway/run.py, causing Signal to silently fall back to hermes-telegram
toolset (same bug as HomeAssistant, fixed in PR #538).
Also updates:
- tests/test_toolsets.py: include hermes-signal and hermes-homeassistant in
the platform core-tools consistency check
- cli-config.yaml.example: document signal and homeassistant platform keys
Cherry-picked and improved from PR #470 (fixes#464).
Problem: On Ubuntu 24.04 with ghostty + tmux, the prompt input box
border lines flash due to cursor blink and raw spinner terminal writes
conflicting with prompt_toolkit's rendering.
Changes:
- cli.py: Add CursorShape.BLOCK to Application() to disable cursor blink
- cli.py: Add thinking_callback + spinner_widget in TUI layout so
thinking status displays as a proper prompt_toolkit widget instead of
raw terminal writes that conflict with the TUI renderer
- run_agent.py: Add thinking_callback parameter to AIAgent; when set,
uses the callback instead of KawaiiSpinner for thinking display
What was NOT changed (preserving existing behavior):
- agent/display.py: Untouched. KawaiiSpinner _write() stdout capture,
_animate() logic, and 0.12s frame interval all preserved. This
protects subagent stdout redirection and keeps smooth animations
for non-CLI contexts (gateway, batch runner).
- Original emoji spinner types (brain/sparkle/pulse/moon/star) preserved
for all non-CLI contexts.
Fixes from original PR #470:
- CursorShape.STEADY_BLOCK -> CursorShape.BLOCK (STEADY_BLOCK doesn't
exist in prompt_toolkit 3.0.52)
- Removed duplicate self._spinner_text = '' line
- Removed redundant nested if-checks
Tested: 2706 tests pass, interactive CLI verified via tmux.
Authored by Himess. Three independent fixes:
- cron/jobs.py: respect HERMES_HOME env var (consistent with scheduler.py)
- gateway/run.py: add Platform.HOMEASSISTANT to toolset mappings
- tools/environments/daytona.py: use time.monotonic() for timeout deadline
Authored by Himess. Replaces split(':', 2) with regex that optionally
captures Windows drive-letter prefix in rg/grep output parsing. Fixes
search_files returning zero results on Windows where paths like
C:\path\file.py:42:content were misparsed by naive colon splitting.
No behavior change on Unix/Mac.
Add comprehensive skill for building, testing, and debugging Hermes Agent
RL environments for Atropos training. Includes:
- SKILL.md: Full guide covering HermesAgentBaseEnv interface, required
methods, config class, CLI modes (serve/process/evaluate), reward
function patterns, common pitfalls, and minimum implementation checklist
- New 'Inference Setup' section: instructs the agent to always ask the
user for their inference provider (OpenRouter + model choice, self-hosted
VLLM endpoint, or other OpenAI-compatible API) before running tests
- references/agentresult-fields.md: AgentResult dataclass field reference
- references/atropos-base-env.md: Atropos BaseEnv API reference
- references/usage-patterns.md: Step-by-step patterns for process,
evaluate, serve, and smoke test modes
Will be auto-synced to ~/.hermes/skills/ via skills_sync.
When a user runs `hermes -w -c Pokemon Agent Dev` without quoting the
session name, argparse would fail with:
error: argument command: invalid choice: 'Agent'
This is because argparse parses `-c Pokemon` (consuming one token via
nargs='?'), then sees 'Agent' and tries to match it as a subcommand.
Fix: add _coalesce_session_name_args() that pre-processes sys.argv before
argparse, joining consecutive non-flag, non-subcommand tokens after -c or
-r into a single argument. This makes both quoted and unquoted multi-word
session names work transparently.
Includes 17 tests covering all edge cases: multi-word names, single-word,
bare flags, flag ordering, subcommand boundaries, and passthrough.
Authored by shitcoinsherpa. Adds encoding='utf-8' to all text-mode
open() calls in gateway/run.py, gateway/config.py, hermes_cli/config.py,
hermes_cli/main.py, and hermes_cli/status.py. Prevents encoding errors
on Windows where the default locale is not UTF-8.
Also fixed 4 additional open() calls in gateway/run.py that were added
after the PR branch was created.
The existing password prompt uses /dev/tty and termios to read input
with echo disabled. Neither exists on Windows.
On Windows, msvcrt.getwch() reads a single character from the console
without echoing it. This adds a Windows code path that uses getwch()
in a loop, collecting characters until Enter is pressed.
The Unix path using termios and /dev/tty is unchanged.
Authored by shitcoinsherpa. Imports winpty.PtyProcess on Windows instead
of ptyprocess.PtyProcess, and adds platform markers to the [pty] extra
so the correct package is installed automatically.
Complements PR #453 by 0xbyt4. Adds isinstance(dict) guard in
run_agent.py to catch cases where json.loads returns non-dict
(e.g. null, list, string) before they reach downstream code.
Also adds 15 tests for build_tool_preview covering None args,
empty dicts, known/unknown tools, fallback keys, truncation,
and all special-cased tools (process, todo, memory, session_search).
evaluate() was calling _llm_judge twice per item (once via
compute_reward, once directly) — double the API cost for no benefit.
Now extracts correctness from compute_reward's buffer instead.
Also: compute_reward appends to training metric buffers during eval,
which would pollute wandb training charts. Now rolls back buffer
entries added during eval so training metrics stay clean.
Comprehensive skill for playing Pokemon Red/Blue via the pokemon-agent
package (NousResearch/pokemon-agent). Includes:
- Full startup procedure (uv venv, server, localhost.run dashboard tunnel)
- Save/load lifecycle and naming conventions
- Gameplay loop with emphasis on frequent vision checks
- Hard-learned navigation tips:
- Use vision every 2-4 steps (RAM state is blind to obstacles)
- Wait 2-3 seconds after door/stair warps for map transitions
- Sidestep after exiting buildings to avoid re-entering
- Hold B to speed Gen 1's slow text scrolling
- Ledges are one-way — use vision to find gaps
- Battle strategy, type chart, Gen 1 quirks
- Memory conventions with PKM: prefix
- Progression milestones through all 8 gyms + Elite Four
The evaluate method was doing single-turn chat_completion (no tools),
which defeats the purpose of an agentic research benchmark. Fixed to
run the full HermesAgentLoop with web_search/web_extract tools.
Results comparison (Claude Sonnet 4.5, FRAMES benchmark):
Without tools (broken): 0.56 mean correctness
With agent loop + tools: 1.00 mean correctness, 0.994 reward
New eval metrics: mean_correctness, mean_reward, mean_tool_calls,
tool_usage_rate — all logged via evaluate_log() in lighteval format.
AgentResult has .messages (list of dicts), not .final_response or
.tool_calls. Fixed compute_reward to extract the final response
and tool names from the message history.
Verified with live process mode test:
- Agent used 7 tool calls (web_search, web_extract)
- Produced a 1106-char researched response about Winter Olympics
- Reward: 0.384 (partial correctness via LLM judge)
- JSONL output contains valid tokens, masks, scores, messages
cap-drop ALL removes DAC_OVERRIDE, which root needs to write to
bind-mounted directories owned by the host user (uid 1000). This
broke persistent Docker sandboxes — the container couldn't write
to /workspace or /root.
Add back the minimum capabilities needed:
- DAC_OVERRIDE: root can write to bind-mounted dirs owned by host user
- CHOWN: package managers (pip, npm, apt) need to set file ownership
- FOWNER: needed for operations on files owned by other users
Still drops all other capabilities (NET_RAW, SYS_ADMIN, etc.) and
keeps no-new-privileges. Security boundary is the container itself.
Verified end-to-end: create files → destroy container → new container
with same task_id → files persist on host and are accessible in the
new container.
The environment was merged missing several standard components.
Updated to match the patterns established by 82 Atropos environments
and our own HermesAgentBaseEnv contract.
Added:
- WebResearchEnvConfig — custom Pydantic config with reward weights,
efficiency thresholds, eval settings, dataset config (all tunable
via CLI/YAML without code changes)
- config_init() classmethod — default server config (OpenRouter +
Claude) so the env works out of the box
- wandb_log() override — logs reward breakdown metrics (correctness,
tool_usage, efficiency, diversity, correct_rate, tool_usage_rate)
with proper buffer management and super() call
- evaluate() — uses server.chat_completion instead of broken stub
_run_agent_on_item(). Logs via evaluate_log() for lighteval-
compatible output.
Fixed:
- Removed broken _run_agent_on_item() stub that returned empty results
- evaluate() now uses server.chat_completion (same pattern as
TerminalTestEnv) for actual model evaluation
- compute_reward reads tool calls from AgentResult properly
- LLM judge uses self.server.chat_completion instead of ctx
Reward config is now tunable without code changes:
--env.correctness_weight 0.6
--env.tool_usage_weight 0.2
--env.efficiency_weight 0.2
--env.diversity_bonus 0.1
--env.efficient_max_calls 5
Authored by PercyDikec. Fixes#445.
Changes 'hide' to 'hidden' in _fetch_models_from_api to match
_read_cache_models and the actual API response format.
Validate that the username portion of ~username paths contains only
valid characters (alphanumeric, dot, hyphen, underscore) before passing
to shell echo for expansion. Previously, paths like '~; rm -rf /'
would be passed unquoted to self._exec(f'echo {path}'), allowing
arbitrary command execution.
The approach validates the username rather than using shlex.quote(),
which would prevent tilde expansion from working at all since
echo '~user' outputs the literal string instead of expanding it.
Added tests for injection blocking and valid ~username/path expansion.
Credit to @alireza78a for reporting (PR #442, issue #442).
Authored by jackx707. Adds web_research_env.py (Atropos RL environment for
multi-step web research using FRAMES benchmark) and batch generation config.
The gateway's config.yaml → env var bridge was missing docker_volumes,
so Docker volume mounts configured in config.yaml were ignored for
gateway sessions (Telegram, Discord, etc.) while working in CLI.
Also fixes list serialization: str() produces Python repr with single
quotes which json.loads() in terminal_tool.py can't parse. Now uses
json.dumps() for list values.
Based on PR #431 by @manuelschipper (applied manually due to stale branch).
Vision auto-mode previously only tried OpenRouter, Nous, and Codex
for multimodal — deliberately skipping custom endpoints with the
assumption they 'may not handle vision input.' This caused silent
failures for users running local multimodal models (Qwen-VL, LLaVA,
Pixtral, etc.) without any cloud API keys.
Now custom endpoints are tried as a last resort in auto mode. If the
model doesn't support vision, the API call fails gracefully — but
users with local vision models no longer need to manually set
auxiliary.vision.provider: main in config.yaml.
Reported by @Spadav and @kotyKD.
The Docker backend already supports user-configured volume mounts via
docker_volumes, but it was undocumented — missing from DEFAULT_CONFIG,
cli.py defaults, and configuration docs.
Changes:
- hermes_cli/config.py: Add docker_volumes to DEFAULT_CONFIG with
inline documentation and examples
- cli.py: Add docker_volumes to load_cli_config defaults
- configuration.md: Full Docker Volume Mounts section with YAML
examples, use cases (providing files, receiving outputs, shared
workspaces), and env var alternative
Authored by aydnOktay. Improves URL validation with urlparse, adds exc_info
to error logs for full stack traces, and tightens type hints.
Resolved merge conflict in _handle_vision_analyze: kept PR's string formatting
with our AUXILIARY_VISION_MODEL env var logic.
MCP tests import from mcp.types but mcp wasn't in the dev optional
dependencies. Fresh 'pip install -e .[dev]' setups failed 3 tests.
Based on PR #427 by @teyrebaz33 (applied manually due to stale branch).
The 'hermes gateway setup' instructions for Slack were missing:
- The 'Subscribe to Events' step entirely (message.im, message.channels,
app_mention, message.groups)
- Several required scopes (app_mentions:read, groups:history, users:read,
files:write)
- Warning about bot only working in DMs without message.channels
- Step to invite the bot to channels
The 'hermes setup' flow (setup.py) and the website docs (slack.md)
already had the correct information — only gateway.py was outdated.
Reported by JordanB on Slack.
The #1 support issue with Slack is 'bot works in DMs but not channels'.
This is almost always caused by missing event subscriptions (message.channels,
message.groups) or missing OAuth scopes (channels:history, groups:history).
Changes:
- slack.md: Move channels:history and groups:history from optional to required
scopes. Move message.channels and message.groups to required events. Add new
'How the Bot Responds' section explaining DM vs channel behavior. Add Step 8
for inviting bot to channels. Expand troubleshooting table with specific
'works in DMs not channels' entry. Add quick checklist for channel debugging.
- setup.py: Expand Slack setup wizard with all required scopes, event
subscriptions, and a warning that without message.channels/message.groups
the bot only works in DMs. Add link to full docs. Improve Member ID
discovery instructions.
- config.py: Update SLACK_BOT_TOKEN and SLACK_APP_TOKEN descriptions to list
required scopes and event subscriptions inline.
- Register no-op app_mention event handler to suppress Bolt 404 errors.
The 'message' handler already processes @mentions in channels, so
app_mention is acknowledged without duplicate processing.
- Add send_document() for native file attachments (PDFs, CSVs, etc.)
via files_upload_v2, matching the pattern from Telegram PR #779.
- Add send_video() for native video uploads via files_upload_v2.
- Handle incoming document attachments from users: download, cache,
and inject text content for .txt/.md files (capped at 100KB),
following the same pattern as the Telegram adapter.
- Add _download_slack_file_bytes() helper for raw byte downloads.
- Add 24 new tests covering all new functionality.
Fixes the unhandled app_mention events reported in gateway logs.
The full HERMES-AGENT ASCII logo needs ~95 columns, and the
side-by-side caduceus + tools panel needs ~80. In narrow terminals
(Kitty default, resized windows) everything wraps into visual garbage.
Fixes:
- show_banner() auto-detects terminal width and falls back to compact
banner when < 80 columns
- build_welcome_banner() skips the ASCII logo when < 95 columns
- Compact banner now dynamically sized via _build_compact_banner()
instead of a hardcoded 64-char box that also wrapped in narrow terms
- Same width checks applied to /clear command's banner refresh
The up/down arrow key issue in Kitty terminal for multiline input is
a known Kitty keyboard protocol (CSI u) vs prompt_toolkit compatibility
gap — arrow keys work correctly in standard terminals and tmux. Users
can work around it by running in tmux or setting TERM=xterm-256color.
Adds a 'find-nearby' skill for discovering nearby places using
OpenStreetMap (Overpass + Nominatim). No API keys needed. Works with:
- Coordinates (from Telegram location pins)
- Addresses, cities, zip codes, landmarks (auto-geocoded)
- Multiple place types (restaurant, cafe, bar, pharmacy, etc.)
Returns names, distances, cuisine, hours, addresses, and Google Maps
links (pin + directions). 184-line stdlib-only script.
Also adds Telegram location message handling:
- New MessageType.LOCATION in gateway base
- Telegram adapter handles LOCATION and VENUE messages
- Injects lat/lon coordinates into conversation context
- Prompts agent to ask what the user wants nearby
Inspired by PR #422 (reimplemented with simpler script and broader
skill scope — addresses/cities/zips, not just Telegram coordinates).
Selecting a saved custom provider now switches instantly without
probing /models — the model name is stored in the config entry
as a complete profile (name + url + key + model).
Changes:
- custom_providers entries now include 'model' field
- Selecting a saved provider with a model just activates it
- Only probes /models if no model is saved (first-time setup)
- Menu shows saved model name: 'Local (localhost:8000) — llama-70b'
- Dedup on re-entry: still activates the model, just doesn't add
a duplicate config entry (updates model name if changed)
When a user adds a custom endpoint via 'hermes model' → 'Custom
endpoint', it now automatically saves to custom_providers in
config.yaml so it persists and appears in the provider menu on
subsequent runs. Deduplicates by base_url.
Auto-generated names based on URL:
http://localhost:8000/v1 → 'Local (localhost:8000)'
https://xyz.runpod.ai/v1 → 'RunPod (xyz.runpod.ai)'
https://api.example.com/v1 → 'Api.example.com'
Also adds 'Remove a saved custom provider' option to the menu
(only shown when custom providers exist) with a selection UI
to pick which one to remove.
Users can also manually edit custom_providers in config.yaml
for full control over names and settings.