Authored by manuelschipper. Adds GLM-4.7 and GLM-5 context lengths (202752)
to model_metadata.py. The key priority fix (prefer OPENAI_API_KEY for
non-OpenRouter endpoints) was already applied in PR #295; merged the Z.ai
mention into the comment.
When a user disables the web toolset via 'hermes tools', the execute_code
schema description still hardcoded web_search/web_extract as available,
causing the model to keep trying to use them. Similarly, delegate_task
always defaulted to ['terminal', 'file', 'web'] for subagents regardless
of the parent's config.
Changes:
- execute_code schema is now built dynamically via build_execute_code_schema()
based on which sandbox tools are actually enabled
- model_tools.py rebuilds the execute_code schema at definition time using
the intersection of sandbox-allowed and session-enabled tools
- delegate_task now inherits the parent agent's enabled_toolsets instead of
hardcoding DEFAULT_TOOLSETS when no explicit toolsets are specified
- delegate_task description updated to say 'inherits your enabled toolsets'
Reported by kotyKD on Discord.
API key selection is now base_url-aware: when the resolved base_url
targets OpenRouter, OPENROUTER_API_KEY takes priority (preserving the
#289 fix). When hitting any other endpoint (Z.ai, vLLM, custom, etc.),
OPENAI_API_KEY takes priority so the OpenRouter key doesn't leak.
Applied in both the runtime provider resolver (the real code path) and
the CLI initial default (for consistency).
Fixes#560.
_make_cli() now patches CLI_CONFIG with clean defaults so
test_cli_init tests don't depend on the developer's local config.yaml.
test_empty_dir_returns_empty now mocks Path.home() so it doesn't pick
up a global SOUL.md.
Credit to teyrebaz33 for identifying and fixing these in PR #557.
Fixes#555.
Two bugs in sync_skills():
1. Failed copytree poisons manifest: when shutil.copytree fails (disk
full, permission error), the skill is still recorded in the manifest.
On the next sync, the skill appears as "in manifest but not on disk"
which is interpreted as "user deliberately deleted it" — the skill
is never retried. Fix: only write to manifest on successful copy.
2. Failed update destroys user copy: rmtree deletes the existing skill
directory before copytree runs. If copytree then fails, the user's
skill is gone with no way to recover. Fix: move to .bak before
copying, restore from backup if copytree fails.
Both bugs are proven by new regression tests that fail on the old code
and pass on the fix.
- Added fallback mechanism to utilize previous content when the model generates an empty response after tool calls, reducing unnecessary API retries.
- Enhanced logging to indicate when prior content is used as a final response.
- Updated logic to ensure that genuine empty responses are retried appropriately, maintaining user experience.
- website/docs/reference/cli-commands.md: Added 'hermes insights' terminal
command section with --days and --source flags, plus /insights slash command
in the Conversation section
- website/docs/user-guide/cli.md: Added /insights to slash commands table
- website/docs/user-guide/messaging/index.md: Added /insights to gateway
chat commands table
- website/docs/user-guide/sessions.md: Added cross-reference to hermes
insights from the sessions stats section
Upgrade skills_sync manifest to v2 format (name:origin_hash). The origin
hash records the MD5 of the bundled skill at the time it was last synced.
On update, the user's copy is compared against the origin hash:
- User copy == origin hash → unmodified → safe to update from bundled
- User copy != origin hash → user customized → skip (preserve changes)
v1 manifests (plain names) are auto-migrated: the user's current hash
becomes the baseline, so future syncs can detect modifications.
Output now shows user-modified skills:
~ whisper (user-modified, skipping)
27 tests covering all scenarios including v1→v2 migration, user
modification detection, update after migration, and origin hash tracking.
2009 tests pass.
- Restored 21 skills removed in commits 757d012 and 740dd92:
accelerate, audiocraft, code-review, faiss, flash-attention, gguf,
grpo-rl-training, guidance, llava, nemo-curator, obliteratus, peft,
pytorch-fsdp, pytorch-lightning, simpo, slime, stable-diffusion,
tensorrt-llm, torchtitan, trl-fine-tuning, whisper
- Rewrote sync_skills() with proper update semantics:
* New skills (not in manifest): copied to user dir
* Existing skills (in manifest + on disk): updated via hash comparison
* User-deleted skills (in manifest, not on disk): respected, not re-added
* Stale manifest entries (removed from bundled): cleaned from manifest
- Added sync_skills() to CLI startup (cmd_chat) and gateway startup
(start_gateway) — previously only ran during 'hermes update'
- Updated cmd_update output to show new/updated/cleaned counts
- Rewrote tests: 20 tests covering manifest CRUD, dir hashing, fresh
install, user deletion respect, update detection, stale cleanup, and
name collision handling
75 bundled skills total. 2002 tests pass.
Issues found and fixed during deep code path review:
1. CRITICAL: Prefix matching returned wrong prices for dated model names
- 'gpt-4o-mini-2024-07-18' matched gpt-4o ($2.50) instead of gpt-4o-mini ($0.15)
- Same for o3-mini→o3 (9x), gpt-4.1-mini→gpt-4.1 (5x), gpt-4.1-nano→gpt-4.1 (20x)
- Fix: use longest-match-wins strategy instead of first-match
- Removed dangerous key.startswith(bare) reverse matching
2. CRITICAL: Top Tools section was empty for CLI sessions
- run_agent.py doesn't set tool_name on tool response messages (pre-existing)
- Insights now also extracts tool names from tool_calls JSON on assistant
messages, which IS populated for all sessions
- Uses max() merge strategy to avoid double-counting between sources
3. SELECT * replaced with explicit column list
- Skips system_prompt and model_config blobs (can be thousands of chars)
- Reduces memory and I/O for large session counts
4. Sets in overview dict converted to sorted lists
- models_with_pricing / models_without_pricing were Python sets
- Sets aren't JSON-serializable — would crash json.dumps()
5. Negative duration guard
- end > start check prevents negative durations from clock drift
6. Model breakdown sort fallback
- When all tokens are 0, now sorts by session count instead of arbitrary order
7. Removed unused timedelta import
Added 6 new tests: dated model pricing (4), tool_calls JSON extraction,
JSON serialization safety. Total: 69 tests.
Custom OAI endpoints, self-hosted models, and local inference should NOT
show fabricated cost estimates. Changed default pricing from $3/$12 per
million tokens to $0/$0 for unrecognized models.
- Added _has_known_pricing() to distinguish commercial vs custom models
- Models with known pricing show $ amounts; unknown models show 'N/A'
- Overview shows asterisk + note when some models lack pricing data
- Gateway format adds '(excludes custom/self-hosted models)' note
- Added 7 new tests for custom model cost handling
Comprehensive guide for using Hermes Agent with alternative LLM backends:
- Ollama (local models, zero config)
- vLLM (high-performance GPU inference)
- SGLang (RadixAttention, prefix caching)
- llama.cpp / llama-server (CPU & Metal inference)
- LiteLLM Proxy (multi-provider gateway)
- ClawRouter (cost-optimized routing with complexity scoring)
- 10+ other compatible providers table (Together, Groq, DeepSeek, etc.)
- Choosing the Right Setup decision table
- General custom endpoint setup instructions
All of these work via the existing OPENAI_BASE_URL + OPENAI_API_KEY
custom endpoint support — no code changes needed.
Comprehensive guide for using Hermes Agent with alternative LLM backends:
- Ollama (local models, zero config)
- vLLM (high-performance GPU inference)
- SGLang (RadixAttention, prefix caching)
- llama.cpp / llama-server (CPU & Metal inference)
- LiteLLM Proxy (multi-provider gateway)
- ClawRouter (cost-optimized routing with complexity scoring)
- 10+ other compatible providers table (Together, Groq, DeepSeek, etc.)
- Choosing the Right Setup decision table
- General custom endpoint setup instructions
All of these work via the existing OPENAI_BASE_URL + OPENAI_API_KEY
custom endpoint support — no code changes needed.
Inspired by Claude Code's /insights, adapted for Hermes Agent's multi-platform
architecture. Analyzes session history from state.db to produce comprehensive
usage insights.
Features:
- Overview stats: sessions, messages, tokens, estimated cost, active time
- Model breakdown: per-model sessions, tokens, and cost estimation
- Platform breakdown: CLI vs Telegram vs Discord etc. (unique to Hermes)
- Tool usage ranking: most-used tools with percentages
- Activity patterns: day-of-week chart, peak hours, streaks
- Notable sessions: longest, most messages, most tokens, most tool calls
- Cost estimation: real pricing data for 25+ models (OpenAI, Anthropic,
DeepSeek, Google, Meta) with fuzzy model name matching
- Configurable time window: --days flag (default 30)
- Source filtering: --source flag to filter by platform
Three entry points:
- /insights slash command in CLI (supports --days and --source flags)
- /insights slash command in gateway (compact markdown format)
- hermes insights CLI subcommand (standalone)
Includes 56 tests covering pricing helpers, format helpers, empty DB,
populated DB with multi-platform data, filtering, formatting, and edge cases.
Authored by aydnOktay. Companion to PR #297 (batch_runner). Applies the
same atomic write pattern (temp file + fsync + os.replace) to both
_write_checkpoint() and recover_from_checkpoint() in process_registry.py.
Prevents checkpoint corruption on gateway crashes. Also improves error
handling: bare 'pass' replaced with logger.debug(..., exc_info=True)
for better debugging.
Previously pressing Escape in any setup wizard menu called sys.exit(1),
killing the entire wizard with no way to recover. Now:
- prompt_choice: Escape keeps the current default and moves on (prints
'Skipped (keeping current)'). Shows '↑/↓ Navigate Enter Select
Esc Skip Ctrl+C Exit' hint.
- prompt_checklist: Escape returns pre-selected items instead of empty
list. Shows 'SPACE Toggle ENTER Confirm ESC Skip Ctrl+C Exit'.
- prompt_yes_no: now catches KeyboardInterrupt/EOFError properly.
- Fallback number prompts also show control hints.
Ctrl+C still exits the wizard cleanly.
Authored by aydnOktay. Three improvements to batch_runner fault tolerance:
1) Atomic checkpoint writes (temp file + fsync + os.replace) to prevent
corruption on crashes — same pattern as auth.py's _save_auth_store().
2) Incremental checkpoints after each batch result instead of only at end,
so interrupted runs can resume with minimal progress loss.
3) Resume loads existing checkpoint state instead of initializing empty,
preventing clobber of prior progress.
Conflict resolved: kept both the incremental checkpoint logic (PR) and
the batch worker error handling (HEAD) in the imap_unordered loop.
Authored by Farukest. Fixes#432. Extracts _kill_port_process() helper
that uses netstat+taskkill on Windows and fuser on Linux. Previously,
fuser calls were inline with bare except-pass, so on Windows orphaned
bridge processes were never cleaned up — causing 'address already in use'
errors on reconnect. Includes 5 tests covering both platforms, port
matching edge cases, and exception suppression.
Authored by Farukest. Fixes#435. The retry summary in
_handle_max_iterations() hardcoded max_tokens instead of using
_max_tokens_param(), which returns max_completion_tokens for direct
OpenAI API (required by gpt-4o, o-series). The first attempt already
used _max_tokens_param correctly — only the retry path was wrong.
Includes 4 tests for _max_tokens_param provider detection.
Authored by PercyDikec. Fixes#440. _handle_retry_command called
_handle_message(retry_event) but discarded the return value, returning
None instead. Since only _process_message_background sends the response
via adapter.send(), this meant the agent would run (tool progress was
visible) but the final answer was silently dropped on all platforms.
Authored by PercyDikec. Fixes#443. Without re.DOTALL, the regex .*
doesn't match newlines, so multi-line JSON arguments (the normal case)
silently fail to parse. Every other parser in the codebase that matches
across lines already uses re.DOTALL.
Authored by PercyDikec. Fixes#447. The status display used
codex_status.get('auth_file') but get_codex_auth_status() in auth.py
returns the path under 'auth_store' (line 1220). This one-char key
mismatch silently dropped the auth file path from 'hermes status'.
Verifies explicit allowlist keys, catch-all _API_KEY/_TOKEN patterns,
case insensitivity, TERMINAL_SSH prefix, and config.yaml routing for
non-secret keys. Covers the fix from PR #469.
The mock handler checked for function_name == 'search' but the RPC
sends 'search_files'. Any test exercising search_files through the
mock would get 'Unknown tool' instead of the canned response.
Add daytona_image to batch_runner per-prompt container image overrides
so batch processing works with the Daytona backend. Update inline
comments in RL environment files (agent_loop, tool_context) and
process_registry docstrings to include Daytona in backend lists.
The _TOOL_STUBS dict in code_execution_tool.py was out of sync with the
actual tool schemas, causing TypeErrors when the LLM used parameters it
sees in its system prompt but the sandbox stubs didn't accept:
search_files:
- Added missing params: context, offset, output_mode
- Fixed target default: 'grep' → 'content' (old value was obsolete)
patch:
- Added missing params: mode, patch (V4A multi-file patch support)
Also added 4 drift-detection tests (TestStubSchemaDrift) that will
catch future divergence between stubs and real schemas:
- test_stubs_cover_all_schema_params: every schema param in stub
- test_stubs_pass_all_params_to_rpc: every stub param sent over RPC
- test_search_files_target_uses_current_values: no obsolete values
- test_generated_module_accepts_all_params: generated code compiles
All 28 tests pass.
Authored by rovle. Adds Daytona as the sixth terminal execution backend
with cloud sandboxes, persistent workspaces, and full CLI/gateway integration.
Includes 24 unit tests and 8 integration tests.
The execute_code sandbox generates a hermes_tools.py stub module for LLM
scripts. Three common failure modes keep tripping up scripts:
1. json.loads(strict=True) rejects control chars in terminal() output
(e.g., GitHub issue bodies with literal tabs/newlines)
2. Shell backtick/quote interpretation when interpolating dynamic content
into terminal() commands (markdown with backticks gets eaten by bash)
3. No retry logic for transient network failures (API timeouts, rate limits)
Adds three convenience helpers to the generated hermes_tools module:
- json_parse(text) — json.loads with strict=False for tolerant parsing
- shell_quote(s) — shlex.quote() for safe shell interpolation
- retry(fn, max_attempts=3, delay=2) — exponential backoff wrapper
Also updates the EXECUTE_CODE_SCHEMA description to document these helpers
so LLMs know they're available without importing anything extra.
Includes 7 new tests (unit + integration) covering all three helpers.
Address code review findings:
Security (Medium):
- Path traversal guard in OptionalSkillSource.fetch() — resolve() and
validate that the path stays within optional-skills/ before reading
Bug fixes (Medium):
- Add 'builtin' to trust_style dicts in do_inspect() and
_resolve_short_name() — official skills now show bright_cyan 'official'
label consistently across all display functions (5/5 dicts fixed)
Edge cases (Low):
- Clamp page_size to [1, 100] in do_browse() to prevent ZeroDivisionError
- Update SkillMeta.source docstring to include 'official'
- Add browse command to optional-skills/DESCRIPTION.md
Add a browse command that shows all available skills across all registries,
paginated and sorted with official skills first.
Usage:
hermes skills browse # all sources, page 1
hermes skills browse --source official # only official optional skills
hermes skills browse --page 2 # page 2
hermes skills browse --size 30 # 30 per page
/skills browse # slash command in chat
Features:
- Official optional skills always appear first (★ marker, cyan styling)
- Per-source limits prevent overloading (100 official/github, 50 others)
- Deduplication by name preferring higher trust
- Sorted: official > trusted > community, then alphabetical
- Page navigation hints at bottom
- Source counts summary
- Works in both CLI and /skills chat interface
- Added 'official' as source filter option for search command too
Add 'optional-skills/' directory for official skills that ship with the repo
but are not copied to ~/.hermes/skills/ during setup. They are:
- NOT shown to the model in the system prompt
- NOT copied during hermes setup/update
- Discoverable via 'hermes skills search' labeled as 'official'
- Installable via 'hermes skills install' with builtin trust (no third-party warning)
- Auto-categorized on install based on directory structure
Implementation:
- OptionalSkillSource adapter in tools/skills_hub.py (search/fetch/inspect)
- Added to create_source_router() as first source (highest priority)
- Trust level 'builtin' for official skills in skills_guard.py
- Friendly install message for official skills (no third-party warning)
- 'official' label in cyan in search results and skill list
First optional skill: Blackbox CLI (autonomous-ai-agents/blackbox)
- Multi-model coding agent with built-in judge/Chairman pattern
- Delegates to Claude, Codex, Gemini, and Blackbox models
- Open-source CLI (GPL-3.0, TypeScript, forked from Gemini CLI)
- Requires paid Blackbox AI API key
Refs: #475
New docs page covering clipboard image paste across all platforms:
- Platform compatibility table (macOS, Linux X11/Wayland, WSL2, VSCode, SSH)
- Setup instructions per platform (xclip, wl-paste, powershell.exe)
- Explanation of terminal paste limitations and why /paste exists
- SSH workarounds (file upload, URLs, X11 forwarding, messaging)
- Keybinding reference (Alt+V, Ctrl+V, /paste) with when each works
Also updates CLI commands reference with /paste command and
Alt+V keybinding documentation.
Alt key combos pass through all terminal emulators (sent as ESC + key),
unlike Ctrl+V which terminals intercept for text paste. This is the
reliable way to attach clipboard images on WSL2, Windows Terminal,
VSCode, and SSH sessions where Ctrl+V never reaches the application
for image-only clipboard content.
Also adds 'Paste image: Alt+V (or /paste)' hint to /help output.
start_gateway() now checks for an existing running instance via PID file
before starting. If another gateway is already running under the same
HERMES_HOME, it refuses to start with a clear error message directing the
user to 'hermes gateway restart' or 'hermes gateway stop'.
Also fixes gateway/status.py to respect the HERMES_HOME env var instead of
hardcoding ~/.hermes. This scopes the PID file per HERMES_HOME directory,
which lays the groundwork for future multi-profile support where distinct
HERMES_HOME directories can run concurrent gateway instances independently.