The web_extract_tool was stripping the 'url' key during its output
trimming step, but documentation in 3 places claimed it was present.
This caused KeyError when accessing result['url'] in execute_code
scripts, especially when extracting from multiple URLs.
Changes:
- web_tools.py: Add 'url' back to trimmed_results output
- code_execution_tool.py: Add 'title' to _TOOL_STUBS docstring and
_TOOL_DOC_LINES so docs match actual {url, title, content, error}
response format
Adds market-data/polymarket skill — read-only access to Polymarket's public
prediction market APIs. Zero dependencies, zero auth required.
Addresses #589.
Adds a new market-data/polymarket skill for querying Polymarket's public
prediction market APIs. Pure read-only, zero authentication required,
zero external dependencies (stdlib only).
Includes:
- SKILL.md: Agent instructions with key concepts and workflow
- references/api-endpoints.md: Full API reference (Gamma, CLOB, Data APIs)
- scripts/polymarket.py: CLI helper for search, trending, prices, orderbooks,
price history, and recent trades
Addresses #589.
Root cause: fal_client.AsyncClient uses @cached_property for its
httpx.AsyncClient, creating it once and caching forever. In the gateway,
the agent runs in a thread pool where _run_async() calls asyncio.run()
which creates a temporary event loop. The first call works, but
asyncio.run() closes that loop. On the next call, a new loop is created
but the cached httpx.AsyncClient still references the old closed loop,
causing 'Event loop is closed'.
Fix: Switch from async fal_client API (submit_async/handler.get with
await) to sync API (submit/handler.get). The sync API uses httpx.Client
which has no event loop dependency. Since the tool already runs in a
thread pool via the gateway, async adds no benefit here.
Changes:
- image_generate_tool: async def -> def
- _upscale_image: async def -> def
- fal_client.submit_async -> fal_client.submit
- await handler.get() -> handler.get()
- is_async=True -> is_async=False in registry
- Remove unused asyncio import
Authored by voidborne-d. Fixes#576.
Adds --replace flag to 'hermes gateway run' that terminates any existing
gateway instance (SIGTERM with SIGKILL fallback) before starting.
Updated systemd unit template with --replace, ExecStop, KillMode, and
TimeoutStopSec for robust service management.
The /clear command was using Rich's console.clear() and console.print()
which write directly to stdout. Inside the TUI, prompt_toolkit's
patch_stdout intercepts stdout via StdoutProxy, which doesn't interpret
screen-clearing escape sequences and mangles Rich's ANSI output,
resulting in raw escape codes dumped to the terminal.
Fix:
- Use prompt_toolkit's output.erase_screen() + cursor_goto() to clear
the terminal directly (bypasses patch_stdout's StdoutProxy)
- Render the banner through ChatConsole (which routes Rich output
through prompt_toolkit's native print_formatted_text/ANSI renderer)
- Use _cprint for the status message (prompt_toolkit-compatible)
- Fall back to the old behavior when not inside the TUI (e.g. startup)
Updated the _generate_summary method to attempt summary generation using the auxiliary model first, with a fallback to the main model. If both attempts fail, the method now returns None instead of a placeholder, allowing the caller to handle missing summaries appropriately. This change enhances the robustness of context compression and improves logging for failure scenarios.
Added support for loading reasoning configuration, prefill messages, and provider routing from environment variables or config.yaml in the run_job function. This improves flexibility and customization for job execution, allowing for better control over agent behavior and message handling.
Enhanced the _run_single_child function by introducing max_tokens, reasoning_config, and prefill_messages parameters from the parent agent. This allows for more flexible configuration of child agents, improving their operational capabilities.
Previously, when a session expired (idle/daily reset), the memory flush
ran synchronously inside get_or_create_session — blocking the user's
message for 10-60s while an LLM call saved memories.
Now a background watcher task (_session_expiry_watcher) runs every 5 min,
detects expired sessions, and flushes memories proactively in a thread
pool. By the time the user sends their next message, memories are
already saved and the response is immediate.
Changes:
- Add _is_session_expired(entry) to SessionStore — works from entry
alone without needing a SessionSource
- Add _pre_flushed_sessions set to track already-flushed sessions
- Remove sync _on_auto_reset callback from get_or_create_session
- Refactor flush into _flush_memories_for_session (sync worker) +
_async_flush_memories (thread pool wrapper)
- Add _session_expiry_watcher background task, started in start()
- Simplify /reset command to use shared fire-and-forget flush
- Add 10 tests for expiry detection, callback removal, tracking
Reduces token usage and latency for most tasks by defaulting to
medium reasoning effort instead of xhigh. Users can still override
via config or CLI flag. Updates code, tests, example config, and docs.
When running under systemd, the gateway could enter restart loops in two
scenarios:
1. The previous gateway process hasn't fully exited when systemd starts
a new one, causing 'Gateway already running (PID ...)' → exit 1 →
restart → same error → infinite loop.
2. The interactive CLI exits immediately in non-TTY mode, and systemd
keeps restarting it.
Changes:
- Add --replace flag to 'hermes gateway run' that gracefully kills any
existing gateway instance (SIGTERM → wait 10s → SIGKILL) before
starting, preventing the PID-lock deadlock.
- Update the generated systemd unit template to use --replace by default,
add ExecStop for clean shutdown, set KillMode=mixed and
TimeoutStopSec=15 for proper process management.
- Existing behavior (without --replace) is unchanged: still prints the
error message and exits, now also mentioning the --replace option.
Fixes#576
Eliminated the model parameter from the AIAgent class initialization, streamlining the constructor and ensuring consistent behavior across agent instances. This change aligns with recent updates to the task delegation logic.
Added functionality to detect the appropriate Z.AI endpoint based on the provided API key, accommodating different billing plans and regions. The setup process now probes available endpoints and updates the configuration accordingly, enhancing user experience and reducing potential billing errors. Updated the setup model provider function to integrate this new detection logic.
Eliminated the model parameter from the delegate_task function and its associated schema, defaulting to None for subagent calls. This change simplifies the function signature and enforces consistent behavior across task delegation.
Added logic to manage multiple compression attempts for large payloads and context length errors. Introduced limits on compression attempts to prevent infinite retries, with appropriate logging and error handling. This ensures better resilience and user feedback when facing compression issues during API calls.
Updated the default model version from "anthropic/claude-sonnet-4-20250514" to "anthropic/claude-sonnet-4.6" across multiple files including AGENTS.md, batch_runner.py, mini_swe_runner.py, and run_agent.py for consistency and to reflect the latest model improvements.
_make_cli() did not clear HERMES_MAX_ITERATIONS env var, so tests
failed in CI where the var was set externally. Also, default max_turns
changed from 60 to 90 in 0a82396 but tests were not updated.
- Clear HERMES_MAX_ITERATIONS in _make_cli() for proper isolation
- Add env_overrides parameter for tests that need specific env values
- Update hardcoded 60 assertions to 90 to match new default
- Simplify test_env_var_max_turns using env_overrides
Subagent tool calls now count toward the same session-wide iteration
limit as the parent agent. Previously, each subagent had its own
independent counter, so a parent with max_iterations=60 could spawn
3 subagents each doing 50 calls = 150 total tool calls unmetered.
Changes:
- IterationBudget: thread-safe shared counter (run_agent.py)
- consume(): try to use one iteration, returns False if exhausted
- refund(): give back one iteration (for execute_code turns)
- Thread-safe via Lock (subagents run in ThreadPoolExecutor)
- Parent creates the budget, children inherit it via delegate_tool.py
- execute_code turns are refunded (don't count against budget)
- Default raised from 60 → 90 to account for shared consumption
- Per-child cap (50) still applies as a safety valve
The per-child max_iterations (default 50) remains as a per-child
ceiling, but the shared budget is the hard session-wide limit.
A child stops at whichever comes first.
Enhance message compression by adding a method to clean up orphaned tool-call and tool-result pairs. This ensures that the API receives well-formed messages, preventing errors related to mismatched IDs. The new functionality includes removing orphaned results and adding stub results for missing calls, improving overall message integrity during compression.
Check how many commits behind origin/main the local repo is and
display a warning in the welcome banner:
⚠ 12 commits behind — run hermes update to update
- git fetch cached for 6 hours (avoids repeated network calls)
- Falls back gracefully if offline or not a git repo
- Never breaks the banner — all errors silently caught
- tools_config.py: Add 'Local Browser' as first provider option
(no API keys needed, same npm install for agent-browser)
- setup.py: Show 'Browser Automation (local)' when agent-browser
CLI is found but no Browserbase key is set
- config.py: Mark BROWSERBASE_* descriptions as optional
- status.py: Note that local browser works without Browserbase
When Telegram's MarkdownV2 parser rejects a message, the send() fallback
was sending the already-escaped text as plain text. This caused users to
see raw backslashes before every special character (periods, dashes,
parentheses, etc.) — e.g. 'sentence\.' or '\-\-auto\-approve'.
Changes:
- Add _strip_mdv2() to reverse MarkdownV2 escaping for clean plaintext
- Use stripped text in the send() fallback path instead of raw escaped chunk
- Add logging when the MDV2 fallback is triggered for diagnostics
- Add logger to telegram.py (was missing)
The edit_message() fallback already correctly used the original content;
this brings send() in line with that behavior.
Add local browser mode as an automatic fallback when Browserbase
credentials are not configured. Uses the same agent-browser CLI with
--session (local Chromium) instead of --cdp (cloud Browserbase).
The agent-facing API is completely unchanged — all 10 browser_* tools
produce identical output in both modes. Auto-detection:
- BROWSERBASE_API_KEY set → cloud mode (existing behavior)
- No key → local mode (new, free, headless Chromium)
Changes:
- _is_local_mode(): auto-detect based on env vars
- _create_local_session(): lightweight session (no API call)
- _get_session_info(): branches on local vs cloud
- _run_browser_command(): --session in local, --cdp in cloud
- check_browser_requirements(): only needs agent-browser CLI in local mode
- _emergency_cleanup: CLI close in local, API release in cloud
- cleanup_browser/browser_close: skip BB API calls in local mode
- Registry: removed requires_env — check_fn handles both modes
Setup for local mode:
npm install -g agent-browser
agent-browser install # downloads Chromium
agent-browser install --with-deps # also installs system libs (Docker/Debian)
Closes#374 (Phase 1)
Add a 'platforms' field to SKILL.md frontmatter that restricts skills
to specific operating systems. Skills with platforms: [macos] only
appear in the system prompt, skills_list(), and slash commands on macOS.
Skills without the field load everywhere (backward compatible).
Implementation:
- skill_matches_platform() in tools/skills_tool.py — core filter
- Wired into all 3 discovery paths: prompt_builder.py, skills_tool.py,
skill_commands.py
- 28 new tests across 3 test files
New bundled Apple/macOS skills (all platforms: [macos]):
- imessage — Send/receive iMessages via imsg CLI
- apple-reminders — Manage Reminders via remindctl CLI
- apple-notes — Manage Notes via memo CLI
- findmy — Track devices/AirTags via AppleScript + screen capture
Docs updated: CONTRIBUTING.md, AGENTS.md, creating-skills.md,
skills.md (user guide)
The previous 'get_env_value' in dir() check always evaluated to False
(dir() returns local scope, not module scope), making the left branch
dead code. Simplified to just os.getenv() which was the fallback anyway.
Authored by areu01or00. Adds timezone support via hermes_time.now() helper
with IANA timezone resolution (HERMES_TIMEZONE env → config.yaml → server-local).
Updates system prompt timestamp, cron scheduling, and execute_code sandbox TZ
injection. Includes config migration (v4→v5) and comprehensive test coverage.
uv pip install requires a virtual environment by default. When hermes
is installed system-wide or via pipx, the setup wizard's SDK installs
(daytona, swe-rex[modal], tinker-atropos) fail with 'No virtual
environment found'. Fix by passing --python sys.executable to uv,
which targets the correct Python regardless of venv state.
Also show the actual error message on install failure so users can
debug.
config['model'] can be a dict (old format: {default, base_url, provider})
or a string (new format). The setup wizard was showing the raw dict in
'Keep current' and 'Model set to' messages. Now extracts the model name
from either format.
Both 'hermes tools' and 'hermes setup tools' now use the same unified
flow in tools_config.py:
1. Select platform (CLI, Telegram, Discord, etc.)
2. Toggle all 18 toolsets on/off in checklist
3. Newly enabled tools that need API keys → provider-aware config
(e.g., TTS shows Edge/OpenAI/ElevenLabs picker)
4. Already-configured tools that stay enabled → silent, no prompts
5. Menu option: 'Reconfigure an existing tool' for updating
providers or API keys on tools that are already set up
Key changes:
- Move TOOL_CATEGORIES, provider config, and post-setup hooks from
setup.py to tools_config.py
- Replace flat _check_and_prompt_requirements() with provider-aware
_configure_toolset() that uses TOOL_CATEGORIES
- Add _reconfigure_tool() flow for updating existing configs
- setup.py's setup_tools() now delegates to tools_command()
- tools_command() menu adds 'Reconfigure' option alongside platforms
- Only prompt for API keys on tools that are NEWLY toggled on AND
don't already have keys configured
No breaking changes. All 2013 tests pass.
simple_term_menu miscalculates string widths when labels contain
ANSI escape codes (from color()) or em dashes, causing duplicated
and garbled lines on arrow key navigation.
Replace color() status indicators with plain text [configured]/[active]
and em dashes with regular dashes in all prompt_choice/prompt_checklist
labels.
Restructure the monolithic hermes setup wizard into independently-runnable
sections with a category-first tool configuration experience.
Changes:
- Break setup into 5 sections: model, terminal, gateway, tools, agent
- Each section is a standalone function, runnable individually via
'hermes setup model', 'hermes setup terminal', etc.
- Returning users get a menu: Quick Setup / Full Setup / individual sections
- First-time users get a guided walkthrough of all sections
Tool Configuration UX overhaul:
- Replace flat API key checklist with category-first approach
- Show tool types (TTS, Web Search, Image Gen, etc.) as top-level items
- Within each category, let users pick a provider:
- TTS: Microsoft Edge (Free), OpenAI, ElevenLabs
- Web: Firecrawl Cloud, Firecrawl Self-Hosted
- Image Gen: FAL.ai
- Browser: Browserbase
- Smart Home: Home Assistant
- RL Training: Tinker/Atropos
- GitHub: Personal Access Token
- Shows configured status on each tool and provider
- Only prompts for API keys after provider selection
Also:
- Add section argument to setup argparse parser in main.py
- Update summary to show new section commands
- Add self-hosted Firecrawl and Home Assistant to tool setup
- All 2013 tests pass
When `fetch_nous_models()` fails silently during setup, the model
selection falls through to the OpenRouter static list. Users then pick
models in OpenRouter format (e.g. `anthropic/claude-opus-4.6`) which
the Nous inference API rejects with a 400 "missing model" error.
Add an explicit `elif selected_provider == "nous"` branch that prompts
for manual model entry instead of falling through to the generic
OpenRouter fallback.
Adds eval-only benchmark for YC-Bench (collinear-ai/yc-bench), a
deterministic long-horizon benchmark where the agent acts as CEO of an
AI startup over a simulated 1-3 year run.
Key design decisions verified against the official yc-bench repo:
- Uses 'sim init' (NOT 'yc-bench run') to avoid starting a competing
built-in agent loop
- Correct DB table names: 'companies' and 'sim_events'
- Correct 4 domains: research, inference, data_environment, training
- Penalty values are preset-dependent (not hardcoded in system prompt)
- Sequential evaluation (each run is 100-500 turns)
- Follows TerminalBench2 patterns: KeyboardInterrupt handling,
cleanup_all_environments(), tqdm logging handler, streaming JSONL
yc-bench added as optional dependency: pip install hermes-agent[yc-bench]
Closes#340
These direct providers don't return cost in API responses and their
per-token pricing isn't readily available externally. Treat as local
models with zero cost so they appear in /insights without fake estimates.
When the user only has a z.ai/Kimi/MiniMax API key (no OpenRouter key),
auxiliary tasks (context compression, web summarization, session search)
now fall back to the configured direct provider instead of returning None.
Resolution chain: OpenRouter -> Nous -> Custom endpoint -> Codex OAuth
-> direct API-key providers -> None.
Uses cheap/fast models for auxiliary tasks:
- zai: glm-4.5-flash
- kimi-coding: kimi-k2-turbo-preview
- minimax/minimax-cn: MiniMax-M2.5-highspeed
Vision auxiliary intentionally NOT modified — vision needs multimodal
models (Gemini) that these providers don't serve.
Adds DEFAULT_CONTEXT_LENGTHS entries for kimi-k2.5 (262144), kimi-k2-thinking
(262144), kimi-k2-turbo-preview (262144), kimi-k2-0905-preview (131072),
MiniMax-M2.5/M2.5-highspeed/M2.1 (204800), and glm-4.5/4.5-flash (131072).
Avoids unnecessary 2M-token probe on first use with direct providers.