Previously, all project context files (AGENTS.md, .cursorrules, .hermes.md)
were loaded and concatenated into the system prompt. This bloated the prompt
with potentially redundant or conflicting instructions.
Now only ONE project context type is loaded, using priority order:
1. .hermes.md / HERMES.md (walk to git root)
2. AGENTS.md / agents.md (recursive directory walk)
3. CLAUDE.md / claude.md (cwd only, NEW)
4. .cursorrules / .cursor/rules/*.mdc (cwd only)
SOUL.md from HERMES_HOME remains independent and always loads.
Also adds CLAUDE.md as a recognized context file format, matching the
convention popularized by Claude Code.
Refactored the monolithic function into four focused helpers:
_load_hermes_md, _load_agents_md, _load_claude_md, _load_cursorrules.
Tests: replaced 1 coexistence test with 10 new tests covering priority
ordering, CLAUDE.md loading, case sensitivity, injection blocking.
- Introduced a mechanism to mute output after the main response is delivered, ensuring that subsequent tool calls run without cluttering the CLI.
- Redirected stdout to devnull during the review agent's execution to prevent any print statements from interfering with the main CLI display.
- Added a new attribute `_mute_post_response` to manage output suppression effectively.
- Added a user bar separator for improved visual clarity when displaying pasted text and user input in the HermesCLI.
- Ensured consistent formatting for both multi-line and single-line user inputs, enhancing the overall user experience in the command-line interface.
These changes contribute to a more organized and visually appealing output during interactions.
Same bug as opencode-zen/go — alibaba fell through to the OpenRouter
model list instead of using _setup_provider_model_selection() which
probes the provider's own /models endpoint.
All user-selectable providers now have correct model selection routing.
After selecting OpenCode Zen or Go as provider in hermes setup, the
model selection page showed OpenRouter models because these providers
weren't in the list that routes to _setup_provider_model_selection().
They fell through to the else branch which shows the OpenRouter catalog.
Users ended up with an OpenCode API key but an OpenRouter model name,
causing 'Provider resolver returned an empty API key' on first use.
Fix: add opencode-zen and opencode-go to the provider list that uses
_setup_provider_model_selection() for live /models detection.
The top-level 'toolsets' key in config.yaml was never read at runtime.
Tool selection uses platform_toolsets (per-platform) or the --toolsets
CLI flag. The key existed in load_cli_config() defaults and the example
config as 'toolsets: [all]', misleading users into thinking it
controlled tool availability.
- Remove from load_cli_config() hardcoded defaults
- Remove from hermes config show output
- Replace in cli-config.yaml.example with deprecation note pointing
to platform_toolsets and hermes tools
Two bugs in the save/load roundtrip for platform_toolsets:
1. _save_platform_tools preserved composite toolset entries (hermes-cli,
hermes-telegram, etc.) because they weren't in configurable_keys.
These composites include ALL _HERMES_CORE_TOOLS, so having hermes-cli
in the saved list alongside individual keys negated any disables —
the subset check always found the disabled toolset's tools via the
composite entry.
Fix: also filter out known TOOLSETS keys from preserved entries. Only
truly unknown entries (MCP server names, custom entries) are kept.
2. _get_platform_tools used reverse subset inference to determine which
configurable toolsets were enabled. This is inherently broken when
tools appear in multiple toolsets (e.g. HA tools in both the
homeassistant toolset and _HERMES_CORE_TOOLS).
Fix: when the saved list contains explicit configurable keys (meaning
the user has configured this platform), use direct membership instead
of subset inference. The fallback path still handles legacy configs
that only have a composite entry like hermes-cli.
Mirrors the CLI fix for the gateway /model handler. When the user is on
a custom provider (provider=custom, localhost, or 127.0.0.1 endpoint),
/model <name> no longer tries to auto-detect a provider switch.
Previously, typing /model openrouter/nvidia/nemotron:free on Telegram
while on a localhost endpoint would silently accept the model name on
the local server — auto-detection failed to match the free model, so
the provider stayed as custom with the localhost base_url. The user saw
'Model changed' but requests still went to localhost, which doesn't
serve that model.
Now shows the endpoint URL and provider:model syntax tip, matching
the CLI behavior.
Cherry-picked from PR #2201 by @Gutslabs.
session_search resolved hits to parent/root sessions but only excluded
the exact current_session_id. If the active session was a child
continuation (compression/delegation), its parent could still appear
as a 'past' conversation result.
Fix: resolve current_session_id to its lineage root before filtering,
so the entire active lineage (parent and children) is excluded.
When an API call fails, the error output now shows the provider name,
model, and endpoint URL so users can immediately identify which service
rejected their request. Auth errors (401/403) get actionable guidance:
check key validity, model access, and OpenRouter credits link.
Before: 'API call failed (attempt 1/3): PermissionDeniedError'
After: 'API call failed (attempt 1/3): PermissionDeniedError
Provider: openrouter Model: anthropic/claude-sonnet-4
Endpoint: https://openrouter.ai/api/v1
Your API key was rejected by the provider. Check:
• Is the key valid? Run: hermes setup
• Does your account have access to anthropic/claude-sonnet-4?
• Check credits: https://openrouter.ai/settings/credits'
- Added support for true-color ANSI escape codes in the HermesCLI to enhance the visual appearance of streamed content.
- Introduced a fallback mechanism for text color in case of errors while retrieving the color from the active skin.
- Updated the output formatting to include the new text color in both line emissions and buffer flushing.
These changes improve the user experience by ensuring consistent and visually appealing text output in the command-line interface.
Remove the memory and skill nudges that were appended directly to user
messages, causing backward-looking system instructions to compete with
forward-looking user tasks. Found in 43% of user messages across 15
sessions, with confirmed cases of the agent spending tool calls on
nudge responses before starting the user's actual request.
Replace with a background review agent that runs AFTER the main agent
finishes responding:
- Spawns a background thread with a snapshot of the conversation
- Uses the main model (not auxiliary) for high-precision memory/skill work
- Only has memory + skill_manage tools (5 iteration budget)
- Shares the memory store for direct writes
- Never modifies the main conversation history
- Never competes with the user's task for model attention
- Zero latency impact (runs after response is delivered)
- Same token cost (processes the same context, just on a separate track)
The trigger conditions are unchanged (every 10 user turns for memory,
after 10+ tool iterations for skills). Only the execution path changes:
from inline injection to background fork.
Closes#2227.
Co-authored-by: Test <test@test.com>
- Changed the ANSI escape code for gold color in cli.py and banner.py to use true-color format (#FFD700) for better visual consistency.
- Enhanced the _on_tool_progress method in HermesCLI to update the TUI spinner with tool execution status, improving user feedback during operations.
These changes improve the visual representation and user experience in the command-line interface.
Co-authored-by: Test <test@test.com>
Remove the [Files already read — do NOT re-read these] user message
that was injected into the conversation after context compression.
This message used role='user' for system-generated content, creating
a fake user turn that confused models about conversation state and
could contribute to task-redo behavior.
The file_tools.py read tracker (warn on 3rd consecutive read, block
on 4th+) already handles re-read prevention inline without injecting
synthetic messages.
Closes#2224.
Co-authored-by: Test <test@test.com>
In Docker/systemd/piped environments, the KawaiiSpinner animation
generates ~500 log lines per tool call. Now checks isatty() and
falls back to clean [tool]/[done] log lines in non-TTY contexts.
Interactive CLI behavior unchanged.
Based on work by 42-evey in PR #2203.
The official international DashScope endpoint uses dashscope-intl.aliyuncs.com
(per Alibaba docs), which the substring match on dashscope.aliyuncs.com misses
because of the hyphenated prefix.
Replace asyncio.run() with thread-local persistent event loops for
worker threads (e.g., delegate_task's ThreadPoolExecutor). asyncio.run()
creates and closes a fresh loop on every call, leaving cached
httpx/AsyncOpenAI clients bound to a dead loop — causing 'Event loop is
closed' errors during GC when parallel subagents clean up connections.
The fix mirrors the main thread's _get_tool_loop() pattern but uses
threading.local() so each worker thread gets its own long-lived loop,
avoiding both cross-thread contention and the create-destroy lifecycle.
Added 4 regression tests covering worker loop persistence, reuse,
per-thread isolation, and separation from the main thread's loop.
If a tool_calls list contains a None entry (from malformed API response,
compression artifact, or corrupt session replay), convert_messages_to_anthropic
crashes with AttributeError: 'NoneType' object has no attribute 'get'.
Skip None and non-dict entries in the tool_calls iteration. Found via
chaos/fuzz testing with mixed valid/invalid tool_call entries.
Custom endpoint users (DashScope/Alibaba, Z.AI, Kimi, DeepSeek, etc.)
get wrong context lengths because their provider resolves as "openrouter"
or "custom", skipping the models.dev lookup entirely. For example,
qwen3.5-plus on DashScope falls to the generic "qwen" hardcoded default
(131K) instead of the correct 1M.
Add _infer_provider_from_url() that maps known API hostnames to their
models.dev provider IDs. When the explicit provider is generic
(openrouter/custom/empty), infer from the base URL before the models.dev
lookup. This resolves context lengths correctly for DashScope, Z.AI,
Kimi, MiniMax, DeepSeek, and Nous endpoints without requiring users to
manually set context_length in config.
Also refactors _is_known_provider_base_url() to use the same URL mapping,
removing the duplicated hostname list.
When the model returns multiple tool calls, run_agent.py executes them
concurrently in a ThreadPoolExecutor. Each thread called _run_async()
which used a shared persistent event loop (_get_tool_loop()). If two
async tools (like web_extract) ran in parallel, the second thread would
hit 'This event loop is already running' on the shared loop.
Fix: detect worker threads (not main thread) and use asyncio.run() with
a per-thread fresh loop instead of the shared persistent one. The shared
loop is still used for the main thread (CLI sequential path) to keep
cached async clients (httpx/AsyncOpenAI) alive.
Co-authored-by: Test <test@test.com>
- Convert ~~text~~ to ~text~ (MarkdownV2 strikethrough)
- Protect ||text|| from pipe escaping (MarkdownV2 spoiler)
- Preserve > at line start as blockquote instead of escaping it
- Update _strip_mdv2() to strip ~strikethrough~ and ||spoiler|| markers
- Add tests covering new formatting paths and edge cases
- Updated _stream_delta method in HermesCLI to handle None values, flushing the stream and resetting state for clean tool execution.
- Enhanced quiet mode handling in AIAgent to ensure proper display closure before tool execution, preventing display issues with intermediate streamed content.
These changes improve the robustness of the streaming functionality and ensure a smoother user experience during tool interactions.
Cherry-picked from PR #2146 by @crazywriter1. Fixes#2104.
asyncio.run() creates and closes a fresh event loop each call. Cached
httpx/AsyncOpenAI clients bound to the dead loop crash on GC with
'Event loop is closed'. This hit vision_analyze on first use in CLI.
Two-layer fix:
- model_tools._run_async(): replace asyncio.run() with persistent
loop via _get_tool_loop() + run_until_complete()
- auxiliary_client._get_cached_client(): track which loop created
each async client, discard stale entries if loop is closed
6 regression tests covering loop lifecycle, reuse, and full vision
dispatch chain.
Co-authored-by: Test <test@test.com>
Adds /queue <prompt> (alias /q) that queues a message for the next
turn while the agent is busy, without interrupting the current run.
- CLI: /queue <prompt> puts it in _pending_input for the next turn
- Gateway: /queue <prompt> creates a pending MessageEvent on the
adapter, picked up after the current agent run finishes
- Enter still interrupts as usual (no behavior change)
- /queue with no prompt shows usage
- /queue when agent is idle tells user to just type normally
Co-authored-by: Test <test@test.com>
Salvaged from PR #2162 by @Zindar. Reply prefix changes excluded (already
on main via #1756 configurable prefix).
Bridge improvements (bridge.js):
- Download incoming images to ~/.hermes/image_cache/ via downloadMediaMessage
so the agent can actually see user-sent photos
- Add getMessage callback required for Baileys 7.x E2EE session
re-establishment (without it, some messages arrive as null)
- Build LID→phone reverse map for allowlist resolution (WhatsApp LID format)
- Add placeholder body for media without caption: [image received]
- Bind express to 127.0.0.1 instead of 0.0.0.0 for security
- Use 127.0.0.1 consistently throughout (more reliable than localhost)
Adapter improvements (whatsapp.py):
- Detect and reuse already-running bridge (only if status=connected)
- Handle local file paths from bridge-cached images in _build_message_event
- Don't kill external bridges on disconnect
- Use 127.0.0.1 throughout for consistency with bridge binding
Fix vs original PR: bridge reuse now checks status=connected, not just
HTTP 200. A disconnected bridge gets restarted instead of reused.
Co-authored-by: Zindar <zindar@users.noreply.github.com>
Cherry-picked from PR #2169 by @0xbyt4.
1. _strip_provider_prefix: skip Ollama model:tag names (qwen:0.5b)
2. Fuzzy match: remove reverse direction that made claude-sonnet-4
resolve to 1M instead of 200K
3. _has_content_after_think_block: reuse _strip_think_blocks() to
handle all tag variants (thinking, reasoning, REASONING_SCRATCHPAD)
4. models.dev lookup: elif→if so nous provider also queries models.dev
5. Disk cache fallback: use 5-min TTL instead of full hour so network
is retried soon
6. Delegate build: wrap child construction in try/finally so
_last_resolved_tool_names is always restored on exception
Matrix, Mattermost, Home Assistant, and DingTalk were missing from the
platform_map in both cron/scheduler.py and tools/send_message_tool.py,
causing delivery to those platforms to silently fail.
Also updates the cronjob tool schema description to list all available
delivery targets so the model knows its options.
Two fixes for Telegram/gateway-specific bugs:
1. Anthropic adapter: strip orphaned tool_result blocks (mirror of
existing tool_use stripping). Context compression or session
truncation can remove an assistant message containing a tool_use
while leaving the subsequent tool_result intact. Anthropic rejects
these with a 400: 'unexpected tool_use_id found in tool_result
blocks'. The adapter now collects all tool_use IDs and filters out
any tool_result blocks referencing IDs not in that set.
2. Gateway: /reset and /new now bypass the running-agent guard (like
/status already does). Previously, sending /reset while an agent
was running caused the raw text to be queued and later fed back as
a user message with the same broken history — replaying the
corrupted session instead of resetting it. Now the running agent is
interrupted, pending messages are cleared, and the reset command
dispatches immediately.
Tests updated: existing tests now include proper tool_use→tool_result
pairs; two new tests cover orphaned tool_result stripping.
Co-authored-by: Test <test@test.com>
- quickstart.md: mention context length prompt for custom endpoints,
link to configuration docs, add Ollama to provider table
- faq.md: rewrite local models section with hermes model flow and
context length prompt example, add Ollama num_ctx tip, expand
context-length-exceeded troubleshooting with detection override
options and config.yaml examples
Co-authored-by: Test <test@test.com>
* feat: context pressure warnings for CLI and gateway
User-facing notifications as context approaches the compaction threshold.
Warnings fire at 60% and 85% of the way to compaction — relative to
the configured compression threshold, not the raw context window.
CLI: Formatted line with a progress bar showing distance to compaction.
Cyan at 60% (approaching), bold yellow at 85% (imminent).
◐ context ▰▰▰▰▰▰▰▰▰▰▰▰▱▱▱▱▱▱▱▱ 60% to compaction 100k threshold (50%) · approaching compaction
⚠ context ▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▱▱▱ 85% to compaction 100k threshold (50%) · compaction imminent
Gateway: Plain-text notification sent to the user's chat via the new
status_callback mechanism (asyncio.run_coroutine_threadsafe bridge,
same pattern as step_callback).
Does NOT inject into the message stream. The LLM never sees these
warnings. Flags reset after each compaction cycle.
Files changed:
- agent/display.py — format_context_pressure(), format_context_pressure_gateway()
- run_agent.py — status_callback param, _context_50/70_warned flags,
_emit_context_pressure(), flag reset in _compress_context()
- gateway/run.py — _status_callback_sync bridge, wired to AIAgent
- tests/test_context_pressure.py — 23 tests
* Merge remote-tracking branch 'origin/main' into hermes/hermes-7ea545bf
---------
Co-authored-by: Test <test@test.com>
Matrix is a supported gateway platform but was missing from the
cron scheduler's delivery platform_map, causing cron job results
to silently fail delivery when targeting Matrix rooms.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>