Two bugs in the auxiliary provider auto-detection chain:
1. Expired Codex JWT blocks the auto chain: _read_codex_access_token()
returned any stored token without checking expiry, preventing fallback
to working providers. Now decodes JWT exp claim and returns None for
expired tokens.
2. Auxiliary Anthropic client missing OAuth identity transforms:
_AnthropicCompletionsAdapter always called build_anthropic_kwargs with
is_oauth=False, causing 400 errors for OAuth tokens. Now detects OAuth
tokens via _is_oauth_token() and propagates the flag through the
adapter chain.
Cherry-picked from PR #2378 by 0xbyt4. Fixed test_api_key_no_oauth_flag
to mock resolve_anthropic_token directly (env var alone was insufficient).
redact_sensitive_text() now returns early for None and coerces other
non-string values to str before applying regex-based redaction,
preventing TypeErrors in logging/tool-output paths.
Cherry-picked from PR #2369 by aydnOktay.
On the native Anthropic Messages API path, convert_messages_to_anthropic()
moves top-level cache_control on role:tool messages inside the tool_result
block. On OpenRouter (chat_completions), no such conversion happens — the
unexpected top-level field causes a silent hang on the second tool call.
Add native_anthropic parameter to _apply_cache_marker() and
apply_anthropic_cache_control(). When False (OpenRouter), role:tool messages
are skipped entirely. When True (native Anthropic), existing behaviour is
preserved.
Fixes#2362
When 'hermes update' stashes local changes and the restore hits
conflicts, the previous behavior silently ran 'git reset --hard HEAD'
to clean up. This could surprise users who didn't realize their
working tree was being nuked.
Now the conflict handler:
- Lists the specific conflicted files
- Reassures the user their stash is preserved
- Asks before resetting (interactive mode)
- Auto-resets in non-interactive mode (prompt_user=False)
- If declined, leaves the working tree as-is with guidance
* fix: prevent Anthropic token fallback leaking to third-party anthropic_messages providers
When provider is minimax/alibaba/etc and MINIMAX_API_KEY is not set,
the code fell back to resolve_anthropic_token() sending Anthropic OAuth
credentials to third-party endpoints, causing 401 errors.
Now only provider=="anthropic" triggers the fallback. Generalizes the
Alibaba-specific guard from #1739 to all non-Anthropic providers.
* fix: set provider='anthropic' in credential refresh tests
Follow-up for cherry-picked PR #2383 — existing tests didn't set
agent.provider, which the new guard requires to allow Anthropic
token refresh.
---------
Co-authored-by: 0xbyt4 <35742124+0xbyt4@users.noreply.github.com>
The gateway created a fresh AIAgent per message, rebuilding the system
prompt (including memory, skills, context files) every turn. This broke
prompt prefix caching — providers like Anthropic charge ~10x more for
uncached prefixes.
Now caches AIAgent instances per session_key with a config signature.
The cached agent is reused across messages in the same session,
preserving the frozen system prompt and tool schemas. Cache is
invalidated when:
- Config changes (model, provider, toolsets, reasoning, ephemeral
prompt) — detected via signature mismatch
- /new, /reset, /clear — explicit session reset
- /model — global model change clears all cached agents
- /reasoning — global reasoning change clears all cached agents
Per-message state (callbacks, stream consumers, progress queues) is
set on the agent instance before each run_conversation() call.
This matches CLI behavior where a single AIAgent lives across all turns
in a session, with _cached_system_prompt built once and reused.
When `hermes update` stashes local changes and the subsequent
`git stash apply` fails or leaves unmerged files, the conflict markers
(<<<<<<< etc.) were left in the working tree, making Hermes unrunnable
until manually cleaned up.
Now the update command runs `git reset --hard HEAD` to restore a clean
working tree before exiting, and also detects unmerged files even when
git stash apply reports success.
Closes#2348
Add @file:path, @folder:dir, @diff, @staged, @git:N, and @url:
references that expand inline before the message reaches the LLM.
Supports line ranges (@file:main.py:10-50), token budget enforcement
(soft warn at 25%, hard block at 50%), and path sandboxing for gateway.
Core module from PR #2090 by @kshitijk4poor. CLI and gateway wiring
rewritten against current main. Fixed asyncio.run() crash when called
from inside a running event loop (gateway).
Closes#682.
Fixes#1803. send_image_file, send_document, and send_video were missing
message_thread_id forwarding, causing them to fail in Telegram forum/supergroups
where thread_id is required. send_voice already handled this correctly. Adds
metadata parameter + message_thread_id to all three methods, and adds tests
covering the thread_id forwarding path.
Based on PR #1749 by @erosika (reimplemented on current main).
Extracts three protected methods from run() so wrapper CLIs can extend
the TUI without overriding the entire method:
- _get_extra_tui_widgets(): inject widgets between spacer and status bar
- _register_extra_tui_keybindings(kb, input_area): add keybindings
- _build_tui_layout_children(**widgets): full control over ordering
Default implementations reproduce existing layout exactly. The inline
HSplit in run() now delegates to _build_tui_layout_children().
5 tests covering defaults, widget insertion position, and keybinding
registration.
When using Alibaba (DashScope) with an anthropic-compatible endpoint,
model names like qwen3.5-plus were being normalized to qwen3-5-plus.
Alibaba's API expects the dot. Added preserve_dots parameter to
normalize_model_name() and build_anthropic_kwargs().
Also fixed 401 auth: when provider is alibaba or base_url contains
dashscope/aliyuncs, use only the resolved API key (DASHSCOPE_API_KEY).
Never fall back to resolve_anthropic_token(), and skip Anthropic
credential refresh for DashScope endpoints.
Cherry-picked from PR #1748 by crazywriter1. Fixes#1739.
- Add resolve_config_path(): checks $HERMES_HOME/honcho.json first,
falls back to ~/.honcho/config.json. Enables isolated Hermes instances
with independent Honcho credentials and settings.
- Update CLI and doctor to use resolved path instead of hardcoded global.
- Change default session_strategy from per-session to per-directory.
Part 1 of #1962 by @erosika.
Bare strings like "image", "audio", "document" were appended to
media_types, but downstream run.py checks mtype.startswith("image/")
and mtype.startswith("audio/"), which never matched. This caused all
Mattermost file attachments to be silently dropped from vision/STT
processing. Use the actual MIME type from file_info instead.
When streaming is enabled, the base adapter receives None from
_handle_message (already_sent=True) and cannot run auto-TTS for
voice input. The runner was unconditionally skipping voice input
TTS assuming the base adapter would handle it.
Now the runner takes over TTS responsibility when streaming has
already delivered the text response, so voice channel playback
works with both streaming on and off.
Streaming off behavior is unchanged (default already_sent=False
preserves the original code path exactly).
Co-authored-by: 0xbyt4 <35742124+0xbyt4@users.noreply.github.com>
Cron deliveries were mirrored into the target gateway session as
assistant-role messages, causing consecutive assistant messages that
violate message alternation (issue #2221).
Instead of fixing the role, remove the mirror injection entirely.
Cron outputs already live in their own cron session and don't belong
in the interactive conversation history.
Delivered messages are now wrapped with a header (task name) and a
footer noting the agent cannot see or respond to the message, so
users have clear context about what they're reading.
Closes#2221
A single Telegram 409 Conflict from getUpdates permanently killed
Telegram polling with no recovery possible (retryable=False on
first occurrence). This is too aggressive for production use with
process supervisors.
Transient 409s are expected during:
- --replace handoffs where the old long-poll session lingers on
Telegram servers for a few seconds after SIGTERM
- systemd Restart=on-failure respawns that overlap with the dying
instance cleanup
Now _handle_polling_conflict() retries up to 3 times with a
10-second delay between attempts. The 30-second total retry window
lets stale server-side sessions expire. If all retries fail, the
error is still marked as permanently fatal — preserving the original
protection against genuine dual-instance conflicts.
Tests updated: split the single conflict test into two — one verifying
retry on transient conflict, one verifying fatal after exhausted
retries.
Closes#2296
Previously, all project context files (AGENTS.md, .cursorrules, .hermes.md)
were loaded and concatenated into the system prompt. This bloated the prompt
with potentially redundant or conflicting instructions.
Now only ONE project context type is loaded, using priority order:
1. .hermes.md / HERMES.md (walk to git root)
2. AGENTS.md / agents.md (recursive directory walk)
3. CLAUDE.md / claude.md (cwd only, NEW)
4. .cursorrules / .cursor/rules/*.mdc (cwd only)
SOUL.md from HERMES_HOME remains independent and always loads.
Also adds CLAUDE.md as a recognized context file format, matching the
convention popularized by Claude Code.
Refactored the monolithic function into four focused helpers:
_load_hermes_md, _load_agents_md, _load_claude_md, _load_cursorrules.
Tests: replaced 1 coexistence test with 10 new tests covering priority
ordering, CLAUDE.md loading, case sensitivity, injection blocking.
Cherry-picked from PR #2201 by @Gutslabs.
session_search resolved hits to parent/root sessions but only excluded
the exact current_session_id. If the active session was a child
continuation (compression/delegation), its parent could still appear
as a 'past' conversation result.
Fix: resolve current_session_id to its lineage root before filtering,
so the entire active lineage (parent and children) is excluded.
Remove the [Files already read — do NOT re-read these] user message
that was injected into the conversation after context compression.
This message used role='user' for system-generated content, creating
a fake user turn that confused models about conversation state and
could contribute to task-redo behavior.
The file_tools.py read tracker (warn on 3rd consecutive read, block
on 4th+) already handles re-read prevention inline without injecting
synthetic messages.
Closes#2224.
Co-authored-by: Test <test@test.com>
Replace asyncio.run() with thread-local persistent event loops for
worker threads (e.g., delegate_task's ThreadPoolExecutor). asyncio.run()
creates and closes a fresh loop on every call, leaving cached
httpx/AsyncOpenAI clients bound to a dead loop — causing 'Event loop is
closed' errors during GC when parallel subagents clean up connections.
The fix mirrors the main thread's _get_tool_loop() pattern but uses
threading.local() so each worker thread gets its own long-lived loop,
avoiding both cross-thread contention and the create-destroy lifecycle.
Added 4 regression tests covering worker loop persistence, reuse,
per-thread isolation, and separation from the main thread's loop.
- Convert ~~text~~ to ~text~ (MarkdownV2 strikethrough)
- Protect ||text|| from pipe escaping (MarkdownV2 spoiler)
- Preserve > at line start as blockquote instead of escaping it
- Update _strip_mdv2() to strip ~strikethrough~ and ||spoiler|| markers
- Add tests covering new formatting paths and edge cases
Cherry-picked from PR #2146 by @crazywriter1. Fixes#2104.
asyncio.run() creates and closes a fresh event loop each call. Cached
httpx/AsyncOpenAI clients bound to the dead loop crash on GC with
'Event loop is closed'. This hit vision_analyze on first use in CLI.
Two-layer fix:
- model_tools._run_async(): replace asyncio.run() with persistent
loop via _get_tool_loop() + run_until_complete()
- auxiliary_client._get_cached_client(): track which loop created
each async client, discard stale entries if loop is closed
6 regression tests covering loop lifecycle, reuse, and full vision
dispatch chain.
Co-authored-by: Test <test@test.com>
Two fixes for Telegram/gateway-specific bugs:
1. Anthropic adapter: strip orphaned tool_result blocks (mirror of
existing tool_use stripping). Context compression or session
truncation can remove an assistant message containing a tool_use
while leaving the subsequent tool_result intact. Anthropic rejects
these with a 400: 'unexpected tool_use_id found in tool_result
blocks'. The adapter now collects all tool_use IDs and filters out
any tool_result blocks referencing IDs not in that set.
2. Gateway: /reset and /new now bypass the running-agent guard (like
/status already does). Previously, sending /reset while an agent
was running caused the raw text to be queued and later fed back as
a user message with the same broken history — replaying the
corrupted session instead of resetting it. Now the running agent is
interrupted, pending messages are cleared, and the reset command
dispatches immediately.
Tests updated: existing tests now include proper tool_use→tool_result
pairs; two new tests cover orphaned tool_result stripping.
Co-authored-by: Test <test@test.com>
* feat: context pressure warnings for CLI and gateway
User-facing notifications as context approaches the compaction threshold.
Warnings fire at 60% and 85% of the way to compaction — relative to
the configured compression threshold, not the raw context window.
CLI: Formatted line with a progress bar showing distance to compaction.
Cyan at 60% (approaching), bold yellow at 85% (imminent).
◐ context ▰▰▰▰▰▰▰▰▰▰▰▰▱▱▱▱▱▱▱▱ 60% to compaction 100k threshold (50%) · approaching compaction
⚠ context ▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▱▱▱ 85% to compaction 100k threshold (50%) · compaction imminent
Gateway: Plain-text notification sent to the user's chat via the new
status_callback mechanism (asyncio.run_coroutine_threadsafe bridge,
same pattern as step_callback).
Does NOT inject into the message stream. The LLM never sees these
warnings. Flags reset after each compaction cycle.
Files changed:
- agent/display.py — format_context_pressure(), format_context_pressure_gateway()
- run_agent.py — status_callback param, _context_50/70_warned flags,
_emit_context_pressure(), flag reset in _compress_context()
- gateway/run.py — _status_callback_sync bridge, wired to AIAgent
- tests/test_context_pressure.py — 23 tests
* Merge remote-tracking branch 'origin/main' into hermes/hermes-7ea545bf
---------
Co-authored-by: Test <test@test.com>
Replace the fragile hardcoded context length system with a multi-source
resolution chain that correctly identifies context windows per provider.
Key changes:
- New agent/models_dev.py: Fetches and caches the models.dev registry
(3800+ models across 100+ providers with per-provider context windows).
In-memory cache (1hr TTL) + disk cache for cold starts.
- Rewritten get_model_context_length() resolution chain:
0. Config override (model.context_length)
1. Custom providers per-model context_length
2. Persistent disk cache
3. Endpoint /models (local servers)
4. Anthropic /v1/models API (max_input_tokens, API-key only)
5. OpenRouter live API (existing, unchanged)
6. Nous suffix-match via OpenRouter (dot/dash normalization)
7. models.dev registry lookup (provider-aware)
8. Thin hardcoded defaults (broad family patterns)
9. 128K fallback (was 2M)
- Provider-aware context: same model now correctly resolves to different
context windows per provider (e.g. claude-opus-4.6: 1M on Anthropic,
128K on GitHub Copilot). Provider name flows through ContextCompressor.
- DEFAULT_CONTEXT_LENGTHS shrunk from 80+ entries to ~16 broad patterns.
models.dev replaces the per-model hardcoding.
- CONTEXT_PROBE_TIERS changed from [2M, 1M, 512K, 200K, 128K, 64K, 32K]
to [128K, 64K, 32K, 16K, 8K]. Unknown models no longer start at 2M.
- hermes model: prompts for context_length when configuring custom
endpoints. Supports shorthand (32k, 128K). Saved to custom_providers
per-model config.
- custom_providers schema extended with optional models dict for
per-model context_length (backward compatible).
- Nous Portal: suffix-matches bare IDs (claude-opus-4-6) against
OpenRouter's prefixed IDs (anthropic/claude-opus-4.6) with dot/dash
normalization. Handles all 15 current Nous models.
- Anthropic direct: queries /v1/models for max_input_tokens. Only works
with regular API keys (sk-ant-api*), not OAuth tokens. Falls through
to models.dev for OAuth users.
Tests: 5574 passed (18 new tests for models_dev + updated probe tiers)
Docs: Updated configuration.md context length section, AGENTS.md
Co-authored-by: Test <test@test.com>
* fix: preserve Ollama model:tag colons in context length detection
The colon-split logic in get_model_context_length() and
_query_local_context_length() assumed any colon meant provider:model
format (e.g. "local:my-model"). But Ollama uses model:tag format
(e.g. "qwen3.5:27b"), so the split turned "qwen3.5:27b" into just
"27b" — which matches nothing, causing a fallback to the 2M token
probe tier.
Now only recognised provider prefixes (local, openrouter, anthropic,
etc.) are stripped. Ollama model:tag names pass through intact.
* fix: update claude-opus-4-6 and claude-sonnet-4-6 context length from 200K to 1M
Both models support 1,000,000 token context windows. The hardcoded defaults
were set before Anthropic expanded the context for the 4.6 generation.
Verified via models.dev and OpenRouter API data.
---------
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Co-authored-by: Test <test@test.com>
Cherry-picked from PR #2120 by @unclebumpy.
- from_env() now reads HONCHO_BASE_URL and enables Honcho when base_url
is set, even without an API key
- from_global_config() reads baseUrl from config root with
HONCHO_BASE_URL env var as fallback
- get_honcho_client() guard relaxed to allow base_url without api_key
for no-auth local instances
- Added HONCHO_BASE_URL to OPTIONAL_ENV_VARS registry
Result: Setting HONCHO_BASE_URL=http://localhost:8000 in ~/.hermes/.env
now correctly routes the Honcho client to a local instance.
When the user is on a custom provider (provider=custom, localhost, or
127.0.0.1 endpoint), /model <name> no longer tries to auto-detect a
provider switch. The model name changes on the current endpoint as-is.
To switch away from a custom endpoint, users must use explicit
provider:model syntax (e.g. /model openai-codex:gpt-5.2-codex).
A helpful tip is printed when changing models on a custom endpoint.
This prevents the confusing case where someone on LM Studio types
/model gpt-5.2-codex, the auto-detection tries to switch providers,
fails or partially succeeds, and requests still go to the old endpoint.
Also fixes the missing prompt_toolkit.auto_suggest mock stub in
test_cli_init.py (same issue already fixed in test_cli_new_session.py).
Follow-up to PR #2101 (InB4DevOps). Adds three missing context compressor
resets in reset_session_state():
- compression_count (displayed in status bar)
- last_total_tokens
- _context_probed (stale context-error flag)
Also fixes the test_cli_new_session.py prompt_toolkit mock (missing
auto_suggest stub) and adds a regression test for #2099 that verifies
all token counters and compressor state are zeroed on /new.
The colon-split logic in get_model_context_length() and
_query_local_context_length() assumed any colon meant provider:model
format (e.g. "local:my-model"). But Ollama uses model:tag format
(e.g. "qwen3.5:27b"), so the split turned "qwen3.5:27b" into just
"27b" — which matches nothing, causing a fallback to the 2M token
probe tier.
Now only recognised provider prefixes (local, openrouter, anthropic,
etc.) are stripped. Ollama model:tag names pass through intact.
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>