* feat: context pressure warnings for CLI and gateway
User-facing notifications as context approaches the compaction threshold.
Warnings fire at 60% and 85% of the way to compaction — relative to
the configured compression threshold, not the raw context window.
CLI: Formatted line with a progress bar showing distance to compaction.
Cyan at 60% (approaching), bold yellow at 85% (imminent).
◐ context ▰▰▰▰▰▰▰▰▰▰▰▰▱▱▱▱▱▱▱▱ 60% to compaction 100k threshold (50%) · approaching compaction
⚠ context ▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▱▱▱ 85% to compaction 100k threshold (50%) · compaction imminent
Gateway: Plain-text notification sent to the user's chat via the new
status_callback mechanism (asyncio.run_coroutine_threadsafe bridge,
same pattern as step_callback).
Does NOT inject into the message stream. The LLM never sees these
warnings. Flags reset after each compaction cycle.
Files changed:
- agent/display.py — format_context_pressure(), format_context_pressure_gateway()
- run_agent.py — status_callback param, _context_50/70_warned flags,
_emit_context_pressure(), flag reset in _compress_context()
- gateway/run.py — _status_callback_sync bridge, wired to AIAgent
- tests/test_context_pressure.py — 23 tests
* Merge remote-tracking branch 'origin/main' into hermes/hermes-7ea545bf
---------
Co-authored-by: Test <test@test.com>
Replace the fragile hardcoded context length system with a multi-source
resolution chain that correctly identifies context windows per provider.
Key changes:
- New agent/models_dev.py: Fetches and caches the models.dev registry
(3800+ models across 100+ providers with per-provider context windows).
In-memory cache (1hr TTL) + disk cache for cold starts.
- Rewritten get_model_context_length() resolution chain:
0. Config override (model.context_length)
1. Custom providers per-model context_length
2. Persistent disk cache
3. Endpoint /models (local servers)
4. Anthropic /v1/models API (max_input_tokens, API-key only)
5. OpenRouter live API (existing, unchanged)
6. Nous suffix-match via OpenRouter (dot/dash normalization)
7. models.dev registry lookup (provider-aware)
8. Thin hardcoded defaults (broad family patterns)
9. 128K fallback (was 2M)
- Provider-aware context: same model now correctly resolves to different
context windows per provider (e.g. claude-opus-4.6: 1M on Anthropic,
128K on GitHub Copilot). Provider name flows through ContextCompressor.
- DEFAULT_CONTEXT_LENGTHS shrunk from 80+ entries to ~16 broad patterns.
models.dev replaces the per-model hardcoding.
- CONTEXT_PROBE_TIERS changed from [2M, 1M, 512K, 200K, 128K, 64K, 32K]
to [128K, 64K, 32K, 16K, 8K]. Unknown models no longer start at 2M.
- hermes model: prompts for context_length when configuring custom
endpoints. Supports shorthand (32k, 128K). Saved to custom_providers
per-model config.
- custom_providers schema extended with optional models dict for
per-model context_length (backward compatible).
- Nous Portal: suffix-matches bare IDs (claude-opus-4-6) against
OpenRouter's prefixed IDs (anthropic/claude-opus-4.6) with dot/dash
normalization. Handles all 15 current Nous models.
- Anthropic direct: queries /v1/models for max_input_tokens. Only works
with regular API keys (sk-ant-api*), not OAuth tokens. Falls through
to models.dev for OAuth users.
Tests: 5574 passed (18 new tests for models_dev + updated probe tiers)
Docs: Updated configuration.md context length section, AGENTS.md
Co-authored-by: Test <test@test.com>
Cron jobs run unattended with no user present. Previously the agent had
send_message and clarify tools available, which makes no sense — the
final response is auto-delivered, and there's nobody to ask questions to.
Changes:
- Disable messaging and clarify toolsets for cron agent sessions
- Update cron platform hint to emphasize autonomous execution: no user
present, cannot ask questions, must execute fully and make decisions
- Update cronjob tool schema description to match (remove stale
send_message guidance)
* fix: preserve Ollama model:tag colons in context length detection
The colon-split logic in get_model_context_length() and
_query_local_context_length() assumed any colon meant provider:model
format (e.g. "local:my-model"). But Ollama uses model:tag format
(e.g. "qwen3.5:27b"), so the split turned "qwen3.5:27b" into just
"27b" — which matches nothing, causing a fallback to the 2M token
probe tier.
Now only recognised provider prefixes (local, openrouter, anthropic,
etc.) are stripped. Ollama model:tag names pass through intact.
* fix: update claude-opus-4-6 and claude-sonnet-4-6 context length from 200K to 1M
Both models support 1,000,000 token context windows. The hardcoded defaults
were set before Anthropic expanded the context for the 4.6 generation.
Verified via models.dev and OpenRouter API data.
---------
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Co-authored-by: Test <test@test.com>
The colon-split logic in get_model_context_length() and
_query_local_context_length() assumed any colon meant provider:model
format (e.g. "local:my-model"). But Ollama uses model:tag format
(e.g. "qwen3.5:27b"), so the split turned "qwen3.5:27b" into just
"27b" — which matches nothing, causing a fallback to the 2M token
probe tier.
Now only recognised provider prefixes (local, openrouter, anthropic,
etc.) are stripped. Ollama model:tag names pass through intact.
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Custom endpoints (LM Studio, Ollama, vLLM, llama.cpp) silently fall
back to 2M tokens when /v1/models doesn't include context_length.
Adds _query_local_context_length() which queries server-specific APIs:
- LM Studio: /api/v1/models (max_context_length + loaded instances)
- Ollama: /api/show (model_info + num_ctx parameters)
- llama.cpp: /props (n_ctx from default_generation_settings)
- vLLM: /v1/models/{model} (max_model_len)
Prefers loaded instance context over max (e.g., 122K loaded vs 1M max).
Results are cached via save_context_length() to avoid repeated queries.
Also fixes detect_local_server_type() misidentifying LM Studio as
Ollama (LM Studio returns 200 for /api/tags with an error body).
When LM Studio has a model loaded with a custom context size (e.g.,
122K), prefer that over the model's max_context_length (e.g., 1M).
This makes the TUI status bar show the actual runtime context window.
Instead of defaulting to 2M for unknown local models, query the server
API for the real context length. Supports Ollama (/api/show), vLLM
(max_model_len), and LM Studio (/v1/models). Results are cached to
avoid repeated queries.
Closes#1911
- insights.py: Pre-compute SELECT queries as class constants instead of
f-string interpolation at runtime. _SESSION_COLS is now evaluated once
at class definition time.
- hermes_state.py: Add identifier quoting and whitelist validation for
ALTER TABLE column names in schema migrations.
- Add 4 tests verifying no injection vectors in SQL query construction.
* fix: detect context length for custom model endpoints via fuzzy matching + config override
Custom model endpoints (non-OpenRouter, non-known-provider) were silently
falling back to 2M tokens when the model name didn't exactly match what the
endpoint's /v1/models reported. This happened because:
1. Endpoint metadata lookup used exact match only — model name mismatches
(e.g. 'qwen3.5:9b' vs 'Qwen3.5-9B-Q4_K_M.gguf') caused a miss
2. Single-model servers (common for local inference) required exact name
match even though only one model was loaded
3. No user escape hatch to manually set context length
Changes:
- Add fuzzy matching for endpoint model metadata: single-model servers
use the only available model regardless of name; multi-model servers
try substring matching in both directions
- Add model.context_length config override (highest priority) so users
can explicitly set their model's context length in config.yaml
- Log an informative message when falling back to 2M probe, telling
users about the config override option
- Thread config_context_length through ContextCompressor and AIAgent init
Tests: 6 new tests covering fuzzy match, single-model fallback, config
override (including zero/None edge cases).
* fix: auto-detect local model name and context length for local servers
Cherry-picked from PR #2043 by sudoingX.
- Auto-detect model name from local server's /v1/models when only one
model is loaded (no manual model name config needed)
- Add n_ctx_train and n_ctx to context length detection keys for llama.cpp
- Query llama.cpp /props endpoint for actual allocated context (not just
training context from GGUF metadata)
- Strip .gguf suffix from display in banner and status bar
- _auto_detect_local_model() in runtime_provider.py for CLI init
Co-authored-by: sudo <sudoingx@users.noreply.github.com>
* fix: revert accidental summary_target_tokens change + add docs for context_length config
- Revert summary_target_tokens from 2500 back to 500 (accidental change
during patching)
- Add 'Context Length Detection' section to Custom & Self-Hosted docs
explaining model.context_length config override
---------
Co-authored-by: Test <test@test.com>
Co-authored-by: sudo <sudoingx@users.noreply.github.com>
After #1675 removed ANTHROPIC_BASE_URL env var support, the Anthropic
provider base URL was hardcoded to https://api.anthropic.com. Now reads
model.base_url from config.yaml as an override, falling back to the
default when not set. Also applies to the auxiliary client.
Cherry-picked from PR #1949 by @rivercrab26.
Co-authored-by: rivercrab26 <rivercrab26@users.noreply.github.com>
_align_boundary_backward only checked messages[idx-1] to decide if
the compress-end boundary splits a tool_call/result group. When an
assistant issues 3+ parallel tool calls, their results span multiple
consecutive messages. If the boundary fell in the middle of that group,
the parent assistant was summarized away and orphaned tool results were
silently deleted by _sanitize_tool_pairs.
Now walks backward through all consecutive tool results to find the
parent assistant, then pulls the boundary before the entire group.
6 regression tests added in tests/test_compression_boundary.py.
Co-authored-by: Guts <Gutslabs@users.noreply.github.com>
SOUL.md now loads in slot #1 of the system prompt, replacing the
hardcoded DEFAULT_AGENT_IDENTITY. This lets users fully customize
the agent's identity and personality by editing ~/.hermes/SOUL.md
without it conflicting with the built-in identity text.
When SOUL.md is loaded as identity, it's excluded from the context
files section to avoid appearing twice. When SOUL.md is missing,
empty, unreadable, or skip_context_files is set, the hardcoded
DEFAULT_AGENT_IDENTITY is used as a fallback.
The default SOUL.md (seeded on first run) already contains the full
Hermes personality, so existing installs are unaffected.
Co-authored-by: Test <test@test.com>
* fix: banner skill count now respects disabled skills and platform filtering
The banner's get_available_skills() was doing a raw rglob scan of
~/.hermes/skills/ without checking:
- Whether skills are disabled (skills.disabled config)
- Whether skills match the current platform (platforms: frontmatter)
This caused the banner to show inflated skill counts (e.g. '100 skills'
when many are disabled) and list macOS-only skills on Linux.
Fix: delegate to _find_all_skills() from tools/skills_tool which already
handles both platform gating and disabled-skill filtering.
* fix: system prompt and slash commands now respect disabled skills
Two more places where disabled skills were still surfaced:
1. build_skills_system_prompt() in prompt_builder.py — disabled skills
appeared in the <available_skills> system prompt section, causing
the agent to suggest/load them despite being disabled.
2. scan_skill_commands() in skill_commands.py — disabled skills still
registered as /skill-name slash commands in CLI help and could be
invoked.
Both now load _get_disabled_skill_names() and filter accordingly.
* fix: skill_view blocks disabled skills
skill_view() checked platform compatibility but not disabled state,
so the agent could still load and read disabled skills directly.
Now returns a clear error when a disabled skill is requested, telling
the user to enable it via hermes skills or inspect the files manually.
---------
Co-authored-by: Test <test@test.com>
* perf: cache base_url.lower() via property, consolidate triple load_config(), hoist set constant
run_agent.py:
- Add base_url property that auto-caches _base_url_lower on every
assignment, eliminating 12+ redundant .lower() calls per API cycle
across __init__, _build_api_kwargs, _supports_reasoning_extra_body,
and the main conversation loop
- Consolidate three separate load_config() disk reads in __init__
(memory, skills, compression) into a single call, reusing the
result dict for all three config sections
model_tools.py:
- Hoist _READ_SEARCH_TOOLS set to module level (was rebuilt inside
handle_function_call on every tool invocation)
* Use endpoint metadata for custom model context and pricing
---------
Co-authored-by: kshitij <82637225+kshitijk4poor@users.noreply.github.com>
MiniMax: Add M2.7 and M2.7-highspeed as new defaults across provider
model lists, auxiliary client, metadata, setup wizard, RL training tool,
fallback tests, and docs. Retain M2.5/M2.1 as alternatives.
OpenRouter: Add grok-4.20-beta, nemotron-3-super-120b-a12b:free,
trinity-large-preview:free, glm-5-turbo, and hunter-alpha to the
model catalog.
MiniMax changes based on PR #1882 by @octo-patch (applied manually
due to stale conflicts in refactored pricing module).
Add first-class GitHub Copilot and Copilot ACP provider support across
model selection, runtime provider resolution, CLI sessions, delegated
subagents, cron jobs, and the Telegram gateway.
This also normalizes Copilot model catalogs and API modes, introduces a
Copilot ACP OpenAI-compatible shim, and fixes service-mode auth by
resolving Homebrew-installed gh binaries under launchd.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaces all remaining print() calls in compress() with logger.info()
and logger.warning() for consistency with the rest of the module.
Inspired by PR #1822.
compress() checks both the head and tail neighbors when choosing the
summary message role. When only the tail collides, the role is flipped.
When BOTH roles would create consecutive same-role messages (e.g.
head=assistant, tail=user), the summary is merged into the first tail
message instead of inserting a standalone message that breaks role
alternation and causes API 400 errors.
The previous code handled head-side collision but left the tail-side
uncovered — long conversations would crash mid-reply with no useful
error, forcing the user to /reset and lose session history.
Based on PR #1186 by @alireza78a, with improved double-collision
handling (merge into tail instead of unconditional 'user' fallback).
Co-authored-by: alireza78a <alireza78.crypto@gmail.com>
- Add summary_base_url config option to compression block for custom
OpenAI-compatible endpoints (e.g. zai, DeepSeek, Ollama)
- Remove compression env var bridges from cli.py and gateway/run.py
(CONTEXT_COMPRESSION_* env vars no longer set from config)
- Switch run_agent.py to read compression config directly from
config.yaml instead of env vars
- Fix backwards-compat block in _resolve_task_provider_model to also
fire when auxiliary.compression.provider is 'auto' (DEFAULT_CONFIG
sets this, which was silently preventing the compression section's
summary_* keys from being read)
- Add test for summary_base_url config-to-client flow
- Update docs to show compression as config.yaml-only
Closes#1591
Based on PR #1702 by @uzaylisak
Four small fixes:
1. model_tools.py: Tool import failures logged at WARNING instead of
DEBUG. If a tool module fails to import (syntax error, missing dep),
the user now sees a warning instead of the tool silently vanishing.
2. hermes_cli/config.py: Remove duplicate 'import sys' (lines 19, 21).
3. agent/model_metadata.py: Remove 6 duplicate entries in
DEFAULT_CONTEXT_LENGTHS dict. Python keeps the last value, so no
functional change, but removes maintenance confusion.
4. hermes_state.py: Add missing self._lock to the LIKE query in
resolve_session_id(). The exact-match path used get_session()
(which locks internally), but the prefix fallback queried _conn
without the lock.
Adds .hermes.md / HERMES.md discovery for per-project agent configuration.
When the agent starts, it walks from cwd to the git root looking for
.hermes.md (preferred) or HERMES.md, strips any YAML frontmatter, and
injects the markdown body into the system prompt as project context.
- Nearest-first discovery (subdirectory configs shadow parent)
- Stops at git root boundary (no leaking into parent repos)
- YAML frontmatter stripped (structured config deferred to Phase 2)
- Same injection scanning and 20K truncation as other context files
- 22 comprehensive tests
Original implementation by ch3ronsa. Cherry-picked and adapted for current main.
Closes#681 (Phase 1)
After the first user→assistant exchange, Hermes now generates a short
descriptive session title via the auxiliary LLM (compression task config).
Title generation runs in a background thread so it never delays the
user-facing response.
Key behaviors:
- Fires only on the first 1-2 exchanges (checks user message count)
- Skips if a title already exists (user-set titles are never overwritten)
- Uses call_llm with compression task config (cheapest/fastest model)
- Truncates long messages to keep the title generation request small
- Cleans up LLM output: strips quotes, 'Title:' prefixes, enforces 80 char max
- Works in both CLI and gateway (Telegram/Discord/etc.)
Also updates /title (no args) to show the session ID alongside the title
in both CLI and gateway.
Implements #1426
The fuzzy match for model context lengths iterated dict insertion
order. Shorter model names (e.g. 'gpt-5') could match before more
specific ones (e.g. 'gpt-5.4-pro'), returning the wrong context
length.
Sort by key length descending so more specific model names always
match first.
The summary message role was determined only by the last head message,
ignoring the first tail message. This could create consecutive user
messages (rejected by Anthropic) when the tail started with 'user'.
Now checks both neighbors. Priority: avoid colliding with the head
(already committed). If the chosen role also collides with the tail,
flip it — but only if flipping wouldn't re-collide with the head.
When tool_choice was 'none', the code did 'pass' — no tool_choice
was sent but tools were still included in the request. Anthropic
defaults to 'auto' when tools are present, so the model could still
call tools despite the caller requesting 'none'.
Fix: omit tools entirely from the request when tool_choice is 'none',
which is the only way to prevent tool use with the Anthropic API.
The module-level auxiliary_is_nous was set to True by _try_nous() and
never reset. In long-running gateway processes, once Nous was resolved
as auxiliary provider, the flag stayed True forever — even if
subsequent resolutions chose a different provider (e.g. OpenRouter).
This caused Nous product tags to be sent to non-Nous providers.
Reset the flag at the start of _resolve_auto() so only the winning
provider's flag persists.
When two consecutive assistant messages had mixed content types (one
string, one list), the merge logic just replaced the earlier message
entirely with the later one (fixed[-1] = m), silently dropping the
earlier message's content.
Apply the same normalization pattern used in the tool_use merge path
(lines 952-956): convert both to list format before concatenating.
This preserves all content from both messages.
* fix: thread safety for concurrent subagent delegation
Four thread-safety fixes that prevent crashes and data races when
running multiple subagents concurrently via delegate_task:
1. Remove redirect_stdout/stderr from delegate_tool — mutating global
sys.stdout races with the spinner thread when multiple children start
concurrently, causing segfaults. Children already run with
quiet_mode=True so the redirect was redundant.
2. Split _run_single_child into _build_child_agent (main thread) +
_run_single_child (worker thread). AIAgent construction creates
httpx/SSL clients which are not thread-safe to initialize
concurrently.
3. Add threading.Lock to SessionDB — subagents share the parent's
SessionDB and call create_session/append_message from worker threads
with no synchronization.
4. Add _active_children_lock to AIAgent — interrupt() iterates
_active_children while worker threads append/remove children.
5. Add _client_cache_lock to auxiliary_client — multiple subagent
threads may resolve clients concurrently via call_llm().
Based on PR #1471 by peteromallet.
* feat: Honcho base_url override via config.yaml + quick command alias type
Two features salvaged from PR #1576:
1. Honcho base_url override: allows pointing Hermes at a remote
self-hosted Honcho deployment via config.yaml:
honcho:
base_url: "http://192.168.x.x:8000"
When set, this overrides the Honcho SDK's environment mapping
(production/local), enabling LAN/VPN Honcho deployments without
requiring the server to live on localhost. Uses config.yaml instead
of env var (HONCHO_URL) per project convention.
2. Quick command alias type: adds a new 'alias' quick command type
that rewrites to another slash command before normal dispatch:
quick_commands:
sc:
type: alias
target: /context
Supports both CLI and gateway. Arguments are forwarded to the
target command.
Based on PR #1576 by redhelix.
---------
Co-authored-by: peteromallet <peteromallet@users.noreply.github.com>
Co-authored-by: redhelix <redhelix@users.noreply.github.com>
Add Alibaba Cloud (DashScope) as a first-class inference provider
using the Anthropic-compatible endpoint. This gives access to Qwen
models (qwen3.5-plus, qwen3-max, qwen3-coder-plus, etc.) through
the same api_mode as native Anthropic.
Also add ANTHROPIC_BASE_URL env var support so users can point the
Anthropic provider at any compatible endpoint.
Changes:
- auth.py: Add alibaba ProviderConfig + ANTHROPIC_BASE_URL on anthropic
- models.py: Add alibaba to catalog, labels, aliases (dashscope/aliyun/qwen), provider order
- runtime_provider.py: Add alibaba resolution (anthropic_messages api_mode) + ANTHROPIC_BASE_URL
- model_metadata.py: Add Qwen model context lengths (128K)
- config.py: Add DASHSCOPE_API_KEY, DASHSCOPE_BASE_URL, ANTHROPIC_BASE_URL env vars
Usage:
hermes --provider alibaba --model qwen3.5-plus
# or via aliases:
hermes --provider qwen --model qwen3-max
* fix: prevent infinite 400 failure loop on context overflow (#1630)
When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message. This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error. Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.
Three-layer fix:
1. run_agent.py — Fallback heuristic: when a 400 error has a very short
generic message AND the session is large (>40% of context or >80
messages), treat it as a probable context overflow and trigger
compression instead of aborting.
2. run_agent.py + gateway/run.py — Don't persist failed messages:
when the agent returns failed=True before generating any response,
skip writing the user's message to the transcript/DB. This prevents
the session from growing on each failure.
3. gateway/run.py — Smarter error messages: detect context-overflow
failures and suggest /compact or /reset specifically, instead of a
generic 'try again' that will fail identically.
* fix(skills): detect prompt injection patterns and block cache file reads
Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):
1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
(index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
was the original injection vector — untrusted skill descriptions
in the catalog contained adversarial text that the model executed.
2. skill_view: warns when skills are loaded from outside the trusted
~/.hermes/skills/ directory, and detects common injection patterns
in skill content ("ignore previous instructions", "<system>", etc.).
Cherry-picked from PR #1562 by ygd58.
* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)
Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.
- Apply truncate_message() chunking in _send_to_platform() before
dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement
Cherry-picked from PR #1557 by llbn.
* fix(approval): show full command in dangerous command approval (#1553)
Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:
- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests
Cherry-picked from PR #1566 by crazywriter1.
* fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)
The interrupt polling loop in chat() waited on the queue without
invalidating the prompt_toolkit renderer. On macOS, the StdoutProxy
buffer only flushed on input events, causing the CLI to appear frozen
during tool execution until the user typed a key.
Fix: call _invalidate() on each queue timeout (every ~100ms, throttled
to 150ms) to force the renderer to flush buffered agent output.
* fix(claw): warn when API keys are skipped during OpenClaw migration (#1580)
When --migrate-secrets is not passed (the default), API keys like
OPENROUTER_API_KEY are silently skipped with no warning. Users don't
realize their keys weren't migrated until the agent fails to connect.
Add a post-migration warning with actionable instructions: either
re-run with --migrate-secrets or add the key manually via
hermes config set.
Cherry-picked from PR #1593 by ygd58.
* fix(security): block sandbox backend creds from subprocess env (#1264)
Add Modal and Daytona sandbox credentials to the subprocess env
blocklist so they're not leaked to agent terminal sessions via
printenv/env.
Cherry-picked from PR #1571 by ygd58.
* fix(gateway): cap interrupt recursion depth to prevent resource exhaustion (#816)
When a user sends multiple messages while the agent keeps failing,
_run_agent() calls itself recursively with no depth limit. This can
exhaust stack/memory if the agent is in a failure loop.
Add _MAX_INTERRUPT_DEPTH = 3. When exceeded, the pending message is
logged and the current result is returned instead of recursing deeper.
The log handler duplication bug described in #816 was already fixed
separately (AIAgent.__init__ deduplicates handlers).
* fix(gateway): /model shows active fallback model instead of config default (#1615)
When the agent falls back to a different model (e.g. due to rate
limiting), /model still showed the config default. Now tracks the
effective model/provider after each agent run and displays it.
Cleared when the primary model succeeds again or the user explicitly
switches via /model.
Cherry-picked from PR #1616 by MaxKerkula. Added hasattr guard for
test compatibility.
* feat(gateway): inject reply-to message context for out-of-session replies (#1594)
When a user replies to a Telegram message, check if the quoted text
exists in the current session transcript. If missing (from cron jobs,
background tasks, or old sessions), prepend [Replying to: "..."] to
the message so the agent has context about what's being referenced.
- Add reply_to_text field to MessageEvent (base.py)
- Populate from Telegram's reply_to_message (text or caption)
- Inject context in _handle_message when not found in history
Based on PR #1596 by anpicasso (cherry-picked reply-to feature only,
excluded unrelated /server command and background delegation changes).
* fix: recognize Claude Code OAuth credentials in startup gate (#1455)
The _has_any_provider_configured() startup check didn't look for
Claude Code OAuth credentials (~/.claude/.credentials.json). Users
with only Claude Code auth got the setup wizard instead of starting.
Cherry-picked from PR #1455 by kshitijk4poor.
* perf: use ripgrep for file search (200x faster than find)
search_files(target='files') now uses rg --files -g instead of find.
Ripgrep respects .gitignore, excludes hidden dirs by default, and has
parallel directory traversal — ~200x faster on wide trees (0.14s vs 34s
benchmarked on 164-repo tree).
Falls back to find when rg is unavailable, preserving hidden-dir
exclusion and BSD find compatibility.
Salvaged from PR #1464 by @light-merlin-dark (Merlin) — adapted to
preserve hidden-dir exclusion added since the original PR.
* refactor(tts): replace NeuTTS optional skill with built-in provider + setup flow
Remove the optional skill (redundant now that NeuTTS is a built-in TTS
provider). Replace neutts_cli dependency with a standalone synthesis
helper (tools/neutts_synth.py) that calls the neutts Python API directly
in a subprocess.
Add TTS provider selection to hermes setup:
- 'hermes setup' now prompts for TTS provider after model selection
- 'hermes setup tts' available as standalone section
- Selecting NeuTTS checks for deps and offers to install:
espeak-ng (system) + neutts[all] (pip)
- ElevenLabs/OpenAI selections prompt for API keys
- Tool status display shows NeuTTS install state
Changes:
- Remove optional-skills/mlops/models/neutts/ (skill + CLI scaffold)
- Add tools/neutts_synth.py (standalone synthesis subprocess helper)
- Move jo.wav/jo.txt to tools/neutts_samples/ (bundled default voice)
- Refactor _generate_neutts() — uses neutts API via subprocess, no
neutts_cli dependency, config-driven ref_audio/ref_text/model/device
- Add TTS setup to hermes_cli/setup.py (SETUP_SECTIONS, tool status)
- Update config.py defaults (ref_audio, ref_text, model, device)
* fix(docker): add explicit env allowlist for container credentials (#1436)
Docker terminal sessions are secret-dark by default. This adds
terminal.docker_forward_env as an explicit allowlist for env vars
that may be forwarded into Docker containers.
Values resolve from the current shell first, then fall back to
~/.hermes/.env. Only variables the user explicitly lists are
forwarded — nothing is auto-exposed.
Cherry-picked from PR #1449 by @teknium1, conflict-resolved onto
current main.
Fixes#1436
Supersedes #1439
* fix: email send_typing metadata param + ☤ Hermes staff symbol
- email.py: add missing metadata parameter to send_typing() to match
BasePlatformAdapter signature (PR #1431 by @ItsChoudhry)
- README.md: ⚕ → ☤ — the caduceus is Hermes's staff, not the
medical Staff of Asclepius (PR #1420 by @rianczerwinski)
* fix(whatsapp): support LID format in self-chat mode (#1556)
WhatsApp now uses LID (Linked Identity Device) format alongside classic
@s.whatsapp.net. Self-chat detection checked only the classic format,
breaking self-chat mode for users on newer WhatsApp versions.
- Check both sock.user.id and sock.user.lid for self-chat detection
- Accept 'append' message type in addition to 'notify' (self-chat
messages arrive as 'append')
- Track sent message IDs to prevent echo-back loops with media
- Add WHATSAPP_DEBUG env var for troubleshooting
Based on PR #1556 by jcorrego (manually applied due to cherry-pick
conflicts).
* fix: detect Claude Code version dynamically for OAuth user-agent
The _CLAUDE_CODE_VERSION was hardcoded to '2.1.2' but Anthropic
rejects OAuth requests when the spoofed user-agent version is too
far behind the current Claude Code release. The error is a generic
400 with just 'Error' as the message, making it very hard to diagnose.
Fix: detect the installed version via 'claude --version' at import
time, falling back to a bumped static constant (2.1.74) when Claude
Code isn't installed. This means users who keep Claude Code updated
never hit stale-version rejections.
Reported by Jack — changing the version string to match the installed
claude binary fixed persistent OAuth 400 errors immediately.
---------
Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
Co-authored-by: Max K <MaxKerkula@users.noreply.github.com>
Co-authored-by: Angello Picasso <angello.picasso@devsu.com>
Co-authored-by: kshitij <kshitijk4poor@users.noreply.github.com>
Co-authored-by: jcorrego <jcorrego@users.noreply.github.com>
Add Kilo Gateway (kilo.ai) as an API-key provider with OpenAI-compatible
endpoint at https://api.kilo.ai/api/gateway. Supports 500+ models from
Anthropic, OpenAI, Google, xAI, Mistral, MiniMax via a single API key.
- Register kilocode in PROVIDER_REGISTRY with aliases (kilo, kilo-code,
kilo-gateway) and KILOCODE_API_KEY / KILOCODE_BASE_URL env vars
- Add to model catalog, CLI provider menu, setup wizard, doctor checks
- Add google/gemini-3-flash-preview as default aux model
- 12 new tests covering registration, aliases, credential resolution,
runtime config
- Documentation updates (env vars, config, fallback providers)
- Fix setup test index shift from provider insertion
Inspired by PR #1473 by @amanning3390.
Co-authored-by: amanning3390 <amanning3390@users.noreply.github.com>
Add support for OpenCode Zen (pay-as-you-go, 35+ curated models) and
OpenCode Go ($10/month subscription, open models) as first-class providers.
Both are OpenAI-compatible endpoints resolved via the generic api_key
provider flow — no custom adapter needed.
Files changed:
- hermes_cli/auth.py — ProviderConfig entries + aliases
- hermes_cli/config.py — OPENCODE_ZEN/GO API key env vars
- hermes_cli/models.py — model catalogs, labels, aliases, provider order
- hermes_cli/main.py — provider labels, menu entries, model flow dispatch
- hermes_cli/setup.py — setup wizard branches (idx 10, 11)
- agent/model_metadata.py — context lengths for all OpenCode models
- agent/auxiliary_client.py — default aux models
- .env.example — documentation
Co-authored-by: DevAgarwal2 <DevAgarwal2@users.noreply.github.com>
* feat: add Vercel AI Gateway as a first-class provider
Adds AI Gateway (ai-gateway.vercel.sh) as a new inference provider
with AI_GATEWAY_API_KEY authentication, live model discovery, and
reasoning support via extra_body.reasoning.
Based on PR #1492 by jerilynzheng.
* feat: add AI Gateway to setup wizard, doctor, and fallback providers
* test: add AI Gateway to api_key_providers test suite
* feat: add AI Gateway to hermes model CLI and model metadata
Wire AI Gateway into the interactive model selection menu and add
context lengths for AI Gateway model IDs in model_metadata.py.
* feat: use claude-haiku-4.5 as AI Gateway auxiliary model
* revert: use gemini-3-flash as AI Gateway auxiliary model
* fix: move AI Gateway below established providers in selection order
---------
Co-authored-by: jerilynzheng <jerilynzheng@users.noreply.github.com>
Co-authored-by: jerilynzheng <zheng.jerilyn@gmail.com>
The URL is now the primary element — displayed in a bordered box
before the browser auto-open attempt. Works for users who SSH into
remote servers where webbrowser.open() silently fails.
Put the authorization URL front and center instead of treating it as
a fallback. Most Hermes users run on remote servers via SSH where
webbrowser.open() silently fails.
Adds our own OAuth login and token refresh flow, independent of Claude
Code CLI. Mirrors the PKCE flow used by pi-ai (clawdbot) and OpenCode:
- run_hermes_oauth_login(): full PKCE authorization code flow
- Opens browser to claude.ai/oauth/authorize
- User pastes code#state back
- Exchanges for access + refresh tokens
- Stores in ~/.hermes/.anthropic_oauth.json (our own file)
- Also writes to ~/.claude/.credentials.json for backward compat
- refresh_hermes_oauth_token(): automatic token refresh
- POST to console.anthropic.com/v1/oauth/token with refresh_token
- Updates both credential files on success
- Credential resolution priority updated:
1. ANTHROPIC_TOKEN env var
2. CLAUDE_CODE_OAUTH_TOKEN env var
3. Hermes OAuth credentials (~/.hermes/.anthropic_oauth.json) ← NEW
4. Claude Code credentials (~/.claude/.credentials.json)
5. ANTHROPIC_API_KEY env var
Uses same CLIENT_ID, endpoints, scopes, and PKCE parameters as
Claude Code / OpenCode / pi-ai. Token refresh happens automatically
before each API call via _try_refresh_anthropic_client_credentials.