Fixes Anthropic OAuth/subscription authentication end-to-end:
Auth failures (401 errors):
- Add missing 'claude-code-20250219' beta header for OAuth tokens. Both
clawdbot and OpenCode include this alongside 'oauth-2025-04-20' — without
it, Anthropic's API rejects OAuth tokens with 401 authentication errors.
- Fix _fetch_anthropic_models() to use canonical beta headers from
_COMMON_BETAS + _OAUTH_ONLY_BETAS instead of hardcoding.
Token refresh:
- Add _refresh_oauth_token() — when Claude Code credentials from
~/.claude/.credentials.json are expired but have a refresh token,
automatically POST to console.anthropic.com/v1/oauth/token to get
a new access token. Uses the same client_id as Claude Code / OpenCode.
- Add _write_claude_code_credentials() — writes refreshed tokens back
to ~/.claude/.credentials.json, preserving other fields.
- resolve_anthropic_token() now auto-refreshes expired tokens before
returning None.
Config contamination:
- Anthropic's _model_flow_anthropic() no longer saves base_url to config.
Since resolve_runtime_provider() always hardcodes Anthropic's URL, the
stale base_url was contaminating other providers when users switched
without re-running 'hermes model' (e.g., Codex hitting api.anthropic.com).
- _update_config_for_provider() now pops base_url when passed empty string.
- Same fix in setup.py.
Flow/UX (hermes model command):
- CLAUDE_CODE_OAUTH_TOKEN env var now checked in credential detection
- Reauthentication option when existing credentials found
- run_oauth_setup_token() runs 'claude setup-token' as interactive
subprocess, then auto-detects saved credentials
- Clean has_creds/needs_auth flow in both main.py and setup.py
Tests (14 new):
- Beta header assertions for claude-code-20250219
- Token refresh: successful refresh with credential writeback, failed
refresh returns None, no refresh token returns None
- Credential writeback: new file creation, preserving existing fields
- Auto-refresh integration in resolve_anthropic_token()
- CLAUDE_CODE_OAUTH_TOKEN fallback, credential file auto-discovery
- run_oauth_setup_token() (5 scenarios)
Fixes from comprehensive code review and cross-referencing with
clawdbot/OpenCode implementations:
CRITICAL:
- Add one-shot guard (anthropic_auth_retry_attempted) to prevent
infinite 401 retry loops when credentials keep changing
- Fix _is_oauth_token(): managed keys from ~/.claude.json are NOT
regular API keys (don't start with sk-ant-api). Inverted the logic:
only sk-ant-api* is treated as API key auth, everything else uses
Bearer auth + oauth beta headers
HIGH:
- Wrap json.loads(args) in try/except in message conversion — malformed
tool_call arguments no longer crash the entire conversation
- Raise AuthError in runtime_provider when no Anthropic token found
(was silently passing empty string, causing confusing API errors)
- Remove broken _try_anthropic() from auxiliary vision chain — the
centralized router creates an OpenAI client for api_key providers
which doesn't work with Anthropic's Messages API
MEDIUM:
- Handle empty assistant message content — Anthropic rejects empty
content blocks, now inserts '(empty)' placeholder
- Fix setup.py existing_key logic — set to 'KEEP' sentinel instead
of None to prevent falling through to the auth choice prompt
- Add debug logging to _fetch_anthropic_models on failure
Tests: 43 adapter tests (2 new for token detection), 3197 total passed
- Add _fetch_anthropic_models() to hermes_cli/models.py — hits the
Anthropic /v1/models endpoint to get the live model catalog. Handles
both API key and OAuth token auth headers.
- Wire it into provider_model_ids() so both 'hermes model' and
'hermes setup model' show the live list instead of a stale static one.
- Update static _PROVIDER_MODELS fallback with full current catalog:
opus-4-6, sonnet-4-6, opus-4-5, sonnet-4-5, opus-4, sonnet-4, haiku-4-5
- Update model_metadata.py with context lengths for all current models.
- Fix thinking parameter for 4.5+ models: use type='adaptive' instead
of type='enabled' (Anthropic deprecated 'enabled' for newer models,
warns at runtime). Detects model version from the model name string.
Verified live:
hermes model → Anthropic → auto-detected creds → shows 7 live models
hermes chat --provider anthropic --model claude-opus-4-6 → works
* fix: use session_key instead of chat_id for adapter interrupt lookups
monitor_for_interrupt() in _run_agent was using source.chat_id to query
the adapter's has_pending_interrupt() and get_pending_message() methods.
But the adapter stores interrupt events under build_session_key(source),
which produces a different string (e.g. 'agent:main:telegram:dm' vs '123456').
This key mismatch meant the interrupt was never detected through the
adapter path, which is the only active interrupt path for all adapter-based
platforms (Telegram, Discord, Slack, etc.). The gateway-level interrupt
path (in dispatch_message) is unreachable because the adapter intercepts
the 2nd message in handle_message() before it reaches dispatch_message().
Result: sending a new message while subagents were running had no effect —
the interrupt was silently lost.
Fix: replace all source.chat_id references in the interrupt-related code
within _run_agent() with the session_key parameter, which matches the
adapter's storage keys.
Also adds regression tests verifying session_key vs chat_id consistency.
* debug: add file-based logging to CLI interrupt path
Temporary instrumentation to diagnose why message-based interrupts
don't seem to work during subagent execution. Logs to
~/.hermes/interrupt_debug.log (immune to redirect_stdout).
Two log points:
1. When Enter handler puts message into _interrupt_queue
2. When chat() reads it and calls agent.interrupt()
This will reveal whether the message reaches the queue and
whether the interrupt is actually fired.
* fix: accept unlisted models with warning instead of rejecting
validate_requested_model() previously hard-rejected any model not found
in the provider's API listing. This was too aggressive — users on higher
plan tiers (e.g. Z.AI Pro/Max) may have access to models not shown in
the public listing (like glm-5 on coding endpoints).
Changes:
- validate_requested_model: accept unlisted models with a warning note
instead of blocking. The model is saved to config and used immediately.
- Z.AI setup: always offer glm-5 in the model list regardless of whether
a coding endpoint was detected. Pro/Max plans support it.
- Z.AI setup detection message: softened from 'GLM-5 is not available'
to 'GLM-5 may still be available depending on your plan tier'
* fix: /reasoning command output ordering, display, and inline think extraction
Three issues with the /reasoning command:
1. Output interleaving: The command echo used print() while feedback
used _cprint(), causing them to render out-of-order under
prompt_toolkit's patch_stdout. Changed echo to use _cprint() so
all output renders through the same path in correct order.
2. Reasoning display not working: /reasoning show toggled a flag
but reasoning never appeared for models that embed thinking in
inline <think> blocks rather than structured API fields. Added
fallback extraction in _build_assistant_message to capture
<think> block content as reasoning when no structured reasoning
fields (reasoning, reasoning_content, reasoning_details) are
present. This feeds into both the reasoning callback (during
tool loops) and the post-response reasoning box display.
3. Feedback clarity: Added checkmarks to confirm actions, persisted
show/hide to config (was session-only before), and aligned the
status display for readability.
Tests: 7 new tests for inline think block extraction (41 total).
* feat: add /reasoning command to gateway (Telegram/Discord/etc)
The /reasoning command only existed in the CLI — messaging platforms
had no way to view or change reasoning settings. This adds:
1. /reasoning command handler in the gateway:
- No args: shows current effort level and display state
- /reasoning <level>: sets reasoning effort (none/low/medium/high/xhigh)
- /reasoning show|hide: toggles reasoning display in responses
- All changes saved to config.yaml immediately
2. Reasoning display in gateway responses:
- When show_reasoning is enabled, prepends a 'Reasoning' block
with the model's last_reasoning content before the response
- Collapses long reasoning (>15 lines) to keep messages readable
- Uses last_reasoning from run_conversation result dict
3. Plumbing:
- Added _show_reasoning attribute loaded from config at startup
- Propagated last_reasoning through _run_agent return dict
- Added /reasoning to help text and known_commands set
- Uses getattr for _show_reasoning to handle test stubs
* fix: improve Kimi model selection — auto-detect endpoint, add missing models
Kimi Coding Plan setup:
- New dedicated _model_flow_kimi() replaces the generic API-key flow
for kimi-coding. Removes the confusing 'Base URL' prompt entirely —
the endpoint is auto-detected from the API key prefix:
sk-kimi-* → api.kimi.com/coding/v1 (Kimi Coding Plan)
other → api.moonshot.ai/v1 (legacy Moonshot)
- Shows appropriate models for each endpoint:
Coding Plan: kimi-for-coding, kimi-k2.5, kimi-k2-thinking, kimi-k2-thinking-turbo
Moonshot: full model catalog
- Clears any stale KIMI_BASE_URL override so runtime auto-detection
via _resolve_kimi_base_url() works correctly.
Model catalog updates:
- Added kimi-for-coding (primary Coding Plan model) and kimi-k2-thinking-turbo
to models.py, main.py _PROVIDER_MODELS, and model_metadata.py context windows.
- Updated User-Agent from KimiCLI/1.0 to KimiCLI/1.3 (Kimi's coding
endpoint whitelists known coding agents via User-Agent sniffing).
curated_models_for_provider() now tries the live API first (via
provider_model_ids) before falling back to static _PROVIDER_MODELS.
This means /model and /provider slash commands show the actual
available models, not a stale hardcoded list.
Also added live Nous Portal model fetching via fetch_nous_models()
in provider_model_ids(), alongside the existing Codex live fetch.
- Custom endpoints can serve any model, so skip validation for
provider='custom' in validate_requested_model(). Previously it
would reject any model name since there's no static catalog or
live API to check against.
- Show clear setup instructions when switching to custom endpoint
without OPENAI_BASE_URL/OPENAI_API_KEY configured.
- Added curated model lists for Nous Portal and OpenAI Codex to
_PROVIDER_MODELS so /model shows their available models.
Both /model and /provider now show the same unified display:
Current: anthropic/claude-opus-4.6 via OpenRouter
Authenticated providers & models:
[openrouter] ← active
anthropic/claude-opus-4.6 ← current
anthropic/claude-sonnet-4.5
...
[nous]
claude-opus-4-6
gemini-3-flash
...
[openai-codex]
gpt-5.2-codex
gpt-5.1-codex-mini
...
Not configured: Z.AI / GLM, Kimi / Moonshot, ...
Switch model: /model <model-name>
Switch provider: /model <provider>:<model-name>
Example: /model nous:claude-opus-4-6
Users can see all authenticated providers and their models at a glance,
making it easy to switch mid-conversation.
Also added curated model lists for Nous Portal and OpenAI Codex to
hermes_cli/models.py.
Three issues caused the gateway to display 'openrouter' instead of
'Custom endpoint' when users configured a custom OAI-compatible endpoint:
1. hermes setup: custom endpoint path saved OPENAI_BASE_URL and
OPENAI_API_KEY to .env but never wrote model.provider to config.yaml.
All other providers (Codex, z.ai, Kimi, etc.) call
_update_config_for_provider() which sets this — custom was the only
path that skipped it. Now writes model.provider='custom' and
model.base_url to config.yaml.
2. hermes model: custom endpoint set model.provider='auto' in config.yaml.
The CLI display had a hack to detect OPENAI_BASE_URL and override to
'custom', but the gateway didn't. Now sets model.provider='custom'
directly.
3. gateway /model and /provider commands: defaulted to 'openrouter' and
read config.yaml — which had no provider set. Added OPENAI_BASE_URL
detection fallback (same pattern the CLI uses) as a defensive catch
for existing users who set up before this fix.
/provider command (CLI + gateway):
Shows all providers with auth status (✓/✗), aliases, and active marker.
Users can now discover what provider names work with provider:model syntax.
Gateway bugs fixed:
- Config was saved even when validation.persist=False (told user 'session
only' but actually persisted the unvalidated model)
- HERMES_INFERENCE_PROVIDER env var not set on provider switch, causing
the switch to be silently overridden if that env var was already set
parse_model_input hardened:
- Colon only treated as provider delimiter if left side is a recognized
provider name or alias. 'anthropic/claude-3.5-sonnet:beta' now passes
through as a model name instead of trying provider='anthropic/claude-3.5-sonnet'.
- HTTP URLs, random colons no longer misinterpreted.
56 tests passing across model validation, CLI commands, and integration.
'auto' doesn't always mean openrouter — it could be nous, zai,
kimi-coding, etc. depending on configured credentials. Reverted the
hardcoded mapping and now both CLI and gateway call
resolve_provider() to detect the actual active provider when 'auto'
is set. Falls back to openrouter only if resolution fails.
- normalize_provider('auto') now returns 'openrouter' (the default)
so /model shows the curated model list instead of nothing
- CLI /model display uses normalize_provider before looking up labels
- Gateway /model handler now uses the same validation logic as CLI:
live API probe, provider:model syntax, curated model list display
Add provider:model syntax to /model command for runtime provider switching:
/model zai:glm-5 → switch to Z.AI provider with glm-5
/model nous:hermes-3 → switch to Nous Portal with hermes-3
/model openrouter:anthropic/claude-sonnet-4.5 → explicit OpenRouter
When switching providers, credentials are resolved via resolve_runtime_provider
and validated before committing. Both model and provider are saved to config.
Provider aliases work (glm: → zai, kimi: → kimi-coding, etc.).
Enhanced /model (no args) display now shows:
- Current model and provider
- Curated model list for the current provider with ← marker
- Usage examples including provider:model syntax
39 tests covering parse_model_input, curated_models_for_provider,
provider switching (success + credential failure), and display output.
Not all providers require 'provider/model' format. Removing the rigid
format check lets the live API probe handle all validation uniformly.
If someone types 'gpt-5.4' on OpenRouter, the probe won't find it and
will suggest 'openai/gpt-5.4' — better UX than a format rejection.
Replace the static catalog-based model validation with a live API probe.
The /model command now hits the provider's /models endpoint to check if
the requested model actually exists:
- Model found in API → accepted + saved to config
- Model NOT found in API → rejected with 'Error: not a valid model'
and fuzzy-match suggestions from the live model list
- API unreachable → graceful fallback to hardcoded catalog (session-only
for unrecognized models)
- Format errors (empty, spaces, missing '/') still caught instantly
without a network call
The API probe takes ~0.2s for OpenRouter (346 models) and works with any
OpenAI-compatible endpoint (Ollama, vLLM, custom, etc.).
32 tests covering all paths: format checks, API found, API not found,
API unreachable fallback, CLI integration.