OpenAI's newer models (GPT-5, Codex) give stronger instruction-following
weight to the 'developer' role vs 'system'. Swap the role at the API
boundary in _build_api_kwargs() for the chat_completions path so internal
message representation stays consistent ('system' everywhere).
Applies regardless of provider — OpenRouter, Nous portal, direct, etc.
The codex_responses path (direct OpenAI) uses 'instructions' instead of
message roles, so it's unaffected.
DEVELOPER_ROLE_MODELS constant in prompt_builder.py defines the matching
model name substrings: ('gpt-5', 'codex').
When PyYAML is unavailable or YAML frontmatter is malformed, the fallback
parser may return metadata as a string instead of a dict. This causes
AttributeError when calling .get("hermes") on the string.
Added explicit type checks to handle cases where metadata or hermes fields
are not dicts, preventing the crash.
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
The total_tokens field includes cache_read + cache_write tokens, but
the display only showed input + output — making the math look wrong
(e.g. 765K + 134K displayed but total said 9.2M). Now shows a cache
line when cache tokens are present so all visible numbers sum to the
displayed total.
Affects both terminal (hermes insights) and gateway (/insights)
formats.
Show inline diffs in the CLI transcript when write_file, patch, or
skill_manage modifies files. Captures a filesystem snapshot before the
tool runs, computes a unified diff after, and renders it with ANSI
coloring in the activity feed.
Adds tool_start_callback and tool_complete_callback hooks to AIAgent
for pre/post tool execution notifications.
Also fixes _extract_parallel_scope_path to normalize relative paths
to absolute, preventing the parallel overlap detection from missing
conflicts when the same file is referenced with different path styles.
Gated by display.inline_diffs config option (default: true).
Based on PR #3774 by @kshitijk4poor.
Three bugs prevented credential pool rotation from working when multiple
Codex OAuth tokens were configured:
1. credential_pool was dropped during smart model turn routing.
resolve_turn_route() constructed runtime dicts without it, so the
AIAgent was created without pool access. Fixed in smart_model_routing.py
(no-route and fallback paths), cli.py, and gateway/run.py.
2. Eager fallback fired before pool rotation on 429. The rate-limit
handler at line ~7180 switched to a fallback provider immediately,
before _recover_with_credential_pool got a chance to rotate to the
next credential. Now deferred when the pool still has credentials.
3. (Non-issue) Retry budget was reported as too small, but successful
pool rotations already skip retry_count increment — no change needed.
Reported by community member Schinsly who identified all three root
causes and verified the fix locally with multiple Codex accounts.
Follow-up to PR #4305 — .config/gh was added to the write-deny list
but missed from _SENSITIVE_HOME_DIRS, leaving GitHub CLI OAuth tokens
exposed via @file:~/.config/gh/hosts.yml context injection.
- Add gho_, ghu_, ghs_, ghr_ prefix patterns (OAuth, user-to-server,
server-to-server, and refresh tokens) — all four types used by
GitHub Apps and Copilot auth flows were absent from _PREFIX_PATTERNS
- Snapshot HERMES_REDACT_SECRETS at module import time instead of
re-reading os.getenv() on every call, preventing runtime env mutations
(e.g. LLM-generated export commands) from disabling redaction
* feat(auth): add same-provider credential pools and rotation UX
Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.
- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647
* fix(tests): prevent pool auto-seeding from host env in credential pool tests
Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.
- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test
* feat(auth): add thread safety, least_used strategy, and request counting
- Add threading.Lock to CredentialPool for gateway thread safety
(concurrent requests from multiple gateway sessions could race on
pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
thread safety (4 threads × 20 selects with no corruption)
* feat(auth): add interactive mode for bare 'hermes auth' command
When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:
1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
add flow explicitly asks 'API key or OAuth login?' — making it
clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection
The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.
* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)
Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.
* feat(auth): support custom endpoint credential pools keyed by provider name
Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).
- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing
* docs: add Excalidraw diagram of full credential pool flow
Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration
Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g
* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow
The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached
* docs: add comprehensive credential pool documentation
- New page: website/docs/user-guide/features/credential-pools.md
Full guide covering quick start, CLI commands, rotation strategies,
error recovery, custom endpoint pools, auto-discovery, thread safety,
architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide
* chore: remove excalidraw diagram from repo (external link only)
* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns
- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
(token_type, scope, client_id, portal_base_url, obtained_at,
expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
agent_key_obtained_at, tls) into a single extra dict with
__getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider
Net -17 lines. All 383 targeted tests pass.
---------
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
OPENAI_BASE_URL was written to .env AND config.yaml, creating a dual-source
confusion. Users (especially Docker) would see the URL in .env and assume
that's where all config lives, then wonder why LLM_MODEL in .env didn't work.
Changes:
- Remove all 27 save_env_value("OPENAI_BASE_URL", ...) calls across main.py,
setup.py, and tools_config.py
- Remove OPENAI_BASE_URL env var reading from runtime_provider.py, cli.py,
models.py, and gateway/run.py
- Remove LLM_MODEL/HERMES_MODEL env var reading from gateway/run.py and
auxiliary_client.py — config.yaml model.default is authoritative
- Vision base URL now saved to config.yaml auxiliary.vision.base_url
(both setup wizard and tools_config paths)
- Tests updated to set config values instead of env vars
Convention enforced: .env is for SECRETS only (API keys). All other
configuration (model names, base URLs, provider selection) lives
exclusively in config.yaml.
- Add api.fireworks.ai to _URL_TO_PROVIDER for automatic provider detection
- Add fireworks to PROVIDER_TO_MODELS_DEV mapped to 'fireworks-ai' (the
correct models.dev provider key — original PR used 'fireworks' which
would silently fail the lookup)
Cherry-picked from PR #3989 with models.dev key fix.
Co-authored-by: sroecker <sroecker@users.noreply.github.com>
Claude Code >=2.1.81 checks for a 'scopes' array containing 'user:inference'
in ~/.claude/.credentials.json before accepting stored OAuth tokens as valid.
When Hermes refreshes the token, it writes only accessToken, refreshToken, and
expiresAt — omitting the scopes field. This causes Claude Code to report
'loggedIn: false' and refuse to start, even though the token is valid.
This commit:
- Parses the 'scope' field from the OAuth refresh response
- Passes it to _write_claude_code_credentials() as a keyword argument
- Persists the scopes array in the claudeAiOauth credential store
- Preserves existing scopes when the refresh response omits the field
Tested against Claude Code v2.1.87 on Linux — auth status correctly reports
loggedIn: true and claude --print works after this fix.
Co-authored-by: Nick <git@flybynight.io>
* fix: treat non-sk-ant- prefixed keys (Azure AI Foundry) as regular API keys, not OAuth tokens
* fix: treat non-sk-ant- keys as regular API keys, not OAuth tokens
_is_oauth_token() returned True for any key not starting with
sk-ant-api, misclassifying Azure AI Foundry keys as OAuth tokens
and sending Bearer auth instead of x-api-key → 401 rejection.
Real Anthropic OAuth tokens all start with sk-ant-oat (confirmed
from live .credentials.json). Non-sk-ant- keys are third-party
provider keys that should use x-api-key.
Test fixtures updated to use realistic sk-ant-oat01- prefixed
tokens instead of fake strings.
Salvaged from PR #4075 by @HangGlidersRule.
---------
Co-authored-by: Clawdbot <clawdbot@openclaw.ai>
MiniMax's /anthropic endpoints implement Anthropic's Messages API but
require Authorization: Bearer instead of x-api-key. Without this fix,
MiniMax users get 401 errors in gateway sessions.
Adds _requires_bearer_auth() to detect MiniMax endpoints and route
through auth_token in the Anthropic SDK. Check runs before OAuth
token detection so MiniMax keys aren't misclassified as setup tokens.
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
ElevenLabs (sk_), Tavily (tvly-), and Exa (exa_) keys were not covered
by _PREFIX_PATTERNS, leaking in plain text via printenv or log output.
Salvaged from PR #3790 by @memosr. Tests rewritten with correct
assertions (original tests had vacuously true checks).
Co-authored-by: memosr <memosr@users.noreply.github.com>
* add .aac audio file format support to transcription tool
* fix(agent): support full context length resolution for direct Gemini API endpoints
Add generativelanguage.googleapis.com to _URL_TO_PROVIDER so direct
Gemini API users get correct 1M+ context length instead of the 128K
unknown-proxy fallback.
Co-authored-by: bb873 <bb873@users.noreply.github.com>
---------
Co-authored-by: Adrian Scott <adrian@adrianscott.com>
Co-authored-by: bb873 <bb873@users.noreply.github.com>
The auxiliary client's auto-detection chain was a black box — when
compression, summarization, or memory flush failed, the only clue was
a generic 'Request timed out' with no indication of which provider was
tried or why it was skipped.
Now logs at INFO level:
- 'Auxiliary auto-detect: using local/custom (qwen3.5-9b) — skipped:
openrouter, nous' when auto-detection picks a provider
- 'Auxiliary compression: using auto (qwen3.5-9b) at http://localhost:11434/v1'
before each auxiliary call
- 'Auxiliary compression: provider custom unavailable, falling back to
openrouter' on fallback
- Clear warning with actionable guidance when NO provider is available:
'Set OPENROUTER_API_KEY or configure a local model in config.yaml'
Local inference servers (Ollama, llama.cpp, vLLM, LM Studio) don't
require API keys, but the auxiliary client's _resolve_custom_runtime()
rejected endpoints with empty keys — causing the auto-detection chain
to skip the user's local server entirely. This broke compression,
summarization, and memory flush for users running local models without
an OpenRouter/cloud API key.
The main CLI already had this fix (PR #2556, 'no-key-required'
placeholder), but the auxiliary client's resolution path was missed.
Two fixes:
- _resolve_custom_runtime(): use 'no-key-required' placeholder instead
of returning None when base_url is present but key is empty
- resolve_provider_client() custom branch: same placeholder fallback
for explicit_base_url without explicit_api_key
Updates 2 tests that expected the old (broken) behavior.
Tool call previews (paths, commands, queries) were hardcoded to truncate
at 35-40 chars across CLI spinners, completion lines, and gateway progress
messages. Users could not see full file paths in tool output.
New config option: display.tool_preview_length (default 0 = no limit).
Set a positive number to truncate at that length.
Changes:
- display.py: module-level _tool_preview_max_len with getter/setter;
build_tool_preview() and get_cute_tool_message() _trunc/_path respect it
- cli.py: reads config at startup, spinner widget respects config
- gateway/run.py: reads config per-message, progress callback respects config
- run_agent.py: removed redundant 30-char quiet-mode spinner truncation
- config.py: added display.tool_preview_length to DEFAULT_CONFIG
Reported by kriskaminski
Add skills.external_dirs config option — a list of additional directories
to scan for skills alongside ~/.hermes/skills/. External dirs are read-only:
skill creation/editing always writes to the local dir. Local skills take
precedence when names collide.
This lets users share skills across tools/agents without copying them into
Hermes's own directory (e.g. ~/.agents/skills, /shared/team-skills).
Changes:
- agent/skill_utils.py: add get_external_skills_dirs() and get_all_skills_dirs()
- agent/prompt_builder.py: scan external dirs in build_skills_system_prompt()
- tools/skills_tool.py: _find_all_skills() and skill_view() search external dirs;
security check recognizes configured external dirs as trusted
- agent/skill_commands.py: /skill slash commands discover external skills
- hermes_cli/config.py: add skills.external_dirs to DEFAULT_CONFIG
- cli-config.yaml.example: document the option
- tests/agent/test_external_skills.py: 11 tests covering discovery, precedence,
deduplication, and skill_view for external skills
Requested by community member primco.
Background agent's KawaiiSpinner wrote \r-based animation and stop()
messages through StdoutProxy, colliding with prompt_toolkit's status bar.
Two fixes:
- display.py: use isinstance(out, StdoutProxy) instead of fragile
hasattr+name check for detecting prompt_toolkit's stdout wrapper
- cli.py: silence bg agent's raw spinner (_print_fn=no-op) and route
thinking updates through the TUI widget only when no foreground
agent is active; clear spinner text in finally block with same guard
Closes#2718
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
Salvage of PR #3533 (binhnt92). Follow-up to #3480 — applies min(100, ...) to 5 remaining unclamped percentage display sites in context_compressor, cli /stats, gateway /stats, and memory tool. Defensive clamps now that the root cause (estimation heuristic) was already removed in #3480.
Co-Authored-By: binhnt92 <binhnt92@users.noreply.github.com>
Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml
instead of hardcoded values. Users with slow local models (Ollama, llama.cpp)
can now increase timeouts for compression, vision, session search, etc.
Defaults:
- auxiliary.compression.timeout: 120s (was hardcoded 45s)
- auxiliary.vision.timeout: 30s (unchanged)
- all other aux tasks: 30s (was hardcoded 30s)
- title_generator: 30s (was hardcoded 15s)
call_llm/async_call_llm now auto-resolve timeout from config when not
explicitly passed. Callers can still override with an explicit timeout arg.
Based on PR #3406 by alanfwilliams. Converted from env vars to config.yaml
per project conventions.
Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>
Use atomic_json_write() from utils.py instead of plain open()/json.dump()
for the models.dev disk cache. Prevents corrupted cache if the process is
killed mid-write — _load_disk_cache() silently returns {} on corrupt JSON,
losing all model metadata until the next successful API fetch.
Co-authored-by: memosr <memosr@users.noreply.github.com>
Cherry-pick of feat/gpt-tool-steering with modifications:
1. Tool-use enforcement prompt (refactored from GPT-specific):
- Renamed GPT_TOOL_USE_GUIDANCE -> TOOL_USE_ENFORCEMENT_GUIDANCE
- Added TOOL_USE_ENFORCEMENT_MODELS tuple: ('gpt', 'codex')
- Injection logic now checks against the tuple instead of hardcoding
'gpt' — adding new model families is a one-line change
- Addresses models describing actions instead of making tool calls
2. Budget warning history stripping:
- _strip_budget_warnings_from_history() strips _budget_warning JSON
keys and [BUDGET WARNING: ...] text from tool results at the start
of run_conversation()
- Prevents old budget warnings from poisoning subsequent turns
Based on PR #3479 by teknium1.
* fix: cap context pressure percentage at 100% in display
The forward-looking token estimate can overshoot the compaction threshold
(e.g. a large tool result pushes it from 70% to 109% in one step). The
progress bar was already capped via min(), but pct_int was not — causing
the user to see '109% to compaction' which is confusing.
Cap pct_int at 100 in both CLI and gateway display functions.
Reported by @JoshExile82.
* refactor: use real API token counts for compression decisions
Replace the rough chars/3 estimation with actual prompt_tokens +
completion_tokens from the API response. The estimation was needed to
predict whether tool results would push context past the threshold, but
the default 50% threshold leaves ample headroom — if tool results push
past it, the next API call reports real usage and triggers compression
then.
This removes all estimation from the compression and context pressure
paths, making both 100% data-driven from provider-reported token counts.
Also removes the dead _msg_count_before_tools variable.
_expand_git_reference() and _rg_files() called subprocess.run()
without a timeout. On a large repository, @diff, @staged, or
@git:N references could hang the agent indefinitely while git
or ripgrep processes slow output.
- Add timeout=30 to git subprocess in _expand_git_reference()
with a user-friendly error message on TimeoutExpired
- Add timeout=10 to rg subprocess in _rg_files() returning
None on timeout (falls back to os.walk folder listing)
Co-authored-by: memosr.eth <96793918+memosr@users.noreply.github.com>
Salvage of #3389 by @binhnt92 with reasoning fallback and retry logic added on top.
All 7 auxiliary LLM call sites now use extract_content_or_reasoning() which mirrors the main agent loop's behavior: extract content, strip think blocks, fall back to structured reasoning fields, retry on empty.
Closes#3389.
Show only agentic models that map to OpenRouter defaults:
Qwen/Qwen3.5-397B-A17B ↔ qwen/qwen3.5-plus
Qwen/Qwen3.5-35B-A3B ↔ qwen/qwen3.5-35b-a3b
deepseek-ai/DeepSeek-V3.2 ↔ deepseek/deepseek-chat
moonshotai/Kimi-K2.5 ↔ moonshotai/kimi-k2.5
MiniMaxAI/MiniMax-M2.5 ↔ minimax/minimax-m2.5
zai-org/GLM-5 ↔ z-ai/glm-5
XiaomiMiMo/MiMo-V2-Flash ↔ xiaomi/mimo-v2-pro
moonshotai/Kimi-K2-Thinking ↔ moonshotai/kimi-k2-thinking
Users can still pick any HF model via Enter custom model name.
The Anthropic adapter defaulted to max_tokens=16384 when no explicit value
was configured. This severely limits thinking-enabled models where thinking
tokens count toward max_tokens:
- Claude Opus 4.6 supports 128K output but was capped at 16K
- Claude Sonnet 4.6 supports 64K output but was capped at 16K
With extended thinking (adaptive or budget-based), the model could exhaust
the entire 16K on reasoning, leaving zero tokens for the actual response.
This caused two user-visible errors:
- 'Response truncated (finish_reason=length)' — thinking consumed most tokens
- 'Response only contains think block with no content' — thinking consumed all
Fix: add _ANTHROPIC_OUTPUT_LIMITS lookup table (sourced from Anthropic docs
and Cline's model catalog) and use the model's actual output limit as the
default. Unknown future models default to 128K (the current maximum).
Also adds context_length clamping: if the user configured a smaller context
window (e.g. custom endpoint), max_tokens is clamped to context_length - 1
to avoid exceeding the window.
Closes#2706
Salvage of PR #1747 (original PR #1171 by @davanstrien) onto current main.
Registers Hugging Face Inference Providers (router.huggingface.co/v1) as a named provider:
- hermes chat --provider huggingface (or --provider hf)
- 18 curated open models via hermes model picker
- HF_TOKEN in ~/.hermes/.env
- OpenAI-compatible endpoint with automatic failover (Groq, Together, SambaNova, etc.)
Files: auth.py, models.py, main.py, setup.py, config.py, model_metadata.py, .env.example, 5 docs pages, 17 new tests.
Co-authored-by: Daniel van Strien <davanstrien@gmail.com>
The OpenAI SDK's AsyncHttpxClientWrapper.__del__ schedules aclose() via
asyncio.get_running_loop().create_task(). When an AsyncOpenAI client is
garbage-collected while prompt_toolkit's event loop is running (the common
CLI idle state), the aclose() task runs on prompt_toolkit's loop but the
underlying TCP transport is bound to a different (dead) worker loop.
The transport's self._loop.call_soon() then raises RuntimeError('Event
loop is closed'), which prompt_toolkit surfaces as the disruptive
'Unhandled exception in event loop ... Press ENTER to continue...' error.
Three-layer fix:
1. neuter_async_httpx_del(): Monkey-patches __del__ to a no-op at CLI
startup before any AsyncOpenAI clients are created. Safe because
cached clients are explicitly cleaned via _force_close_async_httpx,
and uncached clients' TCP connections are cleaned by the OS on exit.
2. Custom asyncio exception handler: Installed on prompt_toolkit's event
loop to silently suppress 'Event loop is closed' RuntimeError.
Defense-in-depth for SDK upgrades that might change the class name.
3. cleanup_stale_async_clients(): Called after each agent turn (when the
agent thread joins) to proactively evict cache entries whose event
loop is closed, preventing stale clients from accumulating.
When user messages have empty content (e.g., Discord @mention-only
messages, unrecognized attachments), the Anthropic API rejects the
request with 'user messages must have non-empty content'.
Changes:
- anthropic_adapter.py: Add empty content validation for user messages
(string and list formats), matching the existing pattern for assistant
and tool messages. Empty content gets '(empty message)' placeholder.
- discord.py: Defense-in-depth check at gateway layer to catch empty
messages before they enter session history.
- Add 4 regression tests covering empty string, whitespace-only,
empty list, and empty text block scenarios.
Fixes#3143
Co-authored-by: Bartok9 <bartok9@users.noreply.github.com>
_try_anthropic() caught ImportError on the module import (line 667-669)
but not on the build_anthropic_client() call (line 696). When the
anthropic_adapter module imports fine but the anthropic SDK is missing,
build_anthropic_client() raises ImportError at call time. This escaped
_try_anthropic() entirely, killing get_available_vision_backends() and
cascading to 7 test failures:
- 4 setup wizard tests hit unexpected 'Configure vision:' prompt
- 3 codex-auth-as-vision tests failed check_vision_requirements()
The fix wraps the build_anthropic_client call in try/except ImportError,
returning (None, None) when the SDK is unavailable — consistent with the
existing guard at the top of the function.
* fix(gateway): silence flush agent terminal output
quiet_mode=True only suppresses AIAgent init messages.
Tool call output still leaks to the terminal through
_safe_print → _print_fn during session reset/expiry.
Since #2670 injected live memory state into the flush prompt,
the flush agent now reliably calls memory tools — making the
output leak noticeable for the first time.
Set _print_fn to a no-op so the background flush is fully silent.
* test(gateway): add test for flush agent terminal silence + fix dotenv mock
- Add TestFlushAgentSilenced: verifies _print_fn is set to a no-op on
the flush agent so tool output never leaks to the terminal
- Fix pre-existing test failures: replace patch('run_agent.AIAgent')
with sys.modules mock to avoid importing run_agent (requires openai)
- Add autouse _mock_dotenv fixture so all tests in this file run
without the dotenv package installed
* fix(display): route KawaiiSpinner output through print_fn to fully silence flush agent
The previous fix set tmp_agent._print_fn = no-op on the flush agent but
spinner output and quiet-mode cute messages bypassed _print_fn entirely:
- KawaiiSpinner captured sys.stdout at __init__ and wrote directly to it
- quiet-mode tool results used builtin print() instead of _safe_print()
Add optional print_fn parameter to KawaiiSpinner.__init__; _write routes
through it when set. Pass self._print_fn to all spinner construction sites
in run_agent.py and change the quiet-mode cute message print to _safe_print.
The existing gateway fix (tmp_agent._print_fn = lambda) now propagates
correctly through both paths.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(gateway): silence hygiene and compression background agents
Two more background AIAgent instances in the gateway were created with
quiet_mode=True but without _print_fn = no-op, causing tool output to
leak to the terminal:
- _hyg_agent (in-turn hygiene memory agent)
- tmp_agent (_compress_context path)
Apply the same _print_fn no-op pattern used for the flush agent.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore(display): remove unused _last_flush_time from KawaiiSpinner
Attribute was set but never read; upstream already removed it.
Leftover from conflict resolution during rebase onto upstream/main.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Nous Portal now passes through OpenRouter model names and routes from
there. Update the static fallback model list and auxiliary client default
to use OpenRouter-format slugs (provider/model) instead of bare names.
- _PROVIDER_MODELS['nous']: full OpenRouter catalog
- _NOUS_MODEL: google/gemini-3-flash-preview (was gemini-3-flash)
- Updated 4 test assertions for the new default model name
Anthropic migrated their OAuth infrastructure from console.anthropic.com
to platform.claude.com (Claude Code v2.1.81+). Update _refresh_oauth_token()
to try the new endpoint first, falling back to the old one for tokens
issued before the migration.
Also switches Content-Type from application/x-www-form-urlencoded to
application/json to match current Claude Code behavior.
Salvaged from PR #2741 by kshitijk4poor.
Two improvements salvaged from PR #2600 (paraddox):
1. Preflight compression now counts tool schema tokens alongside system
prompt and messages. With 50+ tools enabled, schemas can add 20-30K
tokens that were previously invisible to the estimator, delaying
compression until the API rejected the request.
2. Context probe persistence guard: when the agent steps down context
tiers after a context-length error, only provider-confirmed numeric
limits (parsed from the error message) are cached to disk. Guessed
fallback tiers from get_next_probe_tier() stay in-memory only,
preventing wrong values from polluting the persistent cache.
Co-authored-by: paraddox <paraddox@users.noreply.github.com>
Three categories of cleanup, all zero-behavioral-change:
1. F-strings without placeholders (154 fixes across 29 files)
- Converted f'...' to '...' where no {expression} was present
- Heaviest files: run_agent.py (24), cli.py (20), honcho_integration/cli.py (34)
2. Simplify defensive patterns in run_agent.py
- Added explicit self._is_anthropic_oauth = False in __init__ (before
the api_mode branch that conditionally sets it)
- Replaced 7x getattr(self, '_is_anthropic_oauth', False) with direct
self._is_anthropic_oauth (attribute always initialized now)
- Added _is_openrouter_url() and _is_anthropic_url() helper methods
- Replaced 3 inline 'openrouter' in self._base_url_lower checks
3. Remove dead code in small files
- hermes_cli/claw.py: removed unused 'total' computation
- tools/fuzzy_match.py: removed unused strip_indent() function and
pattern_stripped variable
Full test suite: 6184 passed, 0 failures
E2E PTY: banner clean, tool calls work, zero garbled ANSI
The recursive os.walk for AGENTS.md in subdirectories was undesired.
Only load AGENTS.md from the working directory root, matching the
behavior of CLAUDE.md and .cursorrules.
Remove run_hermes_oauth_login(), refresh_hermes_oauth_token(),
read_hermes_oauth_credentials(), _save_hermes_oauth_credentials(),
_generate_pkce(), and associated constants/credential file path.
This code was added in 63e88326 but never wired into any user-facing
flow (setup wizard, hermes model, or any CLI command). Neither
clawdbot/OpenClaw nor opencode implement PKCE for Anthropic — both
use setup-token or API keys. Dead code that was never tested in
production.
Also removes the credential resolution step that checked
~/.hermes/.anthropic_oauth.json (step 3 in resolve_anthropic_token),
renumbering remaining steps.
In gateway mode, async tools (vision_analyze, web_extract, session_search)
deadlock because _run_async() spawns a thread with asyncio.run(), creating
a new event loop, but _get_cached_client() returns an AsyncOpenAI client
bound to a different loop. httpx.AsyncClient cannot work across event loop
boundaries, causing await client.chat.completions.create() to hang forever.
Fix: include the event loop identity in the async client cache key so each
loop gets its own AsyncOpenAI instance. Also fix session_search_tool.py
which had its own broken asyncio.run()-in-thread pattern — now uses the
centralized _run_async() bridge.
frontmatter.get("metadata", {}) returns None (not {}) when the
key exists with a null value, crashing build_skills_system_prompt
with AttributeError: 'NoneType' object has no attribute 'get'.
Made-with: Cursor
Centralizes two widely-duplicated patterns into hermes_constants.py:
1. get_hermes_home() — Path resolution for ~/.hermes (HERMES_HOME env var)
- Was copy-pasted inline across 30+ files as:
Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
- Now defined once in hermes_constants.py (zero-dependency module)
- hermes_cli/config.py re-exports it for backward compatibility
- Removed local wrapper functions in honcho_integration/client.py,
tools/website_policy.py, tools/tirith_security.py, hermes_cli/uninstall.py
2. parse_reasoning_effort() — Reasoning effort string validation
- Was copy-pasted in cli.py, gateway/run.py, cron/scheduler.py
- Same validation logic: check against (xhigh, high, medium, low, minimal, none)
- Now defined once in hermes_constants.py, called from all 3 locations
- Warning log for unknown values kept at call sites (context-specific)
31 files changed, net +31 lines (125 insertions, 94 deletions)
Full test suite: 6179 passed, 0 failed
In gateway/Telegram mode, the stdout fd can be closed by executor
thread cleanup. KawaiiSpinner.stop() called isatty() on the closed fd,
raising ValueError and masking the original error.
Instead of a point fix, add a _is_tty property that centralizes the
closed-stream guard — both _animate() and stop() now use it. Follows
the same (ValueError, OSError) pattern already in _write().
Inspired by PR #2632 by bot-deo88.
format_token_count_compact() used unconditional rstrip("0") to clean up
decimal trailing zeros (e.g. "1.50" → "1.5"), but this also stripped
meaningful trailing zeros from whole numbers ("260" → "26", "100" → "1").
Guard the strip behind a decimal-point check.
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>