hermes-agent

Author	SHA1	Message	Date
Teknium	8cf013ecd9	fix: replace stale 'hermes login' refs with 'hermes auth' + fix credential removal re-seeding (#5670 ) Two fixes: 1. Replace all stale 'hermes login' references with 'hermes auth' across auth.py, auxiliary_client.py, delegate_tool.py, config.py, run_agent.py, and documentation. The 'hermes login' command was deprecated; 'hermes auth' now handles OAuth credential management. 2. Fix credential removal not persisting for singleton-sourced credentials (device_code for openai-codex/nous, hermes_pkce for anthropic). auth_remove_command already cleared env vars for env-sourced credentials, but singleton credentials stored in the auth store were re-seeded by _seed_from_singletons() on the next load_pool() call. Now clears the underlying auth store entry when removing singleton-sourced credentials.	2026-04-06 17:17:57 -07:00
Teknium	da02a4e283	fix: auxiliary client payment fallback — retry with next provider on 402 (#5599 ) When a user runs out of OpenRouter credits and switches to Codex (or any other provider), auxiliary tasks (compression, vision, web_extract) would still try OpenRouter first and fail with 402. Two fixes: 1. Payment fallback in call_llm(): When a resolved provider returns HTTP 402 or a credit-related error, automatically retry with the next available provider in the auto-detection chain. Skips the depleted provider and tries Nous → Custom → Codex → API-key providers. 2. Remove hardcoded OpenRouter fallback: The old code fell back specifically to OpenRouter when auto/custom resolution returned no client. Now falls back to the full auto-detection chain, which handles any available provider — not just OpenRouter. Also extracts _get_provider_chain() as a shared function (replaces inline tuple in _resolve_auto and the new fallback), built at call time so test patches on _try_* functions remain visible. Adds 16 tests covering _is_payment_error(), _get_provider_chain(), _try_payment_fallback(), and call_llm() integration with 402 retry.	2026-04-06 12:41:40 -07:00
Teknium	cc7136b1ac	fix: update Gemini model catalog + wire models.dev as live model source Follow-up for salvaged PR #5494: - Update model catalog to Gemini 3.x + Gemma 4 (drop deprecated 2.0) - Add list_agentic_models() to models_dev.py with noise filter - Wire models.dev into _model_flow_api_key_provider as primary source (static curated list serves as offline fallback) - Add gemini -> google mapping in PROVIDER_TO_MODELS_DEV - Fix Gemma 4 context lengths to 256K (models.dev values) - Update auxiliary model to gemini-3-flash-preview - Expand tests: 3.x catalog, context lengths, models.dev integration	2026-04-06 10:28:03 -07:00
Teknium	6dfab35501	feat(providers): add Google AI Studio (Gemini) as a first-class provider Cherry-picked from PR #5494 by kshitijk4poor. Adds native Gemini support via Google's OpenAI-compatible endpoint. Zero new dependencies.	2026-04-06 10:28:03 -07:00
Teknium	93aa01c71c	fix: use main provider model for auxiliary tasks on non-aggregator providers (#5091 ) Users on direct API-key providers (Alibaba, DeepSeek, ZAI, etc.) without an OpenRouter or Nous key would get broken auxiliary tasks (compression, vision, etc.) because _resolve_auto() only tried aggregator providers first, then fell back to iterating PROVIDER_REGISTRY with wrong default model names. Now _resolve_auto() checks the user's main provider first. If it's not an aggregator (OpenRouter/Nous), it uses their main model directly for all auxiliary tasks. Aggregator users still get the cheap gemini-flash model as before. Adds _read_main_provider() to read model.provider from config.yaml, mirroring the existing _read_main_model(). Reported by SkyLinx — Alibaba Coding Plan user getting 400 errors from google/gemini-3-flash-preview being sent to DashScope.	2026-04-04 12:07:43 -07:00
kshitijk4poor	e94b4b2b40	fix: preserve allowed_users during setup reconfigure and quiet unconfigured provider warnings Setup wizard now shows existing allowed_users when reconfiguring a platform and preserves them if the user presses Enter. Previously the wizard would display a misleading "No allowlist set" warning even when the .env still held the original IDs. Also downgrades the "provider X has no API key configured" log from WARNING to DEBUG in resolve_provider_client — callers already handle the None return with their own contextual messages. This eliminates noisy startup warnings for providers in the fallback chain that the user never configured (e.g. minimax).	2026-04-02 01:00:29 -07:00
Teknium	8d59881a62	feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647 ) * feat(auth): add same-provider credential pools and rotation UX Add same-provider credential pooling so Hermes can rotate across multiple credentials for a single provider, recover from exhausted credentials without jumping providers immediately, and configure that behavior directly in hermes setup. - agent/credential_pool.py: persisted per-provider credential pools - hermes auth add/list/remove/reset CLI commands - 429/402/401 recovery with pool rotation in run_agent.py - Setup wizard integration for pool strategy configuration - Auto-seeding from env vars and existing OAuth state Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com> Salvaged from PR #2647 * fix(tests): prevent pool auto-seeding from host env in credential pool tests Tests for non-pool Anthropic paths and auth remove were failing when host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials were present. The pool auto-seeding picked these up, causing unexpected pool entries in tests. - Mock _select_pool_entry in auxiliary_client OAuth flag tests - Clear Anthropic env vars and mock _seed_from_singletons in auth remove test * feat(auth): add thread safety, least_used strategy, and request counting - Add threading.Lock to CredentialPool for gateway thread safety (concurrent requests from multiple gateway sessions could race on pool state mutations without this) - Add 'least_used' rotation strategy that selects the credential with the lowest request_count, distributing load more evenly - Add request_count field to PooledCredential for usage tracking - Add mark_used() method to increment per-credential request counts - Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current() with lock acquisition - Add tests: least_used selection, mark_used counting, concurrent thread safety (4 threads × 20 selects with no corruption) * feat(auth): add interactive mode for bare 'hermes auth' command When 'hermes auth' is called without a subcommand, it now launches an interactive wizard that: 1. Shows full credential pool status across all providers 2. Offers a menu: add, remove, reset cooldowns, set strategy 3. For OAuth-capable providers (anthropic, nous, openai-codex), the add flow explicitly asks 'API key or OAuth login?' — making it clear that both auth types are supported for the same provider 4. Strategy picker shows all 4 options (fill_first, round_robin, least_used, random) with the current selection marked 5. Remove flow shows entries with indices for easy selection The subcommand paths (hermes auth add/list/remove/reset) still work exactly as before for scripted/non-interactive use. * fix(tests): update runtime_provider tests for config.yaml source of truth (#4165) Tests were using OPENAI_BASE_URL env var which is no longer consulted after #4165. Updated to use model config (provider, base_url, api_key) which is the new single source of truth for custom endpoint URLs. * feat(auth): support custom endpoint credential pools keyed by provider name Custom OpenAI-compatible endpoints all share provider='custom', making the provider-keyed pool useless. Now pools for custom endpoints are keyed by 'custom:<normalized_name>' where the name comes from the custom_providers config list (auto-generated from URL hostname). - Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)' - load_pool('custom:name') seeds from custom_providers api_key AND model.api_key when base_url matches - hermes auth add/list now shows custom endpoints alongside registry providers - _resolve_openrouter_runtime and _resolve_named_custom_runtime check pool before falling back to single config key - 6 new tests covering custom pool keying, seeding, and listing * docs: add Excalidraw diagram of full credential pool flow Comprehensive architecture diagram showing: - Credential sources (env vars, auth.json OAuth, config.yaml, CLI) - Pool storage and auto-seeding - Runtime resolution paths (registry, custom, OpenRouter) - Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh) - CLI management commands and strategy configuration Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g * fix(tests): update setup wizard pool tests for unified select_provider_and_model flow The setup wizard now delegates to select_provider_and_model() instead of using its own prompt_choice-based provider picker. Tests needed: - Mock select_provider_and_model as no-op (provider pre-written to config) - Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it) - Pre-write model.provider to config so the pool step is reached * docs: add comprehensive credential pool documentation - New page: website/docs/user-guide/features/credential-pools.md Full guide covering quick start, CLI commands, rotation strategies, error recovery, custom endpoint pools, auto-discovery, thread safety, architecture, and storage format. - Updated fallback-providers.md to reference credential pools as the first layer of resilience (same-provider rotation before cross-provider) - Added hermes auth to CLI commands reference with usage examples - Added credential_pool_strategies to configuration guide * chore: remove excalidraw diagram from repo (external link only) * refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns - _load_config_safe(): replace 4 identical try/except/import blocks - _iter_custom_providers(): shared generator for custom provider iteration - PooledCredential.extra dict: collapse 11 round-trip-only fields (token_type, scope, client_id, portal_base_url, obtained_at, expires_in, agent_key_id, agent_key_expires_in, agent_key_reused, agent_key_obtained_at, tls) into a single extra dict with __getattr__ for backward-compatible access - _available_entries(): shared exhaustion-check between select and peek - Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical) - SimpleNamespace replaces class _Args boilerplate in auth_commands - _try_resolve_from_custom_pool(): shared pool-check in runtime_provider Net -17 lines. All 383 targeted tests pass. --------- Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-31 03:10:01 -07:00
Teknium	f890a94c12	refactor: make config.yaml the single source of truth for endpoint URLs (#4165 ) OPENAI_BASE_URL was written to .env AND config.yaml, creating a dual-source confusion. Users (especially Docker) would see the URL in .env and assume that's where all config lives, then wonder why LLM_MODEL in .env didn't work. Changes: - Remove all 27 save_env_value("OPENAI_BASE_URL", ...) calls across main.py, setup.py, and tools_config.py - Remove OPENAI_BASE_URL env var reading from runtime_provider.py, cli.py, models.py, and gateway/run.py - Remove LLM_MODEL/HERMES_MODEL env var reading from gateway/run.py and auxiliary_client.py — config.yaml model.default is authoritative - Vision base URL now saved to config.yaml auxiliary.vision.base_url (both setup wizard and tools_config paths) - Tests updated to set config values instead of env vars Convention enforced: .env is for SECRETS only (API keys). All other configuration (model names, base URLs, provider selection) lives exclusively in config.yaml.	2026-03-30 22:02:53 -07:00
Teknium	e296efbf24	fix: add INFO-level logging for auxiliary provider resolution (#3866 ) The auxiliary client's auto-detection chain was a black box — when compression, summarization, or memory flush failed, the only clue was a generic 'Request timed out' with no indication of which provider was tried or why it was skipped. Now logs at INFO level: - 'Auxiliary auto-detect: using local/custom (qwen3.5-9b) — skipped: openrouter, nous' when auto-detection picks a provider - 'Auxiliary compression: using auto (qwen3.5-9b) at http://localhost:11434/v1' before each auxiliary call - 'Auxiliary compression: provider custom unavailable, falling back to openrouter' on fallback - Clear warning with actionable guidance when NO provider is available: 'Set OPENROUTER_API_KEY or configure a local model in config.yaml'	2026-03-29 21:29:00 -07:00
Teknium	3cc50532d1	fix: auxiliary client uses placeholder key for local servers without auth (#3842 ) Local inference servers (Ollama, llama.cpp, vLLM, LM Studio) don't require API keys, but the auxiliary client's _resolve_custom_runtime() rejected endpoints with empty keys — causing the auto-detection chain to skip the user's local server entirely. This broke compression, summarization, and memory flush for users running local models without an OpenRouter/cloud API key. The main CLI already had this fix (PR #2556, 'no-key-required' placeholder), but the auxiliary client's resolution path was missed. Two fixes: - _resolve_custom_runtime(): use 'no-key-required' placeholder instead of returning None when base_url is present but key is empty - resolve_provider_client() custom branch: same placeholder fallback for explicit_base_url without explicit_api_key Updates 2 tests that expected the old (broken) behavior.	2026-03-29 21:05:36 -07:00
Teknium	839d9d7471	feat(agent): configurable timeouts for auxiliary LLM calls via config.yaml (#3597 ) Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml instead of hardcoded values. Users with slow local models (Ollama, llama.cpp) can now increase timeouts for compression, vision, session search, etc. Defaults: - auxiliary.compression.timeout: 120s (was hardcoded 45s) - auxiliary.vision.timeout: 30s (unchanged) - all other aux tasks: 30s (was hardcoded 30s) - title_generator: 30s (was hardcoded 15s) call_llm/async_call_llm now auto-resolve timeout from config when not explicitly passed. Callers can still override with an explicit timeout arg. Based on PR #3406 by alanfwilliams. Converted from env vars to config.yaml per project conventions. Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>	2026-03-28 14:35:28 -07:00
Teknium	658692799d	fix: guard aux LLM calls against None content + reasoning fallback + retry (salvage #3389 ) (#3449 ) Salvage of #3389 by @binhnt92 with reasoning fallback and retry logic added on top. All 7 auxiliary LLM call sites now use extract_content_or_reasoning() which mirrors the main agent loop's behavior: extract content, strip think blocks, fall back to structured reasoning fields, retry on empty. Closes #3389.	2026-03-27 15:28:19 -07:00
Teknium	e0dbbdb2c9	fix: eliminate 'Event loop is closed' / 'Press ENTER to continue' during idle (#3398 ) The OpenAI SDK's AsyncHttpxClientWrapper.__del__ schedules aclose() via asyncio.get_running_loop().create_task(). When an AsyncOpenAI client is garbage-collected while prompt_toolkit's event loop is running (the common CLI idle state), the aclose() task runs on prompt_toolkit's loop but the underlying TCP transport is bound to a different (dead) worker loop. The transport's self._loop.call_soon() then raises RuntimeError('Event loop is closed'), which prompt_toolkit surfaces as the disruptive 'Unhandled exception in event loop ... Press ENTER to continue...' error. Three-layer fix: 1. neuter_async_httpx_del(): Monkey-patches __del__ to a no-op at CLI startup before any AsyncOpenAI clients are created. Safe because cached clients are explicitly cleaned via _force_close_async_httpx, and uncached clients' TCP connections are cleaned by the OS on exit. 2. Custom asyncio exception handler: Installed on prompt_toolkit's event loop to silently suppress 'Event loop is closed' RuntimeError. Defense-in-depth for SDK upgrades that might change the class name. 3. cleanup_stale_async_clients(): Called after each agent turn (when the agent thread joins) to proactively evict cache entries whose event loop is closed, preventing stale clients from accumulating.	2026-03-27 09:45:25 -07:00
Teknium	5a1e2a307a	perf(ttft): salvage easy-win startup optimizations from #3346 (#3395 ) * perf(ttft): dedupe shared tool availability checks * perf(ttft): short-circuit vision auto-resolution * perf(ttft): make Claude Code version detection lazy * perf(ttft): reuse loaded toolsets for skills prompt --------- Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-27 07:49:44 -07:00
Teknium	ad764d3513	fix(auxiliary): catch ImportError from build_anthropic_client in vision auto-detection (#3312 ) _try_anthropic() caught ImportError on the module import (line 667-669) but not on the build_anthropic_client() call (line 696). When the anthropic_adapter module imports fine but the anthropic SDK is missing, build_anthropic_client() raises ImportError at call time. This escaped _try_anthropic() entirely, killing get_available_vision_backends() and cascading to 7 test failures: - 4 setup wizard tests hit unexpected 'Configure vision:' prompt - 3 codex-auth-as-vision tests failed check_vision_requirements() The fix wraps the build_anthropic_client call in try/except ImportError, returning (None, None) when the SDK is unavailable — consistent with the existing guard at the top of the function.	2026-03-26 18:21:59 -07:00
Teknium	a8e02c7d49	fix: align Nous Portal model slugs with OpenRouter naming (#3253 ) Nous Portal now passes through OpenRouter model names and routes from there. Update the static fallback model list and auxiliary client default to use OpenRouter-format slugs (provider/model) instead of bare names. - _PROVIDER_MODELS['nous']: full OpenRouter catalog - _NOUS_MODEL: google/gemini-3-flash-preview (was gemini-3-flash) - Updated 4 test assertions for the new default model name	2026-03-26 13:49:43 -07:00
ctlst	281100e2df	fix(agent): prevent AsyncOpenAI/httpx cross-loop deadlock in gateway mode (#2701 ) In gateway mode, async tools (vision_analyze, web_extract, session_search) deadlock because _run_async() spawns a thread with asyncio.run(), creating a new event loop, but _get_cached_client() returns an AsyncOpenAI client bound to a different loop. httpx.AsyncClient cannot work across event loop boundaries, causing await client.chat.completions.create() to hang forever. Fix: include the event loop identity in the async client cache key so each loop gets its own AsyncOpenAI instance. Also fix session_search_tool.py which had its own broken asyncio.run()-in-thread pattern — now uses the centralized _run_async() bridge.	2026-03-25 17:31:56 -07:00
Teknium	8bb1d15da4	chore: remove ~100 unused imports across 55 files (#3016 ) Automated cleanup via pyflakes + autoflake with manual review. Changes: - Removed unused stdlib imports (os, sys, json, pathlib.Path, etc.) - Removed unused typing imports (List, Dict, Any, Optional, Tuple, Set, etc.) - Removed unused internal imports (hermes_cli.auth, hermes_cli.config, etc.) - Fixed cli.py: removed 8 shadowed banner imports (imported from hermes_cli.banner then immediately redefined locally — only build_welcome_banner is actually used) - Added noqa comments to imports that appear unused but serve a purpose: - Re-exports (gateway/session.py SessionResetPolicy, tools/terminal_tool.py is_interrupted/_interrupt_event) - SDK presence checks in try/except (daytona, fal_client, discord) - Test mock targets (auxiliary_client.py Path, mcp_config.py get_hermes_home) Zero behavioral changes. Full test suite passes (6162/6162, 2 pre-existing streaming test failures unrelated to this change).	2026-03-25 15:02:03 -07:00
Teknium	1f21ef7488	fix(cli): prevent 'Press ENTER to continue...' on exit When AsyncOpenAI clients are garbage-collected after the event loop closes, their AsyncHttpxClientWrapper.__del__ tries to schedule aclose() on the dead loop, causing RuntimeError: Event loop is closed. prompt_toolkit catches this as an unhandled exception and shows 'Press ENTER to continue...' which blocks CLI exit. Fix: Add shutdown_cached_clients() to auxiliary_client.py that marks all cached async clients' underlying httpx transport as CLOSED before GC runs. This prevents __del__ from attempting the aclose() call. - _force_close_async_httpx(): sets httpx AsyncClient._state to CLOSED - shutdown_cached_clients(): iterates _client_cache, closes sync clients normally and marks async clients as closed - Also fix stale client eviction in _get_cached_client to mark evicted async clients as closed (was just del-ing them, triggering __del__) - Call shutdown_cached_clients() from _run_cleanup() in cli.py	2026-03-22 15:31:54 -07:00
Teknium	306e67f32d	fix: fail fast when explicit provider has no API key instead of silent OpenRouter fallback (#2445 ) When a non-OpenRouter provider (e.g. minimax, anthropic) is set in config.yaml but its API key is missing, Hermes silently fell back to OpenRouter, causing confusing 404 errors. Now checks if the user explicitly configured a provider before falling back. Explicit providers raise RuntimeError with a clear message naming the missing env var. Auto/openrouter/custom providers still fall through to OpenRouter as before. Three code paths fixed: - run_agent.py AIAgent.__init__ — main client initialization - auxiliary_client.py call_llm — sync auxiliary calls - auxiliary_client.py call_llm_streaming — async auxiliary calls Based on PR #2272 by @StefanIsMe. Applied manually to fix a pconfig NameError in the original and extend to call_llm_streaming. Co-authored-by: StefanIsMe <StefanIsMe@users.noreply.github.com>	2026-03-22 03:59:29 -07:00
0xbyt4	dbc25a386e	fix: auxiliary client skips expired Codex JWT and propagates Anthropic OAuth flag Two bugs in the auxiliary provider auto-detection chain: 1. Expired Codex JWT blocks the auto chain: _read_codex_access_token() returned any stored token without checking expiry, preventing fallback to working providers. Now decodes JWT exp claim and returns None for expired tokens. 2. Auxiliary Anthropic client missing OAuth identity transforms: _AnthropicCompletionsAdapter always called build_anthropic_kwargs with is_oauth=False, causing 400 errors for OAuth tokens. Now detects OAuth tokens via _is_oauth_token() and propagates the flag through the adapter chain. Cherry-picked from PR #2378 by 0xbyt4. Fixed test_api_key_no_oauth_flag to mock resolve_anthropic_token directly (env var alone was insufficient).	2026-03-21 17:36:25 -07:00
Teknium	f8fb61d4ad	fix(provider): prevent Anthropic fallback from inheriting non-Anthropic base_url Only honor config.model.base_url for Anthropic resolution when config.model.provider is actually "anthropic". This prevents a Codex (or other provider) base_url from leaking into Anthropic runtime and auxiliary client paths, which would send requests to the wrong endpoint. Closes #2384	2026-03-21 16:16:17 -07:00
Teknium	7a427d7b03	fix: persistent event loop in _run_async prevents 'Event loop is closed' (#2190 ) Cherry-picked from PR #2146 by @crazywriter1. Fixes #2104. asyncio.run() creates and closes a fresh event loop each call. Cached httpx/AsyncOpenAI clients bound to the dead loop crash on GC with 'Event loop is closed'. This hit vision_analyze on first use in CLI. Two-layer fix: - model_tools._run_async(): replace asyncio.run() with persistent loop via _get_tool_loop() + run_until_complete() - auxiliary_client._get_cached_client(): track which loop created each async client, discard stale entries if loop is closed 6 regression tests covering loop lifecycle, reuse, and full vision dispatch chain. Co-authored-by: Test <test@test.com>	2026-03-20 09:44:50 -07:00
Teknium	67d707e851	fix: respect config.yaml model.base_url for Anthropic provider (#1948 ) (#1998 ) After #1675 removed ANTHROPIC_BASE_URL env var support, the Anthropic provider base URL was hardcoded to https://api.anthropic.com. Now reads model.base_url from config.yaml as an override, falling back to the default when not set. Also applies to the auxiliary client. Cherry-picked from PR #1949 by @rivercrab26. Co-authored-by: rivercrab26 <rivercrab26@users.noreply.github.com>	2026-03-18 16:51:24 -07:00
Test	e7844e9c8d	Merge origin/main, resolve conflicts (self._base_url_lower)	2026-03-18 04:09:00 -07:00
octo-patch	e4043633fc	feat: upgrade MiniMax default to M2.7 + add new OpenRouter models MiniMax: Add M2.7 and M2.7-highspeed as new defaults across provider model lists, auxiliary client, metadata, setup wizard, RL training tool, fallback tests, and docs. Retain M2.5/M2.1 as alternatives. OpenRouter: Add grok-4.20-beta, nemotron-3-super-120b-a12b:free, trinity-large-preview:free, glm-5-turbo, and hunter-alpha to the model catalog. MiniMax changes based on PR #1882 by @octo-patch (applied manually due to stale conflicts in refactored pricing module).	2026-03-18 02:42:58 -07:00
max	0c392e7a87	feat: integrate GitHub Copilot providers across Hermes Add first-class GitHub Copilot and Copilot ACP provider support across model selection, runtime provider resolution, CLI sessions, delegated subagents, cron jobs, and the Telegram gateway. This also normalizes Copilot model catalogs and API modes, introduces a Copilot ACP OpenAI-compatible shim, and fixes service-mode auth by resolving Homebrew-installed gh binaries under launchd. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-17 23:40:22 -07:00
Teknium	d1d17f4f0a	feat(compression): add summary_base_url + move compression config to YAML-only - Add summary_base_url config option to compression block for custom OpenAI-compatible endpoints (e.g. zai, DeepSeek, Ollama) - Remove compression env var bridges from cli.py and gateway/run.py (CONTEXT_COMPRESSION_* env vars no longer set from config) - Switch run_agent.py to read compression config directly from config.yaml instead of env vars - Fix backwards-compat block in _resolve_task_provider_model to also fire when auxiliary.compression.provider is 'auto' (DEFAULT_CONFIG sets this, which was silently preventing the compression section's summary_* keys from being read) - Add test for summary_base_url config-to-client flow - Update docs to show compression as config.yaml-only Closes #1591 Based on PR #1702 by @uzaylisak	2026-03-17 04:46:15 -07:00
teknium1	e5a244ad5d	fix(aux): reset auxiliary_is_nous flag on each resolution attempt The module-level auxiliary_is_nous was set to True by _try_nous() and never reset. In long-running gateway processes, once Nous was resolved as auxiliary provider, the flag stayed True forever — even if subsequent resolutions chose a different provider (e.g. OpenRouter). This caused Nous product tags to be sent to non-Nous providers. Reset the flag at the start of _resolve_auto() so only the winning provider's flag persists.	2026-03-17 04:02:15 -07:00
Teknium	1d5a39e002	fix: thread safety for concurrent subagent delegation (#1672 ) * fix: thread safety for concurrent subagent delegation Four thread-safety fixes that prevent crashes and data races when running multiple subagents concurrently via delegate_task: 1. Remove redirect_stdout/stderr from delegate_tool — mutating global sys.stdout races with the spinner thread when multiple children start concurrently, causing segfaults. Children already run with quiet_mode=True so the redirect was redundant. 2. Split _run_single_child into _build_child_agent (main thread) + _run_single_child (worker thread). AIAgent construction creates httpx/SSL clients which are not thread-safe to initialize concurrently. 3. Add threading.Lock to SessionDB — subagents share the parent's SessionDB and call create_session/append_message from worker threads with no synchronization. 4. Add _active_children_lock to AIAgent — interrupt() iterates _active_children while worker threads append/remove children. 5. Add _client_cache_lock to auxiliary_client — multiple subagent threads may resolve clients concurrently via call_llm(). Based on PR #1471 by peteromallet. * feat: Honcho base_url override via config.yaml + quick command alias type Two features salvaged from PR #1576: 1. Honcho base_url override: allows pointing Hermes at a remote self-hosted Honcho deployment via config.yaml: honcho: base_url: "http://192.168.x.x:8000" When set, this overrides the Honcho SDK's environment mapping (production/local), enabling LAN/VPN Honcho deployments without requiring the server to live on localhost. Uses config.yaml instead of env var (HONCHO_URL) per project convention. 2. Quick command alias type: adds a new 'alias' quick command type that rewrites to another slash command before normal dispatch: quick_commands: sc: type: alias target: /context Supports both CLI and gateway. Arguments are forwarded to the target command. Based on PR #1576 by redhelix. --------- Co-authored-by: peteromallet <peteromallet@users.noreply.github.com> Co-authored-by: redhelix <redhelix@users.noreply.github.com>	2026-03-17 02:53:33 -07:00
Teknium	35d948b6e1	feat: add Kilo Code (kilocode) as first-class inference provider (#1666 ) Add Kilo Gateway (kilo.ai) as an API-key provider with OpenAI-compatible endpoint at https://api.kilo.ai/api/gateway. Supports 500+ models from Anthropic, OpenAI, Google, xAI, Mistral, MiniMax via a single API key. - Register kilocode in PROVIDER_REGISTRY with aliases (kilo, kilo-code, kilo-gateway) and KILOCODE_API_KEY / KILOCODE_BASE_URL env vars - Add to model catalog, CLI provider menu, setup wizard, doctor checks - Add google/gemini-3-flash-preview as default aux model - 12 new tests covering registration, aliases, credential resolution, runtime config - Documentation updates (env vars, config, fallback providers) - Fix setup test index shift from provider insertion Inspired by PR #1473 by @amanning3390. Co-authored-by: amanning3390 <amanning3390@users.noreply.github.com>	2026-03-17 02:40:34 -07:00
Teknium	40e2f8d9f0	feat(provider): add OpenCode Zen and OpenCode Go providers Add support for OpenCode Zen (pay-as-you-go, 35+ curated models) and OpenCode Go ($10/month subscription, open models) as first-class providers. Both are OpenAI-compatible endpoints resolved via the generic api_key provider flow — no custom adapter needed. Files changed: - hermes_cli/auth.py — ProviderConfig entries + aliases - hermes_cli/config.py — OPENCODE_ZEN/GO API key env vars - hermes_cli/models.py — model catalogs, labels, aliases, provider order - hermes_cli/main.py — provider labels, menu entries, model flow dispatch - hermes_cli/setup.py — setup wizard branches (idx 10, 11) - agent/model_metadata.py — context lengths for all OpenCode models - agent/auxiliary_client.py — default aux models - .env.example — documentation Co-authored-by: DevAgarwal2 <DevAgarwal2@users.noreply.github.com>	2026-03-17 02:02:43 -07:00
Teknium	3576f44a57	feat: add Vercel AI Gateway provider (#1628 ) * feat: add Vercel AI Gateway as a first-class provider Adds AI Gateway (ai-gateway.vercel.sh) as a new inference provider with AI_GATEWAY_API_KEY authentication, live model discovery, and reasoning support via extra_body.reasoning. Based on PR #1492 by jerilynzheng. * feat: add AI Gateway to setup wizard, doctor, and fallback providers * test: add AI Gateway to api_key_providers test suite * feat: add AI Gateway to hermes model CLI and model metadata Wire AI Gateway into the interactive model selection menu and add context lengths for AI Gateway model IDs in model_metadata.py. * feat: use claude-haiku-4.5 as AI Gateway auxiliary model * revert: use gemini-3-flash as AI Gateway auxiliary model * fix: move AI Gateway below established providers in selection order --------- Co-authored-by: jerilynzheng <jerilynzheng@users.noreply.github.com> Co-authored-by: jerilynzheng <zheng.jerilyn@gmail.com>	2026-03-17 00:12:16 -07:00
teknium1	62abb453d3	Merge origin/main into hermes/hermes-daa73839	2026-03-14 23:44:47 -07:00
teknium1	735a6e7651	fix: convert anthropic image content blocks	2026-03-14 23:41:20 -07:00
teknium1	1337c9efd8	test: resolve auxiliary client merge conflict	2026-03-14 22:15:16 -07:00
teknium1	85ef09e520	Merge origin/main into hermes/hermes-dd253d81	2026-03-14 21:16:29 -07:00
teknium1	db362dbd4c	feat: add native Anthropic auxiliary vision	2026-03-14 21:14:20 -07:00
teknium1	9f6bccd76a	feat: add direct endpoint overrides for auxiliary and delegation Add base_url/api_key overrides for auxiliary tasks and delegation so users can route those flows straight to a custom OpenAI-compatible endpoint without having to rely on provider=main or named custom providers. Also clear gateway session env vars in test isolation so the full suite stays deterministic when run from a messaging-backed agent session.	2026-03-14 21:11:37 -07:00
Teknium	a86b487349	Merge pull request #1373 from NousResearch/hermes/hermes-781f9235 fix: restore config-saved custom endpoint resolution	2026-03-14 21:06:41 -07:00
teknium1	53d1043a50	fix: restore config-saved custom endpoint resolution	2026-03-14 20:58:12 -07:00
teknium1	dc11b86e4b	refactor: unify vision backend gating	2026-03-14 20:22:13 -07:00
0xIbra	437ec17125	fix(cli): respect HERMES_HOME in all remaining hardcoded ~/.hermes paths Several files resolved paths via Path.home() / ".hermes" or os.path.expanduser("~/.hermes/..."), bypassing the HERMES_HOME environment variable. This broke isolation when running multiple Hermes instances with distinct HERMES_HOME directories. Replace all hardcoded paths with calls to get_hermes_home() from hermes_cli.config, consistent with the rest of the codebase. Files fixed: - tools/process_registry.py (processes.json) - gateway/pairing.py (pairing/) - gateway/sticker_cache.py (sticker_cache.json) - gateway/channel_directory.py (channel_directory.json, sessions.json) - gateway/config.py (gateway.json, config.yaml, sessions_dir) - gateway/mirror.py (sessions/) - gateway/hooks.py (hooks/) - gateway/platforms/base.py (image_cache/, audio_cache/, document_cache/) - gateway/platforms/whatsapp.py (whatsapp/session) - gateway/delivery.py (cron/output) - agent/auxiliary_client.py (auth.json) - agent/prompt_builder.py (SOUL.md) - cli.py (config.yaml, images/, pastes/, history) - run_agent.py (logs/) - tools/environments/base.py (sandboxes/) - tools/environments/modal.py (modal_snapshots.json) - tools/environments/singularity.py (singularity_snapshots.json) - tools/tts_tool.py (audio_cache) - hermes_cli/status.py (cron/jobs.json, sessions.json) - hermes_cli/gateway.py (logs/, whatsapp session) - hermes_cli/main.py (whatsapp/session) Tests updated to use HERMES_HOME env var instead of patching Path.home(). Closes #892 (cherry picked from commit 78ac1bba43b8b74a934c6172f2c29bb4d03164b9)	2026-03-13 21:32:53 -07:00
Teknium	11b577671b	fix: auxiliary client uses main model for custom/local endpoints instead of gpt-4o-mini (#1189 ) * fix: prevent model/provider mismatch when switching providers during active gateway When _update_config_for_provider() writes the new provider and base_url to config.yaml, the gateway (which re-reads config per-message) can pick up the change before model selection completes. This causes the old model name (e.g. 'anthropic/claude-opus-4.6') to be sent to the new provider's API (e.g. MiniMax), which fails. Changes: - _update_config_for_provider() now accepts an optional default_model parameter. When provided and the current model.default is empty or uses OpenRouter format (contains '/'), it sets a safe default model for the new provider. - All setup.py callers for direct-API providers (zai, kimi, minimax, minimax-cn, anthropic) now pass a provider-appropriate default model. - _setup_provider_model_selection() now validates the 'Keep current' choice: if the current model uses OpenRouter format and wouldn't work with the new provider, it warns and switches to the provider's first default model instead of silently keeping the incompatible name. Reported by a user on Home Assistant whose gateway started sending 'anthropic/claude-opus-4.6' to MiniMax's API after running hermes setup. * fix: auxiliary client uses main model for custom/local endpoints instead of gpt-4o-mini When a user runs a local server (e.g. Qwen3.5-9B via OPENAI_BASE_URL), the auxiliary client (context compression, vision, session search) would send requests for 'gpt-4o-mini' or 'google/gemini-3-flash-preview' to the local server, which only serves one model — causing 404 errors mid-task. Changes: - _try_custom_endpoint() now reads the user's configured main model via _read_main_model() (checks OPENAI_MODEL → HERMES_MODEL → LLM_MODEL → config.yaml model.default) instead of hardcoding 'gpt-4o-mini'. - resolve_provider_client() auto mode now detects when an OpenRouter- formatted model override (containing '/') would be sent to a non- OpenRouter provider (like a local server) and drops it in favor of the provider's default model. - Test isolation fixes: properly clear env vars in 'nothing available' tests to prevent host environment leakage.	2026-03-13 10:02:16 -07:00
teknium1	e976879cf2	merge: resolve conflicts with main (URL update to hermes-agent.nousresearch.com)	2026-03-12 17:49:26 -07:00
teknium1	4068f20ce9	fix(anthropic): deep scan fixes — auth, retries, edge cases Fixes from comprehensive code review and cross-referencing with clawdbot/OpenCode implementations: CRITICAL: - Add one-shot guard (anthropic_auth_retry_attempted) to prevent infinite 401 retry loops when credentials keep changing - Fix _is_oauth_token(): managed keys from ~/.claude.json are NOT regular API keys (don't start with sk-ant-api). Inverted the logic: only sk-ant-api* is treated as API key auth, everything else uses Bearer auth + oauth beta headers HIGH: - Wrap json.loads(args) in try/except in message conversion — malformed tool_call arguments no longer crash the entire conversation - Raise AuthError in runtime_provider when no Anthropic token found (was silently passing empty string, causing confusing API errors) - Remove broken _try_anthropic() from auxiliary vision chain — the centralized router creates an OpenAI client for api_key providers which doesn't work with Anthropic's Messages API MEDIUM: - Handle empty assistant message content — Anthropic rejects empty content blocks, now inserts '(empty)' placeholder - Fix setup.py existing_key logic — set to 'KEEP' sentinel instead of None to prevent falling through to the auth choice prompt - Add debug logging to _fetch_anthropic_models on failure Tests: 43 adapter tests (2 new for token detection), 3197 total passed	2026-03-12 17:14:22 -07:00
Teknium	39f3c0aeb0	fix: use hermes-agent.nousresearch.com as OpenRouter HTTP-Referer * fix: stop rejecting unlisted models + auto-detect from /models endpoint validate_requested_model() now accepts models not in the provider's API listing with a warning instead of blocking. Removes hardcoded catalog fallback for validation — if API is unreachable, accepts with a warning. Model selection flows (setup + /model command) now probe the provider's /models endpoint to get the real available models. Falls back to hardcoded defaults with a clear warning when auto-detection fails: 'Could not auto-detect models — use Custom model if yours isn't listed.' Z.AI setup no longer excludes GLM-5 on coding plans. * fix: use hermes-agent.nousresearch.com as HTTP-Referer for OpenRouter OpenRouter scrapes the favicon/logo from the HTTP-Referer URL for app rankings. We were sending the GitHub repo URL, which gives us a generic GitHub logo. Changed to the proper website URL so our actual branding shows up in rankings. Changed in run_agent.py (main agent client) and auxiliary_client.py (vision/summarization clients).	2026-03-12 16:20:22 -07:00
teknium1	7086fde37e	fix(anthropic): revert inline vision, add hermes model flow, wire vision aux Feedback fixes: 1. Revert _convert_vision_content — vision is handled by the vision_analyze tool, not by converting image blocks inline in conversation messages. Removed the function and its tests. 2. Add Anthropic to 'hermes model' (cmd_model in main.py): - Added to provider_labels dict - Added to providers selection list - Added _model_flow_anthropic() with Claude Code credential auto-detection, API key prompting, and model selection from catalog. 3. Wire up Anthropic as a vision-capable auxiliary provider: - Added _try_anthropic() to auxiliary_client.py using claude-sonnet-4 as the vision model (Claude natively supports multimodal) - Added to the get_vision_auxiliary_client() auto-detection chain (after OpenRouter/Nous, before Codex/custom) Cache tracking note: the Anthropic cache metrics branch in run_agent.py (cache_read_input_tokens / cache_creation_input_tokens) is in the correct place — it's response-level parsing, same location as the existing OpenRouter cache tracking. auxiliary_client.py has no cache tracking.	2026-03-12 16:09:04 -07:00
teknium1	5e12442b4b	feat: native Anthropic provider with Claude Code credential auto-discovery Add Anthropic as a first-class inference provider, bypassing OpenRouter for direct API access. Uses the native Anthropic SDK with a full format adapter (same pattern as the codex_responses api_mode). ## Auth (three methods, priority order) 1. ANTHROPIC_API_KEY env var (regular API key, sk-ant-api-) 2. ANTHROPIC_TOKEN / CLAUDE_CODE_OAUTH_TOKEN env var (setup-token, sk-ant-oat-) 3. Auto-discovery from ~/.claude/.credentials.json (Claude Code subscription) - Reads Claude Code's OAuth credentials - Checks token expiry with 60s buffer - Setup tokens use Bearer auth + anthropic-beta: oauth-2025-04-20 header - Regular API keys use standard x-api-key header ## Changes by file ### New files - agent/anthropic_adapter.py — Client builder, message/tool/response format conversion, Claude Code credential reader, token resolver. Handles system prompt extraction, tool_use/tool_result blocks, thinking/reasoning, orphaned tool_use cleanup, cache_control. - tests/test_anthropic_adapter.py — 36 tests covering all adapter logic ### Modified files - pyproject.toml — Add anthropic>=0.39.0 dependency - hermes_cli/auth.py — Add 'anthropic' to PROVIDER_REGISTRY with three env vars, plus 'claude'/'claude-code' aliases - hermes_cli/models.py — Add model catalog, labels, aliases, provider order - hermes_cli/main.py — Add 'anthropic' to --provider CLI choices - hermes_cli/runtime_provider.py — Add Anthropic branch returning api_mode='anthropic_messages' (before generic api_key fallthrough) - hermes_cli/setup.py — Add Anthropic setup wizard with Claude Code credential auto-discovery, model selection, OpenRouter tools prompt - agent/auxiliary_client.py — Add claude-haiku-4-5 as aux model - agent/model_metadata.py — Add bare Claude model context lengths - run_agent.py — Add anthropic_messages api_mode: * Client init (Anthropic SDK instead of OpenAI) * API call dispatch (_anthropic_client.messages.create) * Response validation (content blocks) * finish_reason mapping (stop_reason -> finish_reason) * Token usage (input_tokens/output_tokens) * Response normalization (normalize_anthropic_response) * Client interrupt/rebuild * Prompt caching auto-enabled for native Anthropic - tests/test_run_agent.py — Update test_anthropic_base_url_accepted to expect native routing, add test_prompt_caching_native_anthropic	2026-03-12 15:47:45 -07:00
teknium1	9302690e1b	refactor: remove LLM_MODEL env var dependency — config.yaml is sole source of truth Model selection now comes exclusively from config.yaml (set via 'hermes model' or 'hermes setup'). The LLM_MODEL env var is no longer read or written anywhere in production code. Why: env vars are per-process/per-user and would conflict in multi-agent or multi-tenant setups. Config.yaml is file-based and can be scoped per-user or eventually per-session. Changes: - cli.py: Read model from CLI_CONFIG only, not LLM_MODEL/OPENAI_MODEL - hermes_cli/auth.py: _save_model_choice() no longer writes LLM_MODEL to .env - hermes_cli/setup.py: Remove 12 save_env_value('LLM_MODEL', ...) calls from all provider setup flows - gateway/run.py: Remove LLM_MODEL fallback (HERMES_MODEL still works for gateway process runtime) - cron/scheduler.py: Same - agent/auxiliary_client.py: Remove LLM_MODEL from custom endpoint model detection	2026-03-11 22:04:42 -07:00

1 2

71 Commits