hermes-agent

Author	SHA1	Message	Date
Teknium	e0dbbdb2c9	fix: eliminate 'Event loop is closed' / 'Press ENTER to continue' during idle (#3398 ) The OpenAI SDK's AsyncHttpxClientWrapper.__del__ schedules aclose() via asyncio.get_running_loop().create_task(). When an AsyncOpenAI client is garbage-collected while prompt_toolkit's event loop is running (the common CLI idle state), the aclose() task runs on prompt_toolkit's loop but the underlying TCP transport is bound to a different (dead) worker loop. The transport's self._loop.call_soon() then raises RuntimeError('Event loop is closed'), which prompt_toolkit surfaces as the disruptive 'Unhandled exception in event loop ... Press ENTER to continue...' error. Three-layer fix: 1. neuter_async_httpx_del(): Monkey-patches __del__ to a no-op at CLI startup before any AsyncOpenAI clients are created. Safe because cached clients are explicitly cleaned via _force_close_async_httpx, and uncached clients' TCP connections are cleaned by the OS on exit. 2. Custom asyncio exception handler: Installed on prompt_toolkit's event loop to silently suppress 'Event loop is closed' RuntimeError. Defense-in-depth for SDK upgrades that might change the class name. 3. cleanup_stale_async_clients(): Called after each agent turn (when the agent thread joins) to proactively evict cache entries whose event loop is closed, preventing stale clients from accumulating.	2026-03-27 09:45:25 -07:00
Teknium	5a1e2a307a	perf(ttft): salvage easy-win startup optimizations from #3346 (#3395 ) * perf(ttft): dedupe shared tool availability checks * perf(ttft): short-circuit vision auto-resolution * perf(ttft): make Claude Code version detection lazy * perf(ttft): reuse loaded toolsets for skills prompt --------- Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-27 07:49:44 -07:00
Teknium	ad764d3513	fix(auxiliary): catch ImportError from build_anthropic_client in vision auto-detection (#3312 ) _try_anthropic() caught ImportError on the module import (line 667-669) but not on the build_anthropic_client() call (line 696). When the anthropic_adapter module imports fine but the anthropic SDK is missing, build_anthropic_client() raises ImportError at call time. This escaped _try_anthropic() entirely, killing get_available_vision_backends() and cascading to 7 test failures: - 4 setup wizard tests hit unexpected 'Configure vision:' prompt - 3 codex-auth-as-vision tests failed check_vision_requirements() The fix wraps the build_anthropic_client call in try/except ImportError, returning (None, None) when the SDK is unavailable — consistent with the existing guard at the top of the function.	2026-03-26 18:21:59 -07:00
Teknium	a8e02c7d49	fix: align Nous Portal model slugs with OpenRouter naming (#3253 ) Nous Portal now passes through OpenRouter model names and routes from there. Update the static fallback model list and auxiliary client default to use OpenRouter-format slugs (provider/model) instead of bare names. - _PROVIDER_MODELS['nous']: full OpenRouter catalog - _NOUS_MODEL: google/gemini-3-flash-preview (was gemini-3-flash) - Updated 4 test assertions for the new default model name	2026-03-26 13:49:43 -07:00
ctlst	281100e2df	fix(agent): prevent AsyncOpenAI/httpx cross-loop deadlock in gateway mode (#2701 ) In gateway mode, async tools (vision_analyze, web_extract, session_search) deadlock because _run_async() spawns a thread with asyncio.run(), creating a new event loop, but _get_cached_client() returns an AsyncOpenAI client bound to a different loop. httpx.AsyncClient cannot work across event loop boundaries, causing await client.chat.completions.create() to hang forever. Fix: include the event loop identity in the async client cache key so each loop gets its own AsyncOpenAI instance. Also fix session_search_tool.py which had its own broken asyncio.run()-in-thread pattern — now uses the centralized _run_async() bridge.	2026-03-25 17:31:56 -07:00
Teknium	8bb1d15da4	chore: remove ~100 unused imports across 55 files (#3016 ) Automated cleanup via pyflakes + autoflake with manual review. Changes: - Removed unused stdlib imports (os, sys, json, pathlib.Path, etc.) - Removed unused typing imports (List, Dict, Any, Optional, Tuple, Set, etc.) - Removed unused internal imports (hermes_cli.auth, hermes_cli.config, etc.) - Fixed cli.py: removed 8 shadowed banner imports (imported from hermes_cli.banner then immediately redefined locally — only build_welcome_banner is actually used) - Added noqa comments to imports that appear unused but serve a purpose: - Re-exports (gateway/session.py SessionResetPolicy, tools/terminal_tool.py is_interrupted/_interrupt_event) - SDK presence checks in try/except (daytona, fal_client, discord) - Test mock targets (auxiliary_client.py Path, mcp_config.py get_hermes_home) Zero behavioral changes. Full test suite passes (6162/6162, 2 pre-existing streaming test failures unrelated to this change).	2026-03-25 15:02:03 -07:00
Teknium	1f21ef7488	fix(cli): prevent 'Press ENTER to continue...' on exit When AsyncOpenAI clients are garbage-collected after the event loop closes, their AsyncHttpxClientWrapper.__del__ tries to schedule aclose() on the dead loop, causing RuntimeError: Event loop is closed. prompt_toolkit catches this as an unhandled exception and shows 'Press ENTER to continue...' which blocks CLI exit. Fix: Add shutdown_cached_clients() to auxiliary_client.py that marks all cached async clients' underlying httpx transport as CLOSED before GC runs. This prevents __del__ from attempting the aclose() call. - _force_close_async_httpx(): sets httpx AsyncClient._state to CLOSED - shutdown_cached_clients(): iterates _client_cache, closes sync clients normally and marks async clients as closed - Also fix stale client eviction in _get_cached_client to mark evicted async clients as closed (was just del-ing them, triggering __del__) - Call shutdown_cached_clients() from _run_cleanup() in cli.py	2026-03-22 15:31:54 -07:00
Teknium	306e67f32d	fix: fail fast when explicit provider has no API key instead of silent OpenRouter fallback (#2445 ) When a non-OpenRouter provider (e.g. minimax, anthropic) is set in config.yaml but its API key is missing, Hermes silently fell back to OpenRouter, causing confusing 404 errors. Now checks if the user explicitly configured a provider before falling back. Explicit providers raise RuntimeError with a clear message naming the missing env var. Auto/openrouter/custom providers still fall through to OpenRouter as before. Three code paths fixed: - run_agent.py AIAgent.__init__ — main client initialization - auxiliary_client.py call_llm — sync auxiliary calls - auxiliary_client.py call_llm_streaming — async auxiliary calls Based on PR #2272 by @StefanIsMe. Applied manually to fix a pconfig NameError in the original and extend to call_llm_streaming. Co-authored-by: StefanIsMe <StefanIsMe@users.noreply.github.com>	2026-03-22 03:59:29 -07:00
0xbyt4	dbc25a386e	fix: auxiliary client skips expired Codex JWT and propagates Anthropic OAuth flag Two bugs in the auxiliary provider auto-detection chain: 1. Expired Codex JWT blocks the auto chain: _read_codex_access_token() returned any stored token without checking expiry, preventing fallback to working providers. Now decodes JWT exp claim and returns None for expired tokens. 2. Auxiliary Anthropic client missing OAuth identity transforms: _AnthropicCompletionsAdapter always called build_anthropic_kwargs with is_oauth=False, causing 400 errors for OAuth tokens. Now detects OAuth tokens via _is_oauth_token() and propagates the flag through the adapter chain. Cherry-picked from PR #2378 by 0xbyt4. Fixed test_api_key_no_oauth_flag to mock resolve_anthropic_token directly (env var alone was insufficient).	2026-03-21 17:36:25 -07:00
Teknium	f8fb61d4ad	fix(provider): prevent Anthropic fallback from inheriting non-Anthropic base_url Only honor config.model.base_url for Anthropic resolution when config.model.provider is actually "anthropic". This prevents a Codex (or other provider) base_url from leaking into Anthropic runtime and auxiliary client paths, which would send requests to the wrong endpoint. Closes #2384	2026-03-21 16:16:17 -07:00
Teknium	7a427d7b03	fix: persistent event loop in _run_async prevents 'Event loop is closed' (#2190 ) Cherry-picked from PR #2146 by @crazywriter1. Fixes #2104. asyncio.run() creates and closes a fresh event loop each call. Cached httpx/AsyncOpenAI clients bound to the dead loop crash on GC with 'Event loop is closed'. This hit vision_analyze on first use in CLI. Two-layer fix: - model_tools._run_async(): replace asyncio.run() with persistent loop via _get_tool_loop() + run_until_complete() - auxiliary_client._get_cached_client(): track which loop created each async client, discard stale entries if loop is closed 6 regression tests covering loop lifecycle, reuse, and full vision dispatch chain. Co-authored-by: Test <test@test.com>	2026-03-20 09:44:50 -07:00
Teknium	67d707e851	fix: respect config.yaml model.base_url for Anthropic provider (#1948 ) (#1998 ) After #1675 removed ANTHROPIC_BASE_URL env var support, the Anthropic provider base URL was hardcoded to https://api.anthropic.com. Now reads model.base_url from config.yaml as an override, falling back to the default when not set. Also applies to the auxiliary client. Cherry-picked from PR #1949 by @rivercrab26. Co-authored-by: rivercrab26 <rivercrab26@users.noreply.github.com>	2026-03-18 16:51:24 -07:00
Test	e7844e9c8d	Merge origin/main, resolve conflicts (self._base_url_lower)	2026-03-18 04:09:00 -07:00
octo-patch	e4043633fc	feat: upgrade MiniMax default to M2.7 + add new OpenRouter models MiniMax: Add M2.7 and M2.7-highspeed as new defaults across provider model lists, auxiliary client, metadata, setup wizard, RL training tool, fallback tests, and docs. Retain M2.5/M2.1 as alternatives. OpenRouter: Add grok-4.20-beta, nemotron-3-super-120b-a12b:free, trinity-large-preview:free, glm-5-turbo, and hunter-alpha to the model catalog. MiniMax changes based on PR #1882 by @octo-patch (applied manually due to stale conflicts in refactored pricing module).	2026-03-18 02:42:58 -07:00
max	0c392e7a87	feat: integrate GitHub Copilot providers across Hermes Add first-class GitHub Copilot and Copilot ACP provider support across model selection, runtime provider resolution, CLI sessions, delegated subagents, cron jobs, and the Telegram gateway. This also normalizes Copilot model catalogs and API modes, introduces a Copilot ACP OpenAI-compatible shim, and fixes service-mode auth by resolving Homebrew-installed gh binaries under launchd. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-17 23:40:22 -07:00
Teknium	d1d17f4f0a	feat(compression): add summary_base_url + move compression config to YAML-only - Add summary_base_url config option to compression block for custom OpenAI-compatible endpoints (e.g. zai, DeepSeek, Ollama) - Remove compression env var bridges from cli.py and gateway/run.py (CONTEXT_COMPRESSION_* env vars no longer set from config) - Switch run_agent.py to read compression config directly from config.yaml instead of env vars - Fix backwards-compat block in _resolve_task_provider_model to also fire when auxiliary.compression.provider is 'auto' (DEFAULT_CONFIG sets this, which was silently preventing the compression section's summary_* keys from being read) - Add test for summary_base_url config-to-client flow - Update docs to show compression as config.yaml-only Closes #1591 Based on PR #1702 by @uzaylisak	2026-03-17 04:46:15 -07:00
teknium1	e5a244ad5d	fix(aux): reset auxiliary_is_nous flag on each resolution attempt The module-level auxiliary_is_nous was set to True by _try_nous() and never reset. In long-running gateway processes, once Nous was resolved as auxiliary provider, the flag stayed True forever — even if subsequent resolutions chose a different provider (e.g. OpenRouter). This caused Nous product tags to be sent to non-Nous providers. Reset the flag at the start of _resolve_auto() so only the winning provider's flag persists.	2026-03-17 04:02:15 -07:00
Teknium	1d5a39e002	fix: thread safety for concurrent subagent delegation (#1672 ) * fix: thread safety for concurrent subagent delegation Four thread-safety fixes that prevent crashes and data races when running multiple subagents concurrently via delegate_task: 1. Remove redirect_stdout/stderr from delegate_tool — mutating global sys.stdout races with the spinner thread when multiple children start concurrently, causing segfaults. Children already run with quiet_mode=True so the redirect was redundant. 2. Split _run_single_child into _build_child_agent (main thread) + _run_single_child (worker thread). AIAgent construction creates httpx/SSL clients which are not thread-safe to initialize concurrently. 3. Add threading.Lock to SessionDB — subagents share the parent's SessionDB and call create_session/append_message from worker threads with no synchronization. 4. Add _active_children_lock to AIAgent — interrupt() iterates _active_children while worker threads append/remove children. 5. Add _client_cache_lock to auxiliary_client — multiple subagent threads may resolve clients concurrently via call_llm(). Based on PR #1471 by peteromallet. * feat: Honcho base_url override via config.yaml + quick command alias type Two features salvaged from PR #1576: 1. Honcho base_url override: allows pointing Hermes at a remote self-hosted Honcho deployment via config.yaml: honcho: base_url: "http://192.168.x.x:8000" When set, this overrides the Honcho SDK's environment mapping (production/local), enabling LAN/VPN Honcho deployments without requiring the server to live on localhost. Uses config.yaml instead of env var (HONCHO_URL) per project convention. 2. Quick command alias type: adds a new 'alias' quick command type that rewrites to another slash command before normal dispatch: quick_commands: sc: type: alias target: /context Supports both CLI and gateway. Arguments are forwarded to the target command. Based on PR #1576 by redhelix. --------- Co-authored-by: peteromallet <peteromallet@users.noreply.github.com> Co-authored-by: redhelix <redhelix@users.noreply.github.com>	2026-03-17 02:53:33 -07:00
Teknium	35d948b6e1	feat: add Kilo Code (kilocode) as first-class inference provider (#1666 ) Add Kilo Gateway (kilo.ai) as an API-key provider with OpenAI-compatible endpoint at https://api.kilo.ai/api/gateway. Supports 500+ models from Anthropic, OpenAI, Google, xAI, Mistral, MiniMax via a single API key. - Register kilocode in PROVIDER_REGISTRY with aliases (kilo, kilo-code, kilo-gateway) and KILOCODE_API_KEY / KILOCODE_BASE_URL env vars - Add to model catalog, CLI provider menu, setup wizard, doctor checks - Add google/gemini-3-flash-preview as default aux model - 12 new tests covering registration, aliases, credential resolution, runtime config - Documentation updates (env vars, config, fallback providers) - Fix setup test index shift from provider insertion Inspired by PR #1473 by @amanning3390. Co-authored-by: amanning3390 <amanning3390@users.noreply.github.com>	2026-03-17 02:40:34 -07:00
Teknium	40e2f8d9f0	feat(provider): add OpenCode Zen and OpenCode Go providers Add support for OpenCode Zen (pay-as-you-go, 35+ curated models) and OpenCode Go ($10/month subscription, open models) as first-class providers. Both are OpenAI-compatible endpoints resolved via the generic api_key provider flow — no custom adapter needed. Files changed: - hermes_cli/auth.py — ProviderConfig entries + aliases - hermes_cli/config.py — OPENCODE_ZEN/GO API key env vars - hermes_cli/models.py — model catalogs, labels, aliases, provider order - hermes_cli/main.py — provider labels, menu entries, model flow dispatch - hermes_cli/setup.py — setup wizard branches (idx 10, 11) - agent/model_metadata.py — context lengths for all OpenCode models - agent/auxiliary_client.py — default aux models - .env.example — documentation Co-authored-by: DevAgarwal2 <DevAgarwal2@users.noreply.github.com>	2026-03-17 02:02:43 -07:00
Teknium	3576f44a57	feat: add Vercel AI Gateway provider (#1628 ) * feat: add Vercel AI Gateway as a first-class provider Adds AI Gateway (ai-gateway.vercel.sh) as a new inference provider with AI_GATEWAY_API_KEY authentication, live model discovery, and reasoning support via extra_body.reasoning. Based on PR #1492 by jerilynzheng. * feat: add AI Gateway to setup wizard, doctor, and fallback providers * test: add AI Gateway to api_key_providers test suite * feat: add AI Gateway to hermes model CLI and model metadata Wire AI Gateway into the interactive model selection menu and add context lengths for AI Gateway model IDs in model_metadata.py. * feat: use claude-haiku-4.5 as AI Gateway auxiliary model * revert: use gemini-3-flash as AI Gateway auxiliary model * fix: move AI Gateway below established providers in selection order --------- Co-authored-by: jerilynzheng <jerilynzheng@users.noreply.github.com> Co-authored-by: jerilynzheng <zheng.jerilyn@gmail.com>	2026-03-17 00:12:16 -07:00
teknium1	62abb453d3	Merge origin/main into hermes/hermes-daa73839	2026-03-14 23:44:47 -07:00
teknium1	735a6e7651	fix: convert anthropic image content blocks	2026-03-14 23:41:20 -07:00
teknium1	1337c9efd8	test: resolve auxiliary client merge conflict	2026-03-14 22:15:16 -07:00
teknium1	85ef09e520	Merge origin/main into hermes/hermes-dd253d81	2026-03-14 21:16:29 -07:00
teknium1	db362dbd4c	feat: add native Anthropic auxiliary vision	2026-03-14 21:14:20 -07:00
teknium1	9f6bccd76a	feat: add direct endpoint overrides for auxiliary and delegation Add base_url/api_key overrides for auxiliary tasks and delegation so users can route those flows straight to a custom OpenAI-compatible endpoint without having to rely on provider=main or named custom providers. Also clear gateway session env vars in test isolation so the full suite stays deterministic when run from a messaging-backed agent session.	2026-03-14 21:11:37 -07:00
Teknium	a86b487349	Merge pull request #1373 from NousResearch/hermes/hermes-781f9235 fix: restore config-saved custom endpoint resolution	2026-03-14 21:06:41 -07:00
teknium1	53d1043a50	fix: restore config-saved custom endpoint resolution	2026-03-14 20:58:12 -07:00
teknium1	dc11b86e4b	refactor: unify vision backend gating	2026-03-14 20:22:13 -07:00
0xIbra	437ec17125	fix(cli): respect HERMES_HOME in all remaining hardcoded ~/.hermes paths Several files resolved paths via Path.home() / ".hermes" or os.path.expanduser("~/.hermes/..."), bypassing the HERMES_HOME environment variable. This broke isolation when running multiple Hermes instances with distinct HERMES_HOME directories. Replace all hardcoded paths with calls to get_hermes_home() from hermes_cli.config, consistent with the rest of the codebase. Files fixed: - tools/process_registry.py (processes.json) - gateway/pairing.py (pairing/) - gateway/sticker_cache.py (sticker_cache.json) - gateway/channel_directory.py (channel_directory.json, sessions.json) - gateway/config.py (gateway.json, config.yaml, sessions_dir) - gateway/mirror.py (sessions/) - gateway/hooks.py (hooks/) - gateway/platforms/base.py (image_cache/, audio_cache/, document_cache/) - gateway/platforms/whatsapp.py (whatsapp/session) - gateway/delivery.py (cron/output) - agent/auxiliary_client.py (auth.json) - agent/prompt_builder.py (SOUL.md) - cli.py (config.yaml, images/, pastes/, history) - run_agent.py (logs/) - tools/environments/base.py (sandboxes/) - tools/environments/modal.py (modal_snapshots.json) - tools/environments/singularity.py (singularity_snapshots.json) - tools/tts_tool.py (audio_cache) - hermes_cli/status.py (cron/jobs.json, sessions.json) - hermes_cli/gateway.py (logs/, whatsapp session) - hermes_cli/main.py (whatsapp/session) Tests updated to use HERMES_HOME env var instead of patching Path.home(). Closes #892 (cherry picked from commit 78ac1bba43b8b74a934c6172f2c29bb4d03164b9)	2026-03-13 21:32:53 -07:00
Teknium	11b577671b	fix: auxiliary client uses main model for custom/local endpoints instead of gpt-4o-mini (#1189 ) * fix: prevent model/provider mismatch when switching providers during active gateway When _update_config_for_provider() writes the new provider and base_url to config.yaml, the gateway (which re-reads config per-message) can pick up the change before model selection completes. This causes the old model name (e.g. 'anthropic/claude-opus-4.6') to be sent to the new provider's API (e.g. MiniMax), which fails. Changes: - _update_config_for_provider() now accepts an optional default_model parameter. When provided and the current model.default is empty or uses OpenRouter format (contains '/'), it sets a safe default model for the new provider. - All setup.py callers for direct-API providers (zai, kimi, minimax, minimax-cn, anthropic) now pass a provider-appropriate default model. - _setup_provider_model_selection() now validates the 'Keep current' choice: if the current model uses OpenRouter format and wouldn't work with the new provider, it warns and switches to the provider's first default model instead of silently keeping the incompatible name. Reported by a user on Home Assistant whose gateway started sending 'anthropic/claude-opus-4.6' to MiniMax's API after running hermes setup. * fix: auxiliary client uses main model for custom/local endpoints instead of gpt-4o-mini When a user runs a local server (e.g. Qwen3.5-9B via OPENAI_BASE_URL), the auxiliary client (context compression, vision, session search) would send requests for 'gpt-4o-mini' or 'google/gemini-3-flash-preview' to the local server, which only serves one model — causing 404 errors mid-task. Changes: - _try_custom_endpoint() now reads the user's configured main model via _read_main_model() (checks OPENAI_MODEL → HERMES_MODEL → LLM_MODEL → config.yaml model.default) instead of hardcoding 'gpt-4o-mini'. - resolve_provider_client() auto mode now detects when an OpenRouter- formatted model override (containing '/') would be sent to a non- OpenRouter provider (like a local server) and drops it in favor of the provider's default model. - Test isolation fixes: properly clear env vars in 'nothing available' tests to prevent host environment leakage.	2026-03-13 10:02:16 -07:00
teknium1	e976879cf2	merge: resolve conflicts with main (URL update to hermes-agent.nousresearch.com)	2026-03-12 17:49:26 -07:00
teknium1	4068f20ce9	fix(anthropic): deep scan fixes — auth, retries, edge cases Fixes from comprehensive code review and cross-referencing with clawdbot/OpenCode implementations: CRITICAL: - Add one-shot guard (anthropic_auth_retry_attempted) to prevent infinite 401 retry loops when credentials keep changing - Fix _is_oauth_token(): managed keys from ~/.claude.json are NOT regular API keys (don't start with sk-ant-api). Inverted the logic: only sk-ant-api* is treated as API key auth, everything else uses Bearer auth + oauth beta headers HIGH: - Wrap json.loads(args) in try/except in message conversion — malformed tool_call arguments no longer crash the entire conversation - Raise AuthError in runtime_provider when no Anthropic token found (was silently passing empty string, causing confusing API errors) - Remove broken _try_anthropic() from auxiliary vision chain — the centralized router creates an OpenAI client for api_key providers which doesn't work with Anthropic's Messages API MEDIUM: - Handle empty assistant message content — Anthropic rejects empty content blocks, now inserts '(empty)' placeholder - Fix setup.py existing_key logic — set to 'KEEP' sentinel instead of None to prevent falling through to the auth choice prompt - Add debug logging to _fetch_anthropic_models on failure Tests: 43 adapter tests (2 new for token detection), 3197 total passed	2026-03-12 17:14:22 -07:00
Teknium	39f3c0aeb0	fix: use hermes-agent.nousresearch.com as OpenRouter HTTP-Referer * fix: stop rejecting unlisted models + auto-detect from /models endpoint validate_requested_model() now accepts models not in the provider's API listing with a warning instead of blocking. Removes hardcoded catalog fallback for validation — if API is unreachable, accepts with a warning. Model selection flows (setup + /model command) now probe the provider's /models endpoint to get the real available models. Falls back to hardcoded defaults with a clear warning when auto-detection fails: 'Could not auto-detect models — use Custom model if yours isn't listed.' Z.AI setup no longer excludes GLM-5 on coding plans. * fix: use hermes-agent.nousresearch.com as HTTP-Referer for OpenRouter OpenRouter scrapes the favicon/logo from the HTTP-Referer URL for app rankings. We were sending the GitHub repo URL, which gives us a generic GitHub logo. Changed to the proper website URL so our actual branding shows up in rankings. Changed in run_agent.py (main agent client) and auxiliary_client.py (vision/summarization clients).	2026-03-12 16:20:22 -07:00
teknium1	7086fde37e	fix(anthropic): revert inline vision, add hermes model flow, wire vision aux Feedback fixes: 1. Revert _convert_vision_content — vision is handled by the vision_analyze tool, not by converting image blocks inline in conversation messages. Removed the function and its tests. 2. Add Anthropic to 'hermes model' (cmd_model in main.py): - Added to provider_labels dict - Added to providers selection list - Added _model_flow_anthropic() with Claude Code credential auto-detection, API key prompting, and model selection from catalog. 3. Wire up Anthropic as a vision-capable auxiliary provider: - Added _try_anthropic() to auxiliary_client.py using claude-sonnet-4 as the vision model (Claude natively supports multimodal) - Added to the get_vision_auxiliary_client() auto-detection chain (after OpenRouter/Nous, before Codex/custom) Cache tracking note: the Anthropic cache metrics branch in run_agent.py (cache_read_input_tokens / cache_creation_input_tokens) is in the correct place — it's response-level parsing, same location as the existing OpenRouter cache tracking. auxiliary_client.py has no cache tracking.	2026-03-12 16:09:04 -07:00
teknium1	5e12442b4b	feat: native Anthropic provider with Claude Code credential auto-discovery Add Anthropic as a first-class inference provider, bypassing OpenRouter for direct API access. Uses the native Anthropic SDK with a full format adapter (same pattern as the codex_responses api_mode). ## Auth (three methods, priority order) 1. ANTHROPIC_API_KEY env var (regular API key, sk-ant-api-) 2. ANTHROPIC_TOKEN / CLAUDE_CODE_OAUTH_TOKEN env var (setup-token, sk-ant-oat-) 3. Auto-discovery from ~/.claude/.credentials.json (Claude Code subscription) - Reads Claude Code's OAuth credentials - Checks token expiry with 60s buffer - Setup tokens use Bearer auth + anthropic-beta: oauth-2025-04-20 header - Regular API keys use standard x-api-key header ## Changes by file ### New files - agent/anthropic_adapter.py — Client builder, message/tool/response format conversion, Claude Code credential reader, token resolver. Handles system prompt extraction, tool_use/tool_result blocks, thinking/reasoning, orphaned tool_use cleanup, cache_control. - tests/test_anthropic_adapter.py — 36 tests covering all adapter logic ### Modified files - pyproject.toml — Add anthropic>=0.39.0 dependency - hermes_cli/auth.py — Add 'anthropic' to PROVIDER_REGISTRY with three env vars, plus 'claude'/'claude-code' aliases - hermes_cli/models.py — Add model catalog, labels, aliases, provider order - hermes_cli/main.py — Add 'anthropic' to --provider CLI choices - hermes_cli/runtime_provider.py — Add Anthropic branch returning api_mode='anthropic_messages' (before generic api_key fallthrough) - hermes_cli/setup.py — Add Anthropic setup wizard with Claude Code credential auto-discovery, model selection, OpenRouter tools prompt - agent/auxiliary_client.py — Add claude-haiku-4-5 as aux model - agent/model_metadata.py — Add bare Claude model context lengths - run_agent.py — Add anthropic_messages api_mode: * Client init (Anthropic SDK instead of OpenAI) * API call dispatch (_anthropic_client.messages.create) * Response validation (content blocks) * finish_reason mapping (stop_reason -> finish_reason) * Token usage (input_tokens/output_tokens) * Response normalization (normalize_anthropic_response) * Client interrupt/rebuild * Prompt caching auto-enabled for native Anthropic - tests/test_run_agent.py — Update test_anthropic_base_url_accepted to expect native routing, add test_prompt_caching_native_anthropic	2026-03-12 15:47:45 -07:00
teknium1	9302690e1b	refactor: remove LLM_MODEL env var dependency — config.yaml is sole source of truth Model selection now comes exclusively from config.yaml (set via 'hermes model' or 'hermes setup'). The LLM_MODEL env var is no longer read or written anywhere in production code. Why: env vars are per-process/per-user and would conflict in multi-agent or multi-tenant setups. Config.yaml is file-based and can be scoped per-user or eventually per-session. Changes: - cli.py: Read model from CLI_CONFIG only, not LLM_MODEL/OPENAI_MODEL - hermes_cli/auth.py: _save_model_choice() no longer writes LLM_MODEL to .env - hermes_cli/setup.py: Remove 12 save_env_value('LLM_MODEL', ...) calls from all provider setup flows - gateway/run.py: Remove LLM_MODEL fallback (HERMES_MODEL still works for gateway process runtime) - cron/scheduler.py: Same - agent/auxiliary_client.py: Remove LLM_MODEL from custom endpoint model detection	2026-03-11 22:04:42 -07:00
teknium1	a29801286f	refactor: route main agent client + fallback through centralized router Phase 2 of the provider router migration — route the main agent's client construction and fallback activation through resolve_provider_client() instead of duplicated ad-hoc logic. run_agent.py: - __init__: When no explicit api_key/base_url, use resolve_provider_client(provider, raw_codex=True) for client construction. Explicit creds (from CLI/gateway runtime provider) still construct directly. - _try_activate_fallback: Replace _resolve_fallback_credentials and its duplicated _FALLBACK_API_KEY_PROVIDERS / _FALLBACK_OAUTH_PROVIDERS dicts with a single resolve_provider_client() call. The router handles all provider types (API-key, OAuth, Codex) centrally. - Remove _resolve_fallback_credentials method and both fallback dicts. agent/auxiliary_client.py: - Add raw_codex parameter to resolve_provider_client(). When True, returns the raw OpenAI client for Codex providers instead of wrapping in CodexAuxiliaryClient. The main agent needs this for direct responses.stream() access. 3251 passed, 2 pre-existing unrelated failures.	2026-03-11 21:38:29 -07:00
teknium1	0aa31cd3cb	feat: call_llm/async_call_llm + config slots + migrate all consumers Add centralized call_llm() and async_call_llm() functions that own the full LLM request lifecycle: 1. Resolve provider + model from task config or explicit args 2. Get or create a cached client for that provider 3. Format request args (max_tokens handling, provider extra_body) 4. Make the API call with max_tokens/max_completion_tokens retry 5. Return the response Config: expanded auxiliary section with provider:model slots for all tasks (compression, vision, web_extract, session_search, skills_hub, mcp, flush_memories). Config version bumped to 7. Migrated all auxiliary consumers: - context_compressor.py: uses call_llm(task='compression') - vision_tools.py: uses async_call_llm(task='vision') - web_tools.py: uses async_call_llm(task='web_extract') - session_search_tool.py: uses async_call_llm(task='session_search') - browser_tool.py: uses call_llm(task='vision'/'web_extract') - mcp_tool.py: uses call_llm(task='mcp') - skills_guard.py: uses call_llm(provider='openrouter') - run_agent.py flush_memories: uses call_llm(task='flush_memories') Tests updated for context_compressor and MCP tool. Some test mocks still need updating (15 remaining failures from mock pattern changes, 2 pre-existing).	2026-03-11 20:52:19 -07:00
teknium1	013cc4d2fc	chore: remove nous-api provider (API key path) Nous Portal only supports OAuth authentication. Remove the 'nous-api' provider which allowed direct API key access via NOUS_API_KEY env var. Removed from: - hermes_cli/auth.py: PROVIDER_REGISTRY entry + aliases - hermes_cli/config.py: OPTIONAL_ENV_VARS entry - hermes_cli/setup.py: setup wizard option + model selection handler (reindexed remaining provider choices) - agent/auxiliary_client.py: docstring references - tests/test_runtime_provider_resolution.py: nous-api test - tests/integration/test_web_tools.py: renamed dict key	2026-03-11 20:14:44 -07:00
teknium1	07f09ecd83	refactor: route ad-hoc LLM consumers through centralized provider router Route all remaining ad-hoc auxiliary LLM call sites through resolve_provider_client() so auth, headers, and API format (Chat Completions vs Responses API) are handled consistently in one place. Files changed: - tools/openrouter_client.py: Replace manual AsyncOpenAI construction with resolve_provider_client('openrouter', async_mode=True). The shared client module now delegates entirely to the router. - tools/skills_guard.py: Replace inline OpenAI client construction (hardcoded OpenRouter base_url, manual api_key lookup, manual headers) with resolve_provider_client('openrouter'). Remove unused OPENROUTER_BASE_URL import. - trajectory_compressor.py: Add _detect_provider() to map config base_url to a provider name, then route through resolve_provider_client. Falls back to raw construction for unrecognized custom endpoints. - mini_swe_runner.py: Route default case (no explicit api_key/base_url) through resolve_provider_client('openrouter') with auto-detection fallback. Preserves direct construction when explicit creds are passed via CLI args. - agent/auxiliary_client.py: Fix stale module docstring — vision auto mode now correctly documents that Codex and custom endpoints are tried (not skipped).	2026-03-11 20:02:36 -07:00
teknium1	8805e705a7	feat: centralized provider router + fix Codex vision bypass + vision error handling Three interconnected fixes for auxiliary client infrastructure: 1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py) Add resolve_provider_client(provider, model, async_mode) — a single entry point for creating properly configured clients. Given a provider name and optional model, it handles auth lookup (env vars, OAuth tokens, auth.json), base URL resolution, provider-specific headers, and API format differences (Chat Completions vs Responses API for Codex). All auxiliary consumers should route through this instead of ad-hoc env var lookups. Refactored get_text_auxiliary_client, get_async_text_auxiliary_client, and get_vision_auxiliary_client to use the router internally. 2. FIX CODEX VISION BYPASS (vision_tools.py) vision_tools.py was constructing a raw AsyncOpenAI client from the sync vision client's api_key/base_url, completely bypassing the Codex Responses API adapter. When the vision provider resolved to Codex, the raw client would hit chatgpt.com/backend-api/codex with chat.completions.create() which only supports the Responses API. Fix: Added get_async_vision_auxiliary_client() which properly wraps Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this instead of manual client construction. 3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING - context_compressor.py: Removed _get_fallback_client() which blindly looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth, API-key providers, users without OPENAI_BASE_URL set). Replaced with fallback loop through resolve_provider_client() for each known provider, with same-provider dedup. - vision_tools.py: Added error detection for vision capability failures. Returns clear message to the model when the configured model doesn't support vision, instead of a generic error. Addresses #886	2026-03-11 19:46:47 -07:00
teknium1	ef5d811aba	fix: vision auto-detection now falls back to custom/local endpoints Vision auto-mode previously only tried OpenRouter, Nous, and Codex for multimodal — deliberately skipping custom endpoints with the assumption they 'may not handle vision input.' This caused silent failures for users running local multimodal models (Qwen-VL, LLaVA, Pixtral, etc.) without any cloud API keys. Now custom endpoints are tried as a last resort in auto mode. If the model doesn't support vision, the API call fails gracefully — but users with local vision models no longer need to manually set auxiliary.vision.provider: main in config.yaml. Reported by @Spadav and @kotyKD.	2026-03-09 15:36:19 -07:00
teknium1	2d1a1c1c47	refactor: remove redundant 'openai' auxiliary provider, clean up docs The 'openai' provider was redundant — using OPENAI_BASE_URL + OPENAI_API_KEY with provider: 'main' already covers direct OpenAI API. Provider options are now: auto, openrouter, nous, codex, main. - Removed _try_openai(), _OPENAI_AUX_MODEL, _OPENAI_BASE_URL - Replaced openai tests with codex provider tests - Updated all docs to remove 'openai' option and clarify 'main' - 'main' description now explicitly mentions it works with OpenAI API, local models, and any OpenAI-compatible endpoint Tests: 2467 passed.	2026-03-08 18:50:26 -07:00
teknium1	71e81728ac	feat: Codex OAuth vision support + multimodal content adapter The Codex Responses API (chatgpt.com/backend-api/codex) supports vision via gpt-5.3-codex. This was verified with real API calls using image analysis. Changes to _CodexCompletionsAdapter: - Added _convert_content_for_responses() to translate chat.completions multimodal format to Responses API format: - {type: 'text'} → {type: 'input_text'} - {type: 'image_url', image_url: {url: '...'}} → {type: 'input_image', image_url: '...'} - Fixed: removed 'stream' from resp_kwargs (responses.stream() handles it) - Fixed: removed max_output_tokens and temperature (Codex endpoint rejects them) Provider changes: - Added 'codex' as explicit auxiliary provider option - Vision auto-fallback now includes Codex (OpenRouter → Nous → Codex) since gpt-5.3-codex supports multimodal input - Updated docs with Codex OAuth examples Tested with real Codex OAuth token + ~/.hermes/image2.png — confirmed working end-to-end through the full adapter pipeline. Tests: 2459 passed.	2026-03-08 18:44:33 -07:00
teknium1	ae4a674c84	feat: add 'openai' as auxiliary provider option Users can now set provider: "openai" for auxiliary tasks (vision, web extract, compression) to use OpenAI's API directly with their OPENAI_API_KEY. This hits api.openai.com/v1 with gpt-4o-mini as the default model — supports vision since GPT-4o handles image input. Provider options are now: auto, openrouter, nous, openai, main. Changes: - agent/auxiliary_client.py: added _try_openai(), "openai" case in _resolve_forced_provider(), updated auxiliary_max_tokens_param() to use max_completion_tokens for OpenAI - Updated docs: cli-config.yaml.example, AGENTS.md, and user-facing configuration.md with Common Setups section showing OpenAI, OpenRouter, and local model examples - 3 new tests for OpenAI provider resolution Tests: 2459 passed (was 2429).	2026-03-08 18:25:30 -07:00
teknium1	5ae0b731d0	fix: harden auxiliary model config — gateway bridge, vision safety, tests Improvements on top of PR #606 (auxiliary model configuration): 1. Gateway bridge: Added auxiliary.* and compression.summary_provider config bridging to gateway/run.py so config.yaml settings work from messaging platforms (not just CLI). Matches the pattern in cli.py. 2. Vision auto-fallback safety: In auto mode, vision now only tries OpenRouter + Nous Portal (known multimodal-capable providers). Custom endpoints, Codex, and API-key providers are skipped to avoid confusing errors from providers that don't support vision input. Explicit provider override (AUXILIARY_VISION_PROVIDER=main) still allows using any provider. 3. Comprehensive tests (46 new): - _get_auxiliary_provider env var resolution (8 tests) - _resolve_forced_provider with all provider types (8 tests) - Per-task provider routing integration (4 tests) - Vision auto-fallback safety (7 tests) - Config bridging logic (11 tests) - Gateway/CLI bridge parity (2 tests) - Vision model override via env var (2 tests) - DEFAULT_CONFIG shape validation (4 tests) 4. Docs: Added auxiliary_client.py to AGENTS.md project structure. Updated module docstring with separate text/vision resolution chains. Tests: 2429 passed (was 2383).	2026-03-08 18:06:47 -07:00
teknium1	d9f373654b	feat: enhance auxiliary model configuration and environment variable handling - Added support for auxiliary model overrides in the configuration, allowing users to specify providers and models for vision and web extraction tasks. - Updated the CLI configuration example to include new auxiliary model settings. - Enhanced the environment variable mapping in the CLI to accommodate auxiliary model configurations. - Improved the resolution logic for auxiliary clients to support task-specific provider overrides. - Updated relevant documentation and comments for clarity on the new features and their usage.	2026-03-08 18:06:47 -07:00
Christo Mitov	4447e7d71a	fix: add Kimi Code API support (api.kimi.com/coding/v1) Kimi Code (platform.kimi.ai) issues API keys prefixed sk-kimi- that require: 1. A different base URL: api.kimi.com/coding/v1 (not api.moonshot.ai/v1) 2. A User-Agent header identifying a recognized coding agent Without this fix, sk-kimi- keys fail with 401 (wrong endpoint) or 403 ('only available for Coding Agents') errors. Changes: - Auto-detect sk-kimi- key prefix and route to api.kimi.com/coding/v1 - Send User-Agent: KimiCLI/1.0 header for Kimi Code endpoints - Legacy Moonshot keys (api.moonshot.ai) continue to work unchanged - KIMI_BASE_URL env var override still takes priority over auto-detection - Updated .env.example with correct docs and all endpoint options - Fixed doctor.py health check for Kimi Code keys Reference: https://github.com/MoonshotAI/kimi-cli (platforms.py)	2026-03-07 21:00:12 -05:00

1 2

59 Commits