hermes-agent

Author	SHA1	Message	Date
teknium1	5e12442b4b	feat: native Anthropic provider with Claude Code credential auto-discovery Add Anthropic as a first-class inference provider, bypassing OpenRouter for direct API access. Uses the native Anthropic SDK with a full format adapter (same pattern as the codex_responses api_mode). ## Auth (three methods, priority order) 1. ANTHROPIC_API_KEY env var (regular API key, sk-ant-api-) 2. ANTHROPIC_TOKEN / CLAUDE_CODE_OAUTH_TOKEN env var (setup-token, sk-ant-oat-) 3. Auto-discovery from ~/.claude/.credentials.json (Claude Code subscription) - Reads Claude Code's OAuth credentials - Checks token expiry with 60s buffer - Setup tokens use Bearer auth + anthropic-beta: oauth-2025-04-20 header - Regular API keys use standard x-api-key header ## Changes by file ### New files - agent/anthropic_adapter.py — Client builder, message/tool/response format conversion, Claude Code credential reader, token resolver. Handles system prompt extraction, tool_use/tool_result blocks, thinking/reasoning, orphaned tool_use cleanup, cache_control. - tests/test_anthropic_adapter.py — 36 tests covering all adapter logic ### Modified files - pyproject.toml — Add anthropic>=0.39.0 dependency - hermes_cli/auth.py — Add 'anthropic' to PROVIDER_REGISTRY with three env vars, plus 'claude'/'claude-code' aliases - hermes_cli/models.py — Add model catalog, labels, aliases, provider order - hermes_cli/main.py — Add 'anthropic' to --provider CLI choices - hermes_cli/runtime_provider.py — Add Anthropic branch returning api_mode='anthropic_messages' (before generic api_key fallthrough) - hermes_cli/setup.py — Add Anthropic setup wizard with Claude Code credential auto-discovery, model selection, OpenRouter tools prompt - agent/auxiliary_client.py — Add claude-haiku-4-5 as aux model - agent/model_metadata.py — Add bare Claude model context lengths - run_agent.py — Add anthropic_messages api_mode: * Client init (Anthropic SDK instead of OpenAI) * API call dispatch (_anthropic_client.messages.create) * Response validation (content blocks) * finish_reason mapping (stop_reason -> finish_reason) * Token usage (input_tokens/output_tokens) * Response normalization (normalize_anthropic_response) * Client interrupt/rebuild * Prompt caching auto-enabled for native Anthropic - tests/test_run_agent.py — Update test_anthropic_base_url_accepted to expect native routing, add test_prompt_caching_native_anthropic	2026-03-12 15:47:45 -07:00
Teknium	e9c3317158	fix: improve Kimi model selection — auto-detect endpoint, add missing models (#1039 ) * fix: /reasoning command output ordering, display, and inline think extraction Three issues with the /reasoning command: 1. Output interleaving: The command echo used print() while feedback used _cprint(), causing them to render out-of-order under prompt_toolkit's patch_stdout. Changed echo to use _cprint() so all output renders through the same path in correct order. 2. Reasoning display not working: /reasoning show toggled a flag but reasoning never appeared for models that embed thinking in inline <think> blocks rather than structured API fields. Added fallback extraction in _build_assistant_message to capture <think> block content as reasoning when no structured reasoning fields (reasoning, reasoning_content, reasoning_details) are present. This feeds into both the reasoning callback (during tool loops) and the post-response reasoning box display. 3. Feedback clarity: Added checkmarks to confirm actions, persisted show/hide to config (was session-only before), and aligned the status display for readability. Tests: 7 new tests for inline think block extraction (41 total). * feat: add /reasoning command to gateway (Telegram/Discord/etc) The /reasoning command only existed in the CLI — messaging platforms had no way to view or change reasoning settings. This adds: 1. /reasoning command handler in the gateway: - No args: shows current effort level and display state - /reasoning <level>: sets reasoning effort (none/low/medium/high/xhigh) - /reasoning show\|hide: toggles reasoning display in responses - All changes saved to config.yaml immediately 2. Reasoning display in gateway responses: - When show_reasoning is enabled, prepends a 'Reasoning' block with the model's last_reasoning content before the response - Collapses long reasoning (>15 lines) to keep messages readable - Uses last_reasoning from run_conversation result dict 3. Plumbing: - Added _show_reasoning attribute loaded from config at startup - Propagated last_reasoning through _run_agent return dict - Added /reasoning to help text and known_commands set - Uses getattr for _show_reasoning to handle test stubs * fix: improve Kimi model selection — auto-detect endpoint, add missing models Kimi Coding Plan setup: - New dedicated _model_flow_kimi() replaces the generic API-key flow for kimi-coding. Removes the confusing 'Base URL' prompt entirely — the endpoint is auto-detected from the API key prefix: sk-kimi-* → api.kimi.com/coding/v1 (Kimi Coding Plan) other → api.moonshot.ai/v1 (legacy Moonshot) - Shows appropriate models for each endpoint: Coding Plan: kimi-for-coding, kimi-k2.5, kimi-k2-thinking, kimi-k2-thinking-turbo Moonshot: full model catalog - Clears any stale KIMI_BASE_URL override so runtime auto-detection via _resolve_kimi_base_url() works correctly. Model catalog updates: - Added kimi-for-coding (primary Coding Plan model) and kimi-k2-thinking-turbo to models.py, main.py _PROVIDER_MODELS, and model_metadata.py context windows. - Updated User-Agent from KimiCLI/1.0 to KimiCLI/1.3 (Kimi's coding endpoint whitelists known coding agents via User-Agent sniffing).	2026-03-12 05:58:48 -07:00
dmahan93	c7fc39bde0	feat: include session ID in system prompt via --pass-session-id flag Adds --pass-session-id CLI flag. When set, the agent's system prompt includes the session ID: Conversation started: Sunday, March 08, 2026 06:32 PM Session ID: 20260308_183200_abc123 Usage: hermes --pass-session-id hermes chat --pass-session-id Implementation threads the flag as a proper parameter through the full chain (main.py → cli.py → run_agent.py) rather than using an env var, avoiding collisions in multi-agent/multitenant setups. Based on PR #726 by dmahan93, reworked to use instance parameter instead of HERMES_PASS_SESSION_ID environment variable. Co-authored-by: dmahan93 <dmahan93@users.noreply.github.com>	2026-03-12 05:51:31 -07:00
teknium1	2192b17670	merge: resolve conflicts with origin/main - gateway/run.py: Take main's _resolve_gateway_model() helper - hermes_cli/setup.py: Re-apply nous-api removal after merge brought it back. Fix provider_idx offset (Custom is now index 3, not 4). - tests/hermes_cli/test_setup.py: Fix custom setup test index (3→4)	2026-03-12 00:29:04 -07:00
teknium1	65356003e3	revert: keep provider preferences for all providers (Nous will proxy) Nous Portal backend will become a transparent proxy for OpenRouter- specific parameters (provider preferences, etc.), so keep sending them to all providers. The reasoning disabled fix is kept (that's a real constraint of the Nous endpoint).	2026-03-11 22:53:06 -07:00
teknium1	a7e5f19528	fix: don't send OpenRouter-specific provider preferences to Nous Portal Two bugs in _build_api_kwargs that broke Nous Portal: 1. Provider preferences (only, ignore, order, sort) are OpenRouter- specific routing features. They were being sent in extra_body to ALL providers, including Nous Portal. When the config had providers_only=['google-vertex'], Nous Portal returned 404 'Inference host not found' because it doesn't have a google-vertex backend. Fix: Only include provider preferences when _is_openrouter is True. 2. Reasoning config with enabled=false was being sent to Nous Portal, which requires reasoning and returns 400 'Reasoning is mandatory for this endpoint and cannot be disabled.' Fix: Omit the reasoning parameter for Nous when enabled=false. Root cause found via HERMES_DUMP_REQUESTS=1 which showed the exact request payload being sent to Nous Portal's inference API.	2026-03-11 22:41:33 -07:00
teknium1	a29801286f	refactor: route main agent client + fallback through centralized router Phase 2 of the provider router migration — route the main agent's client construction and fallback activation through resolve_provider_client() instead of duplicated ad-hoc logic. run_agent.py: - __init__: When no explicit api_key/base_url, use resolve_provider_client(provider, raw_codex=True) for client construction. Explicit creds (from CLI/gateway runtime provider) still construct directly. - _try_activate_fallback: Replace _resolve_fallback_credentials and its duplicated _FALLBACK_API_KEY_PROVIDERS / _FALLBACK_OAUTH_PROVIDERS dicts with a single resolve_provider_client() call. The router handles all provider types (API-key, OAuth, Codex) centrally. - Remove _resolve_fallback_credentials method and both fallback dicts. agent/auxiliary_client.py: - Add raw_codex parameter to resolve_provider_client(). When True, returns the raw OpenAI client for Codex providers instead of wrapping in CodexAuxiliaryClient. The main agent needs this for direct responses.stream() access. 3251 passed, 2 pre-existing unrelated failures.	2026-03-11 21:38:29 -07:00
teknium1	0aa31cd3cb	feat: call_llm/async_call_llm + config slots + migrate all consumers Add centralized call_llm() and async_call_llm() functions that own the full LLM request lifecycle: 1. Resolve provider + model from task config or explicit args 2. Get or create a cached client for that provider 3. Format request args (max_tokens handling, provider extra_body) 4. Make the API call with max_tokens/max_completion_tokens retry 5. Return the response Config: expanded auxiliary section with provider:model slots for all tasks (compression, vision, web_extract, session_search, skills_hub, mcp, flush_memories). Config version bumped to 7. Migrated all auxiliary consumers: - context_compressor.py: uses call_llm(task='compression') - vision_tools.py: uses async_call_llm(task='vision') - web_tools.py: uses async_call_llm(task='web_extract') - session_search_tool.py: uses async_call_llm(task='session_search') - browser_tool.py: uses call_llm(task='vision'/'web_extract') - mcp_tool.py: uses call_llm(task='mcp') - skills_guard.py: uses call_llm(provider='openrouter') - run_agent.py flush_memories: uses call_llm(task='flush_memories') Tests updated for context_compressor and MCP tool. Some test mocks still need updating (15 remaining failures from mock pattern changes, 2 pre-existing).	2026-03-11 20:52:19 -07:00
Teknium	8fa96debc9	Merge pull request #963 from NousResearch/hermes/hermes-cf9f7d54 fix: guard all print() against OSError with _SafeWriter	2026-03-11 09:19:52 -07:00
teknium1	a8409a161f	fix: guard all print() calls against OSError with _SafeWriter When hermes-agent runs as a systemd service, Docker container, or headless daemon, the stdout pipe can become unavailable (idle timeout, buffer exhaustion, socket reset). Any print() call then raises OSError: [Errno 5] Input/output error, crashing run_conversation() and causing cron jobs to fail. Rather than wrapping individual print() calls (68 in run_conversation alone), this adds a transparent _SafeWriter wrapper installed once at the start of run_conversation(). It delegates all writes to the real stdout and silently catches OSError. Zero overhead on the happy path, comprehensive coverage of all print calls including future ones. Fixes #845 Co-authored-by: J0hnLawMississippi <J0hnLawMississippi@users.noreply.github.com>	2026-03-11 09:19:10 -07:00
Teknium	01d3b31479	Merge PR #785 : feat: conditional skill activation based on tool availability Authored by teyrebaz33. Closes #539. feat: conditional skill activation based on tool availability	2026-03-11 08:43:30 -07:00
teknium1	a54405e339	fix: proactive compression after large tool results + Anthropic error detection Two fixes for context overflow handling: 1. Proactive compression after tool execution: The compression check now estimates the next prompt size using real token counts from the last API response (prompt_tokens + completion_tokens) plus a conservative estimate of newly appended tool results (chars // 3 for JSON-heavy content). Previously, should_compress() only checked last_prompt_tokens which didn't account for tool results — so a 130k prompt + 100k chars of tool output would pass the 140k threshold check but fail the 200k API limit. 2. Safety net: Added 'prompt is too long' to context-length error detection phrases. Anthropic returns 'prompt is too long: N tokens > M maximum' on HTTP 400, which wasn't matched by existing phrases. This ensures compression fires even if the proactive check underestimates. Fixes #813	2026-03-11 08:04:52 -07:00
teknium1	683c8b24d4	fix: reduce max_retries to 3 and make ValueError/TypeError non-retryable - max_retries reduced from 6 to 3 — 6 retries with exponential backoff could stall for ~275s total on persistent errors - ValueError and TypeError now detected as non-retryable client errors and abort immediately instead of being retried with backoff (these are local validation/programming errors that will never succeed on retry)	2026-03-11 07:04:46 -07:00
teknium1	d2dee43825	fix: allow tool_choice, parallel_tool_calls, prompt_cache_key in codex preflight _preflight_codex_api_kwargs rejected these three fields as unsupported, but _build_api_kwargs adds them to every codex request. This caused a ValueError before _interruptible_api_call was reached, which was caught by the retry loop and retried with exponential backoff — appearing as an infinite hang in tests (275s total backoff across 6 retries). The fix adds these keys to allowed_keys and passes them through to the normalized request dict. This fixes the hanging test_cron_run_job_codex_path_handles_internal_401_refresh test (now passes in 2.6s instead of timing out).	2026-03-11 07:00:14 -07:00
teknium1	4d873f77c1	feat(cli): add /reasoning command for effort level and display toggle Combined implementation of reasoning management: - /reasoning Show current effort level and display state - /reasoning <level> Set reasoning effort (none, low, medium, high, xhigh) - /reasoning show\|on Show model thinking/reasoning in output - /reasoning hide\|off Hide model thinking/reasoning from output Effort level changes persist to config and force agent re-init. Display toggle updates the agent callback dynamically without re-init. When display is enabled: - Intermediate reasoning shown as dim [thinking] lines during tool loops - Final reasoning shown in a bordered box above the response - Long reasoning collapsed (5 lines intermediate, 10 lines final) Also adds: - reasoning_callback parameter to AIAgent - last_reasoning in run_conversation result dict - show_reasoning config option (display section, default: false) - Display section in /config output - 34 tests covering both features Combines functionality from PR #789 and PR #790. Co-authored-by: Aum Desai <Aum08Desai@users.noreply.github.com> Co-authored-by: 0xbyt4 <35742124+0xbyt4@users.noreply.github.com>	2026-03-11 06:02:18 -07:00
teknium1	a82ce60294	fix: add missing Responses API parameters for Codex provider Adds tool_choice, parallel_tool_calls, and prompt_cache_key to the Codex Responses API request kwargs — matching what the official Codex CLI sends. - tool_choice: 'auto' — enables the model to proactively call tools. Without this, the model may default to not using tools, which explains reports of the agent claiming it lacks shell access (#747). - parallel_tool_calls: True — allows the model to issue multiple tool calls in a single turn for efficiency. - prompt_cache_key: session_id — enables server-side prompt caching across turns in the same session, reducing latency and cost. Refs #747	2026-03-11 04:28:31 -07:00
teknium1	21ff0d39ad	feat: iteration budget pressure via tool result injection Two-tier warning system that nudges the LLM as it approaches max_iterations, injected into the last tool result JSON rather than as a separate system message: - Caution (70%): {"_budget_warning": "[BUDGET: 42/60...]"} - Warning (90%): {"_budget_warning": "[BUDGET WARNING: 54/60...]"} For JSON tool results, adds a _budget_warning field to the existing dict. For plain text results, appends the warning as text. Key properties: - No system messages injected mid-conversation - No changes to message structure - Prompt cache stays valid - Configurable thresholds (0.7 / 0.9) - Can be disabled: _budget_pressure_enabled = False Inspired by PR #421 (@Bartok9) and issue #414. 8 tests covering thresholds, edge cases, JSON and text injection.	2026-03-11 00:37:24 -07:00
teknium1	b53d5dad67	Merge PR #705 : fix: detect, warn, and block file re-read/search loops after context compression Authored by 0xbyt4. Adds read/search loop detection, file history injection after compression, and todo filtering for active items only.	2026-03-10 16:17:03 -07:00
teknium1	c1171fe666	fix: eliminate 3x SQLite message duplication in gateway sessions (#860 ) Three separate code paths all wrote to the same SQLite state.db with no deduplication, inflating session transcripts by 3-4x: 1. _log_msg_to_db() — wrote each message individually after append 2. _flush_messages_to_session_db() — re-wrote ALL new messages at every _persist_session() call (~18 exit points), with no tracking of what was already written 3. gateway append_to_transcript() — wrote everything a third time after the agent returned Since load_transcript() prefers SQLite over JSONL, the inflated data was loaded on every session resume, causing proportional token waste. Fix: - Remove _log_msg_to_db() and all 16 call sites (redundant with flush) - Add _last_flushed_db_idx tracking in _flush_messages_to_session_db() so repeated _persist_session() calls only write truly new messages - Reset flush cursor on compression (new session ID) - Add skip_db parameter to SessionStore.append_to_transcript() so the gateway skips SQLite writes when the agent already persisted them - Gateway now passes skip_db=True for agent-managed messages, still writes to JSONL as backup Verified: a 12-message CLI session with tool calls produces exactly 12 SQLite rows with zero duplicates (previously would be 36-48). Tests: 9 new tests covering flush deduplication, skip_db behavior, compression reset, and initialization. Full suite passes (2869 tests).	2026-03-10 15:22:44 -07:00
teyrebaz33	1caee06b22	fix: tool call repair — auto-lowercase, fuzzy match, helpful error on unknown tool (#520 ) - Add _repair_tool_call(): tries lowercase, normalize, then fuzzy match (difflib 0.7) - Replace 3-retry-then-abort with graceful error: model receives helpful message and self-corrects - Conversation stays alive instead of dying on hallucinated tool names Closes #520	2026-03-10 06:54:11 -07:00
teknium1	771969f747	fix: wire up enabled_tools in agent loop + simplify sandbox tool selection Completes the fix started in `8318a51` — handle_function_call() accepted enabled_tools but run_agent.py never passed it. Now both call sites in _execute_tool_calls() pass self.valid_tool_names, so each agent session uses its own tool list instead of the process-global _last_resolved_tool_names (which subagents can overwrite). Also simplifies the redundant ternary in code_execution_tool.py: sandbox_tools is already computed correctly (intersection with session tools, or full SANDBOX_ALLOWED_TOOLS as fallback), so the conditional was dead logic. Inspired by PR #663 (JasonOA888). Closes #662. Tests: 2857 passed.	2026-03-10 06:35:28 -07:00
vincent	b0a5fe8974	fix: continue after output-length truncation	2026-03-10 04:30:19 -07:00
teknium1	899dfdcfb9	Merge PR #616 : fix: retry with rebuilt payload after compression Authored by tripledoublev. After context compression on 413/400 errors, the inner retry loop was reusing the stale pre-compression api_messages payload. Fix breaks out of the inner retry loop so the outer loop rebuilds api_messages from the now-compressed messages list. Adds regression test verifying the second request actually contains the compressed payload.	2026-03-10 04:22:42 -07:00
teknium1	f16f2912cf	Merge PR #607 : fix: reset all retry counters at start of run_conversation() Authored by 0xbyt4. Adds missing resets for _incomplete_scratchpad_retries and _codex_incomplete_retries to prevent stale counters carrying over between CLI conversations.	2026-03-10 04:17:47 -07:00
teknium1	c1775de56f	feat: filesystem checkpoints and /rollback command Automatic filesystem snapshots before destructive file operations, with user-facing rollback. Inspired by PR #559 (by @alireza78a). Architecture: - Shadow git repos at ~/.hermes/checkpoints/{hash}/ via GIT_DIR - CheckpointManager: take/list/restore, turn-scoped dedup, pruning - Transparent — the LLM never sees it, no tool schema, no tokens - Once per turn — only first write_file/patch triggers a snapshot Integration: - Config: checkpoints.enabled + checkpoints.max_snapshots - CLI flag: hermes --checkpoints - Trigger: run_agent.py _execute_tool_calls() before write_file/patch - /rollback slash command in CLI + gateway (list, restore by number) - Pre-rollback snapshot auto-created on restore (undo the undo) Safety: - Never blocks file operations — all errors silently logged - Skips root dir, home dir, dirs >50K files - Disables gracefully when git not installed - Shadow repo completely isolated from project git Tests: 35 new tests, all passing (2798 total suite) Docs: feature page, config reference, CLI commands reference	2026-03-10 00:49:15 -07:00
teknium1	ee4008431a	fix: stop terminal border flashing with steady cursor and TUI spinner widget Cherry-picked and improved from PR #470 (fixes #464). Problem: On Ubuntu 24.04 with ghostty + tmux, the prompt input box border lines flash due to cursor blink and raw spinner terminal writes conflicting with prompt_toolkit's rendering. Changes: - cli.py: Add CursorShape.BLOCK to Application() to disable cursor blink - cli.py: Add thinking_callback + spinner_widget in TUI layout so thinking status displays as a proper prompt_toolkit widget instead of raw terminal writes that conflict with the TUI renderer - run_agent.py: Add thinking_callback parameter to AIAgent; when set, uses the callback instead of KawaiiSpinner for thinking display What was NOT changed (preserving existing behavior): - agent/display.py: Untouched. KawaiiSpinner _write() stdout capture, _animate() logic, and 0.12s frame interval all preserved. This protects subagent stdout redirection and keeps smooth animations for non-CLI contexts (gateway, batch runner). - Original emoji spinner types (brain/sparkle/pulse/moon/star) preserved for all non-CLI contexts. Fixes from original PR #470: - CursorShape.STEADY_BLOCK -> CursorShape.BLOCK (STEADY_BLOCK doesn't exist in prompt_toolkit 3.0.52) - Removed duplicate self._spinner_text = '' line - Removed redundant nested if-checks Tested: 2706 tests pass, interactive CLI verified via tmux.	2026-03-09 23:26:43 -07:00
teknium1	3e352f8a0d	fix: add upstream guard for non-dict function_args + tests for build_tool_preview Complements PR #453 by 0xbyt4. Adds isinstance(dict) guard in run_agent.py to catch cases where json.loads returns non-dict (e.g. null, list, string) before they reach downstream code. Also adds 15 tests for build_tool_preview covering None args, empty dicts, known/unknown tools, fallback keys, truncation, and all special-cased tools (process, todo, memory, session_search).	2026-03-09 21:01:40 -07:00
teyrebaz33	94023e6a85	feat: conditional skill activation based on tool availability Skills can now declare fallback_for_toolsets, fallback_for_tools, requires_toolsets, and requires_tools in their SKILL.md frontmatter. The system prompt builder filters skills automatically based on which tools are available in the current session. - Add _read_skill_conditions() to parse conditional frontmatter fields - Add _skill_should_show() to evaluate conditions against available tools - Update build_skills_system_prompt() to accept and apply tool availability - Pass valid_tool_names and available toolsets from run_agent.py - Backward compatible: skills without conditions always show; calling build_skills_system_prompt() with no args preserves existing behavior Closes #539	2026-03-09 23:13:39 +03:00
teknium1	1f0944de21	fix: handle non-string content from OpenAI-compatible servers (#759 ) Some local LLM servers (llama-server, etc.) return message.content as a dict or list instead of a plain string. This caused AttributeError 'dict object has no attribute strip' on every API call. Normalizes content to string immediately after receiving the response: - dict: extracts 'text' or 'content' field, falls back to json.dumps - list: extracts text parts (OpenAI multimodal content format) - other: str() conversion Applied at the single point where response.choices[0].message is read in the main agent loop, so all downstream .strip()/.startswith()/[:100] operations work regardless of server implementation. Closes #759	2026-03-09 03:32:32 -07:00
0xbyt4	4684aaffdc	merge: resolve file_tools.py conflict with origin/main Combine read/search loop detection with main's redact_sensitive_text and truncation hint features. Add tracker reset to TestSearchHints to prevent cross-test state leakage.	2026-03-09 13:21:46 +03:00
teknium1	aedb773f0d	fix: stabilize system prompt across gateway turns for cache hits Two changes to prevent unnecessary Anthropic prompt cache misses in the gateway, where a fresh AIAgent is created per user message: 1. Reuse stored system prompt for continuing sessions: When conversation_history is non-empty, load the system prompt from the session DB instead of rebuilding from disk. The model already has updated memory in its conversation history (it wrote it!), so re-reading memory from disk produces a different system prompt that breaks the cache prefix. 2. Stabilize Honcho context per session: - Only prefetch Honcho context on the first turn (empty history) - Bake Honcho context into the cached system prompt and store to DB - Remove the per-turn Honcho injection from the API call loop This ensures the system message is identical across all turns in a session. Previously, re-fetching Honcho could return different context on each turn, changing the system message and invalidating the cache. Both changes preserve the existing behavior for compression (which invalidates the prompt and rebuilds from scratch) and for the CLI (where the same AIAgent persists and the cached prompt is already stable across turns). Tests: 2556 passed (6 new)	2026-03-09 01:50:58 -07:00
teknium1	35d57ed752	refactor: unified OAuth/API-key credential resolution for fallback Split fallback provider handling into two clean registries: _FALLBACK_API_KEY_PROVIDERS — env-var-based (openrouter, zai, kimi, minimax) _FALLBACK_OAUTH_PROVIDERS — OAuth-based (openai-codex, nous) New _resolve_fallback_credentials() method handles all three cases (OAuth, API key, custom endpoint) and returns a uniform (key, url, mode) tuple. _try_activate_fallback() is now just validation + client build. Adds Nous Portal as a fallback provider — uses the same OAuth flow as the primary provider (hermes login), returns chat_completions mode. OAuth providers get credential refresh for free: the existing 401 retry handlers (_try_refresh_codex/nous_client_credentials) check self.provider, which is set correctly after fallback activation. 4 new tests (nous activation, nous no-login, codex retained). 27 total fallback tests passing, 2548 full suite.	2026-03-08 21:44:48 -07:00
teknium1	5785bd3272	feat: add openai-codex as fallback provider Codex OAuth uses a different auth flow (OAuth tokens, not env vars) and a different API mode (codex_responses, not chat_completions). The fallback now handles this specially: - Resolves credentials via resolve_codex_runtime_credentials() - Sets api_mode to codex_responses - Fails gracefully if no Codex OAuth session exists Also added to the commented-out config.yaml example. 2 new tests (codex activation + graceful failure).	2026-03-08 21:34:15 -07:00
teknium1	b3765c28d0	fix: restrict fallback providers to actual hermes providers Remove hallucinated providers (openai, deepseek, together, groq, fireworks, mistral, gemini, nous) from the fallback provider map. These don't exist in hermes-agent's provider system. The real supported providers for fallback are: openrouter (OPENROUTER_API_KEY) zai (ZAI_API_KEY) kimi-coding (KIMI_API_KEY) minimax (MINIMAX_API_KEY) minimax-cn (MINIMAX_CN_API_KEY) For any other OpenAI-compatible endpoint, users can use the base_url + api_key_env overrides in the config. Also adds Kimi User-Agent header for kimi fallback (matching the main provider system).	2026-03-08 20:49:55 -07:00
teknium1	161436cfdd	feat: simple fallback model for provider resilience When the primary model/provider fails after retries (rate limit, overload, auth errors, connection failures), Hermes automatically switches to a configured fallback model for the remainder of the session. Config (in ~/.hermes/config.yaml): fallback_model: provider: openrouter model: anthropic/claude-sonnet-4 Supports all major providers: OpenRouter, OpenAI, Nous, DeepSeek, Together, Groq, Fireworks, Mistral, Gemini — plus custom endpoints via base_url and api_key_env overrides. Design principles: - Dead simple: one fallback model, not a chain - One-shot: switches once, doesn't ping-pong back - Zero new dependencies: uses existing OpenAI client - Minimal code: ~100 lines in run_agent.py, ~5 lines in cli.py/gateway - Three trigger points: max retries exhausted, non-retryable client errors, and invalid response exhaustion Does NOT trigger on context overflow or payload-too-large errors (those are handled by the existing compression system). Addresses #737. 25 new tests, 2492 total passing.	2026-03-08 20:22:33 -07:00
teknium1	2394e18729	fix: add context to interruption messages for model awareness When the agent is interrupted, the model now receives descriptive context instead of a generic 'Operation interrupted.' string: - Tool skip messages include the tool name: '[Tool execution cancelled — terminal was skipped due to user interrupt]' '[Tool execution skipped — web_search was not started. User sent a new message]' - API call interrupts include timing: 'Operation interrupted: waiting for model response (4.2s elapsed).' - Retry/error interrupts include retry context: 'Operation interrupted: retrying API call after rate limit (retry 2/5).' 'Operation interrupted: handling API error (Timeout: connection timed out).' This helps the model understand what was happening when it was interrupted, reducing wasted iterations spent re-discovering state.	2026-03-08 18:58:23 -07:00
teknium1	60b6abefd9	feat: session naming with unique titles, auto-lineage, rich listing, resume by name - Schema v4: unique title index, migration from v2/v3 - set/get/resolve session titles with uniqueness enforcement - Auto-lineage: context compression auto-numbers titles (Task -> Task #2 -> Task #3) - resolve_session_by_title: auto-latest finds most recent continuation - list_sessions_rich: preview (first 60 chars) + last_active timestamp - CLI: -c accepts optional name arg (hermes -c 'my project') - CLI: /title command with deferred mode (set before session exists) - CLI: sessions list shows Title, Preview, Last Active, ID - 27 new tests (1844 total passing)	2026-03-08 15:20:29 -07:00
0xbyt4	9eee529a7f	fix: detect and warn on file re-read loops after context compression When context compression summarizes conversation history, the agent loses track of which files it already read and re-reads them in a loop. Users report the agent reading the same files endlessly without writing. Root cause: context compression is lossy — file contents and read history are lost in the summary. After compression, the model thinks it hasn't examined the files yet and reads them again. Fix (two-part): 1. Track file reads per task in file_tools.py. When the same file region is read again, include a _warning in the response telling the model to stop re-reading and use existing information. 2. After context compression, inject a structured message listing all files already read in the session with explicit "do NOT re-read" instruction, preserving read history across compression boundaries. Adds 16 tests covering warning detection, task isolation, summary accuracy, tracker cleanup, and compression history injection.	2026-03-08 20:44:42 +03:00
teknium1	19b6f81ee7	fix: allow Anthropic API URLs as custom OpenAI-compatible endpoints Removed the hard block on base_url containing 'api.anthropic.com'. Anthropic now offers an OpenAI-compatible /chat/completions endpoint, so blocking their URL prevents legitimate use. If the endpoint isn't compatible, the API call will fail with a proper error anyway. Removed from: run_agent.py, mini_swe_runner.py Updated test to verify Anthropic URLs are accepted.	2026-03-07 23:36:35 -08:00
Christo Mitov	4447e7d71a	fix: add Kimi Code API support (api.kimi.com/coding/v1) Kimi Code (platform.kimi.ai) issues API keys prefixed sk-kimi- that require: 1. A different base URL: api.kimi.com/coding/v1 (not api.moonshot.ai/v1) 2. A User-Agent header identifying a recognized coding agent Without this fix, sk-kimi- keys fail with 401 (wrong endpoint) or 403 ('only available for Coding Agents') errors. Changes: - Auto-detect sk-kimi- key prefix and route to api.kimi.com/coding/v1 - Send User-Agent: KimiCLI/1.0 header for Kimi Code endpoints - Legacy Moonshot keys (api.moonshot.ai) continue to work unchanged - KIMI_BASE_URL env var override still takes priority over auto-detection - Updated .env.example with correct docs and all endpoint options - Fixed doctor.py health check for Kimi Code keys Reference: https://github.com/MoonshotAI/kimi-cli (platforms.py)	2026-03-07 21:00:12 -05:00
vincent	86eed141af	fix: rebuild compressed payload before retry	2026-03-07 18:55:01 -05:00
teknium1	e64d646bad	Critical: fix bug in new subagent tool call budget to not be session-level but tool call loop level	2026-03-07 10:32:51 -08:00
teknium1	b84f9e410c	feat: default reasoning effort from xhigh to medium Reduces token usage and latency for most tasks by defaulting to medium reasoning effort instead of xhigh. Users can still override via config or CLI flag. Updates code, tests, example config, and docs.	2026-03-07 10:14:19 -08:00
teknium1	23e84de830	refactor: remove model parameter from AIAgent initialization Eliminated the model parameter from the AIAgent class initialization, streamlining the constructor and ensuring consistent behavior across agent instances. This change aligns with recent updates to the task delegation logic.	2026-03-07 09:48:19 -08:00
teknium1	5a711f32b1	fix: enhance payload and context compression handling Added logic to manage multiple compression attempts for large payloads and context length errors. Introduced limits on compression attempts to prevent infinite retries, with appropriate logging and error handling. This ensures better resilience and user feedback when facing compression issues during API calls.	2026-03-07 09:19:07 -08:00
0xbyt4	8c26a057a3	fix: reset all retry counters at start of run_conversation() _incomplete_scratchpad_retries and _codex_incomplete_retries were not reset at the start of run_conversation(). In CLI mode, where the same AIAgent instance is reused across conversations, stale counters from a previous conversation could carry over, causing premature retry exhaustion and partial responses.	2026-03-07 20:12:08 +03:00
teknium1	4d34427cc7	fix: update model version in agent configurations Updated the default model version from "anthropic/claude-sonnet-4-20250514" to "anthropic/claude-sonnet-4.6" across multiple files including AGENTS.md, batch_runner.py, mini_swe_runner.py, and run_agent.py for consistency and to reflect the latest model improvements.	2026-03-07 09:06:37 -08:00
teknium1	0a82396718	feat: shared iteration budget across parent + subagents Subagent tool calls now count toward the same session-wide iteration limit as the parent agent. Previously, each subagent had its own independent counter, so a parent with max_iterations=60 could spawn 3 subagents each doing 50 calls = 150 total tool calls unmetered. Changes: - IterationBudget: thread-safe shared counter (run_agent.py) - consume(): try to use one iteration, returns False if exhausted - refund(): give back one iteration (for execute_code turns) - Thread-safe via Lock (subagents run in ThreadPoolExecutor) - Parent creates the budget, children inherit it via delegate_tool.py - execute_code turns are refunded (don't count against budget) - Default raised from 60 → 90 to account for shared consumption - Per-child cap (50) still applies as a safety valve The per-child max_iterations (default 50) remains as a per-child ceiling, but the shared budget is the hard session-wide limit. A child stops at whichever comes first.	2026-03-07 08:16:37 -08:00
teknium1	5da55ea1e3	fix: sanitize orphaned tool-call/result pairs in message compression Enhance message compression by adding a method to clean up orphaned tool-call and tool-result pairs. This ensures that the API receives well-formed messages, preventing errors related to mismatched IDs. The new functionality includes removing orphaned results and adding stub results for missing calls, improving overall message integrity during compression.	2026-03-07 08:08:00 -08:00
teknium1	69a36a3361	Merge PR #309 : fix(timezone): timezone-aware now() for prompt, cron, and execute_code Authored by areu01or00. Adds timezone support via hermes_time.now() helper with IANA timezone resolution (HERMES_TIMEZONE env → config.yaml → server-local). Updates system prompt timestamp, cron scheduling, and execute_code sandbox TZ injection. Includes config migration (v4→v5) and comprehensive test coverage.	2026-03-07 00:04:41 -08:00

1 2 3 4 5

207 Commits