hermes-agent

Author	SHA1	Message	Date
kshitijk4poor	935137f0d9	feat: add inline diff previews for write actions Show inline diffs in the CLI transcript when write_file, patch, or skill_manage modifies files. Captures a filesystem snapshot before the tool runs, computes a unified diff after, and renders it with ANSI coloring in the activity feed. Adds tool_start_callback and tool_complete_callback hooks to AIAgent for pre/post tool execution notifications. Also fixes _extract_parallel_scope_path to normalize relative paths to absolute, preventing the parallel overlap detection from missing conflicts when the same file is referenced with different path styles. Gated by display.inline_diffs config option (default: true). Based on PR #3774 by @kshitijk4poor.	2026-04-01 02:13:57 -07:00
Bartok9	afa75a6185	fix(client): handle is_closed as method in OpenAI SDK The openai SDK's SyncAPIClient.is_closed is a method, not a property. getattr(client, 'is_closed', False) returned the bound method object, which is always truthy — causing _is_openai_client_closed() to report all clients as closed and triggering unnecessary client recreation (~100-200ms TCP+TLS overhead per API call). Fix: check if is_closed is callable and call it, otherwise treat as bool. Fixes #4377 Co-authored-by: Bartok9 <Bartok9@users.noreply.github.com>	2026-04-01 01:40:43 -07:00
Teknium	161acb0086	fix: credential pool 401 recovery rotates to next credential after failed refresh (#4300 ) When an OAuth token refresh fails on a 401 error, the pool recovery would return 'not recovered' without trying the next credential in the pool. This meant users who added a second valid credential via 'hermes auth add' would never see it used when the primary credential was dead. Now: try refresh first (handles expired tokens quickly), and if that fails, rotate to the next available credential — same as 429/402 already did. Adds three tests covering 401 refresh success, refresh-fail-then-rotate, and refresh-fail-with-no-remaining-credentials.	2026-03-31 12:02:29 -07:00
arasovic	0240baa357	fix: strip orphaned think/reasoning tags from user-facing responses Some models (e.g. Kimi K2.5 on Alibaba OpenAI-compatible endpoint) emit reasoning text followed by a closing </think> without a matching opening <think> tag. The existing paired-tag regexes in _strip_think_blocks() cannot match these orphaned tags, so </think> leaks into user-facing responses on all platforms. Add a catch-all regex that strips any remaining opening or closing think/thinking/reasoning/REASONING_SCRATCHPAD tags after the existing paired-block removal pass. Closes #4285	2026-03-31 11:42:44 -07:00
Teknium	8d59881a62	feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647 ) * feat(auth): add same-provider credential pools and rotation UX Add same-provider credential pooling so Hermes can rotate across multiple credentials for a single provider, recover from exhausted credentials without jumping providers immediately, and configure that behavior directly in hermes setup. - agent/credential_pool.py: persisted per-provider credential pools - hermes auth add/list/remove/reset CLI commands - 429/402/401 recovery with pool rotation in run_agent.py - Setup wizard integration for pool strategy configuration - Auto-seeding from env vars and existing OAuth state Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com> Salvaged from PR #2647 * fix(tests): prevent pool auto-seeding from host env in credential pool tests Tests for non-pool Anthropic paths and auth remove were failing when host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials were present. The pool auto-seeding picked these up, causing unexpected pool entries in tests. - Mock _select_pool_entry in auxiliary_client OAuth flag tests - Clear Anthropic env vars and mock _seed_from_singletons in auth remove test * feat(auth): add thread safety, least_used strategy, and request counting - Add threading.Lock to CredentialPool for gateway thread safety (concurrent requests from multiple gateway sessions could race on pool state mutations without this) - Add 'least_used' rotation strategy that selects the credential with the lowest request_count, distributing load more evenly - Add request_count field to PooledCredential for usage tracking - Add mark_used() method to increment per-credential request counts - Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current() with lock acquisition - Add tests: least_used selection, mark_used counting, concurrent thread safety (4 threads × 20 selects with no corruption) * feat(auth): add interactive mode for bare 'hermes auth' command When 'hermes auth' is called without a subcommand, it now launches an interactive wizard that: 1. Shows full credential pool status across all providers 2. Offers a menu: add, remove, reset cooldowns, set strategy 3. For OAuth-capable providers (anthropic, nous, openai-codex), the add flow explicitly asks 'API key or OAuth login?' — making it clear that both auth types are supported for the same provider 4. Strategy picker shows all 4 options (fill_first, round_robin, least_used, random) with the current selection marked 5. Remove flow shows entries with indices for easy selection The subcommand paths (hermes auth add/list/remove/reset) still work exactly as before for scripted/non-interactive use. * fix(tests): update runtime_provider tests for config.yaml source of truth (#4165) Tests were using OPENAI_BASE_URL env var which is no longer consulted after #4165. Updated to use model config (provider, base_url, api_key) which is the new single source of truth for custom endpoint URLs. * feat(auth): support custom endpoint credential pools keyed by provider name Custom OpenAI-compatible endpoints all share provider='custom', making the provider-keyed pool useless. Now pools for custom endpoints are keyed by 'custom:<normalized_name>' where the name comes from the custom_providers config list (auto-generated from URL hostname). - Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)' - load_pool('custom:name') seeds from custom_providers api_key AND model.api_key when base_url matches - hermes auth add/list now shows custom endpoints alongside registry providers - _resolve_openrouter_runtime and _resolve_named_custom_runtime check pool before falling back to single config key - 6 new tests covering custom pool keying, seeding, and listing * docs: add Excalidraw diagram of full credential pool flow Comprehensive architecture diagram showing: - Credential sources (env vars, auth.json OAuth, config.yaml, CLI) - Pool storage and auto-seeding - Runtime resolution paths (registry, custom, OpenRouter) - Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh) - CLI management commands and strategy configuration Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g * fix(tests): update setup wizard pool tests for unified select_provider_and_model flow The setup wizard now delegates to select_provider_and_model() instead of using its own prompt_choice-based provider picker. Tests needed: - Mock select_provider_and_model as no-op (provider pre-written to config) - Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it) - Pre-write model.provider to config so the pool step is reached * docs: add comprehensive credential pool documentation - New page: website/docs/user-guide/features/credential-pools.md Full guide covering quick start, CLI commands, rotation strategies, error recovery, custom endpoint pools, auto-discovery, thread safety, architecture, and storage format. - Updated fallback-providers.md to reference credential pools as the first layer of resilience (same-provider rotation before cross-provider) - Added hermes auth to CLI commands reference with usage examples - Added credential_pool_strategies to configuration guide * chore: remove excalidraw diagram from repo (external link only) * refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns - _load_config_safe(): replace 4 identical try/except/import blocks - _iter_custom_providers(): shared generator for custom provider iteration - PooledCredential.extra dict: collapse 11 round-trip-only fields (token_type, scope, client_id, portal_base_url, obtained_at, expires_in, agent_key_id, agent_key_expires_in, agent_key_reused, agent_key_obtained_at, tls) into a single extra dict with __getattr__ for backward-compatible access - _available_entries(): shared exhaustion-check between select and peek - Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical) - SimpleNamespace replaces class _Args boilerplate in auth_commands - _try_resolve_from_custom_pool(): shared pool-check in runtime_provider Net -17 lines. All 383 targeted tests pass. --------- Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-31 03:10:01 -07:00
Teknium	252fbea005	feat(providers): add ordered fallback provider chain (salvage #1761 ) (#3813 ) Extends the single fallback_model mechanism into an ordered chain. When the primary model fails, Hermes tries each fallback provider in sequence until one succeeds or the chain is exhausted. Config format (new): fallback_providers: - provider: openrouter model: anthropic/claude-sonnet-4 - provider: openai model: gpt-4o Legacy single-dict fallback_model format still works unchanged. Key fix vs original PR: the call sites in the retry loop now use _fallback_index < len(_fallback_chain) instead of the old one-shot _fallback_activated guard, so the chain actually advances through all configured providers. Changes: - run_agent.py: _fallback_chain list + _fallback_index replaces one-shot _fallback_model; _try_activate_fallback() advances through chain; failed provider resolution skips to next entry; call sites updated to allow chain advancement - cli.py: reads fallback_providers with legacy fallback_model compat - gateway/run.py: same - hermes_cli/config.py: fallback_providers: [] in DEFAULT_CONFIG - tests: 12 new chain tests + 6 existing test fixtures updated Co-authored-by: uzaylisak <uzaylisak@users.noreply.github.com>	2026-03-29 16:04:53 -07:00
Teknium	924857c3e3	fix: prevent tool name/arg concatenation for Ollama-compatible endpoints (#3582 ) Ollama reuses index 0 for every tool call in a parallel batch, distinguishing them only by id. The streaming accumulator now detects a new non-empty id at an already-active index and redirects it to a fresh slot, preventing names and arguments from being concatenated into a single tool call. No-op for normal providers that use incrementing indices. Co-authored-by: dmater01 <dmater01@users.noreply.github.com>	2026-03-28 14:08:26 -07:00
Teknium	901494d728	feat: make tool-use enforcement configurable via agent.tool_use_enforcement (#3551 ) The TOOL_USE_ENFORCEMENT_GUIDANCE injection (added in #3528) was hardcoded to only match gpt/codex model names. This makes it a config option so users can turn it on for any model family. New config key: agent.tool_use_enforcement - "auto" (default): matches gpt/codex (existing behavior) - true: inject for all models - false: never inject - list of strings: custom model-name substrings to match e.g. ["gpt", "codex", "deepseek", "qwen"] No version bump needed — deep merge provides the default automatically for existing installs. 12 new tests covering all config modes.	2026-03-28 12:31:22 -07:00
Teknium	8fdfc4b00c	fix(agent): detect thinking-budget exhaustion on truncation, skip useless retries (#3444 ) When finish_reason='length' and the response contains only reasoning (think blocks or empty content), the model exhausted its output token budget on thinking with nothing left for the actual response. Previously, this fell into either: - chat_completions: 3 useless continuation retries (model hits same limit) - anthropic/codex: generic 'Response truncated' error with rollback Now: detect the think-only + length condition early and return immediately with a targeted error message: 'Model used all output tokens on reasoning with none left for the response. Try lowering reasoning effort or increasing max_tokens.' This saves 2 wasted API calls on the chat_completions path and gives users actionable guidance instead of a cryptic error. The existing think-only retry logic (finish_reason='stop') is unchanged — that's a genuine model glitch where retrying can help.	2026-03-27 15:29:30 -07:00
Teknium	fb46a90098	fix: increase API timeout default from 900s to 1800s for slow-thinking models (#3431 ) Models like GLM-5/5.1 can think for 15+ minutes. The previous 900s (15 min) default for HERMES_API_TIMEOUT killed legitimate requests. Raised to 1800s (30 min) in both places that read the env var: - _build_api_kwargs() timeout (non-streaming total timeout) - _call_chat_completions() write timeout (streaming connection) The streaming per-chunk read timeout (60s) and stale stream detector (180-300s) are unchanged — those are appropriate for inter-chunk timing.	2026-03-27 13:02:23 -07:00
Teknium	5a1e2a307a	perf(ttft): salvage easy-win startup optimizations from #3346 (#3395 ) * perf(ttft): dedupe shared tool availability checks * perf(ttft): short-circuit vision auto-resolution * perf(ttft): make Claude Code version detection lazy * perf(ttft): reuse loaded toolsets for skills prompt --------- Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-27 07:49:44 -07:00
Teknium	08d3be0412	fix: graceful return on max retries instead of crashing thread run_conversation raised the raw exception after exhausting retries, which crashed the background thread in cli.py (unhandled exception in Thread). Now returns a proper error result dict with failed=True and persists the session, matching the pattern used by other error paths (invalid responses, empty content, etc.). Also wraps cli.py's run_agent thread function in try/except as a safety net against any future unhandled exceptions from run_conversation. Made-with: Cursor	2026-03-25 19:00:39 -07:00
Teknium	099dfca6db	fix: GLM reasoning-only and max-length handling (#3010 ) - Add 'prompt exceeds max length' to context overflow detection for Z.AI/GLM 400 errors - Extract inline reasoning blocks from assistant content as fallback when no structured reasoning fields are present - Guard inline extraction so structured API reasoning takes priority - Update test for reasoning-only response salvage behavior Cherry-picked from PR #2993 by kshitijk4poor. Added priority guard to fix test_structured_reasoning_takes_priority failure. Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>	2026-03-25 12:05:37 -07:00
Teknium	525caadd8c	fix: prevent Anthropic token leaking to third-party anthropic_messages providers (salvage #2383 ) (#2389 ) * fix: prevent Anthropic token fallback leaking to third-party anthropic_messages providers When provider is minimax/alibaba/etc and MINIMAX_API_KEY is not set, the code fell back to resolve_anthropic_token() sending Anthropic OAuth credentials to third-party endpoints, causing 401 errors. Now only provider=="anthropic" triggers the fallback. Generalizes the Alibaba-specific guard from #1739 to all non-Anthropic providers. * fix: set provider='anthropic' in credential refresh tests Follow-up for cherry-picked PR #2383 — existing tests didn't set agent.provider, which the new guard requires to allow Anthropic token refresh. --------- Co-authored-by: 0xbyt4 <35742124+0xbyt4@users.noreply.github.com>	2026-03-21 16:42:46 -07:00
Test	e7844e9c8d	Merge origin/main, resolve conflicts (self._base_url_lower)	2026-03-18 04:09:00 -07:00
Teknium	c0c14e60b4	fix: make concurrent tool batching path-aware for file mutations (#1914 ) * Improve tool batching independence checks * fix: address review feedback on path-aware batching - Log malformed/non-dict tool arguments at debug level before falling back to sequential, instead of silently swallowing the error into an empty dict - Guard empty paths in _paths_overlap (unreachable in practice due to upstream filtering, but makes the invariant explicit) - Add tests: malformed JSON args, non-dict args, _paths_overlap unit tests including empty path edge cases - web_crawl is not a registered tool (only web_search/web_extract are); no addition needed to _PARALLEL_SAFE_TOOLS --------- Co-authored-by: kshitij <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-18 03:25:38 -07:00
Test	5b74df2bfc	fix: OAuth flag stale after refresh/fallback, memory nudge never fires, dead code - Update _is_anthropic_oauth in _try_refresh_anthropic_client_credentials() when token type changes during credential refresh - Set _is_anthropic_oauth in _try_activate_fallback() Anthropic path - Move _turns_since_memory and _iters_since_skill init to __init__ so nudge counters accumulate across run_conversation() calls in CLI mode - Remove unreachable retry_count >= max_retries block after raise Adds 7 regression tests. Salvaged from PR #1797 by @0xbyt4.	2026-03-18 02:19:57 -07:00
max	0c392e7a87	feat: integrate GitHub Copilot providers across Hermes Add first-class GitHub Copilot and Copilot ACP provider support across model selection, runtime provider resolution, CLI sessions, delegated subagents, cron jobs, and the Telegram gateway. This also normalizes Copilot model catalogs and API modes, introduces a Copilot ACP OpenAI-compatible shim, and fixes service-mode auth by resolving Homebrew-installed gh binaries under launchd. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-17 23:40:22 -07:00
teknium1	8e07f9ca56	fix: audit fixes — 5 bugs found and resolved Thorough code review found 5 issues across run_agent.py, cli.py, and gateway/: 1. CRITICAL — Gateway stream consumer task never started: stream_consumer_holder was checked BEFORE run_sync populated it. Fixed with async polling pattern (same as track_agent). 2. MEDIUM-HIGH — Streaming fallback after partial delivery caused double-response: if streaming failed after some tokens were delivered, the fallback would re-deliver the full response. Now tracks deltas_were_sent and only falls back when no tokens reached consumers yet. 3. MEDIUM — Codex mode lost on_first_delta spinner callback: _run_codex_stream now accepts on_first_delta parameter, fires it on first text delta. Passed through from _interruptible_streaming_api_call via _codex_on_first_delta instance attribute. 4. MEDIUM — CLI close-tag after-text bypassed tag filtering: text after a reasoning close tag was sent directly to _emit_stream_text, skipping open-tag detection. Now routes through _stream_delta for full filtering. 5. LOW — Removed 140 lines of dead code: old _streaming_api_call method (superseded by _interruptible_streaming_api_call). Updated 13 tests in test_run_agent.py and test_openai_client_lifecycle.py to use the new method name and signature. 4573 tests passing.	2026-03-16 06:35:46 -07:00
Teknium	dd7921d514	fix(honcho): isolate session routing for multi-user gateway (#1500 ) Salvaged from PR #1470 by adavyas. Core fix: Honcho tool calls in a multi-session gateway could route to the wrong session because honcho_tools.py relied on process-global state. Now threads session context through the call chain: AIAgent._invoke_tool() → handle_function_call() → registry.dispatch() → handler **kw → _resolve_session_context() Changes: - Add _resolve_session_context() to prefer per-call context over globals - Plumb honcho_manager + honcho_session_key through handle_function_call - Add sync_honcho=False to run_conversation() for synthetic flush turns - Pass honcho_session_key through gateway memory flush lifecycle - Harden gateway PID detection when /proc cmdline is unreadable - Make interrupt test scripts import-safe for pytest-xdist - Wrap BibTeX examples in Jekyll raw blocks for docs build - Fix thread-order-dependent assertion in client lifecycle test - Expand Honcho docs: session isolation, lifecycle, routing internals Dropped from original PR: - Indentation change in _create_request_openai_client that would move client creation inside the lock (causes unnecessary contention) Co-authored-by: adavyas <adavyas@users.noreply.github.com>	2026-03-16 00:23:47 -07:00
Teknium	3f0f4a04a9	fix(agent): skip reasoning extra_body for unsupported OpenRouter models (#1485 ) * fix(agent): skip reasoning extra_body for models that don't support it Sending reasoning config to models like MiniMax or Nvidia via OpenRouter causes a 400 BadRequestError. Previously, reasoning extra_body was sent to all OpenRouter and Nous models unconditionally. Fix: only send reasoning extra_body when the model slug starts with a known reasoning-capable prefix (deepseek/, anthropic/, openai/, x-ai/, google/gemini-2, qwen/qwen3) or when using Nous Portal directly. Applies to both the main API call path (_build_api_kwargs) and the conversation summary path. Fixes #1083 * test(agent): cover reasoning extra_body gating --------- Co-authored-by: ygd58 <buraysandro9@gmail.com>	2026-03-15 20:42:07 -07:00
teknium1	62abb453d3	Merge origin/main into hermes/hermes-daa73839	2026-03-14 23:44:47 -07:00
teknium1	735a6e7651	fix: convert anthropic image content blocks	2026-03-14 23:41:20 -07:00
0xbyt4	6f85283553	fix: use json.dumps instead of str() for Codex Responses API arguments When the Responses API returns tool call arguments as a dict, str(dict) produces Python repr with single quotes (e.g. {'key': 'val'}) which is invalid JSON. Downstream json.loads() fails silently and the tool gets called with empty arguments, losing all parameters. Affects both function_call and custom_tool_call item types in _normalize_codex_response().	2026-03-14 22:03:53 -07:00
teknium1	70ea13eb40	fix: preflight Anthropic auth and prefer Claude store	2026-03-14 19:38:55 -07:00
teknium1	e052c74727	fix: refresh Anthropic OAuth before stale env tokens	2026-03-14 19:22:31 -07:00
teknium1	7b10881b9e	fix: persist clean voice transcripts and /voice off state - keep CLI voice prefixes API-local while storing the original user text - persist explicit gateway off state and restore adapter auto-TTS suppression on restart - add regression coverage for both behaviors	2026-03-14 06:14:22 -07:00
0xbyt4	eb34c0b09a	fix: voice pipeline hardening — 7 bug fixes with tests 1. Anthropic + ElevenLabs TTS silence: forward full response to TTS callback for non-streaming providers (choices first, then native content blocks fallback). 2. Subprocess timeout kill: play_audio_file now kills the process on TimeoutExpired instead of leaving zombie processes. 3. Discord disconnect cleanup: leave all voice channels before closing the client to prevent leaked state. 4. Audio stream leak: close InputStream if stream.start() fails. 5. Race condition: read/write _on_silence_stop under lock in audio callback thread. 6. _vprint force=True: show API error, retry, and truncation messages even during streaming TTS. 7. _refresh_level lock: read _voice_recording under _voice_lock.	2026-03-14 14:27:21 +03:00
0xbyt4	c797314fcf	test: add security and hardening tests for voice mode fixes - Path traversal sanitization (Path.name strips ../) - Media endpoint authentication (401 without token, 404 on traversal) - hmac.compare_digest usage verification (no == for tokens) - DOMPurify XSS prevention in HTML template - Default bind 127.0.0.1 (adapter and config) - /remote-control token hiding in group chats - Opus find_library instead of hardcoded paths - Opus decode error logging (no silent swallow) - Interrupt _vprint force=True on all 6 calls - Anthropic interrupt handler in both API call paths - Update test_web_defaults for new 127.0.0.1 default	2026-03-14 14:27:21 +03:00
0xbyt4	ecc3dd7c63	test: add comprehensive voice mode test coverage (86 tests) - Add TestStreamingApiCall (11 tests) for _streaming_api_call in test_run_agent.py - Add regression tests for all 7 bug fixes (edge_tts lazy import, output_stream cleanup, ctrl+c continuous reset, disable stops TTS, config key, chat cleanup, browser_tool signal handler removal) - Add real behavior tests for CLI voice methods via _make_voice_cli() fixture: TestHandleVoiceCommandReal (7), TestEnableVoiceModeReal (7), TestDisableVoiceModeReal (6), TestVoiceSpeakResponseReal (7), TestVoiceStopAndTranscribeReal (12)	2026-03-14 14:27:20 +03:00
teknium1	cbbba87099	fix: reuse shared atomic session log helper	2026-03-14 02:56:13 -07:00
Teknium	1117a21065	Merge pull request #1271 from NousResearch/hermes/hermes-de3d4e49 fix: guard init-time stdio writes	2026-03-14 02:21:39 -07:00
teknium1	936040d8f7	fix: guard init-time stdio writes	2026-03-14 02:19:46 -07:00
teknium1	806b79b589	test: cover errors.log handler reuse	2026-03-13 23:56:51 -07:00
Teknium	938e887b4c	fix: keep honcho recall out of cached system prefix (#1201 ) Attach later-turn Honcho recall to the current-turn user message at API call time instead of appending it to the system prompt. This preserves the stable system-prefix cache while keeping Honcho continuity context available for the turn. Also adds regression coverage for the injection helper and for continuing sessions so Honcho recall stays out of the system prompt.	2026-03-13 21:07:00 -07:00
Teknium	0157253145	Merge pull request #1152 from NousResearch/hermes/hermes-f47f71c0 feat: concurrent tool execution with ThreadPoolExecutor	2026-03-13 03:20:38 -07:00
kshitijk4poor	ccfbf42844	feat: secure skill env setup on load (core #688 ) When a skill declares required_environment_variables in its YAML frontmatter, missing env vars trigger a secure TUI prompt (identical to the sudo password widget) when the skill is loaded. Secrets flow directly to ~/.hermes/.env, never entering LLM context. Key changes: - New required_environment_variables frontmatter field for skills - Secure TUI widget (masked input, 120s timeout) - Gateway safety: messaging platforms show local setup guidance - Legacy prerequisites.env_vars normalized into new format - Remote backend handling: conservative setup_needed=True - Env var name validation, file permissions hardened to 0o600 - Redact patterns extended for secret-related JSON fields - 12 existing skills updated with prerequisites declarations - ~48 new tests covering skip, timeout, gateway, remote backends - Dynamic panel widget sizing (fixes hardcoded width from original PR) Cherry-picked from PR #723 by kshitijk4poor, rebased onto current main with conflict resolution. Fixes #688 Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>	2026-03-13 03:14:04 -07:00
teknium1	5d0d5b191c	feat: concurrent tool execution with ThreadPoolExecutor When the model returns multiple tool calls in a single response, they are now executed concurrently using a thread pool instead of sequentially. This significantly reduces wall-clock time when multiple independent tools are batched (e.g. parallel web_search, read_file, terminal calls). Architecture: - _execute_tool_calls() dispatches to sequential or concurrent path - Single tool calls and batches containing 'clarify' use sequential path - Multiple non-interactive tools use ThreadPoolExecutor (max 8 workers) - Results are collected and appended to messages in original order - _invoke_tool() extracted as shared tool invocation helper Safety: - Pre-flight interrupt check skips all tools if interrupted - Per-tool exception handling: one failure doesn't crash the batch - Result truncation (100k char limit) applied per tool - Budget pressure injection after all tools complete - Checkpoints taken before file-mutating tools - CLI spinner shows batch progress, then per-tool completion messages Tests: 10 new tests covering dispatch logic, ordering, error handling, interrupt behavior, truncation, and _invoke_tool routing.	2026-03-13 02:51:51 -07:00
Teknium	a282322845	Merge pull request #1121 from 0xbyt4/fix/anthropic-adapter-issues fix: anthropic adapter — max_tokens, fallback crash, proxy base_url	2026-03-12 19:07:06 -07:00
Teknium	475dd58a8e	Merge PR #736 : feat(honcho): async writes, memory modes, session title integration, setup CLI Authored by erosika. Builds on #38 and #243. Adds async write support, configurable memory modes, context prefetch pipeline, 4 new Honcho tools (honcho_context, honcho_profile, honcho_search, honcho_conclude), full 'hermes honcho' CLI, session strategies, AI peer identity, recallMode A/B, gateway lifecycle management, and comprehensive docs. Cherry-picks fixes from PRs #831/#832 (adavyas). Co-authored-by: erosika <erosika@users.noreply.github.com> Co-authored-by: adavyas <adavyas@users.noreply.github.com>	2026-03-12 19:05:11 -07:00
0xbyt4	22479b053c	fix: anthropic adapter — max_tokens ignored, fallback crash, proxy base_url filtered - Pass self.max_tokens to build_anthropic_kwargs instead of hardcoded None - Add anthropic case to _try_activate_fallback (was only handling openai-codex) - Remove 'anthropic in base_url' filter that blocked custom proxy URLs	2026-03-13 04:22:16 +03:00
teknium1	5e12442b4b	feat: native Anthropic provider with Claude Code credential auto-discovery Add Anthropic as a first-class inference provider, bypassing OpenRouter for direct API access. Uses the native Anthropic SDK with a full format adapter (same pattern as the codex_responses api_mode). ## Auth (three methods, priority order) 1. ANTHROPIC_API_KEY env var (regular API key, sk-ant-api-) 2. ANTHROPIC_TOKEN / CLAUDE_CODE_OAUTH_TOKEN env var (setup-token, sk-ant-oat-) 3. Auto-discovery from ~/.claude/.credentials.json (Claude Code subscription) - Reads Claude Code's OAuth credentials - Checks token expiry with 60s buffer - Setup tokens use Bearer auth + anthropic-beta: oauth-2025-04-20 header - Regular API keys use standard x-api-key header ## Changes by file ### New files - agent/anthropic_adapter.py — Client builder, message/tool/response format conversion, Claude Code credential reader, token resolver. Handles system prompt extraction, tool_use/tool_result blocks, thinking/reasoning, orphaned tool_use cleanup, cache_control. - tests/test_anthropic_adapter.py — 36 tests covering all adapter logic ### Modified files - pyproject.toml — Add anthropic>=0.39.0 dependency - hermes_cli/auth.py — Add 'anthropic' to PROVIDER_REGISTRY with three env vars, plus 'claude'/'claude-code' aliases - hermes_cli/models.py — Add model catalog, labels, aliases, provider order - hermes_cli/main.py — Add 'anthropic' to --provider CLI choices - hermes_cli/runtime_provider.py — Add Anthropic branch returning api_mode='anthropic_messages' (before generic api_key fallthrough) - hermes_cli/setup.py — Add Anthropic setup wizard with Claude Code credential auto-discovery, model selection, OpenRouter tools prompt - agent/auxiliary_client.py — Add claude-haiku-4-5 as aux model - agent/model_metadata.py — Add bare Claude model context lengths - run_agent.py — Add anthropic_messages api_mode: * Client init (Anthropic SDK instead of OpenAI) * API call dispatch (_anthropic_client.messages.create) * Response validation (content blocks) * finish_reason mapping (stop_reason -> finish_reason) * Token usage (input_tokens/output_tokens) * Response normalization (normalize_anthropic_response) * Client interrupt/rebuild * Prompt caching auto-enabled for native Anthropic - tests/test_run_agent.py — Update test_anthropic_base_url_accepted to expect native routing, add test_prompt_caching_native_anthropic	2026-03-12 15:47:45 -07:00
Erosika	fefc709b2c	merge: resolve conflict with main in subagent interrupt test	2026-03-12 16:28:57 -04:00
Erosika	ae2a5e5743	refactor(honcho): remove local memory mode The "local" memoryMode was redundant with enabled: false. Simplifies the mode system to hybrid and honcho only.	2026-03-12 16:23:34 -04:00
teknium1	29ef69c703	fix: update all test mocks for call_llm migration Update 14 test files to use the new call_llm/async_call_llm mock patterns instead of the old get_text_auxiliary_client/ get_vision_auxiliary_client tuple returns. - vision_tools tests: mock async_call_llm instead of _aux_async_client - browser tests: mock call_llm instead of _aux_vision_client - flush_memories tests: mock call_llm instead of get_text_auxiliary_client - session_search tests: mock async_call_llm with RuntimeError - mcp_tool tests: fix whitelist model config, use side_effect for multi-response tests - auxiliary_config_bridge: update for model=None (resolved in router) 3251 passed, 2 pre-existing unrelated failures.	2026-03-11 21:06:54 -07:00
Erosika	2d35016b94	fix(honcho): harden tool gating and migration peer routing Prevent stale Honcho tool exposure in context/local modes, restore reliable async write retry behavior, and ensure SOUL.md migration uploads target the AI peer instead of the user peer. Also align Honcho CLI key checks with host-scoped apiKey resolution and lock the fixes with regression tests. Made-with: Cursor	2026-03-11 18:21:27 -04:00
Erosika	a0b0dbe6b2	Merge remote-tracking branch 'origin/main' into feat/honcho-async-memory Made-with: Cursor # Conflicts: # cli.py # tests/test_run_agent.py	2026-03-11 12:22:56 -04:00
teknium1	a8409a161f	fix: guard all print() calls against OSError with _SafeWriter When hermes-agent runs as a systemd service, Docker container, or headless daemon, the stdout pipe can become unavailable (idle timeout, buffer exhaustion, socket reset). Any print() call then raises OSError: [Errno 5] Input/output error, crashing run_conversation() and causing cron jobs to fail. Rather than wrapping individual print() calls (68 in run_conversation alone), this adds a transparent _SafeWriter wrapper installed once at the start of run_conversation(). It delegates all writes to the real stdout and silently catches OSError. Zero overhead on the happy path, comprehensive coverage of all print calls including future ones. Fixes #845 Co-authored-by: J0hnLawMississippi <J0hnLawMississippi@users.noreply.github.com>	2026-03-11 09:19:10 -07:00
Erosika	047b118299	fix(honcho): resolve review blockers for merge Address merge-blocking review feedback by removing unsafe signal handler overrides, wiring next-turn Honcho prefetch, restoring per-directory session defaults, and exposing all Honcho tools to the model surface. Also harden prefetch cache access with public thread-safe accessors and remove duplicate browser cleanup code. Made-with: Cursor	2026-03-11 11:46:37 -04:00
teknium1	21ff0d39ad	feat: iteration budget pressure via tool result injection Two-tier warning system that nudges the LLM as it approaches max_iterations, injected into the last tool result JSON rather than as a separate system message: - Caution (70%): {"_budget_warning": "[BUDGET: 42/60...]"} - Warning (90%): {"_budget_warning": "[BUDGET WARNING: 54/60...]"} For JSON tool results, adds a _budget_warning field to the existing dict. For plain text results, appends the warning as text. Key properties: - No system messages injected mid-conversation - No changes to message structure - Prompt cache stays valid - Configurable thresholds (0.7 / 0.9) - Can be disabled: _budget_pressure_enabled = False Inspired by PR #421 (@Bartok9) and issue #414. 8 tests covering thresholds, edge cases, JSON and text injection.	2026-03-11 00:37:24 -07:00

1 2

75 Commits