hermes-agent

Author	SHA1	Message	Date
Teknium	68ab37e891	fix(delegate): give subagents independent iteration budgets (#3004 ) Each subagent now gets its own IterationBudget instead of sharing the parent's. The per-subagent cap is controlled by delegation.max_iterations in config.yaml (default 50). Total iterations across parent + subagents can exceed the parent's max_iterations, but the user retains control via the config setting. Previously, subagents shared the parent's budget, so three parallel subagents configured for max_iterations=50 racing against a parent that already used 60 of 90 would each only get ~10 iterations. Inspired by PR #2928 (Bartok9) which identified the issue (#2873).	2026-03-25 11:29:49 -07:00
Teknium	61949f0af7	Fix (#2997 ) Co-authored-by: Jack <jvand@DESKTOP-JACK.localdomain>	2026-03-25 11:12:11 -07:00
Teknium	52c5e491f5	fix(session): surface silent SessionDB failures that cause session data loss (#2999 ) * fix(session): surface silent SessionDB failures that cause session data loss SessionDB initialization and operation failures were logged at debug level or silently swallowed, causing sessions to never be indexed in the FTS5 database. This made session_search unable to find affected conversations. In practice, ~48% of sessions can be lost without any visible indication. The JSON session files are still written (separate code path), but the SQLite/FTS5 index gets nothing — making session_search return empty results for affected sessions. Changes: - cli.py: Log warnings (not debug) when SessionDB init fails at both __init__ and _start_session entry points - run_agent.py: Log warnings on create_session, append_message, and compression split failures - run_agent.py: Set _session_db = None after create_session failure to fail fast instead of silently dropping every message for the session Root cause: When gateway restarts or DB lock contention occurs during SessionDB() init, the exception is caught and swallowed. The agent continues running normally — JSON session logs are written to disk — but no messages reach the FTS5 index. * fix: use module logger instead of root logging for SessionDB warnings Follow-up to cherry-picked PR #2939 — the original used logging.warning() (root logger) instead of logger.warning() (module logger) in the 5 new warning calls. Module logger preserves the logger hierarchy and shows the correct module name in log output. --------- Co-authored-by: LucidPaths <lc77@outlook.de>	2026-03-25 11:10:19 -07:00
Teknium	42fec19151	feat: persist reasoning across gateway session turns (schema v6) (#2974 ) feat: persist reasoning across gateway session turns (schema v6) Tested against OpenAI Codex (direct), Anthropic (direct + OAI-compat), and OpenRouter → 6 backends. All reasoning field types (reasoning, reasoning_details, codex_reasoning_items) round-trip through the DB correctly.	2026-03-25 09:47:28 -07:00
Teknium	5dbe2d9d73	fix: skills-sh install fails for deeply nested repo structures (#2980 ) * fix(run_agent): ensure _fire_first_delta() is called for tool generation events Added calls to _fire_first_delta() in the AIAgent class to improve the handling of tool generation events, ensuring timely notifications during the processing of function calls and tool usage. * fix(run_agent): improve timeout handling for chat completions Enhanced the timeout configuration for chat completions in the AIAgent class by introducing customizable connection, read, and write timeouts using environment variables. This ensures more robust handling of API requests during streaming operations. * fix(run_agent): reduce default stream read timeout for chat completions Updated the default stream read timeout from 120 seconds to 60 seconds in the AIAgent class, enhancing the timeout configuration for chat completions. This change aims to improve responsiveness during streaming operations. * fix(run_agent): enhance streaming error handling and retry logic Improved the error handling and retry mechanism for streaming requests in the AIAgent class. Introduced a configurable maximum number of stream retries and refined the handling of transient network errors, allowing for retries with fresh connections. Non-transient errors now trigger a fallback to non-streaming only when appropriate, ensuring better resilience during API interactions. * fix: skills-sh install fails for deeply nested repo structures Skills in repos with deep directory nesting (e.g. cli-tool/components/skills/development/senior-backend/) could not be installed because the candidate path generation and shallow root-dir scan never reached them. Added GitHubSource._find_skill_in_repo_tree() which uses the GitHub Trees API to recursively search the entire repo tree in a single API call. This is used as a final fallback in SkillsShSource._discover_identifier() when the standard candidate paths and shallow scan both fail. Fixes installation of skills from repos like davila7/claude-code-templates where skills are nested 4+ levels deep. Reported by user Samuraixheart.	2026-03-25 09:31:05 -07:00
Teknium	fd292e676b	fix: skip KawaiiSpinner when TUI handles tool progress (#2973 ) * docs: unify hooks documentation — add plugin hooks to hooks page, add session:end event The hooks page only documented gateway event hooks (HOOK.yaml system). The plugins page listed plugin hooks (pre_tool_call, etc.) that weren't referenced from the hooks page, which was confusing. Changes: - hooks.md: Add overview table showing both hook systems - hooks.md: Add Plugin Hooks section with available hooks, callback signatures, and example - hooks.md: Add missing session:end gateway event (emitted but undocumented) - hooks.md: Mark pre_llm_call, post_llm_call, on_session_start, on_session_end as planned (defined in VALID_HOOKS but not yet invoked) - hooks.md: Update info box to cross-reference plugin hooks - hooks.md: Fix heading hierarchy (gateway content as subsections) - plugins.md: Add cross-reference to hooks page for full details - plugins.md: Mark planned hooks as (planned) * feat(session_search): add recent sessions mode when query is omitted When session_search is called without a query (or with an empty query), it now returns metadata for the most recent sessions instead of erroring. This lets the agent quickly see what was worked on recently without needing specific keywords. Returns for each session: session_id, title, source, started_at, last_active, message_count, preview (first user message). Zero LLM cost — pure DB query. Current session lineage and child delegation sessions are excluded. The agent can then keyword-search specific sessions if it needs deeper context from any of them. * docs: clarify two-mode behavior in session_search schema description * fix(compression): restore sane defaults and cap summary at 12K tokens - threshold: 0.80 → 0.50 (compress at 50%, not 80%) - target_ratio: 0.40 → 0.20, now relative to threshold not total context (20% of 50% = 10% of context as tail budget) - summary ceiling: 32K → 12K (Gemini can't output more than ~12K) - Updated DEFAULT_CONFIG, config display, example config, and tests * fix: browser_vision ignores auxiliary.vision.timeout config (#2901) * docs: unify hooks documentation — add plugin hooks to hooks page, add session:end event The hooks page only documented gateway event hooks (HOOK.yaml system). The plugins page listed plugin hooks (pre_tool_call, etc.) that weren't referenced from the hooks page, which was confusing. Changes: - hooks.md: Add overview table showing both hook systems - hooks.md: Add Plugin Hooks section with available hooks, callback signatures, and example - hooks.md: Add missing session:end gateway event (emitted but undocumented) - hooks.md: Mark pre_llm_call, post_llm_call, on_session_start, on_session_end as planned (defined in VALID_HOOKS but not yet invoked) - hooks.md: Update info box to cross-reference plugin hooks - hooks.md: Fix heading hierarchy (gateway content as subsections) - plugins.md: Add cross-reference to hooks page for full details - plugins.md: Mark planned hooks as (planned) * fix: browser_vision ignores auxiliary.vision.timeout config browser_vision called call_llm() without passing a timeout parameter, so it always used the 30-second default in auxiliary_client.py. This made vision analysis with local models (llama.cpp, ollama) impossible since they typically need more than 30s for screenshot analysis. Now browser_vision reads auxiliary.vision.timeout from config.yaml (same config key that vision_analyze already uses) and passes it through to call_llm(). Also bumped the default vision timeout from 30s to 120s in both browser_vision and vision_analyze — 30s is too aggressive for local models and the previous default silently failed for anyone running vision locally. Fixes user report from GamerGB1988. * fix(skills): agent-created skills were incorrectly treated as untrusted community content _resolve_trust_level() didn't handle 'agent-created' source, so it fell through to 'community' trust level. Community policy blocks on any caution or dangerous findings, which meant common patterns like curl with env vars, systemctl, crontab, cloudflared references etc. would block skill creation/patching. The agent-created policy row already existed in INSTALL_POLICY with permissive settings (allow caution, ask on dangerous) but was never reached. Now it is. Fixes reports of skill_manage being blocked by security scanner. * fix(cli): enhance real-time reasoning output by forcing flush of long partial lines Updated the reasoning output mechanism to emit complete lines and force-flush long partial lines, ensuring reasoning is visible in real-time even without newlines. This improves user experience during reasoning sessions. * fix: skip KawaiiSpinner when TUI handles tool progress In the interactive CLI, the agent runs with quiet_mode=True and tool_progress_callback set. The quiet_mode condition triggered KawaiiSpinner for every tool call, but the TUI was already handling progress display via the spinner widget. The KawaiiSpinner writes carriage-return animation through StdoutProxy, triggering run_in_terminal() erase/redraw cycles on every flush. These redundant cycles cause the status bar to ghost into terminal scrollback. The thinking spinner already had this guard (checks thinking_callback). This extends the same pattern to the three tool spinner creation sites: concurrent tools, delegate_task, and single tool execution.	2026-03-25 08:33:44 -07:00
Teknium	7ca22ea11b	fix(compression): restore sane defaults and cap summary at 12K tokens - threshold: 0.80 → 0.50 (compress at 50%, not 80%) - target_ratio: 0.40 → 0.20, now relative to threshold not total context (20% of 50% = 10% of context as tail budget) - summary ceiling: 32K → 12K (Gemini can't output more than ~12K) - Updated DEFAULT_CONFIG, config display, example config, and tests	2026-03-24 18:48:47 -07:00
Teknium	9231a335d4	fix(compression): replace dead summary_target_tokens with ratio-based scaling (#2554 ) The summary_target_tokens parameter was accepted in the constructor, stored on the instance, and never used — the summary budget was always computed from hardcoded module constants (_SUMMARY_RATIO=0.20, _MAX_SUMMARY_TOKENS=8000). This caused two compounding problems: 1. The config value was silently ignored, giving users no control over post-compression size. 2. Fixed budgets (20K tail, 8K summary cap) didn't scale with context window size. Switching from a 1M-context model to a 200K model would trigger compression that nuked 350K tokens of conversation history down to ~30K. Changes: - Replace summary_target_tokens with summary_target_ratio (default 0.40) which sets the post-compression target as a fraction of context_length. Tail token budget and summary cap now scale proportionally: MiniMax 200K → ~80K post-compression GPT-5 1M → ~400K post-compression - Change threshold_percent default: 0.50 → 0.80 (don't fire until 80% of context is consumed) - Change protect_last_n default: 4 → 20 (preserve ~10 full turns) - Summary token cap scales to 5% of context (was fixed 8K), capped at 32K ceiling - Read target_ratio and protect_last_n from config.yaml compression section (both are now configurable) - Remove hardcoded summary_target_tokens=500 from run_agent.py - Add 5 new tests for ratio scaling, clamping, and new defaults	2026-03-24 17:45:49 -07:00
Teknium	8ee4f32819	fix(gateway): use TERMINAL_CWD for context file discovery, not process cwd The gateway process runs from the hermes-agent install directory, so os.getcwd() picks up the repo's AGENTS.md (16k chars) and other dev context files — inflating input tokens by ~10k on every gateway message. Fix: use TERMINAL_CWD (which the gateway sets to MESSAGING_CWD or $HOME) as the cwd for build_context_files_prompt(). In CLI mode, TERMINAL_CWD is the user's actual project directory, so behavior is unchanged. Before: gateway 15-20k input tokens, CLI 6-8k After: gateway ~6-8k input tokens (same as CLI) Reported by keri on Discord.	2026-03-24 17:30:33 -07:00
Teknium	618f15dda9	fix: reorder setup wizard providers — OpenRouter first Move OpenRouter to position 1 in the setup wizard's provider list to match hermes model ordering. Update default selection index and fix test expectations for the new ordering. Setup order: OpenRouter → Nous Portal → Codex → Custom → ...	2026-03-24 12:50:24 -07:00
Teknium	481915587e	fix: update context pressure warnings and token estimates after compaction Reset context pressure warnings and update last_prompt_tokens and last_completion_tokens in the context compressor to prevent stale values from causing excessive warnings and re-triggering compression. This change ensures accurate pressure calculations following the compaction process.	2026-03-24 09:25:10 -07:00
Teknium	ad1bf16f28	chore: remove all remaining mini-swe-agent references Complete cleanup after dropping the mini-swe-agent submodule (PR #2804): - Remove MSWEA_SILENT_STARTUP and MSWEA_GLOBAL_CONFIG_DIR env var settings from cli.py, run_agent.py, hermes_cli/main.py, doctor.py - Remove mini-swe-agent health check from hermes doctor - Remove 'minisweagent' from logger suppression lists - Remove litellm/typer/platformdirs from requirements.txt - Remove mini-swe-agent install steps from install.ps1 (Windows) - Remove mini-swe-agent install steps from website docs - Update all stale comments/docstrings referencing mini-swe-agent in terminal_tool.py, tools/__init__.py, code_execution_tool.py, environments/README.md, environments/agent_loop.py - Remove mini_swe_runner from pyproject.toml py-modules (still exists as standalone script for RL training use) - Shrink test_minisweagent_path.py to empty stub The orphaned mini-swe-agent/ directory on disk needs manual removal: rm -rf mini-swe-agent/	2026-03-24 08:19:23 -07:00
Teknium	02b38b93cb	refactor: remove mini-swe-agent dependency — inline Docker/Modal backends (#2804 ) Drop the mini-swe-agent git submodule. All terminal backends now use hermes-agent's own environment implementations directly. Docker backend: - Inline the `docker run -d` container startup (was 15 lines in minisweagent's DockerEnvironment). Our wrapper already handled execute(), cleanup(), security hardening, volumes, and resource limits. Modal backend: - Import swe-rex's ModalDeployment directly instead of going through minisweagent's 90-line passthrough wrapper. - Bake the _AsyncWorker pattern (from environments/patches.py) directly into ModalEnvironment for Atropos compatibility without monkey-patching. Cleanup: - Remove minisweagent_path.py (submodule path resolution helper) - Remove submodule init/install from install.sh and setup-hermes.sh - Remove mini-swe-agent from .gitmodules - environments/patches.py is now a no-op (kept for backward compat) - terminal_tool.py no longer does sys.path hacking for minisweagent - mini_swe_runner.py guards imports (optional, for RL training only) - Update all affected tests to mock the new direct subprocess calls - Update README.md, CONTRIBUTING.md No functionality change — all Docker, Modal, local, SSH, Singularity, and Daytona backends behave identically. 6093 tests pass.	2026-03-24 07:30:25 -07:00
Teknium	a312ee7b4c	fix(agent): ensure first delta is fired during reasoning updates - Added calls to `_fire_first_delta()` in the `AIAgent` class to ensure that the first delta is triggered for both reasoning and thinking updates. This change improves the handling of delta events during streaming, enhancing the responsiveness of the agent's reasoning capabilities.	2026-03-24 07:16:20 -07:00
Teknium	87e2626cf6	feat(cli, agent): add tool generation callback for streaming updates - Introduced `_on_tool_gen_start` in `HermesCLI` to indicate when tool-call arguments are being generated, enhancing user feedback during streaming. - Updated `AIAgent` to support a new `tool_gen_callback`, notifying the display layer when tool generation starts, allowing for better user experience during large payloads. - Ensured that the callback is triggered appropriately during streaming events to prevent user interface freezing.	2026-03-23 23:10:58 -07:00
Teknium	942f6eac94	fix(run_agent): ensure proper cleanup of OpenAI client in background review Added explicit closing of the OpenAI/httpx client in the background review process to prevent "Event loop is closed" errors. This change ensures that the client is properly cleaned up when the review agent is no longer needed, enhancing stability and resource management.	2026-03-22 16:03:16 -07:00
Teknium	bfe4baa6ed	chore: remove unused imports, dead code, and stale comments Mechanical cleanup — no behavior changes. Unused imports removed: - model_tools.py: import os - run_agent.py: OPENROUTER_MODELS_URL, get_model_context_length - cli.py: Table, VERSION, RELEASE_DATE, resolve_toolset, get_skill_commands - terminal_tool.py: signal, uuid, tempfile, set_interrupt_event, DANGEROUS_PATTERNS, _load_permanent_allowlist, _detect_dangerous_command Dead code removed: - toolsets.py: print_toolset_tree() (zero callers) - browser_tool.py: _get_session_name() (never called) Stale comments removed: - toolsets.py: duplicated/garbled comment line - web_tools.py: 3 aspirational TODO comments from early development	2026-03-22 08:33:34 -07:00
MacroAnarchy	f9c2ad48c2	fix: defer streaming iteration linebreak to prevent blank line stacking Follow-up to `669c60a6` (cherry-pick of PR #2187, fixes #2177). The original fix emits a "\n\n" delta immediately after every _execute_tool_calls() invocation. When the model runs multiple consecutive tool iterations before producing text (common with search → read → analyze flows), each iteration appends its own paragraph break, resulting in 4-6+ blank lines before the actual response. Replace the immediate delta with a deferred flag (_stream_needs_break). _fire_stream_delta() checks the flag and prepends a single "\n\n" only when the first real text delta arrives, so multiple back-to-back tool iterations still produce exactly one paragraph break.	2026-03-22 04:59:12 -07:00
Teknium	34be3f8be6	revert: remove trailing empty assistant message stripping Reverts the sanitizer addition from PR #2466 (originally #2129). We already have _empty_content_retries handling for reasoning-only responses. The trailing strip risks silently eating valid messages and is redundant with existing empty-content handling.	2026-03-22 04:55:34 -07:00
ygd58	5407d12bc6	fix(agent): strip trailing empty assistant messages before API calls to prevent prefill rejection	2026-03-22 04:38:17 -07:00
Bartok Moltbot	e6a708aa04	fix(io): catch ValueError in _SafeWriter for closed file handles (#2428 ) When subagents run in ThreadPoolExecutor threads, the shared stdout handle can close between thread teardown and KawaiiSpinner cleanup. Python raises ValueError (not OSError) for I/O operations on closed files: ValueError: I/O operation on closed file The _SafeWriter class was only catching OSError, missing this case. Changes: - Add ValueError to exception handling in write(), flush(), and isatty() - Update docstring to document the ThreadPoolExecutor teardown scenario Fixes #2428	2026-03-22 04:38:17 -07:00
Teknium	8cb7864110	fix: resolve garbled ANSI escape codes in status printouts (#2262 ) (#2448 ) Two related root causes for the '?[33mTool progress: NEW?[0m' garbling reported on kitty, alacritty, ghostty and gnome-console: 1. /verbose label printing used self.console.print() with Rich markup ([yellow]...[/]). self.console is a plain Rich Console() whose output goes directly to sys.stdout, which patch_stdout's StdoutProxy intercepts and mangles raw ANSI sequences. 2. Context pressure status lines (e.g. 'approaching compaction') from AIAgent._safe_print() had the same problem -- _safe_print() was a @staticmethod that always called builtin print(), bypassing the prompt_toolkit renderer entirely. Fix: - Convert AIAgent._safe_print() from @staticmethod to an instance method that delegates to self._print_fn (defaults to builtin print, preserving all non-CLI behaviour). - After the CLI creates its AIAgent instance, wire self.agent._print_fn to the existing _cprint() helper which routes through prompt_toolkit.print_formatted_text(ANSI(text)). - Rewrite the /verbose feedback labels to use hermes_cli.colors.Colors ANSI constants in f-strings and emit them via _cprint() directly, removing the Rich-markup-inside-patch_stdout anti-pattern. Fixes #2262 Co-authored-by: Animesh Mishra <animesh.m.7523@gmail.com>	2026-03-22 04:07:06 -07:00
Teknium	306e67f32d	fix: fail fast when explicit provider has no API key instead of silent OpenRouter fallback (#2445 ) When a non-OpenRouter provider (e.g. minimax, anthropic) is set in config.yaml but its API key is missing, Hermes silently fell back to OpenRouter, causing confusing 404 errors. Now checks if the user explicitly configured a provider before falling back. Explicit providers raise RuntimeError with a clear message naming the missing env var. Auto/openrouter/custom providers still fall through to OpenRouter as before. Three code paths fixed: - run_agent.py AIAgent.__init__ — main client initialization - auxiliary_client.py call_llm — sync auxiliary calls - auxiliary_client.py call_llm_streaming — async auxiliary calls Based on PR #2272 by @StefanIsMe. Applied manually to fix a pconfig NameError in the original and extend to call_llm_streaming. Co-authored-by: StefanIsMe <StefanIsMe@users.noreply.github.com>	2026-03-22 03:59:29 -07:00
Teknium	669c60a6bb	fix: add iteration boundary linebreak to prevent stream concatenation Cherry-picked from PR #2187 by @devorun. Fixes #2177. When streaming is enabled, text before and after tool calls gets concatenated without separation. Adds a paragraph break delta after _execute_tool_calls() so stream consumers insert proper whitespace between iteration boundaries.	2026-03-21 19:19:26 -07:00
teyrebaz33	bd49bce278	fix(prompt-caching): skip top-level cache_control on role:tool for OpenRouter On the native Anthropic Messages API path, convert_messages_to_anthropic() moves top-level cache_control on role:tool messages inside the tool_result block. On OpenRouter (chat_completions), no such conversion happens — the unexpected top-level field causes a silent hang on the second tool call. Add native_anthropic parameter to _apply_cache_marker() and apply_anthropic_cache_control(). When False (OpenRouter), role:tool messages are skipped entirely. When True (native Anthropic), existing behaviour is preserved. Fixes #2362	2026-03-21 16:54:43 -07:00
Teknium	525caadd8c	fix: prevent Anthropic token leaking to third-party anthropic_messages providers (salvage #2383 ) (#2389 ) * fix: prevent Anthropic token fallback leaking to third-party anthropic_messages providers When provider is minimax/alibaba/etc and MINIMAX_API_KEY is not set, the code fell back to resolve_anthropic_token() sending Anthropic OAuth credentials to third-party endpoints, causing 401 errors. Now only provider=="anthropic" triggers the fallback. Generalizes the Alibaba-specific guard from #1739 to all non-Anthropic providers. * fix: set provider='anthropic' in credential refresh tests Follow-up for cherry-picked PR #2383 — existing tests didn't set agent.provider, which the new guard requires to allow Anthropic token refresh. --------- Co-authored-by: 0xbyt4 <35742124+0xbyt4@users.noreply.github.com>	2026-03-21 16:42:46 -07:00
Teknium	2a5f86ed6d	Merge pull request #2343 from NousResearch/hermes/hermes-31d7db3b feat: @ context references + Honcho config fixes	2026-03-21 16:10:19 -07:00
Teknium	2c06ec5f51	fix: correct provider check for Alibaba model identity injection PR #2314 checked for provider names 'alibaba-coding-plan' and 'alibaba-coding-plan-anthropic' which don't exist in the provider registry. The provider is always 'alibaba' — the condition was dead code. Fixed to check self.provider == 'alibaba'.	2026-03-21 09:46:26 -07:00
crazywriter1	523d8c38f9	fix: Alibaba/DashScope: preserve model dots (qwen3.5-plus) and fix 401 auth When using Alibaba (DashScope) with an anthropic-compatible endpoint, model names like qwen3.5-plus were being normalized to qwen3-5-plus. Alibaba's API expects the dot. Added preserve_dots parameter to normalize_model_name() and build_anthropic_kwargs(). Also fixed 401 auth: when provider is alibaba or base_url contains dashscope/aliyuncs, use only the resolved API key (DASHSCOPE_API_KEY). Never fall back to resolve_anthropic_token(), and skip Anthropic credential refresh for DashScope endpoints. Cherry-picked from PR #1748 by crazywriter1. Fixes #1739.	2026-03-21 09:38:04 -07:00
Teknium	e183744cb5	feat(honcho): instance-local config via HERMES_HOME, default session strategy to per-directory - Add resolve_config_path(): checks $HERMES_HOME/honcho.json first, falls back to ~/.honcho/config.json. Enables isolated Hermes instances with independent Honcho credentials and settings. - Update CLI and doctor to use resolved path instead of hardcoded global. - Change default session_strategy from per-session to per-directory. Part 1 of #1962 by @erosika.	2026-03-21 09:34:00 -07:00
Teknium	9305164bf3	fix: add None-entry guard to tool_calls loops in run_agent, batch_runner, and mini_swe_runner (#2316 ) Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>	2026-03-21 07:20:41 -07:00
ygd58	2ea8054304	fix(agent): inject model identity for Alibaba Coding Plan to work around API returning wrong model name	2026-03-21 07:11:08 -07:00
Teknium	58b52dfb2f	Merge pull request #2303 from NousResearch/hermes/hermes-31d7db3b fix: remove synthetic error message injection, fix session resume after repeated failures	2026-03-21 07:03:54 -07:00
Teknium	779619f742	fix: remove synthetic error message injection, fix session resume after repeated failures Two changes to the error handler in the agent loop: 1. Remove the 'if not pending_handled' block that injected fake [System error during processing: ...] messages into conversation history. These polluted history, burned tokens on retries, and could violate role alternation by injecting as role=user. The tool_calls error-result path (role=tool) is preserved. 2. Append the error final_response as an assistant message when hitting the iteration limit, so session resume doesn't produce consecutive user messages.	2026-03-21 06:33:05 -07:00
Teknium	96a5e9fc11	feat(agent): add summary of successful tool actions in review agent Enhanced the review agent to scan and summarize successful tool actions, providing users with a compact overview of updates made during the review process. This includes actions related to memory and user profiles, improving user feedback and interaction clarity.	2026-03-21 06:31:59 -07:00
Teknium	885f88fb60	feat(agent): suppress non-forced output during post-response housekeeping - Introduced a mechanism to mute output after the main response is delivered, ensuring that subsequent tool calls run without cluttering the CLI. - Redirected stdout to devnull during the review agent's execution to prevent any print statements from interfering with the main CLI display. - Added a new attribute `_mute_post_response` to manage output suppression effectively.	2026-03-20 23:54:42 -07:00
Teknium	761a8ad39a	fix(display): show provider and endpoint in API error messages (#2266 ) fix(display): show provider and endpoint in API error messages	2026-03-20 21:57:53 -07:00
Test	d560f2d1f2	fix(display): show provider and endpoint in API error messages When an API call fails, the error output now shows the provider name, model, and endpoint URL so users can immediately identify which service rejected their request. Auth errors (401/403) get actionable guidance: check key validity, model access, and OpenRouter credits link. Before: 'API call failed (attempt 1/3): PermissionDeniedError' After: 'API call failed (attempt 1/3): PermissionDeniedError Provider: openrouter Model: anthropic/claude-sonnet-4 Endpoint: https://openrouter.ai/api/v1 Your API key was rejected by the provider. Check: • Is the key valid? Run: hermes setup • Does your account have access to anthropic/claude-sonnet-4? • Check credits: https://openrouter.ai/settings/credits'	2026-03-20 21:06:55 -07:00
Teknium	45058b4105	feat: replace inline nudges with background memory/skill review (#2235 ) Remove the memory and skill nudges that were appended directly to user messages, causing backward-looking system instructions to compete with forward-looking user tasks. Found in 43% of user messages across 15 sessions, with confirmed cases of the agent spending tool calls on nudge responses before starting the user's actual request. Replace with a background review agent that runs AFTER the main agent finishes responding: - Spawns a background thread with a snapshot of the conversation - Uses the main model (not auxiliary) for high-precision memory/skill work - Only has memory + skill_manage tools (5 iteration budget) - Shares the memory store for direct writes - Never modifies the main conversation history - Never competes with the user's task for model attention - Zero latency impact (runs after response is delivered) - Same token cost (processes the same context, just on a separate track) The trigger conditions are unchanged (every 10 user turns for memory, after 10+ tool iterations for skills). Only the execution path changes: from inline injection to background fork. Closes #2227. Co-authored-by: Test <test@test.com>	2026-03-20 18:51:31 -07:00
Teknium	4263350c5b	fix: remove post-compression file-read history injection (#2226 ) Remove the [Files already read — do NOT re-read these] user message that was injected into the conversation after context compression. This message used role='user' for system-generated content, creating a fake user turn that confused models about conversation state and could contribute to task-redo behavior. The file_tools.py read tracker (warn on 3rd consecutive read, block on 4th+) already handles re-read prevention inline without injecting synthetic messages. Closes #2224. Co-authored-by: Test <test@test.com>	2026-03-20 14:54:25 -07:00
Test	76bc27199f	fix(cli, agent): improve streaming handling and state management - Updated _stream_delta method in HermesCLI to handle None values, flushing the stream and resetting state for clean tool execution. - Enhanced quiet mode handling in AIAgent to ensure proper display closure before tool execution, preventing display issues with intermediate streamed content. These changes improve the robustness of the streaming functionality and ensure a smoother user experience during tool interactions.	2026-03-20 10:02:42 -07:00
Test	55ce601502	fix: 6 bugs in model metadata, reasoning detection, and delegate tool Cherry-picked from PR #2169 by @0xbyt4. 1. _strip_provider_prefix: skip Ollama model:tag names (qwen:0.5b) 2. Fuzzy match: remove reverse direction that made claude-sonnet-4 resolve to 1M instead of 200K 3. _has_content_after_think_block: reuse _strip_think_blocks() to handle all tag variants (thinking, reasoning, REASONING_SCRATCHPAD) 4. models.dev lookup: elif→if so nous provider also queries models.dev 5. Disk cache fallback: use 5-min TTL instead of full hour so network is retried soon 6. Delegate build: wrap child construction in try/finally so _last_resolved_tool_names is always restored on exception	2026-03-20 08:52:37 -07:00
Teknium	c52353cf8a	feat: context pressure warnings for CLI and gateway (#2159 ) * feat: context pressure warnings for CLI and gateway User-facing notifications as context approaches the compaction threshold. Warnings fire at 60% and 85% of the way to compaction — relative to the configured compression threshold, not the raw context window. CLI: Formatted line with a progress bar showing distance to compaction. Cyan at 60% (approaching), bold yellow at 85% (imminent). ◐ context ▰▰▰▰▰▰▰▰▰▰▰▰▱▱▱▱▱▱▱▱ 60% to compaction 100k threshold (50%) · approaching compaction ⚠ context ▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▱▱▱ 85% to compaction 100k threshold (50%) · compaction imminent Gateway: Plain-text notification sent to the user's chat via the new status_callback mechanism (asyncio.run_coroutine_threadsafe bridge, same pattern as step_callback). Does NOT inject into the message stream. The LLM never sees these warnings. Flags reset after each compaction cycle. Files changed: - agent/display.py — format_context_pressure(), format_context_pressure_gateway() - run_agent.py — status_callback param, _context_50/70_warned flags, _emit_context_pressure(), flag reset in _compress_context() - gateway/run.py — _status_callback_sync bridge, wired to AIAgent - tests/test_context_pressure.py — 23 tests * Merge remote-tracking branch 'origin/main' into hermes/hermes-7ea545bf --------- Co-authored-by: Test <test@test.com>	2026-03-20 08:37:36 -07:00
Teknium	88643a1ba9	feat: overhaul context length detection with models.dev and provider-aware resolution (#2158 ) Replace the fragile hardcoded context length system with a multi-source resolution chain that correctly identifies context windows per provider. Key changes: - New agent/models_dev.py: Fetches and caches the models.dev registry (3800+ models across 100+ providers with per-provider context windows). In-memory cache (1hr TTL) + disk cache for cold starts. - Rewritten get_model_context_length() resolution chain: 0. Config override (model.context_length) 1. Custom providers per-model context_length 2. Persistent disk cache 3. Endpoint /models (local servers) 4. Anthropic /v1/models API (max_input_tokens, API-key only) 5. OpenRouter live API (existing, unchanged) 6. Nous suffix-match via OpenRouter (dot/dash normalization) 7. models.dev registry lookup (provider-aware) 8. Thin hardcoded defaults (broad family patterns) 9. 128K fallback (was 2M) - Provider-aware context: same model now correctly resolves to different context windows per provider (e.g. claude-opus-4.6: 1M on Anthropic, 128K on GitHub Copilot). Provider name flows through ContextCompressor. - DEFAULT_CONTEXT_LENGTHS shrunk from 80+ entries to ~16 broad patterns. models.dev replaces the per-model hardcoding. - CONTEXT_PROBE_TIERS changed from [2M, 1M, 512K, 200K, 128K, 64K, 32K] to [128K, 64K, 32K, 16K, 8K]. Unknown models no longer start at 2M. - hermes model: prompts for context_length when configuring custom endpoints. Supports shorthand (32k, 128K). Saved to custom_providers per-model config. - custom_providers schema extended with optional models dict for per-model context_length (backward compatible). - Nous Portal: suffix-matches bare IDs (claude-opus-4-6) against OpenRouter's prefixed IDs (anthropic/claude-opus-4.6) with dot/dash normalization. Handles all 15 current Nous models. - Anthropic direct: queries /v1/models for max_input_tokens. Only works with regular API keys (sk-ant-api*), not OAuth tokens. Falls through to models.dev for OAuth users. Tests: 5574 passed (18 new tests for models_dev + updated probe tiers) Docs: Updated configuration.md context length section, AGENTS.md Co-authored-by: Test <test@test.com>	2026-03-20 06:04:33 -07:00
Teknium	b7b585656b	Merge pull request #2110 from NousResearch/hermes/hermes-5d6932ba fix: session reset + custom provider model switch + honcho base_url	2026-03-20 06:01:44 -07:00
Teknium	aa6416399e	Merge pull request #2161 from NousResearch/hermes/hermes-6757a563 fix(display): show spinners and tool progress during streaming mode	2026-03-20 05:17:55 -07:00
Test	b313751acf	fix(display): show spinners and tool progress during streaming mode When streaming was enabled, two visual feedback mechanisms were completely suppressed: 1. The thinking spinner (TUI toolbar) was skipped because the entire spinner block was gated on 'not self._has_stream_consumers()'. Now the thinking_callback fires in streaming mode too — the raw KawaiiSpinner is still skipped (would conflict with streamed tokens) but the TUI toolbar widget works fine alongside streaming. 2. Tool progress lines (the ┊ feed) were invisible because _vprint was blanket-suppressed when stream consumers existed. But during tool execution, no tokens are actively streaming, so printing is safe. Added an _executing_tools flag that _vprint respects to allow output during tool execution even with stream consumers registered.	2026-03-20 05:14:42 -07:00
Test	b1d05dfe8b	fix(openai): route api.openai.com to Responses API for GPT-5.x Based on PR #1859 by @magi-morph (too stale to cherry-pick, reimplemented). GPT-5.x models reject tool calls + reasoning_effort on /v1/chat/completions with a 400 error directing to /v1/responses. This auto-detects api.openai.com in the base URL and switches to codex_responses mode in three places: - AIAgent.__init__: upgrades chat_completions → codex_responses - _try_activate_fallback(): same routing for fallback model - runtime_provider.py: _detect_api_mode_for_url() for both custom provider and openrouter runtime resolution paths Also extracts _is_direct_openai_url() helper to replace the inline check in _max_tokens_param().	2026-03-20 05:09:41 -07:00
Test	5822711ae6	fix: complete session reset — missing compressor counters + test Follow-up to PR #2101 (InB4DevOps). Adds three missing context compressor resets in reset_session_state(): - compression_count (displayed in status bar) - last_total_tokens - _context_probed (stale context-error flag) Also fixes the test_cli_new_session.py prompt_toolkit mock (missing auto_suggest stub) and adds a regression test for #2099 that verifies all token counters and compressor state are zeroed on /new.	2026-03-20 04:35:17 -07:00
Teknium	b19f5133c3	Merge pull request #2118 from NousResearch/hermes/hermes-e83093f0 feat: show reasoning/thinking blocks when show_reasoning is enabled	2026-03-20 04:35:12 -07:00

1 2 3 4 5 ...

368 Commits