hermes-agent

Author	SHA1	Message	Date
Teknium	e314833c9d	feat(display): configurable tool preview length -- show full paths by default (#3841 ) Tool call previews (paths, commands, queries) were hardcoded to truncate at 35-40 chars across CLI spinners, completion lines, and gateway progress messages. Users could not see full file paths in tool output. New config option: display.tool_preview_length (default 0 = no limit). Set a positive number to truncate at that length. Changes: - display.py: module-level _tool_preview_max_len with getter/setter; build_tool_preview() and get_cute_tool_message() _trunc/_path respect it - cli.py: reads config at startup, spinner widget respects config - gateway/run.py: reads config per-message, progress callback respects config - run_agent.py: removed redundant 30-char quiet-mode spinner truncation - config.py: added display.tool_preview_length to DEFAULT_CONFIG Reported by kriskaminski	2026-03-29 18:02:42 -07:00
Teknium	fcd1645223	feat(skills): support external skill directories via config (#3678 ) Add skills.external_dirs config option — a list of additional directories to scan for skills alongside ~/.hermes/skills/. External dirs are read-only: skill creation/editing always writes to the local dir. Local skills take precedence when names collide. This lets users share skills across tools/agents without copying them into Hermes's own directory (e.g. ~/.agents/skills, /shared/team-skills). Changes: - agent/skill_utils.py: add get_external_skills_dirs() and get_all_skills_dirs() - agent/prompt_builder.py: scan external dirs in build_skills_system_prompt() - tools/skills_tool.py: _find_all_skills() and skill_view() search external dirs; security check recognizes configured external dirs as trusted - agent/skill_commands.py: /skill slash commands discover external skills - hermes_cli/config.py: add skills.external_dirs to DEFAULT_CONFIG - cli-config.yaml.example: document the option - tests/agent/test_external_skills.py: 11 tests covering discovery, precedence, deduplication, and skill_view for external skills Requested by community member primco.	2026-03-29 00:33:30 -07:00
Teknium	bea49e02a3	fix: route /bg spinner through TUI widget to prevent status bar collision (#3643 ) Background agent's KawaiiSpinner wrote \r-based animation and stop() messages through StdoutProxy, colliding with prompt_toolkit's status bar. Two fixes: - display.py: use isinstance(out, StdoutProxy) instead of fragile hasattr+name check for detecting prompt_toolkit's stdout wrapper - cli.py: silence bg agent's raw spinner (_print_fn=no-op) and route thinking updates through the TUI widget only when no foreground agent is active; clear spinner text in finally block with same guard Closes #2718 Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>	2026-03-28 17:29:37 -07:00
Teknium	9a364f2805	fix: cap percentage displays at 100% in stats, gateway, and memory tool (#3599 ) Salvage of PR #3533 (binhnt92). Follow-up to #3480 — applies min(100, ...) to 5 remaining unclamped percentage display sites in context_compressor, cli /stats, gateway /stats, and memory tool. Defensive clamps now that the root cause (estimation heuristic) was already removed in #3480. Co-Authored-By: binhnt92 <binhnt92@users.noreply.github.com>	2026-03-28 14:55:18 -07:00
Teknium	839d9d7471	feat(agent): configurable timeouts for auxiliary LLM calls via config.yaml (#3597 ) Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml instead of hardcoded values. Users with slow local models (Ollama, llama.cpp) can now increase timeouts for compression, vision, session search, etc. Defaults: - auxiliary.compression.timeout: 120s (was hardcoded 45s) - auxiliary.vision.timeout: 30s (unchanged) - all other aux tasks: 30s (was hardcoded 30s) - title_generator: 30s (was hardcoded 15s) call_llm/async_call_llm now auto-resolve timeout from config when not explicitly passed. Callers can still override with an explicit timeout arg. Based on PR #3406 by alanfwilliams. Converted from env vars to config.yaml per project conventions. Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>	2026-03-28 14:35:28 -07:00
Teknium	2dd286c162	fix: write models.dev disk cache atomically (#3588 ) Use atomic_json_write() from utils.py instead of plain open()/json.dump() for the models.dev disk cache. Prevents corrupted cache if the process is killed mid-write — _load_disk_cache() silently returns {} on corrupt JSON, losing all model metadata until the next successful API fetch. Co-authored-by: memosr <memosr@users.noreply.github.com>	2026-03-28 14:20:30 -07:00
Teknium	831e8ba0e5	feat: tool-use enforcement + strip budget warnings from history (#3528 ) Cherry-pick of feat/gpt-tool-steering with modifications: 1. Tool-use enforcement prompt (refactored from GPT-specific): - Renamed GPT_TOOL_USE_GUIDANCE -> TOOL_USE_ENFORCEMENT_GUIDANCE - Added TOOL_USE_ENFORCEMENT_MODELS tuple: ('gpt', 'codex') - Injection logic now checks against the tuple instead of hardcoding 'gpt' — adding new model families is a one-line change - Addresses models describing actions instead of making tool calls 2. Budget warning history stripping: - _strip_budget_warnings_from_history() strips _budget_warning JSON keys and [BUDGET WARNING: ...] text from tool results at the start of run_conversation() - Prevents old budget warnings from poisoning subsequent turns Based on PR #3479 by teknium1.	2026-03-28 07:38:36 -07:00
Teknium	15cfd20820	fix: cap context pressure percentage at 100% in display (#3480 ) * fix: cap context pressure percentage at 100% in display The forward-looking token estimate can overshoot the compaction threshold (e.g. a large tool result pushes it from 70% to 109% in one step). The progress bar was already capped via min(), but pct_int was not — causing the user to see '109% to compaction' which is confusing. Cap pct_int at 100 in both CLI and gateway display functions. Reported by @JoshExile82. * refactor: use real API token counts for compression decisions Replace the rough chars/3 estimation with actual prompt_tokens + completion_tokens from the API response. The estimation was needed to predict whether tool results would push context past the threshold, but the default 50% threshold leaves ample headroom — if tool results push past it, the next API call reports real usage and triggers compression then. This removes all estimation from the compression and context pressure paths, making both 100% data-driven from provider-reported token counts. Also removes the dead _msg_count_before_tools variable.	2026-03-27 21:42:09 -07:00
Teknium	83043e9aa8	fix: add timeout to subprocess calls in context_references (#3469 ) _expand_git_reference() and _rg_files() called subprocess.run() without a timeout. On a large repository, @diff, @staged, or @git:N references could hang the agent indefinitely while git or ripgrep processes slow output. - Add timeout=30 to git subprocess in _expand_git_reference() with a user-friendly error message on TimeoutExpired - Add timeout=10 to rg subprocess in _rg_files() returning None on timeout (falls back to os.walk folder listing) Co-authored-by: memosr.eth <96793918+memosr@users.noreply.github.com>	2026-03-27 17:51:14 -07:00
Teknium	658692799d	fix: guard aux LLM calls against None content + reasoning fallback + retry (salvage #3389 ) (#3449 ) Salvage of #3389 by @binhnt92 with reasoning fallback and retry logic added on top. All 7 auxiliary LLM call sites now use extract_content_or_reasoning() which mirrors the main agent loop's behavior: extract content, strip think blocks, fall back to structured reasoning fields, retry on empty. Closes #3389.	2026-03-27 15:28:19 -07:00
Teknium	ab09f6b568	feat: curate HF model picker with OpenRouter analogues (#3440 ) Show only agentic models that map to OpenRouter defaults: Qwen/Qwen3.5-397B-A17B ↔ qwen/qwen3.5-plus Qwen/Qwen3.5-35B-A3B ↔ qwen/qwen3.5-35b-a3b deepseek-ai/DeepSeek-V3.2 ↔ deepseek/deepseek-chat moonshotai/Kimi-K2.5 ↔ moonshotai/kimi-k2.5 MiniMaxAI/MiniMax-M2.5 ↔ minimax/minimax-m2.5 zai-org/GLM-5 ↔ z-ai/glm-5 XiaomiMiMo/MiMo-V2-Flash ↔ xiaomi/mimo-v2-pro moonshotai/Kimi-K2-Thinking ↔ moonshotai/kimi-k2-thinking Users can still pick any HF model via Enter custom model name.	2026-03-27 13:54:46 -07:00
Teknium	6f11ff53ad	fix(anthropic): use model-native output limits instead of hardcoded 16K (#3426 ) The Anthropic adapter defaulted to max_tokens=16384 when no explicit value was configured. This severely limits thinking-enabled models where thinking tokens count toward max_tokens: - Claude Opus 4.6 supports 128K output but was capped at 16K - Claude Sonnet 4.6 supports 64K output but was capped at 16K With extended thinking (adaptive or budget-based), the model could exhaust the entire 16K on reasoning, leaving zero tokens for the actual response. This caused two user-visible errors: - 'Response truncated (finish_reason=length)' — thinking consumed most tokens - 'Response only contains think block with no content' — thinking consumed all Fix: add _ANTHROPIC_OUTPUT_LIMITS lookup table (sourced from Anthropic docs and Cline's model catalog) and use the model's actual output limit as the default. Unknown future models default to 128K (the current maximum). Also adds context_length clamping: if the user configured a smaller context window (e.g. custom endpoint), max_tokens is clamped to context_length - 1 to avoid exceeding the window. Closes #2706	2026-03-27 13:02:52 -07:00
Teknium	fd8c465e42	feat: add Hugging Face as a first-class inference provider (#3419 ) Salvage of PR #1747 (original PR #1171 by @davanstrien) onto current main. Registers Hugging Face Inference Providers (router.huggingface.co/v1) as a named provider: - hermes chat --provider huggingface (or --provider hf) - 18 curated open models via hermes model picker - HF_TOKEN in ~/.hermes/.env - OpenAI-compatible endpoint with automatic failover (Groq, Together, SambaNova, etc.) Files: auth.py, models.py, main.py, setup.py, config.py, model_metadata.py, .env.example, 5 docs pages, 17 new tests. Co-authored-by: Daniel van Strien <davanstrien@gmail.com>	2026-03-27 12:41:59 -07:00
Teknium	5127567d5d	perf(ttft): cache skills prompt with shared skill_utils module (salvage #3366 ) (#3421 ) Two-layer caching for build_skills_system_prompt(): 1. In-process LRU (OrderedDict, max 8) — same-process: 546ms → <1ms 2. Disk snapshot (.skills_prompt_snapshot.json) — cold start: 297ms → 103ms Key improvements over original PR #3366: - Extract shared logic into agent/skill_utils.py (parse_frontmatter, skill_matches_platform, get_disabled_skill_names, extract_skill_conditions, extract_skill_description, iter_skill_index_files) - tools/skills_tool.py delegates to shared module — zero code duplication - Proper LRU eviction via OrderedDict.move_to_end + popitem(last=False) - Cache invalidation on all skill mutation paths: - skill_manage tool (in-conversation writes) - hermes skills install (CLI hub) - hermes skills uninstall (CLI hub) - Automatic via mtime/size manifest on cold start prompt_builder.py no longer imports tools.skills_tool (avoids pulling in the entire tool registry chain at prompt build time). 6301 tests pass, 0 failures. Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-27 10:54:02 -07:00
Teknium	e0dbbdb2c9	fix: eliminate 'Event loop is closed' / 'Press ENTER to continue' during idle (#3398 ) The OpenAI SDK's AsyncHttpxClientWrapper.__del__ schedules aclose() via asyncio.get_running_loop().create_task(). When an AsyncOpenAI client is garbage-collected while prompt_toolkit's event loop is running (the common CLI idle state), the aclose() task runs on prompt_toolkit's loop but the underlying TCP transport is bound to a different (dead) worker loop. The transport's self._loop.call_soon() then raises RuntimeError('Event loop is closed'), which prompt_toolkit surfaces as the disruptive 'Unhandled exception in event loop ... Press ENTER to continue...' error. Three-layer fix: 1. neuter_async_httpx_del(): Monkey-patches __del__ to a no-op at CLI startup before any AsyncOpenAI clients are created. Safe because cached clients are explicitly cleaned via _force_close_async_httpx, and uncached clients' TCP connections are cleaned by the OS on exit. 2. Custom asyncio exception handler: Installed on prompt_toolkit's event loop to silently suppress 'Event loop is closed' RuntimeError. Defense-in-depth for SDK upgrades that might change the class name. 3. cleanup_stale_async_clients(): Called after each agent turn (when the agent thread joins) to proactively evict cache entries whose event loop is closed, preventing stale clients from accumulating.	2026-03-27 09:45:25 -07:00
Teknium	5a1e2a307a	perf(ttft): salvage easy-win startup optimizations from #3346 (#3395 ) * perf(ttft): dedupe shared tool availability checks * perf(ttft): short-circuit vision auto-resolution * perf(ttft): make Claude Code version detection lazy * perf(ttft): reuse loaded toolsets for skills prompt --------- Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-27 07:49:44 -07:00
Teknium	3f95e741a7	fix: validate empty user messages to prevent Anthropic API 400 errors (#3322 ) When user messages have empty content (e.g., Discord @mention-only messages, unrecognized attachments), the Anthropic API rejects the request with 'user messages must have non-empty content'. Changes: - anthropic_adapter.py: Add empty content validation for user messages (string and list formats), matching the existing pattern for assistant and tool messages. Empty content gets '(empty message)' placeholder. - discord.py: Defense-in-depth check at gateway layer to catch empty messages before they enter session history. - Add 4 regression tests covering empty string, whitespace-only, empty list, and empty text block scenarios. Fixes #3143 Co-authored-by: Bartok9 <bartok9@users.noreply.github.com>	2026-03-26 19:24:03 -07:00
Teknium	ad764d3513	fix(auxiliary): catch ImportError from build_anthropic_client in vision auto-detection (#3312 ) _try_anthropic() caught ImportError on the module import (line 667-669) but not on the build_anthropic_client() call (line 696). When the anthropic_adapter module imports fine but the anthropic SDK is missing, build_anthropic_client() raises ImportError at call time. This escaped _try_anthropic() entirely, killing get_available_vision_backends() and cascading to 7 test failures: - 4 setup wizard tests hit unexpected 'Configure vision:' prompt - 3 codex-auth-as-vision tests failed check_vision_requirements() The fix wraps the build_anthropic_client call in try/except ImportError, returning (None, None) when the SDK is unavailable — consistent with the existing guard at the top of the function.	2026-03-26 18:21:59 -07:00
Teknium	0375b2a0d7	fix(gateway): silence background agent terminal output (#3297 ) * fix(gateway): silence flush agent terminal output quiet_mode=True only suppresses AIAgent init messages. Tool call output still leaks to the terminal through _safe_print → _print_fn during session reset/expiry. Since #2670 injected live memory state into the flush prompt, the flush agent now reliably calls memory tools — making the output leak noticeable for the first time. Set _print_fn to a no-op so the background flush is fully silent. * test(gateway): add test for flush agent terminal silence + fix dotenv mock - Add TestFlushAgentSilenced: verifies _print_fn is set to a no-op on the flush agent so tool output never leaks to the terminal - Fix pre-existing test failures: replace patch('run_agent.AIAgent') with sys.modules mock to avoid importing run_agent (requires openai) - Add autouse _mock_dotenv fixture so all tests in this file run without the dotenv package installed * fix(display): route KawaiiSpinner output through print_fn to fully silence flush agent The previous fix set tmp_agent._print_fn = no-op on the flush agent but spinner output and quiet-mode cute messages bypassed _print_fn entirely: - KawaiiSpinner captured sys.stdout at __init__ and wrote directly to it - quiet-mode tool results used builtin print() instead of _safe_print() Add optional print_fn parameter to KawaiiSpinner.__init__; _write routes through it when set. Pass self._print_fn to all spinner construction sites in run_agent.py and change the quiet-mode cute message print to _safe_print. The existing gateway fix (tmp_agent._print_fn = lambda) now propagates correctly through both paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(gateway): silence hygiene and compression background agents Two more background AIAgent instances in the gateway were created with quiet_mode=True but without _print_fn = no-op, causing tool output to leak to the terminal: - _hyg_agent (in-turn hygiene memory agent) - tmp_agent (_compress_context path) Apply the same _print_fn no-op pattern used for the flush agent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * chore(display): remove unused _last_flush_time from KawaiiSpinner Attribute was set but never read; upstream already removed it. Leftover from conflict resolution during rebase onto upstream/main. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-26 17:40:31 -07:00
Teknium	a8e02c7d49	fix: align Nous Portal model slugs with OpenRouter naming (#3253 ) Nous Portal now passes through OpenRouter model names and routes from there. Update the static fallback model list and auxiliary client default to use OpenRouter-format slugs (provider/model) instead of bare names. - _PROVIDER_MODELS['nous']: full OpenRouter catalog - _NOUS_MODEL: google/gemini-3-flash-preview (was gemini-3-flash) - Updated 4 test assertions for the new default model name	2026-03-26 13:49:43 -07:00
Teknium	2c719f0701	fix(auth): migrate OAuth token refresh to platform.claude.com with fallback (#3246 ) Anthropic migrated their OAuth infrastructure from console.anthropic.com to platform.claude.com (Claude Code v2.1.81+). Update _refresh_oauth_token() to try the new endpoint first, falling back to the old one for tokens issued before the migration. Also switches Content-Type from application/x-www-form-urlencoded to application/json to match current Claude Code behavior. Salvaged from PR #2741 by kshitijk4poor.	2026-03-26 13:26:56 -07:00
Teknium	43af094ae3	fix(agent): include tool tokens in preflight estimate, guard context probe persistence (#3164 ) Two improvements salvaged from PR #2600 (paraddox): 1. Preflight compression now counts tool schema tokens alongside system prompt and messages. With 50+ tools enabled, schemas can add 20-30K tokens that were previously invisible to the estimator, delaying compression until the API rejected the request. 2. Context probe persistence guard: when the agent steps down context tiers after a context-length error, only provider-confirmed numeric limits (parsed from the error message) are cached to disk. Guessed fallback tiers from get_next_probe_tier() stay in-memory only, preventing wrong values from polluting the persistent cache. Co-authored-by: paraddox <paraddox@users.noreply.github.com>	2026-03-26 02:00:50 -07:00
Teknium	cbf195e806	chore: fix 154 f-strings, simplify getattr/URL patterns, remove dead code (#3119 ) Three categories of cleanup, all zero-behavioral-change: 1. F-strings without placeholders (154 fixes across 29 files) - Converted f'...' to '...' where no {expression} was present - Heaviest files: run_agent.py (24), cli.py (20), honcho_integration/cli.py (34) 2. Simplify defensive patterns in run_agent.py - Added explicit self._is_anthropic_oauth = False in __init__ (before the api_mode branch that conditionally sets it) - Replaced 7x getattr(self, '_is_anthropic_oauth', False) with direct self._is_anthropic_oauth (attribute always initialized now) - Added _is_openrouter_url() and _is_anthropic_url() helper methods - Replaced 3 inline 'openrouter' in self._base_url_lower checks 3. Remove dead code in small files - hermes_cli/claw.py: removed unused 'total' computation - tools/fuzzy_match.py: removed unused strip_indent() function and pattern_stripped variable Full test suite: 6184 passed, 0 failures E2E PTY: banner clean, tool calls work, zero garbled ANSI	2026-03-25 19:47:58 -07:00
Teknium	7258311710	fix: stop recursive AGENTS.md walk, load top-level only (#3110 ) The recursive os.walk for AGENTS.md in subdirectories was undesired. Only load AGENTS.md from the working directory root, matching the behavior of CLAUDE.md and .cursorrules.	2026-03-25 18:30:45 -07:00
Teknium	910ec7eb38	chore: remove unused Hermes-native PKCE OAuth flow (#3107 ) Remove run_hermes_oauth_login(), refresh_hermes_oauth_token(), read_hermes_oauth_credentials(), _save_hermes_oauth_credentials(), _generate_pkce(), and associated constants/credential file path. This code was added in `63e88326` but never wired into any user-facing flow (setup wizard, hermes model, or any CLI command). Neither clawdbot/OpenClaw nor opencode implement PKCE for Anthropic — both use setup-token or API keys. Dead code that was never tested in production. Also removes the credential resolution step that checked ~/.hermes/.anthropic_oauth.json (step 3 in resolve_anthropic_token), renumbering remaining steps.	2026-03-25 18:29:47 -07:00
ctlst	281100e2df	fix(agent): prevent AsyncOpenAI/httpx cross-loop deadlock in gateway mode (#2701 ) In gateway mode, async tools (vision_analyze, web_extract, session_search) deadlock because _run_async() spawns a thread with asyncio.run(), creating a new event loop, but _get_cached_client() returns an AsyncOpenAI client bound to a different loop. httpx.AsyncClient cannot work across event loop boundaries, causing await client.chat.completions.create() to hang forever. Fix: include the event loop identity in the async client cache key so each loop gets its own AsyncOpenAI instance. Also fix session_search_tool.py which had its own broken asyncio.run()-in-thread pattern — now uses the centralized _run_async() bridge.	2026-03-25 17:31:56 -07:00
Teknium	d218cf9118	fix(skills): handle null metadata in skill frontmatter frontmatter.get("metadata", {}) returns None (not {}) when the key exists with a null value, crashing build_skills_system_prompt with AttributeError: 'NoneType' object has no attribute 'get'. Made-with: Cursor	2026-03-25 16:06:15 -07:00
Teknium	77bcaba2d7	refactor: consolidate get_hermes_home() and parse_reasoning_effort() (#3062 ) Centralizes two widely-duplicated patterns into hermes_constants.py: 1. get_hermes_home() — Path resolution for ~/.hermes (HERMES_HOME env var) - Was copy-pasted inline across 30+ files as: Path(os.getenv("HERMES_HOME", Path.home() / ".hermes")) - Now defined once in hermes_constants.py (zero-dependency module) - hermes_cli/config.py re-exports it for backward compatibility - Removed local wrapper functions in honcho_integration/client.py, tools/website_policy.py, tools/tirith_security.py, hermes_cli/uninstall.py 2. parse_reasoning_effort() — Reasoning effort string validation - Was copy-pasted in cli.py, gateway/run.py, cron/scheduler.py - Same validation logic: check against (xhigh, high, medium, low, minimal, none) - Now defined once in hermes_constants.py, called from all 3 locations - Warning log for unknown values kept at call sites (context-specific) 31 files changed, net +31 lines (125 insertions, 94 deletions) Full test suite: 6179 passed, 0 failed	2026-03-25 15:54:28 -07:00
Teknium	14cf2d85ca	fix(display): guard isatty() against closed streams via _is_tty property (#3056 ) In gateway/Telegram mode, the stdout fd can be closed by executor thread cleanup. KawaiiSpinner.stop() called isatty() on the closed fd, raising ValueError and masking the original error. Instead of a point fix, add a _is_tty property that centralizes the closed-stream guard — both _animate() and stop() now use it. Follows the same (ValueError, OSError) pattern already in _write(). Inspired by PR #2632 by bot-deo88.	2026-03-25 15:15:15 -07:00
Teknium	8bb1d15da4	chore: remove ~100 unused imports across 55 files (#3016 ) Automated cleanup via pyflakes + autoflake with manual review. Changes: - Removed unused stdlib imports (os, sys, json, pathlib.Path, etc.) - Removed unused typing imports (List, Dict, Any, Optional, Tuple, Set, etc.) - Removed unused internal imports (hermes_cli.auth, hermes_cli.config, etc.) - Fixed cli.py: removed 8 shadowed banner imports (imported from hermes_cli.banner then immediately redefined locally — only build_welcome_banner is actually used) - Added noqa comments to imports that appear unused but serve a purpose: - Re-exports (gateway/session.py SessionResetPolicy, tools/terminal_tool.py is_interrupted/_interrupt_event) - SDK presence checks in try/except (daytona, fal_client, discord) - Test mock targets (auxiliary_client.py Path, mcp_config.py get_hermes_home) Zero behavioral changes. Full test suite passes (6162/6162, 2 pre-existing streaming test failures unrelated to this change).	2026-03-25 15:02:03 -07:00
Teknium	0dcd6ab2f2	fix: status bar shows 26K instead of 260K for token counts with trailing zeros (#3024 ) format_token_count_compact() used unconditional rstrip("0") to clean up decimal trailing zeros (e.g. "1.50" → "1.5"), but this also stripped meaningful trailing zeros from whole numbers ("260" → "26", "100" → "1"). Guard the strip behind a decimal-point check. Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-25 12:45:58 -07:00
Teknium	114e636b7d	fix(display): suppress KawaiiSpinner animation under patch_stdout (#2994 ) When the CLI is active, sys.stdout is prompt_toolkit's StdoutProxy which queues writes and injects newlines around each flush(). This causes every \r spinner frame to land on its own line instead of overwriting the previous one, producing visible flickering where the spinner and status bar repeatedly swap positions. The CLI already renders spinner state via a dedicated TUI widget (_spinner_text / get_spinner_text), so KawaiiSpinner's \r-based loop is redundant under StdoutProxy. Detect the proxy and suppress the animation entirely — the thread still runs to preserve start()/stop() semantics. Also removes the 0.4s flush rate-limit workaround that was papering over the same issue, and cleans up the unused _last_flush_time attribute. Salvaged from PR #2908 by Mibayy (fixed _raw -> raw detection, dropped unrelated bundled changes).	2026-03-25 10:46:54 -07:00
Teknium	20cc1731f4	perf(prompt_builder): avoid redundant file re-read for skill conditions (#2992 ) build_skills_system_prompt() was calling _read_skill_conditions() which re-read each SKILL.md file to extract conditional activation fields. The frontmatter was already parsed by _parse_skill_file() earlier in the same loop. Extract conditions inline from the existing frontmatter dict instead, saving one file read per skill (~80+ on a typical setup). Salvaged from PR #2827 by InB4DevOps.	2026-03-25 10:39:27 -07:00
Teknium	7ca22ea11b	fix(compression): restore sane defaults and cap summary at 12K tokens - threshold: 0.80 → 0.50 (compress at 50%, not 80%) - target_ratio: 0.40 → 0.20, now relative to threshold not total context (20% of 50% = 10% of context as tail budget) - summary ceiling: 32K → 12K (Gemini can't output more than ~12K) - Updated DEFAULT_CONFIG, config display, example config, and tests	2026-03-24 18:48:47 -07:00
Teknium	9231a335d4	fix(compression): replace dead summary_target_tokens with ratio-based scaling (#2554 ) The summary_target_tokens parameter was accepted in the constructor, stored on the instance, and never used — the summary budget was always computed from hardcoded module constants (_SUMMARY_RATIO=0.20, _MAX_SUMMARY_TOKENS=8000). This caused two compounding problems: 1. The config value was silently ignored, giving users no control over post-compression size. 2. Fixed budgets (20K tail, 8K summary cap) didn't scale with context window size. Switching from a 1M-context model to a 200K model would trigger compression that nuked 350K tokens of conversation history down to ~30K. Changes: - Replace summary_target_tokens with summary_target_ratio (default 0.40) which sets the post-compression target as a fraction of context_length. Tail token budget and summary cap now scale proportionally: MiniMax 200K → ~80K post-compression GPT-5 1M → ~400K post-compression - Change threshold_percent default: 0.50 → 0.80 (don't fire until 80% of context is consumed) - Change protect_last_n default: 4 → 20 (preserve ~10 full turns) - Summary token cap scales to 5% of context (was fixed 8K), capped at 32K ceiling - Read target_ratio and protect_last_n from config.yaml compression section (both are now configurable) - Remove hardcoded summary_target_tokens=500 from run_agent.py - Add 5 new tests for ratio scaling, clamping, and new defaults	2026-03-24 17:45:49 -07:00
Teknium	618f15dda9	fix: reorder setup wizard providers — OpenRouter first Move OpenRouter to position 1 in the setup wizard's provider list to match hermes model ordering. Update default selection index and fix test expectations for the new ordering. Setup order: OpenRouter → Nous Portal → Codex → Custom → ...	2026-03-24 12:50:24 -07:00
Guts	2d8fad8230	fix(context): restrict @ references to safe workspace paths (#2601 ) fix(context): block @ references from reading secrets outside the workspace. Defaults allowed_root to cwd, adds sensitive file blocklist.	2026-03-23 06:40:05 -07:00
Teknium	1f21ef7488	fix(cli): prevent 'Press ENTER to continue...' on exit When AsyncOpenAI clients are garbage-collected after the event loop closes, their AsyncHttpxClientWrapper.__del__ tries to schedule aclose() on the dead loop, causing RuntimeError: Event loop is closed. prompt_toolkit catches this as an unhandled exception and shows 'Press ENTER to continue...' which blocks CLI exit. Fix: Add shutdown_cached_clients() to auxiliary_client.py that marks all cached async clients' underlying httpx transport as CLOSED before GC runs. This prevents __del__ from attempting the aclose() call. - _force_close_async_httpx(): sets httpx AsyncClient._state to CLOSED - shutdown_cached_clients(): iterates _client_cache, closes sync clients normally and marks async clients as closed - Also fix stale client eviction in _get_cached_client to mark evicted async clients as closed (was just del-ing them, triggering __del__) - Call shutdown_cached_clients() from _run_cleanup() in cli.py	2026-03-22 15:31:54 -07:00
Teknium	72a6d7dffe	fix(model_metadata): skip endpoint probe for known providers (Copilot context bug) (#2507 ) The context length resolver was querying the /models endpoint for known providers like GitHub Copilot, which returns a provider-imposed limit (128k) instead of the model's actual context window (400k for gpt-5.4). Since this check happened before the models.dev lookup, the wrong value won every time. Fix: - Add api.githubcopilot.com and models.github.ai to _URL_TO_PROVIDER - Skip the endpoint metadata probe for known providers — their /models data is unreliable for context length. models.dev has the correct per-provider values. Reported by danny [DUMB] — gpt-5.4 via Copilot was resolving to 128k instead of the correct 400k from models.dev.	2026-03-22 08:15:06 -07:00
Teknium	306e67f32d	fix: fail fast when explicit provider has no API key instead of silent OpenRouter fallback (#2445 ) When a non-OpenRouter provider (e.g. minimax, anthropic) is set in config.yaml but its API key is missing, Hermes silently fell back to OpenRouter, causing confusing 404 errors. Now checks if the user explicitly configured a provider before falling back. Explicit providers raise RuntimeError with a clear message naming the missing env var. Auto/openrouter/custom providers still fall through to OpenRouter as before. Three code paths fixed: - run_agent.py AIAgent.__init__ — main client initialization - auxiliary_client.py call_llm — sync auxiliary calls - auxiliary_client.py call_llm_streaming — async auxiliary calls Based on PR #2272 by @StefanIsMe. Applied manually to fix a pconfig NameError in the original and extend to call_llm_streaming. Co-authored-by: StefanIsMe <StefanIsMe@users.noreply.github.com>	2026-03-22 03:59:29 -07:00
Teknium	ec22635b47	Merge pull request #2403 from NousResearch/hermes/hermes-31d7db3b fix(model_metadata): use /v1/props endpoint for llama.cpp context detection	2026-03-21 18:07:41 -07:00
Teknium	29d0541ac9	fix(model_metadata): use /v1/props endpoint for llama.cpp context detection Recent versions of llama.cpp moved the server properties endpoint from /props to /v1/props (consistent with the /v1 API prefix convention). The server-type detection path and the n_ctx reading path both used the old /props URL, which returns 404 on current builds. This caused the allocated context window size to fall back to a hardcoded default, resulting in an incorrect (too small) value being displayed in the TUI context bar. Fix: try /v1/props first, fall back to /props for backward compatibility with older llama.cpp builds. Both paths are now handled gracefully.	2026-03-21 18:07:18 -07:00
0xbyt4	dbc25a386e	fix: auxiliary client skips expired Codex JWT and propagates Anthropic OAuth flag Two bugs in the auxiliary provider auto-detection chain: 1. Expired Codex JWT blocks the auto chain: _read_codex_access_token() returned any stored token without checking expiry, preventing fallback to working providers. Now decodes JWT exp claim and returns None for expired tokens. 2. Auxiliary Anthropic client missing OAuth identity transforms: _AnthropicCompletionsAdapter always called build_anthropic_kwargs with is_oauth=False, causing 400 errors for OAuth tokens. Now detects OAuth tokens via _is_oauth_token() and propagates the flag through the adapter chain. Cherry-picked from PR #2378 by 0xbyt4. Fixed test_api_key_no_oauth_flag to mock resolve_anthropic_token directly (env var alone was insufficient).	2026-03-21 17:36:25 -07:00
Teknium	1d28b4699b	fix(redact): safely handle non-string inputs (salvage #2369 ) fix(redact): safely handle non-string inputs (salvage #2369)	2026-03-21 17:10:14 -07:00
aydnOktay	40c9a13476	fix(redact): safely handle non-string inputs redact_sensitive_text() now returns early for None and coerces other non-string values to str before applying regex-based redaction, preventing TypeErrors in logging/tool-output paths. Cherry-picked from PR #2369 by aydnOktay.	2026-03-21 16:55:02 -07:00
teyrebaz33	bd49bce278	fix(prompt-caching): skip top-level cache_control on role:tool for OpenRouter On the native Anthropic Messages API path, convert_messages_to_anthropic() moves top-level cache_control on role:tool messages inside the tool_result block. On OpenRouter (chat_completions), no such conversion happens — the unexpected top-level field causes a silent hang on the second tool call. Add native_anthropic parameter to _apply_cache_marker() and apply_anthropic_cache_control(). When False (OpenRouter), role:tool messages are skipped entirely. When True (native Anthropic), existing behaviour is preserved. Fixes #2362	2026-03-21 16:54:43 -07:00
Teknium	55510cbad2	Merge pull request #2388 from NousResearch/hermes/hermes-31d7db3b fix(provider): prevent Anthropic fallback from inheriting non-Anthropic base_url + fix(update): reset on stash conflict	2026-03-21 16:20:08 -07:00
Teknium	f8fb61d4ad	fix(provider): prevent Anthropic fallback from inheriting non-Anthropic base_url Only honor config.model.base_url for Anthropic resolution when config.model.provider is actually "anthropic". This prevents a Codex (or other provider) base_url from leaking into Anthropic runtime and auxiliary client paths, which would send requests to the wrong endpoint. Closes #2384	2026-03-21 16:16:17 -07:00
Teknium	2a5f86ed6d	Merge pull request #2343 from NousResearch/hermes/hermes-31d7db3b feat: @ context references + Honcho config fixes	2026-03-21 16:10:19 -07:00
Teknium	da44c196b6	feat: @ context references — inline file, folder, diff, git, and URL injection Add @file:path, @folder:dir, @diff, @staged, @git:N, and @url: references that expand inline before the message reaches the LLM. Supports line ranges (@file:main.py:10-50), token budget enforcement (soft warn at 25%, hard block at 50%), and path sandboxing for gateway. Core module from PR #2090 by @kshitijk4poor. CLI and gateway wiring rewritten against current main. Fixed asyncio.run() crash when called from inside a running event loop (gateway). Closes #682.	2026-03-21 15:57:13 -07:00

1 2 3 4 5 ...

263 Commits