hermes-agent

Author	SHA1	Message	Date
Teknium	0dcd6ab2f2	fix: status bar shows 26K instead of 260K for token counts with trailing zeros (#3024 ) format_token_count_compact() used unconditional rstrip("0") to clean up decimal trailing zeros (e.g. "1.50" → "1.5"), but this also stripped meaningful trailing zeros from whole numbers ("260" → "26", "100" → "1"). Guard the strip behind a decimal-point check. Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-25 12:45:58 -07:00
Teknium	114e636b7d	fix(display): suppress KawaiiSpinner animation under patch_stdout (#2994 ) When the CLI is active, sys.stdout is prompt_toolkit's StdoutProxy which queues writes and injects newlines around each flush(). This causes every \r spinner frame to land on its own line instead of overwriting the previous one, producing visible flickering where the spinner and status bar repeatedly swap positions. The CLI already renders spinner state via a dedicated TUI widget (_spinner_text / get_spinner_text), so KawaiiSpinner's \r-based loop is redundant under StdoutProxy. Detect the proxy and suppress the animation entirely — the thread still runs to preserve start()/stop() semantics. Also removes the 0.4s flush rate-limit workaround that was papering over the same issue, and cleans up the unused _last_flush_time attribute. Salvaged from PR #2908 by Mibayy (fixed _raw -> raw detection, dropped unrelated bundled changes).	2026-03-25 10:46:54 -07:00
Teknium	20cc1731f4	perf(prompt_builder): avoid redundant file re-read for skill conditions (#2992 ) build_skills_system_prompt() was calling _read_skill_conditions() which re-read each SKILL.md file to extract conditional activation fields. The frontmatter was already parsed by _parse_skill_file() earlier in the same loop. Extract conditions inline from the existing frontmatter dict instead, saving one file read per skill (~80+ on a typical setup). Salvaged from PR #2827 by InB4DevOps.	2026-03-25 10:39:27 -07:00
Teknium	7ca22ea11b	fix(compression): restore sane defaults and cap summary at 12K tokens - threshold: 0.80 → 0.50 (compress at 50%, not 80%) - target_ratio: 0.40 → 0.20, now relative to threshold not total context (20% of 50% = 10% of context as tail budget) - summary ceiling: 32K → 12K (Gemini can't output more than ~12K) - Updated DEFAULT_CONFIG, config display, example config, and tests	2026-03-24 18:48:47 -07:00
Teknium	9231a335d4	fix(compression): replace dead summary_target_tokens with ratio-based scaling (#2554 ) The summary_target_tokens parameter was accepted in the constructor, stored on the instance, and never used — the summary budget was always computed from hardcoded module constants (_SUMMARY_RATIO=0.20, _MAX_SUMMARY_TOKENS=8000). This caused two compounding problems: 1. The config value was silently ignored, giving users no control over post-compression size. 2. Fixed budgets (20K tail, 8K summary cap) didn't scale with context window size. Switching from a 1M-context model to a 200K model would trigger compression that nuked 350K tokens of conversation history down to ~30K. Changes: - Replace summary_target_tokens with summary_target_ratio (default 0.40) which sets the post-compression target as a fraction of context_length. Tail token budget and summary cap now scale proportionally: MiniMax 200K → ~80K post-compression GPT-5 1M → ~400K post-compression - Change threshold_percent default: 0.50 → 0.80 (don't fire until 80% of context is consumed) - Change protect_last_n default: 4 → 20 (preserve ~10 full turns) - Summary token cap scales to 5% of context (was fixed 8K), capped at 32K ceiling - Read target_ratio and protect_last_n from config.yaml compression section (both are now configurable) - Remove hardcoded summary_target_tokens=500 from run_agent.py - Add 5 new tests for ratio scaling, clamping, and new defaults	2026-03-24 17:45:49 -07:00
Teknium	618f15dda9	fix: reorder setup wizard providers — OpenRouter first Move OpenRouter to position 1 in the setup wizard's provider list to match hermes model ordering. Update default selection index and fix test expectations for the new ordering. Setup order: OpenRouter → Nous Portal → Codex → Custom → ...	2026-03-24 12:50:24 -07:00
Guts	2d8fad8230	fix(context): restrict @ references to safe workspace paths (#2601 ) fix(context): block @ references from reading secrets outside the workspace. Defaults allowed_root to cwd, adds sensitive file blocklist.	2026-03-23 06:40:05 -07:00
Teknium	1f21ef7488	fix(cli): prevent 'Press ENTER to continue...' on exit When AsyncOpenAI clients are garbage-collected after the event loop closes, their AsyncHttpxClientWrapper.__del__ tries to schedule aclose() on the dead loop, causing RuntimeError: Event loop is closed. prompt_toolkit catches this as an unhandled exception and shows 'Press ENTER to continue...' which blocks CLI exit. Fix: Add shutdown_cached_clients() to auxiliary_client.py that marks all cached async clients' underlying httpx transport as CLOSED before GC runs. This prevents __del__ from attempting the aclose() call. - _force_close_async_httpx(): sets httpx AsyncClient._state to CLOSED - shutdown_cached_clients(): iterates _client_cache, closes sync clients normally and marks async clients as closed - Also fix stale client eviction in _get_cached_client to mark evicted async clients as closed (was just del-ing them, triggering __del__) - Call shutdown_cached_clients() from _run_cleanup() in cli.py	2026-03-22 15:31:54 -07:00
Teknium	72a6d7dffe	fix(model_metadata): skip endpoint probe for known providers (Copilot context bug) (#2507 ) The context length resolver was querying the /models endpoint for known providers like GitHub Copilot, which returns a provider-imposed limit (128k) instead of the model's actual context window (400k for gpt-5.4). Since this check happened before the models.dev lookup, the wrong value won every time. Fix: - Add api.githubcopilot.com and models.github.ai to _URL_TO_PROVIDER - Skip the endpoint metadata probe for known providers — their /models data is unreliable for context length. models.dev has the correct per-provider values. Reported by danny [DUMB] — gpt-5.4 via Copilot was resolving to 128k instead of the correct 400k from models.dev.	2026-03-22 08:15:06 -07:00
Teknium	306e67f32d	fix: fail fast when explicit provider has no API key instead of silent OpenRouter fallback (#2445 ) When a non-OpenRouter provider (e.g. minimax, anthropic) is set in config.yaml but its API key is missing, Hermes silently fell back to OpenRouter, causing confusing 404 errors. Now checks if the user explicitly configured a provider before falling back. Explicit providers raise RuntimeError with a clear message naming the missing env var. Auto/openrouter/custom providers still fall through to OpenRouter as before. Three code paths fixed: - run_agent.py AIAgent.__init__ — main client initialization - auxiliary_client.py call_llm — sync auxiliary calls - auxiliary_client.py call_llm_streaming — async auxiliary calls Based on PR #2272 by @StefanIsMe. Applied manually to fix a pconfig NameError in the original and extend to call_llm_streaming. Co-authored-by: StefanIsMe <StefanIsMe@users.noreply.github.com>	2026-03-22 03:59:29 -07:00
Teknium	ec22635b47	Merge pull request #2403 from NousResearch/hermes/hermes-31d7db3b fix(model_metadata): use /v1/props endpoint for llama.cpp context detection	2026-03-21 18:07:41 -07:00
Teknium	29d0541ac9	fix(model_metadata): use /v1/props endpoint for llama.cpp context detection Recent versions of llama.cpp moved the server properties endpoint from /props to /v1/props (consistent with the /v1 API prefix convention). The server-type detection path and the n_ctx reading path both used the old /props URL, which returns 404 on current builds. This caused the allocated context window size to fall back to a hardcoded default, resulting in an incorrect (too small) value being displayed in the TUI context bar. Fix: try /v1/props first, fall back to /props for backward compatibility with older llama.cpp builds. Both paths are now handled gracefully.	2026-03-21 18:07:18 -07:00
0xbyt4	dbc25a386e	fix: auxiliary client skips expired Codex JWT and propagates Anthropic OAuth flag Two bugs in the auxiliary provider auto-detection chain: 1. Expired Codex JWT blocks the auto chain: _read_codex_access_token() returned any stored token without checking expiry, preventing fallback to working providers. Now decodes JWT exp claim and returns None for expired tokens. 2. Auxiliary Anthropic client missing OAuth identity transforms: _AnthropicCompletionsAdapter always called build_anthropic_kwargs with is_oauth=False, causing 400 errors for OAuth tokens. Now detects OAuth tokens via _is_oauth_token() and propagates the flag through the adapter chain. Cherry-picked from PR #2378 by 0xbyt4. Fixed test_api_key_no_oauth_flag to mock resolve_anthropic_token directly (env var alone was insufficient).	2026-03-21 17:36:25 -07:00
Teknium	1d28b4699b	fix(redact): safely handle non-string inputs (salvage #2369 ) fix(redact): safely handle non-string inputs (salvage #2369)	2026-03-21 17:10:14 -07:00
aydnOktay	40c9a13476	fix(redact): safely handle non-string inputs redact_sensitive_text() now returns early for None and coerces other non-string values to str before applying regex-based redaction, preventing TypeErrors in logging/tool-output paths. Cherry-picked from PR #2369 by aydnOktay.	2026-03-21 16:55:02 -07:00
teyrebaz33	bd49bce278	fix(prompt-caching): skip top-level cache_control on role:tool for OpenRouter On the native Anthropic Messages API path, convert_messages_to_anthropic() moves top-level cache_control on role:tool messages inside the tool_result block. On OpenRouter (chat_completions), no such conversion happens — the unexpected top-level field causes a silent hang on the second tool call. Add native_anthropic parameter to _apply_cache_marker() and apply_anthropic_cache_control(). When False (OpenRouter), role:tool messages are skipped entirely. When True (native Anthropic), existing behaviour is preserved. Fixes #2362	2026-03-21 16:54:43 -07:00
Teknium	55510cbad2	Merge pull request #2388 from NousResearch/hermes/hermes-31d7db3b fix(provider): prevent Anthropic fallback from inheriting non-Anthropic base_url + fix(update): reset on stash conflict	2026-03-21 16:20:08 -07:00
Teknium	f8fb61d4ad	fix(provider): prevent Anthropic fallback from inheriting non-Anthropic base_url Only honor config.model.base_url for Anthropic resolution when config.model.provider is actually "anthropic". This prevents a Codex (or other provider) base_url from leaking into Anthropic runtime and auxiliary client paths, which would send requests to the wrong endpoint. Closes #2384	2026-03-21 16:16:17 -07:00
Teknium	2a5f86ed6d	Merge pull request #2343 from NousResearch/hermes/hermes-31d7db3b feat: @ context references + Honcho config fixes	2026-03-21 16:10:19 -07:00
Teknium	da44c196b6	feat: @ context references — inline file, folder, diff, git, and URL injection Add @file:path, @folder:dir, @diff, @staged, @git:N, and @url: references that expand inline before the message reaches the LLM. Supports line ranges (@file:main.py:10-50), token budget enforcement (soft warn at 25%, hard block at 50%), and path sandboxing for gateway. Core module from PR #2090 by @kshitijk4poor. CLI and gateway wiring rewritten against current main. Fixed asyncio.run() crash when called from inside a running event loop (gateway). Closes #682.	2026-03-21 15:57:13 -07:00
Teknium	292d12bed4	fix: case-insensitive model family matching + compressor init logging Two fixes for local model context detection: 1. Hardcoded DEFAULT_CONTEXT_LENGTHS matching was case-sensitive. 'qwen' didn't match 'Qwen3.5-9B-Q4_K_M.gguf' because of the capital Q. Now uses model.lower() for comparison. 2. Added compressor initialization logging showing the detected context_length, threshold, model, provider, and base_url. This makes turn-1 compression bugs diagnosable from logs — previously there was no log of what context length was detected.	2026-03-21 10:47:44 -07:00
crazywriter1	523d8c38f9	fix: Alibaba/DashScope: preserve model dots (qwen3.5-plus) and fix 401 auth When using Alibaba (DashScope) with an anthropic-compatible endpoint, model names like qwen3.5-plus were being normalized to qwen3-5-plus. Alibaba's API expects the dot. Added preserve_dots parameter to normalize_model_name() and build_anthropic_kwargs(). Also fixed 401 auth: when provider is alibaba or base_url contains dashscope/aliyuncs, use only the resolved API key (DASHSCOPE_API_KEY). Never fall back to resolve_anthropic_token(), and skip Anthropic credential refresh for DashScope endpoints. Cherry-picked from PR #1748 by crazywriter1. Fixes #1739.	2026-03-21 09:38:04 -07:00
Teknium	e75f58420c	feat(compressor): major context compaction improvements Six improvements to reduce information loss during context compression, informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot: 1. Structured summary template — sections for Goal, Progress (Done/ In Progress/Blocked), Key Decisions, Relevant Files, Next Steps, and Critical Context. Forces the summarizer to preserve each category instead of writing a vague paragraph. 2. Iterative summary updates — on re-compression, the prompt says 'PRESERVE existing info, ADD new progress, UPDATE done/in-progress status.' Previous summary is stored and fed back to the summarizer so accumulated context survives across multiple compactions. 3. Token-budget tail protection — instead of fixed protect_last_n=4, walks backward keeping ~20K tokens of recent context. Adapts to message density: sessions with big tool results protect fewer messages, short exchanges protect more. Falls back to protect_last_n for small conversations. 4. Tool output pruning (pre-pass) — before the expensive LLM summary, replaces old tool result contents with a placeholder. This is free (no LLM call) and can save 30%+ of context by itself. 5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20% of compressed content tokens (clamped to 2000-8000). A 50-turn conversation gets more summary space than a 10-turn one. 6. Richer summarizer input — tool calls now include arguments (up to 500 chars) and tool results keep up to 3000 chars (was 1500). The summarizer sees 'terminal(git status) → M src/config.py' instead of just '[Tool calls: terminal]'.	2026-03-21 08:14:14 -07:00
Teknium	2da79b13df	feat: priority-based context file selection + CLAUDE.md support (#2301 ) Previously, all project context files (AGENTS.md, .cursorrules, .hermes.md) were loaded and concatenated into the system prompt. This bloated the prompt with potentially redundant or conflicting instructions. Now only ONE project context type is loaded, using priority order: 1. .hermes.md / HERMES.md (walk to git root) 2. AGENTS.md / agents.md (recursive directory walk) 3. CLAUDE.md / claude.md (cwd only, NEW) 4. .cursorrules / .cursor/rules/*.mdc (cwd only) SOUL.md from HERMES_HOME remains independent and always loads. Also adds CLAUDE.md as a recognized context file format, matching the convention popularized by Claude Code. Refactored the monolithic function into four focused helpers: _load_hermes_md, _load_agents_md, _load_claude_md, _load_cursorrules. Tests: replaced 1 coexistence test with 10 new tests covering priority ordering, CLAUDE.md loading, case sensitivity, injection blocking.	2026-03-21 06:26:20 -07:00
Teknium	214047dee1	fix(display): suppress spinner animation in non-TTY environments (#2216 ) fix(display): suppress spinner animation in non-TTY environments	2026-03-20 12:55:54 -07:00
Evey	6e2be3356d	fix(display): suppress spinner animation in non-TTY environments In Docker/systemd/piped environments, the KawaiiSpinner animation generates ~500 log lines per tool call. Now checks isatty() and falls back to clean [tool]/[done] log lines in non-TTY contexts. Interactive CLI behavior unchanged. Based on work by 42-evey in PR #2203.	2026-03-20 12:52:21 -07:00
Teknium	8e884fb3f1	Merge pull request #2215 from NousResearch/hermes/hermes-31d7db3b fix: infer provider from base URL for models.dev context length lookup	2026-03-20 12:52:07 -07:00
Test	59074df021	fix: add dashscope-intl.aliyuncs.com to URL-to-provider mapping The official international DashScope endpoint uses dashscope-intl.aliyuncs.com (per Alibaba docs), which the substring match on dashscope.aliyuncs.com misses because of the hyphenated prefix.	2026-03-20 12:51:39 -07:00
0xbyt4	0ce35a117c	fix: crash on None entry in tool_calls list during Anthropic conversion (#2209 ) If a tool_calls list contains a None entry (from malformed API response, compression artifact, or corrupt session replay), convert_messages_to_anthropic crashes with AttributeError: 'NoneType' object has no attribute 'get'. Skip None and non-dict entries in the tool_calls iteration. Found via chaos/fuzz testing with mixed valid/invalid tool_call entries.	2026-03-20 12:01:42 -07:00
Test	900e848522	fix: infer provider from base URL for models.dev context length lookup Custom endpoint users (DashScope/Alibaba, Z.AI, Kimi, DeepSeek, etc.) get wrong context lengths because their provider resolves as "openrouter" or "custom", skipping the models.dev lookup entirely. For example, qwen3.5-plus on DashScope falls to the generic "qwen" hardcoded default (131K) instead of the correct 1M. Add _infer_provider_from_url() that maps known API hostnames to their models.dev provider IDs. When the explicit provider is generic (openrouter/custom/empty), infer from the base URL before the models.dev lookup. This resolves context lengths correctly for DashScope, Z.AI, Kimi, MiniMax, DeepSeek, and Nous endpoints without requiring users to manually set context_length in config. Also refactors _is_known_provider_base_url() to use the same URL mapping, removing the duplicated hostname list.	2026-03-20 11:57:24 -07:00
Teknium	1aa7027be1	Merge pull request #2192 from NousResearch/hermes/hermes-3d7c23c9 fix(acp): preserve leading whitespace in streaming chunks	2026-03-20 09:52:32 -07:00
Teknium	7a427d7b03	fix: persistent event loop in _run_async prevents 'Event loop is closed' (#2190 ) Cherry-picked from PR #2146 by @crazywriter1. Fixes #2104. asyncio.run() creates and closes a fresh event loop each call. Cached httpx/AsyncOpenAI clients bound to the dead loop crash on GC with 'Event loop is closed'. This hit vision_analyze on first use in CLI. Two-layer fix: - model_tools._run_async(): replace asyncio.run() with persistent loop via _get_tool_loop() + run_until_complete() - auxiliary_client._get_cached_client(): track which loop created each async client, discard stale entries if loop is closed 6 regression tests covering loop lifecycle, reuse, and full vision dispatch chain. Co-authored-by: Test <test@test.com>	2026-03-20 09:44:50 -07:00
Dilee	1173adbe86	fix(acp): preserve leading whitespace in streaming chunks	2026-03-20 09:38:13 -07:00
Test	55ce601502	fix: 6 bugs in model metadata, reasoning detection, and delegate tool Cherry-picked from PR #2169 by @0xbyt4. 1. _strip_provider_prefix: skip Ollama model:tag names (qwen:0.5b) 2. Fuzzy match: remove reverse direction that made claude-sonnet-4 resolve to 1M instead of 200K 3. _has_content_after_think_block: reuse _strip_think_blocks() to handle all tag variants (thinking, reasoning, REASONING_SCRATCHPAD) 4. models.dev lookup: elif→if so nous provider also queries models.dev 5. Disk cache fallback: use 5-min TTL instead of full hour so network is retried soon 6. Delegate build: wrap child construction in try/finally so _last_resolved_tool_names is always restored on exception	2026-03-20 08:52:37 -07:00
Teknium	2ea4dd30c6	fix(gateway): strip orphaned tool_results + let /reset bypass running agent (#2180 ) Two fixes for Telegram/gateway-specific bugs: 1. Anthropic adapter: strip orphaned tool_result blocks (mirror of existing tool_use stripping). Context compression or session truncation can remove an assistant message containing a tool_use while leaving the subsequent tool_result intact. Anthropic rejects these with a 400: 'unexpected tool_use_id found in tool_result blocks'. The adapter now collects all tool_use IDs and filters out any tool_result blocks referencing IDs not in that set. 2. Gateway: /reset and /new now bypass the running-agent guard (like /status already does). Previously, sending /reset while an agent was running caused the raw text to be queued and later fed back as a user message with the same broken history — replaying the corrupted session instead of resetting it. Now the running agent is interrupted, pending messages are cleared, and the reset command dispatches immediately. Tests updated: existing tests now include proper tool_use→tool_result pairs; two new tests cover orphaned tool_result stripping. Co-authored-by: Test <test@test.com>	2026-03-20 08:39:49 -07:00
Teknium	c52353cf8a	feat: context pressure warnings for CLI and gateway (#2159 ) * feat: context pressure warnings for CLI and gateway User-facing notifications as context approaches the compaction threshold. Warnings fire at 60% and 85% of the way to compaction — relative to the configured compression threshold, not the raw context window. CLI: Formatted line with a progress bar showing distance to compaction. Cyan at 60% (approaching), bold yellow at 85% (imminent). ◐ context ▰▰▰▰▰▰▰▰▰▰▰▰▱▱▱▱▱▱▱▱ 60% to compaction 100k threshold (50%) · approaching compaction ⚠ context ▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▱▱▱ 85% to compaction 100k threshold (50%) · compaction imminent Gateway: Plain-text notification sent to the user's chat via the new status_callback mechanism (asyncio.run_coroutine_threadsafe bridge, same pattern as step_callback). Does NOT inject into the message stream. The LLM never sees these warnings. Flags reset after each compaction cycle. Files changed: - agent/display.py — format_context_pressure(), format_context_pressure_gateway() - run_agent.py — status_callback param, _context_50/70_warned flags, _emit_context_pressure(), flag reset in _compress_context() - gateway/run.py — _status_callback_sync bridge, wired to AIAgent - tests/test_context_pressure.py — 23 tests * Merge remote-tracking branch 'origin/main' into hermes/hermes-7ea545bf --------- Co-authored-by: Test <test@test.com>	2026-03-20 08:37:36 -07:00
Teknium	88643a1ba9	feat: overhaul context length detection with models.dev and provider-aware resolution (#2158 ) Replace the fragile hardcoded context length system with a multi-source resolution chain that correctly identifies context windows per provider. Key changes: - New agent/models_dev.py: Fetches and caches the models.dev registry (3800+ models across 100+ providers with per-provider context windows). In-memory cache (1hr TTL) + disk cache for cold starts. - Rewritten get_model_context_length() resolution chain: 0. Config override (model.context_length) 1. Custom providers per-model context_length 2. Persistent disk cache 3. Endpoint /models (local servers) 4. Anthropic /v1/models API (max_input_tokens, API-key only) 5. OpenRouter live API (existing, unchanged) 6. Nous suffix-match via OpenRouter (dot/dash normalization) 7. models.dev registry lookup (provider-aware) 8. Thin hardcoded defaults (broad family patterns) 9. 128K fallback (was 2M) - Provider-aware context: same model now correctly resolves to different context windows per provider (e.g. claude-opus-4.6: 1M on Anthropic, 128K on GitHub Copilot). Provider name flows through ContextCompressor. - DEFAULT_CONTEXT_LENGTHS shrunk from 80+ entries to ~16 broad patterns. models.dev replaces the per-model hardcoding. - CONTEXT_PROBE_TIERS changed from [2M, 1M, 512K, 200K, 128K, 64K, 32K] to [128K, 64K, 32K, 16K, 8K]. Unknown models no longer start at 2M. - hermes model: prompts for context_length when configuring custom endpoints. Supports shorthand (32k, 128K). Saved to custom_providers per-model config. - custom_providers schema extended with optional models dict for per-model context_length (backward compatible). - Nous Portal: suffix-matches bare IDs (claude-opus-4-6) against OpenRouter's prefixed IDs (anthropic/claude-opus-4.6) with dot/dash normalization. Handles all 15 current Nous models. - Anthropic direct: queries /v1/models for max_input_tokens. Only works with regular API keys (sk-ant-api*), not OAuth tokens. Falls through to models.dev for OAuth users. Tests: 5574 passed (18 new tests for models_dev + updated probe tiers) Docs: Updated configuration.md context length section, AGENTS.md Co-authored-by: Test <test@test.com>	2026-03-20 06:04:33 -07:00
Teknium	b7b585656b	Merge pull request #2110 from NousResearch/hermes/hermes-5d6932ba fix: session reset + custom provider model switch + honcho base_url	2026-03-20 06:01:44 -07:00
Test	4494c0b033	fix(cron): remove send_message/clarify from cron agents + autonomous prompt Cron jobs run unattended with no user present. Previously the agent had send_message and clarify tools available, which makes no sense — the final response is auto-delivered, and there's nobody to ask questions to. Changes: - Disable messaging and clarify toolsets for cron agent sessions - Update cron platform hint to emphasize autonomous execution: no user present, cannot ask questions, must execute fully and make decisions - Update cronjob tool schema description to match (remove stale send_message guidance)	2026-03-20 05:18:05 -07:00
Teknium	3ec6c71e43	fix: update claude 4.6 context length from 200K to 1M (#2155 ) * fix: preserve Ollama model:tag colons in context length detection The colon-split logic in get_model_context_length() and _query_local_context_length() assumed any colon meant provider:model format (e.g. "local:my-model"). But Ollama uses model:tag format (e.g. "qwen3.5:27b"), so the split turned "qwen3.5:27b" into just "27b" — which matches nothing, causing a fallback to the 2M token probe tier. Now only recognised provider prefixes (local, openrouter, anthropic, etc.) are stripped. Ollama model:tag names pass through intact. * fix: update claude-opus-4-6 and claude-sonnet-4-6 context length from 200K to 1M Both models support 1,000,000 token context windows. The hardcoded defaults were set before Anthropic expanded the context for the 4.6 generation. Verified via models.dev and OpenRouter API data. --------- Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com> Co-authored-by: Test <test@test.com>	2026-03-20 04:38:59 -07:00
Teknium	471ea81a7d	fix: preserve Ollama model:tag colons in context length detection (#2149 ) The colon-split logic in get_model_context_length() and _query_local_context_length() assumed any colon meant provider:model format (e.g. "local:my-model"). But Ollama uses model:tag format (e.g. "qwen3.5:27b"), so the split turned "qwen3.5:27b" into just "27b" — which matches nothing, causing a fallback to the 2M token probe tier. Now only recognised provider prefixes (local, openrouter, anthropic, etc.) are stripped. Ollama model:tag names pass through intact. Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-20 03:19:31 -07:00
Peppi Littera	ec5fdb8b92	feat: query local servers for actual context window size Custom endpoints (LM Studio, Ollama, vLLM, llama.cpp) silently fall back to 2M tokens when /v1/models doesn't include context_length. Adds _query_local_context_length() which queries server-specific APIs: - LM Studio: /api/v1/models (max_context_length + loaded instances) - Ollama: /api/show (model_info + num_ctx parameters) - llama.cpp: /props (n_ctx from default_generation_settings) - vLLM: /v1/models/{model} (max_model_len) Prefers loaded instance context over max (e.g., 122K loaded vs 1M max). Results are cached via save_context_length() to avoid repeated queries. Also fixes detect_local_server_type() misidentifying LM Studio as Ollama (LM Studio returns 200 for /api/tags with an error body).	2026-03-19 21:32:04 +01:00
Peppi Littera	c030ac1d85	fix: prefer loaded instance context size over max for LM Studio When LM Studio has a model loaded with a custom context size (e.g., 122K), prefer that over the model's max_context_length (e.g., 1M). This makes the TUI status bar show the actual runtime context window.	2026-03-19 21:24:53 +01:00
Peppi Littera	d223f7388d	feat: query local server for actual context window size Instead of defaulting to 2M for unknown local models, query the server API for the real context length. Supports Ollama (/api/show), vLLM (max_model_len), and LM Studio (/v1/models). Results are cached to avoid repeated queries.	2026-03-19 21:24:05 +01:00
Peppi Littera	219af75704	fix(security): eliminate SQL string formatting in execute() calls Closes #1911 - insights.py: Pre-compute SELECT queries as class constants instead of f-string interpolation at runtime. _SESSION_COLS is now evaluated once at class definition time. - hermes_state.py: Add identifier quoting and whitelist validation for ALTER TABLE column names in schema migrations. - Add 4 tests verifying no injection vectors in SQL query construction.	2026-03-19 15:16:35 +01:00
Teknium	d76fa7fc37	fix: detect context length for custom model endpoints via fuzzy matching + config override (#2051 ) * fix: detect context length for custom model endpoints via fuzzy matching + config override Custom model endpoints (non-OpenRouter, non-known-provider) were silently falling back to 2M tokens when the model name didn't exactly match what the endpoint's /v1/models reported. This happened because: 1. Endpoint metadata lookup used exact match only — model name mismatches (e.g. 'qwen3.5:9b' vs 'Qwen3.5-9B-Q4_K_M.gguf') caused a miss 2. Single-model servers (common for local inference) required exact name match even though only one model was loaded 3. No user escape hatch to manually set context length Changes: - Add fuzzy matching for endpoint model metadata: single-model servers use the only available model regardless of name; multi-model servers try substring matching in both directions - Add model.context_length config override (highest priority) so users can explicitly set their model's context length in config.yaml - Log an informative message when falling back to 2M probe, telling users about the config override option - Thread config_context_length through ContextCompressor and AIAgent init Tests: 6 new tests covering fuzzy match, single-model fallback, config override (including zero/None edge cases). * fix: auto-detect local model name and context length for local servers Cherry-picked from PR #2043 by sudoingX. - Auto-detect model name from local server's /v1/models when only one model is loaded (no manual model name config needed) - Add n_ctx_train and n_ctx to context length detection keys for llama.cpp - Query llama.cpp /props endpoint for actual allocated context (not just training context from GGUF metadata) - Strip .gguf suffix from display in banner and status bar - _auto_detect_local_model() in runtime_provider.py for CLI init Co-authored-by: sudo <sudoingx@users.noreply.github.com> * fix: revert accidental summary_target_tokens change + add docs for context_length config - Revert summary_target_tokens from 2500 back to 500 (accidental change during patching) - Add 'Context Length Detection' section to Custom & Self-Hosted docs explaining model.context_length config override --------- Co-authored-by: Test <test@test.com> Co-authored-by: sudo <sudoingx@users.noreply.github.com>	2026-03-19 06:01:16 -07:00
Teknium	67d707e851	fix: respect config.yaml model.base_url for Anthropic provider (#1948 ) (#1998 ) After #1675 removed ANTHROPIC_BASE_URL env var support, the Anthropic provider base URL was hardcoded to https://api.anthropic.com. Now reads model.base_url from config.yaml as an override, falling back to the default when not set. Also applies to the auxiliary client. Cherry-picked from PR #1949 by @rivercrab26. Co-authored-by: rivercrab26 <rivercrab26@users.noreply.github.com>	2026-03-18 16:51:24 -07:00
Teknium	d132e344d7	fix(agent): prevent silent tool result loss during context compression (#1993 ) _align_boundary_backward only checked messages[idx-1] to decide if the compress-end boundary splits a tool_call/result group. When an assistant issues 3+ parallel tool calls, their results span multiple consecutive messages. If the boundary fell in the middle of that group, the parent assistant was summarized away and orphaned tool results were silently deleted by _sanitize_tool_pairs. Now walks backward through all consecutive tool results to find the parent assistant, then pulls the boundary before the entire group. 6 regression tests added in tests/test_compression_boundary.py. Co-authored-by: Guts <Gutslabs@users.noreply.github.com>	2026-03-18 15:22:51 -07:00
Teknium	e4a3ffa9c1	feat: use SOUL.md as primary agent identity instead of hardcoded default (#1922 ) SOUL.md now loads in slot #1 of the system prompt, replacing the hardcoded DEFAULT_AGENT_IDENTITY. This lets users fully customize the agent's identity and personality by editing ~/.hermes/SOUL.md without it conflicting with the built-in identity text. When SOUL.md is loaded as identity, it's excluded from the context files section to avoid appearing twice. When SOUL.md is missing, empty, unreadable, or skip_context_files is set, the hardcoded DEFAULT_AGENT_IDENTITY is used as a fallback. The default SOUL.md (seeded on first run) already contains the full Hermes personality, so existing installs are unaffected. Co-authored-by: Test <test@test.com>	2026-03-18 04:11:20 -07:00
Test	e7844e9c8d	Merge origin/main, resolve conflicts (self._base_url_lower)	2026-03-18 04:09:00 -07:00

1 2 3 4 5

233 Commits