hermes-agent

Author	SHA1	Message	Date
Teknium	5a98ce5973	fix: use clean user message for all memory provider operations (#4940 ) When a skill is active, user_message contains the full SKILL.md content injected by the skill system. This bloated string was being passed to memory provider sync_all(), queue_prefetch_all(), and prefetch_all(), causing providers with query size limits (e.g. Honcho's 10K char limit) to fail. Both call sites now use original_user_message (the clean user input, already defined at line 6516) instead of the skill-inflated user_message: - Pre-turn prefetch (line ~6695): prefetch_all() query - Post-turn sync (line ~8672): sync_all() + queue_prefetch_all() Fixes #4889	2026-04-03 20:43:01 -07:00
Teknium	cee761ee4a	fix: prevent duplicate messages — gateway dedup + partial stream guard (#4878 ) * fix(gateway): add message deduplication to Discord and Slack adapters (#4777) Discord RESUME replays events after reconnects (~7/day observed), and Slack Socket Mode can redeliver events if the ack was lost. Neither adapter tracked which messages were already processed, causing duplicate bot responses. Add _seen_messages dedup cache (message ID → timestamp) with 5-min TTL and 2000-entry cap to both adapters, matching the pattern already used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email. The check goes at the very top of the message handler, before any other logic, so replayed events are silently dropped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent duplicate messages on partial stream delivery When streaming fails after tokens are already delivered to the platform, _interruptible_streaming_api_call re-raised the error into the outer retry loop, which would make a new API call — creating a duplicate message. Now checks deltas_were_sent before re-raising: if partial content was already streamed, returns a stub response instead. The outer loop treats the turn as complete (no retry, no fallback, no duplicate). Inspired by PR #4871 (@trevorgordon981) which identified the bug. This implementation avoids monkey-patching exception objects and keeps the fix within the streaming call boundary. --------- Co-authored-by: Mibayy <mibayy@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 18:53:52 -07:00
kagura-agent	6bf5946bbe	fix: filter transcript-only roles from chat-completions payload (#4715 ) Add a provider-agnostic role allowlist guard to _sanitize_api_messages() that drops messages with roles not accepted by the chat-completions API (e.g. session_meta). This prevents CLI resume/session restore from leaking transcript-only metadata into the outgoing messages payload. Two layers of defense: 1. API-boundary guard: _sanitize_api_messages() now filters messages by role allowlist (system/user/assistant/tool/function/developer) before the existing orphaned tool-call repair logic. This protects all current and future call paths. 2. CLI restore defense-in-depth: Both session restore paths in cli.py now strip session_meta entries before loading history into conversation_history, matching the existing gateway behavior. Closes #4715	2026-04-03 14:57:33 -07:00
Teknium	e492420df4	fix: route memory provider tools in sequential execution path (#4803 ) Memory provider tools (hindsight_retain, honcho_search, etc.) were advertised to the model via tool schemas but failed with 'Unknown tool' at execution time. The concurrent path (_invoke_tool) correctly checks self._memory_manager.has_tool() before falling through to the registry, but the sequential path (_execute_tool_calls_sequential) was never updated with this check. Since sequential is the default for single tool calls, memory provider tools always hit the registry dispatcher which returns 'Unknown tool' because they're not registered there. Add the memory_manager dispatch check between the delegate_task handler and the quiet_mode fallthrough in the sequential path, with proper spinner/display handling to match the existing pattern. Reported by KiBenderOP — all memory providers affected (Honcho, Hindsight, Holographic, etc.).	2026-04-03 10:31:53 -07:00
Teknium	f374ae4c61	fix: prevent compression death spiral from API disconnects (#2153 ) (#4750 ) Three fixes for long-running gateway sessions that enter a death spiral when API disconnects prevent token data collection, which prevents compression, which causes more disconnects: Layer 1 — Stale token counter fallback (run_agent.py in-loop): When last_prompt_tokens is 0 (stale after API disconnect or provider returned no usage data), fall back to estimate_messages_tokens_rough() instead of passing 0 to should_compress(), which would never fire. Layer 2 — Server disconnect heuristic (run_agent.py error handler): When ReadError/RemoteProtocolError hits a large session (>60% context or >200 messages), treat it as a context-length error and trigger compression rather than burning through retries that all fail the same way. Layer 3 — Hard message count limit (gateway/run.py hygiene): Force compression when a session exceeds 400 messages, regardless of token estimates. This catches runaway growth even when all token-based checks fail due to missing API data. Based on the analysis from PR #2157 by ygd58 — the gateway threshold direction fix (1.4x multiplier) was already resolved on main.	2026-04-03 02:16:46 -07:00
Teknium	8fd9fafc84	fix: handle Anthropic Sonnet long-context tier 429 by reducing to 200k (#4747 ) Anthropic returns HTTP 429 'Extra usage is required for long context requests' when a Claude Max subscription doesn't include the 1M context tier. This is NOT a transient rate limit — retrying won't help. Only applies to Sonnet models (Opus 1M is general access). Detects this specific error before the generic rate-limit handler and: 1. Reduces context_length from 1M to 200k (the standard tier) 2. Triggers context compression to fit 3. Retries with the reduced context The reduction is session-scoped (not persisted) so it auto-recovers if the user later enables extra usage on their subscription. Fixes: Sonnet 4.6 instant rate limits on Claude Max without extra usage	2026-04-03 02:05:02 -07:00
Teknium	470c3ea51a	fix: handle Anthropic long-context tier 429 by reducing to 200k Anthropic returns HTTP 429 'Extra usage is required for long context requests' when a Claude Max subscription doesn't include the 1M context tier. This is NOT a transient rate limit — retrying won't help. Detect this specific error before the generic rate-limit handler and: 1. Reduce context_length from 1M to 200k (the standard tier) 2. Trigger context compression to fit 3. Retry with the reduced context The reduction is session-scoped (not persisted) so it auto-recovers if the user later enables extra usage on their subscription. Fixes: Sonnet 4.6 instant rate limits on Claude Max without extra usage	2026-04-03 01:56:43 -07:00
Jack	f66b3fe76b	fix(acp): include tool results in step_callback for ACP tool_call_update events The step_callback previously only forwarded tool names as strings, so build_tool_complete received result=None and ACP tool_call_update events had empty content/rawOutput. Now prev_tools carries dicts with both name and result by pairing each tool_call with its matching tool-role message via tool_call_id.	2026-04-02 20:54:27 -07:00
Teknium	924bc67eee	feat(memory): pluggable memory provider interface with profile isolation, review fixes, and honcho CLI restoration (#4623 ) * feat(memory): add pluggable memory provider interface with profile isolation Introduces a pluggable MemoryProvider ABC so external memory backends can integrate with Hermes without modifying core files. Each backend becomes a plugin implementing a standard interface, orchestrated by MemoryManager. Key architecture: - agent/memory_provider.py — ABC with core + optional lifecycle hooks - agent/memory_manager.py — single integration point in the agent loop - agent/builtin_memory_provider.py — wraps existing MEMORY.md/USER.md Profile isolation fixes applied to all 6 shipped plugins: - Cognitive Memory: use get_hermes_home() instead of raw env var - Hindsight Memory: check $HERMES_HOME/hindsight/config.json first, fall back to legacy ~/.hindsight/ for backward compat - Hermes Memory Store: replace hardcoded ~/.hermes paths with get_hermes_home() for config loading and DB path defaults - Mem0 Memory: use get_hermes_home() instead of raw env var - RetainDB Memory: auto-derive profile-scoped project name from hermes_home path (hermes-<profile>), explicit env var overrides - OpenViking Memory: read-only, no local state, isolation via .env MemoryManager.initialize_all() now injects hermes_home into kwargs so every provider can resolve profile-scoped storage without importing get_hermes_home() themselves. Plugin system: adds register_memory_provider() to PluginContext and get_plugin_memory_providers() accessor. Based on PR #3825. 46 tests (37 unit + 5 E2E + 4 plugin registration). * refactor(memory): drop cognitive plugin, rewrite OpenViking as full provider Remove cognitive-memory plugin (#727) — core mechanics are broken: decay runs 24x too fast (hourly not daily), prefetch uses row ID as timestamp, search limited by importance not similarity. Rewrite openviking-memory plugin from a read-only search wrapper into a full bidirectional memory provider using the complete OpenViking session lifecycle API: - sync_turn: records user/assistant messages to OpenViking session (threaded, non-blocking) - on_session_end: commits session to trigger automatic memory extraction into 6 categories (profile, preferences, entities, events, cases, patterns) - prefetch: background semantic search via find() endpoint - on_memory_write: mirrors built-in memory writes to the session - is_available: checks env var only, no network calls (ABC compliance) Tools expanded from 3 to 5: - viking_search: semantic search with mode/scope/limit - viking_read: tiered content (abstract ~100tok / overview ~2k / full) - viking_browse: filesystem-style navigation (list/tree/stat) - viking_remember: explicit memory storage via session - viking_add_resource: ingest URLs/docs into knowledge base Uses direct HTTP via httpx (no openviking SDK dependency needed). Response truncation on viking_read to prevent context flooding. * fix(memory): harden Mem0 plugin — thread safety, non-blocking sync, circuit breaker - Remove redundant mem0_context tool (identical to mem0_search with rerank=true, top_k=5 — wastes a tool slot and confuses the model) - Thread sync_turn so it's non-blocking — Mem0's server-side LLM extraction can take 5-10s, was stalling the agent after every turn - Add threading.Lock around _get_client() for thread-safe lazy init (prefetch and sync threads could race on first client creation) - Add circuit breaker: after 5 consecutive API failures, pause calls for 120s instead of hammering a down server every turn. Auto-resets after cooldown. Logs a warning when tripped. - Track success/failure in prefetch, sync_turn, and all tool calls - Wait for previous sync to finish before starting a new one (prevents unbounded thread accumulation on rapid turns) - Clean up shutdown to join both prefetch and sync threads * fix(memory): enforce single external memory provider limit MemoryManager now rejects a second non-builtin provider with a warning. Built-in memory (MEMORY.md/USER.md) is always accepted. Only ONE external plugin provider is allowed at a time. This prevents tool schema bloat (some providers add 3-5 tools each) and conflicting memory backends. The warning message directs users to configure memory.provider in config.yaml to select which provider to activate. Updated all 47 tests to use builtin + one external pattern instead of multiple externals. Added test_second_external_rejected to verify the enforcement. * feat(memory): add ByteRover memory provider plugin Implements the ByteRover integration (from PR #3499 by hieuntg81) as a MemoryProvider plugin instead of direct run_agent.py modifications. ByteRover provides persistent memory via the brv CLI — a hierarchical knowledge tree with tiered retrieval (fuzzy text then LLM-driven search). Local-first with optional cloud sync. Plugin capabilities: - prefetch: background brv query for relevant context - sync_turn: curate conversation turns (threaded, non-blocking) - on_memory_write: mirror built-in memory writes to brv - on_pre_compress: extract insights before context compression Tools (3): - brv_query: search the knowledge tree - brv_curate: store facts/decisions/patterns - brv_status: check CLI version and context tree state Profile isolation: working directory at $HERMES_HOME/byterover/ (scoped per profile). Binary resolution cached with thread-safe double-checked locking. All write operations threaded to avoid blocking the agent (curate can take 120s with LLM processing). * fix(memory): thread remaining sync_turns, fix holographic, add config key Plugin fixes: - Hindsight: thread sync_turn (was blocking up to 30s via _run_in_thread) - RetainDB: thread sync_turn (was blocking on HTTP POST) - Both: shutdown now joins sync threads alongside prefetch threads Holographic retrieval fixes: - reason(): removed dead intersection_key computation (bundled but never used in scoring). Now reuses pre-computed entity_residuals directly, moved role_content encoding outside the inner loop. - contradict(): added _MAX_CONTRADICT_FACTS=500 scaling guard. Above 500 facts, only checks the most recently updated ones to avoid O(n^2) explosion (~125K comparisons at 500 is acceptable). Config: - Added memory.provider key to DEFAULT_CONFIG ("" = builtin only). No version bump needed (deep_merge handles new keys automatically). * feat(memory): extract Honcho as a MemoryProvider plugin Creates plugins/honcho-memory/ as a thin adapter over the existing honcho_integration/ package. All 4 Honcho tools (profile, search, context, conclude) move from the normal tool registry to the MemoryProvider interface. The plugin delegates all work to HonchoSessionManager — no Honcho logic is reimplemented. It uses the existing config chain: $HERMES_HOME/honcho.json -> ~/.honcho/config.json -> env vars. Lifecycle hooks: - initialize: creates HonchoSessionManager via existing client factory - prefetch: background dialectic query - sync_turn: records messages + flushes to API (threaded) - on_memory_write: mirrors user profile writes as conclusions - on_session_end: flushes all pending messages This is a prerequisite for the MemoryManager wiring in run_agent.py. Once wired, Honcho goes through the same provider interface as all other memory plugins, and the scattered Honcho code in run_agent.py can be consolidated into the single MemoryManager integration point. * feat(memory): wire MemoryManager into run_agent.py Adds 8 integration points for the external memory provider plugin, all purely additive (zero existing code modified): 1. Init (~L1130): Create MemoryManager, find matching plugin provider from memory.provider config, initialize with session context 2. Tool injection (~L1160): Append provider tool schemas to self.tools and self.valid_tool_names after memory_manager init 3. System prompt (~L2705): Add external provider's system_prompt_block alongside existing MEMORY.md/USER.md blocks 4. Tool routing (~L5362): Route provider tool calls through memory_manager.handle_tool_call() before the catchall handler 5. Memory write bridge (~L5353): Notify external provider via on_memory_write() when the built-in memory tool writes 6. Pre-compress (~L5233): Call on_pre_compress() before context compression discards messages 7. Prefetch (~L6421): Inject provider prefetch results into the current-turn user message (same pattern as Honcho turn context) 8. Turn sync + session end (~L8161, ~L8172): sync_all() after each completed turn, queue_prefetch_all() for next turn, on_session_end() + shutdown_all() at conversation end All hooks are wrapped in try/except — a failing provider never breaks the agent. The existing memory system, Honcho integration, and all other code paths are completely untouched. Full suite: 7222 passed, 4 pre-existing failures. * refactor(memory): remove legacy Honcho integration from core Extracts all Honcho-specific code from run_agent.py, model_tools.py, toolsets.py, and gateway/run.py. Honcho is now exclusively available as a memory provider plugin (plugins/honcho-memory/). Removed from run_agent.py (-457 lines): - Honcho init block (session manager creation, activation, config) - 8 Honcho methods: _honcho_should_activate, _strip_honcho_tools, _activate_honcho, _register_honcho_exit_hook, _queue_honcho_prefetch, _honcho_prefetch, _honcho_save_user_observation, _honcho_sync - _inject_honcho_turn_context module-level function - Honcho system prompt block (tool descriptions, CLI commands) - Honcho context injection in api_messages building - Honcho params from __init__ (honcho_session_key, honcho_manager, honcho_config) - HONCHO_TOOL_NAMES constant - All honcho-specific tool dispatch forwarding Removed from other files: - model_tools.py: honcho_tools import, honcho params from handle_function_call - toolsets.py: honcho toolset definition, honcho tools from core tools list - gateway/run.py: honcho params from AIAgent constructor calls Removed tests (-339 lines): - 9 Honcho-specific test methods from test_run_agent.py - TestHonchoAtexitFlush class from test_exit_cleanup_interrupt.py Restored two regex constants (_SURROGATE_RE, _BUDGET_WARNING_RE) that were accidentally removed during the honcho function extraction. The honcho_integration/ package is kept intact — the plugin delegates to it. tools/honcho_tools.py registry entries are now dead code (import commented out in model_tools.py) but the file is preserved for reference. Full suite: 7207 passed, 4 pre-existing failures. Zero regressions. * refactor(memory): restructure plugins, add CLI, clean gateway, migration notice Plugin restructure: - Move all memory plugins from plugins/<name>-memory/ to plugins/memory/<name>/ (byterover, hindsight, holographic, honcho, mem0, openviking, retaindb) - New plugins/memory/__init__.py discovery module that scans the directory directly, loading providers by name without the general plugin system - run_agent.py uses load_memory_provider() instead of get_plugin_memory_providers() CLI wiring: - hermes memory setup — interactive curses picker + config wizard - hermes memory status — show active provider, config, availability - hermes memory off — disable external provider (built-in only) - hermes honcho — now shows migration notice pointing to hermes memory setup Gateway cleanup: - Remove _get_or_create_gateway_honcho (already removed in prev commit) - Remove _shutdown_gateway_honcho and _shutdown_all_gateway_honcho methods - Remove all calls to shutdown methods (4 call sites) - Remove _honcho_managers/_honcho_configs dict references Dead code removal: - Delete tools/honcho_tools.py (279 lines, import was already commented out) - Delete tests/gateway/test_honcho_lifecycle.py (131 lines, tested removed methods) - Remove if False placeholder from run_agent.py Migration: - Honcho migration notice on startup: detects existing honcho.json or ~/.honcho/config.json, prints guidance to run hermes memory setup. Only fires when memory.provider is not set and not in quiet mode. Full suite: 7203 passed, 4 pre-existing failures. Zero regressions. * feat(memory): standardize plugin config + add per-plugin documentation Config architecture: - Add save_config(values, hermes_home) to MemoryProvider ABC - Honcho: writes to $HERMES_HOME/honcho.json (SDK native) - Mem0: writes to $HERMES_HOME/mem0.json - Hindsight: writes to $HERMES_HOME/hindsight/config.json - Holographic: writes to config.yaml under plugins.hermes-memory-store - OpenViking/RetainDB/ByteRover: env-var only (default no-op) Setup wizard (hermes memory setup): - Now calls provider.save_config() for non-secret config - Secrets still go to .env via env vars - Only memory.provider activation key goes to config.yaml Documentation: - README.md for each of the 7 providers in plugins/memory/<name>/ - Requirements, setup (wizard + manual), config reference, tools table - Consistent format across all providers The contract for new memory plugins: - get_config_schema() declares all fields (REQUIRED) - save_config() writes native config (REQUIRED if not env-var-only) - Secrets use env_var field in schema, written to .env by wizard - README.md in the plugin directory * docs: add memory providers user guide + developer guide New pages: - user-guide/features/memory-providers.md — comprehensive guide covering all 7 shipped providers (Honcho, OpenViking, Mem0, Hindsight, Holographic, RetainDB, ByteRover). Each with setup, config, tools, cost, and unique features. Includes comparison table and profile isolation notes. - developer-guide/memory-provider-plugin.md — how to build a new memory provider plugin. Covers ABC, required methods, config schema, save_config, threading contract, profile isolation, testing. Updated pages: - user-guide/features/memory.md — replaced Honcho section with link to new Memory Providers page - user-guide/features/honcho.md — replaced with migration redirect to the new Memory Providers page - sidebars.ts — added both new pages to navigation * fix(memory): auto-migrate Honcho users to memory provider plugin When honcho.json or ~/.honcho/config.json exists but memory.provider is not set, automatically set memory.provider: honcho in config.yaml and activate the plugin. The plugin reads the same config files, so all data and credentials are preserved. Zero user action needed. Persists the migration to config.yaml so it only fires once. Prints a one-line confirmation in non-quiet mode. * fix(memory): only auto-migrate Honcho when enabled + credentialed Check HonchoClientConfig.enabled AND (api_key OR base_url) before auto-migrating — not just file existence. Prevents false activation for users who disabled Honcho, stopped using it (config lingers), or have ~/.honcho/ from a different tool. * feat(memory): auto-install pip dependencies during hermes memory setup Reads pip_dependencies from plugin.yaml, checks which are missing, installs them via pip before config walkthrough. Also shows install guidance for external_dependencies (e.g. brv CLI for ByteRover). Updated all 7 plugin.yaml files with pip_dependencies: - honcho: honcho-ai - mem0: mem0ai - openviking: httpx - hindsight: hindsight-client - holographic: (none) - retaindb: requests - byterover: (external_dependencies for brv CLI) * fix: remove remaining Honcho crash risks from cli.py and gateway cli.py: removed Honcho session re-mapping block (would crash importing deleted tools/honcho_tools.py), Honcho flush on compress, Honcho session display on startup, Honcho shutdown on exit, honcho_session_key AIAgent param. gateway/run.py: removed honcho_session_key params from helper methods, sync_honcho param, _honcho.shutdown() block. tests: fixed test_cron_session_with_honcho_key_skipped (was passing removed honcho_key param to _flush_memories_for_session). * fix: include plugins/ in pyproject.toml package list Without this, plugins/memory/ wouldn't be included in non-editable installs. Hermes always runs from the repo checkout so this is belt- and-suspenders, but prevents breakage if the install method changes. * fix(memory): correct pip-to-import name mapping for dep checks The heuristic dep.replace('-', '_') fails for packages where the pip name differs from the import name: honcho-ai→honcho, mem0ai→mem0, hindsight-client→hindsight_client. Added explicit mapping table so hermes memory setup doesn't try to reinstall already-installed packages. * chore: remove dead code from old plugin memory registration path - hermes_cli/plugins.py: removed register_memory_provider(), _memory_providers list, get_plugin_memory_providers() — memory providers now use plugins/memory/ discovery, not the general plugin system - hermes_cli/main.py: stripped 74 lines of dead honcho argparse subparsers (setup, status, sessions, map, peer, mode, tokens, identity, migrate) — kept only the migration redirect - agent/memory_provider.py: updated docstring to reflect new registration path - tests: replaced TestPluginMemoryProviderRegistration with TestPluginMemoryDiscovery that tests the actual plugins/memory/ discovery system. Added 3 new tests (discover, load, nonexistent). * chore: delete dead honcho_integration/cli.py and its tests cli.py (794 lines) was the old 'hermes honcho' command handler — nobody calls it since cmd_honcho was replaced with a migration redirect. Deleted tests that imported from removed code: - tests/honcho_integration/test_cli.py (tested _resolve_api_key) - tests/honcho_integration/test_config_isolation.py (tested CLI config paths) - tests/tools/test_honcho_tools.py (tested the deleted tools/honcho_tools.py) Remaining honcho_integration/ files (actively used by the plugin): - client.py (445 lines) — config loading, SDK client creation - session.py (991 lines) — session management, queries, flush * refactor: move honcho_integration/ into the honcho plugin Moves client.py (445 lines) and session.py (991 lines) from the top-level honcho_integration/ package into plugins/memory/honcho/. No Honcho code remains in the main codebase. - plugins/memory/honcho/client.py — config loading, SDK client creation - plugins/memory/honcho/session.py — session management, queries, flush - Updated all imports: run_agent.py (auto-migration), hermes_cli/doctor.py, plugin __init__.py, session.py cross-import, all tests - Removed honcho_integration/ package and pyproject.toml entry - Renamed tests/honcho_integration/ → tests/honcho_plugin/ * docs: update architecture + gateway-internals for memory provider system - architecture.md: replaced honcho_integration/ with plugins/memory/ - gateway-internals.md: replaced Honcho-specific session routing and flush lifecycle docs with generic memory provider interface docs * fix: update stale mock path for resolve_active_host after honcho plugin migration * fix(memory): address review feedback — P0 lifecycle, ABC contract, honcho CLI restore Review feedback from Honcho devs (erosika): P0 — Provider lifecycle: - Remove on_session_end() + shutdown_all() from run_conversation() tail (was killing providers after every turn in multi-turn sessions) - Add shutdown_memory_provider() method on AIAgent for callers - Wire shutdown into CLI atexit, reset_conversation, gateway stop/expiry Bug fixes: - Remove sync_honcho=False kwarg from /btw callsites (TypeError crash) - Fix doctor.py references to dead 'hermes honcho setup' command - Cache prefetch_all() before tool loop (was re-calling every iteration) ABC contract hardening (all backwards-compatible): - Add session_id kwarg to prefetch/sync_turn/queue_prefetch - Make on_pre_compress() return str (provider insights in compression) - Add *kwargs to on_turn_start() for runtime context - Add on_delegation() hook for parent-side subagent observation - Document agent_context/agent_identity/agent_workspace kwargs on initialize() (prevents cron corruption, enables profile scoping) - Fix docstring: single external provider, not multiple Honcho CLI restoration: - Add plugins/memory/honcho/cli.py (from main's honcho_integration/cli.py with imports adapted to plugin path) - Restore full hermes honcho command with all subcommands (status, peer, mode, tokens, identity, enable/disable, sync, peers, --target-profile) - Restore auto-clone on profile creation + sync on hermes update - hermes honcho setup now redirects to hermes memory setup fix(memory): wire on_delegation, skip_memory for cron/flush, fix ByteRover return type - Wire on_delegation() in delegate_tool.py — parent's memory provider is notified with task+result after each subagent completes - Add skip_memory=True to cron scheduler (prevents cron system prompts from corrupting user representations — closes #4052) - Add skip_memory=True to gateway flush agent (throwaway agent shouldn't activate memory provider) - Fix ByteRover on_pre_compress() return type: None -> str * fix(honcho): port profile isolation fixes from PR #4632 Ports 5 bug fixes found during profile testing (erosika's PR #4632): 1. 3-tier config resolution — resolve_config_path() now checks $HERMES_HOME/honcho.json → ~/.hermes/honcho.json → ~/.honcho/config.json (non-default profiles couldn't find shared host blocks) 2. Thread host=_host_key() through from_global_config() in cmd_setup, cmd_status, cmd_identity (--target-profile was being ignored) 3. Use bare profile name as aiPeer (not host key with dots) — Honcho's peer ID pattern is ^[a-zA-Z0-9_-]+$, dots are invalid 4. Wrap add_peers() in try/except — was fatal on new AI peers, killed all message uploads for the session 5. Gate Honcho clone behind --clone/--clone-all on profile create (bare create should be blank-slate) Also: sanitize assistant_peer_id via _sanitize_id() * fix(tests): add module cleanup fixture to test_cli_provider_resolution test_cli_provider_resolution._import_cli() wipes tools.*, cli, and run_agent from sys.modules to force fresh imports, but had no cleanup. This poisoned all subsequent tests on the same xdist worker — mocks targeting tools.file_tools, tools.send_message_tool, etc. patched the NEW module object while already-imported functions still referenced the OLD one. Caused ~25 cascade failures: send_message KeyError, process_registry FileNotFoundError, file_read_guards timeouts, read_loop_detection file-not-found, mcp_oauth None port, and provider_parity/codex_execution stale tool lists. Fix: autouse fixture saves all affected modules before each test and restores them after, matching the pattern in test_managed_browserbase_and_modal.py.	2026-04-02 15:33:51 -07:00
kshitijk4poor	d2b08406a4	fix(agent): classify think-only empty responses before retrying	2026-04-02 12:29:18 -07:00
Teknium	d89cc7fec1	feat(prompt): add Google model operational guidance for Gemini and Gemma (#4641 ) Adapted from OpenCode's gemini.txt. Gemini and Gemma models now get structured operational directives alongside tool-use enforcement: absolute paths, verify-before-edit, dependency checks, conciseness, parallel tool calls, non-interactive flags, autonomous execution. Based on PR #4026, extended to cover Gemma models.	2026-04-02 11:52:34 -07:00
Teknium	3186668799	feat: per-turn primary runtime restoration and transport recovery (#4624 ) Makes provider fallback turn-scoped in long-lived CLI sessions. Previously, a single transient failure pinned the session to the fallback provider for every subsequent turn. - _primary_runtime dict snapshot at __init__ (model, provider, base_url, api_mode, client_kwargs, compressor state) - _restore_primary_runtime() at top of run_conversation() — restores all state, resets fallback chain index - _try_recover_primary_transport() — one extra recovery cycle (client rebuild + cooldown) for transient transport errors on direct endpoints before fallback - Skipped for aggregator providers (OpenRouter, Nous) - 25 tests Inspired by #4612 (@betamod). Closes #4612.	2026-04-02 10:52:01 -07:00
kshitijk4poor	20441cf2c8	fix(insights): persist token usage for non-CLI sessions	2026-04-02 10:47:13 -07:00
Teknium	585855d2ca	fix: preserve Anthropic thinking block signatures across tool-use turns Anthropic extended thinking blocks include an opaque 'signature' field required for thinking chain continuity across multi-turn tool-use conversations. Previously, normalize_anthropic_response() extracted only the thinking text and set reasoning_details=None, discarding the signature. On subsequent turns the API could not verify the chain. Changes: - _to_plain_data(): new recursive SDK-to-dict converter with depth cap (20 levels) and path-based cycle detection for safety - _extract_preserved_thinking_blocks(): rehydrates preserved thinking blocks (including signature) from reasoning_details on assistant messages, placing them before tool_use blocks as Anthropic requires - normalize_anthropic_response(): stores full thinking blocks in reasoning_details via _to_plain_data() - _extract_reasoning(): adds 'thinking' key to the detail lookup chain so Anthropic-format details are found alongside OpenRouter format Salvaged from PR #4503 by @priveperfumes — focused on the thinking block continuity fix only (cache strategy and other changes excluded).	2026-04-02 10:30:32 -07:00
Ben Barclay	a2e56d044b	Merge branch 'main' into rewbs/tool-use-charge-to-subscription	2026-04-02 11:00:35 +11:00
Teknium	de9bba8d7c	fix: remove hardcoded OpenRouter/opus defaults No model, base_url, or provider is assumed when the user hasn't configured one. Previously the defaults dict in cli.py, AIAgent constructor args, and several fallback paths all hardcoded anthropic/claude-opus-4.6 + openrouter.ai/api/v1 — silently routing unconfigured users to OpenRouter, which 404s for anyone using a different provider. Now empty defaults force the setup wizard to run, and existing users who already completed setup are unaffected (their config.yaml has the model they chose). Files changed: - cli.py: defaults dict, _DEFAULT_CONFIG_MODEL - run_agent.py: AIAgent.__init__ defaults, main() defaults - hermes_cli/config.py: DEFAULT_CONFIG - hermes_cli/runtime_provider.py: is_fallback sentinel - acp_adapter/session.py: default_model - tests: updated to reflect empty defaults	2026-04-01 15:22:26 -07:00
Teknium	3628ccc8c4	feat: use 'developer' role for GPT-5 and Codex models (#4498 ) OpenAI's newer models (GPT-5, Codex) give stronger instruction-following weight to the 'developer' role vs 'system'. Swap the role at the API boundary in _build_api_kwargs() for the chat_completions path so internal message representation stays consistent ('system' everywhere). Applies regardless of provider — OpenRouter, Nous portal, direct, etc. The codex_responses path (direct OpenAI) uses 'instructions' instead of message roles, so it's unaffected. DEVELOPER_ROLE_MODELS constant in prompt_builder.py defines the matching model name substrings: ('gpt-5', 'codex').	2026-04-01 14:49:32 -07:00
Teknium	16d9f58445	fix(gateway): persist memory flush state to prevent redundant re-flushes on restart (#4481 ) * fix: force-close TCP sockets on client cleanup, detect and recover dead connections When a provider drops connections mid-stream (e.g. OpenRouter outage), httpx's graceful close leaves sockets in CLOSE-WAIT indefinitely. These zombie connections accumulate and can prevent recovery without restarting. Changes: - _force_close_tcp_sockets: walks the httpx connection pool and issues socket.shutdown(SHUT_RDWR) + close() to force TCP RST on every socket when a client is closed, preventing CLOSE-WAIT accumulation - _cleanup_dead_connections: probes the primary client's pool for dead sockets (recv MSG_PEEK), rebuilds the client if any are found - Pre-turn health check at the start of each run_conversation call that auto-recovers with a user-facing status message - Primary client rebuild after stale stream detection to purge pool - User-facing messages on streaming connection failures: "Connection to provider dropped — Reconnecting (attempt 2/3)" "Connection failed after 3 attempts — try again in a moment" Made-with: Cursor * fix: pool entry missing base_url for openrouter, clean error messages - _resolve_runtime_from_pool_entry: add OPENROUTER_BASE_URL fallback when pool entry has no runtime_base_url (pool entries from auth.json credential_pool often omit base_url) - Replace Rich console.print for auth errors with plain print() to prevent ANSI escape code mangling through prompt_toolkit's stdout patch - Force-close TCP sockets on client cleanup to prevent CLOSE-WAIT accumulation after provider outages - Pre-turn dead connection detection with auto-recovery and user message - Primary client rebuild after stale stream detection - User-facing status messages on streaming connection failures/retries Made-with: Cursor * fix(gateway): persist memory flush state to prevent redundant re-flushes on restart The _session_expiry_watcher tracked flushed sessions in an in-memory set (_pre_flushed_sessions) that was lost on gateway restart. Expired sessions remained in sessions.json and were re-discovered every restart, causing redundant AIAgent runs that burned API credits and blocked the event loop. Fix: Add a memory_flushed boolean field to SessionEntry, persisted in sessions.json. The watcher sets it after a successful flush. On restart, the flag survives and the watcher skips already-flushed sessions. - Add memory_flushed field to SessionEntry with to_dict/from_dict support - Old sessions.json entries without the field default to False (backward compat) - Remove the ephemeral _pre_flushed_sessions set from SessionStore - Update tests: save/load roundtrip, legacy entry compat, auto-reset behavior	2026-04-01 12:05:02 -07:00
kshitijk4poor	935137f0d9	feat: add inline diff previews for write actions Show inline diffs in the CLI transcript when write_file, patch, or skill_manage modifies files. Captures a filesystem snapshot before the tool runs, computes a unified diff after, and renders it with ANSI coloring in the activity feed. Adds tool_start_callback and tool_complete_callback hooks to AIAgent for pre/post tool execution notifications. Also fixes _extract_parallel_scope_path to normalize relative paths to absolute, preventing the parallel overlap detection from missing conflicts when the same file is referenced with different path styles. Gated by display.inline_diffs config option (default: true). Based on PR #3774 by @kshitijk4poor.	2026-04-01 02:13:57 -07:00
Bartok9	afa75a6185	fix(client): handle is_closed as method in OpenAI SDK The openai SDK's SyncAPIClient.is_closed is a method, not a property. getattr(client, 'is_closed', False) returned the bound method object, which is always truthy — causing _is_openai_client_closed() to report all clients as closed and triggering unnecessary client recreation (~100-200ms TCP+TLS overhead per API call). Fix: check if is_closed is callable and call it, otherwise treat as bool. Fixes #4377 Co-authored-by: Bartok9 <Bartok9@users.noreply.github.com>	2026-04-01 01:40:43 -07:00
Teknium	a7f7e87070	fix: preserve credential_pool through smart routing and defer eager fallback on 429 (#4361 ) Three bugs prevented credential pool rotation from working when multiple Codex OAuth tokens were configured: 1. credential_pool was dropped during smart model turn routing. resolve_turn_route() constructed runtime dicts without it, so the AIAgent was created without pool access. Fixed in smart_model_routing.py (no-route and fallback paths), cli.py, and gateway/run.py. 2. Eager fallback fired before pool rotation on 429. The rate-limit handler at line ~7180 switched to a fallback provider immediately, before _recover_with_credential_pool got a chance to rotate to the next credential. Now deferred when the pool still has credentials. 3. (Non-issue) Retry budget was reported as too small, but successful pool rotations already skip retry_count increment — no change needed. Reported by community member Schinsly who identified all three root causes and verified the fix locally with multiple Codex accounts.	2026-04-01 01:02:34 -07:00
Teknium	e3f8347be3	feat(file_tools): harden read_file with size guard, dedup, and device blocking (#4315 ) * feat(file_tools): harden read_file with size guard, dedup, and device blocking Three improvements to read_file_tool to reduce wasted context tokens and prevent process hangs: 1. Character-count guard: reads that produce more than 100K characters (≈25-35K tokens across tokenisers) are rejected with an error that tells the model to use offset+limit for a smaller range. The effective cap is min(file_size, 100K) so small files that happen to have long lines aren't over-penalised. Large truncated files also get a hint nudging toward targeted reads. 2. File-read deduplication: when the same (path, offset, limit) is read a second time and the file hasn't been modified (mtime unchanged), return a lightweight stub instead of re-sending the full content. Writes and patches naturally change mtime, so post-edit reads always return fresh content. The dedup cache is cleared on context compression — after compression the original read content is summarised away, so the model needs the full content again. 3. Device path blocking: paths like /dev/zero, /dev/random, /dev/stdin etc. are rejected before any I/O to prevent process hangs from infinite-output or blocking-input devices. Tests: 17 new tests covering all three features plus the dedup-reset- on-compression integration. All 52 file-read tests pass (35 existing + 17 new). Full tool suite (2124 tests) passes with 0 failures. * feat: make file_read_max_chars configurable, add docs Add file_read_max_chars to DEFAULT_CONFIG (default 100K). read_file_tool reads this on first call and caches for the process lifetime. Users on large-context models can raise it; users on small local models can lower it. Also adds a 'File Read Safety' section to the configuration docs explaining the char limit, dedup behavior, and example values.	2026-03-31 12:53:19 -07:00
curtitoo	e75964d46d	fix: harden codex responses transport handling	2026-03-31 12:05:06 -07:00
Teknium	161acb0086	fix: credential pool 401 recovery rotates to next credential after failed refresh (#4300 ) When an OAuth token refresh fails on a 401 error, the pool recovery would return 'not recovered' without trying the next credential in the pool. This meant users who added a second valid credential via 'hermes auth add' would never see it used when the primary credential was dead. Now: try refresh first (handles expired tokens quickly), and if that fails, rotate to the next available credential — same as 429/402 already did. Adds three tests covering 401 refresh success, refresh-fail-then-rotate, and refresh-fail-with-no-remaining-credentials.	2026-03-31 12:02:29 -07:00
arasovic	0240baa357	fix: strip orphaned think/reasoning tags from user-facing responses Some models (e.g. Kimi K2.5 on Alibaba OpenAI-compatible endpoint) emit reasoning text followed by a closing </think> without a matching opening <think> tag. The existing paired-tag regexes in _strip_think_blocks() cannot match these orphaned tags, so </think> leaks into user-facing responses on all platforms. Add a catch-all regex that strips any remaining opening or closing think/thinking/reasoning/REASONING_SCRATCHPAD tags after the existing paired-block removal pass. Closes #4285	2026-03-31 11:42:44 -07:00
Teknium	8d59881a62	feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647 ) * feat(auth): add same-provider credential pools and rotation UX Add same-provider credential pooling so Hermes can rotate across multiple credentials for a single provider, recover from exhausted credentials without jumping providers immediately, and configure that behavior directly in hermes setup. - agent/credential_pool.py: persisted per-provider credential pools - hermes auth add/list/remove/reset CLI commands - 429/402/401 recovery with pool rotation in run_agent.py - Setup wizard integration for pool strategy configuration - Auto-seeding from env vars and existing OAuth state Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com> Salvaged from PR #2647 * fix(tests): prevent pool auto-seeding from host env in credential pool tests Tests for non-pool Anthropic paths and auth remove were failing when host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials were present. The pool auto-seeding picked these up, causing unexpected pool entries in tests. - Mock _select_pool_entry in auxiliary_client OAuth flag tests - Clear Anthropic env vars and mock _seed_from_singletons in auth remove test * feat(auth): add thread safety, least_used strategy, and request counting - Add threading.Lock to CredentialPool for gateway thread safety (concurrent requests from multiple gateway sessions could race on pool state mutations without this) - Add 'least_used' rotation strategy that selects the credential with the lowest request_count, distributing load more evenly - Add request_count field to PooledCredential for usage tracking - Add mark_used() method to increment per-credential request counts - Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current() with lock acquisition - Add tests: least_used selection, mark_used counting, concurrent thread safety (4 threads × 20 selects with no corruption) * feat(auth): add interactive mode for bare 'hermes auth' command When 'hermes auth' is called without a subcommand, it now launches an interactive wizard that: 1. Shows full credential pool status across all providers 2. Offers a menu: add, remove, reset cooldowns, set strategy 3. For OAuth-capable providers (anthropic, nous, openai-codex), the add flow explicitly asks 'API key or OAuth login?' — making it clear that both auth types are supported for the same provider 4. Strategy picker shows all 4 options (fill_first, round_robin, least_used, random) with the current selection marked 5. Remove flow shows entries with indices for easy selection The subcommand paths (hermes auth add/list/remove/reset) still work exactly as before for scripted/non-interactive use. * fix(tests): update runtime_provider tests for config.yaml source of truth (#4165) Tests were using OPENAI_BASE_URL env var which is no longer consulted after #4165. Updated to use model config (provider, base_url, api_key) which is the new single source of truth for custom endpoint URLs. * feat(auth): support custom endpoint credential pools keyed by provider name Custom OpenAI-compatible endpoints all share provider='custom', making the provider-keyed pool useless. Now pools for custom endpoints are keyed by 'custom:<normalized_name>' where the name comes from the custom_providers config list (auto-generated from URL hostname). - Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)' - load_pool('custom:name') seeds from custom_providers api_key AND model.api_key when base_url matches - hermes auth add/list now shows custom endpoints alongside registry providers - _resolve_openrouter_runtime and _resolve_named_custom_runtime check pool before falling back to single config key - 6 new tests covering custom pool keying, seeding, and listing * docs: add Excalidraw diagram of full credential pool flow Comprehensive architecture diagram showing: - Credential sources (env vars, auth.json OAuth, config.yaml, CLI) - Pool storage and auto-seeding - Runtime resolution paths (registry, custom, OpenRouter) - Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh) - CLI management commands and strategy configuration Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g * fix(tests): update setup wizard pool tests for unified select_provider_and_model flow The setup wizard now delegates to select_provider_and_model() instead of using its own prompt_choice-based provider picker. Tests needed: - Mock select_provider_and_model as no-op (provider pre-written to config) - Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it) - Pre-write model.provider to config so the pool step is reached * docs: add comprehensive credential pool documentation - New page: website/docs/user-guide/features/credential-pools.md Full guide covering quick start, CLI commands, rotation strategies, error recovery, custom endpoint pools, auto-discovery, thread safety, architecture, and storage format. - Updated fallback-providers.md to reference credential pools as the first layer of resilience (same-provider rotation before cross-provider) - Added hermes auth to CLI commands reference with usage examples - Added credential_pool_strategies to configuration guide * chore: remove excalidraw diagram from repo (external link only) * refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns - _load_config_safe(): replace 4 identical try/except/import blocks - _iter_custom_providers(): shared generator for custom provider iteration - PooledCredential.extra dict: collapse 11 round-trip-only fields (token_type, scope, client_id, portal_base_url, obtained_at, expires_in, agent_key_id, agent_key_expires_in, agent_key_reused, agent_key_obtained_at, tls) into a single extra dict with __getattr__ for backward-compatible access - _available_entries(): shared exhaustion-check between select and peek - Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical) - SimpleNamespace replaces class _Args boilerplate in auth_commands - _try_resolve_from_custom_pool(): shared pool-check in runtime_provider Net -17 lines. All 383 targeted tests pass. --------- Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-31 03:10:01 -07:00
Teknium	1bd206ea5d	feat: add /btw command for ephemeral side questions (#4161 ) Adds /btw <question> — ask a quick follow-up using the current session context without interrupting the main conversation. - Snapshots conversation history, answers with a no-tools agent - Response is not persisted to session history or DB - Runs in a background thread (CLI) / async task (gateway) - Per-session guard prevents concurrent /btw in gateway Implementation: - model_tools.py: enabled_toolsets=[] now correctly means "no tools" (was falsy, fell through to default "all tools") - run_agent.py: persist_session=False gates _persist_session() - cli.py: _handle_btw_command (background thread, Rich panel output) - gateway/run.py: _handle_btw_command + _run_btw_task (async task) - hermes_cli/commands.py: CommandDef for "btw" Inspired by PR #3504 by areu01or00, reimplemented cleanly on current main with the enabled_toolsets=[] fix and without the __btw_no_tools__ hack.	2026-03-30 21:10:05 -07:00
Teknium	54b876a5c9	fix: add actionable guidance to context-exceeded error messages (#4155 ) When context compression fails, users now see hints suggesting /new or /compress instead of a dead-end error. Covers all 4 error paths: payload-too-large, max compression attempts (2 paths), and context length exceeded. Closes #4061 Salvaged from PR #4076 by SHL0MS. Co-authored-by: SHL0MS <SHL0MS@users.noreply.github.com>	2026-03-30 20:23:28 -07:00
Teknium	f776191650	fix: persist compressed context to gateway session after mid-run compression When context compression fires during run_conversation() in the gateway, the compressed messages were silently lost on the next turn. Two bugs: 1. Agent-side: _flush_messages_to_session_db() calculated flush_from = max(len(conversation_history), _last_flushed_db_idx). After compression, _last_flushed_db_idx was correctly reset to 0, but conversation_history still had its original pre-compression length (e.g. 200). Since compressed messages are shorter (~30), messages[200:] was empty — nothing written to the new session's SQLite. Fix: Set conversation_history = None after each _compress_context() call so start_idx = 0 and all compressed messages are flushed. 2. Gateway-side: history_offset was always len(agent_history) — the original pre-compression length. After compression shortened the message list, agent_messages[200:] was empty, causing the gateway to fall back to writing only a user/assistant pair, losing the compressed summary and tail context. Fix: Detect session splits (agent.session_id != original) and set history_offset = 0 so all compressed messages are written to JSONL.	2026-03-30 18:49:14 -07:00
Robin Fernandes	1126284c97	Merge branch 'main' into rewbs/tool-use-charge-to-subscription	2026-03-31 09:29:43 +09:00
Robin Fernandes	6e4598ce1e	Merge branch 'main' into rewbs/tool-use-charge-to-subscription	2026-03-31 08:48:54 +09:00
Teknium	7dac75f2ae	fix: prevent context pressure warning spam after compression (#4012 ) * feat: add /yolo slash command to toggle dangerous command approvals Adds a /yolo command that toggles HERMES_YOLO_MODE at runtime, skipping all dangerous command approval prompts for the current session. Works in both CLI and gateway (Telegram, Discord, etc.). - /yolo -> ON: all commands auto-approved, no confirmation prompts - /yolo -> OFF: normal approval flow restored The --yolo CLI flag already existed for launch-time opt-in. This adds the ability to toggle mid-session without restarting. Session-scoped — resets when the process ends. Uses the existing HERMES_YOLO_MODE env var that check_all_command_guards() already respects. * fix: prevent context pressure warning spam (agent loop + gateway rate-limit) Two complementary fixes for repeated context pressure warnings spamming gateway users (Telegram, Discord, etc.): 1. Agent-level loop fix (run_agent.py): After compression, only reset _context_pressure_warned if the post-compression estimate is actually below the 85% warning level. Previously the flag was unconditionally reset, causing the warning to re-fire every loop iteration when compression couldn't reduce below 85% of the threshold (e.g. very low threshold like 15%, or system prompt alone exceeds the warning level). 2. Gateway-level rate-limit (gateway/run.py, salvaged from PR #3786): Per-chat_id cooldown of 1 hour on compression warning messages. Both warning paths ('still large after compression' and 'compression failed') are gated. Defense-in-depth — even if the agent-level fix has edge cases, users won't see more than one warning per hour. Co-authored-by: dlkakbs <dlkakbs@users.noreply.github.com> --------- Co-authored-by: dlkakbs <dlkakbs@users.noreply.github.com>	2026-03-30 13:18:21 -07:00
Teknium	97d6813f51	fix(cache): use deterministic call_id fallbacks instead of random UUIDs (#3991 ) When the API doesn't provide a call_id for tool calls, the fallback generated a random uuid4 hex. This made every API call's input unique when replayed, preventing OpenAI's prompt cache from matching the prefix across turns. Replaced all four uuid4 fallback sites with a deterministic hash of (function_name, arguments, position_index). The same tool call now always produces the same fallback call_id, preserving cache-friendly input stability. Affected code paths: - _chat_messages_to_responses_input() — Codex input reconstruction - _normalize_codex_response() — function_call and custom_tool_call - _build_assistant_message() — assistant message construction	2026-03-30 09:43:56 -07:00
Robin Fernandes	1cbb1b99cc	Gate tool-gateway behind an env var, so it's not in users' faces until we're ready. Even if users enable it, it'll be blocked server-side for now, until we unlock for non-admin users on tool-gateway.	2026-03-30 13:28:10 +09:00
Teknium	2a0e8b001f	fix(cli): handle closed stdout ValueError in safe print paths (#3843 ) When stdout is closed (piped to a dead process, broken terminal), Python raises ValueError('I/O operation on closed file'), not OSError. _safe_print and the API error printer only caught OSError, letting the ValueError propagate and crash the agent. Salvaged from PR #3760 by @apexscaleai. Fixes #3534. Co-authored-by: apexscaleai <apexscaleai@users.noreply.github.com>	2026-03-29 18:21:27 -07:00
Teknium	e314833c9d	feat(display): configurable tool preview length -- show full paths by default (#3841 ) Tool call previews (paths, commands, queries) were hardcoded to truncate at 35-40 chars across CLI spinners, completion lines, and gateway progress messages. Users could not see full file paths in tool output. New config option: display.tool_preview_length (default 0 = no limit). Set a positive number to truncate at that length. Changes: - display.py: module-level _tool_preview_max_len with getter/setter; build_tool_preview() and get_cute_tool_message() _trunc/_path respect it - cli.py: reads config at startup, spinner widget respects config - gateway/run.py: reads config per-message, progress callback respects config - run_agent.py: removed redundant 30-char quiet-mode spinner truncation - config.py: added display.tool_preview_length to DEFAULT_CONFIG Reported by kriskaminski	2026-03-29 18:02:42 -07:00
Teknium	d6b7836210	fix: update session_log_file during context compression (#3835 ) When compression creates a child session with a new session_id, session_log_file was still pointing to the old session's JSON file. This caused _save_session_log() to write new data to the wrong file. Closes #3731. Co-authored-by: kelsia14 <kelsia14@users.noreply.github.com>	2026-03-29 17:49:58 -07:00
Teknium	68d5472810	fix: omit tools param entirely when empty instead of sending None (#3820 ) Some providers (Fireworks AI) reject tools=null, and others (Anthropic) reject tools=[]. The safest approach is to not include the key at all when there are no tools — the OpenAI SDK treats a missing parameter as NOT_GIVEN and omits it from the request entirely. Inspired by PR #3736 (@kelsia14).	2026-03-29 16:12:47 -07:00
Teknium	252fbea005	feat(providers): add ordered fallback provider chain (salvage #1761 ) (#3813 ) Extends the single fallback_model mechanism into an ordered chain. When the primary model fails, Hermes tries each fallback provider in sequence until one succeeds or the chain is exhausted. Config format (new): fallback_providers: - provider: openrouter model: anthropic/claude-sonnet-4 - provider: openai model: gpt-4o Legacy single-dict fallback_model format still works unchanged. Key fix vs original PR: the call sites in the retry loop now use _fallback_index < len(_fallback_chain) instead of the old one-shot _fallback_activated guard, so the chain actually advances through all configured providers. Changes: - run_agent.py: _fallback_chain list + _fallback_index replaces one-shot _fallback_model; _try_activate_fallback() advances through chain; failed provider resolution skips to next entry; call sites updated to allow chain advancement - cli.py: reads fallback_providers with legacy fallback_model compat - gateway/run.py: same - hermes_cli/config.py: fallback_providers: [] in DEFAULT_CONFIG - tests: 12 new chain tests + 6 existing test fixtures updated Co-authored-by: uzaylisak <uzaylisak@users.noreply.github.com>	2026-03-29 16:04:53 -07:00
Teknium	ed6427e0a7	fix(agent): user-friendly 429 rate limit messages with Retry-After support (#3809 ) When hitting rate limits (429), the agent now: - Extracts the Retry-After header from the provider response and uses it as the wait time instead of blind exponential backoff (capped at 120s) - Shows rate-limit-specific messaging: 'Rate limit reached. Waiting Xs before retry (attempt N/M)...' - Shows a distinct exhaustion message: 'Rate limit persisted after N retries. Please try again later.' Non-429 errors keep the existing exponential backoff and generic messaging. Co-authored-by: ygd58 <ygd58@users.noreply.github.com>	2026-03-29 15:48:06 -07:00
Teknium	c62cadb73a	fix: make display_hermes_home imports lazy to prevent ImportError during hermes update (#3776 ) When a user runs 'hermes update', the Python process caches old modules in sys.modules. After git pull updates files on disk, lazy imports of newly-updated modules fail because they try to import display_hermes_home from the cached (old) hermes_constants which doesn't have the function. This specifically broke the gateway auto-restart in cmd_update — importing hermes_cli/gateway.py triggered the top-level 'from hermes_constants import display_hermes_home' against the cached old module. The ImportError was silently caught, so the gateway was never restarted after update. Users with a running gateway then hit the ImportError on their next Telegram/Discord message when the stale gateway process lazily loaded run_agent.py (new version) which also had the top-level import. Fixes: - hermes_cli/gateway.py: lazy import at call site (line 940) - run_agent.py: lazy import at call site (line 6927) - tools/terminal_tool.py: lazy imports at 3 call sites - tools/tts_tool.py: static schema string (no module-level call) - hermes_cli/auth.py: lazy import at call site (line 2024) - hermes_cli/main.py: reload hermes_constants after git pull in cmd_update Also fixes 4 pre-existing test failures in test_parse_env_var caused by NameError on display_hermes_home in terminal_tool.py.	2026-03-29 15:15:17 -07:00
Teknium	9f01244137	fix: replace user-facing hardcoded ~/.hermes paths with display_hermes_home() Prep for profiles: user-facing messages now use display_hermes_home() so diagnostic output shows the correct path for each profile. New helper: display_hermes_home() in hermes_constants.py 12 files swept, ~30 user-facing string replacements. Includes dynamic TTS schema description.	2026-03-28 23:47:21 -07:00
Teknium	0bd7e95dfc	fix(honcho): allow self-hosted local instances without API key (#3644 ) Self-hosted Honcho on localhost doesn't require authentication, but both the activation gates and the SDK client required an API key. Combined fix from three contributor PRs: - Relax all 8 activation gates to accept (api_key OR base_url) as valid credentials (#3482 by @cameronbergh) - Use 'local' placeholder for the SDK client when base_url points to localhost/127.0.0.1/::1 (#3570 by @ygd58) Files changed: run_agent.py (2 gates), cli.py (1 gate), gateway/run.py (1 gate), honcho_integration/cli.py (2 gates), hermes_cli/doctor.py (2 gates), honcho_integration/client.py (SDK). Co-authored-by: cameronbergh <cameronbergh@users.noreply.github.com> Co-authored-by: ygd58 <ygd58@users.noreply.github.com> Co-authored-by: devorun <devorun@users.noreply.github.com>	2026-03-28 17:49:56 -07:00
Teknium	857a5d7b47	fix: sanitize surrogate characters from clipboard paste to prevent UnicodeEncodeError (#3624 ) Pasting text from rich-text editors (Google Docs, Word, etc.) can inject lone surrogate characters (U+D800..U+DFFF) that are invalid UTF-8. The OpenAI SDK serializes messages with ensure_ascii=False, then encodes to UTF-8 for the HTTP body — surrogates crash this with: UnicodeEncodeError: 'utf-8' codec can't encode character '\udce2' Three-layer fix: 1. Primary: sanitize user_message at the top of run_conversation() 2. CLI: sanitize in chat() before appending to conversation_history 3. Safety net: catch UnicodeEncodeError in the API error handler, sanitize the entire messages list in-place, and retry once. Also exclude UnicodeEncodeError from is_local_validation_error so it doesn't get classified as non-retryable. Includes 14 new tests covering the sanitization helpers and the integration with run_conversation().	2026-03-28 16:53:14 -07:00
Teknium	924857c3e3	fix: prevent tool name/arg concatenation for Ollama-compatible endpoints (#3582 ) Ollama reuses index 0 for every tool call in a parallel batch, distinguishing them only by id. The streaming accumulator now detects a new non-empty id at an already-active index and redirects it to a fresh slot, preventing names and arguments from being concatenated into a single tool call. No-op for normal providers that use incrementing indices. Co-authored-by: dmater01 <dmater01@users.noreply.github.com>	2026-03-28 14:08:26 -07:00
Teknium	1d0a119368	fix(display): show reasoning before response when tool calls suppress content (#3566 ) * fix(provider): remove MiniMax /v1→/anthropic auto-correction to allow user override The minimax-specific auto-correction in runtime_provider.py was preventing users from overriding to the OpenAI-compatible endpoint via MINIMAX_BASE_URL. Users in certain regions get nginx 404 on api.minimax.io/anthropic and need to switch to api.minimax.chat/v1. The generic URL-suffix detection already handles /anthropic → anthropic_messages, so the minimax-specific code was redundant for the default path and harmful for the override path. Now: default /anthropic URL works via generic detection, user override to /v1 gets chat_completions mode naturally. Closes #3546 (different approach — respects user overrides instead of changing the default endpoint). * fix(display): show reasoning during streaming even when tool calls suppress content When a model generates content (containing <REASONING_SCRATCHPAD> tags) alongside tool calls in the same API response, content deltas were suppressed from streaming once any tool call chunk arrived. This prevented the CLI's tag extraction from running, so reasoning was never shown during streaming. The post-response fallback then displayed reasoning AFTER the already-visible streamed response, creating a confusing reversed order. Fix: route suppressed content to stream_delta_callback even when tool calls are present. The CLI's _stream_delta handles tag extraction — reasoning tags are routed to the reasoning display box, while non-reasoning text is handled by the existing stream display logic. This ensures reasoning appears before tool execution and the final response, matching the expected visual order.	2026-03-28 12:34:32 -07:00
Teknium	901494d728	feat: make tool-use enforcement configurable via agent.tool_use_enforcement (#3551 ) The TOOL_USE_ENFORCEMENT_GUIDANCE injection (added in #3528) was hardcoded to only match gpt/codex model names. This makes it a config option so users can turn it on for any model family. New config key: agent.tool_use_enforcement - "auto" (default): matches gpt/codex (existing behavior) - true: inject for all models - false: never inject - list of strings: custom model-name substrings to match e.g. ["gpt", "codex", "deepseek", "qwen"] No version bump needed — deep merge provides the default automatically for existing installs. 12 new tests covering all config modes.	2026-03-28 12:31:22 -07:00
Teknium	455bf2e853	feat: activate plugin lifecycle hooks (pre/post_llm_call, session start/end) (#3542 ) The plugin system defined six lifecycle hooks but only pre_tool_call and post_tool_call were invoked. This activates the remaining four so that external plugins (e.g. memory systems) can hook into the conversation loop without touching core code. Hook semantics: - on_session_start: fires once when a new session is created - pre_llm_call: fires once per turn before the tool-calling loop; plugins can return {"context": "..."} to inject into the ephemeral system prompt (not cached, not persisted) - post_llm_call: fires once per turn after the loop completes, with user_message and assistant_response for sync/storage - on_session_end: fires at the end of every run_conversation call invoke_hook() now returns a list of non-None callback return values, enabling pre_llm_call context injection while remaining backward compatible (existing hooks that return None are unaffected). Salvaged from PR #2823. Co-authored-by: Nicolò Boschi <boschi1997@gmail.com>	2026-03-28 11:14:54 -07:00
Teknium	80a899a8e2	fix: enable fine-grained tool streaming for Claude/OpenRouter + retry SSE errors (#3497 ) Root cause: Anthropic buffers entire tool call arguments and goes silent for minutes while thinking (verified: 167s gap with zero SSE events on direct API). OpenRouter's upstream proxy times out after ~125s of inactivity and drops the connection with 'Network connection lost'. Fix: Send the x-anthropic-beta: fine-grained-tool-streaming-2025-05-14 header for Claude models on OpenRouter. This makes Anthropic stream tool call arguments token-by-token instead of buffering them, keeping the connection alive through OpenRouter's proxy. Live-tested: the exact prompt that consistently failed at ~128s now completes successfully — 2,972 lines written, 49K tokens, 8 minutes. Additional improvements: 1. Send explicit max_tokens for Claude through OpenRouter. Without it, OpenRouter defaults to 65,536 (confirmed via echo_upstream_body) — only half of Opus 4.6's 128K limit. 2. Classify SSE 'Network connection lost' as retryable in the streaming inner retry loop. The OpenAI SDK raises APIError from SSE error events, which was bypassing our transient error retry logic. 3. Actionable diagnostic guidance when stream-drop retries exhaust.	2026-03-28 08:01:37 -07:00
Teknium	831e8ba0e5	feat: tool-use enforcement + strip budget warnings from history (#3528 ) Cherry-pick of feat/gpt-tool-steering with modifications: 1. Tool-use enforcement prompt (refactored from GPT-specific): - Renamed GPT_TOOL_USE_GUIDANCE -> TOOL_USE_ENFORCEMENT_GUIDANCE - Added TOOL_USE_ENFORCEMENT_MODELS tuple: ('gpt', 'codex') - Injection logic now checks against the tuple instead of hardcoding 'gpt' — adding new model families is a one-line change - Addresses models describing actions instead of making tool calls 2. Budget warning history stripping: - _strip_budget_warnings_from_history() strips _budget_warning JSON keys and [BUDGET WARNING: ...] text from tool results at the start of run_conversation() - Prevents old budget warnings from poisoning subsequent turns Based on PR #3479 by teknium1.	2026-03-28 07:38:36 -07:00

1 2 3 4 5 ...

452 Commits