hermes-agent

Author	SHA1	Message	Date
Teknium	2021442c8a	fix: cover remaining codex empty-output gaps in fallback + normalizer (#5724 ) Two gaps in the codex empty-output handling: 1. _run_codex_create_stream_fallback() skipped all non-terminal events, so when the fallback path was used and the terminal response had empty output, there was no recovery. Now collects output_item.done and text deltas during the fallback stream, backfills on empty output. 2. _normalize_codex_response() hard-crashed with RuntimeError when output was empty, even when the response had output_text set. The function already had fallback logic at line 3562 to use output_text, but the guard at line 3446 killed it first. Now checks output_text before raising and synthesizes a minimal output item.	2026-04-06 20:58:47 -07:00
Teknium	0e336b0e71	fix: backfill codex stream output from output_item.done events (#5689 ) Salvages the core fix from PR #5673 (egerev) onto current main. The chatgpt.com/backend-api/codex endpoint streams valid output items via response.output_item.done events, but the OpenAI SDK's get_final_response() returns an empty output list. This caused every Codex response to be rejected as invalid. Fix: collect output_item.done events during streaming and backfill response.output when get_final_response() returns empty. Falls back to synthesizing from text deltas when no done events were received. Also moves the synthesis logic from the validation loop (too late, from #5681) into _run_codex_stream() (before the response leaves the streaming function), and simplifies the validation to just log diagnostics since recovery now happens upstream. Co-authored-by: Egor <egerev@users.noreply.github.com>	2026-04-06 18:19:30 -07:00
Teknium	dc4c07ed9d	fix: codex OAuth credential pool disconnect + expired token import (#5681 ) Three bugs causing OpenAI Codex sessions to fail silently: 1. Credential pool vs legacy store disconnect: hermes auth and hermes model store device_code tokens in the credential pool, but get_codex_auth_status(), resolve_codex_runtime_credentials(), and _model_flow_openai_codex() only read from the legacy provider state. Fresh pool tokens were invisible to the auth status checks and model selection flow. 2. _import_codex_cli_tokens() imported expired tokens from ~/.codex/ without checking JWT expiry. Combined with _login_openai_codex() saying 'Login successful!' for expired credentials, users got stuck in a loop of dead tokens being recycled. 3. _login_openai_codex() accepted expired tokens from resolve_codex_runtime_credentials() without validating expiry before telling the user login succeeded. Fixes: - get_codex_auth_status() now checks credential pool first, falls back to legacy provider state - _model_flow_openai_codex() uses pool-aware auth status for token retrieval when fetching model lists - _import_codex_cli_tokens() validates JWT exp claim, rejects expired - _login_openai_codex() verifies resolved token isn't expiring before accepting existing credentials - _run_codex_stream() logs response.incomplete/failed terminal events with status and incomplete_details for diagnostics - Codex empty output recovery: captures streamed text during streaming and synthesizes a response when get_final_response() returns empty output (handles chatgpt.com backend-api edge cases)	2026-04-06 18:10:33 -07:00
Teknium	8cf013ecd9	fix: replace stale 'hermes login' refs with 'hermes auth' + fix credential removal re-seeding (#5670 ) Two fixes: 1. Replace all stale 'hermes login' references with 'hermes auth' across auth.py, auxiliary_client.py, delegate_tool.py, config.py, run_agent.py, and documentation. The 'hermes login' command was deprecated; 'hermes auth' now handles OAuth credential management. 2. Fix credential removal not persisting for singleton-sourced credentials (device_code for openai-codex/nous, hermes_pkce for anthropic). auth_remove_command already cleared env vars for env-sourced credentials, but singleton credentials stored in the auth store were re-seeded by _seed_from_singletons() on the next load_pool() call. Now clears the underlying auth store entry when removing singleton-sourced credentials.	2026-04-06 17:17:57 -07:00
tymrtn	40527ff5e3	fix(auth): actionable error message when Codex refresh token is reused When the Codex CLI (or VS Code extension) consumes a refresh token before Hermes can use it, Hermes previously surfaced a generic 401 error with no actionable guidance. - In `refresh_codex_oauth_pure`: detect `refresh_token_reused` from the OAuth endpoint and raise an AuthError explaining the cause and the exact steps to recover (run `codex` to refresh, then `hermes login`). - In `run_agent.py`: when provider is `openai-codex` and HTTP 401 is received, show Codex-specific recovery steps instead of the generic "check your API key" message. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 16:50:10 -07:00
Julien Talbot	92c19924a9	feat: add xAI prompt caching via x-grok-conv-id header When using xAI's API directly (base_url contains x.ai), send the x-grok-conv-id header set to the Hermes session_id. This routes consecutive requests to the same server, maximizing automatic prompt cache hits. Ref: https://docs.x.ai/developers/advanced-api-usage/prompt-caching	2026-04-06 12:06:33 -07:00
ClintonEmok	f77be22c65	Fix #5211 : Preserve dots in OpenCode Go model names OpenCode Go model names with dots (minimax-m2.7, glm-4.5, kimi-k2.5) were being mangled to hyphens (minimax-m2-7), causing HTTP 401 errors. Two code paths were affected: 1. model_normalize.py: opencode-go was incorrectly in DOT_TO_HYPHEN_PROVIDERS 2. run_agent.py: _anthropic_preserve_dots() did not check for opencode-go Fix: - Remove opencode-go from _DOT_TO_HYPHEN_PROVIDERS (dots are correct for Go) - Add opencode-go to _anthropic_preserve_dots() provider check - Add opencode.ai/zen/go to base_url fallback check - Add regression tests in tests/test_model_normalize.py Co-authored-by: jacob3712 <jacob3712@users.noreply.github.com>	2026-04-06 11:25:06 -07:00
Teknium	9c96f669a1	feat: centralized logging, instrumentation, hermes logs CLI, gateway noise fix (#5430 ) Adds comprehensive logging infrastructure to Hermes Agent across 4 phases: Phase 1 — Centralized logging - New hermes_logging.py with idempotent setup_logging() used by CLI, gateway, and cron - agent.log (INFO+) and errors.log (WARNING+) with RotatingFileHandler + RedactingFormatter - config.yaml logging: section (level, max_size_mb, backup_count) - All entry points wired (cli.py, main.py, gateway/run.py, run_agent.py) - Fixed debug_helpers.py writing to ./logs/ instead of ~/.hermes/logs/ Phase 2 — Event instrumentation - API calls: model, provider, tokens, latency, cache hit % - Tool execution: name, duration, result size (both sequential + concurrent) - Session lifecycle: turn start (session/model/provider/platform), compression (before/after) - Credential pool: rotation events, exhaustion tracking Phase 3 — hermes logs CLI command - hermes logs / hermes logs -f / hermes logs errors / hermes logs gateway - --level, --session, --since filters - hermes logs list (file sizes + ages) Phase 4 — Gateway bug fix + noise reduction - fix: _async_flush_memories() called with wrong arg count — sessions never flushed - Batched session expiry logs: 6 lines/cycle → 2 summary lines - Added inbound message + response time logging 75 new tests, zero regressions on the full suite.	2026-04-06 00:08:20 -07:00
Teknium	dc9c3cac87	chore: remove redundant local import of normalize_usage Already imported at module level (line 94). The local import inside _usage_summary_for_api_request_hook was unnecessary.	2026-04-05 23:31:29 -07:00
kshitijk4poor	f530ef1835	feat(plugins): pre_api_request/post_api_request with narrow payloads - Rename per-LLM-call hooks from pre_llm_request/post_llm_request for clarity vs pre_llm_call - Emit summary kwargs only (counts, usage dict from normalize_usage); keep env_var_enabled for HERMES_DUMP_REQUESTS - Add is_truthy_value/env_var_enabled to utils; wire hermes_cli.plugins._env_enabled through it - Update Langfuse local setup doc; add scripts/langfuse_smoketest.py and optional ~/.hermes plugin tests Made-with: Cursor	2026-04-05 23:31:29 -07:00
kshitijk4poor	9e820dda37	Add request-scoped plugin lifecycle hooks	2026-04-05 23:31:29 -07:00
Teknium	9ca954a274	fix: mem0 API v2 compat, prefetch context fencing, secret redaction (#5423 ) Consolidated salvage from PRs #5301 (qaqcvc), #5339 (lance0), #5058 and #5098 (maymuneth). Mem0 API v2 compatibility (#5301): - All reads use filters={user_id: ...} instead of bare user_id= kwarg - All writes use filters with user_id + agent_id for attribution - Response unwrapping for v2 dict format {results: [...]} - Split _read_filters() vs _write_filters() — reads are user-scoped only for cross-session recall, writes include agent_id - Preserved 'hermes-user' default (no breaking change for existing users) - Omitted run_id scoping from #5301 — cross-session memory is Mem0's core value, session-scoping reads would defeat that purpose Memory prefetch context fencing (#5339): - Wraps prefetched memory in <memory-context> fenced blocks with system note marking content as recalled context, NOT user input - Sanitizes provider output to strip fence-escape sequences, preventing injection where memory content breaks out of the fence - API-call-time only — never persisted to session history Secret redaction (#5058, #5098): - Added prefix patterns for Groq (gsk_), Matrix (syt_), RetainDB (retaindb_), Hindsight (hsk-), Mem0 (mem0_), ByteRover (brv_)	2026-04-05 22:43:33 -07:00
Teknium	0efe7dace7	feat: add GPT/Codex execution discipline guidance for tool persistence (#5414 ) Adds OPENAI_MODEL_EXECUTION_GUIDANCE — XML-tagged behavioral guidance injected for GPT and Codex models alongside the existing tool-use enforcement. Targets four specific failure modes: - <tool_persistence>: retry on empty/partial results instead of giving up - <prerequisite_checks>: do discovery/lookup before jumping to final action - <verification>: check correctness/grounding/formatting before finalizing - <missing_context>: use lookup tools instead of hallucinating Follows the same injection pattern as GOOGLE_MODEL_OPERATIONAL_GUIDANCE for Gemini/Gemma models. Inspired by OpenClaw PR #38953 and OpenAI's GPT-5.4 prompting guide patterns.	2026-04-05 21:51:07 -07:00
Teknium	2563493466	fix: improve timeout debug logging and user-facing diagnostics (#5370 ) Agent activity tracking: - Add _last_activity_ts, _last_activity_desc, _current_tool to AIAgent - Touch activity on: API call start/complete, tool start/complete, first stream chunk, streaming request start - Public get_activity_summary() method for external consumers Gateway timeout diagnostics: - Timeout message now includes what the agent was doing when killed: actively working vs stuck on a tool vs waiting on API response - Includes iteration count, last activity description, seconds since last activity — users can distinguish legitimate long tasks from genuine hangs - 'Still working' notifications now show iteration count and current tool instead of just elapsed time - Stale lock eviction logs include agent activity state for debugging Stream stale timeout: - _emit_status when stale stream is detected (was log-only) — gateway users now see 'No response from provider for Ns' with model and context size - Improved logger.warning with model name and estimated context size Error path notifications (gateway-visible via _emit_status): - Context compression attempts now use _emit_status (was _vprint only) - Non-retryable client errors emit summary before aborting - Max retry exhaustion emits error summary (was _vprint only) - Rate limit exhaustion emits specific rate-limit message These were all CLI-visible but silent to gateway users, which is why people on Telegram/Discord saw generic 'request failed' messages without explanation.	2026-04-05 18:33:33 -07:00
donrhmexe	7409715947	fix: link subagent sessions to parent and hide from session list Subagent sessions spawned by delegate_task were created with parent_session_id=NULL and source=cli, making them indistinguishable from user sessions in hermes sessions list and /resume. Changes: - delegate_tool.py: pass parent_agent.session_id to child agent - run_agent.py: accept parent_session_id param, pass to create_session - hermes_state.py list_sessions_rich: filter parent_session_id IS NULL by default (opt-in include_children=True for callers that need them) - hermes_state.py delete_session: delete child sessions first (FK) - hermes_state.py prune_sessions: delete children before parents (FK) session_search already handles parent_session_id correctly — child sessions are filtered from recent list and resolved to parent root in full-text search results. Fixes #5122	2026-04-05 12:48:50 -07:00
Teknium	12724e6295	feat: progressive subdirectory hint discovery (#5291 ) As the agent navigates into subdirectories via tool calls (read_file, terminal, search_files, etc.), automatically discover and load project context files (AGENTS.md, CLAUDE.md, .cursorrules) from those directories. Previously, context files were only loaded from the CWD at session start. If the agent moved into backend/, frontend/, or any subdirectory with its own AGENTS.md, those instructions were never seen. Now, SubdirectoryHintTracker watches tool call arguments for file paths and shell commands, resolves directories, and loads hint files on first access. Discovered hints are appended to the tool result so the model gets relevant context at the moment it starts working in a new area — without modifying the system prompt (preserving prompt caching). Features: - Extracts paths from tool args (path, workdir) and shell commands - Loads AGENTS.md, CLAUDE.md, .cursorrules (first match per directory) - Deduplicates — each directory loaded at most once per session - Ignores paths outside the working directory - Truncates large hint files at 8K chars - Works on both sequential and concurrent tool execution paths Inspired by Block/goose SubdirectoryHintTracker.	2026-04-05 12:33:47 -07:00
Mibayy	cc2b56b26a	feat(api): structured run events via /v1/runs SSE endpoint Add POST /v1/runs to start async agent runs and GET /v1/runs/{run_id}/events for SSE streaming of typed lifecycle events (tool.started, tool.completed, message.delta, reasoning.available, run.completed, run.failed). Changes the internal tool_progress_callback signature from positional (tool_name, preview, args) to event-type-first (event_type, tool_name, preview, args, **kwargs). Existing consumers filter on event_type and remain backward-compatible. Adds concurrency limit (_MAX_CONCURRENT_RUNS=10) and orphaned run sweep. Fixes logic inversion in cli.py _on_tool_progress where the original PR would have displayed internal tools instead of non-internal ones. Co-authored-by: Mibayy <mibayy@users.noreply.github.com>	2026-04-05 12:05:13 -07:00
Git-on-my-level	fcdd5447e2	fix: keep ACP stdout protocol-clean Route AIAgent print output to stderr via _print_fn for ACP stdio sessions. Gate quiet-mode spinner startup on _should_start_quiet_spinner() so JSON-RPC on stdout isn't corrupted. Child agents inherit the redirect. Co-authored-by: Git-on-my-level <Git-on-my-level@users.noreply.github.com>	2026-04-05 12:05:13 -07:00
Teknium	a0a1b86c2e	fix: accept reasoning-only responses without retries — set content to "(empty)" (#5278 ) * feat: coerce tool call arguments to match JSON Schema types LLMs frequently return numbers as strings ("42" instead of 42) and booleans as strings ("true" instead of true). This causes silent failures with MCP tools and any tool with strictly-typed parameters. Added coerce_tool_args() in model_tools.py that runs before every tool dispatch. For each argument, it checks the tool registry schema and attempts safe coercion: - "42" → 42 when schema says "type": "integer" - "3.14" → 3.14 when schema says "type": "number" - "true"/"false" → True/False when schema says "type": "boolean" - Union types tried in order - Original values preserved when coercion fails or is not applicable Inspired by Block/goose tool argument coercion system. * fix: accept reasoning-only responses without retries — set content to "(empty)" Previously, when a model returned reasoning/thinking but no visible content, we entered a 120-line retry/classify/compress/salvage cascade that wasted 3+ API calls trying to "fix" the response. The model was done thinking — retrying with the same input just burned money. Now reasoning-only responses are accepted immediately: - Reasoning stays in the `reasoning` field (semantically correct) - Content set to "(empty)" — valid non-empty string every provider accepts - No retries, no compression triggers, no salvage logic - Session history contains "(empty)" not "" — prevents #2128 session poisoning where empty assistant content caused prefill rejections Removes ~120 lines, adds ~15. Saves 2-3 API calls per reasoning-only response. Fixes #2128.	2026-04-05 11:30:52 -07:00
LucidPaths	70f798043b	fix: Ollama Cloud auth, /model switch persistence, and alias tab completion - Add OLLAMA_API_KEY to credential resolution chain for ollama.com endpoints - Update requested_provider/_explicit_api_key/_explicit_base_url after /model switch so _ensure_runtime_credentials() doesn't revert the switch - Pass base_url/api_key from fallback config to resolve_provider_client() - Add DirectAlias system: user-configurable model_aliases in config.yaml checked before catalog resolution, with reverse lookup by model ID - Add /model tab completion showing aliases with provider metadata Co-authored-by: LucidPaths <LucidPaths@users.noreply.github.com>	2026-04-05 11:06:06 -07:00
Teknium	51ed7dc2f3	feat: save oversized tool results to file instead of destructive truncation (#5210 ) Previously, tool results exceeding 100K characters were silently chopped with only a '[Truncated]' notice — the rest of the content was lost permanently. The model had no way to access the truncated portion. Now, oversized results are written to HERMES_HOME/cache/tool_responses/ and the model receives: - A 1,500-char head preview for immediate context - The file path so it can use read_file/search_files on the full output This preserves the context window protection (inline content stays small) while making the full data recoverable. Falls back to the old destructive truncation if the file write fails. Inspired by Block/goose's large response handler pattern.	2026-04-05 10:29:57 -07:00
Teknium	4976a8b066	feat: /model command — models.dev primary database + --provider flag (#5181 ) Full overhaul of the model/provider system. ## What changed - models.dev (109 providers, 4000+ models) as primary database for provider identity AND model metadata - --provider flag replaces colon syntax for explicit provider switching - Full ModelInfo/ProviderInfo dataclasses with context, cost, capabilities, modalities - HermesOverlay system merges models.dev + Hermes-specific transport/auth/aggregator flags - User-defined endpoints via config.yaml providers: section - /model (no args) lists authenticated providers with curated model catalog - Rich metadata display: context window, max output, cost/M tokens, capabilities - Config migration: custom_providers list → providers dict (v11→v12) - AIAgent.switch_model() for in-place model swap preserving conversation ## Files agent/models_dev.py, hermes_cli/providers.py, hermes_cli/model_switch.py, hermes_cli/model_normalize.py, cli.py, gateway/run.py, run_agent.py, hermes_cli/config.py, hermes_cli/commands.py	2026-04-05 01:04:44 -07:00
Teknium	441ec48802	style: use module-level re import instead of local import re as _re	2026-04-05 00:20:53 -07:00
kshitijk4poor	65952ac00c	Honor provider reset windows in pooled credential failover Persist structured exhaustion metadata from provider errors, use explicit reset timestamps when available, and expose label-based credential targeting in the auth CLI. This keeps long-lived Codex cooldowns from being misreported as one-hour waits and avoids forcing operators to manage entries by list position alone. Constraint: Existing credential pool JSON needs to remain backward compatible with stored entries that only record status code and timestamp Constraint: Runtime recovery must keep the existing retry-then-rotate semantics for 429s while enriching pool state with provider metadata Rejected: Add a separate credential scheduler subsystem \| too large for the Hermes pool architecture and unnecessary for this fix Rejected: Only change CLI formatting \| would leave runtime rotation blind to resets_at and preserve the serial-failure behavior Confidence: high Scope-risk: moderate Reversibility: clean Directive: Preserve structured rate-limit metadata when new providers expose reset hints; do not collapse back to status-code-only exhaustion tracking Tested: Focused pytest slice for auth commands, credential pool recovery, and routing (272 passed); py_compile on changed Python files; hermes -w auth list/remove smoke test with temporary HERMES_HOME Not-tested: Full repository pytest suite, broader gateway/integration flows outside the touched auth and pool paths	2026-04-05 00:20:53 -07:00
Lume	ed4a605696	docs: update docstring to mention Fireworks strict validation Updates _sanitize_tool_calls_for_strict_api docstring to explicitly mention Fireworks alongside Mistral as strict APIs requiring sanitization. Also documents the specific fields that are stripped (call_id, response_item_id).	2026-04-05 00:13:25 -07:00
Lume	d90035835b	refactor: use _should_sanitize_tool_calls in run_conversation() Replaces hardcoded Mistral check with the new _should_sanitize_tool_calls() method. Updates comment to mention Fireworks alongside Mistral as strict APIs requiring tool_call field sanitization.	2026-04-05 00:13:25 -07:00
Lume	234c01f690	refactor: use _should_sanitize_tool_calls in _handle_max_iterations() Replaces hardcoded Mistral check with the new _should_sanitize_tool_calls() method. Ensures summary generation works correctly with Fireworks and other strict APIs that reject unknown tool_call fields.	2026-04-05 00:13:25 -07:00
Lume	7f6e509199	refactor: use _should_sanitize_tool_calls in flush_memories() Replaces hardcoded Mistral check with the new _should_sanitize_tool_calls() method. This ensures tool_calls are sanitized for all strict APIs, not just Mistral. Prevents 400 errors from Fireworks and other providers.	2026-04-05 00:13:25 -07:00
Lume	560c6ae143	feat: add _should_sanitize_tool_calls() method Adds a centralized method to determine when tool_calls need sanitization for strict APIs. Returns True for all APIs except codex_responses mode. This prevents 400 errors from providers like Fireworks that reject unknown fields (call_id, response_item_id) in tool_calls.	2026-04-05 00:13:25 -07:00
Teknium	5879b3ef82	fix: move pre_llm_call plugin context to user message, preserve prompt cache (#5146 ) Plugin context from pre_llm_call hooks was injected into the system prompt, breaking the prompt cache prefix every turn when content changed (typical for memory plugins). Now all plugin context goes into the current turn's user message — the system prompt stays identical across turns, preserving cached tokens. The system prompt is reserved for Hermes internals. Plugins contribute context alongside the user's input. Also adds comprehensive documentation for all 6 plugin hooks: pre_tool_call, post_tool_call, pre_llm_call, post_llm_call, on_session_start, on_session_end — each with full callback signatures, parameter tables, firing conditions, and examples. Supersedes #5138 which identified the same cache-busting bug and proposed an uncached system suffix approach. This fix goes further by removing system prompt injection entirely. Co-identified-by: OutThisLife (PR #5138)	2026-04-04 16:55:44 -07:00
Teknium	5a98ce5973	fix: use clean user message for all memory provider operations (#4940 ) When a skill is active, user_message contains the full SKILL.md content injected by the skill system. This bloated string was being passed to memory provider sync_all(), queue_prefetch_all(), and prefetch_all(), causing providers with query size limits (e.g. Honcho's 10K char limit) to fail. Both call sites now use original_user_message (the clean user input, already defined at line 6516) instead of the skill-inflated user_message: - Pre-turn prefetch (line ~6695): prefetch_all() query - Post-turn sync (line ~8672): sync_all() + queue_prefetch_all() Fixes #4889	2026-04-03 20:43:01 -07:00
Teknium	cee761ee4a	fix: prevent duplicate messages — gateway dedup + partial stream guard (#4878 ) * fix(gateway): add message deduplication to Discord and Slack adapters (#4777) Discord RESUME replays events after reconnects (~7/day observed), and Slack Socket Mode can redeliver events if the ack was lost. Neither adapter tracked which messages were already processed, causing duplicate bot responses. Add _seen_messages dedup cache (message ID → timestamp) with 5-min TTL and 2000-entry cap to both adapters, matching the pattern already used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email. The check goes at the very top of the message handler, before any other logic, so replayed events are silently dropped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent duplicate messages on partial stream delivery When streaming fails after tokens are already delivered to the platform, _interruptible_streaming_api_call re-raised the error into the outer retry loop, which would make a new API call — creating a duplicate message. Now checks deltas_were_sent before re-raising: if partial content was already streamed, returns a stub response instead. The outer loop treats the turn as complete (no retry, no fallback, no duplicate). Inspired by PR #4871 (@trevorgordon981) which identified the bug. This implementation avoids monkey-patching exception objects and keeps the fix within the streaming call boundary. --------- Co-authored-by: Mibayy <mibayy@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 18:53:52 -07:00
kagura-agent	6bf5946bbe	fix: filter transcript-only roles from chat-completions payload (#4715 ) Add a provider-agnostic role allowlist guard to _sanitize_api_messages() that drops messages with roles not accepted by the chat-completions API (e.g. session_meta). This prevents CLI resume/session restore from leaking transcript-only metadata into the outgoing messages payload. Two layers of defense: 1. API-boundary guard: _sanitize_api_messages() now filters messages by role allowlist (system/user/assistant/tool/function/developer) before the existing orphaned tool-call repair logic. This protects all current and future call paths. 2. CLI restore defense-in-depth: Both session restore paths in cli.py now strip session_meta entries before loading history into conversation_history, matching the existing gateway behavior. Closes #4715	2026-04-03 14:57:33 -07:00
Teknium	e492420df4	fix: route memory provider tools in sequential execution path (#4803 ) Memory provider tools (hindsight_retain, honcho_search, etc.) were advertised to the model via tool schemas but failed with 'Unknown tool' at execution time. The concurrent path (_invoke_tool) correctly checks self._memory_manager.has_tool() before falling through to the registry, but the sequential path (_execute_tool_calls_sequential) was never updated with this check. Since sequential is the default for single tool calls, memory provider tools always hit the registry dispatcher which returns 'Unknown tool' because they're not registered there. Add the memory_manager dispatch check between the delegate_task handler and the quiet_mode fallthrough in the sequential path, with proper spinner/display handling to match the existing pattern. Reported by KiBenderOP — all memory providers affected (Honcho, Hindsight, Holographic, etc.).	2026-04-03 10:31:53 -07:00
Teknium	f374ae4c61	fix: prevent compression death spiral from API disconnects (#2153 ) (#4750 ) Three fixes for long-running gateway sessions that enter a death spiral when API disconnects prevent token data collection, which prevents compression, which causes more disconnects: Layer 1 — Stale token counter fallback (run_agent.py in-loop): When last_prompt_tokens is 0 (stale after API disconnect or provider returned no usage data), fall back to estimate_messages_tokens_rough() instead of passing 0 to should_compress(), which would never fire. Layer 2 — Server disconnect heuristic (run_agent.py error handler): When ReadError/RemoteProtocolError hits a large session (>60% context or >200 messages), treat it as a context-length error and trigger compression rather than burning through retries that all fail the same way. Layer 3 — Hard message count limit (gateway/run.py hygiene): Force compression when a session exceeds 400 messages, regardless of token estimates. This catches runaway growth even when all token-based checks fail due to missing API data. Based on the analysis from PR #2157 by ygd58 — the gateway threshold direction fix (1.4x multiplier) was already resolved on main.	2026-04-03 02:16:46 -07:00
Teknium	8fd9fafc84	fix: handle Anthropic Sonnet long-context tier 429 by reducing to 200k (#4747 ) Anthropic returns HTTP 429 'Extra usage is required for long context requests' when a Claude Max subscription doesn't include the 1M context tier. This is NOT a transient rate limit — retrying won't help. Only applies to Sonnet models (Opus 1M is general access). Detects this specific error before the generic rate-limit handler and: 1. Reduces context_length from 1M to 200k (the standard tier) 2. Triggers context compression to fit 3. Retries with the reduced context The reduction is session-scoped (not persisted) so it auto-recovers if the user later enables extra usage on their subscription. Fixes: Sonnet 4.6 instant rate limits on Claude Max without extra usage	2026-04-03 02:05:02 -07:00
Teknium	470c3ea51a	fix: handle Anthropic long-context tier 429 by reducing to 200k Anthropic returns HTTP 429 'Extra usage is required for long context requests' when a Claude Max subscription doesn't include the 1M context tier. This is NOT a transient rate limit — retrying won't help. Detect this specific error before the generic rate-limit handler and: 1. Reduce context_length from 1M to 200k (the standard tier) 2. Trigger context compression to fit 3. Retry with the reduced context The reduction is session-scoped (not persisted) so it auto-recovers if the user later enables extra usage on their subscription. Fixes: Sonnet 4.6 instant rate limits on Claude Max without extra usage	2026-04-03 01:56:43 -07:00
Jack	f66b3fe76b	fix(acp): include tool results in step_callback for ACP tool_call_update events The step_callback previously only forwarded tool names as strings, so build_tool_complete received result=None and ACP tool_call_update events had empty content/rawOutput. Now prev_tools carries dicts with both name and result by pairing each tool_call with its matching tool-role message via tool_call_id.	2026-04-02 20:54:27 -07:00
Teknium	924bc67eee	feat(memory): pluggable memory provider interface with profile isolation, review fixes, and honcho CLI restoration (#4623 ) * feat(memory): add pluggable memory provider interface with profile isolation Introduces a pluggable MemoryProvider ABC so external memory backends can integrate with Hermes without modifying core files. Each backend becomes a plugin implementing a standard interface, orchestrated by MemoryManager. Key architecture: - agent/memory_provider.py — ABC with core + optional lifecycle hooks - agent/memory_manager.py — single integration point in the agent loop - agent/builtin_memory_provider.py — wraps existing MEMORY.md/USER.md Profile isolation fixes applied to all 6 shipped plugins: - Cognitive Memory: use get_hermes_home() instead of raw env var - Hindsight Memory: check $HERMES_HOME/hindsight/config.json first, fall back to legacy ~/.hindsight/ for backward compat - Hermes Memory Store: replace hardcoded ~/.hermes paths with get_hermes_home() for config loading and DB path defaults - Mem0 Memory: use get_hermes_home() instead of raw env var - RetainDB Memory: auto-derive profile-scoped project name from hermes_home path (hermes-<profile>), explicit env var overrides - OpenViking Memory: read-only, no local state, isolation via .env MemoryManager.initialize_all() now injects hermes_home into kwargs so every provider can resolve profile-scoped storage without importing get_hermes_home() themselves. Plugin system: adds register_memory_provider() to PluginContext and get_plugin_memory_providers() accessor. Based on PR #3825. 46 tests (37 unit + 5 E2E + 4 plugin registration). * refactor(memory): drop cognitive plugin, rewrite OpenViking as full provider Remove cognitive-memory plugin (#727) — core mechanics are broken: decay runs 24x too fast (hourly not daily), prefetch uses row ID as timestamp, search limited by importance not similarity. Rewrite openviking-memory plugin from a read-only search wrapper into a full bidirectional memory provider using the complete OpenViking session lifecycle API: - sync_turn: records user/assistant messages to OpenViking session (threaded, non-blocking) - on_session_end: commits session to trigger automatic memory extraction into 6 categories (profile, preferences, entities, events, cases, patterns) - prefetch: background semantic search via find() endpoint - on_memory_write: mirrors built-in memory writes to the session - is_available: checks env var only, no network calls (ABC compliance) Tools expanded from 3 to 5: - viking_search: semantic search with mode/scope/limit - viking_read: tiered content (abstract ~100tok / overview ~2k / full) - viking_browse: filesystem-style navigation (list/tree/stat) - viking_remember: explicit memory storage via session - viking_add_resource: ingest URLs/docs into knowledge base Uses direct HTTP via httpx (no openviking SDK dependency needed). Response truncation on viking_read to prevent context flooding. * fix(memory): harden Mem0 plugin — thread safety, non-blocking sync, circuit breaker - Remove redundant mem0_context tool (identical to mem0_search with rerank=true, top_k=5 — wastes a tool slot and confuses the model) - Thread sync_turn so it's non-blocking — Mem0's server-side LLM extraction can take 5-10s, was stalling the agent after every turn - Add threading.Lock around _get_client() for thread-safe lazy init (prefetch and sync threads could race on first client creation) - Add circuit breaker: after 5 consecutive API failures, pause calls for 120s instead of hammering a down server every turn. Auto-resets after cooldown. Logs a warning when tripped. - Track success/failure in prefetch, sync_turn, and all tool calls - Wait for previous sync to finish before starting a new one (prevents unbounded thread accumulation on rapid turns) - Clean up shutdown to join both prefetch and sync threads * fix(memory): enforce single external memory provider limit MemoryManager now rejects a second non-builtin provider with a warning. Built-in memory (MEMORY.md/USER.md) is always accepted. Only ONE external plugin provider is allowed at a time. This prevents tool schema bloat (some providers add 3-5 tools each) and conflicting memory backends. The warning message directs users to configure memory.provider in config.yaml to select which provider to activate. Updated all 47 tests to use builtin + one external pattern instead of multiple externals. Added test_second_external_rejected to verify the enforcement. * feat(memory): add ByteRover memory provider plugin Implements the ByteRover integration (from PR #3499 by hieuntg81) as a MemoryProvider plugin instead of direct run_agent.py modifications. ByteRover provides persistent memory via the brv CLI — a hierarchical knowledge tree with tiered retrieval (fuzzy text then LLM-driven search). Local-first with optional cloud sync. Plugin capabilities: - prefetch: background brv query for relevant context - sync_turn: curate conversation turns (threaded, non-blocking) - on_memory_write: mirror built-in memory writes to brv - on_pre_compress: extract insights before context compression Tools (3): - brv_query: search the knowledge tree - brv_curate: store facts/decisions/patterns - brv_status: check CLI version and context tree state Profile isolation: working directory at $HERMES_HOME/byterover/ (scoped per profile). Binary resolution cached with thread-safe double-checked locking. All write operations threaded to avoid blocking the agent (curate can take 120s with LLM processing). * fix(memory): thread remaining sync_turns, fix holographic, add config key Plugin fixes: - Hindsight: thread sync_turn (was blocking up to 30s via _run_in_thread) - RetainDB: thread sync_turn (was blocking on HTTP POST) - Both: shutdown now joins sync threads alongside prefetch threads Holographic retrieval fixes: - reason(): removed dead intersection_key computation (bundled but never used in scoring). Now reuses pre-computed entity_residuals directly, moved role_content encoding outside the inner loop. - contradict(): added _MAX_CONTRADICT_FACTS=500 scaling guard. Above 500 facts, only checks the most recently updated ones to avoid O(n^2) explosion (~125K comparisons at 500 is acceptable). Config: - Added memory.provider key to DEFAULT_CONFIG ("" = builtin only). No version bump needed (deep_merge handles new keys automatically). * feat(memory): extract Honcho as a MemoryProvider plugin Creates plugins/honcho-memory/ as a thin adapter over the existing honcho_integration/ package. All 4 Honcho tools (profile, search, context, conclude) move from the normal tool registry to the MemoryProvider interface. The plugin delegates all work to HonchoSessionManager — no Honcho logic is reimplemented. It uses the existing config chain: $HERMES_HOME/honcho.json -> ~/.honcho/config.json -> env vars. Lifecycle hooks: - initialize: creates HonchoSessionManager via existing client factory - prefetch: background dialectic query - sync_turn: records messages + flushes to API (threaded) - on_memory_write: mirrors user profile writes as conclusions - on_session_end: flushes all pending messages This is a prerequisite for the MemoryManager wiring in run_agent.py. Once wired, Honcho goes through the same provider interface as all other memory plugins, and the scattered Honcho code in run_agent.py can be consolidated into the single MemoryManager integration point. * feat(memory): wire MemoryManager into run_agent.py Adds 8 integration points for the external memory provider plugin, all purely additive (zero existing code modified): 1. Init (~L1130): Create MemoryManager, find matching plugin provider from memory.provider config, initialize with session context 2. Tool injection (~L1160): Append provider tool schemas to self.tools and self.valid_tool_names after memory_manager init 3. System prompt (~L2705): Add external provider's system_prompt_block alongside existing MEMORY.md/USER.md blocks 4. Tool routing (~L5362): Route provider tool calls through memory_manager.handle_tool_call() before the catchall handler 5. Memory write bridge (~L5353): Notify external provider via on_memory_write() when the built-in memory tool writes 6. Pre-compress (~L5233): Call on_pre_compress() before context compression discards messages 7. Prefetch (~L6421): Inject provider prefetch results into the current-turn user message (same pattern as Honcho turn context) 8. Turn sync + session end (~L8161, ~L8172): sync_all() after each completed turn, queue_prefetch_all() for next turn, on_session_end() + shutdown_all() at conversation end All hooks are wrapped in try/except — a failing provider never breaks the agent. The existing memory system, Honcho integration, and all other code paths are completely untouched. Full suite: 7222 passed, 4 pre-existing failures. * refactor(memory): remove legacy Honcho integration from core Extracts all Honcho-specific code from run_agent.py, model_tools.py, toolsets.py, and gateway/run.py. Honcho is now exclusively available as a memory provider plugin (plugins/honcho-memory/). Removed from run_agent.py (-457 lines): - Honcho init block (session manager creation, activation, config) - 8 Honcho methods: _honcho_should_activate, _strip_honcho_tools, _activate_honcho, _register_honcho_exit_hook, _queue_honcho_prefetch, _honcho_prefetch, _honcho_save_user_observation, _honcho_sync - _inject_honcho_turn_context module-level function - Honcho system prompt block (tool descriptions, CLI commands) - Honcho context injection in api_messages building - Honcho params from __init__ (honcho_session_key, honcho_manager, honcho_config) - HONCHO_TOOL_NAMES constant - All honcho-specific tool dispatch forwarding Removed from other files: - model_tools.py: honcho_tools import, honcho params from handle_function_call - toolsets.py: honcho toolset definition, honcho tools from core tools list - gateway/run.py: honcho params from AIAgent constructor calls Removed tests (-339 lines): - 9 Honcho-specific test methods from test_run_agent.py - TestHonchoAtexitFlush class from test_exit_cleanup_interrupt.py Restored two regex constants (_SURROGATE_RE, _BUDGET_WARNING_RE) that were accidentally removed during the honcho function extraction. The honcho_integration/ package is kept intact — the plugin delegates to it. tools/honcho_tools.py registry entries are now dead code (import commented out in model_tools.py) but the file is preserved for reference. Full suite: 7207 passed, 4 pre-existing failures. Zero regressions. * refactor(memory): restructure plugins, add CLI, clean gateway, migration notice Plugin restructure: - Move all memory plugins from plugins/<name>-memory/ to plugins/memory/<name>/ (byterover, hindsight, holographic, honcho, mem0, openviking, retaindb) - New plugins/memory/__init__.py discovery module that scans the directory directly, loading providers by name without the general plugin system - run_agent.py uses load_memory_provider() instead of get_plugin_memory_providers() CLI wiring: - hermes memory setup — interactive curses picker + config wizard - hermes memory status — show active provider, config, availability - hermes memory off — disable external provider (built-in only) - hermes honcho — now shows migration notice pointing to hermes memory setup Gateway cleanup: - Remove _get_or_create_gateway_honcho (already removed in prev commit) - Remove _shutdown_gateway_honcho and _shutdown_all_gateway_honcho methods - Remove all calls to shutdown methods (4 call sites) - Remove _honcho_managers/_honcho_configs dict references Dead code removal: - Delete tools/honcho_tools.py (279 lines, import was already commented out) - Delete tests/gateway/test_honcho_lifecycle.py (131 lines, tested removed methods) - Remove if False placeholder from run_agent.py Migration: - Honcho migration notice on startup: detects existing honcho.json or ~/.honcho/config.json, prints guidance to run hermes memory setup. Only fires when memory.provider is not set and not in quiet mode. Full suite: 7203 passed, 4 pre-existing failures. Zero regressions. * feat(memory): standardize plugin config + add per-plugin documentation Config architecture: - Add save_config(values, hermes_home) to MemoryProvider ABC - Honcho: writes to $HERMES_HOME/honcho.json (SDK native) - Mem0: writes to $HERMES_HOME/mem0.json - Hindsight: writes to $HERMES_HOME/hindsight/config.json - Holographic: writes to config.yaml under plugins.hermes-memory-store - OpenViking/RetainDB/ByteRover: env-var only (default no-op) Setup wizard (hermes memory setup): - Now calls provider.save_config() for non-secret config - Secrets still go to .env via env vars - Only memory.provider activation key goes to config.yaml Documentation: - README.md for each of the 7 providers in plugins/memory/<name>/ - Requirements, setup (wizard + manual), config reference, tools table - Consistent format across all providers The contract for new memory plugins: - get_config_schema() declares all fields (REQUIRED) - save_config() writes native config (REQUIRED if not env-var-only) - Secrets use env_var field in schema, written to .env by wizard - README.md in the plugin directory * docs: add memory providers user guide + developer guide New pages: - user-guide/features/memory-providers.md — comprehensive guide covering all 7 shipped providers (Honcho, OpenViking, Mem0, Hindsight, Holographic, RetainDB, ByteRover). Each with setup, config, tools, cost, and unique features. Includes comparison table and profile isolation notes. - developer-guide/memory-provider-plugin.md — how to build a new memory provider plugin. Covers ABC, required methods, config schema, save_config, threading contract, profile isolation, testing. Updated pages: - user-guide/features/memory.md — replaced Honcho section with link to new Memory Providers page - user-guide/features/honcho.md — replaced with migration redirect to the new Memory Providers page - sidebars.ts — added both new pages to navigation * fix(memory): auto-migrate Honcho users to memory provider plugin When honcho.json or ~/.honcho/config.json exists but memory.provider is not set, automatically set memory.provider: honcho in config.yaml and activate the plugin. The plugin reads the same config files, so all data and credentials are preserved. Zero user action needed. Persists the migration to config.yaml so it only fires once. Prints a one-line confirmation in non-quiet mode. * fix(memory): only auto-migrate Honcho when enabled + credentialed Check HonchoClientConfig.enabled AND (api_key OR base_url) before auto-migrating — not just file existence. Prevents false activation for users who disabled Honcho, stopped using it (config lingers), or have ~/.honcho/ from a different tool. * feat(memory): auto-install pip dependencies during hermes memory setup Reads pip_dependencies from plugin.yaml, checks which are missing, installs them via pip before config walkthrough. Also shows install guidance for external_dependencies (e.g. brv CLI for ByteRover). Updated all 7 plugin.yaml files with pip_dependencies: - honcho: honcho-ai - mem0: mem0ai - openviking: httpx - hindsight: hindsight-client - holographic: (none) - retaindb: requests - byterover: (external_dependencies for brv CLI) * fix: remove remaining Honcho crash risks from cli.py and gateway cli.py: removed Honcho session re-mapping block (would crash importing deleted tools/honcho_tools.py), Honcho flush on compress, Honcho session display on startup, Honcho shutdown on exit, honcho_session_key AIAgent param. gateway/run.py: removed honcho_session_key params from helper methods, sync_honcho param, _honcho.shutdown() block. tests: fixed test_cron_session_with_honcho_key_skipped (was passing removed honcho_key param to _flush_memories_for_session). * fix: include plugins/ in pyproject.toml package list Without this, plugins/memory/ wouldn't be included in non-editable installs. Hermes always runs from the repo checkout so this is belt- and-suspenders, but prevents breakage if the install method changes. * fix(memory): correct pip-to-import name mapping for dep checks The heuristic dep.replace('-', '_') fails for packages where the pip name differs from the import name: honcho-ai→honcho, mem0ai→mem0, hindsight-client→hindsight_client. Added explicit mapping table so hermes memory setup doesn't try to reinstall already-installed packages. * chore: remove dead code from old plugin memory registration path - hermes_cli/plugins.py: removed register_memory_provider(), _memory_providers list, get_plugin_memory_providers() — memory providers now use plugins/memory/ discovery, not the general plugin system - hermes_cli/main.py: stripped 74 lines of dead honcho argparse subparsers (setup, status, sessions, map, peer, mode, tokens, identity, migrate) — kept only the migration redirect - agent/memory_provider.py: updated docstring to reflect new registration path - tests: replaced TestPluginMemoryProviderRegistration with TestPluginMemoryDiscovery that tests the actual plugins/memory/ discovery system. Added 3 new tests (discover, load, nonexistent). * chore: delete dead honcho_integration/cli.py and its tests cli.py (794 lines) was the old 'hermes honcho' command handler — nobody calls it since cmd_honcho was replaced with a migration redirect. Deleted tests that imported from removed code: - tests/honcho_integration/test_cli.py (tested _resolve_api_key) - tests/honcho_integration/test_config_isolation.py (tested CLI config paths) - tests/tools/test_honcho_tools.py (tested the deleted tools/honcho_tools.py) Remaining honcho_integration/ files (actively used by the plugin): - client.py (445 lines) — config loading, SDK client creation - session.py (991 lines) — session management, queries, flush * refactor: move honcho_integration/ into the honcho plugin Moves client.py (445 lines) and session.py (991 lines) from the top-level honcho_integration/ package into plugins/memory/honcho/. No Honcho code remains in the main codebase. - plugins/memory/honcho/client.py — config loading, SDK client creation - plugins/memory/honcho/session.py — session management, queries, flush - Updated all imports: run_agent.py (auto-migration), hermes_cli/doctor.py, plugin __init__.py, session.py cross-import, all tests - Removed honcho_integration/ package and pyproject.toml entry - Renamed tests/honcho_integration/ → tests/honcho_plugin/ * docs: update architecture + gateway-internals for memory provider system - architecture.md: replaced honcho_integration/ with plugins/memory/ - gateway-internals.md: replaced Honcho-specific session routing and flush lifecycle docs with generic memory provider interface docs * fix: update stale mock path for resolve_active_host after honcho plugin migration * fix(memory): address review feedback — P0 lifecycle, ABC contract, honcho CLI restore Review feedback from Honcho devs (erosika): P0 — Provider lifecycle: - Remove on_session_end() + shutdown_all() from run_conversation() tail (was killing providers after every turn in multi-turn sessions) - Add shutdown_memory_provider() method on AIAgent for callers - Wire shutdown into CLI atexit, reset_conversation, gateway stop/expiry Bug fixes: - Remove sync_honcho=False kwarg from /btw callsites (TypeError crash) - Fix doctor.py references to dead 'hermes honcho setup' command - Cache prefetch_all() before tool loop (was re-calling every iteration) ABC contract hardening (all backwards-compatible): - Add session_id kwarg to prefetch/sync_turn/queue_prefetch - Make on_pre_compress() return str (provider insights in compression) - Add *kwargs to on_turn_start() for runtime context - Add on_delegation() hook for parent-side subagent observation - Document agent_context/agent_identity/agent_workspace kwargs on initialize() (prevents cron corruption, enables profile scoping) - Fix docstring: single external provider, not multiple Honcho CLI restoration: - Add plugins/memory/honcho/cli.py (from main's honcho_integration/cli.py with imports adapted to plugin path) - Restore full hermes honcho command with all subcommands (status, peer, mode, tokens, identity, enable/disable, sync, peers, --target-profile) - Restore auto-clone on profile creation + sync on hermes update - hermes honcho setup now redirects to hermes memory setup fix(memory): wire on_delegation, skip_memory for cron/flush, fix ByteRover return type - Wire on_delegation() in delegate_tool.py — parent's memory provider is notified with task+result after each subagent completes - Add skip_memory=True to cron scheduler (prevents cron system prompts from corrupting user representations — closes #4052) - Add skip_memory=True to gateway flush agent (throwaway agent shouldn't activate memory provider) - Fix ByteRover on_pre_compress() return type: None -> str * fix(honcho): port profile isolation fixes from PR #4632 Ports 5 bug fixes found during profile testing (erosika's PR #4632): 1. 3-tier config resolution — resolve_config_path() now checks $HERMES_HOME/honcho.json → ~/.hermes/honcho.json → ~/.honcho/config.json (non-default profiles couldn't find shared host blocks) 2. Thread host=_host_key() through from_global_config() in cmd_setup, cmd_status, cmd_identity (--target-profile was being ignored) 3. Use bare profile name as aiPeer (not host key with dots) — Honcho's peer ID pattern is ^[a-zA-Z0-9_-]+$, dots are invalid 4. Wrap add_peers() in try/except — was fatal on new AI peers, killed all message uploads for the session 5. Gate Honcho clone behind --clone/--clone-all on profile create (bare create should be blank-slate) Also: sanitize assistant_peer_id via _sanitize_id() * fix(tests): add module cleanup fixture to test_cli_provider_resolution test_cli_provider_resolution._import_cli() wipes tools.*, cli, and run_agent from sys.modules to force fresh imports, but had no cleanup. This poisoned all subsequent tests on the same xdist worker — mocks targeting tools.file_tools, tools.send_message_tool, etc. patched the NEW module object while already-imported functions still referenced the OLD one. Caused ~25 cascade failures: send_message KeyError, process_registry FileNotFoundError, file_read_guards timeouts, read_loop_detection file-not-found, mcp_oauth None port, and provider_parity/codex_execution stale tool lists. Fix: autouse fixture saves all affected modules before each test and restores them after, matching the pattern in test_managed_browserbase_and_modal.py.	2026-04-02 15:33:51 -07:00
kshitijk4poor	d2b08406a4	fix(agent): classify think-only empty responses before retrying	2026-04-02 12:29:18 -07:00
Teknium	d89cc7fec1	feat(prompt): add Google model operational guidance for Gemini and Gemma (#4641 ) Adapted from OpenCode's gemini.txt. Gemini and Gemma models now get structured operational directives alongside tool-use enforcement: absolute paths, verify-before-edit, dependency checks, conciseness, parallel tool calls, non-interactive flags, autonomous execution. Based on PR #4026, extended to cover Gemma models.	2026-04-02 11:52:34 -07:00
Teknium	3186668799	feat: per-turn primary runtime restoration and transport recovery (#4624 ) Makes provider fallback turn-scoped in long-lived CLI sessions. Previously, a single transient failure pinned the session to the fallback provider for every subsequent turn. - _primary_runtime dict snapshot at __init__ (model, provider, base_url, api_mode, client_kwargs, compressor state) - _restore_primary_runtime() at top of run_conversation() — restores all state, resets fallback chain index - _try_recover_primary_transport() — one extra recovery cycle (client rebuild + cooldown) for transient transport errors on direct endpoints before fallback - Skipped for aggregator providers (OpenRouter, Nous) - 25 tests Inspired by #4612 (@betamod). Closes #4612.	2026-04-02 10:52:01 -07:00
kshitijk4poor	20441cf2c8	fix(insights): persist token usage for non-CLI sessions	2026-04-02 10:47:13 -07:00
Teknium	585855d2ca	fix: preserve Anthropic thinking block signatures across tool-use turns Anthropic extended thinking blocks include an opaque 'signature' field required for thinking chain continuity across multi-turn tool-use conversations. Previously, normalize_anthropic_response() extracted only the thinking text and set reasoning_details=None, discarding the signature. On subsequent turns the API could not verify the chain. Changes: - _to_plain_data(): new recursive SDK-to-dict converter with depth cap (20 levels) and path-based cycle detection for safety - _extract_preserved_thinking_blocks(): rehydrates preserved thinking blocks (including signature) from reasoning_details on assistant messages, placing them before tool_use blocks as Anthropic requires - normalize_anthropic_response(): stores full thinking blocks in reasoning_details via _to_plain_data() - _extract_reasoning(): adds 'thinking' key to the detail lookup chain so Anthropic-format details are found alongside OpenRouter format Salvaged from PR #4503 by @priveperfumes — focused on the thinking block continuity fix only (cache strategy and other changes excluded).	2026-04-02 10:30:32 -07:00
Ben Barclay	a2e56d044b	Merge branch 'main' into rewbs/tool-use-charge-to-subscription	2026-04-02 11:00:35 +11:00
Teknium	de9bba8d7c	fix: remove hardcoded OpenRouter/opus defaults No model, base_url, or provider is assumed when the user hasn't configured one. Previously the defaults dict in cli.py, AIAgent constructor args, and several fallback paths all hardcoded anthropic/claude-opus-4.6 + openrouter.ai/api/v1 — silently routing unconfigured users to OpenRouter, which 404s for anyone using a different provider. Now empty defaults force the setup wizard to run, and existing users who already completed setup are unaffected (their config.yaml has the model they chose). Files changed: - cli.py: defaults dict, _DEFAULT_CONFIG_MODEL - run_agent.py: AIAgent.__init__ defaults, main() defaults - hermes_cli/config.py: DEFAULT_CONFIG - hermes_cli/runtime_provider.py: is_fallback sentinel - acp_adapter/session.py: default_model - tests: updated to reflect empty defaults	2026-04-01 15:22:26 -07:00
Teknium	3628ccc8c4	feat: use 'developer' role for GPT-5 and Codex models (#4498 ) OpenAI's newer models (GPT-5, Codex) give stronger instruction-following weight to the 'developer' role vs 'system'. Swap the role at the API boundary in _build_api_kwargs() for the chat_completions path so internal message representation stays consistent ('system' everywhere). Applies regardless of provider — OpenRouter, Nous portal, direct, etc. The codex_responses path (direct OpenAI) uses 'instructions' instead of message roles, so it's unaffected. DEVELOPER_ROLE_MODELS constant in prompt_builder.py defines the matching model name substrings: ('gpt-5', 'codex').	2026-04-01 14:49:32 -07:00
Teknium	16d9f58445	fix(gateway): persist memory flush state to prevent redundant re-flushes on restart (#4481 ) * fix: force-close TCP sockets on client cleanup, detect and recover dead connections When a provider drops connections mid-stream (e.g. OpenRouter outage), httpx's graceful close leaves sockets in CLOSE-WAIT indefinitely. These zombie connections accumulate and can prevent recovery without restarting. Changes: - _force_close_tcp_sockets: walks the httpx connection pool and issues socket.shutdown(SHUT_RDWR) + close() to force TCP RST on every socket when a client is closed, preventing CLOSE-WAIT accumulation - _cleanup_dead_connections: probes the primary client's pool for dead sockets (recv MSG_PEEK), rebuilds the client if any are found - Pre-turn health check at the start of each run_conversation call that auto-recovers with a user-facing status message - Primary client rebuild after stale stream detection to purge pool - User-facing messages on streaming connection failures: "Connection to provider dropped — Reconnecting (attempt 2/3)" "Connection failed after 3 attempts — try again in a moment" Made-with: Cursor * fix: pool entry missing base_url for openrouter, clean error messages - _resolve_runtime_from_pool_entry: add OPENROUTER_BASE_URL fallback when pool entry has no runtime_base_url (pool entries from auth.json credential_pool often omit base_url) - Replace Rich console.print for auth errors with plain print() to prevent ANSI escape code mangling through prompt_toolkit's stdout patch - Force-close TCP sockets on client cleanup to prevent CLOSE-WAIT accumulation after provider outages - Pre-turn dead connection detection with auto-recovery and user message - Primary client rebuild after stale stream detection - User-facing status messages on streaming connection failures/retries Made-with: Cursor * fix(gateway): persist memory flush state to prevent redundant re-flushes on restart The _session_expiry_watcher tracked flushed sessions in an in-memory set (_pre_flushed_sessions) that was lost on gateway restart. Expired sessions remained in sessions.json and were re-discovered every restart, causing redundant AIAgent runs that burned API credits and blocked the event loop. Fix: Add a memory_flushed boolean field to SessionEntry, persisted in sessions.json. The watcher sets it after a successful flush. On restart, the flag survives and the watcher skips already-flushed sessions. - Add memory_flushed field to SessionEntry with to_dict/from_dict support - Old sessions.json entries without the field default to False (backward compat) - Remove the ephemeral _pre_flushed_sessions set from SessionStore - Update tests: save/load roundtrip, legacy entry compat, auto-reset behavior	2026-04-01 12:05:02 -07:00
kshitijk4poor	935137f0d9	feat: add inline diff previews for write actions Show inline diffs in the CLI transcript when write_file, patch, or skill_manage modifies files. Captures a filesystem snapshot before the tool runs, computes a unified diff after, and renders it with ANSI coloring in the activity feed. Adds tool_start_callback and tool_complete_callback hooks to AIAgent for pre/post tool execution notifications. Also fixes _extract_parallel_scope_path to normalize relative paths to absolute, preventing the parallel overlap detection from missing conflicts when the same file is referenced with different path styles. Gated by display.inline_diffs config option (default: true). Based on PR #3774 by @kshitijk4poor.	2026-04-01 02:13:57 -07:00
Bartok9	afa75a6185	fix(client): handle is_closed as method in OpenAI SDK The openai SDK's SyncAPIClient.is_closed is a method, not a property. getattr(client, 'is_closed', False) returned the bound method object, which is always truthy — causing _is_openai_client_closed() to report all clients as closed and triggering unnecessary client recreation (~100-200ms TCP+TLS overhead per API call). Fix: check if is_closed is callable and call it, otherwise treat as bool. Fixes #4377 Co-authored-by: Bartok9 <Bartok9@users.noreply.github.com>	2026-04-01 01:40:43 -07:00

1 2 3 4 5 ...

482 Commits