hermes-agent

Author	SHA1	Message	Date
Teknium	f803f66339	fix(terminal): avoid merging heredoc EOF with fence wrapper (#3598 ) One-shot local execution built `printf FENCE; <cmd>; __hermes_rc=...`, so a command ending in a heredoc produced a closing line like `EOF; __hermes_rc=...`, which is not a valid delimiter. Bash then treated the rest of the wrapper as heredoc body, leaking it into tool output (e.g. gh issue/PR flows). Use newline-separated wrapper lines so the delimiter stays alone and the trailer runs after the heredoc completes. Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-28 14:43:41 -07:00
Teknium	404a0b823e	fix: add self-termination guard for pkill/killall targeting hermes/gateway (#3593 ) Prevent the agent from accidentally killing its own process with pkill -f gateway, killall hermes, etc. Adds a dangerous command pattern that triggers the approval flow. Co-authored-by: arasovic <arasovic@users.noreply.github.com>	2026-03-28 14:33:48 -07:00
Teknium	dabe3c34cc	feat(webhook): hermes webhook CLI + skill for event-driven subscriptions (#3578 ) Adds 'hermes webhook' CLI subcommand and a skill — zero new model tools. CLI commands (require webhook platform to be enabled): hermes webhook subscribe <name> [--events, --prompt, --deliver, ...] hermes webhook list hermes webhook remove <name> hermes webhook test <name> All commands gate on webhook platform being enabled in config. If not configured, prints setup instructions (gateway setup wizard, manual config.yaml, or env vars). The agent uses these via terminal tool, guided by the webhook-subscriptions skill which documents setup, common patterns (GitHub, Stripe, CI/CD, monitoring), prompt template syntax, security, and troubleshooting. Adapter enhancement: webhook.py hot-reloads dynamic subscriptions from ~/.hermes/webhook_subscriptions.json on each incoming request (mtime-gated). Static config.yaml routes always take precedence. Docs: updated webhooks.md with Dynamic Subscriptions section, added hermes webhook to cli-commands.md reference. No new model tools. No toolset changes. 24 new tests for CLI CRUD, persistence, enabled-gate, and adapter dynamic route loading.	2026-03-28 14:33:35 -07:00
Teknium	82d6c28bd5	fix(skills): cache-aware /skills install and uninstall in TUI (#3586 ) Two fixes for /skills install and /skills uninstall slash commands: 1. input() hangs indefinitely inside prompt_toolkit's TUI event loop, soft-locking the CLI. The user typing the slash command is already implicit consent, so confirmation is now always skipped. 2. Cache invalidation was unconditional — installing or uninstalling a skill mid-session silently broke the prompt cache, increasing costs. The slash handler now defers cache invalidation by default (skill takes effect next session). Pass --now to invalidate immediately, with a message explaining the cost tradeoff. The CLI argparse path (hermes skills install) is unaffected and still invalidates. Fixes #3474 Salvaged from PR #3496 by dlkakbs.	2026-03-28 14:32:23 -07:00
Teknium	9e411f7d70	fix(update): skip config migration prompts in non-interactive sessions (#3584 ) hermes update hangs on input() when run from cron, scripts, or piped contexts. Check both stdin and stdout isatty(), catch EOFError as a fallback, and print guidance to run 'hermes config migrate' later. Co-authored-by: phippsbot-byte <phippsbot-byte@users.noreply.github.com>	2026-03-28 14:26:32 -07:00
Teknium	708f187549	fix(gateway): exit with failure when all platforms fail with retryable errors (#3592 ) When all messaging platforms exhaust retries and get queued for background reconnection, exit with code 1 so systemd Restart=on-failure can restart the process. Previously the gateway stayed alive as a zombie with no connected platforms and exit code 0. Salvaged from PR #3567 by kelsia14. Test updates added. Co-authored-by: kelsia14 <kelsia14@users.noreply.github.com>	2026-03-28 14:25:12 -07:00
Teknium	d7c41f3cef	fix(telegram): honor proxy env vars in fallback transport (salvage #3411 ) (#3591 ) * fix: keep gateway running through telegram proxy failures - continue gateway startup in degraded mode when Telegram cannot connect yet - ensure Telegram fallback transport also honors proxy env vars - support reconnect retries without taking down the whole gateway * test(telegram): cover proxy env handling in fallback transport --------- Co-authored-by: kufufu9 <pi@local>	2026-03-28 14:23:27 -07:00
Teknium	6893c3befc	fix(gateway): inject PATH + VIRTUAL_ENV into launchd plist for macOS service (#3585 ) Salvage of PR #2173 (hanai) and PR #3432 (timknip). Injects PATH, VIRTUAL_ENV, and HERMES_HOME into the macOS launchd plist so gateway subprocesses find user-installed tools (node, ffmpeg, etc.). Matches systemd unit parity with venv/bin, node_modules/.bin, and resolved node dir in PATH. Includes 7 new tests and docs updates across 4 pages. Co-Authored-By: Han <ihanai1991@gmail.com> Co-Authored-By: timknip <timknip@users.noreply.github.com>	2026-03-28 14:23:26 -07:00
Teknium	924857c3e3	fix: prevent tool name/arg concatenation for Ollama-compatible endpoints (#3582 ) Ollama reuses index 0 for every tool call in a parallel batch, distinguishing them only by id. The streaming accumulator now detects a new non-empty id at an already-active index and redirects it to a fresh slot, preventing names and arguments from being concatenated into a single tool call. No-op for normal providers that use incrementing indices. Co-authored-by: dmater01 <dmater01@users.noreply.github.com>	2026-03-28 14:08:26 -07:00
Teknium	ba3bbf5b53	fix: add missing mattermost/matrix/dingtalk toolsets + platform consistency tests (salvage #3512 ) (#3583 ) * Fixing mattermost configuration parsing bugs * fix: add homeassistant to skills_config + platform consistency tests Follow-up for cherry-picked #3512: - Add homeassistant to skills_config.py PLATFORMS (was in tools_config but missing from skills_config) - Add 3 consistency tests that verify all platforms in tools_config have matching toolset definitions, gateway includes, and skills_config entries — prevents this class of bug from recurring --------- Co-authored-by: DaneelV3 <dannel@v3rtical.tech>	2026-03-28 14:05:02 -07:00
Teknium	d6b4fa2e9f	fix: strip @botname from commands so /new@TigerNanoBot resolves correctly (#3581 ) Commands sent directly to the bot in groups include @botname suffix (e.g. /compress@TigerNanoBot). get_command() now strips the @anything part before lookup, matching how Telegram bot menu generates commands. Fixes all slash commands silently doing nothing when sent with @mention. Co-authored-by: MacroAnarchy <MacroAnarchy@users.noreply.github.com>	2026-03-28 14:01:01 -07:00
Teknium	df1bf0a209	feat(api-server): add basic security headers (#3576 ) Add X-Content-Type-Options: nosniff and Referrer-Policy: no-referrer to all API server responses via a new security_headers_middleware. Co-authored-by: Oktay Aydin <aydnOktay@users.noreply.github.com>	2026-03-28 14:00:52 -07:00
Teknium	49a49983e4	feat(api-server): add Access-Control-Max-Age to CORS preflight responses (#3580 ) Adds Access-Control-Max-Age: 600 to CORS preflight responses, telling browsers to cache the preflight for 10 minutes. Reduces redundant OPTIONS requests and improves perceived latency for browser-based API clients. Salvaged from PR #3514 by aydnOktay. Co-authored-by: aydnOktay <xaydinoktay@gmail.com>	2026-03-28 14:00:03 -07:00
Teknium	e97c0cb578	fix: replace hardcoded ~/.hermes paths with get_hermes_home() for profile support * feat: GPT tool-use steering + strip budget warnings from history Two changes to improve tool reliability, especially for OpenAI GPT models: 1. GPT tool-use enforcement prompt: Adds GPT_TOOL_USE_GUIDANCE to the system prompt when the model name contains 'gpt' and tools are loaded. This addresses a known behavioral pattern where GPT models describe intended actions ('I will run the tests') instead of actually making tool calls. Inspired by similar steering in OpenCode (beast.txt) and Cline (GPT-5.1 variant). 2. Budget warning history stripping: Budget pressure warnings injected by _get_budget_warning() into tool results are now stripped when conversation history is replayed via run_conversation(). Previously, these turn-scoped signals persisted across turns, causing models to avoid tool calls in all subsequent messages after any turn that hit the 70-90% iteration threshold. * fix: replace hardcoded ~/.hermes paths with get_hermes_home() for profile support Prep for the upcoming profiles feature — each profile is a separate HERMES_HOME directory, so all paths must respect the env var. Fixes: - gateway/platforms/matrix.py: Matrix E2EE store was hardcoded to ~/.hermes/matrix/store, ignoring HERMES_HOME. Now uses get_hermes_home() so each profile gets its own Matrix state. - gateway/platforms/telegram.py: Two locations reading config.yaml via Path.home()/.hermes instead of get_hermes_home(). DM topic thread_id persistence and hot-reload would read the wrong config in a profile. - tools/file_tools.py: Security path for hub index blocking was hardcoded to ~/.hermes, would miss the actual profile's hub cache. - hermes_cli/gateway.py: Service naming now uses the profile name (hermes-gateway-coder) instead of a cryptic hash suffix. Extracted _profile_suffix() helper shared by systemd and launchd. - hermes_cli/gateway.py: Launchd plist path and Label now scoped per profile (ai.hermes.gateway-coder.plist). Previously all profiles would collide on the same plist file on macOS. - hermes_cli/gateway.py: Launchd plist now includes HERMES_HOME in EnvironmentVariables — was missing entirely, making custom HERMES_HOME broken on macOS launchd (pre-existing bug). - All launchctl commands in gateway.py, main.py, status.py updated to use get_launchd_label() instead of hardcoded string. Test fixes: DM topic tests now set HERMES_HOME env var alongside Path.home() mock. Launchd test uses get_launchd_label() for expected commands.	2026-03-28 13:51:08 -07:00
Teknium	c0aa06f300	fix(test): update streaming test to match PR #3566 behavior change (#3574 ) PR #3566 intentionally routes suppressed content to stream_delta_callback when tool calls are present, so reasoning tag extraction can fire during streaming. The test was still asserting the old behavior where content after tool calls was fully suppressed from the callback. Updated the assertion to match: content IS delivered to the callback (for tag extraction), with display-level suppression handled by the CLI's _stream_delta.	2026-03-28 13:41:23 -07:00
Teknium	09ebf8b252	feat(api-server): add /v1/health alias for OpenAI compatibility (#3572 ) Add GET /v1/health as an alias to the existing /health endpoint so OpenAI-compatible health checks work out of the box. Co-authored-by: Oktay Aydin <aydnOktay@users.noreply.github.com>	2026-03-28 13:32:39 -07:00
Teknium	901494d728	feat: make tool-use enforcement configurable via agent.tool_use_enforcement (#3551 ) The TOOL_USE_ENFORCEMENT_GUIDANCE injection (added in #3528) was hardcoded to only match gpt/codex model names. This makes it a config option so users can turn it on for any model family. New config key: agent.tool_use_enforcement - "auto" (default): matches gpt/codex (existing behavior) - true: inject for all models - false: never inject - list of strings: custom model-name substrings to match e.g. ["gpt", "codex", "deepseek", "qwen"] No version bump needed — deep merge provides the default automatically for existing installs. 12 new tests covering all config modes.	2026-03-28 12:31:22 -07:00
Teknium	393929831e	fix(gateway): preserve transcript on /compress and hygiene compression (salvage #3516 ) (#3556 ) * fix(gateway): preserve full transcript on /compress instead of overwriting The /compress command calls _compress_context() which correctly ends the old session (preserving its full transcript in SQLite) and creates a new session_id for the continuation. However, it then immediately called rewrite_transcript() on the OLD session_id, overwriting the preserved transcript with the compressed version — destroying searchable history. Auto-compression (triggered by context pressure) does not have this bug because the gateway already handles the session_id swap via the agent.session_id != session_id check after _run_agent_sync. Fix: after _compress_context creates the new session, write the compressed messages into the NEW session_id and update the session store pointer. The old session's full transcript stays intact and searchable via session_search. Before: /compress destroys original messages, session_search can't find details from compressed portions. After: /compress behaves like /new for history — full transcript preserved, compressed context for the live session. * fix(gateway): preserve transcript on /compress and hygiene compression Apply session_id swap after _compress_context in both /compress handler and hygiene pre-compression. _compress_context creates a new session (ending the old one), but both paths were calling rewrite_transcript on the OLD session_id — overwriting the preserved transcript and destroying searchable history. Now follows the same pattern as the auto-compression handler (lines 5415-5423): detect the new session_id, update the session store entry, and write compressed messages to the new session. Also fix FakeCompressAgent test mock to include session_id attribute and simulate the session_id change that real _compress_context performs. Co-authored-by: MacroAnarchy <MacroAnarchy@users.noreply.github.com> --------- Co-authored-by: MacroAnarchy <MacroAnarchy@users.noreply.github.com>	2026-03-28 12:23:43 -07:00
Teknium	be322efdf2	fix(matrix): harden e2ee access-token handling (#3562 ) * fix(matrix): harden e2ee access-token handling * fix: patch nio mock in e2ee maintenance sync loop test The sync_loop now imports nio for SyncError checking (from PR #3280), so the test needs to inject a fake nio module via sys.modules. --------- Co-authored-by: Cortana <andrew+cortana@chalkley.org>	2026-03-28 12:13:35 -07:00
Teknium	df6ce848e9	fix(provider): remove MiniMax /v1→/anthropic auto-correction to allow user override (#3553 ) The minimax-specific auto-correction in runtime_provider.py was preventing users from overriding to the OpenAI-compatible endpoint via MINIMAX_BASE_URL. Users in certain regions get nginx 404 on api.minimax.io/anthropic and need to switch to api.minimax.chat/v1. The generic URL-suffix detection already handles /anthropic → anthropic_messages, so the minimax-specific code was redundant for the default path and harmful for the override path. Now: default /anthropic URL works via generic detection, user override to /v1 gets chat_completions mode naturally. Closes #3546 (different approach — respects user overrides instead of changing the default endpoint).	2026-03-28 11:36:59 -07:00
Teknium	735ca9dfb2	refactor: replace swe-rex with native Modal SDK for Modal backend (#3538 ) Drop the swe-rex dependency for Modal terminal backend and use the Modal SDK directly (Sandbox.create + Sandbox.exec). This fixes: - AsyncUsageWarning from synchronous App.lookup() in async context - DeprecationError from unencrypted_ports / .url on unencrypted tunnels (deprecated 2026-03-05) The new implementation: - Uses modal.App.lookup.aio() for async-safe app creation - Uses Sandbox.create.aio() with 'sleep infinity' entrypoint - Uses Sandbox.exec.aio() for direct command execution (no HTTP server or tunnel needed) - Keeps all existing features: persistent filesystem snapshots, configurable resources (CPU/memory/disk), sudo support, interrupt handling, _AsyncWorker for event loop safety Consistent with the Docker backend precedent (PR #2804) where we removed mini-swe-agent in favor of direct docker run. Files changed: - tools/environments/modal.py - core rewrite - tools/terminal_tool.py - health check: modal instead of swerex - hermes_cli/setup.py - install modal instead of swe-rex[modal] - pyproject.toml - modal extra: modal>=1.0.0 instead of swe-rex[modal] - scripts/kill_modal.sh - grep for hermes-agent instead of swe-rex - tests/ - updated for new implementation - environments/README.md - updated patches section - website/docs - updated install command	2026-03-28 11:21:44 -07:00
Teknium	455bf2e853	feat: activate plugin lifecycle hooks (pre/post_llm_call, session start/end) (#3542 ) The plugin system defined six lifecycle hooks but only pre_tool_call and post_tool_call were invoked. This activates the remaining four so that external plugins (e.g. memory systems) can hook into the conversation loop without touching core code. Hook semantics: - on_session_start: fires once when a new session is created - pre_llm_call: fires once per turn before the tool-calling loop; plugins can return {"context": "..."} to inject into the ephemeral system prompt (not cached, not persisted) - post_llm_call: fires once per turn after the loop completes, with user_message and assistant_response for sync/storage - on_session_end: fires at the end of every run_conversation call invoke_hook() now returns a list of non-None callback return values, enabling pre_llm_call context injection while remaining backward compatible (existing hooks that return None are unaffected). Salvaged from PR #2823. Co-authored-by: Nicolò Boschi <boschi1997@gmail.com>	2026-03-28 11:14:54 -07:00
Teknium	411e3c1539	fix(api-server): allow Idempotency-Key in CORS headers (#3530 ) Browser clients using the Idempotency-Key header for request deduplication were blocked by CORS preflight because the header was not listed in Access-Control-Allow-Headers. Add Idempotency-Key to _CORS_HEADERS and add tests for both the new header allowance and the existing Vary: Origin behavior. Co-authored-by: aydnOktay <aydnOktay@users.noreply.github.com> Co-authored-by: Hermes Agent <hermes@nousresearch.com>	2026-03-28 08:16:41 -07:00
Teknium	80a899a8e2	fix: enable fine-grained tool streaming for Claude/OpenRouter + retry SSE errors (#3497 ) Root cause: Anthropic buffers entire tool call arguments and goes silent for minutes while thinking (verified: 167s gap with zero SSE events on direct API). OpenRouter's upstream proxy times out after ~125s of inactivity and drops the connection with 'Network connection lost'. Fix: Send the x-anthropic-beta: fine-grained-tool-streaming-2025-05-14 header for Claude models on OpenRouter. This makes Anthropic stream tool call arguments token-by-token instead of buffering them, keeping the connection alive through OpenRouter's proxy. Live-tested: the exact prompt that consistently failed at ~128s now completes successfully — 2,972 lines written, 49K tokens, 8 minutes. Additional improvements: 1. Send explicit max_tokens for Claude through OpenRouter. Without it, OpenRouter defaults to 65,536 (confirmed via echo_upstream_body) — only half of Opus 4.6's 128K limit. 2. Classify SSE 'Network connection lost' as retryable in the streaming inner retry loop. The OpenAI SDK raises APIError from SSE error events, which was bypassing our transient error retry logic. 3. Actionable diagnostic guidance when stream-drop retries exhaust.	2026-03-28 08:01:37 -07:00
Teknium	e295a2215a	fix(gateway): include user-local bin paths in systemd unit PATH (#3527 ) Add ~/.local/bin, ~/.cargo/bin, ~/go/bin, ~/.npm-global/bin to the systemd unit PATH so tools installed via uv/pipx/cargo/go are discoverable by MCP servers and terminal commands. Uses a _build_user_local_paths() helper that checks exists() before adding, and correctly resolves home dir for both user and system service types. Co-authored-by: Kal Sze <ksze@users.noreply.github.com>	2026-03-28 07:47:40 -07:00
Teknium	831e8ba0e5	feat: tool-use enforcement + strip budget warnings from history (#3528 ) Cherry-pick of feat/gpt-tool-steering with modifications: 1. Tool-use enforcement prompt (refactored from GPT-specific): - Renamed GPT_TOOL_USE_GUIDANCE -> TOOL_USE_ENFORCEMENT_GUIDANCE - Added TOOL_USE_ENFORCEMENT_MODELS tuple: ('gpt', 'codex') - Injection logic now checks against the tuple instead of hardcoding 'gpt' — adding new model families is a one-line change - Addresses models describing actions instead of making tool calls 2. Budget warning history stripping: - _strip_budget_warnings_from_history() strips _budget_warning JSON keys and [BUDGET WARNING: ...] text from tool results at the start of run_conversation() - Prevents old budget warnings from poisoning subsequent turns Based on PR #3479 by teknium1.	2026-03-28 07:38:36 -07:00
Teknium	9d4b3e5470	fix: harden hermes update against diverged history, non-main branches, and gateway edge cases (salvage #3489 ) (#3492 ) * fix: harden `hermes update` against diverged history, non-main branches, and gateway edge cases The self-update command (`hermes update` / gateway `/update`) could fail or silently corrupt state in several scenarios: 1. Diverged history — `git pull --ff-only` aborts with a cryptic subprocess error when upstream has force-pushed or rebased. Now falls back to `git reset --hard origin/main` since local changes are already stashed. 2. User on a feature branch / detached HEAD — the old code would either clobber the feature branch HEAD to point at origin/main, or silently pull against a non-existent remote branch. Now auto-checkouts main before pulling, with a clear warning. 3. Fetch failures — network or auth errors produced raw subprocess tracebacks. Now shows user-friendly messages ("Network error", "Authentication failed") with actionable hints. 4. reset --hard failure — if the fallback reset itself fails (disk full, permissions), the old code would still attempt stash restore on a broken working tree. Now skips restore and tells the user their changes are safe in stash. 5. Gateway /update stash conflicts — non-interactive mode (Telegram `/update`) called sys.exit(1) when stash restore had conflicts, making the entire update report as failed even though the code update itself succeeded. Now treats stash conflicts as non-fatal in non-interactive mode (returns False instead of exiting). * fix: restore stash and branch on 'already up to date' early return The PR moved stash creation before the commit-count check (needed for the branch-switching feature), but the 'already up to date' early return didn't restore the stash or switch back to the original branch — leaving the user stranded on main with changes trapped in a stash. Now the early-return path restores the stash and checks out the original branch when applicable. --------- Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-27 23:12:43 -07:00
Teknium	290c71a707	fix(gateway): scope progress thread fallback to Slack only (salvage #3414 ) (#3488 ) * test(gateway): map fixture adapter by platform in progress threading tests * fix(gateway): scope progress thread fallback to Slack only --------- Co-authored-by: EmpireOperating <258363005+EmpireOperating@users.noreply.github.com>	2026-03-27 22:37:53 -07:00
Teknium	09796b183b	fix: alibaba provider default endpoint and model list (#3484 ) - Change default inference_base_url from dashscope-intl Anthropic-compat endpoint to coding-intl OpenAI-compat /v1 endpoint. The old Anthropic endpoint 404'd when used with the OpenAI SDK (which appends /chat/completions to a /apps/anthropic base URL). - Update curated model list: remove models unavailable on coding-intl (qwen3-max, qwen-plus-latest, qwen3.5-flash, qwen-vl-max), add third-party models available on the platform (glm-5, glm-4.7, kimi-k2.5, MiniMax-M2.5). - URL-based api_mode auto-detection still works: overriding DASHSCOPE_BASE_URL to an /apps/anthropic endpoint automatically switches to anthropic_messages mode. - Update provider description and env var descriptions to reflect the coding-intl multi-provider platform. - Update tests to match new default URL and test the anthropic override path instead.	2026-03-27 22:10:10 -07:00
Teknium	15cfd20820	fix: cap context pressure percentage at 100% in display (#3480 ) * fix: cap context pressure percentage at 100% in display The forward-looking token estimate can overshoot the compaction threshold (e.g. a large tool result pushes it from 70% to 109% in one step). The progress bar was already capped via min(), but pct_int was not — causing the user to see '109% to compaction' which is confusing. Cap pct_int at 100 in both CLI and gateway display functions. Reported by @JoshExile82. * refactor: use real API token counts for compression decisions Replace the rough chars/3 estimation with actual prompt_tokens + completion_tokens from the API response. The estimation was needed to predict whether tool results would push context past the threshold, but the default 50% threshold leaves ample headroom — if tool results push past it, the next API call reports real usage and triggers compression then. This removes all estimation from the compression and context pressure paths, making both 100% data-driven from provider-reported token counts. Also removes the dead _msg_count_before_tools variable.	2026-03-27 21:42:09 -07:00
Teknium	8fdfc4b00c	fix(agent): detect thinking-budget exhaustion on truncation, skip useless retries (#3444 ) When finish_reason='length' and the response contains only reasoning (think blocks or empty content), the model exhausted its output token budget on thinking with nothing left for the actual response. Previously, this fell into either: - chat_completions: 3 useless continuation retries (model hits same limit) - anthropic/codex: generic 'Response truncated' error with rollback Now: detect the think-only + length condition early and return immediately with a targeted error message: 'Model used all output tokens on reasoning with none left for the response. Try lowering reasoning effort or increasing max_tokens.' This saves 2 wasted API calls on the chat_completions path and gives users actionable guidance instead of a cryptic error. The existing think-only retry logic (finish_reason='stop') is unchanged — that's a genuine model glitch where retrying can help.	2026-03-27 15:29:30 -07:00
Teknium	658692799d	fix: guard aux LLM calls against None content + reasoning fallback + retry (salvage #3389 ) (#3449 ) Salvage of #3389 by @binhnt92 with reasoning fallback and retry logic added on top. All 7 auxiliary LLM call sites now use extract_content_or_reasoning() which mirrors the main agent loop's behavior: extract content, strip think blocks, fall back to structured reasoning fields, retry on empty. Closes #3389.	2026-03-27 15:28:19 -07:00
Teknium	ab09f6b568	feat: curate HF model picker with OpenRouter analogues (#3440 ) Show only agentic models that map to OpenRouter defaults: Qwen/Qwen3.5-397B-A17B ↔ qwen/qwen3.5-plus Qwen/Qwen3.5-35B-A3B ↔ qwen/qwen3.5-35b-a3b deepseek-ai/DeepSeek-V3.2 ↔ deepseek/deepseek-chat moonshotai/Kimi-K2.5 ↔ moonshotai/kimi-k2.5 MiniMaxAI/MiniMax-M2.5 ↔ minimax/minimax-m2.5 zai-org/GLM-5 ↔ z-ai/glm-5 XiaomiMiMo/MiMo-V2-Flash ↔ xiaomi/mimo-v2-pro moonshotai/Kimi-K2-Thinking ↔ moonshotai/kimi-k2-thinking Users can still pick any HF model via Enter custom model name.	2026-03-27 13:54:46 -07:00
Teknium	e4e04c2005	fix: make tirith block verdicts approvable instead of hard-blocking (#3428 ) Previously, tirith exit code 1 (block) immediately rejected the command with no approval prompt — users saw 'BLOCKED: Command blocked by security scan' and the agent moved on. This prevented gateway/CLI users from approving pipe-to-shell installs like 'curl ... \| sh' even when they understood the risk. Changes: - Tirith 'block' and 'warn' now both go through the approval flow. Users see the full tirith findings (severity, title, description, safer alternatives) and can choose to approve or deny. - New _format_tirith_description() builds rich descriptions from tirith findings JSON so the approval prompt is informative. - CLI startup now warns when tirith is enabled but not available, so users know command scanning is degraded to pattern matching only. The default approval choice is still deny, so the security posture is unchanged for unattended/timeout scenarios. Reported via Discord by pistrie — 'curl -fsSL https://mandex.dev/install.sh \| sh' was hard-blocked with no way to approve.	2026-03-27 13:22:01 -07:00
Teknium	6f11ff53ad	fix(anthropic): use model-native output limits instead of hardcoded 16K (#3426 ) The Anthropic adapter defaulted to max_tokens=16384 when no explicit value was configured. This severely limits thinking-enabled models where thinking tokens count toward max_tokens: - Claude Opus 4.6 supports 128K output but was capped at 16K - Claude Sonnet 4.6 supports 64K output but was capped at 16K With extended thinking (adaptive or budget-based), the model could exhaust the entire 16K on reasoning, leaving zero tokens for the actual response. This caused two user-visible errors: - 'Response truncated (finish_reason=length)' — thinking consumed most tokens - 'Response only contains think block with no content' — thinking consumed all Fix: add _ANTHROPIC_OUTPUT_LIMITS lookup table (sourced from Anthropic docs and Cline's model catalog) and use the model's actual output limit as the default. Unknown future models default to 128K (the current maximum). Also adds context_length clamping: if the user configured a smaller context window (e.g. custom endpoint), max_tokens is clamped to context_length - 1 to avoid exceeding the window. Closes #2706	2026-03-27 13:02:52 -07:00
Teknium	fb46a90098	fix: increase API timeout default from 900s to 1800s for slow-thinking models (#3431 ) Models like GLM-5/5.1 can think for 15+ minutes. The previous 900s (15 min) default for HERMES_API_TIMEOUT killed legitimate requests. Raised to 1800s (30 min) in both places that read the env var: - _build_api_kwargs() timeout (non-streaming total timeout) - _call_chat_completions() write timeout (streaming connection) The streaming per-chunk read timeout (60s) and stale stream detector (180-300s) are unchanged — those are appropriate for inter-chunk timing.	2026-03-27 13:02:23 -07:00
Teknium	fd8c465e42	feat: add Hugging Face as a first-class inference provider (#3419 ) Salvage of PR #1747 (original PR #1171 by @davanstrien) onto current main. Registers Hugging Face Inference Providers (router.huggingface.co/v1) as a named provider: - hermes chat --provider huggingface (or --provider hf) - 18 curated open models via hermes model picker - HF_TOKEN in ~/.hermes/.env - OpenAI-compatible endpoint with automatic failover (Groq, Together, SambaNova, etc.) Files: auth.py, models.py, main.py, setup.py, config.py, model_metadata.py, .env.example, 5 docs pages, 17 new tests. Co-authored-by: Daniel van Strien <davanstrien@gmail.com>	2026-03-27 12:41:59 -07:00
Teknium	f57ebf52e9	fix(api-server): cancel orphaned agent + true interrupt on SSE disconnect (salvage #3399 ) (#3427 ) Salvage of #3399 by @binhnt92 with true agent interruption added on top. When a streaming /v1/chat/completions client disconnects mid-stream, the agent is now interrupted via agent.interrupt() so it stops making LLM API calls, and the asyncio task wrapper is cancelled. Closes #3399.	2026-03-27 11:33:19 -07:00
Teknium	5127567d5d	perf(ttft): cache skills prompt with shared skill_utils module (salvage #3366 ) (#3421 ) Two-layer caching for build_skills_system_prompt(): 1. In-process LRU (OrderedDict, max 8) — same-process: 546ms → <1ms 2. Disk snapshot (.skills_prompt_snapshot.json) — cold start: 297ms → 103ms Key improvements over original PR #3366: - Extract shared logic into agent/skill_utils.py (parse_frontmatter, skill_matches_platform, get_disabled_skill_names, extract_skill_conditions, extract_skill_description, iter_skill_index_files) - tools/skills_tool.py delegates to shared module — zero code duplication - Proper LRU eviction via OrderedDict.move_to_end + popitem(last=False) - Cache invalidation on all skill mutation paths: - skill_manage tool (in-conversation writes) - hermes skills install (CLI hub) - hermes skills uninstall (CLI hub) - Automatic via mtime/size manifest on cold start prompt_builder.py no longer imports tools.skills_tool (avoids pulling in the entire tool registry chain at prompt build time). 6301 tests pass, 0 failures. Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-27 10:54:02 -07:00
Teknium	8ecd7aed2c	fix: prevent reasoning box from rendering 3x during tool-calling loops (#3405 ) Two independent bugs caused the reasoning box to appear three times when the model produced reasoning + tool_calls: Bug A: _build_assistant_message() re-fired reasoning_callback with the full reasoning text even when streaming had already displayed it. The original guard only checked structured reasoning_content deltas, but reasoning also arrives via content tag extraction (<REASONING_SCRATCHPAD>/<think> tags in delta.content), which went through _fire_stream_delta not _fire_reasoning_delta. Fix: skip the callback entirely when streaming is active — both paths display reasoning during the stream. Any reasoning not shown during streaming is caught by the CLI post-response fallback. Bug B: The post-response reasoning display checked _reasoning_stream_started, but that flag was reset by _reset_stream_state() during intermediate turn boundaries (when stream_delta_callback(None) fires between tool calls). Introduced _reasoning_shown_this_turn flag that persists across the tool loop and is only reset at the start of each user turn. Live-tested in PTY: reasoning now shows exactly once per API call, no duplicates across tool-calling loops.	2026-03-27 09:57:50 -07:00
Teknium	e0dbbdb2c9	fix: eliminate 'Event loop is closed' / 'Press ENTER to continue' during idle (#3398 ) The OpenAI SDK's AsyncHttpxClientWrapper.__del__ schedules aclose() via asyncio.get_running_loop().create_task(). When an AsyncOpenAI client is garbage-collected while prompt_toolkit's event loop is running (the common CLI idle state), the aclose() task runs on prompt_toolkit's loop but the underlying TCP transport is bound to a different (dead) worker loop. The transport's self._loop.call_soon() then raises RuntimeError('Event loop is closed'), which prompt_toolkit surfaces as the disruptive 'Unhandled exception in event loop ... Press ENTER to continue...' error. Three-layer fix: 1. neuter_async_httpx_del(): Monkey-patches __del__ to a no-op at CLI startup before any AsyncOpenAI clients are created. Safe because cached clients are explicitly cleaned via _force_close_async_httpx, and uncached clients' TCP connections are cleaned by the OS on exit. 2. Custom asyncio exception handler: Installed on prompt_toolkit's event loop to silently suppress 'Event loop is closed' RuntimeError. Defense-in-depth for SDK upgrades that might change the class name. 3. cleanup_stale_async_clients(): Called after each agent turn (when the agent thread joins) to proactively evict cache entries whose event loop is closed, preventing stale clients from accumulating.	2026-03-27 09:45:25 -07:00
Teknium	eb2127c1dc	fix(cron): prevent recurring job re-fire on gateway crash/restart loop (#3396 ) When a gateway crashes mid-job execution (before mark_job_run can persist the updated next_run_at), the job would fire again on every restart attempt within the grace window. For a daily 6:15 AM job with a 2-hour grace, rapidly restarting the gateway could trigger dozens of duplicate runs. Fix: call advance_next_run() BEFORE run_job() in tick(). For recurring jobs (cron/interval), this preemptively advances next_run_at to the next future occurrence and persists it to disk. If the process then crashes during execution, the job won't be considered due on restart. One-shot jobs are left unchanged — they still retry on restart since there's no future occurrence to advance to. This changes the scheduler from at-least-once to at-most-once semantics for recurring jobs, which is the correct tradeoff: missing one daily message is far better than sending it dozens of times.	2026-03-27 08:02:58 -07:00
Teknium	5a1e2a307a	perf(ttft): salvage easy-win startup optimizations from #3346 (#3395 ) * perf(ttft): dedupe shared tool availability checks * perf(ttft): short-circuit vision auto-resolution * perf(ttft): make Claude Code version detection lazy * perf(ttft): reuse loaded toolsets for skills prompt --------- Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-27 07:49:44 -07:00
Teknium	41d9d08078	fix(telegram): fall back to no thread_id on 'Message thread not found' (#3390 ) python-telegram-bot's BadRequest inherits from NetworkError, so the send() retry loop was catching 'Message thread not found' as a transient network error and retrying 3 times before silently failing. This killed all tool progress messages, streaming responses, and typing indicators when the incoming message carried an invalid message_thread_id. Now detect BadRequest inside the NetworkError handler: - 'thread not found' + thread_id set → clear thread_id and retry once (message still reaches the chat, just without topic threading) - Other BadRequest errors → raise immediately (permanent, don't retry) - True NetworkError → retry as before (transient) 252 silent failures in gateway.log traced to this on 2026-03-26. 5 new tests for thread fallback, non-thread BadRequest, no-thread sends, network retry, and multi-chunk fallback.	2026-03-27 06:07:28 -07:00
Teknium	75fcbc44ce	feat(telegram): auto-discover fallback IPs via DoH when api.telegram.org is unreachable (#3376 ) * feat(telegram): auto-discover fallback IPs via DoH when api.telegram.org is unreachable On some networks (university, corporate), api.telegram.org resolves to a valid Telegram IP that is unreachable due to routing/firewall rules. A different IP in the same Telegram-owned 149.154.160.0/20 block works fine. This adds automatic fallback IP discovery at connect time: 1. Query Google and Cloudflare DNS-over-HTTPS for api.telegram.org A records 2. Exclude the system-DNS IP (the unreachable one), use the rest as fallbacks 3. If DoH is also blocked, fall back to a seed list (149.154.167.220) 4. TelegramFallbackTransport tries primary first, sticks to whichever works No configuration needed — works automatically. TELEGRAM_FALLBACK_IPS env var still available as manual override. Zero impact on healthy networks (primary path succeeds on first attempt, fallback never exercised). No new dependencies (uses httpx already in deps + stdlib socket). * fix: share transport instance and downgrade seed fallback log to info - Use single TelegramFallbackTransport shared between request and get_updates_request so sticky IP is shared across polling and API calls - Keep separate HTTPXRequest instances (different timeout settings) - Downgrade "using seed fallback IPs" from warning to info to avoid noisy logs on healthy networks * fix: add telegram.request mock and discovery fixture to remaining test files The original PR missed test_dm_topics.py and test_telegram_network_reconnect.py — both need the telegram.request mock module. The reconnect test also needs _no_auto_discovery since _handle_polling_network_error calls connect() which now invokes discover_fallback_ips(). --------- Co-authored-by: Mohan Qiao <Gavin-Qiao@users.noreply.github.com>	2026-03-27 04:03:13 -07:00
Teknium	be416cdfa9	fix: guard config.get() against YAML null values to prevent AttributeError (#3377 ) dict.get(key, default) returns None — not the default — when the key IS present but explicitly set to null/~ in YAML. Calling .lower() on that raises AttributeError. Use (config.get(key) or fallback) so both missing keys and explicit nulls coalesce to the intended default. Files fixed: - tools/tts_tool.py — _get_provider() - tools/web_tools.py — _get_backend() - tools/mcp_tool.py — MCPServerTask auth config - trajectory_compressor.py — _detect_provider() and config loading Co-authored-by: dieutx <dangtc94@gmail.com>	2026-03-27 04:03:00 -07:00
Teknium	b8b1f24fd7	fix: handle addition-only hunks in V4A patch parser (#3325 ) V4A patches with only + lines (no context or - lines) were silently dropped because search_lines was empty and the 'if search_lines:' block was the only code path. Addition-only hunks are common when the model generates patches for new functions or blocks. Adds an else branch that inserts at the context_hint position when available, or appends at end of file. Includes 2 regression tests for addition-only hunks with and without context hints. Salvaged from PR #3092 by thakoreh. Co-authored-by: Hiren <hiren.thakore58@gmail.com>	2026-03-26 19:38:04 -07:00
Teknium	a2847ea7f0	fix(gateway): add media download retry to Mattermost, Slack, and base cache (#3323 ) * fix(gateway): add media download retry to Mattermost, Slack, and base cache Media downloads on Mattermost and Slack fail permanently on transient errors (timeouts, 429 rate limits, 5xx server errors). Telegram and WhatsApp already have retry logic, but these platforms had single-attempt downloads with hardcoded 30s timeouts. Changes: - base.py cache_image_from_url: add retry with exponential backoff (covers Signal and any platform using the shared cache helper) - mattermost.py _send_media_url: retry on 429/5xx/timeout (3 attempts) - slack.py _download_slack_file: retry on timeout/5xx (3 attempts) - slack.py _download_slack_file_bytes: same retry pattern * test: add tests for media download retry --------- Co-authored-by: dieutx <dangtc94@gmail.com>	2026-03-26 19:33:18 -07:00
Teknium	58ca875e19	feat(gateway): surface session config on /new, /reset, and auto-reset (#3321 ) When a new session starts in the gateway (via /new, /reset, or auto-reset), send the user a summary of the detected configuration: ✨ Session reset! Starting fresh. ◆ Model: qwen3.5:27b-q4_K_M ◆ Provider: custom ◆ Context: 8K tokens (config) ◆ Endpoint: http://localhost:11434/v1 This makes misconfigured context length immediately visible — a user running a local 8K model that falls to the 128K default will see: ◆ Context: 128K tokens (default — set model.context_length in config to override) Instead of silently getting no compression and degrading responses. - _format_session_info() resolves model, provider, context length, and endpoint from config + runtime, matching the hygiene code's resolution chain - Local/custom endpoints shown; cloud endpoints hidden (not useful) - Context source annotated: config, detected, or default with hint - Appended to /new and /reset responses, and auto-reset notifications - 9 tests covering all formatting paths and failure resilience Addresses the user-facing side of #2708 — instead of trying to fix every edge case in context detection, surface the values so users can immediately see when something is wrong.	2026-03-26 19:27:58 -07:00
Teknium	3f95e741a7	fix: validate empty user messages to prevent Anthropic API 400 errors (#3322 ) When user messages have empty content (e.g., Discord @mention-only messages, unrecognized attachments), the Anthropic API rejects the request with 'user messages must have non-empty content'. Changes: - anthropic_adapter.py: Add empty content validation for user messages (string and list formats), matching the existing pattern for assistant and tool messages. Empty content gets '(empty message)' placeholder. - discord.py: Defense-in-depth check at gateway layer to catch empty messages before they enter session history. - Add 4 regression tests covering empty string, whitespace-only, empty list, and empty text block scenarios. Fixes #3143 Co-authored-by: Bartok9 <bartok9@users.noreply.github.com>	2026-03-26 19:24:03 -07:00

1 2 3 4 5 ...

1069 Commits