hermes-agent

Author	SHA1	Message	Date
Teknium	eb7c408445	fix(gateway): /stop and /new bypass Level 1 active-session guard (#5765 ) * fix(gateway): /stop and /new bypass Level 1 active-session guard The base adapter's Level 1 guard intercepted ALL messages while an agent was running, including /stop and /new. These commands were queued as pending messages instead of being dispatched to the gateway runner's Level 2 handler. When the agent eventually stopped (via the interrupt mechanism), the command text leaked into the conversation as a user message — the model would receive '/stop' as input and respond to it. Fix: Add /stop, /new, and /reset to the bypass set in base.py alongside /approve, /deny, and /status. Consolidate the three separate bypass blocks into one. Commands in the bypass set are dispatched inline to the gateway runner, where Level 2 handles them correctly (hard-kill for /stop, session reset for /new). Also add a safety net in _run_agent's pending-message processing: if the pending text resolves to a known slash command, discard it instead of passing it to the agent. This catches edge cases where command text leaks through the interrupt_message fallback. Refs: #5244 * test: regression tests for command bypass of active-session guard 17 tests covering: - /stop, /new, /reset bypass the Level 1 guard when agent is running - /approve, /deny, /status bypass (existing behavior, now tested) - Regular text and unknown commands still queued (not bypassed) - File paths like '/path/to/file' not treated as commands - Telegram @botname suffix handled correctly - Safety net command resolution (resolve_command detects known commands)	2026-04-07 00:53:45 -07:00
Teknium	8dee82ea1e	fix: stream consumer creates new message after tool boundaries (#5739 ) When streaming was enabled on the gateway, the stream consumer created a single message at the start and kept editing it as tokens arrived. Tool progress messages were sent as separate messages below it. Since edits don't change message position on Telegram/Matrix/Discord, the final response ended up stuck above all tool progress messages — users had to scroll up past potentially dozens of tool call lines to read the answer. The agent already sends stream_delta_callback(None) at tool boundaries (before _execute_tool_calls). The stream consumer was ignoring this signal. Now it treats None as a segment break: finalizes the current message (removes cursor), resets _message_id, and the next text chunk creates a fresh message below the tool progress messages. Timeline before: [msg 1: 'Let me search...' → edits → 'Here is the answer'] ← top [msg 2: tool progress lines] ← bottom Timeline after: [msg 1: 'Let me search...'] ← top [msg 2: tool progress lines] [msg 3: 'Here is the answer'] ← bottom (visible) Reported by SkyLinx on Discord.	2026-04-06 23:00:14 -07:00
eizus	4ec615b0c2	feat(gateway): Enable Slack thread replies without explicit @mentions When a user replies in a Slack thread where the bot has an active conversation session, the bot now processes the message even without an explicit @mention. This improves UX for ongoing threaded discussions. Changes: - Added set_session_store() to BasePlatformAdapter for adapters to check active sessions - Modified SlackAdapter to detect thread replies and check if a session exists for that thread before requiring @mentions - Updated GatewayRunner to inject the session store into adapters - Added comprehensive tests for the new behavior Fixes: Thread replies without @jarvis are now processed if there is an active session, matching user expectations for conversation flow	2026-04-06 21:27:16 -07:00
jtuki	57abc99315	feat(gateway): add per-group access control for Feishu Add fine-grained authorization policies per Feishu group chat via platforms.feishu.extra configuration. - Add global bot-level admins that bypass all group restrictions - Add per-group policies: open, allowlist, blacklist, admin_only, disabled - Add default_group_policy fallback for chats without explicit rules - Thread chat_id through group message gate for per-chat rule selection - Match both open_id and user_id for backward compatibility - Preserve existing FEISHU_ALLOWED_USERS / FEISHU_GROUP_POLICY behavior - Add focused regression tests for all policy modes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 16:54:16 -07:00
jtuki	18727ca9aa	refactor(gateway): simplify Feishu websocket config helpers Consolidate coercion functions, extract loop readiness check, and deduplicate test mock setup to improve maintainability without changing behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 16:54:16 -07:00
jtuki	157d6184e3	fix(gateway): make Feishu websocket overrides effective at runtime Reapply local reconnect and ping settings after the Feishu SDK refreshes its client config so user-provided websocket tuning actually takes effect. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 16:54:16 -07:00
jtuki	ea31d9077c	feat(gateway): add Feishu websocket ping timing overrides Allow Feishu websocket keepalive timing to be configured via platform extra config so disconnects can be detected faster in unstable networks. New optional extra settings: - ws_ping_interval - ws_ping_timeout These values are applied only when explicitly configured. Invalid values fall back to the websocket library defaults by leaving the options unset. This complements the reconnect timing settings added previously and helps reduce total recovery time after network interruptions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 16:54:16 -07:00
jtuki	7d0bf15121	feat(gateway): add configurable Feishu websocket reconnect timing Allow users to configure websocket reconnect behavior via platform extra config to reduce reconnect latency in production environments. The official Feishu SDK defaults to: - First reconnect: random jitter 0-30 seconds - Subsequent retries: 120 second intervals This can cause 20-30 second delays before reconnection after network interruptions. This commit makes these values configurable while keeping the SDK defaults for backward compatibility. Configuration via ~/.hermes/config.yaml: ```yaml platforms: feishu: extra: ws_reconnect_nonce: 0 # Disable first-reconnect jitter (default: 30) ws_reconnect_interval: 3 # Retry every 3 seconds (default: 120) ``` Invalid values (negative numbers, non-integers) fall back to SDK defaults. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 16:54:16 -07:00
jtuki	7cf4bd06bf	fix(gateway): fix Feishu reconnect message drops and shutdown hang This commit fixes two critical bugs in the Feishu adapter that affect message reliability and process lifecycle. Bug Fix 1: Intermittent Message Drops Root cause: Event handler was created once in __init__ and reused across reconnects, causing callbacks to capture stale loop references. When the adapter disconnected and reconnected, old callbacks continued firing with invalid loop references, resulting in dropped messages with warnings: "[Feishu] Dropping inbound message before adapter loop is ready" Fix: - Rebuild event handler on each connect (websocket/webhook) - Clear handler on disconnect - Ensure callbacks always capture current valid loop - Add defensive loop.is_closed() checks with getattr for test compatibility - Unify webhook dispatch path to use same loop checks as websocket mode Bug Fix 2: Process Hangs on Ctrl+C / SIGTERM Root cause: Feishu SDK's websocket client runs in a background thread with an infinite _select() loop that never exits naturally. The thread was never properly joined on disconnect, causing processes to hang indefinitely after Ctrl+C or gateway stop commands. Fix: - Store reference to thread-local event loop (_ws_thread_loop) - On disconnect, cancel all tasks in thread loop and stop it gracefully via call_soon_threadsafe() - Await thread future with 10s timeout - Clean up pending tasks in thread's finally block before closing loop - Add detailed debug logging for disconnect flow Additional Improvements: - Add regression tests for disconnect cleanup and webhook dispatch - Ensure all event callbacks check loop readiness before dispatching Tested on Linux with websocket mode. All Feishu tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 16:54:16 -07:00
kshitijk4poor	05f9267938	fix(matrix): hard-fail E2EE when python-olm missing + stable MATRIX_DEVICE_ID Two issues caused Matrix E2EE to silently not work in encrypted rooms: 1. When matrix-nio is installed without the [e2e] extra (no python-olm / libolm), nio.crypto.ENCRYPTION_ENABLED is False and client.olm is never initialized. The adapter logged warnings but returned True from connect(), so the bot appeared online but could never decrypt messages. Now: check_matrix_requirements() and connect() both hard-fail with a clear error message when MATRIX_ENCRYPTION=true but E2EE deps are missing. 2. Without a stable device_id, the bot gets a new device identity on each restart. Other clients see it as "unknown device" and refuse to share Megolm session keys. Now: MATRIX_DEVICE_ID env var lets users pin a stable device identity that persists across restarts and is passed to nio.AsyncClient constructor + restore_login(). Changes: - gateway/platforms/matrix.py: add _check_e2ee_deps(), hard-fail in connect() and check_matrix_requirements(), MATRIX_DEVICE_ID support in constructor + restore_login - gateway/config.py: plumb MATRIX_DEVICE_ID into platform extras - hermes_cli/config.py: add MATRIX_DEVICE_ID to OPTIONAL_ENV_VARS Closes #3521	2026-04-06 16:54:16 -07:00
WAXLYY	1c0183ec71	fix(gateway): sanitize media URLs in base platform logs	2026-04-06 16:50:05 -07:00
ryanautomated	0f9aa57069	fix: silent memory flush failure on /new and /resume commands The _async_flush_memories() helper accepts (session_id) but both the /new and /resume handlers passed two arguments (session_id, session_key). The TypeError was silently swallowed at DEBUG level, so memory extraction never ran when users typed /new or /resume. One call site (the session expiry watcher) was already fixed in `9c96f669`, but /new and /resume were missed. - gateway/run.py:3247 — remove stray session_key from /new handler - gateway/run.py:4989 — remove stray session_key from /resume handler - tests/gateway/test_resume_command.py:222 — update test assertion	2026-04-06 16:49:42 -07:00
Nick	4f03b9a419	feat(webhook): add {__raw__} template token and thread_id passthrough for forum topics - {__raw__} in webhook prompt templates dumps the full JSON payload (truncated at 4000 chars) - _deliver_cross_platform now passes thread_id/message_thread_id from deliver_extra as metadata, enabling Telegram forum topic delivery - Tests for both features	2026-04-06 16:42:52 -07:00
Mikita Lisavets	9afb9a6cb2	fix: clear session-scoped model overrides during session reset	2026-04-06 13:20:01 -07:00
Dusk1e	1a2f109d8e	Ensure atomic writes for gateway channel directory cache to prevent truncation	2026-04-06 13:20:01 -07:00
kshitijk4poor	5e88eb2ba0	fix(signal): implement send_image_file, send_voice, and send_video for MEDIA: tag delivery The Signal adapter inherited base class defaults for send_image_file(), send_voice(), and send_video() which only sent the file path as text (e.g. '🖼️ Image: /tmp/chart.png') instead of actually delivering the file as a Signal attachment. When agent responses contain MEDIA:/path/to/file tags, the gateway media pipeline extracts them and routes through these methods by file type. Without proper overrides, image/audio/video files were never actually delivered to Signal users. Extract a shared _send_attachment() helper that handles all file validation, size checking, group/DM routing, and RPC dispatch. The four public methods (send_document, send_image_file, send_voice, send_video) now delegate to this helper, following the same pattern used by WhatsApp (_send_media_to_bridge) and Discord (_send_file_attachment). The helper also uses a single stat() call with try/except FileNotFoundError instead of the previous exists() + stat() two-syscall pattern, eliminating a TOCTOU race. As a bonus, send_document() now gains the 100MB size check that was previously missing (inconsistency with send_image). Add 25 tests covering all methods plus MEDIA: tag extraction integration, method-override guards, and send_document's new size check. Fixes #5105	2026-04-06 11:41:34 -07:00
Teknium	89c812d1d2	feat: shared thread sessions by default — multi-user thread support (#5391 ) Threads (Telegram forum topics, Discord threads, Slack threads) now default to shared sessions where all participants see the same conversation. This is the expected UX for threaded conversations where multiple users @mention the bot and interact collaboratively. Changes: - build_session_key(): when thread_id is present, user_id is no longer appended to the session key (threads are shared by default) - New config: thread_sessions_per_user (default: false) — opt-in to restore per-user isolation in threads if needed - Sender attribution: messages in shared threads are prefixed with [sender name] so the agent can tell participants apart - System prompt: shared threads show 'Multi-user thread' note instead of a per-turn User line (avoids busting prompt cache) - Wired through all callers: gateway/run.py, base.py, telegram.py, feishu.py - Regular group messages (no thread) remain per-user isolated (unchanged) - DM threads are unaffected (they have their own keying logic) Closes community request from demontut_ re: thread-based shared sessions.	2026-04-05 19:46:58 -07:00
Teknium	8d5226753f	fix: add missing ButtonStyle.grey to discord mock for test compatibility	2026-04-05 12:42:47 -07:00
Abhey	66d0fa1778	fix: avoid unnecessary Discord members intent on startup Only request the privileged members intent when DISCORD_ALLOWED_USERS includes non-numeric entries that need username resolution. Also release the Discord token lock when startup fails so retries and restarts are not blocked by a stale lock.\n\nAdds regression tests for conditional intents and startup lock cleanup.	2026-04-05 12:42:47 -07:00
MichaelWDanko	c6793d6fc3	fix(gateway): wrap cron helpers with staticmethod to prevent self-binding Plain functions imported as class attributes in APIServerAdapter get auto-bound as methods via Python's descriptor protocol. Every self._cron_() call injected self as the first positional argument, causing TypeError on all 8 cron API endpoints at runtime. Wrap each import with staticmethod() so self._cron_() calls dispatch correctly without modifying any call sites. Co-authored-by: teknium <teknium@nousresearch.com>	2026-04-05 12:31:10 -07:00
Mibayy	cc2b56b26a	feat(api): structured run events via /v1/runs SSE endpoint Add POST /v1/runs to start async agent runs and GET /v1/runs/{run_id}/events for SSE streaming of typed lifecycle events (tool.started, tool.completed, message.delta, reasoning.available, run.completed, run.failed). Changes the internal tool_progress_callback signature from positional (tool_name, preview, args) to event-type-first (event_type, tool_name, preview, args, **kwargs). Existing consumers filter on event_type and remain backward-compatible. Adds concurrency limit (_MAX_CONCURRENT_RUNS=10) and orphaned run sweep. Fixes logic inversion in cli.py _on_tool_progress where the original PR would have displayed internal tools instead of non-internal ones. Co-authored-by: Mibayy <mibayy@users.noreply.github.com>	2026-04-05 12:05:13 -07:00
analista	e8053e8b93	fix(gateway): surface unknown /commands instead of leaking them to the LLM Previously, typing a /command that isn't a built-in, plugin, or skill would silently fall through to the LLM as plain text. The model often interprets it as a loose instruction and invents unrelated tool calls — e.g. a stray /claude_code slipped through and the model fabricated a delegate_task invocation that got stuck in an OAuth loop. Now we check GATEWAY_KNOWN_COMMANDS after the skill / plugin / unavailable-skill lookups and return an actionable message pointing the user at /commands. The user gets feedback, and the agent doesn't waste a round-trip guessing what /foo-bar was supposed to mean.	2026-04-05 11:59:28 -07:00
Damian P	afccbf253c	fix: resolve listed messaging targets consistently	2026-04-05 11:59:28 -07:00
kshitijk4poor	1d2e34c7eb	Prevent Telegram polling handoffs and flood-control send failures Telegram polling can inherit a stale webhook registration when a deployment switches transport modes, which leaves getUpdates idle even though the gateway starts cleanly. Outbound send also treats Telegram retry_after responses as terminal errors, so brief flood control can drop tool progress and replies. Constraint: Keep the PR narrowly scoped to upstream/main Telegram adapter behavior Rejected: Port OpenClaw's broader polling supervisor and offset persistence \| too broad for an isolated fix PR Confidence: high Scope-risk: narrow Reversibility: clean Directive: Polling mode should clear webhook state before starting getUpdates, and send-path retry logic must distinguish flood control from timeouts Tested: uv run --extra dev pytest tests/gateway/test_telegram_* -q Not-tested: Live Telegram webhook-to-polling migration and real Bot API 429 behavior	2026-04-05 11:59:28 -07:00
teyrebaz33	6df0f07ff3	fix: /status command bypasses active-session guard during agent run (#5046 ) When an agent was actively processing a message, /status sent via Telegram (or any gateway) was queued as a pending interrupt instead of being dispatched immediately. The base platform adapter's handle_message() only had special-case bypass logic for /approve and /deny, so /status fell through to the default interrupt path and was never processed as a system command. Apply the same bypass pattern used by /approve//deny: detect cmd == 'status' inside the active-session guard, dispatch directly to the message handler, and send the response without touching session lifecycle or interrupt state. Adds a regression test that verifies /status is dispatched and responded to immediately even when _active_sessions contains an entry for the session.	2026-04-05 11:59:28 -07:00
nepenth	534511bebb	feat(matrix): Tier 1 enhancement — reactions, read receipts, rich formatting, room management Cherry-picked from PR #4338 by nepenth, resolved against current main. Adds: - Processing lifecycle reactions (eyes/checkmark/cross) via MATRIX_REACTIONS env - Reaction send/receive with ReactionEvent + UnknownEvent fallback for older nio - Fire-and-forget read receipts on text and media messages - Message redaction, room history fetch, room creation, user invite - Presence status control (online/offline/unavailable) - Emote (/me) and notice message types with HTML rendering - XSS-hardened markdown-to-HTML converter (strips raw HTML preprocessor, sanitizes link URLs against javascript:/data:/vbscript: schemes) - Comprehensive regex fallback with full block/inline markdown support - Markdown>=3.6 added to [matrix] extras in pyproject.toml - 46 new tests covering all features and security hardening	2026-04-05 11:19:54 -07:00
chalkers	bec02f3731	fix(matrix): handle encrypted media events and cache decrypted attachments Cherry-picked from PR #3140 by chalkers, resolved against current main. Registers RoomEncryptedImage/Audio/Video/File callbacks, decrypts attachments via nio.crypto, caches all media types (images, audio, documents), prevents ciphertext URL fallback for encrypted media. Unifies the separate voice-message download into the main cache block. Preserves main's MATRIX_REQUIRE_MENTION, auto-thread, and mention stripping features. Includes 355 lines of encrypted media tests.	2026-04-05 11:07:47 -07:00
binhnt92	b65e67545a	fix(gateway): stop Matrix/Mattermost reconnect on permanent auth failures Cherry-picked from PR #3695 by binhnt92. Matrix _sync_loop() and Mattermost _ws_loop() were retrying all errors forever, including permanent auth failures (expired tokens, revoked access). Now detects M_UNKNOWN_TOKEN, M_FORBIDDEN, 401/403 and stops instead of spinning. Includes 216 lines of tests.	2026-04-05 11:07:47 -07:00
Teknium	0c54da8aaf	feat(gateway): live-stream /update output + interactive prompt buttons (#5180 ) * feat(gateway): live-stream /update output + forward interactive prompts Adds real-time output streaming and interactive prompt forwarding for the gateway /update command, so users on Telegram/Discord/etc see the full update progress and can respond to prompts (stash restore, config migration) without needing terminal access. Changes: hermes_cli/main.py: - Add --gateway flag to 'hermes update' argparse - Add _gateway_prompt() file-based IPC function that writes .update_prompt.json and polls for .update_response - Modify _restore_stashed_changes() to accept optional input_fn parameter for gateway mode prompt forwarding - cmd_update() uses _gateway_prompt when --gateway is set, enabling interactive stash restore and config migration prompts gateway/run.py: - _handle_update_command: spawn with --gateway flag and PYTHONUNBUFFERED=1 for real-time output flushing - Store session_key in .update_pending.json for cross-restart session matching - Add _update_prompt_pending dict to track sessions awaiting update prompt responses - Replace _watch_for_update_completion with _watch_update_progress: streams output chunks every ~4s, detects .update_prompt.json and forwards prompts to the user, handles completion/failure/timeout - Add update prompt interception in _handle_message: when a prompt is pending, the user's next message is written to .update_response instead of being processed normally - Preserve _send_update_notification as legacy fallback for post-restart cases where adapter isn't available yet File-based IPC protocol: - .update_prompt.json: written by update process with prompt text, default value, and unique ID - .update_response: written by gateway with user's answer - .update_output.txt: existing, now streamed in real-time - .update_exit_code: existing completion marker Tests: 16 new tests covering _gateway_prompt IPC, output streaming, prompt detection/forwarding, message interception, and cleanup. * feat: interactive buttons for update prompts (Telegram + Discord) Telegram: Inline keyboard with ✓ Yes / ✗ No buttons. Clicking a button answers the callback query, edits the message to show the choice, and writes .update_response directly. CallbackQueryHandler registered on the update_prompt: prefix. Discord: UpdatePromptView (discord.ui.View) with green Yes / red No buttons. Follows the ExecApprovalView pattern — auth check, embed color update, disabled-after-click. Writes .update_response on click. All platforms: /approve and /deny (and /yes, /no) now work as shorthand for yes/no when an update prompt is pending. The text fallback message instructs users to use these commands. Raw message interception still works as a fallback for non-command responses. Gateway watcher checks adapter for send_update_prompt method (class-level check to avoid MagicMock false positives) and falls back to text prompt with /approve instructions when unavailable. * fix: block /update on non-messaging platforms (API, webhooks, ACP) Add _UPDATE_ALLOWED_PLATFORMS frozenset that explicitly lists messaging platforms where /update is permitted. API server, webhook, and ACP platforms get a clear error directing them to run hermes update from the terminal instead. ACP and API server already don't reach _handle_message (separate codepaths), and webhooks have distinct session keys that can't collide with messaging sessions. This guard is belt-and-suspenders.	2026-04-05 00:28:58 -07:00
Teknium	85cefc7a5a	fix(telegram): prevent duplicate message delivery on send timeout (#5153 ) TimedOut is a subclass of NetworkError in python-telegram-bot. The inner retry loop in send() and the outer _send_with_retry() in base.py both treated it as a transient connection error and retried — but send_message is not idempotent. When the request reaches Telegram but the HTTP response times out, the message is already delivered. Retrying sends duplicates. Worst case: up to 9 copies (inner 3x × outer 3x). Inner loop (telegram.py): - Import TimedOut separately, isinstance-check before generic NetworkError retry (same pattern as BadRequest carve-out from #3390) - Re-raise immediately — no retry - Mark as retryable=False in outer exception handler Outer loop (base.py): - Remove 'timeout', 'timed out', 'readtimeout', 'writetimeout' from _RETRYABLE_ERROR_PATTERNS (read/write timeouts are delivery-ambiguous) - Add 'connecttimeout' (safe — connection never established) - Keep 'network' (other platforms still need it) - Add _is_timeout_error() + early return to prevent plain-text fallback on timeout errors (would also cause duplicate delivery) Connection errors (ConnectionReset, ConnectError, etc.) are still retried — these fail before the request reaches the server. Credit: tmdgusya (PR #3899), barun1997 (PR #3904) for identifying the bug and proposing fixes. Closes #3899, closes #3904.	2026-04-04 19:05:34 -07:00
Teknium	c8220e69a1	fix: strip MEDIA: directives from streamed gateway messages (#5152 ) When streaming is enabled, the GatewayStreamConsumer sends raw text chunks directly to the platform without post-processing. This causes MEDIA:/path/to/file tags and [[audio_as_voice]] directives to appear as visible text in the user's chat instead of being stripped. The non-streaming path already handles this correctly via extract_media() in base.py, but the streaming path was missing equivalent cleanup. Add _clean_for_display() to GatewayStreamConsumer that strips MEDIA: tags and internal markers before any text reaches the platform. The actual media file delivery is unaffected — _deliver_media_from_response() in gateway/run.py still extracts files from the agent's final_response (separate from the stream consumer's display text). Reported by Ao [FotM] on Discord.	2026-04-04 19:05:27 -07:00
Fran Fitzpatrick	2556cfdab1	fix(gateway): match Discord mention-stripping behavior in Matrix adapter Move mention stripping outside the `if not is_dm` guard so mentions are stripped in DMs too. Remove the bare-mention early return so a message containing only a mention passes through as empty string, matching Discord's behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 13:09:27 -07:00
Fran Fitzpatrick	d86be33161	feat(gateway): add MATRIX_REQUIRE_MENTION and MATRIX_AUTO_THREAD support Bring Matrix feature parity with Discord by adding mention gating and auto-threading. Both default to true, matching Discord behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 13:09:27 -07:00
LucidPaths	6367e1c4c0	fix: remove stale test skips, fix regex backtracking, file search bug, and test flakiness Bug fixes: - agent/redact.py: catastrophic regex backtracking in _ENV_ASSIGN_RE — removed re.IGNORECASE and changed [A-Z_]* to [A-Z0-9_]* to restrict matching to actual env var name chars. Without this, the pattern backtracks exponentially on large strings (e.g. 100K tool output), causing test_file_read_guards to time out. - tools/file_operations.py: over-escaped newline in find -printf format string produced literal backslash-n instead of a real newline, breaking file search result parsing (total_count always 1, paths concatenated). Test fixes: - Remove stale pytestmark.skip from 4 test modules that were blanket-skipped as 'Hangs in non-interactive environments' but actually run fine: - test_413_compression.py (12 tests, 25s) - test_file_tools_live.py (71 tests, 24s) - test_code_execution.py (61 tests, 99s) - test_agent_loop_tool_calling.py (has proper OPENROUTER_API_KEY skip already) - test_413_compression.py: fix threshold values in 2 preflight compression tests where context_length was too small for the compressed output to fit in one pass. - test_mcp_probe.py: add missing _MCP_AVAILABLE mock so tests work without MCP SDK. - test_mcp_tool_issue_948.py: inject MCP symbols (StdioServerParameters etc.) when SDK is not installed so patch() targets exist. - test_approve_deny_commands.py: replace time.sleep(0.3) with deterministic polling of _gateway_queues — fixes race condition where resolve fires before threads register their approval entries, causing the test to hang indefinitely. Net effect: +256 tests recovered from skip, 8 real failures fixed.	2026-04-04 10:18:57 -07:00
Teknium	d4bf517b19	test+docs: add group_topics tests and documentation - 7 new tests covering skill binding, fallthrough, coercion - Docs section in telegram.md with config format, field reference, comparison table, and thread_id discovery tip	2026-04-03 18:20:50 -07:00
Tranquil-Flow	3bfb39a25f	fix(gateway): isolate approval session key per turn	2026-04-03 17:50:01 -07:00
Teknium	b1756084a3	feat: add .zip document support and auto-mount cache dirs into remote backends (#4846 ) - Add .zip to SUPPORTED_DOCUMENT_TYPES so gateway platforms (Telegram, Slack, Discord) cache uploaded zip files instead of rejecting them. - Add get_cache_directory_mounts() and iter_cache_files() to credential_files.py for host-side cache directory passthrough (documents, images, audio, screenshots). - Docker: bind-mount cache dirs read-only alongside credentials/skills. Changes are live (bind mount semantics). - Modal: mount cache files at sandbox creation + resync before each command via _sync_files() with mtime+size change detection. - Handles backward-compat with legacy dir names (document_cache, image_cache, audio_cache, browser_screenshots) via get_hermes_dir(). - Container paths always use the new cache/<subdir> layout regardless of host layout. This replaces the need for a dedicated extract_archive tool (PR #4819) — the agent can now use standard terminal commands (unzip, tar) on uploaded files inside remote containers. Closes: related to PR #4819 by kshitijk4poor	2026-04-03 13:16:26 -07:00
Teknium	8a384628a5	fix(memory): profile-scoped memory isolation and clone support (#4845 ) Three fixes for memory+profile isolation bugs: 1. memory_tool.py: Replace module-level MEMORY_DIR constant with get_memory_dir() function that calls get_hermes_home() dynamically. The old constant was cached at import time and could go stale if HERMES_HOME changed after import. Internal MemoryStore methods now call get_memory_dir() directly. MEMORY_DIR kept as backward-compat alias. 2. profiles.py: profile create --clone now copies MEMORY.md and USER.md from the source profile. These curated memory files are part of the agent's identity (same as SOUL.md) and should carry over on clone. 3. holographic plugin: initialize() now expands $HERMES_HOME and ${HERMES_HOME} in the db_path config value, so users can write 'db_path: $HERMES_HOME/memory_store.db' and it resolves to the active profile directory, not the default home. Tests updated to mock get_memory_dir() alongside the legacy MEMORY_DIR.	2026-04-03 13:10:11 -07:00
Teknium	aecbf7fa4a	fix(discord): register /approve and /deny slash commands, wire up button-based approval UI (#4800 ) Two fixes for Discord exec approval: 1. Register /approve and /deny as native Discord slash commands so they appear in Discord's command picker (autocomplete). Previously they were only handled as text commands, so users saw 'no commands found' when typing /approve. 2. Wire up the existing ExecApprovalView button UI (was dead code): - ExecApprovalView now calls resolve_gateway_approval() to actually unblock the waiting agent thread when a button is clicked - Gateway's _approval_notify_sync() detects adapters with send_exec_approval() and routes through the button UI - Added 'Allow Session' button for parity with /approve session - send_exec_approval() now accepts session_key and metadata for thread support - Graceful fallback to text-based /approve prompt if button send fails Also updates test mocks to include grey/secondary ButtonStyle and purple Color (used by new button styles).	2026-04-03 10:24:07 -07:00
kshitijk4poor	7165eff901	fix(whatsapp): add free_response_chats, mention stripping, and interactive message unwrapping Address feature gaps vs Telegram/Discord/Mattermost adapters: - free_response_chats whitelist to bypass mention gating per-group - strip bot @phone mentions from body before forwarding to agent - unwrap templateMessage/buttonsMessage/listMessage in bridge - info-level log on successful mention pattern compilation - use module-level json import instead of inline import in config - eliminate double _normalize_whatsapp_id call via walrus operator - hoist botIds computation outside per-message loop in bridge	2026-04-03 01:16:39 -07:00
kshitijk4poor	714e4941b8	fix(whatsapp): enforce require_mention in group chats	2026-04-03 01:16:39 -07:00
Teknium	21c2d32471	fix(gateway): normalize step_callback prev_tools for backward compat The PR changed prev_tools from list[str] to list[dict] with name/result keys. The gateway's _step_callback_sync passed this directly to hooks as 'tool_names', breaking user-authored hooks that call ', '.join(tool_names). Now: - 'tool_names' always contains strings (backward-compatible) - 'tools' carries the enriched dicts for hooks that want results Also adds summary logging to register_mcp_servers() and comprehensive tests for all three PR changes: - sanitize_mcp_name_component edge cases - register_mcp_servers public API - _register_session_mcp_servers ACP integration - step_callback result forwarding - gateway normalization backward compat	2026-04-02 20:54:27 -07:00
Teknium	924bc67eee	feat(memory): pluggable memory provider interface with profile isolation, review fixes, and honcho CLI restoration (#4623 ) * feat(memory): add pluggable memory provider interface with profile isolation Introduces a pluggable MemoryProvider ABC so external memory backends can integrate with Hermes without modifying core files. Each backend becomes a plugin implementing a standard interface, orchestrated by MemoryManager. Key architecture: - agent/memory_provider.py — ABC with core + optional lifecycle hooks - agent/memory_manager.py — single integration point in the agent loop - agent/builtin_memory_provider.py — wraps existing MEMORY.md/USER.md Profile isolation fixes applied to all 6 shipped plugins: - Cognitive Memory: use get_hermes_home() instead of raw env var - Hindsight Memory: check $HERMES_HOME/hindsight/config.json first, fall back to legacy ~/.hindsight/ for backward compat - Hermes Memory Store: replace hardcoded ~/.hermes paths with get_hermes_home() for config loading and DB path defaults - Mem0 Memory: use get_hermes_home() instead of raw env var - RetainDB Memory: auto-derive profile-scoped project name from hermes_home path (hermes-<profile>), explicit env var overrides - OpenViking Memory: read-only, no local state, isolation via .env MemoryManager.initialize_all() now injects hermes_home into kwargs so every provider can resolve profile-scoped storage without importing get_hermes_home() themselves. Plugin system: adds register_memory_provider() to PluginContext and get_plugin_memory_providers() accessor. Based on PR #3825. 46 tests (37 unit + 5 E2E + 4 plugin registration). * refactor(memory): drop cognitive plugin, rewrite OpenViking as full provider Remove cognitive-memory plugin (#727) — core mechanics are broken: decay runs 24x too fast (hourly not daily), prefetch uses row ID as timestamp, search limited by importance not similarity. Rewrite openviking-memory plugin from a read-only search wrapper into a full bidirectional memory provider using the complete OpenViking session lifecycle API: - sync_turn: records user/assistant messages to OpenViking session (threaded, non-blocking) - on_session_end: commits session to trigger automatic memory extraction into 6 categories (profile, preferences, entities, events, cases, patterns) - prefetch: background semantic search via find() endpoint - on_memory_write: mirrors built-in memory writes to the session - is_available: checks env var only, no network calls (ABC compliance) Tools expanded from 3 to 5: - viking_search: semantic search with mode/scope/limit - viking_read: tiered content (abstract ~100tok / overview ~2k / full) - viking_browse: filesystem-style navigation (list/tree/stat) - viking_remember: explicit memory storage via session - viking_add_resource: ingest URLs/docs into knowledge base Uses direct HTTP via httpx (no openviking SDK dependency needed). Response truncation on viking_read to prevent context flooding. * fix(memory): harden Mem0 plugin — thread safety, non-blocking sync, circuit breaker - Remove redundant mem0_context tool (identical to mem0_search with rerank=true, top_k=5 — wastes a tool slot and confuses the model) - Thread sync_turn so it's non-blocking — Mem0's server-side LLM extraction can take 5-10s, was stalling the agent after every turn - Add threading.Lock around _get_client() for thread-safe lazy init (prefetch and sync threads could race on first client creation) - Add circuit breaker: after 5 consecutive API failures, pause calls for 120s instead of hammering a down server every turn. Auto-resets after cooldown. Logs a warning when tripped. - Track success/failure in prefetch, sync_turn, and all tool calls - Wait for previous sync to finish before starting a new one (prevents unbounded thread accumulation on rapid turns) - Clean up shutdown to join both prefetch and sync threads * fix(memory): enforce single external memory provider limit MemoryManager now rejects a second non-builtin provider with a warning. Built-in memory (MEMORY.md/USER.md) is always accepted. Only ONE external plugin provider is allowed at a time. This prevents tool schema bloat (some providers add 3-5 tools each) and conflicting memory backends. The warning message directs users to configure memory.provider in config.yaml to select which provider to activate. Updated all 47 tests to use builtin + one external pattern instead of multiple externals. Added test_second_external_rejected to verify the enforcement. * feat(memory): add ByteRover memory provider plugin Implements the ByteRover integration (from PR #3499 by hieuntg81) as a MemoryProvider plugin instead of direct run_agent.py modifications. ByteRover provides persistent memory via the brv CLI — a hierarchical knowledge tree with tiered retrieval (fuzzy text then LLM-driven search). Local-first with optional cloud sync. Plugin capabilities: - prefetch: background brv query for relevant context - sync_turn: curate conversation turns (threaded, non-blocking) - on_memory_write: mirror built-in memory writes to brv - on_pre_compress: extract insights before context compression Tools (3): - brv_query: search the knowledge tree - brv_curate: store facts/decisions/patterns - brv_status: check CLI version and context tree state Profile isolation: working directory at $HERMES_HOME/byterover/ (scoped per profile). Binary resolution cached with thread-safe double-checked locking. All write operations threaded to avoid blocking the agent (curate can take 120s with LLM processing). * fix(memory): thread remaining sync_turns, fix holographic, add config key Plugin fixes: - Hindsight: thread sync_turn (was blocking up to 30s via _run_in_thread) - RetainDB: thread sync_turn (was blocking on HTTP POST) - Both: shutdown now joins sync threads alongside prefetch threads Holographic retrieval fixes: - reason(): removed dead intersection_key computation (bundled but never used in scoring). Now reuses pre-computed entity_residuals directly, moved role_content encoding outside the inner loop. - contradict(): added _MAX_CONTRADICT_FACTS=500 scaling guard. Above 500 facts, only checks the most recently updated ones to avoid O(n^2) explosion (~125K comparisons at 500 is acceptable). Config: - Added memory.provider key to DEFAULT_CONFIG ("" = builtin only). No version bump needed (deep_merge handles new keys automatically). * feat(memory): extract Honcho as a MemoryProvider plugin Creates plugins/honcho-memory/ as a thin adapter over the existing honcho_integration/ package. All 4 Honcho tools (profile, search, context, conclude) move from the normal tool registry to the MemoryProvider interface. The plugin delegates all work to HonchoSessionManager — no Honcho logic is reimplemented. It uses the existing config chain: $HERMES_HOME/honcho.json -> ~/.honcho/config.json -> env vars. Lifecycle hooks: - initialize: creates HonchoSessionManager via existing client factory - prefetch: background dialectic query - sync_turn: records messages + flushes to API (threaded) - on_memory_write: mirrors user profile writes as conclusions - on_session_end: flushes all pending messages This is a prerequisite for the MemoryManager wiring in run_agent.py. Once wired, Honcho goes through the same provider interface as all other memory plugins, and the scattered Honcho code in run_agent.py can be consolidated into the single MemoryManager integration point. * feat(memory): wire MemoryManager into run_agent.py Adds 8 integration points for the external memory provider plugin, all purely additive (zero existing code modified): 1. Init (~L1130): Create MemoryManager, find matching plugin provider from memory.provider config, initialize with session context 2. Tool injection (~L1160): Append provider tool schemas to self.tools and self.valid_tool_names after memory_manager init 3. System prompt (~L2705): Add external provider's system_prompt_block alongside existing MEMORY.md/USER.md blocks 4. Tool routing (~L5362): Route provider tool calls through memory_manager.handle_tool_call() before the catchall handler 5. Memory write bridge (~L5353): Notify external provider via on_memory_write() when the built-in memory tool writes 6. Pre-compress (~L5233): Call on_pre_compress() before context compression discards messages 7. Prefetch (~L6421): Inject provider prefetch results into the current-turn user message (same pattern as Honcho turn context) 8. Turn sync + session end (~L8161, ~L8172): sync_all() after each completed turn, queue_prefetch_all() for next turn, on_session_end() + shutdown_all() at conversation end All hooks are wrapped in try/except — a failing provider never breaks the agent. The existing memory system, Honcho integration, and all other code paths are completely untouched. Full suite: 7222 passed, 4 pre-existing failures. * refactor(memory): remove legacy Honcho integration from core Extracts all Honcho-specific code from run_agent.py, model_tools.py, toolsets.py, and gateway/run.py. Honcho is now exclusively available as a memory provider plugin (plugins/honcho-memory/). Removed from run_agent.py (-457 lines): - Honcho init block (session manager creation, activation, config) - 8 Honcho methods: _honcho_should_activate, _strip_honcho_tools, _activate_honcho, _register_honcho_exit_hook, _queue_honcho_prefetch, _honcho_prefetch, _honcho_save_user_observation, _honcho_sync - _inject_honcho_turn_context module-level function - Honcho system prompt block (tool descriptions, CLI commands) - Honcho context injection in api_messages building - Honcho params from __init__ (honcho_session_key, honcho_manager, honcho_config) - HONCHO_TOOL_NAMES constant - All honcho-specific tool dispatch forwarding Removed from other files: - model_tools.py: honcho_tools import, honcho params from handle_function_call - toolsets.py: honcho toolset definition, honcho tools from core tools list - gateway/run.py: honcho params from AIAgent constructor calls Removed tests (-339 lines): - 9 Honcho-specific test methods from test_run_agent.py - TestHonchoAtexitFlush class from test_exit_cleanup_interrupt.py Restored two regex constants (_SURROGATE_RE, _BUDGET_WARNING_RE) that were accidentally removed during the honcho function extraction. The honcho_integration/ package is kept intact — the plugin delegates to it. tools/honcho_tools.py registry entries are now dead code (import commented out in model_tools.py) but the file is preserved for reference. Full suite: 7207 passed, 4 pre-existing failures. Zero regressions. * refactor(memory): restructure plugins, add CLI, clean gateway, migration notice Plugin restructure: - Move all memory plugins from plugins/<name>-memory/ to plugins/memory/<name>/ (byterover, hindsight, holographic, honcho, mem0, openviking, retaindb) - New plugins/memory/__init__.py discovery module that scans the directory directly, loading providers by name without the general plugin system - run_agent.py uses load_memory_provider() instead of get_plugin_memory_providers() CLI wiring: - hermes memory setup — interactive curses picker + config wizard - hermes memory status — show active provider, config, availability - hermes memory off — disable external provider (built-in only) - hermes honcho — now shows migration notice pointing to hermes memory setup Gateway cleanup: - Remove _get_or_create_gateway_honcho (already removed in prev commit) - Remove _shutdown_gateway_honcho and _shutdown_all_gateway_honcho methods - Remove all calls to shutdown methods (4 call sites) - Remove _honcho_managers/_honcho_configs dict references Dead code removal: - Delete tools/honcho_tools.py (279 lines, import was already commented out) - Delete tests/gateway/test_honcho_lifecycle.py (131 lines, tested removed methods) - Remove if False placeholder from run_agent.py Migration: - Honcho migration notice on startup: detects existing honcho.json or ~/.honcho/config.json, prints guidance to run hermes memory setup. Only fires when memory.provider is not set and not in quiet mode. Full suite: 7203 passed, 4 pre-existing failures. Zero regressions. * feat(memory): standardize plugin config + add per-plugin documentation Config architecture: - Add save_config(values, hermes_home) to MemoryProvider ABC - Honcho: writes to $HERMES_HOME/honcho.json (SDK native) - Mem0: writes to $HERMES_HOME/mem0.json - Hindsight: writes to $HERMES_HOME/hindsight/config.json - Holographic: writes to config.yaml under plugins.hermes-memory-store - OpenViking/RetainDB/ByteRover: env-var only (default no-op) Setup wizard (hermes memory setup): - Now calls provider.save_config() for non-secret config - Secrets still go to .env via env vars - Only memory.provider activation key goes to config.yaml Documentation: - README.md for each of the 7 providers in plugins/memory/<name>/ - Requirements, setup (wizard + manual), config reference, tools table - Consistent format across all providers The contract for new memory plugins: - get_config_schema() declares all fields (REQUIRED) - save_config() writes native config (REQUIRED if not env-var-only) - Secrets use env_var field in schema, written to .env by wizard - README.md in the plugin directory * docs: add memory providers user guide + developer guide New pages: - user-guide/features/memory-providers.md — comprehensive guide covering all 7 shipped providers (Honcho, OpenViking, Mem0, Hindsight, Holographic, RetainDB, ByteRover). Each with setup, config, tools, cost, and unique features. Includes comparison table and profile isolation notes. - developer-guide/memory-provider-plugin.md — how to build a new memory provider plugin. Covers ABC, required methods, config schema, save_config, threading contract, profile isolation, testing. Updated pages: - user-guide/features/memory.md — replaced Honcho section with link to new Memory Providers page - user-guide/features/honcho.md — replaced with migration redirect to the new Memory Providers page - sidebars.ts — added both new pages to navigation * fix(memory): auto-migrate Honcho users to memory provider plugin When honcho.json or ~/.honcho/config.json exists but memory.provider is not set, automatically set memory.provider: honcho in config.yaml and activate the plugin. The plugin reads the same config files, so all data and credentials are preserved. Zero user action needed. Persists the migration to config.yaml so it only fires once. Prints a one-line confirmation in non-quiet mode. * fix(memory): only auto-migrate Honcho when enabled + credentialed Check HonchoClientConfig.enabled AND (api_key OR base_url) before auto-migrating — not just file existence. Prevents false activation for users who disabled Honcho, stopped using it (config lingers), or have ~/.honcho/ from a different tool. * feat(memory): auto-install pip dependencies during hermes memory setup Reads pip_dependencies from plugin.yaml, checks which are missing, installs them via pip before config walkthrough. Also shows install guidance for external_dependencies (e.g. brv CLI for ByteRover). Updated all 7 plugin.yaml files with pip_dependencies: - honcho: honcho-ai - mem0: mem0ai - openviking: httpx - hindsight: hindsight-client - holographic: (none) - retaindb: requests - byterover: (external_dependencies for brv CLI) * fix: remove remaining Honcho crash risks from cli.py and gateway cli.py: removed Honcho session re-mapping block (would crash importing deleted tools/honcho_tools.py), Honcho flush on compress, Honcho session display on startup, Honcho shutdown on exit, honcho_session_key AIAgent param. gateway/run.py: removed honcho_session_key params from helper methods, sync_honcho param, _honcho.shutdown() block. tests: fixed test_cron_session_with_honcho_key_skipped (was passing removed honcho_key param to _flush_memories_for_session). * fix: include plugins/ in pyproject.toml package list Without this, plugins/memory/ wouldn't be included in non-editable installs. Hermes always runs from the repo checkout so this is belt- and-suspenders, but prevents breakage if the install method changes. * fix(memory): correct pip-to-import name mapping for dep checks The heuristic dep.replace('-', '_') fails for packages where the pip name differs from the import name: honcho-ai→honcho, mem0ai→mem0, hindsight-client→hindsight_client. Added explicit mapping table so hermes memory setup doesn't try to reinstall already-installed packages. * chore: remove dead code from old plugin memory registration path - hermes_cli/plugins.py: removed register_memory_provider(), _memory_providers list, get_plugin_memory_providers() — memory providers now use plugins/memory/ discovery, not the general plugin system - hermes_cli/main.py: stripped 74 lines of dead honcho argparse subparsers (setup, status, sessions, map, peer, mode, tokens, identity, migrate) — kept only the migration redirect - agent/memory_provider.py: updated docstring to reflect new registration path - tests: replaced TestPluginMemoryProviderRegistration with TestPluginMemoryDiscovery that tests the actual plugins/memory/ discovery system. Added 3 new tests (discover, load, nonexistent). * chore: delete dead honcho_integration/cli.py and its tests cli.py (794 lines) was the old 'hermes honcho' command handler — nobody calls it since cmd_honcho was replaced with a migration redirect. Deleted tests that imported from removed code: - tests/honcho_integration/test_cli.py (tested _resolve_api_key) - tests/honcho_integration/test_config_isolation.py (tested CLI config paths) - tests/tools/test_honcho_tools.py (tested the deleted tools/honcho_tools.py) Remaining honcho_integration/ files (actively used by the plugin): - client.py (445 lines) — config loading, SDK client creation - session.py (991 lines) — session management, queries, flush * refactor: move honcho_integration/ into the honcho plugin Moves client.py (445 lines) and session.py (991 lines) from the top-level honcho_integration/ package into plugins/memory/honcho/. No Honcho code remains in the main codebase. - plugins/memory/honcho/client.py — config loading, SDK client creation - plugins/memory/honcho/session.py — session management, queries, flush - Updated all imports: run_agent.py (auto-migration), hermes_cli/doctor.py, plugin __init__.py, session.py cross-import, all tests - Removed honcho_integration/ package and pyproject.toml entry - Renamed tests/honcho_integration/ → tests/honcho_plugin/ * docs: update architecture + gateway-internals for memory provider system - architecture.md: replaced honcho_integration/ with plugins/memory/ - gateway-internals.md: replaced Honcho-specific session routing and flush lifecycle docs with generic memory provider interface docs * fix: update stale mock path for resolve_active_host after honcho plugin migration * fix(memory): address review feedback — P0 lifecycle, ABC contract, honcho CLI restore Review feedback from Honcho devs (erosika): P0 — Provider lifecycle: - Remove on_session_end() + shutdown_all() from run_conversation() tail (was killing providers after every turn in multi-turn sessions) - Add shutdown_memory_provider() method on AIAgent for callers - Wire shutdown into CLI atexit, reset_conversation, gateway stop/expiry Bug fixes: - Remove sync_honcho=False kwarg from /btw callsites (TypeError crash) - Fix doctor.py references to dead 'hermes honcho setup' command - Cache prefetch_all() before tool loop (was re-calling every iteration) ABC contract hardening (all backwards-compatible): - Add session_id kwarg to prefetch/sync_turn/queue_prefetch - Make on_pre_compress() return str (provider insights in compression) - Add *kwargs to on_turn_start() for runtime context - Add on_delegation() hook for parent-side subagent observation - Document agent_context/agent_identity/agent_workspace kwargs on initialize() (prevents cron corruption, enables profile scoping) - Fix docstring: single external provider, not multiple Honcho CLI restoration: - Add plugins/memory/honcho/cli.py (from main's honcho_integration/cli.py with imports adapted to plugin path) - Restore full hermes honcho command with all subcommands (status, peer, mode, tokens, identity, enable/disable, sync, peers, --target-profile) - Restore auto-clone on profile creation + sync on hermes update - hermes honcho setup now redirects to hermes memory setup fix(memory): wire on_delegation, skip_memory for cron/flush, fix ByteRover return type - Wire on_delegation() in delegate_tool.py — parent's memory provider is notified with task+result after each subagent completes - Add skip_memory=True to cron scheduler (prevents cron system prompts from corrupting user representations — closes #4052) - Add skip_memory=True to gateway flush agent (throwaway agent shouldn't activate memory provider) - Fix ByteRover on_pre_compress() return type: None -> str * fix(honcho): port profile isolation fixes from PR #4632 Ports 5 bug fixes found during profile testing (erosika's PR #4632): 1. 3-tier config resolution — resolve_config_path() now checks $HERMES_HOME/honcho.json → ~/.hermes/honcho.json → ~/.honcho/config.json (non-default profiles couldn't find shared host blocks) 2. Thread host=_host_key() through from_global_config() in cmd_setup, cmd_status, cmd_identity (--target-profile was being ignored) 3. Use bare profile name as aiPeer (not host key with dots) — Honcho's peer ID pattern is ^[a-zA-Z0-9_-]+$, dots are invalid 4. Wrap add_peers() in try/except — was fatal on new AI peers, killed all message uploads for the session 5. Gate Honcho clone behind --clone/--clone-all on profile create (bare create should be blank-slate) Also: sanitize assistant_peer_id via _sanitize_id() * fix(tests): add module cleanup fixture to test_cli_provider_resolution test_cli_provider_resolution._import_cli() wipes tools.*, cli, and run_agent from sys.modules to force fresh imports, but had no cleanup. This poisoned all subsequent tests on the same xdist worker — mocks targeting tools.file_tools, tools.send_message_tool, etc. patched the NEW module object while already-imported functions still referenced the OLD one. Caused ~25 cascade failures: send_message KeyError, process_registry FileNotFoundError, file_read_guards timeouts, read_loop_detection file-not-found, mcp_oauth None port, and provider_parity/codex_execution stale tool lists. Fix: autouse fixture saves all affected modules before each test and restores them after, matching the pattern in test_managed_browserbase_and_modal.py.	2026-04-02 15:33:51 -07:00
kshitijk4poor	20441cf2c8	fix(insights): persist token usage for non-CLI sessions	2026-04-02 10:47:13 -07:00
Teknium	624ad582a5	fix: make gateway approval block agent thread like CLI does (#4557 ) The gateway's dangerous command approval system was fundamentally broken: the agent loop continued running after a command was flagged, and the approval request only reached the user after the agent finished its entire conversation loop. By then the context was lost. This change makes the gateway approval mirror the CLI's synchronous behavior. When a dangerous command is detected: 1. The agent thread blocks on a threading.Event 2. The approval request is sent to the user immediately 3. The user responds with /approve or /deny 4. The event is signaled and the agent resumes with the real result The agent never sees 'approval_required' as a tool result. It either gets the command output (approved) or a definitive BLOCKED message (denied/timed out) — same as CLI mode. Queue-based design supports multiple concurrent approvals (parallel subagents via delegate_task, execute_code RPC handlers). Each approval gets its own _ApprovalEntry with its own threading.Event. /approve resolves the oldest (FIFO); /approve all resolves all at once. Changes: - tools/approval.py: Queue-based per-session blocking gateway approval (register/unregister callbacks, resolve with FIFO or all-at-once) - gateway/run.py: Register approval callback in run_sync(), remove post-loop pop_pending hack, /approve and /deny support 'all' flag - tests: 21 tests including parallel subagent E2E scenarios	2026-04-02 01:47:19 -07:00
Gary Chiu	8cb3596939	fix(gateway): seed DM thread sessions with parent transcript to preserve context	2026-04-02 01:33:53 -07:00
Teknium	16d9f58445	fix(gateway): persist memory flush state to prevent redundant re-flushes on restart (#4481 ) * fix: force-close TCP sockets on client cleanup, detect and recover dead connections When a provider drops connections mid-stream (e.g. OpenRouter outage), httpx's graceful close leaves sockets in CLOSE-WAIT indefinitely. These zombie connections accumulate and can prevent recovery without restarting. Changes: - _force_close_tcp_sockets: walks the httpx connection pool and issues socket.shutdown(SHUT_RDWR) + close() to force TCP RST on every socket when a client is closed, preventing CLOSE-WAIT accumulation - _cleanup_dead_connections: probes the primary client's pool for dead sockets (recv MSG_PEEK), rebuilds the client if any are found - Pre-turn health check at the start of each run_conversation call that auto-recovers with a user-facing status message - Primary client rebuild after stale stream detection to purge pool - User-facing messages on streaming connection failures: "Connection to provider dropped — Reconnecting (attempt 2/3)" "Connection failed after 3 attempts — try again in a moment" Made-with: Cursor * fix: pool entry missing base_url for openrouter, clean error messages - _resolve_runtime_from_pool_entry: add OPENROUTER_BASE_URL fallback when pool entry has no runtime_base_url (pool entries from auth.json credential_pool often omit base_url) - Replace Rich console.print for auth errors with plain print() to prevent ANSI escape code mangling through prompt_toolkit's stdout patch - Force-close TCP sockets on client cleanup to prevent CLOSE-WAIT accumulation after provider outages - Pre-turn dead connection detection with auto-recovery and user message - Primary client rebuild after stale stream detection - User-facing status messages on streaming connection failures/retries Made-with: Cursor * fix(gateway): persist memory flush state to prevent redundant re-flushes on restart The _session_expiry_watcher tracked flushed sessions in an in-memory set (_pre_flushed_sessions) that was lost on gateway restart. Expired sessions remained in sessions.json and were re-discovered every restart, causing redundant AIAgent runs that burned API credits and blocked the event loop. Fix: Add a memory_flushed boolean field to SessionEntry, persisted in sessions.json. The watcher sets it after a successful flush. On restart, the flag survives and the watcher skips already-flushed sessions. - Add memory_flushed field to SessionEntry with to_dict/from_dict support - Old sessions.json entries without the field default to False (backward compat) - Remove the ephemeral _pre_flushed_sessions set from SessionStore - Update tests: save/load roundtrip, legacy entry compat, auto-reset behavior	2026-04-01 12:05:02 -07:00
Teknium	7e91009018	fix: lazy-init SessionDB on adapter instance instead of per-request Reuse a single SessionDB across requests by caching on self._session_db with lazy initialization. Avoids creating a new SQLite connection per request when X-Hermes-Session-Id is used. Updated tests to set adapter._session_db directly instead of patching the constructor.	2026-04-01 11:41:32 -07:00
txchen	bf19623a53	feat(api-server): support X-Hermes-Session-Id header for session continuity Allow callers to pass X-Hermes-Session-Id in request headers to continue an existing conversation. When provided, history is loaded from SessionDB instead of the request body, and the session_id is echoed in the response header. Without the header, existing behavior is preserved (new uuid per request). This enables web UI clients to maintain thread continuity without modifying any session state themselves — the same mechanism the gateway uses for IM platforms (Telegram, Discord, etc.).	2026-04-01 11:41:32 -07:00
Nick	9a581bba50	fix(gateway): resume agent after /approve executes blocked command When a dangerous command was blocked and the user approved it via /approve, the command was executed but the agent loop had already exited — the agent never received the command output and the task died silently. Now _handle_approve_command sends immediate feedback to the user, then creates a synthetic continuation message with the command output and feeds it through _handle_message so the agent picks up where it left off. - Send command result to chat immediately via adapter.send() - Create synthetic MessageEvent with command + output as context - Spawn asyncio task to re-invoke agent via _handle_message - Return None (feedback already sent directly) - Add test for agent re-invocation after approval - Update existing approval tests for new return behavior	2026-04-01 01:38:55 -07:00

1 2 3 4 5 ...

309 Commits