hermes-agent

Author	SHA1	Message	Date
Teknium	eb7c408445	fix(gateway): /stop and /new bypass Level 1 active-session guard (#5765 ) * fix(gateway): /stop and /new bypass Level 1 active-session guard The base adapter's Level 1 guard intercepted ALL messages while an agent was running, including /stop and /new. These commands were queued as pending messages instead of being dispatched to the gateway runner's Level 2 handler. When the agent eventually stopped (via the interrupt mechanism), the command text leaked into the conversation as a user message — the model would receive '/stop' as input and respond to it. Fix: Add /stop, /new, and /reset to the bypass set in base.py alongside /approve, /deny, and /status. Consolidate the three separate bypass blocks into one. Commands in the bypass set are dispatched inline to the gateway runner, where Level 2 handles them correctly (hard-kill for /stop, session reset for /new). Also add a safety net in _run_agent's pending-message processing: if the pending text resolves to a known slash command, discard it instead of passing it to the agent. This catches edge cases where command text leaks through the interrupt_message fallback. Refs: #5244 * test: regression tests for command bypass of active-session guard 17 tests covering: - /stop, /new, /reset bypass the Level 1 guard when agent is running - /approve, /deny, /status bypass (existing behavior, now tested) - Regular text and unknown commands still queued (not bypassed) - File paths like '/path/to/file' not treated as commands - Telegram @botname suffix handled correctly - Safety net command resolution (resolve_command detects known commands)	2026-04-07 00:53:45 -07:00
Yang Zhi	9e844160f9	fix(credential_pool): auto-detect Z.AI endpoint via probe and cache The credential pool seeder and runtime credential resolver hardcoded api.z.ai/api/paas/v4 for all Z.AI keys. Keys on the Coding Plan (or CN endpoint) would hit the wrong endpoint, causing 401/429 errors on the first request even though a working endpoint exists. Add _resolve_zai_base_url() that: - Respects GLM_BASE_URL env var (no probe when explicitly set) - Probes all candidate endpoints (global, cn, coding-global, coding-cn) via detect_zai_endpoint() to find one that returns HTTP 200 - Caches the detected endpoint in provider state (auth.json) keyed on a SHA-256 hash of the API key so subsequent starts skip the probe - Falls back to the default URL if all probes fail Wire into both _seed_from_env() in the credential pool and resolve_api_key_provider_credentials() in the runtime resolver, matching the pattern from the kimi-coding fix (PR #5566). Fixes the same class of bug as #5561 but for the zai provider.	2026-04-07 00:00:08 -07:00
Teknium	8e64f795a1	fix: stale OAuth credentials block OpenRouter users on auto-detect (#5746 ) When resolve_runtime_provider is called with requested='auto' and auth.json has a stale active_provider (nous or openai-codex) whose OAuth refresh token has been revoked, the AuthError now falls through to the next provider in the chain (e.g. OpenRouter via env vars) instead of propagating to the user as a blocking error. When the user explicitly requested the OAuth provider, the error still propagates so they know to re-authenticate. Root cause: resolve_provider('auto') checks auth.json for an active OAuth provider before checking env vars. get_nous_auth_status() reports logged_in=True if any access_token exists (even expired), so the Nous path is taken. resolve_nous_runtime_credentials() then tries to refresh the token, fails with 'Refresh session has been revoked', and the AuthError bubbles up to the CLI bold-red display. Adds 3 tests: Nous fallthrough, Codex fallthrough, explicit-request still raises.	2026-04-06 23:01:43 -07:00
Mateus Scheuer Macedo	f2c11ff30c	fix(delegate): share credential pools with subagents + per-task leasing Cherry-picked from PR #5580 by MestreY0d4-Uninter. - Share parent's credential pool with child agents for key rotation - Leasing layer spreads parallel children across keys (least-loaded) - Thread-safe acquire_lease/release_lease in CredentialPool - Reverted sneaked-in tool-name restoration change (kept original getattr + isinstance guard pattern)	2026-04-06 23:01:11 -07:00
Teknium	8dee82ea1e	fix: stream consumer creates new message after tool boundaries (#5739 ) When streaming was enabled on the gateway, the stream consumer created a single message at the start and kept editing it as tokens arrived. Tool progress messages were sent as separate messages below it. Since edits don't change message position on Telegram/Matrix/Discord, the final response ended up stuck above all tool progress messages — users had to scroll up past potentially dozens of tool call lines to read the answer. The agent already sends stream_delta_callback(None) at tool boundaries (before _execute_tool_calls). The stream consumer was ignoring this signal. Now it treats None as a segment break: finalizes the current message (removes cursor), resets _message_id, and the next text chunk creates a fresh message below the tool progress messages. Timeline before: [msg 1: 'Let me search...' → edits → 'Here is the answer'] ← top [msg 2: tool progress lines] ← bottom Timeline after: [msg 1: 'Let me search...'] ← top [msg 2: tool progress lines] [msg 3: 'Here is the answer'] ← bottom (visible) Reported by SkyLinx on Discord.	2026-04-06 23:00:14 -07:00
Hermes Agent	4fc7f3eaa5	fix(memory): clean up supermemory provider threads	2026-04-06 22:15:58 -07:00
Hermes Agent	dc333388ec	docs(memory): add Supermemory PR draft and cleanup	2026-04-06 22:15:58 -07:00
Hermes Agent	76f19775c3	feat(memory): add Supermemory memory provider	2026-04-06 22:15:58 -07:00
eizus	4ec615b0c2	feat(gateway): Enable Slack thread replies without explicit @mentions When a user replies in a Slack thread where the bot has an active conversation session, the bot now processes the message even without an explicit @mention. This improves UX for ongoing threaded discussions. Changes: - Added set_session_store() to BasePlatformAdapter for adapters to check active sessions - Modified SlackAdapter to detect thread replies and check if a session exists for that thread before requiring @mentions - Updated GatewayRunner to inject the session store into adapters - Added comprehensive tests for the new behavior Fixes: Thread replies without @jarvis are now processed if there is an active session, matching user expectations for conversation flow	2026-04-06 21:27:16 -07:00
Teknium	adb418fb53	fix: cross-platform browser test path separators Use os.path.join for Windows install path so test passes on Linux (os.path.join uses / on Linux, \ on Windows).	2026-04-06 16:54:16 -07:00
jtuki	57abc99315	feat(gateway): add per-group access control for Feishu Add fine-grained authorization policies per Feishu group chat via platforms.feishu.extra configuration. - Add global bot-level admins that bypass all group restrictions - Add per-group policies: open, allowlist, blacklist, admin_only, disabled - Add default_group_policy fallback for chats without explicit rules - Thread chat_id through group message gate for per-chat rule selection - Match both open_id and user_id for backward compatibility - Preserve existing FEISHU_ALLOWED_USERS / FEISHU_GROUP_POLICY behavior - Add focused regression tests for all policy modes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 16:54:16 -07:00
jtuki	18727ca9aa	refactor(gateway): simplify Feishu websocket config helpers Consolidate coercion functions, extract loop readiness check, and deduplicate test mock setup to improve maintainability without changing behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 16:54:16 -07:00
jtuki	157d6184e3	fix(gateway): make Feishu websocket overrides effective at runtime Reapply local reconnect and ping settings after the Feishu SDK refreshes its client config so user-provided websocket tuning actually takes effect. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 16:54:16 -07:00
jtuki	ea31d9077c	feat(gateway): add Feishu websocket ping timing overrides Allow Feishu websocket keepalive timing to be configured via platform extra config so disconnects can be detected faster in unstable networks. New optional extra settings: - ws_ping_interval - ws_ping_timeout These values are applied only when explicitly configured. Invalid values fall back to the websocket library defaults by leaving the options unset. This complements the reconnect timing settings added previously and helps reduce total recovery time after network interruptions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 16:54:16 -07:00
jtuki	7d0bf15121	feat(gateway): add configurable Feishu websocket reconnect timing Allow users to configure websocket reconnect behavior via platform extra config to reduce reconnect latency in production environments. The official Feishu SDK defaults to: - First reconnect: random jitter 0-30 seconds - Subsequent retries: 120 second intervals This can cause 20-30 second delays before reconnection after network interruptions. This commit makes these values configurable while keeping the SDK defaults for backward compatibility. Configuration via ~/.hermes/config.yaml: ```yaml platforms: feishu: extra: ws_reconnect_nonce: 0 # Disable first-reconnect jitter (default: 30) ws_reconnect_interval: 3 # Retry every 3 seconds (default: 120) ``` Invalid values (negative numbers, non-integers) fall back to SDK defaults. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 16:54:16 -07:00
jtuki	7cf4bd06bf	fix(gateway): fix Feishu reconnect message drops and shutdown hang This commit fixes two critical bugs in the Feishu adapter that affect message reliability and process lifecycle. Bug Fix 1: Intermittent Message Drops Root cause: Event handler was created once in __init__ and reused across reconnects, causing callbacks to capture stale loop references. When the adapter disconnected and reconnected, old callbacks continued firing with invalid loop references, resulting in dropped messages with warnings: "[Feishu] Dropping inbound message before adapter loop is ready" Fix: - Rebuild event handler on each connect (websocket/webhook) - Clear handler on disconnect - Ensure callbacks always capture current valid loop - Add defensive loop.is_closed() checks with getattr for test compatibility - Unify webhook dispatch path to use same loop checks as websocket mode Bug Fix 2: Process Hangs on Ctrl+C / SIGTERM Root cause: Feishu SDK's websocket client runs in a background thread with an infinite _select() loop that never exits naturally. The thread was never properly joined on disconnect, causing processes to hang indefinitely after Ctrl+C or gateway stop commands. Fix: - Store reference to thread-local event loop (_ws_thread_loop) - On disconnect, cancel all tasks in thread loop and stop it gracefully via call_soon_threadsafe() - Await thread future with 10s timeout - Clean up pending tasks in thread's finally block before closing loop - Add detailed debug logging for disconnect flow Additional Improvements: - Add regression tests for disconnect cleanup and webhook dispatch - Ensure all event callbacks check loop readiness before dispatching Tested on Linux with websocket mode. All Feishu tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 16:54:16 -07:00
Ruzzgar	abd24d381b	Implement comprehensive browser path discovery for Windows	2026-04-06 16:54:16 -07:00
Tianxiao	8a29b49036	fix(cli): handle CJK wide chars in TUI input height	2026-04-06 16:54:16 -07:00
kshitijk4poor	05f9267938	fix(matrix): hard-fail E2EE when python-olm missing + stable MATRIX_DEVICE_ID Two issues caused Matrix E2EE to silently not work in encrypted rooms: 1. When matrix-nio is installed without the [e2e] extra (no python-olm / libolm), nio.crypto.ENCRYPTION_ENABLED is False and client.olm is never initialized. The adapter logged warnings but returned True from connect(), so the bot appeared online but could never decrypt messages. Now: check_matrix_requirements() and connect() both hard-fail with a clear error message when MATRIX_ENCRYPTION=true but E2EE deps are missing. 2. Without a stable device_id, the bot gets a new device identity on each restart. Other clients see it as "unknown device" and refuse to share Megolm session keys. Now: MATRIX_DEVICE_ID env var lets users pin a stable device identity that persists across restarts and is passed to nio.AsyncClient constructor + restore_login(). Changes: - gateway/platforms/matrix.py: add _check_e2ee_deps(), hard-fail in connect() and check_matrix_requirements(), MATRIX_DEVICE_ID support in constructor + restore_login - gateway/config.py: plumb MATRIX_DEVICE_ID into platform extras - hermes_cli/config.py: add MATRIX_DEVICE_ID to OPTIONAL_ENV_VARS Closes #3521	2026-04-06 16:54:16 -07:00
WAXLYY	1c0183ec71	fix(gateway): sanitize media URLs in base platform logs	2026-04-06 16:50:05 -07:00
KangYu	b26e85bf9d	Fix compaction summary retries for temperature-restricted models	2026-04-06 16:49:57 -07:00
WAXLYY	c1818b7e9e	fix(tools): redact query secrets in send_message errors	2026-04-06 16:49:52 -07:00
ryanautomated	0f9aa57069	fix: silent memory flush failure on /new and /resume commands The _async_flush_memories() helper accepts (session_id) but both the /new and /resume handlers passed two arguments (session_id, session_key). The TypeError was silently swallowed at DEBUG level, so memory extraction never ran when users typed /new or /resume. One call site (the session expiry watcher) was already fixed in `9c96f669`, but /new and /resume were missed. - gateway/run.py:3247 — remove stray session_key from /new handler - gateway/run.py:4989 — remove stray session_key from /resume handler - tests/gateway/test_resume_command.py:222 — update test assertion	2026-04-06 16:49:42 -07:00
Myeongwon Choi	ea16949422	fix(cron): suppress delivery when [SILENT] appears anywhere in response Previously the scheduler checked startswith('[SILENT]'), so agents that appended [SILENT] after an explanation (e.g. 'N items filtered.\n\n[SILENT]') would still trigger delivery. Change the check to 'in' so the marker is caught regardless of position. Add test_silent_trailing_suppresses_delivery to cover this case.	2026-04-06 16:49:40 -07:00
Nick	4f03b9a419	feat(webhook): add {__raw__} template token and thread_id passthrough for forum topics - {__raw__} in webhook prompt templates dumps the full JSON payload (truncated at 4000 chars) - _deliver_cross_platform now passes thread_id/message_thread_id from deliver_extra as metadata, enabling Telegram forum topic delivery - Tests for both features	2026-04-06 16:42:52 -07:00
kshitijk4poor	9201370c7e	feat(plugins): prompt for required env vars during hermes plugins install Read requires_env from plugin.yaml after install and interactively prompt for any missing environment variables, saving them to ~/.hermes/.env. Supports two manifest formats: Simple (backwards-compatible): requires_env: - MY_API_KEY Rich (with metadata): requires_env: - name: MY_API_KEY description: "API key for Acme" url: "https://acme.com/keys" secret: true Already-set variables are skipped. Empty input skips gracefully. Secret values use getpass (hidden input). Ctrl+C aborts remaining prompts without error.	2026-04-06 16:37:53 -07:00
Siddharth Balyan	7b129636f0	feat(tools): add Firecrawl cloud browser provider (#5628 ) * feat(tools): add Firecrawl cloud browser provider Adds Firecrawl (https://firecrawl.dev) as a cloud browser provider alongside Browserbase and Browser Use. All browser tools route through Firecrawl's cloud browser via CDP when selected. - tools/browser_providers/firecrawl.py — FirecrawlProvider - tools/browser_tool.py — register in _PROVIDER_REGISTRY - hermes_cli/tools_config.py — add to onboarding provider picker - hermes_cli/setup.py — add to setup summary - hermes_cli/config.py — add FIRECRAWL_BROWSER_TTL config - website/docs/ — browser docs and env var reference Based on #4490 by @developersdigest. Co-Authored-By: Developers Digest <124798203+developersdigest@users.noreply.github.com> * refactor: simplify FirecrawlProvider.emergency_cleanup Use self._headers() and self._api_url() instead of duplicating env-var reads and header construction. * fix: recognize Firecrawl in subscription browser detection _resolve_browser_feature_state() now handles "firecrawl" as a direct browser provider (same pattern as "browser-use"), so hermes setup summary correctly shows "Browser Automation (Firecrawl)" instead of misreporting as "Local browser". Also fixes test_config_version_unchanged assertion (11 → 12). --------- Co-authored-by: Developers Digest <124798203+developersdigest@users.noreply.github.com>	2026-04-07 02:35:26 +05:30
Mikita Lisavets	9afb9a6cb2	fix: clear session-scoped model overrides during session reset	2026-04-06 13:20:01 -07:00
Ayman Kamal	96f85b03cd	fix: handle launchctl kickstart exit code 113 in launchd_start() launchctl kickstart returns exit code 113 ("Could not find service") when the plist exists but the job hasn't been bootstrapped into the runtime domain. The existing recovery path only caught exit code 3 ("unloaded"), causing an unhandled CalledProcessError. Exit code 113 means the same thing practically -- the service definition needs bootstrapping before it can be kicked. Add it to the same recovery path that already handles exit 3, matching the existing pattern in launchd_stop(). Follow-up: add a unit test covering the 113 recovery path.	2026-04-06 13:20:01 -07:00
Dusk1e	1a2f109d8e	Ensure atomic writes for gateway channel directory cache to prevent truncation	2026-04-06 13:20:01 -07:00
Awsh1	878b1d3d33	fix(cron): harden scheduler against path traversal and env leaks Cherry-picked from PR #5503 by Awsh1. - Validate ALL script paths (absolute, relative, tilde) against scripts_dir boundary - Add API-boundary validation in cronjob_tools.py - Move os.environ injections inside try block so finally cleanup always runs - Comprehensive regression tests for path containment bypass	2026-04-06 12:42:16 -07:00
Teknium	da02a4e283	fix: auxiliary client payment fallback — retry with next provider on 402 (#5599 ) When a user runs out of OpenRouter credits and switches to Codex (or any other provider), auxiliary tasks (compression, vision, web_extract) would still try OpenRouter first and fail with 402. Two fixes: 1. Payment fallback in call_llm(): When a resolved provider returns HTTP 402 or a credit-related error, automatically retry with the next available provider in the auto-detection chain. Skips the depleted provider and tries Nous → Custom → Codex → API-key providers. 2. Remove hardcoded OpenRouter fallback: The old code fell back specifically to OpenRouter when auto/custom resolution returned no client. Now falls back to the full auto-detection chain, which handles any available provider — not just OpenRouter. Also extracts _get_provider_chain() as a shared function (replaces inline tuple in _resolve_auto and the new fallback), built at call time so test patches on _try_* functions remain visible. Adds 16 tests covering _is_payment_error(), _get_provider_chain(), _try_payment_fallback(), and call_llm() integration with 402 retry.	2026-04-06 12:41:40 -07:00
Teknium	8ffd44a6f9	feat(discord): register skills as native slash commands via shared gateway logic (#5603 ) Centralize the skill → slash command registration that Telegram already had in commands.py so Discord uses the exact same priority system, filtering, and cap enforcement: 1. Core/built-in commands (never trimmed) 2. Plugin commands (never trimmed) 3. Skill commands (fill remaining slots, alphabetical, only tier trimmed) Changes: hermes_cli/commands.py: - Rename _TG_NAME_LIMIT → _CMD_NAME_LIMIT (32 chars shared by both platforms) - Rename _clamp_telegram_names → _clamp_command_names (generic) - Extract _collect_gateway_skill_entries() — shared plugin + skill collection with platform filtering, name sanitization, description truncation, and cap enforcement - Refactor telegram_menu_commands() to use the shared helper - Add discord_skill_commands() that returns (name, desc, cmd_key) triples - Preserve _sanitize_telegram_name() for Telegram-specific name cleaning gateway/platforms/discord.py: - Call discord_skill_commands() from _register_slash_commands() - Create app_commands.Command per skill entry with cmd_key callback - Respect 100-command global Discord limit - Log warning when skills are skipped due to cap Backward-compat aliases preserved for _TG_NAME_LIMIT and _clamp_telegram_names. Tests: 9 new tests (7 Discord + 2 backward-compat), 98 total pass. Inspired by PR #5498 (sprmn24). Closes #5480.	2026-04-06 12:09:36 -07:00
Teknium	3d08a2fa1b	fix: extract MEDIA: tags from cron delivery before sending (#5598 ) The cron scheduler delivery path passed raw text including MEDIA: tags to _send_to_platform(), so media attachments were delivered as literal text instead of actual files. The send function already supports media_files= but the cron path never used it. Now calls BasePlatformAdapter.extract_media() to split media paths from text before sending, matching the gateway's normal message flow. Salvaged from PR #4877 by robert-hoffmann.	2026-04-06 11:42:44 -07:00
kshitijk4poor	5e88eb2ba0	fix(signal): implement send_image_file, send_voice, and send_video for MEDIA: tag delivery The Signal adapter inherited base class defaults for send_image_file(), send_voice(), and send_video() which only sent the file path as text (e.g. '🖼️ Image: /tmp/chart.png') instead of actually delivering the file as a Signal attachment. When agent responses contain MEDIA:/path/to/file tags, the gateway media pipeline extracts them and routes through these methods by file type. Without proper overrides, image/audio/video files were never actually delivered to Signal users. Extract a shared _send_attachment() helper that handles all file validation, size checking, group/DM routing, and RPC dispatch. The four public methods (send_document, send_image_file, send_voice, send_video) now delegate to this helper, following the same pattern used by WhatsApp (_send_media_to_bridge) and Discord (_send_file_attachment). The helper also uses a single stat() call with try/except FileNotFoundError instead of the previous exists() + stat() two-syscall pattern, eliminating a TOCTOU race. As a bonus, send_document() now gains the 100MB size check that was previously missing (inconsistency with send_image). Add 25 tests covering all methods plus MEDIA: tag extraction integration, method-override guards, and send_document's new size check. Fixes #5105	2026-04-06 11:41:34 -07:00
kshitijk4poor	214e60c951	fix: sanitize Telegram command names to strip invalid characters Telegram Bot API requires command names to contain only lowercase a-z, digits 0-9, and underscores. Skill/plugin names containing characters like +, /, @, or . caused set_my_commands to fail with Bot_command_invalid. Two-layer fix: - scan_skill_commands(): strip non-alphanumeric/non-hyphen chars from cmd_key at source, collapse consecutive hyphens, trim edges, skip names that sanitize to empty string - _sanitize_telegram_name(): centralized helper used by all 3 Telegram name generation sites (core commands, plugin commands, skill commands) with empty-name guard at each call site Closes #5534	2026-04-06 11:27:28 -07:00
ClintonEmok	f77be22c65	Fix #5211 : Preserve dots in OpenCode Go model names OpenCode Go model names with dots (minimax-m2.7, glm-4.5, kimi-k2.5) were being mangled to hyphens (minimax-m2-7), causing HTTP 401 errors. Two code paths were affected: 1. model_normalize.py: opencode-go was incorrectly in DOT_TO_HYPHEN_PROVIDERS 2. run_agent.py: _anthropic_preserve_dots() did not check for opencode-go Fix: - Remove opencode-go from _DOT_TO_HYPHEN_PROVIDERS (dots are correct for Go) - Add opencode-go to _anthropic_preserve_dots() provider check - Add opencode.ai/zen/go to base_url fallback check - Add regression tests in tests/test_model_normalize.py Co-authored-by: jacob3712 <jacob3712@users.noreply.github.com>	2026-04-06 11:25:06 -07:00
Teknium	582dbbbbf7	feat: add grok to TOOL_USE_ENFORCEMENT_MODELS for direct xAI usage (#5595 ) Grok models (x-ai/grok-4.20-beta, grok-code-fast-1) now receive tool-use enforcement guidance, steering them to actually call tools instead of describing intended actions. Matches both OpenRouter (x-ai/grok-*) and direct xAI API usage.	2026-04-06 11:22:07 -07:00
Teknium	cc7136b1ac	fix: update Gemini model catalog + wire models.dev as live model source Follow-up for salvaged PR #5494: - Update model catalog to Gemini 3.x + Gemma 4 (drop deprecated 2.0) - Add list_agentic_models() to models_dev.py with noise filter - Wire models.dev into _model_flow_api_key_provider as primary source (static curated list serves as offline fallback) - Add gemini -> google mapping in PROVIDER_TO_MODELS_DEV - Fix Gemma 4 context lengths to 256K (models.dev values) - Update auxiliary model to gemini-3-flash-preview - Expand tests: 3.x catalog, context lengths, models.dev integration	2026-04-06 10:28:03 -07:00
Teknium	6dfab35501	feat(providers): add Google AI Studio (Gemini) as a first-class provider Cherry-picked from PR #5494 by kshitijk4poor. Adds native Gemini support via Google's OpenAI-compatible endpoint. Zero new dependencies.	2026-04-06 10:28:03 -07:00
Teknium	5747590770	fix: follow-up improvements for salvaged PR #5456 - SQLite write queue: thread-local connection pooling instead of creating+closing a new connection per operation - Prefetch threads: join previous batch before spawning new ones to prevent thread accumulation on rapid queue_prefetch() calls - Shutdown: join prefetch threads before stopping write queue - Add 73 tests covering _Client HTTP payloads, _WriteQueue crash recovery & connection reuse, _build_overlay deduplication, RetainDBMemoryProvider lifecycle/tools/prefetch/hooks, thread accumulation guard, and reasoning_level heuristic	2026-04-06 02:00:55 -07:00
kshitijk4poor	d3d5b895f6	refactor: simplify _get_service_pids — dedupe systemd scopes, fix self-import, harden launchd parsing - Loop over user/system scope args instead of duplicating the systemd block - Call get_launchd_label() directly instead of self-importing from hermes_cli.gateway - Validate launchd output by checking parts[2] matches expected label (skip header) - Add race-condition assumption docstring	2026-04-06 00:09:06 -07:00
kshitijk4poor	a2a9ad7431	fix: hermes update kills freshly-restarted gateway service After restarting a service-managed gateway (systemd/launchd), the stale-process sweep calls find_gateway_pids() which returns ALL gateway PIDs via ps aux — including the one just spawned by the service manager. The sweep kills it, leaving the user with a stopped gateway and a confusing 'Restart manually' message. Fix: add _get_service_pids() to query systemd MainPID and launchd PID for active gateway services, then exclude those PIDs from the sweep. Also add exclude_pids parameter to find_gateway_pids() and kill_gateway_processes() so callers can skip known service-managed PIDs. Adds 9 targeted tests covering: - _get_service_pids() for systemd, launchd, empty, and zero-PID cases - find_gateway_pids() exclude_pids filtering - cmd_update integration: service PID not killed after restart - cmd_update integration: manual PID killed while service PID preserved	2026-04-06 00:09:06 -07:00
Teknium	9c96f669a1	feat: centralized logging, instrumentation, hermes logs CLI, gateway noise fix (#5430 ) Adds comprehensive logging infrastructure to Hermes Agent across 4 phases: Phase 1 — Centralized logging - New hermes_logging.py with idempotent setup_logging() used by CLI, gateway, and cron - agent.log (INFO+) and errors.log (WARNING+) with RotatingFileHandler + RedactingFormatter - config.yaml logging: section (level, max_size_mb, backup_count) - All entry points wired (cli.py, main.py, gateway/run.py, run_agent.py) - Fixed debug_helpers.py writing to ./logs/ instead of ~/.hermes/logs/ Phase 2 — Event instrumentation - API calls: model, provider, tokens, latency, cache hit % - Tool execution: name, duration, result size (both sequential + concurrent) - Session lifecycle: turn start (session/model/provider/platform), compression (before/after) - Credential pool: rotation events, exhaustion tracking Phase 3 — hermes logs CLI command - hermes logs / hermes logs -f / hermes logs errors / hermes logs gateway - --level, --session, --since filters - hermes logs list (file sizes + ages) Phase 4 — Gateway bug fix + noise reduction - fix: _async_flush_memories() called with wrong arg count — sessions never flushed - Batched session expiry logs: 6 lines/cycle → 2 summary lines - Added inbound message + response time logging 75 new tests, zero regressions on the full suite.	2026-04-06 00:08:20 -07:00
Teknium	89db3aeb2c	fix(cron): add delivery guidance to cron prompt — stop send_message thrashing (#5444 ) Cron agents were burning iterations trying to use send_message (which is disabled via messaging toolset) because their prompts said things like 'send the report to Telegram'. The scheduler handles delivery automatically via the deliver setting, but nothing told the agent that. Add a delivery guidance hint to _build_job_prompt alongside the existing [SILENT] hint: tells agents their final response is auto-delivered and they should NOT use send_message. Before: only [SILENT] suppression hint After: delivery guidance ('do NOT use send_message') + [SILENT] hint	2026-04-05 23:58:45 -07:00
Teknium	d6ef7fdf92	fix(cron): replace wall-clock timeout with inactivity-based timeout (#5440 ) Port the gateway's inactivity-based timeout pattern (PR #5389) to the cron scheduler. The agent can now run for hours if it's actively calling tools or receiving stream tokens — only genuine inactivity (no activity for HERMES_CRON_TIMEOUT seconds, default 600s) triggers a timeout. This fixes the Sunday PR scouts (openclaw, nanoclaw, ironclaw) which all hit the hard 600s wall-clock limit while actively working. Changes: - Replace flat future.result(timeout=N) with a polling loop that checks agent.get_activity_summary() every 5s (same pattern as gateway) - Timeout error now includes diagnostic info: last activity description, idle duration, current tool, iteration count - HERMES_CRON_TIMEOUT=0 means unlimited (no timeout) - Move sys.path.insert before repo-level imports to fix ModuleNotFoundError for hermes_time on stale gateway processes - Add time import needed by the polling loop - Add 9 tests covering active/idle/unlimited/env-var/diagnostic scenarios	2026-04-05 23:49:42 -07:00
kshitijk4poor	38bcaa1e86	chore: remove langfuse doc, smoketest script, and installed-plugin test Made-with: Cursor	2026-04-05 23:31:29 -07:00
kshitijk4poor	f530ef1835	feat(plugins): pre_api_request/post_api_request with narrow payloads - Rename per-LLM-call hooks from pre_llm_request/post_llm_request for clarity vs pre_llm_call - Emit summary kwargs only (counts, usage dict from normalize_usage); keep env_var_enabled for HERMES_DUMP_REQUESTS - Add is_truthy_value/env_var_enabled to utils; wire hermes_cli.plugins._env_enabled through it - Update Langfuse local setup doc; add scripts/langfuse_smoketest.py and optional ~/.hermes plugin tests Made-with: Cursor	2026-04-05 23:31:29 -07:00
kshitijk4poor	9e820dda37	Add request-scoped plugin lifecycle hooks	2026-04-05 23:31:29 -07:00
Teknium	dce5f51c7c	feat: config structure validation — detect malformed YAML at startup (#5426 ) Add validate_config_structure() that catches common config.yaml mistakes: - custom_providers as dict instead of list (missing '-' in YAML) - fallback_model accidentally nested inside another section - custom_providers entries missing required fields (name, base_url) - Missing model section when custom_providers is configured - Root-level keys that look like misplaced custom_providers fields Surface these diagnostics at three levels: 1. Startup: print_config_warnings() runs at CLI and gateway module load, so users see issues before hitting cryptic errors 2. Error time: 'Unknown provider' errors in auth.py and model_switch.py now include config diagnostics with fix suggestions 3. Doctor: 'hermes doctor' shows a Config Structure section with all issues and fix hints Also adds a warning log in runtime_provider.py when custom_providers is a dict (previously returned None silently). Motivated by a Discord user who had malformed custom_providers YAML and got only 'Unknown Provider' with no guidance on what was wrong. 17 new tests covering all validation paths.	2026-04-05 23:31:20 -07:00

1 2 3 4 5 ...

1345 Commits