hermes-agent

Author	SHA1	Message	Date
Hermes Merge Bot	52ea3a8935	Merge PR #850	2026-04-16 02:09:00 -04:00
Hermes Merge Bot	43246d6cb4	Merge PR #852	2026-04-16 02:08:06 -04:00
Hermes Merge Bot	dff451081d	Merge PR #856	2026-04-16 02:05:42 -04:00
Hermes Merge Bot	5509b157c5	Merge PR #864	2026-04-16 02:05:05 -04:00
Hermes Merge Bot	fcc322fb81	Merge PR #867	2026-04-16 02:03:23 -04:00
Hermes Merge Bot	9bba9ecc40	Merge PR #866	2026-04-16 02:02:43 -04:00
Alexander Whitestone	3238cf4eb1	feat: Tool investigation report + Mem0 local provider (#842 ) Some checks failed Contributor Attribution Check / check-attribution (pull_request) Successful in 38s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 32s Details Tests / test (pull_request) Failing after 43m54s Details Tests / e2e (pull_request) Successful in 2m5s Details ## Investigation Report - docs/tool-investigation-2026-04-15.md: Full report analyzing 414 tools from awesome-ai-tools. Top 5 recommendations with integration paths. - docs/plans/awesome-ai-tools-integration.md: Implementation tracking plan. ## Mem0 Local Provider (P1) - plugins/memory/mem0_local/: New ChromaDB-backed memory provider. No API key required - fully sovereign. Compatible tool schemas with cloud Mem0 (mem0_profile, mem0_search, mem0_conclude). - Pattern-based fact extraction from conversations. - Deterministic dedup via content hashing. - Circuit breaker for resilience. - tests/plugins/memory/test_mem0_local.py: Full test coverage. ## Issues Filed - #857: LightRAG integration (P2) - #858: n8n workflow orchestration (P3) - #859: RAGFlow document understanding (P4) - #860: tensorzero LLMOps evaluation (P3) Closes #842	2026-04-15 23:04:41 -04:00
Timmy	eed87e454e	test: Benchmark Gemma 4 vision accuracy vs current approach (#817 ) Some checks failed Contributor Attribution Check / check-attribution (pull_request) Successful in 26s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 26s Details Tests / e2e (pull_request) Successful in 2m38s Details Tests / test (pull_request) Failing after 47m49s Details Vision benchmark suite comparing Gemma 4 (google/gemma-4-27b-it) vs current Gemini 3 Flash Preview (google/gemini-3-flash-preview). Metrics: - OCR accuracy (character + word overlap) - Description completeness (keyword coverage) - Structural quality (length, sentences, numbers) - Latency (ms per image) - Token usage - Consistency across runs Features: - 24 diverse test images (screenshots, diagrams, photos, charts) - Category-specific evaluation prompts - Automated verdict with composite scoring - JSON + markdown report output - 28 unit tests passing Usage: python benchmarks/vision_benchmark.py --images benchmarks/test_images.json python benchmarks/vision_benchmark.py --url https://example.com/img.png python benchmarks/vision_benchmark.py --generate-dataset Closes #817.	2026-04-15 23:02:02 -04:00
Alexander Whitestone	f03709aa29	test: crisis hook integration tests with agent loop (#707 ) Some checks failed Contributor Attribution Check / check-attribution (pull_request) Successful in 16s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 15s Details Tests / e2e (pull_request) Failing after 12m38s Details Tests / test (pull_request) Failing after 25m58s Details 10 integration tests verifying crisis detection works correctly when called from the agent conversation flow: - scan_user_message detects CRITICAL/HIGH/MEDIUM/LOW levels - Safe messages pass through without triggering - Tool handler returns valid JSON - Compassion injection includes 988 lifeline for CRITICAL/HIGH - Case insensitive detection - Empty/None text handled gracefully - False positive resistance on common non-crisis phrases - Config check returns bool - Callable from agent context (not just isolation tests)	2026-04-15 23:00:12 -04:00
PRIMA	85a654348a	feat: poka-yoke — prevent hardcoded ~/.hermes paths (closes #835 ) Some checks failed Contributor Attribution Check / check-attribution (pull_request) Successful in 27s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 19s Details Tests / e2e (pull_request) Successful in 1m55s Details Tests / test (pull_request) Failing after 56m41s Details scripts/lint_hardcoded_paths.py (new): - Scans Python files for hardcoded home-directory paths - Detects: Path.home()/.hermes without env fallback, /Users/<name>/, /home/<name>/ - Excludes: comments, docstrings, test files, skills, plugins, docs - Excludes correct patterns: profiles_parent, current_default, native_home - Supports --staged (git pre-commit), --fix (suggestions), --json output scripts/pre-commit-hardcoded-paths.sh (new): - Pre-commit hook that runs lint_hardcoded_paths.py --staged - Blocks commits containing hardcoded path violations tools/confirmation_daemon.py (fixed): - Replaced Path.home() / '.hermes' / 'approval_whitelist.json' with get_hermes_home() / 'approval_whitelist.json' - Added import of get_hermes_home from hermes_constants tests/test_hardcoded_paths.py (new): - 11 tests: detection, exclusion, fallback patterns, clean files	2026-04-15 22:56:32 -04:00
Alexander Whitestone	13ef670c05	feat: session compaction with fact extraction (#748 ) Some checks failed Contributor Attribution Check / check-attribution (pull_request) Successful in 29s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 33s Details Tests / e2e (pull_request) Successful in 3m26s Details Tests / test (pull_request) Failing after 1h28m50s Details Before compressing conversation context, extract durable facts (user preferences, corrections, project details) and save to fact store so they survive compression. New agent/session_compactor.py: - extract_facts_from_messages(): scans user messages for preferences, corrections, project/infra facts using regex - 3 pattern categories: user_pref (5 patterns), correction (3 patterns), project (4 patterns) - ExtractedFact: category, entity, content, confidence, source_turn - save_facts_to_store(): saves to fact store (callback or auto-detect) - extract_and_save_facts(): one-call extraction + persistence - Deduplication by category+content - Skips tool results, short messages, system messages - format_facts_summary(): human-readable summary Tests: tests/test_session_compactor.py (9 tests) Closes #748	2026-04-15 22:41:54 -04:00
Alexander Whitestone	9f0c410481	feat: batch tool execution with parallel safety checks (#749 ) Some checks failed Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Contributor Attribution Check / check-attribution (pull_request) Successful in 35s Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 37s Details Tests / e2e (pull_request) Successful in 1m48s Details Tests / test (pull_request) Failing after 36m13s Details Centralized safety classification for tool call batches: tools/batch_executor.py (new): - classify_tool_calls() — classifies batch into parallel_safe, path_scoped, sequential, never_parallel tiers - BatchExecutionPlan — structured plan with parallel and sequential batches - Path conflict detection — write_file + patch on same file go sequential - Destructive command detection — rm, mv, sed -i, redirects - execute_parallel_batch() — ThreadPoolExecutor for concurrent execution tools/registry.py (enhanced): - ToolEntry.parallel_safe field — tools can declare parallel safety - registry.register() accepts parallel_safe=True parameter - registry.get_parallel_safe_tools() — query registry-declared safe tools Safety tiers: - parallel_safe: read_file, web_search, search_files, etc. - path_scoped: write_file, patch (concurrent when paths don't overlap) - sequential: terminal, delegate_task, unknown tools - never_parallel: clarify (requires user interaction) 19 tests passing.	2026-04-15 22:17:16 -04:00
Alexander Whitestone	b34b5b293d	test: add tests for tool hallucination prevention (#836 ) Some checks failed Contributor Attribution Check / check-attribution (pull_request) Successful in 24s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 22s Details Tests / e2e (pull_request) Successful in 3m6s Details Tests / test (pull_request) Failing after 41m24s Details	2026-04-16 02:15:59 +00:00
Timmy Time	fb7464995c	fix: Ultraplan Mode for daily autonomous planning (closes #840 ) Some checks failed Contributor Attribution Check / check-attribution (pull_request) Successful in 37s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 39s Details Tests / test (pull_request) Failing after 1h15m33s Details Tests / e2e (pull_request) Successful in 2m20s Details	2026-04-15 22:14:16 -04:00
Alexander Whitestone	d8d7846897	feat: add tests/tools/test_confirmation_daemon.py from PR #397	2026-04-16 01:35:24 +00:00
Alexander Whitestone	6840d05554	feat: add tests/agent/test_privacy_filter.py from PR #397	2026-04-16 01:35:21 +00:00
Alexander Whitestone	30afd529ac	feat: add crisis detection tool — the-door integration (#141 ) Some checks failed Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Contributor Attribution Check / check-attribution (pull_request) Successful in 44s Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 59s Details Tests / e2e (pull_request) Successful in 3m49s Details Tests / test (pull_request) Failing after 44m1s Details New tool: tools/crisis_tool.py - Wraps the-door's canonical crisis detection (detect.py) - Scans user messages for despair/suicidal ideation - Classifies into NONE/LOW/MEDIUM/HIGH/CRITICAL tiers - Provides recommended actions per tier - Gateway hook: scan_user_message() for pre-API-call detection - System prompt injection: compassion_injection based on crisis level - Optional escalation logging to crisis_escalations.jsonl - Optional bridge API POST for HIGH+ (configurable via CRISIS_BRIDGE_URL) - Configurable via crisis_detection: true/false in config.yaml - Follows the-door design principles: never computes life value, never suggests death, errs on side of higher risk Also: tests/test_crisis_tool.py (9 tests, all passing)	2026-04-15 21:00:06 -04:00
asheriif	33ae403890	fix(gateway): fix matrix lingering typing indicator	2026-04-15 04:16:16 -07:00
Teknium	47e6ea84bb	fix: file handle bug, warning text, and tests for Discord media send - Fix file handle closed before POST: nest session.post() inside the 'with open()' block so aiohttp can read the file during upload - Update warning text to include weixin (also supports media delivery) - Add 8 unit tests covering: text+media, media-only, missing files, upload failures, multiple files, and _send_to_platform routing	2026-04-15 04:16:06 -07:00
Teknium	1c4d3216d3	fix(cron): include job_id in delivery and guide models on removal workflow (#10242 ) * fix(gateway): suppress duplicate replies on interrupt and streaming flood control Three fixes for the duplicate reply bug affecting all gateway platforms: 1. base.py: Suppress stale response when the session was interrupted by a new message that hasn't been consumed yet. Checks both interrupt_event and _pending_messages to avoid false positives. (#8221, #2483) 2. run.py (return path): Remove response_previewed guard from already_sent check. Stream consumer's already_sent alone is authoritative — if content was delivered via streaming, the duplicate send must be suppressed regardless of the agent's response_previewed flag. (#8375) 3. run.py (queued-message path): Same fix — already_sent without response_previewed now correctly marks the first response as already streamed, preventing re-send before processing the queued message. The response_previewed field is still produced by the agent (run_agent.py) but is no longer required as a gate for duplicate suppression. The stream consumer's already_sent flag is the delivery-level truth about what the user actually saw. Concepts from PR #8380 (konsisumer). Closes #8375, #8221, #2483. * fix(cron): include job_id in delivery and guide models on removal workflow Users reported cron reminders keep firing after asking the agent to stop. Root cause: the conversational agent didn't know the job_id (not in delivery) and models don't reliably do the list→remove two-step without guidance. 1. Include job_id in the cron delivery wrapper so users and agents can reference it when requesting removal. 2. Replace confusing footer ('The agent cannot see this message') with actionable guidance ('To stop or manage this job, send me a new message'). 3. Add explicit list→remove guidance in the cronjob tool schema so models know to list first and never guess job IDs.	2026-04-15 03:46:58 -07:00
Teknium	2546b7acea	fix(gateway): suppress duplicate replies on interrupt and streaming flood control Three fixes for the duplicate reply bug affecting all gateway platforms: 1. base.py: Suppress stale response when the session was interrupted by a new message that hasn't been consumed yet. Checks both interrupt_event and _pending_messages to avoid false positives. (#8221, #2483) 2. run.py (return path): Remove response_previewed guard from already_sent check. Stream consumer's already_sent alone is authoritative — if content was delivered via streaming, the duplicate send must be suppressed regardless of the agent's response_previewed flag. (#8375) 3. run.py (queued-message path): Same fix — already_sent without response_previewed now correctly marks the first response as already streamed, preventing re-send before processing the queued message. The response_previewed field is still produced by the agent (run_agent.py) but is no longer required as a gate for duplicate suppression. The stream consumer's already_sent flag is the delivery-level truth about what the user actually saw. Concepts from PR #8380 (konsisumer). Closes #8375, #8221, #2483.	2026-04-15 03:42:24 -07:00
Teknium	a4e1842f12	fix: strip reasoning item IDs from Responses API input when store=False (#10217 ) With store=False (our default for the Responses API), the API does not persist response items. When reasoning items with 'id' fields were replayed on subsequent turns, the API attempted a server-side lookup for those IDs and returned 404: Item with id 'rs_...' not found. Items are not persisted when store is set to false. The encrypted_content blob is self-contained for reasoning chain continuity — the id field is unnecessary and triggers the failed lookup. Fix: strip 'id' from reasoning items in both _chat_messages_to_responses_input (message conversion) and _preflight_codex_input_items (normalization layer). The id is still used for local deduplication but never sent to the API. Reported by @zuogl448 on GPT-5.4.	2026-04-15 03:19:43 -07:00
Teknium	e69526be79	fix(send_message): URL-encode Matrix room IDs and add Matrix to schema examples (#10151 ) Matrix room IDs contain ! and : which must be percent-encoded in URI path segments per the Matrix C-S spec. Without encoding, some homeservers reject the PUT request. Also adds 'matrix:!roomid:server.org' and 'matrix:@user:server.org' to the tool schema examples so models know the correct target format.	2026-04-15 00:10:59 -07:00
Teknium	180b14442f	test: add _parse_target_ref Matrix coverage for salvaged PR #6144	2026-04-15 00:08:14 -07:00
Ubuntu	da8bab77fb	fix(cli): restore messaging toolset for gateway platforms	2026-04-14 23:13:35 -07:00
Teknium	9932366f3c	feat(doctor): add Command Installation check for hermes bin symlink hermes doctor now checks whether the ~/.local/bin/hermes symlink exists and points to the correct venv entry point. With --fix, it creates or repairs the symlink automatically. Covers: - Missing symlink at ~/.local/bin/hermes (or $PREFIX/bin on Termux) - Symlink pointing to wrong target - Missing venv entry point (venv/bin/hermes or .venv/bin/hermes) - PATH warning when ~/.local/bin is not on PATH - Skipped on Windows (different mechanism) Addresses user report: 'python -m hermes_cli.main doesn't have an option to fix the local bin/install' 10 new tests covering all scenarios.	2026-04-14 23:13:11 -07:00
Teknium	029938fbed	fix(cli): defensive subparser routing for argparse bpo-9338 (#10113 ) On some Python versions, argparse fails to route subcommand tokens when the parent parser has nargs='?' optional arguments (--continue). The symptom: 'hermes model' produces 'unrecognized arguments: model' even though 'model' is a registered subcommand. Fix: when argv contains a token matching a known subcommand, set subparsers.required=True to force deterministic routing. If that fails (e.g. 'hermes -c model' where 'model' is consumed as the session name for --continue), fall back to the default optional-subparsers behaviour. Adds 13 tests covering all key argument combinations. Reported via user screenshot showing the exact error on an installed version with the model subcommand listed in usage but rejected at parse time.	2026-04-14 23:13:02 -07:00
Teknium	5d5d21556e	fix: sync client.api_key during UnicodeEncodeError ASCII recovery (#10090 ) The existing recovery block sanitized self.api_key and self._client_kwargs['api_key'] but did not update self.client.api_key. The OpenAI SDK stores its own copy of api_key and reads it dynamically via the auth_headers property on every request. Without this fix, the retry after sanitization would still send the corrupted key in the Authorization header, causing the same UnicodeEncodeError. The bug manifests when an API key contains Unicode lookalike characters (e.g. ʋ U+028B instead of v) from copy-pasting out of PDFs, rich-text editors, or web pages with decorative fonts. httpx hard-encodes all HTTP headers as ASCII, so the non-ASCII char in the Authorization header triggers the error. Adds TestApiKeyClientSync with two tests verifying: - All three key locations are synced after sanitization - Recovery handles client=None (pre-init) without crashing	2026-04-14 22:37:45 -07:00
Teknium	93fe4ead83	fix: warn on invalid context_length format in config.yaml (#10067 ) Previously, non-integer context_length values (e.g. '256K') in config.yaml were silently ignored, causing the agent to fall back to 128K auto-detection with no user feedback. This was confusing for users with custom LiteLLM endpoints expecting larger context. Now prints a clear stderr warning and logs at WARNING level when model.context_length or custom_providers[].models.<model>.context_length cannot be parsed as an integer, telling users to use plain integers (e.g. 256000 instead of '256K'). Reported by community user ChFarhan via Discord.	2026-04-14 22:14:27 -07:00
Teknium	a8b7db35b2	fix: interrupt agent immediately when user messages during active run (#10068 ) When a user sends a message while the agent is executing a task on the gateway, the agent is now interrupted immediately — not silently queued. Previously, messages were stored in _pending_messages with zero feedback to the user, potentially leaving them waiting 1+ hours. Root cause: Level 1 guard (base.py) intercepted all messages for active sessions and returned with no response. Level 2 (gateway/run.py) which calls agent.interrupt() was never reached. Fix: Expand _handle_active_session_busy_message to handle the normal (non-draining) case: 1. Call running_agent.interrupt(text) to abort in-flight tool calls and exit the agent loop at the next check point 2. Store the message as pending so it becomes the next turn once the interrupted run returns 3. Send a brief ack: 'Interrupting current task (10 min elapsed, iteration 21/60, running: terminal). I'll respond shortly.' 4. Debounce acks to once per 30s to avoid spam on rapid messages Reported by @Lonely__MH.	2026-04-14 22:07:28 -07:00
Teknium	8548893d14	feat: entry-level Podman support — find_docker() + rootless entrypoint (#10066 ) - find_docker() now checks HERMES_DOCKER_BINARY env var first, then docker on PATH, then podman on PATH, then macOS known locations - Entrypoint respects HERMES_HOME env var (was hardcoded to /opt/data) - Entrypoint uses groupmod -o to tolerate non-unique GIDs (fixes macOS GID 20 conflict with Debian's dialout group) - Entrypoint makes chown best-effort so rootless Podman continues instead of failing with 'Operation not permitted' - 5 new tests covering env var override, podman fallback, precedence Based on work by alanjds (PR #3996) and malaiwah (PR #8115). Closes #4084.	2026-04-14 21:20:37 -07:00
Teknium	c5688e7c8b	fix(gateway): break compression-exhaustion infinite loop and auto-reset session (#9893 ) When compression fails after max attempts, the agent returns {completed: False, partial: True} but was missing the 'failed' flag. The gateway's agent_failed_early guard checked for 'failed' AND 'not final_response', but _run_agent_blocking always converts errors to final_response — making the guard dead code. This caused the oversized session to persist, creating an infinite fail loop where every subsequent message hits the same compression failure. Changes: - run_agent.py: add 'failed: True' and 'compression_exhausted: True' to all 5 compression-exhaustion return paths - gateway/run.py (_run_agent_blocking): forward 'failed' and 'compression_exhausted' flags through to the caller - gateway/run.py (_handle_message_with_agent): fix agent_failed_early to check bool(failed) without the broken 'not final_response' clause; auto-reset the session when compression is exhausted so the next message starts fresh - Update tests to match new guard logic and add TestCompressionExhaustedFlag test class Closes #9893	2026-04-14 21:18:17 -07:00
Greer Guthrie	4b2a1a4337	fix(tools): auto-discover built-in tool modules	2026-04-14 21:12:29 -07:00
Teknium	5cbb45d93e	fix: preserve session_id across previous_response_id chains in /v1/responses (#10059 ) The /v1/responses endpoint generated a new UUID session_id for every request, even when previous_response_id was provided. This caused each turn of a multi-turn conversation to appear as a separate session on the web dashboard, despite the conversation history being correctly chained. Fix: store session_id alongside the response in the ResponseStore, and reuse it when a subsequent request chains via previous_response_id. Applies to both the non-streaming /v1/responses path and the streaming SSE path. The /v1/runs endpoint also gains session continuity from stored responses (explicit body.session_id still takes priority). Adds test verifying session_id is preserved across chained requests.	2026-04-14 21:06:32 -07:00
Teknium	cf1d718823	fix: keep batch-path function_call_output.output as string per OpenAI spec The streaming path emits output as content-part arrays for Open WebUI compatibility, but the batch (non-streaming) Responses API path must return output as a plain string per the OpenAI Responses API spec. Reverts the _extract_output_items change from the cherry-picked commits while preserving the streaming path's array format.	2026-04-14 20:51:52 -07:00
simon-marcus	302554b158	fix(api-server): format responses tool outputs for open webui	2026-04-14 20:51:52 -07:00
simon-marcus	d6c09ab94a	feat(api-server): stream /v1/responses SSE tool events	2026-04-14 20:51:52 -07:00
Teknium	da528a8207	fix: detect and strip non-ASCII characters from API keys (#6843 ) API keys containing Unicode lookalike characters (e.g. ʋ U+028B instead of v) cause UnicodeEncodeError when httpx encodes the Authorization header as ASCII. This commonly happens when users copy-paste keys from PDFs, rich-text editors, or web pages with decorative fonts. Three layers of defense: 1. Save-time validation (hermes_cli/config.py): _check_non_ascii_credential() strips non-ASCII from credential values when saving to .env, with a clear warning explaining the issue. 2. Load-time sanitization (hermes_cli/env_loader.py): _sanitize_loaded_credentials() strips non-ASCII from credential env vars (those ending in _API_KEY, _TOKEN, _SECRET, _KEY) after dotenv loads them, so the rest of the codebase never sees non-ASCII keys. 3. Runtime recovery (run_agent.py): The UnicodeEncodeError recovery block now also sanitizes self.api_key and self._client_kwargs['api_key'], fixing the gap where message/tool sanitization succeeded but the API key still caused httpx to fail on the Authorization header. Also: hermes_logging.py RotatingFileHandler now explicitly sets encoding='utf-8' instead of relying on locale default (defensive hardening for ASCII-locale systems).	2026-04-14 20:20:31 -07:00
Greer Guthrie	c10fea8d26	fix(mcp): make server aliases explicit	2026-04-14 17:19:20 -07:00
Greer Guthrie	cda64a5961	fix(mcp): resolve toolsets from live registry	2026-04-14 17:19:20 -07:00
Teknium	2a98098035	fix: hermes gateway restart waits for service to come back up (#8260 ) Previously, systemd_restart() sent SIGUSR1 to the gateway, printed 'restart requested', and returned immediately. The gateway still needed to drain active agents, exit with code 75, wait for systemd's RestartSec=30, and start the new process. The user saw 'success' but the gateway was actually down for 30-60 seconds. Now the SIGUSR1 path blocks with progress feedback: Phase 1 — wait for old process to die: ⏳ User service draining active work... Polls os.kill(pid, 0) until ProcessLookupError (up to 90s) Phase 2 — wait for new process to become active: ⏳ Waiting for hermes-gateway to restart... Polls systemctl is-active + verifies new PID (up to 60s) Success: ✓ User service restarted (PID 12345) Timeout: ⚠ User service did not become active within 60s. Check status: hermes gateway status Check logs: journalctl --user -u hermes-gateway --since '2 min ago' The reload-or-restart fallback path (line 1189) already blocks because systemctl reload-or-restart is synchronous. Test plan: - Updated test to verify wait-for-restart behavior - All 118 gateway CLI tests pass	2026-04-14 17:12:58 -07:00
Teknium	6c89306437	fix: break stuck session resume loops after repeated restarts (#7536 ) When a session gets stuck (hung terminal, runaway tool loop) and the user restarts the gateway, the same session history loads and puts the agent right back in the stuck state. The user is trapped in a loop: restart → stuck → restart → stuck. Fix: track restart-failure counts per session using a simple JSON file (.restart_failure_counts). On each shutdown with active agents, the counter increments for those sessions. On startup, if any session has been active across 3+ consecutive restarts, it's auto-suspended — giving the user a clean slate on their next message. The counter resets to 0 when a session completes a turn successfully (response delivered), so normal sessions that happen to be active during planned restarts (/restart, hermes update) won't accumulate false counts. Implementation: - _increment_restart_failure_counts(): called during stop() when agents are active. Writes {session_key: count} to JSON file. Sessions NOT active are dropped (loop broken). - _suspend_stuck_loop_sessions(): called on startup. Reads the file, suspends sessions at threshold (3), clears the file. - _clear_restart_failure_count(): called after successful response delivery. Removes the session from the counter file. No SessionEntry schema changes. No database migration. Pure file-based tracking that naturally cleans up. Test plan: - 9 new stuck-loop tests (increment, accumulate, threshold, clear, suspend, file cleanup, edge cases) - All 28 gateway lifecycle tests pass (restart drain + auto-continue + stuck loop)	2026-04-14 17:08:35 -07:00
Teknium	e7475b1582	feat: auto-continue interrupted agent work after gateway restart (#4493 ) When the gateway restarts mid-agent-work, the session transcript ends on a tool result the agent never processed. Previously, the user had to type 'continue' or use /retry (which replays from scratch, losing all prior work). Now, when the next user message arrives and the loaded history ends with role='tool', a system note is prepended: [System note: Your previous turn was interrupted before you could process the last tool result(s). Please finish processing those results and summarize what was accomplished, then address the user's new message below.] This is injected in _run_agent()'s run_sync closure, right before calling agent.run_conversation(). The agent sees the full history (including the pending tool results) and the system note, so it can summarize what was accomplished and then handle the user's new input. Design decisions: - No new session flags or schema changes — purely detects trailing tool messages in the loaded history - Works for any restart scenario (clean, crash, SIGTERM, drain timeout) as long as the session wasn't suspended (suspended = fresh start) - The user's actual message is preserved after the note - If the session WAS suspended (unclean shutdown), the old history is abandoned and the user starts fresh — no false auto-continue Also updates the shutdown notification message from 'Use /retry after restart to continue' to 'Send any message after restart to resume where it left off' — which is now accurate. Test plan: - 6 new auto-continue tests (trailing tool detection, no false positives for assistant/user/empty history, multi-tool, message preservation) - All 13 restart drain tests pass (updated /retry assertion)	2026-04-14 16:56:49 -07:00
adybag14-cyber	56c34ac4f7	fix(browser): add termux PATH fallbacks Refactor browser tool PATH construction to include Termux directories (/data/data/com.termux/files/usr/bin, /data/data/com.termux/files/usr/sbin) so agent-browser and npx are discoverable on Android/Termux. Extracts _browser_candidate_path_dirs() and _merge_browser_path() helpers to centralize PATH construction shared between _find_agent_browser() and _run_browser_command(), replacing duplicated inline logic. Also fixes os.pathsep usage (was hardcoded ':') for cross-platform correctness. Cherry-picked from PR #9846.	2026-04-14 16:55:55 -07:00
areu01or00	cfa24532d3	fix(discord): register native /restart slash command	2026-04-14 16:55:48 -07:00
Teknium	10494b42a1	feat(discord): register skills under /skill command group with category subcommands (#9909 ) Instead of consuming one top-level slash command slot per skill (hitting the 100-command limit with ~26 built-ins + 74 skills), skills are now organized under a single /skill group command with category-based subcommand groups: /skill creative ascii-art [args] /skill media gif-search [args] /skill mlops axolotl [args] Discord supports 25 subcommand groups × 25 subcommands = 625 max skills, well beyond the previous 74-slot ceiling. Categories are derived from the skill directory structure: - skills/creative/ascii-art/ → category 'creative' - skills/mlops/training/axolotl/ → category 'mlops' (top-level parent) - skills/dogfood/ → uncategorized (direct subcommand) Changes: - hermes_cli/commands.py: add discord_skill_commands_by_category() with category grouping, hub/disabled filtering, Discord limit enforcement - gateway/platforms/discord.py: replace top-level skill registration with _register_skill_group() using app_commands.Group hierarchy - tests: 7 new tests covering group creation, category grouping, uncategorized skills, hub exclusion, deep nesting, empty skills, and handler dispatch Inspired by Discord community suggestion from bottium.	2026-04-14 16:27:02 -07:00
Teknium	1525624904	fix: block agent from self-destructing gateway via terminal (#6666 ) Add dangerous command patterns that require approval when the agent tries to run gateway lifecycle commands via the terminal tool: - hermes gateway stop/restart — kills all running agents mid-work - hermes update — pulls code and restarts the gateway - systemctl restart/stop (with optional flags like --user) These patterns fire the approval prompt so the user must explicitly approve before the agent can kill its own gateway process. In YOLO mode, the commands run without approval (by design — YOLO means the user accepts all risks). Also fixes the existing systemctl pattern to handle flags between the command and action (e.g. 'systemctl --user restart' was previously undetected because the regex expected the action immediately after 'systemctl'). Root cause: issue #6666 reported agents running 'hermes gateway restart' via terminal, killing the gateway process mid-agent-loop. The user sees the agent suddenly stop responding with no explanation. Combined with the SIGTERM auto-recovery from PR #9875, the gateway now both prevents accidental self-destruction AND recovers if it happens anyway. Test plan: - Updated test_systemctl_restart_not_flagged → test_systemctl_restart_flagged - All 119 approval tests pass - E2E verified: hermes gateway restart, hermes update, systemctl --user restart all detected; hermes gateway status, systemctl status remain safe	2026-04-14 15:43:31 -07:00
Teknium	353b5bacbd	test: add tests for /health/detailed endpoint and gateway health probe - TestHealthDetailedEndpoint: 3 tests for the new API server endpoint (returns runtime data, handles missing status, no auth required) - TestProbeGatewayHealth: 5 tests for _probe_gateway_health() (URL normalization, successful/failed probes, fallback chain) - TestStatusRemoteGateway: 4 tests for /api/status remote fallback (remote probe triggers, skipped when local PID found, null PID handling)	2026-04-14 15:41:30 -07:00
Teknium	eed891f1bb	security: supply chain hardening — CI pinning, dep pinning, and code fixes (#9801 ) CI/CD Hardening: - Pin all 12 GitHub Actions to full commit SHAs (was mutable @vN tags) - Add explicit permissions: {contents: read} to 4 workflows - Pin CI pip installs to exact versions (pyyaml==6.0.2, httpx==0.28.1) - Extend supply-chain-audit.yml to scan workflow, Dockerfile, dependency manifest, and Actions version changes Dependency Pinning: - Pin git-based Python deps to commit SHAs (atroposlib, tinker, yc-bench) - Pin WhatsApp Baileys from mutable branch to commit SHA Tool Registry: - Reject tool name shadowing from different tool families (plugins/MCP cannot overwrite built-in tools). MCP-to-MCP overwrites still allowed. MCP Security: - Add tool description content scanning for prompt injection patterns - Log detailed change diff on dynamic tool refresh at WARNING level Skill Manager: - Fix dangerous verdict bug: agent-created skills with dangerous findings were silently allowed (ask->None->allow). Now blocked.	2026-04-14 14:23:37 -07:00
Roy-oss1	1aa76620d4	fix(feishu): keep approval clicks synchronized with callback card state Feishu approval clicks need the resolved card to come back from the synchronous callback path itself. Leaving approval resolution to the generic asynchronous card-action flow made button feedback depend on later loop work instead of the callback response the client is waiting for. Change-Id: I574997cbbcaa097fdba759b47367e28d1b56b040 Constraint: Feishu card-action callbacks must acknowledge quickly and reflect final approval state from the callback response path Rejected: Keep approval handling on the generic async card-action route \| leaves card state synchronization vulnerable to callback timing and follow-up update ordering Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep approval callback response construction separate from async queue unblocking unless Feishu callback semantics change Tested: pytest tests/gateway/test_feishu.py tests/gateway/test_feishu_approval_buttons.py tests/gateway/test_approve_deny_commands.py tests/gateway/test_slack_approval_buttons.py tests/gateway/test_telegram_approval_buttons.py -q Not-tested: Live Feishu workspace end-to-end callback rendering	2026-04-14 14:22:11 -07:00

1 2 3 4 5 ...

1810 Commits