hermes-agent

Author	SHA1	Message	Date
teknium1	41d8a80226	fix(display): fix subagent progress tree-view visual nits Two fixes to the subagent progress display from PR #186: 1. Task index prefix: show 1-indexed prefix ([1], [2], ...) for ALL tasks in batch mode (task_count > 1). Single tasks get no prefix. Previously task 0 had no prefix while others did, making batch output confusing. 2. Completion indicator: use spinner.print_above() instead of raw print() for per-task completion lines (✓ [1/2] ...). Raw print collided with the active spinner, mushing the completion text onto the spinner line. Now prints cleanly above. Added task_count parameter to _build_child_progress_callback and _run_single_child. Updated tests accordingly.	2026-02-28 23:29:49 -08:00
teknium1	4ec386cc72	fix(display): use spaces instead of ANSI \033[K in print_above() for prompt_toolkit compat print_above() used \033[K (erase-to-end-of-line) to clear the spinner line before printing text above it. This causes garbled escape codes when prompt_toolkit's patch_stdout is active in CLI mode. Switched to the same spaces-based clearing approach used by stop() — overwrite with blanks, then carriage return back to start of line. Updated test assertion to match the new clearing method.	2026-02-28 23:19:23 -08:00
lila	dd69f16c3e	feat(gateway): expose subagent tool calls and thinking to user (fixes #169 ) (#186 ) When subagents run via delegate_task, the user now sees real-time progress instead of silence: CLI: tree-view activity lines print above the delegation spinner 🔀 Delegating: research quantum computing ├─ 💭 "I'll search for papers first..." ├─ 🔍 web_search "quantum computing" ├─ 📖 read_file "paper.pdf" └─ ⠹ working... (18.2s) Gateway (Telegram/Discord): batched progress summaries sent every 5 tool calls to avoid message spam. Remaining tools flushed on subagent completion. Changes: - agent/display.py: add KawaiiSpinner.print_above() to print status lines above an active spinner without disrupting animation. Uses captured stdout (self._out) so it works inside the child's redirect_stdout(devnull). - tools/delegate_tool.py: add _build_child_progress_callback() that creates a per-child callback relaying tool calls and thinking events to the parent's spinner (CLI) or progress queue (gateway). Each child gets its own callback instance, so parallel subagents don't share state. Includes _flush() for gateway batch completion. - run_agent.py: fire tool_progress_callback with '_thinking' event when the model produces text content. Guarded by _delegate_depth > 0 so only subagents fire this (prevents gateway spam from main agent). REASONING_SCRATCHPAD/think/ reasoning XML tags are stripped before display. Tests: 21 new tests covering print_above, callback builder, thinking relay, SCRATCHPAD filtering, batching, flush, thread isolation, delegate_depth guard, and prefix handling.	2026-02-28 23:18:00 -08:00
teknium1	1db5598294	feat(tests): add live integration tests for file operations and shell noise filtering - Introduce a new test suite in `test_file_tools_live.py` to validate file operations and ensure accurate command execution in a real environment. - Implement assertions to check for shell noise contamination in outputs, enhancing the reliability of command results. - Create fixtures for setting up a local environment and populating directories with known file contents for comprehensive testing. - Refactor shell noise handling in `process_registry.py` and `local.py` to support multiple noise patterns, improving output cleanliness.	2026-02-28 22:57:58 -08:00
teknium1	70dfec9638	test(redact): add sensitive text redaction - Introduce a new test suite for the `redact_sensitive_text` function, covering various sensitive data formats including API keys, tokens, and environment variables. - Ensure that sensitive information is properly masked in logs and outputs while non-sensitive data remains unchanged. - Add tests for different scenarios including JSON fields, authorization headers, and environment variable assignments. - Implement a redacting formatter for logging to enhance security during log output.	2026-02-28 21:56:27 -08:00
teknium1	500f0eab4a	refactor(cli): Finalize OpenAI Codex Integration with OAuth - Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults. - Updated the context compressor's summary target tokens to 2500 for improved performance. - Added external credential detection for Codex CLI to streamline authentication. - Refactored various components to ensure consistent handling of authentication and model selection across the application.	2026-02-28 21:47:51 -08:00
Teknium	5a79e423fe	Merge branch 'main' into codex/align-codex-provider-conventions-mainrepo	2026-02-28 18:13:38 -08:00
Bartok9	35655298e6	fix(gateway): prevent TTS voice messages from accumulating across turns Fixes #160 The issue was that MEDIA tags were being extracted from ALL messages in the conversation history, not just messages from the current turn. This caused TTS voice messages generated in earlier turns to be re-attached to every subsequent reply. The fix: - Track history_len before calling run_conversation - Only scan messages AFTER history_len for MEDIA tags - Add comprehensive tests to prevent regression This ensures each voice message is sent exactly once, when it's generated, not on every subsequent message in the session.	2026-02-28 03:38:27 -05:00
teknium1	50cb4d5fc7	fix(agent): update error message for unsupported Anthropic API endpoints to clarify usage of OpenRouter	2026-02-27 23:23:31 -08:00
Teknium	2bc9508b7c	Merge pull request #173 from adavyas/fix/anthropic-base-url-guard fix(agent): fail fast on Anthropic native base URLs	2026-02-27 23:22:01 -08:00
teknium1	19f28a633a	fix(agent): enhance 413 error handling and improve conversation history management in tests	2026-02-27 23:04:32 -08:00
Teknium	2c817ce4a5	Merge pull request #153 from tekelala/main fix(agent): handle 413 payload-too-large via compression instead of aborting	2026-02-27 22:57:55 -08:00
adavyas	0c0a2eb0a2	fix(agent): fail fast on Anthropic native base URLs	2026-02-27 21:19:29 -08:00
Teknium	0d2ac1c07f	Merge pull request #121 from Bartok9/test-clarify-tool test(tools): add unit tests for clarify_tool.py	2026-02-27 16:27:37 -08:00
tekelala	79bd65034c	fix(agent): handle 413 payload-too-large via compression instead of aborting The 413 "Request Entity Too Large" error from the LLM API was caught by the generic 4xx handler which aborts immediately. This is wrong for 413 — it's a payload-size issue that can be resolved by compressing conversation history. - Intercept 413 before the generic 4xx block and route to _compress_context - Exclude 413 from generic is_client_error detection - Add 'request entity too large' to context-length phrases as safety net - Add tests for 413 compression behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 12:21:27 -05:00
tekelala	fbb1923fad	fix(security): patch path traversal, size bypass, and prompt injection in document processing - Sanitize filenames in cache_document_from_bytes to prevent path traversal (strip directory components, null bytes, resolve check) - Reject documents with None file_size instead of silently allowing download - Cap text file injection at 100 KB to prevent oversized prompt payloads - Sanitize display_name in run.py context notes to block prompt injection via filenames - Add 35 unit tests covering document cache utilities and Telegram document handling Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 11:53:46 -05:00
Teknium	3526fa27fd	Merge pull request #62 from 0xbyt4/test/expand-coverage-2 test: add unit tests for 8 modules (batch 2)	2026-02-27 01:47:30 -08:00
Teknium	64eca85876	Merge pull request #67 from 0xbyt4/test/add-run-agent-unit-tests test: add unit tests for run_agent.py (AIAgent)	2026-02-27 01:36:49 -08:00
Teknium	152271851f	Merge pull request #63 from 0xbyt4/fix/cron-prompt-injection-bypass fix: cron prompt injection scanner bypass for multi-word variants	2026-02-27 01:34:14 -08:00
Teknium	0909be3aa8	Merge pull request #61 from 0xbyt4/fix/write-deny-macos-symlink fix: resolve symlink bypass in write deny list on macOS	2026-02-27 01:32:19 -08:00
Teknium	274e623b50	Merge pull request #60 from 0xbyt4/test/expand-coverage test: add unit tests for 8 untested core modules	2026-02-27 01:30:36 -08:00
Bartok Moltbot	df8a62d018	test(tools): add unit tests for clarify_tool.py Add comprehensive test coverage for the clarify_tool module: - TestClarifyToolBasics: 5 tests for core functionality - Simple questions, questions with choices, error handling - TestClarifyToolChoicesValidation: 5 tests for choices parameter - MAX_CHOICES enforcement, empty/whitespace handling, type conversion - TestClarifyToolCallbackHandling: 3 tests for callback behavior - Exception handling, question/response trimming - TestCheckClarifyRequirements: 1 test verifying always-true behavior - TestClarifySchema: 6 tests verifying OpenAI function schema - Required/optional parameters, maxItems constraint Total: 20 tests covering all public functions and edge cases.	2026-02-27 03:29:26 -05:00
George Pickett	32070e6bc0	Merge remote-tracking branch 'origin/main' into codex/align-codex-provider-conventions-mainrepo # Conflicts: # cron/scheduler.py # gateway/run.py # tools/delegate_tool.py	2026-02-26 10:56:29 -08:00
darya	f5c09a3aba	test: add regression tests for recursive delete false positive fix Add 15 new tests in two classes: - TestRmFalsePositiveFix (8 tests): verify filenames starting with 'r' (readme.txt, requirements.txt, report.csv, etc.) are NOT falsely flagged as 'recursive delete' - TestRmRecursiveFlagVariants (7 tests): verify all recursive delete flag styles (-r, -rf, -rfv, -fr, -irf, --recursive, sudo rm -rf) are still correctly caught All 29 tests pass (14 existing + 15 new).	2026-02-26 16:40:44 +03:00
0xbyt4	90ca2ae16b	test: add unit tests for run_agent.py (AIAgent) 71 tests covering pure functions, state/structure methods, and conversation loop pieces. OpenAI client and tool loading are mocked.	2026-02-26 16:15:04 +03:00
0xbyt4	feea8332d6	fix: cron prompt injection scanner bypass for multi-word variants The regex `ignore\s+(previous\|all\|above\|prior)\s+instructions` only allowed ONE word between "ignore" and "instructions". Multi-word variants like "Ignore ALL prior instructions" bypassed the scanner because "ALL" matched the alternation but then `\s+instructions` failed to match "prior". Fix: use `(?:\w+\s+)*` groups to allow optional extra words before and after the keyword alternation.	2026-02-26 13:55:54 +03:00
0xbyt4	ffbdd7fcce	test: add unit tests for 8 modules (batch 2) Cover model_tools, toolset_distributions, context_compressor, prompt_caching, cronjob_tools, session_search, process_registry, and cron/scheduler with 127 new test cases.	2026-02-26 13:54:20 +03:00
0xbyt4	b699cf8c48	test: remove /etc platform-conditional tests from file_operations These tests documented the macOS symlink bypass bug with platform-conditional assertions. The fix and proper regression tests are in PR #61 (tests/tools/test_write_deny.py), so remove them here to avoid ordering conflicts between the two PRs.	2026-02-26 13:43:30 +03:00
0xbyt4	2efd9bbac4	fix: resolve symlink bypass in write deny list on macOS On macOS, /etc is a symlink to /private/etc. The _is_write_denied() function resolves the input path with os.path.realpath() but the deny list entries were stored as literal strings ("/etc/shadow"). This meant the resolved path "/private/etc/shadow" never matched, allowing writes to sensitive system files on macOS. Fix: Apply os.path.realpath() to deny list entries at module load time so both sides of the comparison use resolved paths. Adds 19 regression tests in tests/tools/test_write_deny.py.	2026-02-26 13:30:55 +03:00
0xbyt4	0ac3af8776	test: add unit tests for 8 untested modules Add comprehensive test coverage for: - cron/jobs.py: schedule parsing, job CRUD, due-job detection (34 tests) - tools/memory_tool.py: security scanning, MemoryStore ops, dispatcher (32 tests) - toolsets.py: resolution, validation, composition, cycle detection (19 tests) - tools/file_operations.py: write deny list, result dataclasses, helpers (37 tests) - agent/prompt_builder.py: context scanning, truncation, skills index (24 tests) - agent/model_metadata.py: token estimation, context lengths (16 tests) - hermes_state.py: SessionDB SQLite CRUD, FTS5 search, export, prune (28 tests) Total: 210 new tests, all passing (380 total suite).	2026-02-26 13:27:58 +03:00
teknium1	178658bf9f	test: enhance session source tests and add validation for chat types - Renamed test method for clarity and added comprehensive tests for `SessionSource` including handling of numeric `chat_id`, missing optional fields, and invalid platforms. - Introduced tests for session source descriptions based on chat types and names, ensuring accurate representation in prompts. - Improved file tools tests by validating schema structures, ensuring no duplicate model IDs, and enhancing error handling in file operations.	2026-02-26 00:53:57 -08:00
George Pickett	74c662b63a	Harden Codex auth refresh and responses compatibility	2026-02-25 19:27:54 -08:00
George Pickett	91bdb9eb2d	Fix Codex stream fallback for Responses completion gaps	2026-02-25 19:08:11 -08:00
George Pickett	47f16505d2	Omit optional function_call id in Responses replay input	2026-02-25 19:00:11 -08:00
George Pickett	e63986b534	Harden Codex stream handling and ack continuation	2026-02-25 18:56:06 -08:00
George Pickett	ce175d7372	Fix Codex Responses continuation and schema parity	2026-02-25 18:20:41 -08:00
George Pickett	609b19b630	Add OpenAI Codex provider runtime and responses integration (without .agent/PLANS.md)	2026-02-25 18:20:38 -08:00
0xbyt4	8fc28c34ce	test: reorganize test structure and add missing unit tests Reorganize flat tests/ directory to mirror source code structure (tools/, gateway/, hermes_cli/, integration/). Add 11 new test files covering previously untested modules: registry, patch_parser, fuzzy_match, todo_tool, approval, file_tools, gateway session/config/ delivery, and hermes_cli config/models. Total: 147 unit tests passing, 9 integration tests gated behind pytest marker.	2026-02-26 03:20:08 +03:00
teknium1	8fedbf87d9	feat: add cleanup utility for test artifacts in checkpoint resumption tests - Introduced a new `_cleanup_test_artifacts` function to remove test-generated files and directories after test execution. - Integrated the cleanup function into the `test_current_implementation` and `test_interruption_and_resume` tests to ensure proper resource management and prevent clutter from leftover files.	2026-02-23 02:16:10 -08:00
teknium1	d8a369e194	refactor: update API key checks in WebToolsTester - Replaced the Nous API key check with the Auxiliary Model check in the WebToolsTester class. - Updated the environment configuration to reflect the change in API key validation, ensuring accurate reporting of available keys.	2026-02-23 02:13:33 -08:00
teknium1	90af34bc83	feat: enhance interrupt handling and container resource configuration - Introduced a shared interrupt signaling mechanism to allow tools to check for user interrupts during long-running operations. - Updated the AIAgent to handle interrupts more effectively, ensuring in-progress tool calls are canceled and multiple interrupt messages are combined into one prompt. - Enhanced the CLI configuration to include container resource limits (CPU, memory, disk) and persistence options for Docker, Singularity, and Modal environments. - Improved documentation to clarify interrupt behaviors and container resource settings, providing users with better guidance on configuration and usage.	2026-02-23 02:11:33 -08:00
teknium1	cbff1b818c	refactor: remove obsolete Nous API test scripts - Deleted test scripts for Nous API limits, patterns, and temperature checks to streamline the testing suite. - These scripts were no longer necessary and their removal helps maintain a cleaner codebase.	2026-02-21 03:21:13 -08:00
teknium1	70dd3a16dc	Cleanup time!	2026-02-20 23:23:32 -08:00
teknium1	90e5211128	feat: implement subagent delegation for task management - Introduced the `delegate_task` tool, allowing the main agent to spawn child AIAgent instances with isolated context for complex tasks. - Supported both single-task and batch processing (up to 3 concurrent tasks) to enhance task management capabilities. - Updated configuration options for delegation, including maximum iterations and default toolsets for subagents. - Enhanced documentation to provide clear guidance on using the delegation feature and its configuration. - Added comprehensive tests to ensure the functionality and reliability of the delegation logic.	2026-02-20 03:15:53 -08:00
teknium1	783acd712d	feat: implement code execution sandbox for programmatic tool calling - Introduced a new `execute_code` tool that allows the agent to run Python scripts that call Hermes tools via RPC, reducing the number of round trips required for tool interactions. - Added configuration options for timeout and maximum tool calls in the sandbox environment. - Updated the toolset definitions to include the new code execution capabilities, ensuring integration across platforms. - Implemented comprehensive tests for the code execution sandbox, covering various scenarios including tool call limits and error handling. - Enhanced the CLI and documentation to reflect the new functionality, providing users with clear guidance on using the code execution tool.	2026-02-19 23:23:43 -08:00
teknium	248acf715e	Add browser automation tools and enhance environment configuration - Introduced new browser automation tools in `browser_tool.py` for navigating, interacting with, and extracting content from web pages using the agent-browser CLI and Browserbase cloud execution. - Updated `.env.example` to include new configuration options for Browserbase API keys and session settings. - Enhanced `model_tools.py` and `toolsets.py` to integrate browser tools into the existing tool framework, ensuring consistent access across toolsets. - Updated `README.md` with setup instructions for browser tools and their usage examples. - Added new test script `test_modal_terminal.py` to validate Modal terminal backend functionality. - Improved `run_agent.py` to support browser tool integration and logging enhancements for better tracking of API responses.	2026-01-29 06:10:24 +00:00
teknium	c82741c3d8	some cleanups	2025-11-05 03:47:17 +00:00
teknium	f6f75cbe2b	update webtools	2025-11-02 06:03:21 +00:00
teknium	0e2e69a71d	Add batch processing capabilities with checkpointing and statistics tracking, along with toolset distribution management. Update README and add test scripts for validation.	2025-10-06 03:17:58 +00:00
teknium	a7ff4d49e9	A bit of restructuring for simplicity and organization	2025-10-01 23:29:25 +00:00

50 Commits