1132 Commits

Author SHA1 Message Date
Teknium
152271851f Merge pull request #63 from 0xbyt4/fix/cron-prompt-injection-bypass
fix: cron prompt injection scanner bypass for multi-word variants
2026-02-27 01:34:14 -08:00
Teknium
0909be3aa8 Merge pull request #61 from 0xbyt4/fix/write-deny-macos-symlink
fix: resolve symlink bypass in write deny list on macOS
2026-02-27 01:32:19 -08:00
Teknium
274e623b50 Merge pull request #60 from 0xbyt4/test/expand-coverage
test: add unit tests for 8 untested core modules
2026-02-27 01:30:36 -08:00
Bartok Moltbot
df8a62d018 test(tools): add unit tests for clarify_tool.py
Add comprehensive test coverage for the clarify_tool module:

- TestClarifyToolBasics: 5 tests for core functionality
  - Simple questions, questions with choices, error handling

- TestClarifyToolChoicesValidation: 5 tests for choices parameter
  - MAX_CHOICES enforcement, empty/whitespace handling, type conversion

- TestClarifyToolCallbackHandling: 3 tests for callback behavior
  - Exception handling, question/response trimming

- TestCheckClarifyRequirements: 1 test verifying always-true behavior

- TestClarifySchema: 6 tests verifying OpenAI function schema
  - Required/optional parameters, maxItems constraint

Total: 20 tests covering all public functions and edge cases.
2026-02-27 03:29:26 -05:00
George Pickett
32070e6bc0 Merge remote-tracking branch 'origin/main' into codex/align-codex-provider-conventions-mainrepo
# Conflicts:
#	cron/scheduler.py
#	gateway/run.py
#	tools/delegate_tool.py
2026-02-26 10:56:29 -08:00
darya
f5c09a3aba test: add regression tests for recursive delete false positive fix
Add 15 new tests in two classes:

- TestRmFalsePositiveFix (8 tests): verify filenames starting with 'r'
  (readme.txt, requirements.txt, report.csv, etc.) are NOT falsely
  flagged as 'recursive delete'

- TestRmRecursiveFlagVariants (7 tests): verify all recursive delete
  flag styles (-r, -rf, -rfv, -fr, -irf, --recursive, sudo rm -rf)
  are still correctly caught

All 29 tests pass (14 existing + 15 new).
2026-02-26 16:40:44 +03:00
0xbyt4
90ca2ae16b test: add unit tests for run_agent.py (AIAgent)
71 tests covering pure functions, state/structure methods, and
conversation loop pieces. OpenAI client and tool loading are mocked.
2026-02-26 16:15:04 +03:00
0xbyt4
feea8332d6 fix: cron prompt injection scanner bypass for multi-word variants
The regex `ignore\s+(previous|all|above|prior)\s+instructions` only
allowed ONE word between "ignore" and "instructions". Multi-word
variants like "Ignore ALL prior instructions" bypassed the scanner
because "ALL" matched the alternation but then `\s+instructions`
failed to match "prior".

Fix: use `(?:\w+\s+)*` groups to allow optional extra words before
and after the keyword alternation.
2026-02-26 13:55:54 +03:00
0xbyt4
ffbdd7fcce test: add unit tests for 8 modules (batch 2)
Cover model_tools, toolset_distributions, context_compressor,
prompt_caching, cronjob_tools, session_search, process_registry,
and cron/scheduler with 127 new test cases.
2026-02-26 13:54:20 +03:00
0xbyt4
b699cf8c48 test: remove /etc platform-conditional tests from file_operations
These tests documented the macOS symlink bypass bug with
platform-conditional assertions. The fix and proper regression
tests are in PR #61 (tests/tools/test_write_deny.py), so remove
them here to avoid ordering conflicts between the two PRs.
2026-02-26 13:43:30 +03:00
0xbyt4
2efd9bbac4 fix: resolve symlink bypass in write deny list on macOS
On macOS, /etc is a symlink to /private/etc. The _is_write_denied()
function resolves the input path with os.path.realpath() but the deny
list entries were stored as literal strings ("/etc/shadow"). This meant
the resolved path "/private/etc/shadow" never matched, allowing writes
to sensitive system files on macOS.

Fix: Apply os.path.realpath() to deny list entries at module load time
so both sides of the comparison use resolved paths.

Adds 19 regression tests in tests/tools/test_write_deny.py.
2026-02-26 13:30:55 +03:00
0xbyt4
0ac3af8776 test: add unit tests for 8 untested modules
Add comprehensive test coverage for:
- cron/jobs.py: schedule parsing, job CRUD, due-job detection (34 tests)
- tools/memory_tool.py: security scanning, MemoryStore ops, dispatcher (32 tests)
- toolsets.py: resolution, validation, composition, cycle detection (19 tests)
- tools/file_operations.py: write deny list, result dataclasses, helpers (37 tests)
- agent/prompt_builder.py: context scanning, truncation, skills index (24 tests)
- agent/model_metadata.py: token estimation, context lengths (16 tests)
- hermes_state.py: SessionDB SQLite CRUD, FTS5 search, export, prune (28 tests)

Total: 210 new tests, all passing (380 total suite).
2026-02-26 13:27:58 +03:00
teknium1
178658bf9f test: enhance session source tests and add validation for chat types
- Renamed test method for clarity and added comprehensive tests for `SessionSource` including handling of numeric `chat_id`, missing optional fields, and invalid platforms.
- Introduced tests for session source descriptions based on chat types and names, ensuring accurate representation in prompts.
- Improved file tools tests by validating schema structures, ensuring no duplicate model IDs, and enhancing error handling in file operations.
2026-02-26 00:53:57 -08:00
George Pickett
74c662b63a Harden Codex auth refresh and responses compatibility 2026-02-25 19:27:54 -08:00
George Pickett
91bdb9eb2d Fix Codex stream fallback for Responses completion gaps 2026-02-25 19:08:11 -08:00
George Pickett
47f16505d2 Omit optional function_call id in Responses replay input 2026-02-25 19:00:11 -08:00
George Pickett
e63986b534 Harden Codex stream handling and ack continuation 2026-02-25 18:56:06 -08:00
George Pickett
ce175d7372 Fix Codex Responses continuation and schema parity 2026-02-25 18:20:41 -08:00
George Pickett
609b19b630 Add OpenAI Codex provider runtime and responses integration (without .agent/PLANS.md) 2026-02-25 18:20:38 -08:00
0xbyt4
8fc28c34ce test: reorganize test structure and add missing unit tests
Reorganize flat tests/ directory to mirror source code structure
(tools/, gateway/, hermes_cli/, integration/). Add 11 new test files
covering previously untested modules: registry, patch_parser,
fuzzy_match, todo_tool, approval, file_tools, gateway session/config/
delivery, and hermes_cli config/models. Total: 147 unit tests passing,
9 integration tests gated behind pytest marker.
2026-02-26 03:20:08 +03:00
teknium1
8fedbf87d9 feat: add cleanup utility for test artifacts in checkpoint resumption tests
- Introduced a new `_cleanup_test_artifacts` function to remove test-generated files and directories after test execution.
- Integrated the cleanup function into the `test_current_implementation` and `test_interruption_and_resume` tests to ensure proper resource management and prevent clutter from leftover files.
2026-02-23 02:16:10 -08:00
teknium1
d8a369e194 refactor: update API key checks in WebToolsTester
- Replaced the Nous API key check with the Auxiliary Model check in the WebToolsTester class.
- Updated the environment configuration to reflect the change in API key validation, ensuring accurate reporting of available keys.
2026-02-23 02:13:33 -08:00
teknium1
90af34bc83 feat: enhance interrupt handling and container resource configuration
- Introduced a shared interrupt signaling mechanism to allow tools to check for user interrupts during long-running operations.
- Updated the AIAgent to handle interrupts more effectively, ensuring in-progress tool calls are canceled and multiple interrupt messages are combined into one prompt.
- Enhanced the CLI configuration to include container resource limits (CPU, memory, disk) and persistence options for Docker, Singularity, and Modal environments.
- Improved documentation to clarify interrupt behaviors and container resource settings, providing users with better guidance on configuration and usage.
2026-02-23 02:11:33 -08:00
teknium1
cbff1b818c refactor: remove obsolete Nous API test scripts
- Deleted test scripts for Nous API limits, patterns, and temperature checks to streamline the testing suite.
- These scripts were no longer necessary and their removal helps maintain a cleaner codebase.
2026-02-21 03:21:13 -08:00
teknium1
70dd3a16dc Cleanup time! 2026-02-20 23:23:32 -08:00
teknium1
90e5211128 feat: implement subagent delegation for task management
- Introduced the `delegate_task` tool, allowing the main agent to spawn child AIAgent instances with isolated context for complex tasks.
- Supported both single-task and batch processing (up to 3 concurrent tasks) to enhance task management capabilities.
- Updated configuration options for delegation, including maximum iterations and default toolsets for subagents.
- Enhanced documentation to provide clear guidance on using the delegation feature and its configuration.
- Added comprehensive tests to ensure the functionality and reliability of the delegation logic.
2026-02-20 03:15:53 -08:00
teknium1
783acd712d feat: implement code execution sandbox for programmatic tool calling
- Introduced a new `execute_code` tool that allows the agent to run Python scripts that call Hermes tools via RPC, reducing the number of round trips required for tool interactions.
- Added configuration options for timeout and maximum tool calls in the sandbox environment.
- Updated the toolset definitions to include the new code execution capabilities, ensuring integration across platforms.
- Implemented comprehensive tests for the code execution sandbox, covering various scenarios including tool call limits and error handling.
- Enhanced the CLI and documentation to reflect the new functionality, providing users with clear guidance on using the code execution tool.
2026-02-19 23:23:43 -08:00
teknium
248acf715e Add browser automation tools and enhance environment configuration
- Introduced new browser automation tools in `browser_tool.py` for navigating, interacting with, and extracting content from web pages using the agent-browser CLI and Browserbase cloud execution.
- Updated `.env.example` to include new configuration options for Browserbase API keys and session settings.
- Enhanced `model_tools.py` and `toolsets.py` to integrate browser tools into the existing tool framework, ensuring consistent access across toolsets.
- Updated `README.md` with setup instructions for browser tools and their usage examples.
- Added new test script `test_modal_terminal.py` to validate Modal terminal backend functionality.
- Improved `run_agent.py` to support browser tool integration and logging enhancements for better tracking of API responses.
2026-01-29 06:10:24 +00:00
teknium
c82741c3d8 some cleanups 2025-11-05 03:47:17 +00:00
teknium
f6f75cbe2b update webtools 2025-11-02 06:03:21 +00:00
teknium
0e2e69a71d Add batch processing capabilities with checkpointing and statistics tracking, along with toolset distribution management. Update README and add test scripts for validation. 2025-10-06 03:17:58 +00:00
teknium
a7ff4d49e9 A bit of restructuring for simplicity and organization 2025-10-01 23:29:25 +00:00