Commit Graph

308 Commits

Author SHA1 Message Date
Farukest
e87859e82c fix(agent): copy conversation_history to avoid mutating caller's list 2026-03-01 03:06:13 +03:00
Farukest
de101a8202 fix(agent): strip _flush_sentinel from API messages 2026-03-01 02:51:31 +03:00
Farukest
7f1f4c2248 fix(tools): preserve empty content in ReadResult.to_dict() 2026-03-01 02:42:15 +03:00
Farukest
c33f8d381b fix: correct off-by-one in retry exhaustion checks
The retry exhaustion checks used > instead of >= to compare
retry_count against max_retries. Since the while loop condition is
retry_count < max_retries, the check retry_count > max_retries can
never be true inside the loop. When retries are exhausted, the loop
exits and falls through to response.choices[0] on an invalid response,
crashing with IndexError instead of returning a proper error.
2026-03-01 02:27:26 +03:00
Farukest
3f58e47c63 fix: guard POSIX-only process functions for Windows compatibility
os.setsid, os.killpg, and os.getpgid do not exist on Windows and raise
AttributeError on import or first call. This breaks the terminal tool,
code execution sandbox, process registry, and WhatsApp bridge on Windows.

Added _IS_WINDOWS platform guard in all four affected files, following
the pattern documented in CONTRIBUTING.md. On Windows, preexec_fn is
set to None and process termination falls back to proc.terminate() /
proc.kill() instead of process group signals.

Files changed:
- tools/environments/local.py (3 call sites)
- tools/process_registry.py (2 call sites)
- tools/code_execution_tool.py (3 call sites)
- gateway/platforms/whatsapp.py (3 call sites)
2026-03-01 01:54:27 +03:00
Farukest
b7f8a17c24 fix(gateway): persist transcript changes in /retry, /undo and fix /reset
/retry and /undo set session_entry.conversation_history which does not
exist on SessionEntry. The truncated history was never written to disk,
so the next message reload picked up the full unmodified transcript.

Added SessionStore.rewrite_transcript() that persists changes to both
the JSONL file and SQLite database, and updated both commands to use it.

/reset accessed self.session_store._sessions which does not exist on
SessionStore (the correct attribute is _entries). Also replaced the
hand-coded session key with _generate_session_key() to fix WhatsApp DM
sessions using the wrong key format.

Closes #210
2026-03-01 01:40:30 +03:00
0xbyt4
b759602483 fix: prevent italic regex from spanning newlines in Telegram formatter
The italic regex \*([^*]+)\* used [^*] which matches newlines, causing
bullet lists with * markers to be incorrectly converted to italic text.
Changed to [^*\n]+ to prevent cross-line matching.

Adds 43 tests for _escape_mdv2 and format_message covering code blocks,
bold/italic, headers, links, mixed formatting, and the regression case.
2026-02-28 22:01:48 +03:00
0xbyt4
9769e07cd5 test: add 25 unit tests for trajectory_compressor
Tests cover CompressionConfig (defaults, from_yaml with full/partial/empty),
TrajectoryMetrics and AggregateMetrics (to_dict, aggregation, division-by-zero
guards), _find_protected_indices (basic, all-protected, no tail, missing roles,
disabled protection), _extract_turn_content_for_summary (basic, truncation,
empty range), and token counting (empty, basic, trajectory, fallback on error).
2026-02-28 21:28:28 +03:00
0xbyt4
08250a53a1 fix: skills hub dedup prefers higher trust levels + 43 tests
- unified_search and GitHubSource.search dedup: replace naive
  `trust_level == "trusted"` check with ranked comparison so
  "builtin" results are never overwritten by "trusted" or "community"
- Add 43 unit tests covering _parse_frontmatter_quick, trust_level_for,
  HubLockFile CRUD, TapsManager ops, LobeHub _convert_to_skill_md,
  unified_search dedup (with regression test), and append_audit_log
2026-02-28 21:25:55 +03:00
0xbyt4
ff6d62802d fix: platform base extract_images and truncate_message bugs + tests
- extract_images: only remove extracted image tags from content, preserve
  non-image markdown links (e.g. PDFs) that were previously silently lost
- truncate_message: walk only chunk_body (not prepended prefix) so the
  reopened code fence does not toggle in_code off, leaving continuation
  chunks with unclosed code blocks
- Add 49 unit tests covering MessageEvent command parsing, extract_images,
  extract_media, truncate_message code block handling, and _get_human_delay
2026-02-28 21:21:03 +03:00
0xbyt4
46506769f1 test: add unit tests for 5 security/logic-critical modules (batch 4)
- gateway/pairing.py: rate limiting, lockout, code expiry, approval flow (28 tests)
- tools/skill_manager_tool.py: validation, path traversal prevention, CRUD (46 tests)
- tools/skills_tool.py: frontmatter/tag parsing, skill discovery, view chain (34 tests)
- agent/auxiliary_client.py: auth reading, API key resolution, param branching (16 tests)
- honcho_integration/session.py: session dataclass, ID sanitization, transcript format (20 tests)
2026-02-28 20:33:48 +03:00
0xbyt4
dfd50ceccd fix: preserve Gemini thought_signature in tool call messages
Gemini 3 thinking models attach extra_content with thought_signature
to function call responses. This must be echoed back on subsequent
API calls or the server rejects with a 400 error. The assistant
message builder was dropping this field, causing all Gemini 3 Flash/Pro
tool-calling flows to fail after the first function call.
2026-02-28 18:10:05 +03:00
0xbyt4
2390728cc3 fix: resolve 4 bugs found in HA integration code review
- Auto-authorize HA events in gateway (system-generated, not user messages)
- Guard _read_events against None/closed WebSocket after failed reconnect
- Use UUID for send() message_id instead of polluting WS sequence counter
- entity_id parameter now takes precedence over data["entity_id"]
2026-02-28 15:12:18 +03:00
0xbyt4
b32c642af3 test: add HA integration tests with fake in-process server
Fake HA server (aiohttp.web) simulates full API surface over real TCP:
- WebSocket auth handshake + event push
- REST endpoints (states, services, notifications)

14 integration tests verify end-to-end flows without mocks:
- WS connect/auth/subscribe/event-forwarding/disconnect
- REST list/get/call-service against fake server
- send() notification delivery and auth failure
- 401/500 error handling
2026-02-28 14:28:04 +03:00
0xbyt4
c36b256de5 feat: add Home Assistant integration (REST tools + WebSocket gateway)
- Add ha_list_entities, ha_get_state, ha_call_service tools via REST API
- Add WebSocket gateway adapter for real-time state_changed event monitoring
- Support domain/entity filtering, cooldown, and auto-reconnect with backoff
- Use REST API for outbound notifications to avoid WS race condition
- Gate tool availability on HASS_TOKEN env var
- Add 82 unit tests covering real logic (filtering, payload building, event pipeline)
2026-02-28 13:32:48 +03:00
Bartok9
35655298e6 fix(gateway): prevent TTS voice messages from accumulating across turns
Fixes #160

The issue was that MEDIA tags were being extracted from ALL messages
in the conversation history, not just messages from the current turn.
This caused TTS voice messages generated in earlier turns to be
re-attached to every subsequent reply.

The fix:
- Track history_len before calling run_conversation
- Only scan messages AFTER history_len for MEDIA tags
- Add comprehensive tests to prevent regression

This ensures each voice message is sent exactly once, when it's
generated, not on every subsequent message in the session.
2026-02-28 03:38:27 -05:00
teknium1
50cb4d5fc7 fix(agent): update error message for unsupported Anthropic API endpoints to clarify usage of OpenRouter 2026-02-27 23:23:31 -08:00
Teknium
2bc9508b7c Merge pull request #173 from adavyas/fix/anthropic-base-url-guard
fix(agent): fail fast on Anthropic native base URLs
2026-02-27 23:22:01 -08:00
teknium1
19f28a633a fix(agent): enhance 413 error handling and improve conversation history management in tests 2026-02-27 23:04:32 -08:00
Teknium
2c817ce4a5 Merge pull request #153 from tekelala/main
fix(agent): handle 413 payload-too-large via compression instead of aborting
2026-02-27 22:57:55 -08:00
adavyas
0c0a2eb0a2 fix(agent): fail fast on Anthropic native base URLs 2026-02-27 21:19:29 -08:00
Teknium
0d2ac1c07f Merge pull request #121 from Bartok9/test-clarify-tool
test(tools): add unit tests for clarify_tool.py
2026-02-27 16:27:37 -08:00
tekelala
79bd65034c fix(agent): handle 413 payload-too-large via compression instead of aborting
The 413 "Request Entity Too Large" error from the LLM API was caught by the
generic 4xx handler which aborts immediately. This is wrong for 413 — it's a
payload-size issue that can be resolved by compressing conversation history.

- Intercept 413 before the generic 4xx block and route to _compress_context
- Exclude 413 from generic is_client_error detection
- Add 'request entity too large' to context-length phrases as safety net
- Add tests for 413 compression behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 12:21:27 -05:00
tekelala
fbb1923fad fix(security): patch path traversal, size bypass, and prompt injection in document processing
- Sanitize filenames in cache_document_from_bytes to prevent path traversal (strip directory components, null bytes, resolve check)
- Reject documents with None file_size instead of silently allowing download
- Cap text file injection at 100 KB to prevent oversized prompt payloads
- Sanitize display_name in run.py context notes to block prompt injection via filenames
- Add 35 unit tests covering document cache utilities and Telegram document handling

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 11:53:46 -05:00
Teknium
3526fa27fd Merge pull request #62 from 0xbyt4/test/expand-coverage-2
test: add unit tests for 8 modules (batch 2)
2026-02-27 01:47:30 -08:00
Teknium
64eca85876 Merge pull request #67 from 0xbyt4/test/add-run-agent-unit-tests
test: add unit tests for run_agent.py (AIAgent)
2026-02-27 01:36:49 -08:00
Teknium
152271851f Merge pull request #63 from 0xbyt4/fix/cron-prompt-injection-bypass
fix: cron prompt injection scanner bypass for multi-word variants
2026-02-27 01:34:14 -08:00
Teknium
0909be3aa8 Merge pull request #61 from 0xbyt4/fix/write-deny-macos-symlink
fix: resolve symlink bypass in write deny list on macOS
2026-02-27 01:32:19 -08:00
Teknium
274e623b50 Merge pull request #60 from 0xbyt4/test/expand-coverage
test: add unit tests for 8 untested core modules
2026-02-27 01:30:36 -08:00
Bartok Moltbot
df8a62d018 test(tools): add unit tests for clarify_tool.py
Add comprehensive test coverage for the clarify_tool module:

- TestClarifyToolBasics: 5 tests for core functionality
  - Simple questions, questions with choices, error handling

- TestClarifyToolChoicesValidation: 5 tests for choices parameter
  - MAX_CHOICES enforcement, empty/whitespace handling, type conversion

- TestClarifyToolCallbackHandling: 3 tests for callback behavior
  - Exception handling, question/response trimming

- TestCheckClarifyRequirements: 1 test verifying always-true behavior

- TestClarifySchema: 6 tests verifying OpenAI function schema
  - Required/optional parameters, maxItems constraint

Total: 20 tests covering all public functions and edge cases.
2026-02-27 03:29:26 -05:00
George Pickett
32070e6bc0 Merge remote-tracking branch 'origin/main' into codex/align-codex-provider-conventions-mainrepo
# Conflicts:
#	cron/scheduler.py
#	gateway/run.py
#	tools/delegate_tool.py
2026-02-26 10:56:29 -08:00
darya
f5c09a3aba test: add regression tests for recursive delete false positive fix
Add 15 new tests in two classes:

- TestRmFalsePositiveFix (8 tests): verify filenames starting with 'r'
  (readme.txt, requirements.txt, report.csv, etc.) are NOT falsely
  flagged as 'recursive delete'

- TestRmRecursiveFlagVariants (7 tests): verify all recursive delete
  flag styles (-r, -rf, -rfv, -fr, -irf, --recursive, sudo rm -rf)
  are still correctly caught

All 29 tests pass (14 existing + 15 new).
2026-02-26 16:40:44 +03:00
0xbyt4
90ca2ae16b test: add unit tests for run_agent.py (AIAgent)
71 tests covering pure functions, state/structure methods, and
conversation loop pieces. OpenAI client and tool loading are mocked.
2026-02-26 16:15:04 +03:00
0xbyt4
feea8332d6 fix: cron prompt injection scanner bypass for multi-word variants
The regex `ignore\s+(previous|all|above|prior)\s+instructions` only
allowed ONE word between "ignore" and "instructions". Multi-word
variants like "Ignore ALL prior instructions" bypassed the scanner
because "ALL" matched the alternation but then `\s+instructions`
failed to match "prior".

Fix: use `(?:\w+\s+)*` groups to allow optional extra words before
and after the keyword alternation.
2026-02-26 13:55:54 +03:00
0xbyt4
ffbdd7fcce test: add unit tests for 8 modules (batch 2)
Cover model_tools, toolset_distributions, context_compressor,
prompt_caching, cronjob_tools, session_search, process_registry,
and cron/scheduler with 127 new test cases.
2026-02-26 13:54:20 +03:00
0xbyt4
b699cf8c48 test: remove /etc platform-conditional tests from file_operations
These tests documented the macOS symlink bypass bug with
platform-conditional assertions. The fix and proper regression
tests are in PR #61 (tests/tools/test_write_deny.py), so remove
them here to avoid ordering conflicts between the two PRs.
2026-02-26 13:43:30 +03:00
0xbyt4
2efd9bbac4 fix: resolve symlink bypass in write deny list on macOS
On macOS, /etc is a symlink to /private/etc. The _is_write_denied()
function resolves the input path with os.path.realpath() but the deny
list entries were stored as literal strings ("/etc/shadow"). This meant
the resolved path "/private/etc/shadow" never matched, allowing writes
to sensitive system files on macOS.

Fix: Apply os.path.realpath() to deny list entries at module load time
so both sides of the comparison use resolved paths.

Adds 19 regression tests in tests/tools/test_write_deny.py.
2026-02-26 13:30:55 +03:00
0xbyt4
0ac3af8776 test: add unit tests for 8 untested modules
Add comprehensive test coverage for:
- cron/jobs.py: schedule parsing, job CRUD, due-job detection (34 tests)
- tools/memory_tool.py: security scanning, MemoryStore ops, dispatcher (32 tests)
- toolsets.py: resolution, validation, composition, cycle detection (19 tests)
- tools/file_operations.py: write deny list, result dataclasses, helpers (37 tests)
- agent/prompt_builder.py: context scanning, truncation, skills index (24 tests)
- agent/model_metadata.py: token estimation, context lengths (16 tests)
- hermes_state.py: SessionDB SQLite CRUD, FTS5 search, export, prune (28 tests)

Total: 210 new tests, all passing (380 total suite).
2026-02-26 13:27:58 +03:00
teknium1
178658bf9f test: enhance session source tests and add validation for chat types
- Renamed test method for clarity and added comprehensive tests for `SessionSource` including handling of numeric `chat_id`, missing optional fields, and invalid platforms.
- Introduced tests for session source descriptions based on chat types and names, ensuring accurate representation in prompts.
- Improved file tools tests by validating schema structures, ensuring no duplicate model IDs, and enhancing error handling in file operations.
2026-02-26 00:53:57 -08:00
George Pickett
74c662b63a Harden Codex auth refresh and responses compatibility 2026-02-25 19:27:54 -08:00
George Pickett
91bdb9eb2d Fix Codex stream fallback for Responses completion gaps 2026-02-25 19:08:11 -08:00
George Pickett
47f16505d2 Omit optional function_call id in Responses replay input 2026-02-25 19:00:11 -08:00
George Pickett
e63986b534 Harden Codex stream handling and ack continuation 2026-02-25 18:56:06 -08:00
George Pickett
ce175d7372 Fix Codex Responses continuation and schema parity 2026-02-25 18:20:41 -08:00
George Pickett
609b19b630 Add OpenAI Codex provider runtime and responses integration (without .agent/PLANS.md) 2026-02-25 18:20:38 -08:00
0xbyt4
8fc28c34ce test: reorganize test structure and add missing unit tests
Reorganize flat tests/ directory to mirror source code structure
(tools/, gateway/, hermes_cli/, integration/). Add 11 new test files
covering previously untested modules: registry, patch_parser,
fuzzy_match, todo_tool, approval, file_tools, gateway session/config/
delivery, and hermes_cli config/models. Total: 147 unit tests passing,
9 integration tests gated behind pytest marker.
2026-02-26 03:20:08 +03:00
teknium1
8fedbf87d9 feat: add cleanup utility for test artifacts in checkpoint resumption tests
- Introduced a new `_cleanup_test_artifacts` function to remove test-generated files and directories after test execution.
- Integrated the cleanup function into the `test_current_implementation` and `test_interruption_and_resume` tests to ensure proper resource management and prevent clutter from leftover files.
2026-02-23 02:16:10 -08:00
teknium1
d8a369e194 refactor: update API key checks in WebToolsTester
- Replaced the Nous API key check with the Auxiliary Model check in the WebToolsTester class.
- Updated the environment configuration to reflect the change in API key validation, ensuring accurate reporting of available keys.
2026-02-23 02:13:33 -08:00
teknium1
90af34bc83 feat: enhance interrupt handling and container resource configuration
- Introduced a shared interrupt signaling mechanism to allow tools to check for user interrupts during long-running operations.
- Updated the AIAgent to handle interrupts more effectively, ensuring in-progress tool calls are canceled and multiple interrupt messages are combined into one prompt.
- Enhanced the CLI configuration to include container resource limits (CPU, memory, disk) and persistence options for Docker, Singularity, and Modal environments.
- Improved documentation to clarify interrupt behaviors and container resource settings, providing users with better guidance on configuration and usage.
2026-02-23 02:11:33 -08:00
teknium1
cbff1b818c refactor: remove obsolete Nous API test scripts
- Deleted test scripts for Nous API limits, patterns, and temperature checks to streamline the testing suite.
- These scripts were no longer necessary and their removal helps maintain a cleaner codebase.
2026-02-21 03:21:13 -08:00