Commit Graph

66 Commits

Author SHA1 Message Date
teknium1
39bfd226b8 Merge PR #225: fix: preserve empty content in ReadResult.to_dict()
Authored by Farukest. Fixes #224.
2026-03-02 03:13:31 -08:00
teknium1
234b67f5fd fix: mock time in retry exhaustion tests to prevent backoff sleep
The TestRetryExhaustion tests from PR #223 didn't mock time.sleep/time.time,
causing the retry backoff loops (275s+ total) to run in real time. Tests would
time out instead of running quickly.

Added _make_fast_time_mock() helper that creates a mock time module where
time.time() advances 500s per call (so sleep_end is always in the past) and
time.sleep() is a no-op. Both tests now complete in <1s.
2026-03-02 02:59:41 -08:00
teknium1
e27e3a4f8a Merge PR #223: fix: correct off-by-one in retry exhaustion checks
Authored by Farukest. Fixes #222.
2026-03-02 02:54:10 -08:00
teknium1
1cb2311bad fix(security): block path traversal in skill_view file_path (fixes #220)
skill_view accepted arbitrary file_path values like '../../.env' and
would read files outside the skill directory, exposing API keys and
other sensitive data.

Added two layers of defense:
1. Reject paths with '..' components (fast, catches obvious traversal)
2. resolve() containment check with trailing '/' to prevent prefix
   collisions (catches symlinks and edge cases)

Fix approach from PR #242 (@Bartok9). Vulnerability reported by
@Farukest (#220, PR #221). Tests rewritten to properly mock SKILLS_DIR.

Closes #220
2026-03-02 02:00:09 -08:00
teknium1
25c65bc99e fix(agent): handle None content in context compressor (fixes #211)
The OpenAI API returns content: null on assistant messages that only
contain tool calls. msg.get('content', '') returns None (not '') when
the key exists with value None, causing TypeError on len() and string
concatenation in _generate_summary and compress.

Fix: msg.get('content') or '' — handles both missing keys and None.

Tests from PR #216 (@Farukest). Fix also in PR #215 (@cutepawss).
Both PRs had stale branches and couldn't be merged directly.

Closes #211
2026-03-02 01:35:52 -08:00
teknium1
afb680b50d fix(cli): fix max_turns comment and test for correct priority order
Priority is: CLI arg > config file > env var > default
(not env var > config file as the old comment stated)

The test failed because config.yaml had max_turns at both root level
and inside agent section. The test cleared agent.max_turns but the
root-level value still took precedence over the env var. Fixed the
test to clear both, and corrected the comment to match the intended
priority order.
2026-03-02 01:18:52 -08:00
teknium1
e265006fd6 test: add coverage for chat_topic in SessionSource and session context prompt
Tests added:
- Roundtrip serialization of chat_topic via to_dict/from_dict
- chat_topic defaults to None when missing from dict
- Channel Topic line appears in session context prompt when set
- Channel Topic line is omitted when chat_topic is None

Follow-up to PR #248 (feat: Discord channel topic in session context).
2026-03-02 00:53:21 -08:00
teknium1
719f2eef32 Merge branch 'pr-217'
# Conflicts:
#	gateway/session.py
2026-03-02 00:18:41 -08:00
teknium1
e5893075f9 feat(agent): add summary handling for reasoning items
Enhanced the AIAgent class to capture and normalize summary information for reasoning items. Implemented logic to handle summaries as lists, ensuring proper formatting for API interactions. Updated tests to validate the inclusion of summaries in reasoning items, both for existing and default cases.
2026-03-01 20:03:03 -08:00
teknium1
5e598a588f refactor(auth): transition Codex OAuth tokens to Hermes auth store
Updated the authentication mechanism to store Codex OAuth tokens in the Hermes auth store located at ~/.hermes/auth.json instead of the previous ~/.codex/auth.json. This change includes refactoring related functions for reading and saving tokens, ensuring better management of authentication states and preventing conflicts between different applications. Adjusted tests to reflect the new storage structure and improved error handling for missing or malformed tokens.
2026-03-01 19:59:24 -08:00
teknium1
8bc2de4ab6 feat(provider-routing): add OpenRouter provider routing configuration
Introduced a new `provider_routing` section in the CLI configuration to control how requests are routed across providers when using OpenRouter. This includes options for sorting providers by throughput, latency, or price, as well as allowing or ignoring specific providers, setting the order of provider attempts, and managing data collection policies. Updated relevant classes and documentation to support these features, enhancing flexibility in provider selection.
2026-03-01 18:24:27 -08:00
teknium1
11f5c1ecf0 fix(tests): use bare @pytest.mark.asyncio for hook emit tests
Remove loop_scope="function" parameter from async test decorators in
test_hooks.py. This matches the existing convention in the repo
(test_telegram_documents.py) and avoids requiring pytest-asyncio 0.23+.

All 144 new tests from PR #191 now pass.
2026-03-01 05:28:55 -08:00
0xbyt4
3b745633e4 test: add unit tests for 8 untested modules (batch 3) (#191)
* test: add unit tests for 8 untested modules (batch 3)

New test files (143 tests total):
- tools/debug_helpers.py: DebugSession enable/disable, log, save, session info
- tools/skills_guard.py: scan_file, scan_skill, trust levels, install policy, structural checks
- tools/skills_sync.py: manifest read/write, skill discovery, sync logic
- gateway/sticker_cache.py: cache CRUD, sticker injection text builders
- gateway/channel_directory.py: channel resolution, display formatting, session building
- gateway/hooks.py: hook discovery, sync/async emit, wildcard matching
- gateway/mirror.py: session lookup, JSONL append, mirror_to_session
- honcho_integration/client.py: config from env/file, session name resolution, linked workspaces

Also documents a gap in skills_guard: multi-word prompt injection
variants like "ignore all prior instructions" bypass the regex scanner.

* test: strengthen sticker injection tests with exact format assertions

Replace loose "contains" checks with exact output matching for
build_sticker_injection and build_animated_sticker_injection.
Add edge cases: set_name without emoji, empty description, empty emoji.

* test: remove skills_guard gap-documenting test to avoid conflict with fix PR
2026-03-01 05:28:12 -08:00
teknium1
41d8a80226 fix(display): fix subagent progress tree-view visual nits
Two fixes to the subagent progress display from PR #186:

1. Task index prefix: show 1-indexed prefix ([1], [2], ...) for ALL
   tasks in batch mode (task_count > 1). Single tasks get no prefix.
   Previously task 0 had no prefix while others did, making batch
   output confusing.

2. Completion indicator: use spinner.print_above() instead of raw
   print() for per-task completion lines (✓ [1/2] ...). Raw print
   collided with the active spinner, mushing the completion text
   onto the spinner line. Now prints cleanly above.

Added task_count parameter to _build_child_progress_callback and
_run_single_child. Updated tests accordingly.
2026-02-28 23:29:49 -08:00
teknium1
4ec386cc72 fix(display): use spaces instead of ANSI \033[K in print_above() for prompt_toolkit compat
print_above() used \033[K (erase-to-end-of-line) to clear the spinner
line before printing text above it. This causes garbled escape codes when
prompt_toolkit's patch_stdout is active in CLI mode.

Switched to the same spaces-based clearing approach used by stop() —
overwrite with blanks, then carriage return back to start of line.

Updated test assertion to match the new clearing method.
2026-02-28 23:19:23 -08:00
lila
dd69f16c3e feat(gateway): expose subagent tool calls and thinking to user (fixes #169) (#186)
When subagents run via delegate_task, the user now sees real-time
progress instead of silence:

CLI: tree-view activity lines print above the delegation spinner
  🔀 Delegating: research quantum computing
     ├─ 💭 "I'll search for papers first..."
     ├─ 🔍 web_search  "quantum computing"
     ├─ 📖 read_file  "paper.pdf"
     └─ ⠹ working... (18.2s)

Gateway (Telegram/Discord): batched progress summaries sent every
5 tool calls to avoid message spam. Remaining tools flushed on
subagent completion.

Changes:
- agent/display.py: add KawaiiSpinner.print_above() to print
  status lines above an active spinner without disrupting animation.
  Uses captured stdout (self._out) so it works inside the child's
  redirect_stdout(devnull).

- tools/delegate_tool.py: add _build_child_progress_callback()
  that creates a per-child callback relaying tool calls and
  thinking events to the parent's spinner (CLI) or progress
  queue (gateway). Each child gets its own callback instance,
  so parallel subagents don't share state. Includes _flush()
  for gateway batch completion.

- run_agent.py: fire tool_progress_callback with '_thinking'
  event when the model produces text content. Guarded by
  _delegate_depth > 0 so only subagents fire this (prevents
  gateway spam from main agent). REASONING_SCRATCHPAD/think/
  reasoning XML tags are stripped before display.

Tests: 21 new tests covering print_above, callback builder,
thinking relay, SCRATCHPAD filtering, batching, flush, thread
isolation, delegate_depth guard, and prefix handling.
2026-02-28 23:18:00 -08:00
teknium1
1db5598294 feat(tests): add live integration tests for file operations and shell noise filtering
- Introduce a new test suite in `test_file_tools_live.py` to validate file operations and ensure accurate command execution in a real environment.
- Implement assertions to check for shell noise contamination in outputs, enhancing the reliability of command results.
- Create fixtures for setting up a local environment and populating directories with known file contents for comprehensive testing.
- Refactor shell noise handling in `process_registry.py` and `local.py` to support multiple noise patterns, improving output cleanliness.
2026-02-28 22:57:58 -08:00
teknium1
70dfec9638 test(redact): add sensitive text redaction
- Introduce a new test suite for the `redact_sensitive_text` function, covering various sensitive data formats including API keys, tokens, and environment variables.
- Ensure that sensitive information is properly masked in logs and outputs while non-sensitive data remains unchanged.
- Add tests for different scenarios including JSON fields, authorization headers, and environment variable assignments.
- Implement a redacting formatter for logging to enhance security during log output.
2026-02-28 21:56:27 -08:00
teknium1
500f0eab4a refactor(cli): Finalize OpenAI Codex Integration with OAuth
- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.
2026-02-28 21:47:51 -08:00
Teknium
5a79e423fe Merge branch 'main' into codex/align-codex-provider-conventions-mainrepo 2026-02-28 18:13:38 -08:00
Farukest
7f1f4c2248 fix(tools): preserve empty content in ReadResult.to_dict() 2026-03-01 02:42:15 +03:00
Farukest
c33f8d381b fix: correct off-by-one in retry exhaustion checks
The retry exhaustion checks used > instead of >= to compare
retry_count against max_retries. Since the while loop condition is
retry_count < max_retries, the check retry_count > max_retries can
never be true inside the loop. When retries are exhausted, the loop
exits and falls through to response.choices[0] on an invalid response,
crashing with IndexError instead of returning a proper error.
2026-03-01 02:27:26 +03:00
Farukest
b7f8a17c24 fix(gateway): persist transcript changes in /retry, /undo and fix /reset
/retry and /undo set session_entry.conversation_history which does not
exist on SessionEntry. The truncated history was never written to disk,
so the next message reload picked up the full unmodified transcript.

Added SessionStore.rewrite_transcript() that persists changes to both
the JSONL file and SQLite database, and updated both commands to use it.

/reset accessed self.session_store._sessions which does not exist on
SessionStore (the correct attribute is _entries). Also replaced the
hand-coded session key with _generate_session_key() to fix WhatsApp DM
sessions using the wrong key format.

Closes #210
2026-03-01 01:40:30 +03:00
Bartok9
35655298e6 fix(gateway): prevent TTS voice messages from accumulating across turns
Fixes #160

The issue was that MEDIA tags were being extracted from ALL messages
in the conversation history, not just messages from the current turn.
This caused TTS voice messages generated in earlier turns to be
re-attached to every subsequent reply.

The fix:
- Track history_len before calling run_conversation
- Only scan messages AFTER history_len for MEDIA tags
- Add comprehensive tests to prevent regression

This ensures each voice message is sent exactly once, when it's
generated, not on every subsequent message in the session.
2026-02-28 03:38:27 -05:00
teknium1
50cb4d5fc7 fix(agent): update error message for unsupported Anthropic API endpoints to clarify usage of OpenRouter 2026-02-27 23:23:31 -08:00
Teknium
2bc9508b7c Merge pull request #173 from adavyas/fix/anthropic-base-url-guard
fix(agent): fail fast on Anthropic native base URLs
2026-02-27 23:22:01 -08:00
teknium1
19f28a633a fix(agent): enhance 413 error handling and improve conversation history management in tests 2026-02-27 23:04:32 -08:00
Teknium
2c817ce4a5 Merge pull request #153 from tekelala/main
fix(agent): handle 413 payload-too-large via compression instead of aborting
2026-02-27 22:57:55 -08:00
adavyas
0c0a2eb0a2 fix(agent): fail fast on Anthropic native base URLs 2026-02-27 21:19:29 -08:00
Teknium
0d2ac1c07f Merge pull request #121 from Bartok9/test-clarify-tool
test(tools): add unit tests for clarify_tool.py
2026-02-27 16:27:37 -08:00
tekelala
79bd65034c fix(agent): handle 413 payload-too-large via compression instead of aborting
The 413 "Request Entity Too Large" error from the LLM API was caught by the
generic 4xx handler which aborts immediately. This is wrong for 413 — it's a
payload-size issue that can be resolved by compressing conversation history.

- Intercept 413 before the generic 4xx block and route to _compress_context
- Exclude 413 from generic is_client_error detection
- Add 'request entity too large' to context-length phrases as safety net
- Add tests for 413 compression behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 12:21:27 -05:00
tekelala
fbb1923fad fix(security): patch path traversal, size bypass, and prompt injection in document processing
- Sanitize filenames in cache_document_from_bytes to prevent path traversal (strip directory components, null bytes, resolve check)
- Reject documents with None file_size instead of silently allowing download
- Cap text file injection at 100 KB to prevent oversized prompt payloads
- Sanitize display_name in run.py context notes to block prompt injection via filenames
- Add 35 unit tests covering document cache utilities and Telegram document handling

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 11:53:46 -05:00
Teknium
3526fa27fd Merge pull request #62 from 0xbyt4/test/expand-coverage-2
test: add unit tests for 8 modules (batch 2)
2026-02-27 01:47:30 -08:00
Teknium
64eca85876 Merge pull request #67 from 0xbyt4/test/add-run-agent-unit-tests
test: add unit tests for run_agent.py (AIAgent)
2026-02-27 01:36:49 -08:00
Teknium
152271851f Merge pull request #63 from 0xbyt4/fix/cron-prompt-injection-bypass
fix: cron prompt injection scanner bypass for multi-word variants
2026-02-27 01:34:14 -08:00
Teknium
0909be3aa8 Merge pull request #61 from 0xbyt4/fix/write-deny-macos-symlink
fix: resolve symlink bypass in write deny list on macOS
2026-02-27 01:32:19 -08:00
Teknium
274e623b50 Merge pull request #60 from 0xbyt4/test/expand-coverage
test: add unit tests for 8 untested core modules
2026-02-27 01:30:36 -08:00
Bartok Moltbot
df8a62d018 test(tools): add unit tests for clarify_tool.py
Add comprehensive test coverage for the clarify_tool module:

- TestClarifyToolBasics: 5 tests for core functionality
  - Simple questions, questions with choices, error handling

- TestClarifyToolChoicesValidation: 5 tests for choices parameter
  - MAX_CHOICES enforcement, empty/whitespace handling, type conversion

- TestClarifyToolCallbackHandling: 3 tests for callback behavior
  - Exception handling, question/response trimming

- TestCheckClarifyRequirements: 1 test verifying always-true behavior

- TestClarifySchema: 6 tests verifying OpenAI function schema
  - Required/optional parameters, maxItems constraint

Total: 20 tests covering all public functions and edge cases.
2026-02-27 03:29:26 -05:00
George Pickett
32070e6bc0 Merge remote-tracking branch 'origin/main' into codex/align-codex-provider-conventions-mainrepo
# Conflicts:
#	cron/scheduler.py
#	gateway/run.py
#	tools/delegate_tool.py
2026-02-26 10:56:29 -08:00
darya
f5c09a3aba test: add regression tests for recursive delete false positive fix
Add 15 new tests in two classes:

- TestRmFalsePositiveFix (8 tests): verify filenames starting with 'r'
  (readme.txt, requirements.txt, report.csv, etc.) are NOT falsely
  flagged as 'recursive delete'

- TestRmRecursiveFlagVariants (7 tests): verify all recursive delete
  flag styles (-r, -rf, -rfv, -fr, -irf, --recursive, sudo rm -rf)
  are still correctly caught

All 29 tests pass (14 existing + 15 new).
2026-02-26 16:40:44 +03:00
0xbyt4
90ca2ae16b test: add unit tests for run_agent.py (AIAgent)
71 tests covering pure functions, state/structure methods, and
conversation loop pieces. OpenAI client and tool loading are mocked.
2026-02-26 16:15:04 +03:00
0xbyt4
feea8332d6 fix: cron prompt injection scanner bypass for multi-word variants
The regex `ignore\s+(previous|all|above|prior)\s+instructions` only
allowed ONE word between "ignore" and "instructions". Multi-word
variants like "Ignore ALL prior instructions" bypassed the scanner
because "ALL" matched the alternation but then `\s+instructions`
failed to match "prior".

Fix: use `(?:\w+\s+)*` groups to allow optional extra words before
and after the keyword alternation.
2026-02-26 13:55:54 +03:00
0xbyt4
ffbdd7fcce test: add unit tests for 8 modules (batch 2)
Cover model_tools, toolset_distributions, context_compressor,
prompt_caching, cronjob_tools, session_search, process_registry,
and cron/scheduler with 127 new test cases.
2026-02-26 13:54:20 +03:00
0xbyt4
b699cf8c48 test: remove /etc platform-conditional tests from file_operations
These tests documented the macOS symlink bypass bug with
platform-conditional assertions. The fix and proper regression
tests are in PR #61 (tests/tools/test_write_deny.py), so remove
them here to avoid ordering conflicts between the two PRs.
2026-02-26 13:43:30 +03:00
0xbyt4
2efd9bbac4 fix: resolve symlink bypass in write deny list on macOS
On macOS, /etc is a symlink to /private/etc. The _is_write_denied()
function resolves the input path with os.path.realpath() but the deny
list entries were stored as literal strings ("/etc/shadow"). This meant
the resolved path "/private/etc/shadow" never matched, allowing writes
to sensitive system files on macOS.

Fix: Apply os.path.realpath() to deny list entries at module load time
so both sides of the comparison use resolved paths.

Adds 19 regression tests in tests/tools/test_write_deny.py.
2026-02-26 13:30:55 +03:00
0xbyt4
0ac3af8776 test: add unit tests for 8 untested modules
Add comprehensive test coverage for:
- cron/jobs.py: schedule parsing, job CRUD, due-job detection (34 tests)
- tools/memory_tool.py: security scanning, MemoryStore ops, dispatcher (32 tests)
- toolsets.py: resolution, validation, composition, cycle detection (19 tests)
- tools/file_operations.py: write deny list, result dataclasses, helpers (37 tests)
- agent/prompt_builder.py: context scanning, truncation, skills index (24 tests)
- agent/model_metadata.py: token estimation, context lengths (16 tests)
- hermes_state.py: SessionDB SQLite CRUD, FTS5 search, export, prune (28 tests)

Total: 210 new tests, all passing (380 total suite).
2026-02-26 13:27:58 +03:00
teknium1
178658bf9f test: enhance session source tests and add validation for chat types
- Renamed test method for clarity and added comprehensive tests for `SessionSource` including handling of numeric `chat_id`, missing optional fields, and invalid platforms.
- Introduced tests for session source descriptions based on chat types and names, ensuring accurate representation in prompts.
- Improved file tools tests by validating schema structures, ensuring no duplicate model IDs, and enhancing error handling in file operations.
2026-02-26 00:53:57 -08:00
George Pickett
74c662b63a Harden Codex auth refresh and responses compatibility 2026-02-25 19:27:54 -08:00
George Pickett
91bdb9eb2d Fix Codex stream fallback for Responses completion gaps 2026-02-25 19:08:11 -08:00
George Pickett
47f16505d2 Omit optional function_call id in Responses replay input 2026-02-25 19:00:11 -08:00