hermes-agent/tests at 04b6ecadc4a5efacb573cdeded06fa9642a1a71b - hermes-agent - Hermes Gitea

Timmy_Foundation/hermes-agent

Files

History

Teknium e84d952dc0 fix(codex): handle reasoning-only responses and replay path (#2070 )

* fix(codex): treat reasoning-only responses as incomplete, not stop

When a Codex Responses API response contains only reasoning items
(encrypted thinking state) with no message text or tool calls, the
_normalize_codex_response method was setting finish_reason='stop'.
This sent the response into the empty-content retry loop, which
burned 3 retries and then failed — exactly the pattern Nester
reported in Discord.

Two fixes:
1. _normalize_codex_response: reasoning-only responses (reasoning_items_raw
   non-empty but no final_text) now get finish_reason='incomplete', routing
   them to the Codex continuation path instead of the retry loop.
2. Incomplete handling: also checks for codex_reasoning_items when deciding
   whether to preserve an interim message, so encrypted reasoning state is
   not silently dropped when there is no visible reasoning text.

Adds 4 regression tests covering:
- Unit: reasoning-only → incomplete, reasoning+content → stop
- E2E: reasoning-only → continuation → final answer succeeds
- E2E: encrypted reasoning items preserved in interim messages

* fix(codex): ensure reasoning items have required following item in API input

Follow-up to the reasoning-only response fix. Three additional issues
found by tracing the full replay path:

1. _chat_messages_to_responses_input: when a reasoning-only interim
   message was converted to Responses API input, the reasoning items
   were emitted as the last items with no following item. The Responses
   API requires a following item after each reasoning item (otherwise:
   'missing_following_item' error, as seen in OpenHands #11406). Now
   emits an empty assistant message as the required following item when
   content is empty but reasoning items were added.

2. Duplicate detection: two consecutive reasoning-only incomplete
   messages with identical empty content/reasoning but different
   encrypted codex_reasoning_items were incorrectly treated as
   duplicates, silently dropping the second response's reasoning state.
   Now includes codex_reasoning_items in the duplicate comparison.

3. Added tests for both the API input conversion path and the duplicate
   detection edge case.

Research context: verified against OpenCode (uses Vercel AI SDK, no
retry loop so avoids the issue), Clawdbot (drops orphaned reasoning
blocks entirely), and OpenHands (hit the missing_following_item error).
Our approach preserves reasoning continuity while satisfying the API
constraint.

---------

Co-authored-by: Test <test@test.com>

2026-03-19 10:34:44 -07:00

..

fix: persist ACP sessions to SessionDB so they survive process restarts

2026-03-19 10:30:50 -07:00

fix: detect context length for custom model endpoints via fuzzy matching + config override (#2051 )

2026-03-19 06:01:16 -07:00

fix(cron): warn and skip missing skills instead of crashing job

2026-03-19 09:56:16 -07:00

…

fix(gateway): replace bare text approval with /approve and /deny commands (#2002 )

2026-03-18 16:58:20 -07:00

Merge origin/main, resolve conflicts (self._base_url_lower)

2026-03-18 04:09:00 -07:00

honcho_integration

test: align Hermes setup and full-suite expectations (#1710 )

2026-03-17 04:01:37 -07:00

feat(web): add Parallel as alternative web search/extract backend (#1696 )

2026-03-17 04:02:02 -07:00

fix: persist google oauth pkce for headless auth

2026-03-14 22:11:34 -07:00

fix: normalize live Chrome CDP endpoints for browser tools

2026-03-19 10:17:03 -07:00

__init__.py

…

conftest.py

fix(approval): show full command in dangerous command approval (#1553 )

2026-03-17 02:02:33 -07:00

run_interrupt_test.py

fix: thread safety for concurrent subagent delegation (#1672 )

2026-03-17 02:53:33 -07:00

test_413_compression.py

…

test_860_dedup.py

…

test_1630_context_overflow_loop.py

fix: prevent infinite 400 loop on context overflow + block prompt injection via cache files (#1630 , #1558 )

2026-03-17 01:50:59 -07:00

test_agent_guardrails.py

feat: pre-call sanitization and post-call tool guardrails (#1732 )

2026-03-17 04:24:27 -07:00

test_agent_loop_tool_calling.py

…

test_agent_loop_vllm.py

test: restore vllm integration coverage and add dict-args regression

2026-03-15 08:02:29 -07:00

test_agent_loop.py

…

test_anthropic_adapter.py

fix: isolate test_anthropic_adapter from local credentials

2026-03-16 22:53:32 -07:00

test_anthropic_error_handling.py

fix(anthropic): retry 429/529 errors and surface error details to users

2026-03-17 01:07:11 +03:00

test_anthropic_oauth_flow.py

…

test_anthropic_provider_persistence.py

…

test_api_key_providers.py

feat: proper Copilot auth with OAuth device code flow and token validation

2026-03-18 03:25:58 -07:00

test_atomic_json_write.py

test: cover atomic temp cleanup on interrupts

2026-03-14 22:31:51 -07:00

test_atomic_yaml_write.py

test: cover atomic temp cleanup on interrupts

2026-03-14 22:31:51 -07:00

test_auth_codex_provider.py

…

test_auth_nous_provider.py

…

test_auxiliary_config_bridge.py

feat(compression): add summary_base_url + move compression config to YAML-only

2026-03-17 04:46:15 -07:00

test_batch_runner_checkpoint.py

…

test_cli_approval_ui.py

…

test_cli_init.py

…

test_cli_interrupt_subagent.py

fix: thread safety for concurrent subagent delegation (#1672 )

2026-03-17 02:53:33 -07:00

test_cli_loading_indicator.py

…

test_cli_mcp_config_watch.py

fix: auto-reload MCP tools when mcp_servers config changes without restart (#1474 )

2026-03-15 19:03:34 -07:00

test_cli_model_command.py

feat: auto-detect provider when switching models via /model (#1506 )

2026-03-16 04:34:45 -07:00

test_cli_new_session.py

…

test_cli_plan_command.py

…

test_cli_prefix_matching.py

feat: add /tools disable/enable/list slash commands with session reset (#1652 )

2026-03-17 02:05:26 -07:00

test_cli_preloaded_skills.py

…

test_cli_provider_resolution.py

fix: respect model.default from config.yaml for openai-codex provider (#1896 )

2026-03-18 02:50:31 -07:00

test_cli_retry.py

…

test_cli_secret_capture.py

…

test_cli_skin_integration.py

…

test_cli_status_bar.py

feat: add route-aware pricing estimates (#1695 )

2026-03-17 03:44:44 -07:00

test_cli_tools_command.py

feat: add /tools disable/enable/list slash commands with session reset (#1652 )

2026-03-17 02:05:26 -07:00

test_codex_execution_paths.py

…

test_codex_models.py

…

test_compression_boundary.py

fix(agent): prevent silent tool result loss during context compression (#1993 )

2026-03-18 15:22:51 -07:00

test_context_token_tracking.py

fix: context counter shows cached token count in status bar

2026-03-17 05:06:11 +03:00

test_dict_tool_call_args.py

test: restore vllm integration coverage and add dict-args regression

2026-03-15 08:02:29 -07:00

test_display.py

…

test_evidence_store.py

feat: add OSS Security Forensics skill (Skills Hub) (#1482 )

2026-03-15 21:59:53 -07:00

test_external_credential_detection.py

…

test_fallback_model.py

feat: upgrade MiniMax default to M2.7 + add new OpenRouter models

2026-03-18 02:42:58 -07:00

test_file_permissions.py

…

test_flush_memories_codex.py

…

test_hermes_state.py

fix: search all sources by default in session_search (#1892 )

2026-03-18 02:21:29 -07:00

test_honcho_client_config.py

…

test_insights.py

feat: add route-aware pricing estimates (#1695 )

2026-03-17 03:44:44 -07:00

test_interactive_interrupt.py

fix: thread safety for concurrent subagent delegation (#1672 )

2026-03-17 02:53:33 -07:00

test_interrupt_propagation.py

fix: thread safety for concurrent subagent delegation (#1672 )

2026-03-17 02:53:33 -07:00

test_managed_server_tool_support.py

…

test_minisweagent_path.py

…

test_model_provider_persistence.py

feat: integrate GitHub Copilot providers across Hermes

2026-03-17 23:40:22 -07:00

test_model_tools.py

…

test_openai_client_lifecycle.py

fix: audit fixes — 5 bugs found and resolved

2026-03-16 06:35:46 -07:00

test_personality_none.py

…

test_plugins.py

feat: first-class plugin architecture (#1555 )

2026-03-16 07:17:36 -07:00

test_provider_parity.py

feat: add Vercel AI Gateway provider (#1628 )

2026-03-17 00:12:16 -07:00

test_quick_commands.py

fix: thread safety for concurrent subagent delegation (#1672 )

2026-03-17 02:53:33 -07:00

test_real_interrupt_subagent.py

fix: thread safety for concurrent subagent delegation (#1672 )

2026-03-17 02:53:33 -07:00

test_reasoning_command.py

…

test_redirect_stdout_issue.py

…

test_resume_display.py

…

test_run_agent_codex_responses.py

fix(codex): handle reasoning-only responses and replay path (#2070 )

2026-03-19 10:34:44 -07:00

test_run_agent.py

Merge origin/main, resolve conflicts (self._base_url_lower)

2026-03-18 04:09:00 -07:00

test_runtime_provider_resolution.py

fix: support Anthropic-compatible endpoints for third-party providers (#1997 )

2026-03-18 16:26:06 -07:00

test_setup_model_selection.py

…

test_sql_injection.py

fix(security): eliminate SQL string formatting in execute() calls

2026-03-19 15:16:35 +01:00

test_streaming.py

fix: always fall back to non-streaming on ANY streaming error

2026-03-16 06:15:09 -07:00

test_timezone.py

fix: skip stale cron jobs on gateway restart instead of firing immediately

2026-03-16 23:48:14 -07:00

test_tool_call_parsers.py

…

test_toolset_distributions.py

…

test_toolsets.py

…

test_trajectory_compressor.py

…

test_worktree_security.py

…

test_worktree.py

…