Commit Graph

70 Commits

Author SHA1 Message Date
Teknium
252fbea005 feat(providers): add ordered fallback provider chain (salvage #1761) (#3813)
Extends the single fallback_model mechanism into an ordered chain.
When the primary model fails, Hermes tries each fallback provider in
sequence until one succeeds or the chain is exhausted.

Config format (new):
  fallback_providers:
    - provider: openrouter
      model: anthropic/claude-sonnet-4
    - provider: openai
      model: gpt-4o

Legacy single-dict fallback_model format still works unchanged.

Key fix vs original PR: the call sites in the retry loop now use
_fallback_index < len(_fallback_chain) instead of the old one-shot
_fallback_activated guard, so the chain actually advances through
all configured providers.

Changes:
- run_agent.py: _fallback_chain list + _fallback_index replaces
  one-shot _fallback_model; _try_activate_fallback() advances
  through chain; failed provider resolution skips to next entry;
  call sites updated to allow chain advancement
- cli.py: reads fallback_providers with legacy fallback_model compat
- gateway/run.py: same
- hermes_cli/config.py: fallback_providers: [] in DEFAULT_CONFIG
- tests: 12 new chain tests + 6 existing test fixtures updated

Co-authored-by: uzaylisak <uzaylisak@users.noreply.github.com>
2026-03-29 16:04:53 -07:00
Teknium
924857c3e3 fix: prevent tool name/arg concatenation for Ollama-compatible endpoints (#3582)
Ollama reuses index 0 for every tool call in a parallel batch,
distinguishing them only by id.  The streaming accumulator now
detects a new non-empty id at an already-active index and redirects
it to a fresh slot, preventing names and arguments from being
concatenated into a single tool call.

No-op for normal providers that use incrementing indices.

Co-authored-by: dmater01 <dmater01@users.noreply.github.com>
2026-03-28 14:08:26 -07:00
Teknium
901494d728 feat: make tool-use enforcement configurable via agent.tool_use_enforcement (#3551)
The TOOL_USE_ENFORCEMENT_GUIDANCE injection (added in #3528) was
hardcoded to only match gpt/codex model names. This makes it a
config option so users can turn it on for any model family.

New config key: agent.tool_use_enforcement
  - "auto" (default): matches gpt/codex (existing behavior)
  - true: inject for all models
  - false: never inject
  - list of strings: custom model-name substrings to match
    e.g. ["gpt", "codex", "deepseek", "qwen"]

No version bump needed — deep merge provides the default
automatically for existing installs.

12 new tests covering all config modes.
2026-03-28 12:31:22 -07:00
Teknium
8fdfc4b00c fix(agent): detect thinking-budget exhaustion on truncation, skip useless retries (#3444)
When finish_reason='length' and the response contains only reasoning
(think blocks or empty content), the model exhausted its output token
budget on thinking with nothing left for the actual response.

Previously, this fell into either:
- chat_completions: 3 useless continuation retries (model hits same limit)
- anthropic/codex: generic 'Response truncated' error with rollback

Now: detect the think-only + length condition early and return immediately
with a targeted error message: 'Model used all output tokens on reasoning
with none left for the response. Try lowering reasoning effort or
increasing max_tokens.'

This saves 2 wasted API calls on the chat_completions path and gives
users actionable guidance instead of a cryptic error.

The existing think-only retry logic (finish_reason='stop') is unchanged —
that's a genuine model glitch where retrying can help.
2026-03-27 15:29:30 -07:00
Teknium
fb46a90098 fix: increase API timeout default from 900s to 1800s for slow-thinking models (#3431)
Models like GLM-5/5.1 can think for 15+ minutes. The previous 900s
(15 min) default for HERMES_API_TIMEOUT killed legitimate requests.

Raised to 1800s (30 min) in both places that read the env var:
- _build_api_kwargs() timeout (non-streaming total timeout)
- _call_chat_completions() write timeout (streaming connection)

The streaming per-chunk read timeout (60s) and stale stream detector
(180-300s) are unchanged — those are appropriate for inter-chunk timing.
2026-03-27 13:02:23 -07:00
Teknium
5a1e2a307a perf(ttft): salvage easy-win startup optimizations from #3346 (#3395)
* perf(ttft): dedupe shared tool availability checks

* perf(ttft): short-circuit vision auto-resolution

* perf(ttft): make Claude Code version detection lazy

* perf(ttft): reuse loaded toolsets for skills prompt

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-27 07:49:44 -07:00
Teknium
08d3be0412 fix: graceful return on max retries instead of crashing thread
run_conversation raised the raw exception after exhausting retries,
which crashed the background thread in cli.py (unhandled exception
in Thread). Now returns a proper error result dict with failed=True
and persists the session, matching the pattern used by other error
paths (invalid responses, empty content, etc.).

Also wraps cli.py's run_agent thread function in try/except as a
safety net against any future unhandled exceptions from
run_conversation.

Made-with: Cursor
2026-03-25 19:00:39 -07:00
Teknium
099dfca6db fix: GLM reasoning-only and max-length handling (#3010)
- Add 'prompt exceeds max length' to context overflow detection for
  Z.AI/GLM 400 errors
- Extract inline reasoning blocks from assistant content as fallback
  when no structured reasoning fields are present
- Guard inline extraction so structured API reasoning takes priority
- Update test for reasoning-only response salvage behavior

Cherry-picked from PR #2993 by kshitijk4poor. Added priority guard
to fix test_structured_reasoning_takes_priority failure.

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-25 12:05:37 -07:00
Teknium
525caadd8c fix: prevent Anthropic token leaking to third-party anthropic_messages providers (salvage #2383) (#2389)
* fix: prevent Anthropic token fallback leaking to third-party anthropic_messages providers

When provider is minimax/alibaba/etc and MINIMAX_API_KEY is not set,
the code fell back to resolve_anthropic_token() sending Anthropic OAuth
credentials to third-party endpoints, causing 401 errors.

Now only provider=="anthropic" triggers the fallback. Generalizes the
Alibaba-specific guard from #1739 to all non-Anthropic providers.

* fix: set provider='anthropic' in credential refresh tests

Follow-up for cherry-picked PR #2383 — existing tests didn't set
agent.provider, which the new guard requires to allow Anthropic
token refresh.

---------

Co-authored-by: 0xbyt4 <35742124+0xbyt4@users.noreply.github.com>
2026-03-21 16:42:46 -07:00
Test
e7844e9c8d Merge origin/main, resolve conflicts (self._base_url_lower) 2026-03-18 04:09:00 -07:00
Teknium
c0c14e60b4 fix: make concurrent tool batching path-aware for file mutations (#1914)
* Improve tool batching independence checks

* fix: address review feedback on path-aware batching

- Log malformed/non-dict tool arguments at debug level before
  falling back to sequential, instead of silently swallowing
  the error into an empty dict
- Guard empty paths in _paths_overlap (unreachable in practice
  due to upstream filtering, but makes the invariant explicit)
- Add tests: malformed JSON args, non-dict args, _paths_overlap
  unit tests including empty path edge cases
- web_crawl is not a registered tool (only web_search/web_extract
  are); no addition needed to _PARALLEL_SAFE_TOOLS

---------

Co-authored-by: kshitij <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-18 03:25:38 -07:00
Test
5b74df2bfc fix: OAuth flag stale after refresh/fallback, memory nudge never fires, dead code
- Update _is_anthropic_oauth in _try_refresh_anthropic_client_credentials()
  when token type changes during credential refresh
- Set _is_anthropic_oauth in _try_activate_fallback() Anthropic path
- Move _turns_since_memory and _iters_since_skill init to __init__ so
  nudge counters accumulate across run_conversation() calls in CLI mode
- Remove unreachable retry_count >= max_retries block after raise

Adds 7 regression tests. Salvaged from PR #1797 by @0xbyt4.
2026-03-18 02:19:57 -07:00
max
0c392e7a87 feat: integrate GitHub Copilot providers across Hermes
Add first-class GitHub Copilot and Copilot ACP provider support across
model selection, runtime provider resolution, CLI sessions, delegated
subagents, cron jobs, and the Telegram gateway.

This also normalizes Copilot model catalogs and API modes, introduces a
Copilot ACP OpenAI-compatible shim, and fixes service-mode auth by
resolving Homebrew-installed gh binaries under launchd.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-17 23:40:22 -07:00
teknium1
8e07f9ca56 fix: audit fixes — 5 bugs found and resolved
Thorough code review found 5 issues across run_agent.py, cli.py, and gateway/:

1. CRITICAL — Gateway stream consumer task never started: stream_consumer_holder
   was checked BEFORE run_sync populated it. Fixed with async polling pattern
   (same as track_agent).

2. MEDIUM-HIGH — Streaming fallback after partial delivery caused double-response:
   if streaming failed after some tokens were delivered, the fallback would
   re-deliver the full response. Now tracks deltas_were_sent and only falls
   back when no tokens reached consumers yet.

3. MEDIUM — Codex mode lost on_first_delta spinner callback: _run_codex_stream
   now accepts on_first_delta parameter, fires it on first text delta. Passed
   through from _interruptible_streaming_api_call via _codex_on_first_delta
   instance attribute.

4. MEDIUM — CLI close-tag after-text bypassed tag filtering: text after a
   reasoning close tag was sent directly to _emit_stream_text, skipping
   open-tag detection. Now routes through _stream_delta for full filtering.

5. LOW — Removed 140 lines of dead code: old _streaming_api_call method
   (superseded by _interruptible_streaming_api_call). Updated 13 tests in
   test_run_agent.py and test_openai_client_lifecycle.py to use the new
   method name and signature.

4573 tests passing.
2026-03-16 06:35:46 -07:00
Teknium
dd7921d514 fix(honcho): isolate session routing for multi-user gateway (#1500)
Salvaged from PR #1470 by adavyas.

Core fix: Honcho tool calls in a multi-session gateway could route to
the wrong session because honcho_tools.py relied on process-global
state. Now threads session context through the call chain:
  AIAgent._invoke_tool() → handle_function_call() → registry.dispatch()
  → handler **kw → _resolve_session_context()

Changes:
- Add _resolve_session_context() to prefer per-call context over globals
- Plumb honcho_manager + honcho_session_key through handle_function_call
- Add sync_honcho=False to run_conversation() for synthetic flush turns
- Pass honcho_session_key through gateway memory flush lifecycle
- Harden gateway PID detection when /proc cmdline is unreadable
- Make interrupt test scripts import-safe for pytest-xdist
- Wrap BibTeX examples in Jekyll raw blocks for docs build
- Fix thread-order-dependent assertion in client lifecycle test
- Expand Honcho docs: session isolation, lifecycle, routing internals

Dropped from original PR:
- Indentation change in _create_request_openai_client that would move
  client creation inside the lock (causes unnecessary contention)

Co-authored-by: adavyas <adavyas@users.noreply.github.com>
2026-03-16 00:23:47 -07:00
Teknium
3f0f4a04a9 fix(agent): skip reasoning extra_body for unsupported OpenRouter models (#1485)
* fix(agent): skip reasoning extra_body for models that don't support it

Sending reasoning config to models like MiniMax or Nvidia via OpenRouter
causes a 400 BadRequestError. Previously, reasoning extra_body was sent
to all OpenRouter and Nous models unconditionally.

Fix: only send reasoning extra_body when the model slug starts with a
known reasoning-capable prefix (deepseek/, anthropic/, openai/, x-ai/,
google/gemini-2, qwen/qwen3) or when using Nous Portal directly.

Applies to both the main API call path (_build_api_kwargs) and the
conversation summary path.

Fixes #1083

* test(agent): cover reasoning extra_body gating

---------

Co-authored-by: ygd58 <buraysandro9@gmail.com>
2026-03-15 20:42:07 -07:00
teknium1
62abb453d3 Merge origin/main into hermes/hermes-daa73839 2026-03-14 23:44:47 -07:00
teknium1
735a6e7651 fix: convert anthropic image content blocks 2026-03-14 23:41:20 -07:00
0xbyt4
6f85283553 fix: use json.dumps instead of str() for Codex Responses API arguments
When the Responses API returns tool call arguments as a dict,
str(dict) produces Python repr with single quotes (e.g. {'key': 'val'})
which is invalid JSON. Downstream json.loads() fails silently and the
tool gets called with empty arguments, losing all parameters.

Affects both function_call and custom_tool_call item types in
_normalize_codex_response().
2026-03-14 22:03:53 -07:00
teknium1
70ea13eb40 fix: preflight Anthropic auth and prefer Claude store 2026-03-14 19:38:55 -07:00
teknium1
e052c74727 fix: refresh Anthropic OAuth before stale env tokens 2026-03-14 19:22:31 -07:00
teknium1
7b10881b9e fix: persist clean voice transcripts and /voice off state
- keep CLI voice prefixes API-local while storing the original user text
- persist explicit gateway off state and restore adapter auto-TTS suppression on restart
- add regression coverage for both behaviors
2026-03-14 06:14:22 -07:00
0xbyt4
eb34c0b09a fix: voice pipeline hardening — 7 bug fixes with tests
1. Anthropic + ElevenLabs TTS silence: forward full response to TTS
   callback for non-streaming providers (choices first, then native
   content blocks fallback).

2. Subprocess timeout kill: play_audio_file now kills the process on
   TimeoutExpired instead of leaving zombie processes.

3. Discord disconnect cleanup: leave all voice channels before closing
   the client to prevent leaked state.

4. Audio stream leak: close InputStream if stream.start() fails.

5. Race condition: read/write _on_silence_stop under lock in audio
   callback thread.

6. _vprint force=True: show API error, retry, and truncation messages
   even during streaming TTS.

7. _refresh_level lock: read _voice_recording under _voice_lock.
2026-03-14 14:27:21 +03:00
0xbyt4
c797314fcf test: add security and hardening tests for voice mode fixes
- Path traversal sanitization (Path.name strips ../)
- Media endpoint authentication (401 without token, 404 on traversal)
- hmac.compare_digest usage verification (no == for tokens)
- DOMPurify XSS prevention in HTML template
- Default bind 127.0.0.1 (adapter and config)
- /remote-control token hiding in group chats
- Opus find_library instead of hardcoded paths
- Opus decode error logging (no silent swallow)
- Interrupt _vprint force=True on all 6 calls
- Anthropic interrupt handler in both API call paths
- Update test_web_defaults for new 127.0.0.1 default
2026-03-14 14:27:21 +03:00
0xbyt4
ecc3dd7c63 test: add comprehensive voice mode test coverage (86 tests)
- Add TestStreamingApiCall (11 tests) for _streaming_api_call in test_run_agent.py
- Add regression tests for all 7 bug fixes (edge_tts lazy import, output_stream
  cleanup, ctrl+c continuous reset, disable stops TTS, config key, chat cleanup,
  browser_tool signal handler removal)
- Add real behavior tests for CLI voice methods via _make_voice_cli() fixture:
  TestHandleVoiceCommandReal (7), TestEnableVoiceModeReal (7),
  TestDisableVoiceModeReal (6), TestVoiceSpeakResponseReal (7),
  TestVoiceStopAndTranscribeReal (12)
2026-03-14 14:27:20 +03:00
teknium1
cbbba87099 fix: reuse shared atomic session log helper 2026-03-14 02:56:13 -07:00
Teknium
1117a21065 Merge pull request #1271 from NousResearch/hermes/hermes-de3d4e49
fix: guard init-time stdio writes
2026-03-14 02:21:39 -07:00
teknium1
936040d8f7 fix: guard init-time stdio writes 2026-03-14 02:19:46 -07:00
teknium1
806b79b589 test: cover errors.log handler reuse 2026-03-13 23:56:51 -07:00
Teknium
938e887b4c fix: keep honcho recall out of cached system prefix (#1201)
Attach later-turn Honcho recall to the current-turn user message at API
call time instead of appending it to the system prompt. This preserves the
stable system-prefix cache while keeping Honcho continuity context
available for the turn.

Also adds regression coverage for the injection helper and for continuing
sessions so Honcho recall stays out of the system prompt.
2026-03-13 21:07:00 -07:00
Teknium
0157253145 Merge pull request #1152 from NousResearch/hermes/hermes-f47f71c0
feat: concurrent tool execution with ThreadPoolExecutor
2026-03-13 03:20:38 -07:00
kshitijk4poor
ccfbf42844 feat: secure skill env setup on load (core #688)
When a skill declares required_environment_variables in its YAML
frontmatter, missing env vars trigger a secure TUI prompt (identical
to the sudo password widget) when the skill is loaded. Secrets flow
directly to ~/.hermes/.env, never entering LLM context.

Key changes:
- New required_environment_variables frontmatter field for skills
- Secure TUI widget (masked input, 120s timeout)
- Gateway safety: messaging platforms show local setup guidance
- Legacy prerequisites.env_vars normalized into new format
- Remote backend handling: conservative setup_needed=True
- Env var name validation, file permissions hardened to 0o600
- Redact patterns extended for secret-related JSON fields
- 12 existing skills updated with prerequisites declarations
- ~48 new tests covering skip, timeout, gateway, remote backends
- Dynamic panel widget sizing (fixes hardcoded width from original PR)

Cherry-picked from PR #723 by kshitijk4poor, rebased onto current main
with conflict resolution.

Fixes #688

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-13 03:14:04 -07:00
teknium1
5d0d5b191c feat: concurrent tool execution with ThreadPoolExecutor
When the model returns multiple tool calls in a single response, they are
now executed concurrently using a thread pool instead of sequentially.
This significantly reduces wall-clock time when multiple independent tools
are batched (e.g. parallel web_search, read_file, terminal calls).

Architecture:
- _execute_tool_calls() dispatches to sequential or concurrent path
- Single tool calls and batches containing 'clarify' use sequential path
- Multiple non-interactive tools use ThreadPoolExecutor (max 8 workers)
- Results are collected and appended to messages in original order
- _invoke_tool() extracted as shared tool invocation helper

Safety:
- Pre-flight interrupt check skips all tools if interrupted
- Per-tool exception handling: one failure doesn't crash the batch
- Result truncation (100k char limit) applied per tool
- Budget pressure injection after all tools complete
- Checkpoints taken before file-mutating tools
- CLI spinner shows batch progress, then per-tool completion messages

Tests: 10 new tests covering dispatch logic, ordering, error handling,
interrupt behavior, truncation, and _invoke_tool routing.
2026-03-13 02:51:51 -07:00
Teknium
a282322845 Merge pull request #1121 from 0xbyt4/fix/anthropic-adapter-issues
fix: anthropic adapter — max_tokens, fallback crash, proxy base_url
2026-03-12 19:07:06 -07:00
Teknium
475dd58a8e Merge PR #736: feat(honcho): async writes, memory modes, session title integration, setup CLI
Authored by erosika. Builds on #38 and #243.

Adds async write support, configurable memory modes, context prefetch pipeline,
4 new Honcho tools (honcho_context, honcho_profile, honcho_search, honcho_conclude),
full 'hermes honcho' CLI, session strategies, AI peer identity, recallMode A/B,
gateway lifecycle management, and comprehensive docs.

Cherry-picks fixes from PRs #831/#832 (adavyas).

Co-authored-by: erosika <erosika@users.noreply.github.com>
Co-authored-by: adavyas <adavyas@users.noreply.github.com>
2026-03-12 19:05:11 -07:00
0xbyt4
22479b053c fix: anthropic adapter — max_tokens ignored, fallback crash, proxy base_url filtered
- Pass self.max_tokens to build_anthropic_kwargs instead of hardcoded None
- Add anthropic case to _try_activate_fallback (was only handling openai-codex)
- Remove 'anthropic in base_url' filter that blocked custom proxy URLs
2026-03-13 04:22:16 +03:00
teknium1
5e12442b4b feat: native Anthropic provider with Claude Code credential auto-discovery
Add Anthropic as a first-class inference provider, bypassing OpenRouter
for direct API access. Uses the native Anthropic SDK with a full format
adapter (same pattern as the codex_responses api_mode).

## Auth (three methods, priority order)
1. ANTHROPIC_API_KEY env var (regular API key, sk-ant-api-*)
2. ANTHROPIC_TOKEN / CLAUDE_CODE_OAUTH_TOKEN env var (setup-token, sk-ant-oat-*)
3. Auto-discovery from ~/.claude/.credentials.json (Claude Code subscription)
   - Reads Claude Code's OAuth credentials
   - Checks token expiry with 60s buffer
   - Setup tokens use Bearer auth + anthropic-beta: oauth-2025-04-20 header
   - Regular API keys use standard x-api-key header

## Changes by file

### New files
- agent/anthropic_adapter.py — Client builder, message/tool/response
  format conversion, Claude Code credential reader, token resolver.
  Handles system prompt extraction, tool_use/tool_result blocks,
  thinking/reasoning, orphaned tool_use cleanup, cache_control.
- tests/test_anthropic_adapter.py — 36 tests covering all adapter logic

### Modified files
- pyproject.toml — Add anthropic>=0.39.0 dependency
- hermes_cli/auth.py — Add 'anthropic' to PROVIDER_REGISTRY with
  three env vars, plus 'claude'/'claude-code' aliases
- hermes_cli/models.py — Add model catalog, labels, aliases, provider order
- hermes_cli/main.py — Add 'anthropic' to --provider CLI choices
- hermes_cli/runtime_provider.py — Add Anthropic branch returning
  api_mode='anthropic_messages' (before generic api_key fallthrough)
- hermes_cli/setup.py — Add Anthropic setup wizard with Claude Code
  credential auto-discovery, model selection, OpenRouter tools prompt
- agent/auxiliary_client.py — Add claude-haiku-4-5 as aux model
- agent/model_metadata.py — Add bare Claude model context lengths
- run_agent.py — Add anthropic_messages api_mode:
  * Client init (Anthropic SDK instead of OpenAI)
  * API call dispatch (_anthropic_client.messages.create)
  * Response validation (content blocks)
  * finish_reason mapping (stop_reason -> finish_reason)
  * Token usage (input_tokens/output_tokens)
  * Response normalization (normalize_anthropic_response)
  * Client interrupt/rebuild
  * Prompt caching auto-enabled for native Anthropic
- tests/test_run_agent.py — Update test_anthropic_base_url_accepted to
  expect native routing, add test_prompt_caching_native_anthropic
2026-03-12 15:47:45 -07:00
Erosika
fefc709b2c merge: resolve conflict with main in subagent interrupt test 2026-03-12 16:28:57 -04:00
Erosika
ae2a5e5743 refactor(honcho): remove local memory mode
The "local" memoryMode was redundant with enabled: false. Simplifies
the mode system to hybrid and honcho only.
2026-03-12 16:23:34 -04:00
teknium1
29ef69c703 fix: update all test mocks for call_llm migration
Update 14 test files to use the new call_llm/async_call_llm mock
patterns instead of the old get_text_auxiliary_client/
get_vision_auxiliary_client tuple returns.

- vision_tools tests: mock async_call_llm instead of _aux_async_client
- browser tests: mock call_llm instead of _aux_vision_client
- flush_memories tests: mock call_llm instead of get_text_auxiliary_client
- session_search tests: mock async_call_llm with RuntimeError
- mcp_tool tests: fix whitelist model config, use side_effect for
  multi-response tests
- auxiliary_config_bridge: update for model=None (resolved in router)

3251 passed, 2 pre-existing unrelated failures.
2026-03-11 21:06:54 -07:00
Erosika
2d35016b94 fix(honcho): harden tool gating and migration peer routing
Prevent stale Honcho tool exposure in context/local modes, restore reliable async write retry behavior, and ensure SOUL.md migration uploads target the AI peer instead of the user peer. Also align Honcho CLI key checks with host-scoped apiKey resolution and lock the fixes with regression tests.

Made-with: Cursor
2026-03-11 18:21:27 -04:00
Erosika
a0b0dbe6b2 Merge remote-tracking branch 'origin/main' into feat/honcho-async-memory
Made-with: Cursor

# Conflicts:
#	cli.py
#	tests/test_run_agent.py
2026-03-11 12:22:56 -04:00
teknium1
a8409a161f fix: guard all print() calls against OSError with _SafeWriter
When hermes-agent runs as a systemd service, Docker container, or
headless daemon, the stdout pipe can become unavailable (idle timeout,
buffer exhaustion, socket reset). Any print() call then raises
OSError: [Errno 5] Input/output error, crashing run_conversation()
and causing cron jobs to fail.

Rather than wrapping individual print() calls (68 in run_conversation
alone), this adds a transparent _SafeWriter wrapper installed once at
the start of run_conversation(). It delegates all writes to the real
stdout and silently catches OSError. Zero overhead on the happy path,
comprehensive coverage of all print calls including future ones.

Fixes #845

Co-authored-by: J0hnLawMississippi <J0hnLawMississippi@users.noreply.github.com>
2026-03-11 09:19:10 -07:00
Erosika
047b118299 fix(honcho): resolve review blockers for merge
Address merge-blocking review feedback by removing unsafe signal handler overrides, wiring next-turn Honcho prefetch, restoring per-directory session defaults, and exposing all Honcho tools to the model surface. Also harden prefetch cache access with public thread-safe accessors and remove duplicate browser cleanup code.

Made-with: Cursor
2026-03-11 11:46:37 -04:00
teknium1
21ff0d39ad feat: iteration budget pressure via tool result injection
Two-tier warning system that nudges the LLM as it approaches
max_iterations, injected into the last tool result JSON rather
than as a separate system message:

- Caution (70%): {"_budget_warning": "[BUDGET: 42/60...]"}
- Warning (90%): {"_budget_warning": "[BUDGET WARNING: 54/60...]"}

For JSON tool results, adds a _budget_warning field to the existing
dict. For plain text results, appends the warning as text.

Key properties:
- No system messages injected mid-conversation
- No changes to message structure
- Prompt cache stays valid
- Configurable thresholds (0.7 / 0.9)
- Can be disabled: _budget_pressure_enabled = False

Inspired by PR #421 (@Bartok9) and issue #414.
8 tests covering thresholds, edge cases, JSON and text injection.
2026-03-11 00:37:24 -07:00
adavyas
87cc5287a8 fix(honcho): enforce local mode and cache-safe warmup 2026-03-10 16:21:42 -04:00
teknium1
771969f747 fix: wire up enabled_tools in agent loop + simplify sandbox tool selection
Completes the fix started in 8318a51 — handle_function_call() accepted
enabled_tools but run_agent.py never passed it. Now both call sites in
_execute_tool_calls() pass self.valid_tool_names, so each agent session
uses its own tool list instead of the process-global
_last_resolved_tool_names (which subagents can overwrite).

Also simplifies the redundant ternary in code_execution_tool.py:
sandbox_tools is already computed correctly (intersection with session
tools, or full SANDBOX_ALLOWED_TOOLS as fallback), so the conditional
was dead logic.

Inspired by PR #663 (JasonOA888). Closes #662.
Tests: 2857 passed.
2026-03-10 06:35:28 -07:00
vincent
b0a5fe8974 fix: continue after output-length truncation 2026-03-10 04:30:19 -07:00
teknium1
aedb773f0d fix: stabilize system prompt across gateway turns for cache hits
Two changes to prevent unnecessary Anthropic prompt cache misses in the
gateway, where a fresh AIAgent is created per user message:

1. Reuse stored system prompt for continuing sessions:
   When conversation_history is non-empty, load the system prompt from
   the session DB instead of rebuilding from disk. The model already has
   updated memory in its conversation history (it wrote it!), so
   re-reading memory from disk produces a different system prompt that
   breaks the cache prefix.

2. Stabilize Honcho context per session:
   - Only prefetch Honcho context on the first turn (empty history)
   - Bake Honcho context into the cached system prompt and store to DB
   - Remove the per-turn Honcho injection from the API call loop

   This ensures the system message is identical across all turns in a
   session. Previously, re-fetching Honcho could return different context
   on each turn, changing the system message and invalidating the cache.

Both changes preserve the existing behavior for compression (which
invalidates the prompt and rebuilds from scratch) and for the CLI
(where the same AIAgent persists and the cached prompt is already
stable across turns).

Tests: 2556 passed (6 new)
2026-03-09 01:50:58 -07:00
teknium1
19b6f81ee7 fix: allow Anthropic API URLs as custom OpenAI-compatible endpoints
Removed the hard block on base_url containing 'api.anthropic.com'.
Anthropic now offers an OpenAI-compatible /chat/completions endpoint,
so blocking their URL prevents legitimate use. If the endpoint isn't
compatible, the API call will fail with a proper error anyway.

Removed from: run_agent.py, mini_swe_runner.py
Updated test to verify Anthropic URLs are accepted.
2026-03-07 23:36:35 -08:00