Commit Graph

228 Commits

Author SHA1 Message Date
Teknium
475dd58a8e Merge PR #736: feat(honcho): async writes, memory modes, session title integration, setup CLI
Authored by erosika. Builds on #38 and #243.

Adds async write support, configurable memory modes, context prefetch pipeline,
4 new Honcho tools (honcho_context, honcho_profile, honcho_search, honcho_conclude),
full 'hermes honcho' CLI, session strategies, AI peer identity, recallMode A/B,
gateway lifecycle management, and comprehensive docs.

Cherry-picks fixes from PRs #831/#832 (adavyas).

Co-authored-by: erosika <erosika@users.noreply.github.com>
Co-authored-by: adavyas <adavyas@users.noreply.github.com>
2026-03-12 19:05:11 -07:00
teknium1
e976879cf2 merge: resolve conflicts with main (URL update to hermes-agent.nousresearch.com) 2026-03-12 17:49:26 -07:00
teknium1
7f7282c78d fix(anthropic): guard memory flush tool_calls extraction for Anthropic response format
The memory flush path extracted tool_calls from the response assuming
OpenAI format (response.choices[0].message.tool_calls). When using
the Anthropic client directly (aux unavailable), the response is an
Anthropic Message object which has no .choices attribute. Now uses
normalize_anthropic_response() to extract tool_calls correctly.
2026-03-12 17:35:01 -07:00
teknium1
aaaba78126 fix(anthropic): final polish — tool ID sanitization, crash guards, temp=1
Remaining issues from deep scan:

Adapter (agent/anthropic_adapter.py):
- Add _sanitize_tool_id() — Anthropic requires IDs matching [a-zA-Z0-9_-],
  now strips invalid chars and ensures non-empty (both tool_use and tool_result)
- Empty tool result content → '(no output)' placeholder (Anthropic rejects empty)
- Set temperature=1 when thinking type='enabled' on older models (required)
- normalize_model_name now case-insensitive for 'Anthropic/' prefix
- Fix stale docstrings referencing only ~/.claude/.credentials.json

Agent loop (run_agent.py):
- Guard memory flush path (line ~2684) — was calling self.client.chat.completions
  which is None in anthropic_messages mode. Now routes through Anthropic client.
- Guard summary generation path (line ~3171) — same crash when reaching
  iteration limit. Now builds proper Anthropic kwargs and normalizes response.
- Guard retry summary path (line ~3200) — same fix for the summary retry loop.

All three self.client.chat.completions.create() calls outside the main
loop now have anthropic_messages branches to prevent NoneType crashes.
2026-03-12 17:23:09 -07:00
teknium1
4068f20ce9 fix(anthropic): deep scan fixes — auth, retries, edge cases
Fixes from comprehensive code review and cross-referencing with
clawdbot/OpenCode implementations:

CRITICAL:
- Add one-shot guard (anthropic_auth_retry_attempted) to prevent
  infinite 401 retry loops when credentials keep changing
- Fix _is_oauth_token(): managed keys from ~/.claude.json are NOT
  regular API keys (don't start with sk-ant-api). Inverted the logic:
  only sk-ant-api* is treated as API key auth, everything else uses
  Bearer auth + oauth beta headers

HIGH:
- Wrap json.loads(args) in try/except in message conversion — malformed
  tool_call arguments no longer crash the entire conversation
- Raise AuthError in runtime_provider when no Anthropic token found
  (was silently passing empty string, causing confusing API errors)
- Remove broken _try_anthropic() from auxiliary vision chain — the
  centralized router creates an OpenAI client for api_key providers
  which doesn't work with Anthropic's Messages API

MEDIUM:
- Handle empty assistant message content — Anthropic rejects empty
  content blocks, now inserts '(empty)' placeholder
- Fix setup.py existing_key logic — set to 'KEEP' sentinel instead
  of None to prevent falling through to the auth choice prompt
- Add debug logging to _fetch_anthropic_models on failure

Tests: 43 adapter tests (2 new for token detection), 3197 total passed
2026-03-12 17:14:22 -07:00
Teknium
39f3c0aeb0 fix: use hermes-agent.nousresearch.com as OpenRouter HTTP-Referer
* fix: stop rejecting unlisted models + auto-detect from /models endpoint

validate_requested_model() now accepts models not in the provider's API
listing with a warning instead of blocking. Removes hardcoded catalog
fallback for validation — if API is unreachable, accepts with a warning.

Model selection flows (setup + /model command) now probe the provider's
/models endpoint to get the real available models. Falls back to
hardcoded defaults with a clear warning when auto-detection fails:
'Could not auto-detect models — use Custom model if yours isn't listed.'

Z.AI setup no longer excludes GLM-5 on coding plans.

* fix: use hermes-agent.nousresearch.com as HTTP-Referer for OpenRouter

OpenRouter scrapes the favicon/logo from the HTTP-Referer URL for app
rankings. We were sending the GitHub repo URL, which gives us a generic
GitHub logo. Changed to the proper website URL so our actual branding
shows up in rankings.

Changed in run_agent.py (main agent client) and auxiliary_client.py
(vision/summarization clients).
2026-03-12 16:20:22 -07:00
teknium1
d7adfe8f61 fix(anthropic): address gaps found in deep-dive audit
After studying clawdbot (OpenClaw) and OpenCode implementations:

## Beta headers
- Add interleaved-thinking-2025-05-14 and fine-grained-tool-streaming-2025-05-14
  as common betas (sent with ALL auth types, not just OAuth)
- OAuth tokens additionally get oauth-2025-04-20
- API keys now also get the common betas (previously got none)

## Vision/image support
- Add _convert_vision_content() to convert OpenAI multimodal format
  (image_url blocks) to Anthropic format (image blocks with base64/url source)
- Handles both data: URIs (base64) and regular URLs

## Role alternation enforcement
- Anthropic strictly rejects consecutive same-role messages (400 error)
- Add post-processing step that merges consecutive user/assistant messages
- Handles string, list, and mixed content types during merge

## Tool choice support
- Add tool_choice parameter to build_anthropic_kwargs()
- Maps OpenAI values: auto→auto, required→any, none→omit, name→tool

## Cache metrics tracking
- Anthropic uses cache_read_input_tokens / cache_creation_input_tokens
  (different from OpenRouter's prompt_tokens_details.cached_tokens)
- Add api_mode-aware branch in run_agent.py cache stats logging

## Credential refresh on 401
- On 401 error during anthropic_messages mode, re-read credentials
  via resolve_anthropic_token() (picks up refreshed Claude Code tokens)
- Rebuild client if new token differs from current one
- Follows same pattern as Codex/Nous 401 refresh handlers

## Tests
- 44 adapter tests (8 new: vision conversion, role alternation, tool choice)
- Updated beta header tests to verify new structure
- Full suite: 3198 passed, 0 regressions
2026-03-12 16:00:46 -07:00
Teknium
1bb8ed4495 chore: lower default compression threshold from 85% to 50% (#1096)
* fix: ClawHub skill install — use /download ZIP endpoint

The ClawHub API v1 version endpoint only returns file metadata
(path, size, sha256, contentType) without inline content or download
URLs. Our code was looking for inline content in the metadata, which
never existed, causing all ClawHub installs to fail with:
'no inline/raw file content was available'

Fix: Use the /api/v1/download endpoint (same as the official clawhub
CLI) to download skills as ZIP bundles and extract files in-memory.

Changes:
- Add _download_zip() method that downloads and extracts ZIP bundles
- Retry on 429 rate limiting with Retry-After header support
- Path sanitization and binary file filtering for security
- Keep _extract_files() as a fallback for inline/raw content
- Also fix nested file lookup (version_data.version.files)

* chore: lower default compression threshold from 85% to 50%

Triggers context compression earlier — at 50% of the model's context
window instead of 85%. Updated in all four places where the default
is defined: context_compressor.py, cli.py, run_agent.py, config.py,
and gateway/run.py.
2026-03-12 15:51:50 -07:00
teknium1
5e12442b4b feat: native Anthropic provider with Claude Code credential auto-discovery
Add Anthropic as a first-class inference provider, bypassing OpenRouter
for direct API access. Uses the native Anthropic SDK with a full format
adapter (same pattern as the codex_responses api_mode).

## Auth (three methods, priority order)
1. ANTHROPIC_API_KEY env var (regular API key, sk-ant-api-*)
2. ANTHROPIC_TOKEN / CLAUDE_CODE_OAUTH_TOKEN env var (setup-token, sk-ant-oat-*)
3. Auto-discovery from ~/.claude/.credentials.json (Claude Code subscription)
   - Reads Claude Code's OAuth credentials
   - Checks token expiry with 60s buffer
   - Setup tokens use Bearer auth + anthropic-beta: oauth-2025-04-20 header
   - Regular API keys use standard x-api-key header

## Changes by file

### New files
- agent/anthropic_adapter.py — Client builder, message/tool/response
  format conversion, Claude Code credential reader, token resolver.
  Handles system prompt extraction, tool_use/tool_result blocks,
  thinking/reasoning, orphaned tool_use cleanup, cache_control.
- tests/test_anthropic_adapter.py — 36 tests covering all adapter logic

### Modified files
- pyproject.toml — Add anthropic>=0.39.0 dependency
- hermes_cli/auth.py — Add 'anthropic' to PROVIDER_REGISTRY with
  three env vars, plus 'claude'/'claude-code' aliases
- hermes_cli/models.py — Add model catalog, labels, aliases, provider order
- hermes_cli/main.py — Add 'anthropic' to --provider CLI choices
- hermes_cli/runtime_provider.py — Add Anthropic branch returning
  api_mode='anthropic_messages' (before generic api_key fallthrough)
- hermes_cli/setup.py — Add Anthropic setup wizard with Claude Code
  credential auto-discovery, model selection, OpenRouter tools prompt
- agent/auxiliary_client.py — Add claude-haiku-4-5 as aux model
- agent/model_metadata.py — Add bare Claude model context lengths
- run_agent.py — Add anthropic_messages api_mode:
  * Client init (Anthropic SDK instead of OpenAI)
  * API call dispatch (_anthropic_client.messages.create)
  * Response validation (content blocks)
  * finish_reason mapping (stop_reason -> finish_reason)
  * Token usage (input_tokens/output_tokens)
  * Response normalization (normalize_anthropic_response)
  * Client interrupt/rebuild
  * Prompt caching auto-enabled for native Anthropic
- tests/test_run_agent.py — Update test_anthropic_base_url_accepted to
  expect native routing, add test_prompt_caching_native_anthropic
2026-03-12 15:47:45 -07:00
Erosika
fefc709b2c merge: resolve conflict with main in subagent interrupt test 2026-03-12 16:28:57 -04:00
Erosika
0aed9bfde1 refactor(honcho): rename memory tools to Honcho tools, clarify recall mode language
Replace "memory tools" with "Honcho tools" and "pre-warmed/prefetch"
with "auto-injected context" in all user-facing strings and docs.
2026-03-12 16:26:10 -04:00
Erosika
ae2a5e5743 refactor(honcho): remove local memory mode
The "local" memoryMode was redundant with enabled: false. Simplifies
the mode system to hybrid and honcho only.
2026-03-12 16:23:34 -04:00
Teknium
73ea5102dc Merge pull request #1058 from NousResearch/hermes/hermes-465f3702
fix: strip call_id/response_item_id from tool_calls for Mistral compatibility
2026-03-12 08:21:36 -07:00
teknium1
400b8d92b7 fix: strip call_id/response_item_id from tool_calls for Mistral compatibility
Mistral's API strictly validates the Chat Completions schema and rejects
unknown fields (call_id, response_item_id) with 422. These fields are
added by _build_assistant_message() for Codex Responses API support.

This fix:
- Only strips when targeting Mistral (api.mistral.ai in base_url)
- Creates new tool_call dicts instead of mutating originals (shallow
  copy safety — msg.copy() shares the tool_calls list)
- Preserves call_id/response_item_id in the internal message history
  so _chat_messages_to_responses_input() can still read them if the
  session falls back to a Codex provider mid-conversation

Applied in all 3 API message building locations:
- Main conversation loop (run_conversation)
- _handle_max_iterations()
- flush_memories()

Inspired by PR #864 (unmodeled-tyler) which identified the issue but
applied the fix unconditionally and mutated originals via shallow copy.

Co-authored-by: unmodeled-tyler <unmodeled.tyler@proton.me>
2026-03-12 08:18:27 -07:00
Teknium
e9c3317158 fix: improve Kimi model selection — auto-detect endpoint, add missing models (#1039)
* fix: /reasoning command output ordering, display, and inline think extraction

Three issues with the /reasoning command:

1. Output interleaving: The command echo used print() while feedback
   used _cprint(), causing them to render out-of-order under
   prompt_toolkit's patch_stdout. Changed echo to use _cprint() so
   all output renders through the same path in correct order.

2. Reasoning display not working: /reasoning show toggled a flag
   but reasoning never appeared for models that embed thinking in
   inline <think> blocks rather than structured API fields. Added
   fallback extraction in _build_assistant_message to capture
   <think> block content as reasoning when no structured reasoning
   fields (reasoning, reasoning_content, reasoning_details) are
   present. This feeds into both the reasoning callback (during
   tool loops) and the post-response reasoning box display.

3. Feedback clarity: Added checkmarks to confirm actions, persisted
   show/hide to config (was session-only before), and aligned the
   status display for readability.

Tests: 7 new tests for inline think block extraction (41 total).

* feat: add /reasoning command to gateway (Telegram/Discord/etc)

The /reasoning command only existed in the CLI — messaging platforms
had no way to view or change reasoning settings. This adds:

1. /reasoning command handler in the gateway:
   - No args: shows current effort level and display state
   - /reasoning <level>: sets reasoning effort (none/low/medium/high/xhigh)
   - /reasoning show|hide: toggles reasoning display in responses
   - All changes saved to config.yaml immediately

2. Reasoning display in gateway responses:
   - When show_reasoning is enabled, prepends a 'Reasoning' block
     with the model's last_reasoning content before the response
   - Collapses long reasoning (>15 lines) to keep messages readable
   - Uses last_reasoning from run_conversation result dict

3. Plumbing:
   - Added _show_reasoning attribute loaded from config at startup
   - Propagated last_reasoning through _run_agent return dict
   - Added /reasoning to help text and known_commands set
   - Uses getattr for _show_reasoning to handle test stubs

* fix: improve Kimi model selection — auto-detect endpoint, add missing models

Kimi Coding Plan setup:
- New dedicated _model_flow_kimi() replaces the generic API-key flow
  for kimi-coding. Removes the confusing 'Base URL' prompt entirely —
  the endpoint is auto-detected from the API key prefix:
    sk-kimi-* → api.kimi.com/coding/v1 (Kimi Coding Plan)
    other     → api.moonshot.ai/v1 (legacy Moonshot)

- Shows appropriate models for each endpoint:
    Coding Plan: kimi-for-coding, kimi-k2.5, kimi-k2-thinking, kimi-k2-thinking-turbo
    Moonshot:    full model catalog

- Clears any stale KIMI_BASE_URL override so runtime auto-detection
  via _resolve_kimi_base_url() works correctly.

Model catalog updates:
- Added kimi-for-coding (primary Coding Plan model) and kimi-k2-thinking-turbo
  to models.py, main.py _PROVIDER_MODELS, and model_metadata.py context windows.

- Updated User-Agent from KimiCLI/1.0 to KimiCLI/1.3 (Kimi's coding
  endpoint whitelists known coding agents via User-Agent sniffing).
2026-03-12 05:58:48 -07:00
dmahan93
c7fc39bde0 feat: include session ID in system prompt via --pass-session-id flag
Adds --pass-session-id CLI flag. When set, the agent's system prompt
includes the session ID:

  Conversation started: Sunday, March 08, 2026 06:32 PM
  Session ID: 20260308_183200_abc123

Usage:
  hermes --pass-session-id
  hermes chat --pass-session-id

Implementation threads the flag as a proper parameter through the full
chain (main.py → cli.py → run_agent.py) rather than using an env var,
avoiding collisions in multi-agent/multitenant setups.

Based on PR #726 by dmahan93, reworked to use instance parameter
instead of HERMES_PASS_SESSION_ID environment variable.

Co-authored-by: dmahan93 <dmahan93@users.noreply.github.com>
2026-03-12 05:51:31 -07:00
teknium1
2192b17670 merge: resolve conflicts with origin/main
- gateway/run.py: Take main's _resolve_gateway_model() helper
- hermes_cli/setup.py: Re-apply nous-api removal after merge brought
  it back. Fix provider_idx offset (Custom is now index 3, not 4).
- tests/hermes_cli/test_setup.py: Fix custom setup test index (3→4)
2026-03-12 00:29:04 -07:00
teknium1
65356003e3 revert: keep provider preferences for all providers (Nous will proxy)
Nous Portal backend will become a transparent proxy for OpenRouter-
specific parameters (provider preferences, etc.), so keep sending them
to all providers. The reasoning disabled fix is kept (that's a real
constraint of the Nous endpoint).
2026-03-11 22:53:06 -07:00
teknium1
a7e5f19528 fix: don't send OpenRouter-specific provider preferences to Nous Portal
Two bugs in _build_api_kwargs that broke Nous Portal:

1. Provider preferences (only, ignore, order, sort) are OpenRouter-
   specific routing features. They were being sent in extra_body to ALL
   providers, including Nous Portal. When the config had
   providers_only=['google-vertex'], Nous Portal returned 404 'Inference
   host not found' because it doesn't have a google-vertex backend.

   Fix: Only include provider preferences when _is_openrouter is True.

2. Reasoning config with enabled=false was being sent to Nous Portal,
   which requires reasoning and returns 400 'Reasoning is mandatory for
   this endpoint and cannot be disabled.'

   Fix: Omit the reasoning parameter for Nous when enabled=false.

Root cause found via HERMES_DUMP_REQUESTS=1 which showed the exact
request payload being sent to Nous Portal's inference API.
2026-03-11 22:41:33 -07:00
teknium1
a29801286f refactor: route main agent client + fallback through centralized router
Phase 2 of the provider router migration — route the main agent's
client construction and fallback activation through
resolve_provider_client() instead of duplicated ad-hoc logic.

run_agent.py:
- __init__: When no explicit api_key/base_url, use
  resolve_provider_client(provider, raw_codex=True) for client
  construction. Explicit creds (from CLI/gateway runtime provider)
  still construct directly.
- _try_activate_fallback: Replace _resolve_fallback_credentials and
  its duplicated _FALLBACK_API_KEY_PROVIDERS / _FALLBACK_OAUTH_PROVIDERS
  dicts with a single resolve_provider_client() call. The router
  handles all provider types (API-key, OAuth, Codex) centrally.
- Remove _resolve_fallback_credentials method and both fallback dicts.

agent/auxiliary_client.py:
- Add raw_codex parameter to resolve_provider_client(). When True,
  returns the raw OpenAI client for Codex providers instead of wrapping
  in CodexAuxiliaryClient. The main agent needs this for direct
  responses.stream() access.

3251 passed, 2 pre-existing unrelated failures.
2026-03-11 21:38:29 -07:00
teknium1
0aa31cd3cb feat: call_llm/async_call_llm + config slots + migrate all consumers
Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).
2026-03-11 20:52:19 -07:00
Erosika
2d35016b94 fix(honcho): harden tool gating and migration peer routing
Prevent stale Honcho tool exposure in context/local modes, restore reliable async write retry behavior, and ensure SOUL.md migration uploads target the AI peer instead of the user peer. Also align Honcho CLI key checks with host-scoped apiKey resolution and lock the fixes with regression tests.

Made-with: Cursor
2026-03-11 18:21:27 -04:00
Erosika
a0b0dbe6b2 Merge remote-tracking branch 'origin/main' into feat/honcho-async-memory
Made-with: Cursor

# Conflicts:
#	cli.py
#	tests/test_run_agent.py
2026-03-11 12:22:56 -04:00
Teknium
8fa96debc9 Merge pull request #963 from NousResearch/hermes/hermes-cf9f7d54
fix: guard all print() against OSError with _SafeWriter
2026-03-11 09:19:52 -07:00
teknium1
a8409a161f fix: guard all print() calls against OSError with _SafeWriter
When hermes-agent runs as a systemd service, Docker container, or
headless daemon, the stdout pipe can become unavailable (idle timeout,
buffer exhaustion, socket reset). Any print() call then raises
OSError: [Errno 5] Input/output error, crashing run_conversation()
and causing cron jobs to fail.

Rather than wrapping individual print() calls (68 in run_conversation
alone), this adds a transparent _SafeWriter wrapper installed once at
the start of run_conversation(). It delegates all writes to the real
stdout and silently catches OSError. Zero overhead on the happy path,
comprehensive coverage of all print calls including future ones.

Fixes #845

Co-authored-by: J0hnLawMississippi <J0hnLawMississippi@users.noreply.github.com>
2026-03-11 09:19:10 -07:00
Erosika
047b118299 fix(honcho): resolve review blockers for merge
Address merge-blocking review feedback by removing unsafe signal handler overrides, wiring next-turn Honcho prefetch, restoring per-directory session defaults, and exposing all Honcho tools to the model surface. Also harden prefetch cache access with public thread-safe accessors and remove duplicate browser cleanup code.

Made-with: Cursor
2026-03-11 11:46:37 -04:00
Teknium
01d3b31479 Merge PR #785: feat: conditional skill activation based on tool availability
Authored by teyrebaz33. Closes #539.

feat: conditional skill activation based on tool availability
2026-03-11 08:43:30 -07:00
teknium1
a54405e339 fix: proactive compression after large tool results + Anthropic error detection
Two fixes for context overflow handling:

1. Proactive compression after tool execution: The compression check now
   estimates the next prompt size using real token counts from the last API
   response (prompt_tokens + completion_tokens) plus a conservative estimate
   of newly appended tool results (chars // 3 for JSON-heavy content).
   Previously, should_compress() only checked last_prompt_tokens which
   didn't account for tool results — so a 130k prompt + 100k chars of tool
   output would pass the 140k threshold check but fail the 200k API limit.

2. Safety net: Added 'prompt is too long' to context-length error detection
   phrases. Anthropic returns 'prompt is too long: N tokens > M maximum'
   on HTTP 400, which wasn't matched by existing phrases. This ensures
   compression fires even if the proactive check underestimates.

Fixes #813
2026-03-11 08:04:52 -07:00
teknium1
683c8b24d4 fix: reduce max_retries to 3 and make ValueError/TypeError non-retryable
- max_retries reduced from 6 to 3 — 6 retries with exponential backoff
  could stall for ~275s total on persistent errors
- ValueError and TypeError now detected as non-retryable client errors
  and abort immediately instead of being retried with backoff (these are
  local validation/programming errors that will never succeed on retry)
2026-03-11 07:04:46 -07:00
teknium1
d2dee43825 fix: allow tool_choice, parallel_tool_calls, prompt_cache_key in codex preflight
_preflight_codex_api_kwargs rejected these three fields as unsupported,
but _build_api_kwargs adds them to every codex request. This caused a
ValueError before _interruptible_api_call was reached, which was caught
by the retry loop and retried with exponential backoff — appearing as
an infinite hang in tests (275s total backoff across 6 retries).

The fix adds these keys to allowed_keys and passes them through to the
normalized request dict.

This fixes the hanging test_cron_run_job_codex_path_handles_internal_401_refresh
test (now passes in 2.6s instead of timing out).
2026-03-11 07:00:14 -07:00
teknium1
4d873f77c1 feat(cli): add /reasoning command for effort level and display toggle
Combined implementation of reasoning management:
- /reasoning              Show current effort level and display state
- /reasoning <level>      Set reasoning effort (none, low, medium, high, xhigh)
- /reasoning show|on      Show model thinking/reasoning in output
- /reasoning hide|off     Hide model thinking/reasoning from output

Effort level changes persist to config and force agent re-init.
Display toggle updates the agent callback dynamically without re-init.

When display is enabled:
- Intermediate reasoning shown as dim [thinking] lines during tool loops
- Final reasoning shown in a bordered box above the response
- Long reasoning collapsed (5 lines intermediate, 10 lines final)

Also adds:
- reasoning_callback parameter to AIAgent
- last_reasoning in run_conversation result dict
- show_reasoning config option (display section, default: false)
- Display section in /config output
- 34 tests covering both features

Combines functionality from PR #789 and PR #790.

Co-authored-by: Aum Desai <Aum08Desai@users.noreply.github.com>
Co-authored-by: 0xbyt4 <35742124+0xbyt4@users.noreply.github.com>
2026-03-11 06:02:18 -07:00
teknium1
a82ce60294 fix: add missing Responses API parameters for Codex provider
Adds tool_choice, parallel_tool_calls, and prompt_cache_key to the
Codex Responses API request kwargs — matching what the official Codex
CLI sends.

- tool_choice: 'auto' — enables the model to proactively call tools.
  Without this, the model may default to not using tools, which explains
  reports of the agent claiming it lacks shell access (#747).
- parallel_tool_calls: True — allows the model to issue multiple tool
  calls in a single turn for efficiency.
- prompt_cache_key: session_id — enables server-side prompt caching
  across turns in the same session, reducing latency and cost.

Refs #747
2026-03-11 04:28:31 -07:00
teknium1
21ff0d39ad feat: iteration budget pressure via tool result injection
Two-tier warning system that nudges the LLM as it approaches
max_iterations, injected into the last tool result JSON rather
than as a separate system message:

- Caution (70%): {"_budget_warning": "[BUDGET: 42/60...]"}
- Warning (90%): {"_budget_warning": "[BUDGET WARNING: 54/60...]"}

For JSON tool results, adds a _budget_warning field to the existing
dict. For plain text results, appends the warning as text.

Key properties:
- No system messages injected mid-conversation
- No changes to message structure
- Prompt cache stays valid
- Configurable thresholds (0.7 / 0.9)
- Can be disabled: _budget_pressure_enabled = False

Inspired by PR #421 (@Bartok9) and issue #414.
8 tests covering thresholds, edge cases, JSON and text injection.
2026-03-11 00:37:24 -07:00
teknium1
b53d5dad67 Merge PR #705: fix: detect, warn, and block file re-read/search loops after context compression
Authored by 0xbyt4. Adds read/search loop detection, file history injection after compression, and todo filtering for active items only.
2026-03-10 16:17:03 -07:00
teknium1
c1171fe666 fix: eliminate 3x SQLite message duplication in gateway sessions (#860)
Three separate code paths all wrote to the same SQLite state.db with
no deduplication, inflating session transcripts by 3-4x:

1. _log_msg_to_db() — wrote each message individually after append
2. _flush_messages_to_session_db() — re-wrote ALL new messages at
   every _persist_session() call (~18 exit points), with no tracking
   of what was already written
3. gateway append_to_transcript() — wrote everything a third time
   after the agent returned

Since load_transcript() prefers SQLite over JSONL, the inflated data
was loaded on every session resume, causing proportional token waste.

Fix:
- Remove _log_msg_to_db() and all 16 call sites (redundant with flush)
- Add _last_flushed_db_idx tracking in _flush_messages_to_session_db()
  so repeated _persist_session() calls only write truly new messages
- Reset flush cursor on compression (new session ID)
- Add skip_db parameter to SessionStore.append_to_transcript() so the
  gateway skips SQLite writes when the agent already persisted them
- Gateway now passes skip_db=True for agent-managed messages, still
  writes to JSONL as backup

Verified: a 12-message CLI session with tool calls produces exactly
12 SQLite rows with zero duplicates (previously would be 36-48).

Tests: 9 new tests covering flush deduplication, skip_db behavior,
compression reset, and initialization. Full suite passes (2869 tests).
2026-03-10 15:22:44 -07:00
adavyas
87cc5287a8 fix(honcho): enforce local mode and cache-safe warmup 2026-03-10 16:21:42 -04:00
Erosika
0cb639d472 refactor(honcho): rename query_user_context to honcho_context
Consistent naming: all honcho tools now prefixed with honcho_
(honcho_context, honcho_search, honcho_profile, honcho_conclude).
2026-03-10 16:21:07 -04:00
Erosika
792be0e8e3 feat(honcho): add honcho_conclude tool for writing facts back to memory
New tool lets Hermes persist conclusions about the user (preferences,
corrections, project context) directly to Honcho via the conclusions
API. Feeds into the user's peer card and representation.
2026-03-10 16:21:07 -04:00
Erosika
c1228e9a4a refactor(honcho): rename recallMode "auto" to "hybrid"
Matches the mental model: hybrid = context + tools,
context = context only, tools = tools only.
2026-03-10 16:21:07 -04:00
Erosika
74c214e957 feat(honcho): async memory integration with prefetch pipeline and recallMode
Adds full Honcho memory integration to Hermes:

- Session manager with async background writes, memory modes (honcho/hybrid/local),
  and dialectic prefetch for first-turn context warming
- Agent integration: prefetch pipeline, tool surface gated by recallMode,
  system prompt context injection, SIGTERM/SIGINT flush handlers
- CLI commands: setup, status, mode, tokens, peer, identity, migrate
- recallMode setting (auto | context | tools) for A/B testing retrieval strategies
- Session strategies: per-session, per-repo (git tree root), per-directory, global
- Polymorphic memoryMode config: string shorthand or per-peer object overrides
- 97 tests covering async writes, client config, session resolution, and memory modes
2026-03-10 16:21:07 -04:00
teyrebaz33
1caee06b22 fix: tool call repair — auto-lowercase, fuzzy match, helpful error on unknown tool (#520)
- Add _repair_tool_call(): tries lowercase, normalize, then fuzzy match (difflib 0.7)
- Replace 3-retry-then-abort with graceful error: model receives helpful message and self-corrects
- Conversation stays alive instead of dying on hallucinated tool names

Closes #520
2026-03-10 06:54:11 -07:00
teknium1
771969f747 fix: wire up enabled_tools in agent loop + simplify sandbox tool selection
Completes the fix started in 8318a51 — handle_function_call() accepted
enabled_tools but run_agent.py never passed it. Now both call sites in
_execute_tool_calls() pass self.valid_tool_names, so each agent session
uses its own tool list instead of the process-global
_last_resolved_tool_names (which subagents can overwrite).

Also simplifies the redundant ternary in code_execution_tool.py:
sandbox_tools is already computed correctly (intersection with session
tools, or full SANDBOX_ALLOWED_TOOLS as fallback), so the conditional
was dead logic.

Inspired by PR #663 (JasonOA888). Closes #662.
Tests: 2857 passed.
2026-03-10 06:35:28 -07:00
vincent
b0a5fe8974 fix: continue after output-length truncation 2026-03-10 04:30:19 -07:00
teknium1
899dfdcfb9 Merge PR #616: fix: retry with rebuilt payload after compression
Authored by tripledoublev.

After context compression on 413/400 errors, the inner retry loop was
reusing the stale pre-compression api_messages payload. Fix breaks out
of the inner retry loop so the outer loop rebuilds api_messages from
the now-compressed messages list. Adds regression test verifying the
second request actually contains the compressed payload.
2026-03-10 04:22:42 -07:00
teknium1
f16f2912cf Merge PR #607: fix: reset all retry counters at start of run_conversation()
Authored by 0xbyt4. Adds missing resets for _incomplete_scratchpad_retries and _codex_incomplete_retries to prevent stale counters carrying over between CLI conversations.
2026-03-10 04:17:47 -07:00
teknium1
c1775de56f feat: filesystem checkpoints and /rollback command
Automatic filesystem snapshots before destructive file operations,
with user-facing rollback.  Inspired by PR #559 (by @alireza78a).

Architecture:
- Shadow git repos at ~/.hermes/checkpoints/{hash}/ via GIT_DIR
- CheckpointManager: take/list/restore, turn-scoped dedup, pruning
- Transparent — the LLM never sees it, no tool schema, no tokens
- Once per turn — only first write_file/patch triggers a snapshot

Integration:
- Config: checkpoints.enabled + checkpoints.max_snapshots
- CLI flag: hermes --checkpoints
- Trigger: run_agent.py _execute_tool_calls() before write_file/patch
- /rollback slash command in CLI + gateway (list, restore by number)
- Pre-rollback snapshot auto-created on restore (undo the undo)

Safety:
- Never blocks file operations — all errors silently logged
- Skips root dir, home dir, dirs >50K files
- Disables gracefully when git not installed
- Shadow repo completely isolated from project git

Tests: 35 new tests, all passing (2798 total suite)
Docs: feature page, config reference, CLI commands reference
2026-03-10 00:49:15 -07:00
teknium1
ee4008431a fix: stop terminal border flashing with steady cursor and TUI spinner widget
Cherry-picked and improved from PR #470 (fixes #464).

Problem: On Ubuntu 24.04 with ghostty + tmux, the prompt input box
border lines flash due to cursor blink and raw spinner terminal writes
conflicting with prompt_toolkit's rendering.

Changes:
- cli.py: Add CursorShape.BLOCK to Application() to disable cursor blink
- cli.py: Add thinking_callback + spinner_widget in TUI layout so
  thinking status displays as a proper prompt_toolkit widget instead of
  raw terminal writes that conflict with the TUI renderer
- run_agent.py: Add thinking_callback parameter to AIAgent; when set,
  uses the callback instead of KawaiiSpinner for thinking display

What was NOT changed (preserving existing behavior):
- agent/display.py: Untouched. KawaiiSpinner _write() stdout capture,
  _animate() logic, and 0.12s frame interval all preserved. This
  protects subagent stdout redirection and keeps smooth animations
  for non-CLI contexts (gateway, batch runner).
- Original emoji spinner types (brain/sparkle/pulse/moon/star) preserved
  for all non-CLI contexts.

Fixes from original PR #470:
- CursorShape.STEADY_BLOCK -> CursorShape.BLOCK (STEADY_BLOCK doesn't
  exist in prompt_toolkit 3.0.52)
- Removed duplicate self._spinner_text = '' line
- Removed redundant nested if-checks

Tested: 2706 tests pass, interactive CLI verified via tmux.
2026-03-09 23:26:43 -07:00
teknium1
3e352f8a0d fix: add upstream guard for non-dict function_args + tests for build_tool_preview
Complements PR #453 by 0xbyt4. Adds isinstance(dict) guard in
run_agent.py to catch cases where json.loads returns non-dict
(e.g. null, list, string) before they reach downstream code.

Also adds 15 tests for build_tool_preview covering None args,
empty dicts, known/unknown tools, fallback keys, truncation,
and all special-cased tools (process, todo, memory, session_search).
2026-03-09 21:01:40 -07:00
teyrebaz33
94023e6a85 feat: conditional skill activation based on tool availability
Skills can now declare fallback_for_toolsets, fallback_for_tools,
requires_toolsets, and requires_tools in their SKILL.md frontmatter.
The system prompt builder filters skills automatically based on which
tools are available in the current session.

- Add _read_skill_conditions() to parse conditional frontmatter fields
- Add _skill_should_show() to evaluate conditions against available tools
- Update build_skills_system_prompt() to accept and apply tool availability
- Pass valid_tool_names and available toolsets from run_agent.py
- Backward compatible: skills without conditions always show; calling
  build_skills_system_prompt() with no args preserves existing behavior

Closes #539
2026-03-09 23:13:39 +03:00
teknium1
1f0944de21 fix: handle non-string content from OpenAI-compatible servers (#759)
Some local LLM servers (llama-server, etc.) return message.content as
a dict or list instead of a plain string. This caused AttributeError
'dict object has no attribute strip' on every API call.

Normalizes content to string immediately after receiving the response:
- dict: extracts 'text' or 'content' field, falls back to json.dumps
- list: extracts text parts (OpenAI multimodal content format)
- other: str() conversion

Applied at the single point where response.choices[0].message is read
in the main agent loop, so all downstream .strip()/.startswith()/[:100]
operations work regardless of server implementation.

Closes #759
2026-03-09 03:32:32 -07:00