Commit Graph

411 Commits

Author SHA1 Message Date
Ahmad Ragab
3dc148ab6f fix: use adaptive thinking without budget_tokens for Claude 4.6 models
For Claude 4.6 models (Opus and Sonnet), the Anthropic API rejects
budget_tokens when thinking.type is 'adaptive'. This was causing a
400 error: 'thinking.adaptive.budget_tokens: Extra inputs are not
permitted'.

Changes:
- Send thinking: {type: 'adaptive'} without budget_tokens for 4.6
- Move effort control to output_config: {effort: ...} per Anthropic docs
- Map Hermes effort levels to Anthropic effort levels (xhigh->max, etc.)
- Narrow adaptive detection to 4.6 models only (4.5 still uses manual)
- Add tests for adaptive thinking on 4.6 and manual thinking on pre-4.6

Fixes #1126
2026-03-13 03:21:13 +01:00
Teknium
a282322845 Merge pull request #1121 from 0xbyt4/fix/anthropic-adapter-issues
fix: anthropic adapter — max_tokens, fallback crash, proxy base_url
2026-03-12 19:07:06 -07:00
Teknium
475dd58a8e Merge PR #736: feat(honcho): async writes, memory modes, session title integration, setup CLI
Authored by erosika. Builds on #38 and #243.

Adds async write support, configurable memory modes, context prefetch pipeline,
4 new Honcho tools (honcho_context, honcho_profile, honcho_search, honcho_conclude),
full 'hermes honcho' CLI, session strategies, AI peer identity, recallMode A/B,
gateway lifecycle management, and comprehensive docs.

Cherry-picks fixes from PRs #831/#832 (adavyas).

Co-authored-by: erosika <erosika@users.noreply.github.com>
Co-authored-by: adavyas <adavyas@users.noreply.github.com>
2026-03-12 19:05:11 -07:00
Teknium
28ffa8e693 fix: slack file upload fallback loses thread context (#1122)
fix: slack file upload fallback loses thread context
2026-03-12 18:56:27 -07:00
0xbyt4
93c3a1a9c9 fix(setup): remove dead code causing is_coding_plan NameError crash
Remove 50 lines of unreachable duplicate model selection logic in
setup_model_provider() for zai/kimi-coding/minimax/minimax-cn providers.
The code referenced undefined `is_coding_plan` variable, crashing setup.
_setup_provider_model_selection() already handles these providers correctly
via _DEFAULT_PROVIDER_MODELS dict.
2026-03-13 04:42:26 +03:00
0xbyt4
064c66df8c fix: slack file upload fallback loses thread context
Fallback paths in send_image_file, send_video, and send_document called
super() without metadata, causing replies to appear outside the thread
when file upload fails. Use self.send() with metadata instead to preserve
thread_ts context.
2026-03-13 04:26:27 +03:00
0xbyt4
22479b053c fix: anthropic adapter — max_tokens ignored, fallback crash, proxy base_url filtered
- Pass self.max_tokens to build_anthropic_kwargs instead of hardcoded None
- Add anthropic case to _try_activate_fallback (was only handling openai-codex)
- Remove 'anthropic in base_url' filter that blocked custom proxy URLs
2026-03-13 04:22:16 +03:00
Teknium
3bc933586a fix: Slack MAX_MESSAGE_LENGTH + typing indicator via assistant.threads.setStatus (#1117)
fix: Slack MAX_MESSAGE_LENGTH 3900 → 39000
2026-03-12 17:53:49 -07:00
teknium1
e976879cf2 merge: resolve conflicts with main (URL update to hermes-agent.nousresearch.com) 2026-03-12 17:49:26 -07:00
teknium1
319e6615c3 fix: Slack MAX_MESSAGE_LENGTH + typing indicator via assistant.threads.setStatus
- Increase MAX_MESSAGE_LENGTH from 3,900 to 39,000 (Slack API allows 40k)
- Implement real typing indicator using assistant.threads.setStatus API
  - Shows 'BotName is thinking...' next to the bot name in threads
  - Auto-clears when the bot sends a reply
  - Requires assistant:write or chat:write scope
  - Falls back silently if scope unavailable (reactions still work)
- 4 new tests for typing indicator
2026-03-12 17:46:53 -07:00
teknium1
4068f20ce9 fix(anthropic): deep scan fixes — auth, retries, edge cases
Fixes from comprehensive code review and cross-referencing with
clawdbot/OpenCode implementations:

CRITICAL:
- Add one-shot guard (anthropic_auth_retry_attempted) to prevent
  infinite 401 retry loops when credentials keep changing
- Fix _is_oauth_token(): managed keys from ~/.claude.json are NOT
  regular API keys (don't start with sk-ant-api). Inverted the logic:
  only sk-ant-api* is treated as API key auth, everything else uses
  Bearer auth + oauth beta headers

HIGH:
- Wrap json.loads(args) in try/except in message conversion — malformed
  tool_call arguments no longer crash the entire conversation
- Raise AuthError in runtime_provider when no Anthropic token found
  (was silently passing empty string, causing confusing API errors)
- Remove broken _try_anthropic() from auxiliary vision chain — the
  centralized router creates an OpenAI client for api_key providers
  which doesn't work with Anthropic's Messages API

MEDIUM:
- Handle empty assistant message content — Anthropic rejects empty
  content blocks, now inserts '(empty)' placeholder
- Fix setup.py existing_key logic — set to 'KEEP' sentinel instead
  of None to prevent falling through to the auth choice prompt
- Add debug logging to _fetch_anthropic_models on failure

Tests: 43 adapter tests (2 new for token detection), 3197 total passed
2026-03-12 17:14:22 -07:00
teknium1
978e1356c0 feat: Slack adapter improvements — formatting, reactions, user resolution, commands
1. Markdown → mrkdwn conversion (format_message override):
   - **bold** → *bold*, *italic* → _italic_
   - ## Headers → *Headers* (bold)
   - [link](url) → <url|link>
   - ~~strike~~ → ~strike~
   - Code blocks and inline code preserved unchanged
   - Placeholder-based approach (same pattern as Telegram)

2. Message length splitting:
   - send() now calls format_message() + truncate_message()
   - Long responses split at natural boundaries (newlines, spaces)
   - Code blocks properly closed/reopened across chunks
   - Chunk indicators (1/N) appended for multi-part messages

3. Reaction-based acknowledgment:
   - 👀 (eyes) reaction added on message receipt
   - Replaced with  (white_check_mark) when response is complete
   - Graceful error handling (missing scopes, already-reacted)
   - Serves as visual feedback since Slack has no bot typing API

4. User identity resolution:
   - Resolves Slack user IDs to display names via users.info API
   - LRU-style in-memory cache (one API call per user)
   - Fallback chain: display_name → real_name → user_id
   - user_name now included in MessageEvent source

5. Expanded slash commands (/hermes <subcommand>):
   - Added: compact, compress, resume, background, usage,
     insights, title, reasoning, provider, rollback
   - Arguments preserved (e.g. /hermes resume my session)

6. reply_broadcast config option:
   - When gateway.slack.reply_broadcast is true, first response
     in a thread also appears in the main channel
   - Disabled by default — thread = session stays clean

30 new tests covering all features.
2026-03-12 16:22:39 -07:00
teknium1
7086fde37e fix(anthropic): revert inline vision, add hermes model flow, wire vision aux
Feedback fixes:

1. Revert _convert_vision_content — vision is handled by the vision_analyze
   tool, not by converting image blocks inline in conversation messages.
   Removed the function and its tests.

2. Add Anthropic to 'hermes model' (cmd_model in main.py):
   - Added to provider_labels dict
   - Added to providers selection list
   - Added _model_flow_anthropic() with Claude Code credential auto-detection,
     API key prompting, and model selection from catalog.

3. Wire up Anthropic as a vision-capable auxiliary provider:
   - Added _try_anthropic() to auxiliary_client.py using claude-sonnet-4
     as the vision model (Claude natively supports multimodal)
   - Added to the get_vision_auxiliary_client() auto-detection chain
     (after OpenRouter/Nous, before Codex/custom)

Cache tracking note: the Anthropic cache metrics branch in run_agent.py
(cache_read_input_tokens / cache_creation_input_tokens) is in the correct
place — it's response-level parsing, same location as the existing
OpenRouter cache tracking. auxiliary_client.py has no cache tracking.
2026-03-12 16:09:04 -07:00
Teknium
4a8cd6f856 fix: stop rejecting unlisted models, accept with warning instead
* fix: use session_key instead of chat_id for adapter interrupt lookups

monitor_for_interrupt() in _run_agent was using source.chat_id to query
the adapter's has_pending_interrupt() and get_pending_message() methods.
But the adapter stores interrupt events under build_session_key(source),
which produces a different string (e.g. 'agent:main:telegram:dm' vs '123456').

This key mismatch meant the interrupt was never detected through the
adapter path, which is the only active interrupt path for all adapter-based
platforms (Telegram, Discord, Slack, etc.). The gateway-level interrupt
path (in dispatch_message) is unreachable because the adapter intercepts
the 2nd message in handle_message() before it reaches dispatch_message().

Result: sending a new message while subagents were running had no effect —
the interrupt was silently lost.

Fix: replace all source.chat_id references in the interrupt-related code
within _run_agent() with the session_key parameter, which matches the
adapter's storage keys.

Also adds regression tests verifying session_key vs chat_id consistency.

* debug: add file-based logging to CLI interrupt path

Temporary instrumentation to diagnose why message-based interrupts
don't seem to work during subagent execution. Logs to
~/.hermes/interrupt_debug.log (immune to redirect_stdout).

Two log points:
1. When Enter handler puts message into _interrupt_queue
2. When chat() reads it and calls agent.interrupt()

This will reveal whether the message reaches the queue and
whether the interrupt is actually fired.

* fix: accept unlisted models with warning instead of rejecting

validate_requested_model() previously hard-rejected any model not found
in the provider's API listing. This was too aggressive — users on higher
plan tiers (e.g. Z.AI Pro/Max) may have access to models not shown in
the public listing (like glm-5 on coding endpoints).

Changes:
- validate_requested_model: accept unlisted models with a warning note
  instead of blocking. The model is saved to config and used immediately.
- Z.AI setup: always offer glm-5 in the model list regardless of whether
  a coding endpoint was detected. Pro/Max plans support it.
- Z.AI setup detection message: softened from 'GLM-5 is not available'
  to 'GLM-5 may still be available depending on your plan tier'
2026-03-12 16:02:35 -07:00
teknium1
d7adfe8f61 fix(anthropic): address gaps found in deep-dive audit
After studying clawdbot (OpenClaw) and OpenCode implementations:

## Beta headers
- Add interleaved-thinking-2025-05-14 and fine-grained-tool-streaming-2025-05-14
  as common betas (sent with ALL auth types, not just OAuth)
- OAuth tokens additionally get oauth-2025-04-20
- API keys now also get the common betas (previously got none)

## Vision/image support
- Add _convert_vision_content() to convert OpenAI multimodal format
  (image_url blocks) to Anthropic format (image blocks with base64/url source)
- Handles both data: URIs (base64) and regular URLs

## Role alternation enforcement
- Anthropic strictly rejects consecutive same-role messages (400 error)
- Add post-processing step that merges consecutive user/assistant messages
- Handles string, list, and mixed content types during merge

## Tool choice support
- Add tool_choice parameter to build_anthropic_kwargs()
- Maps OpenAI values: auto→auto, required→any, none→omit, name→tool

## Cache metrics tracking
- Anthropic uses cache_read_input_tokens / cache_creation_input_tokens
  (different from OpenRouter's prompt_tokens_details.cached_tokens)
- Add api_mode-aware branch in run_agent.py cache stats logging

## Credential refresh on 401
- On 401 error during anthropic_messages mode, re-read credentials
  via resolve_anthropic_token() (picks up refreshed Claude Code tokens)
- Rebuild client if new token differs from current one
- Follows same pattern as Codex/Nous 401 refresh handlers

## Tests
- 44 adapter tests (8 new: vision conversion, role alternation, tool choice)
- Updated beta header tests to verify new structure
- Full suite: 3198 passed, 0 regressions
2026-03-12 16:00:46 -07:00
teknium1
5e12442b4b feat: native Anthropic provider with Claude Code credential auto-discovery
Add Anthropic as a first-class inference provider, bypassing OpenRouter
for direct API access. Uses the native Anthropic SDK with a full format
adapter (same pattern as the codex_responses api_mode).

## Auth (three methods, priority order)
1. ANTHROPIC_API_KEY env var (regular API key, sk-ant-api-*)
2. ANTHROPIC_TOKEN / CLAUDE_CODE_OAUTH_TOKEN env var (setup-token, sk-ant-oat-*)
3. Auto-discovery from ~/.claude/.credentials.json (Claude Code subscription)
   - Reads Claude Code's OAuth credentials
   - Checks token expiry with 60s buffer
   - Setup tokens use Bearer auth + anthropic-beta: oauth-2025-04-20 header
   - Regular API keys use standard x-api-key header

## Changes by file

### New files
- agent/anthropic_adapter.py — Client builder, message/tool/response
  format conversion, Claude Code credential reader, token resolver.
  Handles system prompt extraction, tool_use/tool_result blocks,
  thinking/reasoning, orphaned tool_use cleanup, cache_control.
- tests/test_anthropic_adapter.py — 36 tests covering all adapter logic

### Modified files
- pyproject.toml — Add anthropic>=0.39.0 dependency
- hermes_cli/auth.py — Add 'anthropic' to PROVIDER_REGISTRY with
  three env vars, plus 'claude'/'claude-code' aliases
- hermes_cli/models.py — Add model catalog, labels, aliases, provider order
- hermes_cli/main.py — Add 'anthropic' to --provider CLI choices
- hermes_cli/runtime_provider.py — Add Anthropic branch returning
  api_mode='anthropic_messages' (before generic api_key fallthrough)
- hermes_cli/setup.py — Add Anthropic setup wizard with Claude Code
  credential auto-discovery, model selection, OpenRouter tools prompt
- agent/auxiliary_client.py — Add claude-haiku-4-5 as aux model
- agent/model_metadata.py — Add bare Claude model context lengths
- run_agent.py — Add anthropic_messages api_mode:
  * Client init (Anthropic SDK instead of OpenAI)
  * API call dispatch (_anthropic_client.messages.create)
  * Response validation (content blocks)
  * finish_reason mapping (stop_reason -> finish_reason)
  * Token usage (input_tokens/output_tokens)
  * Response normalization (normalize_anthropic_response)
  * Client interrupt/rebuild
  * Prompt caching auto-enabled for native Anthropic
- tests/test_run_agent.py — Update test_anthropic_base_url_accepted to
  expect native routing, add test_prompt_caching_native_anthropic
2026-03-12 15:47:45 -07:00
Erosika
fefc709b2c merge: resolve conflict with main in subagent interrupt test 2026-03-12 16:28:57 -04:00
Erosika
ae2a5e5743 refactor(honcho): remove local memory mode
The "local" memoryMode was redundant with enabled: false. Simplifies
the mode system to hybrid and honcho only.
2026-03-12 16:23:34 -04:00
Erosika
f896bb5d8c fix(test): patch correct method in subagent interrupt test
build_system_prompt was refactored to AIAgent._build_system_prompt
but the test still patched the non-existent module-level function.
2026-03-12 15:05:42 -04:00
Teknium
e004c094ea fix: use session_key instead of chat_id for adapter interrupt lookups
* fix: use session_key instead of chat_id for adapter interrupt lookups

monitor_for_interrupt() in _run_agent was using source.chat_id to query
the adapter's has_pending_interrupt() and get_pending_message() methods.
But the adapter stores interrupt events under build_session_key(source),
which produces a different string (e.g. 'agent:main:telegram:dm' vs '123456').

This key mismatch meant the interrupt was never detected through the
adapter path, which is the only active interrupt path for all adapter-based
platforms (Telegram, Discord, Slack, etc.). The gateway-level interrupt
path (in dispatch_message) is unreachable because the adapter intercepts
the 2nd message in handle_message() before it reaches dispatch_message().

Result: sending a new message while subagents were running had no effect —
the interrupt was silently lost.

Fix: replace all source.chat_id references in the interrupt-related code
within _run_agent() with the session_key parameter, which matches the
adapter's storage keys.

Also adds regression tests verifying session_key vs chat_id consistency.

* debug: add file-based logging to CLI interrupt path

Temporary instrumentation to diagnose why message-based interrupts
don't seem to work during subagent execution. Logs to
~/.hermes/interrupt_debug.log (immune to redirect_stdout).

Two log points:
1. When Enter handler puts message into _interrupt_queue
2. When chat() reads it and calls agent.interrupt()

This will reveal whether the message reaches the queue and
whether the interrupt is actually fired.
2026-03-12 08:35:45 -07:00
Teknium
42cf66ae39 feat: add 'hermes claw migrate' command + migration docs (#1059)
feat: add 'hermes claw migrate' command + migration docs
2026-03-12 08:23:05 -07:00
teknium1
d53035ad82 feat: add 'hermes claw migrate' command + migration docs
- Add hermes_cli/claw.py with full CLI migration handler:
  - hermes claw migrate (interactive migration with confirmation)
  - --dry-run, --preset, --overwrite, --skill-conflict flags
  - --source for custom OpenClaw path
  - --yes to skip confirmation
  - Clean formatted output matching setup wizard style

- Fix Python 3.11+ @dataclass compatibility bug in dynamic module loading:
  - Register module in sys.modules before exec_module()
  - Fixes both setup.py (PR #981) and new claw.py

- Add 16 tests in tests/hermes_cli/test_claw.py covering:
  - Script discovery (project root, installed, missing)
  - Command routing
  - Dry-run, execute, cancellation, error handling
  - Preset/secrets behavior, report formatting

- Documentation updates:
  - README.md: Add 'hermes claw migrate' to Getting Started, new Migration section
  - docs/migration/openclaw.md: Full migration guide with all options
  - SKILL.md: Add CLI Command section at top of openclaw-migration skill
2026-03-12 08:20:12 -07:00
Teknium
5a4348d046 Merge pull request #1053 from NousResearch/hermes/hermes-c877bdeb
chore(skills): clean up PR #862 + feat(docs): add search to Docusaurus
2026-03-12 08:20:10 -07:00
Teknium
68fdc62d8f feat: offer OpenClaw migration during first-time setup wizard (#981)
feat: offer OpenClaw migration during first-time setup wizard
2026-03-12 08:12:30 -07:00
teknium1
bb7cdc6d44 chore(skills): clean up PR #862 — simplify manifest guard, DRY up tests
Follow-up to PR #862 (local skills classification by arceus77-7):

- Remove unnecessary isinstance guard on _read_manifest() return value —
  it always returns Dict[str, str], so set() on it suffices.
- Extract repeated hub-dir monkeypatching into a shared pytest fixture (hub_env).
- Add three_source_env fixture for source-classification tests.
- Add _read_manifest monkeypatch to test_do_list_initializes_hub_dir
  (was fragile — relied on empty skills list masking the real manifest).
- Add test coverage for --source hub and --source builtin filters.
- Extract _capture() helper to reduce console/StringIO boilerplate.

5 tests, all green.
2026-03-12 08:08:22 -07:00
Teknium
7e637d3b6a Merge pull request #862 from arceus77-7/fix/skills-list-source-provenance
Merging — clean fix for local skills mislabeling. Follow-up cleanup coming.
2026-03-12 08:05:34 -07:00
Teknium
2a62514d17 feat: add 'View full command' option to dangerous command approval (#887)
When a dangerous command is detected and the user is prompted for
approval, long commands are truncated (80 chars in fallback, 70 chars
in the TUI). Users had no way to see the full command before deciding.

This adds a 'View full command' option across all approval interfaces:

- CLI fallback (tools/approval.py): [v]iew option in the prompt menu.
  Shows the full command and re-prompts for approval decision.
- CLI TUI (cli.py): 'Show full command' choice in the arrow-key
  selection panel. Expands the command display in-place and removes
  the view option after use.
- CLI callbacks (callbacks.py): 'view' choice added to the list when
  the command exceeds 70 characters.
- Gateway (gateway/run.py): 'full', 'show', 'view' responses reveal
  the complete command while keeping the approval pending.

Includes 7 new tests covering view-then-approve, view-then-deny,
short command fallthrough, and double-view behavior.

Closes community feedback about the 80-char cap on dangerous commands.
2026-03-12 06:27:21 -07:00
Teknium
e782b92bca fix: /reasoning command — add gateway support, fix display, persist settings (#1031)
* fix: /reasoning command output ordering, display, and inline think extraction

Three issues with the /reasoning command:

1. Output interleaving: The command echo used print() while feedback
   used _cprint(), causing them to render out-of-order under
   prompt_toolkit's patch_stdout. Changed echo to use _cprint() so
   all output renders through the same path in correct order.

2. Reasoning display not working: /reasoning show toggled a flag
   but reasoning never appeared for models that embed thinking in
   inline <think> blocks rather than structured API fields. Added
   fallback extraction in _build_assistant_message to capture
   <think> block content as reasoning when no structured reasoning
   fields (reasoning, reasoning_content, reasoning_details) are
   present. This feeds into both the reasoning callback (during
   tool loops) and the post-response reasoning box display.

3. Feedback clarity: Added checkmarks to confirm actions, persisted
   show/hide to config (was session-only before), and aligned the
   status display for readability.

Tests: 7 new tests for inline think block extraction (41 total).

* feat: add /reasoning command to gateway (Telegram/Discord/etc)

The /reasoning command only existed in the CLI — messaging platforms
had no way to view or change reasoning settings. This adds:

1. /reasoning command handler in the gateway:
   - No args: shows current effort level and display state
   - /reasoning <level>: sets reasoning effort (none/low/medium/high/xhigh)
   - /reasoning show|hide: toggles reasoning display in responses
   - All changes saved to config.yaml immediately

2. Reasoning display in gateway responses:
   - When show_reasoning is enabled, prepends a 'Reasoning' block
     with the model's last_reasoning content before the response
   - Collapses long reasoning (>15 lines) to keep messages readable
   - Uses last_reasoning from run_conversation result dict

3. Plumbing:
   - Added _show_reasoning attribute loaded from config at startup
   - Propagated last_reasoning through _run_agent return dict
   - Added /reasoning to help text and known_commands set
   - Uses getattr for _show_reasoning to handle test stubs
2026-03-12 05:38:19 -07:00
teknium1
a37fc05171 fix: skip hanging tests + add global test timeout
4 test files spawn real processes or make live API calls that hang
indefinitely in batch/CI runs. Skip them with pytestmark:

- tests/tools/test_code_execution.py (subprocess spawns)
- tests/tools/test_file_tools_live.py (live LocalEnvironment)
- tests/test_413_compression.py (blocks on process)
- tests/test_agent_loop_tool_calling.py (live OpenRouter API calls)

Also added global 30s signal.alarm timeout in conftest.py as a safety
net, and removed stale nous-api test that hung on OAuth browser login.

Suite now runs in ~55s with no hangs.
2026-03-12 01:23:28 -07:00
teknium1
1956b9d97a fix: remove nous-api test + fix OAuth test index after nous-api removal
- Remove test_nous_api_setup_preserves_model_provider_metadata (nous-api
  provider no longer exists, test selected Nous OAuth which hangs waiting
  for browser login)
- Fix test_nous_oauth_setup prompt_choice index: 1→0 (Nous Portal is
  now first option after nous-api removal)
2026-03-12 00:51:30 -07:00
teknium1
2192b17670 merge: resolve conflicts with origin/main
- gateway/run.py: Take main's _resolve_gateway_model() helper
- hermes_cli/setup.py: Re-apply nous-api removal after merge brought
  it back. Fix provider_idx offset (Custom is now index 3, not 4).
- tests/hermes_cli/test_setup.py: Fix custom setup test index (3→4)
2026-03-12 00:29:04 -07:00
teknium1
ec2c6dff70 feat: unified /model and /provider into single view
Both /model and /provider now show the same unified display:

  Current: anthropic/claude-opus-4.6 via OpenRouter

  Authenticated providers & models:
    [openrouter] ← active
      anthropic/claude-opus-4.6 ← current
      anthropic/claude-sonnet-4.5
      ...
    [nous]
      claude-opus-4-6
      gemini-3-flash
      ...
    [openai-codex]
      gpt-5.2-codex
      gpt-5.1-codex-mini
      ...

  Not configured: Z.AI / GLM, Kimi / Moonshot, ...

  Switch model:    /model <model-name>
  Switch provider: /model <provider>:<model-name>
  Example: /model nous:claude-opus-4-6

Users can see all authenticated providers and their models at a glance,
making it easy to switch mid-conversation.

Also added curated model lists for Nous Portal and OpenAI Codex to
hermes_cli/models.py.
2026-03-11 23:06:06 -07:00
teknium1
9302690e1b refactor: remove LLM_MODEL env var dependency — config.yaml is sole source of truth
Model selection now comes exclusively from config.yaml (set via
'hermes model' or 'hermes setup'). The LLM_MODEL env var is no longer
read or written anywhere in production code.

Why: env vars are per-process/per-user and would conflict in
multi-agent or multi-tenant setups. Config.yaml is file-based and
can be scoped per-user or eventually per-session.

Changes:
- cli.py: Read model from CLI_CONFIG only, not LLM_MODEL/OPENAI_MODEL
- hermes_cli/auth.py: _save_model_choice() no longer writes LLM_MODEL
  to .env
- hermes_cli/setup.py: Remove 12 save_env_value('LLM_MODEL', ...)
  calls from all provider setup flows
- gateway/run.py: Remove LLM_MODEL fallback (HERMES_MODEL still works
  for gateway process runtime)
- cron/scheduler.py: Same
- agent/auxiliary_client.py: Remove LLM_MODEL from custom endpoint
  model detection
2026-03-11 22:04:42 -07:00
teknium1
a29801286f refactor: route main agent client + fallback through centralized router
Phase 2 of the provider router migration — route the main agent's
client construction and fallback activation through
resolve_provider_client() instead of duplicated ad-hoc logic.

run_agent.py:
- __init__: When no explicit api_key/base_url, use
  resolve_provider_client(provider, raw_codex=True) for client
  construction. Explicit creds (from CLI/gateway runtime provider)
  still construct directly.
- _try_activate_fallback: Replace _resolve_fallback_credentials and
  its duplicated _FALLBACK_API_KEY_PROVIDERS / _FALLBACK_OAUTH_PROVIDERS
  dicts with a single resolve_provider_client() call. The router
  handles all provider types (API-key, OAuth, Codex) centrally.
- Remove _resolve_fallback_credentials method and both fallback dicts.

agent/auxiliary_client.py:
- Add raw_codex parameter to resolve_provider_client(). When True,
  returns the raw OpenAI client for Codex providers instead of wrapping
  in CodexAuxiliaryClient. The main agent needs this for direct
  responses.stream() access.

3251 passed, 2 pre-existing unrelated failures.
2026-03-11 21:38:29 -07:00
teknium1
29ef69c703 fix: update all test mocks for call_llm migration
Update 14 test files to use the new call_llm/async_call_llm mock
patterns instead of the old get_text_auxiliary_client/
get_vision_auxiliary_client tuple returns.

- vision_tools tests: mock async_call_llm instead of _aux_async_client
- browser tests: mock call_llm instead of _aux_vision_client
- flush_memories tests: mock call_llm instead of get_text_auxiliary_client
- session_search tests: mock async_call_llm with RuntimeError
- mcp_tool tests: fix whitelist model config, use side_effect for
  multi-response tests
- auxiliary_config_bridge: update for model=None (resolved in router)

3251 passed, 2 pre-existing unrelated failures.
2026-03-11 21:06:54 -07:00
teknium1
0aa31cd3cb feat: call_llm/async_call_llm + config slots + migrate all consumers
Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).
2026-03-11 20:52:19 -07:00
teknium1
013cc4d2fc chore: remove nous-api provider (API key path)
Nous Portal only supports OAuth authentication. Remove the 'nous-api'
provider which allowed direct API key access via NOUS_API_KEY env var.

Removed from:
- hermes_cli/auth.py: PROVIDER_REGISTRY entry + aliases
- hermes_cli/config.py: OPTIONAL_ENV_VARS entry
- hermes_cli/setup.py: setup wizard option + model selection handler
  (reindexed remaining provider choices)
- agent/auxiliary_client.py: docstring references
- tests/test_runtime_provider_resolution.py: nous-api test
- tests/integration/test_web_tools.py: renamed dict key
2026-03-11 20:14:44 -07:00
Erosika
2d35016b94 fix(honcho): harden tool gating and migration peer routing
Prevent stale Honcho tool exposure in context/local modes, restore reliable async write retry behavior, and ensure SOUL.md migration uploads target the AI peer instead of the user peer. Also align Honcho CLI key checks with host-scoped apiKey resolution and lock the fixes with regression tests.

Made-with: Cursor
2026-03-11 18:21:27 -04:00
kshitij
0712639441 test: verify reloaded config drives setup after migration 2026-03-12 02:56:36 +05:30
kshitij
4f427167ac chore: clean OpenClaw migration follow-up 2026-03-12 02:49:29 +05:30
teknium1
44bf859c3b feat: offer OpenClaw migration during first-time setup wizard
When a new user runs 'hermes setup' for the first time and ~/.openclaw/
exists, the wizard now asks if they want to import their OpenClaw data
before API/tool configuration begins.

If accepted, the existing migration script from optional-skills/ is
loaded dynamically and run with the 'full' preset — importing settings,
memories, skills, API keys, and platform configs. Config is reloaded
afterward so imported values (like API keys) are available for the
remaining setup steps.

The migration is only offered on first-time setup (not returning users)
and handles errors gracefully without blocking setup completion.

Closes #829
2026-03-12 02:40:00 +05:30
Erosika
d987ff54a1 fix: change session_strategy default from per-directory to per-session
Matches Hermes' native session naming (title if set, otherwise
session-scoped). Not a breaking change -- no memory data is lost,
old sessions remain in Honcho.
2026-03-11 15:42:35 -04:00
Erosika
a0b0dbe6b2 Merge remote-tracking branch 'origin/main' into feat/honcho-async-memory
Made-with: Cursor

# Conflicts:
#	cli.py
#	tests/test_run_agent.py
2026-03-11 12:22:56 -04:00
Teknium
8fa96debc9 Merge pull request #963 from NousResearch/hermes/hermes-cf9f7d54
fix: guard all print() against OSError with _SafeWriter
2026-03-11 09:19:52 -07:00
teknium1
a8409a161f fix: guard all print() calls against OSError with _SafeWriter
When hermes-agent runs as a systemd service, Docker container, or
headless daemon, the stdout pipe can become unavailable (idle timeout,
buffer exhaustion, socket reset). Any print() call then raises
OSError: [Errno 5] Input/output error, crashing run_conversation()
and causing cron jobs to fail.

Rather than wrapping individual print() calls (68 in run_conversation
alone), this adds a transparent _SafeWriter wrapper installed once at
the start of run_conversation(). It delegates all writes to the real
stdout and silently catches OSError. Zero overhead on the happy path,
comprehensive coverage of all print calls including future ones.

Fixes #845

Co-authored-by: J0hnLawMississippi <J0hnLawMississippi@users.noreply.github.com>
2026-03-11 09:19:10 -07:00
kshitij-eliza
452593319b fix(setup): preserve provider metadata during model selection 2026-03-11 09:17:09 -07:00
insecurejezza
11825ccefa feat(gateway): thread-aware free-response routing for Discord
- Forum parent channel IDs now match free-response list (add a forum
  channel ID and all its threads respond without mention)
- Better thread chat names: 'Guild / forum / thread' for forum threads
- Add discord.require_mention and discord.free_response_channels to
  config.yaml (bridged to env vars, env vars still override)
- Keep require_mention defaulting to true (safe for shared servers)

Cherry-picked from PR #867 by insecurejezza with default fix and
config.yaml integration.

Co-authored-by: insecurejezza <insecurejezza@users.noreply.github.com>
2026-03-11 09:15:31 -07:00
Teknium
fa7a18f42a Merge pull request #949 from NousResearch/hermes/hermes-b86fddbe
fix(cron): handle naive legacy timestamps in due-job checks
2026-03-11 08:47:10 -07:00
Erosika
047b118299 fix(honcho): resolve review blockers for merge
Address merge-blocking review feedback by removing unsafe signal handler overrides, wiring next-turn Honcho prefetch, restoring per-directory session defaults, and exposing all Honcho tools to the model surface. Also harden prefetch cache access with public thread-safe accessors and remove duplicate browser cleanup code.

Made-with: Cursor
2026-03-11 11:46:37 -04:00
Teknium
01d3b31479 Merge PR #785: feat: conditional skill activation based on tool availability
Authored by teyrebaz33. Closes #539.

feat: conditional skill activation based on tool availability
2026-03-11 08:43:30 -07:00