Commit Graph

408 Commits

Author SHA1 Message Date
ryanautomated
0f9aa57069 fix: silent memory flush failure on /new and /resume commands
The _async_flush_memories() helper accepts (session_id) but both the
/new and /resume handlers passed two arguments (session_id, session_key).
The TypeError was silently swallowed at DEBUG level, so memory extraction
never ran when users typed /new or /resume.

One call site (the session expiry watcher) was already fixed in 9c96f669,
but /new and /resume were missed.

- gateway/run.py:3247 — remove stray session_key from /new handler
- gateway/run.py:4989 — remove stray session_key from /resume handler
- tests/gateway/test_resume_command.py:222 — update test assertion
2026-04-06 16:49:42 -07:00
Mikita Lisavets
29b5ec2555 fix: clear session-scoped model after session reset 2026-04-06 13:20:01 -07:00
Mikita Lisavets
9afb9a6cb2 fix: clear session-scoped model overrides during session reset 2026-04-06 13:20:01 -07:00
donrhmexe
2c814d7b5d fix: /model --global writes model.name instead of model.default
The canonical config key for model name is model.default (used by setup,
auth, runtime_provider, profile list, and CLI startup). But /model --global
wrote to model.name in both gateway and CLI paths.

This caused:
- hermes profile list showing the old model (reads model.default)
- Gateway restart reverting to the old model (_resolve_gateway_model reads model.default)
- CLI startup using the old model (main.py reads model.default)

The only reason it appeared to work in Telegram was the cached agent
staying alive with the in-place switch.

Fix: change all 3 write/read sites to use model.default.
2026-04-06 13:20:01 -07:00
Dusk1e
7d0953d6ff security(gateway): isolate env/credential registries using ContextVars 2026-04-06 12:42:16 -07:00
Teknium
9c96f669a1 feat: centralized logging, instrumentation, hermes logs CLI, gateway noise fix (#5430)
Adds comprehensive logging infrastructure to Hermes Agent across 4 phases:

**Phase 1 — Centralized logging**
- New hermes_logging.py with idempotent setup_logging() used by CLI, gateway, and cron
- agent.log (INFO+) and errors.log (WARNING+) with RotatingFileHandler + RedactingFormatter
- config.yaml logging: section (level, max_size_mb, backup_count)
- All entry points wired (cli.py, main.py, gateway/run.py, run_agent.py)
- Fixed debug_helpers.py writing to ./logs/ instead of ~/.hermes/logs/

**Phase 2 — Event instrumentation**
- API calls: model, provider, tokens, latency, cache hit %
- Tool execution: name, duration, result size (both sequential + concurrent)
- Session lifecycle: turn start (session/model/provider/platform), compression (before/after)
- Credential pool: rotation events, exhaustion tracking

**Phase 3 — hermes logs CLI command**
- hermes logs / hermes logs -f / hermes logs errors / hermes logs gateway
- --level, --session, --since filters
- hermes logs list (file sizes + ages)

**Phase 4 — Gateway bug fix + noise reduction**
- fix: _async_flush_memories() called with wrong arg count — sessions never flushed
- Batched session expiry logs: 6 lines/cycle → 2 summary lines
- Added inbound message + response time logging

75 new tests, zero regressions on the full suite.
2026-04-06 00:08:20 -07:00
Teknium
dce5f51c7c feat: config structure validation — detect malformed YAML at startup (#5426)
Add validate_config_structure() that catches common config.yaml mistakes:
- custom_providers as dict instead of list (missing '-' in YAML)
- fallback_model accidentally nested inside another section
- custom_providers entries missing required fields (name, base_url)
- Missing model section when custom_providers is configured
- Root-level keys that look like misplaced custom_providers fields

Surface these diagnostics at three levels:
1. Startup: print_config_warnings() runs at CLI and gateway module load,
   so users see issues before hitting cryptic errors
2. Error time: 'Unknown provider' errors in auth.py and model_switch.py
   now include config diagnostics with fix suggestions
3. Doctor: 'hermes doctor' shows a Config Structure section with all
   issues and fix hints

Also adds a warning log in runtime_provider.py when custom_providers
is a dict (previously returned None silently).

Motivated by a Discord user who had malformed custom_providers YAML
and got only 'Unknown Provider' with no guidance on what was wrong.

17 new tests covering all validation paths.
2026-04-05 23:31:20 -07:00
Teknium
89c812d1d2 feat: shared thread sessions by default — multi-user thread support (#5391)
Threads (Telegram forum topics, Discord threads, Slack threads) now default
to shared sessions where all participants see the same conversation. This is
the expected UX for threaded conversations where multiple users @mention the
bot and interact collaboratively.

Changes:
- build_session_key(): when thread_id is present, user_id is no longer
  appended to the session key (threads are shared by default)
- New config: thread_sessions_per_user (default: false) — opt-in to restore
  per-user isolation in threads if needed
- Sender attribution: messages in shared threads are prefixed with
  [sender name] so the agent can tell participants apart
- System prompt: shared threads show 'Multi-user thread' note instead of
  a per-turn User line (avoids busting prompt cache)
- Wired through all callers: gateway/run.py, base.py, telegram.py, feishu.py
- Regular group messages (no thread) remain per-user isolated (unchanged)
- DM threads are unaffected (they have their own keying logic)

Closes community request from demontut_ re: thread-based shared sessions.
2026-04-05 19:46:58 -07:00
Teknium
fec58ad99e fix(gateway): replace wall-clock agent timeout with inactivity-based timeout (#5389)
The gateway previously used a hard wall-clock asyncio.wait_for timeout
that killed agents after a fixed duration regardless of activity. This
punished legitimate long-running tasks (subagent delegation, reasoning
models, multi-step research).

Now uses an inactivity-based polling loop that checks the agent's
built-in activity tracker (get_activity_summary) every 5 seconds. The
agent can run indefinitely as long as it's actively calling tools or
receiving API responses. Only fires when the agent has been completely
idle for the configured duration.

Changes:
- Replace asyncio.wait_for with asyncio.wait poll loop checking
  agent idle time via get_activity_summary()
- Add agent.gateway_timeout config.yaml key (default 1800s, 0=unlimited)
- Update stale session eviction to use agent idle time instead of
  pure wall-clock (prevents evicting active long-running tasks)
- Preserve all existing diagnostic logging and user-facing context

Inspired by PR #4864 (Mibayy) and issue #4815 (BongSuCHOI).
Reimplemented on current main using existing _touch_activity()
infrastructure rather than a parallel tracker.
2026-04-05 19:38:21 -07:00
Teknium
2563493466 fix: improve timeout debug logging and user-facing diagnostics (#5370)
Agent activity tracking:
- Add _last_activity_ts, _last_activity_desc, _current_tool to AIAgent
- Touch activity on: API call start/complete, tool start/complete,
  first stream chunk, streaming request start
- Public get_activity_summary() method for external consumers

Gateway timeout diagnostics:
- Timeout message now includes what the agent was doing when killed:
  actively working vs stuck on a tool vs waiting on API response
- Includes iteration count, last activity description, seconds since
  last activity — users can distinguish legitimate long tasks from
  genuine hangs
- 'Still working' notifications now show iteration count and current
  tool instead of just elapsed time
- Stale lock eviction logs include agent activity state for debugging

Stream stale timeout:
- _emit_status when stale stream is detected (was log-only) — gateway
  users now see 'No response from provider for Ns' with model and
  context size
- Improved logger.warning with model name and estimated context size

Error path notifications (gateway-visible via _emit_status):
- Context compression attempts now use _emit_status (was _vprint only)
- Non-retryable client errors emit summary before aborting
- Max retry exhaustion emits error summary (was _vprint only)
- Rate limit exhaustion emits specific rate-limit message

These were all CLI-visible but silent to gateway users, which is why
people on Telegram/Discord saw generic 'request failed' messages
without explanation.
2026-04-05 18:33:33 -07:00
Mibayy
cc2b56b26a feat(api): structured run events via /v1/runs SSE endpoint
Add POST /v1/runs to start async agent runs and GET /v1/runs/{run_id}/events
for SSE streaming of typed lifecycle events (tool.started, tool.completed,
message.delta, reasoning.available, run.completed, run.failed).

Changes the internal tool_progress_callback signature from positional
(tool_name, preview, args) to event-type-first
(event_type, tool_name, preview, args, **kwargs). Existing consumers
filter on event_type and remain backward-compatible.

Adds concurrency limit (_MAX_CONCURRENT_RUNS=10) and orphaned run sweep.

Fixes logic inversion in cli.py _on_tool_progress where the original PR
would have displayed internal tools instead of non-internal ones.

Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
2026-04-05 12:05:13 -07:00
Teknium
0c95e91059 fix: follow-up fixes for salvaged PRs
- Fix GatewayApp → GatewayRunner import in api_server.py (PR #4976)
- Update launchd test assertions for new bootstrap/bootout/kickstart commands (PR #4892)
- Add nonlocal message declaration in run_sync() to fix UnboundLocalError (pre-existing scoping bug)
2026-04-05 11:59:28 -07:00
analista
6a6ae9a5c3 fix(gateway): correct misleading log text for unknown /commands
The warning said 'forwarding as plain text' but the code returns a
user-facing error reply instead of forwarding. Describe what actually
happens.
2026-04-05 11:59:28 -07:00
analista
e8053e8b93 fix(gateway): surface unknown /commands instead of leaking them to the LLM
Previously, typing a /command that isn't a built-in, plugin, or skill
would silently fall through to the LLM as plain text. The model often
interprets it as a loose instruction and invents unrelated tool calls —
e.g. a stray /claude_code slipped through and the model fabricated a
delegate_task invocation that got stuck in an OAuth loop.

Now we check GATEWAY_KNOWN_COMMANDS after the skill / plugin /
unavailable-skill lookups and return an actionable message pointing the
user at /commands. The user gets feedback, and the agent doesn't waste
a round-trip guessing what /foo-bar was supposed to mean.
2026-04-05 11:59:28 -07:00
analista
4a75aec433 fix(gateway): resolve Telegram's underscored /commands to skill/plugin keys
Telegram's Bot API disallows hyphens in command names, so
_build_telegram_menu registers /claude-code as /claude_code. When the
user taps it from autocomplete, the gateway dispatch did a direct
lookup against skill_cmds (keyed on the hyphenated form) and missed,
silently falling through to the LLM as plain text. The model would
then typically call delegate_task, spawning a Hermes subagent instead
of invoking the intended skill.

Normalize underscores to hyphens in skill and plugin command lookup,
matching the existing pattern in _check_unavailable_skill.
2026-04-05 11:59:28 -07:00
nibzard
4df2fca2f0 fix(gateway): cap memory flush retries at 3 to prevent infinite loop
The _session_expiry_watcher retried failed memory flushes forever
because exceptions were caught at debug level without setting
memory_flushed=True. Expired sessions with transient failures
(rate limits, network errors) would retry every 5 minutes
indefinitely, burning API quota and blocking gateway message
processing via 429 rate limit cascades.

Observed case: a March 19 session retried 28+ times over ~17 days,
causing repeated 429 errors that made Telegram unresponsive.

Add a per-session failure counter (_flush_failures) that gives up
after 3 consecutive attempts and marks the session as flushed to
break the loop.
2026-04-05 11:59:28 -07:00
Teknium
c100ad874c fix(matrix): E2EE cron delivery via live adapter + HTML formatting + origin fallback
Salvaged from PRs #3767 (chalkers), #5236 (ygd58), #2641 (buntingszn).

Three improvements to Matrix cron delivery:

1. Live adapter path: when the gateway is running, cron delivery now uses
   the connected MatrixAdapter via run_coroutine_threadsafe instead of
   the standalone HTTP PUT. This enables delivery to E2EE rooms where
   the raw HTTP path cannot encrypt. Falls back to standalone on failure.
   Threads adapters + event loop from gateway -> cron ticker -> tick() ->
   _deliver_result(). (from #3767)

2. HTML formatted_body: _send_matrix() now converts markdown to HTML
   using the optional markdown library, with h1-h6 to bold conversion
   for Element X compatibility. Falls back to plain text if markdown
   is not installed. Also adds random bytes to txn_id to prevent
   collisions. (from #5236)

3. Origin fallback: when deliver="origin" but origin is null (jobs
   created via API/scripts), falls back to HOME_CHANNEL env vars
   in order: matrix -> telegram -> discord -> slack. (from #2641)
2026-04-05 11:07:47 -07:00
dlkakbs
36e046e843 fix(gateway): MIME type fallback for Matrix document uploads
Cherry-picked run.py portion from PR #3495 by dlkakbs.
When Matrix sends non-image files (text, YAML, JSON, etc.), the MIME
type may be empty or application/octet-stream. Falls back to
extension-based detection so text files are properly injected into
agent context.
2026-04-05 11:07:47 -07:00
LucidPaths
70f798043b fix: Ollama Cloud auth, /model switch persistence, and alias tab completion
- Add OLLAMA_API_KEY to credential resolution chain for ollama.com endpoints
- Update requested_provider/_explicit_api_key/_explicit_base_url after /model
  switch so _ensure_runtime_credentials() doesn't revert the switch
- Pass base_url/api_key from fallback config to resolve_provider_client()
- Add DirectAlias system: user-configurable model_aliases in config.yaml
  checked before catalog resolution, with reverse lookup by model ID
- Add /model tab completion showing aliases with provider metadata

Co-authored-by: LucidPaths <LucidPaths@users.noreply.github.com>
2026-04-05 11:06:06 -07:00
Teknium
e899d6a05d fix: increase default HERMES_AGENT_TIMEOUT from 10min to 30min
Users hitting the 10-minute default during complex tool chains.
Bumps both the execution cap and stale-lock eviction timeout.
Still overridable via HERMES_AGENT_TIMEOUT env var (0 = unlimited).
2026-04-05 10:32:59 -07:00
Teknium
4976a8b066 feat: /model command — models.dev primary database + --provider flag (#5181)
Full overhaul of the model/provider system.

## What changed
- models.dev (109 providers, 4000+ models) as primary database for provider identity AND model metadata
- --provider flag replaces colon syntax for explicit provider switching
- Full ModelInfo/ProviderInfo dataclasses with context, cost, capabilities, modalities
- HermesOverlay system merges models.dev + Hermes-specific transport/auth/aggregator flags
- User-defined endpoints via config.yaml providers: section
- /model (no args) lists authenticated providers with curated model catalog
- Rich metadata display: context window, max output, cost/M tokens, capabilities
- Config migration: custom_providers list → providers dict (v11→v12)
- AIAgent.switch_model() for in-place model swap preserving conversation

## Files
agent/models_dev.py, hermes_cli/providers.py, hermes_cli/model_switch.py,
hermes_cli/model_normalize.py, cli.py, gateway/run.py, run_agent.py,
hermes_cli/config.py, hermes_cli/commands.py
2026-04-05 01:04:44 -07:00
Teknium
0c54da8aaf feat(gateway): live-stream /update output + interactive prompt buttons (#5180)
* feat(gateway): live-stream /update output + forward interactive prompts

Adds real-time output streaming and interactive prompt forwarding for
the gateway /update command, so users on Telegram/Discord/etc see the
full update progress and can respond to prompts (stash restore, config
migration) without needing terminal access.

Changes:

hermes_cli/main.py:
- Add --gateway flag to 'hermes update' argparse
- Add _gateway_prompt() file-based IPC function that writes
  .update_prompt.json and polls for .update_response
- Modify _restore_stashed_changes() to accept optional input_fn
  parameter for gateway mode prompt forwarding
- cmd_update() uses _gateway_prompt when --gateway is set, enabling
  interactive stash restore and config migration prompts

gateway/run.py:
- _handle_update_command: spawn with --gateway flag and
  PYTHONUNBUFFERED=1 for real-time output flushing
- Store session_key in .update_pending.json for cross-restart
  session matching
- Add _update_prompt_pending dict to track sessions awaiting
  update prompt responses
- Replace _watch_for_update_completion with _watch_update_progress:
  streams output chunks every ~4s, detects .update_prompt.json and
  forwards prompts to the user, handles completion/failure/timeout
- Add update prompt interception in _handle_message: when a prompt
  is pending, the user's next message is written to .update_response
  instead of being processed normally
- Preserve _send_update_notification as legacy fallback for
  post-restart cases where adapter isn't available yet

File-based IPC protocol:
- .update_prompt.json: written by update process with prompt text,
  default value, and unique ID
- .update_response: written by gateway with user's answer
- .update_output.txt: existing, now streamed in real-time
- .update_exit_code: existing completion marker

Tests: 16 new tests covering _gateway_prompt IPC, output streaming,
prompt detection/forwarding, message interception, and cleanup.

* feat: interactive buttons for update prompts (Telegram + Discord)

Telegram: Inline keyboard with ✓ Yes / ✗ No buttons. Clicking a button
answers the callback query, edits the message to show the choice, and
writes .update_response directly. CallbackQueryHandler registered on
the update_prompt: prefix.

Discord: UpdatePromptView (discord.ui.View) with green Yes / red No
buttons. Follows the ExecApprovalView pattern — auth check, embed color
update, disabled-after-click. Writes .update_response on click.

All platforms: /approve and /deny (and /yes, /no) now work as shorthand
for yes/no when an update prompt is pending. The text fallback message
instructs users to use these commands. Raw message interception still
works as a fallback for non-command responses.

Gateway watcher checks adapter for send_update_prompt method (class-level
check to avoid MagicMock false positives) and falls back to text prompt
with /approve instructions when unavailable.

* fix: block /update on non-messaging platforms (API, webhooks, ACP)

Add _UPDATE_ALLOWED_PLATFORMS frozenset that explicitly lists messaging
platforms where /update is permitted. API server, webhook, and ACP
platforms get a clear error directing them to run hermes update from
the terminal instead.

ACP and API server already don't reach _handle_message (separate
codepaths), and webhooks have distinct session keys that can't collide
with messaging sessions. This guard is belt-and-suspenders.
2026-04-05 00:28:58 -07:00
Teknium
b93fa234df fix: clear ghost status-bar lines on terminal resize (#4960)
* feat: add /branch (/fork) command for session branching

Inspired by Claude Code's /branch command. Creates a copy of the current
session's conversation history in a new session, allowing the user to
explore a different approach without losing the original.

Works like 'git checkout -b' for conversations:
- /branch            — auto-generates a title from the parent session
- /branch my-idea    — uses a custom title
- /fork              — alias for /branch

Implementation:
- CLI: _handle_branch_command() in cli.py
- Gateway: _handle_branch_command() in gateway/run.py
- CommandDef with 'fork' alias in commands.py
- Uses existing parent_session_id field in session DB
- Uses get_next_title_in_lineage() for auto-numbered branches
- 14 tests covering session creation, history copy, parent links,
  title generation, edge cases, and agent sync

* fix: clear ghost status-bar lines on terminal resize

When the terminal shrinks (e.g. un-maximize), the emulator reflows
previously full-width rows (status bar, input rules) into multiple
narrower rows. prompt_toolkit's _on_resize only cursor_up()s by the
stored layout height, missing the extra rows from reflow — leaving
ghost duplicates of the status bar visible.

Fix: monkey-patch Application._on_resize to detect width shrinks,
calculate the extra rows created by reflow, and inflate the renderer's
cursor_pos.y so the erase moves up far enough to clear ghosts.
2026-04-03 22:43:45 -07:00
Teknium
1c0c5d957f fix(gateway): support infinite timeout + periodic notifications + actionable error (#4959)
- HERMES_AGENT_TIMEOUT=0 now means no limit (infinite execution)
- Periodic 'still working' notifications every 10 minutes for long tasks
- Timeout error message now tells users how to increase the limit
- Stale-lock eviction handles infinite timeout correctly (float inf TTL)
2026-04-03 22:37:38 -07:00
Teknium
ad4feeaf0d feat: wire skills.external_dirs into all remaining discovery paths
The config key skills.external_dirs and core resolution (get_all_skills_dirs,
get_external_skills_dirs in agent/skill_utils.py) already existed but several
code paths still only scanned SKILLS_DIR. Now external dirs are respected
everywhere:

- skills_categories(): scan all dirs for category discovery
- _get_category_from_path(): resolve categories against any skills root
- skill_manager_tool._find_skill(): search all dirs for edit/patch/delete
- credential_files.get_skills_directory_mount(): mount all dirs into
  Docker/Singularity containers (external dirs at external_skills/<idx>)
- credential_files.iter_skills_files(): list files from all dirs for
  Modal/Daytona upload
- tools/environments/ssh.py: rsync all skill dirs to remote hosts
- gateway _check_unavailable_skill(): check disabled skills across all dirs

Usage in config.yaml:
  skills:
    external_dirs:
      - ~/repos/agent-skills/hermes
      - /shared/team-skills
2026-04-03 21:14:42 -07:00
Teknium
f1c0847145 fix(gateway): restore short preview truncation for all/new tool progress modes (#4935)
The tool_preview_length: 0 (unlimited) config change from e314833c
removed truncation from gateway progress messages in all/new modes.
This caused full terminal commands, code blocks, and file paths to
appear as permanent messages in Telegram -- the old 40-char truncation
was the correct behavior for messaging platforms.

Now:
- all/new modes: always truncate previews to 40 chars (old behavior)
- verbose mode: respects tool_preview_length config for JSON args cap

Reported by Paulclgro and socialsurfer on Discord.
2026-04-03 20:32:01 -07:00
memosr
287ac15efd fix(gateway): write update-pending state atomically to prevent corruption 2026-04-03 18:57:38 -07:00
Tranquil-Flow
3bfb39a25f fix(gateway): isolate approval session key per turn 2026-04-03 17:50:01 -07:00
Teknium
8a384628a5 fix(memory): profile-scoped memory isolation and clone support (#4845)
Three fixes for memory+profile isolation bugs:

1. memory_tool.py: Replace module-level MEMORY_DIR constant with
   get_memory_dir() function that calls get_hermes_home() dynamically.
   The old constant was cached at import time and could go stale if
   HERMES_HOME changed after import. Internal MemoryStore methods now
   call get_memory_dir() directly. MEMORY_DIR kept as backward-compat
   alias.

2. profiles.py: profile create --clone now copies MEMORY.md and USER.md
   from the source profile. These curated memory files are part of the
   agent's identity (same as SOUL.md) and should carry over on clone.

3. holographic plugin: initialize() now expands $HERMES_HOME and
   ${HERMES_HOME} in the db_path config value, so users can write
   'db_path: $HERMES_HOME/memory_store.db' and it resolves to the
   active profile directory, not the default home.

Tests updated to mock get_memory_dir() alongside the legacy MEMORY_DIR.
2026-04-03 13:10:11 -07:00
Teknium
aecbf7fa4a fix(discord): register /approve and /deny slash commands, wire up button-based approval UI (#4800)
Two fixes for Discord exec approval:

1. Register /approve and /deny as native Discord slash commands so they
   appear in Discord's command picker (autocomplete). Previously they
   were only handled as text commands, so users saw 'no commands found'
   when typing /approve.

2. Wire up the existing ExecApprovalView button UI (was dead code):
   - ExecApprovalView now calls resolve_gateway_approval() to actually
     unblock the waiting agent thread when a button is clicked
   - Gateway's _approval_notify_sync() detects adapters with
     send_exec_approval() and routes through the button UI
   - Added 'Allow Session' button for parity with /approve session
   - send_exec_approval() now accepts session_key and metadata for
     thread support
   - Graceful fallback to text-based /approve prompt if button send fails

Also updates test mocks to include grey/secondary ButtonStyle and
purple Color (used by new button styles).
2026-04-03 10:24:07 -07:00
Teknium
5db630aae4 fix: respect per-platform disabled skills in Telegram menu and gateway dispatch (#4799)
Three interconnected bugs caused `hermes skills config` per-platform
settings to be silently ignored:

1. telegram_menu_commands() never filtered disabled skills — all skills
   consumed menu slots regardless of platform config, hitting Telegram's
   100 command cap. Now loads disabled skills for 'telegram' and excludes
   them from the menu.

2. Gateway skill dispatch executed disabled skills because
   get_skill_commands() (process-global cache) only filters by the global
   disabled list at scan time. Added per-platform check before execution,
   returning an actionable 'skill is disabled' message.

3. get_disabled_skill_names() only checked HERMES_PLATFORM env var, but
   the gateway sets HERMES_SESSION_PLATFORM instead. Added
   HERMES_SESSION_PLATFORM as fallback, plus an explicit platform=
   parameter for callers that know their platform (menu builder, gateway
   dispatch). Also added platform to prompt_builder's skills cache key
   so multi-platform gateways get correct per-platform skill prompts.

Reported by SteveSkedasticity (CLAW community).
2026-04-03 10:10:53 -07:00
Teknium
b6f9b70afd fix(gateway): route /approve and /deny through running-agent guard (#4798)
When the agent is blocked on a dangerous command approval (threading.Event
wait inside tools/approval.py), incoming /approve and /deny commands were
falling through to the generic interrupt path instead of being dispatched
to their command handlers. The interrupt sets _interrupt_requested on the
agent, but the agent thread is blocked on event.wait() — not checking the
flag. Result: approval times out after 300s (5 minutes) before executing.

Fix: intercept /approve and /deny in the running-agent early-intercept
block (alongside /stop, /new, /queue) and route directly to
_handle_approve_command / _handle_deny_command.
2026-04-03 09:59:52 -07:00
Teknium
f374ae4c61 fix: prevent compression death spiral from API disconnects (#2153) (#4750)
Three fixes for long-running gateway sessions that enter a death spiral
when API disconnects prevent token data collection, which prevents
compression, which causes more disconnects:

Layer 1 — Stale token counter fallback (run_agent.py in-loop):
When last_prompt_tokens is 0 (stale after API disconnect or provider
returned no usage data), fall back to estimate_messages_tokens_rough()
instead of passing 0 to should_compress(), which would never fire.

Layer 2 — Server disconnect heuristic (run_agent.py error handler):
When ReadError/RemoteProtocolError hits a large session (>60% context
or >200 messages), treat it as a context-length error and trigger
compression rather than burning through retries that all fail the
same way.

Layer 3 — Hard message count limit (gateway/run.py hygiene):
Force compression when a session exceeds 400 messages, regardless of
token estimates. This catches runaway growth even when all token-based
checks fail due to missing API data.

Based on the analysis from PR #2157 by ygd58 — the gateway threshold
direction fix (1.4x multiplier) was already resolved on main.
2026-04-03 02:16:46 -07:00
Teknium
a933079564 fix: move class-level attribute after docstring, clarify throttle comment
Follow-up nits for salvaged PR #4577:
- Move _running_agents_ts class attribute below the docstring so
  GatewayRunner.__doc__ is preserved.
- Add clarifying comment explaining the throttle continue behavior
  (batches queued messages during the throttle interval).
2026-04-03 00:50:17 -07:00
kshitijk4poor
0ed28ab80c refactor: simplify and harden PR fixes after review
- Fix cron ThreadPoolExecutor blocking on timeout: use shutdown(wait=False,
  cancel_futures=True) instead of context manager that waits indefinitely
- Extract _dequeue_pending_text() to deduplicate media-placeholder logic
  in interrupt and normal-completion dequeue paths
- Remove hasattr guards for _running_agents_ts: add class-level default
  so partial test construction works without scattered defensive checks
- Move `import concurrent.futures` to top of cron/scheduler.py
- Progress throttle: sleep remaining interval instead of busy-looping
  0.1s (~15 wakeups per 1.5s window → 1 wakeup)
- Deduplicate _load_stt_config() in transcription_tools.py:
  _has_openai_audio_backend() now delegates to _resolve_openai_audio_client_config()
2026-04-03 00:50:17 -07:00
kshitijk4poor
970042deab fix(gateway): prevent stuck sessions with agent timeout and staleness eviction
Three changes to prevent sessions from getting permanently locked:

1. Agent execution timeout (HERMES_AGENT_TIMEOUT, default 10min):
   Wraps run_in_executor with asyncio.wait_for so a hung API call or
   runaway tool can't lock a session indefinitely. On timeout, the
   agent is interrupted and the user gets an actionable error message.

2. Staleness eviction for _running_agents:
   Tracks start timestamps for each session entry. When a new message
   arrives and the entry is older than timeout + 1min grace, it's
   evicted as a leaked lock. Safety net for any cleanup path that
   fails to remove the entry.

3. Cron job timeout (HERMES_CRON_TIMEOUT, default 10min):
   Wraps run_conversation in a ThreadPoolExecutor with timeout so a
   hung cron job doesn't block the ticker thread (and all subsequent
   cron jobs) indefinitely.

Follows grammY runner's per-update timeout pattern and aiogram's
asyncio.wait_for approach for handler deadlines.
2026-04-03 00:50:17 -07:00
kshitijk4poor
69f85a4dce fix(gateway): race condition, photo media loss, and flood control in Telegram
Three bugs causing intermittent silent drops, partial responses, and
flood control delays on the Telegram platform:

1. Race condition in handle_message() — _active_sessions was set inside
   the background task, not before create_task(). Two rapid messages
   could both pass the guard and spawn duplicate processing tasks.
   Fix: set _active_sessions synchronously before spawning the task
   (grammY sequentialize / aiogram EventIsolation pattern).

2. Photo media loss on dequeue — when a photo (no caption) was queued
   during active processing and later dequeued, only .text was
   extracted. Empty text → message silently dropped.
   Fix: _build_media_placeholder() creates text context for media-only
   events so they survive the dequeue path.

3. Progress message edits triggered Telegram flood control — rapid tool
   calls edited the progress message every 0.3s, hitting Telegram's
   rate limit (23s+ waits). This blocked progress updates and could
   cause stream consumer timeouts.
   Fix: throttle edits to 1.5s minimum interval, detect flood control
   errors and gracefully degrade to new messages. edit_message() now
   returns failure for flood waits >5s instead of blocking.
2026-04-03 00:50:17 -07:00
Teknium
21c2d32471 fix(gateway): normalize step_callback prev_tools for backward compat
The PR changed prev_tools from list[str] to list[dict] with name/result
keys.  The gateway's _step_callback_sync passed this directly to hooks
as 'tool_names', breaking user-authored hooks that call
', '.join(tool_names).

Now:
- 'tool_names' always contains strings (backward-compatible)
- 'tools' carries the enriched dicts for hooks that want results

Also adds summary logging to register_mcp_servers() and comprehensive
tests for all three PR changes:
- sanitize_mcp_name_component edge cases
- register_mcp_servers public API
- _register_session_mcp_servers ACP integration
- step_callback result forwarding
- gateway normalization backward compat
2026-04-02 20:54:27 -07:00
Teknium
924bc67eee feat(memory): pluggable memory provider interface with profile isolation, review fixes, and honcho CLI restoration (#4623)
* feat(memory): add pluggable memory provider interface with profile isolation

Introduces a pluggable MemoryProvider ABC so external memory backends can
integrate with Hermes without modifying core files. Each backend becomes a
plugin implementing a standard interface, orchestrated by MemoryManager.

Key architecture:
- agent/memory_provider.py — ABC with core + optional lifecycle hooks
- agent/memory_manager.py — single integration point in the agent loop
- agent/builtin_memory_provider.py — wraps existing MEMORY.md/USER.md

Profile isolation fixes applied to all 6 shipped plugins:
- Cognitive Memory: use get_hermes_home() instead of raw env var
- Hindsight Memory: check $HERMES_HOME/hindsight/config.json first,
  fall back to legacy ~/.hindsight/ for backward compat
- Hermes Memory Store: replace hardcoded ~/.hermes paths with
  get_hermes_home() for config loading and DB path defaults
- Mem0 Memory: use get_hermes_home() instead of raw env var
- RetainDB Memory: auto-derive profile-scoped project name from
  hermes_home path (hermes-<profile>), explicit env var overrides
- OpenViking Memory: read-only, no local state, isolation via .env

MemoryManager.initialize_all() now injects hermes_home into kwargs so
every provider can resolve profile-scoped storage without importing
get_hermes_home() themselves.

Plugin system: adds register_memory_provider() to PluginContext and
get_plugin_memory_providers() accessor.

Based on PR #3825. 46 tests (37 unit + 5 E2E + 4 plugin registration).

* refactor(memory): drop cognitive plugin, rewrite OpenViking as full provider

Remove cognitive-memory plugin (#727) — core mechanics are broken:
decay runs 24x too fast (hourly not daily), prefetch uses row ID as
timestamp, search limited by importance not similarity.

Rewrite openviking-memory plugin from a read-only search wrapper into
a full bidirectional memory provider using the complete OpenViking
session lifecycle API:

- sync_turn: records user/assistant messages to OpenViking session
  (threaded, non-blocking)
- on_session_end: commits session to trigger automatic memory extraction
  into 6 categories (profile, preferences, entities, events, cases,
  patterns)
- prefetch: background semantic search via find() endpoint
- on_memory_write: mirrors built-in memory writes to the session
- is_available: checks env var only, no network calls (ABC compliance)

Tools expanded from 3 to 5:
- viking_search: semantic search with mode/scope/limit
- viking_read: tiered content (abstract ~100tok / overview ~2k / full)
- viking_browse: filesystem-style navigation (list/tree/stat)
- viking_remember: explicit memory storage via session
- viking_add_resource: ingest URLs/docs into knowledge base

Uses direct HTTP via httpx (no openviking SDK dependency needed).
Response truncation on viking_read to prevent context flooding.

* fix(memory): harden Mem0 plugin — thread safety, non-blocking sync, circuit breaker

- Remove redundant mem0_context tool (identical to mem0_search with
  rerank=true, top_k=5 — wastes a tool slot and confuses the model)
- Thread sync_turn so it's non-blocking — Mem0's server-side LLM
  extraction can take 5-10s, was stalling the agent after every turn
- Add threading.Lock around _get_client() for thread-safe lazy init
  (prefetch and sync threads could race on first client creation)
- Add circuit breaker: after 5 consecutive API failures, pause calls
  for 120s instead of hammering a down server every turn. Auto-resets
  after cooldown. Logs a warning when tripped.
- Track success/failure in prefetch, sync_turn, and all tool calls
- Wait for previous sync to finish before starting a new one (prevents
  unbounded thread accumulation on rapid turns)
- Clean up shutdown to join both prefetch and sync threads

* fix(memory): enforce single external memory provider limit

MemoryManager now rejects a second non-builtin provider with a warning.
Built-in memory (MEMORY.md/USER.md) is always accepted. Only ONE
external plugin provider is allowed at a time. This prevents tool
schema bloat (some providers add 3-5 tools each) and conflicting
memory backends.

The warning message directs users to configure memory.provider in
config.yaml to select which provider to activate.

Updated all 47 tests to use builtin + one external pattern instead
of multiple externals. Added test_second_external_rejected to verify
the enforcement.

* feat(memory): add ByteRover memory provider plugin

Implements the ByteRover integration (from PR #3499 by hieuntg81) as a
MemoryProvider plugin instead of direct run_agent.py modifications.

ByteRover provides persistent memory via the brv CLI — a hierarchical
knowledge tree with tiered retrieval (fuzzy text then LLM-driven search).
Local-first with optional cloud sync.

Plugin capabilities:
- prefetch: background brv query for relevant context
- sync_turn: curate conversation turns (threaded, non-blocking)
- on_memory_write: mirror built-in memory writes to brv
- on_pre_compress: extract insights before context compression

Tools (3):
- brv_query: search the knowledge tree
- brv_curate: store facts/decisions/patterns
- brv_status: check CLI version and context tree state

Profile isolation: working directory at $HERMES_HOME/byterover/ (scoped
per profile). Binary resolution cached with thread-safe double-checked
locking. All write operations threaded to avoid blocking the agent
(curate can take 120s with LLM processing).

* fix(memory): thread remaining sync_turns, fix holographic, add config key

Plugin fixes:
- Hindsight: thread sync_turn (was blocking up to 30s via _run_in_thread)
- RetainDB: thread sync_turn (was blocking on HTTP POST)
- Both: shutdown now joins sync threads alongside prefetch threads

Holographic retrieval fixes:
- reason(): removed dead intersection_key computation (bundled but never
  used in scoring). Now reuses pre-computed entity_residuals directly,
  moved role_content encoding outside the inner loop.
- contradict(): added _MAX_CONTRADICT_FACTS=500 scaling guard. Above
  500 facts, only checks the most recently updated ones to avoid O(n^2)
  explosion (~125K comparisons at 500 is acceptable).

Config:
- Added memory.provider key to DEFAULT_CONFIG ("" = builtin only).
  No version bump needed (deep_merge handles new keys automatically).

* feat(memory): extract Honcho as a MemoryProvider plugin

Creates plugins/honcho-memory/ as a thin adapter over the existing
honcho_integration/ package. All 4 Honcho tools (profile, search,
context, conclude) move from the normal tool registry to the
MemoryProvider interface.

The plugin delegates all work to HonchoSessionManager — no Honcho
logic is reimplemented. It uses the existing config chain:
$HERMES_HOME/honcho.json -> ~/.honcho/config.json -> env vars.

Lifecycle hooks:
- initialize: creates HonchoSessionManager via existing client factory
- prefetch: background dialectic query
- sync_turn: records messages + flushes to API (threaded)
- on_memory_write: mirrors user profile writes as conclusions
- on_session_end: flushes all pending messages

This is a prerequisite for the MemoryManager wiring in run_agent.py.
Once wired, Honcho goes through the same provider interface as all
other memory plugins, and the scattered Honcho code in run_agent.py
can be consolidated into the single MemoryManager integration point.

* feat(memory): wire MemoryManager into run_agent.py

Adds 8 integration points for the external memory provider plugin,
all purely additive (zero existing code modified):

1. Init (~L1130): Create MemoryManager, find matching plugin provider
   from memory.provider config, initialize with session context
2. Tool injection (~L1160): Append provider tool schemas to self.tools
   and self.valid_tool_names after memory_manager init
3. System prompt (~L2705): Add external provider's system_prompt_block
   alongside existing MEMORY.md/USER.md blocks
4. Tool routing (~L5362): Route provider tool calls through
   memory_manager.handle_tool_call() before the catchall handler
5. Memory write bridge (~L5353): Notify external provider via
   on_memory_write() when the built-in memory tool writes
6. Pre-compress (~L5233): Call on_pre_compress() before context
   compression discards messages
7. Prefetch (~L6421): Inject provider prefetch results into the
   current-turn user message (same pattern as Honcho turn context)
8. Turn sync + session end (~L8161, ~L8172): sync_all() after each
   completed turn, queue_prefetch_all() for next turn, on_session_end()
   + shutdown_all() at conversation end

All hooks are wrapped in try/except — a failing provider never breaks
the agent. The existing memory system, Honcho integration, and all
other code paths are completely untouched.

Full suite: 7222 passed, 4 pre-existing failures.

* refactor(memory): remove legacy Honcho integration from core

Extracts all Honcho-specific code from run_agent.py, model_tools.py,
toolsets.py, and gateway/run.py. Honcho is now exclusively available
as a memory provider plugin (plugins/honcho-memory/).

Removed from run_agent.py (-457 lines):
- Honcho init block (session manager creation, activation, config)
- 8 Honcho methods: _honcho_should_activate, _strip_honcho_tools,
  _activate_honcho, _register_honcho_exit_hook, _queue_honcho_prefetch,
  _honcho_prefetch, _honcho_save_user_observation, _honcho_sync
- _inject_honcho_turn_context module-level function
- Honcho system prompt block (tool descriptions, CLI commands)
- Honcho context injection in api_messages building
- Honcho params from __init__ (honcho_session_key, honcho_manager,
  honcho_config)
- HONCHO_TOOL_NAMES constant
- All honcho-specific tool dispatch forwarding

Removed from other files:
- model_tools.py: honcho_tools import, honcho params from handle_function_call
- toolsets.py: honcho toolset definition, honcho tools from core tools list
- gateway/run.py: honcho params from AIAgent constructor calls

Removed tests (-339 lines):
- 9 Honcho-specific test methods from test_run_agent.py
- TestHonchoAtexitFlush class from test_exit_cleanup_interrupt.py

Restored two regex constants (_SURROGATE_RE, _BUDGET_WARNING_RE) that
were accidentally removed during the honcho function extraction.

The honcho_integration/ package is kept intact — the plugin delegates
to it. tools/honcho_tools.py registry entries are now dead code (import
commented out in model_tools.py) but the file is preserved for reference.

Full suite: 7207 passed, 4 pre-existing failures. Zero regressions.

* refactor(memory): restructure plugins, add CLI, clean gateway, migration notice

Plugin restructure:
- Move all memory plugins from plugins/<name>-memory/ to plugins/memory/<name>/
  (byterover, hindsight, holographic, honcho, mem0, openviking, retaindb)
- New plugins/memory/__init__.py discovery module that scans the directory
  directly, loading providers by name without the general plugin system
- run_agent.py uses load_memory_provider() instead of get_plugin_memory_providers()

CLI wiring:
- hermes memory setup — interactive curses picker + config wizard
- hermes memory status — show active provider, config, availability
- hermes memory off — disable external provider (built-in only)
- hermes honcho — now shows migration notice pointing to hermes memory setup

Gateway cleanup:
- Remove _get_or_create_gateway_honcho (already removed in prev commit)
- Remove _shutdown_gateway_honcho and _shutdown_all_gateway_honcho methods
- Remove all calls to shutdown methods (4 call sites)
- Remove _honcho_managers/_honcho_configs dict references

Dead code removal:
- Delete tools/honcho_tools.py (279 lines, import was already commented out)
- Delete tests/gateway/test_honcho_lifecycle.py (131 lines, tested removed methods)
- Remove if False placeholder from run_agent.py

Migration:
- Honcho migration notice on startup: detects existing honcho.json or
  ~/.honcho/config.json, prints guidance to run hermes memory setup.
  Only fires when memory.provider is not set and not in quiet mode.

Full suite: 7203 passed, 4 pre-existing failures. Zero regressions.

* feat(memory): standardize plugin config + add per-plugin documentation

Config architecture:
- Add save_config(values, hermes_home) to MemoryProvider ABC
- Honcho: writes to $HERMES_HOME/honcho.json (SDK native)
- Mem0: writes to $HERMES_HOME/mem0.json
- Hindsight: writes to $HERMES_HOME/hindsight/config.json
- Holographic: writes to config.yaml under plugins.hermes-memory-store
- OpenViking/RetainDB/ByteRover: env-var only (default no-op)

Setup wizard (hermes memory setup):
- Now calls provider.save_config() for non-secret config
- Secrets still go to .env via env vars
- Only memory.provider activation key goes to config.yaml

Documentation:
- README.md for each of the 7 providers in plugins/memory/<name>/
- Requirements, setup (wizard + manual), config reference, tools table
- Consistent format across all providers

The contract for new memory plugins:
- get_config_schema() declares all fields (REQUIRED)
- save_config() writes native config (REQUIRED if not env-var-only)
- Secrets use env_var field in schema, written to .env by wizard
- README.md in the plugin directory

* docs: add memory providers user guide + developer guide

New pages:
- user-guide/features/memory-providers.md — comprehensive guide covering
  all 7 shipped providers (Honcho, OpenViking, Mem0, Hindsight,
  Holographic, RetainDB, ByteRover). Each with setup, config, tools,
  cost, and unique features. Includes comparison table and profile
  isolation notes.
- developer-guide/memory-provider-plugin.md — how to build a new memory
  provider plugin. Covers ABC, required methods, config schema,
  save_config, threading contract, profile isolation, testing.

Updated pages:
- user-guide/features/memory.md — replaced Honcho section with link to
  new Memory Providers page
- user-guide/features/honcho.md — replaced with migration redirect to
  the new Memory Providers page
- sidebars.ts — added both new pages to navigation

* fix(memory): auto-migrate Honcho users to memory provider plugin

When honcho.json or ~/.honcho/config.json exists but memory.provider
is not set, automatically set memory.provider: honcho in config.yaml
and activate the plugin. The plugin reads the same config files, so
all data and credentials are preserved. Zero user action needed.

Persists the migration to config.yaml so it only fires once. Prints
a one-line confirmation in non-quiet mode.

* fix(memory): only auto-migrate Honcho when enabled + credentialed

Check HonchoClientConfig.enabled AND (api_key OR base_url) before
auto-migrating — not just file existence. Prevents false activation
for users who disabled Honcho, stopped using it (config lingers),
or have ~/.honcho/ from a different tool.

* feat(memory): auto-install pip dependencies during hermes memory setup

Reads pip_dependencies from plugin.yaml, checks which are missing,
installs them via pip before config walkthrough. Also shows install
guidance for external_dependencies (e.g. brv CLI for ByteRover).

Updated all 7 plugin.yaml files with pip_dependencies:
- honcho: honcho-ai
- mem0: mem0ai
- openviking: httpx
- hindsight: hindsight-client
- holographic: (none)
- retaindb: requests
- byterover: (external_dependencies for brv CLI)

* fix: remove remaining Honcho crash risks from cli.py and gateway

cli.py: removed Honcho session re-mapping block (would crash importing
deleted tools/honcho_tools.py), Honcho flush on compress, Honcho
session display on startup, Honcho shutdown on exit, honcho_session_key
AIAgent param.

gateway/run.py: removed honcho_session_key params from helper methods,
sync_honcho param, _honcho.shutdown() block.

tests: fixed test_cron_session_with_honcho_key_skipped (was passing
removed honcho_key param to _flush_memories_for_session).

* fix: include plugins/ in pyproject.toml package list

Without this, plugins/memory/ wouldn't be included in non-editable
installs. Hermes always runs from the repo checkout so this is belt-
and-suspenders, but prevents breakage if the install method changes.

* fix(memory): correct pip-to-import name mapping for dep checks

The heuristic dep.replace('-', '_') fails for packages where the pip
name differs from the import name: honcho-ai→honcho, mem0ai→mem0,
hindsight-client→hindsight_client. Added explicit mapping table so
hermes memory setup doesn't try to reinstall already-installed packages.

* chore: remove dead code from old plugin memory registration path

- hermes_cli/plugins.py: removed register_memory_provider(),
  _memory_providers list, get_plugin_memory_providers() — memory
  providers now use plugins/memory/ discovery, not the general plugin system
- hermes_cli/main.py: stripped 74 lines of dead honcho argparse
  subparsers (setup, status, sessions, map, peer, mode, tokens,
  identity, migrate) — kept only the migration redirect
- agent/memory_provider.py: updated docstring to reflect new
  registration path
- tests: replaced TestPluginMemoryProviderRegistration with
  TestPluginMemoryDiscovery that tests the actual plugins/memory/
  discovery system. Added 3 new tests (discover, load, nonexistent).

* chore: delete dead honcho_integration/cli.py and its tests

cli.py (794 lines) was the old 'hermes honcho' command handler — nobody
calls it since cmd_honcho was replaced with a migration redirect.

Deleted tests that imported from removed code:
- tests/honcho_integration/test_cli.py (tested _resolve_api_key)
- tests/honcho_integration/test_config_isolation.py (tested CLI config paths)
- tests/tools/test_honcho_tools.py (tested the deleted tools/honcho_tools.py)

Remaining honcho_integration/ files (actively used by the plugin):
- client.py (445 lines) — config loading, SDK client creation
- session.py (991 lines) — session management, queries, flush

* refactor: move honcho_integration/ into the honcho plugin

Moves client.py (445 lines) and session.py (991 lines) from the
top-level honcho_integration/ package into plugins/memory/honcho/.
No Honcho code remains in the main codebase.

- plugins/memory/honcho/client.py — config loading, SDK client creation
- plugins/memory/honcho/session.py — session management, queries, flush
- Updated all imports: run_agent.py (auto-migration), hermes_cli/doctor.py,
  plugin __init__.py, session.py cross-import, all tests
- Removed honcho_integration/ package and pyproject.toml entry
- Renamed tests/honcho_integration/ → tests/honcho_plugin/

* docs: update architecture + gateway-internals for memory provider system

- architecture.md: replaced honcho_integration/ with plugins/memory/
- gateway-internals.md: replaced Honcho-specific session routing and
  flush lifecycle docs with generic memory provider interface docs

* fix: update stale mock path for resolve_active_host after honcho plugin migration

* fix(memory): address review feedback — P0 lifecycle, ABC contract, honcho CLI restore

Review feedback from Honcho devs (erosika):

P0 — Provider lifecycle:
- Remove on_session_end() + shutdown_all() from run_conversation() tail
  (was killing providers after every turn in multi-turn sessions)
- Add shutdown_memory_provider() method on AIAgent for callers
- Wire shutdown into CLI atexit, reset_conversation, gateway stop/expiry

Bug fixes:
- Remove sync_honcho=False kwarg from /btw callsites (TypeError crash)
- Fix doctor.py references to dead 'hermes honcho setup' command
- Cache prefetch_all() before tool loop (was re-calling every iteration)

ABC contract hardening (all backwards-compatible):
- Add session_id kwarg to prefetch/sync_turn/queue_prefetch
- Make on_pre_compress() return str (provider insights in compression)
- Add **kwargs to on_turn_start() for runtime context
- Add on_delegation() hook for parent-side subagent observation
- Document agent_context/agent_identity/agent_workspace kwargs on
  initialize() (prevents cron corruption, enables profile scoping)
- Fix docstring: single external provider, not multiple

Honcho CLI restoration:
- Add plugins/memory/honcho/cli.py (from main's honcho_integration/cli.py
  with imports adapted to plugin path)
- Restore full hermes honcho command with all subcommands (status, peer,
  mode, tokens, identity, enable/disable, sync, peers, --target-profile)
- Restore auto-clone on profile creation + sync on hermes update
- hermes honcho setup now redirects to hermes memory setup

* fix(memory): wire on_delegation, skip_memory for cron/flush, fix ByteRover return type

- Wire on_delegation() in delegate_tool.py — parent's memory provider
  is notified with task+result after each subagent completes
- Add skip_memory=True to cron scheduler (prevents cron system prompts
  from corrupting user representations — closes #4052)
- Add skip_memory=True to gateway flush agent (throwaway agent shouldn't
  activate memory provider)
- Fix ByteRover on_pre_compress() return type: None -> str

* fix(honcho): port profile isolation fixes from PR #4632

Ports 5 bug fixes found during profile testing (erosika's PR #4632):

1. 3-tier config resolution — resolve_config_path() now checks
   $HERMES_HOME/honcho.json → ~/.hermes/honcho.json → ~/.honcho/config.json
   (non-default profiles couldn't find shared host blocks)

2. Thread host=_host_key() through from_global_config() in cmd_setup,
   cmd_status, cmd_identity (--target-profile was being ignored)

3. Use bare profile name as aiPeer (not host key with dots) — Honcho's
   peer ID pattern is ^[a-zA-Z0-9_-]+$, dots are invalid

4. Wrap add_peers() in try/except — was fatal on new AI peers, killed
   all message uploads for the session

5. Gate Honcho clone behind --clone/--clone-all on profile create
   (bare create should be blank-slate)

Also: sanitize assistant_peer_id via _sanitize_id()

* fix(tests): add module cleanup fixture to test_cli_provider_resolution

test_cli_provider_resolution._import_cli() wipes tools.*, cli, and
run_agent from sys.modules to force fresh imports, but had no cleanup.
This poisoned all subsequent tests on the same xdist worker — mocks
targeting tools.file_tools, tools.send_message_tool, etc. patched the
NEW module object while already-imported functions still referenced
the OLD one. Caused ~25 cascade failures: send_message KeyError,
process_registry FileNotFoundError, file_read_guards timeouts,
read_loop_detection file-not-found, mcp_oauth None port, and
provider_parity/codex_execution stale tool lists.

Fix: autouse fixture saves all affected modules before each test and
restores them after, matching the pattern in
test_managed_browserbase_and_modal.py.
2026-04-02 15:33:51 -07:00
Teknium
e0b2bdb089 fix: webhook platform support — skip home channel prompt, disable tool progress (salvage #4363) (#4660)
Cherry-picked from PR #4363 by @bennyhodl with follow-up fixes:

- Skip 'No home channel' prompt for webhook platform (webhooks deliver
  to configured targets, not a home channel)
- Disable tool progress for webhooks (no message editing support)
- Add webhook to PLATFORMS in tools_config.py and skills_config.py
- Add hermes-webhook toolset to toolsets.py + hermes-gateway includes
- Removed overly aggressive <50 char content filter that blocked
  legitimate short responses (tool progress already handled at source)

Co-authored-by: bennyhodl <bennyhodl@users.noreply.github.com>
2026-04-02 14:00:22 -07:00
kshitijk4poor
20441cf2c8 fix(insights): persist token usage for non-CLI sessions 2026-04-02 10:47:13 -07:00
Teknium
624ad582a5 fix: make gateway approval block agent thread like CLI does (#4557)
The gateway's dangerous command approval system was fundamentally broken:
the agent loop continued running after a command was flagged, and the
approval request only reached the user after the agent finished its
entire conversation loop. By then the context was lost.

This change makes the gateway approval mirror the CLI's synchronous
behavior. When a dangerous command is detected:

1. The agent thread blocks on a threading.Event
2. The approval request is sent to the user immediately
3. The user responds with /approve or /deny
4. The event is signaled and the agent resumes with the real result

The agent never sees 'approval_required' as a tool result. It either
gets the command output (approved) or a definitive BLOCKED message
(denied/timed out) — same as CLI mode.

Queue-based design supports multiple concurrent approvals (parallel
subagents via delegate_task, execute_code RPC handlers). Each approval
gets its own _ApprovalEntry with its own threading.Event. /approve
resolves the oldest (FIFO); /approve all resolves all at once.

Changes:
- tools/approval.py: Queue-based per-session blocking gateway approval
  (register/unregister callbacks, resolve with FIFO or all-at-once)
- gateway/run.py: Register approval callback in run_sync(), remove
  post-loop pop_pending hack, /approve and /deny support 'all' flag
- tests: 21 tests including parallel subagent E2E scenarios
2026-04-02 01:47:19 -07:00
Teknium
16d9f58445 fix(gateway): persist memory flush state to prevent redundant re-flushes on restart (#4481)
* fix: force-close TCP sockets on client cleanup, detect and recover dead connections

When a provider drops connections mid-stream (e.g. OpenRouter outage),
httpx's graceful close leaves sockets in CLOSE-WAIT indefinitely. These
zombie connections accumulate and can prevent recovery without restarting.

Changes:
- _force_close_tcp_sockets: walks the httpx connection pool and issues
  socket.shutdown(SHUT_RDWR) + close() to force TCP RST on every socket
  when a client is closed, preventing CLOSE-WAIT accumulation
- _cleanup_dead_connections: probes the primary client's pool for dead
  sockets (recv MSG_PEEK), rebuilds the client if any are found
- Pre-turn health check at the start of each run_conversation call that
  auto-recovers with a user-facing status message
- Primary client rebuild after stale stream detection to purge pool
- User-facing messages on streaming connection failures:
  "Connection to provider dropped — Reconnecting (attempt 2/3)"
  "Connection failed after 3 attempts — try again in a moment"

Made-with: Cursor

* fix: pool entry missing base_url for openrouter, clean error messages

- _resolve_runtime_from_pool_entry: add OPENROUTER_BASE_URL fallback
  when pool entry has no runtime_base_url (pool entries from auth.json
  credential_pool often omit base_url)
- Replace Rich console.print for auth errors with plain print() to
  prevent ANSI escape code mangling through prompt_toolkit's stdout patch
- Force-close TCP sockets on client cleanup to prevent CLOSE-WAIT
  accumulation after provider outages
- Pre-turn dead connection detection with auto-recovery and user message
- Primary client rebuild after stale stream detection
- User-facing status messages on streaming connection failures/retries

Made-with: Cursor

* fix(gateway): persist memory flush state to prevent redundant re-flushes on restart

The _session_expiry_watcher tracked flushed sessions in an in-memory set
(_pre_flushed_sessions) that was lost on gateway restart. Expired sessions
remained in sessions.json and were re-discovered every restart, causing
redundant AIAgent runs that burned API credits and blocked the event loop.

Fix: Add a memory_flushed boolean field to SessionEntry, persisted in
sessions.json. The watcher sets it after a successful flush. On restart,
the flag survives and the watcher skips already-flushed sessions.

- Add memory_flushed field to SessionEntry with to_dict/from_dict support
- Old sessions.json entries without the field default to False (backward compat)
- Remove the ephemeral _pre_flushed_sessions set from SessionStore
- Update tests: save/load roundtrip, legacy entry compat, auto-reset behavior
2026-04-01 12:05:02 -07:00
Teknium
bacc86d031 fix: use RedactingFormatter on stderr handler, update types and test mock
- stderr handler now uses RedactingFormatter to match file handlers
- restart path uses verbose=0 (int) instead of verbose=False (bool)
- test mock updated with new run_gateway(verbose, quiet, replace) signature
2026-04-01 11:05:07 -07:00
Alan Justino
5bd01b838c fix(gateway): wire -v/-q flags to stderr logging
By default 'hermes gateway run' now prints WARNING+ to stderr so
connection errors and startup failures are visible in the terminal
without having to tail ~/.hermes/logs/gateway.log.

- gateway/run.py: start_gateway() accepts verbosity: Optional[int]=0.
  When not None, attaches a StreamHandler to stderr with level mapped
  from the count (0=WARNING, 1=INFO, 2+=DEBUG). Root logger level is
  also lowered when DEBUG is requested so records are not swallowed.

- hermes_cli/gateway.py: run_gateway() gains verbose: int and
  quiet: bool params. -q translates to verbosity=None (no stderr
  handler). Wired through gateway_command().

- hermes_cli/main.py: -v changed from store_true to action=count so
  -v/-vv/-vvv each increment the level. -q/--quiet added as a new flag.

Behaviour summary:
  hermes gateway run        -> WARNING+ on stderr (default)
  hermes gateway run -q     -> silent
  hermes gateway run -v     -> INFO+
  hermes gateway run -vv    -> DEBUG
2026-04-01 11:05:07 -07:00
Nick
9a581bba50 fix(gateway): resume agent after /approve executes blocked command
When a dangerous command was blocked and the user approved it via /approve,
the command was executed but the agent loop had already exited — the agent
never received the command output and the task died silently.

Now _handle_approve_command sends immediate feedback to the user, then
creates a synthetic continuation message with the command output and feeds
it through _handle_message so the agent picks up where it left off.

- Send command result to chat immediately via adapter.send()
- Create synthetic MessageEvent with command + output as context
- Spawn asyncio task to re-invoke agent via _handle_message
- Return None (feedback already sent directly)
- Add test for agent re-invocation after approval
- Update existing approval tests for new return behavior
2026-04-01 01:38:55 -07:00
Teknium
a7f7e87070 fix: preserve credential_pool through smart routing and defer eager fallback on 429 (#4361)
Three bugs prevented credential pool rotation from working when multiple
Codex OAuth tokens were configured:

1. credential_pool was dropped during smart model turn routing.
   resolve_turn_route() constructed runtime dicts without it, so the
   AIAgent was created without pool access. Fixed in smart_model_routing.py
   (no-route and fallback paths), cli.py, and gateway/run.py.

2. Eager fallback fired before pool rotation on 429. The rate-limit
   handler at line ~7180 switched to a fallback provider immediately,
   before _recover_with_credential_pool got a chance to rotate to the
   next credential. Now deferred when the pool still has credentials.

3. (Non-issue) Retry budget was reported as too small, but successful
   pool rotations already skip retry_count increment — no change needed.

Reported by community member Schinsly who identified all three root
causes and verified the fix locally with multiple Codex accounts.
2026-04-01 01:02:34 -07:00
Teknium
84a541b619 feat: support * wildcard in platform allowlists and improve WhatsApp docs
* docs: clarify WhatsApp allowlist behavior and document WHATSAPP_ALLOW_ALL_USERS

- Add WHATSAPP_ALLOW_ALL_USERS and WHATSAPP_DEBUG to env vars reference
- Warn that * is not a wildcard and silently blocks all messages
- Show WHATSAPP_ALLOWED_USERS as optional, not required
- Update troubleshooting with the * trap and debug mode tip
- Fix Security section to mention the allow-all alternative

Prompted by a user report in Discord where WHATSAPP_ALLOWED_USERS=*
caused all incoming messages to be silently dropped at the bridge level.

* feat: support * wildcard in platform allowlists

Follow the precedent set by SIGNAL_GROUP_ALLOWED_USERS which already
supports * as an allow-all wildcard.

Bridge (allowlist.js): matchesAllowedUser() now checks for * in the
allowedUsers set before iterating sender aliases.

Gateway (run.py): _is_authorized() checks for * in allowed_ids after
parsing the allowlist. This is generic — works for all platforms, not
just WhatsApp.

Updated docs to document * as a supported value instead of warning
against it. Added WHATSAPP_ALLOW_ALL_USERS and WHATSAPP_DEBUG to
the env vars reference.

Tests: JS allowlist test + 2 Python gateway tests (WhatsApp + Telegram
to verify cross-platform behavior).
2026-03-31 10:42:03 -07:00
Teknium
8d59881a62 feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)
* feat(auth): add same-provider credential pools and rotation UX

Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.

- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647

* fix(tests): prevent pool auto-seeding from host env in credential pool tests

Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.

- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test

* feat(auth): add thread safety, least_used strategy, and request counting

- Add threading.Lock to CredentialPool for gateway thread safety
  (concurrent requests from multiple gateway sessions could race on
  pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
  with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
  with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
  thread safety (4 threads × 20 selects with no corruption)

* feat(auth): add interactive mode for bare 'hermes auth' command

When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:

1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
   add flow explicitly asks 'API key or OAuth login?' — making it
   clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
   least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection

The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.

* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)

Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.

* feat(auth): support custom endpoint credential pools keyed by provider name

Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).

- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
  model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
  providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
  pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing

* docs: add Excalidraw diagram of full credential pool flow

Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration

Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g

* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow

The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached

* docs: add comprehensive credential pool documentation

- New page: website/docs/user-guide/features/credential-pools.md
  Full guide covering quick start, CLI commands, rotation strategies,
  error recovery, custom endpoint pools, auto-discovery, thread safety,
  architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
  first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide

* chore: remove excalidraw diagram from repo (external link only)

* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns

- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
  (token_type, scope, client_id, portal_base_url, obtained_at,
  expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
  agent_key_obtained_at, tls) into a single extra dict with
  __getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider

Net -17 lines. All 383 targeted tests pass.

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-31 03:10:01 -07:00
Teknium
f890a94c12 refactor: make config.yaml the single source of truth for endpoint URLs (#4165)
OPENAI_BASE_URL was written to .env AND config.yaml, creating a dual-source
confusion. Users (especially Docker) would see the URL in .env and assume
that's where all config lives, then wonder why LLM_MODEL in .env didn't work.

Changes:
- Remove all 27 save_env_value("OPENAI_BASE_URL", ...) calls across main.py,
  setup.py, and tools_config.py
- Remove OPENAI_BASE_URL env var reading from runtime_provider.py, cli.py,
  models.py, and gateway/run.py
- Remove LLM_MODEL/HERMES_MODEL env var reading from gateway/run.py and
  auxiliary_client.py — config.yaml model.default is authoritative
- Vision base URL now saved to config.yaml auxiliary.vision.base_url
  (both setup wizard and tools_config paths)
- Tests updated to set config values instead of env vars

Convention enforced: .env is for SECRETS only (API keys). All other
configuration (model names, base URLs, provider selection) lives
exclusively in config.yaml.
2026-03-30 22:02:53 -07:00