Commit Graph

538 Commits

Author SHA1 Message Date
Teknium
393929831e fix(gateway): preserve transcript on /compress and hygiene compression (salvage #3516) (#3556)
* fix(gateway): preserve full transcript on /compress instead of overwriting

The /compress command calls _compress_context() which correctly ends the
old session (preserving its full transcript in SQLite) and creates a new
session_id for the continuation. However, it then immediately called
rewrite_transcript() on the OLD session_id, overwriting the preserved
transcript with the compressed version — destroying searchable history.

Auto-compression (triggered by context pressure) does not have this bug
because the gateway already handles the session_id swap via the
agent.session_id != session_id check after _run_agent_sync.

Fix: after _compress_context creates the new session, write the compressed
messages into the NEW session_id and update the session store pointer.
The old session's full transcript stays intact and searchable via
session_search.

Before: /compress destroys original messages, session_search can't find
details from compressed portions.

After: /compress behaves like /new for history — full transcript preserved,
compressed context for the live session.

* fix(gateway): preserve transcript on /compress and hygiene compression

Apply session_id swap after _compress_context in both /compress handler
and hygiene pre-compression. _compress_context creates a new session
(ending the old one), but both paths were calling rewrite_transcript on
the OLD session_id — overwriting the preserved transcript and destroying
searchable history.

Now follows the same pattern as the auto-compression handler (lines
5415-5423): detect the new session_id, update the session store entry,
and write compressed messages to the new session.

Also fix FakeCompressAgent test mock to include session_id attribute
and simulate the session_id change that real _compress_context performs.

Co-authored-by: MacroAnarchy <MacroAnarchy@users.noreply.github.com>

---------

Co-authored-by: MacroAnarchy <MacroAnarchy@users.noreply.github.com>
2026-03-28 12:23:43 -07:00
Teknium
be322efdf2 fix(matrix): harden e2ee access-token handling (#3562)
* fix(matrix): harden e2ee access-token handling

* fix: patch nio mock in e2ee maintenance sync loop test

The sync_loop now imports nio for SyncError checking (from PR #3280),
so the test needs to inject a fake nio module via sys.modules.

---------

Co-authored-by: Cortana <andrew+cortana@chalkley.org>
2026-03-28 12:13:35 -07:00
Teknium
411e3c1539 fix(api-server): allow Idempotency-Key in CORS headers (#3530)
Browser clients using the Idempotency-Key header for request
deduplication were blocked by CORS preflight because the header
was not listed in Access-Control-Allow-Headers.

Add Idempotency-Key to _CORS_HEADERS and add tests for both the
new header allowance and the existing Vary: Origin behavior.

Co-authored-by: aydnOktay <aydnOktay@users.noreply.github.com>
Co-authored-by: Hermes Agent <hermes@nousresearch.com>
2026-03-28 08:16:41 -07:00
Teknium
6ed9740444 fix: prevent unbounded growth of _seen_uids in EmailAdapter (#3490)
EmailAdapter._seen_uids accumulates every IMAP UID ever seen but
never removes any. A long-running gateway processing a high-volume
inbox would leak memory indefinitely — thousands of integers per day.

IMAP UIDs are monotonically increasing integers, so old UIDs are safe
to drop: new messages always have higher UIDs, and the IMAP UNSEEN
flag already prevents re-delivery regardless of our local tracking.

Fix adds _trim_seen_uids() which keeps only the most recent 1000 UIDs
(half of the 2000-entry cap) when the set grows too large. Called
automatically during connect() and after each fetch cycle.

Co-authored-by: memosr.eth <96793918+memosr@users.noreply.github.com>
2026-03-27 23:08:42 -07:00
Teknium
290c71a707 fix(gateway): scope progress thread fallback to Slack only (salvage #3414) (#3488)
* test(gateway): map fixture adapter by platform in progress threading tests

* fix(gateway): scope progress thread fallback to Slack only

---------

Co-authored-by: EmpireOperating <258363005+EmpireOperating@users.noreply.github.com>
2026-03-27 22:37:53 -07:00
Teknium
f57ebf52e9 fix(api-server): cancel orphaned agent + true interrupt on SSE disconnect (salvage #3399) (#3427)
Salvage of #3399 by @binhnt92 with true agent interruption added on top.

When a streaming /v1/chat/completions client disconnects mid-stream, the agent is now interrupted via agent.interrupt() so it stops making LLM API calls, and the asyncio task wrapper is cancelled.

Closes #3399.
2026-03-27 11:33:19 -07:00
Teknium
41d9d08078 fix(telegram): fall back to no thread_id on 'Message thread not found' (#3390)
python-telegram-bot's BadRequest inherits from NetworkError, so the
send() retry loop was catching 'Message thread not found' as a transient
network error and retrying 3 times before silently failing. This killed
all tool progress messages, streaming responses, and typing indicators
when the incoming message carried an invalid message_thread_id.

Now detect BadRequest inside the NetworkError handler:
- 'thread not found' + thread_id set → clear thread_id and retry once
  (message still reaches the chat, just without topic threading)
- Other BadRequest errors → raise immediately (permanent, don't retry)
- True NetworkError → retry as before (transient)

252 silent failures in gateway.log traced to this on 2026-03-26.

5 new tests for thread fallback, non-thread BadRequest, no-thread sends,
network retry, and multi-chunk fallback.
2026-03-27 06:07:28 -07:00
Teknium
75fcbc44ce feat(telegram): auto-discover fallback IPs via DoH when api.telegram.org is unreachable (#3376)
* feat(telegram): auto-discover fallback IPs via DoH when api.telegram.org is unreachable

On some networks (university, corporate), api.telegram.org resolves to a
valid Telegram IP that is unreachable due to routing/firewall rules. A
different IP in the same Telegram-owned 149.154.160.0/20 block works fine.

This adds automatic fallback IP discovery at connect time:
1. Query Google and Cloudflare DNS-over-HTTPS for api.telegram.org A records
2. Exclude the system-DNS IP (the unreachable one), use the rest as fallbacks
3. If DoH is also blocked, fall back to a seed list (149.154.167.220)
4. TelegramFallbackTransport tries primary first, sticks to whichever works

No configuration needed — works automatically. TELEGRAM_FALLBACK_IPS env var
still available as manual override. Zero impact on healthy networks (primary
path succeeds on first attempt, fallback never exercised).

No new dependencies (uses httpx already in deps + stdlib socket).

* fix: share transport instance and downgrade seed fallback log to info

- Use single TelegramFallbackTransport shared between request and
  get_updates_request so sticky IP is shared across polling and API calls
- Keep separate HTTPXRequest instances (different timeout settings)
- Downgrade "using seed fallback IPs" from warning to info to avoid
  noisy logs on healthy networks

* fix: add telegram.request mock and discovery fixture to remaining test files

The original PR missed test_dm_topics.py and
test_telegram_network_reconnect.py — both need the telegram.request
mock module. The reconnect test also needs _no_auto_discovery since
_handle_polling_network_error calls connect() which now invokes
discover_fallback_ips().

---------

Co-authored-by: Mohan Qiao <Gavin-Qiao@users.noreply.github.com>
2026-03-27 04:03:13 -07:00
Teknium
a2847ea7f0 fix(gateway): add media download retry to Mattermost, Slack, and base cache (#3323)
* fix(gateway): add media download retry to Mattermost, Slack, and base cache

Media downloads on Mattermost and Slack fail permanently on transient
errors (timeouts, 429 rate limits, 5xx server errors). Telegram and
WhatsApp already have retry logic, but these platforms had single-attempt
downloads with hardcoded 30s timeouts.

Changes:
- base.py cache_image_from_url: add retry with exponential backoff
  (covers Signal and any platform using the shared cache helper)
- mattermost.py _send_media_url: retry on 429/5xx/timeout (3 attempts)
- slack.py _download_slack_file: retry on timeout/5xx (3 attempts)
- slack.py _download_slack_file_bytes: same retry pattern

* test: add tests for media download retry

---------

Co-authored-by: dieutx <dangtc94@gmail.com>
2026-03-26 19:33:18 -07:00
Teknium
58ca875e19 feat(gateway): surface session config on /new, /reset, and auto-reset (#3321)
When a new session starts in the gateway (via /new, /reset, or
auto-reset), send the user a summary of the detected configuration:

   Session reset! Starting fresh.

  ◆ Model: qwen3.5:27b-q4_K_M
  ◆ Provider: custom
  ◆ Context: 8K tokens (config)
  ◆ Endpoint: http://localhost:11434/v1

This makes misconfigured context length immediately visible — a user
running a local 8K model that falls to the 128K default will see:

  ◆ Context: 128K tokens (default — set model.context_length in config to override)

Instead of silently getting no compression and degrading responses.

- _format_session_info() resolves model, provider, context length,
  and endpoint from config + runtime, matching the hygiene code's
  resolution chain
- Local/custom endpoints shown; cloud endpoints hidden (not useful)
- Context source annotated: config, detected, or default with hint
- Appended to /new and /reset responses, and auto-reset notifications
- 9 tests covering all formatting paths and failure resilience

Addresses the user-facing side of #2708 — instead of trying to fix
every edge case in context detection, surface the values so users
can immediately see when something is wrong.
2026-03-26 19:27:58 -07:00
Teknium
3f95e741a7 fix: validate empty user messages to prevent Anthropic API 400 errors (#3322)
When user messages have empty content (e.g., Discord @mention-only
messages, unrecognized attachments), the Anthropic API rejects the
request with 'user messages must have non-empty content'.

Changes:
- anthropic_adapter.py: Add empty content validation for user messages
  (string and list formats), matching the existing pattern for assistant
  and tool messages. Empty content gets '(empty message)' placeholder.

- discord.py: Defense-in-depth check at gateway layer to catch empty
  messages before they enter session history.

- Add 4 regression tests covering empty string, whitespace-only,
  empty list, and empty text block scenarios.

Fixes #3143

Co-authored-by: Bartok9 <bartok9@users.noreply.github.com>
2026-03-26 19:24:03 -07:00
Teknium
22cfad157b fix: gateway token double-counting — use absolute set instead of increment (#3317)
The gateway's update_session() used += for token counts, but the cached
agent's session_prompt_tokens / session_completion_tokens are cumulative
totals that grow across messages. Each update_session call re-added the
running total, inflating usage stats with every message (1.7x after 3
messages, worse over longer conversations).

Fix: change += to = for in-memory entry fields, add set_token_counts()
to SessionDB that uses direct assignment instead of SQL increment, and
switch the gateway to call it.

CLI mode continues using update_token_counts() (increment) since it
tracks per-API-call deltas — that path is unchanged.

Based on analysis from PR #3222 by @zaycruz (closed).

Co-authored-by: zaycruz <zay@users.noreply.github.com>
2026-03-26 19:13:07 -07:00
Teknium
867eefdd9f fix(signal): track SSE keepalive comments as connection activity (#3316)
signal-cli sends SSE comment lines (':') as keepalives every ~15s. The
SSE listener only counted 'data:' lines as activity, so the health
monitor reported false idle warnings every 2 minutes during quiet
periods. Recognize ':' lines as valid activity per the SSE spec.

Salvaged from PR #2938 by ticketclosed-wontfix.
2026-03-26 19:10:25 -07:00
Teknium
a8df7f9964 fix: gateway token double-counting with cached agents (#3306)
The cached agent accumulates session_input_tokens across messages, so
run_conversation() returns cumulative totals. But update_session() used
+= (increment), double-counting on every message after the first.

- session.py: change in-memory entry updates from += to = (direct
  assignment for cumulative values)
- hermes_state.py: add absolute=True flag to update_token_counts()
  that uses SET column = ? instead of SET column = column + ?
- session.py: pass absolute=True to the DB call

CLI path is unchanged — it passes per-API-call deltas directly to
update_token_counts() with the default absolute=False (increment).

Reported by @zaycruz in #3222. Closes #3222.
2026-03-26 19:04:53 -07:00
Teknium
005786c55d fix(gateway): include per-platform ALLOW_ALL and SIGNAL_GROUP in startup allowlist check (#3313)
The startup warning 'No user allowlists configured' only checked
GATEWAY_ALLOW_ALL_USERS and per-platform _ALLOWED_USERS vars. It
missed SIGNAL_GROUP_ALLOWED_USERS and per-platform _ALLOW_ALL_USERS
vars (e.g. TELEGRAM_ALLOW_ALL_USERS), causing a false warning even
when users had these configured. The actual auth check in
_is_user_authorized already recognized these vars.

Cherry-picked from PR #3202 by binhnt92.

Co-authored-by: binhnt92 <binhnt.ht.92@gmail.com>
2026-03-26 18:23:49 -07:00
Teknium
f008ee1019 fix(session): preserve reasoning fields in rewrite_transcript (#3311)
rewrite_transcript (used by /retry, /undo, /compress) was calling
append_message without reasoning, reasoning_details, or
codex_reasoning_items — permanently dropping them from SQLite.

Co-authored-by: alireza78a <alireza78.crypto@gmail.com>
2026-03-26 18:18:00 -07:00
Teknium
18d28c63a7 fix: add explicit hermes-api-server toolset for API server platform (#3304)
The API server adapter was creating agents without specifying
enabled_toolsets, causing ALL tools to load — including clarify,
send_message, and text_to_speech which don't work without interactive
callbacks or gateway dispatch.

Changes:
- toolsets.py: Add hermes-api-server toolset (core tools minus clarify,
  send_message, text_to_speech)
- api_server.py: Resolve toolsets from config.yaml platform_toolsets
  via _get_platform_tools() — same path as all other gateway platforms.
  Falls back to hermes-api-server default when no override configured.
- tools_config.py: Add api_server to PLATFORMS dict so users can
  customize via 'hermes tools' or platform_toolsets.api_server in
  config.yaml
- 12 tests covering toolset definition, config resolution, and
  user override

Reported by thatwolfieguy on Discord.
2026-03-26 18:02:26 -07:00
Teknium
3c57eaf744 fix: YAML boolean handling for tool_progress config (#3300)
YAML 1.1 parses bare `off` as boolean False, which is falsy in
Python's `or` chain and silently falls through to the 'all' default.
Users setting `display.tool_progress: off` in config.yaml saw no
effect — tool progress stayed on.

Normalise False → 'off' before the or chain in both affected paths:
- gateway/run.py _run_agent() tool progress reader
- cli.py HermesCLI.__init__() tool_progress_mode

Reported by @gibbsoft in #2859. Closes #2859.
2026-03-26 17:58:50 -07:00
Teknium
0375b2a0d7 fix(gateway): silence background agent terminal output (#3297)
* fix(gateway): silence flush agent terminal output

quiet_mode=True only suppresses AIAgent init messages.
Tool call output still leaks to the terminal through
_safe_print → _print_fn during session reset/expiry.

Since #2670 injected live memory state into the flush prompt,
the flush agent now reliably calls memory tools — making the
output leak noticeable for the first time.

Set _print_fn to a no-op so the background flush is fully silent.

* test(gateway): add test for flush agent terminal silence + fix dotenv mock

- Add TestFlushAgentSilenced: verifies _print_fn is set to a no-op on
  the flush agent so tool output never leaks to the terminal
- Fix pre-existing test failures: replace patch('run_agent.AIAgent')
  with sys.modules mock to avoid importing run_agent (requires openai)
- Add autouse _mock_dotenv fixture so all tests in this file run
  without the dotenv package installed

* fix(display): route KawaiiSpinner output through print_fn to fully silence flush agent

The previous fix set tmp_agent._print_fn = no-op on the flush agent but
spinner output and quiet-mode cute messages bypassed _print_fn entirely:
- KawaiiSpinner captured sys.stdout at __init__ and wrote directly to it
- quiet-mode tool results used builtin print() instead of _safe_print()

Add optional print_fn parameter to KawaiiSpinner.__init__; _write routes
through it when set. Pass self._print_fn to all spinner construction sites
in run_agent.py and change the quiet-mode cute message print to _safe_print.
The existing gateway fix (tmp_agent._print_fn = lambda) now propagates
correctly through both paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(gateway): silence hygiene and compression background agents

Two more background AIAgent instances in the gateway were created with
quiet_mode=True but without _print_fn = no-op, causing tool output to
leak to the terminal:
- _hyg_agent (in-turn hygiene memory agent)
- tmp_agent (_compress_context path)

Apply the same _print_fn no-op pattern used for the flush agent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore(display): remove unused _last_flush_time from KawaiiSpinner

Attribute was set but never read; upstream already removed it.
Leftover from conflict resolution during rebase onto upstream/main.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 17:40:31 -07:00
Teknium
08fa326bb0 feat(gateway): deliver background review notifications to user chat (#3293)
The background memory/skill review (_spawn_background_review) runs
after the agent response when turn/iteration counters exceed their
thresholds. It saves memories and skills, then prints a summary like
'💾 Memory updated · User profile updated'. In CLI mode this goes to
the terminal via _safe_print. In gateway mode, _safe_print routes to
print() which goes to stdout — invisible to the user.

Add a background_review_callback attribute to AIAgent. When set, the
background review thread calls it with the summary string after saves
complete. The gateway wires this to adapter.send() via the same
run_coroutine_threadsafe bridge used by status_callback, delivering
the notification to the user's chat.
2026-03-26 17:38:24 -07:00
Teknium
bde45f5a2a fix(gateway): retry transient send failures and notify user on exhaustion (#3288)
When send() fails due to a network error (ConnectError, ReadTimeout, etc.),
the failure was silently logged and the user received no feedback — appearing
as a hang. In one reported case, a user waited 1+ hour for a response that
had already been generated but failed to deliver (#2910).

Adds _send_with_retry() to BasePlatformAdapter:
- Transient errors: retry up to 2x with exponential backoff + jitter
- On exhaustion: send delivery-failure notice so user knows to retry
- Permanent errors: fall back to plain-text version (preserves existing behavior)
- SendResult.retryable flag for platform-specific transient errors

All adapters benefit automatically via BasePlatformAdapter inheritance.

Cherry-picked from PR #3108 by Mibayy.

Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
2026-03-26 17:37:10 -07:00
Teknium
148f46620f fix(matrix): add backoff for SyncError in sync loop (#3280)
When the homeserver returns an error response, matrix-nio parses it
as a SyncError return value rather than raising an exception. The sync
loop only had backoff in the except handler, so SyncError caused a
tight retry loop (~489 req/s) flooding logs and hammering the
homeserver. Check the return value and sleep 5s before retry.

Cherry-picked from PR #2937 by ticketclosed-wontfix.

Co-authored-by: ticketclosed-wontfix <ticketclosed-wontfix@users.noreply.github.com>
2026-03-26 16:19:58 -07:00
Teknium
6610c377ba fix(telegram): self-reschedule reconnect when start_polling fails (#3268)
After a Telegram 502, _handle_polling_network_error calls updater.stop()
then start_polling(). If start_polling() also raises, the old code logged
a warning and returned — but the comment 'The next network error will
trigger another attempt' was wrong. The updater loop is dead after stop(),
so no further error callbacks ever fire. The gateway stays alive but
permanently deaf to messages.

Fix: when start_polling() fails in the except branch, schedule a new
_handle_polling_network_error task to continue the exponential backoff
retry chain. The task is tracked in _background_tasks (preventing GC).
Guarded by has_fatal_error to avoid spurious retries during shutdown.

Closes #3173.
Salvaged from PR #3177 by Mibayy.
2026-03-26 15:34:33 -07:00
Teknium
72250b5f62 feat: config-gated /verbose command for messaging gateway (#3262)
* feat: config-gated /verbose command for messaging gateway

Add gateway_config_gate field to CommandDef, allowing cli_only commands
to be conditionally available in the gateway based on a config value.

- CommandDef gains gateway_config_gate: str | None — a config dotpath
  that, when truthy, overrides cli_only for gateway surfaces
- /verbose uses gateway_config_gate='display.tool_progress_command'
- Default is off (cli_only behavior preserved)
- When enabled, /verbose cycles tool_progress mode (off/new/all/verbose)
  in the gateway, saving to config.yaml — same cycle as the CLI
- Gateway helpers (help, telegram menus, slack mapping) dynamically
  check config to include/exclude config-gated commands
- GATEWAY_KNOWN_COMMANDS always includes config-gated commands so
  the gateway recognizes them and can respond appropriately
- Handles YAML 1.1 bool coercion (bare 'off' parses as False)
- 8 new tests for the config gate mechanism + gateway handler

* docs: document gateway_config_gate and /verbose messaging support

- AGENTS.md: add gateway_config_gate to CommandDef fields
- slash-commands.md: note /verbose can be enabled for messaging, update Notes
- configuration.md: add tool_progress_command to display section + usage note
- cli.md: cross-link to config docs for messaging enablement
- messaging/index.md: show tool_progress_command in config snippet
- plugins.md: add gateway_config_gate to register_command parameter table
2026-03-26 14:41:04 -07:00
Teknium
243ee67529 fix: store asyncio task references to prevent GC mid-execution (#3267)
Python's asyncio event loop holds only weak references to tasks.
Without a strong reference, the garbage collector can destroy a task
while it's awaiting I/O — silently dropping messages. Python 3.12+
made this more aggressive.

Audit of all gateway platform adapters found 6 untracked create_task
calls across 6 files:

Per-message tasks (tracked via _background_tasks set from base class):
- gateway/platforms/webhook.py: handle_message task
- gateway/platforms/sms.py: handle_message task
- gateway/platforms/signal.py: SSE response aclose task

Long-running infrastructure tasks (stored in named instance vars):
- gateway/platforms/slack.py: Socket Mode handler (_socket_mode_task)
- gateway/platforms/discord.py: bot client (_bot_task)
- gateway/platforms/whatsapp.py: message poll loop (_poll_task, 2 sites)

All other adapters (telegram, mattermost, matrix, email, homeassistant,
dingtalk) already tracked their tasks correctly.

Salvaged from PR #3160 by memosr — expanded from 1 file to 6.
2026-03-26 14:36:24 -07:00
Teknium
3a86328847 fix(gateway): add request timeouts to HA, Email, Mattermost, SMS adapters (#3258)
Add timeout=30 to all bare ClientSession, IMAP4_SSL, smtplib.SMTP, and
ws_connect calls that previously had no timeout, preventing indefinite
hangs when an external server is slow or unresponsive.

Adapters hardened:
- HomeAssistant: REST + WS session creation, ws_connect handshake
- Email: all IMAP4_SSL (x2) and smtplib.SMTP (x3) calls
- Mattermost: session creation, _api_get, _api_post, _upload_file (60s)
- SMS: session creation in connect() + fallback session in send()

Salvaged from PRs #3161, #3168, #3170 (memosr) and #3201 (binhnt92).
SMS fallback ClientSession on send() also patched (missed in #3201).

Co-authored-by: memosr <memosr@users.noreply.github.com>
Co-authored-by: nguyen binh <binhnt92@users.noreply.github.com>
2026-03-26 14:36:07 -07:00
Teknium
e9e7fb0683 fix(gateway): track background task references in GatewayRunner (#3254)
Asyncio tasks created with create_task() but never stored can be
garbage collected mid-execution. Add self._background_tasks set to
hold references, with add_done_callback cleanup. Tracks:
- /background command task
- session-reset memory flush task
- session-resume memory flush task
Cancel all pending tasks in stop().

Update test fixtures that construct GatewayRunner via object.__new__()
to include the new _background_tasks attribute.

Cherry-picked from PR #3167 by memosr. The original PR also deleted
the DM topic auto-skill loading code — that deletion was excluded
from this salvage as it removes a shipped feature (#2598).

Co-authored-by: memosr.eth <96793918+memosr@users.noreply.github.com>
2026-03-26 14:33:48 -07:00
Teknium
b81d49dc45 fix(state): SQLite concurrency hardening + session transcript integrity (#3249)
* fix(session-db): survive CLI/gateway concurrent write contention

Closes #3139

Three layered fixes for the scenario where CLI and gateway write to
state.db concurrently, causing create_session() to fail with
'database is locked' and permanently disabling session_search on the
gateway side.

1. Increase SQLite connection timeout: 10s -> 30s
   hermes_state.py: longer window for the WAL writer to finish a batch
   flush before the other process gives up entirely.

2. INSERT OR IGNORE in create_session
   hermes_state.py: prevents IntegrityError on duplicate session IDs
   (e.g. gateway restarts while CLI session is still alive).

3. Don't null out _session_db on create_session failure  (main fix)
   run_agent.py: a transient lock at agent startup must not permanently
   disable session_search for the lifetime of that agent instance.
   _session_db now stays alive so subsequent flushes and searches work
   once the lock clears.

4. New ensure_session() helper + call it during flush
   hermes_state.py: INSERT OR IGNORE for a minimal session row.
   run_agent.py _flush_messages_to_session_db: calls ensure_session()
   before appending messages, so the FK constraint is satisfied even
   when create_session() failed at startup. No-op when the row exists.

* fix(state): release lock between context queries in search_messages

The context-window queries (one per FTS5 match) were running inside
the same lock acquisition as the primary FTS5 query, holding the lock
for O(N) sequential SQLite round-trips. Move per-match context fetches
outside the outer lock block so each acquires the lock independently,
keeping critical sections short and allowing other threads to interleave.

* fix(session): prefer longer source in load_transcript to prevent legacy truncation

When a long-lived session pre-dates SQLite storage (e.g. sessions
created before the DB layer was introduced, or after a clean
deployment that reset the DB), _flush_messages_to_session_db only
writes the *new* messages from the current turn to SQLite — it skips
messages already present in conversation_history, assuming they are
already persisted.

That assumption fails for legacy JSONL-only sessions:

  Turn N (first after DB migration):
    load_transcript(id)       → SQLite: 0  → falls back to JSONL: 994 ✓
    _flush_messages_to_session_db: skip first 994, write 2 new → SQLite: 2

  Turn N+1:
    load_transcript(id)       → SQLite: 2  → returns immediately ✗
    Agent sees 2 messages of history instead of 996

The same pattern causes the reported symptom: session JSON truncated
to 4 messages (_save_session_log writes agent.messages which only has
2 history + 2 new = 4).

Fix: always load both sources and return whichever is longer.  For a
fully-migrated session SQLite will always be ≥ JSONL, so there is no
regression.  For a legacy session that hasn't been bootstrapped yet,
JSONL wins and the full history is restored.

Closes #3212

* test: add load_transcript source preference tests for #3212

Covers: JSONL longer returns JSONL, SQLite longer returns SQLite,
SQLite empty falls back to JSONL, both empty returns empty, equal
length prefers SQLite (richer reasoning fields).

---------

Co-authored-by: Mibayy <mibayy@hermes.ai>
Co-authored-by: kewe63 <kewe.3217@gmail.com>
Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
2026-03-26 13:47:14 -07:00
Teknium
62f8aa9b03 fix: MCP toolset resolution for runtime and config (#3252)
Gateway sessions had their own inline toolset resolution that only read
platform_toolsets from config, which never includes MCP server names.
MCP tools were discovered and registered but invisible to the model.

- Replace duplicated gateway toolset resolution in _run_agent() and
  _run_background_task() with calls to the shared _get_platform_tools()
- Extend _get_platform_tools() to include globally enabled MCP servers
  at runtime (include_default_mcp_servers=True), while config-editing
  flows use include_default_mcp_servers=False to avoid persisting
  implicit MCP defaults into platform_toolsets
- Add homeassistant to PLATFORMS dict (was missing, caused KeyError)
- Fix CLI entry point to use _get_platform_tools() as well, so MCP
  tools are visible in CLI mode too
- Remove redundant platform_key reassignment in _run_background_task

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-26 13:39:41 -07:00
Teknium
c6fe75e99b fix(gateway): fingerprint full auth token in agent cache signature (#3247)
Previously _agent_config_signature() used only the first 8 characters of
the API key, which causes false cache hits for JWT/OAuth tokens that share
a common prefix (e.g. 'eyJhbGci'). This led to cross-account cache
collisions when switching OAuth accounts in multi-user gateway deployments.

Replace the 8-char prefix with a SHA-256 hash of the full key so the
signature is unique per credential while keeping secrets out of the
cache key.

Salvaged from PR #3117 by EmpireOperating.

Co-authored-by: EmpireOperating <EmpireOperating@users.noreply.github.com>
2026-03-26 13:19:43 -07:00
Teknium
36af1f3baf feat(telegram): Private Chat Topics with functional skill binding (#2598)
Salvages PR #3005 by web3blind. Cherry-picked onto current main with functional skill binding and docs added.

- DM topic creation via createForumTopic (Bot API 9.4, Feb 2026)
- Config-driven topics with thread_id persistence across restarts
- Session isolation via existing build_session_key thread_id support
- auto_skill field on MessageEvent for topic-skill bindings
- Gateway auto-loads bound skill on new sessions (same as /skill commands)
- Docs: full Private Chat Topics section in Telegram messaging guide
- 20 tests (17 original + 3 for auto_skill)

Closes #2598
Co-authored-by: web3blind <web3blind@users.noreply.github.com>
2026-03-26 02:04:11 -07:00
Teknium
cbf195e806 chore: fix 154 f-strings, simplify getattr/URL patterns, remove dead code (#3119)
Three categories of cleanup, all zero-behavioral-change:

1. F-strings without placeholders (154 fixes across 29 files)
   - Converted f'...' to '...' where no {expression} was present
   - Heaviest files: run_agent.py (24), cli.py (20), honcho_integration/cli.py (34)

2. Simplify defensive patterns in run_agent.py
   - Added explicit self._is_anthropic_oauth = False in __init__ (before
     the api_mode branch that conditionally sets it)
   - Replaced 7x getattr(self, '_is_anthropic_oauth', False) with direct
     self._is_anthropic_oauth (attribute always initialized now)
   - Added _is_openrouter_url() and _is_anthropic_url() helper methods
   - Replaced 3 inline 'openrouter' in self._base_url_lower checks

3. Remove dead code in small files
   - hermes_cli/claw.py: removed unused 'total' computation
   - tools/fuzzy_match.py: removed unused strip_indent() function and
     pattern_stripped variable

Full test suite: 6184 passed, 0 failures
E2E PTY: banner clean, tool calls work, zero garbled ANSI
2026-03-25 19:47:58 -07:00
Teknium
59575d6a91 fix(gateway): recover from hung agents — /stop force-unlocks session (#3104)
When an agent thread hangs (truly blocked, never checks _interrupt_requested),
/stop now force-cleans _running_agents to unlock the session immediately.

Two changes:
- Early /stop intercept in the running-agent guard: bypasses normal command
  dispatch to force-interrupt and unlock the session. Follows the same pattern
  as the existing /new intercept.
- Sentinel /stop: force-cleans the sentinel instead of returning 'nothing to
  stop yet', so /stop during slow startup actually unlocks the session.

Follow-up improvements over original PR:
- Consolidated duplicate resolve_command imports into single early resolution
- Updated _handle_stop_command to also force-clean for consistency
- Removed 10-minute hard timeout on the executor (would kill legitimate
  long-running agent tasks; the /stop force-clean handles recovery)

Cherry-picked from Mibayy's PR #2498.

Co-authored-by: Mibayy <Mibayy@users.noreply.github.com>
2026-03-25 18:46:50 -07:00
Teknium
9783c9d5c1 refactor: remove /model slash command from CLI and gateway (#3080)
The /model command is removed from both the interactive CLI and
messenger gateway (Telegram/Discord/Slack/WhatsApp). Users can
still change models via 'hermes model' CLI subcommand or by
editing config.yaml directly.

Removed:
- CommandDef entry from COMMAND_REGISTRY
- CLI process_command() handler and model autocomplete logic
- Gateway _handle_model_command() and dispatch
- SlashCommandCompleter model_completer_provider parameter
- Two-stage Tab completion and ghost text for /model
- All /model-specific tests

Unaffected:
- /provider command (read-only, shows current model + providers)
- ACP adapter _cmd_model (separate system for VS Code/Zed/JetBrains)
- model_switch.py module (used by ACP)
- 'hermes model' CLI subcommand

Author: Teknium
2026-03-25 17:03:05 -07:00
Teknium
77bcaba2d7 refactor: consolidate get_hermes_home() and parse_reasoning_effort() (#3062)
Centralizes two widely-duplicated patterns into hermes_constants.py:

1. get_hermes_home() — Path resolution for ~/.hermes (HERMES_HOME env var)
   - Was copy-pasted inline across 30+ files as:
     Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
   - Now defined once in hermes_constants.py (zero-dependency module)
   - hermes_cli/config.py re-exports it for backward compatibility
   - Removed local wrapper functions in honcho_integration/client.py,
     tools/website_policy.py, tools/tirith_security.py, hermes_cli/uninstall.py

2. parse_reasoning_effort() — Reasoning effort string validation
   - Was copy-pasted in cli.py, gateway/run.py, cron/scheduler.py
   - Same validation logic: check against (xhigh, high, medium, low, minimal, none)
   - Now defined once in hermes_constants.py, called from all 3 locations
   - Warning log for unknown values kept at call sites (context-specific)

31 files changed, net +31 lines (125 insertions, 94 deletions)
Full test suite: 6179 passed, 0 failed
2026-03-25 15:54:28 -07:00
Teknium
e0cfc089da fix(gateway/slack): send progress messages to correct thread (#3063)
Co-authored-by: Jneeee <jneeee@outlook.com>
2026-03-25 15:51:15 -07:00
Teknium
73e66eb3c0 fix(gateway): thread-safe SessionStore — protect _entries with threading.Lock (#3052)
SessionStore._entries was read and mutated without synchronisation,
causing race conditions when multiple platforms (Telegram + Discord)
received messages concurrently on the same gateway process. Two threads
could simultaneously pass the session_key check and create duplicate
sessions for the same user, splitting conversation history.

- Added threading.Lock to protect all _entries / _loaded mutations
- Split _ensure_loaded() into public wrapper + internal _ensure_loaded_locked()
- SQLite I/O is performed outside the lock to avoid blocking during
  slow disk operations
- _save() stays inside the lock since it reads _entries for serialization

Cherry-picked from PR #3012 by Kewe63. Removed unrelated changes
(delivery.py case-sensitivity, hermes_state.py schema tracking) and
stripped the UTC timezone switch to keep the change focused on threading.

Co-authored-by: Kewe63 <Kewe63@users.noreply.github.com>
2026-03-25 15:15:37 -07:00
Teknium
8bb1d15da4 chore: remove ~100 unused imports across 55 files (#3016)
Automated cleanup via pyflakes + autoflake with manual review.

Changes:
- Removed unused stdlib imports (os, sys, json, pathlib.Path, etc.)
- Removed unused typing imports (List, Dict, Any, Optional, Tuple, Set, etc.)
- Removed unused internal imports (hermes_cli.auth, hermes_cli.config, etc.)
- Fixed cli.py: removed 8 shadowed banner imports (imported from hermes_cli.banner
  then immediately redefined locally — only build_welcome_banner is actually used)
- Added noqa comments to imports that appear unused but serve a purpose:
  - Re-exports (gateway/session.py SessionResetPolicy, tools/terminal_tool.py
    is_interrupted/_interrupt_event)
  - SDK presence checks in try/except (daytona, fal_client, discord)
  - Test mock targets (auxiliary_client.py Path, mcp_config.py get_hermes_home)

Zero behavioral changes. Full test suite passes (6162/6162, 2 pre-existing
streaming test failures unrelated to this change).
2026-03-25 15:02:03 -07:00
Teknium
65dace1b1a fix(discord): stop phantom typing indicator after agent turn completes (#3003)
Two fixes for a race where Discord's typing indicator lingers after the
agent finishes:

1. _keep_typing (root cause): after outer stop_typing() clears the task
   dict, _keep_typing wakes from its 2s sleep and calls send_typing()
   again, recreating an orphaned loop. Add a finally block so _keep_typing
   always calls stop_typing() on exit, cleaning up any loop it recreated.

2. _process_message_background (safety net): add stop_typing() after
   cancelling the typing task, catching any platform-level persistent
   typing tasks that slipped through.

Combines fixes from PR #2945 by catbusconductor (root cause in
_keep_typing) and PR #2832 by subrih (safety net in
_process_message_background).
2026-03-25 11:28:28 -07:00
Teknium
b2a6b012fe fix(api_server): streaming breaks when agent makes tool calls (#2985)
* fix(run_agent): ensure _fire_first_delta() is called for tool generation events

Added calls to _fire_first_delta() in the AIAgent class to improve the handling of tool generation events, ensuring timely notifications during the processing of function calls and tool usage.

* fix(run_agent): improve timeout handling for chat completions

Enhanced the timeout configuration for chat completions in the AIAgent class by introducing customizable connection, read, and write timeouts using environment variables. This ensures more robust handling of API requests during streaming operations.

* fix(run_agent): reduce default stream read timeout for chat completions

Updated the default stream read timeout from 120 seconds to 60 seconds in the AIAgent class, enhancing the timeout configuration for chat completions. This change aims to improve responsiveness during streaming operations.

* fix(run_agent): enhance streaming error handling and retry logic

Improved the error handling and retry mechanism for streaming requests in the AIAgent class. Introduced a configurable maximum number of stream retries and refined the handling of transient network errors, allowing for retries with fresh connections. Non-transient errors now trigger a fallback to non-streaming only when appropriate, ensuring better resilience during API interactions.

* fix(api_server): streaming breaks when agent makes tool calls

The agent fires stream_delta_callback(None) to signal the CLI display
to close its response box before tool execution begins. The API server's
_on_delta callback was forwarding this None directly into the SSE queue,
where the SSE writer treats it as end-of-stream and terminates the HTTP
response prematurely.

After tool calls complete, the agent streams the final answer through
the same callback, but the SSE response was already closed — so Open
WebUI (and similar frontends) never received the actual answer.

Fix: filter out None in _on_delta so the SSE stream stays open. The SSE
loop already detects completion via agent_task.done(), which handles
stream termination correctly without needing the None sentinel.

Reported by Rohit Paul on X.
2026-03-25 09:56:20 -07:00
Teknium
42fec19151 feat: persist reasoning across gateway session turns (schema v6) (#2974)
feat: persist reasoning across gateway session turns (schema v6)

Tested against OpenAI Codex (direct), Anthropic (direct + OAI-compat), and OpenRouter → 6 backends. All reasoning field types (reasoning, reasoning_details, codex_reasoning_items) round-trip through the DB correctly.
2026-03-25 09:47:28 -07:00
Teknium
c6f4515f73 fix(whatsapp): download documents, audio, and video media from messages (#2978)
Add downloadMediaMessage() calls for documents, audio/voice notes, and
video in bridge.js — previously only images were downloaded, leaving all
other file types inaccessible to the agent.

Handle local file paths from the bridge for DOCUMENT, VOICE, and VIDEO
types in whatsapp.py with proper MIME detection. Inject text content
inline for readable files (.txt, .md, .csv, .json, etc.).

Follow-up fixes applied during salvage:
- Remove unused cache_document_from_bytes import
- Add 100KB size cap on text injection (matches Telegram/Discord/Slack)
- Align injection format with other platforms

Cherry-picked from PR #2818. Also fixes #2856 (bugs 1 & 2).
PR #2865 by ayberkesn fixed the same voice note issue.

Co-authored-by: noestelar <hola@noeali.com>
2026-03-25 08:37:28 -07:00
Teknium
e5691eed38 feat(gateway): configurable Telegram reply threading mode (#2907)
Add reply_to_mode setting (off/first/all) to control whether Telegram
replies quote/thread to the user's original message.

- 'off': Never thread replies (no quote bubble)
- 'first': Only first chunk threads to user's message (default, preserves existing behavior)
- 'all': All chunks in multi-part replies thread to user's message

Configurable via:
- reply_to_mode in platform config (gateway config YAML)
- TELEGRAM_REPLY_TO_MODE env var

Based on PR #855 by raulvidis.
2026-03-24 19:56:00 -07:00
Teknium
80cc27eb9d feat(api-server): Idempotency-Key support, body size limit, OpenAI error envelope (#2903)
* feat(api-server): add Idempotency-Key support and request size limit; unify OpenAI error envelope

* fix(api-server): include provider error message in 500 OpenAI error body

---------

Co-authored-by: aydnOktay <xaydinoktay@gmail.com>
2026-03-24 19:31:08 -07:00
Teknium
2e524272b1 refactor(model): extract shared switch_model() from CLI and gateway handlers
Phase 4 of the /model command overhaul.

Both the CLI (cli.py) and gateway (gateway/run.py) /model handlers
had ~50 lines of duplicated core logic: parsing, provider detection,
credential resolution, and model validation. This extracts that
pipeline into hermes_cli/model_switch.py.

New module exports:
- ModelSwitchResult: dataclass with all fields both handlers need
- CustomAutoResult: dataclass for bare '/model custom' results
- switch_model(): core pipeline — parse → detect → resolve → validate
- switch_to_custom_provider(): resolve endpoint + auto-detect model

The shared functions are pure (no I/O side effects). Each caller
handles its own platform-specific concerns:
- CLI: sets self.model/provider/etc, calls save_config_value(), prints
- Gateway: writes config.yaml directly, sets env vars, returns markdown

Net result: -244 lines from handlers, +234 lines in shared module.
The handlers are now ~80 lines each (down from ~150+) and can't drift
apart on core logic.
2026-03-24 07:08:07 -07:00
Teknium
b641ee88f4 feat(model): /model command overhaul — Phases 2, 3, 5
* feat(model): persist base_url on /model switch, auto-detect for bare /model custom

Phase 2+3 of the /model command overhaul:

Phase 2 — Persist base_url on model switch:
- CLI: save model.base_url when switching to a non-OpenRouter endpoint;
  clear it when switching away from custom to prevent stale URLs
  leaking into the new provider's resolution
- Gateway: same logic using direct YAML write

Phase 3 — Better feedback and edge cases:
- Bare '/model custom' now auto-detects the model from the endpoint
  using _auto_detect_local_model() and saves all three config values
  (model, provider, base_url) atomically
- Shows endpoint URL in success messages when switching to/from
  custom providers (both CLI and gateway)
- Clear error messages when no custom endpoint is configured
- Updated test assertions for the additional save_config_value call

Fixes #2562 (Phase 2+3)

* feat(model): support custom:name:model triple syntax for named custom providers

Phase 5 of the /model command overhaul.

Extends parse_model_input() to handle the triple syntax:
  /model custom:local-server:qwen → provider='custom:local-server', model='qwen'
  /model custom:my-model          → provider='custom', model='my-model' (unchanged)

The 'custom:local-server' provider string is already supported by
_get_named_custom_provider() in runtime_provider.py, which matches
it against the custom_providers list in config.yaml. This just wires
the parsing so users can do it from the /model slash command.

Added 4 tests covering single, triple, whitespace, and empty model cases.
2026-03-24 06:58:04 -07:00
Teknium
48b5bc6038 fix(gateway): prevent stale memory overwrites by flush agent (#2670)
The gateway memory flush agent reviews old conversation history on session
reset/expiry and writes to memory. It had no awareness of memory changes
made after that conversation ended (by the live agent, cron jobs, or other
sessions), causing silent overwrites of newer entries.

Two fixes:

1. Skip memory flush entirely for cron sessions (session IDs starting with
   'cron_'). Cron sessions are headless with no meaningful user conversation
   to extract memories from.

2. Inject the current live memory state (MEMORY.md + USER.md) directly into
   the flush prompt. The flush agent can now see what's already saved and
   make informed decisions — only adding genuinely new information rather
   than blindly overwriting entries that may have been updated since the
   conversation ended.

Addresses the root cause identified in #2670: the flush agent was making
memory decisions blind to the current state of memory, causing stale
context to overwrite newer entries on gateway restarts and session resets.

Co-authored-by: devorun <devorun@users.noreply.github.com>
Co-authored-by: dlkakbs <dlkakbs@users.noreply.github.com>
2026-03-23 16:08:38 -07:00
Teknium
4ff73fb32c feat(config): support ${ENV_VAR} substitution in config.yaml (#2684)
* feat(config): support ${ENV_VAR} substitution in config.yaml

* fix: extend env var expansion to CLI and gateway config loaders

The original PR (#2680) only wired _expand_env_vars into load_config(),
which is used by 'hermes tools' and 'hermes setup'. The two primary
config paths were missed:

- load_cli_config() in cli.py (interactive CLI)
- Module-level _cfg in gateway/run.py (gateway — bridges api_keys to env vars)

Also:
- Remove redundant 'import re' (already imported at module level)
- Add missing blank lines between top-level functions (PEP 8)
- Add tests for load_cli_config() expansion

---------

Co-authored-by: teyrebaz33 <hakanerten02@hotmail.com>
2026-03-23 16:02:06 -07:00
Teknium
f9c2565ab4 fix(config): log warning instead of silently swallowing config.yaml errors (#2683)
A bare `except Exception: pass` meant any YAML syntax error, bad value,
or unexpected structure in config.yaml was silently ignored and the
gateway fell back to .env / gateway.json without any indication.
Users had no way to know why their config changes had no effect.

Co-authored-by: sprmn24 <oncuevtv@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 15:54:11 -07:00
Teknium
6302e56e7c fix(gateway): add all missing platform allowlist env vars to startup warning check (#2628)
* fix(gateway): added MATRIX_ALLOWED_USERS to list of env vars checked by gateway

* fix(gateway): add all missing platform allowlist env vars to startup check

The startup warning for 'No user allowlists configured' was only checking
TELEGRAM, DISCORD, WHATSAPP, SLACK, and SMS — missing SIGNAL, EMAIL,
MATTERMOST, and DINGTALK. Users of those platforms would see a spurious
warning even with their platform-specific allowlist configured.

Now matches the canonical platform_env_map in _is_user_authorized().

---------

Co-authored-by: SteelPh0enix <wojciech_olech@hotmail.com>
2026-03-23 07:19:14 -07:00