Commit Graph

210 Commits

Author SHA1 Message Date
Teknium
e9e7fb0683 fix(gateway): track background task references in GatewayRunner (#3254)
Asyncio tasks created with create_task() but never stored can be
garbage collected mid-execution. Add self._background_tasks set to
hold references, with add_done_callback cleanup. Tracks:
- /background command task
- session-reset memory flush task
- session-resume memory flush task
Cancel all pending tasks in stop().

Update test fixtures that construct GatewayRunner via object.__new__()
to include the new _background_tasks attribute.

Cherry-picked from PR #3167 by memosr. The original PR also deleted
the DM topic auto-skill loading code — that deletion was excluded
from this salvage as it removes a shipped feature (#2598).

Co-authored-by: memosr.eth <96793918+memosr@users.noreply.github.com>
2026-03-26 14:33:48 -07:00
Teknium
b81d49dc45 fix(state): SQLite concurrency hardening + session transcript integrity (#3249)
* fix(session-db): survive CLI/gateway concurrent write contention

Closes #3139

Three layered fixes for the scenario where CLI and gateway write to
state.db concurrently, causing create_session() to fail with
'database is locked' and permanently disabling session_search on the
gateway side.

1. Increase SQLite connection timeout: 10s -> 30s
   hermes_state.py: longer window for the WAL writer to finish a batch
   flush before the other process gives up entirely.

2. INSERT OR IGNORE in create_session
   hermes_state.py: prevents IntegrityError on duplicate session IDs
   (e.g. gateway restarts while CLI session is still alive).

3. Don't null out _session_db on create_session failure  (main fix)
   run_agent.py: a transient lock at agent startup must not permanently
   disable session_search for the lifetime of that agent instance.
   _session_db now stays alive so subsequent flushes and searches work
   once the lock clears.

4. New ensure_session() helper + call it during flush
   hermes_state.py: INSERT OR IGNORE for a minimal session row.
   run_agent.py _flush_messages_to_session_db: calls ensure_session()
   before appending messages, so the FK constraint is satisfied even
   when create_session() failed at startup. No-op when the row exists.

* fix(state): release lock between context queries in search_messages

The context-window queries (one per FTS5 match) were running inside
the same lock acquisition as the primary FTS5 query, holding the lock
for O(N) sequential SQLite round-trips. Move per-match context fetches
outside the outer lock block so each acquires the lock independently,
keeping critical sections short and allowing other threads to interleave.

* fix(session): prefer longer source in load_transcript to prevent legacy truncation

When a long-lived session pre-dates SQLite storage (e.g. sessions
created before the DB layer was introduced, or after a clean
deployment that reset the DB), _flush_messages_to_session_db only
writes the *new* messages from the current turn to SQLite — it skips
messages already present in conversation_history, assuming they are
already persisted.

That assumption fails for legacy JSONL-only sessions:

  Turn N (first after DB migration):
    load_transcript(id)       → SQLite: 0  → falls back to JSONL: 994 ✓
    _flush_messages_to_session_db: skip first 994, write 2 new → SQLite: 2

  Turn N+1:
    load_transcript(id)       → SQLite: 2  → returns immediately ✗
    Agent sees 2 messages of history instead of 996

The same pattern causes the reported symptom: session JSON truncated
to 4 messages (_save_session_log writes agent.messages which only has
2 history + 2 new = 4).

Fix: always load both sources and return whichever is longer.  For a
fully-migrated session SQLite will always be ≥ JSONL, so there is no
regression.  For a legacy session that hasn't been bootstrapped yet,
JSONL wins and the full history is restored.

Closes #3212

* test: add load_transcript source preference tests for #3212

Covers: JSONL longer returns JSONL, SQLite longer returns SQLite,
SQLite empty falls back to JSONL, both empty returns empty, equal
length prefers SQLite (richer reasoning fields).

---------

Co-authored-by: Mibayy <mibayy@hermes.ai>
Co-authored-by: kewe63 <kewe.3217@gmail.com>
Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
2026-03-26 13:47:14 -07:00
Teknium
62f8aa9b03 fix: MCP toolset resolution for runtime and config (#3252)
Gateway sessions had their own inline toolset resolution that only read
platform_toolsets from config, which never includes MCP server names.
MCP tools were discovered and registered but invisible to the model.

- Replace duplicated gateway toolset resolution in _run_agent() and
  _run_background_task() with calls to the shared _get_platform_tools()
- Extend _get_platform_tools() to include globally enabled MCP servers
  at runtime (include_default_mcp_servers=True), while config-editing
  flows use include_default_mcp_servers=False to avoid persisting
  implicit MCP defaults into platform_toolsets
- Add homeassistant to PLATFORMS dict (was missing, caused KeyError)
- Fix CLI entry point to use _get_platform_tools() as well, so MCP
  tools are visible in CLI mode too
- Remove redundant platform_key reassignment in _run_background_task

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-26 13:39:41 -07:00
Teknium
c6fe75e99b fix(gateway): fingerprint full auth token in agent cache signature (#3247)
Previously _agent_config_signature() used only the first 8 characters of
the API key, which causes false cache hits for JWT/OAuth tokens that share
a common prefix (e.g. 'eyJhbGci'). This led to cross-account cache
collisions when switching OAuth accounts in multi-user gateway deployments.

Replace the 8-char prefix with a SHA-256 hash of the full key so the
signature is unique per credential while keeping secrets out of the
cache key.

Salvaged from PR #3117 by EmpireOperating.

Co-authored-by: EmpireOperating <EmpireOperating@users.noreply.github.com>
2026-03-26 13:19:43 -07:00
Teknium
36af1f3baf feat(telegram): Private Chat Topics with functional skill binding (#2598)
Salvages PR #3005 by web3blind. Cherry-picked onto current main with functional skill binding and docs added.

- DM topic creation via createForumTopic (Bot API 9.4, Feb 2026)
- Config-driven topics with thread_id persistence across restarts
- Session isolation via existing build_session_key thread_id support
- auto_skill field on MessageEvent for topic-skill bindings
- Gateway auto-loads bound skill on new sessions (same as /skill commands)
- Docs: full Private Chat Topics section in Telegram messaging guide
- 20 tests (17 original + 3 for auto_skill)

Closes #2598
Co-authored-by: web3blind <web3blind@users.noreply.github.com>
2026-03-26 02:04:11 -07:00
Teknium
59575d6a91 fix(gateway): recover from hung agents — /stop force-unlocks session (#3104)
When an agent thread hangs (truly blocked, never checks _interrupt_requested),
/stop now force-cleans _running_agents to unlock the session immediately.

Two changes:
- Early /stop intercept in the running-agent guard: bypasses normal command
  dispatch to force-interrupt and unlock the session. Follows the same pattern
  as the existing /new intercept.
- Sentinel /stop: force-cleans the sentinel instead of returning 'nothing to
  stop yet', so /stop during slow startup actually unlocks the session.

Follow-up improvements over original PR:
- Consolidated duplicate resolve_command imports into single early resolution
- Updated _handle_stop_command to also force-clean for consistency
- Removed 10-minute hard timeout on the executor (would kill legitimate
  long-running agent tasks; the /stop force-clean handles recovery)

Cherry-picked from Mibayy's PR #2498.

Co-authored-by: Mibayy <Mibayy@users.noreply.github.com>
2026-03-25 18:46:50 -07:00
Teknium
e0cfc089da fix(gateway/slack): send progress messages to correct thread (#3063)
Co-authored-by: Jneeee <jneeee@outlook.com>
2026-03-25 15:51:15 -07:00
Teknium
b2a6b012fe fix(api_server): streaming breaks when agent makes tool calls (#2985)
* fix(run_agent): ensure _fire_first_delta() is called for tool generation events

Added calls to _fire_first_delta() in the AIAgent class to improve the handling of tool generation events, ensuring timely notifications during the processing of function calls and tool usage.

* fix(run_agent): improve timeout handling for chat completions

Enhanced the timeout configuration for chat completions in the AIAgent class by introducing customizable connection, read, and write timeouts using environment variables. This ensures more robust handling of API requests during streaming operations.

* fix(run_agent): reduce default stream read timeout for chat completions

Updated the default stream read timeout from 120 seconds to 60 seconds in the AIAgent class, enhancing the timeout configuration for chat completions. This change aims to improve responsiveness during streaming operations.

* fix(run_agent): enhance streaming error handling and retry logic

Improved the error handling and retry mechanism for streaming requests in the AIAgent class. Introduced a configurable maximum number of stream retries and refined the handling of transient network errors, allowing for retries with fresh connections. Non-transient errors now trigger a fallback to non-streaming only when appropriate, ensuring better resilience during API interactions.

* fix(api_server): streaming breaks when agent makes tool calls

The agent fires stream_delta_callback(None) to signal the CLI display
to close its response box before tool execution begins. The API server's
_on_delta callback was forwarding this None directly into the SSE queue,
where the SSE writer treats it as end-of-stream and terminates the HTTP
response prematurely.

After tool calls complete, the agent streams the final answer through
the same callback, but the SSE response was already closed — so Open
WebUI (and similar frontends) never received the actual answer.

Fix: filter out None in _on_delta so the SSE stream stays open. The SSE
loop already detects completion via agent_task.done(), which handles
stream termination correctly without needing the None sentinel.

Reported by Rohit Paul on X.
2026-03-25 09:56:20 -07:00
Teknium
e5691eed38 feat(gateway): configurable Telegram reply threading mode (#2907)
Add reply_to_mode setting (off/first/all) to control whether Telegram
replies quote/thread to the user's original message.

- 'off': Never thread replies (no quote bubble)
- 'first': Only first chunk threads to user's message (default, preserves existing behavior)
- 'all': All chunks in multi-part replies thread to user's message

Configurable via:
- reply_to_mode in platform config (gateway config YAML)
- TELEGRAM_REPLY_TO_MODE env var

Based on PR #855 by raulvidis.
2026-03-24 19:56:00 -07:00
Teknium
48b5bc6038 fix(gateway): prevent stale memory overwrites by flush agent (#2670)
The gateway memory flush agent reviews old conversation history on session
reset/expiry and writes to memory. It had no awareness of memory changes
made after that conversation ended (by the live agent, cron jobs, or other
sessions), causing silent overwrites of newer entries.

Two fixes:

1. Skip memory flush entirely for cron sessions (session IDs starting with
   'cron_'). Cron sessions are headless with no meaningful user conversation
   to extract memories from.

2. Inject the current live memory state (MEMORY.md + USER.md) directly into
   the flush prompt. The flush agent can now see what's already saved and
   make informed decisions — only adding genuinely new information rather
   than blindly overwriting entries that may have been updated since the
   conversation ended.

Addresses the root cause identified in #2670: the flush agent was making
memory decisions blind to the current state of memory, causing stale
context to overwrite newer entries on gateway restarts and session resets.

Co-authored-by: devorun <devorun@users.noreply.github.com>
Co-authored-by: dlkakbs <dlkakbs@users.noreply.github.com>
2026-03-23 16:08:38 -07:00
Teknium
d35df0db71 fix(discord): ignore system messages in on_message handler (#2618)
Cherry-picked from PR #2575 by ticketclosed-wontfix.

Filters out Discord system messages (thread renames, pins, member joins,
boosts) that were being treated as regular user messages.

Follow-up fix: also allow MessageType.reply (value 19) — the original
filter only allowed MessageType.default, which would silently drop all
reply-based interactions.

Added pytest.importorskip for discord dependency in tests.
2026-03-23 06:50:09 -07:00
Teknium
3b509da571 feat: auto-reconnect failed gateway platforms with exponential backoff (#2584)
When a messaging platform fails to connect at startup (e.g. transient DNS
failure) or disconnects at runtime with a retryable error, the gateway now
queues it for background reconnection instead of giving up permanently.

- New _platform_reconnect_watcher background task runs alongside the
  existing session expiry watcher
- Exponential backoff: 30s, 60s, 120s, 240s, 300s cap
- Max 20 retry attempts before giving up on a platform
- Non-retryable errors (bad auth token, etc.) are not retried
- Runtime disconnections via _handle_adapter_fatal_error now queue
  retryable failures instead of triggering gateway shutdown
- On successful reconnect, adapter is wired up and channel directory
  is rebuilt automatically

Fixes the case where a DNS blip during gateway startup caused Telegram
and Discord to be permanently unavailable until manual restart.
2026-03-22 23:48:24 -07:00
Teknium
b799bca7a3 refactor(gateway): remove broken 1.4x hygiene multiplier entirely
The previous commit capped the 1.4x at 95% of context, but the multiplier
itself is unnecessary and confusing:

  85% threshold × 1.4 = 119% of context → never fires
  95% warn      × 1.4 = 133% of context → never warns

The 85% hygiene threshold already provides ample headroom over the agent's
own 50% compressor. Even if rough estimates overestimate by 50%, hygiene
would fire at ~57% actual usage — safe and harmless.

Remove the multiplier entirely. Both actual and estimated token paths
now use the same 85% / 95% thresholds. Update tests and comments.
2026-03-22 15:21:18 -07:00
Teknium
b2b4a9ee7d fix(gateway): hygiene compression ignores config context_length and 1.4x exceeds model limit
Three bugs in gateway session hygiene pre-compression caused 'Session too
large' errors for ~200K context models like GLM-5-turbo on z.ai:

1. Gateway hygiene called get_model_context_length(model) without passing
   config_context_length, provider, or base_url — so user overrides like
   model.context_length: 180000 were ignored, and provider-aware detection
   (models.dev, z.ai endpoint) couldn't fire. The agent's own compressor
   correctly passed all three (run_agent.py line 1038).

2. The 1.4x safety factor on rough token estimates pushed the compression
   threshold above the model's actual context limit:
     200K * 0.85 * 1.4 = 238K > 200K (model limit)
   So hygiene never compressed, sessions grew past the limit, and the API
   rejected the request.

3. Same issue for the warn threshold: 200K * 0.95 * 1.4 = 266K.

Fix:
- Read model.context_length, provider, and base_url from config.yaml
  (same as run_agent.py does) and pass them to get_model_context_length()
- Resolve provider/base_url from runtime when not in config
- Cap the 1.4x-adjusted compress threshold at 95% of context_length
- Cap the 1.4x-adjusted warn threshold at context_length

Affects: z.ai GLM-5/GLM-5-turbo, any ~200K or smaller context model
where the 1.4x factor would push 85% above 100%.

Ref: Discord report from Ddox — glm-5-turbo on z.ai coding plan
2026-03-22 15:15:37 -07:00
Teknium
cd2280d1a3 feat(gateway): notify users when session auto-resets (#2519)
When a session expires (daily schedule or idle timeout) and is
automatically reset, send a notification to the user explaining
what happened:

  ◐ Session automatically reset (inactive for 24h).
    Conversation history cleared.
  Use /resume to browse and restore a previous session.
  Adjust reset timing in config.yaml under session_reset.

Notifications are suppressed when:
- The expired session had no activity (no tokens used)
- The platform is excluded (api_server, webhook by default)
- notify: false in config

Changes:
- session.py: _should_reset() returns reason string ('idle'/'daily')
  instead of bool; SessionEntry gains auto_reset_reason and
  reset_had_activity fields; old entry's total_tokens checked
- config.py: SessionResetPolicy gains notify (bool, default: true)
  and notify_exclude_platforms (default: api_server, webhook)
- run.py: sends notification via adapter.send() before processing
  the user's message, with activity + platform checks
- 13 new tests

Config (config.yaml):

  session_reset:
    notify: true
    notify_exclude_platforms: [api_server, webhook]
2026-03-22 09:33:39 -07:00
Teknium
afe2f0abe1 feat(discord): add document caching and text-file injection (#2503)
- Download and cache .pdf, .docx, .xlsx, .pptx attachments locally
  instead of passing expiring CDN URLs to the agent
- Inject .txt and .md content (≤100 KB) into event.text so the agent
  sees file content without needing to fetch the URL
- Add 20 MB size guard and SUPPORTED_DOCUMENT_TYPES allowlist
- Fix: unsupported types (.zip etc.) no longer get MessageType.DOCUMENT
- Add 9 unit tests in test_discord_document_handling.py

Mirrors the Slack implementation from PR #784. Discord CDN URLs are
publicly accessible so no auth header is needed (unlike Slack).

Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>
2026-03-22 07:38:14 -07:00
Teknium
be3eb62047 fix(tests): resolve all consistently failing tests
- test_plugins.py: remove tests for unimplemented plugin command API
  (get_plugin_command_handler, register_command never existed)
- test_redact.py: add autouse fixture to clear HERMES_REDACT_SECRETS
  env var leaked by cli.py import in other tests
- test_signal.py: same HERMES_REDACT_SECRETS fix for phone redaction
- test_mattermost.py: add @bot_user_id to test messages after the
  mention-only filter was added in #2443
- test_context_token_tracking.py: mock resolve_provider_client for
  openai-codex provider that requires real OAuth credentials

Full suite: 5893 passed, 0 failed.
2026-03-22 05:58:26 -07:00
Teknium
ff071fc74c fix(gateway): process /queue'd messages after agent completion (#2469)
* fix: respect DashScope v1 runtime mode for alibaba

Remove the hardcoded Alibaba branch from resolve_runtime_provider()
that forced api_mode='anthropic_messages' regardless of the base URL.

Alibaba now goes through the generic API-key provider path, which
auto-detects the protocol from the URL:
- /apps/anthropic → anthropic_messages (via endswith check)
- /v1 → chat_completions (default)

This fixes Alibaba setup with OpenAI-compatible DashScope endpoints
(e.g. coding-intl.dashscope.aliyuncs.com/v1) that were broken because
runtime always forced Anthropic mode even when setup saved a /v1 URL.

Based on PR #2024 by @kshitijk4poor.

* docs(skill): add split, merge, search examples to ocr-and-documents skill

Adds pymupdf examples for PDF splitting, merging, and text search
to the existing ocr-and-documents skill. No new dependencies — pymupdf
already covers all three operations natively.

* fix: replace all production print() calls with logger in rl_training_tool

Replace all bare print() calls in production code paths with proper logger calls.

- Add `import logging` and module-level `logger = logging.getLogger(__name__)`
- Replace print() in _start_training_run() with logger.info()
- Replace print() in _stop_training_run() with logger.info()
- Replace print(Warning/Note) calls with logger.warning() and logger.info()

Using the logging framework allows log level filtering, proper formatting,
and log routing instead of always printing to stdout.

* fix(gateway): process /queue'd messages after agent completion

/queue stored messages in adapter._pending_messages but never consumed
them after normal (non-interrupted) completion. The consumption path
at line 5219 only checked pending messages when result.get('interrupted')
was True — since /queue deliberately doesn't interrupt, queued messages
were silently dropped.

Now checks adapter._pending_messages after both interrupted AND normal
completion. For queued messages (non-interrupt), the first response is
delivered before recursing to process the queued follow-up. Skips the
direct send when streaming already delivered the response.

Reported by GhostMode on Discord.

---------

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
Co-authored-by: memosr.eth <96793918+memosr@users.noreply.github.com>
2026-03-22 04:56:13 -07:00
Teknium
8d528e0045 fix(api_server): persist ResponseStore to SQLite across restarts (#2472)
The /v1/responses endpoint used an in-memory OrderedDict that lost
all conversation state on gateway restart. Replace with SQLite-backed
storage at ~/.hermes/response_store.db.

- Responses and conversation name mappings survive restarts
- Same LRU eviction behavior (configurable max_size)
- WAL mode for concurrent read performance
- Falls back to in-memory SQLite if disk path unavailable
- Conversation name→response_id mapping moved into the store
2026-03-22 04:56:06 -07:00
Teknium
0e64a48743 Merge pull request #2460 from NousResearch/hermes/hermes-5d6932ba
fix(discord): properly route slash event handling in threads
2026-03-22 04:28:53 -07:00
Teknium
ffa8b562e9 fix(discord): properly route slash event handling in threads
Cherry-picked from PR #2017 by @simpolism. Fixes #2011.

Discord slash commands in threads were missing thread_id in the
SessionSource, causing them to route to the parent channel session.
Commands like /usage and /reset returned wrong data or affected the
wrong session.

Detects discord.Thread channels in _build_slash_event and sets
chat_type='thread' with thread_id. Two tests added.
2026-03-22 04:25:19 -07:00
Teknium
0f1c970179 fix(api-server): harden jobs API — input limits, field whitelist, startup check, tests
Five improvements to the /api/jobs endpoints:

1. Startup availability check — cron module imported once at class load,
   endpoints return 501 if unavailable (not 500 per-request import error)
2. Input limits — name ≤ 200 chars, prompt ≤ 5000 chars, repeat must be
   positive int
3. Update field whitelist — only name/schedule/prompt/deliver/skills/
   repeat/enabled pass through to cron.jobs.update_job, preventing
   arbitrary key injection
4. Deduplicated validation — _check_job_id and _check_jobs_available
   helpers replace repeated boilerplate
5. 32 new tests covering all endpoints, validation, auth, and
   cron-unavailable cases
2026-03-22 04:18:18 -07:00
Teknium
e109a8b502 fix(security): block untrusted browser access to api server (#2451)
Co-authored-by: ifrederico <fr@tecompanytea.com>
2026-03-22 04:08:48 -07:00
Teknium
342096b4bd feat(gateway): cache AIAgent per session for prompt caching
The gateway created a fresh AIAgent per message, rebuilding the system
prompt (including memory, skills, context files) every turn. This broke
prompt prefix caching — providers like Anthropic charge ~10x more for
uncached prefixes.

Now caches AIAgent instances per session_key with a config signature.
The cached agent is reused across messages in the same session,
preserving the frozen system prompt and tool schemas. Cache is
invalidated when:
- Config changes (model, provider, toolsets, reasoning, ephemeral
  prompt) — detected via signature mismatch
- /new, /reset, /clear — explicit session reset
- /model — global model change clears all cached agents
- /reasoning — global reasoning change clears all cached agents

Per-message state (callbacks, stream consumers, progress queues) is
set on the agent instance before each run_conversation() call.

This matches CLI behavior where a single AIAgent lives across all turns
in a session, with _cached_system_prompt built once and reused.
2026-03-21 16:21:06 -07:00
unmodeled-tyler
fb48b8f0c5 fix(gateway): pass message_thread_id in send_image_file, send_document, send_video
Fixes #1803. send_image_file, send_document, and send_video were missing
message_thread_id forwarding, causing them to fail in Telegram forum/supergroups
where thread_id is required. send_voice already handled this correctly. Adds
metadata parameter + message_thread_id to all three methods, and adds tests
covering the thread_id forwarding path.
2026-03-21 09:49:33 -07:00
Teknium
8304a7716d fix(gateway): restart on whatsapp bridge child exit (#2334)
Co-authored-by: Frederico Ribeiro <fr@tecompanytea.com>
2026-03-21 09:38:52 -07:00
Himess
bc15f6cca3 fix(mattermost): use MIME types for media attachments
Bare strings like "image", "audio", "document" were appended to
media_types, but downstream run.py checks mtype.startswith("image/")
and mtype.startswith("audio/"), which never matched. This caused all
Mattermost file attachments to be silently dropped from vision/STT
processing. Use the actual MIME type from file_info instead.
2026-03-21 09:31:15 -07:00
Teknium
28bb0e770f fix(voice): enable TTS voice reply when streaming is active (#2322)
When streaming is enabled, the base adapter receives None from
_handle_message (already_sent=True) and cannot run auto-TTS for
voice input. The runner was unconditionally skipping voice input
TTS assuming the base adapter would handle it.

Now the runner takes over TTS responsibility when streaming has
already delivered the text response, so voice channel playback
works with both streaming on and off.

Streaming off behavior is unchanged (default already_sent=False
preserves the original code path exactly).

Co-authored-by: 0xbyt4 <35742124+0xbyt4@users.noreply.github.com>
2026-03-21 08:08:37 -07:00
Teknium
488a30e879 fix(gateway): retry Telegram 409 polling conflicts before giving up
A single Telegram 409 Conflict from getUpdates permanently killed
Telegram polling with no recovery possible (retryable=False on
first occurrence).  This is too aggressive for production use with
process supervisors.

Transient 409s are expected during:
- --replace handoffs where the old long-poll session lingers on
  Telegram servers for a few seconds after SIGTERM
- systemd Restart=on-failure respawns that overlap with the dying
  instance cleanup

Now _handle_polling_conflict() retries up to 3 times with a
10-second delay between attempts.  The 30-second total retry window
lets stale server-side sessions expire.  If all retries fail, the
error is still marked as permanently fatal — preserving the original
protection against genuine dual-instance conflicts.

Tests updated: split the single conflict test into two — one verifying
retry on transient conflict, one verifying fatal after exhausted
retries.

Closes #2296
2026-03-21 07:11:06 -07:00
Teknium
f853e50589 Merge pull request #2199 from llbn/fix/telegram-markdownv2-features
Clean PR, well-tested. Adds MarkdownV2 strikethrough, spoiler, and blockquote support to Telegram adapter.
2026-03-20 12:45:47 -07:00
llbn
43b3a0ac66 fix(telegram): escape backslashes and backticks inside code entities for MarkdownV2
- Escape \ → \\ inside inline code and fenced code blocks
- Escape ` → \` inside fenced code block bodies (not delimiters)
- Add regression tests for code entity backslash handling
2026-03-20 18:32:45 +01:00
llbn
02f639e561 fix(telegram): add MarkdownV2 support for strikethrough, spoiler, and blockquotes
- Convert ~~text~~ to ~text~ (MarkdownV2 strikethrough)
- Protect ||text|| from pipe escaping (MarkdownV2 spoiler)
- Preserve > at line start as blockquote instead of escaping it
- Update _strip_mdv2() to strip ~strikethrough~ and ||spoiler|| markers
- Add tests covering new formatting paths and edge cases
2026-03-20 18:21:24 +01:00
Test
e140c02d51 feat(gateway): add webhook platform adapter for external event triggers
Add a generic webhook platform adapter that receives HTTP POSTs from
external services (GitHub, GitLab, JIRA, Stripe, etc.), validates HMAC
signatures, transforms payloads into agent prompts, and routes responses
back to the source or to another platform.

Features:
- Configurable routes with per-route HMAC secrets, event filters,
  prompt templates with dot-notation payload access, skill loading,
  and pluggable delivery (github_comment, telegram, discord, log)
- HMAC signature validation (GitHub SHA-256, GitLab token, generic)
- Rate limiting (30 req/min per route, configurable)
- Idempotency cache (1hr TTL, prevents duplicate runs on retries)
- Body size limits (1MB default, checked before reading payload)
- Setup wizard integration with security warnings and docs links
- 33 tests (29 unit + 4 integration), all passing

Security:
- HMAC secret required per route (startup validation)
- Setup wizard warns about internet exposure for webhook/SMS platforms
- Sandboxing (Docker/VM) recommended in docs for public-facing deployments

Files changed:
- gateway/config.py — Platform.WEBHOOK enum + env var overrides
- gateway/platforms/webhook.py — WebhookAdapter (~420 lines)
- gateway/run.py — factory wiring + auth bypass for webhook events
- hermes_cli/config.py — WEBHOOK_* env var definitions
- hermes_cli/setup.py — webhook section in setup_gateway()
- tests/gateway/test_webhook_adapter.py — 29 unit tests
- tests/gateway/test_webhook_integration.py — 4 integration tests
- website/docs/user-guide/messaging/webhooks.md — full user docs
- website/docs/reference/environment-variables.md — WEBHOOK_* vars
- website/sidebars.ts — nav entry
2026-03-20 06:33:36 -07:00
Test
fc061c2fee fix: harden sentinel guard for /stop during setup and shutdown
- /stop during sentinel returns helpful message instead of queuing
- Shutdown loop skips sentinel entries instead of catching AttributeError
- _handle_stop_command guards against sentinel (defensive)
- Added tests for both edge cases (7 total race guard tests)
2026-03-19 18:26:09 -07:00
Gutslabs
aaa96713d4 fix(gateway): prevent concurrent agent runs for the same session
Place a sentinel in _running_agents immediately after the "already
running" guard check passes — before any await.  Without this, the
numerous await points between the guard (line 1324) and agent
registration (track_agent at line 4790) create a window where a
second message for the same session can bypass the guard and start
a duplicate agent, corrupting the transcript.

The await gap includes: hook emissions, vision enrichment (external
API call), audio transcription (external API call), session hygiene
compression, and the run_in_executor call itself.  For messages with
media attachments the window can be several seconds wide.

The sentinel is wrapped in try/finally so it is always cleaned up —
even if the handler raises or takes an early-return path.  When the
real AIAgent is created, track_agent() overwrites the sentinel with
the actual instance (preserving interrupt support).

Also handles the edge case where a message arrives while the sentinel
is set but no real agent exists yet: the message is queued via the
adapter's pending-message mechanism instead of attempting to call
interrupt() on the sentinel object.
2026-03-19 18:23:24 -07:00
Teknium
7b6d14e62a fix(gateway): replace bare text approval with /approve and /deny commands (#2002)
The gateway approval system previously intercepted bare 'yes'/'no' text
from the user's next message to approve/deny dangerous commands. This was
fragile and dangerous — if the agent asked a clarify question and the user
said 'yes' to answer it, the gateway would execute the pending dangerous
command instead. (Fixes #1888)

Changes:
- Remove bare text matching ('yes', 'y', 'approve', 'ok', etc.) from
  _handle_message approval check
- Add /approve and /deny as gateway-only slash commands in the command
  registry
- /approve supports scoping: /approve (one-time), /approve session,
  /approve always (permanent)
- Add 5-minute timeout for stale approvals
- Gateway appends structured instructions to the agent response when a
  dangerous command is pending, telling the user exactly how to respond
- 9 tests covering approve, deny, timeout, scoping, and verification
  that bare 'yes' no longer triggers execution

Credit to @solo386 and @FlyByNight69420 for identifying and reporting
this security issue in PR #1971 and issue #1888.

Co-authored-by: Test <test@test.com>
2026-03-18 16:58:20 -07:00
Teknium
0a247a50f2 feat: support ignoring unauthorized gateway DMs (#1919)
Add unauthorized_dm_behavior config (pair|ignore) with global default
and per-platform override. WhatsApp can silently drop unknown DMs
instead of sending pairing codes.

Adapted config bridging to work with gw_data dict (pre-construction)
rather than config object. Dropped implementation plan document.

Co-authored-by: Frederico Ribeiro <fr@tecompanytea.com>
2026-03-18 04:06:08 -07:00
TheSameCat2
5c4c4b8b7d fix(gateway): detect script-style gateway processes for --replace
Recognize hermes_cli/main.py gateway command lines in gateway
process detection and PID validation so --replace reliably finds
existing gateway instances.

Adds a regression test covering script-style cmdline detection.

Closes #1830
2026-03-18 03:12:59 -07:00
Teknium
dd60bcbfb7 feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter

Salvaged from PR #956, updated for current main.

Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.

Endpoints:
- POST /v1/chat/completions  — stateless Chat Completions API
- POST /v1/responses         — stateful Responses API with chaining
- GET  /v1/responses/{id}    — retrieve stored response
- DELETE /v1/responses/{id}  — delete stored response
- GET  /v1/models            — list hermes-agent as available model
- GET  /health               — health check

Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses

Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()

Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)

Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference

* feat(whatsapp): make reply prefix configurable via config.yaml

Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.

The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:

  whatsapp:
    reply_prefix: ''                     # disable header
    reply_prefix: '🤖 *My Bot*\n───\n'  # custom prefix

How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
  and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
  WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
  or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix

Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.

Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.

---------

Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
Teknium
702191049f fix(session): skip corrupt lines in load_transcript instead of crashing (#1744)
Wrap json.loads() in load_transcript() with try/except JSONDecodeError
so that partial JSONL lines (from mid-write crashes like OOM/SIGKILL)
are skipped with a warning instead of crashing the entire transcript
load. The rest of the history loads fine.

Adds a logger.warning with the session ID and truncated corrupt line
content for debugging visibility.

Salvaged from PR #1193 by alireza78a.
Closes #1193
2026-03-17 05:18:12 -07:00
Teknium
d87655afff fix(gateway): persist watcher metadata in checkpoint for crash recovery (#1706)
Salvaged from PR #1573 by @eren-karakus0. Cherry-picked with authorship preserved.

Fixes #1143 — background process notifications resume after gateway restart.

Co-authored-by: Muhammet Eren Karakuş <erenkar950@gmail.com>
2026-03-17 03:52:15 -07:00
Teknium
d417ba2a48 feat: add route-aware pricing estimates (#1695)
Salvaged from PR #1563 by @kshitijk4poor. Cherry-picked with authorship preserved.

- Route-aware pricing architecture replacing static MODEL_PRICING + heuristics
- Canonical usage normalization (Anthropic/OpenAI/Codex API shapes)
- Cache-aware billing (separate cache_read/cache_write rates)
- Cost status tracking (estimated/included/unknown/actual)
- OpenRouter live pricing via models API
- Schema migration v4→v5 with billing metadata columns
- Removed speculative forward-looking entries
- Removed cost display from CLI status bar
- Threaded OpenRouter metadata pre-warm

Co-authored-by: kshitij <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-17 03:44:44 -07:00
teknium1
c3ce6108e3 test: add comprehensive tests for Mattermost and Matrix adapters
77 tests covering:

Mattermost (37 tests):
- Platform enum and config loading
- Message formatting (image markdown stripping)
- Message chunking at 4000 chars
- Send with mocked aiohttp (payload, threading, errors)
- WebSocket event parsing (double-encoded JSON!)
- File upload flow
- Post dedup cache (TTL, pruning)
- Requirements check

Matrix (40 tests):
- Platform enum and config loading (token + password auth, E2EE)
- mxc:// to HTTP URL conversion (authenticated v1.11+ endpoint)
- DM detection via m.direct cache
- Reply fallback stripping
- Thread detection from m.relates_to
- Message formatting and markdown to HTML
- Display name resolution
- Requirements check
2026-03-17 03:18:16 -07:00
Teknium
07549c967a feat: add SMS (Twilio) platform adapter
Add SMS as a first-class messaging platform via the Twilio API.
Shares credentials with the existing telephony skill — same
TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN, TWILIO_PHONE_NUMBER env vars.

Adapter (gateway/platforms/sms.py):
- aiohttp webhook server for inbound (Twilio form-encoded POSTs)
- Twilio REST API with Basic auth for outbound
- Markdown stripping, smart chunking at 1600 chars
- Echo loop prevention, phone number redaction in logs

Integration (13 files):
- gateway config, run, channel_directory
- agent prompt_builder (SMS platform hint)
- cron scheduler, cronjob tools
- send_message_tool (_send_sms via Twilio API)
- toolsets (hermes-sms + hermes-gateway)
- gateway setup wizard, status display
- pyproject.toml (sms optional extra)
- 21 tests

Docs:
- website/docs/user-guide/messaging/sms.md (full setup guide)
- Updated messaging index (architecture, toolsets, security, links)
- Updated environment-variables.md reference

Inspired by PR #1575 (@sunsakis), rewritten for Twilio.
2026-03-17 03:14:53 -07:00
Teknium
a6dcc231f8 feat(gateway): add DingTalk platform adapter (#1685)
Add DingTalk as a messaging platform using the dingtalk-stream SDK
for real-time message reception via Stream Mode (no webhook needed).
Replies are sent via session webhook using markdown format.

Features:
- Stream Mode connection (long-lived WebSocket, no public URL needed)
- Text and rich text message support
- DM and group chat support
- Message deduplication with 5-minute window
- Auto-reconnection with exponential backoff
- Session webhook caching for reply routing

Configuration:
  export DINGTALK_CLIENT_ID=your-app-key
  export DINGTALK_CLIENT_SECRET=your-app-secret

  # or in config.yaml:
  platforms:
    dingtalk:
      enabled: true
      extra:
        client_id: your-app-key
        client_secret: your-app-secret

Files:
- gateway/platforms/dingtalk.py (340 lines) — adapter implementation
- gateway/config.py — add DINGTALK to Platform enum
- gateway/run.py — add DingTalk to _create_adapter
- hermes_cli/config.py — add env vars to _EXTRA_ENV_KEYS
- hermes_cli/tools_config.py — add dingtalk to PLATFORMS
- tests/gateway/test_dingtalk.py — 21 tests
2026-03-17 03:04:58 -07:00
Teknium
fd61ae13e5 revert: revert SMS (Telnyx) platform adapter for review
This reverts commit ef67037f8e.
2026-03-17 02:53:30 -07:00
Teknium
ef67037f8e feat: add SMS (Telnyx) platform adapter
Implement SMS as a first-class messaging platform following
ADDING_A_PLATFORM.md checklist. All 16 integration points covered:

- gateway/platforms/sms.py: Core adapter with aiohttp webhook server,
  Telnyx REST API send, markdown stripping, 1600-char chunking,
  echo loop prevention, multi-number reply-from tracking
- gateway/config.py: Platform.SMS enum + env override block
- gateway/run.py: Adapter factory + auth maps (SMS_ALLOWED_USERS,
  SMS_ALLOW_ALL_USERS)
- toolsets.py: hermes-sms toolset + included in hermes-gateway
- cron/scheduler.py: SMS in platform_map for cron delivery
- tools/send_message_tool.py: SMS routing + _send_sms() standalone sender
- tools/cronjob_tools.py: 'sms' in deliver description
- gateway/channel_directory.py: SMS in session-based discovery
- agent/prompt_builder.py: SMS platform hint (plain text, concise)
- hermes_cli/status.py: SMS in platforms status display
- hermes_cli/gateway.py: SMS in setup wizard with Telnyx instructions
- pyproject.toml: sms optional dependency group (aiohttp>=3.9.0)
- tests/gateway/test_sms.py: Unit tests for config, format, truncate,
  echo prevention, requirements, toolset integration

Co-authored-by: sunsakis <teo@sunsakis.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 02:52:34 -07:00
Teknium
d156942419 fix(telegram): aggregate split text messages before dispatching (#1674)
When a user sends a long message, Telegram clients split it into
multiple updates that arrive within milliseconds of each other.
Previously each chunk was dispatched independently — the first would
start the agent, and subsequent chunks would interrupt or queue as
separate turns, causing the agent to only see part of the message.

Add text message batching to TelegramAdapter following the same pattern
as the existing photo burst batching:

- _enqueue_text_event() buffers text by session key, concatenating
  chunks that arrive in rapid succession
- _flush_text_batch() dispatches the combined message after a 0.6s
  quiet period (configurable via HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS)
- Timer resets on each new chunk, so all parts of a split arrive
  before the batch is dispatched

Reported by NulledVector on Discord.
2026-03-17 02:49:57 -07:00
teknium1
c8582fc4a2 fix(discord): persist thread participation across gateway restarts
_bot_participated_threads was an in-memory set — lost on every restart.
After restart, the bot forgot which threads it was active in, requiring
fresh @mentions and potentially creating duplicate threads instead of
continuing existing conversations.

Changes:
- Persist thread IDs to ~/.hermes/discord_threads.json
- Load on adapter init, save on every new thread participation
- _track_thread() replaces direct .add() calls for atomic persist
- Cap at 500 tracked threads to prevent unbounded growth
- /thread slash command also tracks participation
- 7 new tests covering persistence, restart survival, corruption
  recovery, cap enforcement
2026-03-17 02:26:34 -07:00
Teknium
4920c5940f feat: auto-detect local file paths in gateway responses for native media delivery (#1640)
Small models (7B-14B) can't reliably use MEDIA: or IMAGE: syntax. This
adds extract_local_files() to BasePlatformAdapter that regex-detects
bare local file paths ending in image/video extensions, validates them
with os.path.isfile(), and delivers them as native platform attachments.

Hardened over the original PR:
- Code-block exclusion: paths inside fenced blocks and inline code are
  skipped so code samples are never mutilated
- URL rejection: negative lookbehind prevents matching path segments
  inside HTTP URLs
- Relative path rejection: ./foo.png no longer matches
- Tilde path cleanup: raw ~/... form is removed from response text
- Deduplication by expanded path
- Added .webm to _VIDEO_EXTS
- Fallback to send_document for unrecognized media extensions

Based on PR #1636 by sudoingX.

Co-authored-by: sudoingX <sudoingX@users.noreply.github.com>
2026-03-17 01:47:34 -07:00