feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
"""
|
|
|
|
|
OpenAI-compatible API server platform adapter.
|
|
|
|
|
|
|
|
|
|
Exposes an HTTP server with endpoints:
|
2026-04-01 11:29:20 -07:00
|
|
|
- POST /v1/chat/completions — OpenAI Chat Completions format (stateless; opt-in session continuity via X-Hermes-Session-Id header)
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
- POST /v1/responses — OpenAI Responses API format (stateful via previous_response_id)
|
|
|
|
|
- GET /v1/responses/{response_id} — Retrieve a stored response
|
|
|
|
|
- DELETE /v1/responses/{response_id} — Delete a stored response
|
|
|
|
|
- GET /v1/models — lists hermes-agent as an available model
|
feat(api): structured run events via /v1/runs SSE endpoint
Add POST /v1/runs to start async agent runs and GET /v1/runs/{run_id}/events
for SSE streaming of typed lifecycle events (tool.started, tool.completed,
message.delta, reasoning.available, run.completed, run.failed).
Changes the internal tool_progress_callback signature from positional
(tool_name, preview, args) to event-type-first
(event_type, tool_name, preview, args, **kwargs). Existing consumers
filter on event_type and remain backward-compatible.
Adds concurrency limit (_MAX_CONCURRENT_RUNS=10) and orphaned run sweep.
Fixes logic inversion in cli.py _on_tool_progress where the original PR
would have displayed internal tools instead of non-internal ones.
Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
2026-04-05 11:52:46 -07:00
|
|
|
- POST /v1/runs — start a run, returns run_id immediately (202)
|
|
|
|
|
- GET /v1/runs/{run_id}/events — SSE stream of structured lifecycle events
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
- GET /health — health check
|
2026-04-14 05:17:17 +00:00
|
|
|
- GET /health/detailed — rich status for cross-container dashboard probing
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
|
|
|
|
|
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
|
|
|
|
|
AnythingLLM, NextChat, ChatBox, etc.) can connect to hermes-agent
|
|
|
|
|
through this adapter by pointing at http://localhost:8642/v1.
|
|
|
|
|
|
|
|
|
|
Requires:
|
|
|
|
|
- aiohttp (already available in the gateway)
|
|
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
import asyncio
|
2026-04-10 04:56:35 -07:00
|
|
|
import hashlib
|
fix(security): consolidated security hardening — SSRF, timing attack, tar traversal, credential leakage (#5944)
Salvaged from PRs #5800 (memosr), #5806 (memosr), #5915 (Ruzzgar), #5928 (Awsh1).
Changes:
- Use hmac.compare_digest for API key comparison (timing attack prevention)
- Apply provider env var blocklist to Docker containers (credential leakage)
- Replace tar.extractall() with safe extraction in TerminalBench2 (CVE-2007-4559)
- Add SSRF protection via is_safe_url to ALL platform adapters:
base.py (cache_image_from_url, cache_audio_from_url),
discord, slack, telegram, matrix, mattermost, feishu, wecom
(Signal and WhatsApp protected via base.py helpers)
- Update tests: mock is_safe_url in Mattermost download tests
- Add security tests for tar extraction (traversal, symlinks, safe files)
2026-04-07 17:28:37 -07:00
|
|
|
import hmac
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
import json
|
|
|
|
|
import logging
|
|
|
|
|
import os
|
2026-04-10 16:40:54 -07:00
|
|
|
import socket as _socket
|
2026-04-10 11:52:01 +08:00
|
|
|
import re
|
2026-03-22 04:56:06 -07:00
|
|
|
import sqlite3
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
import time
|
|
|
|
|
import uuid
|
|
|
|
|
from typing import Any, Dict, List, Optional
|
|
|
|
|
|
|
|
|
|
try:
|
|
|
|
|
from aiohttp import web
|
|
|
|
|
AIOHTTP_AVAILABLE = True
|
|
|
|
|
except ImportError:
|
|
|
|
|
AIOHTTP_AVAILABLE = False
|
|
|
|
|
web = None # type: ignore[assignment]
|
|
|
|
|
|
|
|
|
|
from gateway.config import Platform, PlatformConfig
|
|
|
|
|
from gateway.platforms.base import (
|
|
|
|
|
BasePlatformAdapter,
|
|
|
|
|
SendResult,
|
2026-04-10 16:40:54 -07:00
|
|
|
is_network_accessible,
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
)
|
|
|
|
|
|
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
|
|
|
|
# Default settings
|
|
|
|
|
DEFAULT_HOST = "127.0.0.1"
|
|
|
|
|
DEFAULT_PORT = 8642
|
|
|
|
|
MAX_STORED_RESPONSES = 100
|
2026-03-24 19:31:08 -07:00
|
|
|
MAX_REQUEST_BYTES = 1_000_000 # 1 MB default limit for POST bodies
|
2026-04-11 12:42:01 -06:00
|
|
|
CHAT_COMPLETIONS_SSE_KEEPALIVE_SECONDS = 30.0
|
2026-04-12 17:16:16 -07:00
|
|
|
MAX_NORMALIZED_TEXT_LENGTH = 65_536 # 64 KB cap for normalized content parts
|
|
|
|
|
MAX_CONTENT_LIST_SIZE = 1_000 # Max items when content is an array
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _normalize_chat_content(
|
|
|
|
|
content: Any, *, _max_depth: int = 10, _depth: int = 0,
|
|
|
|
|
) -> str:
|
|
|
|
|
"""Normalize OpenAI chat message content into a plain text string.
|
|
|
|
|
|
|
|
|
|
Some clients (Open WebUI, LobeChat, etc.) send content as an array of
|
|
|
|
|
typed parts instead of a plain string::
|
|
|
|
|
|
|
|
|
|
[{"type": "text", "text": "hello"}, {"type": "input_text", "text": "..."}]
|
|
|
|
|
|
|
|
|
|
This function flattens those into a single string so the agent pipeline
|
|
|
|
|
(which expects strings) doesn't choke.
|
|
|
|
|
|
|
|
|
|
Defensive limits prevent abuse: recursion depth, list size, and output
|
|
|
|
|
length are all bounded.
|
|
|
|
|
"""
|
|
|
|
|
if _depth > _max_depth:
|
|
|
|
|
return ""
|
|
|
|
|
if content is None:
|
|
|
|
|
return ""
|
|
|
|
|
if isinstance(content, str):
|
|
|
|
|
return content[:MAX_NORMALIZED_TEXT_LENGTH] if len(content) > MAX_NORMALIZED_TEXT_LENGTH else content
|
|
|
|
|
|
|
|
|
|
if isinstance(content, list):
|
|
|
|
|
parts: List[str] = []
|
|
|
|
|
items = content[:MAX_CONTENT_LIST_SIZE] if len(content) > MAX_CONTENT_LIST_SIZE else content
|
|
|
|
|
for item in items:
|
|
|
|
|
if isinstance(item, str):
|
|
|
|
|
if item:
|
|
|
|
|
parts.append(item[:MAX_NORMALIZED_TEXT_LENGTH])
|
|
|
|
|
elif isinstance(item, dict):
|
|
|
|
|
item_type = str(item.get("type") or "").strip().lower()
|
|
|
|
|
if item_type in {"text", "input_text", "output_text"}:
|
|
|
|
|
text = item.get("text", "")
|
|
|
|
|
if text:
|
|
|
|
|
try:
|
|
|
|
|
parts.append(str(text)[:MAX_NORMALIZED_TEXT_LENGTH])
|
|
|
|
|
except Exception:
|
|
|
|
|
pass
|
|
|
|
|
# Silently skip image_url / other non-text parts
|
|
|
|
|
elif isinstance(item, list):
|
|
|
|
|
nested = _normalize_chat_content(item, _max_depth=_max_depth, _depth=_depth + 1)
|
|
|
|
|
if nested:
|
|
|
|
|
parts.append(nested)
|
|
|
|
|
# Check accumulated size
|
|
|
|
|
if sum(len(p) for p in parts) >= MAX_NORMALIZED_TEXT_LENGTH:
|
|
|
|
|
break
|
|
|
|
|
result = "\n".join(parts)
|
|
|
|
|
return result[:MAX_NORMALIZED_TEXT_LENGTH] if len(result) > MAX_NORMALIZED_TEXT_LENGTH else result
|
|
|
|
|
|
|
|
|
|
# Fallback for unexpected types (int, float, bool, etc.)
|
|
|
|
|
try:
|
|
|
|
|
result = str(content)
|
|
|
|
|
return result[:MAX_NORMALIZED_TEXT_LENGTH] if len(result) > MAX_NORMALIZED_TEXT_LENGTH else result
|
|
|
|
|
except Exception:
|
|
|
|
|
return ""
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
|
|
|
|
|
|
|
|
|
|
def check_api_server_requirements() -> bool:
|
|
|
|
|
"""Check if API server dependencies are available."""
|
|
|
|
|
return AIOHTTP_AVAILABLE
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
class ResponseStore:
|
|
|
|
|
"""
|
2026-03-22 04:56:06 -07:00
|
|
|
SQLite-backed LRU store for Responses API state.
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
|
|
|
|
|
Each stored response includes the full internal conversation history
|
|
|
|
|
(with tool calls and results) so it can be reconstructed on subsequent
|
|
|
|
|
requests via previous_response_id.
|
2026-03-22 04:56:06 -07:00
|
|
|
|
|
|
|
|
Persists across gateway restarts. Falls back to in-memory SQLite
|
|
|
|
|
if the on-disk path is unavailable.
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
"""
|
|
|
|
|
|
2026-03-22 04:56:06 -07:00
|
|
|
def __init__(self, max_size: int = MAX_STORED_RESPONSES, db_path: str = None):
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
self._max_size = max_size
|
2026-03-22 04:56:06 -07:00
|
|
|
if db_path is None:
|
|
|
|
|
try:
|
|
|
|
|
from hermes_cli.config import get_hermes_home
|
|
|
|
|
db_path = str(get_hermes_home() / "response_store.db")
|
|
|
|
|
except Exception:
|
|
|
|
|
db_path = ":memory:"
|
|
|
|
|
try:
|
|
|
|
|
self._conn = sqlite3.connect(db_path, check_same_thread=False)
|
|
|
|
|
except Exception:
|
|
|
|
|
self._conn = sqlite3.connect(":memory:", check_same_thread=False)
|
|
|
|
|
self._conn.execute("PRAGMA journal_mode=WAL")
|
|
|
|
|
self._conn.execute(
|
|
|
|
|
"""CREATE TABLE IF NOT EXISTS responses (
|
|
|
|
|
response_id TEXT PRIMARY KEY,
|
|
|
|
|
data TEXT NOT NULL,
|
|
|
|
|
accessed_at REAL NOT NULL
|
|
|
|
|
)"""
|
|
|
|
|
)
|
|
|
|
|
self._conn.execute(
|
|
|
|
|
"""CREATE TABLE IF NOT EXISTS conversations (
|
|
|
|
|
name TEXT PRIMARY KEY,
|
|
|
|
|
response_id TEXT NOT NULL
|
|
|
|
|
)"""
|
|
|
|
|
)
|
|
|
|
|
self._conn.commit()
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
|
|
|
|
|
def get(self, response_id: str) -> Optional[Dict[str, Any]]:
|
2026-03-22 04:56:06 -07:00
|
|
|
"""Retrieve a stored response by ID (updates access time for LRU)."""
|
|
|
|
|
row = self._conn.execute(
|
|
|
|
|
"SELECT data FROM responses WHERE response_id = ?", (response_id,)
|
|
|
|
|
).fetchone()
|
|
|
|
|
if row is None:
|
|
|
|
|
return None
|
|
|
|
|
import time
|
|
|
|
|
self._conn.execute(
|
|
|
|
|
"UPDATE responses SET accessed_at = ? WHERE response_id = ?",
|
|
|
|
|
(time.time(), response_id),
|
|
|
|
|
)
|
|
|
|
|
self._conn.commit()
|
|
|
|
|
return json.loads(row[0])
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
|
|
|
|
|
def put(self, response_id: str, data: Dict[str, Any]) -> None:
|
|
|
|
|
"""Store a response, evicting the oldest if at capacity."""
|
2026-03-22 04:56:06 -07:00
|
|
|
import time
|
|
|
|
|
self._conn.execute(
|
|
|
|
|
"INSERT OR REPLACE INTO responses (response_id, data, accessed_at) VALUES (?, ?, ?)",
|
|
|
|
|
(response_id, json.dumps(data, default=str), time.time()),
|
|
|
|
|
)
|
|
|
|
|
# Evict oldest entries beyond max_size
|
|
|
|
|
count = self._conn.execute("SELECT COUNT(*) FROM responses").fetchone()[0]
|
|
|
|
|
if count > self._max_size:
|
|
|
|
|
self._conn.execute(
|
|
|
|
|
"DELETE FROM responses WHERE response_id IN "
|
|
|
|
|
"(SELECT response_id FROM responses ORDER BY accessed_at ASC LIMIT ?)",
|
|
|
|
|
(count - self._max_size,),
|
|
|
|
|
)
|
|
|
|
|
self._conn.commit()
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
|
|
|
|
|
def delete(self, response_id: str) -> bool:
|
|
|
|
|
"""Remove a response from the store. Returns True if found and deleted."""
|
2026-03-22 04:56:06 -07:00
|
|
|
cursor = self._conn.execute(
|
|
|
|
|
"DELETE FROM responses WHERE response_id = ?", (response_id,)
|
|
|
|
|
)
|
|
|
|
|
self._conn.commit()
|
|
|
|
|
return cursor.rowcount > 0
|
|
|
|
|
|
|
|
|
|
def get_conversation(self, name: str) -> Optional[str]:
|
|
|
|
|
"""Get the latest response_id for a conversation name."""
|
|
|
|
|
row = self._conn.execute(
|
|
|
|
|
"SELECT response_id FROM conversations WHERE name = ?", (name,)
|
|
|
|
|
).fetchone()
|
|
|
|
|
return row[0] if row else None
|
|
|
|
|
|
|
|
|
|
def set_conversation(self, name: str, response_id: str) -> None:
|
|
|
|
|
"""Map a conversation name to its latest response_id."""
|
|
|
|
|
self._conn.execute(
|
|
|
|
|
"INSERT OR REPLACE INTO conversations (name, response_id) VALUES (?, ?)",
|
|
|
|
|
(name, response_id),
|
|
|
|
|
)
|
|
|
|
|
self._conn.commit()
|
|
|
|
|
|
|
|
|
|
def close(self) -> None:
|
|
|
|
|
"""Close the database connection."""
|
|
|
|
|
try:
|
|
|
|
|
self._conn.close()
|
|
|
|
|
except Exception:
|
|
|
|
|
pass
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
|
|
|
|
|
def __len__(self) -> int:
|
2026-03-22 04:56:06 -07:00
|
|
|
row = self._conn.execute("SELECT COUNT(*) FROM responses").fetchone()
|
|
|
|
|
return row[0] if row else 0
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
|
|
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# CORS middleware
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
_CORS_HEADERS = {
|
|
|
|
|
"Access-Control-Allow-Methods": "GET, POST, DELETE, OPTIONS",
|
2026-03-28 08:16:41 -07:00
|
|
|
"Access-Control-Allow-Headers": "Authorization, Content-Type, Idempotency-Key",
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
if AIOHTTP_AVAILABLE:
|
|
|
|
|
@web.middleware
|
|
|
|
|
async def cors_middleware(request, handler):
|
2026-03-22 04:08:48 -07:00
|
|
|
"""Add CORS headers for explicitly allowed origins; handle OPTIONS preflight."""
|
|
|
|
|
adapter = request.app.get("api_server_adapter")
|
|
|
|
|
origin = request.headers.get("Origin", "")
|
|
|
|
|
cors_headers = None
|
|
|
|
|
if adapter is not None:
|
|
|
|
|
if not adapter._origin_allowed(origin):
|
|
|
|
|
return web.Response(status=403)
|
|
|
|
|
cors_headers = adapter._cors_headers_for_origin(origin)
|
|
|
|
|
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
if request.method == "OPTIONS":
|
2026-03-22 04:08:48 -07:00
|
|
|
if cors_headers is None:
|
|
|
|
|
return web.Response(status=403)
|
|
|
|
|
return web.Response(status=200, headers=cors_headers)
|
|
|
|
|
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
response = await handler(request)
|
2026-03-22 04:08:48 -07:00
|
|
|
if cors_headers is not None:
|
|
|
|
|
response.headers.update(cors_headers)
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
return response
|
|
|
|
|
else:
|
|
|
|
|
cors_middleware = None # type: ignore[assignment]
|
|
|
|
|
|
|
|
|
|
|
2026-03-24 19:31:08 -07:00
|
|
|
def _openai_error(message: str, err_type: str = "invalid_request_error", param: str = None, code: str = None) -> Dict[str, Any]:
|
|
|
|
|
"""OpenAI-style error envelope."""
|
|
|
|
|
return {
|
|
|
|
|
"error": {
|
|
|
|
|
"message": message,
|
|
|
|
|
"type": err_type,
|
|
|
|
|
"param": param,
|
|
|
|
|
"code": code,
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
if AIOHTTP_AVAILABLE:
|
|
|
|
|
@web.middleware
|
|
|
|
|
async def body_limit_middleware(request, handler):
|
|
|
|
|
"""Reject overly large request bodies early based on Content-Length."""
|
|
|
|
|
if request.method in ("POST", "PUT", "PATCH"):
|
|
|
|
|
cl = request.headers.get("Content-Length")
|
|
|
|
|
if cl is not None:
|
|
|
|
|
try:
|
|
|
|
|
if int(cl) > MAX_REQUEST_BYTES:
|
|
|
|
|
return web.json_response(_openai_error("Request body too large.", code="body_too_large"), status=413)
|
|
|
|
|
except ValueError:
|
|
|
|
|
return web.json_response(_openai_error("Invalid Content-Length header.", code="invalid_content_length"), status=400)
|
|
|
|
|
return await handler(request)
|
|
|
|
|
else:
|
|
|
|
|
body_limit_middleware = None # type: ignore[assignment]
|
|
|
|
|
|
2026-03-28 14:00:52 -07:00
|
|
|
_SECURITY_HEADERS = {
|
|
|
|
|
"X-Content-Type-Options": "nosniff",
|
|
|
|
|
"Referrer-Policy": "no-referrer",
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
if AIOHTTP_AVAILABLE:
|
|
|
|
|
@web.middleware
|
|
|
|
|
async def security_headers_middleware(request, handler):
|
|
|
|
|
"""Add security headers to all responses (including errors)."""
|
|
|
|
|
response = await handler(request)
|
|
|
|
|
for k, v in _SECURITY_HEADERS.items():
|
|
|
|
|
response.headers.setdefault(k, v)
|
|
|
|
|
return response
|
|
|
|
|
else:
|
|
|
|
|
security_headers_middleware = None # type: ignore[assignment]
|
|
|
|
|
|
2026-03-24 19:31:08 -07:00
|
|
|
|
|
|
|
|
class _IdempotencyCache:
|
|
|
|
|
"""In-memory idempotency cache with TTL and basic LRU semantics."""
|
|
|
|
|
def __init__(self, max_items: int = 1000, ttl_seconds: int = 300):
|
|
|
|
|
from collections import OrderedDict
|
|
|
|
|
self._store = OrderedDict()
|
|
|
|
|
self._ttl = ttl_seconds
|
|
|
|
|
self._max = max_items
|
|
|
|
|
|
|
|
|
|
def _purge(self):
|
|
|
|
|
import time as _t
|
|
|
|
|
now = _t.time()
|
|
|
|
|
expired = [k for k, v in self._store.items() if now - v["ts"] > self._ttl]
|
|
|
|
|
for k in expired:
|
|
|
|
|
self._store.pop(k, None)
|
|
|
|
|
while len(self._store) > self._max:
|
|
|
|
|
self._store.popitem(last=False)
|
|
|
|
|
|
|
|
|
|
async def get_or_set(self, key: str, fingerprint: str, compute_coro):
|
|
|
|
|
self._purge()
|
|
|
|
|
item = self._store.get(key)
|
|
|
|
|
if item and item["fp"] == fingerprint:
|
|
|
|
|
return item["resp"]
|
|
|
|
|
resp = await compute_coro()
|
|
|
|
|
import time as _t
|
|
|
|
|
self._store[key] = {"resp": resp, "fp": fingerprint, "ts": _t.time()}
|
|
|
|
|
self._purge()
|
|
|
|
|
return resp
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
_idem_cache = _IdempotencyCache()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _make_request_fingerprint(body: Dict[str, Any], keys: List[str]) -> str:
|
|
|
|
|
from hashlib import sha256
|
|
|
|
|
subset = {k: body.get(k) for k in keys}
|
|
|
|
|
return sha256(repr(subset).encode("utf-8")).hexdigest()
|
|
|
|
|
|
|
|
|
|
|
2026-04-10 04:56:35 -07:00
|
|
|
def _derive_chat_session_id(
|
|
|
|
|
system_prompt: Optional[str],
|
|
|
|
|
first_user_message: str,
|
|
|
|
|
) -> str:
|
|
|
|
|
"""Derive a stable session ID from the conversation's first user message.
|
|
|
|
|
|
|
|
|
|
OpenAI-compatible frontends (Open WebUI, LibreChat, etc.) send the full
|
|
|
|
|
conversation history with every request. The system prompt and first user
|
|
|
|
|
message are constant across all turns of the same conversation, so hashing
|
|
|
|
|
them produces a deterministic session ID that lets the API server reuse
|
|
|
|
|
the same Hermes session (and therefore the same Docker container sandbox
|
|
|
|
|
directory) across turns.
|
|
|
|
|
"""
|
|
|
|
|
seed = f"{system_prompt or ''}\n{first_user_message}"
|
|
|
|
|
digest = hashlib.sha256(seed.encode("utf-8")).hexdigest()[:16]
|
|
|
|
|
return f"api-{digest}"
|
|
|
|
|
|
|
|
|
|
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
class APIServerAdapter(BasePlatformAdapter):
|
|
|
|
|
"""
|
|
|
|
|
OpenAI-compatible HTTP API server adapter.
|
|
|
|
|
|
|
|
|
|
Runs an aiohttp web server that accepts OpenAI-format requests
|
|
|
|
|
and routes them through hermes-agent's AIAgent.
|
|
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
def __init__(self, config: PlatformConfig):
|
|
|
|
|
super().__init__(config, Platform.API_SERVER)
|
|
|
|
|
extra = config.extra or {}
|
|
|
|
|
self._host: str = extra.get("host", os.getenv("API_SERVER_HOST", DEFAULT_HOST))
|
|
|
|
|
self._port: int = int(extra.get("port", os.getenv("API_SERVER_PORT", str(DEFAULT_PORT))))
|
|
|
|
|
self._api_key: str = extra.get("key", os.getenv("API_SERVER_KEY", ""))
|
2026-03-22 04:08:48 -07:00
|
|
|
self._cors_origins: tuple[str, ...] = self._parse_cors_origins(
|
|
|
|
|
extra.get("cors_origins", os.getenv("API_SERVER_CORS_ORIGINS", "")),
|
|
|
|
|
)
|
2026-04-09 17:07:29 -07:00
|
|
|
self._model_name: str = self._resolve_model_name(
|
|
|
|
|
extra.get("model_name", os.getenv("API_SERVER_MODEL_NAME", "")),
|
|
|
|
|
)
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
self._app: Optional["web.Application"] = None
|
|
|
|
|
self._runner: Optional["web.AppRunner"] = None
|
|
|
|
|
self._site: Optional["web.TCPSite"] = None
|
|
|
|
|
self._response_store = ResponseStore()
|
feat(api): structured run events via /v1/runs SSE endpoint
Add POST /v1/runs to start async agent runs and GET /v1/runs/{run_id}/events
for SSE streaming of typed lifecycle events (tool.started, tool.completed,
message.delta, reasoning.available, run.completed, run.failed).
Changes the internal tool_progress_callback signature from positional
(tool_name, preview, args) to event-type-first
(event_type, tool_name, preview, args, **kwargs). Existing consumers
filter on event_type and remain backward-compatible.
Adds concurrency limit (_MAX_CONCURRENT_RUNS=10) and orphaned run sweep.
Fixes logic inversion in cli.py _on_tool_progress where the original PR
would have displayed internal tools instead of non-internal ones.
Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
2026-04-05 11:52:46 -07:00
|
|
|
# Active run streams: run_id -> asyncio.Queue of SSE event dicts
|
|
|
|
|
self._run_streams: Dict[str, "asyncio.Queue[Optional[Dict]]"] = {}
|
|
|
|
|
# Creation timestamps for orphaned-run TTL sweep
|
|
|
|
|
self._run_streams_created: Dict[str, float] = {}
|
2026-04-01 11:29:20 -07:00
|
|
|
self._session_db: Optional[Any] = None # Lazy-init SessionDB for session continuity
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
|
2026-03-22 04:08:48 -07:00
|
|
|
@staticmethod
|
|
|
|
|
def _parse_cors_origins(value: Any) -> tuple[str, ...]:
|
|
|
|
|
"""Normalize configured CORS origins into a stable tuple."""
|
|
|
|
|
if not value:
|
|
|
|
|
return ()
|
|
|
|
|
|
|
|
|
|
if isinstance(value, str):
|
|
|
|
|
items = value.split(",")
|
|
|
|
|
elif isinstance(value, (list, tuple, set)):
|
|
|
|
|
items = value
|
|
|
|
|
else:
|
|
|
|
|
items = [str(value)]
|
|
|
|
|
|
|
|
|
|
return tuple(str(item).strip() for item in items if str(item).strip())
|
|
|
|
|
|
2026-04-09 17:07:29 -07:00
|
|
|
@staticmethod
|
|
|
|
|
def _resolve_model_name(explicit: str) -> str:
|
|
|
|
|
"""Derive the advertised model name for /v1/models.
|
|
|
|
|
|
|
|
|
|
Priority:
|
|
|
|
|
1. Explicit override (config extra or API_SERVER_MODEL_NAME env var)
|
|
|
|
|
2. Active profile name (so each profile advertises a distinct model)
|
|
|
|
|
3. Fallback: "hermes-agent"
|
|
|
|
|
"""
|
|
|
|
|
if explicit and explicit.strip():
|
|
|
|
|
return explicit.strip()
|
|
|
|
|
try:
|
|
|
|
|
from hermes_cli.profiles import get_active_profile_name
|
|
|
|
|
profile = get_active_profile_name()
|
|
|
|
|
if profile and profile not in ("default", "custom"):
|
|
|
|
|
return profile
|
|
|
|
|
except Exception:
|
|
|
|
|
pass
|
|
|
|
|
return "hermes-agent"
|
|
|
|
|
|
2026-03-22 04:08:48 -07:00
|
|
|
def _cors_headers_for_origin(self, origin: str) -> Optional[Dict[str, str]]:
|
|
|
|
|
"""Return CORS headers for an allowed browser origin."""
|
|
|
|
|
if not origin or not self._cors_origins:
|
|
|
|
|
return None
|
|
|
|
|
|
|
|
|
|
if "*" in self._cors_origins:
|
|
|
|
|
headers = dict(_CORS_HEADERS)
|
|
|
|
|
headers["Access-Control-Allow-Origin"] = "*"
|
2026-03-28 14:00:03 -07:00
|
|
|
headers["Access-Control-Max-Age"] = "600"
|
2026-03-22 04:08:48 -07:00
|
|
|
return headers
|
|
|
|
|
|
|
|
|
|
if origin not in self._cors_origins:
|
|
|
|
|
return None
|
|
|
|
|
|
|
|
|
|
headers = dict(_CORS_HEADERS)
|
|
|
|
|
headers["Access-Control-Allow-Origin"] = origin
|
|
|
|
|
headers["Vary"] = "Origin"
|
2026-03-28 14:00:03 -07:00
|
|
|
headers["Access-Control-Max-Age"] = "600"
|
2026-03-22 04:08:48 -07:00
|
|
|
return headers
|
|
|
|
|
|
|
|
|
|
def _origin_allowed(self, origin: str) -> bool:
|
|
|
|
|
"""Allow non-browser clients and explicitly configured browser origins."""
|
|
|
|
|
if not origin:
|
|
|
|
|
return True
|
|
|
|
|
|
|
|
|
|
if not self._cors_origins:
|
|
|
|
|
return False
|
|
|
|
|
|
|
|
|
|
return "*" in self._cors_origins or origin in self._cors_origins
|
|
|
|
|
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
# Auth helper
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
def _check_auth(self, request: "web.Request") -> Optional["web.Response"]:
|
|
|
|
|
"""
|
|
|
|
|
Validate Bearer token from Authorization header.
|
|
|
|
|
|
|
|
|
|
Returns None if auth is OK, or a 401 web.Response on failure.
|
2026-04-10 16:40:54 -07:00
|
|
|
If no API key is configured, all requests are allowed (only when API
|
|
|
|
|
server is local).
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
"""
|
|
|
|
|
if not self._api_key:
|
|
|
|
|
return None # No key configured — allow all (local-only use)
|
|
|
|
|
|
|
|
|
|
auth_header = request.headers.get("Authorization", "")
|
|
|
|
|
if auth_header.startswith("Bearer "):
|
|
|
|
|
token = auth_header[7:].strip()
|
fix(security): consolidated security hardening — SSRF, timing attack, tar traversal, credential leakage (#5944)
Salvaged from PRs #5800 (memosr), #5806 (memosr), #5915 (Ruzzgar), #5928 (Awsh1).
Changes:
- Use hmac.compare_digest for API key comparison (timing attack prevention)
- Apply provider env var blocklist to Docker containers (credential leakage)
- Replace tar.extractall() with safe extraction in TerminalBench2 (CVE-2007-4559)
- Add SSRF protection via is_safe_url to ALL platform adapters:
base.py (cache_image_from_url, cache_audio_from_url),
discord, slack, telegram, matrix, mattermost, feishu, wecom
(Signal and WhatsApp protected via base.py helpers)
- Update tests: mock is_safe_url in Mattermost download tests
- Add security tests for tar extraction (traversal, symlinks, safe files)
2026-04-07 17:28:37 -07:00
|
|
|
if hmac.compare_digest(token, self._api_key):
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
return None # Auth OK
|
|
|
|
|
|
|
|
|
|
return web.json_response(
|
|
|
|
|
{"error": {"message": "Invalid API key", "type": "invalid_request_error", "code": "invalid_api_key"}},
|
|
|
|
|
status=401,
|
|
|
|
|
)
|
|
|
|
|
|
2026-04-03 10:31:11 -07:00
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
# Session DB helper
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
def _ensure_session_db(self):
|
|
|
|
|
"""Lazily initialise and return the shared SessionDB instance.
|
|
|
|
|
|
|
|
|
|
Sessions are persisted to ``state.db`` so that ``hermes sessions list``
|
|
|
|
|
shows API-server conversations alongside CLI and gateway ones.
|
|
|
|
|
"""
|
|
|
|
|
if self._session_db is None:
|
|
|
|
|
try:
|
|
|
|
|
from hermes_state import SessionDB
|
|
|
|
|
self._session_db = SessionDB()
|
|
|
|
|
except Exception as e:
|
|
|
|
|
logger.debug("SessionDB unavailable for API server: %s", e)
|
|
|
|
|
return self._session_db
|
|
|
|
|
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
# Agent creation helper
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
def _create_agent(
|
|
|
|
|
self,
|
|
|
|
|
ephemeral_system_prompt: Optional[str] = None,
|
|
|
|
|
session_id: Optional[str] = None,
|
|
|
|
|
stream_delta_callback=None,
|
2026-03-30 18:50:27 -07:00
|
|
|
tool_progress_callback=None,
|
2026-04-14 09:48:49 -04:00
|
|
|
tool_start_callback=None,
|
|
|
|
|
tool_complete_callback=None,
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
) -> Any:
|
|
|
|
|
"""
|
|
|
|
|
Create an AIAgent instance using the gateway's runtime config.
|
|
|
|
|
|
|
|
|
|
Uses _resolve_runtime_agent_kwargs() to pick up model, api_key,
|
2026-03-26 18:02:26 -07:00
|
|
|
base_url, etc. from config.yaml / env vars. Toolsets are resolved
|
|
|
|
|
from config.yaml platform_toolsets.api_server (same as all other
|
|
|
|
|
gateway platforms), falling back to the hermes-api-server default.
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
"""
|
|
|
|
|
from run_agent import AIAgent
|
2026-03-26 18:02:26 -07:00
|
|
|
from gateway.run import _resolve_runtime_agent_kwargs, _resolve_gateway_model, _load_gateway_config
|
|
|
|
|
from hermes_cli.tools_config import _get_platform_tools
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
|
|
|
|
|
runtime_kwargs = _resolve_runtime_agent_kwargs()
|
|
|
|
|
model = _resolve_gateway_model()
|
|
|
|
|
|
2026-03-26 18:02:26 -07:00
|
|
|
user_config = _load_gateway_config()
|
|
|
|
|
enabled_toolsets = sorted(_get_platform_tools(user_config, "api_server"))
|
|
|
|
|
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
max_iterations = int(os.getenv("HERMES_MAX_ITERATIONS", "90"))
|
|
|
|
|
|
2026-04-04 11:32:41 +05:30
|
|
|
# Load fallback provider chain so the API server platform has the
|
|
|
|
|
# same fallback behaviour as Telegram/Discord/Slack (fixes #4954).
|
2026-04-05 11:46:06 -07:00
|
|
|
from gateway.run import GatewayRunner
|
|
|
|
|
fallback_model = GatewayRunner._load_fallback_model()
|
2026-04-04 11:32:41 +05:30
|
|
|
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
agent = AIAgent(
|
|
|
|
|
model=model,
|
|
|
|
|
**runtime_kwargs,
|
|
|
|
|
max_iterations=max_iterations,
|
|
|
|
|
quiet_mode=True,
|
|
|
|
|
verbose_logging=False,
|
|
|
|
|
ephemeral_system_prompt=ephemeral_system_prompt or None,
|
2026-03-26 18:02:26 -07:00
|
|
|
enabled_toolsets=enabled_toolsets,
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
session_id=session_id,
|
|
|
|
|
platform="api_server",
|
|
|
|
|
stream_delta_callback=stream_delta_callback,
|
2026-03-30 18:50:27 -07:00
|
|
|
tool_progress_callback=tool_progress_callback,
|
2026-04-14 09:48:49 -04:00
|
|
|
tool_start_callback=tool_start_callback,
|
|
|
|
|
tool_complete_callback=tool_complete_callback,
|
2026-04-03 10:31:11 -07:00
|
|
|
session_db=self._ensure_session_db(),
|
2026-04-04 11:32:41 +05:30
|
|
|
fallback_model=fallback_model,
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
)
|
|
|
|
|
return agent
|
|
|
|
|
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
# HTTP Handlers
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
async def _handle_health(self, request: "web.Request") -> "web.Response":
|
|
|
|
|
"""GET /health — simple health check."""
|
|
|
|
|
return web.json_response({"status": "ok", "platform": "hermes-agent"})
|
|
|
|
|
|
2026-04-14 05:17:17 +00:00
|
|
|
async def _handle_health_detailed(self, request: "web.Request") -> "web.Response":
|
|
|
|
|
"""GET /health/detailed — rich status for cross-container dashboard probing.
|
|
|
|
|
|
|
|
|
|
Returns gateway state, connected platforms, PID, and uptime so the
|
|
|
|
|
dashboard can display full status without needing a shared PID file or
|
|
|
|
|
/proc access. No authentication required.
|
|
|
|
|
"""
|
|
|
|
|
from gateway.status import read_runtime_status
|
|
|
|
|
|
|
|
|
|
runtime = read_runtime_status() or {}
|
|
|
|
|
return web.json_response({
|
|
|
|
|
"status": "ok",
|
|
|
|
|
"platform": "hermes-agent",
|
|
|
|
|
"gateway_state": runtime.get("gateway_state"),
|
|
|
|
|
"platforms": runtime.get("platforms", {}),
|
|
|
|
|
"active_agents": runtime.get("active_agents", 0),
|
|
|
|
|
"exit_reason": runtime.get("exit_reason"),
|
|
|
|
|
"updated_at": runtime.get("updated_at"),
|
|
|
|
|
"pid": os.getpid(),
|
|
|
|
|
})
|
|
|
|
|
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
async def _handle_models(self, request: "web.Request") -> "web.Response":
|
|
|
|
|
"""GET /v1/models — return hermes-agent as an available model."""
|
|
|
|
|
auth_err = self._check_auth(request)
|
|
|
|
|
if auth_err:
|
|
|
|
|
return auth_err
|
|
|
|
|
|
|
|
|
|
return web.json_response({
|
|
|
|
|
"object": "list",
|
|
|
|
|
"data": [
|
|
|
|
|
{
|
2026-04-09 17:07:29 -07:00
|
|
|
"id": self._model_name,
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
"object": "model",
|
|
|
|
|
"created": int(time.time()),
|
|
|
|
|
"owned_by": "hermes",
|
|
|
|
|
"permission": [],
|
2026-04-09 17:07:29 -07:00
|
|
|
"root": self._model_name,
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
"parent": None,
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
})
|
|
|
|
|
|
|
|
|
|
async def _handle_chat_completions(self, request: "web.Request") -> "web.Response":
|
|
|
|
|
"""POST /v1/chat/completions — OpenAI Chat Completions format."""
|
|
|
|
|
auth_err = self._check_auth(request)
|
|
|
|
|
if auth_err:
|
|
|
|
|
return auth_err
|
|
|
|
|
|
|
|
|
|
# Parse request body
|
|
|
|
|
try:
|
|
|
|
|
body = await request.json()
|
|
|
|
|
except (json.JSONDecodeError, Exception):
|
2026-03-24 19:31:08 -07:00
|
|
|
return web.json_response(_openai_error("Invalid JSON in request body"), status=400)
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
|
|
|
|
|
messages = body.get("messages")
|
|
|
|
|
if not messages or not isinstance(messages, list):
|
|
|
|
|
return web.json_response(
|
|
|
|
|
{"error": {"message": "Missing or invalid 'messages' field", "type": "invalid_request_error"}},
|
|
|
|
|
status=400,
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
stream = body.get("stream", False)
|
|
|
|
|
|
|
|
|
|
# Extract system message (becomes ephemeral system prompt layered ON TOP of core)
|
|
|
|
|
system_prompt = None
|
|
|
|
|
conversation_messages: List[Dict[str, str]] = []
|
|
|
|
|
|
|
|
|
|
for msg in messages:
|
|
|
|
|
role = msg.get("role", "")
|
2026-04-12 17:16:16 -07:00
|
|
|
content = _normalize_chat_content(msg.get("content", ""))
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
if role == "system":
|
|
|
|
|
# Accumulate system messages
|
|
|
|
|
if system_prompt is None:
|
|
|
|
|
system_prompt = content
|
|
|
|
|
else:
|
|
|
|
|
system_prompt = system_prompt + "\n" + content
|
|
|
|
|
elif role in ("user", "assistant"):
|
|
|
|
|
conversation_messages.append({"role": role, "content": content})
|
|
|
|
|
|
|
|
|
|
# Extract the last user message as the primary input
|
|
|
|
|
user_message = ""
|
|
|
|
|
history = []
|
|
|
|
|
if conversation_messages:
|
|
|
|
|
user_message = conversation_messages[-1].get("content", "")
|
|
|
|
|
history = conversation_messages[:-1]
|
|
|
|
|
|
|
|
|
|
if not user_message:
|
|
|
|
|
return web.json_response(
|
|
|
|
|
{"error": {"message": "No user message found in messages", "type": "invalid_request_error"}},
|
|
|
|
|
status=400,
|
|
|
|
|
)
|
|
|
|
|
|
2026-03-31 12:56:10 -07:00
|
|
|
# Allow caller to continue an existing session by passing X-Hermes-Session-Id.
|
|
|
|
|
# When provided, history is loaded from state.db instead of from the request body.
|
fix(security): require auth for session continuation and warn on missing API key
Two security hardening changes for the API server:
1. **Startup warning when no API key is configured.**
When `API_SERVER_KEY` is not set, all endpoints accept unauthenticated
requests. This is the default configuration, but operators may not
realize the security implications. A prominent warning at startup
makes the risk visible.
2. **Require authentication for session continuation.**
The `X-Hermes-Session-Id` header allows callers to load and continue
any session stored in state.db. Without authentication, an attacker
who can reach the API server (e.g. via CORS from a malicious page,
or on a shared host) could enumerate session IDs and read conversation
history — which may contain API keys, passwords, code, or other
sensitive data shared with the agent.
Session continuation now returns 403 when no API key is configured,
with a clear error message explaining how to enable the feature.
When a key IS configured, the existing Bearer token check already
gates access.
This is defense-in-depth: the API server is intended for local use,
but defense against cross-origin and shared-host attacks is important
since the default binding is 127.0.0.1 which is reachable from
browsers via DNS rebinding or localhost CORS.
2026-04-10 11:56:23 +08:00
|
|
|
#
|
|
|
|
|
# Security: session continuation exposes conversation history, so it is
|
|
|
|
|
# only allowed when the API key is configured and the request is
|
|
|
|
|
# authenticated. Without this gate, any unauthenticated client could
|
|
|
|
|
# read arbitrary session history by guessing/enumerating session IDs.
|
2026-03-31 12:56:10 -07:00
|
|
|
provided_session_id = request.headers.get("X-Hermes-Session-Id", "").strip()
|
|
|
|
|
if provided_session_id:
|
fix(security): require auth for session continuation and warn on missing API key
Two security hardening changes for the API server:
1. **Startup warning when no API key is configured.**
When `API_SERVER_KEY` is not set, all endpoints accept unauthenticated
requests. This is the default configuration, but operators may not
realize the security implications. A prominent warning at startup
makes the risk visible.
2. **Require authentication for session continuation.**
The `X-Hermes-Session-Id` header allows callers to load and continue
any session stored in state.db. Without authentication, an attacker
who can reach the API server (e.g. via CORS from a malicious page,
or on a shared host) could enumerate session IDs and read conversation
history — which may contain API keys, passwords, code, or other
sensitive data shared with the agent.
Session continuation now returns 403 when no API key is configured,
with a clear error message explaining how to enable the feature.
When a key IS configured, the existing Bearer token check already
gates access.
This is defense-in-depth: the API server is intended for local use,
but defense against cross-origin and shared-host attacks is important
since the default binding is 127.0.0.1 which is reachable from
browsers via DNS rebinding or localhost CORS.
2026-04-10 11:56:23 +08:00
|
|
|
if not self._api_key:
|
|
|
|
|
logger.warning(
|
|
|
|
|
"Session continuation via X-Hermes-Session-Id rejected: "
|
|
|
|
|
"no API key configured. Set API_SERVER_KEY to enable "
|
|
|
|
|
"session continuity."
|
|
|
|
|
)
|
|
|
|
|
return web.json_response(
|
|
|
|
|
_openai_error(
|
|
|
|
|
"Session continuation requires API key authentication. "
|
|
|
|
|
"Configure API_SERVER_KEY to enable this feature."
|
|
|
|
|
),
|
|
|
|
|
status=403,
|
|
|
|
|
)
|
2026-04-10 11:52:01 +08:00
|
|
|
# Sanitize: reject control characters that could enable header injection.
|
|
|
|
|
if re.search(r'[\r\n\x00]', provided_session_id):
|
|
|
|
|
return web.json_response(
|
|
|
|
|
{"error": {"message": "Invalid session ID", "type": "invalid_request_error"}},
|
|
|
|
|
status=400,
|
|
|
|
|
)
|
2026-03-31 12:56:10 -07:00
|
|
|
session_id = provided_session_id
|
|
|
|
|
try:
|
2026-04-03 10:31:11 -07:00
|
|
|
db = self._ensure_session_db()
|
|
|
|
|
if db is not None:
|
|
|
|
|
history = db.get_messages_as_conversation(session_id)
|
2026-03-31 12:56:10 -07:00
|
|
|
except Exception as e:
|
|
|
|
|
logger.warning("Failed to load session history for %s: %s", session_id, e)
|
|
|
|
|
history = []
|
|
|
|
|
else:
|
2026-04-10 04:56:35 -07:00
|
|
|
# Derive a stable session ID from the conversation fingerprint so
|
|
|
|
|
# that consecutive messages from the same Open WebUI (or similar)
|
|
|
|
|
# conversation map to the same Hermes session. The first user
|
|
|
|
|
# message + system prompt are constant across all turns.
|
|
|
|
|
first_user = ""
|
|
|
|
|
for cm in conversation_messages:
|
|
|
|
|
if cm.get("role") == "user":
|
|
|
|
|
first_user = cm.get("content", "")
|
|
|
|
|
break
|
|
|
|
|
session_id = _derive_chat_session_id(system_prompt, first_user)
|
2026-03-31 12:56:10 -07:00
|
|
|
# history already set from request body above
|
|
|
|
|
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
completion_id = f"chatcmpl-{uuid.uuid4().hex[:29]}"
|
2026-04-09 17:07:29 -07:00
|
|
|
model_name = body.get("model", self._model_name)
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
created = int(time.time())
|
|
|
|
|
|
|
|
|
|
if stream:
|
|
|
|
|
import queue as _q
|
|
|
|
|
_stream_q: _q.Queue = _q.Queue()
|
|
|
|
|
|
|
|
|
|
def _on_delta(delta):
|
fix(api_server): streaming breaks when agent makes tool calls (#2985)
* fix(run_agent): ensure _fire_first_delta() is called for tool generation events
Added calls to _fire_first_delta() in the AIAgent class to improve the handling of tool generation events, ensuring timely notifications during the processing of function calls and tool usage.
* fix(run_agent): improve timeout handling for chat completions
Enhanced the timeout configuration for chat completions in the AIAgent class by introducing customizable connection, read, and write timeouts using environment variables. This ensures more robust handling of API requests during streaming operations.
* fix(run_agent): reduce default stream read timeout for chat completions
Updated the default stream read timeout from 120 seconds to 60 seconds in the AIAgent class, enhancing the timeout configuration for chat completions. This change aims to improve responsiveness during streaming operations.
* fix(run_agent): enhance streaming error handling and retry logic
Improved the error handling and retry mechanism for streaming requests in the AIAgent class. Introduced a configurable maximum number of stream retries and refined the handling of transient network errors, allowing for retries with fresh connections. Non-transient errors now trigger a fallback to non-streaming only when appropriate, ensuring better resilience during API interactions.
* fix(api_server): streaming breaks when agent makes tool calls
The agent fires stream_delta_callback(None) to signal the CLI display
to close its response box before tool execution begins. The API server's
_on_delta callback was forwarding this None directly into the SSE queue,
where the SSE writer treats it as end-of-stream and terminates the HTTP
response prematurely.
After tool calls complete, the agent streams the final answer through
the same callback, but the SSE response was already closed — so Open
WebUI (and similar frontends) never received the actual answer.
Fix: filter out None in _on_delta so the SSE stream stays open. The SSE
loop already detects completion via agent_task.done(), which handles
stream termination correctly without needing the None sentinel.
Reported by Rohit Paul on X.
2026-03-25 09:56:20 -07:00
|
|
|
# Filter out None — the agent fires stream_delta_callback(None)
|
|
|
|
|
# to signal the CLI display to close its response box before
|
|
|
|
|
# tool execution, but the SSE writer uses None as end-of-stream
|
|
|
|
|
# sentinel. Forwarding it would prematurely close the HTTP
|
|
|
|
|
# response, causing Open WebUI (and similar frontends) to miss
|
|
|
|
|
# the final answer after tool calls. The SSE loop detects
|
|
|
|
|
# completion via agent_task.done() instead.
|
|
|
|
|
if delta is not None:
|
|
|
|
|
_stream_q.put(delta)
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
|
2026-04-07 21:23:49 +02:00
|
|
|
def _on_tool_progress(event_type, name, preview, args, **kwargs):
|
2026-04-10 03:37:34 -04:00
|
|
|
"""Send tool progress as a separate SSE event.
|
|
|
|
|
|
|
|
|
|
Previously, progress markers like ``⏰ list`` were injected
|
|
|
|
|
directly into ``delta.content``. OpenAI-compatible frontends
|
|
|
|
|
(Open WebUI, LobeChat, …) store ``delta.content`` verbatim as
|
|
|
|
|
the assistant message and send it back on subsequent requests.
|
|
|
|
|
After enough turns the model learns to *emit* the markers as
|
|
|
|
|
plain text instead of issuing real tool calls — silently
|
|
|
|
|
hallucinating tool results. See #6972.
|
|
|
|
|
|
|
|
|
|
The fix: push a tagged tuple ``("__tool_progress__", payload)``
|
|
|
|
|
onto the stream queue. The SSE writer emits it as a custom
|
|
|
|
|
``event: hermes.tool.progress`` line that compliant frontends
|
|
|
|
|
can render for UX but will *not* persist into conversation
|
|
|
|
|
history. Clients that don't understand the custom event type
|
|
|
|
|
silently ignore it per the SSE specification.
|
|
|
|
|
"""
|
2026-04-07 21:23:49 +02:00
|
|
|
if event_type != "tool.started":
|
2026-04-10 03:37:34 -04:00
|
|
|
return
|
2026-03-30 18:50:27 -07:00
|
|
|
if name.startswith("_"):
|
2026-04-10 03:37:34 -04:00
|
|
|
return
|
2026-03-30 18:50:27 -07:00
|
|
|
from agent.display import get_tool_emoji
|
|
|
|
|
emoji = get_tool_emoji(name)
|
|
|
|
|
label = preview or name
|
2026-04-10 03:37:34 -04:00
|
|
|
_stream_q.put(("__tool_progress__", {
|
|
|
|
|
"tool": name,
|
|
|
|
|
"emoji": emoji,
|
|
|
|
|
"label": label,
|
|
|
|
|
}))
|
2026-03-30 18:50:27 -07:00
|
|
|
|
2026-03-27 11:33:19 -07:00
|
|
|
# Start agent in background. agent_ref is a mutable container
|
|
|
|
|
# so the SSE writer can interrupt the agent on client disconnect.
|
|
|
|
|
agent_ref = [None]
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
agent_task = asyncio.ensure_future(self._run_agent(
|
|
|
|
|
user_message=user_message,
|
|
|
|
|
conversation_history=history,
|
|
|
|
|
ephemeral_system_prompt=system_prompt,
|
|
|
|
|
session_id=session_id,
|
|
|
|
|
stream_delta_callback=_on_delta,
|
2026-03-30 18:50:27 -07:00
|
|
|
tool_progress_callback=_on_tool_progress,
|
2026-03-27 11:33:19 -07:00
|
|
|
agent_ref=agent_ref,
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
))
|
|
|
|
|
|
|
|
|
|
return await self._write_sse_chat_completion(
|
2026-03-27 11:33:19 -07:00
|
|
|
request, completion_id, model_name, created, _stream_q,
|
2026-03-31 12:56:10 -07:00
|
|
|
agent_task, agent_ref, session_id=session_id,
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
)
|
|
|
|
|
|
2026-03-24 19:31:08 -07:00
|
|
|
# Non-streaming: run the agent (with optional Idempotency-Key)
|
|
|
|
|
async def _compute_completion():
|
|
|
|
|
return await self._run_agent(
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
user_message=user_message,
|
|
|
|
|
conversation_history=history,
|
|
|
|
|
ephemeral_system_prompt=system_prompt,
|
|
|
|
|
session_id=session_id,
|
|
|
|
|
)
|
2026-03-24 19:31:08 -07:00
|
|
|
|
|
|
|
|
idempotency_key = request.headers.get("Idempotency-Key")
|
|
|
|
|
if idempotency_key:
|
|
|
|
|
fp = _make_request_fingerprint(body, keys=["model", "messages", "tools", "tool_choice", "stream"])
|
|
|
|
|
try:
|
|
|
|
|
result, usage = await _idem_cache.get_or_set(idempotency_key, fp, _compute_completion)
|
|
|
|
|
except Exception as e:
|
|
|
|
|
logger.error("Error running agent for chat completions: %s", e, exc_info=True)
|
|
|
|
|
return web.json_response(
|
|
|
|
|
_openai_error(f"Internal server error: {e}", err_type="server_error"),
|
|
|
|
|
status=500,
|
|
|
|
|
)
|
|
|
|
|
else:
|
|
|
|
|
try:
|
|
|
|
|
result, usage = await _compute_completion()
|
|
|
|
|
except Exception as e:
|
|
|
|
|
logger.error("Error running agent for chat completions: %s", e, exc_info=True)
|
|
|
|
|
return web.json_response(
|
|
|
|
|
_openai_error(f"Internal server error: {e}", err_type="server_error"),
|
|
|
|
|
status=500,
|
|
|
|
|
)
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
|
|
|
|
|
final_response = result.get("final_response", "")
|
|
|
|
|
if not final_response:
|
|
|
|
|
final_response = result.get("error", "(No response generated)")
|
|
|
|
|
|
|
|
|
|
response_data = {
|
|
|
|
|
"id": completion_id,
|
|
|
|
|
"object": "chat.completion",
|
|
|
|
|
"created": created,
|
|
|
|
|
"model": model_name,
|
|
|
|
|
"choices": [
|
|
|
|
|
{
|
|
|
|
|
"index": 0,
|
|
|
|
|
"message": {
|
|
|
|
|
"role": "assistant",
|
|
|
|
|
"content": final_response,
|
|
|
|
|
},
|
|
|
|
|
"finish_reason": "stop",
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"usage": {
|
|
|
|
|
"prompt_tokens": usage.get("input_tokens", 0),
|
|
|
|
|
"completion_tokens": usage.get("output_tokens", 0),
|
|
|
|
|
"total_tokens": usage.get("total_tokens", 0),
|
|
|
|
|
},
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-31 12:56:10 -07:00
|
|
|
return web.json_response(response_data, headers={"X-Hermes-Session-Id": session_id})
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
|
|
|
|
|
async def _write_sse_chat_completion(
|
|
|
|
|
self, request: "web.Request", completion_id: str, model: str,
|
2026-03-31 12:56:10 -07:00
|
|
|
created: int, stream_q, agent_task, agent_ref=None, session_id: str = None,
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
) -> "web.StreamResponse":
|
2026-03-27 11:33:19 -07:00
|
|
|
"""Write real streaming SSE from agent's stream_delta_callback queue.
|
|
|
|
|
|
|
|
|
|
If the client disconnects mid-stream (network drop, browser tab close),
|
|
|
|
|
the agent is interrupted via ``agent.interrupt()`` so it stops making
|
|
|
|
|
LLM API calls, and the asyncio task wrapper is cancelled.
|
|
|
|
|
"""
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
import queue as _q
|
|
|
|
|
|
2026-04-11 12:42:01 -06:00
|
|
|
sse_headers = {
|
|
|
|
|
"Content-Type": "text/event-stream",
|
|
|
|
|
"Cache-Control": "no-cache",
|
|
|
|
|
"X-Accel-Buffering": "no",
|
|
|
|
|
}
|
2026-03-28 13:38:30 -07:00
|
|
|
# CORS middleware can't inject headers into StreamResponse after
|
|
|
|
|
# prepare() flushes them, so resolve CORS headers up front.
|
|
|
|
|
origin = request.headers.get("Origin", "")
|
|
|
|
|
cors = self._cors_headers_for_origin(origin) if origin else None
|
|
|
|
|
if cors:
|
|
|
|
|
sse_headers.update(cors)
|
2026-03-31 12:56:10 -07:00
|
|
|
if session_id:
|
|
|
|
|
sse_headers["X-Hermes-Session-Id"] = session_id
|
2026-03-28 13:38:30 -07:00
|
|
|
response = web.StreamResponse(status=200, headers=sse_headers)
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
await response.prepare(request)
|
|
|
|
|
|
2026-03-27 11:33:19 -07:00
|
|
|
try:
|
2026-04-11 12:42:01 -06:00
|
|
|
last_activity = time.monotonic()
|
|
|
|
|
|
2026-03-27 11:33:19 -07:00
|
|
|
# Role chunk
|
|
|
|
|
role_chunk = {
|
|
|
|
|
"id": completion_id, "object": "chat.completion.chunk",
|
|
|
|
|
"created": created, "model": model,
|
|
|
|
|
"choices": [{"index": 0, "delta": {"role": "assistant"}, "finish_reason": None}],
|
|
|
|
|
}
|
|
|
|
|
await response.write(f"data: {json.dumps(role_chunk)}\n\n".encode())
|
2026-04-11 12:42:01 -06:00
|
|
|
last_activity = time.monotonic()
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
|
2026-04-10 03:37:34 -04:00
|
|
|
# Helper — route a queue item to the correct SSE event.
|
|
|
|
|
async def _emit(item):
|
|
|
|
|
"""Write a single queue item to the SSE stream.
|
|
|
|
|
|
|
|
|
|
Plain strings are sent as normal ``delta.content`` chunks.
|
|
|
|
|
Tagged tuples ``("__tool_progress__", payload)`` are sent
|
|
|
|
|
as a custom ``event: hermes.tool.progress`` SSE event so
|
|
|
|
|
frontends can display them without storing the markers in
|
|
|
|
|
conversation history. See #6972.
|
|
|
|
|
"""
|
|
|
|
|
if isinstance(item, tuple) and len(item) == 2 and item[0] == "__tool_progress__":
|
|
|
|
|
event_data = json.dumps(item[1])
|
|
|
|
|
await response.write(
|
|
|
|
|
f"event: hermes.tool.progress\ndata: {event_data}\n\n".encode()
|
|
|
|
|
)
|
|
|
|
|
else:
|
|
|
|
|
content_chunk = {
|
|
|
|
|
"id": completion_id, "object": "chat.completion.chunk",
|
|
|
|
|
"created": created, "model": model,
|
|
|
|
|
"choices": [{"index": 0, "delta": {"content": item}, "finish_reason": None}],
|
|
|
|
|
}
|
|
|
|
|
await response.write(f"data: {json.dumps(content_chunk)}\n\n".encode())
|
2026-04-11 12:42:01 -06:00
|
|
|
return time.monotonic()
|
2026-04-10 03:37:34 -04:00
|
|
|
|
2026-03-27 11:33:19 -07:00
|
|
|
# Stream content chunks as they arrive from the agent
|
|
|
|
|
loop = asyncio.get_event_loop()
|
|
|
|
|
while True:
|
|
|
|
|
try:
|
|
|
|
|
delta = await loop.run_in_executor(None, lambda: stream_q.get(timeout=0.5))
|
|
|
|
|
except _q.Empty:
|
|
|
|
|
if agent_task.done():
|
|
|
|
|
# Drain any remaining items
|
|
|
|
|
while True:
|
|
|
|
|
try:
|
|
|
|
|
delta = stream_q.get_nowait()
|
|
|
|
|
if delta is None:
|
|
|
|
|
break
|
2026-04-11 12:42:01 -06:00
|
|
|
last_activity = await _emit(delta)
|
2026-03-27 11:33:19 -07:00
|
|
|
except _q.Empty:
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
break
|
2026-03-27 11:33:19 -07:00
|
|
|
break
|
2026-04-11 12:42:01 -06:00
|
|
|
if time.monotonic() - last_activity >= CHAT_COMPLETIONS_SSE_KEEPALIVE_SECONDS:
|
|
|
|
|
await response.write(b": keepalive\n\n")
|
|
|
|
|
last_activity = time.monotonic()
|
2026-03-27 11:33:19 -07:00
|
|
|
continue
|
|
|
|
|
|
|
|
|
|
if delta is None: # End of stream sentinel
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
break
|
|
|
|
|
|
2026-04-11 12:42:01 -06:00
|
|
|
last_activity = await _emit(delta)
|
2026-03-27 11:33:19 -07:00
|
|
|
|
|
|
|
|
# Get usage from completed agent
|
|
|
|
|
usage = {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0}
|
|
|
|
|
try:
|
|
|
|
|
result, agent_usage = await agent_task
|
|
|
|
|
usage = agent_usage or usage
|
|
|
|
|
except Exception:
|
|
|
|
|
pass
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
|
2026-03-27 11:33:19 -07:00
|
|
|
# Finish chunk
|
|
|
|
|
finish_chunk = {
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
"id": completion_id, "object": "chat.completion.chunk",
|
|
|
|
|
"created": created, "model": model,
|
2026-03-27 11:33:19 -07:00
|
|
|
"choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}],
|
|
|
|
|
"usage": {
|
|
|
|
|
"prompt_tokens": usage.get("input_tokens", 0),
|
|
|
|
|
"completion_tokens": usage.get("output_tokens", 0),
|
|
|
|
|
"total_tokens": usage.get("total_tokens", 0),
|
|
|
|
|
},
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
}
|
2026-03-27 11:33:19 -07:00
|
|
|
await response.write(f"data: {json.dumps(finish_chunk)}\n\n".encode())
|
|
|
|
|
await response.write(b"data: [DONE]\n\n")
|
|
|
|
|
except (ConnectionResetError, ConnectionAbortedError, BrokenPipeError, OSError):
|
|
|
|
|
# Client disconnected mid-stream. Interrupt the agent so it
|
|
|
|
|
# stops making LLM API calls at the next loop iteration, then
|
|
|
|
|
# cancel the asyncio task wrapper.
|
|
|
|
|
agent = agent_ref[0] if agent_ref else None
|
|
|
|
|
if agent is not None:
|
|
|
|
|
try:
|
|
|
|
|
agent.interrupt("SSE client disconnected")
|
|
|
|
|
except Exception:
|
|
|
|
|
pass
|
|
|
|
|
if not agent_task.done():
|
|
|
|
|
agent_task.cancel()
|
|
|
|
|
try:
|
|
|
|
|
await agent_task
|
|
|
|
|
except (asyncio.CancelledError, Exception):
|
|
|
|
|
pass
|
|
|
|
|
logger.info("SSE client disconnected; interrupted agent task %s", completion_id)
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
|
|
|
|
|
return response
|
|
|
|
|
|
2026-04-14 09:48:49 -04:00
|
|
|
async def _write_sse_responses(
|
|
|
|
|
self,
|
|
|
|
|
request: "web.Request",
|
|
|
|
|
response_id: str,
|
|
|
|
|
model: str,
|
|
|
|
|
created_at: int,
|
|
|
|
|
stream_q,
|
|
|
|
|
agent_task,
|
|
|
|
|
agent_ref,
|
|
|
|
|
conversation_history: List[Dict[str, str]],
|
|
|
|
|
user_message: str,
|
|
|
|
|
instructions: Optional[str],
|
|
|
|
|
conversation: Optional[str],
|
|
|
|
|
store: bool,
|
|
|
|
|
session_id: str,
|
|
|
|
|
) -> "web.StreamResponse":
|
|
|
|
|
"""Write an SSE stream for POST /v1/responses (OpenAI Responses API).
|
|
|
|
|
|
|
|
|
|
Emits spec-compliant event types as the agent runs:
|
|
|
|
|
|
|
|
|
|
- ``response.created`` — initial envelope (status=in_progress)
|
|
|
|
|
- ``response.output_text.delta`` / ``response.output_text.done`` —
|
|
|
|
|
streamed assistant text
|
|
|
|
|
- ``response.output_item.added`` / ``response.output_item.done``
|
|
|
|
|
with ``item.type == "function_call"`` — when the agent invokes a
|
|
|
|
|
tool (both events fire; the ``done`` event carries the finalized
|
|
|
|
|
``arguments`` string)
|
|
|
|
|
- ``response.output_item.added`` with
|
|
|
|
|
``item.type == "function_call_output"`` — tool result with
|
|
|
|
|
``{call_id, output, status}``
|
|
|
|
|
- ``response.completed`` — terminal event carrying the full
|
|
|
|
|
response object with all output items + usage (same payload
|
|
|
|
|
shape as the non-streaming path for parity)
|
|
|
|
|
- ``response.failed`` — terminal event on agent error
|
|
|
|
|
|
|
|
|
|
If the client disconnects mid-stream, ``agent.interrupt()`` is
|
|
|
|
|
called so the agent stops issuing upstream LLM calls, then the
|
|
|
|
|
asyncio task is cancelled. When ``store=True`` the full response
|
|
|
|
|
is persisted to the ResponseStore in a ``finally`` block so GET
|
|
|
|
|
/v1/responses/{id} and ``previous_response_id`` chaining work the
|
|
|
|
|
same as the batch path.
|
|
|
|
|
"""
|
|
|
|
|
import queue as _q
|
|
|
|
|
|
|
|
|
|
sse_headers = {
|
|
|
|
|
"Content-Type": "text/event-stream",
|
|
|
|
|
"Cache-Control": "no-cache",
|
|
|
|
|
"X-Accel-Buffering": "no",
|
|
|
|
|
}
|
|
|
|
|
origin = request.headers.get("Origin", "")
|
|
|
|
|
cors = self._cors_headers_for_origin(origin) if origin else None
|
|
|
|
|
if cors:
|
|
|
|
|
sse_headers.update(cors)
|
|
|
|
|
if session_id:
|
|
|
|
|
sse_headers["X-Hermes-Session-Id"] = session_id
|
|
|
|
|
response = web.StreamResponse(status=200, headers=sse_headers)
|
|
|
|
|
await response.prepare(request)
|
|
|
|
|
|
|
|
|
|
# State accumulated during the stream
|
|
|
|
|
final_text_parts: List[str] = []
|
|
|
|
|
# Track open function_call items by name so we can emit a matching
|
|
|
|
|
# ``done`` event when the tool completes. Order preserved.
|
|
|
|
|
pending_tool_calls: List[Dict[str, Any]] = []
|
|
|
|
|
# Output items we've emitted so far (used to build the terminal
|
|
|
|
|
# response.completed payload). Kept in the order they appeared.
|
|
|
|
|
emitted_items: List[Dict[str, Any]] = []
|
|
|
|
|
# Monotonic counter for output_index (spec requires it).
|
|
|
|
|
output_index = 0
|
|
|
|
|
# Monotonic counter for call_id generation if the agent doesn't
|
|
|
|
|
# provide one (it doesn't, from tool_progress_callback).
|
|
|
|
|
call_counter = 0
|
2026-04-14 13:14:26 -04:00
|
|
|
# Canonical Responses SSE events include a monotonically increasing
|
|
|
|
|
# sequence_number. Add it server-side for every emitted event so
|
|
|
|
|
# clients that validate the OpenAI event schema can parse our stream.
|
|
|
|
|
sequence_number = 0
|
2026-04-14 09:48:49 -04:00
|
|
|
# Track the assistant message item id + content index for text
|
|
|
|
|
# delta events — the spec ties deltas to a specific item.
|
|
|
|
|
message_item_id = f"msg_{uuid.uuid4().hex[:24]}"
|
|
|
|
|
message_output_index: Optional[int] = None
|
|
|
|
|
message_opened = False
|
|
|
|
|
|
|
|
|
|
async def _write_event(event_type: str, data: Dict[str, Any]) -> None:
|
2026-04-14 13:14:26 -04:00
|
|
|
nonlocal sequence_number
|
|
|
|
|
if "sequence_number" not in data:
|
|
|
|
|
data["sequence_number"] = sequence_number
|
|
|
|
|
sequence_number += 1
|
2026-04-14 09:48:49 -04:00
|
|
|
payload = f"event: {event_type}\ndata: {json.dumps(data)}\n\n"
|
|
|
|
|
await response.write(payload.encode())
|
|
|
|
|
|
|
|
|
|
def _envelope(status: str) -> Dict[str, Any]:
|
|
|
|
|
env: Dict[str, Any] = {
|
|
|
|
|
"id": response_id,
|
|
|
|
|
"object": "response",
|
|
|
|
|
"status": status,
|
|
|
|
|
"created_at": created_at,
|
|
|
|
|
"model": model,
|
|
|
|
|
}
|
|
|
|
|
return env
|
|
|
|
|
|
|
|
|
|
final_response_text = ""
|
|
|
|
|
agent_error: Optional[str] = None
|
|
|
|
|
usage: Dict[str, int] = {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0}
|
|
|
|
|
|
|
|
|
|
try:
|
|
|
|
|
# response.created — initial envelope, status=in_progress
|
|
|
|
|
created_env = _envelope("in_progress")
|
|
|
|
|
created_env["output"] = []
|
|
|
|
|
await _write_event("response.created", {
|
|
|
|
|
"type": "response.created",
|
|
|
|
|
"response": created_env,
|
|
|
|
|
})
|
|
|
|
|
last_activity = time.monotonic()
|
|
|
|
|
|
|
|
|
|
async def _open_message_item() -> None:
|
|
|
|
|
"""Emit response.output_item.added for the assistant message
|
|
|
|
|
the first time any text delta arrives."""
|
|
|
|
|
nonlocal message_opened, message_output_index, output_index
|
|
|
|
|
if message_opened:
|
|
|
|
|
return
|
|
|
|
|
message_opened = True
|
|
|
|
|
message_output_index = output_index
|
|
|
|
|
output_index += 1
|
|
|
|
|
item = {
|
|
|
|
|
"id": message_item_id,
|
|
|
|
|
"type": "message",
|
|
|
|
|
"status": "in_progress",
|
|
|
|
|
"role": "assistant",
|
|
|
|
|
"content": [],
|
|
|
|
|
}
|
|
|
|
|
await _write_event("response.output_item.added", {
|
|
|
|
|
"type": "response.output_item.added",
|
|
|
|
|
"output_index": message_output_index,
|
|
|
|
|
"item": item,
|
|
|
|
|
})
|
|
|
|
|
|
|
|
|
|
async def _emit_text_delta(delta_text: str) -> None:
|
|
|
|
|
await _open_message_item()
|
|
|
|
|
final_text_parts.append(delta_text)
|
|
|
|
|
await _write_event("response.output_text.delta", {
|
|
|
|
|
"type": "response.output_text.delta",
|
|
|
|
|
"item_id": message_item_id,
|
|
|
|
|
"output_index": message_output_index,
|
|
|
|
|
"content_index": 0,
|
|
|
|
|
"delta": delta_text,
|
2026-04-14 13:14:26 -04:00
|
|
|
"logprobs": [],
|
2026-04-14 09:48:49 -04:00
|
|
|
})
|
|
|
|
|
|
|
|
|
|
async def _emit_tool_started(payload: Dict[str, Any]) -> str:
|
|
|
|
|
"""Emit response.output_item.added for a function_call.
|
|
|
|
|
|
|
|
|
|
Returns the call_id so the matching completion event can
|
|
|
|
|
reference it. Prefer the real ``tool_call_id`` from the
|
|
|
|
|
agent when available; fall back to a generated call id for
|
|
|
|
|
safety in tests or older code paths.
|
|
|
|
|
"""
|
|
|
|
|
nonlocal output_index, call_counter
|
|
|
|
|
call_counter += 1
|
|
|
|
|
call_id = payload.get("tool_call_id") or f"call_{response_id[5:]}_{call_counter}"
|
|
|
|
|
args = payload.get("arguments", {})
|
|
|
|
|
if isinstance(args, dict):
|
|
|
|
|
arguments_str = json.dumps(args)
|
|
|
|
|
else:
|
|
|
|
|
arguments_str = str(args)
|
|
|
|
|
item = {
|
|
|
|
|
"id": f"fc_{uuid.uuid4().hex[:24]}",
|
|
|
|
|
"type": "function_call",
|
|
|
|
|
"status": "in_progress",
|
|
|
|
|
"name": payload.get("name", ""),
|
|
|
|
|
"call_id": call_id,
|
|
|
|
|
"arguments": arguments_str,
|
|
|
|
|
}
|
|
|
|
|
idx = output_index
|
|
|
|
|
output_index += 1
|
|
|
|
|
pending_tool_calls.append({
|
|
|
|
|
"call_id": call_id,
|
|
|
|
|
"name": payload.get("name", ""),
|
|
|
|
|
"arguments": arguments_str,
|
|
|
|
|
"item_id": item["id"],
|
|
|
|
|
"output_index": idx,
|
|
|
|
|
})
|
|
|
|
|
emitted_items.append({
|
|
|
|
|
"type": "function_call",
|
|
|
|
|
"name": payload.get("name", ""),
|
|
|
|
|
"arguments": arguments_str,
|
|
|
|
|
"call_id": call_id,
|
|
|
|
|
})
|
|
|
|
|
await _write_event("response.output_item.added", {
|
|
|
|
|
"type": "response.output_item.added",
|
|
|
|
|
"output_index": idx,
|
|
|
|
|
"item": item,
|
|
|
|
|
})
|
|
|
|
|
return call_id
|
|
|
|
|
|
|
|
|
|
async def _emit_tool_completed(payload: Dict[str, Any]) -> None:
|
|
|
|
|
"""Emit response.output_item.done (function_call) followed
|
|
|
|
|
by response.output_item.added (function_call_output)."""
|
|
|
|
|
nonlocal output_index
|
|
|
|
|
call_id = payload.get("tool_call_id")
|
|
|
|
|
result = payload.get("result", "")
|
|
|
|
|
pending = None
|
|
|
|
|
if call_id:
|
|
|
|
|
for i, p in enumerate(pending_tool_calls):
|
|
|
|
|
if p["call_id"] == call_id:
|
|
|
|
|
pending = pending_tool_calls.pop(i)
|
|
|
|
|
break
|
|
|
|
|
if pending is None:
|
|
|
|
|
# Completion without a matching start — skip to avoid
|
|
|
|
|
# emitting orphaned done events.
|
|
|
|
|
return
|
|
|
|
|
|
|
|
|
|
# function_call done
|
|
|
|
|
done_item = {
|
|
|
|
|
"id": pending["item_id"],
|
|
|
|
|
"type": "function_call",
|
|
|
|
|
"status": "completed",
|
|
|
|
|
"name": pending["name"],
|
|
|
|
|
"call_id": pending["call_id"],
|
|
|
|
|
"arguments": pending["arguments"],
|
|
|
|
|
}
|
|
|
|
|
await _write_event("response.output_item.done", {
|
|
|
|
|
"type": "response.output_item.done",
|
|
|
|
|
"output_index": pending["output_index"],
|
|
|
|
|
"item": done_item,
|
|
|
|
|
})
|
|
|
|
|
|
|
|
|
|
# function_call_output added (result)
|
|
|
|
|
result_str = result if isinstance(result, str) else json.dumps(result)
|
2026-04-14 13:14:26 -04:00
|
|
|
output_parts = [{"type": "input_text", "text": result_str}]
|
2026-04-14 09:48:49 -04:00
|
|
|
output_item = {
|
|
|
|
|
"id": f"fco_{uuid.uuid4().hex[:24]}",
|
|
|
|
|
"type": "function_call_output",
|
|
|
|
|
"call_id": pending["call_id"],
|
2026-04-14 13:14:26 -04:00
|
|
|
"output": output_parts,
|
2026-04-14 09:48:49 -04:00
|
|
|
"status": "completed",
|
|
|
|
|
}
|
|
|
|
|
idx = output_index
|
|
|
|
|
output_index += 1
|
|
|
|
|
emitted_items.append({
|
|
|
|
|
"type": "function_call_output",
|
|
|
|
|
"call_id": pending["call_id"],
|
2026-04-14 13:14:26 -04:00
|
|
|
"output": output_parts,
|
2026-04-14 09:48:49 -04:00
|
|
|
})
|
|
|
|
|
await _write_event("response.output_item.added", {
|
|
|
|
|
"type": "response.output_item.added",
|
|
|
|
|
"output_index": idx,
|
|
|
|
|
"item": output_item,
|
|
|
|
|
})
|
2026-04-14 13:14:26 -04:00
|
|
|
await _write_event("response.output_item.done", {
|
|
|
|
|
"type": "response.output_item.done",
|
|
|
|
|
"output_index": idx,
|
|
|
|
|
"item": output_item,
|
|
|
|
|
})
|
2026-04-14 09:48:49 -04:00
|
|
|
|
|
|
|
|
# Main drain loop — thread-safe queue fed by agent callbacks.
|
|
|
|
|
async def _dispatch(it) -> None:
|
|
|
|
|
"""Route a queue item to the correct SSE emitter.
|
|
|
|
|
|
|
|
|
|
Plain strings are text deltas. Tagged tuples with
|
|
|
|
|
``__tool_started__`` / ``__tool_completed__`` prefixes
|
|
|
|
|
are tool lifecycle events.
|
|
|
|
|
"""
|
|
|
|
|
if isinstance(it, tuple) and len(it) == 2 and isinstance(it[0], str):
|
|
|
|
|
tag, payload = it
|
|
|
|
|
if tag == "__tool_started__":
|
|
|
|
|
await _emit_tool_started(payload)
|
|
|
|
|
elif tag == "__tool_completed__":
|
|
|
|
|
await _emit_tool_completed(payload)
|
|
|
|
|
# Unknown tags are silently ignored (forward-compat).
|
|
|
|
|
elif isinstance(it, str):
|
|
|
|
|
await _emit_text_delta(it)
|
|
|
|
|
# Other types (non-string, non-tuple) are silently dropped.
|
|
|
|
|
|
|
|
|
|
loop = asyncio.get_event_loop()
|
|
|
|
|
while True:
|
|
|
|
|
try:
|
|
|
|
|
item = await loop.run_in_executor(None, lambda: stream_q.get(timeout=0.5))
|
|
|
|
|
except _q.Empty:
|
|
|
|
|
if agent_task.done():
|
|
|
|
|
# Drain remaining
|
|
|
|
|
while True:
|
|
|
|
|
try:
|
|
|
|
|
item = stream_q.get_nowait()
|
|
|
|
|
if item is None:
|
|
|
|
|
break
|
|
|
|
|
await _dispatch(item)
|
|
|
|
|
last_activity = time.monotonic()
|
|
|
|
|
except _q.Empty:
|
|
|
|
|
break
|
|
|
|
|
break
|
|
|
|
|
if time.monotonic() - last_activity >= CHAT_COMPLETIONS_SSE_KEEPALIVE_SECONDS:
|
|
|
|
|
await response.write(b": keepalive\n\n")
|
|
|
|
|
last_activity = time.monotonic()
|
|
|
|
|
continue
|
|
|
|
|
|
|
|
|
|
if item is None: # EOS sentinel
|
|
|
|
|
break
|
|
|
|
|
|
|
|
|
|
await _dispatch(item)
|
|
|
|
|
last_activity = time.monotonic()
|
|
|
|
|
|
|
|
|
|
# Pick up agent result + usage from the completed task
|
|
|
|
|
try:
|
|
|
|
|
result, agent_usage = await agent_task
|
|
|
|
|
usage = agent_usage or usage
|
|
|
|
|
# If the agent produced a final_response but no text
|
|
|
|
|
# deltas were streamed (e.g. some providers only emit
|
|
|
|
|
# the full response at the end), emit a single fallback
|
|
|
|
|
# delta so Responses clients still receive a live text part.
|
|
|
|
|
agent_final = result.get("final_response", "") if isinstance(result, dict) else ""
|
|
|
|
|
if agent_final and not final_text_parts:
|
|
|
|
|
await _emit_text_delta(agent_final)
|
|
|
|
|
if agent_final and not final_response_text:
|
|
|
|
|
final_response_text = agent_final
|
|
|
|
|
if isinstance(result, dict) and result.get("error") and not final_response_text:
|
|
|
|
|
agent_error = result["error"]
|
|
|
|
|
except Exception as e: # noqa: BLE001
|
|
|
|
|
logger.error("Error running agent for streaming responses: %s", e, exc_info=True)
|
|
|
|
|
agent_error = str(e)
|
|
|
|
|
|
|
|
|
|
# Close the message item if it was opened
|
|
|
|
|
final_response_text = "".join(final_text_parts) or final_response_text
|
|
|
|
|
if message_opened:
|
|
|
|
|
await _write_event("response.output_text.done", {
|
|
|
|
|
"type": "response.output_text.done",
|
|
|
|
|
"item_id": message_item_id,
|
|
|
|
|
"output_index": message_output_index,
|
|
|
|
|
"content_index": 0,
|
|
|
|
|
"text": final_response_text,
|
2026-04-14 13:14:26 -04:00
|
|
|
"logprobs": [],
|
2026-04-14 09:48:49 -04:00
|
|
|
})
|
|
|
|
|
msg_done_item = {
|
|
|
|
|
"id": message_item_id,
|
|
|
|
|
"type": "message",
|
|
|
|
|
"status": "completed",
|
|
|
|
|
"role": "assistant",
|
|
|
|
|
"content": [
|
|
|
|
|
{"type": "output_text", "text": final_response_text}
|
|
|
|
|
],
|
|
|
|
|
}
|
|
|
|
|
await _write_event("response.output_item.done", {
|
|
|
|
|
"type": "response.output_item.done",
|
|
|
|
|
"output_index": message_output_index,
|
|
|
|
|
"item": msg_done_item,
|
|
|
|
|
})
|
|
|
|
|
|
|
|
|
|
# Always append a final message item in the completed
|
|
|
|
|
# response envelope so clients that only parse the terminal
|
|
|
|
|
# payload still see the assistant text. This mirrors the
|
|
|
|
|
# shape produced by _extract_output_items in the batch path.
|
|
|
|
|
final_items: List[Dict[str, Any]] = list(emitted_items)
|
|
|
|
|
final_items.append({
|
|
|
|
|
"type": "message",
|
|
|
|
|
"role": "assistant",
|
|
|
|
|
"content": [
|
|
|
|
|
{"type": "output_text", "text": final_response_text or (agent_error or "")}
|
|
|
|
|
],
|
|
|
|
|
})
|
|
|
|
|
|
|
|
|
|
if agent_error:
|
|
|
|
|
failed_env = _envelope("failed")
|
|
|
|
|
failed_env["output"] = final_items
|
|
|
|
|
failed_env["error"] = {"message": agent_error, "type": "server_error"}
|
|
|
|
|
failed_env["usage"] = {
|
|
|
|
|
"input_tokens": usage.get("input_tokens", 0),
|
|
|
|
|
"output_tokens": usage.get("output_tokens", 0),
|
|
|
|
|
"total_tokens": usage.get("total_tokens", 0),
|
|
|
|
|
}
|
|
|
|
|
await _write_event("response.failed", {
|
|
|
|
|
"type": "response.failed",
|
|
|
|
|
"response": failed_env,
|
|
|
|
|
})
|
|
|
|
|
else:
|
|
|
|
|
completed_env = _envelope("completed")
|
|
|
|
|
completed_env["output"] = final_items
|
|
|
|
|
completed_env["usage"] = {
|
|
|
|
|
"input_tokens": usage.get("input_tokens", 0),
|
|
|
|
|
"output_tokens": usage.get("output_tokens", 0),
|
|
|
|
|
"total_tokens": usage.get("total_tokens", 0),
|
|
|
|
|
}
|
|
|
|
|
await _write_event("response.completed", {
|
|
|
|
|
"type": "response.completed",
|
|
|
|
|
"response": completed_env,
|
|
|
|
|
})
|
|
|
|
|
|
|
|
|
|
# Persist for future chaining / GET retrieval, mirroring
|
|
|
|
|
# the batch path behavior.
|
|
|
|
|
if store:
|
|
|
|
|
full_history = list(conversation_history)
|
|
|
|
|
full_history.append({"role": "user", "content": user_message})
|
|
|
|
|
if isinstance(result, dict) and result.get("messages"):
|
|
|
|
|
full_history.extend(result["messages"])
|
|
|
|
|
else:
|
|
|
|
|
full_history.append({"role": "assistant", "content": final_response_text})
|
|
|
|
|
self._response_store.put(response_id, {
|
|
|
|
|
"response": completed_env,
|
|
|
|
|
"conversation_history": full_history,
|
|
|
|
|
"instructions": instructions,
|
2026-04-14 21:06:32 -07:00
|
|
|
"session_id": session_id,
|
2026-04-14 09:48:49 -04:00
|
|
|
})
|
|
|
|
|
if conversation:
|
|
|
|
|
self._response_store.set_conversation(conversation, response_id)
|
|
|
|
|
|
|
|
|
|
except (ConnectionResetError, ConnectionAbortedError, BrokenPipeError, OSError):
|
|
|
|
|
# Client disconnected — interrupt the agent so it stops
|
|
|
|
|
# making upstream LLM calls, then cancel the task.
|
|
|
|
|
agent = agent_ref[0] if agent_ref else None
|
|
|
|
|
if agent is not None:
|
|
|
|
|
try:
|
|
|
|
|
agent.interrupt("SSE client disconnected")
|
|
|
|
|
except Exception:
|
|
|
|
|
pass
|
|
|
|
|
if not agent_task.done():
|
|
|
|
|
agent_task.cancel()
|
|
|
|
|
try:
|
|
|
|
|
await agent_task
|
|
|
|
|
except (asyncio.CancelledError, Exception):
|
|
|
|
|
pass
|
|
|
|
|
logger.info("SSE client disconnected; interrupted agent task %s", response_id)
|
|
|
|
|
|
|
|
|
|
return response
|
|
|
|
|
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
async def _handle_responses(self, request: "web.Request") -> "web.Response":
|
|
|
|
|
"""POST /v1/responses — OpenAI Responses API format."""
|
|
|
|
|
auth_err = self._check_auth(request)
|
|
|
|
|
if auth_err:
|
|
|
|
|
return auth_err
|
|
|
|
|
|
|
|
|
|
# Parse request body
|
|
|
|
|
try:
|
|
|
|
|
body = await request.json()
|
|
|
|
|
except (json.JSONDecodeError, Exception):
|
|
|
|
|
return web.json_response(
|
|
|
|
|
{"error": {"message": "Invalid JSON in request body", "type": "invalid_request_error"}},
|
|
|
|
|
status=400,
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
raw_input = body.get("input")
|
|
|
|
|
if raw_input is None:
|
2026-03-24 19:31:08 -07:00
|
|
|
return web.json_response(_openai_error("Missing 'input' field"), status=400)
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
|
|
|
|
|
instructions = body.get("instructions")
|
|
|
|
|
previous_response_id = body.get("previous_response_id")
|
|
|
|
|
conversation = body.get("conversation")
|
|
|
|
|
store = body.get("store", True)
|
|
|
|
|
|
|
|
|
|
# conversation and previous_response_id are mutually exclusive
|
|
|
|
|
if conversation and previous_response_id:
|
2026-03-24 19:31:08 -07:00
|
|
|
return web.json_response(_openai_error("Cannot use both 'conversation' and 'previous_response_id'"), status=400)
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
|
|
|
|
|
# Resolve conversation name to latest response_id
|
|
|
|
|
if conversation:
|
2026-03-22 04:56:06 -07:00
|
|
|
previous_response_id = self._response_store.get_conversation(conversation)
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
# No error if conversation doesn't exist yet — it's a new conversation
|
|
|
|
|
|
|
|
|
|
# Normalize input to message list
|
|
|
|
|
input_messages: List[Dict[str, str]] = []
|
|
|
|
|
if isinstance(raw_input, str):
|
|
|
|
|
input_messages = [{"role": "user", "content": raw_input}]
|
|
|
|
|
elif isinstance(raw_input, list):
|
|
|
|
|
for item in raw_input:
|
|
|
|
|
if isinstance(item, str):
|
|
|
|
|
input_messages.append({"role": "user", "content": item})
|
|
|
|
|
elif isinstance(item, dict):
|
|
|
|
|
role = item.get("role", "user")
|
2026-04-12 17:16:16 -07:00
|
|
|
content = _normalize_chat_content(item.get("content", ""))
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
input_messages.append({"role": role, "content": content})
|
|
|
|
|
else:
|
2026-03-24 19:31:08 -07:00
|
|
|
return web.json_response(_openai_error("'input' must be a string or array"), status=400)
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
|
2026-04-07 17:41:05 -07:00
|
|
|
# Accept explicit conversation_history from the request body.
|
|
|
|
|
# This lets stateless clients supply their own history instead of
|
|
|
|
|
# relying on server-side response chaining via previous_response_id.
|
|
|
|
|
# Precedence: explicit conversation_history > previous_response_id.
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
conversation_history: List[Dict[str, str]] = []
|
2026-04-07 17:41:05 -07:00
|
|
|
raw_history = body.get("conversation_history")
|
|
|
|
|
if raw_history:
|
|
|
|
|
if not isinstance(raw_history, list):
|
|
|
|
|
return web.json_response(
|
|
|
|
|
_openai_error("'conversation_history' must be an array of message objects"),
|
|
|
|
|
status=400,
|
|
|
|
|
)
|
|
|
|
|
for i, entry in enumerate(raw_history):
|
|
|
|
|
if not isinstance(entry, dict) or "role" not in entry or "content" not in entry:
|
|
|
|
|
return web.json_response(
|
|
|
|
|
_openai_error(f"conversation_history[{i}] must have 'role' and 'content' fields"),
|
|
|
|
|
status=400,
|
|
|
|
|
)
|
|
|
|
|
conversation_history.append({"role": str(entry["role"]), "content": str(entry["content"])})
|
|
|
|
|
if previous_response_id:
|
|
|
|
|
logger.debug("Both conversation_history and previous_response_id provided; using conversation_history")
|
|
|
|
|
|
2026-04-14 21:06:32 -07:00
|
|
|
stored_session_id = None
|
2026-04-07 17:41:05 -07:00
|
|
|
if not conversation_history and previous_response_id:
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
stored = self._response_store.get(previous_response_id)
|
|
|
|
|
if stored is None:
|
2026-03-24 19:31:08 -07:00
|
|
|
return web.json_response(_openai_error(f"Previous response not found: {previous_response_id}"), status=404)
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
conversation_history = list(stored.get("conversation_history", []))
|
2026-04-14 21:06:32 -07:00
|
|
|
stored_session_id = stored.get("session_id")
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
# If no instructions provided, carry forward from previous
|
|
|
|
|
if instructions is None:
|
|
|
|
|
instructions = stored.get("instructions")
|
|
|
|
|
|
|
|
|
|
# Append new input messages to history (all but the last become history)
|
|
|
|
|
for msg in input_messages[:-1]:
|
|
|
|
|
conversation_history.append(msg)
|
|
|
|
|
|
|
|
|
|
# Last input message is the user_message
|
|
|
|
|
user_message = input_messages[-1].get("content", "") if input_messages else ""
|
|
|
|
|
if not user_message:
|
2026-03-24 19:31:08 -07:00
|
|
|
return web.json_response(_openai_error("No user message found in input"), status=400)
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
|
|
|
|
|
# Truncation support
|
|
|
|
|
if body.get("truncation") == "auto" and len(conversation_history) > 100:
|
|
|
|
|
conversation_history = conversation_history[-100:]
|
|
|
|
|
|
2026-04-14 21:06:32 -07:00
|
|
|
# Reuse session from previous_response_id chain so the dashboard
|
|
|
|
|
# groups the entire conversation under one session entry.
|
|
|
|
|
session_id = stored_session_id or str(uuid.uuid4())
|
2026-03-24 19:31:08 -07:00
|
|
|
|
2026-04-14 09:48:49 -04:00
|
|
|
stream = bool(body.get("stream", False))
|
|
|
|
|
if stream:
|
|
|
|
|
# Streaming branch — emit OpenAI Responses SSE events as the
|
|
|
|
|
# agent runs so frontends can render text deltas and tool
|
|
|
|
|
# calls in real time. See _write_sse_responses for details.
|
|
|
|
|
import queue as _q
|
|
|
|
|
_stream_q: _q.Queue = _q.Queue()
|
|
|
|
|
|
|
|
|
|
def _on_delta(delta):
|
|
|
|
|
# None from the agent is a CLI box-close signal, not EOS.
|
|
|
|
|
# Forwarding would kill the SSE stream prematurely; the
|
|
|
|
|
# SSE writer detects completion via agent_task.done().
|
|
|
|
|
if delta is not None:
|
|
|
|
|
_stream_q.put(delta)
|
|
|
|
|
|
|
|
|
|
def _on_tool_progress(event_type, name, preview, args, **kwargs):
|
|
|
|
|
"""Queue non-start tool progress events if needed in future.
|
|
|
|
|
|
|
|
|
|
The structured Responses stream uses ``tool_start_callback``
|
|
|
|
|
and ``tool_complete_callback`` for exact call-id correlation,
|
|
|
|
|
so progress events are currently ignored here.
|
|
|
|
|
"""
|
|
|
|
|
return
|
|
|
|
|
|
|
|
|
|
def _on_tool_start(tool_call_id, function_name, function_args):
|
|
|
|
|
"""Queue a started tool for live function_call streaming."""
|
|
|
|
|
_stream_q.put(("__tool_started__", {
|
|
|
|
|
"tool_call_id": tool_call_id,
|
|
|
|
|
"name": function_name,
|
|
|
|
|
"arguments": function_args or {},
|
|
|
|
|
}))
|
|
|
|
|
|
|
|
|
|
def _on_tool_complete(tool_call_id, function_name, function_args, function_result):
|
|
|
|
|
"""Queue a completed tool result for live function_call_output streaming."""
|
|
|
|
|
_stream_q.put(("__tool_completed__", {
|
|
|
|
|
"tool_call_id": tool_call_id,
|
|
|
|
|
"name": function_name,
|
|
|
|
|
"arguments": function_args or {},
|
|
|
|
|
"result": function_result,
|
|
|
|
|
}))
|
|
|
|
|
|
|
|
|
|
agent_ref = [None]
|
|
|
|
|
agent_task = asyncio.ensure_future(self._run_agent(
|
|
|
|
|
user_message=user_message,
|
|
|
|
|
conversation_history=conversation_history,
|
|
|
|
|
ephemeral_system_prompt=instructions,
|
|
|
|
|
session_id=session_id,
|
|
|
|
|
stream_delta_callback=_on_delta,
|
|
|
|
|
tool_progress_callback=_on_tool_progress,
|
|
|
|
|
tool_start_callback=_on_tool_start,
|
|
|
|
|
tool_complete_callback=_on_tool_complete,
|
|
|
|
|
agent_ref=agent_ref,
|
|
|
|
|
))
|
|
|
|
|
|
|
|
|
|
response_id = f"resp_{uuid.uuid4().hex[:28]}"
|
|
|
|
|
model_name = body.get("model", self._model_name)
|
|
|
|
|
created_at = int(time.time())
|
|
|
|
|
|
|
|
|
|
return await self._write_sse_responses(
|
|
|
|
|
request=request,
|
|
|
|
|
response_id=response_id,
|
|
|
|
|
model=model_name,
|
|
|
|
|
created_at=created_at,
|
|
|
|
|
stream_q=_stream_q,
|
|
|
|
|
agent_task=agent_task,
|
|
|
|
|
agent_ref=agent_ref,
|
|
|
|
|
conversation_history=conversation_history,
|
|
|
|
|
user_message=user_message,
|
|
|
|
|
instructions=instructions,
|
|
|
|
|
conversation=conversation,
|
|
|
|
|
store=store,
|
|
|
|
|
session_id=session_id,
|
|
|
|
|
)
|
|
|
|
|
|
2026-03-24 19:31:08 -07:00
|
|
|
async def _compute_response():
|
|
|
|
|
return await self._run_agent(
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
user_message=user_message,
|
|
|
|
|
conversation_history=conversation_history,
|
|
|
|
|
ephemeral_system_prompt=instructions,
|
|
|
|
|
session_id=session_id,
|
|
|
|
|
)
|
2026-03-24 19:31:08 -07:00
|
|
|
|
|
|
|
|
idempotency_key = request.headers.get("Idempotency-Key")
|
|
|
|
|
if idempotency_key:
|
|
|
|
|
fp = _make_request_fingerprint(
|
|
|
|
|
body,
|
|
|
|
|
keys=["input", "instructions", "previous_response_id", "conversation", "model", "tools"],
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
)
|
2026-03-24 19:31:08 -07:00
|
|
|
try:
|
|
|
|
|
result, usage = await _idem_cache.get_or_set(idempotency_key, fp, _compute_response)
|
|
|
|
|
except Exception as e:
|
|
|
|
|
logger.error("Error running agent for responses: %s", e, exc_info=True)
|
|
|
|
|
return web.json_response(
|
|
|
|
|
_openai_error(f"Internal server error: {e}", err_type="server_error"),
|
|
|
|
|
status=500,
|
|
|
|
|
)
|
|
|
|
|
else:
|
|
|
|
|
try:
|
|
|
|
|
result, usage = await _compute_response()
|
|
|
|
|
except Exception as e:
|
|
|
|
|
logger.error("Error running agent for responses: %s", e, exc_info=True)
|
|
|
|
|
return web.json_response(
|
|
|
|
|
_openai_error(f"Internal server error: {e}", err_type="server_error"),
|
|
|
|
|
status=500,
|
|
|
|
|
)
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
|
|
|
|
|
final_response = result.get("final_response", "")
|
|
|
|
|
if not final_response:
|
|
|
|
|
final_response = result.get("error", "(No response generated)")
|
|
|
|
|
|
|
|
|
|
response_id = f"resp_{uuid.uuid4().hex[:28]}"
|
|
|
|
|
created_at = int(time.time())
|
|
|
|
|
|
|
|
|
|
# Build the full conversation history for storage
|
|
|
|
|
# (includes tool calls from the agent run)
|
|
|
|
|
full_history = list(conversation_history)
|
|
|
|
|
full_history.append({"role": "user", "content": user_message})
|
|
|
|
|
# Add agent's internal messages if available
|
|
|
|
|
agent_messages = result.get("messages", [])
|
|
|
|
|
if agent_messages:
|
|
|
|
|
full_history.extend(agent_messages)
|
|
|
|
|
else:
|
|
|
|
|
full_history.append({"role": "assistant", "content": final_response})
|
|
|
|
|
|
|
|
|
|
# Build output items (includes tool calls + final message)
|
|
|
|
|
output_items = self._extract_output_items(result)
|
|
|
|
|
|
|
|
|
|
response_data = {
|
|
|
|
|
"id": response_id,
|
|
|
|
|
"object": "response",
|
|
|
|
|
"status": "completed",
|
|
|
|
|
"created_at": created_at,
|
2026-04-09 17:07:29 -07:00
|
|
|
"model": body.get("model", self._model_name),
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
"output": output_items,
|
|
|
|
|
"usage": {
|
|
|
|
|
"input_tokens": usage.get("input_tokens", 0),
|
|
|
|
|
"output_tokens": usage.get("output_tokens", 0),
|
|
|
|
|
"total_tokens": usage.get("total_tokens", 0),
|
|
|
|
|
},
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
# Store the complete response object for future chaining / GET retrieval
|
|
|
|
|
if store:
|
|
|
|
|
self._response_store.put(response_id, {
|
|
|
|
|
"response": response_data,
|
|
|
|
|
"conversation_history": full_history,
|
|
|
|
|
"instructions": instructions,
|
2026-04-14 21:06:32 -07:00
|
|
|
"session_id": session_id,
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
})
|
|
|
|
|
# Update conversation mapping so the next request with the same
|
|
|
|
|
# conversation name automatically chains to this response
|
|
|
|
|
if conversation:
|
2026-03-22 04:56:06 -07:00
|
|
|
self._response_store.set_conversation(conversation, response_id)
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
|
|
|
|
|
return web.json_response(response_data)
|
|
|
|
|
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
# GET / DELETE response endpoints
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
async def _handle_get_response(self, request: "web.Request") -> "web.Response":
|
|
|
|
|
"""GET /v1/responses/{response_id} — retrieve a stored response."""
|
|
|
|
|
auth_err = self._check_auth(request)
|
|
|
|
|
if auth_err:
|
|
|
|
|
return auth_err
|
|
|
|
|
|
|
|
|
|
response_id = request.match_info["response_id"]
|
|
|
|
|
stored = self._response_store.get(response_id)
|
|
|
|
|
if stored is None:
|
2026-03-24 19:31:08 -07:00
|
|
|
return web.json_response(_openai_error(f"Response not found: {response_id}"), status=404)
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
|
|
|
|
|
return web.json_response(stored["response"])
|
|
|
|
|
|
|
|
|
|
async def _handle_delete_response(self, request: "web.Request") -> "web.Response":
|
|
|
|
|
"""DELETE /v1/responses/{response_id} — delete a stored response."""
|
|
|
|
|
auth_err = self._check_auth(request)
|
|
|
|
|
if auth_err:
|
|
|
|
|
return auth_err
|
|
|
|
|
|
|
|
|
|
response_id = request.match_info["response_id"]
|
|
|
|
|
deleted = self._response_store.delete(response_id)
|
|
|
|
|
if not deleted:
|
2026-03-24 19:31:08 -07:00
|
|
|
return web.json_response(_openai_error(f"Response not found: {response_id}"), status=404)
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
|
|
|
|
|
return web.json_response({
|
|
|
|
|
"id": response_id,
|
|
|
|
|
"object": "response",
|
|
|
|
|
"deleted": True,
|
|
|
|
|
})
|
|
|
|
|
|
2026-03-22 04:06:57 -07:00
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
# Cron jobs API
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
|
fix(api-server): harden jobs API — input limits, field whitelist, startup check, tests
Five improvements to the /api/jobs endpoints:
1. Startup availability check — cron module imported once at class load,
endpoints return 501 if unavailable (not 500 per-request import error)
2. Input limits — name ≤ 200 chars, prompt ≤ 5000 chars, repeat must be
positive int
3. Update field whitelist — only name/schedule/prompt/deliver/skills/
repeat/enabled pass through to cron.jobs.update_job, preventing
arbitrary key injection
4. Deduplicated validation — _check_job_id and _check_jobs_available
helpers replace repeated boilerplate
5. 32 new tests covering all endpoints, validation, auth, and
cron-unavailable cases
2026-03-22 04:18:18 -07:00
|
|
|
# Check cron module availability once (not per-request)
|
|
|
|
|
_CRON_AVAILABLE = False
|
|
|
|
|
try:
|
|
|
|
|
from cron.jobs import (
|
|
|
|
|
list_jobs as _cron_list,
|
|
|
|
|
get_job as _cron_get,
|
|
|
|
|
create_job as _cron_create,
|
|
|
|
|
update_job as _cron_update,
|
|
|
|
|
remove_job as _cron_remove,
|
|
|
|
|
pause_job as _cron_pause,
|
|
|
|
|
resume_job as _cron_resume,
|
|
|
|
|
trigger_job as _cron_trigger,
|
|
|
|
|
)
|
2026-04-05 12:28:09 -07:00
|
|
|
# Wrap as staticmethod to prevent descriptor binding — these are plain
|
|
|
|
|
# module functions, not instance methods. Without this, self._cron_*()
|
|
|
|
|
# injects ``self`` as the first positional argument and every call
|
|
|
|
|
# raises TypeError.
|
|
|
|
|
_cron_list = staticmethod(_cron_list)
|
|
|
|
|
_cron_get = staticmethod(_cron_get)
|
|
|
|
|
_cron_create = staticmethod(_cron_create)
|
|
|
|
|
_cron_update = staticmethod(_cron_update)
|
|
|
|
|
_cron_remove = staticmethod(_cron_remove)
|
|
|
|
|
_cron_pause = staticmethod(_cron_pause)
|
|
|
|
|
_cron_resume = staticmethod(_cron_resume)
|
|
|
|
|
_cron_trigger = staticmethod(_cron_trigger)
|
fix(api-server): harden jobs API — input limits, field whitelist, startup check, tests
Five improvements to the /api/jobs endpoints:
1. Startup availability check — cron module imported once at class load,
endpoints return 501 if unavailable (not 500 per-request import error)
2. Input limits — name ≤ 200 chars, prompt ≤ 5000 chars, repeat must be
positive int
3. Update field whitelist — only name/schedule/prompt/deliver/skills/
repeat/enabled pass through to cron.jobs.update_job, preventing
arbitrary key injection
4. Deduplicated validation — _check_job_id and _check_jobs_available
helpers replace repeated boilerplate
5. 32 new tests covering all endpoints, validation, auth, and
cron-unavailable cases
2026-03-22 04:18:18 -07:00
|
|
|
_CRON_AVAILABLE = True
|
|
|
|
|
except ImportError:
|
|
|
|
|
pass
|
|
|
|
|
|
|
|
|
|
_JOB_ID_RE = __import__("re").compile(r"[a-f0-9]{12}")
|
|
|
|
|
# Allowed fields for update — prevents clients injecting arbitrary keys
|
|
|
|
|
_UPDATE_ALLOWED_FIELDS = {"name", "schedule", "prompt", "deliver", "skills", "skill", "repeat", "enabled"}
|
|
|
|
|
_MAX_NAME_LENGTH = 200
|
|
|
|
|
_MAX_PROMPT_LENGTH = 5000
|
|
|
|
|
|
|
|
|
|
def _check_jobs_available(self) -> Optional["web.Response"]:
|
|
|
|
|
"""Return error response if cron module isn't available."""
|
|
|
|
|
if not self._CRON_AVAILABLE:
|
|
|
|
|
return web.json_response(
|
|
|
|
|
{"error": "Cron module not available"}, status=501,
|
|
|
|
|
)
|
2026-03-22 04:06:57 -07:00
|
|
|
return None
|
|
|
|
|
|
fix(api-server): harden jobs API — input limits, field whitelist, startup check, tests
Five improvements to the /api/jobs endpoints:
1. Startup availability check — cron module imported once at class load,
endpoints return 501 if unavailable (not 500 per-request import error)
2. Input limits — name ≤ 200 chars, prompt ≤ 5000 chars, repeat must be
positive int
3. Update field whitelist — only name/schedule/prompt/deliver/skills/
repeat/enabled pass through to cron.jobs.update_job, preventing
arbitrary key injection
4. Deduplicated validation — _check_job_id and _check_jobs_available
helpers replace repeated boilerplate
5. 32 new tests covering all endpoints, validation, auth, and
cron-unavailable cases
2026-03-22 04:18:18 -07:00
|
|
|
def _check_job_id(self, request: "web.Request") -> tuple:
|
|
|
|
|
"""Validate and extract job_id. Returns (job_id, error_response)."""
|
|
|
|
|
job_id = request.match_info["job_id"]
|
|
|
|
|
if not self._JOB_ID_RE.fullmatch(job_id):
|
|
|
|
|
return job_id, web.json_response(
|
|
|
|
|
{"error": "Invalid job ID format"}, status=400,
|
|
|
|
|
)
|
|
|
|
|
return job_id, None
|
|
|
|
|
|
2026-03-22 04:06:57 -07:00
|
|
|
async def _handle_list_jobs(self, request: "web.Request") -> "web.Response":
|
|
|
|
|
"""GET /api/jobs — list all cron jobs."""
|
|
|
|
|
auth_err = self._check_auth(request)
|
|
|
|
|
if auth_err:
|
|
|
|
|
return auth_err
|
fix(api-server): harden jobs API — input limits, field whitelist, startup check, tests
Five improvements to the /api/jobs endpoints:
1. Startup availability check — cron module imported once at class load,
endpoints return 501 if unavailable (not 500 per-request import error)
2. Input limits — name ≤ 200 chars, prompt ≤ 5000 chars, repeat must be
positive int
3. Update field whitelist — only name/schedule/prompt/deliver/skills/
repeat/enabled pass through to cron.jobs.update_job, preventing
arbitrary key injection
4. Deduplicated validation — _check_job_id and _check_jobs_available
helpers replace repeated boilerplate
5. 32 new tests covering all endpoints, validation, auth, and
cron-unavailable cases
2026-03-22 04:18:18 -07:00
|
|
|
cron_err = self._check_jobs_available()
|
|
|
|
|
if cron_err:
|
|
|
|
|
return cron_err
|
2026-03-22 04:06:57 -07:00
|
|
|
try:
|
|
|
|
|
include_disabled = request.query.get("include_disabled", "").lower() in ("true", "1")
|
fix(api-server): harden jobs API — input limits, field whitelist, startup check, tests
Five improvements to the /api/jobs endpoints:
1. Startup availability check — cron module imported once at class load,
endpoints return 501 if unavailable (not 500 per-request import error)
2. Input limits — name ≤ 200 chars, prompt ≤ 5000 chars, repeat must be
positive int
3. Update field whitelist — only name/schedule/prompt/deliver/skills/
repeat/enabled pass through to cron.jobs.update_job, preventing
arbitrary key injection
4. Deduplicated validation — _check_job_id and _check_jobs_available
helpers replace repeated boilerplate
5. 32 new tests covering all endpoints, validation, auth, and
cron-unavailable cases
2026-03-22 04:18:18 -07:00
|
|
|
jobs = self._cron_list(include_disabled=include_disabled)
|
2026-03-22 04:06:57 -07:00
|
|
|
return web.json_response({"jobs": jobs})
|
|
|
|
|
except Exception as e:
|
|
|
|
|
return web.json_response({"error": str(e)}, status=500)
|
|
|
|
|
|
|
|
|
|
async def _handle_create_job(self, request: "web.Request") -> "web.Response":
|
|
|
|
|
"""POST /api/jobs — create a new cron job."""
|
|
|
|
|
auth_err = self._check_auth(request)
|
|
|
|
|
if auth_err:
|
|
|
|
|
return auth_err
|
fix(api-server): harden jobs API — input limits, field whitelist, startup check, tests
Five improvements to the /api/jobs endpoints:
1. Startup availability check — cron module imported once at class load,
endpoints return 501 if unavailable (not 500 per-request import error)
2. Input limits — name ≤ 200 chars, prompt ≤ 5000 chars, repeat must be
positive int
3. Update field whitelist — only name/schedule/prompt/deliver/skills/
repeat/enabled pass through to cron.jobs.update_job, preventing
arbitrary key injection
4. Deduplicated validation — _check_job_id and _check_jobs_available
helpers replace repeated boilerplate
5. 32 new tests covering all endpoints, validation, auth, and
cron-unavailable cases
2026-03-22 04:18:18 -07:00
|
|
|
cron_err = self._check_jobs_available()
|
|
|
|
|
if cron_err:
|
|
|
|
|
return cron_err
|
2026-03-22 04:06:57 -07:00
|
|
|
try:
|
|
|
|
|
body = await request.json()
|
fix(api-server): harden jobs API — input limits, field whitelist, startup check, tests
Five improvements to the /api/jobs endpoints:
1. Startup availability check — cron module imported once at class load,
endpoints return 501 if unavailable (not 500 per-request import error)
2. Input limits — name ≤ 200 chars, prompt ≤ 5000 chars, repeat must be
positive int
3. Update field whitelist — only name/schedule/prompt/deliver/skills/
repeat/enabled pass through to cron.jobs.update_job, preventing
arbitrary key injection
4. Deduplicated validation — _check_job_id and _check_jobs_available
helpers replace repeated boilerplate
5. 32 new tests covering all endpoints, validation, auth, and
cron-unavailable cases
2026-03-22 04:18:18 -07:00
|
|
|
name = (body.get("name") or "").strip()
|
|
|
|
|
schedule = (body.get("schedule") or "").strip()
|
2026-03-22 04:06:57 -07:00
|
|
|
prompt = body.get("prompt", "")
|
|
|
|
|
deliver = body.get("deliver", "local")
|
|
|
|
|
skills = body.get("skills")
|
|
|
|
|
repeat = body.get("repeat")
|
|
|
|
|
|
|
|
|
|
if not name:
|
|
|
|
|
return web.json_response({"error": "Name is required"}, status=400)
|
fix(api-server): harden jobs API — input limits, field whitelist, startup check, tests
Five improvements to the /api/jobs endpoints:
1. Startup availability check — cron module imported once at class load,
endpoints return 501 if unavailable (not 500 per-request import error)
2. Input limits — name ≤ 200 chars, prompt ≤ 5000 chars, repeat must be
positive int
3. Update field whitelist — only name/schedule/prompt/deliver/skills/
repeat/enabled pass through to cron.jobs.update_job, preventing
arbitrary key injection
4. Deduplicated validation — _check_job_id and _check_jobs_available
helpers replace repeated boilerplate
5. 32 new tests covering all endpoints, validation, auth, and
cron-unavailable cases
2026-03-22 04:18:18 -07:00
|
|
|
if len(name) > self._MAX_NAME_LENGTH:
|
|
|
|
|
return web.json_response(
|
|
|
|
|
{"error": f"Name must be ≤ {self._MAX_NAME_LENGTH} characters"}, status=400,
|
|
|
|
|
)
|
2026-03-22 04:06:57 -07:00
|
|
|
if not schedule:
|
|
|
|
|
return web.json_response({"error": "Schedule is required"}, status=400)
|
fix(api-server): harden jobs API — input limits, field whitelist, startup check, tests
Five improvements to the /api/jobs endpoints:
1. Startup availability check — cron module imported once at class load,
endpoints return 501 if unavailable (not 500 per-request import error)
2. Input limits — name ≤ 200 chars, prompt ≤ 5000 chars, repeat must be
positive int
3. Update field whitelist — only name/schedule/prompt/deliver/skills/
repeat/enabled pass through to cron.jobs.update_job, preventing
arbitrary key injection
4. Deduplicated validation — _check_job_id and _check_jobs_available
helpers replace repeated boilerplate
5. 32 new tests covering all endpoints, validation, auth, and
cron-unavailable cases
2026-03-22 04:18:18 -07:00
|
|
|
if len(prompt) > self._MAX_PROMPT_LENGTH:
|
|
|
|
|
return web.json_response(
|
|
|
|
|
{"error": f"Prompt must be ≤ {self._MAX_PROMPT_LENGTH} characters"}, status=400,
|
|
|
|
|
)
|
|
|
|
|
if repeat is not None and (not isinstance(repeat, int) or repeat < 1):
|
|
|
|
|
return web.json_response({"error": "Repeat must be a positive integer"}, status=400)
|
2026-03-22 04:06:57 -07:00
|
|
|
|
|
|
|
|
kwargs = {
|
|
|
|
|
"prompt": prompt,
|
|
|
|
|
"schedule": schedule,
|
|
|
|
|
"name": name,
|
|
|
|
|
"deliver": deliver,
|
|
|
|
|
}
|
|
|
|
|
if skills:
|
|
|
|
|
kwargs["skills"] = skills
|
|
|
|
|
if repeat is not None:
|
|
|
|
|
kwargs["repeat"] = repeat
|
|
|
|
|
|
fix(api-server): harden jobs API — input limits, field whitelist, startup check, tests
Five improvements to the /api/jobs endpoints:
1. Startup availability check — cron module imported once at class load,
endpoints return 501 if unavailable (not 500 per-request import error)
2. Input limits — name ≤ 200 chars, prompt ≤ 5000 chars, repeat must be
positive int
3. Update field whitelist — only name/schedule/prompt/deliver/skills/
repeat/enabled pass through to cron.jobs.update_job, preventing
arbitrary key injection
4. Deduplicated validation — _check_job_id and _check_jobs_available
helpers replace repeated boilerplate
5. 32 new tests covering all endpoints, validation, auth, and
cron-unavailable cases
2026-03-22 04:18:18 -07:00
|
|
|
job = self._cron_create(**kwargs)
|
2026-03-22 04:06:57 -07:00
|
|
|
return web.json_response({"job": job})
|
|
|
|
|
except Exception as e:
|
|
|
|
|
return web.json_response({"error": str(e)}, status=500)
|
|
|
|
|
|
|
|
|
|
async def _handle_get_job(self, request: "web.Request") -> "web.Response":
|
|
|
|
|
"""GET /api/jobs/{job_id} — get a single cron job."""
|
|
|
|
|
auth_err = self._check_auth(request)
|
|
|
|
|
if auth_err:
|
|
|
|
|
return auth_err
|
fix(api-server): harden jobs API — input limits, field whitelist, startup check, tests
Five improvements to the /api/jobs endpoints:
1. Startup availability check — cron module imported once at class load,
endpoints return 501 if unavailable (not 500 per-request import error)
2. Input limits — name ≤ 200 chars, prompt ≤ 5000 chars, repeat must be
positive int
3. Update field whitelist — only name/schedule/prompt/deliver/skills/
repeat/enabled pass through to cron.jobs.update_job, preventing
arbitrary key injection
4. Deduplicated validation — _check_job_id and _check_jobs_available
helpers replace repeated boilerplate
5. 32 new tests covering all endpoints, validation, auth, and
cron-unavailable cases
2026-03-22 04:18:18 -07:00
|
|
|
cron_err = self._check_jobs_available()
|
|
|
|
|
if cron_err:
|
|
|
|
|
return cron_err
|
|
|
|
|
job_id, id_err = self._check_job_id(request)
|
|
|
|
|
if id_err:
|
|
|
|
|
return id_err
|
2026-03-22 04:06:57 -07:00
|
|
|
try:
|
fix(api-server): harden jobs API — input limits, field whitelist, startup check, tests
Five improvements to the /api/jobs endpoints:
1. Startup availability check — cron module imported once at class load,
endpoints return 501 if unavailable (not 500 per-request import error)
2. Input limits — name ≤ 200 chars, prompt ≤ 5000 chars, repeat must be
positive int
3. Update field whitelist — only name/schedule/prompt/deliver/skills/
repeat/enabled pass through to cron.jobs.update_job, preventing
arbitrary key injection
4. Deduplicated validation — _check_job_id and _check_jobs_available
helpers replace repeated boilerplate
5. 32 new tests covering all endpoints, validation, auth, and
cron-unavailable cases
2026-03-22 04:18:18 -07:00
|
|
|
job = self._cron_get(job_id)
|
2026-03-22 04:06:57 -07:00
|
|
|
if not job:
|
|
|
|
|
return web.json_response({"error": "Job not found"}, status=404)
|
|
|
|
|
return web.json_response({"job": job})
|
|
|
|
|
except Exception as e:
|
|
|
|
|
return web.json_response({"error": str(e)}, status=500)
|
|
|
|
|
|
|
|
|
|
async def _handle_update_job(self, request: "web.Request") -> "web.Response":
|
|
|
|
|
"""PATCH /api/jobs/{job_id} — update a cron job."""
|
|
|
|
|
auth_err = self._check_auth(request)
|
|
|
|
|
if auth_err:
|
|
|
|
|
return auth_err
|
fix(api-server): harden jobs API — input limits, field whitelist, startup check, tests
Five improvements to the /api/jobs endpoints:
1. Startup availability check — cron module imported once at class load,
endpoints return 501 if unavailable (not 500 per-request import error)
2. Input limits — name ≤ 200 chars, prompt ≤ 5000 chars, repeat must be
positive int
3. Update field whitelist — only name/schedule/prompt/deliver/skills/
repeat/enabled pass through to cron.jobs.update_job, preventing
arbitrary key injection
4. Deduplicated validation — _check_job_id and _check_jobs_available
helpers replace repeated boilerplate
5. 32 new tests covering all endpoints, validation, auth, and
cron-unavailable cases
2026-03-22 04:18:18 -07:00
|
|
|
cron_err = self._check_jobs_available()
|
|
|
|
|
if cron_err:
|
|
|
|
|
return cron_err
|
|
|
|
|
job_id, id_err = self._check_job_id(request)
|
|
|
|
|
if id_err:
|
|
|
|
|
return id_err
|
2026-03-22 04:06:57 -07:00
|
|
|
try:
|
|
|
|
|
body = await request.json()
|
fix(api-server): harden jobs API — input limits, field whitelist, startup check, tests
Five improvements to the /api/jobs endpoints:
1. Startup availability check — cron module imported once at class load,
endpoints return 501 if unavailable (not 500 per-request import error)
2. Input limits — name ≤ 200 chars, prompt ≤ 5000 chars, repeat must be
positive int
3. Update field whitelist — only name/schedule/prompt/deliver/skills/
repeat/enabled pass through to cron.jobs.update_job, preventing
arbitrary key injection
4. Deduplicated validation — _check_job_id and _check_jobs_available
helpers replace repeated boilerplate
5. 32 new tests covering all endpoints, validation, auth, and
cron-unavailable cases
2026-03-22 04:18:18 -07:00
|
|
|
# Whitelist allowed fields to prevent arbitrary key injection
|
|
|
|
|
sanitized = {k: v for k, v in body.items() if k in self._UPDATE_ALLOWED_FIELDS}
|
|
|
|
|
if not sanitized:
|
|
|
|
|
return web.json_response({"error": "No valid fields to update"}, status=400)
|
|
|
|
|
# Validate lengths if present
|
|
|
|
|
if "name" in sanitized and len(sanitized["name"]) > self._MAX_NAME_LENGTH:
|
|
|
|
|
return web.json_response(
|
|
|
|
|
{"error": f"Name must be ≤ {self._MAX_NAME_LENGTH} characters"}, status=400,
|
|
|
|
|
)
|
|
|
|
|
if "prompt" in sanitized and len(sanitized["prompt"]) > self._MAX_PROMPT_LENGTH:
|
|
|
|
|
return web.json_response(
|
|
|
|
|
{"error": f"Prompt must be ≤ {self._MAX_PROMPT_LENGTH} characters"}, status=400,
|
|
|
|
|
)
|
|
|
|
|
job = self._cron_update(job_id, sanitized)
|
2026-03-22 04:06:57 -07:00
|
|
|
if not job:
|
|
|
|
|
return web.json_response({"error": "Job not found"}, status=404)
|
|
|
|
|
return web.json_response({"job": job})
|
|
|
|
|
except Exception as e:
|
|
|
|
|
return web.json_response({"error": str(e)}, status=500)
|
|
|
|
|
|
|
|
|
|
async def _handle_delete_job(self, request: "web.Request") -> "web.Response":
|
|
|
|
|
"""DELETE /api/jobs/{job_id} — delete a cron job."""
|
|
|
|
|
auth_err = self._check_auth(request)
|
|
|
|
|
if auth_err:
|
|
|
|
|
return auth_err
|
fix(api-server): harden jobs API — input limits, field whitelist, startup check, tests
Five improvements to the /api/jobs endpoints:
1. Startup availability check — cron module imported once at class load,
endpoints return 501 if unavailable (not 500 per-request import error)
2. Input limits — name ≤ 200 chars, prompt ≤ 5000 chars, repeat must be
positive int
3. Update field whitelist — only name/schedule/prompt/deliver/skills/
repeat/enabled pass through to cron.jobs.update_job, preventing
arbitrary key injection
4. Deduplicated validation — _check_job_id and _check_jobs_available
helpers replace repeated boilerplate
5. 32 new tests covering all endpoints, validation, auth, and
cron-unavailable cases
2026-03-22 04:18:18 -07:00
|
|
|
cron_err = self._check_jobs_available()
|
|
|
|
|
if cron_err:
|
|
|
|
|
return cron_err
|
|
|
|
|
job_id, id_err = self._check_job_id(request)
|
|
|
|
|
if id_err:
|
|
|
|
|
return id_err
|
2026-03-22 04:06:57 -07:00
|
|
|
try:
|
fix(api-server): harden jobs API — input limits, field whitelist, startup check, tests
Five improvements to the /api/jobs endpoints:
1. Startup availability check — cron module imported once at class load,
endpoints return 501 if unavailable (not 500 per-request import error)
2. Input limits — name ≤ 200 chars, prompt ≤ 5000 chars, repeat must be
positive int
3. Update field whitelist — only name/schedule/prompt/deliver/skills/
repeat/enabled pass through to cron.jobs.update_job, preventing
arbitrary key injection
4. Deduplicated validation — _check_job_id and _check_jobs_available
helpers replace repeated boilerplate
5. 32 new tests covering all endpoints, validation, auth, and
cron-unavailable cases
2026-03-22 04:18:18 -07:00
|
|
|
success = self._cron_remove(job_id)
|
2026-03-22 04:06:57 -07:00
|
|
|
if not success:
|
|
|
|
|
return web.json_response({"error": "Job not found"}, status=404)
|
|
|
|
|
return web.json_response({"ok": True})
|
|
|
|
|
except Exception as e:
|
|
|
|
|
return web.json_response({"error": str(e)}, status=500)
|
|
|
|
|
|
|
|
|
|
async def _handle_pause_job(self, request: "web.Request") -> "web.Response":
|
|
|
|
|
"""POST /api/jobs/{job_id}/pause — pause a cron job."""
|
|
|
|
|
auth_err = self._check_auth(request)
|
|
|
|
|
if auth_err:
|
|
|
|
|
return auth_err
|
fix(api-server): harden jobs API — input limits, field whitelist, startup check, tests
Five improvements to the /api/jobs endpoints:
1. Startup availability check — cron module imported once at class load,
endpoints return 501 if unavailable (not 500 per-request import error)
2. Input limits — name ≤ 200 chars, prompt ≤ 5000 chars, repeat must be
positive int
3. Update field whitelist — only name/schedule/prompt/deliver/skills/
repeat/enabled pass through to cron.jobs.update_job, preventing
arbitrary key injection
4. Deduplicated validation — _check_job_id and _check_jobs_available
helpers replace repeated boilerplate
5. 32 new tests covering all endpoints, validation, auth, and
cron-unavailable cases
2026-03-22 04:18:18 -07:00
|
|
|
cron_err = self._check_jobs_available()
|
|
|
|
|
if cron_err:
|
|
|
|
|
return cron_err
|
|
|
|
|
job_id, id_err = self._check_job_id(request)
|
|
|
|
|
if id_err:
|
|
|
|
|
return id_err
|
2026-03-22 04:06:57 -07:00
|
|
|
try:
|
fix(api-server): harden jobs API — input limits, field whitelist, startup check, tests
Five improvements to the /api/jobs endpoints:
1. Startup availability check — cron module imported once at class load,
endpoints return 501 if unavailable (not 500 per-request import error)
2. Input limits — name ≤ 200 chars, prompt ≤ 5000 chars, repeat must be
positive int
3. Update field whitelist — only name/schedule/prompt/deliver/skills/
repeat/enabled pass through to cron.jobs.update_job, preventing
arbitrary key injection
4. Deduplicated validation — _check_job_id and _check_jobs_available
helpers replace repeated boilerplate
5. 32 new tests covering all endpoints, validation, auth, and
cron-unavailable cases
2026-03-22 04:18:18 -07:00
|
|
|
job = self._cron_pause(job_id)
|
2026-03-22 04:06:57 -07:00
|
|
|
if not job:
|
|
|
|
|
return web.json_response({"error": "Job not found"}, status=404)
|
|
|
|
|
return web.json_response({"job": job})
|
|
|
|
|
except Exception as e:
|
|
|
|
|
return web.json_response({"error": str(e)}, status=500)
|
|
|
|
|
|
|
|
|
|
async def _handle_resume_job(self, request: "web.Request") -> "web.Response":
|
|
|
|
|
"""POST /api/jobs/{job_id}/resume — resume a paused cron job."""
|
|
|
|
|
auth_err = self._check_auth(request)
|
|
|
|
|
if auth_err:
|
|
|
|
|
return auth_err
|
fix(api-server): harden jobs API — input limits, field whitelist, startup check, tests
Five improvements to the /api/jobs endpoints:
1. Startup availability check — cron module imported once at class load,
endpoints return 501 if unavailable (not 500 per-request import error)
2. Input limits — name ≤ 200 chars, prompt ≤ 5000 chars, repeat must be
positive int
3. Update field whitelist — only name/schedule/prompt/deliver/skills/
repeat/enabled pass through to cron.jobs.update_job, preventing
arbitrary key injection
4. Deduplicated validation — _check_job_id and _check_jobs_available
helpers replace repeated boilerplate
5. 32 new tests covering all endpoints, validation, auth, and
cron-unavailable cases
2026-03-22 04:18:18 -07:00
|
|
|
cron_err = self._check_jobs_available()
|
|
|
|
|
if cron_err:
|
|
|
|
|
return cron_err
|
|
|
|
|
job_id, id_err = self._check_job_id(request)
|
|
|
|
|
if id_err:
|
|
|
|
|
return id_err
|
2026-03-22 04:06:57 -07:00
|
|
|
try:
|
fix(api-server): harden jobs API — input limits, field whitelist, startup check, tests
Five improvements to the /api/jobs endpoints:
1. Startup availability check — cron module imported once at class load,
endpoints return 501 if unavailable (not 500 per-request import error)
2. Input limits — name ≤ 200 chars, prompt ≤ 5000 chars, repeat must be
positive int
3. Update field whitelist — only name/schedule/prompt/deliver/skills/
repeat/enabled pass through to cron.jobs.update_job, preventing
arbitrary key injection
4. Deduplicated validation — _check_job_id and _check_jobs_available
helpers replace repeated boilerplate
5. 32 new tests covering all endpoints, validation, auth, and
cron-unavailable cases
2026-03-22 04:18:18 -07:00
|
|
|
job = self._cron_resume(job_id)
|
2026-03-22 04:06:57 -07:00
|
|
|
if not job:
|
|
|
|
|
return web.json_response({"error": "Job not found"}, status=404)
|
|
|
|
|
return web.json_response({"job": job})
|
|
|
|
|
except Exception as e:
|
|
|
|
|
return web.json_response({"error": str(e)}, status=500)
|
|
|
|
|
|
|
|
|
|
async def _handle_run_job(self, request: "web.Request") -> "web.Response":
|
|
|
|
|
"""POST /api/jobs/{job_id}/run — trigger immediate execution."""
|
|
|
|
|
auth_err = self._check_auth(request)
|
|
|
|
|
if auth_err:
|
|
|
|
|
return auth_err
|
fix(api-server): harden jobs API — input limits, field whitelist, startup check, tests
Five improvements to the /api/jobs endpoints:
1. Startup availability check — cron module imported once at class load,
endpoints return 501 if unavailable (not 500 per-request import error)
2. Input limits — name ≤ 200 chars, prompt ≤ 5000 chars, repeat must be
positive int
3. Update field whitelist — only name/schedule/prompt/deliver/skills/
repeat/enabled pass through to cron.jobs.update_job, preventing
arbitrary key injection
4. Deduplicated validation — _check_job_id and _check_jobs_available
helpers replace repeated boilerplate
5. 32 new tests covering all endpoints, validation, auth, and
cron-unavailable cases
2026-03-22 04:18:18 -07:00
|
|
|
cron_err = self._check_jobs_available()
|
|
|
|
|
if cron_err:
|
|
|
|
|
return cron_err
|
|
|
|
|
job_id, id_err = self._check_job_id(request)
|
|
|
|
|
if id_err:
|
|
|
|
|
return id_err
|
2026-03-22 04:06:57 -07:00
|
|
|
try:
|
fix(api-server): harden jobs API — input limits, field whitelist, startup check, tests
Five improvements to the /api/jobs endpoints:
1. Startup availability check — cron module imported once at class load,
endpoints return 501 if unavailable (not 500 per-request import error)
2. Input limits — name ≤ 200 chars, prompt ≤ 5000 chars, repeat must be
positive int
3. Update field whitelist — only name/schedule/prompt/deliver/skills/
repeat/enabled pass through to cron.jobs.update_job, preventing
arbitrary key injection
4. Deduplicated validation — _check_job_id and _check_jobs_available
helpers replace repeated boilerplate
5. 32 new tests covering all endpoints, validation, auth, and
cron-unavailable cases
2026-03-22 04:18:18 -07:00
|
|
|
job = self._cron_trigger(job_id)
|
2026-03-22 04:06:57 -07:00
|
|
|
if not job:
|
|
|
|
|
return web.json_response({"error": "Job not found"}, status=404)
|
|
|
|
|
return web.json_response({"job": job})
|
|
|
|
|
except Exception as e:
|
|
|
|
|
return web.json_response({"error": str(e)}, status=500)
|
|
|
|
|
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
# Output extraction helper
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
@staticmethod
|
|
|
|
|
def _extract_output_items(result: Dict[str, Any]) -> List[Dict[str, Any]]:
|
|
|
|
|
"""
|
|
|
|
|
Build the full output item array from the agent's messages.
|
|
|
|
|
|
|
|
|
|
Walks *result["messages"]* and emits:
|
|
|
|
|
- ``function_call`` items for each tool_call on assistant messages
|
|
|
|
|
- ``function_call_output`` items for each tool-role message
|
|
|
|
|
- a final ``message`` item with the assistant's text reply
|
|
|
|
|
"""
|
|
|
|
|
items: List[Dict[str, Any]] = []
|
|
|
|
|
messages = result.get("messages", [])
|
|
|
|
|
|
|
|
|
|
for msg in messages:
|
|
|
|
|
role = msg.get("role")
|
|
|
|
|
if role == "assistant" and msg.get("tool_calls"):
|
|
|
|
|
for tc in msg["tool_calls"]:
|
|
|
|
|
func = tc.get("function", {})
|
|
|
|
|
items.append({
|
|
|
|
|
"type": "function_call",
|
|
|
|
|
"name": func.get("name", ""),
|
|
|
|
|
"arguments": func.get("arguments", ""),
|
|
|
|
|
"call_id": tc.get("id", ""),
|
|
|
|
|
})
|
|
|
|
|
elif role == "tool":
|
|
|
|
|
items.append({
|
|
|
|
|
"type": "function_call_output",
|
|
|
|
|
"call_id": msg.get("tool_call_id", ""),
|
2026-04-14 20:47:06 -07:00
|
|
|
"output": msg.get("content", ""),
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
})
|
|
|
|
|
|
|
|
|
|
# Final assistant message
|
|
|
|
|
final = result.get("final_response", "")
|
|
|
|
|
if not final:
|
|
|
|
|
final = result.get("error", "(No response generated)")
|
|
|
|
|
|
|
|
|
|
items.append({
|
|
|
|
|
"type": "message",
|
|
|
|
|
"role": "assistant",
|
|
|
|
|
"content": [
|
|
|
|
|
{
|
|
|
|
|
"type": "output_text",
|
|
|
|
|
"text": final,
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
})
|
|
|
|
|
return items
|
|
|
|
|
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
# Agent execution
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
async def _run_agent(
|
|
|
|
|
self,
|
|
|
|
|
user_message: str,
|
|
|
|
|
conversation_history: List[Dict[str, str]],
|
|
|
|
|
ephemeral_system_prompt: Optional[str] = None,
|
|
|
|
|
session_id: Optional[str] = None,
|
|
|
|
|
stream_delta_callback=None,
|
2026-03-30 18:50:27 -07:00
|
|
|
tool_progress_callback=None,
|
2026-04-14 09:48:49 -04:00
|
|
|
tool_start_callback=None,
|
|
|
|
|
tool_complete_callback=None,
|
2026-03-27 11:33:19 -07:00
|
|
|
agent_ref: Optional[list] = None,
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
) -> tuple:
|
|
|
|
|
"""
|
|
|
|
|
Create an agent and run a conversation in a thread executor.
|
|
|
|
|
|
|
|
|
|
Returns ``(result_dict, usage_dict)`` where *usage_dict* contains
|
|
|
|
|
``input_tokens``, ``output_tokens`` and ``total_tokens``.
|
2026-03-27 11:33:19 -07:00
|
|
|
|
|
|
|
|
If *agent_ref* is a one-element list, the AIAgent instance is stored
|
|
|
|
|
at ``agent_ref[0]`` before ``run_conversation`` begins. This allows
|
|
|
|
|
callers (e.g. the SSE writer) to call ``agent.interrupt()`` from
|
|
|
|
|
another thread to stop in-progress LLM calls.
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
"""
|
|
|
|
|
loop = asyncio.get_event_loop()
|
|
|
|
|
|
|
|
|
|
def _run():
|
|
|
|
|
agent = self._create_agent(
|
|
|
|
|
ephemeral_system_prompt=ephemeral_system_prompt,
|
|
|
|
|
session_id=session_id,
|
|
|
|
|
stream_delta_callback=stream_delta_callback,
|
2026-03-30 18:50:27 -07:00
|
|
|
tool_progress_callback=tool_progress_callback,
|
2026-04-14 09:48:49 -04:00
|
|
|
tool_start_callback=tool_start_callback,
|
|
|
|
|
tool_complete_callback=tool_complete_callback,
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
)
|
2026-03-27 11:33:19 -07:00
|
|
|
if agent_ref is not None:
|
|
|
|
|
agent_ref[0] = agent
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
result = agent.run_conversation(
|
|
|
|
|
user_message=user_message,
|
|
|
|
|
conversation_history=conversation_history,
|
2026-04-10 04:56:35 -07:00
|
|
|
task_id="default",
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
)
|
|
|
|
|
usage = {
|
|
|
|
|
"input_tokens": getattr(agent, "session_prompt_tokens", 0) or 0,
|
|
|
|
|
"output_tokens": getattr(agent, "session_completion_tokens", 0) or 0,
|
|
|
|
|
"total_tokens": getattr(agent, "session_total_tokens", 0) or 0,
|
|
|
|
|
}
|
|
|
|
|
return result, usage
|
|
|
|
|
|
|
|
|
|
return await loop.run_in_executor(None, _run)
|
|
|
|
|
|
feat(api): structured run events via /v1/runs SSE endpoint
Add POST /v1/runs to start async agent runs and GET /v1/runs/{run_id}/events
for SSE streaming of typed lifecycle events (tool.started, tool.completed,
message.delta, reasoning.available, run.completed, run.failed).
Changes the internal tool_progress_callback signature from positional
(tool_name, preview, args) to event-type-first
(event_type, tool_name, preview, args, **kwargs). Existing consumers
filter on event_type and remain backward-compatible.
Adds concurrency limit (_MAX_CONCURRENT_RUNS=10) and orphaned run sweep.
Fixes logic inversion in cli.py _on_tool_progress where the original PR
would have displayed internal tools instead of non-internal ones.
Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
2026-04-05 11:52:46 -07:00
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
# /v1/runs — structured event streaming
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
_MAX_CONCURRENT_RUNS = 10 # Prevent unbounded resource allocation
|
|
|
|
|
_RUN_STREAM_TTL = 300 # seconds before orphaned runs are swept
|
|
|
|
|
|
|
|
|
|
def _make_run_event_callback(self, run_id: str, loop: "asyncio.AbstractEventLoop"):
|
|
|
|
|
"""Return a tool_progress_callback that pushes structured events to the run's SSE queue."""
|
|
|
|
|
def _push(event: Dict[str, Any]) -> None:
|
|
|
|
|
q = self._run_streams.get(run_id)
|
|
|
|
|
if q is None:
|
|
|
|
|
return
|
|
|
|
|
try:
|
|
|
|
|
loop.call_soon_threadsafe(q.put_nowait, event)
|
|
|
|
|
except Exception:
|
|
|
|
|
pass
|
|
|
|
|
|
|
|
|
|
def _callback(event_type: str, tool_name: str = None, preview: str = None, args=None, **kwargs):
|
|
|
|
|
ts = time.time()
|
|
|
|
|
if event_type == "tool.started":
|
|
|
|
|
_push({
|
|
|
|
|
"event": "tool.started",
|
|
|
|
|
"run_id": run_id,
|
|
|
|
|
"timestamp": ts,
|
|
|
|
|
"tool": tool_name,
|
|
|
|
|
"preview": preview,
|
|
|
|
|
})
|
|
|
|
|
elif event_type == "tool.completed":
|
|
|
|
|
_push({
|
|
|
|
|
"event": "tool.completed",
|
|
|
|
|
"run_id": run_id,
|
|
|
|
|
"timestamp": ts,
|
|
|
|
|
"tool": tool_name,
|
|
|
|
|
"duration": round(kwargs.get("duration", 0), 3),
|
|
|
|
|
"error": kwargs.get("is_error", False),
|
|
|
|
|
})
|
|
|
|
|
elif event_type == "reasoning.available":
|
|
|
|
|
_push({
|
|
|
|
|
"event": "reasoning.available",
|
|
|
|
|
"run_id": run_id,
|
|
|
|
|
"timestamp": ts,
|
|
|
|
|
"text": preview or "",
|
|
|
|
|
})
|
|
|
|
|
# _thinking and subagent_progress are intentionally not forwarded
|
|
|
|
|
|
|
|
|
|
return _callback
|
|
|
|
|
|
|
|
|
|
async def _handle_runs(self, request: "web.Request") -> "web.Response":
|
|
|
|
|
"""POST /v1/runs — start an agent run, return run_id immediately."""
|
|
|
|
|
auth_err = self._check_auth(request)
|
|
|
|
|
if auth_err:
|
|
|
|
|
return auth_err
|
|
|
|
|
|
|
|
|
|
# Enforce concurrency limit
|
|
|
|
|
if len(self._run_streams) >= self._MAX_CONCURRENT_RUNS:
|
|
|
|
|
return web.json_response(
|
|
|
|
|
_openai_error(f"Too many concurrent runs (max {self._MAX_CONCURRENT_RUNS})", code="rate_limit_exceeded"),
|
|
|
|
|
status=429,
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
try:
|
|
|
|
|
body = await request.json()
|
|
|
|
|
except Exception:
|
|
|
|
|
return web.json_response(_openai_error("Invalid JSON"), status=400)
|
|
|
|
|
|
|
|
|
|
raw_input = body.get("input")
|
|
|
|
|
if not raw_input:
|
|
|
|
|
return web.json_response(_openai_error("Missing 'input' field"), status=400)
|
|
|
|
|
|
|
|
|
|
user_message = raw_input if isinstance(raw_input, str) else (raw_input[-1].get("content", "") if isinstance(raw_input, list) else "")
|
|
|
|
|
if not user_message:
|
|
|
|
|
return web.json_response(_openai_error("No user message found in input"), status=400)
|
|
|
|
|
|
|
|
|
|
run_id = f"run_{uuid.uuid4().hex}"
|
|
|
|
|
loop = asyncio.get_running_loop()
|
|
|
|
|
q: "asyncio.Queue[Optional[Dict]]" = asyncio.Queue()
|
|
|
|
|
self._run_streams[run_id] = q
|
|
|
|
|
self._run_streams_created[run_id] = time.time()
|
|
|
|
|
|
|
|
|
|
event_cb = self._make_run_event_callback(run_id, loop)
|
|
|
|
|
|
|
|
|
|
# Also wire stream_delta_callback so message.delta events flow through
|
|
|
|
|
def _text_cb(delta: Optional[str]) -> None:
|
|
|
|
|
if delta is None:
|
|
|
|
|
return
|
|
|
|
|
try:
|
|
|
|
|
loop.call_soon_threadsafe(q.put_nowait, {
|
|
|
|
|
"event": "message.delta",
|
|
|
|
|
"run_id": run_id,
|
|
|
|
|
"timestamp": time.time(),
|
|
|
|
|
"delta": delta,
|
|
|
|
|
})
|
|
|
|
|
except Exception:
|
|
|
|
|
pass
|
|
|
|
|
|
|
|
|
|
instructions = body.get("instructions")
|
|
|
|
|
previous_response_id = body.get("previous_response_id")
|
2026-04-07 17:41:05 -07:00
|
|
|
|
|
|
|
|
# Accept explicit conversation_history from the request body.
|
|
|
|
|
# Precedence: explicit conversation_history > previous_response_id.
|
feat(api): structured run events via /v1/runs SSE endpoint
Add POST /v1/runs to start async agent runs and GET /v1/runs/{run_id}/events
for SSE streaming of typed lifecycle events (tool.started, tool.completed,
message.delta, reasoning.available, run.completed, run.failed).
Changes the internal tool_progress_callback signature from positional
(tool_name, preview, args) to event-type-first
(event_type, tool_name, preview, args, **kwargs). Existing consumers
filter on event_type and remain backward-compatible.
Adds concurrency limit (_MAX_CONCURRENT_RUNS=10) and orphaned run sweep.
Fixes logic inversion in cli.py _on_tool_progress where the original PR
would have displayed internal tools instead of non-internal ones.
Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
2026-04-05 11:52:46 -07:00
|
|
|
conversation_history: List[Dict[str, str]] = []
|
2026-04-07 17:41:05 -07:00
|
|
|
raw_history = body.get("conversation_history")
|
|
|
|
|
if raw_history:
|
|
|
|
|
if not isinstance(raw_history, list):
|
|
|
|
|
return web.json_response(
|
|
|
|
|
_openai_error("'conversation_history' must be an array of message objects"),
|
|
|
|
|
status=400,
|
|
|
|
|
)
|
|
|
|
|
for i, entry in enumerate(raw_history):
|
|
|
|
|
if not isinstance(entry, dict) or "role" not in entry or "content" not in entry:
|
|
|
|
|
return web.json_response(
|
|
|
|
|
_openai_error(f"conversation_history[{i}] must have 'role' and 'content' fields"),
|
|
|
|
|
status=400,
|
|
|
|
|
)
|
|
|
|
|
conversation_history.append({"role": str(entry["role"]), "content": str(entry["content"])})
|
|
|
|
|
if previous_response_id:
|
|
|
|
|
logger.debug("Both conversation_history and previous_response_id provided; using conversation_history")
|
|
|
|
|
|
2026-04-14 21:06:32 -07:00
|
|
|
stored_session_id = None
|
2026-04-07 17:41:05 -07:00
|
|
|
if not conversation_history and previous_response_id:
|
feat(api): structured run events via /v1/runs SSE endpoint
Add POST /v1/runs to start async agent runs and GET /v1/runs/{run_id}/events
for SSE streaming of typed lifecycle events (tool.started, tool.completed,
message.delta, reasoning.available, run.completed, run.failed).
Changes the internal tool_progress_callback signature from positional
(tool_name, preview, args) to event-type-first
(event_type, tool_name, preview, args, **kwargs). Existing consumers
filter on event_type and remain backward-compatible.
Adds concurrency limit (_MAX_CONCURRENT_RUNS=10) and orphaned run sweep.
Fixes logic inversion in cli.py _on_tool_progress where the original PR
would have displayed internal tools instead of non-internal ones.
Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
2026-04-05 11:52:46 -07:00
|
|
|
stored = self._response_store.get(previous_response_id)
|
|
|
|
|
if stored:
|
|
|
|
|
conversation_history = list(stored.get("conversation_history", []))
|
2026-04-14 21:06:32 -07:00
|
|
|
stored_session_id = stored.get("session_id")
|
feat(api): structured run events via /v1/runs SSE endpoint
Add POST /v1/runs to start async agent runs and GET /v1/runs/{run_id}/events
for SSE streaming of typed lifecycle events (tool.started, tool.completed,
message.delta, reasoning.available, run.completed, run.failed).
Changes the internal tool_progress_callback signature from positional
(tool_name, preview, args) to event-type-first
(event_type, tool_name, preview, args, **kwargs). Existing consumers
filter on event_type and remain backward-compatible.
Adds concurrency limit (_MAX_CONCURRENT_RUNS=10) and orphaned run sweep.
Fixes logic inversion in cli.py _on_tool_progress where the original PR
would have displayed internal tools instead of non-internal ones.
Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
2026-04-05 11:52:46 -07:00
|
|
|
if instructions is None:
|
|
|
|
|
instructions = stored.get("instructions")
|
|
|
|
|
|
2026-04-07 17:41:34 -07:00
|
|
|
# When input is a multi-message array, extract all but the last
|
|
|
|
|
# message as conversation history (the last becomes user_message).
|
|
|
|
|
# Only fires when no explicit history was provided.
|
|
|
|
|
if not conversation_history and isinstance(raw_input, list) and len(raw_input) > 1:
|
|
|
|
|
for msg in raw_input[:-1]:
|
|
|
|
|
if isinstance(msg, dict) and msg.get("role") and msg.get("content"):
|
|
|
|
|
content = msg["content"]
|
|
|
|
|
if isinstance(content, list):
|
|
|
|
|
# Flatten multi-part content blocks to text
|
|
|
|
|
content = " ".join(
|
|
|
|
|
part.get("text", "") for part in content
|
|
|
|
|
if isinstance(part, dict) and part.get("type") == "text"
|
|
|
|
|
)
|
|
|
|
|
conversation_history.append({"role": msg["role"], "content": str(content)})
|
|
|
|
|
|
2026-04-14 21:06:32 -07:00
|
|
|
session_id = body.get("session_id") or stored_session_id or run_id
|
feat(api): structured run events via /v1/runs SSE endpoint
Add POST /v1/runs to start async agent runs and GET /v1/runs/{run_id}/events
for SSE streaming of typed lifecycle events (tool.started, tool.completed,
message.delta, reasoning.available, run.completed, run.failed).
Changes the internal tool_progress_callback signature from positional
(tool_name, preview, args) to event-type-first
(event_type, tool_name, preview, args, **kwargs). Existing consumers
filter on event_type and remain backward-compatible.
Adds concurrency limit (_MAX_CONCURRENT_RUNS=10) and orphaned run sweep.
Fixes logic inversion in cli.py _on_tool_progress where the original PR
would have displayed internal tools instead of non-internal ones.
Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
2026-04-05 11:52:46 -07:00
|
|
|
ephemeral_system_prompt = instructions
|
|
|
|
|
|
|
|
|
|
async def _run_and_close():
|
|
|
|
|
try:
|
|
|
|
|
agent = self._create_agent(
|
|
|
|
|
ephemeral_system_prompt=ephemeral_system_prompt,
|
|
|
|
|
session_id=session_id,
|
|
|
|
|
stream_delta_callback=_text_cb,
|
|
|
|
|
tool_progress_callback=event_cb,
|
|
|
|
|
)
|
|
|
|
|
def _run_sync():
|
|
|
|
|
r = agent.run_conversation(
|
|
|
|
|
user_message=user_message,
|
|
|
|
|
conversation_history=conversation_history,
|
2026-04-10 04:56:35 -07:00
|
|
|
task_id="default",
|
feat(api): structured run events via /v1/runs SSE endpoint
Add POST /v1/runs to start async agent runs and GET /v1/runs/{run_id}/events
for SSE streaming of typed lifecycle events (tool.started, tool.completed,
message.delta, reasoning.available, run.completed, run.failed).
Changes the internal tool_progress_callback signature from positional
(tool_name, preview, args) to event-type-first
(event_type, tool_name, preview, args, **kwargs). Existing consumers
filter on event_type and remain backward-compatible.
Adds concurrency limit (_MAX_CONCURRENT_RUNS=10) and orphaned run sweep.
Fixes logic inversion in cli.py _on_tool_progress where the original PR
would have displayed internal tools instead of non-internal ones.
Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
2026-04-05 11:52:46 -07:00
|
|
|
)
|
|
|
|
|
u = {
|
|
|
|
|
"input_tokens": getattr(agent, "session_prompt_tokens", 0) or 0,
|
|
|
|
|
"output_tokens": getattr(agent, "session_completion_tokens", 0) or 0,
|
|
|
|
|
"total_tokens": getattr(agent, "session_total_tokens", 0) or 0,
|
|
|
|
|
}
|
|
|
|
|
return r, u
|
|
|
|
|
|
|
|
|
|
result, usage = await asyncio.get_running_loop().run_in_executor(None, _run_sync)
|
|
|
|
|
final_response = result.get("final_response", "") if isinstance(result, dict) else ""
|
|
|
|
|
q.put_nowait({
|
|
|
|
|
"event": "run.completed",
|
|
|
|
|
"run_id": run_id,
|
|
|
|
|
"timestamp": time.time(),
|
|
|
|
|
"output": final_response,
|
|
|
|
|
"usage": usage,
|
|
|
|
|
})
|
|
|
|
|
except Exception as exc:
|
|
|
|
|
logger.exception("[api_server] run %s failed", run_id)
|
|
|
|
|
try:
|
|
|
|
|
q.put_nowait({
|
|
|
|
|
"event": "run.failed",
|
|
|
|
|
"run_id": run_id,
|
|
|
|
|
"timestamp": time.time(),
|
|
|
|
|
"error": str(exc),
|
|
|
|
|
})
|
|
|
|
|
except Exception:
|
|
|
|
|
pass
|
|
|
|
|
finally:
|
|
|
|
|
# Sentinel: signal SSE stream to close
|
|
|
|
|
try:
|
|
|
|
|
q.put_nowait(None)
|
|
|
|
|
except Exception:
|
|
|
|
|
pass
|
|
|
|
|
|
|
|
|
|
task = asyncio.create_task(_run_and_close())
|
|
|
|
|
try:
|
|
|
|
|
self._background_tasks.add(task)
|
|
|
|
|
except TypeError:
|
|
|
|
|
pass
|
|
|
|
|
if hasattr(task, "add_done_callback"):
|
|
|
|
|
task.add_done_callback(self._background_tasks.discard)
|
|
|
|
|
|
|
|
|
|
return web.json_response({"run_id": run_id, "status": "started"}, status=202)
|
|
|
|
|
|
|
|
|
|
async def _handle_run_events(self, request: "web.Request") -> "web.StreamResponse":
|
|
|
|
|
"""GET /v1/runs/{run_id}/events — SSE stream of structured agent lifecycle events."""
|
|
|
|
|
auth_err = self._check_auth(request)
|
|
|
|
|
if auth_err:
|
|
|
|
|
return auth_err
|
|
|
|
|
|
|
|
|
|
run_id = request.match_info["run_id"]
|
|
|
|
|
|
|
|
|
|
# Allow subscribing slightly before the run is registered (race condition window)
|
|
|
|
|
for _ in range(20):
|
|
|
|
|
if run_id in self._run_streams:
|
|
|
|
|
break
|
|
|
|
|
await asyncio.sleep(0.05)
|
|
|
|
|
else:
|
|
|
|
|
return web.json_response(_openai_error(f"Run not found: {run_id}", code="run_not_found"), status=404)
|
|
|
|
|
|
|
|
|
|
q = self._run_streams[run_id]
|
|
|
|
|
|
|
|
|
|
response = web.StreamResponse(
|
|
|
|
|
status=200,
|
|
|
|
|
headers={
|
|
|
|
|
"Content-Type": "text/event-stream",
|
|
|
|
|
"Cache-Control": "no-cache",
|
|
|
|
|
"X-Accel-Buffering": "no",
|
|
|
|
|
},
|
|
|
|
|
)
|
|
|
|
|
await response.prepare(request)
|
|
|
|
|
|
|
|
|
|
try:
|
|
|
|
|
while True:
|
|
|
|
|
try:
|
|
|
|
|
event = await asyncio.wait_for(q.get(), timeout=30.0)
|
|
|
|
|
except asyncio.TimeoutError:
|
|
|
|
|
await response.write(b": keepalive\n\n")
|
|
|
|
|
continue
|
|
|
|
|
if event is None:
|
|
|
|
|
# Run finished — send final SSE comment and close
|
|
|
|
|
await response.write(b": stream closed\n\n")
|
|
|
|
|
break
|
|
|
|
|
payload = f"data: {json.dumps(event)}\n\n"
|
|
|
|
|
await response.write(payload.encode())
|
|
|
|
|
except Exception as exc:
|
|
|
|
|
logger.debug("[api_server] SSE stream error for run %s: %s", run_id, exc)
|
|
|
|
|
finally:
|
|
|
|
|
self._run_streams.pop(run_id, None)
|
|
|
|
|
self._run_streams_created.pop(run_id, None)
|
|
|
|
|
|
|
|
|
|
return response
|
|
|
|
|
|
|
|
|
|
async def _sweep_orphaned_runs(self) -> None:
|
|
|
|
|
"""Periodically clean up run streams that were never consumed."""
|
|
|
|
|
while True:
|
|
|
|
|
await asyncio.sleep(60)
|
|
|
|
|
now = time.time()
|
|
|
|
|
stale = [
|
|
|
|
|
run_id
|
|
|
|
|
for run_id, created_at in list(self._run_streams_created.items())
|
|
|
|
|
if now - created_at > self._RUN_STREAM_TTL
|
|
|
|
|
]
|
|
|
|
|
for run_id in stale:
|
|
|
|
|
logger.debug("[api_server] sweeping orphaned run %s", run_id)
|
|
|
|
|
self._run_streams.pop(run_id, None)
|
|
|
|
|
self._run_streams_created.pop(run_id, None)
|
|
|
|
|
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
# BasePlatformAdapter interface
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
async def connect(self) -> bool:
|
|
|
|
|
"""Start the aiohttp web server."""
|
|
|
|
|
if not AIOHTTP_AVAILABLE:
|
|
|
|
|
logger.warning("[%s] aiohttp not installed", self.name)
|
|
|
|
|
return False
|
|
|
|
|
|
|
|
|
|
try:
|
2026-03-28 14:00:52 -07:00
|
|
|
mws = [mw for mw in (cors_middleware, body_limit_middleware, security_headers_middleware) if mw is not None]
|
2026-03-24 19:31:08 -07:00
|
|
|
self._app = web.Application(middlewares=mws)
|
2026-03-22 04:08:48 -07:00
|
|
|
self._app["api_server_adapter"] = self
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
self._app.router.add_get("/health", self._handle_health)
|
2026-04-14 05:17:17 +00:00
|
|
|
self._app.router.add_get("/health/detailed", self._handle_health_detailed)
|
2026-03-28 13:32:39 -07:00
|
|
|
self._app.router.add_get("/v1/health", self._handle_health)
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
self._app.router.add_get("/v1/models", self._handle_models)
|
|
|
|
|
self._app.router.add_post("/v1/chat/completions", self._handle_chat_completions)
|
|
|
|
|
self._app.router.add_post("/v1/responses", self._handle_responses)
|
|
|
|
|
self._app.router.add_get("/v1/responses/{response_id}", self._handle_get_response)
|
|
|
|
|
self._app.router.add_delete("/v1/responses/{response_id}", self._handle_delete_response)
|
2026-03-22 04:06:57 -07:00
|
|
|
# Cron jobs management API
|
|
|
|
|
self._app.router.add_get("/api/jobs", self._handle_list_jobs)
|
|
|
|
|
self._app.router.add_post("/api/jobs", self._handle_create_job)
|
|
|
|
|
self._app.router.add_get("/api/jobs/{job_id}", self._handle_get_job)
|
|
|
|
|
self._app.router.add_patch("/api/jobs/{job_id}", self._handle_update_job)
|
|
|
|
|
self._app.router.add_delete("/api/jobs/{job_id}", self._handle_delete_job)
|
|
|
|
|
self._app.router.add_post("/api/jobs/{job_id}/pause", self._handle_pause_job)
|
|
|
|
|
self._app.router.add_post("/api/jobs/{job_id}/resume", self._handle_resume_job)
|
|
|
|
|
self._app.router.add_post("/api/jobs/{job_id}/run", self._handle_run_job)
|
feat(api): structured run events via /v1/runs SSE endpoint
Add POST /v1/runs to start async agent runs and GET /v1/runs/{run_id}/events
for SSE streaming of typed lifecycle events (tool.started, tool.completed,
message.delta, reasoning.available, run.completed, run.failed).
Changes the internal tool_progress_callback signature from positional
(tool_name, preview, args) to event-type-first
(event_type, tool_name, preview, args, **kwargs). Existing consumers
filter on event_type and remain backward-compatible.
Adds concurrency limit (_MAX_CONCURRENT_RUNS=10) and orphaned run sweep.
Fixes logic inversion in cli.py _on_tool_progress where the original PR
would have displayed internal tools instead of non-internal ones.
Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
2026-04-05 11:52:46 -07:00
|
|
|
# Structured event streaming
|
|
|
|
|
self._app.router.add_post("/v1/runs", self._handle_runs)
|
|
|
|
|
self._app.router.add_get("/v1/runs/{run_id}/events", self._handle_run_events)
|
|
|
|
|
# Start background sweep to clean up orphaned (unconsumed) run streams
|
|
|
|
|
sweep_task = asyncio.create_task(self._sweep_orphaned_runs())
|
|
|
|
|
try:
|
|
|
|
|
self._background_tasks.add(sweep_task)
|
|
|
|
|
except TypeError:
|
|
|
|
|
pass
|
|
|
|
|
if hasattr(sweep_task, "add_done_callback"):
|
|
|
|
|
sweep_task.add_done_callback(self._background_tasks.discard)
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
|
2026-04-10 16:40:54 -07:00
|
|
|
# Refuse to start network-accessible without authentication
|
|
|
|
|
if is_network_accessible(self._host) and not self._api_key:
|
|
|
|
|
logger.error(
|
|
|
|
|
"[%s] Refusing to start: binding to %s requires API_SERVER_KEY. "
|
|
|
|
|
"Set API_SERVER_KEY or use the default 127.0.0.1.",
|
|
|
|
|
self.name, self._host,
|
|
|
|
|
)
|
|
|
|
|
return False
|
|
|
|
|
|
2026-04-12 18:05:02 -07:00
|
|
|
# Refuse to start network-accessible with a placeholder key.
|
|
|
|
|
# Ported from openclaw/openclaw#64586.
|
|
|
|
|
if is_network_accessible(self._host) and self._api_key:
|
|
|
|
|
try:
|
|
|
|
|
from hermes_cli.auth import has_usable_secret
|
|
|
|
|
if not has_usable_secret(self._api_key, min_length=8):
|
|
|
|
|
logger.error(
|
|
|
|
|
"[%s] Refusing to start: API_SERVER_KEY is set to a "
|
|
|
|
|
"placeholder value. Generate a real secret "
|
|
|
|
|
"(e.g. `openssl rand -hex 32`) and set API_SERVER_KEY "
|
|
|
|
|
"before exposing the API server on %s.",
|
|
|
|
|
self.name, self._host,
|
|
|
|
|
)
|
|
|
|
|
return False
|
|
|
|
|
except ImportError:
|
|
|
|
|
pass
|
|
|
|
|
|
feat: add profiles — run multiple isolated Hermes instances (#3681)
Each profile is a fully independent HERMES_HOME with its own config,
API keys, memory, sessions, skills, gateway, cron, and state.db.
Core module: hermes_cli/profiles.py (~900 lines)
- Profile CRUD: create, delete, list, show, rename
- Three clone levels: blank, --clone (config), --clone-all (everything)
- Export/import: tar.gz archive for backup and migration
- Wrapper alias scripts (~/.local/bin/<name>)
- Collision detection for alias names
- Sticky default via ~/.hermes/active_profile
- Skill seeding via subprocess (handles module-level caching)
- Auto-stop gateway on delete with disable-before-stop for services
- Tab completion generation for bash and zsh
CLI integration (hermes_cli/main.py):
- _apply_profile_override(): pre-import -p/--profile flag + sticky default
- Full 'hermes profile' subcommand: list, use, create, delete, show,
alias, rename, export, import
- 'hermes completion bash/zsh' command
- Multi-profile skill sync in hermes update
Display (cli.py, banner.py, gateway/run.py):
- CLI prompt: 'coder ❯' when using a non-default profile
- Banner shows profile name
- Gateway startup log includes profile name
Gateway safety:
- Token locks: Discord, Slack, WhatsApp, Signal (extends Telegram pattern)
- Port conflict detection: API server, webhook adapter
Diagnostics (hermes_cli/doctor.py):
- Profile health section: lists profiles, checks config, .env, aliases
- Orphan alias detection: warns when wrapper points to deleted profile
Tests (tests/hermes_cli/test_profiles.py):
- 71 automated tests covering: validation, CRUD, clone levels, rename,
export/import, active profile, isolation, alias collision, completion
- Full suite: 6760 passed, 0 new failures
Documentation:
- website/docs/user-guide/profiles.md: full user guide (12 sections)
- website/docs/reference/profile-commands.md: command reference (12 commands)
- website/docs/reference/faq.md: 6 profile FAQ entries
- website/sidebars.ts: navigation updated
2026-03-29 10:41:20 -07:00
|
|
|
# Port conflict detection — fail fast if port is already in use
|
|
|
|
|
try:
|
|
|
|
|
with _socket.socket(_socket.AF_INET, _socket.SOCK_STREAM) as _s:
|
|
|
|
|
_s.settimeout(1)
|
|
|
|
|
_s.connect(('127.0.0.1', self._port))
|
|
|
|
|
logger.error('[%s] Port %d already in use. Set a different port in config.yaml: platforms.api_server.port', self.name, self._port)
|
|
|
|
|
return False
|
|
|
|
|
except (ConnectionRefusedError, OSError):
|
|
|
|
|
pass # port is free
|
|
|
|
|
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
self._runner = web.AppRunner(self._app)
|
|
|
|
|
await self._runner.setup()
|
|
|
|
|
self._site = web.TCPSite(self._runner, self._host, self._port)
|
|
|
|
|
await self._site.start()
|
|
|
|
|
|
|
|
|
|
self._mark_connected()
|
fix(security): require auth for session continuation and warn on missing API key
Two security hardening changes for the API server:
1. **Startup warning when no API key is configured.**
When `API_SERVER_KEY` is not set, all endpoints accept unauthenticated
requests. This is the default configuration, but operators may not
realize the security implications. A prominent warning at startup
makes the risk visible.
2. **Require authentication for session continuation.**
The `X-Hermes-Session-Id` header allows callers to load and continue
any session stored in state.db. Without authentication, an attacker
who can reach the API server (e.g. via CORS from a malicious page,
or on a shared host) could enumerate session IDs and read conversation
history — which may contain API keys, passwords, code, or other
sensitive data shared with the agent.
Session continuation now returns 403 when no API key is configured,
with a clear error message explaining how to enable the feature.
When a key IS configured, the existing Bearer token check already
gates access.
This is defense-in-depth: the API server is intended for local use,
but defense against cross-origin and shared-host attacks is important
since the default binding is 127.0.0.1 which is reachable from
browsers via DNS rebinding or localhost CORS.
2026-04-10 11:56:23 +08:00
|
|
|
if not self._api_key:
|
|
|
|
|
logger.warning(
|
|
|
|
|
"[%s] ⚠️ No API key configured (API_SERVER_KEY / platforms.api_server.key). "
|
|
|
|
|
"All requests will be accepted without authentication. "
|
|
|
|
|
"Set an API key for production deployments to prevent "
|
|
|
|
|
"unauthorized access to sessions, responses, and cron jobs.",
|
|
|
|
|
self.name,
|
|
|
|
|
)
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
logger.info(
|
2026-04-09 17:07:29 -07:00
|
|
|
"[%s] API server listening on http://%s:%d (model: %s)",
|
|
|
|
|
self.name, self._host, self._port, self._model_name,
|
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
|
|
|
)
|
|
|
|
|
return True
|
|
|
|
|
|
|
|
|
|
except Exception as e:
|
|
|
|
|
logger.error("[%s] Failed to start API server: %s", self.name, e)
|
|
|
|
|
return False
|
|
|
|
|
|
|
|
|
|
async def disconnect(self) -> None:
|
|
|
|
|
"""Stop the aiohttp web server."""
|
|
|
|
|
self._mark_disconnected()
|
|
|
|
|
if self._site:
|
|
|
|
|
await self._site.stop()
|
|
|
|
|
self._site = None
|
|
|
|
|
if self._runner:
|
|
|
|
|
await self._runner.cleanup()
|
|
|
|
|
self._runner = None
|
|
|
|
|
self._app = None
|
|
|
|
|
logger.info("[%s] API server stopped", self.name)
|
|
|
|
|
|
|
|
|
|
async def send(
|
|
|
|
|
self,
|
|
|
|
|
chat_id: str,
|
|
|
|
|
content: str,
|
|
|
|
|
reply_to: Optional[str] = None,
|
|
|
|
|
metadata: Optional[Dict[str, Any]] = None,
|
|
|
|
|
) -> SendResult:
|
|
|
|
|
"""
|
|
|
|
|
Not used — HTTP request/response cycle handles delivery directly.
|
|
|
|
|
"""
|
|
|
|
|
return SendResult(success=False, error="API server uses HTTP request/response, not send()")
|
|
|
|
|
|
|
|
|
|
async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
|
|
|
|
|
"""Return basic info about the API server."""
|
|
|
|
|
return {
|
|
|
|
|
"name": "API Server",
|
|
|
|
|
"type": "api",
|
|
|
|
|
"host": self._host,
|
|
|
|
|
"port": self._port,
|
|
|
|
|
}
|