feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)

* feat: OpenAI-compatible API server platform adapter

Salvaged from PR #956, updated for current main.

Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.

Endpoints:
- POST /v1/chat/completions  — stateless Chat Completions API
- POST /v1/responses         — stateful Responses API with chaining
- GET  /v1/responses/{id}    — retrieve stored response
- DELETE /v1/responses/{id}  — delete stored response
- GET  /v1/models            — list hermes-agent as available model
- GET  /health               — health check

Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses

Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()

Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)

Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference

* feat(whatsapp): make reply prefix configurable via config.yaml

Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.

The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:

  whatsapp:
    reply_prefix: ''                     # disable header
    reply_prefix: '🤖 *My Bot*\n───\n'  # custom prefix

How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
  and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
  WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
  or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix

Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.

Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.

---------

Co-authored-by: Test <test@test.com>
This commit is contained in:
Teknium
2026-03-17 10:44:37 -07:00
committed by GitHub
parent b5cf0f0aef
commit dd60bcbfb7
15 changed files with 2765 additions and 10 deletions

View File

@@ -44,6 +44,14 @@ const SESSION_DIR = getArg('session', path.join(process.env.HOME || '~', '.herme
const PAIR_ONLY = args.includes('--pair-only');
const WHATSAPP_MODE = getArg('mode', process.env.WHATSAPP_MODE || 'self-chat'); // "bot" or "self-chat"
const ALLOWED_USERS = (process.env.WHATSAPP_ALLOWED_USERS || '').split(',').map(s => s.trim()).filter(Boolean);
const DEFAULT_REPLY_PREFIX = '⚕ *Hermes Agent*\n────────────\n';
const REPLY_PREFIX = process.env.WHATSAPP_REPLY_PREFIX === undefined
? DEFAULT_REPLY_PREFIX
: process.env.WHATSAPP_REPLY_PREFIX.replace(/\\n/g, '\n');
function formatOutgoingMessage(message) {
return REPLY_PREFIX ? `${REPLY_PREFIX}${message}` : message;
}
mkdirSync(SESSION_DIR, { recursive: true });
@@ -188,7 +196,7 @@ async function startSocket() {
}
// Ignore Hermes' own reply messages in self-chat mode to avoid loops.
if (msg.key.fromMe && (body.startsWith('⚕ *Hermes Agent*') || recentlySentIds.has(msg.key.id))) {
if (msg.key.fromMe && ((REPLY_PREFIX && body.startsWith(REPLY_PREFIX)) || recentlySentIds.has(msg.key.id))) {
if (WHATSAPP_DEBUG) {
try { console.log(JSON.stringify({ event: 'ignored', reason: 'agent_echo', chatId, messageId: msg.key.id })); } catch {}
}
@@ -251,10 +259,7 @@ app.post('/send', async (req, res) => {
}
try {
// Prefix responses so the user can distinguish agent replies from their
// own messages (especially in self-chat / "Message Yourself").
const prefixed = `⚕ *Hermes Agent*\n────────────\n${message}`;
const sent = await sock.sendMessage(chatId, { text: prefixed });
const sent = await sock.sendMessage(chatId, { text: formatOutgoingMessage(message) });
// Track sent message ID to prevent echo-back loops
if (sent?.key?.id) {
@@ -282,9 +287,8 @@ app.post('/edit', async (req, res) => {
}
try {
const prefixed = `⚕ *Hermes Agent*\n────────────\n${message}`;
const key = { id: messageId, fromMe: true, remoteJid: chatId };
await sock.sendMessage(chatId, { text: prefixed, edit: key });
await sock.sendMessage(chatId, { text: formatOutgoingMessage(message), edit: key });
res.json({ success: true });
} catch (err) {
res.status(500).json({ error: err.message });