Files

Teknium dd60bcbfb7 feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756 )

* feat: OpenAI-compatible API server platform adapter

Salvaged from PR #956, updated for current main.

Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.

Endpoints:
- POST /v1/chat/completions  — stateless Chat Completions API
- POST /v1/responses         — stateful Responses API with chaining
- GET  /v1/responses/{id}    — retrieve stored response
- DELETE /v1/responses/{id}  — delete stored response
- GET  /v1/models            — list hermes-agent as available model
- GET  /health               — health check

Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses

Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()

Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)

Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference

* feat(whatsapp): make reply prefix configurable via config.yaml

Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.

The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:

  whatsapp:
    reply_prefix: ''                     # disable header
    reply_prefix: '🤖 *My Bot*\n───\n'  # custom prefix

How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
  and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
  WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
  or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix

Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.

Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.

---------

Co-authored-by: Test <test@test.com>

2026-03-17 10:44:37 -07:00

7.1 KiB

Raw Blame History

sidebar_position, title, description

sidebar_position	title	description
5	WhatsApp	Set up Hermes Agent as a WhatsApp bot via the built-in Baileys bridge

WhatsApp Setup

Hermes connects to WhatsApp through a built-in bridge based on Baileys. This works by emulating a WhatsApp Web session — not through the official WhatsApp Business API. No Meta developer account or Business verification is required.

:::warning Unofficial API — Ban Risk WhatsApp does not officially support third-party bots outside the Business API. Using a third-party bridge carries a small risk of account restrictions. To minimize risk:

Use a dedicated phone number for the bot (not your personal number)
Don't send bulk/spam messages — keep usage conversational
Don't automate outbound messaging to people who haven't messaged first :::

:::warning WhatsApp Web Protocol Updates WhatsApp periodically updates their Web protocol, which can temporarily break compatibility with third-party bridges. When this happens, Hermes will update the bridge dependency. If the bot stops working after a WhatsApp update, pull the latest Hermes version and re-pair. :::

Two Modes

Mode	How it works	Best for
Separate bot number (recommended)	Dedicate a phone number to the bot. People message that number directly.	Clean UX, multiple users, lower ban risk
Personal self-chat	Use your own WhatsApp. You message yourself to talk to the agent.	Quick setup, single user, testing

Prerequisites

Node.js v18+ and npm — the WhatsApp bridge runs as a Node.js process
A phone with WhatsApp installed (for scanning the QR code)

Unlike older browser-driven bridges, the current Baileys-based bridge does not require a local Chromium or Puppeteer dependency stack.

Step 1: Run the Setup Wizard

hermes whatsapp

The wizard will:

Ask which mode you want (bot or self-chat)
Install bridge dependencies if needed
Display a QR code in your terminal
Wait for you to scan it

To scan the QR code:

Open WhatsApp on your phone
Go to Settings → Linked Devices
Tap Link a Device
Point your camera at the terminal QR code

Once paired, the wizard confirms the connection and exits. Your session is saved automatically.

:::tip If the QR code looks garbled, make sure your terminal is at least 60 columns wide and supports Unicode. You can also try a different terminal emulator. :::

Step 2: Getting a Second Phone Number (Bot Mode)

For bot mode, you need a phone number that isn't already registered with WhatsApp. Three options:

Option	Cost	Notes
Google Voice	Free	US only. Get a number at voice.google.com. Verify WhatsApp via SMS through the Google Voice app.
Prepaid SIM	$5–15 one-time	Any carrier. Activate, verify WhatsApp, then the SIM can sit in a drawer. Number must stay active (make a call every 90 days).
VoIP services	Free–$5/month	TextNow, TextFree, or similar. Some VoIP numbers are blocked by WhatsApp — try a few if the first doesn't work.

After getting the number:

Install WhatsApp on a phone (or use WhatsApp Business app with dual-SIM)
Register the new number with WhatsApp
Run hermes whatsapp and scan the QR code from that WhatsApp account

Step 3: Configure Hermes

Add the following to your ~/.hermes/.env file:

# Required
WHATSAPP_ENABLED=true
WHATSAPP_MODE=bot                          # "bot" or "self-chat"
WHATSAPP_ALLOWED_USERS=15551234567         # Comma-separated phone numbers (with country code, no +)

Then start the gateway:

hermes gateway              # Foreground
hermes gateway install      # Install as a user service
sudo hermes gateway install --system   # Linux only: boot-time system service

The gateway starts the WhatsApp bridge automatically using the saved session.

Session Persistence

The Baileys bridge saves its session under ~/.hermes/whatsapp/session. This means:

Sessions survive restarts — you don't need to re-scan the QR code every time
The session data includes encryption keys and device credentials
Do not share or commit this session directory — it grants full access to the WhatsApp account

Re-pairing

If the session breaks (phone reset, WhatsApp update, manually unlinked), you'll see connection errors in the gateway logs. To fix it:

hermes whatsapp

This generates a fresh QR code. Scan it again and the session is re-established. The gateway handles temporary disconnections (network blips, phone going offline briefly) automatically with reconnection logic.

Voice Messages

Hermes supports voice on WhatsApp:

Incoming: Voice messages (.ogg opus) are automatically transcribed using the configured STT provider: local faster-whisper, Groq Whisper (GROQ_API_KEY), or OpenAI Whisper (VOICE_TOOLS_OPENAI_KEY)
Outgoing: TTS responses are sent as MP3 audio file attachments
Agent responses are prefixed with "⚕ Hermes Agent" by default. You can customize or disable this in config.yaml:

# ~/.hermes/config.yaml
whatsapp:
  reply_prefix: ""                          # Empty string disables the header
  # reply_prefix: "🤖 *My Bot*\n──────\n"  # Custom prefix (supports \n for newlines)

Troubleshooting

Problem	Solution
QR code not scanning	Ensure terminal is wide enough (60+ columns). Try a different terminal. Make sure you're scanning from the correct WhatsApp account (bot number, not personal).
QR code expires	QR codes refresh every ~20 seconds. If it times out, restart `hermes whatsapp`.
Session not persisting	Check that `~/.hermes/whatsapp/session` exists and is writable. If containerized, mount it as a persistent volume.
Logged out unexpectedly	WhatsApp unlinks devices after long inactivity. Keep the phone on and connected to the network, then re-pair with `hermes whatsapp` if needed.
Bridge crashes or reconnect loops	Restart the gateway, update Hermes, and re-pair if the session was invalidated by a WhatsApp protocol change.
Bot stops working after WhatsApp update	Update Hermes to get the latest bridge version, then re-pair.
Messages not being received	Verify `WHATSAPP_ALLOWED_USERS` includes the sender's number (with country code, no `+` or spaces).

Security

:::warning Always set WHATSAPP_ALLOWED_USERS with phone numbers (including country code, without the +) of authorized users. Without this setting, the gateway will deny all incoming messages as a safety measure. :::

The ~/.hermes/whatsapp/session directory contains full session credentials — protect it like a password
Set file permissions: chmod 700 ~/.hermes/whatsapp/session
Use a dedicated phone number for the bot to isolate risk from your personal account
If you suspect compromise, unlink the device from WhatsApp → Settings → Linked Devices
Phone numbers in logs are partially redacted, but review your log retention policy

7.1 KiB Raw Blame History Unescape Escape