Files

teknium1 eb49936a60 Update documentation and installation scripts for TTS audio formats

- Clarified the requirements for Telegram voice bubbles, specifying the need for ffmpeg when using Edge TTS.
- Enhanced README and messaging documentation to detail audio delivery formats across platforms.
- Improved installation script messages to inform users about the necessity of ffmpeg for proper audio playback on Telegram.

2026-02-14 16:16:54 -08:00

16 KiB

Raw Blame History

Messaging Platform Integrations (Gateway)

Hermes Agent can connect to messaging platforms like Telegram, Discord, and WhatsApp to serve as a conversational AI assistant.

Quick Start

# 1. Set your bot token(s) in .env file
echo 'TELEGRAM_BOT_TOKEN="your_telegram_bot_token"' >> .env
echo 'DISCORD_BOT_TOKEN="your_discord_bot_token"' >> .env

# 2. Test the gateway (foreground)
./scripts/hermes-gateway run

# 3. Install as a system service (runs in background)
./scripts/hermes-gateway install

# 4. Manage the service
./scripts/hermes-gateway start
./scripts/hermes-gateway stop
./scripts/hermes-gateway restart
./scripts/hermes-gateway status

Quick test (without service install):

python cli.py --gateway  # Runs in foreground, useful for debugging

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                      Hermes Gateway                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐          │
│  │   Telegram   │  │   Discord    │  │   WhatsApp   │          │
│  │   Adapter    │  │   Adapter    │  │   Adapter    │          │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘          │
│         │                 │                 │                   │
│         └─────────────────┼─────────────────┘                   │
│                           │                                     │
│                  ┌────────▼────────┐                            │
│                  │  Session Store  │                            │
│                  │  (per-chat)     │                            │
│                  └────────┬────────┘                            │
│                           │                                     │
│                  ┌────────▼────────┐                            │
│                  │   AIAgent       │                            │
│                  │   (run_agent)   │                            │
│                  └─────────────────┘                            │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Session Management

Session Persistence

Sessions persist across messages until they reset. The agent remembers your conversation context.

Reset Policies

Sessions reset based on configurable policies:

Policy	Default	Description
Daily	4:00 AM	Reset at a specific hour each day
Idle	120 min	Reset after N minutes of inactivity
Both	(combined)	Whichever triggers first

Manual Reset

Send /new or /reset as a message to start fresh.

Per-Platform Overrides

Configure different reset policies per platform:

{
  "reset_by_platform": {
    "telegram": { "mode": "idle", "idle_minutes": 240 },
    "discord": { "mode": "idle", "idle_minutes": 60 }
  }
}

Platform Setup

Create a bot via @BotFather
Get your token (looks like 123456789:ABCdefGHIjklMNOpqrsTUVwxyz)

Set environment variable:

export TELEGRAM_BOT_TOKEN="your_token_here"

Optional: Set home channel for cron job delivery:

export TELEGRAM_HOME_CHANNEL="-1001234567890"
export TELEGRAM_HOME_CHANNEL_NAME="My Notes"

Requirements:

pip install python-telegram-bot>=20.0

Discord

Create an application at Discord Developer Portal
Create a bot under your application
Get the bot token
Enable required intents:
- Message Content Intent
- Server Members Intent (optional)
Invite to your server using OAuth2 URL generator (scopes: bot, applications.commands)

Set environment variable:

export DISCORD_BOT_TOKEN="your_token_here"

Optional: Set home channel:

export DISCORD_HOME_CHANNEL="123456789012345678"
export DISCORD_HOME_CHANNEL_NAME="#bot-updates"

Requirements:

pip install discord.py>=2.0

WhatsApp integration is more complex due to the lack of a simple bot API.

Options:

WhatsApp Business API (requires Meta verification)
whatsapp-web.js via Node.js bridge (for personal accounts)

Bridge Setup:

Install Node.js
Set up the bridge script (see scripts/whatsapp-bridge/ for reference)

Configure in gateway:

{
  "platforms": {
    "whatsapp": {
      "enabled": true,
      "extra": {
        "bridge_script": "/path/to/bridge.js",
        "bridge_port": 3000
      }
    }
  }
}

Configuration

There are three ways to configure the gateway (in order of precedence):

1. Environment Variables (`.env` file) - Recommended for Quick Setup

Add to your ~/.hermes/.env file:

# =============================================================================
# MESSAGING PLATFORM TOKENS
# =============================================================================

# Telegram - get from @BotFather on Telegram
TELEGRAM_BOT_TOKEN=your_telegram_bot_token
TELEGRAM_ALLOWED_USERS=123456789,987654321    # Security: restrict to these user IDs

# Optional: Default channel for cron job delivery
TELEGRAM_HOME_CHANNEL=-1001234567890
TELEGRAM_HOME_CHANNEL_NAME="My Notes"

# Discord - get from Discord Developer Portal
DISCORD_BOT_TOKEN=your_discord_bot_token
DISCORD_ALLOWED_USERS=123456789012345678      # Security: restrict to these user IDs

# Optional: Default channel for cron job delivery
DISCORD_HOME_CHANNEL=123456789012345678
DISCORD_HOME_CHANNEL_NAME="#bot-updates"

# WhatsApp - requires Node.js bridge setup
WHATSAPP_ENABLED=true

# =============================================================================
# AGENT SETTINGS
# =============================================================================

# Max tool-calling iterations per conversation (default: 60)
HERMES_MAX_ITERATIONS=60

# Working directory for terminal commands (default: home ~)
MESSAGING_CWD=/home/myuser

# =============================================================================
# TOOL PROGRESS NOTIFICATIONS
# =============================================================================

# Show progress messages as agent uses tools
HERMES_TOOL_PROGRESS=true

# Mode: "new" (only when tool changes) or "all" (every tool call)
HERMES_TOOL_PROGRESS_MODE=new

# =============================================================================
# SESSION SETTINGS
# =============================================================================

# Reset sessions after N minutes of inactivity (default: 120)
SESSION_IDLE_MINUTES=120

# Daily reset hour in 24h format (default: 4 = 4am)
SESSION_RESET_HOUR=4

2. Gateway Config File (`~/.hermes/gateway.json`) - Full Control

For advanced configuration, create ~/.hermes/gateway.json:

{
  "platforms": {
    "telegram": {
      "enabled": true,
      "token": "your_telegram_token",
      "home_channel": {
        "platform": "telegram",
        "chat_id": "-1001234567890",
        "name": "My Notes"
      }
    },
    "discord": {
      "enabled": true,
      "token": "your_discord_token",
      "home_channel": {
        "platform": "discord",
        "chat_id": "123456789012345678",
        "name": "#bot-updates"
      }
    }
  },
  "default_reset_policy": {
    "mode": "both",
    "at_hour": 4,
    "idle_minutes": 120
  },
  "reset_by_platform": {
    "discord": {
      "mode": "idle",
      "idle_minutes": 60
    }
  },
  "always_log_local": true
}

Platform-Specific Toolsets

Each platform has its own toolset for security:

Platform	Toolset	Capabilities
CLI	`hermes-cli`	Full access (terminal, browser, etc.)
Telegram	`hermes-telegram`	Full tools including terminal
Discord	`hermes-discord`	Full tools including terminal
WhatsApp	`hermes-whatsapp`	Full tools including terminal

User Experience Features

Typing Indicator

The gateway keeps the "typing..." indicator active throughout processing, refreshing every 4 seconds. This lets users know the bot is working even during long tool-calling sequences.

Tool Progress Notifications

When HERMES_TOOL_PROGRESS=true, the bot sends status messages as it works:

💻 `ls -la`...
🔍 web_search...
📄 web_extract...
🎨 image_generate...

Terminal commands show the actual command (truncated to 50 chars). Other tools just show the tool name.

Modes:

new: Only sends message when switching to a different tool (less spam)
all: Sends message for every single tool call

Working Directory

CLI (hermes command): Uses current directory where you run the command
Messaging: Uses MESSAGING_CWD (default: home directory ~)

This is intentional: CLI users are in a terminal and expect the agent to work in their current directory, while messaging users need a consistent starting location.

Max Iterations

If the agent hits the max iteration limit while working, instead of a generic error, it asks the model to summarize what it found so far. This gives you a useful response even when the task couldn't be fully completed.

Voice Messages (TTS)

The text_to_speech tool generates audio that the gateway delivers as native voice messages on each platform:

Platform	Delivery	Format
Telegram	Voice bubble (plays inline)	Opus `.ogg` — native from OpenAI/ElevenLabs, converted via ffmpeg for Edge TTS
Discord	Audio file attachment	MP3
WhatsApp	Audio file attachment	MP3
CLI	Saved to `~/voice-memos/`	MP3

Providers:

Edge TTS (default) — Free, no API key, 322 voices in 74 languages
ElevenLabs — Premium quality, requires ELEVENLABS_API_KEY
OpenAI TTS — Good quality, requires OPENAI_API_KEY

Voice and provider are configured by the user in ~/.hermes/config.yaml under the tts: key. The model only sends text; it does not choose the voice.

The tool returns a MEDIA:<path> tag that the gateway send pipeline intercepts and delivers as a native audio message. If [[audio_as_voice]] is present (Opus format available), Telegram sends it as a voice bubble instead of an audio file.

Telegram voice bubbles & ffmpeg:

Telegram requires Opus/OGG format for native voice bubbles (the round, inline-playable kind). OpenAI and ElevenLabs produce Opus natively when on Telegram — no extra setup needed. Edge TTS (the default free provider) outputs MP3 and needs ffmpeg to convert:

sudo apt install ffmpeg    # Ubuntu/Debian
brew install ffmpeg         # macOS
sudo dnf install ffmpeg     # Fedora

Without ffmpeg, Edge TTS audio is sent as a regular audio file (still playable, but shows as a rectangular music player instead of a voice bubble).

Cron Job Delivery

When scheduling cron jobs, you can specify where the output should be delivered:

User: "Remind me to check the server in 30 minutes"

Agent uses: schedule_cronjob(
  prompt="Check server status...",
  schedule="30m",
  deliver="origin"  # Back to this chat
)

Delivery Options

Option	Description
`"origin"`	Back to where the job was created
`"local"`	Save to local files only
`"telegram"`	Telegram home channel
`"discord"`	Discord home channel
`"telegram:123456"`	Specific Telegram chat

Dynamic Context Injection

The agent knows where it is via injected context:

## Current Session Context

**Source:** Telegram (group: Dev Team, ID: -1001234567890)
**Connected Platforms:** local, telegram, discord

**Home Channels:**
  - telegram: My Notes (ID: -1001234567890)
  - discord: #bot-updates (ID: 123456789012345678)

**Delivery options for scheduled tasks:**
- "origin" → Back to this chat (Dev Team)
- "local" → Save to local files only
- "telegram" → Home channel (My Notes)
- "discord" → Home channel (#bot-updates)

CLI Commands

Command	Description
`/platforms`	Show gateway configuration and status
`--gateway`	Start the gateway (CLI flag)

Troubleshooting

"python-telegram-bot not installed"

pip install python-telegram-bot>=20.0

"discord.py not installed"

pip install discord.py>=2.0

"No platforms connected"

Check your environment variables are set
Check your tokens are valid
Try /platforms to see configuration status

Session not persisting

Check ~/.hermes/sessions/ exists
Check session policies aren't too aggressive
Verify no errors in gateway logs

Adding a New Platform

To add a new messaging platform:

1. Create the adapter

Create gateway/platforms/your_platform.py:

from gateway.platforms.base import BasePlatformAdapter, MessageEvent, SendResult
from gateway.config import Platform, PlatformConfig

class YourPlatformAdapter(BasePlatformAdapter):
    def __init__(self, config: PlatformConfig):
        super().__init__(config, Platform.YOUR_PLATFORM)
    
    async def connect(self) -> bool:
        # Connect to the platform
        ...
    
    async def disconnect(self) -> None:
        # Disconnect
        ...
    
    async def send(self, chat_id: str, content: str, ...) -> SendResult:
        # Send a message
        ...
    
    async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
        # Get chat information
        ...

2. Register the platform

Add to gateway/config.py:

class Platform(Enum):
    # ... existing ...
    YOUR_PLATFORM = "your_platform"

3. Add to gateway runner

Update gateway/run.py _create_adapter():

elif platform == Platform.YOUR_PLATFORM:
    from gateway.platforms.your_platform import YourPlatformAdapter
    return YourPlatformAdapter(config)

4. Create a toolset (optional)

Add to toolsets.py:

"hermes-your-platform": {
    "description": "Your platform toolset",
    "tools": [...],
    "includes": []
}

5. Configure

Add environment variables to .env:

YOUR_PLATFORM_TOKEN=...
YOUR_PLATFORM_HOME_CHANNEL=...

Service Management

Linux (systemd)

# Install as user service
./scripts/hermes-gateway install

# Manage
systemctl --user start hermes-gateway
systemctl --user stop hermes-gateway
systemctl --user restart hermes-gateway
systemctl --user status hermes-gateway

# View logs
journalctl --user -u hermes-gateway -f

# Enable lingering (keeps running after logout)
sudo loginctl enable-linger $USER

macOS (launchd)

# Install
./scripts/hermes-gateway install

# Manage
launchctl start ai.hermes.gateway
launchctl stop ai.hermes.gateway

# View logs
tail -f ~/.hermes/logs/gateway.log

Manual (any platform)

# Run in foreground (for testing/debugging)
./scripts/hermes-gateway run

# Or via CLI (also foreground)
python cli.py --gateway

Storage Locations

Path	Purpose
`~/.hermes/gateway.json`	Gateway configuration
`~/.hermes/sessions/sessions.json`	Session index
`~/.hermes/sessions/{id}.jsonl`	Conversation transcripts
`~/.hermes/cron/output/`	Cron job outputs
`~/.hermes/logs/gateway.log`	Gateway logs (macOS launchd)

16 KiB Raw Blame History