Cross-referenced all 84 docs pages against the actual codebase and corrected every discrepancy found. Reference docs: - faq.md: Fix non-existent commands (/stats→/usage, /context→/usage, hermes models→hermes model, hermes config get→hermes config show, hermes gateway logs→cat gateway.log, async→sync chat() call) - cli-commands.md: Fix --provider choices list (remove providers not in argparse), add undocumented -s/--skills flag - slash-commands.md: Add missing /queue and /resume commands, fix /approve args_hint to show [session|always] - tools-reference.md: Remove duplicate vision and web toolset sections - environment-variables.md: Fix HERMES_INFERENCE_PROVIDER list (add copilot-acp, remove alibaba to match actual argparse choices) Configuration & user guide: - configuration.md: Fix approval_mode→approvals.mode (manual not ask), checkpoints.enabled default true not false, human_delay defaults (500/2000→800/2500), remove non-existent delegation.max_iterations and delegation.default_toolsets, fix website_blocklist nesting under security:, add .hermes.md and CLAUDE.md to context files table with priority system explanation - security.md: Fix website_blocklist nesting under security: - context-files.md: Add .hermes.md/HERMES.md and CLAUDE.md support, document priority-based first-match-wins loading behavior - cli.md: Fix personalities config nesting (top-level, not under agent:) - delegation.md: Fix model override docs (config-level, not per-call tool parameter) - rl-training.md: Fix log directory (tinker-atropos/logs/→ ~/.hermes/logs/rl_training/) - tts.md: Fix Discord delivery format (voice bubble with fallback, not just file attachment) - git-worktrees.md: Remove outdated v0.2.0 version reference Developer guide: - prompt-assembly.md: Add .hermes.md, CLAUDE.md, document priority system for context files - agent-loop.md: Fix callback list (remove non-existent message_callback, add stream_delta_callback, tool_gen_callback, status_callback) Messaging & guides: - webhooks.md: Fix command (hermes setup gateway→hermes gateway setup) - tips.md: Fix session idle timeout (120min→24h), config file (gateway.json→config.yaml) - build-a-hermes-plugin.md: Fix plugin.yaml provides: format (provides_tools/provides_hooks as lists), note register_command() as not yet implemented
4.7 KiB
sidebar_position, title, description
| sidebar_position | title | description |
|---|---|---|
| 9 | Voice & TTS | Text-to-speech and voice message transcription across all platforms |
Voice & TTS
Hermes Agent supports both text-to-speech output and voice message transcription across all messaging platforms.
Text-to-Speech
Convert text to speech with four providers:
| Provider | Quality | Cost | API Key |
|---|---|---|---|
| Edge TTS (default) | Good | Free | None needed |
| ElevenLabs | Excellent | Paid | ELEVENLABS_API_KEY |
| OpenAI TTS | Good | Paid | VOICE_TOOLS_OPENAI_KEY |
| NeuTTS | Good | Free | None needed |
Platform Delivery
| Platform | Delivery | Format |
|---|---|---|
| Telegram | Voice bubble (plays inline) | Opus .ogg |
| Discord | Voice bubble (Opus/OGG), falls back to file attachment | Opus/MP3 |
| Audio file attachment | MP3 | |
| CLI | Saved to ~/.hermes/audio_cache/ |
MP3 |
Configuration
# In ~/.hermes/config.yaml
tts:
provider: "edge" # "edge" | "elevenlabs" | "openai" | "neutts"
edge:
voice: "en-US-AriaNeural" # 322 voices, 74 languages
elevenlabs:
voice_id: "pNInz6obpgDQGcFmaJgB" # Adam
model_id: "eleven_multilingual_v2"
openai:
model: "gpt-4o-mini-tts"
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
base_url: "https://api.openai.com/v1" # Override for OpenAI-compatible TTS endpoints
neutts:
ref_audio: ''
ref_text: ''
model: neuphonic/neutts-air-q4-gguf
device: cpu
Telegram Voice Bubbles & ffmpeg
Telegram voice bubbles require Opus/OGG audio format:
- OpenAI and ElevenLabs produce Opus natively — no extra setup
- Edge TTS (default) outputs MP3 and needs ffmpeg to convert:
- NeuTTS outputs WAV and also needs ffmpeg to convert for Telegram voice bubbles
# Ubuntu/Debian
sudo apt install ffmpeg
# macOS
brew install ffmpeg
# Fedora
sudo dnf install ffmpeg
Without ffmpeg, Edge TTS and NeuTTS audio are sent as regular audio files (playable, but shown as a rectangular player instead of a voice bubble).
:::tip If you want voice bubbles without installing ffmpeg, switch to the OpenAI or ElevenLabs provider. :::
Voice Message Transcription (STT)
Voice messages sent on Telegram, Discord, WhatsApp, Slack, or Signal are automatically transcribed and injected as text into the conversation. The agent sees the transcript as normal text.
| Provider | Quality | Cost | API Key |
|---|---|---|---|
| Local Whisper (default) | Good | Free | None needed |
| Groq Whisper API | Good–Best | Free tier | GROQ_API_KEY |
| OpenAI Whisper API | Good–Best | Paid | VOICE_TOOLS_OPENAI_KEY or OPENAI_API_KEY |
:::info Zero Config
Local transcription works out of the box when faster-whisper is installed. If that's unavailable, Hermes can also use a local whisper CLI from common install locations (like /opt/homebrew/bin) or a custom command via HERMES_LOCAL_STT_COMMAND.
:::
Configuration
# In ~/.hermes/config.yaml
stt:
provider: "local" # "local" | "groq" | "openai"
local:
model: "base" # tiny, base, small, medium, large-v3
openai:
model: "whisper-1" # whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe
Provider Details
Local (faster-whisper) — Runs Whisper locally via faster-whisper. Uses CPU by default, GPU if available. Model sizes:
| Model | Size | Speed | Quality |
|---|---|---|---|
tiny |
~75 MB | Fastest | Basic |
base |
~150 MB | Fast | Good (default) |
small |
~500 MB | Medium | Better |
medium |
~1.5 GB | Slower | Great |
large-v3 |
~3 GB | Slowest | Best |
Groq API — Requires GROQ_API_KEY. Good cloud fallback when you want a free hosted STT option.
OpenAI API — Accepts VOICE_TOOLS_OPENAI_KEY first and falls back to OPENAI_API_KEY. Supports whisper-1, gpt-4o-mini-transcribe, and gpt-4o-transcribe.
Custom local CLI fallback — Set HERMES_LOCAL_STT_COMMAND if you want Hermes to call a local transcription command directly. The command template supports {input_path}, {output_dir}, {language}, and {model} placeholders.
Fallback Behavior
If your configured provider isn't available, Hermes automatically falls back:
- Local faster-whisper unavailable → Tries a local
whisperCLI orHERMES_LOCAL_STT_COMMANDbefore cloud providers - Groq key not set → Falls back to local transcription, then OpenAI
- OpenAI key not set → Falls back to local transcription, then Groq
- Nothing available → Voice messages pass through with an accurate note to the user