website/docs/user-guide/features/tts.md

---
sidebar_position: 9
title: "Voice & TTS"
description: "Text-to-speech and voice message transcription across all platforms"
---

# Voice & TTS

Hermes Agent supports both text-to-speech output and voice message transcription across all messaging platforms.

## Text-to-Speech

Convert text to speech with three providers:

| Provider | Quality | Cost | API Key |
|----------|---------|------|---------|
| **Edge TTS** (default) | Good | Free | None needed |
| **ElevenLabs** | Excellent | Paid | `ELEVENLABS_API_KEY` |
| **OpenAI TTS** | Good | Paid | `VOICE_TOOLS_OPENAI_KEY` |

### Platform Delivery

| Platform | Delivery | Format |
|----------|----------|--------|
| Telegram | Voice bubble (plays inline) | Opus `.ogg` |
| Discord | Audio file attachment | MP3 |
| WhatsApp | Audio file attachment | MP3 |
| CLI | Saved to `~/.hermes/audio_cache/` | MP3 |

### Configuration

```yaml
# In ~/.hermes/config.yaml
tts:
  provider: "edge"              # "edge" | "elevenlabs" | "openai"
  edge:
    voice: "en-US-AriaNeural"   # 322 voices, 74 languages
  elevenlabs:
    voice_id: "pNInz6obpgDQGcFmaJgB"  # Adam
    model_id: "eleven_multilingual_v2"
  openai:
    model: "gpt-4o-mini-tts"
    voice: "alloy"              # alloy, echo, fable, onyx, nova, shimmer
```

### Telegram Voice Bubbles & ffmpeg

Telegram voice bubbles require Opus/OGG audio format:

- **OpenAI and ElevenLabs** produce Opus natively — no extra setup
- **Edge TTS** (default) outputs MP3 and needs **ffmpeg** to convert:

```bash
# Ubuntu/Debian
sudo apt install ffmpeg

# macOS
brew install ffmpeg

# Fedora
sudo dnf install ffmpeg
```

Without ffmpeg, Edge TTS audio is sent as a regular audio file (playable, but shows as a rectangular player instead of a voice bubble).

:::tip
If you want voice bubbles without installing ffmpeg, switch to the OpenAI or ElevenLabs provider.
:::

## Voice Message Transcription

Voice messages sent on Telegram, Discord, WhatsApp, or Slack are automatically transcribed and injected as text into the conversation. The agent sees the transcript as normal text.

| Provider | Model | Quality | Cost |
|----------|-------|---------|------|
| **OpenAI Whisper** | `whisper-1` (default) | Good | Low |
| **OpenAI GPT-4o** | `gpt-4o-mini-transcribe` | Better | Medium |
| **OpenAI GPT-4o** | `gpt-4o-transcribe` | Best | Higher |

Requires `VOICE_TOOLS_OPENAI_KEY` in `~/.hermes/.env`.

### Configuration

```yaml
# In ~/.hermes/config.yaml
stt:
  enabled: true
  model: "whisper-1"
```
feat: add documentation website (Docusaurus) - 25 documentation pages covering Getting Started, User Guide, Developer Guide, and Reference - Docusaurus with custom amber/gold theme matching the landing page branding - GitHub Actions workflow to deploy landing page + docs to GitHub Pages - Landing page at root, docs at /docs/ on hermes-agent.nousresearch.com - Content extracted and restructured from existing repo docs (README, AGENTS.md, CONTRIBUTING.md, docs/) - Auto-deploy on push to main when website/ or landingpage/ changes 2026-03-05 05:24:55 -08:00			`---`
			`sidebar_position: 9`
			`title: "Voice & TTS"`
			`description: "Text-to-speech and voice message transcription across all platforms"`
			`---`

			`# Voice & TTS`

			`Hermes Agent supports both text-to-speech output and voice message transcription across all messaging platforms.`

			`## Text-to-Speech`

			`Convert text to speech with three providers:`

			`\| Provider \| Quality \| Cost \| API Key \|`
			`\|----------\|---------\|------\|---------\|`
			`\| Edge TTS (default) \| Good \| Free \| None needed \|`
			\| ElevenLabs \| Excellent \| Paid \| `ELEVENLABS_API_KEY` \|
			\| OpenAI TTS \| Good \| Paid \| `VOICE_TOOLS_OPENAI_KEY` \|

			`### Platform Delivery`

			`\| Platform \| Delivery \| Format \|`
			`\|----------\|----------\|--------\|`
			\| Telegram \| Voice bubble (plays inline) \| Opus `.ogg` \|
			`\| Discord \| Audio file attachment \| MP3 \|`
			`\| WhatsApp \| Audio file attachment \| MP3 \|`
docs: comprehensive accuracy audit fixes (35+ corrections) CRITICAL fixes: - Installation: Remove false prerequisites (installer auto-installs everything except git) - Tools: Remove non-existent 'web_crawl' tool from tools table - Memory: Remove non-existent 'read' action (only add/replace/remove exist) - Code execution: Fix 'search' to 'search_files' in sandbox tools list - CLI commands: Fix --model/--provider/--toolsets/--verbose as chat subcommand flags IMPORTANT fixes: - Installation: Add missing installer features (Node.js, ripgrep, ffmpeg, skills seeding) - Installation: Add 6 missing package extras to table (mcp, honcho, tts-premium, etc) - Installation: Fix mkdir to include all directories the installer creates - Quickstart: Add OpenAI Codex to provider table - CLI: Fix all 'hermes --flag' to 'hermes chat --flag' across all docs - Configuration: Remove non-existent --max-turns CLI flag - Tools: Fix 'search' to 'search_files', add missing 'process' tool - Skills: Remove skills_categories() (not a registered tool) - Cron: Remove unsupported 'daily at 9am' schedule format - TTS: Fix output directory to ~/.hermes/audio_cache/ - Delegation: Clarify depth limit wording - Architecture: Fix default model, chat() signature, file names - Contributing: Fix Python requirement from 3.11+ to 3.10+ - CLI reference: Add missing commands (login, tools, sessions subcommands) - Env vars: Fix TERMINAL_DOCKER_IMAGE default, add HERMES_MODEL 2026-03-05 06:50:22 -08:00			\| CLI \| Saved to `~/.hermes/audio_cache/` \| MP3 \|
feat: add documentation website (Docusaurus) - 25 documentation pages covering Getting Started, User Guide, Developer Guide, and Reference - Docusaurus with custom amber/gold theme matching the landing page branding - GitHub Actions workflow to deploy landing page + docs to GitHub Pages - Landing page at root, docs at /docs/ on hermes-agent.nousresearch.com - Content extracted and restructured from existing repo docs (README, AGENTS.md, CONTRIBUTING.md, docs/) - Auto-deploy on push to main when website/ or landingpage/ changes 2026-03-05 05:24:55 -08:00
			`### Configuration`

			```yaml
			`# In ~/.hermes/config.yaml`
			`tts:`
			`provider: "edge" # "edge" \| "elevenlabs" \| "openai"`
			`edge:`
			`voice: "en-US-AriaNeural" # 322 voices, 74 languages`
			`elevenlabs:`
			`voice_id: "pNInz6obpgDQGcFmaJgB" # Adam`
			`model_id: "eleven_multilingual_v2"`
			`openai:`
			`model: "gpt-4o-mini-tts"`
			`voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer`
			```

			`### Telegram Voice Bubbles & ffmpeg`

			`Telegram voice bubbles require Opus/OGG audio format:`

			`- OpenAI and ElevenLabs produce Opus natively — no extra setup`
			`- Edge TTS (default) outputs MP3 and needs ffmpeg to convert:`

			```bash
			`# Ubuntu/Debian`
			`sudo apt install ffmpeg`

			`# macOS`
			`brew install ffmpeg`

			`# Fedora`
			`sudo dnf install ffmpeg`
			```

			`Without ffmpeg, Edge TTS audio is sent as a regular audio file (playable, but shows as a rectangular player instead of a voice bubble).`

			`:::tip`
			`If you want voice bubbles without installing ffmpeg, switch to the OpenAI or ElevenLabs provider.`
			`:::`

			`## Voice Message Transcription`

			`Voice messages sent on Telegram, Discord, WhatsApp, or Slack are automatically transcribed and injected as text into the conversation. The agent sees the transcript as normal text.`

			`\| Provider \| Model \| Quality \| Cost \|`
			`\|----------\|-------\|---------\|------\|`
			\| OpenAI Whisper \| `whisper-1` (default) \| Good \| Low \|
			\| OpenAI GPT-4o \| `gpt-4o-mini-transcribe` \| Better \| Medium \|
			\| OpenAI GPT-4o \| `gpt-4o-transcribe` \| Best \| Higher \|

			Requires `VOICE_TOOLS_OPENAI_KEY` in `~/.hermes/.env`.

			`### Configuration`

			```yaml
			`# In ~/.hermes/config.yaml`
			`stt:`
			`enabled: true`
			`model: "whisper-1"`
			```