docs(voice): add comprehensive voice mode guide

Add a hands-on guide for using voice mode with Hermes, fix and expand the main voice-mode docs, surface /voice in messaging docs, and improve discoverability from the homepage and learning path.
2026-03-14 09:50:45 -07:00
parent 6c0bf2824e
commit f43c078f9e
7 changed files with 439 additions and 2 deletions
--- a/website/docs/getting-started/learning-path.md
+++ b/website/docs/getting-started/learning-path.md
@@ -54,7 +54,9 @@ Deploy Hermes Agent as a bot on your favorite messaging platform.
 3. [Messaging Overview](/docs/user-guide/messaging)
 4. [Telegram Setup](/docs/user-guide/messaging/telegram)
 5. [Discord Setup](/docs/user-guide/messaging/discord)
-6. [Security](/docs/user-guide/security)
+6. [Voice Mode](/docs/user-guide/features/voice-mode)
+7. [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes)
+8. [Security](/docs/user-guide/security)

 For full project examples, see:
 - [Daily Briefing Bot](/docs/guides/daily-briefing-bot)
--- a/website/docs/guides/use-voice-mode-with-hermes.md
+++ b/website/docs/guides/use-voice-mode-with-hermes.md
@@ -0,0 +1,422 @@
+---
+sidebar_position: 7
+title: "Use Voice Mode with Hermes"
+description: "A practical guide to setting up and using Hermes voice mode across CLI, Telegram, Discord, and Discord voice channels"
+---
+
+# Use Voice Mode with Hermes
+
+This guide is the practical companion to the [Voice Mode feature reference](/docs/user-guide/features/voice-mode).
+
+If the feature page explains what voice mode can do, this guide shows how to actually use it well.
+
+## What voice mode is good for
+
+Voice mode is especially useful when:
+- you want a hands-free CLI workflow
+- you want spoken responses in Telegram or Discord
+- you want Hermes sitting in a Discord voice channel for live conversation
+- you want quick idea capture, debugging, or back-and-forth while walking around instead of typing
+
+## Choose your voice mode setup
+
+There are really three different voice experiences in Hermes.
+
+| Mode | Best for | Platform |
+|---|---|---|
+| Interactive microphone loop | Personal hands-free use while coding or researching | CLI |
+| Voice replies in chat | Spoken responses alongside normal messaging | Telegram, Discord |
+| Live voice channel bot | Group or personal live conversation in a VC | Discord voice channels |
+
+A good path is:
+1. get text working first
+2. enable voice replies second
+3. move to Discord voice channels last if you want the full experience
+
+## Step 1: make sure normal Hermes works first
+
+Before touching voice mode, verify that:
+- Hermes starts
+- your provider is configured
+- the agent can answer text prompts normally
+
+```bash
+hermes
+```
+
+Ask something simple:
+
+```text
+What tools do you have available?
+```
+
+If that is not solid yet, fix text mode first.
+
+## Step 2: install the right extras
+
+### CLI microphone + playback
+
+```bash
+pip install hermes-agent[voice]
+```
+
+### Messaging platforms
+
+```bash
+pip install hermes-agent[messaging]
+```
+
+### Premium ElevenLabs TTS
+
+```bash
+pip install hermes-agent[tts-premium]
+```
+
+### Everything
+
+```bash
+pip install hermes-agent[all]
+```
+
+## Step 3: install system dependencies
+
+### macOS
+
+```bash
+brew install portaudio ffmpeg opus
+```
+
+### Ubuntu / Debian
+
+```bash
+sudo apt install portaudio19-dev ffmpeg libopus0
+```
+
+Why these matter:
+- `portaudio` → microphone input / playback for CLI voice mode
+- `ffmpeg` → audio conversion for TTS and messaging delivery
+- `opus` → Discord voice codec support
+
+## Step 4: choose STT and TTS providers
+
+Hermes supports both local and cloud speech stacks.
+
+### Easiest / cheapest setup
+
+Use local STT and free Edge TTS:
+- STT provider: `local`
+- TTS provider: `edge`
+
+This is usually the best place to start.
+
+### Environment file example
+
+Add to `~/.hermes/.env`:
+
+```bash
+# Cloud STT options (local needs no key)
+GROQ_API_KEY=***
+VOICE_TOOLS_OPENAI_KEY=***
+
+# Premium TTS (optional)
+ELEVENLABS_API_KEY=***
+```
+
+### Provider recommendations
+
+#### Speech-to-text
+
+- `local` → best default for privacy and zero-cost use
+- `groq` → very fast cloud transcription
+- `openai` → good paid fallback
+
+#### Text-to-speech
+
+- `edge` → free and good enough for most users
+- `elevenlabs` → best quality
+- `openai` → good middle ground
+
+## Step 5: recommended config
+
+```yaml
+voice:
+  record_key: "ctrl+b"
+  max_recording_seconds: 120
+  auto_tts: false
+  silence_threshold: 200
+  silence_duration: 3.0
+
+stt:
+  provider: "local"
+  local:
+    model: "base"
+
+tts:
+  provider: "edge"
+  edge:
+    voice: "en-US-AriaNeural"
+```
+
+This is a good conservative default for most people.
+
+## Use case 1: CLI voice mode
+
+## Turn it on
+
+Start Hermes:
+
+```bash
+hermes
+```
+
+Inside the CLI:
+
+```text
+/voice on
+```
+
+### Recording flow
+
+Default key:
+- `Ctrl+B`
+
+Workflow:
+1. press `Ctrl+B`
+2. speak
+3. wait for silence detection to stop recording automatically
+4. Hermes transcribes and responds
+5. if TTS is on, it speaks the answer
+6. the loop can automatically restart for continuous use
+
+### Useful commands
+
+```text
+/voice
+/voice on
+/voice off
+/voice tts
+/voice status
+```
+
+### Good CLI workflows
+
+#### Walk-up debugging
+
+Say:
+
+```text
+I keep getting a docker permission error. Help me debug it.
+```
+
+Then continue hands-free:
+- "Read the last error again"
+- "Explain the root cause in simpler terms"
+- "Now give me the exact fix"
+
+#### Research / brainstorming
+
+Great for:
+- walking around while thinking
+- dictating half-formed ideas
+- asking Hermes to structure your thoughts in real time
+
+#### Accessibility / low-typing sessions
+
+If typing is inconvenient, voice mode is one of the fastest ways to stay in the full Hermes loop.
+
+## Tuning CLI behavior
+
+### Silence threshold
+
+If Hermes starts/stops too aggressively, tune:
+
+```yaml
+voice:
+  silence_threshold: 250
+```
+
+Higher threshold = less sensitive.
+
+### Silence duration
+
+If you pause a lot between sentences, increase:
+
+```yaml
+voice:
+  silence_duration: 4.0
+```
+
+### Record key
+
+If `Ctrl+B` conflicts with your terminal or tmux habits:
+
+```yaml
+voice:
+  record_key: "ctrl+space"
+```
+
+## Use case 2: voice replies in Telegram or Discord
+
+This mode is simpler than full voice channels.
+
+Hermes stays a normal chat bot, but can speak replies.
+
+### Start the gateway
+
+```bash
+hermes gateway
+```
+
+### Turn on voice replies
+
+Inside Telegram or Discord:
+
+```text
+/voice on
+```
+
+or
+
+```text
+/voice tts
+```
+
+### Modes
+
+| Mode | Meaning |
+|---|---|
+| `off` | text only |
+| `voice_only` | speak only when the user sent voice |
+| `all` | speak every reply |
+
+### When to use which mode
+
+- `/voice on` if you want spoken replies only for voice-originating messages
+- `/voice tts` if you want a full spoken assistant all the time
+
+### Good messaging workflows
+
+#### Telegram assistant on your phone
+
+Use when:
+- you are away from your machine
+- you want to send voice notes and get quick spoken replies
+- you want Hermes to function like a portable research or ops assistant
+
+#### Discord DMs with spoken output
+
+Useful when you want private interaction without server-channel mention behavior.
+
+## Use case 3: Discord voice channels
+
+This is the most advanced mode.
+
+Hermes joins a Discord VC, listens to user speech, transcribes it, runs the normal agent pipeline, and speaks replies back into the channel.
+
+## Required Discord permissions
+
+In addition to the normal text-bot setup, make sure the bot has:
+- Connect
+- Speak
+- preferably Use Voice Activity
+
+Also enable privileged intents in the Developer Portal:
+- Presence Intent
+- Server Members Intent
+- Message Content Intent
+
+## Join and leave
+
+In a Discord text channel where the bot is present:
+
+```text
+/voice join
+/voice leave
+/voice status
+```
+
+### What happens when joined
+
+- users speak in the VC
+- Hermes detects speech boundaries
+- transcripts are posted in the associated text channel
+- Hermes responds in text and audio
+- the text channel is the one where `/voice join` was issued
+
+### Best practices for Discord VC use
+
+- keep `DISCORD_ALLOWED_USERS` tight
+- use a dedicated bot/testing channel at first
+- verify STT and TTS work in ordinary text-chat voice mode before trying VC mode
+
+## Voice quality recommendations
+
+### Best quality setup
+
+- STT: local `large-v3` or Groq `whisper-large-v3`
+- TTS: ElevenLabs
+
+### Best speed / convenience setup
+
+- STT: local `base` or Groq
+- TTS: Edge
+
+### Best zero-cost setup
+
+- STT: local
+- TTS: Edge
+
+## Common failure modes
+
+### "No audio device found"
+
+Install `portaudio`.
+
+### "Bot joins but hears nothing"
+
+Check:
+- your Discord user ID is in `DISCORD_ALLOWED_USERS`
+- you are not muted
+- privileged intents are enabled
+- the bot has Connect/Speak permissions
+
+### "It transcribes but does not speak"
+
+Check:
+- TTS provider config
+- API key / quota for ElevenLabs or OpenAI
+- `ffmpeg` install for Edge conversion paths
+
+### "Whisper outputs garbage"
+
+Try:
+- quieter environment
+- higher `silence_threshold`
+- different STT provider/model
+- shorter, clearer utterances
+
+### "It works in DMs but not in server channels"
+
+That is often mention policy.
+
+By default, the bot needs an `@mention` in Discord server text channels unless configured otherwise.
+
+## Suggested first-week setup
+
+If you want the shortest path to success:
+
+1. get text Hermes working
+2. install `hermes-agent[voice]`
+3. use CLI voice mode with local STT + Edge TTS
+4. then enable `/voice on` in Telegram or Discord
+5. only after that, try Discord VC mode
+
+That progression keeps the debugging surface small.
+
+## Where to read next
+
+- [Voice Mode feature reference](/docs/user-guide/features/voice-mode)
+- [Messaging Gateway](/docs/user-guide/messaging)
+- [Discord setup](/docs/user-guide/messaging/discord)
+- [Telegram setup](/docs/user-guide/messaging/telegram)
+- [Configuration](/docs/user-guide/configuration)
--- a/website/docs/index.md
+++ b/website/docs/index.md
@@ -33,6 +33,8 @@ It's not a coding copilot tethered to an IDE or a chatbot wrapper around a singl
 | 📚 **[Skills System](/docs/user-guide/features/skills)** | Procedural memory the agent creates and reuses |
 | 🔌 **[MCP Integration](/docs/user-guide/features/mcp)** | Connect to MCP servers, filter their tools, and extend Hermes safely |
 | 🧭 **[Use MCP with Hermes](/docs/guides/use-mcp-with-hermes)** | Practical MCP setup patterns, examples, and tutorials |
+| 🎙️ **[Voice Mode](/docs/user-guide/features/voice-mode)** | Real-time voice interaction in CLI, Telegram, Discord, and Discord VC |
+| 🗣️ **[Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes)** | Hands-on setup and usage patterns for Hermes voice workflows |
 | 🎭 **[Personality & SOUL.md](/docs/user-guide/features/personality)** | Define Hermes' default voice with a global SOUL.md |
 | 📄 **[Context Files](/docs/user-guide/features/context-files)** | Project context files that shape every conversation |
 | 🔒 **[Security](/docs/user-guide/security)** | Command approval, authorization, container isolation |
--- a/website/docs/user-guide/features/voice-mode.md
+++ b/website/docs/user-guide/features/voice-mode.md
@@ -8,11 +8,13 @@ description: "Real-time voice conversations with Hermes Agent — CLI, Telegram,

 Hermes Agent supports full voice interaction across CLI and messaging platforms. Talk to the agent using your microphone, hear spoken replies, and have live voice conversations in Discord voice channels.

+If you want a practical setup walkthrough with recommended configurations and real usage patterns, see [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes).
+
 ## Prerequisites

 Before using voice features, make sure you have:

-1. **Hermes Agent installed** — `pip install hermes-agent` (see [Getting Started](../../getting-started.md))
+1. **Hermes Agent installed** — `pip install hermes-agent` (see [Installation](/docs/getting-started/installation))
 2. **An LLM provider configured** — set `OPENAI_API_KEY`, `OPENAI_BASE_URL`, and `LLM_MODEL` in `~/.hermes/.env`
 3. **A working base setup** — run `hermes` to verify the agent responds to text before enabling voice

--- a/website/docs/user-guide/messaging/discord.md
+++ b/website/docs/user-guide/messaging/discord.md
@@ -212,6 +212,11 @@ Hermes Agent supports Discord voice messages:

 - **Incoming voice messages** are automatically transcribed using Whisper (requires `GROQ_API_KEY` or `VOICE_TOOLS_OPENAI_KEY` to be set in your environment).
 - **Text-to-speech**: Use `/voice tts` to have the bot send spoken audio responses alongside text replies.
+- **Discord voice channels**: Hermes can also join a voice channel, listen to users speaking, and talk back in the channel.
+
+For the full setup and operational guide, see:
+- [Voice Mode](/docs/user-guide/features/voice-mode)
+- [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes)

 ## Troubleshooting

--- a/website/docs/user-guide/messaging/index.md
+++ b/website/docs/user-guide/messaging/index.md
@@ -8,6 +8,8 @@ description: "Chat with Hermes from Telegram, Discord, Slack, WhatsApp, Signal,

 Chat with Hermes from Telegram, Discord, Slack, WhatsApp, Signal, Email, Home Assistant, or your browser. The gateway is a single background process that connects to all your configured platforms, handles sessions, runs cron jobs, and delivers voice messages.

+For the full voice feature set — including CLI microphone mode, spoken replies in messaging, and Discord voice-channel conversations — see [Voice Mode](/docs/user-guide/features/voice-mode) and [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes).
+
 ## Architecture

 ```text
@@ -77,6 +79,7 @@ hermes gateway status       # Check service status
 | `/usage` | Show token usage for this session |
 | `/insights [days]` | Show usage insights and analytics |
 | `/reasoning [level\|show\|hide]` | Change reasoning effort or toggle reasoning display |
+| `/voice [on\|off\|tts\|join\|leave\|status]` | Control messaging voice replies and Discord voice-channel behavior |
 | `/rollback [number]` | List or restore filesystem checkpoints |
 | `/background <prompt>` | Run a prompt in a separate background session |
 | `/reload-mcp` | Reload MCP servers from config |
--- a/website/sidebars.ts
+++ b/website/sidebars.ts
@@ -24,6 +24,7 @@ const sidebars: SidebarsConfig = {
        'guides/python-library',
        'guides/use-mcp-with-hermes',
        'guides/use-soul-with-hermes',
+        'guides/use-voice-mode-with-hermes',
      ],
    },
    {