diff --git a/website/docs/getting-started/learning-path.md b/website/docs/getting-started/learning-path.md index 2c08f077e..bcdbb44d4 100644 --- a/website/docs/getting-started/learning-path.md +++ b/website/docs/getting-started/learning-path.md @@ -54,7 +54,9 @@ Deploy Hermes Agent as a bot on your favorite messaging platform. 3. [Messaging Overview](/docs/user-guide/messaging) 4. [Telegram Setup](/docs/user-guide/messaging/telegram) 5. [Discord Setup](/docs/user-guide/messaging/discord) -6. [Security](/docs/user-guide/security) +6. [Voice Mode](/docs/user-guide/features/voice-mode) +7. [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes) +8. [Security](/docs/user-guide/security) For full project examples, see: - [Daily Briefing Bot](/docs/guides/daily-briefing-bot) diff --git a/website/docs/guides/use-voice-mode-with-hermes.md b/website/docs/guides/use-voice-mode-with-hermes.md new file mode 100644 index 000000000..dc35dcc65 --- /dev/null +++ b/website/docs/guides/use-voice-mode-with-hermes.md @@ -0,0 +1,422 @@ +--- +sidebar_position: 7 +title: "Use Voice Mode with Hermes" +description: "A practical guide to setting up and using Hermes voice mode across CLI, Telegram, Discord, and Discord voice channels" +--- + +# Use Voice Mode with Hermes + +This guide is the practical companion to the [Voice Mode feature reference](/docs/user-guide/features/voice-mode). + +If the feature page explains what voice mode can do, this guide shows how to actually use it well. + +## What voice mode is good for + +Voice mode is especially useful when: +- you want a hands-free CLI workflow +- you want spoken responses in Telegram or Discord +- you want Hermes sitting in a Discord voice channel for live conversation +- you want quick idea capture, debugging, or back-and-forth while walking around instead of typing + +## Choose your voice mode setup + +There are really three different voice experiences in Hermes. + +| Mode | Best for | Platform | +|---|---|---| +| Interactive microphone loop | Personal hands-free use while coding or researching | CLI | +| Voice replies in chat | Spoken responses alongside normal messaging | Telegram, Discord | +| Live voice channel bot | Group or personal live conversation in a VC | Discord voice channels | + +A good path is: +1. get text working first +2. enable voice replies second +3. move to Discord voice channels last if you want the full experience + +## Step 1: make sure normal Hermes works first + +Before touching voice mode, verify that: +- Hermes starts +- your provider is configured +- the agent can answer text prompts normally + +```bash +hermes +``` + +Ask something simple: + +```text +What tools do you have available? +``` + +If that is not solid yet, fix text mode first. + +## Step 2: install the right extras + +### CLI microphone + playback + +```bash +pip install hermes-agent[voice] +``` + +### Messaging platforms + +```bash +pip install hermes-agent[messaging] +``` + +### Premium ElevenLabs TTS + +```bash +pip install hermes-agent[tts-premium] +``` + +### Everything + +```bash +pip install hermes-agent[all] +``` + +## Step 3: install system dependencies + +### macOS + +```bash +brew install portaudio ffmpeg opus +``` + +### Ubuntu / Debian + +```bash +sudo apt install portaudio19-dev ffmpeg libopus0 +``` + +Why these matter: +- `portaudio` → microphone input / playback for CLI voice mode +- `ffmpeg` → audio conversion for TTS and messaging delivery +- `opus` → Discord voice codec support + +## Step 4: choose STT and TTS providers + +Hermes supports both local and cloud speech stacks. + +### Easiest / cheapest setup + +Use local STT and free Edge TTS: +- STT provider: `local` +- TTS provider: `edge` + +This is usually the best place to start. + +### Environment file example + +Add to `~/.hermes/.env`: + +```bash +# Cloud STT options (local needs no key) +GROQ_API_KEY=*** +VOICE_TOOLS_OPENAI_KEY=*** + +# Premium TTS (optional) +ELEVENLABS_API_KEY=*** +``` + +### Provider recommendations + +#### Speech-to-text + +- `local` → best default for privacy and zero-cost use +- `groq` → very fast cloud transcription +- `openai` → good paid fallback + +#### Text-to-speech + +- `edge` → free and good enough for most users +- `elevenlabs` → best quality +- `openai` → good middle ground + +## Step 5: recommended config + +```yaml +voice: + record_key: "ctrl+b" + max_recording_seconds: 120 + auto_tts: false + silence_threshold: 200 + silence_duration: 3.0 + +stt: + provider: "local" + local: + model: "base" + +tts: + provider: "edge" + edge: + voice: "en-US-AriaNeural" +``` + +This is a good conservative default for most people. + +## Use case 1: CLI voice mode + +## Turn it on + +Start Hermes: + +```bash +hermes +``` + +Inside the CLI: + +```text +/voice on +``` + +### Recording flow + +Default key: +- `Ctrl+B` + +Workflow: +1. press `Ctrl+B` +2. speak +3. wait for silence detection to stop recording automatically +4. Hermes transcribes and responds +5. if TTS is on, it speaks the answer +6. the loop can automatically restart for continuous use + +### Useful commands + +```text +/voice +/voice on +/voice off +/voice tts +/voice status +``` + +### Good CLI workflows + +#### Walk-up debugging + +Say: + +```text +I keep getting a docker permission error. Help me debug it. +``` + +Then continue hands-free: +- "Read the last error again" +- "Explain the root cause in simpler terms" +- "Now give me the exact fix" + +#### Research / brainstorming + +Great for: +- walking around while thinking +- dictating half-formed ideas +- asking Hermes to structure your thoughts in real time + +#### Accessibility / low-typing sessions + +If typing is inconvenient, voice mode is one of the fastest ways to stay in the full Hermes loop. + +## Tuning CLI behavior + +### Silence threshold + +If Hermes starts/stops too aggressively, tune: + +```yaml +voice: + silence_threshold: 250 +``` + +Higher threshold = less sensitive. + +### Silence duration + +If you pause a lot between sentences, increase: + +```yaml +voice: + silence_duration: 4.0 +``` + +### Record key + +If `Ctrl+B` conflicts with your terminal or tmux habits: + +```yaml +voice: + record_key: "ctrl+space" +``` + +## Use case 2: voice replies in Telegram or Discord + +This mode is simpler than full voice channels. + +Hermes stays a normal chat bot, but can speak replies. + +### Start the gateway + +```bash +hermes gateway +``` + +### Turn on voice replies + +Inside Telegram or Discord: + +```text +/voice on +``` + +or + +```text +/voice tts +``` + +### Modes + +| Mode | Meaning | +|---|---| +| `off` | text only | +| `voice_only` | speak only when the user sent voice | +| `all` | speak every reply | + +### When to use which mode + +- `/voice on` if you want spoken replies only for voice-originating messages +- `/voice tts` if you want a full spoken assistant all the time + +### Good messaging workflows + +#### Telegram assistant on your phone + +Use when: +- you are away from your machine +- you want to send voice notes and get quick spoken replies +- you want Hermes to function like a portable research or ops assistant + +#### Discord DMs with spoken output + +Useful when you want private interaction without server-channel mention behavior. + +## Use case 3: Discord voice channels + +This is the most advanced mode. + +Hermes joins a Discord VC, listens to user speech, transcribes it, runs the normal agent pipeline, and speaks replies back into the channel. + +## Required Discord permissions + +In addition to the normal text-bot setup, make sure the bot has: +- Connect +- Speak +- preferably Use Voice Activity + +Also enable privileged intents in the Developer Portal: +- Presence Intent +- Server Members Intent +- Message Content Intent + +## Join and leave + +In a Discord text channel where the bot is present: + +```text +/voice join +/voice leave +/voice status +``` + +### What happens when joined + +- users speak in the VC +- Hermes detects speech boundaries +- transcripts are posted in the associated text channel +- Hermes responds in text and audio +- the text channel is the one where `/voice join` was issued + +### Best practices for Discord VC use + +- keep `DISCORD_ALLOWED_USERS` tight +- use a dedicated bot/testing channel at first +- verify STT and TTS work in ordinary text-chat voice mode before trying VC mode + +## Voice quality recommendations + +### Best quality setup + +- STT: local `large-v3` or Groq `whisper-large-v3` +- TTS: ElevenLabs + +### Best speed / convenience setup + +- STT: local `base` or Groq +- TTS: Edge + +### Best zero-cost setup + +- STT: local +- TTS: Edge + +## Common failure modes + +### "No audio device found" + +Install `portaudio`. + +### "Bot joins but hears nothing" + +Check: +- your Discord user ID is in `DISCORD_ALLOWED_USERS` +- you are not muted +- privileged intents are enabled +- the bot has Connect/Speak permissions + +### "It transcribes but does not speak" + +Check: +- TTS provider config +- API key / quota for ElevenLabs or OpenAI +- `ffmpeg` install for Edge conversion paths + +### "Whisper outputs garbage" + +Try: +- quieter environment +- higher `silence_threshold` +- different STT provider/model +- shorter, clearer utterances + +### "It works in DMs but not in server channels" + +That is often mention policy. + +By default, the bot needs an `@mention` in Discord server text channels unless configured otherwise. + +## Suggested first-week setup + +If you want the shortest path to success: + +1. get text Hermes working +2. install `hermes-agent[voice]` +3. use CLI voice mode with local STT + Edge TTS +4. then enable `/voice on` in Telegram or Discord +5. only after that, try Discord VC mode + +That progression keeps the debugging surface small. + +## Where to read next + +- [Voice Mode feature reference](/docs/user-guide/features/voice-mode) +- [Messaging Gateway](/docs/user-guide/messaging) +- [Discord setup](/docs/user-guide/messaging/discord) +- [Telegram setup](/docs/user-guide/messaging/telegram) +- [Configuration](/docs/user-guide/configuration) diff --git a/website/docs/index.md b/website/docs/index.md index 3dbfcaf71..470c8d2ed 100644 --- a/website/docs/index.md +++ b/website/docs/index.md @@ -33,6 +33,8 @@ It's not a coding copilot tethered to an IDE or a chatbot wrapper around a singl | 📚 **[Skills System](/docs/user-guide/features/skills)** | Procedural memory the agent creates and reuses | | 🔌 **[MCP Integration](/docs/user-guide/features/mcp)** | Connect to MCP servers, filter their tools, and extend Hermes safely | | 🧭 **[Use MCP with Hermes](/docs/guides/use-mcp-with-hermes)** | Practical MCP setup patterns, examples, and tutorials | +| 🎙️ **[Voice Mode](/docs/user-guide/features/voice-mode)** | Real-time voice interaction in CLI, Telegram, Discord, and Discord VC | +| 🗣️ **[Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes)** | Hands-on setup and usage patterns for Hermes voice workflows | | 🎭 **[Personality & SOUL.md](/docs/user-guide/features/personality)** | Define Hermes' default voice with a global SOUL.md | | 📄 **[Context Files](/docs/user-guide/features/context-files)** | Project context files that shape every conversation | | 🔒 **[Security](/docs/user-guide/security)** | Command approval, authorization, container isolation | diff --git a/website/docs/user-guide/features/voice-mode.md b/website/docs/user-guide/features/voice-mode.md index ce151643a..3c94062f7 100644 --- a/website/docs/user-guide/features/voice-mode.md +++ b/website/docs/user-guide/features/voice-mode.md @@ -8,11 +8,13 @@ description: "Real-time voice conversations with Hermes Agent — CLI, Telegram, Hermes Agent supports full voice interaction across CLI and messaging platforms. Talk to the agent using your microphone, hear spoken replies, and have live voice conversations in Discord voice channels. +If you want a practical setup walkthrough with recommended configurations and real usage patterns, see [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes). + ## Prerequisites Before using voice features, make sure you have: -1. **Hermes Agent installed** — `pip install hermes-agent` (see [Getting Started](../../getting-started.md)) +1. **Hermes Agent installed** — `pip install hermes-agent` (see [Installation](/docs/getting-started/installation)) 2. **An LLM provider configured** — set `OPENAI_API_KEY`, `OPENAI_BASE_URL`, and `LLM_MODEL` in `~/.hermes/.env` 3. **A working base setup** — run `hermes` to verify the agent responds to text before enabling voice diff --git a/website/docs/user-guide/messaging/discord.md b/website/docs/user-guide/messaging/discord.md index 0fc7f8cbc..b5f060596 100644 --- a/website/docs/user-guide/messaging/discord.md +++ b/website/docs/user-guide/messaging/discord.md @@ -212,6 +212,11 @@ Hermes Agent supports Discord voice messages: - **Incoming voice messages** are automatically transcribed using Whisper (requires `GROQ_API_KEY` or `VOICE_TOOLS_OPENAI_KEY` to be set in your environment). - **Text-to-speech**: Use `/voice tts` to have the bot send spoken audio responses alongside text replies. +- **Discord voice channels**: Hermes can also join a voice channel, listen to users speaking, and talk back in the channel. + +For the full setup and operational guide, see: +- [Voice Mode](/docs/user-guide/features/voice-mode) +- [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes) ## Troubleshooting diff --git a/website/docs/user-guide/messaging/index.md b/website/docs/user-guide/messaging/index.md index debc841b8..2530248ee 100644 --- a/website/docs/user-guide/messaging/index.md +++ b/website/docs/user-guide/messaging/index.md @@ -8,6 +8,8 @@ description: "Chat with Hermes from Telegram, Discord, Slack, WhatsApp, Signal, Chat with Hermes from Telegram, Discord, Slack, WhatsApp, Signal, Email, Home Assistant, or your browser. The gateway is a single background process that connects to all your configured platforms, handles sessions, runs cron jobs, and delivers voice messages. +For the full voice feature set — including CLI microphone mode, spoken replies in messaging, and Discord voice-channel conversations — see [Voice Mode](/docs/user-guide/features/voice-mode) and [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes). + ## Architecture ```text @@ -77,6 +79,7 @@ hermes gateway status # Check service status | `/usage` | Show token usage for this session | | `/insights [days]` | Show usage insights and analytics | | `/reasoning [level\|show\|hide]` | Change reasoning effort or toggle reasoning display | +| `/voice [on\|off\|tts\|join\|leave\|status]` | Control messaging voice replies and Discord voice-channel behavior | | `/rollback [number]` | List or restore filesystem checkpoints | | `/background ` | Run a prompt in a separate background session | | `/reload-mcp` | Reload MCP servers from config | diff --git a/website/sidebars.ts b/website/sidebars.ts index ff91c4de5..828b4472f 100644 --- a/website/sidebars.ts +++ b/website/sidebars.ts @@ -24,6 +24,7 @@ const sidebars: SidebarsConfig = { 'guides/python-library', 'guides/use-mcp-with-hermes', 'guides/use-soul-with-hermes', + 'guides/use-voice-mode-with-hermes', ], }, {