docs: improve voice mode docs with prerequisites, startup commands, and platform links

This commit is contained in:
0xbyt4
2026-03-11 15:29:23 +03:00
parent 2bb2312ea2
commit 75bd5a582b

View File

@@ -8,6 +8,18 @@ description: "Real-time voice conversations with Hermes Agent — CLI, Telegram,
Hermes Agent supports full voice interaction across CLI and messaging platforms. Talk to the agent using your microphone, hear spoken replies, and have live voice conversations in Discord voice channels.
## Prerequisites
Before using voice features, make sure you have:
1. **Hermes Agent installed**`pip install hermes-agent` (see [Getting Started](../../getting-started.md))
2. **An LLM provider configured** — set `OPENAI_API_KEY`, `OPENAI_BASE_URL`, and `LLM_MODEL` in `~/.hermes/.env`
3. **A working base setup** — run `hermes` to verify the agent responds to text before enabling voice
:::tip
The `~/.hermes/` directory and default `config.yaml` are created automatically the first time you run `hermes`. You only need to create `~/.hermes/.env` manually for API keys.
:::
## Overview
| Feature | Platform | Description |
@@ -79,6 +91,14 @@ ELEVENLABS_API_KEY=your-key # ElevenLabs — premium quality
### Quick Start
Start the CLI and enable voice mode:
```bash
hermes # Start the interactive CLI
```
Then use these commands inside the CLI:
```
/voice Toggle voice mode on/off
/voice on Enable voice mode
@@ -89,7 +109,7 @@ ELEVENLABS_API_KEY=your-key # ElevenLabs — premium quality
### How It Works
1. Enable voice mode with `/voice on`
1. Start the CLI with `hermes` and enable voice mode with `/voice on`
2. **Press Ctrl+B** — a beep plays (880Hz), recording starts
3. **Speak** — a live audio level bar shows your input: `● [▁▂▃▅▇▇▅▂] `
4. **Stop speaking** — after 3 seconds of silence, recording auto-stops
@@ -125,12 +145,23 @@ When TTS is enabled, the agent speaks its reply **sentence-by-sentence** as it g
### Hallucination Filter
Whisper sometimes generates phantom text from silence or background noise ("Thank you for watching", "Subscribe", etc.). The agent filters these out using a database of 498+ known hallucination phrases across multiple languages.
Whisper sometimes generates phantom text from silence or background noise ("Thank you for watching", "Subscribe", etc.). The agent filters these out using a set of 26 known hallucination phrases across multiple languages, plus a regex pattern that catches repetitive variations.
---
## Gateway Voice Reply (Telegram & Discord)
If you haven't set up your messaging bots yet, see the platform-specific guides:
- [Telegram Setup Guide](../messaging/telegram.md)
- [Discord Setup Guide](../messaging/discord.md)
Start the gateway to connect to your messaging platforms:
```bash
hermes gateway # Start the gateway (connects to configured platforms)
hermes gateway setup # Interactive setup wizard for first-time configuration
```
### Commands
These work in both Telegram and Discord text channels:
@@ -245,8 +276,18 @@ GROQ_API_KEY=your-key # Recommended (fast, free tier)
# ELEVENLABS_API_KEY=your-key # Premium quality
```
### Start the Gateway
```bash
hermes gateway # Start with existing configuration
```
The bot should come online in Discord within a few seconds.
### Commands
Use these in the Discord text channel where the bot is present:
```
/voice join Bot joins your current voice channel
/voice channel Alias for /voice join