docs: complete voice mode docs
This commit is contained in:
@@ -210,7 +210,7 @@ Replace the ID with the actual channel ID (right-click → Copy Channel ID with
|
||||
|
||||
Hermes Agent supports Discord voice messages:
|
||||
|
||||
- **Incoming voice messages** are automatically transcribed using Whisper (requires `GROQ_API_KEY` or `VOICE_TOOLS_OPENAI_KEY` to be set in your environment).
|
||||
- **Incoming voice messages** are automatically transcribed using the configured STT provider: local `faster-whisper` (no key), Groq Whisper (`GROQ_API_KEY`), or OpenAI Whisper (`VOICE_TOOLS_OPENAI_KEY`).
|
||||
- **Text-to-speech**: Use `/voice tts` to have the bot send spoken audio responses alongside text replies.
|
||||
- **Discord voice channels**: Hermes can also join a voice channel, listen to users speaking, and talk back in the channel.
|
||||
|
||||
|
||||
@@ -224,7 +224,7 @@ Make sure the bot has been **invited to the channel** (`/invite @Hermes Agent`).
|
||||
|
||||
Hermes supports voice on Slack:
|
||||
|
||||
- **Incoming:** Voice/audio messages are automatically transcribed using Whisper (requires `VOICE_TOOLS_OPENAI_KEY`)
|
||||
- **Incoming:** Voice/audio messages are automatically transcribed using the configured STT provider: local `faster-whisper`, Groq Whisper (`GROQ_API_KEY`), or OpenAI Whisper (`VOICE_TOOLS_OPENAI_KEY`)
|
||||
- **Outgoing:** TTS responses are sent as audio file attachments
|
||||
|
||||
---
|
||||
|
||||
@@ -131,7 +131,11 @@ Group chat IDs are negative numbers (e.g., `-1001234567890`). Your personal DM c
|
||||
|
||||
### Incoming Voice (Speech-to-Text)
|
||||
|
||||
Voice messages you send on Telegram are automatically transcribed using OpenAI's Whisper API and injected as text into the conversation. This requires `VOICE_TOOLS_OPENAI_KEY` in `~/.hermes/.env`.
|
||||
Voice messages you send on Telegram are automatically transcribed by Hermes's configured STT provider and injected as text into the conversation.
|
||||
|
||||
- `local` uses `faster-whisper` on the machine running Hermes — no API key required
|
||||
- `groq` uses Groq Whisper and requires `GROQ_API_KEY`
|
||||
- `openai` uses OpenAI Whisper and requires `VOICE_TOOLS_OPENAI_KEY`
|
||||
|
||||
### Outgoing Voice (Text-to-Speech)
|
||||
|
||||
@@ -173,7 +177,7 @@ Hermes Agent works in Telegram group chats with a few considerations:
|
||||
| Bot not responding at all | Verify `TELEGRAM_BOT_TOKEN` is correct. Check `hermes gateway` logs for errors. |
|
||||
| Bot responds with "unauthorized" | Your user ID is not in `TELEGRAM_ALLOWED_USERS`. Double-check with @userinfobot. |
|
||||
| Bot ignores group messages | Privacy mode is likely on. Disable it (Step 3) or make the bot a group admin. **Remember to remove and re-add the bot after changing privacy.** |
|
||||
| Voice messages not transcribed | Check that `VOICE_TOOLS_OPENAI_KEY` is set and valid in `~/.hermes/.env`. |
|
||||
| Voice messages not transcribed | Verify STT is available: install `faster-whisper` for local transcription, or set `GROQ_API_KEY` / `VOICE_TOOLS_OPENAI_KEY` in `~/.hermes/.env`. |
|
||||
| Voice replies are files, not bubbles | Install `ffmpeg` (needed for Edge TTS Opus conversion). |
|
||||
| Bot token revoked/invalid | Generate a new token via `/revoke` then `/newbot` or `/token` in BotFather. Update your `.env` file. |
|
||||
|
||||
|
||||
@@ -137,7 +137,7 @@ with reconnection logic.
|
||||
|
||||
Hermes supports voice on WhatsApp:
|
||||
|
||||
- **Incoming:** Voice messages (`.ogg` opus) are automatically transcribed using Whisper (requires `VOICE_TOOLS_OPENAI_KEY`)
|
||||
- **Incoming:** Voice messages (`.ogg` opus) are automatically transcribed using the configured STT provider: local `faster-whisper`, Groq Whisper (`GROQ_API_KEY`), or OpenAI Whisper (`VOICE_TOOLS_OPENAI_KEY`)
|
||||
- **Outgoing:** TTS responses are sent as MP3 audio file attachments
|
||||
- Agent responses are prefixed with "⚕ **Hermes Agent**" for easy identification
|
||||
|
||||
|
||||
Reference in New Issue
Block a user