feat: add MiniMax TTS provider support (speech-2.8)

Add MiniMax as a fifth TTS provider alongside Edge TTS, ElevenLabs,
OpenAI, and NeuTTS. Supports speech-2.8-hd (recommended default) and
speech-2.8-turbo models via the MiniMax T2A HTTP API.

Changes:
- Add _generate_minimax_tts() with hex-encoded audio decoding
- Add MiniMax to provider dispatch, requirements check, and Telegram
  Opus compatibility handling
- Add MiniMax to interactive setup wizard with API key prompt
- Update TTS documentation and config example

Configuration:
  tts:
    provider: "minimax"
    minimax:
      model: "speech-2.8-hd"
      voice_id: "English_Graceful_Lady"

Requires MINIMAX_API_KEY environment variable.

API reference: https://platform.minimax.io/docs/api-reference/speech-t2a-http
This commit is contained in:
Octopus
2026-04-03 17:42:24 +08:00
committed by Teknium
parent 831067c5d3
commit f5c212f69b
4 changed files with 130 additions and 8 deletions

View File

@@ -10,13 +10,14 @@ Hermes Agent supports both text-to-speech output and voice message transcription
## Text-to-Speech
Convert text to speech with four providers:
Convert text to speech with five providers:
| Provider | Quality | Cost | API Key |
|----------|---------|------|---------|
| **Edge TTS** (default) | Good | Free | None needed |
| **ElevenLabs** | Excellent | Paid | `ELEVENLABS_API_KEY` |
| **OpenAI TTS** | Good | Paid | `VOICE_TOOLS_OPENAI_KEY` |
| **MiniMax TTS** | Excellent | Paid | `MINIMAX_API_KEY` |
| **NeuTTS** | Good | Free | None needed |
### Platform Delivery
@@ -33,7 +34,7 @@ Convert text to speech with four providers:
```yaml
# In ~/.hermes/config.yaml
tts:
provider: "edge" # "edge" | "elevenlabs" | "openai" | "neutts"
provider: "edge" # "edge" | "elevenlabs" | "openai" | "minimax" | "neutts"
edge:
voice: "en-US-AriaNeural" # 322 voices, 74 languages
elevenlabs:
@@ -43,6 +44,12 @@ tts:
model: "gpt-4o-mini-tts"
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
base_url: "https://api.openai.com/v1" # Override for OpenAI-compatible TTS endpoints
minimax:
model: "speech-2.8-hd" # speech-2.8-hd (default), speech-2.8-turbo
voice_id: "English_Graceful_Lady" # See https://platform.minimax.io/faq/system-voice-id
speed: 1 # 0.5 - 2.0
vol: 1 # 0 - 10
pitch: 0 # -12 - 12
neutts:
ref_audio: ''
ref_text: ''
@@ -56,6 +63,7 @@ Telegram voice bubbles require Opus/OGG audio format:
- **OpenAI and ElevenLabs** produce Opus natively — no extra setup
- **Edge TTS** (default) outputs MP3 and needs **ffmpeg** to convert:
- **MiniMax TTS** outputs MP3 and needs **ffmpeg** to convert for Telegram voice bubbles
- **NeuTTS** outputs WAV and also needs **ffmpeg** to convert for Telegram voice bubbles
```bash
@@ -69,7 +77,7 @@ brew install ffmpeg
sudo dnf install ffmpeg
```
Without ffmpeg, Edge TTS and NeuTTS audio are sent as regular audio files (playable, but shown as a rectangular player instead of a voice bubble).
Without ffmpeg, Edge TTS, MiniMax TTS, and NeuTTS audio are sent as regular audio files (playable, but shown as a rectangular player instead of a voice bubble).
:::tip
If you want voice bubbles without installing ffmpeg, switch to the OpenAI or ElevenLabs provider.