Files
the-nexus/docs/voice-output.md
Alexander Whitestone ef74536e33
Some checks failed
CI / test (pull_request) Failing after 33s
CI / validate (pull_request) Failing after 26s
Review Approval Gate / verify-review (pull_request) Failing after 5s
feat: add edge-tts as zero-cost voice output provider
- Add EdgeTTSAdapter to bin/deepdive_tts.py (provider key: "edge-tts")
  default voice: en-US-GuyNeural, no API key required
- Add EdgeTTS class to intelligence/deepdive/tts_engine.py
- Update HybridTTS to try edge-tts as fallback between piper and elevenlabs
- Add --voice-memo flag to bin/night_watch.py for spoken nightly reports
- Add edge-tts>=6.1.9 to requirements.txt
- Create docs/voice-output.md documenting all providers and fallback chain
- Add tests/test_edge_tts.py with 17 unit tests (all mocked, no network)

Fixes #1126

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 06:29:26 -04:00

136 lines
4.0 KiB
Markdown

# Voice Output System
## Overview
The Nexus voice output system converts text reports and briefings into spoken audio.
It supports multiple TTS providers with automatic fallback so that audio generation
degrades gracefully when a provider is unavailable.
Primary use cases:
- **Deep Dive** daily briefings (`bin/deepdive_tts.py`)
- **Night Watch** nightly reports (`bin/night_watch.py --voice-memo`)
---
## Available Providers
### edge-tts (recommended default)
- **Cost:** Zero — no API key, no account required
- **Package:** `pip install edge-tts>=6.1.9`
- **Default voice:** `en-US-GuyNeural`
- **Output format:** MP3
- **How it works:** Streams audio from Microsoft Edge's neural TTS service over HTTPS.
No local model download required.
- **Available locales:** 100+ languages and locales. Full list:
https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support
Notable English voices:
| Voice ID | Style |
|---|---|
| `en-US-GuyNeural` | Neutral male (default) |
| `en-US-JennyNeural` | Warm female |
| `en-US-AriaNeural` | Expressive female |
| `en-GB-RyanNeural` | British male |
### piper
- **Cost:** Free, fully offline
- **Package:** `pip install piper-tts` + model download (~65 MB)
- **Model location:** `~/.local/share/piper/en_US-lessac-medium.onnx`
- **Output format:** WAV → MP3 (requires `lame`)
- **Sovereignty:** Fully local; no network calls after model download
### elevenlabs
- **Cost:** Usage-based (paid)
- **Requirement:** `ELEVENLABS_API_KEY` environment variable
- **Output format:** MP3
- **Quality:** Highest quality of the three providers
### openai
- **Cost:** Usage-based (paid)
- **Requirement:** `OPENAI_API_KEY` environment variable
- **Output format:** MP3
- **Default voice:** `alloy`
---
## Usage: deepdive_tts.py
```bash
# Use edge-tts (zero cost)
DEEPDIVE_TTS_PROVIDER=edge-tts python bin/deepdive_tts.py --text "Good morning."
# Specify a different Edge voice
python bin/deepdive_tts.py --provider edge-tts --voice en-US-JennyNeural --text "Hello world."
# Read from a file
python bin/deepdive_tts.py --provider edge-tts --input-file /tmp/briefing.txt --output /tmp/briefing
# Use OpenAI
OPENAI_API_KEY=sk-... python bin/deepdive_tts.py --provider openai --voice nova --text "Hello."
# Use ElevenLabs
ELEVENLABS_API_KEY=... python bin/deepdive_tts.py --provider elevenlabs --voice rachel --text "Hello."
# Use local Piper (offline)
python bin/deepdive_tts.py --provider piper --text "Hello."
```
Provider and voice can also be set via environment variables:
```bash
export DEEPDIVE_TTS_PROVIDER=edge-tts
export DEEPDIVE_TTS_VOICE=en-GB-RyanNeural
python bin/deepdive_tts.py --text "Good evening."
```
---
## Usage: Night Watch --voice-memo
The `--voice-memo` flag causes Night Watch to generate an MP3 audio summary of the
nightly report immediately after writing the markdown file.
```bash
python bin/night_watch.py --voice-memo
```
Output location: `/tmp/bezalel/night-watch-<YYYY-MM-DD>.mp3`
The voice memo:
- Strips markdown formatting (`#`, `|`, `*`, `---`) for cleaner speech
- Uses `edge-tts` with the `en-US-GuyNeural` voice
- Is non-fatal: if TTS fails, the markdown report is still written normally
Example crontab with voice memo:
```cron
0 3 * * * cd /path/to/the-nexus && python bin/night_watch.py --voice-memo \
>> /var/log/bezalel/night-watch.log 2>&1
```
---
## Fallback Chain
`HybridTTS` (used by `tts_engine.py`) attempts providers in this order:
1. **edge-tts** — zero cost, no API key
2. **piper** — offline local model (if model file present)
3. **elevenlabs** — cloud fallback (if `ELEVENLABS_API_KEY` set)
If `prefer_cloud=True` is passed, the order becomes: elevenlabs → piper.
---
## Phase 3 TODO
Evaluate **fish-speech** and **F5-TTS** as fully offline, sovereign alternatives
with higher voice quality than Piper. These models run locally with no network
dependency whatsoever, providing complete independence from Microsoft's Edge service.
Tracking: to be filed as a follow-up to issue #830.