- Add EdgeTTSAdapter to bin/deepdive_tts.py (provider key: "edge-tts") default voice: en-US-GuyNeural, no API key required - Add EdgeTTS class to intelligence/deepdive/tts_engine.py - Update HybridTTS to try edge-tts as fallback between piper and elevenlabs - Add --voice-memo flag to bin/night_watch.py for spoken nightly reports - Add edge-tts>=6.1.9 to requirements.txt - Create docs/voice-output.md documenting all providers and fallback chain - Add tests/test_edge_tts.py with 17 unit tests (all mocked, no network) Fixes #1126 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
4.0 KiB
Voice Output System
Overview
The Nexus voice output system converts text reports and briefings into spoken audio. It supports multiple TTS providers with automatic fallback so that audio generation degrades gracefully when a provider is unavailable.
Primary use cases:
- Deep Dive daily briefings (
bin/deepdive_tts.py) - Night Watch nightly reports (
bin/night_watch.py --voice-memo)
Available Providers
edge-tts (recommended default)
- Cost: Zero — no API key, no account required
- Package:
pip install edge-tts>=6.1.9 - Default voice:
en-US-GuyNeural - Output format: MP3
- How it works: Streams audio from Microsoft Edge's neural TTS service over HTTPS. No local model download required.
- Available locales: 100+ languages and locales. Full list: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support
Notable English voices:
| Voice ID | Style |
|---|---|
en-US-GuyNeural |
Neutral male (default) |
en-US-JennyNeural |
Warm female |
en-US-AriaNeural |
Expressive female |
en-GB-RyanNeural |
British male |
piper
- Cost: Free, fully offline
- Package:
pip install piper-tts+ model download (~65 MB) - Model location:
~/.local/share/piper/en_US-lessac-medium.onnx - Output format: WAV → MP3 (requires
lame) - Sovereignty: Fully local; no network calls after model download
elevenlabs
- Cost: Usage-based (paid)
- Requirement:
ELEVENLABS_API_KEYenvironment variable - Output format: MP3
- Quality: Highest quality of the three providers
openai
- Cost: Usage-based (paid)
- Requirement:
OPENAI_API_KEYenvironment variable - Output format: MP3
- Default voice:
alloy
Usage: deepdive_tts.py
# Use edge-tts (zero cost)
DEEPDIVE_TTS_PROVIDER=edge-tts python bin/deepdive_tts.py --text "Good morning."
# Specify a different Edge voice
python bin/deepdive_tts.py --provider edge-tts --voice en-US-JennyNeural --text "Hello world."
# Read from a file
python bin/deepdive_tts.py --provider edge-tts --input-file /tmp/briefing.txt --output /tmp/briefing
# Use OpenAI
OPENAI_API_KEY=sk-... python bin/deepdive_tts.py --provider openai --voice nova --text "Hello."
# Use ElevenLabs
ELEVENLABS_API_KEY=... python bin/deepdive_tts.py --provider elevenlabs --voice rachel --text "Hello."
# Use local Piper (offline)
python bin/deepdive_tts.py --provider piper --text "Hello."
Provider and voice can also be set via environment variables:
export DEEPDIVE_TTS_PROVIDER=edge-tts
export DEEPDIVE_TTS_VOICE=en-GB-RyanNeural
python bin/deepdive_tts.py --text "Good evening."
Usage: Night Watch --voice-memo
The --voice-memo flag causes Night Watch to generate an MP3 audio summary of the
nightly report immediately after writing the markdown file.
python bin/night_watch.py --voice-memo
Output location: /tmp/bezalel/night-watch-<YYYY-MM-DD>.mp3
The voice memo:
- Strips markdown formatting (
#,|,*,---) for cleaner speech - Uses
edge-ttswith theen-US-GuyNeuralvoice - Is non-fatal: if TTS fails, the markdown report is still written normally
Example crontab with voice memo:
0 3 * * * cd /path/to/the-nexus && python bin/night_watch.py --voice-memo \
>> /var/log/bezalel/night-watch.log 2>&1
Fallback Chain
HybridTTS (used by tts_engine.py) attempts providers in this order:
- edge-tts — zero cost, no API key
- piper — offline local model (if model file present)
- elevenlabs — cloud fallback (if
ELEVENLABS_API_KEYset)
If prefer_cloud=True is passed, the order becomes: elevenlabs → piper.
Phase 3 TODO
Evaluate fish-speech and F5-TTS as fully offline, sovereign alternatives with higher voice quality than Piper. These models run locally with no network dependency whatsoever, providing complete independence from Microsoft's Edge service.
Tracking: to be filed as a follow-up to issue #830.