# Voice Output System ## Overview The Nexus voice output system converts text reports and briefings into spoken audio. It supports multiple TTS providers with automatic fallback so that audio generation degrades gracefully when a provider is unavailable. Primary use cases: - **Deep Dive** daily briefings (`bin/deepdive_tts.py`) - **Night Watch** nightly reports (`bin/night_watch.py --voice-memo`) --- ## Available Providers ### edge-tts (recommended default) - **Cost:** Zero — no API key, no account required - **Package:** `pip install edge-tts>=6.1.9` - **Default voice:** `en-US-GuyNeural` - **Output format:** MP3 - **How it works:** Streams audio from Microsoft Edge's neural TTS service over HTTPS. No local model download required. - **Available locales:** 100+ languages and locales. Full list: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support Notable English voices: | Voice ID | Style | |---|---| | `en-US-GuyNeural` | Neutral male (default) | | `en-US-JennyNeural` | Warm female | | `en-US-AriaNeural` | Expressive female | | `en-GB-RyanNeural` | British male | ### piper - **Cost:** Free, fully offline - **Package:** `pip install piper-tts` + model download (~65 MB) - **Model location:** `~/.local/share/piper/en_US-lessac-medium.onnx` - **Output format:** WAV → MP3 (requires `lame`) - **Sovereignty:** Fully local; no network calls after model download ### elevenlabs - **Cost:** Usage-based (paid) - **Requirement:** `ELEVENLABS_API_KEY` environment variable - **Output format:** MP3 - **Quality:** Highest quality of the three providers ### openai - **Cost:** Usage-based (paid) - **Requirement:** `OPENAI_API_KEY` environment variable - **Output format:** MP3 - **Default voice:** `alloy` --- ## Usage: deepdive_tts.py ```bash # Use edge-tts (zero cost) DEEPDIVE_TTS_PROVIDER=edge-tts python bin/deepdive_tts.py --text "Good morning." # Specify a different Edge voice python bin/deepdive_tts.py --provider edge-tts --voice en-US-JennyNeural --text "Hello world." # Read from a file python bin/deepdive_tts.py --provider edge-tts --input-file /tmp/briefing.txt --output /tmp/briefing # Use OpenAI OPENAI_API_KEY=sk-... python bin/deepdive_tts.py --provider openai --voice nova --text "Hello." # Use ElevenLabs ELEVENLABS_API_KEY=... python bin/deepdive_tts.py --provider elevenlabs --voice rachel --text "Hello." # Use local Piper (offline) python bin/deepdive_tts.py --provider piper --text "Hello." ``` Provider and voice can also be set via environment variables: ```bash export DEEPDIVE_TTS_PROVIDER=edge-tts export DEEPDIVE_TTS_VOICE=en-GB-RyanNeural python bin/deepdive_tts.py --text "Good evening." ``` --- ## Usage: Night Watch --voice-memo The `--voice-memo` flag causes Night Watch to generate an MP3 audio summary of the nightly report immediately after writing the markdown file. ```bash python bin/night_watch.py --voice-memo ``` Output location: `/tmp/bezalel/night-watch-.mp3` The voice memo: - Strips markdown formatting (`#`, `|`, `*`, `---`) for cleaner speech - Uses `edge-tts` with the `en-US-GuyNeural` voice - Is non-fatal: if TTS fails, the markdown report is still written normally Example crontab with voice memo: ```cron 0 3 * * * cd /path/to/the-nexus && python bin/night_watch.py --voice-memo \ >> /var/log/bezalel/night-watch.log 2>&1 ``` --- ## Fallback Chain `HybridTTS` (used by `tts_engine.py`) attempts providers in this order: 1. **edge-tts** — zero cost, no API key 2. **piper** — offline local model (if model file present) 3. **elevenlabs** — cloud fallback (if `ELEVENLABS_API_KEY` set) If `prefer_cloud=True` is passed, the order becomes: elevenlabs → piper. --- ## Phase 3 TODO Evaluate **fish-speech** and **F5-TTS** as fully offline, sovereign alternatives with higher voice quality than Piper. These models run locally with no network dependency whatsoever, providing complete independence from Microsoft's Edge service. Tracking: to be filed as a follow-up to issue #830.