Files
the-nexus/docs/voice-output.md
Alexander Whitestone ef74536e33
Some checks failed
CI / test (pull_request) Failing after 33s
CI / validate (pull_request) Failing after 26s
Review Approval Gate / verify-review (pull_request) Failing after 5s
feat: add edge-tts as zero-cost voice output provider
- Add EdgeTTSAdapter to bin/deepdive_tts.py (provider key: "edge-tts")
  default voice: en-US-GuyNeural, no API key required
- Add EdgeTTS class to intelligence/deepdive/tts_engine.py
- Update HybridTTS to try edge-tts as fallback between piper and elevenlabs
- Add --voice-memo flag to bin/night_watch.py for spoken nightly reports
- Add edge-tts>=6.1.9 to requirements.txt
- Create docs/voice-output.md documenting all providers and fallback chain
- Add tests/test_edge_tts.py with 17 unit tests (all mocked, no network)

Fixes #1126

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 06:29:26 -04:00

4.0 KiB

Voice Output System

Overview

The Nexus voice output system converts text reports and briefings into spoken audio. It supports multiple TTS providers with automatic fallback so that audio generation degrades gracefully when a provider is unavailable.

Primary use cases:

  • Deep Dive daily briefings (bin/deepdive_tts.py)
  • Night Watch nightly reports (bin/night_watch.py --voice-memo)

Available Providers

Notable English voices:

Voice ID Style
en-US-GuyNeural Neutral male (default)
en-US-JennyNeural Warm female
en-US-AriaNeural Expressive female
en-GB-RyanNeural British male

piper

  • Cost: Free, fully offline
  • Package: pip install piper-tts + model download (~65 MB)
  • Model location: ~/.local/share/piper/en_US-lessac-medium.onnx
  • Output format: WAV → MP3 (requires lame)
  • Sovereignty: Fully local; no network calls after model download

elevenlabs

  • Cost: Usage-based (paid)
  • Requirement: ELEVENLABS_API_KEY environment variable
  • Output format: MP3
  • Quality: Highest quality of the three providers

openai

  • Cost: Usage-based (paid)
  • Requirement: OPENAI_API_KEY environment variable
  • Output format: MP3
  • Default voice: alloy

Usage: deepdive_tts.py

# Use edge-tts (zero cost)
DEEPDIVE_TTS_PROVIDER=edge-tts python bin/deepdive_tts.py --text "Good morning."

# Specify a different Edge voice
python bin/deepdive_tts.py --provider edge-tts --voice en-US-JennyNeural --text "Hello world."

# Read from a file
python bin/deepdive_tts.py --provider edge-tts --input-file /tmp/briefing.txt --output /tmp/briefing

# Use OpenAI
OPENAI_API_KEY=sk-... python bin/deepdive_tts.py --provider openai --voice nova --text "Hello."

# Use ElevenLabs
ELEVENLABS_API_KEY=... python bin/deepdive_tts.py --provider elevenlabs --voice rachel --text "Hello."

# Use local Piper (offline)
python bin/deepdive_tts.py --provider piper --text "Hello."

Provider and voice can also be set via environment variables:

export DEEPDIVE_TTS_PROVIDER=edge-tts
export DEEPDIVE_TTS_VOICE=en-GB-RyanNeural
python bin/deepdive_tts.py --text "Good evening."

Usage: Night Watch --voice-memo

The --voice-memo flag causes Night Watch to generate an MP3 audio summary of the nightly report immediately after writing the markdown file.

python bin/night_watch.py --voice-memo

Output location: /tmp/bezalel/night-watch-<YYYY-MM-DD>.mp3

The voice memo:

  • Strips markdown formatting (#, |, *, ---) for cleaner speech
  • Uses edge-tts with the en-US-GuyNeural voice
  • Is non-fatal: if TTS fails, the markdown report is still written normally

Example crontab with voice memo:

0 3 * * * cd /path/to/the-nexus && python bin/night_watch.py --voice-memo \
    >> /var/log/bezalel/night-watch.log 2>&1

Fallback Chain

HybridTTS (used by tts_engine.py) attempts providers in this order:

  1. edge-tts — zero cost, no API key
  2. piper — offline local model (if model file present)
  3. elevenlabs — cloud fallback (if ELEVENLABS_API_KEY set)

If prefer_cloud=True is passed, the order becomes: elevenlabs → piper.


Phase 3 TODO

Evaluate fish-speech and F5-TTS as fully offline, sovereign alternatives with higher voice quality than Piper. These models run locally with no network dependency whatsoever, providing complete independence from Microsoft's Edge service.

Tracking: to be filed as a follow-up to issue #830.