Timmy_Foundation/the-nexus

Fork 2

Files

Alexander Whitestone ef74536e33

CI / test (pull_request) Failing after 33s

Details

CI / validate (pull_request) Failing after 26s

Details

Review Approval Gate / verify-review (pull_request) Failing after 5s

Details

feat: add edge-tts as zero-cost voice output provider

- Add EdgeTTSAdapter to bin/deepdive_tts.py (provider key: "edge-tts")
  default voice: en-US-GuyNeural, no API key required
- Add EdgeTTS class to intelligence/deepdive/tts_engine.py
- Update HybridTTS to try edge-tts as fallback between piper and elevenlabs
- Add --voice-memo flag to bin/night_watch.py for spoken nightly reports
- Add edge-tts>=6.1.9 to requirements.txt
- Create docs/voice-output.md documenting all providers and fallback chain
- Add tests/test_edge_tts.py with 17 unit tests (all mocked, no network)

Fixes #1126

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-08 06:29:26 -04:00

4.0 KiB

Raw Blame History

Voice Output System

Overview

The Nexus voice output system converts text reports and briefings into spoken audio. It supports multiple TTS providers with automatic fallback so that audio generation degrades gracefully when a provider is unavailable.

Primary use cases:

Deep Dive daily briefings (bin/deepdive_tts.py)
Night Watch nightly reports (bin/night_watch.py --voice-memo)

Available Providers

edge-tts (recommended default)

Cost: Zero — no API key, no account required
Package: pip install edge-tts>=6.1.9
Default voice: en-US-GuyNeural
Output format: MP3
How it works: Streams audio from Microsoft Edge's neural TTS service over HTTPS. No local model download required.
Available locales: 100+ languages and locales. Full list: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support

Notable English voices:

Voice ID	Style
`en-US-GuyNeural`	Neutral male (default)
`en-US-JennyNeural`	Warm female
`en-US-AriaNeural`	Expressive female
`en-GB-RyanNeural`	British male

piper

Cost: Free, fully offline
Package: pip install piper-tts + model download (~65 MB)
Model location: ~/.local/share/piper/en_US-lessac-medium.onnx
Output format: WAV → MP3 (requires lame)
Sovereignty: Fully local; no network calls after model download

elevenlabs

Cost: Usage-based (paid)
Requirement: ELEVENLABS_API_KEY environment variable
Output format: MP3
Quality: Highest quality of the three providers

openai

Cost: Usage-based (paid)
Requirement: OPENAI_API_KEY environment variable
Output format: MP3
Default voice: alloy

Usage: deepdive_tts.py

# Use edge-tts (zero cost)
DEEPDIVE_TTS_PROVIDER=edge-tts python bin/deepdive_tts.py --text "Good morning."

# Specify a different Edge voice
python bin/deepdive_tts.py --provider edge-tts --voice en-US-JennyNeural --text "Hello world."

# Read from a file
python bin/deepdive_tts.py --provider edge-tts --input-file /tmp/briefing.txt --output /tmp/briefing

# Use OpenAI
OPENAI_API_KEY=sk-... python bin/deepdive_tts.py --provider openai --voice nova --text "Hello."

# Use ElevenLabs
ELEVENLABS_API_KEY=... python bin/deepdive_tts.py --provider elevenlabs --voice rachel --text "Hello."

# Use local Piper (offline)
python bin/deepdive_tts.py --provider piper --text "Hello."

Provider and voice can also be set via environment variables:

export DEEPDIVE_TTS_PROVIDER=edge-tts
export DEEPDIVE_TTS_VOICE=en-GB-RyanNeural
python bin/deepdive_tts.py --text "Good evening."

Usage: Night Watch --voice-memo

The --voice-memo flag causes Night Watch to generate an MP3 audio summary of the nightly report immediately after writing the markdown file.

python bin/night_watch.py --voice-memo

Output location: /tmp/bezalel/night-watch-<YYYY-MM-DD>.mp3

The voice memo:

Strips markdown formatting (#, |, *, ---) for cleaner speech
Uses edge-tts with the en-US-GuyNeural voice
Is non-fatal: if TTS fails, the markdown report is still written normally

Example crontab with voice memo:

0 3 * * * cd /path/to/the-nexus && python bin/night_watch.py --voice-memo \
    >> /var/log/bezalel/night-watch.log 2>&1

Fallback Chain

HybridTTS (used by tts_engine.py) attempts providers in this order:

edge-tts — zero cost, no API key
piper — offline local model (if model file present)
elevenlabs — cloud fallback (if ELEVENLABS_API_KEY set)

If prefer_cloud=True is passed, the order becomes: elevenlabs → piper.

Phase 3 TODO

Evaluate fish-speech and F5-TTS as fully offline, sovereign alternatives with higher voice quality than Piper. These models run locally with no network dependency whatsoever, providing complete independence from Microsoft's Edge service.

Tracking: to be filed as a follow-up to issue #830.

4.0 KiB Raw Blame History

Voice Output System

Overview

Available Providers

edge-tts (recommended default)

piper

elevenlabs

openai

Usage: deepdive_tts.py

Usage: Night Watch --voice-memo

Fallback Chain

Phase 3 TODO

4.0 KiB

Raw Blame History