the-nexus/docs/voice-output.md

# Voice Output System

## Overview

The Nexus voice output system converts text reports and briefings into spoken audio.
It supports multiple TTS providers with automatic fallback so that audio generation
degrades gracefully when a provider is unavailable.

Primary use cases:
- **Deep Dive** daily briefings (`bin/deepdive_tts.py`)
- **Night Watch** nightly reports (`bin/night_watch.py --voice-memo`)

---

## Available Providers

### edge-tts (recommended default)

- **Cost:** Zero — no API key, no account required
- **Package:** `pip install edge-tts>=6.1.9`
- **Default voice:** `en-US-GuyNeural`
- **Output format:** MP3
- **How it works:** Streams audio from Microsoft Edge's neural TTS service over HTTPS.
  No local model download required.
- **Available locales:** 100+ languages and locales. Full list:
  https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support

Notable English voices:
| Voice ID | Style |
|---|---|
| `en-US-GuyNeural` | Neutral male (default) |
| `en-US-JennyNeural` | Warm female |
| `en-US-AriaNeural` | Expressive female |
| `en-GB-RyanNeural` | British male |

### piper

- **Cost:** Free, fully offline
- **Package:** `pip install piper-tts` + model download (~65 MB)
- **Model location:** `~/.local/share/piper/en_US-lessac-medium.onnx`
- **Output format:** WAV → MP3 (requires `lame`)
- **Sovereignty:** Fully local; no network calls after model download

### elevenlabs

- **Cost:** Usage-based (paid)
- **Requirement:** `ELEVENLABS_API_KEY` environment variable
- **Output format:** MP3
- **Quality:** Highest quality of the three providers

### openai

- **Cost:** Usage-based (paid)
- **Requirement:** `OPENAI_API_KEY` environment variable
- **Output format:** MP3
- **Default voice:** `alloy`

---

## Usage: deepdive_tts.py

```bash
# Use edge-tts (zero cost)
DEEPDIVE_TTS_PROVIDER=edge-tts python bin/deepdive_tts.py --text "Good morning."

# Specify a different Edge voice
python bin/deepdive_tts.py --provider edge-tts --voice en-US-JennyNeural --text "Hello world."

# Read from a file
python bin/deepdive_tts.py --provider edge-tts --input-file /tmp/briefing.txt --output /tmp/briefing

# Use OpenAI
OPENAI_API_KEY=sk-... python bin/deepdive_tts.py --provider openai --voice nova --text "Hello."

# Use ElevenLabs
ELEVENLABS_API_KEY=... python bin/deepdive_tts.py --provider elevenlabs --voice rachel --text "Hello."

# Use local Piper (offline)
python bin/deepdive_tts.py --provider piper --text "Hello."
```

Provider and voice can also be set via environment variables:

```bash
export DEEPDIVE_TTS_PROVIDER=edge-tts
export DEEPDIVE_TTS_VOICE=en-GB-RyanNeural
python bin/deepdive_tts.py --text "Good evening."
```

---

## Usage: Night Watch --voice-memo

The `--voice-memo` flag causes Night Watch to generate an MP3 audio summary of the
nightly report immediately after writing the markdown file.

```bash
python bin/night_watch.py --voice-memo
```

Output location: `/tmp/bezalel/night-watch-<YYYY-MM-DD>.mp3`

The voice memo:
- Strips markdown formatting (`#`, `|`, `*`, `---`) for cleaner speech
- Uses `edge-tts` with the `en-US-GuyNeural` voice
- Is non-fatal: if TTS fails, the markdown report is still written normally

Example crontab with voice memo:

```cron
0 3 * * * cd /path/to/the-nexus && python bin/night_watch.py --voice-memo \
    >> /var/log/bezalel/night-watch.log 2>&1
```

---

## Fallback Chain

`HybridTTS` (used by `tts_engine.py`) attempts providers in this order:

1. **edge-tts** — zero cost, no API key
2. **piper** — offline local model (if model file present)
3. **elevenlabs** — cloud fallback (if `ELEVENLABS_API_KEY` set)

If `prefer_cloud=True` is passed, the order becomes: elevenlabs → piper.

---

## Phase 3 TODO

Evaluate **fish-speech** and **F5-TTS** as fully offline, sovereign alternatives
with higher voice quality than Piper. These models run locally with no network
dependency whatsoever, providing complete independence from Microsoft's Edge service.

Tracking: to be filed as a follow-up to issue #830.