docs/voice-output.md

# Voice Output System

## Overview

The Nexus voice output system converts text reports and briefings into spoken audio.
It supports multiple TTS providers with automatic fallback so that audio generation
degrades gracefully when a provider is unavailable.

Primary use cases:
- **Deep Dive** daily briefings (`bin/deepdive_tts.py`)
- **Night Watch** nightly reports (`bin/night_watch.py --voice-memo`)

---

## Available Providers

### edge-tts (recommended default)

- **Cost:** Zero — no API key, no account required
- **Package:** `pip install edge-tts>=6.1.9`
- **Default voice:** `en-US-GuyNeural`
- **Output format:** MP3
- **How it works:** Streams audio from Microsoft Edge's neural TTS service over HTTPS.
  No local model download required.
- **Available locales:** 100+ languages and locales. Full list:
  https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support

Notable English voices:
| Voice ID | Style |
|---|---|
| `en-US-GuyNeural` | Neutral male (default) |
| `en-US-JennyNeural` | Warm female |
| `en-US-AriaNeural` | Expressive female |
| `en-GB-RyanNeural` | British male |

### piper

- **Cost:** Free, fully offline
- **Package:** `pip install piper-tts` + model download (~65 MB)
- **Model location:** `~/.local/share/piper/en_US-lessac-medium.onnx`
- **Output format:** WAV → MP3 (requires `lame`)
- **Sovereignty:** Fully local; no network calls after model download

### elevenlabs

- **Cost:** Usage-based (paid)
- **Requirement:** `ELEVENLABS_API_KEY` environment variable
- **Output format:** MP3
- **Quality:** Highest quality of the three providers

### openai

- **Cost:** Usage-based (paid)
- **Requirement:** `OPENAI_API_KEY` environment variable
- **Output format:** MP3
- **Default voice:** `alloy`

---

## Usage: deepdive_tts.py

```bash
# Use edge-tts (zero cost)
DEEPDIVE_TTS_PROVIDER=edge-tts python bin/deepdive_tts.py --text "Good morning."

# Specify a different Edge voice
python bin/deepdive_tts.py --provider edge-tts --voice en-US-JennyNeural --text "Hello world."

# Read from a file
python bin/deepdive_tts.py --provider edge-tts --input-file /tmp/briefing.txt --output /tmp/briefing

# Use OpenAI
OPENAI_API_KEY=sk-... python bin/deepdive_tts.py --provider openai --voice nova --text "Hello."

# Use ElevenLabs
ELEVENLABS_API_KEY=... python bin/deepdive_tts.py --provider elevenlabs --voice rachel --text "Hello."

# Use local Piper (offline)
python bin/deepdive_tts.py --provider piper --text "Hello."
```

Provider and voice can also be set via environment variables:

```bash
export DEEPDIVE_TTS_PROVIDER=edge-tts
export DEEPDIVE_TTS_VOICE=en-GB-RyanNeural
python bin/deepdive_tts.py --text "Good evening."
```

---

## Usage: Night Watch --voice-memo

The `--voice-memo` flag causes Night Watch to generate an MP3 audio summary of the
nightly report immediately after writing the markdown file.

```bash
python bin/night_watch.py --voice-memo
```

Output location: `/tmp/bezalel/night-watch-<YYYY-MM-DD>.mp3`

The voice memo:
- Strips markdown formatting (`#`, `|`, `*`, `---`) for cleaner speech
- Uses `edge-tts` with the `en-US-GuyNeural` voice
- Is non-fatal: if TTS fails, the markdown report is still written normally

Example crontab with voice memo:

```cron
0 3 * * * cd /path/to/the-nexus && python bin/night_watch.py --voice-memo \
    >> /var/log/bezalel/night-watch.log 2>&1
```

---

## Fallback Chain

`HybridTTS` (used by `tts_engine.py`) attempts providers in this order:

1. **edge-tts** — zero cost, no API key
2. **piper** — offline local model (if model file present)
3. **elevenlabs** — cloud fallback (if `ELEVENLABS_API_KEY` set)

If `prefer_cloud=True` is passed, the order becomes: elevenlabs → piper.

---

## Phase 3 TODO

Evaluate **fish-speech** and **F5-TTS** as fully offline, sovereign alternatives
with higher voice quality than Piper. These models run locally with no network
dependency whatsoever, providing complete independence from Microsoft's Edge service.

Tracking: to be filed as a follow-up to issue #830.
feat: add edge-tts as zero-cost voice output provider - Add EdgeTTSAdapter to bin/deepdive_tts.py (provider key: "edge-tts") default voice: en-US-GuyNeural, no API key required - Add EdgeTTS class to intelligence/deepdive/tts_engine.py - Update HybridTTS to try edge-tts as fallback between piper and elevenlabs - Add --voice-memo flag to bin/night_watch.py for spoken nightly reports - Add edge-tts>=6.1.9 to requirements.txt - Create docs/voice-output.md documenting all providers and fallback chain - Add tests/test_edge_tts.py with 17 unit tests (all mocked, no network) Fixes #1126 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> 2026-04-08 06:29:26 -04:00			`# Voice Output System`

			`## Overview`

			`The Nexus voice output system converts text reports and briefings into spoken audio.`
			`It supports multiple TTS providers with automatic fallback so that audio generation`
			`degrades gracefully when a provider is unavailable.`

			`Primary use cases:`
			- Deep Dive daily briefings (`bin/deepdive_tts.py`)
			- Night Watch nightly reports (`bin/night_watch.py --voice-memo`)

			`---`

			`## Available Providers`

			`### edge-tts (recommended default)`

			`- Cost: Zero — no API key, no account required`
			- Package: `pip install edge-tts>=6.1.9`
			- Default voice: `en-US-GuyNeural`
			`- Output format: MP3`
			`- How it works: Streams audio from Microsoft Edge's neural TTS service over HTTPS.`
			`No local model download required.`
			`- Available locales: 100+ languages and locales. Full list:`
			`https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support`

			`Notable English voices:`
			`\| Voice ID \| Style \|`
			`\|---\|---\|`
			\| `en-US-GuyNeural` \| Neutral male (default) \|
			\| `en-US-JennyNeural` \| Warm female \|
			\| `en-US-AriaNeural` \| Expressive female \|
			\| `en-GB-RyanNeural` \| British male \|

			`### piper`

			`- Cost: Free, fully offline`
			- Package: `pip install piper-tts` + model download (~65 MB)
			- Model location: `~/.local/share/piper/en_US-lessac-medium.onnx`
			- Output format: WAV → MP3 (requires `lame`)
			`- Sovereignty: Fully local; no network calls after model download`

			`### elevenlabs`

			`- Cost: Usage-based (paid)`
			- Requirement: `ELEVENLABS_API_KEY` environment variable
			`- Output format: MP3`
			`- Quality: Highest quality of the three providers`

			`### openai`

			`- Cost: Usage-based (paid)`
			- Requirement: `OPENAI_API_KEY` environment variable
			`- Output format: MP3`
			- Default voice: `alloy`

			`---`

			`## Usage: deepdive_tts.py`

			```bash
			`# Use edge-tts (zero cost)`
			`DEEPDIVE_TTS_PROVIDER=edge-tts python bin/deepdive_tts.py --text "Good morning."`

			`# Specify a different Edge voice`
			`python bin/deepdive_tts.py --provider edge-tts --voice en-US-JennyNeural --text "Hello world."`

			`# Read from a file`
			`python bin/deepdive_tts.py --provider edge-tts --input-file /tmp/briefing.txt --output /tmp/briefing`

			`# Use OpenAI`
			`OPENAI_API_KEY=sk-... python bin/deepdive_tts.py --provider openai --voice nova --text "Hello."`

			`# Use ElevenLabs`
			`ELEVENLABS_API_KEY=... python bin/deepdive_tts.py --provider elevenlabs --voice rachel --text "Hello."`

			`# Use local Piper (offline)`
			`python bin/deepdive_tts.py --provider piper --text "Hello."`
			```

			`Provider and voice can also be set via environment variables:`

			```bash
			`export DEEPDIVE_TTS_PROVIDER=edge-tts`
			`export DEEPDIVE_TTS_VOICE=en-GB-RyanNeural`
			`python bin/deepdive_tts.py --text "Good evening."`
			```

			`---`

			`## Usage: Night Watch --voice-memo`

			The `--voice-memo` flag causes Night Watch to generate an MP3 audio summary of the
			`nightly report immediately after writing the markdown file.`

			```bash
			`python bin/night_watch.py --voice-memo`
			```

			Output location: `/tmp/bezalel/night-watch-<YYYY-MM-DD>.mp3`

			`The voice memo:`
			- Strips markdown formatting (`#`, `\|`, `*`, `---`) for cleaner speech
			- Uses `edge-tts` with the `en-US-GuyNeural` voice
			`- Is non-fatal: if TTS fails, the markdown report is still written normally`

			`Example crontab with voice memo:`

			```cron
			`0 3 * * * cd /path/to/the-nexus && python bin/night_watch.py --voice-memo \`
			`>> /var/log/bezalel/night-watch.log 2>&1`
			```

			`---`

			`## Fallback Chain`

			`HybridTTS` (used by `tts_engine.py`) attempts providers in this order:

			`1. edge-tts — zero cost, no API key`
			`2. piper — offline local model (if model file present)`
			3. elevenlabs — cloud fallback (if `ELEVENLABS_API_KEY` set)

			If `prefer_cloud=True` is passed, the order becomes: elevenlabs → piper.

			`---`

			`## Phase 3 TODO`

			`Evaluate fish-speech and F5-TTS as fully offline, sovereign alternatives`
			`with higher voice quality than Piper. These models run locally with no network`
			`dependency whatsoever, providing complete independence from Microsoft's Edge service.`

			`Tracking: to be filed as a follow-up to issue #830.`