[VOICE] Add edge-tts as Zero-Cost Voice Output Provider #1126

Open
opened 2026-04-07 21:17:10 +00:00 by Timmy · 1 comment
Owner

Objective

Integrate edge-tts as a lightweight, API-key-free voice output provider for Hermes alerts and voice memos.

Background

Hermes already has a text_to_speech tool. Adding edge-tts gives us a zero-cost fallback that works on Linux without Microsoft Edge or Windows. Useful for: Night Watch alerts, spoken summaries, and accessibility.

Acceptance Criteria

Phase 1 — Integration (2 days)

  • edge-tts is installed in the Hermes Python environment
  • New TTS provider edge-tts added to the existing text_to_speech tool
  • Voice can be selected from available locales (default: en-US-GuyNeural)
  • Output format supports .mp3 and .ogg for Telegram compatibility

Phase 2 — Trigger Points (3 days)

  • Night Watch can optionally deliver its summary as a voice memo via edge-tts
  • Critical alerts (e.g., CI failure, runner down) can be sent as voice messages to Timmy Time
  • A /voice-summary or similar command generates a spoken daily brief

Phase 3 — Fallback & Sovereignty (1 week)

  • If edge-tts fails (network unreachable), Hermes falls back to another configured TTS provider
  • Evaluate local TTS alternatives (fish-speech, F5-TTS) for full offline sovereignty; create follow-up issue if viable
  • Document in the-nexus/docs/voice-output.md

Suggested Implementation Path

  1. pip install edge-tts
  2. Add provider logic in tools/text_to_speech.py
  3. Add voice delivery path in Telegram/notification pipeline

Owner

Bezalel

Linked Epic

#1120

## Objective Integrate `edge-tts` as a lightweight, API-key-free voice output provider for Hermes alerts and voice memos. ## Background Hermes already has a `text_to_speech` tool. Adding `edge-tts` gives us a zero-cost fallback that works on Linux without Microsoft Edge or Windows. Useful for: Night Watch alerts, spoken summaries, and accessibility. ## Acceptance Criteria ### Phase 1 — Integration (2 days) - [ ] `edge-tts` is installed in the Hermes Python environment - [ ] New TTS provider `edge-tts` added to the existing `text_to_speech` tool - [ ] Voice can be selected from available locales (default: `en-US-GuyNeural`) - [ ] Output format supports `.mp3` and `.ogg` for Telegram compatibility ### Phase 2 — Trigger Points (3 days) - [ ] Night Watch can optionally deliver its summary as a voice memo via edge-tts - [ ] Critical alerts (e.g., CI failure, runner down) can be sent as voice messages to Timmy Time - [ ] A `/voice-summary` or similar command generates a spoken daily brief ### Phase 3 — Fallback & Sovereignty (1 week) - [ ] If edge-tts fails (network unreachable), Hermes falls back to another configured TTS provider - [ ] Evaluate local TTS alternatives (`fish-speech`, `F5-TTS`) for full offline sovereignty; create follow-up issue if viable - [ ] Document in `the-nexus/docs/voice-output.md` ## Suggested Implementation Path 1. `pip install edge-tts` 2. Add provider logic in `tools/text_to_speech.py` 3. Add voice delivery path in Telegram/notification pipeline ## Owner Bezalel ## Linked Epic #1120
claude self-assigned this 2026-04-08 10:24:58 +00:00
Member

PR created: #1130

Implemented edge-tts as a zero-cost voice output provider:

  • EdgeTTSAdapter added to bin/deepdive_tts.py (provider key: edge-tts, default voice: en-US-GuyNeural, no API key)
  • EdgeTTS class added to intelligence/deepdive/tts_engine.py; HybridTTS now uses edge-tts as fallback between Piper and ElevenLabs
  • --voice-memo flag on bin/night_watch.py generates a spoken MP3 of the nightly report
  • requirements.txt pinned edge-tts>=6.1.9
  • docs/voice-output.md documents all providers, fallback chain, and Phase 3 fish-speech/F5-TTS evaluation TODO
  • 17 unit tests added (tests/test_edge_tts.py), all mocked, zero network calls — all passing
PR created: https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/pulls/1130 Implemented `edge-tts` as a zero-cost voice output provider: - **`EdgeTTSAdapter`** added to `bin/deepdive_tts.py` (provider key: `edge-tts`, default voice: `en-US-GuyNeural`, no API key) - **`EdgeTTS`** class added to `intelligence/deepdive/tts_engine.py`; `HybridTTS` now uses edge-tts as fallback between Piper and ElevenLabs - **`--voice-memo`** flag on `bin/night_watch.py` generates a spoken MP3 of the nightly report - **`requirements.txt`** pinned `edge-tts>=6.1.9` - **`docs/voice-output.md`** documents all providers, fallback chain, and Phase 3 fish-speech/F5-TTS evaluation TODO - **17 unit tests** added (`tests/test_edge_tts.py`), all mocked, zero network calls — all passing
bezalel was assigned by Timmy 2026-04-08 19:16:35 +00:00
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/the-nexus#1126