Wire voice analysis into Hermes STT pipeline #507

Closed
opened 2026-03-25 14:07:31 +00:00 by Timmy · 1 comment
Owner

Depends on: voice analysis pipeline issue (emotion/prosody/environment)

Objective

After the voice analysis script is built, wire it into the Hermes agent's STT flow so every voice message automatically gets rich audio context.

Current flow

  1. User sends voice message (Telegram/WhatsApp)
  2. Hermes downloads audio
  3. Whisper transcribes to text
  4. Text sent to LLM as user message

Target flow

  1. User sends voice message
  2. Hermes downloads audio
  3. Whisper transcribes to text (with word-level timestamps)
  4. ~/.timmy/voice/analyze_audio.py runs on the audio + transcript
  5. Voice context block prepended to transcript
  6. Enriched message sent to LLM

Integration points

  • Check ~/.hermes/hermes-agent/ for the STT processing code
  • Look for where Whisper output is converted to a user message
  • Insert the analysis call there
  • If analysis takes >3s, send transcript first, update context after

Verification

  • Send a voice message via Telegram to @TimmysNexus_bot
  • Check the session log — user message should contain [VOICE CONTEXT] block
  • Timmy's response should acknowledge the emotional context naturally
## Depends on: voice analysis pipeline issue (emotion/prosody/environment) ## Objective After the voice analysis script is built, wire it into the Hermes agent's STT flow so every voice message automatically gets rich audio context. ## Current flow 1. User sends voice message (Telegram/WhatsApp) 2. Hermes downloads audio 3. Whisper transcribes to text 4. Text sent to LLM as user message ## Target flow 1. User sends voice message 2. Hermes downloads audio 3. Whisper transcribes to text (with word-level timestamps) 4. `~/.timmy/voice/analyze_audio.py` runs on the audio + transcript 5. Voice context block prepended to transcript 6. Enriched message sent to LLM ## Integration points - Check `~/.hermes/hermes-agent/` for the STT processing code - Look for where Whisper output is converted to a user message - Insert the analysis call there - If analysis takes >3s, send transcript first, update context after ## Verification - Send a voice message via Telegram to @TimmysNexus_bot - Check the session log — user message should contain [VOICE CONTEXT] block - Timmy's response should acknowledge the emotional context naturally
claude was assigned by Timmy 2026-03-25 14:07:31 +00:00
Member

Closed per direction shift (#542). Reason: Wire custom voice analysis — depends on custom pipeline above.

The Nexus has three jobs: Heartbeat, Harness, Portal Interface. This issue doesn't serve any of them.

Closed per direction shift (#542). Reason: Wire custom voice analysis — depends on custom pipeline above. The Nexus has three jobs: Heartbeat, Harness, Portal Interface. This issue doesn't serve any of them.
perplexity added the deprioritized label 2026-03-25 23:30:11 +00:00
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/the-nexus#507