fix(gateway): discard empty placeholder when voice transcription succeeds

When a Discord voice message arrives, the adapter sets event.text to "(The user sent a message with no text content)" since voice messages have no text content. The transcription enrichment in _enrich_message_with_transcription() then prepends the transcript but leaves the placeholder intact, causing the agent to receive both: [The user sent a voice message~ Here's what they said: "..."] (The user sent a message with no text content) The agent sees this as two separate user turns — one transcribed and one empty — creating confusing duplicate messages. Fix: when the transcription succeeds and user_text is only the empty placeholder, return just the transcript without the redundant placeholder.
2026-04-07 13:46:59 +02:00
parent c3158d38b2
commit 25080986a0
1 changed files with 5 additions and 0 deletions
--- a/gateway/run.py
+++ b/gateway/run.py
@@ -6044,6 +6044,11 @@ class GatewayRunner:

        if enriched_parts:
            prefix = "\n\n".join(enriched_parts)
+            # Strip the empty-content placeholder from the Discord adapter
+            # when we successfully transcribed the audio — it's redundant.
+            _placeholder = "(The user sent a message with no text content)"
+            if user_text and user_text.strip() == _placeholder:
+                return prefix
            if user_text:
                return f"{prefix}\n\n{user_text}"
            return prefix