Commit Graph

1677 Commits

Author SHA1 Message Date
0xbyt4
35748a2fb0 fix: address PR review round 4 — remove web UI, fix audio/import/interface issues
Remove web UI gateway (web.py, tests, docs, toolset, env vars, Platform.WEB
enum) per maintainer request — Nous is building their own official chat UI.

Fix 1: Replace sd.wait() with polling pattern in play_audio_file() to prevent
indefinite hang when audio device stalls (consistent with play_beep()).

Fix 2: Use importlib.util.find_spec() for faster_whisper/openai availability
checks instead of module-level imports that trigger heavy native library
loading (CUDA/cuDNN) at import time.

Fix 3: Remove inspect.signature() hack in _send_voice_reply() — add **kwargs
to Telegram send_voice() so all adapters accept metadata uniformly.

Fix 4: Make session loading resilient to removed platform enum values — skip
entries with unknown platforms instead of crashing the entire gateway.
2026-03-14 14:27:21 +03:00
0xbyt4
1ad5e0ed15 feat: add voice channel awareness — inject participant and speaking state into agent context 2026-03-14 14:27:21 +03:00
0xbyt4
49f3f0fc62 fix: add choices/Choice to discord mock for /voice slash command test 2026-03-14 14:27:21 +03:00
0xbyt4
e3126aeb40 fix: STT consistency — web.py model param, error matching, local provider key
- web.py: pass stt_model from config like discord.py and run.py do
- run.py: match new error messages (No STT provider / not set)
- _transcribe_local: add missing "provider": "local" to return dict
2026-03-14 14:27:21 +03:00
0xbyt4
41162e0aca fix: prevent shutdown deadlock and unblockable Ctrl+C on exit
Move stream close outside the lock in shutdown() to prevent deadlock
when audio callback tries to acquire the same lock. Replace single
t.join(timeout) with a polling loop (0.1s intervals) so KeyboardInterrupt
is not blocked during stream cleanup.
2026-03-14 14:27:21 +03:00
0xbyt4
69cb373864 fix: update /voice status to show correct STT provider
Voice status was hardcoded to check API keys only. Now uses the actual
provider resolution (local/groq/openai) so it correctly shows
"local faster-whisper" when installed instead of "Groq" or "MISSING".
2026-03-14 14:27:21 +03:00
0xbyt4
eb052b1b42 fix: add explicit metadata param to Discord send_voice signature 2026-03-14 14:27:21 +03:00
0xbyt4
b8f8d3ef9e feat: integrate faster-whisper local STT with three-provider fallback
Merge main's faster-whisper (local, free) with our Groq support into a
unified three-provider STT pipeline: local > groq > openai.

Provider priority ensures free options are tried first. Each provider
has its own transcriber function with model auto-correction, env-
overridable endpoints, and proper error handling.

74 tests cover the full provider matrix, fallback chains, model
correction, config loading, validation edge cases, and dispatch.
2026-03-14 14:27:21 +03:00
0xbyt4
c433c89d7d fix: demote RTP debug logs to DEBUG and isolate web sessions
- Change RTP packet logging from INFO to DEBUG level to reduce noise
  (SPEAKING events remain at INFO as they are important lifecycle events)
- Use per-session chat_id (web_{session_id}) instead of shared "web"
  to isolate conversation context between simultaneous web users
2026-03-14 14:27:21 +03:00
0xbyt4
fa2c825e2f fix: isolate WEB_UI_HOST env var in test and handle empty string
- Patch WEB_UI_HOST in test_web_defaults to avoid env leak
- Handle empty WEB_UI_HOST string in config (fall back to 127.0.0.1)
2026-03-14 14:27:21 +03:00
0xbyt4
5b47b87c42 fix: show only reachable URLs in Web UI startup message
When bound to 127.0.0.1, only show localhost URL instead of listing
unreachable network interfaces. Add hint about WEB_UI_HOST=0.0.0.0
for phone/tablet access. Add VPN/multi-interface and token exposure
tests (11 new tests).
2026-03-14 14:27:21 +03:00
0xbyt4
a21f518c0b fix: hide configured token value in Web UI startup log
Only print the access token when auto-generated (user needs it to
log in). When set via WEB_UI_TOKEN env var, just confirm it is set
without exposing the value in console output.
2026-03-14 14:27:21 +03:00
0xbyt4
44abe852fb fix: add macOS Homebrew Opus fallback and fix shutdown dict iteration
- Add Homebrew library path fallback when ctypes.util.find_library fails
  on macOS (Apple Silicon + Intel paths, guarded by platform check)
- Fix RuntimeError in gateway stop() by iterating over dict copy
- Update Opus tests to verify find_library-first + conditional fallback
2026-03-14 14:27:21 +03:00
0xbyt4
c797314fcf test: add security and hardening tests for voice mode fixes
- Path traversal sanitization (Path.name strips ../)
- Media endpoint authentication (401 without token, 404 on traversal)
- hmac.compare_digest usage verification (no == for tokens)
- DOMPurify XSS prevention in HTML template
- Default bind 127.0.0.1 (adapter and config)
- /remote-control token hiding in group chats
- Opus find_library instead of hardcoded paths
- Opus decode error logging (no silent swallow)
- Interrupt _vprint force=True on all 6 calls
- Anthropic interrupt handler in both API call paths
- Update test_web_defaults for new 127.0.0.1 default
2026-03-14 14:27:21 +03:00
0xbyt4
0ff1b4ade2 fix: harden web gateway security and fix error swallowing
- Use hmac.compare_digest for timing-safe token comparison (3 endpoints)
- Default bind to 127.0.0.1 instead of 0.0.0.0
- Sanitize upload filenames with Path.name to prevent path traversal
- Add DOMPurify to sanitize marked.parse() output against XSS
- Replace add_static with authenticated media handler
- Hide token in group chats for /remote-control command
- Use ctypes.util.find_library for Opus instead of hardcoded paths
- Add force=True to 5 interrupt _vprint calls for visibility
- Log Opus decode errors and voice restart failures instead of swallowing
2026-03-14 14:27:21 +03:00
0xbyt4
d646442692 fix: restore Anthropic interrupt handler in _interruptible_api_call
Rebase auto-merge silently overwrote main's Anthropic-aware interrupt
handler with the older OpenAI-only version. Without this fix, interrupting
an Anthropic API call closes the wrong client and leaves token generation
running on the Anthropic side.
2026-03-14 14:27:21 +03:00
0xbyt4
0a8985acf9 fix: add missing load_config import in _show_voice_status 2026-03-14 14:27:21 +03:00
0xbyt4
2c84979d77 refactor: extract get_stt_model_from_config helper to eliminate DRY violation
Duplicated YAML config parsing for stt.model existed in gateway/run.py
and gateway/platforms/discord.py. Moved to a single helper in
transcription_tools.py and added 5 tests covering all edge cases.
2026-03-14 14:27:21 +03:00
0xbyt4
3260413cc7 docs: add STT override env vars to .env.example 2026-03-14 14:27:20 +03:00
0xbyt4
238a431545 fix: make STT config env-overridable and fix doc issues
Code fixes:
- STT model, Groq base URL, and OpenAI STT base URL are now
  configurable via env vars (STT_GROQ_MODEL, STT_OPENAI_MODEL,
  GROQ_BASE_URL, STT_OPENAI_BASE_URL) instead of hardcoded
- Gateway and Discord VC now read stt.model from config.yaml
  (previously only CLI did this — gateway always used defaults)

Doc fixes:
- voice-mode.md: move Web UI troubleshooting to web.md (was duplicated)
- voice-mode.md: simplify "How It Works" for end users (remove NaCl,
  DAVE, RTP internals)
- voice-mode.md: clarify STT priority (OpenAI used first if both keys
  set, Groq recommended for free tier)
- voice-mode.md: document new STT env overrides in config reference
- web.md: remove duplicate Quick Start / Step 1-3 sections
- web.md: add mobile HTTPS mic workarounds (moved from voice-mode.md)
- web.md: clarify STT fallback order
2026-03-14 14:27:20 +03:00
0xbyt4
79ed0effdd docs: fix 3 inaccuracies found during code-vs-docs audit
- voice-mode.md: Discord sends native voice bubbles (OGG/Opus flags=8192),
  not MP3 file attachments. Falls back to file only if voice API fails.
- discord.md: Bot requires @mention by default in server channels
  (DISCORD_REQUIRE_MENTION=true). Previous text incorrectly said no
  mention needed.
- index.md: Fix broken ASCII architecture diagram alignment after
  adding Web adapter box.
2026-03-14 14:27:20 +03:00
0xbyt4
9722bd8be0 fix: 8 voice pipeline bugs with tests proving each fix
1. VoiceReceiver.stop() now acquires _lock before clearing shared state
   to prevent race with _on_packet on the socket reader thread
2. _packet_debug_count moved from class-level to instance-level to avoid
   cross-instance race condition in multi-guild setups
3. play_in_voice_channel uses asyncio.get_running_loop() instead of
   deprecated asyncio.get_event_loop()
4. _send_voice_reply uses uuid for filenames instead of time-based names
   that can collide when two replies happen in the same second
5. Voice timeout now notifies runner via _on_voice_disconnect callback
   so runner cleans up _voice_mode state (prevents orphaned TTS replies)
6. play_in_voice_channel adds PLAYBACK_TIMEOUT (120s) to prevent
   infinite blocking when FFmpeg callback is never called
7. _send_voice_reply moves temp file cleanup to finally block so files
   are always cleaned up even when send_voice/play raises
8. Base adapter auto-TTS wraps play_tts in try/finally with os.remove
   to clean up generated audio files after playback

18 new tests (120 total voice tests)
2026-03-14 14:27:20 +03:00
0xbyt4
c925d2ee76 fix: voice pipeline thread safety and error handling bugs
- Add lock protection around VoiceReceiver buffer writes in _on_packet
  to prevent race condition with check_silence on different threads
- Wire _voice_input_callback BEFORE join_voice_channel to avoid
  losing voice input during the join window
- Add try/except around leave_voice_channel to ensure state cleanup
  (voice_mode, callback) even if leave raises an exception
- Guard against empty text after markdown stripping in base.py auto-TTS
- Add 11 tests proving each bug and verifying the fix
2026-03-14 14:27:20 +03:00
0xbyt4
34c324ff59 fix(test): use real _strip_markdown_for_tts instead of duplicated copy
- Import from tools.tts_tool instead of reimplementing the logic
- Fix test_truncates_long_text: truncation is the caller's job, not the function's
- Remove unused re import
2026-03-14 14:27:20 +03:00
0xbyt4
86ddaaee9c fix: extract voice reply logic and add comprehensive tests
- Fix tempfile.mktemp() TOCTOU race in Discord voice input (use NamedTemporaryFile)
- Extract voice reply decision from _handle_message into _should_send_voice_reply()
- Rewrite TestAutoVoiceReply to call real method instead of testing a copy
- Add 59 new tests: VoiceReceiver, VC commands, adapter methods, streaming TTS
2026-03-14 14:27:20 +03:00
0xbyt4
0d56b79685 docs: add firewall and mobile HTTPS troubleshooting for Web UI
- macOS firewall may block LAN access to Web UI
- Mobile browsers require HTTPS for microphone API
- Document workarounds: Android Chrome flag, mkcert self-signed cert,
  Caddy reverse proxy, SSH tunnel for iOS
2026-03-14 14:27:20 +03:00
0xbyt4
3431f73c96 fix: show mic button on mobile Web UI with HTTPS warning
Mobile browsers require HTTPS for navigator.mediaDevices API.
Instead of hiding the mic button (confusing UX), show it as dimmed
and display an informative message when tapped explaining the HTTPS
requirement.
2026-03-14 14:27:20 +03:00
0xbyt4
fbf47e9ff6 fix: allow voice reply in Discord VC despite skip_double guard
When bot is in a Discord voice channel, both base auto-TTS and Discord
play_tts override skip audio. The skip_double guard was also blocking
the runner's _send_voice_reply, resulting in zero audio output in VC.

Now skip_double is overridden when the bot is actively connected to a
voice channel, allowing play_in_voice_channel to handle TTS.

Add comprehensive test matrix covering all platform x input x mode
combinations with full decision table documentation.
2026-03-14 14:27:20 +03:00
0xbyt4
dcb84a8d30 test: add double TTS prevention tests for voice reply logic
- Update TestAutoVoiceReply to include skip_double logic: voice input
  is handled by base adapter auto-TTS, gateway runner skips to prevent
  duplicate audio
- Add TestDiscordPlayTtsSkip: verifies Discord adapter skips play_tts
  when bot is in a voice channel (VC playback handled by runner)
- Add TestWebPlayTts: verifies Web adapter sends invisible play_audio
  instead of voice bubble
2026-03-14 14:27:20 +03:00
0xbyt4
095815d520 fix: skip gateway voice reply for all platforms on voice input
Base adapter auto-TTS already generates and sends audio for voice
messages in _process_message_background. The gateway runner's
_send_voice_reply was causing double audio on all platforms (not
just Web). Now skip_double applies to any voice input regardless
of platform.
2026-03-14 14:27:20 +03:00
0xbyt4
62e75cd158 fix: skip duplicate TTS file attachment when bot is in Discord voice channel
Override play_tts in DiscordAdapter to no-op when connected to a voice
channel for the same guild. The gateway runner already plays TTS audio
in the VC via play_in_voice_channel, so the base adapter's fallback
to send_voice (file attachment) was causing double audio output.
2026-03-14 14:27:20 +03:00
0xbyt4
815e83952e fix: prevent double TTS on Web UI voice messages
When voice mode is enabled and user sends a voice message on Web UI,
both the base adapter auto-TTS (play_audio) and the gateway voice reply
(send_voice) would fire, causing duplicate audio playback. Skip the
gateway voice reply for Web platform voice input since base adapter
already handles it.
2026-03-14 14:27:20 +03:00
0xbyt4
e21a13488b docs: add Discord DM usage and mention requirement to voice mode guide
- Document DM vs server channel interaction modes
- Explain @mention requirement and how to select bot user vs role
- Add DISCORD_REQUIRE_MENTION and DISCORD_FREE_RESPONSE_CHANNELS config
- Add troubleshooting entry for bot not responding in server channels
2026-03-14 14:27:20 +03:00
0xbyt4
1b10c3711d fix: accept **kwargs in send_voice for Discord and Slack adapters
play_tts base class forwards metadata via **kwargs to send_voice,
but Discord and Slack adapters did not accept extra keyword arguments,
causing TypeError and silent message handling failure.

Also fix test_web_defaults to patch correct env var (WEB_UI_TOKEN).
2026-03-14 14:27:20 +03:00
0xbyt4
f078cb4038 fix(test): isolate WEB_TOKEN env var in test_web_defaults 2026-03-14 14:27:20 +03:00
0xbyt4
6205f061fe test: add comprehensive tests for web gateway adapter
32 tests covering:
- Platform enum and config env overrides
- WebAdapter init, port/host/token parsing, auto-token generation
- aiohttp server lifecycle (connect/disconnect)
- HTML serving on GET /
- WebSocket auth handshake (success/failure)
- WebSocket text message routing to handler
- send/send_voice/play_tts broadcast payloads
- hermes-web toolset registration
- Groq STT fallback in transcription_tools
- LAN IP detection
- Media directory management
2026-03-14 14:27:20 +03:00
0xbyt4
c477f660da feat: add continuous voice mode with VAD silence detection
- Voice mode: press mic once to enter, press again to exit
- VAD (Voice Activity Detection) auto-stops recording after 1.5s silence
- Continuous loop: speak → transcribe → agent responds → TTS plays → auto-listen
- Voice mode UI: input bar hides, large mic button centered
- Auto-restart listening when TTS playback finishes
- Fallback: restart listening on text response if no TTS arrives
2026-03-14 14:27:20 +03:00
0xbyt4
d3e09df01a feat: add voice conversation support and futuristic UI redesign
- Auto-TTS: voice messages get spoken response (audio first, then text)
- STT: Groq Whisper fallback when VOICE_TOOLS_OPENAI_KEY not set
- Futuristic UI: glassmorphism, centered container, purple theme, glow effects
- Voice bubble: custom waveform player with seek and progress
- Invisible TTS playback via play_tts() method (no audio file in chat)
- Add hermes-web toolset with full tool access
- Register Platform.WEB in toolset/config maps
- Update docs for voice conversation feature
2026-03-14 14:27:20 +03:00
0xbyt4
db51cfa60e docs: add Web UI setup guide and update gateway docs
- New web.md with full setup, features, security, and troubleshooting
- Update index.md: architecture diagram, platform table, commands, links
2026-03-14 14:27:20 +03:00
0xbyt4
536be3e0f6 fix: show correct LAN IP when VPN is active
Detect all network interfaces instead of relying on UDP trick which
returns VPN IP. Prefers 192.168.x.x/10.x.x.x over VPN ranges.
Shows all available IPs in console output.
2026-03-14 14:27:20 +03:00
0xbyt4
ddfbc22b7c feat: add /remote-control command to start web UI on demand
Type /remote-control from any platform (Telegram, Discord, etc.) to
instantly start the web UI without restarting the gateway.

- Auto-generates access token if not provided
- Shows URL + token in response
- Optional: /remote-control [port] [token]
- Reports status if already running
- Added to /help command list
2026-03-14 14:27:20 +03:00
0xbyt4
4e3b14dc69 docs: add Web UI config to .env.example 2026-03-14 14:27:20 +03:00
0xbyt4
a3905ef289 feat: add web gateway — browser-based chat UI over WebSocket
New platform adapter that serves a full-featured chat interface via HTTP.
Enables access from any device on the network (phone, tablet, desktop).

Features:
- aiohttp server with WebSocket real-time messaging
- Token-based authentication
- Markdown rendering (marked.js) + code highlighting (highlight.js)
- Voice recording via MediaRecorder API + STT transcription
- Image, voice, and document display
- Typing indicator + message editing (streaming support)
- Mobile responsive dark theme
- Auto-reconnect on disconnect
- Media file cleanup (24h TTL)

Config: WEB_UI_ENABLED=true, WEB_UI_PORT=8765, WEB_UI_TOKEN=<token>
No new dependencies — uses aiohttp already in [messaging] extra.
2026-03-14 14:27:20 +03:00
0xbyt4
e50323f730 fix(test): add missing _voice_mode attr to GatewayRunner test stubs 2026-03-14 14:27:20 +03:00
0xbyt4
75bd5a582b docs: improve voice mode docs with prerequisites, startup commands, and platform links 2026-03-14 14:27:20 +03:00
0xbyt4
2bb2312ea2 docs: add comprehensive voice mode documentation
Cover CLI voice mode, Telegram/Discord auto voice reply, and Discord
voice channel support. Include setup guide with bot permissions, OAuth2
invite URL, privileged intents, system dependencies, and Python packages.
Update discord.md voice messages section with correct STT key reference.
2026-03-14 14:27:20 +03:00
0xbyt4
c0c358d051 feat: add Discord voice channel listening — STT transcription and agent response pipeline
Phase 2 of voice channel support: bot listens to users speaking in VC,
transcribes speech via Groq Whisper, and processes through the agent pipeline.

- Add VoiceReceiver class for RTP packet capture, NaCl/DAVE decryption, Opus decode
- Add silence detection and per-user PCM buffering
- Wire voice input callback from adapter to GatewayRunner
- Fix adapter dict key: use Platform.DISCORD enum instead of string
- Fix guild_id extraction for synthetic voice events via SimpleNamespace raw_message
- Pause/resume receiver during TTS playback to prevent echo
2026-03-14 14:27:20 +03:00
0xbyt4
cc974904f8 feat: Discord voice channel support — bot joins VC and speaks replies
- /voice channel: bot joins user's voice channel, speaks TTS replies
- /voice leave: disconnect from voice channel
- Auto-disconnect after 5 min inactivity
- _get_guild_id() helper extracts guild from raw_message
- Load opus codec for voice playback
- discord.py[voice] in pyproject.toml (pulls PyNaCl + davey)
2026-03-14 14:27:20 +03:00
0xbyt4
cbe4c23efa fix: Discord voice bubble + edge-tts mp3/ogg format mismatch
- Send Discord voice messages with flags=8192 and waveform metadata
  so they render as native voice bubbles instead of file attachments
- Use .mp3 output path for TTS so edge-tts opus conversion works
  correctly (edge always outputs mp3, convert was skipped for .ogg)
- Use actual file_path from TTS result after potential opus conversion
2026-03-14 14:27:20 +03:00
0xbyt4
f6cf4ca826 feat: add /voice slash command to Discord + fix cross-platform send_voice
- Register /voice as Discord slash command with mode choices
- Fix _send_voice_reply to handle adapters that don't accept metadata
  parameter (Discord) by inspecting the method signature at runtime
2026-03-14 14:27:20 +03:00