hermes-agent

Author	SHA1	Message	Date
0xbyt4	35748a2fb0	fix: address PR review round 4 — remove web UI, fix audio/import/interface issues Remove web UI gateway (web.py, tests, docs, toolset, env vars, Platform.WEB enum) per maintainer request — Nous is building their own official chat UI. Fix 1: Replace sd.wait() with polling pattern in play_audio_file() to prevent indefinite hang when audio device stalls (consistent with play_beep()). Fix 2: Use importlib.util.find_spec() for faster_whisper/openai availability checks instead of module-level imports that trigger heavy native library loading (CUDA/cuDNN) at import time. Fix 3: Remove inspect.signature() hack in _send_voice_reply() — add **kwargs to Telegram send_voice() so all adapters accept metadata uniformly. Fix 4: Make session loading resilient to removed platform enum values — skip entries with unknown platforms instead of crashing the entire gateway.	2026-03-14 14:27:21 +03:00
0xbyt4	1ad5e0ed15	feat: add voice channel awareness — inject participant and speaking state into agent context	2026-03-14 14:27:21 +03:00
0xbyt4	49f3f0fc62	fix: add choices/Choice to discord mock for /voice slash command test	2026-03-14 14:27:21 +03:00
0xbyt4	e3126aeb40	fix: STT consistency — web.py model param, error matching, local provider key - web.py: pass stt_model from config like discord.py and run.py do - run.py: match new error messages (No STT provider / not set) - _transcribe_local: add missing "provider": "local" to return dict	2026-03-14 14:27:21 +03:00
0xbyt4	41162e0aca	fix: prevent shutdown deadlock and unblockable Ctrl+C on exit Move stream close outside the lock in shutdown() to prevent deadlock when audio callback tries to acquire the same lock. Replace single t.join(timeout) with a polling loop (0.1s intervals) so KeyboardInterrupt is not blocked during stream cleanup.	2026-03-14 14:27:21 +03:00
0xbyt4	69cb373864	fix: update /voice status to show correct STT provider Voice status was hardcoded to check API keys only. Now uses the actual provider resolution (local/groq/openai) so it correctly shows "local faster-whisper" when installed instead of "Groq" or "MISSING".	2026-03-14 14:27:21 +03:00
0xbyt4	eb052b1b42	fix: add explicit metadata param to Discord send_voice signature	2026-03-14 14:27:21 +03:00
0xbyt4	b8f8d3ef9e	feat: integrate faster-whisper local STT with three-provider fallback Merge main's faster-whisper (local, free) with our Groq support into a unified three-provider STT pipeline: local > groq > openai. Provider priority ensures free options are tried first. Each provider has its own transcriber function with model auto-correction, env- overridable endpoints, and proper error handling. 74 tests cover the full provider matrix, fallback chains, model correction, config loading, validation edge cases, and dispatch.	2026-03-14 14:27:21 +03:00
0xbyt4	c433c89d7d	fix: demote RTP debug logs to DEBUG and isolate web sessions - Change RTP packet logging from INFO to DEBUG level to reduce noise (SPEAKING events remain at INFO as they are important lifecycle events) - Use per-session chat_id (web_{session_id}) instead of shared "web" to isolate conversation context between simultaneous web users	2026-03-14 14:27:21 +03:00
0xbyt4	fa2c825e2f	fix: isolate WEB_UI_HOST env var in test and handle empty string - Patch WEB_UI_HOST in test_web_defaults to avoid env leak - Handle empty WEB_UI_HOST string in config (fall back to 127.0.0.1)	2026-03-14 14:27:21 +03:00
0xbyt4	5b47b87c42	fix: show only reachable URLs in Web UI startup message When bound to 127.0.0.1, only show localhost URL instead of listing unreachable network interfaces. Add hint about WEB_UI_HOST=0.0.0.0 for phone/tablet access. Add VPN/multi-interface and token exposure tests (11 new tests).	2026-03-14 14:27:21 +03:00
0xbyt4	a21f518c0b	fix: hide configured token value in Web UI startup log Only print the access token when auto-generated (user needs it to log in). When set via WEB_UI_TOKEN env var, just confirm it is set without exposing the value in console output.	2026-03-14 14:27:21 +03:00
0xbyt4	44abe852fb	fix: add macOS Homebrew Opus fallback and fix shutdown dict iteration - Add Homebrew library path fallback when ctypes.util.find_library fails on macOS (Apple Silicon + Intel paths, guarded by platform check) - Fix RuntimeError in gateway stop() by iterating over dict copy - Update Opus tests to verify find_library-first + conditional fallback	2026-03-14 14:27:21 +03:00
0xbyt4	c797314fcf	test: add security and hardening tests for voice mode fixes - Path traversal sanitization (Path.name strips ../) - Media endpoint authentication (401 without token, 404 on traversal) - hmac.compare_digest usage verification (no == for tokens) - DOMPurify XSS prevention in HTML template - Default bind 127.0.0.1 (adapter and config) - /remote-control token hiding in group chats - Opus find_library instead of hardcoded paths - Opus decode error logging (no silent swallow) - Interrupt _vprint force=True on all 6 calls - Anthropic interrupt handler in both API call paths - Update test_web_defaults for new 127.0.0.1 default	2026-03-14 14:27:21 +03:00
0xbyt4	0ff1b4ade2	fix: harden web gateway security and fix error swallowing - Use hmac.compare_digest for timing-safe token comparison (3 endpoints) - Default bind to 127.0.0.1 instead of 0.0.0.0 - Sanitize upload filenames with Path.name to prevent path traversal - Add DOMPurify to sanitize marked.parse() output against XSS - Replace add_static with authenticated media handler - Hide token in group chats for /remote-control command - Use ctypes.util.find_library for Opus instead of hardcoded paths - Add force=True to 5 interrupt _vprint calls for visibility - Log Opus decode errors and voice restart failures instead of swallowing	2026-03-14 14:27:21 +03:00
0xbyt4	d646442692	fix: restore Anthropic interrupt handler in _interruptible_api_call Rebase auto-merge silently overwrote main's Anthropic-aware interrupt handler with the older OpenAI-only version. Without this fix, interrupting an Anthropic API call closes the wrong client and leaves token generation running on the Anthropic side.	2026-03-14 14:27:21 +03:00
0xbyt4	0a8985acf9	fix: add missing load_config import in _show_voice_status	2026-03-14 14:27:21 +03:00
0xbyt4	2c84979d77	refactor: extract get_stt_model_from_config helper to eliminate DRY violation Duplicated YAML config parsing for stt.model existed in gateway/run.py and gateway/platforms/discord.py. Moved to a single helper in transcription_tools.py and added 5 tests covering all edge cases.	2026-03-14 14:27:21 +03:00
0xbyt4	3260413cc7	docs: add STT override env vars to .env.example	2026-03-14 14:27:20 +03:00
0xbyt4	238a431545	fix: make STT config env-overridable and fix doc issues Code fixes: - STT model, Groq base URL, and OpenAI STT base URL are now configurable via env vars (STT_GROQ_MODEL, STT_OPENAI_MODEL, GROQ_BASE_URL, STT_OPENAI_BASE_URL) instead of hardcoded - Gateway and Discord VC now read stt.model from config.yaml (previously only CLI did this — gateway always used defaults) Doc fixes: - voice-mode.md: move Web UI troubleshooting to web.md (was duplicated) - voice-mode.md: simplify "How It Works" for end users (remove NaCl, DAVE, RTP internals) - voice-mode.md: clarify STT priority (OpenAI used first if both keys set, Groq recommended for free tier) - voice-mode.md: document new STT env overrides in config reference - web.md: remove duplicate Quick Start / Step 1-3 sections - web.md: add mobile HTTPS mic workarounds (moved from voice-mode.md) - web.md: clarify STT fallback order	2026-03-14 14:27:20 +03:00
0xbyt4	79ed0effdd	docs: fix 3 inaccuracies found during code-vs-docs audit - voice-mode.md: Discord sends native voice bubbles (OGG/Opus flags=8192), not MP3 file attachments. Falls back to file only if voice API fails. - discord.md: Bot requires @mention by default in server channels (DISCORD_REQUIRE_MENTION=true). Previous text incorrectly said no mention needed. - index.md: Fix broken ASCII architecture diagram alignment after adding Web adapter box.	2026-03-14 14:27:20 +03:00
0xbyt4	9722bd8be0	fix: 8 voice pipeline bugs with tests proving each fix 1. VoiceReceiver.stop() now acquires _lock before clearing shared state to prevent race with _on_packet on the socket reader thread 2. _packet_debug_count moved from class-level to instance-level to avoid cross-instance race condition in multi-guild setups 3. play_in_voice_channel uses asyncio.get_running_loop() instead of deprecated asyncio.get_event_loop() 4. _send_voice_reply uses uuid for filenames instead of time-based names that can collide when two replies happen in the same second 5. Voice timeout now notifies runner via _on_voice_disconnect callback so runner cleans up _voice_mode state (prevents orphaned TTS replies) 6. play_in_voice_channel adds PLAYBACK_TIMEOUT (120s) to prevent infinite blocking when FFmpeg callback is never called 7. _send_voice_reply moves temp file cleanup to finally block so files are always cleaned up even when send_voice/play raises 8. Base adapter auto-TTS wraps play_tts in try/finally with os.remove to clean up generated audio files after playback 18 new tests (120 total voice tests)	2026-03-14 14:27:20 +03:00
0xbyt4	c925d2ee76	fix: voice pipeline thread safety and error handling bugs - Add lock protection around VoiceReceiver buffer writes in _on_packet to prevent race condition with check_silence on different threads - Wire _voice_input_callback BEFORE join_voice_channel to avoid losing voice input during the join window - Add try/except around leave_voice_channel to ensure state cleanup (voice_mode, callback) even if leave raises an exception - Guard against empty text after markdown stripping in base.py auto-TTS - Add 11 tests proving each bug and verifying the fix	2026-03-14 14:27:20 +03:00
0xbyt4	34c324ff59	fix(test): use real _strip_markdown_for_tts instead of duplicated copy - Import from tools.tts_tool instead of reimplementing the logic - Fix test_truncates_long_text: truncation is the caller's job, not the function's - Remove unused re import	2026-03-14 14:27:20 +03:00
0xbyt4	86ddaaee9c	fix: extract voice reply logic and add comprehensive tests - Fix tempfile.mktemp() TOCTOU race in Discord voice input (use NamedTemporaryFile) - Extract voice reply decision from _handle_message into _should_send_voice_reply() - Rewrite TestAutoVoiceReply to call real method instead of testing a copy - Add 59 new tests: VoiceReceiver, VC commands, adapter methods, streaming TTS	2026-03-14 14:27:20 +03:00
0xbyt4	0d56b79685	docs: add firewall and mobile HTTPS troubleshooting for Web UI - macOS firewall may block LAN access to Web UI - Mobile browsers require HTTPS for microphone API - Document workarounds: Android Chrome flag, mkcert self-signed cert, Caddy reverse proxy, SSH tunnel for iOS	2026-03-14 14:27:20 +03:00
0xbyt4	3431f73c96	fix: show mic button on mobile Web UI with HTTPS warning Mobile browsers require HTTPS for navigator.mediaDevices API. Instead of hiding the mic button (confusing UX), show it as dimmed and display an informative message when tapped explaining the HTTPS requirement.	2026-03-14 14:27:20 +03:00
0xbyt4	fbf47e9ff6	fix: allow voice reply in Discord VC despite skip_double guard When bot is in a Discord voice channel, both base auto-TTS and Discord play_tts override skip audio. The skip_double guard was also blocking the runner's _send_voice_reply, resulting in zero audio output in VC. Now skip_double is overridden when the bot is actively connected to a voice channel, allowing play_in_voice_channel to handle TTS. Add comprehensive test matrix covering all platform x input x mode combinations with full decision table documentation.	2026-03-14 14:27:20 +03:00
0xbyt4	dcb84a8d30	test: add double TTS prevention tests for voice reply logic - Update TestAutoVoiceReply to include skip_double logic: voice input is handled by base adapter auto-TTS, gateway runner skips to prevent duplicate audio - Add TestDiscordPlayTtsSkip: verifies Discord adapter skips play_tts when bot is in a voice channel (VC playback handled by runner) - Add TestWebPlayTts: verifies Web adapter sends invisible play_audio instead of voice bubble	2026-03-14 14:27:20 +03:00
0xbyt4	095815d520	fix: skip gateway voice reply for all platforms on voice input Base adapter auto-TTS already generates and sends audio for voice messages in _process_message_background. The gateway runner's _send_voice_reply was causing double audio on all platforms (not just Web). Now skip_double applies to any voice input regardless of platform.	2026-03-14 14:27:20 +03:00
0xbyt4	62e75cd158	fix: skip duplicate TTS file attachment when bot is in Discord voice channel Override play_tts in DiscordAdapter to no-op when connected to a voice channel for the same guild. The gateway runner already plays TTS audio in the VC via play_in_voice_channel, so the base adapter's fallback to send_voice (file attachment) was causing double audio output.	2026-03-14 14:27:20 +03:00
0xbyt4	815e83952e	fix: prevent double TTS on Web UI voice messages When voice mode is enabled and user sends a voice message on Web UI, both the base adapter auto-TTS (play_audio) and the gateway voice reply (send_voice) would fire, causing duplicate audio playback. Skip the gateway voice reply for Web platform voice input since base adapter already handles it.	2026-03-14 14:27:20 +03:00
0xbyt4	e21a13488b	docs: add Discord DM usage and mention requirement to voice mode guide - Document DM vs server channel interaction modes - Explain @mention requirement and how to select bot user vs role - Add DISCORD_REQUIRE_MENTION and DISCORD_FREE_RESPONSE_CHANNELS config - Add troubleshooting entry for bot not responding in server channels	2026-03-14 14:27:20 +03:00
0xbyt4	1b10c3711d	fix: accept kwargs in send_voice for Discord and Slack adapters play_tts base class forwards metadata via kwargs to send_voice, but Discord and Slack adapters did not accept extra keyword arguments, causing TypeError and silent message handling failure. Also fix test_web_defaults to patch correct env var (WEB_UI_TOKEN).	2026-03-14 14:27:20 +03:00
0xbyt4	f078cb4038	fix(test): isolate WEB_TOKEN env var in test_web_defaults	2026-03-14 14:27:20 +03:00
0xbyt4	6205f061fe	test: add comprehensive tests for web gateway adapter 32 tests covering: - Platform enum and config env overrides - WebAdapter init, port/host/token parsing, auto-token generation - aiohttp server lifecycle (connect/disconnect) - HTML serving on GET / - WebSocket auth handshake (success/failure) - WebSocket text message routing to handler - send/send_voice/play_tts broadcast payloads - hermes-web toolset registration - Groq STT fallback in transcription_tools - LAN IP detection - Media directory management	2026-03-14 14:27:20 +03:00
0xbyt4	c477f660da	feat: add continuous voice mode with VAD silence detection - Voice mode: press mic once to enter, press again to exit - VAD (Voice Activity Detection) auto-stops recording after 1.5s silence - Continuous loop: speak → transcribe → agent responds → TTS plays → auto-listen - Voice mode UI: input bar hides, large mic button centered - Auto-restart listening when TTS playback finishes - Fallback: restart listening on text response if no TTS arrives	2026-03-14 14:27:20 +03:00
0xbyt4	d3e09df01a	feat: add voice conversation support and futuristic UI redesign - Auto-TTS: voice messages get spoken response (audio first, then text) - STT: Groq Whisper fallback when VOICE_TOOLS_OPENAI_KEY not set - Futuristic UI: glassmorphism, centered container, purple theme, glow effects - Voice bubble: custom waveform player with seek and progress - Invisible TTS playback via play_tts() method (no audio file in chat) - Add hermes-web toolset with full tool access - Register Platform.WEB in toolset/config maps - Update docs for voice conversation feature	2026-03-14 14:27:20 +03:00
0xbyt4	db51cfa60e	docs: add Web UI setup guide and update gateway docs - New web.md with full setup, features, security, and troubleshooting - Update index.md: architecture diagram, platform table, commands, links	2026-03-14 14:27:20 +03:00
0xbyt4	536be3e0f6	fix: show correct LAN IP when VPN is active Detect all network interfaces instead of relying on UDP trick which returns VPN IP. Prefers 192.168.x.x/10.x.x.x over VPN ranges. Shows all available IPs in console output.	2026-03-14 14:27:20 +03:00
0xbyt4	ddfbc22b7c	feat: add /remote-control command to start web UI on demand Type /remote-control from any platform (Telegram, Discord, etc.) to instantly start the web UI without restarting the gateway. - Auto-generates access token if not provided - Shows URL + token in response - Optional: /remote-control [port] [token] - Reports status if already running - Added to /help command list	2026-03-14 14:27:20 +03:00
0xbyt4	4e3b14dc69	docs: add Web UI config to .env.example	2026-03-14 14:27:20 +03:00
0xbyt4	a3905ef289	feat: add web gateway — browser-based chat UI over WebSocket New platform adapter that serves a full-featured chat interface via HTTP. Enables access from any device on the network (phone, tablet, desktop). Features: - aiohttp server with WebSocket real-time messaging - Token-based authentication - Markdown rendering (marked.js) + code highlighting (highlight.js) - Voice recording via MediaRecorder API + STT transcription - Image, voice, and document display - Typing indicator + message editing (streaming support) - Mobile responsive dark theme - Auto-reconnect on disconnect - Media file cleanup (24h TTL) Config: WEB_UI_ENABLED=true, WEB_UI_PORT=8765, WEB_UI_TOKEN=<token> No new dependencies — uses aiohttp already in [messaging] extra.	2026-03-14 14:27:20 +03:00
0xbyt4	e50323f730	fix(test): add missing _voice_mode attr to GatewayRunner test stubs	2026-03-14 14:27:20 +03:00
0xbyt4	75bd5a582b	docs: improve voice mode docs with prerequisites, startup commands, and platform links	2026-03-14 14:27:20 +03:00
0xbyt4	2bb2312ea2	docs: add comprehensive voice mode documentation Cover CLI voice mode, Telegram/Discord auto voice reply, and Discord voice channel support. Include setup guide with bot permissions, OAuth2 invite URL, privileged intents, system dependencies, and Python packages. Update discord.md voice messages section with correct STT key reference.	2026-03-14 14:27:20 +03:00
0xbyt4	c0c358d051	feat: add Discord voice channel listening — STT transcription and agent response pipeline Phase 2 of voice channel support: bot listens to users speaking in VC, transcribes speech via Groq Whisper, and processes through the agent pipeline. - Add VoiceReceiver class for RTP packet capture, NaCl/DAVE decryption, Opus decode - Add silence detection and per-user PCM buffering - Wire voice input callback from adapter to GatewayRunner - Fix adapter dict key: use Platform.DISCORD enum instead of string - Fix guild_id extraction for synthetic voice events via SimpleNamespace raw_message - Pause/resume receiver during TTS playback to prevent echo	2026-03-14 14:27:20 +03:00
0xbyt4	cc974904f8	feat: Discord voice channel support — bot joins VC and speaks replies - /voice channel: bot joins user's voice channel, speaks TTS replies - /voice leave: disconnect from voice channel - Auto-disconnect after 5 min inactivity - _get_guild_id() helper extracts guild from raw_message - Load opus codec for voice playback - discord.py[voice] in pyproject.toml (pulls PyNaCl + davey)	2026-03-14 14:27:20 +03:00
0xbyt4	cbe4c23efa	fix: Discord voice bubble + edge-tts mp3/ogg format mismatch - Send Discord voice messages with flags=8192 and waveform metadata so they render as native voice bubbles instead of file attachments - Use .mp3 output path for TTS so edge-tts opus conversion works correctly (edge always outputs mp3, convert was skipped for .ogg) - Use actual file_path from TTS result after potential opus conversion	2026-03-14 14:27:20 +03:00
0xbyt4	f6cf4ca826	feat: add /voice slash command to Discord + fix cross-platform send_voice - Register /voice as Discord slash command with mode choices - Fix _send_voice_reply to handle adapters that don't accept metadata parameter (Discord) by inspecting the method signature at runtime	2026-03-14 14:27:20 +03:00

1 2 3 4 5 ...

1677 Commits