Restore local STT command fallback for voice transcription, detect whisper and ffmpeg in common local install paths, and avoid bogus no-provider messaging when only a backend-specific key is missing.
* docs: comprehensive fallback providers documentation
- New dedicated page: user-guide/features/fallback-providers.md covering
both primary model fallback and auxiliary task fallback systems
- Updated configuration.md with fallback_model config section
- Updated environment-variables.md noting fallback is config-only
- Fleshed out developer-guide/provider-runtime.md fallback section with
internal architecture details (trigger points, activation flow, config flow)
- Added cross-reference from provider-routing.md distinguishing OpenRouter
sub-provider routing from Hermes-level model fallback
- Added new page to sidebar under Integrations
* docs: comprehensive /background command documentation
- Added Background Sessions section to cli.md covering how it works
(daemon threads, isolated sessions, config inheritance, Rich panel
output, bell notification, concurrent tasks)
- Added Background Sessions section to messaging/index.md covering
messaging-specific behavior (async execution, result delivery back
to same chat, fire-and-forget pattern)
- Documented background_process_notifications config
(all/result/error/off) in messaging docs and configuration.md
- Added HERMES_BACKGROUND_NOTIFICATIONS env var to reference page
- Fixed inconsistency in slash-commands.md: /background was listed as
messaging-only but works in both CLI and messaging. Moved it to the
'both surfaces' note.
- Expanded one-liner table descriptions with detail and cross-references
Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.
Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.
Allow cron runs to keep using send_message for additional destinations, but
skip same-target sends when the scheduler will already auto-deliver the final
response there. Add prompt/tool guidance, docs, and regression coverage for
origin/home-channel resolution and thread-aware comparisons.
Add a hands-on guide for using voice mode with Hermes, fix and expand the main voice-mode docs, surface /voice in messaging docs, and improve discoverability from the homepage and learning path.
Document the expanded skills hub functionality, including:
- skills.sh source usage
- well-known endpoint discovery
- check/update commands
- real install/inspect examples
- accurate --force semantics and trust policy behavior
Also verified the docs site with a successful Docusaurus production build.
Document the new global-only SOUL behavior, add a dedicated use guide, update personality/context/config docs, and fix docs language that still described cwd-local SOUL loading.
Expand the MCP feature docs with filtering and capability-aware registration details, add a practical 'Use MCP with Hermes' tutorial, add a config reference page, and wire the new docs into the sidebar and landing page.
Remove web UI gateway (web.py, tests, docs, toolset, env vars, Platform.WEB
enum) per maintainer request — Nous is building their own official chat UI.
Fix 1: Replace sd.wait() with polling pattern in play_audio_file() to prevent
indefinite hang when audio device stalls (consistent with play_beep()).
Fix 2: Use importlib.util.find_spec() for faster_whisper/openai availability
checks instead of module-level imports that trigger heavy native library
loading (CUDA/cuDNN) at import time.
Fix 3: Remove inspect.signature() hack in _send_voice_reply() — add **kwargs
to Telegram send_voice() so all adapters accept metadata uniformly.
Fix 4: Make session loading resilient to removed platform enum values — skip
entries with unknown platforms instead of crashing the entire gateway.
Merge main's faster-whisper (local, free) with our Groq support into a
unified three-provider STT pipeline: local > groq > openai.
Provider priority ensures free options are tried first. Each provider
has its own transcriber function with model auto-correction, env-
overridable endpoints, and proper error handling.
74 tests cover the full provider matrix, fallback chains, model
correction, config loading, validation edge cases, and dispatch.
- Use hmac.compare_digest for timing-safe token comparison (3 endpoints)
- Default bind to 127.0.0.1 instead of 0.0.0.0
- Sanitize upload filenames with Path.name to prevent path traversal
- Add DOMPurify to sanitize marked.parse() output against XSS
- Replace add_static with authenticated media handler
- Hide token in group chats for /remote-control command
- Use ctypes.util.find_library for Opus instead of hardcoded paths
- Add force=True to 5 interrupt _vprint calls for visibility
- Log Opus decode errors and voice restart failures instead of swallowing
Code fixes:
- STT model, Groq base URL, and OpenAI STT base URL are now
configurable via env vars (STT_GROQ_MODEL, STT_OPENAI_MODEL,
GROQ_BASE_URL, STT_OPENAI_BASE_URL) instead of hardcoded
- Gateway and Discord VC now read stt.model from config.yaml
(previously only CLI did this — gateway always used defaults)
Doc fixes:
- voice-mode.md: move Web UI troubleshooting to web.md (was duplicated)
- voice-mode.md: simplify "How It Works" for end users (remove NaCl,
DAVE, RTP internals)
- voice-mode.md: clarify STT priority (OpenAI used first if both keys
set, Groq recommended for free tier)
- voice-mode.md: document new STT env overrides in config reference
- web.md: remove duplicate Quick Start / Step 1-3 sections
- web.md: add mobile HTTPS mic workarounds (moved from voice-mode.md)
- web.md: clarify STT fallback order
- voice-mode.md: Discord sends native voice bubbles (OGG/Opus flags=8192),
not MP3 file attachments. Falls back to file only if voice API fails.
- discord.md: Bot requires @mention by default in server channels
(DISCORD_REQUIRE_MENTION=true). Previous text incorrectly said no
mention needed.
- index.md: Fix broken ASCII architecture diagram alignment after
adding Web adapter box.
- macOS firewall may block LAN access to Web UI
- Mobile browsers require HTTPS for microphone API
- Document workarounds: Android Chrome flag, mkcert self-signed cert,
Caddy reverse proxy, SSH tunnel for iOS
- Document DM vs server channel interaction modes
- Explain @mention requirement and how to select bot user vs role
- Add DISCORD_REQUIRE_MENTION and DISCORD_FREE_RESPONSE_CHANNELS config
- Add troubleshooting entry for bot not responding in server channels
- Auto-TTS: voice messages get spoken response (audio first, then text)
- STT: Groq Whisper fallback when VOICE_TOOLS_OPENAI_KEY not set
- Futuristic UI: glassmorphism, centered container, purple theme, glow effects
- Voice bubble: custom waveform player with seek and progress
- Invisible TTS playback via play_tts() method (no audio file in chat)
- Add hermes-web toolset with full tool access
- Register Platform.WEB in toolset/config maps
- Update docs for voice conversation feature