Add messaging platform enhancements: STT, stickers, Discord UX, Slack, pairing, hooks
Major feature additions inspired by OpenClaw/ClawdBot integration analysis:
Voice Message Transcription (STT):
- Auto-transcribe voice/audio messages via OpenAI Whisper API
- Download voice to ~/.hermes/audio_cache/ on Telegram/Discord/WhatsApp
- Inject transcript as text so all models can understand voice input
- Configurable model (whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe)
Telegram Sticker Understanding:
- Describe static stickers via vision tool with JSON-backed cache
- Cache keyed by file_unique_id avoids redundant API calls
- Animated/video stickers get emoji-based fallback description
Discord Rich UX:
- Native slash commands (/ask, /reset, /status, /stop) via app_commands
- Button-based exec approvals (Allow Once / Always Allow / Deny)
- ExecApprovalView with user authorization and timeout handling
Slack Integration:
- Full SlackAdapter using slack-bolt with Socket Mode
- DMs, channel messages (mention-gated), /hermes slash command
- File attachment handling with bot-token-authenticated downloads
DM Pairing System:
- Code-based user authorization as alternative to static allowlists
- 8-char codes from unambiguous alphabet, 1-hour expiry
- Rate limiting, lockout after failed attempts, chmod 0600 on data
- CLI: hermes pairing list/approve/revoke/clear-pending
Event Hook System:
- File-based hook discovery from ~/.hermes/hooks/
- HOOK.yaml + handler.py per hook, sync/async handler support
- Events: gateway:startup, session:start/reset, agent:start/step/end
- Wildcard matching (command:* catches all command events)
Cross-Channel Messaging:
- send_message agent tool for delivering to any connected platform
- Enables cron job delivery and cross-platform notifications
Human-Like Response Pacing:
- Configurable delays between message chunks (off/natural/custom)
- HERMES_HUMAN_DELAY_MODE env var with min/max ms settings
Warm Injection Message Style:
- Retrofitted image vision messages with friendly kawaii-consistent tone
- All new injection messages (STT, stickers, errors) use warm style
Also: updated config migration to prompt for optional keys interactively,
bumped config version, updated README, AGENTS.md, .env.example,
cli-config.yaml.example, install scripts, pyproject.toml, and toolsets.
2026-02-15 21:38:59 -08:00
|
|
|
#!/usr/bin/env python3
|
|
|
|
|
"""
|
|
|
|
|
Transcription Tools Module
|
|
|
|
|
|
2026-03-13 23:33:16 +03:00
|
|
|
Provides speech-to-text transcription with three providers:
|
Add messaging platform enhancements: STT, stickers, Discord UX, Slack, pairing, hooks
Major feature additions inspired by OpenClaw/ClawdBot integration analysis:
Voice Message Transcription (STT):
- Auto-transcribe voice/audio messages via OpenAI Whisper API
- Download voice to ~/.hermes/audio_cache/ on Telegram/Discord/WhatsApp
- Inject transcript as text so all models can understand voice input
- Configurable model (whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe)
Telegram Sticker Understanding:
- Describe static stickers via vision tool with JSON-backed cache
- Cache keyed by file_unique_id avoids redundant API calls
- Animated/video stickers get emoji-based fallback description
Discord Rich UX:
- Native slash commands (/ask, /reset, /status, /stop) via app_commands
- Button-based exec approvals (Allow Once / Always Allow / Deny)
- ExecApprovalView with user authorization and timeout handling
Slack Integration:
- Full SlackAdapter using slack-bolt with Socket Mode
- DMs, channel messages (mention-gated), /hermes slash command
- File attachment handling with bot-token-authenticated downloads
DM Pairing System:
- Code-based user authorization as alternative to static allowlists
- 8-char codes from unambiguous alphabet, 1-hour expiry
- Rate limiting, lockout after failed attempts, chmod 0600 on data
- CLI: hermes pairing list/approve/revoke/clear-pending
Event Hook System:
- File-based hook discovery from ~/.hermes/hooks/
- HOOK.yaml + handler.py per hook, sync/async handler support
- Events: gateway:startup, session:start/reset, agent:start/step/end
- Wildcard matching (command:* catches all command events)
Cross-Channel Messaging:
- send_message agent tool for delivering to any connected platform
- Enables cron job delivery and cross-platform notifications
Human-Like Response Pacing:
- Configurable delays between message chunks (off/natural/custom)
- HERMES_HUMAN_DELAY_MODE env var with min/max ms settings
Warm Injection Message Style:
- Retrofitted image vision messages with friendly kawaii-consistent tone
- All new injection messages (STT, stickers, errors) use warm style
Also: updated config migration to prompt for optional keys interactively,
bumped config version, updated README, AGENTS.md, .env.example,
cli-config.yaml.example, install scripts, pyproject.toml, and toolsets.
2026-02-15 21:38:59 -08:00
|
|
|
|
2026-03-13 23:33:16 +03:00
|
|
|
- **local** (default, free) — faster-whisper running locally, no API key needed.
|
|
|
|
|
Auto-downloads the model (~150 MB for ``base``) on first use.
|
|
|
|
|
- **groq** (free tier) — Groq Whisper API, requires ``GROQ_API_KEY``.
|
|
|
|
|
- **openai** (paid) — OpenAI Whisper API, requires ``VOICE_TOOLS_OPENAI_KEY``.
|
Add messaging platform enhancements: STT, stickers, Discord UX, Slack, pairing, hooks
Major feature additions inspired by OpenClaw/ClawdBot integration analysis:
Voice Message Transcription (STT):
- Auto-transcribe voice/audio messages via OpenAI Whisper API
- Download voice to ~/.hermes/audio_cache/ on Telegram/Discord/WhatsApp
- Inject transcript as text so all models can understand voice input
- Configurable model (whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe)
Telegram Sticker Understanding:
- Describe static stickers via vision tool with JSON-backed cache
- Cache keyed by file_unique_id avoids redundant API calls
- Animated/video stickers get emoji-based fallback description
Discord Rich UX:
- Native slash commands (/ask, /reset, /status, /stop) via app_commands
- Button-based exec approvals (Allow Once / Always Allow / Deny)
- ExecApprovalView with user authorization and timeout handling
Slack Integration:
- Full SlackAdapter using slack-bolt with Socket Mode
- DMs, channel messages (mention-gated), /hermes slash command
- File attachment handling with bot-token-authenticated downloads
DM Pairing System:
- Code-based user authorization as alternative to static allowlists
- 8-char codes from unambiguous alphabet, 1-hour expiry
- Rate limiting, lockout after failed attempts, chmod 0600 on data
- CLI: hermes pairing list/approve/revoke/clear-pending
Event Hook System:
- File-based hook discovery from ~/.hermes/hooks/
- HOOK.yaml + handler.py per hook, sync/async handler support
- Events: gateway:startup, session:start/reset, agent:start/step/end
- Wildcard matching (command:* catches all command events)
Cross-Channel Messaging:
- send_message agent tool for delivering to any connected platform
- Enables cron job delivery and cross-platform notifications
Human-Like Response Pacing:
- Configurable delays between message chunks (off/natural/custom)
- HERMES_HUMAN_DELAY_MODE env var with min/max ms settings
Warm Injection Message Style:
- Retrofitted image vision messages with friendly kawaii-consistent tone
- All new injection messages (STT, stickers, errors) use warm style
Also: updated config migration to prompt for optional keys interactively,
bumped config version, updated README, AGENTS.md, .env.example,
cli-config.yaml.example, install scripts, pyproject.toml, and toolsets.
2026-02-15 21:38:59 -08:00
|
|
|
|
2026-03-13 23:33:16 +03:00
|
|
|
Used by the messaging gateway to automatically transcribe voice messages
|
|
|
|
|
sent by users on Telegram, Discord, WhatsApp, Slack, and Signal.
|
Add messaging platform enhancements: STT, stickers, Discord UX, Slack, pairing, hooks
Major feature additions inspired by OpenClaw/ClawdBot integration analysis:
Voice Message Transcription (STT):
- Auto-transcribe voice/audio messages via OpenAI Whisper API
- Download voice to ~/.hermes/audio_cache/ on Telegram/Discord/WhatsApp
- Inject transcript as text so all models can understand voice input
- Configurable model (whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe)
Telegram Sticker Understanding:
- Describe static stickers via vision tool with JSON-backed cache
- Cache keyed by file_unique_id avoids redundant API calls
- Animated/video stickers get emoji-based fallback description
Discord Rich UX:
- Native slash commands (/ask, /reset, /status, /stop) via app_commands
- Button-based exec approvals (Allow Once / Always Allow / Deny)
- ExecApprovalView with user authorization and timeout handling
Slack Integration:
- Full SlackAdapter using slack-bolt with Socket Mode
- DMs, channel messages (mention-gated), /hermes slash command
- File attachment handling with bot-token-authenticated downloads
DM Pairing System:
- Code-based user authorization as alternative to static allowlists
- 8-char codes from unambiguous alphabet, 1-hour expiry
- Rate limiting, lockout after failed attempts, chmod 0600 on data
- CLI: hermes pairing list/approve/revoke/clear-pending
Event Hook System:
- File-based hook discovery from ~/.hermes/hooks/
- HOOK.yaml + handler.py per hook, sync/async handler support
- Events: gateway:startup, session:start/reset, agent:start/step/end
- Wildcard matching (command:* catches all command events)
Cross-Channel Messaging:
- send_message agent tool for delivering to any connected platform
- Enables cron job delivery and cross-platform notifications
Human-Like Response Pacing:
- Configurable delays between message chunks (off/natural/custom)
- HERMES_HUMAN_DELAY_MODE env var with min/max ms settings
Warm Injection Message Style:
- Retrofitted image vision messages with friendly kawaii-consistent tone
- All new injection messages (STT, stickers, errors) use warm style
Also: updated config migration to prompt for optional keys interactively,
bumped config version, updated README, AGENTS.md, .env.example,
cli-config.yaml.example, install scripts, pyproject.toml, and toolsets.
2026-02-15 21:38:59 -08:00
|
|
|
|
2026-03-29 21:27:03 -07:00
|
|
|
Supported input formats: mp3, mp4, mpeg, mpga, m4a, wav, webm, ogg, aac
|
feat(stt): add free local whisper transcription via faster-whisper (#1185)
* fix: Home Assistant event filtering now closed by default
Previously, when no watch_domains or watch_entities were configured,
ALL state_changed events passed through to the agent, causing users
to be flooded with notifications for every HA entity change.
Now events are dropped by default unless the user explicitly configures:
- watch_domains: list of domains to monitor (e.g. climate, light)
- watch_entities: list of specific entity IDs to monitor
- watch_all: true (new option — opt-in to receive all events)
A warning is logged at connect time if no filters are configured,
guiding users to set up their HA platform config.
All 49 gateway HA tests + 52 HA tool tests pass.
* docs: update Home Assistant integration documentation
- homeassistant.md: Fix event filtering docs to reflect closed-by-default
behavior. Add watch_all option. Replace Python dict config example with
YAML. Fix defaults table (was incorrectly showing 'all'). Add required
configuration warning admonition.
- environment-variables.md: Add HASS_TOKEN and HASS_URL to Messaging section.
- messaging/index.md: Add Home Assistant to description, architecture
diagram, platform toolsets table, and Next Steps links.
* fix(terminal): strip provider env vars from background and PTY subprocesses
Extends the env var blocklist from #1157 to also cover the two remaining
leaky paths in process_registry.py:
- spawn_local() PTY path (line 156)
- spawn_local() background Popen path (line 197)
Both were still using raw os.environ, leaking provider vars to background
processes and interactive PTY sessions. Now uses the same dynamic
_HERMES_PROVIDER_ENV_BLOCKLIST from local.py.
Explicit env_vars passed to spawn_local() still override the blocklist,
matching the existing behavior for callers that intentionally need these.
Gap identified by PR #1004 (@PeterFile).
* feat(delegate): add observability metadata to subagent results
Enrich delegate_task results with metadata from the child AIAgent:
- model: which model the child used
- exit_reason: completed | interrupted | max_iterations
- tokens.input / tokens.output: token counts
- tool_trace: per-tool-call trace with byte sizes and ok/error status
Tool trace uses tool_call_id matching to correctly pair parallel tool
calls with their results, with a fallback for messages without IDs.
Cherry-picked from PR #872 by @omerkaz, with fixes:
- Fixed parallel tool call trace pairing (was always updating last entry)
- Removed redundant 'iterations' field (identical to existing 'api_calls')
- Added test for parallel tool call trace correctness
Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>
* feat(stt): add free local whisper transcription via faster-whisper
Replace OpenAI-only STT with a dual-provider system mirroring the TTS
architecture (Edge TTS free / ElevenLabs paid):
STT: faster-whisper local (free, default) / OpenAI Whisper API (paid)
Changes:
- tools/transcription_tools.py: Full rewrite with provider dispatch,
config loading, local faster-whisper backend, and OpenAI API backend.
Auto-downloads model (~150MB for 'base') on first voice message.
Singleton model instance reused across calls.
- pyproject.toml: Add faster-whisper>=1.0.0 as core dependency
- hermes_cli/config.py: Expand stt config to match TTS pattern with
provider selection and per-provider model settings
- agent/context_compressor.py: Fix .strip() crash when LLM returns
non-string content (dict from llama.cpp, None). Fixes #1100 partially.
- tests/: 23 new tests for STT providers + 2 for compressor fix
- docs/: Updated Voice & TTS page with STT provider table, model sizes,
config examples, and fallback behavior
Fallback behavior:
- Local not installed → OpenAI API (if key set)
- OpenAI key not set → local whisper (if installed)
- Neither → graceful error message to user
Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
---------
Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>
Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
2026-03-13 11:11:05 -07:00
|
|
|
|
2026-03-13 23:33:16 +03:00
|
|
|
Usage::
|
|
|
|
|
|
Add messaging platform enhancements: STT, stickers, Discord UX, Slack, pairing, hooks
Major feature additions inspired by OpenClaw/ClawdBot integration analysis:
Voice Message Transcription (STT):
- Auto-transcribe voice/audio messages via OpenAI Whisper API
- Download voice to ~/.hermes/audio_cache/ on Telegram/Discord/WhatsApp
- Inject transcript as text so all models can understand voice input
- Configurable model (whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe)
Telegram Sticker Understanding:
- Describe static stickers via vision tool with JSON-backed cache
- Cache keyed by file_unique_id avoids redundant API calls
- Animated/video stickers get emoji-based fallback description
Discord Rich UX:
- Native slash commands (/ask, /reset, /status, /stop) via app_commands
- Button-based exec approvals (Allow Once / Always Allow / Deny)
- ExecApprovalView with user authorization and timeout handling
Slack Integration:
- Full SlackAdapter using slack-bolt with Socket Mode
- DMs, channel messages (mention-gated), /hermes slash command
- File attachment handling with bot-token-authenticated downloads
DM Pairing System:
- Code-based user authorization as alternative to static allowlists
- 8-char codes from unambiguous alphabet, 1-hour expiry
- Rate limiting, lockout after failed attempts, chmod 0600 on data
- CLI: hermes pairing list/approve/revoke/clear-pending
Event Hook System:
- File-based hook discovery from ~/.hermes/hooks/
- HOOK.yaml + handler.py per hook, sync/async handler support
- Events: gateway:startup, session:start/reset, agent:start/step/end
- Wildcard matching (command:* catches all command events)
Cross-Channel Messaging:
- send_message agent tool for delivering to any connected platform
- Enables cron job delivery and cross-platform notifications
Human-Like Response Pacing:
- Configurable delays between message chunks (off/natural/custom)
- HERMES_HUMAN_DELAY_MODE env var with min/max ms settings
Warm Injection Message Style:
- Retrofitted image vision messages with friendly kawaii-consistent tone
- All new injection messages (STT, stickers, errors) use warm style
Also: updated config migration to prompt for optional keys interactively,
bumped config version, updated README, AGENTS.md, .env.example,
cli-config.yaml.example, install scripts, pyproject.toml, and toolsets.
2026-02-15 21:38:59 -08:00
|
|
|
from tools.transcription_tools import transcribe_audio
|
|
|
|
|
|
|
|
|
|
result = transcribe_audio("/path/to/audio.ogg")
|
|
|
|
|
if result["success"]:
|
|
|
|
|
print(result["transcript"])
|
|
|
|
|
"""
|
|
|
|
|
|
2026-02-21 03:11:11 -08:00
|
|
|
import logging
|
Add messaging platform enhancements: STT, stickers, Discord UX, Slack, pairing, hooks
Major feature additions inspired by OpenClaw/ClawdBot integration analysis:
Voice Message Transcription (STT):
- Auto-transcribe voice/audio messages via OpenAI Whisper API
- Download voice to ~/.hermes/audio_cache/ on Telegram/Discord/WhatsApp
- Inject transcript as text so all models can understand voice input
- Configurable model (whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe)
Telegram Sticker Understanding:
- Describe static stickers via vision tool with JSON-backed cache
- Cache keyed by file_unique_id avoids redundant API calls
- Animated/video stickers get emoji-based fallback description
Discord Rich UX:
- Native slash commands (/ask, /reset, /status, /stop) via app_commands
- Button-based exec approvals (Allow Once / Always Allow / Deny)
- ExecApprovalView with user authorization and timeout handling
Slack Integration:
- Full SlackAdapter using slack-bolt with Socket Mode
- DMs, channel messages (mention-gated), /hermes slash command
- File attachment handling with bot-token-authenticated downloads
DM Pairing System:
- Code-based user authorization as alternative to static allowlists
- 8-char codes from unambiguous alphabet, 1-hour expiry
- Rate limiting, lockout after failed attempts, chmod 0600 on data
- CLI: hermes pairing list/approve/revoke/clear-pending
Event Hook System:
- File-based hook discovery from ~/.hermes/hooks/
- HOOK.yaml + handler.py per hook, sync/async handler support
- Events: gateway:startup, session:start/reset, agent:start/step/end
- Wildcard matching (command:* catches all command events)
Cross-Channel Messaging:
- send_message agent tool for delivering to any connected platform
- Enables cron job delivery and cross-platform notifications
Human-Like Response Pacing:
- Configurable delays between message chunks (off/natural/custom)
- HERMES_HUMAN_DELAY_MODE env var with min/max ms settings
Warm Injection Message Style:
- Retrofitted image vision messages with friendly kawaii-consistent tone
- All new injection messages (STT, stickers, errors) use warm style
Also: updated config migration to prompt for optional keys interactively,
bumped config version, updated README, AGENTS.md, .env.example,
cli-config.yaml.example, install scripts, pyproject.toml, and toolsets.
2026-02-15 21:38:59 -08:00
|
|
|
import os
|
2026-03-15 21:51:40 -07:00
|
|
|
import shlex
|
|
|
|
|
import shutil
|
|
|
|
|
import subprocess
|
|
|
|
|
import tempfile
|
Add messaging platform enhancements: STT, stickers, Discord UX, Slack, pairing, hooks
Major feature additions inspired by OpenClaw/ClawdBot integration analysis:
Voice Message Transcription (STT):
- Auto-transcribe voice/audio messages via OpenAI Whisper API
- Download voice to ~/.hermes/audio_cache/ on Telegram/Discord/WhatsApp
- Inject transcript as text so all models can understand voice input
- Configurable model (whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe)
Telegram Sticker Understanding:
- Describe static stickers via vision tool with JSON-backed cache
- Cache keyed by file_unique_id avoids redundant API calls
- Animated/video stickers get emoji-based fallback description
Discord Rich UX:
- Native slash commands (/ask, /reset, /status, /stop) via app_commands
- Button-based exec approvals (Allow Once / Always Allow / Deny)
- ExecApprovalView with user authorization and timeout handling
Slack Integration:
- Full SlackAdapter using slack-bolt with Socket Mode
- DMs, channel messages (mention-gated), /hermes slash command
- File attachment handling with bot-token-authenticated downloads
DM Pairing System:
- Code-based user authorization as alternative to static allowlists
- 8-char codes from unambiguous alphabet, 1-hour expiry
- Rate limiting, lockout after failed attempts, chmod 0600 on data
- CLI: hermes pairing list/approve/revoke/clear-pending
Event Hook System:
- File-based hook discovery from ~/.hermes/hooks/
- HOOK.yaml + handler.py per hook, sync/async handler support
- Events: gateway:startup, session:start/reset, agent:start/step/end
- Wildcard matching (command:* catches all command events)
Cross-Channel Messaging:
- send_message agent tool for delivering to any connected platform
- Enables cron job delivery and cross-platform notifications
Human-Like Response Pacing:
- Configurable delays between message chunks (off/natural/custom)
- HERMES_HUMAN_DELAY_MODE env var with min/max ms settings
Warm Injection Message Style:
- Retrofitted image vision messages with friendly kawaii-consistent tone
- All new injection messages (STT, stickers, errors) use warm style
Also: updated config migration to prompt for optional keys interactively,
bumped config version, updated README, AGENTS.md, .env.example,
cli-config.yaml.example, install scripts, pyproject.toml, and toolsets.
2026-02-15 21:38:59 -08:00
|
|
|
from pathlib import Path
|
2026-03-13 23:33:16 +03:00
|
|
|
from typing import Optional, Dict, Any
|
Add messaging platform enhancements: STT, stickers, Discord UX, Slack, pairing, hooks
Major feature additions inspired by OpenClaw/ClawdBot integration analysis:
Voice Message Transcription (STT):
- Auto-transcribe voice/audio messages via OpenAI Whisper API
- Download voice to ~/.hermes/audio_cache/ on Telegram/Discord/WhatsApp
- Inject transcript as text so all models can understand voice input
- Configurable model (whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe)
Telegram Sticker Understanding:
- Describe static stickers via vision tool with JSON-backed cache
- Cache keyed by file_unique_id avoids redundant API calls
- Animated/video stickers get emoji-based fallback description
Discord Rich UX:
- Native slash commands (/ask, /reset, /status, /stop) via app_commands
- Button-based exec approvals (Allow Once / Always Allow / Deny)
- ExecApprovalView with user authorization and timeout handling
Slack Integration:
- Full SlackAdapter using slack-bolt with Socket Mode
- DMs, channel messages (mention-gated), /hermes slash command
- File attachment handling with bot-token-authenticated downloads
DM Pairing System:
- Code-based user authorization as alternative to static allowlists
- 8-char codes from unambiguous alphabet, 1-hour expiry
- Rate limiting, lockout after failed attempts, chmod 0600 on data
- CLI: hermes pairing list/approve/revoke/clear-pending
Event Hook System:
- File-based hook discovery from ~/.hermes/hooks/
- HOOK.yaml + handler.py per hook, sync/async handler support
- Events: gateway:startup, session:start/reset, agent:start/step/end
- Wildcard matching (command:* catches all command events)
Cross-Channel Messaging:
- send_message agent tool for delivering to any connected platform
- Enables cron job delivery and cross-platform notifications
Human-Like Response Pacing:
- Configurable delays between message chunks (off/natural/custom)
- HERMES_HUMAN_DELAY_MODE env var with min/max ms settings
Warm Injection Message Style:
- Retrofitted image vision messages with friendly kawaii-consistent tone
- All new injection messages (STT, stickers, errors) use warm style
Also: updated config migration to prompt for optional keys interactively,
bumped config version, updated README, AGENTS.md, .env.example,
cli-config.yaml.example, install scripts, pyproject.toml, and toolsets.
2026-02-15 21:38:59 -08:00
|
|
|
|
refactor: consolidate get_hermes_home() and parse_reasoning_effort() (#3062)
Centralizes two widely-duplicated patterns into hermes_constants.py:
1. get_hermes_home() — Path resolution for ~/.hermes (HERMES_HOME env var)
- Was copy-pasted inline across 30+ files as:
Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
- Now defined once in hermes_constants.py (zero-dependency module)
- hermes_cli/config.py re-exports it for backward compatibility
- Removed local wrapper functions in honcho_integration/client.py,
tools/website_policy.py, tools/tirith_security.py, hermes_cli/uninstall.py
2. parse_reasoning_effort() — Reasoning effort string validation
- Was copy-pasted in cli.py, gateway/run.py, cron/scheduler.py
- Same validation logic: check against (xhigh, high, medium, low, minimal, none)
- Now defined once in hermes_constants.py, called from all 3 locations
- Warning log for unknown values kept at call sites (context-specific)
31 files changed, net +31 lines (125 insertions, 94 deletions)
Full test suite: 6179 passed, 0 failed
2026-03-25 15:54:28 -07:00
|
|
|
from hermes_constants import get_hermes_home
|
|
|
|
|
|
2026-02-21 03:11:11 -08:00
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
2026-03-13 23:33:16 +03:00
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# Optional imports — graceful degradation
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
2026-03-14 09:06:52 +03:00
|
|
|
import importlib.util as _ilu
|
|
|
|
|
_HAS_FASTER_WHISPER = _ilu.find_spec("faster_whisper") is not None
|
|
|
|
|
_HAS_OPENAI = _ilu.find_spec("openai") is not None
|
2026-03-13 23:33:16 +03:00
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# Constants
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
DEFAULT_PROVIDER = "local"
|
|
|
|
|
DEFAULT_LOCAL_MODEL = "base"
|
2026-03-15 21:51:40 -07:00
|
|
|
DEFAULT_LOCAL_STT_LANGUAGE = "en"
|
2026-03-12 00:15:38 +03:00
|
|
|
DEFAULT_STT_MODEL = os.getenv("STT_OPENAI_MODEL", "whisper-1")
|
|
|
|
|
DEFAULT_GROQ_STT_MODEL = os.getenv("STT_GROQ_MODEL", "whisper-large-v3-turbo")
|
2026-03-15 21:51:40 -07:00
|
|
|
LOCAL_STT_COMMAND_ENV = "HERMES_LOCAL_STT_COMMAND"
|
|
|
|
|
LOCAL_STT_LANGUAGE_ENV = "HERMES_LOCAL_STT_LANGUAGE"
|
|
|
|
|
COMMON_LOCAL_BIN_DIRS = ("/opt/homebrew/bin", "/usr/local/bin")
|
feat(stt): add free local whisper transcription via faster-whisper (#1185)
* fix: Home Assistant event filtering now closed by default
Previously, when no watch_domains or watch_entities were configured,
ALL state_changed events passed through to the agent, causing users
to be flooded with notifications for every HA entity change.
Now events are dropped by default unless the user explicitly configures:
- watch_domains: list of domains to monitor (e.g. climate, light)
- watch_entities: list of specific entity IDs to monitor
- watch_all: true (new option — opt-in to receive all events)
A warning is logged at connect time if no filters are configured,
guiding users to set up their HA platform config.
All 49 gateway HA tests + 52 HA tool tests pass.
* docs: update Home Assistant integration documentation
- homeassistant.md: Fix event filtering docs to reflect closed-by-default
behavior. Add watch_all option. Replace Python dict config example with
YAML. Fix defaults table (was incorrectly showing 'all'). Add required
configuration warning admonition.
- environment-variables.md: Add HASS_TOKEN and HASS_URL to Messaging section.
- messaging/index.md: Add Home Assistant to description, architecture
diagram, platform toolsets table, and Next Steps links.
* fix(terminal): strip provider env vars from background and PTY subprocesses
Extends the env var blocklist from #1157 to also cover the two remaining
leaky paths in process_registry.py:
- spawn_local() PTY path (line 156)
- spawn_local() background Popen path (line 197)
Both were still using raw os.environ, leaking provider vars to background
processes and interactive PTY sessions. Now uses the same dynamic
_HERMES_PROVIDER_ENV_BLOCKLIST from local.py.
Explicit env_vars passed to spawn_local() still override the blocklist,
matching the existing behavior for callers that intentionally need these.
Gap identified by PR #1004 (@PeterFile).
* feat(delegate): add observability metadata to subagent results
Enrich delegate_task results with metadata from the child AIAgent:
- model: which model the child used
- exit_reason: completed | interrupted | max_iterations
- tokens.input / tokens.output: token counts
- tool_trace: per-tool-call trace with byte sizes and ok/error status
Tool trace uses tool_call_id matching to correctly pair parallel tool
calls with their results, with a fallback for messages without IDs.
Cherry-picked from PR #872 by @omerkaz, with fixes:
- Fixed parallel tool call trace pairing (was always updating last entry)
- Removed redundant 'iterations' field (identical to existing 'api_calls')
- Added test for parallel tool call trace correctness
Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>
* feat(stt): add free local whisper transcription via faster-whisper
Replace OpenAI-only STT with a dual-provider system mirroring the TTS
architecture (Edge TTS free / ElevenLabs paid):
STT: faster-whisper local (free, default) / OpenAI Whisper API (paid)
Changes:
- tools/transcription_tools.py: Full rewrite with provider dispatch,
config loading, local faster-whisper backend, and OpenAI API backend.
Auto-downloads model (~150MB for 'base') on first voice message.
Singleton model instance reused across calls.
- pyproject.toml: Add faster-whisper>=1.0.0 as core dependency
- hermes_cli/config.py: Expand stt config to match TTS pattern with
provider selection and per-provider model settings
- agent/context_compressor.py: Fix .strip() crash when LLM returns
non-string content (dict from llama.cpp, None). Fixes #1100 partially.
- tests/: 23 new tests for STT providers + 2 for compressor fix
- docs/: Updated Voice & TTS page with STT provider table, model sizes,
config examples, and fallback behavior
Fallback behavior:
- Local not installed → OpenAI API (if key set)
- OpenAI key not set → local whisper (if installed)
- Neither → graceful error message to user
Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
---------
Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>
Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
2026-03-13 11:11:05 -07:00
|
|
|
|
2026-03-12 00:15:38 +03:00
|
|
|
GROQ_BASE_URL = os.getenv("GROQ_BASE_URL", "https://api.groq.com/openai/v1")
|
|
|
|
|
OPENAI_BASE_URL = os.getenv("STT_OPENAI_BASE_URL", "https://api.openai.com/v1")
|
feat(stt): add free local whisper transcription via faster-whisper (#1185)
* fix: Home Assistant event filtering now closed by default
Previously, when no watch_domains or watch_entities were configured,
ALL state_changed events passed through to the agent, causing users
to be flooded with notifications for every HA entity change.
Now events are dropped by default unless the user explicitly configures:
- watch_domains: list of domains to monitor (e.g. climate, light)
- watch_entities: list of specific entity IDs to monitor
- watch_all: true (new option — opt-in to receive all events)
A warning is logged at connect time if no filters are configured,
guiding users to set up their HA platform config.
All 49 gateway HA tests + 52 HA tool tests pass.
* docs: update Home Assistant integration documentation
- homeassistant.md: Fix event filtering docs to reflect closed-by-default
behavior. Add watch_all option. Replace Python dict config example with
YAML. Fix defaults table (was incorrectly showing 'all'). Add required
configuration warning admonition.
- environment-variables.md: Add HASS_TOKEN and HASS_URL to Messaging section.
- messaging/index.md: Add Home Assistant to description, architecture
diagram, platform toolsets table, and Next Steps links.
* fix(terminal): strip provider env vars from background and PTY subprocesses
Extends the env var blocklist from #1157 to also cover the two remaining
leaky paths in process_registry.py:
- spawn_local() PTY path (line 156)
- spawn_local() background Popen path (line 197)
Both were still using raw os.environ, leaking provider vars to background
processes and interactive PTY sessions. Now uses the same dynamic
_HERMES_PROVIDER_ENV_BLOCKLIST from local.py.
Explicit env_vars passed to spawn_local() still override the blocklist,
matching the existing behavior for callers that intentionally need these.
Gap identified by PR #1004 (@PeterFile).
* feat(delegate): add observability metadata to subagent results
Enrich delegate_task results with metadata from the child AIAgent:
- model: which model the child used
- exit_reason: completed | interrupted | max_iterations
- tokens.input / tokens.output: token counts
- tool_trace: per-tool-call trace with byte sizes and ok/error status
Tool trace uses tool_call_id matching to correctly pair parallel tool
calls with their results, with a fallback for messages without IDs.
Cherry-picked from PR #872 by @omerkaz, with fixes:
- Fixed parallel tool call trace pairing (was always updating last entry)
- Removed redundant 'iterations' field (identical to existing 'api_calls')
- Added test for parallel tool call trace correctness
Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>
* feat(stt): add free local whisper transcription via faster-whisper
Replace OpenAI-only STT with a dual-provider system mirroring the TTS
architecture (Edge TTS free / ElevenLabs paid):
STT: faster-whisper local (free, default) / OpenAI Whisper API (paid)
Changes:
- tools/transcription_tools.py: Full rewrite with provider dispatch,
config loading, local faster-whisper backend, and OpenAI API backend.
Auto-downloads model (~150MB for 'base') on first voice message.
Singleton model instance reused across calls.
- pyproject.toml: Add faster-whisper>=1.0.0 as core dependency
- hermes_cli/config.py: Expand stt config to match TTS pattern with
provider selection and per-provider model settings
- agent/context_compressor.py: Fix .strip() crash when LLM returns
non-string content (dict from llama.cpp, None). Fixes #1100 partially.
- tests/: 23 new tests for STT providers + 2 for compressor fix
- docs/: Updated Voice & TTS page with STT provider table, model sizes,
config examples, and fallback behavior
Fallback behavior:
- Local not installed → OpenAI API (if key set)
- OpenAI key not set → local whisper (if installed)
- Neither → graceful error message to user
Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
---------
Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>
Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
2026-03-13 11:11:05 -07:00
|
|
|
|
2026-03-29 21:27:03 -07:00
|
|
|
SUPPORTED_FORMATS = {".mp3", ".mp4", ".mpeg", ".mpga", ".m4a", ".wav", ".webm", ".ogg", ".aac"}
|
2026-03-15 21:51:40 -07:00
|
|
|
LOCAL_NATIVE_AUDIO_FORMATS = {".wav", ".aiff", ".aif"}
|
2026-03-13 23:33:16 +03:00
|
|
|
MAX_FILE_SIZE = 25 * 1024 * 1024 # 25 MB
|
|
|
|
|
|
|
|
|
|
# Known model sets for auto-correction
|
|
|
|
|
OPENAI_MODELS = {"whisper-1", "gpt-4o-mini-transcribe", "gpt-4o-transcribe"}
|
|
|
|
|
GROQ_MODELS = {"whisper-large-v3", "whisper-large-v3-turbo", "distil-whisper-large-v3-en"}
|
|
|
|
|
|
|
|
|
|
# Singleton for the local model — loaded once, reused across calls
|
2026-03-14 09:06:52 +03:00
|
|
|
_local_model: Optional[object] = None
|
2026-03-13 23:33:16 +03:00
|
|
|
_local_model_name: Optional[str] = None
|
|
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# Config helpers
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
Add messaging platform enhancements: STT, stickers, Discord UX, Slack, pairing, hooks
Major feature additions inspired by OpenClaw/ClawdBot integration analysis:
Voice Message Transcription (STT):
- Auto-transcribe voice/audio messages via OpenAI Whisper API
- Download voice to ~/.hermes/audio_cache/ on Telegram/Discord/WhatsApp
- Inject transcript as text so all models can understand voice input
- Configurable model (whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe)
Telegram Sticker Understanding:
- Describe static stickers via vision tool with JSON-backed cache
- Cache keyed by file_unique_id avoids redundant API calls
- Animated/video stickers get emoji-based fallback description
Discord Rich UX:
- Native slash commands (/ask, /reset, /status, /stop) via app_commands
- Button-based exec approvals (Allow Once / Always Allow / Deny)
- ExecApprovalView with user authorization and timeout handling
Slack Integration:
- Full SlackAdapter using slack-bolt with Socket Mode
- DMs, channel messages (mention-gated), /hermes slash command
- File attachment handling with bot-token-authenticated downloads
DM Pairing System:
- Code-based user authorization as alternative to static allowlists
- 8-char codes from unambiguous alphabet, 1-hour expiry
- Rate limiting, lockout after failed attempts, chmod 0600 on data
- CLI: hermes pairing list/approve/revoke/clear-pending
Event Hook System:
- File-based hook discovery from ~/.hermes/hooks/
- HOOK.yaml + handler.py per hook, sync/async handler support
- Events: gateway:startup, session:start/reset, agent:start/step/end
- Wildcard matching (command:* catches all command events)
Cross-Channel Messaging:
- send_message agent tool for delivering to any connected platform
- Enables cron job delivery and cross-platform notifications
Human-Like Response Pacing:
- Configurable delays between message chunks (off/natural/custom)
- HERMES_HUMAN_DELAY_MODE env var with min/max ms settings
Warm Injection Message Style:
- Retrofitted image vision messages with friendly kawaii-consistent tone
- All new injection messages (STT, stickers, errors) use warm style
Also: updated config migration to prompt for optional keys interactively,
bumped config version, updated README, AGENTS.md, .env.example,
cli-config.yaml.example, install scripts, pyproject.toml, and toolsets.
2026-02-15 21:38:59 -08:00
|
|
|
|
2026-03-12 00:26:40 +03:00
|
|
|
def get_stt_model_from_config() -> Optional[str]:
|
|
|
|
|
"""Read the STT model name from ~/.hermes/config.yaml.
|
|
|
|
|
|
|
|
|
|
Returns the value of ``stt.model`` if present, otherwise ``None``.
|
|
|
|
|
Silently returns ``None`` on any error (missing file, bad YAML, etc.).
|
|
|
|
|
"""
|
|
|
|
|
try:
|
|
|
|
|
import yaml
|
refactor: consolidate get_hermes_home() and parse_reasoning_effort() (#3062)
Centralizes two widely-duplicated patterns into hermes_constants.py:
1. get_hermes_home() — Path resolution for ~/.hermes (HERMES_HOME env var)
- Was copy-pasted inline across 30+ files as:
Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
- Now defined once in hermes_constants.py (zero-dependency module)
- hermes_cli/config.py re-exports it for backward compatibility
- Removed local wrapper functions in honcho_integration/client.py,
tools/website_policy.py, tools/tirith_security.py, hermes_cli/uninstall.py
2. parse_reasoning_effort() — Reasoning effort string validation
- Was copy-pasted in cli.py, gateway/run.py, cron/scheduler.py
- Same validation logic: check against (xhigh, high, medium, low, minimal, none)
- Now defined once in hermes_constants.py, called from all 3 locations
- Warning log for unknown values kept at call sites (context-specific)
31 files changed, net +31 lines (125 insertions, 94 deletions)
Full test suite: 6179 passed, 0 failed
2026-03-25 15:54:28 -07:00
|
|
|
cfg_path = get_hermes_home() / "config.yaml"
|
2026-03-12 00:26:40 +03:00
|
|
|
if cfg_path.exists():
|
|
|
|
|
with open(cfg_path) as f:
|
|
|
|
|
data = yaml.safe_load(f) or {}
|
|
|
|
|
return data.get("stt", {}).get("model")
|
|
|
|
|
except Exception:
|
|
|
|
|
pass
|
|
|
|
|
return None
|
|
|
|
|
|
|
|
|
|
|
2026-03-13 23:33:16 +03:00
|
|
|
def _load_stt_config() -> dict:
|
|
|
|
|
"""Load the ``stt`` section from user config, falling back to defaults."""
|
|
|
|
|
try:
|
|
|
|
|
from hermes_cli.config import load_config
|
|
|
|
|
return load_config().get("stt", {})
|
|
|
|
|
except Exception:
|
|
|
|
|
return {}
|
Add messaging platform enhancements: STT, stickers, Discord UX, Slack, pairing, hooks
Major feature additions inspired by OpenClaw/ClawdBot integration analysis:
Voice Message Transcription (STT):
- Auto-transcribe voice/audio messages via OpenAI Whisper API
- Download voice to ~/.hermes/audio_cache/ on Telegram/Discord/WhatsApp
- Inject transcript as text so all models can understand voice input
- Configurable model (whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe)
Telegram Sticker Understanding:
- Describe static stickers via vision tool with JSON-backed cache
- Cache keyed by file_unique_id avoids redundant API calls
- Animated/video stickers get emoji-based fallback description
Discord Rich UX:
- Native slash commands (/ask, /reset, /status, /stop) via app_commands
- Button-based exec approvals (Allow Once / Always Allow / Deny)
- ExecApprovalView with user authorization and timeout handling
Slack Integration:
- Full SlackAdapter using slack-bolt with Socket Mode
- DMs, channel messages (mention-gated), /hermes slash command
- File attachment handling with bot-token-authenticated downloads
DM Pairing System:
- Code-based user authorization as alternative to static allowlists
- 8-char codes from unambiguous alphabet, 1-hour expiry
- Rate limiting, lockout after failed attempts, chmod 0600 on data
- CLI: hermes pairing list/approve/revoke/clear-pending
Event Hook System:
- File-based hook discovery from ~/.hermes/hooks/
- HOOK.yaml + handler.py per hook, sync/async handler support
- Events: gateway:startup, session:start/reset, agent:start/step/end
- Wildcard matching (command:* catches all command events)
Cross-Channel Messaging:
- send_message agent tool for delivering to any connected platform
- Enables cron job delivery and cross-platform notifications
Human-Like Response Pacing:
- Configurable delays between message chunks (off/natural/custom)
- HERMES_HUMAN_DELAY_MODE env var with min/max ms settings
Warm Injection Message Style:
- Retrofitted image vision messages with friendly kawaii-consistent tone
- All new injection messages (STT, stickers, errors) use warm style
Also: updated config migration to prompt for optional keys interactively,
bumped config version, updated README, AGENTS.md, .env.example,
cli-config.yaml.example, install scripts, pyproject.toml, and toolsets.
2026-02-15 21:38:59 -08:00
|
|
|
|
feat(stt): add free local whisper transcription via faster-whisper (#1185)
* fix: Home Assistant event filtering now closed by default
Previously, when no watch_domains or watch_entities were configured,
ALL state_changed events passed through to the agent, causing users
to be flooded with notifications for every HA entity change.
Now events are dropped by default unless the user explicitly configures:
- watch_domains: list of domains to monitor (e.g. climate, light)
- watch_entities: list of specific entity IDs to monitor
- watch_all: true (new option — opt-in to receive all events)
A warning is logged at connect time if no filters are configured,
guiding users to set up their HA platform config.
All 49 gateway HA tests + 52 HA tool tests pass.
* docs: update Home Assistant integration documentation
- homeassistant.md: Fix event filtering docs to reflect closed-by-default
behavior. Add watch_all option. Replace Python dict config example with
YAML. Fix defaults table (was incorrectly showing 'all'). Add required
configuration warning admonition.
- environment-variables.md: Add HASS_TOKEN and HASS_URL to Messaging section.
- messaging/index.md: Add Home Assistant to description, architecture
diagram, platform toolsets table, and Next Steps links.
* fix(terminal): strip provider env vars from background and PTY subprocesses
Extends the env var blocklist from #1157 to also cover the two remaining
leaky paths in process_registry.py:
- spawn_local() PTY path (line 156)
- spawn_local() background Popen path (line 197)
Both were still using raw os.environ, leaking provider vars to background
processes and interactive PTY sessions. Now uses the same dynamic
_HERMES_PROVIDER_ENV_BLOCKLIST from local.py.
Explicit env_vars passed to spawn_local() still override the blocklist,
matching the existing behavior for callers that intentionally need these.
Gap identified by PR #1004 (@PeterFile).
* feat(delegate): add observability metadata to subagent results
Enrich delegate_task results with metadata from the child AIAgent:
- model: which model the child used
- exit_reason: completed | interrupted | max_iterations
- tokens.input / tokens.output: token counts
- tool_trace: per-tool-call trace with byte sizes and ok/error status
Tool trace uses tool_call_id matching to correctly pair parallel tool
calls with their results, with a fallback for messages without IDs.
Cherry-picked from PR #872 by @omerkaz, with fixes:
- Fixed parallel tool call trace pairing (was always updating last entry)
- Removed redundant 'iterations' field (identical to existing 'api_calls')
- Added test for parallel tool call trace correctness
Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>
* feat(stt): add free local whisper transcription via faster-whisper
Replace OpenAI-only STT with a dual-provider system mirroring the TTS
architecture (Edge TTS free / ElevenLabs paid):
STT: faster-whisper local (free, default) / OpenAI Whisper API (paid)
Changes:
- tools/transcription_tools.py: Full rewrite with provider dispatch,
config loading, local faster-whisper backend, and OpenAI API backend.
Auto-downloads model (~150MB for 'base') on first voice message.
Singleton model instance reused across calls.
- pyproject.toml: Add faster-whisper>=1.0.0 as core dependency
- hermes_cli/config.py: Expand stt config to match TTS pattern with
provider selection and per-provider model settings
- agent/context_compressor.py: Fix .strip() crash when LLM returns
non-string content (dict from llama.cpp, None). Fixes #1100 partially.
- tests/: 23 new tests for STT providers + 2 for compressor fix
- docs/: Updated Voice & TTS page with STT provider table, model sizes,
config examples, and fallback behavior
Fallback behavior:
- Local not installed → OpenAI API (if key set)
- OpenAI key not set → local whisper (if installed)
- Neither → graceful error message to user
Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
---------
Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>
Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
2026-03-13 11:11:05 -07:00
|
|
|
|
2026-03-14 22:09:59 -07:00
|
|
|
def is_stt_enabled(stt_config: Optional[dict] = None) -> bool:
|
|
|
|
|
"""Return whether STT is enabled in config."""
|
|
|
|
|
if stt_config is None:
|
|
|
|
|
stt_config = _load_stt_config()
|
|
|
|
|
enabled = stt_config.get("enabled", True)
|
|
|
|
|
if isinstance(enabled, str):
|
|
|
|
|
return enabled.strip().lower() in ("true", "1", "yes", "on")
|
|
|
|
|
if enabled is None:
|
|
|
|
|
return True
|
|
|
|
|
return bool(enabled)
|
|
|
|
|
|
|
|
|
|
|
2026-03-15 21:51:40 -07:00
|
|
|
def _resolve_openai_api_key() -> str:
|
|
|
|
|
"""Prefer the voice-tools key, but fall back to the normal OpenAI key."""
|
|
|
|
|
return os.getenv("VOICE_TOOLS_OPENAI_KEY", "") or os.getenv("OPENAI_API_KEY", "")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _find_binary(binary_name: str) -> Optional[str]:
|
|
|
|
|
"""Find a local binary, checking common Homebrew/local prefixes as well as PATH."""
|
|
|
|
|
for directory in COMMON_LOCAL_BIN_DIRS:
|
|
|
|
|
candidate = Path(directory) / binary_name
|
|
|
|
|
if candidate.exists() and os.access(candidate, os.X_OK):
|
|
|
|
|
return str(candidate)
|
|
|
|
|
return shutil.which(binary_name)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _find_ffmpeg_binary() -> Optional[str]:
|
|
|
|
|
return _find_binary("ffmpeg")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _find_whisper_binary() -> Optional[str]:
|
|
|
|
|
return _find_binary("whisper")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _get_local_command_template() -> Optional[str]:
|
|
|
|
|
configured = os.getenv(LOCAL_STT_COMMAND_ENV, "").strip()
|
|
|
|
|
if configured:
|
|
|
|
|
return configured
|
|
|
|
|
|
|
|
|
|
whisper_binary = _find_whisper_binary()
|
|
|
|
|
if whisper_binary:
|
|
|
|
|
quoted_binary = shlex.quote(whisper_binary)
|
|
|
|
|
return (
|
|
|
|
|
f"{quoted_binary} {{input_path}} --model {{model}} --output_format txt "
|
|
|
|
|
"--output_dir {output_dir} --language {language}"
|
|
|
|
|
)
|
|
|
|
|
return None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _has_local_command() -> bool:
|
|
|
|
|
return _get_local_command_template() is not None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _normalize_local_command_model(model_name: Optional[str]) -> str:
|
|
|
|
|
if not model_name or model_name in OPENAI_MODELS or model_name in GROQ_MODELS:
|
|
|
|
|
return DEFAULT_LOCAL_MODEL
|
|
|
|
|
return model_name
|
|
|
|
|
|
|
|
|
|
|
2026-03-13 23:33:16 +03:00
|
|
|
def _get_provider(stt_config: dict) -> str:
|
|
|
|
|
"""Determine which STT provider to use.
|
2026-03-03 17:17:40 +03:00
|
|
|
|
2026-03-17 10:30:58 -07:00
|
|
|
When ``stt.provider`` is explicitly set in config, that choice is
|
|
|
|
|
honoured — no silent cloud fallback. When no provider is configured,
|
|
|
|
|
auto-detect tries: local > groq (free) > openai (paid).
|
Add messaging platform enhancements: STT, stickers, Discord UX, Slack, pairing, hooks
Major feature additions inspired by OpenClaw/ClawdBot integration analysis:
Voice Message Transcription (STT):
- Auto-transcribe voice/audio messages via OpenAI Whisper API
- Download voice to ~/.hermes/audio_cache/ on Telegram/Discord/WhatsApp
- Inject transcript as text so all models can understand voice input
- Configurable model (whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe)
Telegram Sticker Understanding:
- Describe static stickers via vision tool with JSON-backed cache
- Cache keyed by file_unique_id avoids redundant API calls
- Animated/video stickers get emoji-based fallback description
Discord Rich UX:
- Native slash commands (/ask, /reset, /status, /stop) via app_commands
- Button-based exec approvals (Allow Once / Always Allow / Deny)
- ExecApprovalView with user authorization and timeout handling
Slack Integration:
- Full SlackAdapter using slack-bolt with Socket Mode
- DMs, channel messages (mention-gated), /hermes slash command
- File attachment handling with bot-token-authenticated downloads
DM Pairing System:
- Code-based user authorization as alternative to static allowlists
- 8-char codes from unambiguous alphabet, 1-hour expiry
- Rate limiting, lockout after failed attempts, chmod 0600 on data
- CLI: hermes pairing list/approve/revoke/clear-pending
Event Hook System:
- File-based hook discovery from ~/.hermes/hooks/
- HOOK.yaml + handler.py per hook, sync/async handler support
- Events: gateway:startup, session:start/reset, agent:start/step/end
- Wildcard matching (command:* catches all command events)
Cross-Channel Messaging:
- send_message agent tool for delivering to any connected platform
- Enables cron job delivery and cross-platform notifications
Human-Like Response Pacing:
- Configurable delays between message chunks (off/natural/custom)
- HERMES_HUMAN_DELAY_MODE env var with min/max ms settings
Warm Injection Message Style:
- Retrofitted image vision messages with friendly kawaii-consistent tone
- All new injection messages (STT, stickers, errors) use warm style
Also: updated config migration to prompt for optional keys interactively,
bumped config version, updated README, AGENTS.md, .env.example,
cli-config.yaml.example, install scripts, pyproject.toml, and toolsets.
2026-02-15 21:38:59 -08:00
|
|
|
"""
|
2026-03-14 22:09:59 -07:00
|
|
|
if not is_stt_enabled(stt_config):
|
|
|
|
|
return "none"
|
|
|
|
|
|
2026-03-17 10:30:58 -07:00
|
|
|
explicit = "provider" in stt_config
|
2026-03-13 23:33:16 +03:00
|
|
|
provider = stt_config.get("provider", DEFAULT_PROVIDER)
|
|
|
|
|
|
2026-03-17 10:30:58 -07:00
|
|
|
# --- Explicit provider: respect the user's choice ----------------------
|
|
|
|
|
|
|
|
|
|
if explicit:
|
|
|
|
|
if provider == "local":
|
|
|
|
|
if _HAS_FASTER_WHISPER:
|
|
|
|
|
return "local"
|
|
|
|
|
if _has_local_command():
|
|
|
|
|
return "local_command"
|
|
|
|
|
logger.warning(
|
|
|
|
|
"STT provider 'local' configured but unavailable "
|
|
|
|
|
"(install faster-whisper or set HERMES_LOCAL_STT_COMMAND)"
|
|
|
|
|
)
|
|
|
|
|
return "none"
|
|
|
|
|
|
|
|
|
|
if provider == "local_command":
|
|
|
|
|
if _has_local_command():
|
|
|
|
|
return "local_command"
|
|
|
|
|
if _HAS_FASTER_WHISPER:
|
|
|
|
|
logger.info("Local STT command unavailable, using local faster-whisper")
|
|
|
|
|
return "local"
|
|
|
|
|
logger.warning(
|
|
|
|
|
"STT provider 'local_command' configured but unavailable"
|
|
|
|
|
)
|
|
|
|
|
return "none"
|
|
|
|
|
|
|
|
|
|
if provider == "groq":
|
|
|
|
|
if _HAS_OPENAI and os.getenv("GROQ_API_KEY"):
|
|
|
|
|
return "groq"
|
|
|
|
|
logger.warning(
|
|
|
|
|
"STT provider 'groq' configured but GROQ_API_KEY not set"
|
|
|
|
|
)
|
|
|
|
|
return "none"
|
|
|
|
|
|
|
|
|
|
if provider == "openai":
|
|
|
|
|
if _HAS_OPENAI and _resolve_openai_api_key():
|
|
|
|
|
return "openai"
|
|
|
|
|
logger.warning(
|
|
|
|
|
"STT provider 'openai' configured but no API key available"
|
|
|
|
|
)
|
|
|
|
|
return "none"
|
|
|
|
|
|
|
|
|
|
return provider # Unknown — let it fail downstream
|
|
|
|
|
|
|
|
|
|
# --- Auto-detect (no explicit provider): local > groq > openai ---------
|
|
|
|
|
|
|
|
|
|
if _HAS_FASTER_WHISPER:
|
|
|
|
|
return "local"
|
|
|
|
|
if _has_local_command():
|
|
|
|
|
return "local_command"
|
|
|
|
|
if _HAS_OPENAI and os.getenv("GROQ_API_KEY"):
|
|
|
|
|
logger.info("No local STT available, using Groq Whisper API")
|
|
|
|
|
return "groq"
|
|
|
|
|
if _HAS_OPENAI and _resolve_openai_api_key():
|
|
|
|
|
logger.info("No local STT available, using OpenAI Whisper API")
|
|
|
|
|
return "openai"
|
|
|
|
|
return "none"
|
2026-03-13 23:33:16 +03:00
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# Shared validation
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _validate_audio_file(file_path: str) -> Optional[Dict[str, Any]]:
|
|
|
|
|
"""Validate the audio file. Returns an error dict or None if OK."""
|
2026-03-03 17:17:40 +03:00
|
|
|
audio_path = Path(file_path)
|
2026-03-13 23:33:16 +03:00
|
|
|
|
2026-03-02 01:53:18 +03:00
|
|
|
if not audio_path.exists():
|
2026-03-13 23:33:16 +03:00
|
|
|
return {"success": False, "transcript": "", "error": f"Audio file not found: {file_path}"}
|
2026-03-02 01:53:18 +03:00
|
|
|
if not audio_path.is_file():
|
2026-03-13 23:33:16 +03:00
|
|
|
return {"success": False, "transcript": "", "error": f"Path is not a file: {file_path}"}
|
2026-03-02 01:53:18 +03:00
|
|
|
if audio_path.suffix.lower() not in SUPPORTED_FORMATS:
|
|
|
|
|
return {
|
|
|
|
|
"success": False,
|
|
|
|
|
"transcript": "",
|
2026-03-13 23:33:16 +03:00
|
|
|
"error": f"Unsupported format: {audio_path.suffix}. Supported: {', '.join(sorted(SUPPORTED_FORMATS))}",
|
2026-03-02 01:53:18 +03:00
|
|
|
}
|
|
|
|
|
try:
|
|
|
|
|
file_size = audio_path.stat().st_size
|
|
|
|
|
if file_size > MAX_FILE_SIZE:
|
|
|
|
|
return {
|
|
|
|
|
"success": False,
|
|
|
|
|
"transcript": "",
|
2026-03-13 23:33:16 +03:00
|
|
|
"error": f"File too large: {file_size / (1024*1024):.1f}MB (max {MAX_FILE_SIZE / (1024*1024):.0f}MB)",
|
2026-03-02 01:53:18 +03:00
|
|
|
}
|
|
|
|
|
except OSError as e:
|
2026-03-13 23:33:16 +03:00
|
|
|
return {"success": False, "transcript": "", "error": f"Failed to access file: {e}"}
|
|
|
|
|
|
|
|
|
|
return None
|
Add messaging platform enhancements: STT, stickers, Discord UX, Slack, pairing, hooks
Major feature additions inspired by OpenClaw/ClawdBot integration analysis:
Voice Message Transcription (STT):
- Auto-transcribe voice/audio messages via OpenAI Whisper API
- Download voice to ~/.hermes/audio_cache/ on Telegram/Discord/WhatsApp
- Inject transcript as text so all models can understand voice input
- Configurable model (whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe)
Telegram Sticker Understanding:
- Describe static stickers via vision tool with JSON-backed cache
- Cache keyed by file_unique_id avoids redundant API calls
- Animated/video stickers get emoji-based fallback description
Discord Rich UX:
- Native slash commands (/ask, /reset, /status, /stop) via app_commands
- Button-based exec approvals (Allow Once / Always Allow / Deny)
- ExecApprovalView with user authorization and timeout handling
Slack Integration:
- Full SlackAdapter using slack-bolt with Socket Mode
- DMs, channel messages (mention-gated), /hermes slash command
- File attachment handling with bot-token-authenticated downloads
DM Pairing System:
- Code-based user authorization as alternative to static allowlists
- 8-char codes from unambiguous alphabet, 1-hour expiry
- Rate limiting, lockout after failed attempts, chmod 0600 on data
- CLI: hermes pairing list/approve/revoke/clear-pending
Event Hook System:
- File-based hook discovery from ~/.hermes/hooks/
- HOOK.yaml + handler.py per hook, sync/async handler support
- Events: gateway:startup, session:start/reset, agent:start/step/end
- Wildcard matching (command:* catches all command events)
Cross-Channel Messaging:
- send_message agent tool for delivering to any connected platform
- Enables cron job delivery and cross-platform notifications
Human-Like Response Pacing:
- Configurable delays between message chunks (off/natural/custom)
- HERMES_HUMAN_DELAY_MODE env var with min/max ms settings
Warm Injection Message Style:
- Retrofitted image vision messages with friendly kawaii-consistent tone
- All new injection messages (STT, stickers, errors) use warm style
Also: updated config migration to prompt for optional keys interactively,
bumped config version, updated README, AGENTS.md, .env.example,
cli-config.yaml.example, install scripts, pyproject.toml, and toolsets.
2026-02-15 21:38:59 -08:00
|
|
|
|
2026-03-13 23:33:16 +03:00
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# Provider: local (faster-whisper)
|
|
|
|
|
# ---------------------------------------------------------------------------
|
feat(stt): add free local whisper transcription via faster-whisper (#1185)
* fix: Home Assistant event filtering now closed by default
Previously, when no watch_domains or watch_entities were configured,
ALL state_changed events passed through to the agent, causing users
to be flooded with notifications for every HA entity change.
Now events are dropped by default unless the user explicitly configures:
- watch_domains: list of domains to monitor (e.g. climate, light)
- watch_entities: list of specific entity IDs to monitor
- watch_all: true (new option — opt-in to receive all events)
A warning is logged at connect time if no filters are configured,
guiding users to set up their HA platform config.
All 49 gateway HA tests + 52 HA tool tests pass.
* docs: update Home Assistant integration documentation
- homeassistant.md: Fix event filtering docs to reflect closed-by-default
behavior. Add watch_all option. Replace Python dict config example with
YAML. Fix defaults table (was incorrectly showing 'all'). Add required
configuration warning admonition.
- environment-variables.md: Add HASS_TOKEN and HASS_URL to Messaging section.
- messaging/index.md: Add Home Assistant to description, architecture
diagram, platform toolsets table, and Next Steps links.
* fix(terminal): strip provider env vars from background and PTY subprocesses
Extends the env var blocklist from #1157 to also cover the two remaining
leaky paths in process_registry.py:
- spawn_local() PTY path (line 156)
- spawn_local() background Popen path (line 197)
Both were still using raw os.environ, leaking provider vars to background
processes and interactive PTY sessions. Now uses the same dynamic
_HERMES_PROVIDER_ENV_BLOCKLIST from local.py.
Explicit env_vars passed to spawn_local() still override the blocklist,
matching the existing behavior for callers that intentionally need these.
Gap identified by PR #1004 (@PeterFile).
* feat(delegate): add observability metadata to subagent results
Enrich delegate_task results with metadata from the child AIAgent:
- model: which model the child used
- exit_reason: completed | interrupted | max_iterations
- tokens.input / tokens.output: token counts
- tool_trace: per-tool-call trace with byte sizes and ok/error status
Tool trace uses tool_call_id matching to correctly pair parallel tool
calls with their results, with a fallback for messages without IDs.
Cherry-picked from PR #872 by @omerkaz, with fixes:
- Fixed parallel tool call trace pairing (was always updating last entry)
- Removed redundant 'iterations' field (identical to existing 'api_calls')
- Added test for parallel tool call trace correctness
Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>
* feat(stt): add free local whisper transcription via faster-whisper
Replace OpenAI-only STT with a dual-provider system mirroring the TTS
architecture (Edge TTS free / ElevenLabs paid):
STT: faster-whisper local (free, default) / OpenAI Whisper API (paid)
Changes:
- tools/transcription_tools.py: Full rewrite with provider dispatch,
config loading, local faster-whisper backend, and OpenAI API backend.
Auto-downloads model (~150MB for 'base') on first voice message.
Singleton model instance reused across calls.
- pyproject.toml: Add faster-whisper>=1.0.0 as core dependency
- hermes_cli/config.py: Expand stt config to match TTS pattern with
provider selection and per-provider model settings
- agent/context_compressor.py: Fix .strip() crash when LLM returns
non-string content (dict from llama.cpp, None). Fixes #1100 partially.
- tests/: 23 new tests for STT providers + 2 for compressor fix
- docs/: Updated Voice & TTS page with STT provider table, model sizes,
config examples, and fallback behavior
Fallback behavior:
- Local not installed → OpenAI API (if key set)
- OpenAI key not set → local whisper (if installed)
- Neither → graceful error message to user
Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
---------
Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>
Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
2026-03-13 11:11:05 -07:00
|
|
|
|
2026-03-13 23:33:16 +03:00
|
|
|
|
|
|
|
|
def _transcribe_local(file_path: str, model_name: str) -> Dict[str, Any]:
|
|
|
|
|
"""Transcribe using faster-whisper (local, free)."""
|
|
|
|
|
global _local_model, _local_model_name
|
|
|
|
|
|
|
|
|
|
if not _HAS_FASTER_WHISPER:
|
|
|
|
|
return {"success": False, "transcript": "", "error": "faster-whisper not installed"}
|
Add messaging platform enhancements: STT, stickers, Discord UX, Slack, pairing, hooks
Major feature additions inspired by OpenClaw/ClawdBot integration analysis:
Voice Message Transcription (STT):
- Auto-transcribe voice/audio messages via OpenAI Whisper API
- Download voice to ~/.hermes/audio_cache/ on Telegram/Discord/WhatsApp
- Inject transcript as text so all models can understand voice input
- Configurable model (whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe)
Telegram Sticker Understanding:
- Describe static stickers via vision tool with JSON-backed cache
- Cache keyed by file_unique_id avoids redundant API calls
- Animated/video stickers get emoji-based fallback description
Discord Rich UX:
- Native slash commands (/ask, /reset, /status, /stop) via app_commands
- Button-based exec approvals (Allow Once / Always Allow / Deny)
- ExecApprovalView with user authorization and timeout handling
Slack Integration:
- Full SlackAdapter using slack-bolt with Socket Mode
- DMs, channel messages (mention-gated), /hermes slash command
- File attachment handling with bot-token-authenticated downloads
DM Pairing System:
- Code-based user authorization as alternative to static allowlists
- 8-char codes from unambiguous alphabet, 1-hour expiry
- Rate limiting, lockout after failed attempts, chmod 0600 on data
- CLI: hermes pairing list/approve/revoke/clear-pending
Event Hook System:
- File-based hook discovery from ~/.hermes/hooks/
- HOOK.yaml + handler.py per hook, sync/async handler support
- Events: gateway:startup, session:start/reset, agent:start/step/end
- Wildcard matching (command:* catches all command events)
Cross-Channel Messaging:
- send_message agent tool for delivering to any connected platform
- Enables cron job delivery and cross-platform notifications
Human-Like Response Pacing:
- Configurable delays between message chunks (off/natural/custom)
- HERMES_HUMAN_DELAY_MODE env var with min/max ms settings
Warm Injection Message Style:
- Retrofitted image vision messages with friendly kawaii-consistent tone
- All new injection messages (STT, stickers, errors) use warm style
Also: updated config migration to prompt for optional keys interactively,
bumped config version, updated README, AGENTS.md, .env.example,
cli-config.yaml.example, install scripts, pyproject.toml, and toolsets.
2026-02-15 21:38:59 -08:00
|
|
|
|
|
|
|
|
try:
|
2026-03-14 09:06:52 +03:00
|
|
|
from faster_whisper import WhisperModel
|
2026-03-13 23:33:16 +03:00
|
|
|
# Lazy-load the model (downloads on first use, ~150 MB for 'base')
|
|
|
|
|
if _local_model is None or _local_model_name != model_name:
|
|
|
|
|
logger.info("Loading faster-whisper model '%s' (first load downloads the model)...", model_name)
|
|
|
|
|
_local_model = WhisperModel(model_name, device="auto", compute_type="auto")
|
|
|
|
|
_local_model_name = model_name
|
|
|
|
|
|
|
|
|
|
segments, info = _local_model.transcribe(file_path, beam_size=5)
|
|
|
|
|
transcript = " ".join(segment.text.strip() for segment in segments)
|
|
|
|
|
|
|
|
|
|
logger.info(
|
|
|
|
|
"Transcribed %s via local whisper (%s, lang=%s, %.1fs audio)",
|
|
|
|
|
Path(file_path).name, model_name, info.language, info.duration,
|
|
|
|
|
)
|
|
|
|
|
|
2026-03-14 00:14:19 +03:00
|
|
|
return {"success": True, "transcript": transcript, "provider": "local"}
|
2026-03-13 23:33:16 +03:00
|
|
|
|
|
|
|
|
except Exception as e:
|
|
|
|
|
logger.error("Local transcription failed: %s", e, exc_info=True)
|
|
|
|
|
return {"success": False, "transcript": "", "error": f"Local transcription failed: {e}"}
|
|
|
|
|
|
2026-03-15 21:51:40 -07:00
|
|
|
|
|
|
|
|
def _prepare_local_audio(file_path: str, work_dir: str) -> tuple[Optional[str], Optional[str]]:
|
|
|
|
|
"""Normalize audio for local CLI STT when needed."""
|
|
|
|
|
audio_path = Path(file_path)
|
|
|
|
|
if audio_path.suffix.lower() in LOCAL_NATIVE_AUDIO_FORMATS:
|
|
|
|
|
return file_path, None
|
|
|
|
|
|
|
|
|
|
ffmpeg = _find_ffmpeg_binary()
|
|
|
|
|
if not ffmpeg:
|
|
|
|
|
return None, "Local STT fallback requires ffmpeg for non-WAV inputs, but ffmpeg was not found"
|
|
|
|
|
|
|
|
|
|
converted_path = os.path.join(work_dir, f"{audio_path.stem}.wav")
|
|
|
|
|
command = [ffmpeg, "-y", "-i", file_path, converted_path]
|
|
|
|
|
|
|
|
|
|
try:
|
|
|
|
|
subprocess.run(command, check=True, capture_output=True, text=True)
|
|
|
|
|
return converted_path, None
|
|
|
|
|
except subprocess.CalledProcessError as e:
|
|
|
|
|
details = e.stderr.strip() or e.stdout.strip() or str(e)
|
|
|
|
|
logger.error("ffmpeg conversion failed for %s: %s", file_path, details)
|
|
|
|
|
return None, f"Failed to convert audio for local STT: {details}"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _transcribe_local_command(file_path: str, model_name: str) -> Dict[str, Any]:
|
|
|
|
|
"""Run the configured local STT command template and read back a .txt transcript."""
|
|
|
|
|
command_template = _get_local_command_template()
|
|
|
|
|
if not command_template:
|
|
|
|
|
return {
|
|
|
|
|
"success": False,
|
|
|
|
|
"transcript": "",
|
|
|
|
|
"error": (
|
|
|
|
|
f"{LOCAL_STT_COMMAND_ENV} not configured and no local whisper binary was found"
|
|
|
|
|
),
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
language = os.getenv(LOCAL_STT_LANGUAGE_ENV, DEFAULT_LOCAL_STT_LANGUAGE)
|
|
|
|
|
normalized_model = _normalize_local_command_model(model_name)
|
|
|
|
|
|
|
|
|
|
try:
|
|
|
|
|
with tempfile.TemporaryDirectory(prefix="hermes-local-stt-") as output_dir:
|
|
|
|
|
prepared_input, prep_error = _prepare_local_audio(file_path, output_dir)
|
|
|
|
|
if prep_error:
|
|
|
|
|
return {"success": False, "transcript": "", "error": prep_error}
|
|
|
|
|
|
2026-03-30 23:15:11 +00:00
|
|
|
# SECURITY FIX: Use list-based command execution instead of shell=True
|
|
|
|
|
# to prevent command injection via malicious file paths or parameters
|
2026-03-15 21:51:40 -07:00
|
|
|
command = command_template.format(
|
2026-03-30 23:15:11 +00:00
|
|
|
input_path=prepared_input, # shlex.quote not needed with list execution
|
|
|
|
|
output_dir=output_dir,
|
|
|
|
|
language=language,
|
|
|
|
|
model=normalized_model,
|
2026-03-15 21:51:40 -07:00
|
|
|
)
|
2026-03-30 23:15:11 +00:00
|
|
|
# Parse the command string into a list safely
|
|
|
|
|
command_parts = shlex.split(command)
|
|
|
|
|
subprocess.run(command_parts, shell=False, check=True, capture_output=True, text=True)
|
2026-03-15 21:51:40 -07:00
|
|
|
|
|
|
|
|
txt_files = sorted(Path(output_dir).glob("*.txt"))
|
|
|
|
|
if not txt_files:
|
|
|
|
|
return {
|
|
|
|
|
"success": False,
|
|
|
|
|
"transcript": "",
|
|
|
|
|
"error": "Local STT command completed but did not produce a .txt transcript",
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
transcript_text = txt_files[0].read_text(encoding="utf-8").strip()
|
|
|
|
|
logger.info(
|
|
|
|
|
"Transcribed %s via local STT command (%s, %d chars)",
|
|
|
|
|
Path(file_path).name,
|
|
|
|
|
normalized_model,
|
|
|
|
|
len(transcript_text),
|
|
|
|
|
)
|
|
|
|
|
return {"success": True, "transcript": transcript_text, "provider": "local_command"}
|
|
|
|
|
|
|
|
|
|
except KeyError as e:
|
|
|
|
|
return {
|
|
|
|
|
"success": False,
|
|
|
|
|
"transcript": "",
|
|
|
|
|
"error": f"Invalid {LOCAL_STT_COMMAND_ENV} template, missing placeholder: {e}",
|
|
|
|
|
}
|
|
|
|
|
except subprocess.CalledProcessError as e:
|
|
|
|
|
details = e.stderr.strip() or e.stdout.strip() or str(e)
|
|
|
|
|
logger.error("Local STT command failed for %s: %s", file_path, details)
|
|
|
|
|
return {"success": False, "transcript": "", "error": f"Local STT failed: {details}"}
|
|
|
|
|
except Exception as e:
|
|
|
|
|
logger.error("Unexpected error during local command transcription: %s", e, exc_info=True)
|
|
|
|
|
return {"success": False, "transcript": "", "error": f"Local transcription failed: {e}"}
|
|
|
|
|
|
2026-03-13 23:33:16 +03:00
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# Provider: groq (Whisper API — free tier)
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _transcribe_groq(file_path: str, model_name: str) -> Dict[str, Any]:
|
|
|
|
|
"""Transcribe using Groq Whisper API (free tier available)."""
|
|
|
|
|
api_key = os.getenv("GROQ_API_KEY")
|
|
|
|
|
if not api_key:
|
|
|
|
|
return {"success": False, "transcript": "", "error": "GROQ_API_KEY not set"}
|
|
|
|
|
|
|
|
|
|
if not _HAS_OPENAI:
|
|
|
|
|
return {"success": False, "transcript": "", "error": "openai package not installed"}
|
Add messaging platform enhancements: STT, stickers, Discord UX, Slack, pairing, hooks
Major feature additions inspired by OpenClaw/ClawdBot integration analysis:
Voice Message Transcription (STT):
- Auto-transcribe voice/audio messages via OpenAI Whisper API
- Download voice to ~/.hermes/audio_cache/ on Telegram/Discord/WhatsApp
- Inject transcript as text so all models can understand voice input
- Configurable model (whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe)
Telegram Sticker Understanding:
- Describe static stickers via vision tool with JSON-backed cache
- Cache keyed by file_unique_id avoids redundant API calls
- Animated/video stickers get emoji-based fallback description
Discord Rich UX:
- Native slash commands (/ask, /reset, /status, /stop) via app_commands
- Button-based exec approvals (Allow Once / Always Allow / Deny)
- ExecApprovalView with user authorization and timeout handling
Slack Integration:
- Full SlackAdapter using slack-bolt with Socket Mode
- DMs, channel messages (mention-gated), /hermes slash command
- File attachment handling with bot-token-authenticated downloads
DM Pairing System:
- Code-based user authorization as alternative to static allowlists
- 8-char codes from unambiguous alphabet, 1-hour expiry
- Rate limiting, lockout after failed attempts, chmod 0600 on data
- CLI: hermes pairing list/approve/revoke/clear-pending
Event Hook System:
- File-based hook discovery from ~/.hermes/hooks/
- HOOK.yaml + handler.py per hook, sync/async handler support
- Events: gateway:startup, session:start/reset, agent:start/step/end
- Wildcard matching (command:* catches all command events)
Cross-Channel Messaging:
- send_message agent tool for delivering to any connected platform
- Enables cron job delivery and cross-platform notifications
Human-Like Response Pacing:
- Configurable delays between message chunks (off/natural/custom)
- HERMES_HUMAN_DELAY_MODE env var with min/max ms settings
Warm Injection Message Style:
- Retrofitted image vision messages with friendly kawaii-consistent tone
- All new injection messages (STT, stickers, errors) use warm style
Also: updated config migration to prompt for optional keys interactively,
bumped config version, updated README, AGENTS.md, .env.example,
cli-config.yaml.example, install scripts, pyproject.toml, and toolsets.
2026-02-15 21:38:59 -08:00
|
|
|
|
2026-03-13 23:33:16 +03:00
|
|
|
# Auto-correct model if caller passed an OpenAI-only model
|
|
|
|
|
if model_name in OPENAI_MODELS:
|
|
|
|
|
logger.info("Model %s not available on Groq, using %s", model_name, DEFAULT_GROQ_STT_MODEL)
|
|
|
|
|
model_name = DEFAULT_GROQ_STT_MODEL
|
|
|
|
|
|
|
|
|
|
try:
|
2026-03-14 09:06:52 +03:00
|
|
|
from openai import OpenAI, APIError, APIConnectionError, APITimeoutError
|
2026-03-13 23:33:16 +03:00
|
|
|
client = OpenAI(api_key=api_key, base_url=GROQ_BASE_URL, timeout=30, max_retries=0)
|
Add messaging platform enhancements: STT, stickers, Discord UX, Slack, pairing, hooks
Major feature additions inspired by OpenClaw/ClawdBot integration analysis:
Voice Message Transcription (STT):
- Auto-transcribe voice/audio messages via OpenAI Whisper API
- Download voice to ~/.hermes/audio_cache/ on Telegram/Discord/WhatsApp
- Inject transcript as text so all models can understand voice input
- Configurable model (whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe)
Telegram Sticker Understanding:
- Describe static stickers via vision tool with JSON-backed cache
- Cache keyed by file_unique_id avoids redundant API calls
- Animated/video stickers get emoji-based fallback description
Discord Rich UX:
- Native slash commands (/ask, /reset, /status, /stop) via app_commands
- Button-based exec approvals (Allow Once / Always Allow / Deny)
- ExecApprovalView with user authorization and timeout handling
Slack Integration:
- Full SlackAdapter using slack-bolt with Socket Mode
- DMs, channel messages (mention-gated), /hermes slash command
- File attachment handling with bot-token-authenticated downloads
DM Pairing System:
- Code-based user authorization as alternative to static allowlists
- 8-char codes from unambiguous alphabet, 1-hour expiry
- Rate limiting, lockout after failed attempts, chmod 0600 on data
- CLI: hermes pairing list/approve/revoke/clear-pending
Event Hook System:
- File-based hook discovery from ~/.hermes/hooks/
- HOOK.yaml + handler.py per hook, sync/async handler support
- Events: gateway:startup, session:start/reset, agent:start/step/end
- Wildcard matching (command:* catches all command events)
Cross-Channel Messaging:
- send_message agent tool for delivering to any connected platform
- Enables cron job delivery and cross-platform notifications
Human-Like Response Pacing:
- Configurable delays between message chunks (off/natural/custom)
- HERMES_HUMAN_DELAY_MODE env var with min/max ms settings
Warm Injection Message Style:
- Retrofitted image vision messages with friendly kawaii-consistent tone
- All new injection messages (STT, stickers, errors) use warm style
Also: updated config migration to prompt for optional keys interactively,
bumped config version, updated README, AGENTS.md, .env.example,
cli-config.yaml.example, install scripts, pyproject.toml, and toolsets.
2026-02-15 21:38:59 -08:00
|
|
|
|
|
|
|
|
with open(file_path, "rb") as audio_file:
|
|
|
|
|
transcription = client.audio.transcriptions.create(
|
2026-03-13 23:33:16 +03:00
|
|
|
model=model_name,
|
Add messaging platform enhancements: STT, stickers, Discord UX, Slack, pairing, hooks
Major feature additions inspired by OpenClaw/ClawdBot integration analysis:
Voice Message Transcription (STT):
- Auto-transcribe voice/audio messages via OpenAI Whisper API
- Download voice to ~/.hermes/audio_cache/ on Telegram/Discord/WhatsApp
- Inject transcript as text so all models can understand voice input
- Configurable model (whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe)
Telegram Sticker Understanding:
- Describe static stickers via vision tool with JSON-backed cache
- Cache keyed by file_unique_id avoids redundant API calls
- Animated/video stickers get emoji-based fallback description
Discord Rich UX:
- Native slash commands (/ask, /reset, /status, /stop) via app_commands
- Button-based exec approvals (Allow Once / Always Allow / Deny)
- ExecApprovalView with user authorization and timeout handling
Slack Integration:
- Full SlackAdapter using slack-bolt with Socket Mode
- DMs, channel messages (mention-gated), /hermes slash command
- File attachment handling with bot-token-authenticated downloads
DM Pairing System:
- Code-based user authorization as alternative to static allowlists
- 8-char codes from unambiguous alphabet, 1-hour expiry
- Rate limiting, lockout after failed attempts, chmod 0600 on data
- CLI: hermes pairing list/approve/revoke/clear-pending
Event Hook System:
- File-based hook discovery from ~/.hermes/hooks/
- HOOK.yaml + handler.py per hook, sync/async handler support
- Events: gateway:startup, session:start/reset, agent:start/step/end
- Wildcard matching (command:* catches all command events)
Cross-Channel Messaging:
- send_message agent tool for delivering to any connected platform
- Enables cron job delivery and cross-platform notifications
Human-Like Response Pacing:
- Configurable delays between message chunks (off/natural/custom)
- HERMES_HUMAN_DELAY_MODE env var with min/max ms settings
Warm Injection Message Style:
- Retrofitted image vision messages with friendly kawaii-consistent tone
- All new injection messages (STT, stickers, errors) use warm style
Also: updated config migration to prompt for optional keys interactively,
bumped config version, updated README, AGENTS.md, .env.example,
cli-config.yaml.example, install scripts, pyproject.toml, and toolsets.
2026-02-15 21:38:59 -08:00
|
|
|
file=audio_file,
|
|
|
|
|
response_format="text",
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
transcript_text = str(transcription).strip()
|
2026-03-13 23:33:16 +03:00
|
|
|
logger.info("Transcribed %s via Groq API (%s, %d chars)",
|
|
|
|
|
Path(file_path).name, model_name, len(transcript_text))
|
Add messaging platform enhancements: STT, stickers, Discord UX, Slack, pairing, hooks
Major feature additions inspired by OpenClaw/ClawdBot integration analysis:
Voice Message Transcription (STT):
- Auto-transcribe voice/audio messages via OpenAI Whisper API
- Download voice to ~/.hermes/audio_cache/ on Telegram/Discord/WhatsApp
- Inject transcript as text so all models can understand voice input
- Configurable model (whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe)
Telegram Sticker Understanding:
- Describe static stickers via vision tool with JSON-backed cache
- Cache keyed by file_unique_id avoids redundant API calls
- Animated/video stickers get emoji-based fallback description
Discord Rich UX:
- Native slash commands (/ask, /reset, /status, /stop) via app_commands
- Button-based exec approvals (Allow Once / Always Allow / Deny)
- ExecApprovalView with user authorization and timeout handling
Slack Integration:
- Full SlackAdapter using slack-bolt with Socket Mode
- DMs, channel messages (mention-gated), /hermes slash command
- File attachment handling with bot-token-authenticated downloads
DM Pairing System:
- Code-based user authorization as alternative to static allowlists
- 8-char codes from unambiguous alphabet, 1-hour expiry
- Rate limiting, lockout after failed attempts, chmod 0600 on data
- CLI: hermes pairing list/approve/revoke/clear-pending
Event Hook System:
- File-based hook discovery from ~/.hermes/hooks/
- HOOK.yaml + handler.py per hook, sync/async handler support
- Events: gateway:startup, session:start/reset, agent:start/step/end
- Wildcard matching (command:* catches all command events)
Cross-Channel Messaging:
- send_message agent tool for delivering to any connected platform
- Enables cron job delivery and cross-platform notifications
Human-Like Response Pacing:
- Configurable delays between message chunks (off/natural/custom)
- HERMES_HUMAN_DELAY_MODE env var with min/max ms settings
Warm Injection Message Style:
- Retrofitted image vision messages with friendly kawaii-consistent tone
- All new injection messages (STT, stickers, errors) use warm style
Also: updated config migration to prompt for optional keys interactively,
bumped config version, updated README, AGENTS.md, .env.example,
cli-config.yaml.example, install scripts, pyproject.toml, and toolsets.
2026-02-15 21:38:59 -08:00
|
|
|
|
2026-03-13 23:33:16 +03:00
|
|
|
return {"success": True, "transcript": transcript_text, "provider": "groq"}
|
2026-03-03 17:17:40 +03:00
|
|
|
|
2026-03-13 23:33:16 +03:00
|
|
|
except PermissionError:
|
|
|
|
|
return {"success": False, "transcript": "", "error": f"Permission denied: {file_path}"}
|
|
|
|
|
except APIConnectionError as e:
|
|
|
|
|
return {"success": False, "transcript": "", "error": f"Connection error: {e}"}
|
|
|
|
|
except APITimeoutError as e:
|
|
|
|
|
return {"success": False, "transcript": "", "error": f"Request timeout: {e}"}
|
|
|
|
|
except APIError as e:
|
|
|
|
|
return {"success": False, "transcript": "", "error": f"API error: {e}"}
|
|
|
|
|
except Exception as e:
|
|
|
|
|
logger.error("Groq transcription failed: %s", e, exc_info=True)
|
|
|
|
|
return {"success": False, "transcript": "", "error": f"Transcription failed: {e}"}
|
|
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# Provider: openai (Whisper API)
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _transcribe_openai(file_path: str, model_name: str) -> Dict[str, Any]:
|
|
|
|
|
"""Transcribe using OpenAI Whisper API (paid)."""
|
2026-03-15 21:51:40 -07:00
|
|
|
api_key = _resolve_openai_api_key()
|
2026-03-13 23:33:16 +03:00
|
|
|
if not api_key:
|
2026-03-15 21:51:40 -07:00
|
|
|
return {
|
|
|
|
|
"success": False,
|
|
|
|
|
"transcript": "",
|
|
|
|
|
"error": "Neither VOICE_TOOLS_OPENAI_KEY nor OPENAI_API_KEY is set",
|
|
|
|
|
}
|
2026-03-13 23:33:16 +03:00
|
|
|
|
|
|
|
|
if not _HAS_OPENAI:
|
|
|
|
|
return {"success": False, "transcript": "", "error": "openai package not installed"}
|
|
|
|
|
|
|
|
|
|
# Auto-correct model if caller passed a Groq-only model
|
|
|
|
|
if model_name in GROQ_MODELS:
|
|
|
|
|
logger.info("Model %s not available on OpenAI, using %s", model_name, DEFAULT_STT_MODEL)
|
|
|
|
|
model_name = DEFAULT_STT_MODEL
|
|
|
|
|
|
|
|
|
|
try:
|
2026-03-14 09:06:52 +03:00
|
|
|
from openai import OpenAI, APIError, APIConnectionError, APITimeoutError
|
2026-03-13 23:33:16 +03:00
|
|
|
client = OpenAI(api_key=api_key, base_url=OPENAI_BASE_URL, timeout=30, max_retries=0)
|
|
|
|
|
|
|
|
|
|
with open(file_path, "rb") as audio_file:
|
|
|
|
|
transcription = client.audio.transcriptions.create(
|
|
|
|
|
model=model_name,
|
|
|
|
|
file=audio_file,
|
|
|
|
|
response_format="text",
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
transcript_text = str(transcription).strip()
|
|
|
|
|
logger.info("Transcribed %s via OpenAI API (%s, %d chars)",
|
|
|
|
|
Path(file_path).name, model_name, len(transcript_text))
|
|
|
|
|
|
|
|
|
|
return {"success": True, "transcript": transcript_text, "provider": "openai"}
|
Add messaging platform enhancements: STT, stickers, Discord UX, Slack, pairing, hooks
Major feature additions inspired by OpenClaw/ClawdBot integration analysis:
Voice Message Transcription (STT):
- Auto-transcribe voice/audio messages via OpenAI Whisper API
- Download voice to ~/.hermes/audio_cache/ on Telegram/Discord/WhatsApp
- Inject transcript as text so all models can understand voice input
- Configurable model (whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe)
Telegram Sticker Understanding:
- Describe static stickers via vision tool with JSON-backed cache
- Cache keyed by file_unique_id avoids redundant API calls
- Animated/video stickers get emoji-based fallback description
Discord Rich UX:
- Native slash commands (/ask, /reset, /status, /stop) via app_commands
- Button-based exec approvals (Allow Once / Always Allow / Deny)
- ExecApprovalView with user authorization and timeout handling
Slack Integration:
- Full SlackAdapter using slack-bolt with Socket Mode
- DMs, channel messages (mention-gated), /hermes slash command
- File attachment handling with bot-token-authenticated downloads
DM Pairing System:
- Code-based user authorization as alternative to static allowlists
- 8-char codes from unambiguous alphabet, 1-hour expiry
- Rate limiting, lockout after failed attempts, chmod 0600 on data
- CLI: hermes pairing list/approve/revoke/clear-pending
Event Hook System:
- File-based hook discovery from ~/.hermes/hooks/
- HOOK.yaml + handler.py per hook, sync/async handler support
- Events: gateway:startup, session:start/reset, agent:start/step/end
- Wildcard matching (command:* catches all command events)
Cross-Channel Messaging:
- send_message agent tool for delivering to any connected platform
- Enables cron job delivery and cross-platform notifications
Human-Like Response Pacing:
- Configurable delays between message chunks (off/natural/custom)
- HERMES_HUMAN_DELAY_MODE env var with min/max ms settings
Warm Injection Message Style:
- Retrofitted image vision messages with friendly kawaii-consistent tone
- All new injection messages (STT, stickers, errors) use warm style
Also: updated config migration to prompt for optional keys interactively,
bumped config version, updated README, AGENTS.md, .env.example,
cli-config.yaml.example, install scripts, pyproject.toml, and toolsets.
2026-02-15 21:38:59 -08:00
|
|
|
|
2026-03-02 01:53:18 +03:00
|
|
|
except PermissionError:
|
2026-03-13 23:33:16 +03:00
|
|
|
return {"success": False, "transcript": "", "error": f"Permission denied: {file_path}"}
|
2026-03-02 01:53:18 +03:00
|
|
|
except APIConnectionError as e:
|
2026-03-13 23:33:16 +03:00
|
|
|
return {"success": False, "transcript": "", "error": f"Connection error: {e}"}
|
2026-03-02 01:53:18 +03:00
|
|
|
except APITimeoutError as e:
|
2026-03-13 23:33:16 +03:00
|
|
|
return {"success": False, "transcript": "", "error": f"Request timeout: {e}"}
|
2026-03-02 01:53:18 +03:00
|
|
|
except APIError as e:
|
2026-03-13 23:33:16 +03:00
|
|
|
return {"success": False, "transcript": "", "error": f"API error: {e}"}
|
Add messaging platform enhancements: STT, stickers, Discord UX, Slack, pairing, hooks
Major feature additions inspired by OpenClaw/ClawdBot integration analysis:
Voice Message Transcription (STT):
- Auto-transcribe voice/audio messages via OpenAI Whisper API
- Download voice to ~/.hermes/audio_cache/ on Telegram/Discord/WhatsApp
- Inject transcript as text so all models can understand voice input
- Configurable model (whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe)
Telegram Sticker Understanding:
- Describe static stickers via vision tool with JSON-backed cache
- Cache keyed by file_unique_id avoids redundant API calls
- Animated/video stickers get emoji-based fallback description
Discord Rich UX:
- Native slash commands (/ask, /reset, /status, /stop) via app_commands
- Button-based exec approvals (Allow Once / Always Allow / Deny)
- ExecApprovalView with user authorization and timeout handling
Slack Integration:
- Full SlackAdapter using slack-bolt with Socket Mode
- DMs, channel messages (mention-gated), /hermes slash command
- File attachment handling with bot-token-authenticated downloads
DM Pairing System:
- Code-based user authorization as alternative to static allowlists
- 8-char codes from unambiguous alphabet, 1-hour expiry
- Rate limiting, lockout after failed attempts, chmod 0600 on data
- CLI: hermes pairing list/approve/revoke/clear-pending
Event Hook System:
- File-based hook discovery from ~/.hermes/hooks/
- HOOK.yaml + handler.py per hook, sync/async handler support
- Events: gateway:startup, session:start/reset, agent:start/step/end
- Wildcard matching (command:* catches all command events)
Cross-Channel Messaging:
- send_message agent tool for delivering to any connected platform
- Enables cron job delivery and cross-platform notifications
Human-Like Response Pacing:
- Configurable delays between message chunks (off/natural/custom)
- HERMES_HUMAN_DELAY_MODE env var with min/max ms settings
Warm Injection Message Style:
- Retrofitted image vision messages with friendly kawaii-consistent tone
- All new injection messages (STT, stickers, errors) use warm style
Also: updated config migration to prompt for optional keys interactively,
bumped config version, updated README, AGENTS.md, .env.example,
cli-config.yaml.example, install scripts, pyproject.toml, and toolsets.
2026-02-15 21:38:59 -08:00
|
|
|
except Exception as e:
|
2026-03-13 23:33:16 +03:00
|
|
|
logger.error("OpenAI transcription failed: %s", e, exc_info=True)
|
|
|
|
|
return {"success": False, "transcript": "", "error": f"Transcription failed: {e}"}
|
|
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# Public API
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def transcribe_audio(file_path: str, model: Optional[str] = None) -> Dict[str, Any]:
|
|
|
|
|
"""
|
|
|
|
|
Transcribe an audio file using the configured STT provider.
|
|
|
|
|
|
|
|
|
|
Provider priority:
|
|
|
|
|
1. User config (``stt.provider`` in config.yaml)
|
|
|
|
|
2. Auto-detect: local faster-whisper (free) > Groq (free tier) > OpenAI (paid)
|
|
|
|
|
|
|
|
|
|
Args:
|
|
|
|
|
file_path: Absolute path to the audio file to transcribe.
|
|
|
|
|
model: Override the model. If None, uses config or provider default.
|
|
|
|
|
|
|
|
|
|
Returns:
|
|
|
|
|
dict with keys:
|
|
|
|
|
- "success" (bool): Whether transcription succeeded
|
|
|
|
|
- "transcript" (str): The transcribed text (empty on failure)
|
|
|
|
|
- "error" (str, optional): Error message if success is False
|
|
|
|
|
- "provider" (str, optional): Which provider was used
|
|
|
|
|
"""
|
|
|
|
|
# Validate input
|
|
|
|
|
error = _validate_audio_file(file_path)
|
|
|
|
|
if error:
|
|
|
|
|
return error
|
|
|
|
|
|
|
|
|
|
# Load config and determine provider
|
|
|
|
|
stt_config = _load_stt_config()
|
2026-03-14 22:09:59 -07:00
|
|
|
if not is_stt_enabled(stt_config):
|
|
|
|
|
return {
|
|
|
|
|
"success": False,
|
|
|
|
|
"transcript": "",
|
|
|
|
|
"error": "STT is disabled in config.yaml (stt.enabled: false).",
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-13 23:33:16 +03:00
|
|
|
provider = _get_provider(stt_config)
|
|
|
|
|
|
|
|
|
|
if provider == "local":
|
|
|
|
|
local_cfg = stt_config.get("local", {})
|
|
|
|
|
model_name = model or local_cfg.get("model", DEFAULT_LOCAL_MODEL)
|
|
|
|
|
return _transcribe_local(file_path, model_name)
|
|
|
|
|
|
2026-03-15 21:51:40 -07:00
|
|
|
if provider == "local_command":
|
|
|
|
|
local_cfg = stt_config.get("local", {})
|
|
|
|
|
model_name = _normalize_local_command_model(
|
|
|
|
|
model or local_cfg.get("model", DEFAULT_LOCAL_MODEL)
|
|
|
|
|
)
|
|
|
|
|
return _transcribe_local_command(file_path, model_name)
|
|
|
|
|
|
2026-03-13 23:33:16 +03:00
|
|
|
if provider == "groq":
|
|
|
|
|
model_name = model or DEFAULT_GROQ_STT_MODEL
|
|
|
|
|
return _transcribe_groq(file_path, model_name)
|
|
|
|
|
|
|
|
|
|
if provider == "openai":
|
|
|
|
|
openai_cfg = stt_config.get("openai", {})
|
|
|
|
|
model_name = model or openai_cfg.get("model", DEFAULT_STT_MODEL)
|
|
|
|
|
return _transcribe_openai(file_path, model_name)
|
|
|
|
|
|
|
|
|
|
# No provider available
|
|
|
|
|
return {
|
|
|
|
|
"success": False,
|
|
|
|
|
"transcript": "",
|
|
|
|
|
"error": (
|
|
|
|
|
"No STT provider available. Install faster-whisper for free local "
|
2026-03-15 21:51:40 -07:00
|
|
|
f"transcription, configure {LOCAL_STT_COMMAND_ENV} or install a local whisper CLI, "
|
|
|
|
|
"set GROQ_API_KEY for free Groq Whisper, or set VOICE_TOOLS_OPENAI_KEY "
|
|
|
|
|
"or OPENAI_API_KEY for the OpenAI Whisper API."
|
2026-03-13 23:33:16 +03:00
|
|
|
),
|
|
|
|
|
}
|