Commit Graph

260 Commits

Author SHA1 Message Date
teknium1
780ddd102b fix(docker): gate cwd workspace mount behind config
Keep Docker sandboxes isolated by default. Add an explicit terminal.docker_mount_cwd_to_workspace opt-in, thread it through terminal/file environment creation, and document the security tradeoff and config.yaml workflow clearly.
2026-03-16 05:20:56 -07:00
Teknium
9e845a6e53 feat: major /rollback improvements — enabled by default, diff preview, file-level restore, conversation undo, terminal checkpoints
Checkpoint & rollback upgrades:

1. Enabled by default — checkpoints are now on for all new sessions.
   Zero cost when no file-mutating tools fire. Disable with
   checkpoints.enabled: false in config.yaml.

2. Diff preview — /rollback diff <N> shows a git diff between the
   checkpoint and current working tree before committing to a restore.

3. File-level restore — /rollback <N> <file> restores a single file
   from a checkpoint instead of the entire directory.

4. Conversation undo on rollback — when restoring files, the last
   chat turn is automatically undone so the agent's context matches
   the restored filesystem state.

5. Terminal command checkpoints — destructive terminal commands (rm,
   mv, sed -i, truncate, git reset/clean, output redirects) now
   trigger automatic checkpoints before execution. Previously only
   write_file and patch were covered.

6. Change summary in listing — /rollback now shows file count and
   +insertions/-deletions for each checkpoint.

7. Fixed dead code — removed duplicate _run_git call in
   list_checkpoints with nonsensical --all if False condition.

8. Updated help text — /rollback with no args now shows available
   subcommands (diff, file-level restore).
2026-03-16 04:43:37 -07:00
Teknium
00a0c56598 feat: add persistent CLI status bar and usage details (#1522)
Salvaged from PR #1104 by kshitijk4poor. Closes #683.

Adds a persistent status bar to the CLI showing model name, context
window usage with visual bar, estimated cost, and session duration.
Responsive layout degrades gracefully for narrow terminals.

Changes:
- agent/usage_pricing.py: shared pricing table, cost estimation with
  Decimal arithmetic, duration/token formatting helpers
- agent/insights.py: refactored to reuse usage_pricing (eliminates
  duplicate pricing table and formatting logic)
- cli.py: status bar with FormattedTextControl fragments, color-coded
  context thresholds (green/yellow/orange/red), enhanced /usage with
  cost breakdown, 1Hz idle refresh for status bar updates
- tests/test_cli_status_bar.py: status bar snapshot, width collapsing,
  usage report with/without pricing, zero-priced model handling
- tests/test_insights.py: verify zero-priced providers show as unknown

Salvage fixes:
- Resolved conflict with voice status bar (both coexist in layout)
- Import _format_context_length from hermes_cli.banner (moved since PR)

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-16 04:42:48 -07:00
Teknium
c1da1fdcd5 feat: auto-detect provider when switching models via /model (#1506)
When typing /model deepseek-chat while on a different provider, the
model name now auto-resolves to the correct provider instead of
silently staying on the wrong one and causing API errors.

Detection priority:
1. Direct provider with credentials (e.g. DEEPSEEK_API_KEY set)
2. OpenRouter catalog match with proper slug remapping
3. Direct provider without creds (clear error beats silent failure)

Also adds DeepSeek as a first-class API-key provider — just set
DEEPSEEK_API_KEY and /model deepseek-chat routes directly.

Bare model names get remapped to proper OpenRouter slugs:
  /model gpt-5.4 → openai/gpt-5.4
  /model claude-opus-4.6 → anthropic/claude-opus-4.6

Salvages the concept from PR #1177 by @virtaava with credential
awareness and OpenRouter slug mapping added.

Co-authored-by: virtaava <virtaava@users.noreply.github.com>
2026-03-16 04:34:45 -07:00
teknium1
33ebedc76d feat: enable persistent shell by default for SSH, add config option
SSH persistent shell now defaults to true — non-local backends benefit
most from state persistence across execute() calls. Local backend
remains opt-in via TERMINAL_LOCAL_PERSISTENT env var.

New config.yaml option: terminal.persistent_shell (default: true)
Controls the default for non-local backends. Users can disable with:
  hermes config set terminal.persistent_shell false

Precedence: per-backend env var > TERMINAL_PERSISTENT_SHELL > default.

Wired through cli.py, gateway/run.py, and hermes_cli/config.py so the
config.yaml value reaches terminal_tool via env var bridge.
2026-03-15 20:17:13 -07:00
Teknium
103f7b1ebc fix: verbose mode shows full untruncated output
* fix(cli): silence tirith prefetch install warnings at startup

* fix: verbose mode now shows full untruncated tool args, results, content, and think blocks

When tool progress is set to 'verbose' (via /verbose or config), the display
was still truncating tool arguments to 100 chars, tool results to 100-200 chars,
assistant content to 100 chars, and think blocks to 5 lines. This defeated the
purpose of verbose mode.

Changes:
- Tool args: show full JSON args (not truncated to log_prefix_chars)
- Tool results: show full result content in both display and debug logs
- Assistant content: show full content during tool-call loops
- Think blocks: show full reasoning text (not truncated to 5 lines/100 chars)
- Auto-enable reasoning display when verbose mode is active
- Fix initial agent creation to respect verbose config (was always quiet_mode=True)
- Updated verbose label to mention think blocks
2026-03-15 20:03:37 -07:00
Teknium
5e92a4ce5a fix: auto-reload MCP tools when mcp_servers config changes without restart (#1474)
Fixes #1036

After adding an MCP server to config.yaml, users had to restart Hermes
before the new tools became visible — even though /reload-mcp existed.

Add _check_config_mcp_changes() called from process_loop every 5s:
- stat() config.yaml for mtime changes (fast path, no YAML parse)
- On mtime change, parse and compare mcp_servers section
- If mcp_servers changed, auto-trigger _reload_mcp() and notify user
- Skip check while agent is running to avoid interrupting tool calls
- Throttled to CONFIG_WATCH_INTERVAL=5s to avoid busy-polling

/reload-mcp still works for manual force-reload.

Tests: 6 new tests in TestMCPConfigWatch, all passed

Co-authored-by: teyrebaz33 <hakanerten02@hotmail.com>
2026-03-15 19:03:34 -07:00
Teknium
471c663fdf fix(cli): silence tirith prefetch install warnings at startup (#1452) 2026-03-15 18:07:03 -07:00
teknium1
f24c00a5bf fix(config): reload .env over stale shell overrides
Hermes startup entrypoints now load ~/.hermes/.env and project fallback env files with user config taking precedence over stale shell-exported values. This makes model/provider/base URL changes in .env actually take effect after restarting Hermes. Adds a shared env loader plus regression coverage, and reproduces the original bug case where OPENAI_BASE_URL and HERMES_INFERENCE_PROVIDER remained stuck on old shell values before import.
2026-03-15 06:46:28 -07:00
Teknium
d36b3d498d Merge pull request #1388 from NousResearch/hermes/hermes-0fadff1b
fix: harden .worktreeinclude path containment
2026-03-14 21:53:28 -07:00
teknium1
f4c012873c fix: harden salvaged worktree include checks
Use Path.relative_to-based containment checks for the salvaged .worktreeinclude guard, remove the replayed test logic from the cherry-picked PR, and add real integration regressions for file, directory, and symlink escapes.
2026-03-14 21:51:27 -07:00
Sebastion
12bc86d9c9 fix: prevent path traversal in .worktreeinclude file processing
Resolve .worktreeinclude entries and validate that both the source path
stays within the repository root and the destination path stays within
the worktree directory before copying files or creating symlinks.

A malicious .worktreeinclude in a cloned repository could previously
reference paths like "../../etc/passwd" to copy or symlink arbitrary
files from outside the repo into the worktree.

CWE-22: Improper Limitation of a Pathname to a Restricted Directory
2026-03-14 21:48:19 -07:00
Nyk
b89177668e fix(cli): non-blocking startup update check and banner deduplication
- Add background thread mechanism (prefetch_update_check/get_update_result)
  so git fetch runs in parallel with skill sync and agent init
- Fix repo path fallback in check_for_updates() for dev installs
- Remove duplicate build_welcome_banner (~180 lines) and
  _format_context_length from cli.py — the banner.py version is
  now the single source of truth
- Port skin banner_hero/banner_logo support and terminal width check
  from cli.py's version into banner.py
- Add update status output to hermes version command
- Add unit tests for update check, prefetch, and version string
2026-03-14 21:45:50 -07:00
Teknium
b14a07315b fix: save /plan output in workspace (#1381) 2026-03-14 21:28:51 -07:00
Teknium
ff3473a37c feat: add /plan command (#1372)
* feat: add /plan command

* refactor: back /plan with bundled skill

* docs: document /plan skill
2026-03-14 21:18:17 -07:00
teknium1
9f6bccd76a feat: add direct endpoint overrides for auxiliary and delegation
Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.
2026-03-14 21:11:37 -07:00
Teknium
f8a3e37f54 Merge pull request #1343 from NousResearch/hermes/hermes-5d160594
feat: compress cron management into one tool
2026-03-14 19:34:20 -07:00
teknium1
3229e434b8 Merge origin/main into hermes/hermes-5d160594 2026-03-14 19:34:05 -07:00
Teknium
24f61d006a feat: preload CLI skills on launch (#1359)
* feat: preload CLI skills on launch

* test: cover continue with worktree and skills flags

* feat: show activated skills before CLI banner
2026-03-14 19:33:59 -07:00
teknium1
c3ea620796 feat: add multi-skill cron editing and docs 2026-03-14 19:18:10 -07:00
teknium1
df5c61b37c feat: compress cron management into one tool 2026-03-14 12:21:50 -07:00
teknium1
eb8226daab fix(cli): repair dangerous command approval UI
Move the dangerous-command header onto its own line inside the approval box
so the panel border no longer cuts through it, and restore the long-command
expand path in the active prompt_toolkit approval callback. The CLI already
had a merged 'view full command' feature in fallback/gateway paths, but the
live TUI callback was still using an older choice set and never exposed it.
Add regression tests for long-command view state, in-place expansion, and
panel rendering.
2026-03-14 11:57:44 -07:00
teyrebaz33
fbdce27b9a fix: address prefix matching recursion and skill command coverage
Per teknium1 review on PR #968:

1. Guard against infinite recursion: if expanded name equals the typed
   token (already exact), fall through to Unknown command instead of
   redispatching the same string forever.

2. Include skill slash commands in prefix resolution so execution-time
   matching agrees with tab-completion (set(COMMANDS) | set(_skill_commands)).

3. Add missing test cases:
   - unambiguous prefix with extra args does not recurse
   - exact command with args does not loop
   - skill command prefix matches correctly
   - exact builtin takes priority over skill prefix ambiguity

8 tests passing.
2026-03-14 10:33:58 -07:00
teyrebaz33
a50550fdb4 fix: add prefix matching to slash command dispatcher
Slash commands previously required exact full names. Typing /con
returned 'Unknown command' even though /config was the only match.

Add unambiguous prefix matching in process_command():
- Unique prefix (e.g. /con -> /config): dispatch immediately
- Ambiguous prefix (e.g. /re -> /reset, /retry, /reasoning...):
  show 'Did you mean' suggestions
- No match: existing 'Unknown command' error

Prefix matching uses the COMMANDS dict from hermes_cli/commands.py
(same source as SlashCommandCompleter) so it stays in sync with
any new commands added there.

Closes #928
2026-03-14 10:33:58 -07:00
teknium1
9633ddd8d8 fix: initialize CLI voice state for single-query mode
- initialize voice and interrupt runtime state in HermesCLI.__init__
- prevent chat -q from crashing before run() has executed
- add regression coverage for single-query state initialization
2026-03-14 06:31:32 -07:00
teknium1
7b10881b9e fix: persist clean voice transcripts and /voice off state
- keep CLI voice prefixes API-local while storing the original user text
- persist explicit gateway off state and restore adapter auto-TTS suppression on restart
- add regression coverage for both behaviors
2026-03-14 06:14:22 -07:00
0xbyt4
eb34c0b09a fix: voice pipeline hardening — 7 bug fixes with tests
1. Anthropic + ElevenLabs TTS silence: forward full response to TTS
   callback for non-streaming providers (choices first, then native
   content blocks fallback).

2. Subprocess timeout kill: play_audio_file now kills the process on
   TimeoutExpired instead of leaving zombie processes.

3. Discord disconnect cleanup: leave all voice channels before closing
   the client to prevent leaked state.

4. Audio stream leak: close InputStream if stream.start() fails.

5. Race condition: read/write _on_silence_stop under lock in audio
   callback thread.

6. _vprint force=True: show API error, retry, and truncation messages
   even during streaming TTS.

7. _refresh_level lock: read _voice_recording under _voice_lock.
2026-03-14 14:27:21 +03:00
0xbyt4
cc0a453476 fix: address PR review round 5 — streaming guard, VC auth, history prefix, auto-TTS control
1. Gate _streaming_api_call to chat_completions mode only — Anthropic and
   Codex fall back to _interruptible_api_call. Preserve Anthropic base_url
   across all client rebuild paths (interrupt, fallback, 401 refresh).

2. Discord VC synthetic events now use chat_type="channel" instead of
   defaulting to "dm" — prevents session bleed into DM context.
   Authorization runs before echoing transcript. Sanitize @everyone/@here
   in voice transcripts.

3. CLI voice prefix ("[Voice input...]") is now API-call-local only —
   stripped from returned history so it never persists to session DB or
   resumed sessions.

4. /voice off now disables base adapter auto-TTS via _auto_tts_disabled_chats
   set — voice input no longer triggers TTS when voice mode is off.
2026-03-14 14:27:21 +03:00
0xbyt4
69cb373864 fix: update /voice status to show correct STT provider
Voice status was hardcoded to check API keys only. Now uses the actual
provider resolution (local/groq/openai) so it correctly shows
"local faster-whisper" when installed instead of "Groq" or "MISSING".
2026-03-14 14:27:21 +03:00
0xbyt4
0ff1b4ade2 fix: harden web gateway security and fix error swallowing
- Use hmac.compare_digest for timing-safe token comparison (3 endpoints)
- Default bind to 127.0.0.1 instead of 0.0.0.0
- Sanitize upload filenames with Path.name to prevent path traversal
- Add DOMPurify to sanitize marked.parse() output against XSS
- Replace add_static with authenticated media handler
- Hide token in group chats for /remote-control command
- Use ctypes.util.find_library for Opus instead of hardcoded paths
- Add force=True to 5 interrupt _vprint calls for visibility
- Log Opus decode errors and voice restart failures instead of swallowing
2026-03-14 14:27:21 +03:00
0xbyt4
0a8985acf9 fix: add missing load_config import in _show_voice_status 2026-03-14 14:27:21 +03:00
0xbyt4
39a77431e2 fix: use shutdown() instead of cancel() on CLI exit to release persistent audio stream 2026-03-14 14:27:20 +03:00
0xbyt4
eb79dda04b fix: persistent audio stream and silence detection improvements
- Keep InputStream alive across recordings to avoid CoreAudio hang on
  repeated open/close cycles on macOS.  New _ensure_stream() creates the
  stream once; start()/stop()/cancel() only toggle frame collection.
- Add _close_stream_with_timeout() with daemon thread to prevent
  stream.stop()/close() from blocking indefinitely.
- Add generation counter to detect stale stream-open completions after
  cancel or restart.
- Run recorder.cancel() in background thread from Ctrl+C handler to
  keep the event loop responsive.
- Add shutdown() method called on /voice off to release audio resources.
- Fix silence timer reset during active speech: use dip tolerance for
  _resume_start tracker so natural speech pauses (< 0.3s) don't prevent
  the silence timer from being reset.
- Update tests to match persistent stream behavior.
2026-03-14 14:27:20 +03:00
0xbyt4
c3dc4448bf fix: disable STT retries and stop continuous mode after 3 silent cycles
- Set max_retries=0 on the STT OpenAI client. The SDK default (2) honors
  Groq's retry-after header (often 53s), blocking the thread for up to
  ~106s on rate limits. Voice STT should fail fast, not retry silently.
- Stop continuous recording mode after 3 consecutive no-speech cycles to
  prevent infinite restart loops when nobody is talking.
2026-03-14 14:27:20 +03:00
0xbyt4
0a89933f9b fix: add STT timeout, move finally restart to thread, guard exit on recording
- Set OpenAI client timeout=30s in transcribe_audio() — default 600s
  blocks _voice_processing for 10 min if Groq/OpenAI stalls
- Move _voice_start_recording in _voice_stop_and_transcribe finally
  block to a daemon thread (same pattern as Ctrl+B handler and
  process_loop)
- Add _should_exit guard at top of _voice_start_recording so all 4
  call sites respect shutdown without individual checks
2026-03-14 14:27:20 +03:00
0xbyt4
9d58cafec9 fix: move process_loop voice restart to daemon thread, use _cprint consistently
- process_loop's continuous mode restart called _voice_start_recording()
  directly, blocking the loop if play_beep/sd.wait hangs — queued user
  input would stall silently. Dispatch to daemon thread like Ctrl+B handler.
- Replace print() with _cprint() in _handle_voice_command for consistency
  with the rest of the voice mode code.
2026-03-14 14:27:20 +03:00
0xbyt4
d0e3b39e69 fix: prevent Ctrl+B key handler from blocking prompt_toolkit event loop
The handle_voice_record key binding runs in prompt_toolkit's event-loop
thread. When silence auto-stopped recording, _voice_recording was False
but recorder.stop() still held AudioRecorder._lock. A concurrent Ctrl+B
press entered the START path and blocked on that lock, freezing all
keyboard input.

Three changes:
- Set _voice_processing atomically with _voice_recording=False in
  _voice_stop_and_transcribe to close the race window
- Add _voice_processing guard in the START path to prevent starting
  while stop/transcribe is still running
- Dispatch _voice_start_recording to a daemon thread so play_beep
  (sd.wait) and AudioRecorder.start (lock acquire) never block the
  event loop
2026-03-14 14:27:20 +03:00
0xbyt4
ddfd6e0c59 fix: resolve 6 voice mode bugs found during audit
- edge_tts NameError: _generate_edge_tts now calls _import_edge_tts()
  instead of referencing bare module name (tts_tool.py)
- TTS thread leak: chat() finally block sends sentinel to text_queue,
  sets stop_event, and joins tts_thread on exception paths (cli.py)
- output_stream leak: moved close() into finally block so audio device
  is released even on exception (tts_tool.py)
- Ctrl+C continuous mode: cancel handler now resets _voice_continuous
  to prevent auto-restart after user cancels recording (cli.py)
- _disable_voice_mode: now calls stop_playback() and sets
  _voice_tts_done so TTS stops when voice mode is turned off (cli.py)
- _show_voice_status: reads record key from config instead of
  hardcoding Ctrl+B (cli.py)
2026-03-14 14:27:20 +03:00
0xbyt4
a78249230c fix: address voice mode PR review (streaming TTS, prompt cache, _vprint)
Bug A: Replace stale _HAS_ELEVENLABS/_HAS_AUDIO boolean imports with
lazy import function calls (_import_elevenlabs, _import_sounddevice).
The old constants no longer exist in tts_tool -- the try/except
silently swallowed the ImportError, leaving streaming TTS dead.

Bug B: Use user message prefix instead of modifying system prompt for
voice mode instruction. Changing ephemeral_system_prompt mid-session
invalidates the prompt cache. Now the concise-response hint is
prepended to the user_message passed to run_conversation while
conversation_history keeps the original text.

Minor: Add force parameter to _vprint so critical error messages
(max retries, non-retryable errors, API failures) are always shown
even during streaming TTS playback.

Tests: 15 new tests in test_voice_cli_integration.py covering all
three fixes -- lazy import activation, message prefix behavior,
history cleanliness, system prompt stability, and AST verification
that all critical _vprint calls use force=True.
2026-03-14 14:27:20 +03:00
0xbyt4
fc893f98f4 fix: wrap sd.InputStream in try-except and fix config key name
- AudioRecorder.start() now catches InputStream errors gracefully
  with a clear error message about microphone availability
- Fix config key mismatch: cli.py was reading "push_to_talk_key"
  but config.py defines "record_key" -- now consistent
- Add format conversion from config format ("ctrl+b") to
  prompt_toolkit format ("c-b")
2026-03-14 14:27:20 +03:00
0xbyt4
a8838a7ae5 fix: replace all hardcoded Ctrl+R references with Ctrl+B 2026-03-14 14:27:20 +03:00
0xbyt4
b859dfab16 fix: address voice mode review feedback
1. Fully lazy imports: sounddevice, numpy, elevenlabs, edge_tts, and
   openai are never imported at module level. Each is imported only when
   the feature is explicitly activated, preventing crashes in headless
   environments (SSH, Docker, WSL, no PortAudio).

2. No core agent loop changes: streaming TTS path extracted from
   _interruptible_api_call() into separate _streaming_api_call() method.
   The original method is restored to its upstream form.

3. Configurable key binding: push-to-talk key changed from Ctrl+R
   (conflicts with readline reverse-search) to Ctrl+B by default.
   Configurable via voice.push_to_talk_key in config.yaml.

4. Environment detection: new detect_audio_environment() function checks
   for SSH, Docker, WSL, and missing audio devices before enabling voice
   mode. Auto-disables with clear warnings in incompatible environments.

5. Graceful degradation: every audio touchpoint (sd.play, sd.InputStream,
   sd.OutputStream) wrapped in try/except with ImportError/OSError
   handling. Failures produce warnings, not crashes.
2026-03-14 14:27:20 +03:00
0xbyt4
46db7aeffd fix: streaming tool call parsing, error handling, and fake HA state mutation
- Fix Gemini streaming tool call merge bug: multiple tool calls with same
  index but different IDs are now parsed as separate calls instead of
  concatenating names (e.g. ha_call_serviceha_call_service)
- Handle partial results in voice mode: show error and stop continuous
  mode when agent returns partial/failed results with empty response
- Fix error display during streaming TTS: error messages are shown in
  full response box even when streaming box was already opened
- Add duplicate sentence filter in TTS: skip near-duplicate sentences
  from LLM repetition
- Fix fake HA server state mutation: turn_on/turn_off/set_temperature
  correctly update entity states; temperature sensor simulates change
  when thermostat is adjusted
2026-03-14 14:27:20 +03:00
0xbyt4
404123aea7 feat: add persistent voice mode status bar below input area
Shows voice state (recording, transcribing, TTS/continuous toggles)
as a persistent toolbar using prompt_toolkit ConditionalContainer.
2026-03-14 14:27:20 +03:00
0xbyt4
b00c5949fc fix: suppress verbose logs during streaming TTS, improve hallucination filter, stop continuous mode on errors
- Add _vprint() helper to suppress log output when stream_callback is active
- Expand Whisper hallucination filter with multi-language phrases and regex pattern for repetitive text
- Stop continuous voice mode when agent returns a failed result (e.g. 429 rate limit)
2026-03-14 14:26:55 +03:00
0xbyt4
3a1b35ed92 fix: voice mode race conditions, temp file leak, think tag parsing
- Atomic check-and-set for _voice_recording flag with _voice_lock
- Guard _voice_stop_and_transcribe against concurrent invocation
- Remove premature flag clearing from Ctrl+R handler
- Clean up temp WAV files in finally block (_play_via_tempfile)
- Use buffer-level regex for <think> block filtering (handles chunked tags)
- Prevent /voice on prompt accumulation on repeated calls
- Include Groq in STT key error message
2026-03-14 14:26:55 +03:00
0xbyt4
7d4b4e95f1 feat: sync text display with TTS audio playback
Move screen output from stream_callback to display_callback called by
TTS consumer thread. Text now appears sentence-by-sentence in sync with
audio instead of streaming ahead at LLM speed. Removes quiet_mode hack.
2026-03-14 14:26:55 +03:00
0xbyt4
179d9e1a22 feat: add streaming sentence-by-sentence TTS via ElevenLabs
Stream audio to speaker as the agent generates tokens instead of
waiting for the full response. First sentence plays within ~1-2s
of agent starting to respond.

- run_agent: add stream_callback to run_conversation/chat, streaming
  path in _interruptible_api_call accumulates chunks into mock
  ChatCompletion while forwarding content deltas to callback
- tts_tool: add stream_tts_to_speaker() with sentence buffering,
  think block filtering, markdown stripping, ElevenLabs pcm_24000
  streaming to sounddevice OutputStream
- cli: wire up streaming TTS pipeline in chat(), detect elevenlabs
  provider + sounddevice availability, skip batch TTS when streaming
  is active, signal stop on interrupt

Falls back to batch TTS for Edge/OpenAI providers or when
elevenlabs/sounddevice are not available. Zero impact on non-voice
mode (callback defaults to None).
2026-03-14 14:26:30 +03:00
0xbyt4
d7425343ee fix: fix voice recording stuck in continuous mode
- Track submitted state locally instead of using racy qsize() check
- Allow Ctrl+R to stop recording even while agent is running
- Add double-start guard to prevent concurrent recording attempts
2026-03-14 14:26:30 +03:00
0xbyt4
dad865e920 fix: fix silence detection bugs and add Phase 4 voice mode features
Fix 3 critical bugs in silence detection:
- Micro-pause tolerance now tracks dip duration (not time since speech start)
- Peak RMS check in stop() prevents discarding recordings with real speech
- Reduced min_speech_duration from 0.5s to 0.3s for reliable speech confirmation

Phase 4 features: configurable silence params, visual audio level indicator,
voice system prompt, tool call audio cues, TTS interrupt, continuous mode
auto-restart, interruptable playback via Popen tracking.
2026-03-14 14:26:30 +03:00