Commit Graph

10 Commits

Author SHA1 Message Date
0xbyt4
eb34c0b09a fix: voice pipeline hardening — 7 bug fixes with tests
1. Anthropic + ElevenLabs TTS silence: forward full response to TTS
   callback for non-streaming providers (choices first, then native
   content blocks fallback).

2. Subprocess timeout kill: play_audio_file now kills the process on
   TimeoutExpired instead of leaving zombie processes.

3. Discord disconnect cleanup: leave all voice channels before closing
   the client to prevent leaked state.

4. Audio stream leak: close InputStream if stream.start() fails.

5. Race condition: read/write _on_silence_stop under lock in audio
   callback thread.

6. _vprint force=True: show API error, retry, and truncation messages
   even during streaming TTS.

7. _refresh_level lock: read _voice_recording under _voice_lock.
2026-03-14 14:27:21 +03:00
0xbyt4
34c324ff59 fix(test): use real _strip_markdown_for_tts instead of duplicated copy
- Import from tools.tts_tool instead of reimplementing the logic
- Fix test_truncates_long_text: truncation is the caller's job, not the function's
- Remove unused re import
2026-03-14 14:27:20 +03:00
0xbyt4
eb79dda04b fix: persistent audio stream and silence detection improvements
- Keep InputStream alive across recordings to avoid CoreAudio hang on
  repeated open/close cycles on macOS.  New _ensure_stream() creates the
  stream once; start()/stop()/cancel() only toggle frame collection.
- Add _close_stream_with_timeout() with daemon thread to prevent
  stream.stop()/close() from blocking indefinitely.
- Add generation counter to detect stale stream-open completions after
  cancel or restart.
- Run recorder.cancel() in background thread from Ctrl+C handler to
  keep the event loop responsive.
- Add shutdown() method called on /voice off to release audio resources.
- Fix silence timer reset during active speech: use dip tolerance for
  _resume_start tracker so natural speech pauses (< 0.3s) don't prevent
  the silence timer from being reset.
- Update tests to match persistent stream behavior.
2026-03-14 14:27:20 +03:00
0xbyt4
9d58cafec9 fix: move process_loop voice restart to daemon thread, use _cprint consistently
- process_loop's continuous mode restart called _voice_start_recording()
  directly, blocking the loop if play_beep/sd.wait hangs — queued user
  input would stall silently. Dispatch to daemon thread like Ctrl+B handler.
- Replace print() with _cprint() in _handle_voice_command for consistency
  with the rest of the voice mode code.
2026-03-14 14:27:20 +03:00
0xbyt4
d0e3b39e69 fix: prevent Ctrl+B key handler from blocking prompt_toolkit event loop
The handle_voice_record key binding runs in prompt_toolkit's event-loop
thread. When silence auto-stopped recording, _voice_recording was False
but recorder.stop() still held AudioRecorder._lock. A concurrent Ctrl+B
press entered the START path and blocked on that lock, freezing all
keyboard input.

Three changes:
- Set _voice_processing atomically with _voice_recording=False in
  _voice_stop_and_transcribe to close the race window
- Add _voice_processing guard in the START path to prevent starting
  while stop/transcribe is still running
- Dispatch _voice_start_recording to a daemon thread so play_beep
  (sd.wait) and AudioRecorder.start (lock acquire) never block the
  event loop
2026-03-14 14:27:20 +03:00
0xbyt4
ecc3dd7c63 test: add comprehensive voice mode test coverage (86 tests)
- Add TestStreamingApiCall (11 tests) for _streaming_api_call in test_run_agent.py
- Add regression tests for all 7 bug fixes (edge_tts lazy import, output_stream
  cleanup, ctrl+c continuous reset, disable stops TTS, config key, chat cleanup,
  browser_tool signal handler removal)
- Add real behavior tests for CLI voice methods via _make_voice_cli() fixture:
  TestHandleVoiceCommandReal (7), TestEnableVoiceModeReal (7),
  TestDisableVoiceModeReal (6), TestVoiceSpeakResponseReal (7),
  TestVoiceStopAndTranscribeReal (12)
2026-03-14 14:27:20 +03:00
0xbyt4
6e51729c4c fix: remove browser_tool signal handlers that cause voice mode deadlock
browser_tool.py registered SIGINT/SIGTERM handlers that called sys.exit()
at module import time. When a signal arrived during a lock acquisition
(e.g. AudioRecorder._lock in voice mode), SystemExit was raised inside
prompt_toolkit's async event loop, corrupting coroutine state and making
the process unkillable (required SIGKILL).

atexit handler already ensures browser sessions are cleaned up on any
normal exit path, so the signal handlers were redundant and harmful.
2026-03-14 14:27:20 +03:00
0xbyt4
ddfd6e0c59 fix: resolve 6 voice mode bugs found during audit
- edge_tts NameError: _generate_edge_tts now calls _import_edge_tts()
  instead of referencing bare module name (tts_tool.py)
- TTS thread leak: chat() finally block sends sentinel to text_queue,
  sets stop_event, and joins tts_thread on exception paths (cli.py)
- output_stream leak: moved close() into finally block so audio device
  is released even on exception (tts_tool.py)
- Ctrl+C continuous mode: cancel handler now resets _voice_continuous
  to prevent auto-restart after user cancels recording (cli.py)
- _disable_voice_mode: now calls stop_playback() and sets
  _voice_tts_done so TTS stops when voice mode is turned off (cli.py)
- _show_voice_status: reads record key from config instead of
  hardcoding Ctrl+B (cli.py)
2026-03-14 14:27:20 +03:00
0xbyt4
a78249230c fix: address voice mode PR review (streaming TTS, prompt cache, _vprint)
Bug A: Replace stale _HAS_ELEVENLABS/_HAS_AUDIO boolean imports with
lazy import function calls (_import_elevenlabs, _import_sounddevice).
The old constants no longer exist in tts_tool -- the try/except
silently swallowed the ImportError, leaving streaming TTS dead.

Bug B: Use user message prefix instead of modifying system prompt for
voice mode instruction. Changing ephemeral_system_prompt mid-session
invalidates the prompt cache. Now the concise-response hint is
prepended to the user_message passed to run_conversation while
conversation_history keeps the original text.

Minor: Add force parameter to _vprint so critical error messages
(max retries, non-retryable errors, API failures) are always shown
even during streaming TTS playback.

Tests: 15 new tests in test_voice_cli_integration.py covering all
three fixes -- lazy import activation, message prefix behavior,
history cleanliness, system prompt stability, and AST verification
that all critical _vprint calls use force=True.
2026-03-14 14:27:20 +03:00
0xbyt4
c23928d089 fix: improve voice mode robustness and add integration tests
- Show TTS errors to user instead of silently logging
- Improve markdown stripping: code blocks, URLs, links, horizontal rules
- Fix stripping order: process markdown links before removing URLs
- Add threading.Lock for voice state variables (cross-thread safety)
- Add 14 CLI integration tests (markdown stripping, command parsing, thread safety)
- Total: 47 voice-related tests
2026-03-14 14:25:28 +03:00