Commit Graph

501 Commits

Author SHA1 Message Date
0xbyt4
e50323f730 fix(test): add missing _voice_mode attr to GatewayRunner test stubs 2026-03-14 14:27:20 +03:00
0xbyt4
f6cf4ca826 feat: add /voice slash command to Discord + fix cross-platform send_voice
- Register /voice as Discord slash command with mode choices
- Fix _send_voice_reply to handle adapters that don't accept metadata
  parameter (Discord) by inspecting the method signature at runtime
2026-03-14 14:27:20 +03:00
0xbyt4
d80da5ddd8 feat: add /voice command for auto voice reply in Telegram gateway
- /voice on: reply with voice when user sends voice messages
- /voice tts: reply with voice to all messages
- /voice off: disable, text-only replies
- /voice status: show current mode
- Per-chat state persisted to gateway_voice_mode.json
- Dedup: skips auto-reply if agent already called text_to_speech tool
- drop_pending_updates=True to ignore stale Telegram messages on restart
- 25 tests covering command handler, reply logic, and edge cases
2026-03-14 14:27:20 +03:00
0xbyt4
eb79dda04b fix: persistent audio stream and silence detection improvements
- Keep InputStream alive across recordings to avoid CoreAudio hang on
  repeated open/close cycles on macOS.  New _ensure_stream() creates the
  stream once; start()/stop()/cancel() only toggle frame collection.
- Add _close_stream_with_timeout() with daemon thread to prevent
  stream.stop()/close() from blocking indefinitely.
- Add generation counter to detect stale stream-open completions after
  cancel or restart.
- Run recorder.cancel() in background thread from Ctrl+C handler to
  keep the event loop responsive.
- Add shutdown() method called on /voice off to release audio resources.
- Fix silence timer reset during active speech: use dip tolerance for
  _resume_start tracker so natural speech pauses (< 0.3s) don't prevent
  the silence timer from being reset.
- Update tests to match persistent stream behavior.
2026-03-14 14:27:20 +03:00
0xbyt4
eec04d180a fix(test): update play_beep test to match polling-based implementation
play_beep was changed from sd.wait() to a poll loop + sd.stop() in
302e1fe but the test was not updated. Now asserts sd.stop() instead
of sd.wait().
2026-03-14 14:27:20 +03:00
0xbyt4
9d58cafec9 fix: move process_loop voice restart to daemon thread, use _cprint consistently
- process_loop's continuous mode restart called _voice_start_recording()
  directly, blocking the loop if play_beep/sd.wait hangs — queued user
  input would stall silently. Dispatch to daemon thread like Ctrl+B handler.
- Replace print() with _cprint() in _handle_voice_command for consistency
  with the rest of the voice mode code.
2026-03-14 14:27:20 +03:00
0xbyt4
d0e3b39e69 fix: prevent Ctrl+B key handler from blocking prompt_toolkit event loop
The handle_voice_record key binding runs in prompt_toolkit's event-loop
thread. When silence auto-stopped recording, _voice_recording was False
but recorder.stop() still held AudioRecorder._lock. A concurrent Ctrl+B
press entered the START path and blocked on that lock, freezing all
keyboard input.

Three changes:
- Set _voice_processing atomically with _voice_recording=False in
  _voice_stop_and_transcribe to close the race window
- Add _voice_processing guard in the START path to prevent starting
  while stop/transcribe is still running
- Dispatch _voice_start_recording to a daemon thread so play_beep
  (sd.wait) and AudioRecorder.start (lock acquire) never block the
  event loop
2026-03-14 14:27:20 +03:00
0xbyt4
ecc3dd7c63 test: add comprehensive voice mode test coverage (86 tests)
- Add TestStreamingApiCall (11 tests) for _streaming_api_call in test_run_agent.py
- Add regression tests for all 7 bug fixes (edge_tts lazy import, output_stream
  cleanup, ctrl+c continuous reset, disable stops TTS, config key, chat cleanup,
  browser_tool signal handler removal)
- Add real behavior tests for CLI voice methods via _make_voice_cli() fixture:
  TestHandleVoiceCommandReal (7), TestEnableVoiceModeReal (7),
  TestDisableVoiceModeReal (6), TestVoiceSpeakResponseReal (7),
  TestVoiceStopAndTranscribeReal (12)
2026-03-14 14:27:20 +03:00
0xbyt4
6e51729c4c fix: remove browser_tool signal handlers that cause voice mode deadlock
browser_tool.py registered SIGINT/SIGTERM handlers that called sys.exit()
at module import time. When a signal arrived during a lock acquisition
(e.g. AudioRecorder._lock in voice mode), SystemExit was raised inside
prompt_toolkit's async event loop, corrupting coroutine state and making
the process unkillable (required SIGKILL).

atexit handler already ensures browser sessions are cleaned up on any
normal exit path, so the signal handlers were redundant and harmful.
2026-03-14 14:27:20 +03:00
0xbyt4
ddfd6e0c59 fix: resolve 6 voice mode bugs found during audit
- edge_tts NameError: _generate_edge_tts now calls _import_edge_tts()
  instead of referencing bare module name (tts_tool.py)
- TTS thread leak: chat() finally block sends sentinel to text_queue,
  sets stop_event, and joins tts_thread on exception paths (cli.py)
- output_stream leak: moved close() into finally block so audio device
  is released even on exception (tts_tool.py)
- Ctrl+C continuous mode: cancel handler now resets _voice_continuous
  to prevent auto-restart after user cancels recording (cli.py)
- _disable_voice_mode: now calls stop_playback() and sets
  _voice_tts_done so TTS stops when voice mode is turned off (cli.py)
- _show_voice_status: reads record key from config instead of
  hardcoding Ctrl+B (cli.py)
2026-03-14 14:27:20 +03:00
0xbyt4
a78249230c fix: address voice mode PR review (streaming TTS, prompt cache, _vprint)
Bug A: Replace stale _HAS_ELEVENLABS/_HAS_AUDIO boolean imports with
lazy import function calls (_import_elevenlabs, _import_sounddevice).
The old constants no longer exist in tts_tool -- the try/except
silently swallowed the ImportError, leaving streaming TTS dead.

Bug B: Use user message prefix instead of modifying system prompt for
voice mode instruction. Changing ephemeral_system_prompt mid-session
invalidates the prompt cache. Now the concise-response hint is
prepended to the user_message passed to run_conversation while
conversation_history keeps the original text.

Minor: Add force parameter to _vprint so critical error messages
(max retries, non-retryable errors, API failures) are always shown
even during streaming TTS playback.

Tests: 15 new tests in test_voice_cli_integration.py covering all
three fixes -- lazy import activation, message prefix behavior,
history cleanliness, system prompt stability, and AST verification
that all critical _vprint calls use force=True.
2026-03-14 14:27:20 +03:00
0xbyt4
b859dfab16 fix: address voice mode review feedback
1. Fully lazy imports: sounddevice, numpy, elevenlabs, edge_tts, and
   openai are never imported at module level. Each is imported only when
   the feature is explicitly activated, preventing crashes in headless
   environments (SSH, Docker, WSL, no PortAudio).

2. No core agent loop changes: streaming TTS path extracted from
   _interruptible_api_call() into separate _streaming_api_call() method.
   The original method is restored to its upstream form.

3. Configurable key binding: push-to-talk key changed from Ctrl+R
   (conflicts with readline reverse-search) to Ctrl+B by default.
   Configurable via voice.push_to_talk_key in config.yaml.

4. Environment detection: new detect_audio_environment() function checks
   for SSH, Docker, WSL, and missing audio devices before enabling voice
   mode. Auto-disables with clear warnings in incompatible environments.

5. Graceful degradation: every audio touchpoint (sd.play, sd.InputStream,
   sd.OutputStream) wrapped in try/except with ImportError/OSError
   handling. Failures produce warnings, not crashes.
2026-03-14 14:27:20 +03:00
0xbyt4
143cc68946 fix(test): add /voice to EXPECTED_COMMANDS set in test_commands.py 2026-03-14 14:27:20 +03:00
0xbyt4
46db7aeffd fix: streaming tool call parsing, error handling, and fake HA state mutation
- Fix Gemini streaming tool call merge bug: multiple tool calls with same
  index but different IDs are now parsed as separate calls instead of
  concatenating names (e.g. ha_call_serviceha_call_service)
- Handle partial results in voice mode: show error and stop continuous
  mode when agent returns partial/failed results with empty response
- Fix error display during streaming TTS: error messages are shown in
  full response box even when streaming box was already opened
- Add duplicate sentence filter in TTS: skip near-duplicate sentences
  from LLM repetition
- Fix fake HA server state mutation: turn_on/turn_off/set_temperature
  correctly update entity states; temperature sensor simulates change
  when thermostat is adjusted
2026-03-14 14:27:20 +03:00
0xbyt4
dad865e920 fix: fix silence detection bugs and add Phase 4 voice mode features
Fix 3 critical bugs in silence detection:
- Micro-pause tolerance now tracks dip duration (not time since speech start)
- Peak RMS check in stop() prevents discarding recordings with real speech
- Reduced min_speech_duration from 0.5s to 0.3s for reliable speech confirmation

Phase 4 features: configurable silence params, visual audio level indicator,
voice system prompt, tool call audio cues, TTS interrupt, continuous mode
auto-restart, interruptable playback via Popen tracking.
2026-03-14 14:26:30 +03:00
0xbyt4
32b033c11c feat: add silence filter, hallucination guard, and continuous mode control
- Skip silent recordings before STT call (RMS check in AudioRecorder.stop)
- Filter known Whisper hallucinations ("Thank you.", "Bye." etc.)
- Continuous mode: Ctrl+R starts loop, Ctrl+R during recording exits it
- Wait for TTS to finish before auto-restart to avoid recording speaker
- Silence timeout increased to 3s for natural pauses
- Tests: hallucination filter, silent recording skip, real speech passthrough
2026-03-14 14:25:28 +03:00
0xbyt4
bfd9c97705 feat: add Phase 4 low-latency features for voice mode
- Audio cues: beep on record start (880Hz), double beep on stop (660Hz)
- Silence detection: auto-stop recording after 3s of silence (RMS-based)
- Continuous mode: auto-restart recording after agent responds
  - Ctrl+R starts continuous mode, Ctrl+R during recording exits it
  - Waits for TTS to finish before restarting to avoid recording speaker
- Tests: 7 new tests for beep generation and silence detection
2026-03-14 14:25:28 +03:00
0xbyt4
a69bd55b5a fix: isolate GROQ_API_KEY in test_missing_stt_key test
The test was failing because GROQ_API_KEY leaked from the environment.
Now both VOICE_TOOLS_OPENAI_KEY and GROQ_API_KEY are removed to
properly test the "no STT key" scenario.
2026-03-14 14:25:28 +03:00
0xbyt4
c23928d089 fix: improve voice mode robustness and add integration tests
- Show TTS errors to user instead of silently logging
- Improve markdown stripping: code blocks, URLs, links, horizontal rules
- Fix stripping order: process markdown links before removing URLs
- Add threading.Lock for voice state variables (cross-thread safety)
- Add 14 CLI integration tests (markdown stripping, command parsing, thread safety)
- Total: 47 voice-related tests
2026-03-14 14:25:28 +03:00
0xbyt4
37b01ab964 test: add transcription_tools tests for multi-provider STT
- Provider resolution: OpenAI priority, Groq fallback, no keys
- Model auto-correction: Groq corrects OpenAI models and vice versa
- Success path: transcription, API errors, whitespace stripping
- 12 new tests, 33 total voice-related tests
2026-03-14 14:25:28 +03:00
0xbyt4
1a6fbef8a9 feat: add voice mode with push-to-talk and TTS output for CLI
Implements Issue #314 Phase 2 & 3:
- /voice command to toggle voice mode (on/off/tts/status)
- Ctrl+Space push-to-talk recording via sounddevice
- Whisper STT transcription via existing transcription_tools
- Optional TTS response playback via existing tts_tool
- Visual indicators in prompt (recording/transcribing/voice)
- 21 unit tests, all mocked (no real mic/API)
- Optional deps: sounddevice, numpy (pip install hermes-agent[voice])
2026-03-14 14:25:28 +03:00
Teknium
cf3dceafe1 Merge pull request #1284 from NousResearch/hermes/hermes-de3d4e49-pr964
fix: show effective model and provider in status
2026-03-14 03:42:16 -07:00
teknium1
b5a7e807d0 test: cover provider label formatting 2026-03-14 03:39:12 -07:00
luisv-1
c2c37ef158 Show configured model and provider in status output
Made-with: Cursor
2026-03-14 03:35:37 -07:00
Teknium
2f8dbe4e77 Merge pull request #1283 from NousResearch/hermes/hermes-f48b210a
fix(setup): salvage keep-current provider handling from #951
2026-03-14 03:26:44 -07:00
Teknium
95d49401ee Merge pull request #1282 from NousResearch/hermes/hermes-cc060dd9
fix(cli): make TUI prompt and accent output skin-aware
2026-03-14 03:24:24 -07:00
StefanIsMe
26f8b790c9 fix(setup): persist provider when switching model endpoints 2026-03-14 03:21:46 -07:00
Teknium
7901d863dd Merge pull request #1280 from NousResearch/hermes/hermes-de3d4e49-pr944
fix: make session log writes reuse shared atomic JSON helper
2026-03-14 03:15:52 -07:00
teknium1
e9a7441c9b test: restore default event loop for sync tests 2026-03-14 03:14:34 -07:00
Wayne
41f22de20f fix(cli): make TUI prompt and accent output skin-aware
Salvaged from PR #932 by Wayne onto current main.

Apply skin-aware prompt symbols and live prompt_toolkit color refresh,
replace lingering hardcoded accent output with active-skin colors, keep
ANSI-safe response rendering, preserve secret-capture and approval-prompt
state handling, and add integration coverage for prompt state and style
refresh behavior.
2026-03-14 03:12:52 -07:00
Teknium
b91cac7b4b test: make gateway async tests xdist-safe (#1281)
* feat: improve context compaction handoff summaries

Adapt PR #916 onto current main by replacing the old context summary marker
with a clearer handoff wrapper, updating the summarization prompt for
resume-oriented summaries, and preserving the current call_llm-based
compression path.

* fix: clearer error when docker backend is unavailable

* fix: preserve docker discovery in backend preflight

Follow up on salvaged PR #940 by reusing find_docker() during the new
availability check so non-PATH Docker Desktop installs still work. Add
a regression test covering the resolved executable path.

* test: make gateway async tests xdist-safe

Replace sync test usage of asyncio.get_event_loop().run_until_complete()
with asyncio.run() so tests do not depend on an ambient current event loop.
Also create the email disconnect poll task inside a running loop. This fixes
xdist/CI failures where workers have no current loop in MainThread.

---------

Co-authored-by: aydnOktay <xaydinoktay@gmail.com>
2026-03-14 03:12:15 -07:00
kshitij
0bb7ed1d95 refactor: salvage adapter and CLI cleanup from PR #939
Salvaged from PR #939 by kshitij.

- deduplicate Discord slash command dispatch and local file send helpers
- deduplicate Slack file uploads while preserving thread metadata
- extract shared CLI session relative-time formatting
- hoist browser PATH cleanup constants and throttle screenshot pruning
- tidy small type and import cleanups
2026-03-14 03:07:11 -07:00
Teknium
f279bb004f Merge pull request #1278 from NousResearch/hermes/hermes-f48b210a
test: fix gateway async tests without implicit event loop
2026-03-14 02:57:47 -07:00
teknium1
cbbba87099 fix: reuse shared atomic session log helper 2026-03-14 02:56:13 -07:00
Teknium
6036793f60 fix: clearer docker backend preflight errors (#1276)
* feat: improve context compaction handoff summaries

Adapt PR #916 onto current main by replacing the old context summary marker
with a clearer handoff wrapper, updating the summarization prompt for
resume-oriented summaries, and preserving the current call_llm-based
compression path.

* fix: clearer error when docker backend is unavailable

* fix: preserve docker discovery in backend preflight

Follow up on salvaged PR #940 by reusing find_docker() during the new
availability check so non-PATH Docker Desktop installs still work. Add
a regression test covering the resolved executable path.

---------

Co-authored-by: aydnOktay <xaydinoktay@gmail.com>
2026-03-14 02:53:02 -07:00
teknium1
115dd17b3c test: fix gateway async test event loop usage
Use asyncio.run in sync tests that were relying on an implicit current event loop. This makes the gateway send-image and Slack connect tests pass reliably under Python 3.11+ and xdist workers.
2026-03-14 02:52:47 -07:00
Teknium
486cb772b8 Merge pull request #1275 from NousResearch/hermes/hermes-f48b210a
feat(gateway): salvage reasoning hot reload from #938
2026-03-14 02:47:11 -07:00
Teknium
11e6775f98 Merge pull request #1274 from NousResearch/hermes/hermes-de3d4e49-pr920
fix: handle headless setup flows end-to-end
2026-03-14 02:45:19 -07:00
Verne
52ba940c9b feat(gateway): add reasoning hot reload
Add a /reasoning command across gateway adapters so users can
inspect or change reasoning effort without editing config by hand.

Reload reasoning settings from config.yaml before each agent run,
including background tasks, so the next message picks up the new
value consistently.
2026-03-14 02:42:47 -07:00
teknium1
9492f42aa7 fix: cover headless first-run setup flow 2026-03-14 02:37:29 -07:00
Teknium
5c479eedf1 feat: improve context compaction handoff summaries (#1273)
Adapt PR #916 onto current main by replacing the old context summary marker
with a clearer handoff wrapper, updating the summarization prompt for
resume-oriented summaries, and preserving the current call_llm-based
compression path.
2026-03-14 02:33:31 -07:00
teyrebaz33
4aa94ae7cc fix: detect non-interactive TTY in setup wizard to prevent hang
hermes setup hung indefinitely on headless SSH sessions, Docker
containers, and CI/CD environments because the interactive provider
selection menu could not receive input.

Two-layer fix:
1. sys.stdin.isatty() check — auto-detects non-interactive environments
2. --non-interactive flag support — already in CLI parser, now honored

In both cases the wizard exits immediately with helpful guidance
pointing users to 'hermes config set' commands.

Closes #905
2026-03-14 02:32:46 -07:00
teknium1
1e23d14568 fix: log prompt builder skill parsing fallbacks 2026-03-14 02:22:17 -07:00
Teknium
1117a21065 Merge pull request #1271 from NousResearch/hermes/hermes-de3d4e49
fix: guard init-time stdio writes
2026-03-14 02:21:39 -07:00
teknium1
936040d8f7 fix: guard init-time stdio writes 2026-03-14 02:19:46 -07:00
Teknium
1869e88169 Merge pull request #1256 from NousResearch/hermes/hermes-720acdad
feat(security): add tirith pre-exec command scanning
2026-03-14 00:24:56 -07:00
teknium1
6f1889b0fa fix: preserve current approval semantics for tirith guard
Restore gateway/run.py to current main behavior while keeping tirith startup
and pattern_keys replay, preserve yolo and non-interactive bypass semantics in
the combined guard, and add regression tests for yolo and view-full flows.
2026-03-14 00:17:04 -07:00
teknium1
f5cf1f8a45 fix(cron): tag persisted cron sessions and test wiring
- store cron-run sessions with source=cron instead of falling back to cli
- close the per-run SessionDB after completion
- add regression coverage for cron session_db/platform wiring
2026-03-14 00:12:34 -07:00
sheeki003
375ce8a881 feat(security): add tirith pre-exec command scanning
Integrate tirith as a pre-execution security scanner that detects
homograph URLs, pipe-to-interpreter patterns, terminal injection,
zero-width Unicode, and environment variable manipulation — threats
the existing 50-pattern dangerous command detector doesn't cover.

Architecture: gather-then-decide — both tirith and the dangerous
command detector run before any approval prompt, preventing gateway
force=True replay from bypassing one check when only the other was
shown to the user.

New files:
- tools/tirith_security.py: subprocess wrapper with auto-installer,
  mandatory cosign provenance verification, non-blocking background
  download, disk-persistent failure markers with retryable-cause
  tracking (cosign_missing auto-clears when cosign appears on PATH)
- tests/tools/test_tirith_security.py: 62 tests covering exit code
  mapping, fail_open, cosign verification, background install,
  HERMES_HOME isolation, and failure recovery
- tests/tools/test_command_guards.py: 21 integration tests for the
  combined guard orchestration

Modified files:
- tools/approval.py: add check_all_command_guards() orchestrator,
  add allow_permanent parameter to prompt_dangerous_approval()
- tools/terminal_tool.py: replace _check_dangerous_command with
  consolidated check_all_command_guards
- cli.py: update _approval_callback for allow_permanent kwarg,
  call ensure_installed() at startup
- gateway/run.py: iterate pattern_keys list on replay approval,
  call ensure_installed() at startup
- hermes_cli/config.py: add security config defaults, split
  commented sections for independent fallback
- cli-config.yaml.example: document tirith security config
2026-03-14 00:11:27 -07:00
Teknium
29176f302e fix: sanitize chat payloads and provider precedence (#1253)
fix: sanitize chat payloads and provider precedence
2026-03-14 00:09:14 -07:00