Remove web UI gateway (web.py, tests, docs, toolset, env vars, Platform.WEB
enum) per maintainer request — Nous is building their own official chat UI.
Fix 1: Replace sd.wait() with polling pattern in play_audio_file() to prevent
indefinite hang when audio device stalls (consistent with play_beep()).
Fix 2: Use importlib.util.find_spec() for faster_whisper/openai availability
checks instead of module-level imports that trigger heavy native library
loading (CUDA/cuDNN) at import time.
Fix 3: Remove inspect.signature() hack in _send_voice_reply() — add **kwargs
to Telegram send_voice() so all adapters accept metadata uniformly.
Fix 4: Make session loading resilient to removed platform enum values — skip
entries with unknown platforms instead of crashing the entire gateway.
- web.py: pass stt_model from config like discord.py and run.py do
- run.py: match new error messages (No STT provider / not set)
- _transcribe_local: add missing "provider": "local" to return dict
Move stream close outside the lock in shutdown() to prevent deadlock
when audio callback tries to acquire the same lock. Replace single
t.join(timeout) with a polling loop (0.1s intervals) so KeyboardInterrupt
is not blocked during stream cleanup.
Voice status was hardcoded to check API keys only. Now uses the actual
provider resolution (local/groq/openai) so it correctly shows
"local faster-whisper" when installed instead of "Groq" or "MISSING".
Merge main's faster-whisper (local, free) with our Groq support into a
unified three-provider STT pipeline: local > groq > openai.
Provider priority ensures free options are tried first. Each provider
has its own transcriber function with model auto-correction, env-
overridable endpoints, and proper error handling.
74 tests cover the full provider matrix, fallback chains, model
correction, config loading, validation edge cases, and dispatch.
Duplicated YAML config parsing for stt.model existed in gateway/run.py
and gateway/platforms/discord.py. Moved to a single helper in
transcription_tools.py and added 5 tests covering all edge cases.
Code fixes:
- STT model, Groq base URL, and OpenAI STT base URL are now
configurable via env vars (STT_GROQ_MODEL, STT_OPENAI_MODEL,
GROQ_BASE_URL, STT_OPENAI_BASE_URL) instead of hardcoded
- Gateway and Discord VC now read stt.model from config.yaml
(previously only CLI did this — gateway always used defaults)
Doc fixes:
- voice-mode.md: move Web UI troubleshooting to web.md (was duplicated)
- voice-mode.md: simplify "How It Works" for end users (remove NaCl,
DAVE, RTP internals)
- voice-mode.md: clarify STT priority (OpenAI used first if both keys
set, Groq recommended for free tier)
- voice-mode.md: document new STT env overrides in config reference
- web.md: remove duplicate Quick Start / Step 1-3 sections
- web.md: add mobile HTTPS mic workarounds (moved from voice-mode.md)
- web.md: clarify STT fallback order
The counter was incremented in start/stop/cancel but never read
anywhere in the codebase. The race condition it was meant to guard
against is practically impossible with the persistent stream design.
- Keep InputStream alive across recordings to avoid CoreAudio hang on
repeated open/close cycles on macOS. New _ensure_stream() creates the
stream once; start()/stop()/cancel() only toggle frame collection.
- Add _close_stream_with_timeout() with daemon thread to prevent
stream.stop()/close() from blocking indefinitely.
- Add generation counter to detect stale stream-open completions after
cancel or restart.
- Run recorder.cancel() in background thread from Ctrl+C handler to
keep the event loop responsive.
- Add shutdown() method called on /voice off to release audio resources.
- Fix silence timer reset during active speech: use dip tolerance for
_resume_start tracker so natural speech pauses (< 0.3s) don't prevent
the silence timer from being reset.
- Update tests to match persistent stream behavior.
AudioRecorder now auto-stops after 15 seconds if no speech is detected
(_has_spoken remains False). In quiet environments where ambient RMS
never exceeds the silence threshold (200), the recording would wait
indefinitely. The new _max_wait parameter fires the silence callback
after the timeout, triggering the normal "No speech detected" flow.
- Set max_retries=0 on the STT OpenAI client. The SDK default (2) honors
Groq's retry-after header (often 53s), blocking the thread for up to
~106s on rate limits. Voice STT should fail fast, not retry silently.
- Stop continuous recording mode after 3 consecutive no-speech cycles to
prevent infinite restart loops when nobody is talking.
- Set OpenAI client timeout=30s in transcribe_audio() — default 600s
blocks _voice_processing for 10 min if Groq/OpenAI stalls
- Move _voice_start_recording in _voice_stop_and_transcribe finally
block to a daemon thread (same pattern as Ctrl+B handler and
process_loop)
- Add _should_exit guard at top of _voice_start_recording so all 4
call sites respect shutdown without individual checks
- Replace sd.wait() with a poll loop + sd.stop() in play_beep().
sd.wait() calls Event.wait() without timeout — hangs forever if the
audio device stalls. Poll with a 2s ceiling and force-stop instead.
- Wrap _on_silence callback in try-except so exceptions are logged
instead of silently lost in the daemon thread. Prevents recording
state from becoming inconsistent on unexpected errors.
browser_tool.py registered SIGINT/SIGTERM handlers that called sys.exit()
at module import time. When a signal arrived during a lock acquisition
(e.g. AudioRecorder._lock in voice mode), SystemExit was raised inside
prompt_toolkit's async event loop, corrupting coroutine state and making
the process unkillable (required SIGKILL).
atexit handler already ensures browser sessions are cleaned up on any
normal exit path, so the signal handlers were redundant and harmful.
- edge_tts NameError: _generate_edge_tts now calls _import_edge_tts()
instead of referencing bare module name (tts_tool.py)
- TTS thread leak: chat() finally block sends sentinel to text_queue,
sets stop_event, and joins tts_thread on exception paths (cli.py)
- output_stream leak: moved close() into finally block so audio device
is released even on exception (tts_tool.py)
- Ctrl+C continuous mode: cancel handler now resets _voice_continuous
to prevent auto-restart after user cancels recording (cli.py)
- _disable_voice_mode: now calls stop_playback() and sets
_voice_tts_done so TTS stops when voice mode is turned off (cli.py)
- _show_voice_status: reads record key from config instead of
hardcoding Ctrl+B (cli.py)
- AudioRecorder.start() now catches InputStream errors gracefully
with a clear error message about microphone availability
- Fix config key mismatch: cli.py was reading "push_to_talk_key"
but config.py defines "record_key" -- now consistent
- Add format conversion from config format ("ctrl+b") to
prompt_toolkit format ("c-b")
1. Fully lazy imports: sounddevice, numpy, elevenlabs, edge_tts, and
openai are never imported at module level. Each is imported only when
the feature is explicitly activated, preventing crashes in headless
environments (SSH, Docker, WSL, no PortAudio).
2. No core agent loop changes: streaming TTS path extracted from
_interruptible_api_call() into separate _streaming_api_call() method.
The original method is restored to its upstream form.
3. Configurable key binding: push-to-talk key changed from Ctrl+R
(conflicts with readline reverse-search) to Ctrl+B by default.
Configurable via voice.push_to_talk_key in config.yaml.
4. Environment detection: new detect_audio_environment() function checks
for SSH, Docker, WSL, and missing audio devices before enabling voice
mode. Auto-disables with clear warnings in incompatible environments.
5. Graceful degradation: every audio touchpoint (sd.play, sd.InputStream,
sd.OutputStream) wrapped in try/except with ImportError/OSError
handling. Failures produce warnings, not crashes.
- Fix Gemini streaming tool call merge bug: multiple tool calls with same
index but different IDs are now parsed as separate calls instead of
concatenating names (e.g. ha_call_serviceha_call_service)
- Handle partial results in voice mode: show error and stop continuous
mode when agent returns partial/failed results with empty response
- Fix error display during streaming TTS: error messages are shown in
full response box even when streaming box was already opened
- Add duplicate sentence filter in TTS: skip near-duplicate sentences
from LLM repetition
- Fix fake HA server state mutation: turn_on/turn_off/set_temperature
correctly update entity states; temperature sensor simulates change
when thermostat is adjusted
- Add _vprint() helper to suppress log output when stream_callback is active
- Expand Whisper hallucination filter with multi-language phrases and regex pattern for repetitive text
- Stop continuous voice mode when agent returns a failed result (e.g. 429 rate limit)
- Atomic check-and-set for _voice_recording flag with _voice_lock
- Guard _voice_stop_and_transcribe against concurrent invocation
- Remove premature flag clearing from Ctrl+R handler
- Clean up temp WAV files in finally block (_play_via_tempfile)
- Use buffer-level regex for <think> block filtering (handles chunked tags)
- Prevent /voice on prompt accumulation on repeated calls
- Include Groq in STT key error message
Move screen output from stream_callback to display_callback called by
TTS consumer thread. Text now appears sentence-by-sentence in sync with
audio instead of streaming ahead at LLM speed. Removes quiet_mode hack.
sounddevice raises OSError (not ImportError) when the PortAudio C
library is missing. This broke test collection on CI runners that
have the Python package installed but lack the native library.
Stream audio to speaker as the agent generates tokens instead of
waiting for the full response. First sentence plays within ~1-2s
of agent starting to respond.
- run_agent: add stream_callback to run_conversation/chat, streaming
path in _interruptible_api_call accumulates chunks into mock
ChatCompletion while forwarding content deltas to callback
- tts_tool: add stream_tts_to_speaker() with sentence buffering,
think block filtering, markdown stripping, ElevenLabs pcm_24000
streaming to sounddevice OutputStream
- cli: wire up streaming TTS pipeline in chat(), detect elevenlabs
provider + sounddevice availability, skip batch TTS when streaming
is active, signal stop on interrupt
Falls back to batch TTS for Edge/OpenAI providers or when
elevenlabs/sounddevice are not available. Zero impact on non-voice
mode (callback defaults to None).
- Audio cues: beep on record start (880Hz), double beep on stop (660Hz)
- Silence detection: auto-stop recording after 3s of silence (RMS-based)
- Continuous mode: auto-restart recording after agent responds
- Ctrl+R starts continuous mode, Ctrl+R during recording exits it
- Waits for TTS to finish before restarting to avoid recording speaker
- Tests: 7 new tests for beep generation and silence detection
- Add multi-provider STT support (OpenAI > Groq fallback) in transcription_tools
- Auto-correct model selection when provider doesn't support the configured model
- Change voice record key from Ctrl+Space to Ctrl+R (macOS compatibility)
- Fix duplicate transcript echo in voice pipeline
- Add GROQ_API_KEY to .env.example
* feat: improve context compaction handoff summaries
Adapt PR #916 onto current main by replacing the old context summary marker
with a clearer handoff wrapper, updating the summarization prompt for
resume-oriented summaries, and preserving the current call_llm-based
compression path.
* fix: clearer error when docker backend is unavailable
* fix: preserve docker discovery in backend preflight
Follow up on salvaged PR #940 by reusing find_docker() during the new
availability check so non-PATH Docker Desktop installs still work. Add
a regression test covering the resolved executable path.
---------
Co-authored-by: aydnOktay <xaydinoktay@gmail.com>
Restore gateway/run.py to current main behavior while keeping tirith startup
and pattern_keys replay, preserve yolo and non-interactive bypass semantics in
the combined guard, and add regression tests for yolo and view-full flows.
Integrate tirith as a pre-execution security scanner that detects
homograph URLs, pipe-to-interpreter patterns, terminal injection,
zero-width Unicode, and environment variable manipulation — threats
the existing 50-pattern dangerous command detector doesn't cover.
Architecture: gather-then-decide — both tirith and the dangerous
command detector run before any approval prompt, preventing gateway
force=True replay from bypassing one check when only the other was
shown to the user.
New files:
- tools/tirith_security.py: subprocess wrapper with auto-installer,
mandatory cosign provenance verification, non-blocking background
download, disk-persistent failure markers with retryable-cause
tracking (cosign_missing auto-clears when cosign appears on PATH)
- tests/tools/test_tirith_security.py: 62 tests covering exit code
mapping, fail_open, cosign verification, background install,
HERMES_HOME isolation, and failure recovery
- tests/tools/test_command_guards.py: 21 integration tests for the
combined guard orchestration
Modified files:
- tools/approval.py: add check_all_command_guards() orchestrator,
add allow_permanent parameter to prompt_dangerous_approval()
- tools/terminal_tool.py: replace _check_dangerous_command with
consolidated check_all_command_guards
- cli.py: update _approval_callback for allow_permanent kwarg,
call ensure_installed() at startup
- gateway/run.py: iterate pattern_keys list on replay approval,
call ensure_installed() at startup
- hermes_cli/config.py: add security config defaults, split
commented sections for independent fallback
- cli-config.yaml.example: document tirith security config
Salvage of PR #1246 by ChatGPT (teknium1 session), resolved against
current main which already includes #1239.
Changes:
- Add minisweagent_path.py: worktree-aware helper that finds
mini-swe-agent/src from either the current checkout or the main
checkout behind a git worktree
- Use the helper in tools/terminal_tool.py and mini_swe_runner.py
instead of naive path-relative lookup that fails in worktrees
- Clean up check_terminal_requirements():
- local: return True (no minisweagent dep, per #1239)
- singularity/ssh: remove unnecessary minisweagent imports
- docker/modal: use importlib.util.find_spec with clear error
- Add regression tests for worktree path discovery and tool resolution
* fix: Home Assistant event filtering now closed by default
Previously, when no watch_domains or watch_entities were configured,
ALL state_changed events passed through to the agent, causing users
to be flooded with notifications for every HA entity change.
Now events are dropped by default unless the user explicitly configures:
- watch_domains: list of domains to monitor (e.g. climate, light)
- watch_entities: list of specific entity IDs to monitor
- watch_all: true (new option — opt-in to receive all events)
A warning is logged at connect time if no filters are configured,
guiding users to set up their HA platform config.
All 49 gateway HA tests + 52 HA tool tests pass.
* docs: update Home Assistant integration documentation
- homeassistant.md: Fix event filtering docs to reflect closed-by-default
behavior. Add watch_all option. Replace Python dict config example with
YAML. Fix defaults table (was incorrectly showing 'all'). Add required
configuration warning admonition.
- environment-variables.md: Add HASS_TOKEN and HASS_URL to Messaging section.
- messaging/index.md: Add Home Assistant to description, architecture
diagram, platform toolsets table, and Next Steps links.
* fix(terminal): strip provider env vars from background and PTY subprocesses
Extends the env var blocklist from #1157 to also cover the two remaining
leaky paths in process_registry.py:
- spawn_local() PTY path (line 156)
- spawn_local() background Popen path (line 197)
Both were still using raw os.environ, leaking provider vars to background
processes and interactive PTY sessions. Now uses the same dynamic
_HERMES_PROVIDER_ENV_BLOCKLIST from local.py.
Explicit env_vars passed to spawn_local() still override the blocklist,
matching the existing behavior for callers that intentionally need these.
Gap identified by PR #1004 (@PeterFile).
* feat(delegate): add observability metadata to subagent results
Enrich delegate_task results with metadata from the child AIAgent:
- model: which model the child used
- exit_reason: completed | interrupted | max_iterations
- tokens.input / tokens.output: token counts
- tool_trace: per-tool-call trace with byte sizes and ok/error status
Tool trace uses tool_call_id matching to correctly pair parallel tool
calls with their results, with a fallback for messages without IDs.
Cherry-picked from PR #872 by @omerkaz, with fixes:
- Fixed parallel tool call trace pairing (was always updating last entry)
- Removed redundant 'iterations' field (identical to existing 'api_calls')
- Added test for parallel tool call trace correctness
Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>
* feat(stt): add free local whisper transcription via faster-whisper
Replace OpenAI-only STT with a dual-provider system mirroring the TTS
architecture (Edge TTS free / ElevenLabs paid):
STT: faster-whisper local (free, default) / OpenAI Whisper API (paid)
Changes:
- tools/transcription_tools.py: Full rewrite with provider dispatch,
config loading, local faster-whisper backend, and OpenAI API backend.
Auto-downloads model (~150MB for 'base') on first voice message.
Singleton model instance reused across calls.
- pyproject.toml: Add faster-whisper>=1.0.0 as core dependency
- hermes_cli/config.py: Expand stt config to match TTS pattern with
provider selection and per-provider model settings
- agent/context_compressor.py: Fix .strip() crash when LLM returns
non-string content (dict from llama.cpp, None). Fixes#1100 partially.
- tests/: 23 new tests for STT providers + 2 for compressor fix
- docs/: Updated Voice & TTS page with STT provider table, model sizes,
config examples, and fallback behavior
Fallback behavior:
- Local not installed → OpenAI API (if key set)
- OpenAI key not set → local whisper (if installed)
- Neither → graceful error message to user
Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
---------
Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>
Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
* fix: Home Assistant event filtering now closed by default
Previously, when no watch_domains or watch_entities were configured,
ALL state_changed events passed through to the agent, causing users
to be flooded with notifications for every HA entity change.
Now events are dropped by default unless the user explicitly configures:
- watch_domains: list of domains to monitor (e.g. climate, light)
- watch_entities: list of specific entity IDs to monitor
- watch_all: true (new option — opt-in to receive all events)
A warning is logged at connect time if no filters are configured,
guiding users to set up their HA platform config.
All 49 gateway HA tests + 52 HA tool tests pass.
* docs: update Home Assistant integration documentation
- homeassistant.md: Fix event filtering docs to reflect closed-by-default
behavior. Add watch_all option. Replace Python dict config example with
YAML. Fix defaults table (was incorrectly showing 'all'). Add required
configuration warning admonition.
- environment-variables.md: Add HASS_TOKEN and HASS_URL to Messaging section.
- messaging/index.md: Add Home Assistant to description, architecture
diagram, platform toolsets table, and Next Steps links.
* fix(terminal): strip provider env vars from background and PTY subprocesses
Extends the env var blocklist from #1157 to also cover the two remaining
leaky paths in process_registry.py:
- spawn_local() PTY path (line 156)
- spawn_local() background Popen path (line 197)
Both were still using raw os.environ, leaking provider vars to background
processes and interactive PTY sessions. Now uses the same dynamic
_HERMES_PROVIDER_ENV_BLOCKLIST from local.py.
Explicit env_vars passed to spawn_local() still override the blocklist,
matching the existing behavior for callers that intentionally need these.
Gap identified by PR #1004 (@PeterFile).
* feat(delegate): add observability metadata to subagent results
Enrich delegate_task results with metadata from the child AIAgent:
- model: which model the child used
- exit_reason: completed | interrupted | max_iterations
- tokens.input / tokens.output: token counts
- tool_trace: per-tool-call trace with byte sizes and ok/error status
Tool trace uses tool_call_id matching to correctly pair parallel tool
calls with their results, with a fallback for messages without IDs.
Cherry-picked from PR #872 by @omerkaz, with fixes:
- Fixed parallel tool call trace pairing (was always updating last entry)
- Removed redundant 'iterations' field (identical to existing 'api_calls')
- Added test for parallel tool call trace correctness
Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>
---------
Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>
* fix: Home Assistant event filtering now closed by default
Previously, when no watch_domains or watch_entities were configured,
ALL state_changed events passed through to the agent, causing users
to be flooded with notifications for every HA entity change.
Now events are dropped by default unless the user explicitly configures:
- watch_domains: list of domains to monitor (e.g. climate, light)
- watch_entities: list of specific entity IDs to monitor
- watch_all: true (new option — opt-in to receive all events)
A warning is logged at connect time if no filters are configured,
guiding users to set up their HA platform config.
All 49 gateway HA tests + 52 HA tool tests pass.
* docs: update Home Assistant integration documentation
- homeassistant.md: Fix event filtering docs to reflect closed-by-default
behavior. Add watch_all option. Replace Python dict config example with
YAML. Fix defaults table (was incorrectly showing 'all'). Add required
configuration warning admonition.
- environment-variables.md: Add HASS_TOKEN and HASS_URL to Messaging section.
- messaging/index.md: Add Home Assistant to description, architecture
diagram, platform toolsets table, and Next Steps links.
* fix(terminal): strip provider env vars from background and PTY subprocesses
Extends the env var blocklist from #1157 to also cover the two remaining
leaky paths in process_registry.py:
- spawn_local() PTY path (line 156)
- spawn_local() background Popen path (line 197)
Both were still using raw os.environ, leaking provider vars to background
processes and interactive PTY sessions. Now uses the same dynamic
_HERMES_PROVIDER_ENV_BLOCKLIST from local.py.
Explicit env_vars passed to spawn_local() still override the blocklist,
matching the existing behavior for callers that intentionally need these.
Gap identified by PR #1004 (@PeterFile).
Terminal subprocesses inherit OPENAI_BASE_URL and other provider env
vars loaded from ~/.hermes/.env, silently misrouting external CLIs
like codex. Build a blocklist dynamically from the provider registry
so new providers are automatically covered. Callers that truly need
a blocked var can opt in via the _HERMES_FORCE_ prefix.
Closes#1002
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The old message referenced 'hermes setup' which doesn't handle
skill-specific env vars. Updated to direct users to load the skill
in the local CLI (which triggers the secure prompt) or add the key
to ~/.hermes/.env manually.
When a skill declares required_environment_variables in its YAML
frontmatter, missing env vars trigger a secure TUI prompt (identical
to the sudo password widget) when the skill is loaded. Secrets flow
directly to ~/.hermes/.env, never entering LLM context.
Key changes:
- New required_environment_variables frontmatter field for skills
- Secure TUI widget (masked input, 120s timeout)
- Gateway safety: messaging platforms show local setup guidance
- Legacy prerequisites.env_vars normalized into new format
- Remote backend handling: conservative setup_needed=True
- Env var name validation, file permissions hardened to 0o600
- Redact patterns extended for secret-related JSON fields
- 12 existing skills updated with prerequisites declarations
- ~48 new tests covering skip, timeout, gateway, remote backends
- Dynamic panel widget sizing (fixes hardcoded width from original PR)
Cherry-picked from PR #723 by kshitijk4poor, rebased onto current main
with conflict resolution.
Fixes#688
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
The tinker-atropos submodule and its heavy dependencies (atroposlib, tinker,
wandb, fastapi, uvicorn) were being installed for all users by default,
adding significant install time and disk usage for most users who don't
need RL training capabilities.
Changes:
- install.sh: Only init mini-swe-agent submodule by default; skip
tinker-atropos clone and install entirely
- install.sh: Remove --recurse-submodules from git clone (only fetches
what's needed)
- pyproject.toml: Add [rl] optional dependency group for explicit opt-in
- rl_training_tool.py: Move LOGS_DIR.mkdir() from module-level to lazy
init (_ensure_logs_dir) to avoid side effects on import
- README.md: Update contributor quick start to not auto-fetch
tinker-atropos; add RL opt-in instructions
Users who want RL training can opt in with:
git submodule update --init tinker-atropos
uv pip install -e ./tinker-atropos
The ClawHub API v1 version endpoint only returns file metadata
(path, size, sha256, contentType) without inline content or download
URLs. Our code was looking for inline content in the metadata, which
never existed, causing all ClawHub installs to fail with:
'no inline/raw file content was available'
Fix: Use the /api/v1/download endpoint (same as the official clawhub
CLI) to download skills as ZIP bundles and extract files in-memory.
Changes:
- Add _download_zip() method that downloads and extracts ZIP bundles
- Retry on 429 rate limiting with Retry-After header support
- Path sanitization and binary file filtering for security
- Keep _extract_files() as a fallback for inline/raw content
- Also fix nested file lookup (version_data.version.files)
When a dangerous command is detected and the user is prompted for
approval, long commands are truncated (80 chars in fallback, 70 chars
in the TUI). Users had no way to see the full command before deciding.
This adds a 'View full command' option across all approval interfaces:
- CLI fallback (tools/approval.py): [v]iew option in the prompt menu.
Shows the full command and re-prompts for approval decision.
- CLI TUI (cli.py): 'Show full command' choice in the arrow-key
selection panel. Expands the command display in-place and removes
the view option after use.
- CLI callbacks (callbacks.py): 'view' choice added to the list when
the command exceeds 70 characters.
- Gateway (gateway/run.py): 'full', 'show', 'view' responses reveal
the complete command while keeping the approval pending.
Includes 7 new tests covering view-then-approve, view-then-deny,
short command fallthrough, and double-view behavior.
Closes community feedback about the 80-char cap on dangerous commands.
- gateway/run.py: Take main's _resolve_gateway_model() helper
- hermes_cli/setup.py: Re-apply nous-api removal after merge brought
it back. Fix provider_idx offset (Custom is now index 3, not 4).
- tests/hermes_cli/test_setup.py: Fix custom setup test index (3→4)