- Auto-TTS: voice messages get spoken response (audio first, then text)
- STT: Groq Whisper fallback when VOICE_TOOLS_OPENAI_KEY not set
- Futuristic UI: glassmorphism, centered container, purple theme, glow effects
- Voice bubble: custom waveform player with seek and progress
- Invisible TTS playback via play_tts() method (no audio file in chat)
- Add hermes-web toolset with full tool access
- Register Platform.WEB in toolset/config maps
- Update docs for voice conversation feature
Type /remote-control from any platform (Telegram, Discord, etc.) to
instantly start the web UI without restarting the gateway.
- Auto-generates access token if not provided
- Shows URL + token in response
- Optional: /remote-control [port] [token]
- Reports status if already running
- Added to /help command list
New platform adapter that serves a full-featured chat interface via HTTP.
Enables access from any device on the network (phone, tablet, desktop).
Features:
- aiohttp server with WebSocket real-time messaging
- Token-based authentication
- Markdown rendering (marked.js) + code highlighting (highlight.js)
- Voice recording via MediaRecorder API + STT transcription
- Image, voice, and document display
- Typing indicator + message editing (streaming support)
- Mobile responsive dark theme
- Auto-reconnect on disconnect
- Media file cleanup (24h TTL)
Config: WEB_UI_ENABLED=true, WEB_UI_PORT=8765, WEB_UI_TOKEN=<token>
No new dependencies — uses aiohttp already in [messaging] extra.
Phase 2 of voice channel support: bot listens to users speaking in VC,
transcribes speech via Groq Whisper, and processes through the agent pipeline.
- Add VoiceReceiver class for RTP packet capture, NaCl/DAVE decryption, Opus decode
- Add silence detection and per-user PCM buffering
- Wire voice input callback from adapter to GatewayRunner
- Fix adapter dict key: use Platform.DISCORD enum instead of string
- Fix guild_id extraction for synthetic voice events via SimpleNamespace raw_message
- Pause/resume receiver during TTS playback to prevent echo
- Send Discord voice messages with flags=8192 and waveform metadata
so they render as native voice bubbles instead of file attachments
- Use .mp3 output path for TTS so edge-tts opus conversion works
correctly (edge always outputs mp3, convert was skipped for .ogg)
- Use actual file_path from TTS result after potential opus conversion
- Register /voice as Discord slash command with mode choices
- Fix _send_voice_reply to handle adapters that don't accept metadata
parameter (Discord) by inspecting the method signature at runtime
- /voice on: reply with voice when user sends voice messages
- /voice tts: reply with voice to all messages
- /voice off: disable, text-only replies
- /voice status: show current mode
- Per-chat state persisted to gateway_voice_mode.json
- Dedup: skips auto-reply if agent already called text_to_speech tool
- drop_pending_updates=True to ignore stale Telegram messages on restart
- 25 tests covering command handler, reply logic, and edge cases
Add a /reasoning command across gateway adapters so users can
inspect or change reasoning effort without editing config by hand.
Reload reasoning settings from config.yaml before each agent run,
including background tasks, so the next message picks up the new
value consistently.
Restore gateway/run.py to current main behavior while keeping tirith startup
and pattern_keys replay, preserve yolo and non-interactive bypass semantics in
the combined guard, and add regression tests for yolo and view-full flows.
Integrate tirith as a pre-execution security scanner that detects
homograph URLs, pipe-to-interpreter patterns, terminal injection,
zero-width Unicode, and environment variable manipulation — threats
the existing 50-pattern dangerous command detector doesn't cover.
Architecture: gather-then-decide — both tirith and the dangerous
command detector run before any approval prompt, preventing gateway
force=True replay from bypassing one check when only the other was
shown to the user.
New files:
- tools/tirith_security.py: subprocess wrapper with auto-installer,
mandatory cosign provenance verification, non-blocking background
download, disk-persistent failure markers with retryable-cause
tracking (cosign_missing auto-clears when cosign appears on PATH)
- tests/tools/test_tirith_security.py: 62 tests covering exit code
mapping, fail_open, cosign verification, background install,
HERMES_HOME isolation, and failure recovery
- tests/tools/test_command_guards.py: 21 integration tests for the
combined guard orchestration
Modified files:
- tools/approval.py: add check_all_command_guards() orchestrator,
add allow_permanent parameter to prompt_dangerous_approval()
- tools/terminal_tool.py: replace _check_dangerous_command with
consolidated check_all_command_guards
- cli.py: update _approval_callback for allow_permanent kwarg,
call ensure_installed() at startup
- gateway/run.py: iterate pattern_keys list on replay approval,
call ensure_installed() at startup
- hermes_cli/config.py: add security config defaults, split
commented sections for independent fallback
- cli-config.yaml.example: document tirith security config
Session hygiene was firing at the same threshold (50%) as the agent's
own context compressor, causing premature compression on every turn
in long gateway sessions (especially Telegram).
Hygiene is a safety net for pathologically large sessions that would
cause API failures — it should NOT be doing normal compression work.
The agent's own compressor handles that during its tool loop with
accurate real token counts from the API.
Changes:
- Default hygiene threshold: 0.50 → 0.85 (fires only when truly large)
- Hygiene threshold is now independent of compression.threshold config
(that setting controls the agent's compressor, not the pre-agent safety net)
- Removed env var override for hygiene threshold (CONTEXT_COMPRESSION_THRESHOLD
still controls the agent's own compressor)
Critical bug: when the agent's context compressor fires during a tool
loop (_compress_context), it creates a new session_id and writes the
compressed messages there. But the gateway's session_entry still pointed
to the old session_id. On the next message, load_transcript() loaded
the stale pre-compression transcript, causing:
- Context bloat returning every turn
- Repeated compression cycles
- Loss of carefully compressed context
Fix: after run_conversation() returns, check if the agent's session_id
changed (compression split) and sync it back to the session store entry.
Also pass the effective session_id in the result dict so _handle_message
writes transcript entries to the correct session.
This affects ALL gateway adapters, not just webhook.
When a skill declares required_environment_variables in its YAML
frontmatter, missing env vars trigger a secure TUI prompt (identical
to the sudo password widget) when the skill is loaded. Secrets flow
directly to ~/.hermes/.env, never entering LLM context.
Key changes:
- New required_environment_variables frontmatter field for skills
- Secure TUI widget (masked input, 120s timeout)
- Gateway safety: messaging platforms show local setup guidance
- Legacy prerequisites.env_vars normalized into new format
- Remote backend handling: conservative setup_needed=True
- Env var name validation, file permissions hardened to 0o600
- Redact patterns extended for secret-related JSON fields
- 12 existing skills updated with prerequisites declarations
- ~48 new tests covering skip, timeout, gateway, remote backends
- Dynamic panel widget sizing (fixes hardcoded width from original PR)
Cherry-picked from PR #723 by kshitijk4poor, rebased onto current main
with conflict resolution.
Fixes#688
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
Root cause: two issues combined to create visual spam on Telegram/Discord:
1. build_tool_preview() preserved newlines from tool arguments. A preview
like 'import os\nprint("...")' rendered as 2+ visual lines per
progress entry on messaging platforms. This affected execute_code most
(code always has newlines), but could also hit terminal, memory,
send_message, session_search, and process tools.
2. No deduplication of identical progress messages. When models iterate
with execute_code using the same boilerplate code (common pattern),
each call produced an identical progress line. 9 calls x 2 visual
lines = 18 lines of identical spam in one message bubble.
Fixes:
- Added _oneline() helper to collapse all whitespace (newlines, tabs) to
single spaces. Applied to ALL code paths in build_tool_preview() —
both the generic path and every early-return path that touches user
content (memory, session_search, send_message, process).
- Added dedup in gateway progress_callback: consecutive identical messages
are collapsed with a repeat counter, e.g. 'execute_code: ... (x9)'
instead of 9 identical lines. The send_progress_messages async loop
handles dedup tuples by updating the last progress_line in-place.
* fix: ClawHub skill install — use /download ZIP endpoint
The ClawHub API v1 version endpoint only returns file metadata
(path, size, sha256, contentType) without inline content or download
URLs. Our code was looking for inline content in the metadata, which
never existed, causing all ClawHub installs to fail with:
'no inline/raw file content was available'
Fix: Use the /api/v1/download endpoint (same as the official clawhub
CLI) to download skills as ZIP bundles and extract files in-memory.
Changes:
- Add _download_zip() method that downloads and extracts ZIP bundles
- Retry on 429 rate limiting with Retry-After header support
- Path sanitization and binary file filtering for security
- Keep _extract_files() as a fallback for inline/raw content
- Also fix nested file lookup (version_data.version.files)
* chore: lower default compression threshold from 85% to 50%
Triggers context compression earlier — at 50% of the model's context
window instead of 85%. Updated in all four places where the default
is defined: context_compressor.py, cli.py, run_agent.py, config.py,
and gateway/run.py.
* fix: use session_key instead of chat_id for adapter interrupt lookups
monitor_for_interrupt() in _run_agent was using source.chat_id to query
the adapter's has_pending_interrupt() and get_pending_message() methods.
But the adapter stores interrupt events under build_session_key(source),
which produces a different string (e.g. 'agent:main:telegram:dm' vs '123456').
This key mismatch meant the interrupt was never detected through the
adapter path, which is the only active interrupt path for all adapter-based
platforms (Telegram, Discord, Slack, etc.). The gateway-level interrupt
path (in dispatch_message) is unreachable because the adapter intercepts
the 2nd message in handle_message() before it reaches dispatch_message().
Result: sending a new message while subagents were running had no effect —
the interrupt was silently lost.
Fix: replace all source.chat_id references in the interrupt-related code
within _run_agent() with the session_key parameter, which matches the
adapter's storage keys.
Also adds regression tests verifying session_key vs chat_id consistency.
* debug: add file-based logging to CLI interrupt path
Temporary instrumentation to diagnose why message-based interrupts
don't seem to work during subagent execution. Logs to
~/.hermes/interrupt_debug.log (immune to redirect_stdout).
Two log points:
1. When Enter handler puts message into _interrupt_queue
2. When chat() reads it and calls agent.interrupt()
This will reveal whether the message reaches the queue and
whether the interrupt is actually fired.
When a dangerous command is detected and the user is prompted for
approval, long commands are truncated (80 chars in fallback, 70 chars
in the TUI). Users had no way to see the full command before deciding.
This adds a 'View full command' option across all approval interfaces:
- CLI fallback (tools/approval.py): [v]iew option in the prompt menu.
Shows the full command and re-prompts for approval decision.
- CLI TUI (cli.py): 'Show full command' choice in the arrow-key
selection panel. Expands the command display in-place and removes
the view option after use.
- CLI callbacks (callbacks.py): 'view' choice added to the list when
the command exceeds 70 characters.
- Gateway (gateway/run.py): 'full', 'show', 'view' responses reveal
the complete command while keeping the approval pending.
Includes 7 new tests covering view-then-approve, view-then-deny,
short command fallthrough, and double-view behavior.
Closes community feedback about the 80-char cap on dangerous commands.
* fix: /reasoning command output ordering, display, and inline think extraction
Three issues with the /reasoning command:
1. Output interleaving: The command echo used print() while feedback
used _cprint(), causing them to render out-of-order under
prompt_toolkit's patch_stdout. Changed echo to use _cprint() so
all output renders through the same path in correct order.
2. Reasoning display not working: /reasoning show toggled a flag
but reasoning never appeared for models that embed thinking in
inline <think> blocks rather than structured API fields. Added
fallback extraction in _build_assistant_message to capture
<think> block content as reasoning when no structured reasoning
fields (reasoning, reasoning_content, reasoning_details) are
present. This feeds into both the reasoning callback (during
tool loops) and the post-response reasoning box display.
3. Feedback clarity: Added checkmarks to confirm actions, persisted
show/hide to config (was session-only before), and aligned the
status display for readability.
Tests: 7 new tests for inline think block extraction (41 total).
* feat: add /reasoning command to gateway (Telegram/Discord/etc)
The /reasoning command only existed in the CLI — messaging platforms
had no way to view or change reasoning settings. This adds:
1. /reasoning command handler in the gateway:
- No args: shows current effort level and display state
- /reasoning <level>: sets reasoning effort (none/low/medium/high/xhigh)
- /reasoning show|hide: toggles reasoning display in responses
- All changes saved to config.yaml immediately
2. Reasoning display in gateway responses:
- When show_reasoning is enabled, prepends a 'Reasoning' block
with the model's last_reasoning content before the response
- Collapses long reasoning (>15 lines) to keep messages readable
- Uses last_reasoning from run_conversation result dict
3. Plumbing:
- Added _show_reasoning attribute loaded from config at startup
- Propagated last_reasoning through _run_agent return dict
- Added /reasoning to help text and known_commands set
- Uses getattr for _show_reasoning to handle test stubs
- gateway/run.py: Take main's _resolve_gateway_model() helper
- hermes_cli/setup.py: Re-apply nous-api removal after merge brought
it back. Fix provider_idx offset (Custom is now index 3, not 4).
- tests/hermes_cli/test_setup.py: Fix custom setup test index (3→4)
Model selection now comes exclusively from config.yaml (set via
'hermes model' or 'hermes setup'). The LLM_MODEL env var is no longer
read or written anywhere in production code.
Why: env vars are per-process/per-user and would conflict in
multi-agent or multi-tenant setups. Config.yaml is file-based and
can be scoped per-user or eventually per-session.
Changes:
- cli.py: Read model from CLI_CONFIG only, not LLM_MODEL/OPENAI_MODEL
- hermes_cli/auth.py: _save_model_choice() no longer writes LLM_MODEL
to .env
- hermes_cli/setup.py: Remove 12 save_env_value('LLM_MODEL', ...)
calls from all provider setup flows
- gateway/run.py: Remove LLM_MODEL fallback (HERMES_MODEL still works
for gateway process runtime)
- cron/scheduler.py: Same
- agent/auxiliary_client.py: Remove LLM_MODEL from custom endpoint
model detection
Replace two inline copies of the env/config model resolution pattern
(in _run_agent_sync and _run_agent) with the _resolve_gateway_model()
helper introduced in PR #830.
Left untouched:
- Session hygiene block: different default (sonnet vs opus) + reads
compression config from the same YAML load
- /model command: also reads provider from same config block
Memory flush, /compress, and session hygiene create AIAgent without
model=, falling back to the hardcoded default "anthropic/claude-opus-4.6".
This fails with a 400 error when the active provider is openai-codex
(Codex only accepts its own model names like gpt-5.1-codex-mini).
Add _resolve_gateway_model() that mirrors the env/config resolution
already used by _run_agent_sync, and wire it into all three temporary
agent creation sites.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Post-merge fixes for the email gateway (PR #797):
1. Add Platform.EMAIL to all 4 platform-to-toolset/config mapping
dicts in gateway/run.py. Without this, email sessions silently
fell back to the Telegram toolset because these dicts were added
after the PR branched off main.
2. Add email (and signal) to hermes_cli/tools_config.py and
hermes_cli/skills_config.py PLATFORMS dicts so they appear in
'hermes tools' and 'hermes skills' CLI commands.
3. Add full email setup documentation:
- website/docs/user-guide/messaging/email.md — setup guide with
Gmail/Outlook instructions, configuration, troubleshooting,
security advice, and env var reference
- Update messaging/index.md — add email to architecture diagram,
platform toolset table, security examples, and next steps
Allow users to interact with Hermes by sending and receiving emails.
Uses IMAP polling for incoming messages and SMTP for replies with
proper threading (In-Reply-To, References headers).
Integrates with all 14 gateway extension points: config, adapter
factory, authorization, send_message tool, cron delivery, toolsets,
prompt hints, channel directory, setup wizard, status display, and
env example.
65 tests covering config, parsing, dispatch, threading, IMAP fetch,
SMTP send, attachments, and all integration points.
Isolate Telegram forum topic sessions — each topic gets its own independent session key, history, and interrupt tracking. Progress, hygiene, and cron messages all route to the correct topic.
Authored by teyrebaz33. Closes#643.
- /personality none/default/neutral clears system prompt overlay
- Dict format personalities with description, tone, style fields
- Works in both CLI and gateway
- 18 tests
Add /background <prompt> to the gateway, allowing users on Telegram,
Discord, Slack, etc. to fire off a prompt in a separate agent session.
The result is delivered back to the same chat when done, without
modifying the active conversation history.
Implementation:
- _handle_background_command: validates input, spawns asyncio task
- _run_background_task: creates AIAgent in executor thread, delivers
result (text, images, media files) back via the platform adapter
- Inherits model, toolsets, provider routing from gateway config
- Error handling with user-visible failure messages
Also adds /background to hermes_cli/commands.py registry so it
appears in /help and autocomplete.
Tests: 15 new tests covering usage, task creation, uniqueness,
multi-platform, error paths, and help/autocomplete integration.
Authored by teyrebaz33. Adds config-driven quick commands that execute
shell commands without invoking the LLM — zero token usage, works from
Telegram/Discord/Slack/etc. Closes#744.
Follow-up to 58dbd81 — ensures smooth transition for existing users:
- Backward compat: old session files without last_prompt_tokens
default to 0 via data.get('last_prompt_tokens', 0)
- /compress, /undo, /retry: reset last_prompt_tokens to 0 after
rewriting transcripts (stale token counts would under-report)
- Auto-compression hygiene: reset last_prompt_tokens after rewriting
- update_session: use None sentinel (not 0) as default so callers
can explicitly reset to 0 while normal calls don't clobber
- 6 new tests covering: default value, serialization roundtrip,
old-format migration, set/reset/no-change semantics
- /reset: new SessionEntry naturally gets last_prompt_tokens=0
2942 tests pass.
Root cause of aggressive gateway compression vs CLI:
- CLI: single AIAgent persists across conversation, uses real API-reported
prompt_tokens for compression decisions — accurate
- Gateway: each message creates fresh AIAgent, token count discarded after,
next message pre-check falls back to rough str(msg)//4 estimate which
overestimates 30-50% on tool-heavy conversations
Fix:
- Add last_prompt_tokens field to SessionEntry — stores the actual
API-reported prompt token count from the most recent agent turn
- After run_conversation(), extract context_compressor.last_prompt_tokens
and persist it via update_session()
- Gateway pre-check now uses stored actual tokens when available (exact
same accuracy as CLI), falling back to rough estimate with 1.4x safety
factor only for the first message of a session
This makes gateway compression behave identically to CLI compression
for all turns after the first. Reported by TigerHix.
The gateway's session hygiene pre-check uses a rough char-based token
estimate (total_chars / 4) to decide whether to compress before the
agent starts. This significantly overestimates for tool-heavy and
code-heavy conversations because:
1. str(msg) on dicts includes Python repr overhead (keys, brackets, etc.)
2. Code/JSON tokenizes at 5-7+ chars/token, not the assumed 4
This caused users with 200k context to see compression trigger at
~100-113k actual tokens instead of the expected 170k (85% threshold).
Reported by TigerHix on Twitter.
Fix: apply a 1.4x safety factor to the gateway pre-check threshold.
This pre-check is only meant to catch pathologically large transcripts
— the agent's own compression uses actual API-reported token counts
for precise threshold management.
Three separate code paths all wrote to the same SQLite state.db with
no deduplication, inflating session transcripts by 3-4x:
1. _log_msg_to_db() — wrote each message individually after append
2. _flush_messages_to_session_db() — re-wrote ALL new messages at
every _persist_session() call (~18 exit points), with no tracking
of what was already written
3. gateway append_to_transcript() — wrote everything a third time
after the agent returned
Since load_transcript() prefers SQLite over JSONL, the inflated data
was loaded on every session resume, causing proportional token waste.
Fix:
- Remove _log_msg_to_db() and all 16 call sites (redundant with flush)
- Add _last_flushed_db_idx tracking in _flush_messages_to_session_db()
so repeated _persist_session() calls only write truly new messages
- Reset flush cursor on compression (new session ID)
- Add skip_db parameter to SessionStore.append_to_transcript() so the
gateway skips SQLite writes when the agent already persisted them
- Gateway now passes skip_db=True for agent-managed messages, still
writes to JSONL as backup
Verified: a 12-message CLI session with tool calls produces exactly
12 SQLite rows with zero duplicates (previously would be 36-48).
Tests: 9 new tests covering flush deduplication, skip_db behavior,
compression reset, and initialization. Full suite passes (2869 tests).
Replies in Telegram forum topics (supergroups with topics) now land in
the correct topic thread instead of 'General'.
- base.py: build thread_id metadata from event.source, pass to all
send/media calls; add metadata param to send_typing, send_image,
send_animation, send_voice, send_video, send_document, send_image_file,
_keep_typing
- telegram.py: extract thread_id from metadata and pass as
message_thread_id to all Bot API calls (send_photo, send_voice,
send_audio, send_animation, send_chat_action)
- run.py: pass thread_id metadata to progress/streaming send calls
- discord/slack/whatsapp/homeassistant: update send_typing signature
Based on the fix proposed by @Bitstreamono in PR #656.
Add display.background_process_notifications config option to control
how chatty the gateway process watcher is when using
terminal(background=true, check_interval=...) from messaging platforms.
Modes:
- all: running-output updates + final message (default, current behavior)
- result: only the final completion message
- error: only the final message when exit code != 0
- off: no watcher messages at all
Also supports HERMES_BACKGROUND_NOTIFICATIONS env var override.
Includes 12 tests (5 config loading + 7 watcher behavior).
Inspired by @PeterFile's PR #593. Closes#592.
Automatic filesystem snapshots before destructive file operations,
with user-facing rollback. Inspired by PR #559 (by @alireza78a).
Architecture:
- Shadow git repos at ~/.hermes/checkpoints/{hash}/ via GIT_DIR
- CheckpointManager: take/list/restore, turn-scoped dedup, pruning
- Transparent — the LLM never sees it, no tool schema, no tokens
- Once per turn — only first write_file/patch triggers a snapshot
Integration:
- Config: checkpoints.enabled + checkpoints.max_snapshots
- CLI flag: hermes --checkpoints
- Trigger: run_agent.py _execute_tool_calls() before write_file/patch
- /rollback slash command in CLI + gateway (list, restore by number)
- Pre-rollback snapshot auto-created on restore (undo the undo)
Safety:
- Never blocks file operations — all errors silently logged
- Skips root dir, home dir, dirs >50K files
- Disables gracefully when git not installed
- Shadow repo completely isolated from project git
Tests: 35 new tests, all passing (2798 total suite)
Docs: feature page, config reference, CLI commands reference
Platform.SIGNAL was missing from default_toolset_map and platform_config_key
in gateway/run.py, causing Signal to silently fall back to hermes-telegram
toolset (same bug as HomeAssistant, fixed in PR #538).
Also updates:
- tests/test_toolsets.py: include hermes-signal and hermes-homeassistant in
the platform core-tools consistency check
- cli-config.yaml.example: document signal and homeassistant platform keys
Authored by Himess. Three independent fixes:
- cron/jobs.py: respect HERMES_HOME env var (consistent with scheduler.py)
- gateway/run.py: add Platform.HOMEASSISTANT to toolset mappings
- tools/environments/daytona.py: use time.monotonic() for timeout deadline
Authored by shitcoinsherpa. Adds encoding='utf-8' to all text-mode
open() calls in gateway/run.py, gateway/config.py, hermes_cli/config.py,
hermes_cli/main.py, and hermes_cli/status.py. Prevents encoding errors
on Windows where the default locale is not UTF-8.
Also fixed 4 additional open() calls in gateway/run.py that were added
after the PR branch was created.
The gateway's config.yaml → env var bridge was missing docker_volumes,
so Docker volume mounts configured in config.yaml were ignored for
gateway sessions (Telegram, Discord, etc.) while working in CLI.
Also fixes list serialization: str() produces Python repr with single
quotes which json.loads() in terminal_tool.py can't parse. Now uses
json.dumps() for list values.
Based on PR #431 by @manuelschipper (applied manually due to stale branch).
Closes#643
Changes:
- /personality none|default|neutral — clears system prompt overlay
- Custom personalities in config.yaml support dict format with:
name, description, system_prompt, tone, style directives
- Backwards compatible — existing string format still works
- CLI + gateway both updated
- 18 tests covering none/default/neutral, dict format, string format,
list display, save to config
Three issues caused the gateway to display 'openrouter' instead of
'Custom endpoint' when users configured a custom OAI-compatible endpoint:
1. hermes setup: custom endpoint path saved OPENAI_BASE_URL and
OPENAI_API_KEY to .env but never wrote model.provider to config.yaml.
All other providers (Codex, z.ai, Kimi, etc.) call
_update_config_for_provider() which sets this — custom was the only
path that skipped it. Now writes model.provider='custom' and
model.base_url to config.yaml.
2. hermes model: custom endpoint set model.provider='auto' in config.yaml.
The CLI display had a hack to detect OPENAI_BASE_URL and override to
'custom', but the gateway didn't. Now sets model.provider='custom'
directly.
3. gateway /model and /provider commands: defaulted to 'openrouter' and
read config.yaml — which had no provider set. Added OPENAI_BASE_URL
detection fallback (same pattern the CLI uses) as a defensive catch
for existing users who set up before this fix.