Centralizes two widely-duplicated patterns into hermes_constants.py:
1. get_hermes_home() — Path resolution for ~/.hermes (HERMES_HOME env var)
- Was copy-pasted inline across 30+ files as:
Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
- Now defined once in hermes_constants.py (zero-dependency module)
- hermes_cli/config.py re-exports it for backward compatibility
- Removed local wrapper functions in honcho_integration/client.py,
tools/website_policy.py, tools/tirith_security.py, hermes_cli/uninstall.py
2. parse_reasoning_effort() — Reasoning effort string validation
- Was copy-pasted in cli.py, gateway/run.py, cron/scheduler.py
- Same validation logic: check against (xhigh, high, medium, low, minimal, none)
- Now defined once in hermes_constants.py, called from all 3 locations
- Warning log for unknown values kept at call sites (context-specific)
31 files changed, net +31 lines (125 insertions, 94 deletions)
Full test suite: 6179 passed, 0 failed
When a background task (/bg command) prints its output while the main agent
is processing with the thinking spinner visible, the status bar could render
on the same row as the spinner, causing visual overlap.
This fix adds an explicit app.invalidate() call with a brief pause before
printing background task output, ensuring the TUI layout is in a consistent
state before the output is written.
Changes:
- Add TUI refresh before success output in _handle_background_command
- Add TUI refresh before error output in the exception handler
- Add tests for the refresh behavior
Closes#2718
Co-authored-by: Bartok9 <bartokmagic@proton.me>
KeyboardInterrupt inherits from BaseException, not Exception, so the
except Exception: clauses wrapping flush_memories() on exit paths
silently skipped the flush when the user pressed Ctrl+C. This could
lose conversation memory.
Change both call sites to except (Exception, KeyboardInterrupt): so
the memory flush is attempted even during interrupt.
Salvaged from PR #2855 by RufusLin (dropped unrelated bundled changes).
Three improvements to reasoning/thinking display in the CLI:
1. Buffer tiny reasoning chunks: providers like DeepSeek stream reasoning
one word at a time, producing a separate [thinking] line per token.
Add a buffer that coalesces chunks and flushes at natural boundaries
(newlines, sentence endings, terminal width).
2. Fix duplicate reasoning display: centralize callback selection into
_current_reasoning_callback() — one place instead of 4 scattered
inline ternaries. Prevents both the streaming box AND the preview
callback from firing simultaneously.
3. Fix post-response reasoning box guard: change the check from
'not self._stream_started' to 'not self._reasoning_stream_started'
so the final reasoning box is only suppressed when reasoning was
actually streamed live, not when any text was streamed.
Cherry-picked from PR #2781 by juanfradb.
* fix(session): surface silent SessionDB failures that cause session data loss
SessionDB initialization and operation failures were logged at debug level
or silently swallowed, causing sessions to never be indexed in the FTS5
database. This made session_search unable to find affected conversations.
In practice, ~48% of sessions can be lost without any visible indication.
The JSON session files are still written (separate code path), but the
SQLite/FTS5 index gets nothing — making session_search return empty results
for affected sessions.
Changes:
- cli.py: Log warnings (not debug) when SessionDB init fails at both
__init__ and _start_session entry points
- run_agent.py: Log warnings on create_session, append_message, and
compression split failures
- run_agent.py: Set _session_db = None after create_session failure to
fail fast instead of silently dropping every message for the session
Root cause: When gateway restarts or DB lock contention occurs during
SessionDB() init, the exception is caught and swallowed. The agent
continues running normally — JSON session logs are written to disk —
but no messages reach the FTS5 index.
* fix: use module logger instead of root logging for SessionDB warnings
Follow-up to cherry-picked PR #2939 — the original used logging.warning()
(root logger) instead of logger.warning() (module logger) in the 5 new
warning calls. Module logger preserves the logger hierarchy and shows the
correct module name in log output.
---------
Co-authored-by: LucidPaths <lc77@outlook.de>
* docs: unify hooks documentation — add plugin hooks to hooks page, add session:end event
The hooks page only documented gateway event hooks (HOOK.yaml system).
The plugins page listed plugin hooks (pre_tool_call, etc.) that weren't
referenced from the hooks page, which was confusing.
Changes:
- hooks.md: Add overview table showing both hook systems
- hooks.md: Add Plugin Hooks section with available hooks, callback
signatures, and example
- hooks.md: Add missing session:end gateway event (emitted but undocumented)
- hooks.md: Mark pre_llm_call, post_llm_call, on_session_start,
on_session_end as planned (defined in VALID_HOOKS but not yet invoked)
- hooks.md: Update info box to cross-reference plugin hooks
- hooks.md: Fix heading hierarchy (gateway content as subsections)
- plugins.md: Add cross-reference to hooks page for full details
- plugins.md: Mark planned hooks as (planned)
* feat(session_search): add recent sessions mode when query is omitted
When session_search is called without a query (or with an empty query),
it now returns metadata for the most recent sessions instead of erroring.
This lets the agent quickly see what was worked on recently without
needing specific keywords.
Returns for each session: session_id, title, source, started_at,
last_active, message_count, preview (first user message).
Zero LLM cost — pure DB query. Current session lineage and child
delegation sessions are excluded.
The agent can then keyword-search specific sessions if it needs
deeper context from any of them.
* docs: clarify two-mode behavior in session_search schema description
* fix(compression): restore sane defaults and cap summary at 12K tokens
- threshold: 0.80 → 0.50 (compress at 50%, not 80%)
- target_ratio: 0.40 → 0.20, now relative to threshold not total context
(20% of 50% = 10% of context as tail budget)
- summary ceiling: 32K → 12K (Gemini can't output more than ~12K)
- Updated DEFAULT_CONFIG, config display, example config, and tests
* fix: browser_vision ignores auxiliary.vision.timeout config (#2901)
* docs: unify hooks documentation — add plugin hooks to hooks page, add session:end event
The hooks page only documented gateway event hooks (HOOK.yaml system).
The plugins page listed plugin hooks (pre_tool_call, etc.) that weren't
referenced from the hooks page, which was confusing.
Changes:
- hooks.md: Add overview table showing both hook systems
- hooks.md: Add Plugin Hooks section with available hooks, callback
signatures, and example
- hooks.md: Add missing session:end gateway event (emitted but undocumented)
- hooks.md: Mark pre_llm_call, post_llm_call, on_session_start,
on_session_end as planned (defined in VALID_HOOKS but not yet invoked)
- hooks.md: Update info box to cross-reference plugin hooks
- hooks.md: Fix heading hierarchy (gateway content as subsections)
- plugins.md: Add cross-reference to hooks page for full details
- plugins.md: Mark planned hooks as (planned)
* fix: browser_vision ignores auxiliary.vision.timeout config
browser_vision called call_llm() without passing a timeout parameter,
so it always used the 30-second default in auxiliary_client.py. This
made vision analysis with local models (llama.cpp, ollama) impossible
since they typically need more than 30s for screenshot analysis.
Now browser_vision reads auxiliary.vision.timeout from config.yaml
(same config key that vision_analyze already uses) and passes it
through to call_llm().
Also bumped the default vision timeout from 30s to 120s in both
browser_vision and vision_analyze — 30s is too aggressive for local
models and the previous default silently failed for anyone running
vision locally.
Fixes user report from GamerGB1988.
* fix(skills): agent-created skills were incorrectly treated as untrusted community content
_resolve_trust_level() didn't handle 'agent-created' source, so it
fell through to 'community' trust level. Community policy blocks on
any caution or dangerous findings, which meant common patterns like
curl with env vars, systemctl, crontab, cloudflared references etc.
would block skill creation/patching.
The agent-created policy row already existed in INSTALL_POLICY with
permissive settings (allow caution, ask on dangerous) but was never
reached. Now it is.
Fixes reports of skill_manage being blocked by security scanner.
* fix(cli): enhance real-time reasoning output by forcing flush of long partial lines
Updated the reasoning output mechanism to emit complete lines and force-flush long partial lines, ensuring reasoning is visible in real-time even without newlines. This improves user experience during reasoning sessions.
* fix: skip KawaiiSpinner when TUI handles tool progress
In the interactive CLI, the agent runs with quiet_mode=True and
tool_progress_callback set. The quiet_mode condition triggered
KawaiiSpinner for every tool call, but the TUI was already handling
progress display via the spinner widget.
The KawaiiSpinner writes carriage-return animation through StdoutProxy,
triggering run_in_terminal() erase/redraw cycles on every flush. These
redundant cycles cause the status bar to ghost into terminal scrollback.
The thinking spinner already had this guard (checks thinking_callback).
This extends the same pattern to the three tool spinner creation sites:
concurrent tools, delegate_task, and single tool execution.
Complete cleanup after dropping the mini-swe-agent submodule (PR #2804):
- Remove MSWEA_SILENT_STARTUP and MSWEA_GLOBAL_CONFIG_DIR env var
settings from cli.py, run_agent.py, hermes_cli/main.py, doctor.py
- Remove mini-swe-agent health check from hermes doctor
- Remove 'minisweagent' from logger suppression lists
- Remove litellm/typer/platformdirs from requirements.txt
- Remove mini-swe-agent install steps from install.ps1 (Windows)
- Remove mini-swe-agent install steps from website docs
- Update all stale comments/docstrings referencing mini-swe-agent
in terminal_tool.py, tools/__init__.py, code_execution_tool.py,
environments/README.md, environments/agent_loop.py
- Remove mini_swe_runner from pyproject.toml py-modules
(still exists as standalone script for RL training use)
- Shrink test_minisweagent_path.py to empty stub
The orphaned mini-swe-agent/ directory on disk needs manual removal:
rm -rf mini-swe-agent/
Drop the mini-swe-agent git submodule. All terminal backends now use
hermes-agent's own environment implementations directly.
Docker backend:
- Inline the `docker run -d` container startup (was 15 lines in
minisweagent's DockerEnvironment). Our wrapper already handled
execute(), cleanup(), security hardening, volumes, and resource limits.
Modal backend:
- Import swe-rex's ModalDeployment directly instead of going through
minisweagent's 90-line passthrough wrapper.
- Bake the _AsyncWorker pattern (from environments/patches.py) directly
into ModalEnvironment for Atropos compatibility without monkey-patching.
Cleanup:
- Remove minisweagent_path.py (submodule path resolution helper)
- Remove submodule init/install from install.sh and setup-hermes.sh
- Remove mini-swe-agent from .gitmodules
- environments/patches.py is now a no-op (kept for backward compat)
- terminal_tool.py no longer does sys.path hacking for minisweagent
- mini_swe_runner.py guards imports (optional, for RL training only)
- Update all affected tests to mock the new direct subprocess calls
- Update README.md, CONTRIBUTING.md
No functionality change — all Docker, Modal, local, SSH, Singularity,
and Daytona backends behave identically. 6093 tests pass.
Phase 4 of the /model command overhaul.
Both the CLI (cli.py) and gateway (gateway/run.py) /model handlers
had ~50 lines of duplicated core logic: parsing, provider detection,
credential resolution, and model validation. This extracts that
pipeline into hermes_cli/model_switch.py.
New module exports:
- ModelSwitchResult: dataclass with all fields both handlers need
- CustomAutoResult: dataclass for bare '/model custom' results
- switch_model(): core pipeline — parse → detect → resolve → validate
- switch_to_custom_provider(): resolve endpoint + auto-detect model
The shared functions are pure (no I/O side effects). Each caller
handles its own platform-specific concerns:
- CLI: sets self.model/provider/etc, calls save_config_value(), prints
- Gateway: writes config.yaml directly, sets env vars, returns markdown
Net result: -244 lines from handlers, +234 lines in shared module.
The handlers are now ~80 lines each (down from ~150+) and can't drift
apart on core logic.
* feat(model): persist base_url on /model switch, auto-detect for bare /model custom
Phase 2+3 of the /model command overhaul:
Phase 2 — Persist base_url on model switch:
- CLI: save model.base_url when switching to a non-OpenRouter endpoint;
clear it when switching away from custom to prevent stale URLs
leaking into the new provider's resolution
- Gateway: same logic using direct YAML write
Phase 3 — Better feedback and edge cases:
- Bare '/model custom' now auto-detects the model from the endpoint
using _auto_detect_local_model() and saves all three config values
(model, provider, base_url) atomically
- Shows endpoint URL in success messages when switching to/from
custom providers (both CLI and gateway)
- Clear error messages when no custom endpoint is configured
- Updated test assertions for the additional save_config_value call
Fixes#2562 (Phase 2+3)
* feat(model): support custom:name:model triple syntax for named custom providers
Phase 5 of the /model command overhaul.
Extends parse_model_input() to handle the triple syntax:
/model custom:local-server:qwen → provider='custom:local-server', model='qwen'
/model custom:my-model → provider='custom', model='my-model' (unchanged)
The 'custom:local-server' provider string is already supported by
_get_named_custom_provider() in runtime_provider.py, which matches
it against the custom_providers list in config.yaml. This just wires
the parsing so users can do it from the /model slash command.
Added 4 tests covering single, triple, whitespace, and empty model cases.
- Updated `_on_tool_gen_start` method in `HermesCLI` to close open streaming boxes exactly once, preventing potential multiple closures.
- Added a check for `_stream_box_opened` to manage the state of the streaming box more effectively, enhancing user experience during large payload streaming.
- Introduced `_on_tool_gen_start` in `HermesCLI` to indicate when tool-call arguments are being generated, enhancing user feedback during streaming.
- Updated `AIAgent` to support a new `tool_gen_callback`, notifying the display layer when tool generation starts, allowing for better user experience during large payloads.
- Ensured that the callback is triggered appropriately during streaming events to prevent user interface freezing.
* feat(config): support ${ENV_VAR} substitution in config.yaml
* fix: extend env var expansion to CLI and gateway config loaders
The original PR (#2680) only wired _expand_env_vars into load_config(),
which is used by 'hermes tools' and 'hermes setup'. The two primary
config paths were missed:
- load_cli_config() in cli.py (interactive CLI)
- Module-level _cfg in gateway/run.py (gateway — bridges api_keys to env vars)
Also:
- Remove redundant 'import re' (already imported at module level)
- Add missing blank lines between top-level functions (PEP 8)
- Add tests for load_cli_config() expansion
---------
Co-authored-by: teyrebaz33 <hakanerten02@hotmail.com>
Previously 'Activated skills: xxx' was printed above the banner in
show_banner(). Now it prints directly after the 'Welcome to Hermes
Agent!' line in run(), which is a more natural placement.
Local LLM servers (llama.cpp, ollama, vLLM, etc.) typically don't
require authentication. When a custom base_url is configured but no
API key is found, use a placeholder instead of failing with
'Provider resolver returned an empty API key.'
The OpenAI SDK accepts any string as api_key, and local servers
simply ignore the Authorization header.
Fixes issue reported by @ThatWolfieGuy — llama.cpp stopped working
after updating because the new runtime provider resolver enforces
non-empty API keys even for keyless local endpoints.
When AsyncOpenAI clients are garbage-collected after the event loop
closes, their AsyncHttpxClientWrapper.__del__ tries to schedule
aclose() on the dead loop, causing RuntimeError: Event loop is closed.
prompt_toolkit catches this as an unhandled exception and shows
'Press ENTER to continue...' which blocks CLI exit.
Fix: Add shutdown_cached_clients() to auxiliary_client.py that marks
all cached async clients' underlying httpx transport as CLOSED before
GC runs. This prevents __del__ from attempting the aclose() call.
- _force_close_async_httpx(): sets httpx AsyncClient._state to CLOSED
- shutdown_cached_clients(): iterates _client_cache, closes sync clients
normally and marks async clients as closed
- Also fix stale client eviction in _get_cached_client to mark evicted
async clients as closed (was just del-ing them, triggering __del__)
- Call shutdown_cached_clients() from _run_cleanup() in cli.py
Closes#2453
The DEFAULT_CONFIG was hardcoding google/gemini-3-flash-preview as the
summary_model for context compression. This caused unexpected OpenRouter
charges for users who configured a different provider/model, because the
compression task would silently fall back to gemini via OpenRouter even
when the user's main model was on a different provider.
Fix: change summary_model default to empty string. When empty,
call_llm() resolves the model through the standard auto-detection chain
(auxiliary.compression config -> env vars -> main provider), which
correctly uses the user's configured provider and model.
Users who want a dedicated cheap model for compression can still
explicitly set compression.summary_model in their config.yaml.
Two fixes:
1. CLI /stop command crashed with 'cannot import name get_registry' —
the code imported a non-existent function. Fixed to use the actual
process_registry singleton and list_sessions() method.
(Reported in #2458 by haiyuzhong1980)
2. Streaming media delivery used undefined 'adapter' variable —
our PR #2382 called _deliver_media_from_response(adapter=adapter)
but 'adapter' wasn't guaranteed to be defined in that scope.
Fixed to resolve via self.adapters.get(source.platform).
(Reported in #2424 by 42-evey)
Two related root causes for the '?[33mTool progress: NEW?[0m' garbling
reported on kitty, alacritty, ghostty and gnome-console:
1. /verbose label printing used self.console.print() with Rich markup
([yellow]...[/]). self.console is a plain Rich Console() whose output
goes directly to sys.stdout, which patch_stdout's StdoutProxy
intercepts and mangles raw ANSI sequences.
2. Context pressure status lines (e.g. 'approaching compaction') from
AIAgent._safe_print() had the same problem -- _safe_print() was a
@staticmethod that always called builtin print(), bypassing the
prompt_toolkit renderer entirely.
Fix:
- Convert AIAgent._safe_print() from @staticmethod to an instance method
that delegates to self._print_fn (defaults to builtin print, preserving
all non-CLI behaviour).
- After the CLI creates its AIAgent instance, wire self.agent._print_fn to
the existing _cprint() helper which routes through
prompt_toolkit.print_formatted_text(ANSI(text)).
- Rewrite the /verbose feedback labels to use hermes_cli.colors.Colors
ANSI constants in f-strings and emit them via _cprint() directly,
removing the Rich-markup-inside-patch_stdout anti-pattern.
Fixes#2262
Co-authored-by: Animesh Mishra <animesh.m.7523@gmail.com>
Add @file:path, @folder:dir, @diff, @staged, @git:N, and @url:
references that expand inline before the message reaches the LLM.
Supports line ranges (@file:main.py:10-50), token budget enforcement
(soft warn at 25%, hard block at 50%), and path sandboxing for gateway.
Core module from PR #2090 by @kshitijk4poor. CLI and gateway wiring
rewritten against current main. Fixed asyncio.run() crash when called
from inside a running event loop (gateway).
Closes#682.
hermes chat -q 'msg' --resume SESSION_ID loaded the session history
but never passed it to run_conversation(), so the model responded
without prior context. The interactive mode already does this correctly.
Based on work by christopher-kapic in PR #2081. Fixes#2106.
Reverts the s-enter and Kitty CSI keybindings from PR #2345/#2346.
The s-enter key notation causes 'Invalid key: s-enter' crash on
some prompt_toolkit versions, breaking hermes startup entirely.
PR #2346 was merged with unresolved git conflict markers (<<<<<<,
=======, >>>>>>>) in cli.py at line 6047, causing SyntaxError on
startup. Resolved by keeping both the Shift+Enter keybindings and
the tab handler.
Kitty-protocol terminals (Ghostty, WezTerm) encode Shift+Enter as
CSI 13;2u instead of plain Enter. Without this binding, raw escape
characters appear in the input buffer. Adds s-enter and the Kitty
escape sequence as newline-insert bindings.
Based on work by ygd58 in PR #1798. Fixes#1795.
Registry.py apostrophe sanitization change excluded (unrelated scope).
Streaming provides a better UX — tokens appear as they arrive instead
of waiting for the full response. show_reasoning remains false so
thinking blocks are not streamed to the user.
Based on PR #1749 by @erosika (reimplemented on current main).
Extracts three protected methods from run() so wrapper CLIs can extend
the TUI without overriding the entire method:
- _get_extra_tui_widgets(): inject widgets between spacer and status bar
- _register_extra_tui_keybindings(kb, input_area): add keybindings
- _build_tui_layout_children(**widgets): full control over ordering
Default implementations reproduce existing layout exactly. The inline
HSplit in run() now delegates to _build_tui_layout_children().
5 tests covering defaults, widget insertion position, and keybinding
registration.
Add explicitly_configured field to HonchoClientConfig — set when the
config has a hosts.hermes block or explicit enabled flag, vs auto-enabled
from a stray HONCHO_API_KEY env var. Banner only shows when this is true.
Based on #1960 by @erosika, reimplemented without duplicating config parsing.
Cherry-picked from PR #2295 by @dlkakbs.
The web_extract auxiliary client api_key env var was literally stored as
'AUXILI..._KEY' (dots in the source) instead of the full name. Users
configuring an auxiliary web_extract model with an API key would have
auth failures because the key was written to a non-existent var.
Added a check to suppress further reasoning rendering once the response box is open, preventing potential overlap of reasoning boxes during late thinking blocks. This enhances the user experience by maintaining a clean output in the CLI.
- Added a user bar separator for improved visual clarity when displaying pasted text and user input in the HermesCLI.
- Ensured consistent formatting for both multi-line and single-line user inputs, enhancing the overall user experience in the command-line interface.
These changes contribute to a more organized and visually appealing output during interactions.
The top-level 'toolsets' key in config.yaml was never read at runtime.
Tool selection uses platform_toolsets (per-platform) or the --toolsets
CLI flag. The key existed in load_cli_config() defaults and the example
config as 'toolsets: [all]', misleading users into thinking it
controlled tool availability.
- Remove from load_cli_config() hardcoded defaults
- Remove from hermes config show output
- Replace in cli-config.yaml.example with deprecation note pointing
to platform_toolsets and hermes tools
- Added support for true-color ANSI escape codes in the HermesCLI to enhance the visual appearance of streamed content.
- Introduced a fallback mechanism for text color in case of errors while retrieving the color from the active skin.
- Updated the output formatting to include the new text color in both line emissions and buffer flushing.
These changes improve the user experience by ensuring consistent and visually appealing text output in the command-line interface.
- Changed the ANSI escape code for gold color in cli.py and banner.py to use true-color format (#FFD700) for better visual consistency.
- Enhanced the _on_tool_progress method in HermesCLI to update the TUI spinner with tool execution status, improving user feedback during operations.
These changes improve the visual representation and user experience in the command-line interface.
Co-authored-by: Test <test@test.com>
- Updated _stream_delta method in HermesCLI to handle None values, flushing the stream and resetting state for clean tool execution.
- Enhanced quiet mode handling in AIAgent to ensure proper display closure before tool execution, preventing display issues with intermediate streamed content.
These changes improve the robustness of the streaming functionality and ensure a smoother user experience during tool interactions.
Adds /queue <prompt> (alias /q) that queues a message for the next
turn while the agent is busy, without interrupting the current run.
- CLI: /queue <prompt> puts it in _pending_input for the next turn
- Gateway: /queue <prompt> creates a pending MessageEvent on the
adapter, picked up after the current agent run finishes
- Enter still interrupts as usual (no behavior change)
- /queue with no prompt shows usage
- /queue when agent is idle tells user to just type normally
Co-authored-by: Test <test@test.com>
When the user is on a custom provider (provider=custom, localhost, or
127.0.0.1 endpoint), /model <name> no longer tries to auto-detect a
provider switch. The model name changes on the current endpoint as-is.
To switch away from a custom endpoint, users must use explicit
provider:model syntax (e.g. /model openai-codex:gpt-5.2-codex).
A helpful tip is printed when changing models on a custom endpoint.
This prevents the confusing case where someone on LM Studio types
/model gpt-5.2-codex, the auto-detection tries to switch providers,
fails or partially succeeds, and requests still go to the old endpoint.
Also fixes the missing prompt_toolkit.auto_suggest mock stub in
test_cli_init.py (same issue already fixed in test_cli_new_session.py).
- Add <thinking> tag to streaming filter's tag list
- When show_reasoning is on, route XML reasoning content to the
reasoning display box instead of silently discarding it
- Expand _strip_think_blocks to handle all tag variants:
<think>, <thinking>, <THINKING>, <reasoning>, <REASONING_SCRATCHPAD>
Two issues with /model preventing proper provider switching:
1. Bare provider names not detected: typing '/model nous' treated 'nous'
as a model name instead of triggering a provider switch. Fixed by adding
step 0 in detect_provider_for_model() that checks if the input matches
a known provider name/alias (excluding 'custom'/'openrouter' which need
explicit model names) and returns that provider's default model.
2. Custom endpoint details hidden: /model (no args) showed '[custom]' with
just a usage hint but no endpoint URL or model name. Now displays the
configured base_url for custom providers in both CLI and gateway.
Note: config base_url and OPENAI_BASE_URL are intentionally NOT cleared on
provider switch — dedicated provider paths (nous, anthropic, codex) have
their own credential resolution that ignores these, and clearing them would
destroy the user's custom endpoint config, preventing switching back.
Co-authored-by: Test <test@test.com>
Previously, Tab only handled dropdown completions. Users seeing gray
ghost text from history-based suggestions had no way to accept them
with Tab - they had to use Right arrow or Ctrl+E.
Now Tab follows priority:
1. Completion menu open → accept selected completion
2. Ghost text suggestion available → accept auto-suggestion
3. Otherwise → start completion menu
This matches user intuition that Tab should 'complete what I see.'
* fix: detect context length for custom model endpoints via fuzzy matching + config override
Custom model endpoints (non-OpenRouter, non-known-provider) were silently
falling back to 2M tokens when the model name didn't exactly match what the
endpoint's /v1/models reported. This happened because:
1. Endpoint metadata lookup used exact match only — model name mismatches
(e.g. 'qwen3.5:9b' vs 'Qwen3.5-9B-Q4_K_M.gguf') caused a miss
2. Single-model servers (common for local inference) required exact name
match even though only one model was loaded
3. No user escape hatch to manually set context length
Changes:
- Add fuzzy matching for endpoint model metadata: single-model servers
use the only available model regardless of name; multi-model servers
try substring matching in both directions
- Add model.context_length config override (highest priority) so users
can explicitly set their model's context length in config.yaml
- Log an informative message when falling back to 2M probe, telling
users about the config override option
- Thread config_context_length through ContextCompressor and AIAgent init
Tests: 6 new tests covering fuzzy match, single-model fallback, config
override (including zero/None edge cases).
* fix: auto-detect local model name and context length for local servers
Cherry-picked from PR #2043 by sudoingX.
- Auto-detect model name from local server's /v1/models when only one
model is loaded (no manual model name config needed)
- Add n_ctx_train and n_ctx to context length detection keys for llama.cpp
- Query llama.cpp /props endpoint for actual allocated context (not just
training context from GGUF metadata)
- Strip .gguf suffix from display in banner and status bar
- _auto_detect_local_model() in runtime_provider.py for CLI init
Co-authored-by: sudo <sudoingx@users.noreply.github.com>
* fix: revert accidental summary_target_tokens change + add docs for context_length config
- Revert summary_target_tokens from 2500 back to 500 (accidental change
during patching)
- Add 'Context Length Detection' section to Custom & Self-Hosted docs
explaining model.context_length config override
---------
Co-authored-by: Test <test@test.com>
Co-authored-by: sudo <sudoingx@users.noreply.github.com>
Adds /statusbar (alias /sb) to show/hide the bottom status bar that
displays model name, context usage, and session duration.
Uses ConditionalContainer so the bar takes zero space when hidden
rather than leaving a blank line.