- Use Optional[List[str]] instead of List[str] | None (consistency)
- Add header, per-platform counts, and checkmark list format
- Matches the visual style of the interactive configurator
Documents the two-tier budget warning system from PR #762:
- Explains caution (70%) and warning (90%) thresholds
- Table showing what the model sees at each tier
- Notes on how injection preserves prompt caching
- Links to max_turns config
Authored by aydnOktay. Replaces print() statements with structured
logging calls (error/warning/info/debug) throughout the Telegram
adapter. Adds exc_info=True for stack traces on failures.
Two-tier warning system that nudges the LLM as it approaches
max_iterations, injected into the last tool result JSON rather
than as a separate system message:
- Caution (70%): {"_budget_warning": "[BUDGET: 42/60...]"}
- Warning (90%): {"_budget_warning": "[BUDGET WARNING: 54/60...]"}
For JSON tool results, adds a _budget_warning field to the existing
dict. For plain text results, appends the warning as text.
Key properties:
- No system messages injected mid-conversation
- No changes to message structure
- Prompt cache stays valid
- Configurable thresholds (0.7 / 0.9)
- Can be disabled: _budget_pressure_enabled = False
Inspired by PR #421 (@Bartok9) and issue #414.
8 tests covering thresholds, edge cases, JSON and text injection.
Authored by aydnOktay. Replaces bare print statements with structured
logger calls (error/warning/info) and adds exc_info=True for stack
traces on failure paths.
Documents the quick_commands config feature from PR #746:
- configuration.md: full section with examples (server status, disk,
gpu, update), behavior notes (timeout, priority, works everywhere)
- cli.md: brief section with config example + link to config guide
Replaces head-only stdout capture with a two-buffer approach (40% head,
60% tail rolling window) so scripts that print() their final results
at the end never lose them. Adds truncation notice between sections.
Cherry-picked from PR #755, conflict resolved (test file additions).
3 new tests for short output, head+tail preservation, and notice format.
Adds configurable bot message filtering via DISCORD_ALLOW_BOTS env var:
- 'none' (default): ignore all bot messages
- 'mentions': accept bots only when they @mention us
- 'all': accept all bot messages
Includes 8 tests.
Authored by teyrebaz33. Adds config-driven quick commands that execute
shell commands without invoking the LLM — zero token usage, works from
Telegram/Discord/Slack/etc. Closes#744.
Post-merge follow-up to PR #752 — adds 10 commands that were added
since the PR was submitted:
Session: /title, /compress, /rollback
Configuration: /provider, /verbose, /skin
Tools & Skills: /reload-mcp (+ full /skills description)
Info: /usage, /insights, /paste
Also preserved existing color formatting (_cprint, _GOLD, _BOLD, _DIM)
and skill commands section from main.
- Organize COMMANDS into COMMANDS_BY_CATEGORY dict
- Group commands: Session, Configuration, Tools & Skills, Info, Exit
- Add visual category headers with spacing
- Maintain backwards compat via flat COMMANDS dict
- Better visual hierarchy and scannability
Before:
/help - Show this help message
/tools - List available tools
... (dense list)
After:
── Session ──
/new Start a new conversation
/reset Reset conversation only
...
── Configuration ──
/config Show current configuration
...
Closes#640
Follow-up to 58dbd81 — ensures smooth transition for existing users:
- Backward compat: old session files without last_prompt_tokens
default to 0 via data.get('last_prompt_tokens', 0)
- /compress, /undo, /retry: reset last_prompt_tokens to 0 after
rewriting transcripts (stale token counts would under-report)
- Auto-compression hygiene: reset last_prompt_tokens after rewriting
- update_session: use None sentinel (not 0) as default so callers
can explicitly reset to 0 while normal calls don't clobber
- 6 new tests covering: default value, serialization roundtrip,
old-format migration, set/reset/no-change semantics
- /reset: new SessionEntry naturally gets last_prompt_tokens=0
2942 tests pass.
Adds two tests for _handle_retry_command: verifies /retry returns the
agent response (not None), and verifies graceful handling when no
previous message exists.
Cherry-picked from PR #731 by teyrebaz33. Regression coverage for
the fix merged in PR #441.
Co-authored-by: teyrebaz33 <teyrebaz33@users.noreply.github.com>
Root cause of aggressive gateway compression vs CLI:
- CLI: single AIAgent persists across conversation, uses real API-reported
prompt_tokens for compression decisions — accurate
- Gateway: each message creates fresh AIAgent, token count discarded after,
next message pre-check falls back to rough str(msg)//4 estimate which
overestimates 30-50% on tool-heavy conversations
Fix:
- Add last_prompt_tokens field to SessionEntry — stores the actual
API-reported prompt token count from the most recent agent turn
- After run_conversation(), extract context_compressor.last_prompt_tokens
and persist it via update_session()
- Gateway pre-check now uses stored actual tokens when available (exact
same accuracy as CLI), falling back to rough estimate with 1.4x safety
factor only for the first message of a session
This makes gateway compression behave identically to CLI compression
for all turns after the first. Reported by TigerHix.
fetch_nous_models() returned models in whatever order the API gave
them, which put sonnet near the top. Add a priority sort so users
see the best models first: opus > pro > other > sonnet.
The gateway's session hygiene pre-check uses a rough char-based token
estimate (total_chars / 4) to decide whether to compress before the
agent starts. This significantly overestimates for tool-heavy and
code-heavy conversations because:
1. str(msg) on dicts includes Python repr overhead (keys, brackets, etc.)
2. Code/JSON tokenizes at 5-7+ chars/token, not the assumed 4
This caused users with 200k context to see compression trigger at
~100-113k actual tokens instead of the expected 170k (85% threshold).
Reported by TigerHix on Twitter.
Fix: apply a 1.4x safety factor to the gateway pre-check threshold.
This pre-check is only meant to catch pathologically large transcripts
— the agent's own compression uses actual API-reported token counts
for precise threshold management.
Authored by dmahan93. Adds HERMES_YOLO_MODE env var and --yolo CLI flag
to auto-approve all dangerous command prompts.
Post-merge: renamed --fuck-it-ship-it to --yolo for brevity,
resolved conflict with --checkpoints flag.
Adds -Q/--quiet to `hermes chat` for use by external orchestrators
(Paperclip, scripts, CI). When combined with -q, suppresses:
- Banner and ASCII art
- Spinner animations
- Tool preview lines (┊ prefix)
Only outputs:
- The agent's final response text
- A parseable 'session_id: <id>' line for session resumption
Usage: hermes chat -q 'Do something' -Q
Used by: Paperclip adapter (@nousresearch/paperclip-adapter-hermes)
On macOS, Docker Desktop installs the CLI to /usr/local/bin/docker, but
when Hermes runs as a gateway service (launchd) or in other non-login
contexts, /usr/local/bin is often not in PATH. This causes the Docker
requirements check to fail with 'No such file or directory: docker' even
though docker works fine from the user's terminal.
Add find_docker() helper that uses shutil.which() first, then probes
common Docker Desktop install paths on macOS (/usr/local/bin,
/opt/homebrew/bin, Docker.app bundle). The resolved path is cached and
passed to mini-swe-agent via its 'executable' parameter.
- tools/environments/docker.py: add find_docker(), use it in
_storage_opt_supported() and pass to _Docker(executable=...)
- tools/terminal_tool.py: use find_docker() in requirements check
- tests/tools/test_docker_find.py: 4 tests (PATH, fallback, not found, cache)
2877 tests pass.
Shows an immediate status message and braille spinner for slow slash
commands (/skills search|browse|inspect|install, /reload-mcp). Makes
input read-only while the command runs so the CLI doesn't appear frozen.
Cherry-picked from PR #714 by vilkasdev, rebased onto current main
with conflict resolution and bug fix (get_hint_text duplicate return).
Fixes#636
Co-authored-by: vilkasdev <vilkasdev@users.noreply.github.com>
Two related bugs prevented users from reliably switching providers:
1. OPENAI_BASE_URL poisoning OpenRouter resolution: When a user with a
custom endpoint ran /model openrouter:model, _resolve_openrouter_runtime
picked up OPENAI_BASE_URL instead of the OpenRouter URL, causing model
validation to probe the wrong API and reject valid models.
Fix: skip OPENAI_BASE_URL when requested_provider is explicitly
'openrouter'.
2. Provider never saved to config: _save_model_choice() could save
config.model as a plain string. All five _model_flow_* functions then
checked isinstance(model, dict) before writing the provider — which
silently failed on strings. With no provider in config, auto-detection
would pick up stale credentials (e.g. Codex desktop app) instead of
the user's explicit choice.
Fix: _save_model_choice() now always saves as dict format. All flow
functions also normalize string->dict as a safety net before writing
provider.
Adds 4 regression tests. 2873 tests pass.
Follow-up to PR #716 (0xbyt4):
- Log the third remaining silent except-pass in scheduler (prefill
messages JSON parse failure)
- Fix test mock: run → run_conversation (matches actual agent API)
- Remove unused imports (asyncio, AsyncMock)
- Add test for prefill_messages parse failure logging
The 4 early-return paths in _spawn_training_run (API exit, trainer
exit, env not found, env exit) were doing manual process.terminate()
or returning without cleanup, leaking open log file handles. Now all
paths call _stop_training_run() which handles both process termination
and file handle closure.
Also adds 12 tests for _stop_training_run covering file handle
cleanup, process termination, status transitions, and edge cases.
Inspired by PR #715 (0xbyt4) which identified the early-return issue.
Core file handle fix was already on main via e28dc13 (memosr.eth).
Follow-up to PR #705 (merged from 0xbyt4). Addresses several issues:
1. CONSECUTIVE-ONLY TRACKING: Redesigned the read/search tracker to only
warn/block on truly consecutive identical calls. Any other tool call
in between (write, patch, terminal, etc.) resets the counter via
notify_other_tool_call(), called from handle_function_call() in
model_tools.py. This prevents false blocks in read→edit→verify flows.
2. THRESHOLD ADJUSTMENT: Warn on 3rd consecutive (was 2nd), block on
4th+ consecutive (was 3rd+). Gives the model more room before
intervening.
3. TUPLE UNPACKING BUG: Fixed get_read_files_summary() which crashed on
search keys (5-tuple) when trying to unpack as 3-tuple. Now uses a
separate read_history set that only tracks file reads.
4. WEB_EXTRACT DOCSTRING: Reverted incorrect removal of 'title' from
web_extract return docs in code_execution_tool.py — the field IS
returned by web_tools.py.
5. TESTS: Rewrote test_read_loop_detection.py (35 tests) to cover
consecutive-only behavior, notify_other_tool_call, interleaved
read/search, and summary-unaffected-by-searches.
Use rich_box.HORIZONTALS instead of the default ROUNDED box style
for the agent response panel. This keeps the top/bottom horizontal
rules (with title) but removes the vertical │ borders on left and
right, making it much easier to copy-paste response text from the
terminal.
Three separate code paths all wrote to the same SQLite state.db with
no deduplication, inflating session transcripts by 3-4x:
1. _log_msg_to_db() — wrote each message individually after append
2. _flush_messages_to_session_db() — re-wrote ALL new messages at
every _persist_session() call (~18 exit points), with no tracking
of what was already written
3. gateway append_to_transcript() — wrote everything a third time
after the agent returned
Since load_transcript() prefers SQLite over JSONL, the inflated data
was loaded on every session resume, causing proportional token waste.
Fix:
- Remove _log_msg_to_db() and all 16 call sites (redundant with flush)
- Add _last_flushed_db_idx tracking in _flush_messages_to_session_db()
so repeated _persist_session() calls only write truly new messages
- Reset flush cursor on compression (new session ID)
- Add skip_db parameter to SessionStore.append_to_transcript() so the
gateway skips SQLite writes when the agent already persisted them
- Gateway now passes skip_db=True for agent-managed messages, still
writes to JSONL as backup
Verified: a 12-message CLI session with tool calls produces exactly
12 SQLite rows with zero duplicates (previously would be 36-48).
Tests: 9 new tests covering flush deduplication, skip_db behavior,
compression reset, and initialization. Full suite passes (2869 tests).
Signal's send() used 'text' instead of 'content' and 'reply_to_message_id'
instead of 'reply_to', mismatching BasePlatformAdapter.send(). Callers in
gateway/run.py use keyword args matching the base interface, so Signal's
send() was missing its required 'text' positional arg.
Fixes: 'SignalAdapter.send() missing 1 required positional argument: text'
_keep_typing() was called with metadata= for thread-aware typing
indicators, but neither it nor the base send_typing() accepted
that parameter. Most adapter overrides (Slack, Discord, Telegram,
WhatsApp, HA) already accept metadata=None, but the base class
and Signal adapter did not.
- Add metadata=None to BasePlatformAdapter.send_typing()
- Add metadata=None to BasePlatformAdapter._keep_typing(), pass through
- Add metadata=None to SignalAdapter.send_typing()
Fixes TypeError in _process_message_background for Signal.
The Signal adapter was passing image_paths, audio_path, and document_paths
to MessageEvent.__init__(), but those fields don't exist on the dataclass.
MessageEvent uses media_urls (List[str]) and media_types (List[str]).
Changes:
- Replace separate image_paths/audio_path/document_paths with unified
media_urls and media_types lists (matching Discord, Slack, etc.)
- Add _ext_to_mime() helper to map file extensions to MIME types
- Use Signal's contentType from attachment metadata when available,
falling back to extension-based mapping
- Update message type detection to check media_types prefixes
Fixes TypeError: MessageEvent.__init__() got an unexpected keyword
argument 'image_paths'
Major UX improvements:
1. Response box now uses a Rich Panel rendered through ChatConsole
instead of hand-rolled ANSI box-drawing borders. Rich Panels
adapt to terminal width at render time, wrap content inside
the borders properly, and use skin colors natively.
2. ChatConsole now reads terminal width at render time via
shutil.get_terminal_size() instead of defaulting to 80 cols.
All Rich output adapts to the current terminal size.
3. User-input separator reduced to fixed 40-char width so it
never wraps regardless of terminal resize.
4. Approval and clarify countdown repaints throttled to every 5s
(was 1s), dramatically reducing flicker in Kitty/ghostty.
Selection changes still trigger instant repaints via key bindings.
5. Sudo widget now uses dynamic _panel_box_width() instead of
hardcoded border strings.
Tests: 2860 passed.