Extract the inline file-drop detection logic into a standalone
_detect_file_drop() function at module level for testability. The main
loop now calls this function instead of inlining the logic.
Tests cover:
- Slash commands still route correctly (/help, /quit, /xyz)
- Image paths auto-detected (.png, .jpg, .gif, etc.)
- Non-image files detected (.py, .txt, Makefile, etc.)
- Backslash-escaped spaces from macOS drag-and-drop
- Trailing user text preserved as remainder
- Edge cases: directories, symlinks, no-extension files
- Non-string input, empty strings, nonexistent paths
When a user drags a file into the terminal, macOS pastes the absolute
path (e.g. /Users/roland/Desktop/Screenshot.png) which starts with '/'
and was incorrectly routed to process_command(), producing an 'Unknown
command' error.
This change adds file-path detection before the slash-command check:
- Parses the first token, handling backslash-escaped spaces from macOS
- Checks if the path exists as a real file via Path.exists()
- Image files (.png, .jpg, etc.) are auto-attached to the message
- Non-image files are reformatted as [User attached file: ...] context
- Falls through to normal slash-command handling if not a real file path
_load_installable_optional_extras() was returning ALL extras from
pyproject.toml except 'all', which included 'rl' and 'yc-bench' —
extras not referenced by [all] that install heavy research deps
(atroposlib, tinker, wandb) from git repos. Changed to parse the
[all] group's references and only retry those 18 extras.
Also moved tomllib import to function-level since it only runs
during the rare fallback path.
- Add logger + debug log to read_nous_access_token() catch-all so token
refresh failures are observable instead of silently swallowed
- Tighten _is_nous_auxiliary_client() domain check to use proper URL
hostname parsing instead of substring match, preventing false-positives
on domains like not-nousresearch.com or nousresearch.com.evil.com
- Add missing `from agent.credential_pool import load_pool` import to
auxiliary_client.py (introduced by the credential pool feature in main)
- Thread `args` through `select_provider_and_model(args=None)` so TLS
options from `cmd_model` reach `_model_flow_nous`
- Mock `_require_tty` in test_cmd_model_forwards_nous_login_tls_options
so it can run in non-interactive test environments
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move e2e tests into tests.yml as a parallel job instead of a separate
workflow. Unit tests now also ignore tests/e2e/ to avoid running them
twice. Both jobs appear as independent checks in the PR.
New test classes:
- TestSessionLifecycle: /new then /status sequence, idempotent resets
- TestAuthorization: unauthorized users get pairing code, not commands
- TestSendFailureResilience: pipeline survives send() failures
Additional command coverage: /provider, /verbose, /personality, /yolo.
Note: /provider test is xfail - found a real bug where model_cfg is
referenced unbound when config.yaml is absent (run.py:3247).
Separate workflow for gateway e2e tests, runs on push/PR to main.
Same Python 3.11 + uv setup as existing tests.yml but targets only
tests/e2e/ with verbose output.
Tests /help, /status, /new, /stop, /commands through the full adapter
background-task pipeline. Validates command dispatch, session lifecycle,
and response delivery without any LLM involvement.
Fixtures and helpers for driving messages through the full async
pipeline: adapter.handle_message → background task → GatewayRunner
command dispatch → adapter.send (mocked).
Uses the established _make_runner pattern (object.__new__) to skip
filesystem side effects while exercising real command dispatch logic.
No model, base_url, or provider is assumed when the user hasn't
configured one. Previously the defaults dict in cli.py, AIAgent
constructor args, and several fallback paths all hardcoded
anthropic/claude-opus-4.6 + openrouter.ai/api/v1 — silently routing
unconfigured users to OpenRouter, which 404s for anyone using a
different provider.
Now empty defaults force the setup wizard to run, and existing users
who already completed setup are unaffected (their config.yaml has
the model they chose).
Files changed:
- cli.py: defaults dict, _DEFAULT_CONFIG_MODEL
- run_agent.py: AIAgent.__init__ defaults, main() defaults
- hermes_cli/config.py: DEFAULT_CONFIG
- hermes_cli/runtime_provider.py: is_fallback sentinel
- acp_adapter/session.py: default_model
- tests: updated to reflect empty defaults
OpenAI's newer models (GPT-5, Codex) give stronger instruction-following
weight to the 'developer' role vs 'system'. Swap the role at the API
boundary in _build_api_kwargs() for the chat_completions path so internal
message representation stays consistent ('system' everywhere).
Applies regardless of provider — OpenRouter, Nous portal, direct, etc.
The codex_responses path (direct OpenAI) uses 'instructions' instead of
message roles, so it's unaffected.
DEVELOPER_ROLE_MODELS constant in prompt_builder.py defines the matching
model name substrings: ('gpt-5', 'codex').
When a profile config sets model.model but not model.default, the
hardcoded default (claude-opus-4.6) survived the config merge and
took precedence in HermesCLI.__init__ because it checks model.default
first. Profile model configs were silently ignored.
Now model.model is promoted to model.default during the merge when the
user didn't explicitly set model.default. Fixes#4486.
* fix: force-close TCP sockets on client cleanup, detect and recover dead connections
When a provider drops connections mid-stream (e.g. OpenRouter outage),
httpx's graceful close leaves sockets in CLOSE-WAIT indefinitely. These
zombie connections accumulate and can prevent recovery without restarting.
Changes:
- _force_close_tcp_sockets: walks the httpx connection pool and issues
socket.shutdown(SHUT_RDWR) + close() to force TCP RST on every socket
when a client is closed, preventing CLOSE-WAIT accumulation
- _cleanup_dead_connections: probes the primary client's pool for dead
sockets (recv MSG_PEEK), rebuilds the client if any are found
- Pre-turn health check at the start of each run_conversation call that
auto-recovers with a user-facing status message
- Primary client rebuild after stale stream detection to purge pool
- User-facing messages on streaming connection failures:
"Connection to provider dropped — Reconnecting (attempt 2/3)"
"Connection failed after 3 attempts — try again in a moment"
Made-with: Cursor
* fix: pool entry missing base_url for openrouter, clean error messages
- _resolve_runtime_from_pool_entry: add OPENROUTER_BASE_URL fallback
when pool entry has no runtime_base_url (pool entries from auth.json
credential_pool often omit base_url)
- Replace Rich console.print for auth errors with plain print() to
prevent ANSI escape code mangling through prompt_toolkit's stdout patch
- Force-close TCP sockets on client cleanup to prevent CLOSE-WAIT
accumulation after provider outages
- Pre-turn dead connection detection with auto-recovery and user message
- Primary client rebuild after stale stream detection
- User-facing status messages on streaming connection failures/retries
Made-with: Cursor
* fix(gateway): persist memory flush state to prevent redundant re-flushes on restart
The _session_expiry_watcher tracked flushed sessions in an in-memory set
(_pre_flushed_sessions) that was lost on gateway restart. Expired sessions
remained in sessions.json and were re-discovered every restart, causing
redundant AIAgent runs that burned API credits and blocked the event loop.
Fix: Add a memory_flushed boolean field to SessionEntry, persisted in
sessions.json. The watcher sets it after a successful flush. On restart,
the flag survives and the watcher skips already-flushed sessions.
- Add memory_flushed field to SessionEntry with to_dict/from_dict support
- Old sessions.json entries without the field default to False (backward compat)
- Remove the ephemeral _pre_flushed_sessions set from SessionStore
- Update tests: save/load roundtrip, legacy entry compat, auto-reset behavior
The original test file had mock secrets corrupted by secret-redaction
tooling before commit — the test values (sk-ant...l012) didn't actually
trigger the PREFIX_RE regex, so 4 of 10 tests were asserting against
values that never appeared in the input.
- Replace truncated mock values with proper fake keys built via string
concatenation (avoids tool redaction during file writes)
- Add _ensure_redaction_enabled autouse fixture to patch the module-level
_REDACT_ENABLED constant, matching the pattern from test_redact.py
LLM responses from browser snapshot extraction and vision analysis
could echo back secrets that appeared on screen or in page content.
Input redaction alone is insufficient — the LLM may reproduce secrets
it read from screenshots (which cannot be text-redacted).
Now redact outputs from:
- _extract_relevant_content (auxiliary LLM response)
- browser_vision (vision LLM response)
- camofox_vision (vision LLM response)
Three exfiltration vectors closed:
1. Browser URL exfil — agent could embed secrets in URL params and
navigate to attacker-controlled server. Now scans URLs for known
API key patterns before navigating (browser_navigate, web_extract).
2. Browser snapshot leak — page displaying env vars or API keys would
send secrets to auxiliary LLM via _extract_relevant_content before
run_agent.py's redaction layer sees the result. Now redacts snapshot
text before the auxiliary call.
3. Camofox annotation leak — accessibility tree text sent to vision
LLM could contain secrets visible on screen. Now redacts annotation
context before the vision call.
10 new tests covering URL blocking, snapshot redaction, and annotation
redaction for both browser and camofox backends.
Reuse a single SessionDB across requests by caching on self._session_db
with lazy initialization. Avoids creating a new SQLite connection per
request when X-Hermes-Session-Id is used. Updated tests to set
adapter._session_db directly instead of patching the constructor.
Allow callers to pass X-Hermes-Session-Id in request headers to continue
an existing conversation. When provided, history is loaded from SessionDB
instead of the request body, and the session_id is echoed in the response
header. Without the header, existing behavior is preserved (new uuid per
request).
This enables web UI clients to maintain thread continuity without modifying
any session state themselves — the same mechanism the gateway uses for IM
platforms (Telegram, Discord, etc.).
When PyYAML is unavailable or YAML frontmatter is malformed, the fallback
parser may return metadata as a string instead of a dict. This causes
AttributeError when calling .get("hermes") on the string.
Added explicit type checks to handle cases where metadata or hermes fields
are not dicts, preventing the crash.
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
The original PR excluded auth.json from _DEFAULT_EXPORT_EXCLUDE_ROOT and
filtered both auth.json and .env from named profile exports, but missed
adding .env to the default profile exclusion set. Default exports would
still leak .env containing API keys.
Added .env to _DEFAULT_EXPORT_EXCLUDE_ROOT, added test coverage, and
updated the existing test that incorrectly asserted .env presence.
- stderr handler now uses RedactingFormatter to match file handlers
- restart path uses verbose=0 (int) instead of verbose=False (bool)
- test mock updated with new run_gateway(verbose, quiet, replace) signature
By default 'hermes gateway run' now prints WARNING+ to stderr so
connection errors and startup failures are visible in the terminal
without having to tail ~/.hermes/logs/gateway.log.
- gateway/run.py: start_gateway() accepts verbosity: Optional[int]=0.
When not None, attaches a StreamHandler to stderr with level mapped
from the count (0=WARNING, 1=INFO, 2+=DEBUG). Root logger level is
also lowered when DEBUG is requested so records are not swallowed.
- hermes_cli/gateway.py: run_gateway() gains verbose: int and
quiet: bool params. -q translates to verbosity=None (no stderr
handler). Wired through gateway_command().
- hermes_cli/main.py: -v changed from store_true to action=count so
-v/-vv/-vvv each increment the level. -q/--quiet added as a new flag.
Behaviour summary:
hermes gateway run -> WARNING+ on stderr (default)
hermes gateway run -q -> silent
hermes gateway run -v -> INFO+
hermes gateway run -vv -> DEBUG
The library removed the static get_transcript() method in v1.0.
Migrate to the new instance-based fetch() API and normalize
FetchedTranscriptSnippet objects back to dicts for compatibility
with the rest of the script.
When `sudo hermes gateway install --system --run-as-user <user>` generates
the systemd unit, get_hermes_home() resolves to /root/.hermes because
Path.home() returns root's home under sudo. The unit correctly sets
HOME= and User= via _system_service_identity(), but HERMES_HOME was
computed independently and pointed to root's config directory.
Add _hermes_home_for_target_user() which remaps the current HERMES_HOME
to the equivalent path under the target user's home. This handles:
- Default ~/.hermes → target user's ~/.hermes
- Profiles (e.g. ~/.hermes/profiles/coder) → preserves relative structure
- Custom paths (e.g. /opt/hermes) → kept as-is
Supersedes #3861 which only handled the default case and left profiles
broken (also flagged by Copilot review).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PR #4419 was based on pre-credential-pools main where _config_version was 10.
The squash merge downgraded it from 11 (set by #2647) back to 10.
Also fixes the test assertion.
* feat(skills): add content size limits for agent-created skills
Agent writes via skill_manage (create/edit/patch/write_file) are now
constrained to prevent unbounded growth:
- SKILL.md and supporting files: 100,000 character limit
- Supporting files: additional 1 MiB byte limit
- Patches on oversized hand-placed skills that reduce the size are
allowed (shrink path), but patches that grow beyond the limit are
rejected
Hand-placed skills and hub-installed skills have NO hard limit —
they load and function normally regardless of size. Hub installs
get a warning in the log if SKILL.md exceeds 100k chars.
This mirrors the memory system's char_limit pattern. Without this,
the agent auto-grows skills indefinitely through iterative patches
(hermes-agent-dev reached 197k chars / 72k tokens — 40x larger than
the largest skill in the entire skills.sh ecosystem).
Constants: MAX_SKILL_CONTENT_CHARS (100k), MAX_SKILL_FILE_BYTES (1MiB)
Tests: 14 new tests covering all write paths and edge cases
* feat(skills): add fuzzy matching to skill patch
_patch_skill now uses the same 8-strategy fuzzy matching engine
(tools/fuzzy_match.py) as the file patch tool. Handles whitespace
normalization, indentation differences, escape sequences, and
block-anchor matching. Eliminates exact-match failures when agents
patch skills with minor formatting mismatches.
Adds two Camofox features:
1. Persistent browser sessions: new `browser.camofox.managed_persistence`
config option. When enabled, Hermes sends a deterministic profile-scoped
userId to Camofox so the server maps it to a persistent browser profile
directory. Cookies, logins, and browser state survive across restarts.
Default remains ephemeral (random userId per session).
2. VNC URL discovery: Camofox /health endpoint returns vncPort when running
in headed mode. Hermes constructs the VNC URL and includes it in navigate
responses so the agent can share it with users.
Also fixes camofox_vision bug where call_llm response object was passed
directly to json.dumps instead of extracting .choices[0].message.content.
Changes from original PR:
- Removed browser_evaluate tool (separate feature, needs own PR)
- Removed snapshot truncation limit change (unrelated)
- Config.yaml only for managed_persistence (no env var, no version bump)
- Rewrote tests to use config mock instead of env var
- Reverted package-lock.json churn
Co-authored-by: analista <psikonetik@gmail.com.com>
The unified skill from PR #4332 was placed at a top-level
skills/hermes-agent/ directory, creating a redundant standalone
category. Move it to skills/autonomous-ai-agents/hermes-agent/
alongside claude-code, codex, and opencode where it belongs.
The total_tokens field includes cache_read + cache_write tokens, but
the display only showed input + output — making the math look wrong
(e.g. 765K + 134K displayed but total said 9.2M). Now shows a cache
line when cache tokens are present so all visible numbers sum to the
displayed total.
Affects both terminal (hermes insights) and gateway (/insights)
formats.
Show inline diffs in the CLI transcript when write_file, patch, or
skill_manage modifies files. Captures a filesystem snapshot before the
tool runs, computes a unified diff after, and renders it with ANSI
coloring in the activity feed.
Adds tool_start_callback and tool_complete_callback hooks to AIAgent
for pre/post tool execution notifications.
Also fixes _extract_parallel_scope_path to normalize relative paths
to absolute, preventing the parallel overlap detection from missing
conflicts when the same file is referenced with different path styles.
Gated by display.inline_diffs config option (default: true).
Based on PR #3774 by @kshitijk4poor.
- Add _DEFAULT_EXPORT_EXCLUDE_ROOT constant with 25+ entries to exclude
from default profile exports: repo checkout (hermes-agent), worktrees,
databases (state.db), caches, runtime state, logs, binaries
- Add _default_export_ignore() with root-level and universal exclusions
(__pycache__, *.sock, *.tmp at any depth)
- Remove redundant shutil/tempfile imports from contributor's if-block
- Block import_profile() from accepting 'default' as target name with
clear guidance to use --name
- Add 7 tests covering: archive creation, inclusion of profile data,
exclusion of infrastructure, nested __pycache__ exclusion, import
rejection without --name, import rejection with --name default,
full export-import roundtrip with a different name
Addresses review feedback on PR #4370.
Replace the per-response padding from PR #4359 (which created a void
between short responses and the prompt) with a one-time initial scroll
at session start. Prints terminal_height newlines before the banner so
the cursor starts at the bottom row — banner, responses, and prompt all
appear pinned to the bottom with empty space above, not below.
patch_stdout naturally keeps the prompt at the bottom from there, so
no per-response padding is needed.
The openai SDK's SyncAPIClient.is_closed is a method, not a property.
getattr(client, 'is_closed', False) returned the bound method object,
which is always truthy — causing _is_openai_client_closed() to report
all clients as closed and triggering unnecessary client recreation
(~100-200ms TCP+TLS overhead per API call).
Fix: check if is_closed is callable and call it, otherwise treat as bool.
Fixes#4377
Co-authored-by: Bartok9 <Bartok9@users.noreply.github.com>
When a dangerous command was blocked and the user approved it via /approve,
the command was executed but the agent loop had already exited — the agent
never received the command output and the task died silently.
Now _handle_approve_command sends immediate feedback to the user, then
creates a synthetic continuation message with the command output and feeds
it through _handle_message so the agent picks up where it left off.
- Send command result to chat immediately via adapter.send()
- Create synthetic MessageEvent with command + output as context
- Spawn asyncio task to re-invoke agent via _handle_message
- Return None (feedback already sent directly)
- Add test for agent re-invocation after approval
- Update existing approval tests for new return behavior
Target the exact state that breaks: when .navbar-sidebar--show is active
on the same <nav> element. This preserves the blur on mobile when the
sidebar is closed, and only removes it when the sidebar is open.
backdrop-filter on .navbar creates a new CSS stacking context that
hides .navbar-sidebar menu content on mobile (only the close button
is visible). Scope the blur effect to min-width: 997px so it only
applies on desktop where the sidebar is not rendered inside the navbar.
Ref: facebook/docusaurus#6996, facebook/docusaurus#6853
Three bugs prevented credential pool rotation from working when multiple
Codex OAuth tokens were configured:
1. credential_pool was dropped during smart model turn routing.
resolve_turn_route() constructed runtime dicts without it, so the
AIAgent was created without pool access. Fixed in smart_model_routing.py
(no-route and fallback paths), cli.py, and gateway/run.py.
2. Eager fallback fired before pool rotation on 429. The rate-limit
handler at line ~7180 switched to a fallback provider immediately,
before _recover_with_credential_pool got a chance to rotate to the
next credential. Now deferred when the pool still has credentials.
3. (Non-issue) Retry budget was reported as too small, but successful
pool rotations already skip retry_count increment — no change needed.
Reported by community member Schinsly who identified all three root
causes and verified the fix locally with multiple Codex accounts.
After a successful write_file or patch, update the stored read
timestamp to match the file's new modification time. Without this,
consecutive edits by the same task (read → write → write) would
false-warn on the second write because the stored timestamp still
reflected the original read, not the first write.
Also renames the internal tracker key from 'file_mtimes' to
'read_timestamps' for clarity.