hermes-agent

Author	SHA1	Message	Date
Teknium	2da79b13df	feat: priority-based context file selection + CLAUDE.md support (#2301 ) Previously, all project context files (AGENTS.md, .cursorrules, .hermes.md) were loaded and concatenated into the system prompt. This bloated the prompt with potentially redundant or conflicting instructions. Now only ONE project context type is loaded, using priority order: 1. .hermes.md / HERMES.md (walk to git root) 2. AGENTS.md / agents.md (recursive directory walk) 3. CLAUDE.md / claude.md (cwd only, NEW) 4. .cursorrules / .cursor/rules/*.mdc (cwd only) SOUL.md from HERMES_HOME remains independent and always loads. Also adds CLAUDE.md as a recognized context file format, matching the convention popularized by Claude Code. Refactored the monolithic function into four focused helpers: _load_hermes_md, _load_agents_md, _load_claude_md, _load_cursorrules. Tests: replaced 1 coexistence test with 10 new tests covering priority ordering, CLAUDE.md loading, case sensitivity, injection blocking.	2026-03-21 06:26:20 -07:00
Teknium	88643a1ba9	feat: overhaul context length detection with models.dev and provider-aware resolution (#2158 ) Replace the fragile hardcoded context length system with a multi-source resolution chain that correctly identifies context windows per provider. Key changes: - New agent/models_dev.py: Fetches and caches the models.dev registry (3800+ models across 100+ providers with per-provider context windows). In-memory cache (1hr TTL) + disk cache for cold starts. - Rewritten get_model_context_length() resolution chain: 0. Config override (model.context_length) 1. Custom providers per-model context_length 2. Persistent disk cache 3. Endpoint /models (local servers) 4. Anthropic /v1/models API (max_input_tokens, API-key only) 5. OpenRouter live API (existing, unchanged) 6. Nous suffix-match via OpenRouter (dot/dash normalization) 7. models.dev registry lookup (provider-aware) 8. Thin hardcoded defaults (broad family patterns) 9. 128K fallback (was 2M) - Provider-aware context: same model now correctly resolves to different context windows per provider (e.g. claude-opus-4.6: 1M on Anthropic, 128K on GitHub Copilot). Provider name flows through ContextCompressor. - DEFAULT_CONTEXT_LENGTHS shrunk from 80+ entries to ~16 broad patterns. models.dev replaces the per-model hardcoding. - CONTEXT_PROBE_TIERS changed from [2M, 1M, 512K, 200K, 128K, 64K, 32K] to [128K, 64K, 32K, 16K, 8K]. Unknown models no longer start at 2M. - hermes model: prompts for context_length when configuring custom endpoints. Supports shorthand (32k, 128K). Saved to custom_providers per-model config. - custom_providers schema extended with optional models dict for per-model context_length (backward compatible). - Nous Portal: suffix-matches bare IDs (claude-opus-4-6) against OpenRouter's prefixed IDs (anthropic/claude-opus-4.6) with dot/dash normalization. Handles all 15 current Nous models. - Anthropic direct: queries /v1/models for max_input_tokens. Only works with regular API keys (sk-ant-api*), not OAuth tokens. Falls through to models.dev for OAuth users. Tests: 5574 passed (18 new tests for models_dev + updated probe tiers) Docs: Updated configuration.md context length section, AGENTS.md Co-authored-by: Test <test@test.com>	2026-03-20 06:04:33 -07:00
Teknium	3ec6c71e43	fix: update claude 4.6 context length from 200K to 1M (#2155 ) * fix: preserve Ollama model:tag colons in context length detection The colon-split logic in get_model_context_length() and _query_local_context_length() assumed any colon meant provider:model format (e.g. "local:my-model"). But Ollama uses model:tag format (e.g. "qwen3.5:27b"), so the split turned "qwen3.5:27b" into just "27b" — which matches nothing, causing a fallback to the 2M token probe tier. Now only recognised provider prefixes (local, openrouter, anthropic, etc.) are stripped. Ollama model:tag names pass through intact. * fix: update claude-opus-4-6 and claude-sonnet-4-6 context length from 200K to 1M Both models support 1,000,000 token context windows. The hardcoded defaults were set before Anthropic expanded the context for the 4.6 generation. Verified via models.dev and OpenRouter API data. --------- Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com> Co-authored-by: Test <test@test.com>	2026-03-20 04:38:59 -07:00
Teknium	471ea81a7d	fix: preserve Ollama model:tag colons in context length detection (#2149 ) The colon-split logic in get_model_context_length() and _query_local_context_length() assumed any colon meant provider:model format (e.g. "local:my-model"). But Ollama uses model:tag format (e.g. "qwen3.5:27b"), so the split turned "qwen3.5:27b" into just "27b" — which matches nothing, causing a fallback to the 2M token probe tier. Now only recognised provider prefixes (local, openrouter, anthropic, etc.) are stripped. Ollama model:tag names pass through intact. Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-20 03:19:31 -07:00
Teknium	d76fa7fc37	fix: detect context length for custom model endpoints via fuzzy matching + config override (#2051 ) * fix: detect context length for custom model endpoints via fuzzy matching + config override Custom model endpoints (non-OpenRouter, non-known-provider) were silently falling back to 2M tokens when the model name didn't exactly match what the endpoint's /v1/models reported. This happened because: 1. Endpoint metadata lookup used exact match only — model name mismatches (e.g. 'qwen3.5:9b' vs 'Qwen3.5-9B-Q4_K_M.gguf') caused a miss 2. Single-model servers (common for local inference) required exact name match even though only one model was loaded 3. No user escape hatch to manually set context length Changes: - Add fuzzy matching for endpoint model metadata: single-model servers use the only available model regardless of name; multi-model servers try substring matching in both directions - Add model.context_length config override (highest priority) so users can explicitly set their model's context length in config.yaml - Log an informative message when falling back to 2M probe, telling users about the config override option - Thread config_context_length through ContextCompressor and AIAgent init Tests: 6 new tests covering fuzzy match, single-model fallback, config override (including zero/None edge cases). * fix: auto-detect local model name and context length for local servers Cherry-picked from PR #2043 by sudoingX. - Auto-detect model name from local server's /v1/models when only one model is loaded (no manual model name config needed) - Add n_ctx_train and n_ctx to context length detection keys for llama.cpp - Query llama.cpp /props endpoint for actual allocated context (not just training context from GGUF metadata) - Strip .gguf suffix from display in banner and status bar - _auto_detect_local_model() in runtime_provider.py for CLI init Co-authored-by: sudo <sudoingx@users.noreply.github.com> * fix: revert accidental summary_target_tokens change + add docs for context_length config - Revert summary_target_tokens from 2500 back to 500 (accidental change during patching) - Add 'Context Length Detection' section to Custom & Self-Hosted docs explaining model.context_length config override --------- Co-authored-by: Test <test@test.com> Co-authored-by: sudo <sudoingx@users.noreply.github.com>	2026-03-19 06:01:16 -07:00
Test	e7844e9c8d	Merge origin/main, resolve conflicts (self._base_url_lower)	2026-03-18 04:09:00 -07:00
Teknium	b70dd51cfa	fix: disabled skills respected across banner, system prompt, slash commands, and skill_view (#1897 ) * fix: banner skill count now respects disabled skills and platform filtering The banner's get_available_skills() was doing a raw rglob scan of ~/.hermes/skills/ without checking: - Whether skills are disabled (skills.disabled config) - Whether skills match the current platform (platforms: frontmatter) This caused the banner to show inflated skill counts (e.g. '100 skills' when many are disabled) and list macOS-only skills on Linux. Fix: delegate to _find_all_skills() from tools/skills_tool which already handles both platform gating and disabled-skill filtering. * fix: system prompt and slash commands now respect disabled skills Two more places where disabled skills were still surfaced: 1. build_skills_system_prompt() in prompt_builder.py — disabled skills appeared in the <available_skills> system prompt section, causing the agent to suggest/load them despite being disabled. 2. scan_skill_commands() in skill_commands.py — disabled skills still registered as /skill-name slash commands in CLI help and could be invoked. Both now load _get_disabled_skill_names() and filter accordingly. * fix: skill_view blocks disabled skills skill_view() checked platform compatibility but not disabled state, so the agent could still load and read disabled skills directly. Now returns a clear error when a disabled skill is requested, telling the user to enable it via hermes skills or inspect the files manually. --------- Co-authored-by: Test <test@test.com>	2026-03-18 03:17:37 -07:00
Teknium	a2440f72f6	feat: use endpoint metadata for custom model context and pricing (#1906 ) * perf: cache base_url.lower() via property, consolidate triple load_config(), hoist set constant run_agent.py: - Add base_url property that auto-caches _base_url_lower on every assignment, eliminating 12+ redundant .lower() calls per API cycle across __init__, _build_api_kwargs, _supports_reasoning_extra_body, and the main conversation loop - Consolidate three separate load_config() disk reads in __init__ (memory, skills, compression) into a single call, reusing the result dict for all three config sections model_tools.py: - Hoist _READ_SEARCH_TOOLS set to module level (was rebuilt inside handle_function_call on every tool invocation) * Use endpoint metadata for custom model context and pricing --------- Co-authored-by: kshitij <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-18 03:04:07 -07:00
max	0c392e7a87	feat: integrate GitHub Copilot providers across Hermes Add first-class GitHub Copilot and Copilot ACP provider support across model selection, runtime provider resolution, CLI sessions, delegated subagents, cron jobs, and the Telegram gateway. This also normalizes Copilot model catalogs and API modes, introduces a Copilot ACP OpenAI-compatible shim, and fixes service-mode auth by resolving Homebrew-installed gh binaries under launchd. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-17 23:40:22 -07:00
Teknium	548cedb869	fix(context_compressor): prevent consecutive same-role messages after compression (#1743 ) compress() checks both the head and tail neighbors when choosing the summary message role. When only the tail collides, the role is flipped. When BOTH roles would create consecutive same-role messages (e.g. head=assistant, tail=user), the summary is merged into the first tail message instead of inserting a standalone message that breaks role alternation and causes API 400 errors. The previous code handled head-side collision but left the tail-side uncovered — long conversations would crash mid-reply with no useful error, forcing the user to /reset and lose session history. Based on PR #1186 by @alireza78a, with improved double-collision handling (merge into tail instead of unconditional 'user' fallback). Co-authored-by: alireza78a <alireza78.crypto@gmail.com>	2026-03-17 05:18:52 -07:00
Teknium	d1d17f4f0a	feat(compression): add summary_base_url + move compression config to YAML-only - Add summary_base_url config option to compression block for custom OpenAI-compatible endpoints (e.g. zai, DeepSeek, Ollama) - Remove compression env var bridges from cli.py and gateway/run.py (CONTEXT_COMPRESSION_* env vars no longer set from config) - Switch run_agent.py to read compression config directly from config.yaml instead of env vars - Fix backwards-compat block in _resolve_task_provider_model to also fire when auxiliary.compression.provider is 'auto' (DEFAULT_CONFIG sets this, which was silently preventing the compression section's summary_* keys from being read) - Add test for summary_base_url config-to-client flow - Update docs to show compression as config.yaml-only Closes #1591 Based on PR #1702 by @uzaylisak	2026-03-17 04:46:15 -07:00
teknium1	0897e4350e	merge: resolve conflicts with origin/main	2026-03-17 04:30:37 -07:00
ch3ronsa	695eb04243	feat(agent): .hermes.md per-repository project config discovery Adds .hermes.md / HERMES.md discovery for per-project agent configuration. When the agent starts, it walks from cwd to the git root looking for .hermes.md (preferred) or HERMES.md, strips any YAML frontmatter, and injects the markdown body into the system prompt as project context. - Nearest-first discovery (subdirectory configs shadow parent) - Stops at git root boundary (no leaking into parent repos) - YAML frontmatter stripped (structured config deferred to Phase 2) - Same injection scanning and 20K truncation as other context files - 22 comprehensive tests Original implementation by ch3ronsa. Cherry-picked and adapted for current main. Closes #681 (Phase 1)	2026-03-17 04:16:32 -07:00
teknium1	e5fc916814	feat: auto-generate session titles after first exchange After the first user→assistant exchange, Hermes now generates a short descriptive session title via the auxiliary LLM (compression task config). Title generation runs in a background thread so it never delays the user-facing response. Key behaviors: - Fires only on the first 1-2 exchanges (checks user message count) - Skips if a title already exists (user-set titles are never overwritten) - Uses call_llm with compression task config (cheapest/fastest model) - Truncates long messages to keep the title generation request small - Cleans up LLM output: strips quotes, 'Title:' prefixes, enforces 80 char max - Works in both CLI and gateway (Telegram/Discord/etc.) Also updates /title (no args) to show the session ID alongside the title in both CLI and gateway. Implements #1426	2026-03-17 04:14:40 -07:00
crazywriter1	7049dba778	fix(docker): remove container on cleanup when container_persistent=false When container_persistent=false, the inner mini-swe-agent cleanup only runs 'docker stop' in the background, leaving containers in Exited state. Now cleanup() also runs 'docker rm -f' to fully remove the container. Also fixes pre-existing test failures in model_metadata (gpt-4.1 1M context), setup tests (TTS provider step), and adds MockInnerDocker.cleanup(). Original fix by crazywriter1. Cherry-picked and adapted for current main. Fixes #1679	2026-03-17 04:02:01 -07:00
Teknium	6405d389aa	test: align Hermes setup and full-suite expectations (#1710 ) Salvaged from PR #1708 by @kartikkabadi. Cherry-picked with authorship preserved. Fixes pre-existing test failures from setup TTS prompt flow changes and environment-sensitive assumptions. Co-authored-by: Kartik <user2@RentKars-MacBook-Air.local>	2026-03-17 04:01:37 -07:00
Teknium	d417ba2a48	feat: add route-aware pricing estimates (#1695 ) Salvaged from PR #1563 by @kshitijk4poor. Cherry-picked with authorship preserved. - Route-aware pricing architecture replacing static MODEL_PRICING + heuristics - Canonical usage normalization (Anthropic/OpenAI/Codex API shapes) - Cache-aware billing (separate cache_read/cache_write rates) - Cost status tracking (estimated/included/unknown/actual) - OpenRouter live pricing via models API - Schema migration v4→v5 with billing metadata columns - Removed speculative forward-looking entries - Removed cost display from CLI status bar - Threaded OpenRouter metadata pre-warm Co-authored-by: kshitij <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-17 03:44:44 -07:00
Teknium	5e5c92663d	fix: hermes update causes dual gateways on macOS (launchd) (#1567 ) * feat: add optional smart model routing Add a conservative cheap-vs-strong routing option that can send very short/simple turns to a cheaper model across providers while keeping the primary model for complex work. Wire it through CLI, gateway, and cron, and document the config.yaml workflow. * fix(gateway): remove recursive ExecStop from systemd units, extend TimeoutStopSec to 60s * fix(gateway): avoid recursive ExecStop in user systemd unit * fix: extend ExecStop removal and TimeoutStopSec=60 to system unit The cherry-picked PR #1448 fix only covered the user systemd unit. The system unit had the same TimeoutStopSec=15 and could benefit from the same 60s timeout for clean shutdown. Also adds a regression test for the system unit. --------- Co-authored-by: Ninja <ninja@local> * feat(skills): add blender-mcp optional skill for 3D modeling Control a running Blender instance from Hermes via socket connection to the blender-mcp addon (port 9876). Supports creating 3D objects, materials, animations, and running arbitrary bpy code. Placed in optional-skills/ since it requires Blender 4.3+ desktop with a third-party addon manually started each session. * feat(acp): support slash commands in ACP adapter (#1532) Adds /help, /model, /tools, /context, /reset, /compact, /version to the ACP adapter (VS Code, Zed, JetBrains). Commands are handled directly in the server without instantiating the TUI — each command queries agent/session state and returns plain text. Unrecognized /commands fall through to the LLM as normal messages. /model uses detect_provider_for_model() for auto-detection when switching models, matching the CLI and gateway behavior. Fixes #1402 * fix(logging): improve error logging in session search tool (#1533) * fix(gateway): restart on retryable startup failures (#1517) * feat(email): add skip_attachments option via config.yaml * feat(email): add skip_attachments option via config.yaml Adds a config.yaml-driven option to skip email attachments in the gateway email adapter. Useful for malware protection and bandwidth savings. Configure in config.yaml: platforms: email: skip_attachments: true Based on PR #1521 by @an420eth, changed from env var to config.yaml (via PlatformConfig.extra) to match the project's config-first pattern. * docs: document skip_attachments option for email adapter * fix(telegram): retry on transient TLS failures during connect and send Add exponential-backoff retry (3 attempts) around initialize() to handle transient TLS resets during gateway startup. Also catches TimedOut and OSError in addition to NetworkError. Add exponential-backoff retry (3 attempts) around send_message() for NetworkError during message delivery, wrapping the existing Markdown fallback logic. Both imports are guarded with try/except ImportError for test environments where telegram is mocked. Based on PR #1527 by cmd8. Closes #1526. * feat: permissive block_anchor thresholds and unicode normalization (#1539) Salvaged from PR #1528 by an420eth. Closes #517. Improves _strategy_block_anchor in fuzzy_match.py: - Add unicode normalization (smart quotes, em/en-dashes, ellipsis, non-breaking spaces → ASCII) so LLM-produced unicode artifacts don't break anchor line matching - Lower thresholds: 0.10 for unique matches (was 0.70), 0.30 for multiple candidates — if first/last lines match exactly, the block is almost certainly correct - Use original (non-normalized) content for offset calculation to preserve correct character positions Tested: 3 new scenarios fixed (em-dash anchors, non-breaking space anchors, very-low-similarity unique matches), zero regressions on all 9 existing fuzzy match tests. Co-authored-by: an420eth <an420eth@users.noreply.github.com> * feat(cli): add file path autocomplete in the input prompt (#1545) When typing a path-like token (./ ../ ~/ / or containing /), the CLI now shows filesystem completions in the dropdown menu. Directories show a trailing slash and 'dir' label; files show their size. Completions are case-insensitive and capped at 30 entries. Triggered by tokens like: edit ./src/ma → shows ./src/main.py, ./src/manifest.json, ... check ~/doc → shows ~/docs/, ~/documents/, ... read /etc/hos → shows /etc/hosts, /etc/hostname, ... open tools/reg → shows tools/registry.py Slash command autocomplete (/help, /model, etc.) is unaffected — it still triggers when the input starts with /. Inspired by OpenCode PR #145 (file path completion menu). Implementation: - hermes_cli/commands.py: _extract_path_word() detects path-like tokens, _path_completions() yields filesystem Completions with size labels, get_completions() routes to paths vs slash commands - tests/hermes_cli/test_path_completion.py: 26 tests covering path extraction, prefix filtering, directory markers, home expansion, case-insensitivity, integration with slash commands * feat(privacy): redact PII from LLM context when privacy.redact_pii is enabled Add privacy.redact_pii config option (boolean, default false). When enabled, the gateway redacts personally identifiable information from the system prompt before sending it to the LLM provider: - Phone numbers (user IDs on WhatsApp/Signal) → hashed to user_<sha256> - User IDs → hashed to user_<sha256> - Chat IDs → numeric portion hashed, platform prefix preserved - Home channel IDs → hashed - Names/usernames → NOT affected (user-chosen, publicly visible) Hashes are deterministic (same user → same hash) so the model can still distinguish users in group chats. Routing and delivery use the original values internally — redaction only affects LLM context. Inspired by OpenClaw PR #47959. * fix(privacy): skip PII redaction on Discord/Slack (mentions need real IDs) Discord uses <@user_id> for mentions and Slack uses <@U12345> — the LLM needs the real ID to tag users. Redaction now only applies to WhatsApp, Signal, and Telegram where IDs are pure routing metadata. Add 4 platform-specific tests covering Discord, WhatsApp, Signal, Slack. * feat: smart approvals + /stop command (inspired by OpenAI Codex) * feat: smart approvals — LLM-based risk assessment for dangerous commands Adds a 'smart' approval mode that uses the auxiliary LLM to assess whether a flagged command is genuinely dangerous or a false positive, auto-approving low-risk commands without prompting the user. Inspired by OpenAI Codex's Smart Approvals guardian subagent (openai/codex#13860). Config (config.yaml): approvals: mode: manual # manual (default), smart, off Modes: - manual — current behavior, always prompt the user - smart — aux LLM evaluates risk: APPROVE (auto-allow), DENY (block), or ESCALATE (fall through to manual prompt) - off — skip all approval prompts (equivalent to --yolo) When smart mode auto-approves, the pattern gets session-level approval so subsequent uses of the same pattern don't trigger another LLM call. When it denies, the command is blocked without user prompt. When uncertain, it escalates to the normal manual approval flow. The LLM prompt is carefully scoped: it sees only the command text and the flagged reason, assesses actual risk vs false positive, and returns a single-word verdict. * feat: make smart approval model configurable via config.yaml Adds auxiliary.approval section to config.yaml with the same provider/model/base_url/api_key pattern as other aux tasks (vision, web_extract, compression, etc.). Config: auxiliary: approval: provider: auto model: '' # fast/cheap model recommended base_url: '' api_key: '' Bridged to env vars in both CLI and gateway paths so the aux client picks them up automatically. * feat: add /stop command to kill all background processes Adds a /stop slash command that kills all running background processes at once. Currently users have to process(list) then process(kill) for each one individually. Inspired by OpenAI Codex's separation of interrupt (Ctrl+C stops current turn) from /stop (cleans up background processes). See openai/codex#14602. Ctrl+C continues to only interrupt the active agent turn — background dev servers, watchers, etc. are preserved. /stop is the explicit way to clean them all up. * feat: first-class plugin architecture + hide status bar cost by default (#1544) The persistent status bar now shows context %, token counts, and duration but NOT $ cost by default. Cost display is opt-in via: display: show_cost: true in config.yaml, or: hermes config set display.show_cost true The /usage command still shows full cost breakdown since the user explicitly asked for it — this only affects the always-visible bar. Status bar without cost: ⚕ claude-sonnet-4 │ 12K/200K │ 6% │ 15m Status bar with show_cost: true: ⚕ claude-sonnet-4 │ 12K/200K │ 6% │ $0.06 │ 15m * feat: improve memory prioritization + aggressive skill updates (inspired by OpenAI Codex) * feat: improve memory prioritization — user preferences over procedural knowledge Inspired by OpenAI Codex's memory prompt improvements (openai/codex#14493) which focus memory writes on user preferences and recurring patterns rather than procedural task details. Key insight: 'Optimize for reducing future user steering — the most valuable memory prevents the user from having to repeat themselves.' Changes: - MEMORY_GUIDANCE (prompt_builder.py): added prioritization hierarchy and the core principle about reducing user steering - MEMORY_SCHEMA (memory_tool.py): reordered WHEN TO SAVE list to put corrections first, added explicit PRIORITY guidance - Memory nudge (run_agent.py): now asks specifically about preferences, corrections, and workflow patterns instead of generic 'anything' - Memory flush (run_agent.py): now instructs to prioritize user preferences and corrections over task-specific details * feat: more aggressive skill creation and update prompting Press harder on skill updates — the agent should proactively patch skills when it encounters issues during use, not wait to be asked. Changes: - SKILLS_GUIDANCE: 'consider saving' → 'save'; added explicit instruction to patch skills immediately when found outdated/wrong - Skills header: added instruction to update loaded skills before finishing if they had missing steps or wrong commands - Skill nudge: more assertive ('save the approach' not 'consider saving'), now also prompts for updating existing skills used in the task - Skill nudge interval: lowered default from 15 to 10 iterations - skill_manage schema: added 'patch it immediately' to update triggers * feat: first-class plugin architecture (#1555) Plugin system for extending Hermes with custom tools, hooks, and integrations — no source code changes required. Core system (hermes_cli/plugins.py): - Plugin discovery from ~/.hermes/plugins/, .hermes/plugins/, and pip entry_points (hermes_agent.plugins group) - PluginContext with register_tool() and register_hook() - 6 lifecycle hooks: pre/post tool_call, pre/post llm_call, on_session_start/end - Namespace package handling for relative imports in plugins - Graceful error isolation — broken plugins never crash the agent Integration (model_tools.py): - Plugin discovery runs after built-in + MCP tools - Plugin tools bypass toolset filter via get_plugin_tool_names() - Pre/post tool call hooks fire in handle_function_call() CLI: - /plugins command shows loaded plugins, tool counts, status - Added to COMMANDS dict for autocomplete Docs: - Getting started guide (build-a-hermes-plugin.md) — full tutorial building a calculator plugin step by step - Reference page (features/plugins.md) — quick overview + tables - Covers: file structure, schemas, handlers, hooks, data files, bundled skills, env var gating, pip distribution, common mistakes Tests: 16 tests covering discovery, loading, hooks, tool visibility. * fix: hermes update causes dual gateways on macOS (launchd) Three bugs worked together to create the dual-gateway problem: 1. cmd_update only checked systemd for gateway restart, completely ignoring launchd on macOS. After killing the PID it would print 'Restart it with: hermes gateway run' even when launchd was about to auto-respawn the process. 2. launchd's KeepAlive.SuccessfulExit=false respawns the gateway after SIGTERM (non-zero exit), so the user's manual restart created a second instance. 3. The launchd plist lacked --replace (systemd had it), so the respawned gateway didn't kill stale instances on startup. Fixes: - Add --replace to launchd ProgramArguments (matches systemd) - Add launchd detection to cmd_update's auto-restart logic - Print 'auto-restart via launchd' instead of manual restart hint * fix: add launchd plist auto-refresh + explicit restart in cmd_update Two integration issues with the initial fix: 1. Existing macOS users with old plist (no --replace) would never get the fix until manual uninstall/reinstall. Added refresh_launchd_plist_if_needed() — mirrors the existing refresh_systemd_unit_if_needed(). Called from launchd_start(), launchd_restart(), and cmd_update. 2. cmd_update relied on KeepAlive respawn after SIGTERM rather than explicit launchctl stop/start. This caused races: launchd would respawn the old process before the PID file was cleaned up. Now does explicit stop+start (matching how systemd gets an explicit systemctl restart), with plist refresh first so the new --replace flag is picked up. --------- Co-authored-by: Ninja <ninja@local> Co-authored-by: alireza78a <alireza78a@users.noreply.github.com> Co-authored-by: Oktay Aydin <113846926+aydnOktay@users.noreply.github.com> Co-authored-by: JP Lew <polydegen@protonmail.com> Co-authored-by: an420eth <an420eth@users.noreply.github.com>	2026-03-16 12:36:29 -07:00
teknium1	210d5ade1e	feat(tools): centralize tool emoji metadata in registry + skin integration - Add 'emoji' field to ToolEntry and 'get_emoji()' to ToolRegistry - Add emoji= to all 50+ registry.register() calls across tool files - Add get_tool_emoji() helper in agent/display.py with 3-tier resolution: skin override → registry default → hardcoded fallback - Replace hardcoded emoji maps in run_agent.py, delegate_tool.py, and gateway/run.py with centralized get_tool_emoji() calls - Add 'tool_emojis' field to SkinConfig so skins can override per-tool emojis (e.g. ares skin could use swords instead of wrenches) - Add 11 tests (5 registry emoji, 6 display/skin integration) - Update AGENTS.md skin docs table Based on the approach from PR #1061 by ForgingAlex (emoji centralization in registry). This salvage fixes several issues from the original: - Does NOT split the cronjob tool (which would crash on missing schemas) - Does NOT change image_generate toolset/requires_env/is_async - Does NOT delete existing tests - Completes the centralization (gateway/run.py was missed) - Hooks into the skin system for full customizability	2026-03-15 20:21:21 -07:00
teknium1	62abb453d3	Merge origin/main into hermes/hermes-daa73839	2026-03-14 23:44:47 -07:00
teknium1	735a6e7651	fix: convert anthropic image content blocks	2026-03-14 23:41:20 -07:00
teknium1	1337c9efd8	test: resolve auxiliary client merge conflict	2026-03-14 22:15:16 -07:00
Teknium	b14a07315b	fix: save /plan output in workspace (#1381 )	2026-03-14 21:28:51 -07:00
Teknium	ff3473a37c	feat: add /plan command (#1372 ) * feat: add /plan command * refactor: back /plan with bundled skill * docs: document /plan skill	2026-03-14 21:18:17 -07:00
teknium1	85ef09e520	Merge origin/main into hermes/hermes-dd253d81	2026-03-14 21:16:29 -07:00
teknium1	db362dbd4c	feat: add native Anthropic auxiliary vision	2026-03-14 21:14:20 -07:00
teknium1	9f6bccd76a	feat: add direct endpoint overrides for auxiliary and delegation Add base_url/api_key overrides for auxiliary tasks and delegation so users can route those flows straight to a custom OpenAI-compatible endpoint without having to rely on provider=main or named custom providers. Also clear gateway session env vars in test isolation so the full suite stays deterministic when run from a messaging-backed agent session.	2026-03-14 21:11:37 -07:00
Teknium	a86b487349	Merge pull request #1373 from NousResearch/hermes/hermes-781f9235 fix: restore config-saved custom endpoint resolution	2026-03-14 21:06:41 -07:00
teknium1	53d1043a50	fix: restore config-saved custom endpoint resolution	2026-03-14 20:58:12 -07:00
Teknium	24f61d006a	feat: preload CLI skills on launch (#1359 ) * feat: preload CLI skills on launch * test: cover continue with worktree and skills flags * feat: show activated skills before CLI banner	2026-03-14 19:33:59 -07:00
teknium1	7b140b31e6	fix: suppress duplicate cron sends to auto-delivery targets Allow cron runs to keep using send_message for additional destinations, but skip same-target sends when the scheduler will already auto-deliver the final response there. Add prompt/tool guidance, docs, and regression coverage for origin/home-channel resolution and thread-aware comparisons.	2026-03-14 19:07:50 -07:00
teknium1	5319bb6ac4	fix: tighten memory and session recall guidance Remove diary-style memory framing from the system prompt and memory tool schema, explicitly steer task/session logs to session_search, and clarify that session_search is for cross-session recall after checking the current conversation first. Add regression tests for the updated guidance text.	2026-03-14 11:36:47 -07:00
teknium1	906e25f299	feat: seed a default global SOUL.md Seed ~/.hermes/SOUL.md when missing, load SOUL only from HERMES_HOME, and inject raw SOUL content without wrapper text. If the file exists but is empty, nothing is added to the system prompt.	2026-03-14 08:05:30 -07:00
Teknium	5c479eedf1	feat: improve context compaction handoff summaries (#1273 ) Adapt PR #916 onto current main by replacing the old context summary marker with a clearer handoff wrapper, updating the summarization prompt for resume-oriented summaries, and preserving the current call_llm-based compression path.	2026-03-14 02:33:31 -07:00
teknium1	1e23d14568	fix: log prompt builder skill parsing fallbacks	2026-03-14 02:22:17 -07:00
brandtcormorant	76efb0153a	fix(cache_control) treat empty text like None to avoid anthropic api cache_control error	2026-03-13 18:08:46 -07:00
Teknium	07927f6bf2	feat(stt): add free local whisper transcription via faster-whisper (#1185 ) * fix: Home Assistant event filtering now closed by default Previously, when no watch_domains or watch_entities were configured, ALL state_changed events passed through to the agent, causing users to be flooded with notifications for every HA entity change. Now events are dropped by default unless the user explicitly configures: - watch_domains: list of domains to monitor (e.g. climate, light) - watch_entities: list of specific entity IDs to monitor - watch_all: true (new option — opt-in to receive all events) A warning is logged at connect time if no filters are configured, guiding users to set up their HA platform config. All 49 gateway HA tests + 52 HA tool tests pass. * docs: update Home Assistant integration documentation - homeassistant.md: Fix event filtering docs to reflect closed-by-default behavior. Add watch_all option. Replace Python dict config example with YAML. Fix defaults table (was incorrectly showing 'all'). Add required configuration warning admonition. - environment-variables.md: Add HASS_TOKEN and HASS_URL to Messaging section. - messaging/index.md: Add Home Assistant to description, architecture diagram, platform toolsets table, and Next Steps links. * fix(terminal): strip provider env vars from background and PTY subprocesses Extends the env var blocklist from #1157 to also cover the two remaining leaky paths in process_registry.py: - spawn_local() PTY path (line 156) - spawn_local() background Popen path (line 197) Both were still using raw os.environ, leaking provider vars to background processes and interactive PTY sessions. Now uses the same dynamic _HERMES_PROVIDER_ENV_BLOCKLIST from local.py. Explicit env_vars passed to spawn_local() still override the blocklist, matching the existing behavior for callers that intentionally need these. Gap identified by PR #1004 (@PeterFile). * feat(delegate): add observability metadata to subagent results Enrich delegate_task results with metadata from the child AIAgent: - model: which model the child used - exit_reason: completed \| interrupted \| max_iterations - tokens.input / tokens.output: token counts - tool_trace: per-tool-call trace with byte sizes and ok/error status Tool trace uses tool_call_id matching to correctly pair parallel tool calls with their results, with a fallback for messages without IDs. Cherry-picked from PR #872 by @omerkaz, with fixes: - Fixed parallel tool call trace pairing (was always updating last entry) - Removed redundant 'iterations' field (identical to existing 'api_calls') - Added test for parallel tool call trace correctness Co-authored-by: omerkaz <omerkaz@users.noreply.github.com> * feat(stt): add free local whisper transcription via faster-whisper Replace OpenAI-only STT with a dual-provider system mirroring the TTS architecture (Edge TTS free / ElevenLabs paid): STT: faster-whisper local (free, default) / OpenAI Whisper API (paid) Changes: - tools/transcription_tools.py: Full rewrite with provider dispatch, config loading, local faster-whisper backend, and OpenAI API backend. Auto-downloads model (~150MB for 'base') on first voice message. Singleton model instance reused across calls. - pyproject.toml: Add faster-whisper>=1.0.0 as core dependency - hermes_cli/config.py: Expand stt config to match TTS pattern with provider selection and per-provider model settings - agent/context_compressor.py: Fix .strip() crash when LLM returns non-string content (dict from llama.cpp, None). Fixes #1100 partially. - tests/: 23 new tests for STT providers + 2 for compressor fix - docs/: Updated Voice & TTS page with STT provider table, model sizes, config examples, and fallback behavior Fallback behavior: - Local not installed → OpenAI API (if key set) - OpenAI key not set → local whisper (if installed) - Neither → graceful error message to user Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com> --------- Co-authored-by: omerkaz <omerkaz@users.noreply.github.com> Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>	2026-03-13 11:11:05 -07:00
Teknium	11b577671b	fix: auxiliary client uses main model for custom/local endpoints instead of gpt-4o-mini (#1189 ) * fix: prevent model/provider mismatch when switching providers during active gateway When _update_config_for_provider() writes the new provider and base_url to config.yaml, the gateway (which re-reads config per-message) can pick up the change before model selection completes. This causes the old model name (e.g. 'anthropic/claude-opus-4.6') to be sent to the new provider's API (e.g. MiniMax), which fails. Changes: - _update_config_for_provider() now accepts an optional default_model parameter. When provided and the current model.default is empty or uses OpenRouter format (contains '/'), it sets a safe default model for the new provider. - All setup.py callers for direct-API providers (zai, kimi, minimax, minimax-cn, anthropic) now pass a provider-appropriate default model. - _setup_provider_model_selection() now validates the 'Keep current' choice: if the current model uses OpenRouter format and wouldn't work with the new provider, it warns and switches to the provider's first default model instead of silently keeping the incompatible name. Reported by a user on Home Assistant whose gateway started sending 'anthropic/claude-opus-4.6' to MiniMax's API after running hermes setup. * fix: auxiliary client uses main model for custom/local endpoints instead of gpt-4o-mini When a user runs a local server (e.g. Qwen3.5-9B via OPENAI_BASE_URL), the auxiliary client (context compression, vision, session search) would send requests for 'gpt-4o-mini' or 'google/gemini-3-flash-preview' to the local server, which only serves one model — causing 404 errors mid-task. Changes: - _try_custom_endpoint() now reads the user's configured main model via _read_main_model() (checks OPENAI_MODEL → HERMES_MODEL → LLM_MODEL → config.yaml model.default) instead of hardcoding 'gpt-4o-mini'. - resolve_provider_client() auto mode now detects when an OpenRouter- formatted model override (containing '/') would be sent to a non- OpenRouter provider (like a local server) and drops it in favor of the provider's default model. - Test isolation fixes: properly clear env vars in 'nothing available' tests to prevent host environment leakage.	2026-03-13 10:02:16 -07:00
teknium1	06a5cc484c	fix: improve gateway secret capture guidance message The old message referenced 'hermes setup' which doesn't handle skill-specific env vars. Updated to direct users to load the skill in the local CLI (which triggers the secure prompt) or add the key to ~/.hermes/.env manually.	2026-03-13 04:10:22 -07:00
kshitijk4poor	ccfbf42844	feat: secure skill env setup on load (core #688 ) When a skill declares required_environment_variables in its YAML frontmatter, missing env vars trigger a secure TUI prompt (identical to the sudo password widget) when the skill is loaded. Secrets flow directly to ~/.hermes/.env, never entering LLM context. Key changes: - New required_environment_variables frontmatter field for skills - Secure TUI widget (masked input, 120s timeout) - Gateway safety: messaging platforms show local setup guidance - Legacy prerequisites.env_vars normalized into new format - Remote backend handling: conservative setup_needed=True - Env var name validation, file permissions hardened to 0o600 - Redact patterns extended for secret-related JSON fields - 12 existing skills updated with prerequisites declarations - ~48 new tests covering skip, timeout, gateway, remote backends - Dynamic panel widget sizing (fixes hardcoded width from original PR) Cherry-picked from PR #723 by kshitijk4poor, rebased onto current main with conflict resolution. Fixes #688 Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>	2026-03-13 03:14:04 -07:00
teknium1	2192b17670	merge: resolve conflicts with origin/main - gateway/run.py: Take main's _resolve_gateway_model() helper - hermes_cli/setup.py: Re-apply nous-api removal after merge brought it back. Fix provider_idx offset (Custom is now index 3, not 4). - tests/hermes_cli/test_setup.py: Fix custom setup test index (3→4)	2026-03-12 00:29:04 -07:00
teknium1	0aa31cd3cb	feat: call_llm/async_call_llm + config slots + migrate all consumers Add centralized call_llm() and async_call_llm() functions that own the full LLM request lifecycle: 1. Resolve provider + model from task config or explicit args 2. Get or create a cached client for that provider 3. Format request args (max_tokens handling, provider extra_body) 4. Make the API call with max_tokens/max_completion_tokens retry 5. Return the response Config: expanded auxiliary section with provider:model slots for all tasks (compression, vision, web_extract, session_search, skills_hub, mcp, flush_memories). Config version bumped to 7. Migrated all auxiliary consumers: - context_compressor.py: uses call_llm(task='compression') - vision_tools.py: uses async_call_llm(task='vision') - web_tools.py: uses async_call_llm(task='web_extract') - session_search_tool.py: uses async_call_llm(task='session_search') - browser_tool.py: uses call_llm(task='vision'/'web_extract') - mcp_tool.py: uses call_llm(task='mcp') - skills_guard.py: uses call_llm(provider='openrouter') - run_agent.py flush_memories: uses call_llm(task='flush_memories') Tests updated for context_compressor and MCP tool. Some test mocks still need updating (15 remaining failures from mock pattern changes, 2 pre-existing).	2026-03-11 20:52:19 -07:00
Teknium	01d3b31479	Merge PR #785 : feat: conditional skill activation based on tool availability Authored by teyrebaz33. Closes #539. feat: conditional skill activation based on tool availability	2026-03-11 08:43:30 -07:00
teknium1	a34102049b	Merge: vision auto-detection fallback to local endpoints	2026-03-09 15:36:27 -07:00
teknium1	ef5d811aba	fix: vision auto-detection now falls back to custom/local endpoints Vision auto-mode previously only tried OpenRouter, Nous, and Codex for multimodal — deliberately skipping custom endpoints with the assumption they 'may not handle vision input.' This caused silent failures for users running local multimodal models (Qwen-VL, LLaVA, Pixtral, etc.) without any cloud API keys. Now custom endpoints are tried as a last resort in auto mode. If the model doesn't support vision, the API call fails gracefully — but users with local vision models no longer need to manually set auxiliary.vision.provider: main in config.yaml. Reported by @Spadav and @kotyKD.	2026-03-09 15:36:19 -07:00
teyrebaz33	94023e6a85	feat: conditional skill activation based on tool availability Skills can now declare fallback_for_toolsets, fallback_for_tools, requires_toolsets, and requires_tools in their SKILL.md frontmatter. The system prompt builder filters skills automatically based on which tools are available in the current session. - Add _read_skill_conditions() to parse conditional frontmatter fields - Add _skill_should_show() to evaluate conditions against available tools - Update build_skills_system_prompt() to accept and apply tool availability - Pass valid_tool_names and available toolsets from run_agent.py - Backward compatible: skills without conditions always show; calling build_skills_system_prompt() with no args preserves existing behavior Closes #539	2026-03-09 23:13:39 +03:00
teknium1	77da3bbc95	fix: use correct role for summary message in context compressor The summary message was always injected as 'user' role, which causes consecutive user messages when the last preserved head message is also 'user'. Some APIs reject this (400 error), and it produces malformed training data. Fix: check the role of the last head message and pick the opposite role for the summary — 'user' after assistant/tool, 'assistant' after user. Based on PR #328 by johnh4098. Closes #328.	2026-03-08 23:09:04 -07:00
teknium1	2d1a1c1c47	refactor: remove redundant 'openai' auxiliary provider, clean up docs The 'openai' provider was redundant — using OPENAI_BASE_URL + OPENAI_API_KEY with provider: 'main' already covers direct OpenAI API. Provider options are now: auto, openrouter, nous, codex, main. - Removed _try_openai(), _OPENAI_AUX_MODEL, _OPENAI_BASE_URL - Replaced openai tests with codex provider tests - Updated all docs to remove 'openai' option and clarify 'main' - 'main' description now explicitly mentions it works with OpenAI API, local models, and any OpenAI-compatible endpoint Tests: 2467 passed.	2026-03-08 18:50:26 -07:00
teknium1	71e81728ac	feat: Codex OAuth vision support + multimodal content adapter The Codex Responses API (chatgpt.com/backend-api/codex) supports vision via gpt-5.3-codex. This was verified with real API calls using image analysis. Changes to _CodexCompletionsAdapter: - Added _convert_content_for_responses() to translate chat.completions multimodal format to Responses API format: - {type: 'text'} → {type: 'input_text'} - {type: 'image_url', image_url: {url: '...'}} → {type: 'input_image', image_url: '...'} - Fixed: removed 'stream' from resp_kwargs (responses.stream() handles it) - Fixed: removed max_output_tokens and temperature (Codex endpoint rejects them) Provider changes: - Added 'codex' as explicit auxiliary provider option - Vision auto-fallback now includes Codex (OpenRouter → Nous → Codex) since gpt-5.3-codex supports multimodal input - Updated docs with Codex OAuth examples Tested with real Codex OAuth token + ~/.hermes/image2.png — confirmed working end-to-end through the full adapter pipeline. Tests: 2459 passed.	2026-03-08 18:44:33 -07:00
teknium1	ae4a674c84	feat: add 'openai' as auxiliary provider option Users can now set provider: "openai" for auxiliary tasks (vision, web extract, compression) to use OpenAI's API directly with their OPENAI_API_KEY. This hits api.openai.com/v1 with gpt-4o-mini as the default model — supports vision since GPT-4o handles image input. Provider options are now: auto, openrouter, nous, openai, main. Changes: - agent/auxiliary_client.py: added _try_openai(), "openai" case in _resolve_forced_provider(), updated auxiliary_max_tokens_param() to use max_completion_tokens for OpenAI - Updated docs: cli-config.yaml.example, AGENTS.md, and user-facing configuration.md with Common Setups section showing OpenAI, OpenRouter, and local model examples - 3 new tests for OpenAI provider resolution Tests: 2459 passed (was 2429).	2026-03-08 18:25:30 -07:00

1 2

67 Commits