Files

Teknium 36a4481152 fix: prevent unavailable tool names from leaking into model schemas

* fix: prevent unavailable tool names from leaking into model schemas

When web_search/web_extract fail check_fn (no API key configured), their
names were still leaking into tool descriptions via two paths:

1. execute_code schema: sandbox_enabled was computed from tools_to_include
   (pre-filter) instead of the actual available tools (post-filter), so
   the execute_code description listed web_search/web_extract as available
   sandbox imports even when they weren't.

2. browser_navigate schema: hardcoded description said 'prefer web_search
   or web_extract' regardless of whether those tools existed.

The model saw these references, assumed the tools existed, and tried
calling them directly — triggering 'Unknown tool' errors.

Fix: compute available_tool_names from the filtered result set and use
that for both execute_code sandbox listing and browser_navigate description
patching.

* docs: add pitfall about cross-tool references in schema descriptions

---------

Co-authored-by: Test <test@test.com>

2026-03-19 10:08:14 -07:00

16 KiB

Raw Blame History

Hermes Agent - Development Guide

Instructions for AI coding assistants and developers working on the hermes-agent codebase.

Development Environment

source .venv/bin/activate  # ALWAYS activate before running Python

Project Structure

hermes-agent/
├── run_agent.py          # AIAgent class — core conversation loop
├── model_tools.py        # Tool orchestration, _discover_tools(), handle_function_call()
├── toolsets.py           # Toolset definitions, _HERMES_CORE_TOOLS list
├── cli.py                # HermesCLI class — interactive CLI orchestrator
├── hermes_state.py       # SessionDB — SQLite session store (FTS5 search)
├── agent/                # Agent internals
│   ├── prompt_builder.py     # System prompt assembly
│   ├── context_compressor.py # Auto context compression
│   ├── prompt_caching.py     # Anthropic prompt caching
│   ├── auxiliary_client.py   # Auxiliary LLM client (vision, summarization)
│   ├── model_metadata.py     # Model context lengths, token estimation
│   ├── display.py            # KawaiiSpinner, tool preview formatting
│   ├── skill_commands.py     # Skill slash commands (shared CLI/gateway)
│   └── trajectory.py         # Trajectory saving helpers
├── hermes_cli/           # CLI subcommands and setup
│   ├── main.py           # Entry point — all `hermes` subcommands
│   ├── config.py         # DEFAULT_CONFIG, OPTIONAL_ENV_VARS, migration
│   ├── commands.py       # Slash command definitions + SlashCommandCompleter
│   ├── callbacks.py      # Terminal callbacks (clarify, sudo, approval)
│   ├── setup.py          # Interactive setup wizard
│   ├── skin_engine.py    # Skin/theme engine — CLI visual customization
│   ├── skills_config.py  # `hermes skills` — enable/disable skills per platform
│   ├── tools_config.py   # `hermes tools` — enable/disable tools per platform
│   ├── skills_hub.py     # `/skills` slash command (search, browse, install)
│   ├── models.py         # Model catalog, provider model lists
│   └── auth.py           # Provider credential resolution
├── tools/                # Tool implementations (one file per tool)
│   ├── registry.py       # Central tool registry (schemas, handlers, dispatch)
│   ├── approval.py       # Dangerous command detection
│   ├── terminal_tool.py  # Terminal orchestration
│   ├── process_registry.py # Background process management
│   ├── file_tools.py     # File read/write/search/patch
│   ├── web_tools.py      # Web search/extract (Parallel + Firecrawl)
│   ├── browser_tool.py   # Browserbase browser automation
│   ├── code_execution_tool.py # execute_code sandbox
│   ├── delegate_tool.py  # Subagent delegation
│   ├── mcp_tool.py       # MCP client (~1050 lines)
│   └── environments/     # Terminal backends (local, docker, ssh, modal, daytona, singularity)
├── gateway/              # Messaging platform gateway
│   ├── run.py            # Main loop, slash commands, message dispatch
│   ├── session.py        # SessionStore — conversation persistence
│   └── platforms/        # Adapters: telegram, discord, slack, whatsapp, homeassistant, signal
├── acp_adapter/          # ACP server (VS Code / Zed / JetBrains integration)
├── cron/                 # Scheduler (jobs.py, scheduler.py)
├── environments/         # RL training environments (Atropos)
├── tests/                # Pytest suite (~3000 tests)
└── batch_runner.py       # Parallel batch processing

User config: ~/.hermes/config.yaml (settings), ~/.hermes/.env (API keys)

File Dependency Chain

tools/registry.py  (no deps — imported by all tool files)
       ↑
tools/*.py  (each calls registry.register() at import time)
       ↑
model_tools.py  (imports tools/registry + triggers tool discovery)
       ↑
run_agent.py, cli.py, batch_runner.py, environments/

AIAgent Class (run_agent.py)

class AIAgent:
    def __init__(self,
        model: str = "anthropic/claude-opus-4.6",
        max_iterations: int = 90,
        enabled_toolsets: list = None,
        disabled_toolsets: list = None,
        quiet_mode: bool = False,
        save_trajectories: bool = False,
        platform: str = None,           # "cli", "telegram", etc.
        session_id: str = None,
        skip_context_files: bool = False,
        skip_memory: bool = False,
        # ... plus provider, api_mode, callbacks, routing params
    ): ...

    def chat(self, message: str) -> str:
        """Simple interface — returns final response string."""

    def run_conversation(self, user_message: str, system_message: str = None,
                         conversation_history: list = None, task_id: str = None) -> dict:
        """Full interface — returns dict with final_response + messages."""

Agent Loop

The core loop is inside run_conversation() — entirely synchronous:

while api_call_count < self.max_iterations and self.iteration_budget.remaining > 0:
    response = client.chat.completions.create(model=model, messages=messages, tools=tool_schemas)
    if response.tool_calls:
        for tool_call in response.tool_calls:
            result = handle_function_call(tool_call.name, tool_call.args, task_id)
            messages.append(tool_result_message(result))
        api_call_count += 1
    else:
        return response.content

Messages follow OpenAI format: {"role": "system/user/assistant/tool", ...}. Reasoning content is stored in assistant_msg["reasoning"].

CLI Architecture (cli.py)

Rich for banner/panels, prompt_toolkit for input with autocomplete
KawaiiSpinner (agent/display.py) — animated faces during API calls, ┊ activity feed for tool results
load_cli_config() in cli.py merges hardcoded defaults + user config YAML
Skin engine (hermes_cli/skin_engine.py) — data-driven CLI theming; initialized from display.skin config key at startup; skins customize banner colors, spinner faces/verbs/wings, tool prefix, response box, branding text
process_command() is a method on HermesCLI — dispatches on canonical command name resolved via resolve_command() from the central registry
Skill slash commands: agent/skill_commands.py scans ~/.hermes/skills/, injects as user message (not system prompt) to preserve prompt caching

Slash Command Registry (`hermes_cli/commands.py`)

All slash commands are defined in a central COMMAND_REGISTRY list of CommandDef objects. Every downstream consumer derives from this registry automatically:

CLI — process_command() resolves aliases via resolve_command(), dispatches on canonical name
Gateway — GATEWAY_KNOWN_COMMANDS frozenset for hook emission, resolve_command() for dispatch
Gateway help — gateway_help_lines() generates /help output
Telegram — telegram_bot_commands() generates the BotCommand menu
Slack — slack_subcommand_map() generates /hermes subcommand routing
Autocomplete — COMMANDS flat dict feeds SlashCommandCompleter
CLI help — COMMANDS_BY_CATEGORY dict feeds show_help()

Adding a Slash Command

Add a CommandDef entry to COMMAND_REGISTRY in hermes_cli/commands.py:

CommandDef("mycommand", "Description of what it does", "Session",
           aliases=("mc",), args_hint="[arg]"),

Add handler in HermesCLI.process_command() in cli.py:

elif canonical == "mycommand":
    self._handle_mycommand(cmd_original)

If the command is available in the gateway, add a handler in gateway/run.py:

if canonical == "mycommand":
    return await self._handle_mycommand(event)

For persistent settings, use save_config_value() in cli.py

CommandDef fields:

name — canonical name without slash (e.g. "background")
description — human-readable description
category — one of "Session", "Configuration", "Tools & Skills", "Info", "Exit"
aliases — tuple of alternative names (e.g. ("bg",))
args_hint — argument placeholder shown in help (e.g. "<prompt>", "[name]")
cli_only — only available in the interactive CLI
gateway_only — only available in messaging platforms

Adding an alias requires only adding it to the aliases tuple on the existing CommandDef. No other file changes needed — dispatch, help text, Telegram menu, Slack mapping, and autocomplete all update automatically.

Adding New Tools

Requires changes in 3 files:

1. Create tools/your_tool.py:

import json, os
from tools.registry import registry

def check_requirements() -> bool:
    return bool(os.getenv("EXAMPLE_API_KEY"))

def example_tool(param: str, task_id: str = None) -> str:
    return json.dumps({"success": True, "data": "..."})

registry.register(
    name="example_tool",
    toolset="example",
    schema={"name": "example_tool", "description": "...", "parameters": {...}},
    handler=lambda args, **kw: example_tool(param=args.get("param", ""), task_id=kw.get("task_id")),
    check_fn=check_requirements,
    requires_env=["EXAMPLE_API_KEY"],
)

2. Add import in model_tools.py _discover_tools() list.

3. Add to toolsets.py — either _HERMES_CORE_TOOLS (all platforms) or a new toolset.

The registry handles schema collection, dispatch, availability checking, and error wrapping. All handlers MUST return a JSON string.

Agent-level tools (todo, memory): intercepted by run_agent.py before handle_function_call(). See todo_tool.py for the pattern.

Adding Configuration

config.yaml options:

Add to DEFAULT_CONFIG in hermes_cli/config.py
Bump _config_version (currently 5) to trigger migration for existing users

.env variables:

Add to OPTIONAL_ENV_VARS in hermes_cli/config.py with metadata:

"NEW_API_KEY": {
    "description": "What it's for",
    "prompt": "Display name",
    "url": "https://...",
    "password": True,
    "category": "tool",  # provider, tool, messaging, setting
},

Config loaders (two separate systems):

Loader	Used by	Location
`load_cli_config()`	CLI mode	`cli.py`
`load_config()`	`hermes tools`, `hermes setup`	`hermes_cli/config.py`
Direct YAML load	Gateway	`gateway/run.py`

Skin/Theme System

The skin engine (hermes_cli/skin_engine.py) provides data-driven CLI visual customization. Skins are pure data — no code changes needed to add a new skin.

Architecture

hermes_cli/skin_engine.py    # SkinConfig dataclass, built-in skins, YAML loader
~/.hermes/skins/*.yaml       # User-installed custom skins (drop-in)

init_skin_from_config() — called at CLI startup, reads display.skin from config
get_active_skin() — returns cached SkinConfig for the current skin
set_active_skin(name) — switches skin at runtime (used by /skin command)
load_skin(name) — loads from user skins first, then built-ins, then falls back to default
Missing skin values inherit from the default skin automatically

What skins customize

Element	Skin Key	Used By
Banner panel border	`colors.banner_border`	`banner.py`
Banner panel title	`colors.banner_title`	`banner.py`
Banner section headers	`colors.banner_accent`	`banner.py`
Banner dim text	`colors.banner_dim`	`banner.py`
Banner body text	`colors.banner_text`	`banner.py`
Response box border	`colors.response_border`	`cli.py`
Spinner faces (waiting)	`spinner.waiting_faces`	`display.py`
Spinner faces (thinking)	`spinner.thinking_faces`	`display.py`
Spinner verbs	`spinner.thinking_verbs`	`display.py`
Spinner wings (optional)	`spinner.wings`	`display.py`
Tool output prefix	`tool_prefix`	`display.py`
Per-tool emojis	`tool_emojis`	`display.py` → `get_tool_emoji()`
Agent name	`branding.agent_name`	`banner.py`, `cli.py`
Welcome message	`branding.welcome`	`cli.py`
Response box label	`branding.response_label`	`cli.py`
Prompt symbol	`branding.prompt_symbol`	`cli.py`

Built-in skins

default — Classic Hermes gold/kawaii (the current look)
ares — Crimson/bronze war-god theme with custom spinner wings
mono — Clean grayscale monochrome
slate — Cool blue developer-focused theme

Adding a built-in skin

Add to _BUILTIN_SKINS dict in hermes_cli/skin_engine.py:

"mytheme": {
    "name": "mytheme",
    "description": "Short description",
    "colors": { ... },
    "spinner": { ... },
    "branding": { ... },
    "tool_prefix": "┊",
},

User skins (YAML)

Users create ~/.hermes/skins/<name>.yaml:

name: cyberpunk
description: Neon-soaked terminal theme

colors:
  banner_border: "#FF00FF"
  banner_title: "#00FFFF"
  banner_accent: "#FF1493"

spinner:
  thinking_verbs: ["jacking in", "decrypting", "uploading"]
  wings:
    - ["⟨⚡", "⚡⟩"]

branding:
  agent_name: "Cyber Agent"
  response_label: " ⚡ Cyber "

tool_prefix: "▏"

Activate with /skin cyberpunk or display.skin: cyberpunk in config.yaml.

Important Policies

Prompt Caching Must Not Break

Hermes-Agent ensures caching remains valid throughout a conversation. Do NOT implement changes that would:

Alter past context mid-conversation
Change toolsets mid-conversation
Reload memories or rebuild system prompts mid-conversation

Cache-breaking forces dramatically higher costs. The ONLY time we alter context is during context compression.

Working Directory Behavior

CLI: Uses current directory (. → os.getcwd())
Messaging: Uses MESSAGING_CWD env var (default: home directory)

Background Process Notifications (Gateway)

When terminal(background=true, check_interval=...) is used, the gateway runs a watcher that pushes status updates to the user's chat. Control verbosity with display.background_process_notifications in config.yaml (or HERMES_BACKGROUND_NOTIFICATIONS env var):

all — running-output updates + final message (default)
result — only the final completion message
error — only the final message when exit code != 0
off — no watcher messages at all

Known Pitfalls

DO NOT use `simple_term_menu` for interactive menus

Rendering bugs in tmux/iTerm2 — ghosting on scroll. Use curses (stdlib) instead. See hermes_cli/tools_config.py for the pattern.

DO NOT use `\033[K` (ANSI erase-to-EOL) in spinner/display code

Leaks as literal ?[K text under prompt_toolkit's patch_stdout. Use space-padding: f"\r{line}{' ' * pad}".

`_last_resolved_tool_names` is a process-global in `model_tools.py`

_run_single_child() in delegate_tool.py saves and restores this global around subagent execution. If you add new code that reads this global, be aware it may be temporarily stale during child agent runs.

DO NOT hardcode cross-tool references in schema descriptions

Tool schema descriptions must not mention tools from other toolsets by name (e.g., browser_navigate saying "prefer web_search"). Those tools may be unavailable (missing API keys, disabled toolset), causing the model to hallucinate calls to non-existent tools. If a cross-reference is needed, add it dynamically in get_tool_definitions() in model_tools.py — see the browser_navigate / execute_code post-processing blocks for the pattern.

Tests must not write to `~/.hermes/`

The _isolate_hermes_home autouse fixture in tests/conftest.py redirects HERMES_HOME to a temp dir. Never hardcode ~/.hermes/ paths in tests.

Testing

source .venv/bin/activate
python -m pytest tests/ -q          # Full suite (~3000 tests, ~3 min)
python -m pytest tests/test_model_tools.py -q   # Toolset resolution
python -m pytest tests/test_cli_init.py -q       # CLI config loading
python -m pytest tests/gateway/ -q               # Gateway tests
python -m pytest tests/tools/ -q                 # Tool-level tests

Always run the full suite before pushing changes.

16 KiB Raw Blame History