docs: deep quality pass — expand 10 thin pages, fix specific issues (#4134)

Developer guide stubs expanded to full documentation:
- trajectory-format.md: 56→233 lines (JSONL format, ShareGPT example,
  normalization rules, reasoning markup, replay code)
- session-storage.md: 66→388 lines (SQLite schema, migration table,
  FTS5 search syntax, lineage queries, Python API examples)
- context-compression-and-caching.md: 72→321 lines (dual compression
  system, config defaults, 4-phase algorithm, before/after example,
  prompt caching mechanics, cache-aware patterns)
- tools-runtime.md: 65→246 lines (registry API, dispatch flow,
  availability checking, error wrapping, approval flow)
- prompt-assembly.md: 89→246 lines (concrete assembled prompt example,
  SOUL.md injection, context file discovery table)

User-facing pages expanded:
- docker.md: 62→224 lines (volumes, env forwarding, docker-compose,
  resource limits, troubleshooting)
- updating.md: 79→167 lines (update behavior, version checking,
  rollback instructions, Nix users)
- skins.md: 80→206 lines (all color/spinner/branding keys, built-in
  skin descriptions, full custom skin YAML template)

Hub pages improved:
- integrations/index.md: 25→82 lines (web search backends table,
  TTS/browser providers, quick config example)
- features/overview.md: added Integrations section with 6 missing links

Specific fixes:
- configuration.md: removed duplicate Gateway Streaming section
- mcp.md: removed internal "PR work" language
- plugins.md: added inline minimal plugin example (self-contained)

13 files changed, ~1700 lines added. Docusaurus build verified clean.
This commit is contained in:
Teknium
2026-03-30 20:30:11 -07:00
committed by GitHub
parent 54b876a5c9
commit 5b0243e6ad
13 changed files with 1735 additions and 174 deletions

View File

@@ -1,72 +1,321 @@
---
sidebar_position: 6
title: "Context Compression & Prompt Caching"
description: "How Hermes compresses long conversations and applies provider-side prompt caching"
---
# Context Compression and Caching
# Context Compression & Prompt Caching
Hermes Agent uses a dual compression system and Anthropic prompt caching to
manage context window usage efficiently across long conversations.
Hermes manages long conversations with two complementary mechanisms:
Source files: `agent/context_compressor.py`, `agent/prompt_caching.py`,
`gateway/run.py` (session hygiene), `run_agent.py` (lines 1146-1204)
- prompt caching
- context compression
Primary files:
## Dual Compression System
- `agent/prompt_caching.py`
- `agent/context_compressor.py`
- `run_agent.py`
Hermes has two separate compression layers that operate independently:
## Prompt caching
```
┌──────────────────────────┐
Incoming message │ Gateway Session Hygiene │ Fires at 85% of context
─────────────────► │ (pre-agent, rough est.) │ Safety net for large sessions
└─────────────┬────────────┘
┌──────────────────────────┐
│ Agent ContextCompressor │ Fires at 50% of context (default)
│ (in-loop, real tokens) │ Normal context management
└──────────────────────────┘
```
For Anthropic/native and Claude-via-OpenRouter flows, Hermes applies Anthropic-style cache markers.
### 1. Gateway Session Hygiene (85% threshold)
Current strategy:
Located in `gateway/run.py` (around line 2220). This is a **safety net** that
runs before the agent processes a message. It prevents API failures when sessions
grow too large between turns (e.g., overnight accumulation in Telegram/Discord).
- cache the system prompt
- cache the last 3 non-system messages
- default TTL is 5 minutes unless explicitly extended
- **Threshold**: Fixed at 85% of model context length
- **Token source**: Prefers actual API-reported tokens from last turn; falls back
to rough character-based estimate (`estimate_messages_tokens_rough`)
- **Fires**: Only when `len(history) >= 4` and compression is enabled
- **Purpose**: Catch sessions that escaped the agent's own compressor
This is implemented in `agent/prompt_caching.py`.
The gateway hygiene threshold is intentionally higher than the agent's compressor.
Setting it at 50% (same as the agent) caused premature compression on every turn
in long gateway sessions.
## Why prompt stability matters
### 2. Agent ContextCompressor (50% threshold, configurable)
Prompt caching only helps when the stable prefix remains stable. That is why Hermes avoids rebuilding or mutating the core system prompt mid-session unless it has to.
Located in `agent/context_compressor.py`. This is the **primary compression
system** that runs inside the agent's tool loop with access to accurate,
API-reported token counts.
## Compression trigger
Hermes can compress context when conversations become large. Configuration defaults live in `config.yaml`, and the compressor also has runtime checks based on actual prompt token counts.
## Configuration
## Compression algorithm
All compression settings are read from `config.yaml` under the `compression` key:
The compressor protects:
```yaml
compression:
enabled: true # Enable/disable compression (default: true)
threshold: 0.50 # Fraction of context window (default: 0.50 = 50%)
target_ratio: 0.20 # How much of threshold to keep as tail (default: 0.20)
protect_last_n: 20 # Minimum protected tail messages (default: 20)
summary_model: null # Override model for summaries (default: uses auxiliary)
```
- the first N turns
- the last N turns
### Parameter Details
and summarizes the middle section.
| Parameter | Default | Range | Description |
|-----------|---------|-------|-------------|
| `threshold` | `0.50` | 0.0-1.0 | Compression triggers when prompt tokens ≥ `threshold × context_length` |
| `target_ratio` | `0.20` | 0.10-0.80 | Controls tail protection token budget: `threshold_tokens × target_ratio` |
| `protect_last_n` | `20` | ≥1 | Minimum number of recent messages always preserved |
| `protect_first_n` | `3` | (hardcoded) | System prompt + first exchange always preserved |
It also cleans up structural issues such as orphaned tool-call/result pairs so the API never receives invalid conversation structure after compression.
### Computed Values (for a 200K context model at defaults)
## Pre-compression memory flush
```
context_length = 200,000
threshold_tokens = 200,000 × 0.50 = 100,000
tail_token_budget = 100,000 × 0.20 = 20,000
max_summary_tokens = min(200,000 × 0.05, 12,000) = 10,000
```
Before compression, Hermes can give the model one last chance to persist memory so facts are not lost when middle turns are summarized away.
## Session lineage after compression
## Compression Algorithm
Compression can split the session into a new session ID while preserving parent lineage in the state DB.
The `ContextCompressor.compress()` method follows a 4-phase algorithm:
This lets Hermes continue operating with a smaller active context while retaining a searchable ancestry chain.
### Phase 1: Prune Old Tool Results (cheap, no LLM call)
## Re-injected state after compression
Old tool results (>200 chars) outside the protected tail are replaced with:
```
[Old tool output cleared to save context space]
```
After compression, Hermes may re-inject compact operational state such as:
This is a cheap pre-pass that saves significant tokens from verbose tool
outputs (file contents, terminal output, search results).
- todo snapshot
- prior-read-files summary
### Phase 2: Determine Boundaries
## Related docs
```
┌─────────────────────────────────────────────────────────────┐
│ Message list │
│ │
│ [0..2] ← protect_first_n (system + first exchange) │
│ [3..N] ← middle turns → SUMMARIZED │
│ [N..end] ← tail (by token budget OR protect_last_n) │
│ │
└─────────────────────────────────────────────────────────────┘
```
- [Prompt Assembly](./prompt-assembly.md)
- [Session Storage](./session-storage.md)
- [Agent Loop Internals](./agent-loop.md)
Tail protection is **token-budget based**: walks backward from the end,
accumulating tokens until the budget is exhausted. Falls back to the fixed
`protect_last_n` count if the budget would protect fewer messages.
Boundaries are aligned to avoid splitting tool_call/tool_result groups.
The `_align_boundary_backward()` method walks past consecutive tool results
to find the parent assistant message, keeping groups intact.
### Phase 3: Generate Structured Summary
The middle turns are summarized using the auxiliary LLM with a structured
template:
```
## Goal
[What the user is trying to accomplish]
## Constraints & Preferences
[User preferences, coding style, constraints, important decisions]
## Progress
### Done
[Completed work — specific file paths, commands run, results]
### In Progress
[Work currently underway]
### Blocked
[Any blockers or issues encountered]
## Key Decisions
[Important technical decisions and why]
## Relevant Files
[Files read, modified, or created — with brief note on each]
## Next Steps
[What needs to happen next]
## Critical Context
[Specific values, error messages, configuration details]
```
Summary budget scales with the amount of content being compressed:
- Formula: `content_tokens × 0.20` (the `_SUMMARY_RATIO` constant)
- Minimum: 2,000 tokens
- Maximum: `min(context_length × 0.05, 12,000)` tokens
### Phase 4: Assemble Compressed Messages
The compressed message list is:
1. Head messages (with a note appended to system prompt on first compression)
2. Summary message (role chosen to avoid consecutive same-role violations)
3. Tail messages (unmodified)
Orphaned tool_call/tool_result pairs are cleaned up by `_sanitize_tool_pairs()`:
- Tool results referencing removed calls → removed
- Tool calls whose results were removed → stub result injected
### Iterative Re-compression
On subsequent compressions, the previous summary is passed to the LLM with
instructions to **update** it rather than summarize from scratch. This preserves
information across multiple compactions — items move from "In Progress" to "Done",
new progress is added, and obsolete information is removed.
The `_previous_summary` field on the compressor instance stores the last summary
text for this purpose.
## Before/After Example
### Before Compression (45 messages, ~95K tokens)
```
[0] system: "You are a helpful assistant..." (system prompt)
[1] user: "Help me set up a FastAPI project"
[2] assistant: <tool_call> terminal: mkdir project </tool_call>
[3] tool: "directory created"
[4] assistant: <tool_call> write_file: main.py </tool_call>
[5] tool: "file written (2.3KB)"
... 30 more turns of file editing, testing, debugging ...
[38] assistant: <tool_call> terminal: pytest </tool_call>
[39] tool: "8 passed, 2 failed\n..." (5KB output)
[40] user: "Fix the failing tests"
[41] assistant: <tool_call> read_file: tests/test_api.py </tool_call>
[42] tool: "import pytest\n..." (3KB)
[43] assistant: "I see the issue with the test fixtures..."
[44] user: "Great, also add error handling"
```
### After Compression (25 messages, ~45K tokens)
```
[0] system: "You are a helpful assistant...
[Note: Some earlier conversation turns have been compacted...]"
[1] user: "Help me set up a FastAPI project"
[2] assistant: "[CONTEXT COMPACTION] Earlier turns were compacted...
## Goal
Set up a FastAPI project with tests and error handling
## Progress
### Done
- Created project structure: main.py, tests/, requirements.txt
- Implemented 5 API endpoints in main.py
- Wrote 10 test cases in tests/test_api.py
- 8/10 tests passing
### In Progress
- Fixing 2 failing tests (test_create_user, test_delete_user)
## Relevant Files
- main.py — FastAPI app with 5 endpoints
- tests/test_api.py — 10 test cases
- requirements.txt — fastapi, pytest, httpx
## Next Steps
- Fix failing test fixtures
- Add error handling"
[3] user: "Fix the failing tests"
[4] assistant: <tool_call> read_file: tests/test_api.py </tool_call>
[5] tool: "import pytest\n..."
[6] assistant: "I see the issue with the test fixtures..."
[7] user: "Great, also add error handling"
```
## Prompt Caching (Anthropic)
Source: `agent/prompt_caching.py`
Reduces input token costs by ~75% on multi-turn conversations by caching the
conversation prefix. Uses Anthropic's `cache_control` breakpoints.
### Strategy: system_and_3
Anthropic allows a maximum of 4 `cache_control` breakpoints per request. Hermes
uses the "system_and_3" strategy:
```
Breakpoint 1: System prompt (stable across all turns)
Breakpoint 2: 3rd-to-last non-system message ─┐
Breakpoint 3: 2nd-to-last non-system message ├─ Rolling window
Breakpoint 4: Last non-system message ─┘
```
### How It Works
`apply_anthropic_cache_control()` deep-copies the messages and injects
`cache_control` markers:
```python
# Cache marker format
marker = {"type": "ephemeral"}
# Or for 1-hour TTL:
marker = {"type": "ephemeral", "ttl": "1h"}
```
The marker is applied differently based on content type:
| Content Type | Where Marker Goes |
|-------------|-------------------|
| String content | Converted to `[{"type": "text", "text": ..., "cache_control": ...}]` |
| List content | Added to the last element's dict |
| None/empty | Added as `msg["cache_control"]` |
| Tool messages | Added as `msg["cache_control"]` (native Anthropic only) |
### Cache-Aware Design Patterns
1. **Stable system prompt**: The system prompt is breakpoint 1 and cached across
all turns. Avoid mutating it mid-conversation (compression appends a note
only on the first compaction).
2. **Message ordering matters**: Cache hits require prefix matching. Adding or
removing messages in the middle invalidates the cache for everything after.
3. **Compression cache interaction**: After compression, the cache is invalidated
for the compressed region but the system prompt cache survives. The rolling
3-message window re-establishes caching within 1-2 turns.
4. **TTL selection**: Default is `5m` (5 minutes). Use `1h` for long-running
sessions where the user takes breaks between turns.
### Enabling Prompt Caching
Prompt caching is automatically enabled when:
- The model is an Anthropic Claude model (detected by model name)
- The provider supports `cache_control` (native Anthropic API or OpenRouter)
```yaml
# config.yaml — TTL is configurable
model:
cache_ttl: "5m" # "5m" or "1h"
```
The CLI shows caching status at startup:
```
💾 Prompt caching: ENABLED (Claude via OpenRouter, 5m TTL)
```
## Context Pressure Warnings
The agent emits context pressure warnings at 85% of the compression threshold
(not 85% of context — 85% of the threshold which is itself 50% of context):
```
⚠️ Context is 85% to compaction threshold (42,500/50,000 tokens)
```
After compression, if usage drops below 85% of threshold, the warning state
is cleared. If compression fails to reduce below the warning level (the
conversation is too dense), the warning persists but compression won't
re-trigger until the threshold is exceeded again.

View File

@@ -41,6 +41,163 @@ The cached system prompt is assembled in roughly this order:
When `skip_context_files` is set (e.g., subagent delegation), SOUL.md is not loaded and the hardcoded `DEFAULT_AGENT_IDENTITY` is used instead.
### Concrete example: assembled system prompt
Here is a simplified view of what the final system prompt looks like when all layers are present (comments show the source of each section):
```
# Layer 1: Agent Identity (from ~/.hermes/SOUL.md)
You are Hermes, an AI assistant created by Nous Research.
You are an expert software engineer and researcher.
You value correctness, clarity, and efficiency.
...
# Layer 2: Tool-aware behavior guidance
You have persistent memory across sessions. Save durable facts using
the memory tool: user preferences, environment details, tool quirks,
and stable conventions. Memory is injected into every turn, so keep
it compact and focused on facts that will still matter later.
...
When the user references something from a past conversation or you
suspect relevant cross-session context exists, use session_search
to recall it before asking them to repeat themselves.
# Tool-use enforcement (for GPT/Codex models only)
You MUST use your tools to take action — do not describe what you
would do or plan to do without actually doing it.
...
# Layer 3: Honcho static block (when active)
[Honcho personality/context data]
# Layer 4: Optional system message (from config or API)
[User-configured system message override]
# Layer 5: Frozen MEMORY snapshot
## Persistent Memory
- User prefers Python 3.12, uses pyproject.toml
- Default editor is nvim
- Working on project "atlas" in ~/code/atlas
- Timezone: US/Pacific
# Layer 6: Frozen USER profile snapshot
## User Profile
- Name: Alice
- GitHub: alice-dev
# Layer 7: Skills index
## Skills (mandatory)
Before replying, scan the skills below. If one clearly matches
your task, load it with skill_view(name) and follow its instructions.
...
<available_skills>
software-development:
- code-review: Structured code review workflow
- test-driven-development: TDD methodology
research:
- arxiv: Search and summarize arXiv papers
</available_skills>
# Layer 8: Context files (from project directory)
# Project Context
The following project context files have been loaded and should be followed:
## AGENTS.md
This is the atlas project. Use pytest for testing. The main
entry point is src/atlas/main.py. Always run `make lint` before
committing.
# Layer 9: Timestamp + session
Current time: 2026-03-30T14:30:00-07:00
Session: abc123
# Layer 10: Platform hint
You are a CLI AI Agent. Try not to use markdown but simple text
renderable inside a terminal.
```
## How SOUL.md appears in the prompt
`SOUL.md` lives at `~/.hermes/SOUL.md` and serves as the agent's identity — the very first section of the system prompt. The loading logic in `prompt_builder.py` works as follows:
```python
# From agent/prompt_builder.py (simplified)
def load_soul_md() -> Optional[str]:
soul_path = get_hermes_home() / "SOUL.md"
if not soul_path.exists():
return None
content = soul_path.read_text(encoding="utf-8").strip()
content = _scan_context_content(content, "SOUL.md") # Security scan
content = _truncate_content(content, "SOUL.md") # Cap at 20k chars
return content
```
When `load_soul_md()` returns content, it replaces the hardcoded `DEFAULT_AGENT_IDENTITY`. The `build_context_files_prompt()` function is then called with `skip_soul=True` to prevent SOUL.md from appearing twice (once as identity, once as a context file).
If `SOUL.md` doesn't exist, the system falls back to:
```
You are Hermes Agent, an intelligent AI assistant created by Nous Research.
You are helpful, knowledgeable, and direct. You assist users with a wide
range of tasks including answering questions, writing and editing code,
analyzing information, creative work, and executing actions via your tools.
You communicate clearly, admit uncertainty when appropriate, and prioritize
being genuinely useful over being verbose unless otherwise directed below.
Be targeted and efficient in your exploration and investigations.
```
## How context files are injected
`build_context_files_prompt()` uses a **priority system** — only one project context type is loaded (first match wins):
```python
# From agent/prompt_builder.py (simplified)
def build_context_files_prompt(cwd=None, skip_soul=False):
cwd_path = Path(cwd).resolve()
# Priority: first match wins — only ONE project context loaded
project_context = (
_load_hermes_md(cwd_path) # 1. .hermes.md / HERMES.md (walks to git root)
or _load_agents_md(cwd_path) # 2. AGENTS.md (cwd only)
or _load_claude_md(cwd_path) # 3. CLAUDE.md (cwd only)
or _load_cursorrules(cwd_path) # 4. .cursorrules / .cursor/rules/*.mdc
)
sections = []
if project_context:
sections.append(project_context)
# SOUL.md from HERMES_HOME (independent of project context)
if not skip_soul:
soul_content = load_soul_md()
if soul_content:
sections.append(soul_content)
if not sections:
return ""
return (
"# Project Context\n\n"
"The following project context files have been loaded "
"and should be followed:\n\n"
+ "\n".join(sections)
)
```
### Context file discovery details
| Priority | Files | Search scope | Notes |
|----------|-------|-------------|-------|
| 1 | `.hermes.md`, `HERMES.md` | CWD up to git root | Hermes-native project config |
| 2 | `AGENTS.md` | CWD only | Common agent instruction file |
| 3 | `CLAUDE.md` | CWD only | Claude Code compatibility |
| 4 | `.cursorrules`, `.cursor/rules/*.mdc` | CWD only | Cursor compatibility |
All context files are:
- **Security scanned** — checked for prompt injection patterns (invisible unicode, "ignore previous instructions", credential exfiltration attempts)
- **Truncated** — capped at 20,000 characters using 70/20 head/tail ratio with a truncation marker
- **YAML frontmatter stripped** — `.hermes.md` frontmatter is removed (reserved for future config overrides)
## API-call-time-only layers
These are intentionally *not* persisted as part of the cached system prompt:

View File

@@ -1,66 +1,388 @@
---
sidebar_position: 8
title: "Session Storage"
description: "How Hermes stores sessions in SQLite, maintains lineage, and exposes recall/search"
---
# Session Storage
Hermes uses a SQLite-backed session store as the main source of truth for historical conversation state.
Hermes Agent uses a SQLite database (`~/.hermes/state.db`) to persist session
metadata, full message history, and model configuration across CLI and gateway
sessions. This replaces the earlier per-session JSONL file approach.
Primary files:
Source file: `hermes_state.py`
- `hermes_state.py`
- `gateway/session.py`
- `tools/session_search_tool.py`
## Main database
## Architecture Overview
The primary store lives at:
```text
~/.hermes/state.db
```
~/.hermes/state.db (SQLite, WAL mode)
├── sessions — Session metadata, token counts, billing
├── messages — Full message history per session
├── messages_fts — FTS5 virtual table for full-text search
└── schema_version — Single-row table tracking migration state
```
It contains:
Key design decisions:
- **WAL mode** for concurrent readers + one writer (gateway multi-platform)
- **FTS5 virtual table** for fast text search across all session messages
- **Session lineage** via `parent_session_id` chains (compression-triggered splits)
- **Source tagging** (`cli`, `telegram`, `discord`, etc.) for platform filtering
- Batch runner and RL trajectories are NOT stored here (separate systems)
- sessions
- messages
- metadata such as token counts and titles
- lineage relationships
- full-text search indexes
## What is stored per session
## SQLite Schema
Examples of important session metadata:
### Sessions Table
- session ID
- source/platform
- title
- created/updated timestamps
- token counts
- tool call counts
- stored system prompt snapshot
- parent session ID after compression splits
```sql
CREATE TABLE IF NOT EXISTS sessions (
id TEXT PRIMARY KEY,
source TEXT NOT NULL,
user_id TEXT,
model TEXT,
model_config TEXT,
system_prompt TEXT,
parent_session_id TEXT,
started_at REAL NOT NULL,
ended_at REAL,
end_reason TEXT,
message_count INTEGER DEFAULT 0,
tool_call_count INTEGER DEFAULT 0,
input_tokens INTEGER DEFAULT 0,
output_tokens INTEGER DEFAULT 0,
cache_read_tokens INTEGER DEFAULT 0,
cache_write_tokens INTEGER DEFAULT 0,
reasoning_tokens INTEGER DEFAULT 0,
billing_provider TEXT,
billing_base_url TEXT,
billing_mode TEXT,
estimated_cost_usd REAL,
actual_cost_usd REAL,
cost_status TEXT,
cost_source TEXT,
pricing_version TEXT,
title TEXT,
FOREIGN KEY (parent_session_id) REFERENCES sessions(id)
);
## Lineage
CREATE INDEX IF NOT EXISTS idx_sessions_source ON sessions(source);
CREATE INDEX IF NOT EXISTS idx_sessions_parent ON sessions(parent_session_id);
CREATE INDEX IF NOT EXISTS idx_sessions_started ON sessions(started_at DESC);
CREATE UNIQUE INDEX IF NOT EXISTS idx_sessions_title_unique
ON sessions(title) WHERE title IS NOT NULL;
```
When Hermes compresses a conversation, it can continue in a new session ID while preserving ancestry via `parent_session_id`.
### Messages Table
This means resuming/searching can follow session families instead of treating each compressed shard as unrelated.
```sql
CREATE TABLE IF NOT EXISTS messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT NOT NULL REFERENCES sessions(id),
role TEXT NOT NULL,
content TEXT,
tool_call_id TEXT,
tool_calls TEXT,
tool_name TEXT,
timestamp REAL NOT NULL,
token_count INTEGER,
finish_reason TEXT,
reasoning TEXT,
reasoning_details TEXT,
codex_reasoning_items TEXT
);
## Gateway vs CLI persistence
CREATE INDEX IF NOT EXISTS idx_messages_session ON messages(session_id, timestamp);
```
- CLI uses the state DB directly for resume/history/search
- gateway keeps active-session mappings and may also maintain additional platform transcript/state files
- some legacy JSON/JSONL artifacts still exist for compatibility, but SQLite is the main historical store
Notes:
- `tool_calls` is stored as a JSON string (serialized list of tool call objects)
- `reasoning_details` and `codex_reasoning_items` are stored as JSON strings
- `reasoning` stores the raw reasoning text for providers that expose it
- Timestamps are Unix epoch floats (`time.time()`)
## Session search
### FTS5 Full-Text Search
The `session_search` tool uses the session DB's search features to retrieve and summarize relevant past work.
```sql
CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts USING fts5(
content,
content=messages,
content_rowid=id
);
```
## Related docs
The FTS5 table is kept in sync via three triggers that fire on INSERT, UPDATE,
and DELETE of the `messages` table:
- [Gateway Internals](./gateway-internals.md)
- [Prompt Assembly](./prompt-assembly.md)
- [Context Compression & Prompt Caching](./context-compression-and-caching.md)
```sql
CREATE TRIGGER IF NOT EXISTS messages_fts_insert AFTER INSERT ON messages BEGIN
INSERT INTO messages_fts(rowid, content) VALUES (new.id, new.content);
END;
CREATE TRIGGER IF NOT EXISTS messages_fts_delete AFTER DELETE ON messages BEGIN
INSERT INTO messages_fts(messages_fts, rowid, content)
VALUES('delete', old.id, old.content);
END;
CREATE TRIGGER IF NOT EXISTS messages_fts_update AFTER UPDATE ON messages BEGIN
INSERT INTO messages_fts(messages_fts, rowid, content)
VALUES('delete', old.id, old.content);
INSERT INTO messages_fts(rowid, content) VALUES (new.id, new.content);
END;
```
## Schema Version and Migrations
Current schema version: **6**
The `schema_version` table stores a single integer. On initialization,
`_init_schema()` checks the current version and applies migrations sequentially:
| Version | Change |
|---------|--------|
| 1 | Initial schema (sessions, messages, FTS5) |
| 2 | Add `finish_reason` column to messages |
| 3 | Add `title` column to sessions |
| 4 | Add unique index on `title` (NULLs allowed, non-NULL must be unique) |
| 5 | Add billing columns: `cache_read_tokens`, `cache_write_tokens`, `reasoning_tokens`, `billing_provider`, `billing_base_url`, `billing_mode`, `estimated_cost_usd`, `actual_cost_usd`, `cost_status`, `cost_source`, `pricing_version` |
| 6 | Add reasoning columns to messages: `reasoning`, `reasoning_details`, `codex_reasoning_items` |
Each migration uses `ALTER TABLE ADD COLUMN` wrapped in try/except to handle
the column-already-exists case (idempotent). The version number is bumped after
each successful migration block.
## Write Contention Handling
Multiple hermes processes (gateway + CLI sessions + worktree agents) share one
`state.db`. The `SessionDB` class handles write contention with:
- **Short SQLite timeout** (1 second) instead of the default 30s
- **Application-level retry** with random jitter (20-150ms, up to 15 retries)
- **BEGIN IMMEDIATE** transactions to surface lock contention at transaction start
- **Periodic WAL checkpoints** every 50 successful writes (PASSIVE mode)
This avoids the "convoy effect" where SQLite's deterministic internal backoff
causes all competing writers to retry at the same intervals.
```
_WRITE_MAX_RETRIES = 15
_WRITE_RETRY_MIN_S = 0.020 # 20ms
_WRITE_RETRY_MAX_S = 0.150 # 150ms
_CHECKPOINT_EVERY_N_WRITES = 50
```
## Common Operations
### Initialize
```python
from hermes_state import SessionDB
db = SessionDB() # Default: ~/.hermes/state.db
db = SessionDB(db_path=Path("/tmp/test.db")) # Custom path
```
### Create and Manage Sessions
```python
# Create a new session
db.create_session(
session_id="sess_abc123",
source="cli",
model="anthropic/claude-sonnet-4.6",
user_id="user_1",
parent_session_id=None, # or previous session ID for lineage
)
# End a session
db.end_session("sess_abc123", end_reason="user_exit")
# Reopen a session (clear ended_at/end_reason)
db.reopen_session("sess_abc123")
```
### Store Messages
```python
msg_id = db.append_message(
session_id="sess_abc123",
role="assistant",
content="Here's the answer...",
tool_calls=[{"id": "call_1", "function": {"name": "terminal", "arguments": "{}"}}],
token_count=150,
finish_reason="stop",
reasoning="Let me think about this...",
)
```
### Retrieve Messages
```python
# Raw messages with all metadata
messages = db.get_messages("sess_abc123")
# OpenAI conversation format (for API replay)
conversation = db.get_messages_as_conversation("sess_abc123")
# Returns: [{"role": "user", "content": "..."}, {"role": "assistant", ...}]
```
### Session Titles
```python
# Set a title (must be unique among non-NULL titles)
db.set_session_title("sess_abc123", "Fix Docker Build")
# Resolve by title (returns most recent in lineage)
session_id = db.resolve_session_by_title("Fix Docker Build")
# Auto-generate next title in lineage
next_title = db.get_next_title_in_lineage("Fix Docker Build")
# Returns: "Fix Docker Build #2"
```
## Full-Text Search
The `search_messages()` method supports FTS5 query syntax with automatic
sanitization of user input.
### Basic Search
```python
results = db.search_messages("docker deployment")
```
### FTS5 Query Syntax
| Syntax | Example | Meaning |
|--------|---------|---------|
| Keywords | `docker deployment` | Both terms (implicit AND) |
| Quoted phrase | `"exact phrase"` | Exact phrase match |
| Boolean OR | `docker OR kubernetes` | Either term |
| Boolean NOT | `python NOT java` | Exclude term |
| Prefix | `deploy*` | Prefix match |
### Filtered Search
```python
# Search only CLI sessions
results = db.search_messages("error", source_filter=["cli"])
# Exclude gateway sessions
results = db.search_messages("bug", exclude_sources=["telegram", "discord"])
# Search only user messages
results = db.search_messages("help", role_filter=["user"])
```
### Search Results Format
Each result includes:
- `id`, `session_id`, `role`, `timestamp`
- `snippet` — FTS5-generated snippet with `>>>match<<<` markers
- `context` — 1 message before and after the match (content truncated to 200 chars)
- `source`, `model`, `session_started` — from the parent session
The `_sanitize_fts5_query()` method handles edge cases:
- Strips unmatched quotes and special characters
- Wraps hyphenated terms in quotes (`chat-send``"chat-send"`)
- Removes dangling boolean operators (`hello AND``hello`)
## Session Lineage
Sessions can form chains via `parent_session_id`. This happens when context
compression triggers a session split in the gateway.
### Query: Find Session Lineage
```sql
-- Find all ancestors of a session
WITH RECURSIVE lineage AS (
SELECT * FROM sessions WHERE id = ?
UNION ALL
SELECT s.* FROM sessions s
JOIN lineage l ON s.id = l.parent_session_id
)
SELECT id, title, started_at, parent_session_id FROM lineage;
-- Find all descendants of a session
WITH RECURSIVE descendants AS (
SELECT * FROM sessions WHERE id = ?
UNION ALL
SELECT s.* FROM sessions s
JOIN descendants d ON s.parent_session_id = d.id
)
SELECT id, title, started_at FROM descendants;
```
### Query: Recent Sessions with Preview
```sql
SELECT s.*,
COALESCE(
(SELECT SUBSTR(m.content, 1, 63)
FROM messages m
WHERE m.session_id = s.id AND m.role = 'user' AND m.content IS NOT NULL
ORDER BY m.timestamp, m.id LIMIT 1),
''
) AS preview,
COALESCE(
(SELECT MAX(m2.timestamp) FROM messages m2 WHERE m2.session_id = s.id),
s.started_at
) AS last_active
FROM sessions s
ORDER BY s.started_at DESC
LIMIT 20;
```
### Query: Token Usage Statistics
```sql
-- Total tokens by model
SELECT model,
COUNT(*) as session_count,
SUM(input_tokens) as total_input,
SUM(output_tokens) as total_output,
SUM(estimated_cost_usd) as total_cost
FROM sessions
WHERE model IS NOT NULL
GROUP BY model
ORDER BY total_cost DESC;
-- Sessions with highest token usage
SELECT id, title, model, input_tokens + output_tokens AS total_tokens,
estimated_cost_usd
FROM sessions
ORDER BY total_tokens DESC
LIMIT 10;
```
## Export and Cleanup
```python
# Export a single session with messages
data = db.export_session("sess_abc123")
# Export all sessions (with messages) as list of dicts
all_data = db.export_all(source="cli")
# Delete old sessions (only ended sessions)
deleted_count = db.prune_sessions(older_than_days=90)
deleted_count = db.prune_sessions(older_than_days=30, source="telegram")
# Clear messages but keep the session record
db.clear_messages("sess_abc123")
# Delete session and all messages
db.delete_session("sess_abc123")
```
## Database Location
Default path: `~/.hermes/state.db`
This is derived from `hermes_constants.get_hermes_home()` which resolves to
`~/.hermes/` by default, or the value of `HERMES_HOME` environment variable.
The database file, WAL file (`state.db-wal`), and shared-memory file
(`state.db-shm`) are all created in the same directory.

View File

@@ -22,6 +22,89 @@ Each tool module calls `registry.register(...)` at import time.
`model_tools.py` is responsible for importing/discovering tool modules and building the schema list used by the model.
### How `registry.register()` works
Every tool file in `tools/` calls `registry.register()` at module level to declare itself. The function signature is:
```python
registry.register(
name="terminal", # Unique tool name (used in API schemas)
toolset="terminal", # Toolset this tool belongs to
schema={...}, # OpenAI function-calling schema (description, parameters)
handler=handle_terminal, # The function that executes when the tool is called
check_fn=check_terminal, # Optional: returns True/False for availability
requires_env=["SOME_VAR"], # Optional: env vars needed (for UI display)
is_async=False, # Whether the handler is an async coroutine
description="Run commands", # Human-readable description
emoji="💻", # Emoji for spinner/progress display
)
```
Each call creates a `ToolEntry` stored in the singleton `ToolRegistry._tools` dict keyed by tool name. If a name collision occurs across toolsets, a warning is logged and the later registration wins.
### Discovery: `_discover_tools()`
When `model_tools.py` is imported, it calls `_discover_tools()` which imports every tool module in order:
```python
_modules = [
"tools.web_tools",
"tools.terminal_tool",
"tools.file_tools",
"tools.vision_tools",
"tools.mixture_of_agents_tool",
"tools.image_generation_tool",
"tools.skills_tool",
"tools.browser_tool",
"tools.cronjob_tools",
"tools.rl_training_tool",
"tools.tts_tool",
"tools.todo_tool",
"tools.memory_tool",
"tools.session_search_tool",
"tools.clarify_tool",
"tools.code_execution_tool",
"tools.delegate_tool",
"tools.process_registry",
"tools.send_message_tool",
"tools.honcho_tools",
"tools.homeassistant_tool",
]
```
Each import triggers the module's `registry.register()` calls. Errors in optional tools (e.g., missing `fal_client` for image generation) are caught and logged — they don't prevent other tools from loading.
After core tool discovery, MCP tools and plugin tools are also discovered:
1. **MCP tools**`tools.mcp_tool.discover_mcp_tools()` reads MCP server config and registers tools from external servers.
2. **Plugin tools**`hermes_cli.plugins.discover_plugins()` loads user/project/pip plugins that may register additional tools.
## Tool availability checking (`check_fn`)
Each tool can optionally provide a `check_fn` — a callable that returns `True` when the tool is available and `False` otherwise. Typical checks include:
- **API key present** — e.g., `lambda: bool(os.environ.get("SERP_API_KEY"))` for web search
- **Service running** — e.g., checking if the Honcho server is configured
- **Binary installed** — e.g., verifying `playwright` is available for browser tools
When `registry.get_definitions()` builds the schema list for the model, it runs each tool's `check_fn()`:
```python
# Simplified from registry.py
if entry.check_fn:
try:
available = bool(entry.check_fn())
except Exception:
available = False # Exceptions = unavailable
if not available:
continue # Skip this tool entirely
```
Key behaviors:
- Check results are **cached per-call** — if multiple tools share the same `check_fn`, it only runs once.
- Exceptions in `check_fn()` are treated as "unavailable" (fail-safe).
- The `is_toolset_available()` method checks whether a toolset's `check_fn` passes, used for UI display and toolset resolution.
## Toolset resolution
Toolsets are named bundles of tools. Hermes resolves them through:
@@ -31,10 +114,108 @@ Toolsets are named bundles of tools. Hermes resolves them through:
- dynamic MCP toolsets
- curated special-purpose sets like `hermes-acp`
### How `get_tool_definitions()` filters tools
The main entry point is `model_tools.get_tool_definitions(enabled_toolsets, disabled_toolsets, quiet_mode)`:
1. **If `enabled_toolsets` is provided** — only tools from those toolsets are included. Each toolset name is resolved via `resolve_toolset()` which expands composite toolsets into individual tool names.
2. **If `disabled_toolsets` is provided** — start with ALL toolsets, then subtract the disabled ones.
3. **If neither** — include all known toolsets.
4. **Registry filtering** — the resolved tool name set is passed to `registry.get_definitions()`, which applies `check_fn` filtering and returns OpenAI-format schemas.
5. **Dynamic schema patching** — after filtering, `execute_code` and `browser_navigate` schemas are dynamically adjusted to only reference tools that actually passed filtering (prevents model hallucination of unavailable tools).
### Legacy toolset names
Old toolset names with `_tools` suffixes (e.g., `web_tools`, `terminal_tools`) are mapped to their modern tool names via `_LEGACY_TOOLSET_MAP` for backward compatibility.
## Dispatch
At runtime, tools are dispatched through the central registry, with agent-loop exceptions for some agent-level tools such as memory/todo/session-search handling.
### Dispatch flow: model tool_call → handler execution
When the model returns a `tool_call`, the flow is:
```
Model response with tool_call
run_agent.py agent loop
model_tools.handle_function_call(name, args, task_id, user_task)
[Agent-loop tools?] → handled directly by agent loop (todo, memory, session_search, delegate_task)
[Plugin pre-hook] → invoke_hook("pre_tool_call", ...)
registry.dispatch(name, args, **kwargs)
Look up ToolEntry by name
[Async handler?] → bridge via _run_async()
[Sync handler?] → call directly
Return result string (or JSON error)
[Plugin post-hook] → invoke_hook("post_tool_call", ...)
```
### Error wrapping
All tool execution is wrapped in error handling at two levels:
1. **`registry.dispatch()`** — catches any exception from the handler and returns `{"error": "Tool execution failed: ExceptionType: message"}` as JSON.
2. **`handle_function_call()`** — wraps the entire dispatch in a secondary try/except that returns `{"error": "Error executing tool_name: message"}`.
This ensures the model always receives a well-formed JSON string, never an unhandled exception.
### Agent-loop tools
Four tools are intercepted before registry dispatch because they need agent-level state (TodoStore, MemoryStore, etc.):
- `todo` — planning/task tracking
- `memory` — persistent memory writes
- `session_search` — cross-session recall
- `delegate_task` — spawns subagent sessions
These tools' schemas are still registered in the registry (for `get_tool_definitions`), but their handlers return a stub error if dispatch somehow reaches them directly.
### Async bridging
When a tool handler is async, `_run_async()` bridges it to the sync dispatch path:
- **CLI path (no running loop)** — uses a persistent event loop to keep cached async clients alive
- **Gateway path (running loop)** — spins up a disposable thread with `asyncio.run()`
- **Worker threads (parallel tools)** — uses per-thread persistent loops stored in thread-local storage
## The DANGEROUS_PATTERNS approval flow
The terminal tool integrates a dangerous-command approval system defined in `tools/approval.py`:
1. **Pattern detection**`DANGEROUS_PATTERNS` is a list of `(regex, description)` tuples covering destructive operations:
- Recursive deletes (`rm -rf`)
- Filesystem formatting (`mkfs`, `dd`)
- SQL destructive operations (`DROP TABLE`, `DELETE FROM` without `WHERE`)
- System config overwrites (`> /etc/`)
- Service manipulation (`systemctl stop`)
- Remote code execution (`curl | sh`)
- Fork bombs, process kills, etc.
2. **Detection** — before executing any terminal command, `detect_dangerous_command(command)` checks against all patterns.
3. **Approval prompt** — if a match is found:
- **CLI mode** — an interactive prompt asks the user to approve, deny, or allow permanently
- **Gateway mode** — an async approval callback sends the request to the messaging platform
- **Smart approval** — optionally, an auxiliary LLM can auto-approve low-risk commands that match patterns (e.g., `rm -rf node_modules/` is safe but matches "recursive delete")
4. **Session state** — approvals are tracked per-session. Once you approve "recursive delete" for a session, subsequent `rm -rf` commands don't re-prompt.
5. **Permanent allowlist** — the "allow permanently" option writes the pattern to `config.yaml`'s `command_allowlist`, persisting across sessions.
## Terminal/runtime environments
The terminal system supports multiple backends:

View File

@@ -1,56 +1,233 @@
---
sidebar_position: 10
title: "Trajectories & Training Format"
description: "How Hermes saves trajectories, normalizes tool calls, and produces training-friendly outputs"
---
# Trajectory Format
# Trajectories & Training Format
Hermes Agent saves conversation trajectories in ShareGPT-compatible JSONL format
for use as training data, debugging artifacts, and reinforcement learning datasets.
Hermes can save conversation trajectories for training, evaluation, and batch data generation workflows.
Source files: `agent/trajectory.py`, `run_agent.py` (lines 1788-1975), `batch_runner.py`
Primary files:
- `agent/trajectory.py`
- `run_agent.py`
- `batch_runner.py`
- `trajectory_compressor.py`
## File Naming Convention
## What trajectories are for
Trajectories are written to files in the current working directory:
Trajectory outputs are used for:
| File | When |
|------|------|
| `trajectory_samples.jsonl` | Conversations that completed successfully (`completed=True`) |
| `failed_trajectories.jsonl` | Conversations that failed or were interrupted (`completed=False`) |
- SFT data generation
- debugging agent behavior
- benchmark/evaluation artifact capture
- post-processing and compression pipelines
The batch runner (`batch_runner.py`) writes to a custom output file per batch
(e.g., `batch_001_output.jsonl`) with additional metadata fields.
## Normalization strategy
You can override the filename via the `filename` parameter in `save_trajectory()`.
Hermes converts live conversation structure into a training-friendly format.
Important behaviors include:
## JSONL Entry Format
- representing reasoning in explicit markup
- converting tool calls into structured XML-like regions for dataset compatibility
- grouping tool outputs appropriately
- separating successful and failed trajectories
Each line in the file is a self-contained JSON object. There are two variants:
## Persistence boundaries
### CLI/Interactive Format (from `_save_trajectory`)
Trajectory files do **not** blindly mirror all runtime prompt state.
```json
{
"conversations": [ ... ],
"timestamp": "2026-03-30T14:22:31.456789",
"model": "anthropic/claude-sonnet-4.6",
"completed": true
}
```
Some prompt-time-only layers are intentionally excluded from persisted trajectory content so datasets are cleaner and less environment-specific.
### Batch Runner Format (from `batch_runner.py`)
## Batch runner
```json
{
"prompt_index": 42,
"conversations": [ ... ],
"metadata": { "prompt_source": "gsm8k", "difficulty": "hard" },
"completed": true,
"partial": false,
"api_calls": 7,
"toolsets_used": ["code_tools", "file_tools"],
"tool_stats": {
"terminal": {"count": 3, "success": 3, "failure": 0},
"read_file": {"count": 2, "success": 2, "failure": 0},
"write_file": {"count": 0, "success": 0, "failure": 0}
},
"tool_error_counts": {
"terminal": 0,
"read_file": 0,
"write_file": 0
}
}
```
`batch_runner.py` emits richer metadata than single-session trajectory saving, including:
The `tool_stats` and `tool_error_counts` dictionaries are normalized to include
ALL possible tools (from `model_tools.TOOL_TO_TOOLSET_MAP`) with zero defaults,
ensuring consistent schema across entries for HuggingFace dataset loading.
- model/provider metadata
- toolset info
- partial/failure markers
- tool statistics
## Related docs
## Conversations Array (ShareGPT Format)
- [Environments, Benchmarks & Data Generation](./environments.md)
- [Agent Loop Internals](./agent-loop.md)
The `conversations` array uses ShareGPT role conventions:
| API Role | ShareGPT `from` |
|----------|-----------------|
| system | `"system"` |
| user | `"human"` |
| assistant | `"gpt"` |
| tool | `"tool"` |
### Complete Example
```json
{
"conversations": [
{
"from": "system",
"value": "You are a function calling AI model. You are provided with function signatures within <tools> </tools> XML tags. You may call one or more functions to assist with the user query. If available tools are not relevant in assisting with user query, just respond in natural conversational language. Don't make assumptions about what values to plug into functions. After calling & executing the functions, you will be provided with function results within <tool_response> </tool_response> XML tags. Here are the available tools:\n<tools>\n[{\"name\": \"terminal\", \"description\": \"Execute shell commands\", \"parameters\": {\"type\": \"object\", \"properties\": {\"command\": {\"type\": \"string\"}}}, \"required\": null}]\n</tools>\nFor each function call return a JSON object, with the following pydantic model json schema for each:\n{'title': 'FunctionCall', 'type': 'object', 'properties': {'name': {'title': 'Name', 'type': 'string'}, 'arguments': {'title': 'Arguments', 'type': 'object'}}, 'required': ['name', 'arguments']}\nEach function call should be enclosed within <tool_call> </tool_call> XML tags.\nExample:\n<tool_call>\n{'name': <function-name>,'arguments': <args-dict>}\n</tool_call>"
},
{
"from": "human",
"value": "What Python version is installed?"
},
{
"from": "gpt",
"value": "<think>\nThe user wants to know the Python version. I should run python3 --version.\n</think>\n<tool_call>\n{\"name\": \"terminal\", \"arguments\": {\"command\": \"python3 --version\"}}\n</tool_call>"
},
{
"from": "tool",
"value": "<tool_response>\n{\"tool_call_id\": \"call_abc123\", \"name\": \"terminal\", \"content\": \"Python 3.11.6\"}\n</tool_response>"
},
{
"from": "gpt",
"value": "<think>\nGot the version. I can now answer the user.\n</think>\nPython 3.11.6 is installed on this system."
}
],
"timestamp": "2026-03-30T14:22:31.456789",
"model": "anthropic/claude-sonnet-4.6",
"completed": true
}
```
## Normalization Rules
### Reasoning Content Markup
The trajectory converter normalizes ALL reasoning into `<think>` tags, regardless
of how the model originally produced it:
1. **Native thinking tokens** (`msg["reasoning"]` field from providers like
Anthropic, OpenAI o-series): Wrapped as `<think>\n{reasoning}\n</think>\n`
and prepended before the content.
2. **REASONING_SCRATCHPAD XML** (when native thinking is disabled and the model
reasons via system-prompt-instructed XML): `<REASONING_SCRATCHPAD>` tags are
converted to `<think>` via `convert_scratchpad_to_think()`.
3. **Empty think blocks**: Every `gpt` turn is guaranteed to have a `<think>`
block. If no reasoning was produced, an empty block is inserted:
`<think>\n</think>\n` — this ensures consistent format for training data.
### Tool Call Normalization
Tool calls from the API format (with `tool_call_id`, function name, arguments as
JSON string) are converted to XML-wrapped JSON:
```
<tool_call>
{"name": "terminal", "arguments": {"command": "ls -la"}}
</tool_call>
```
- Arguments are parsed from JSON strings back to objects (not double-encoded)
- If JSON parsing fails (shouldn't happen — validated during conversation),
an empty `{}` is used with a warning logged
- Multiple tool calls in one assistant turn produce multiple `<tool_call>` blocks
in a single `gpt` message
### Tool Response Normalization
All tool results following an assistant message are grouped into a single `tool`
turn with XML-wrapped JSON responses:
```
<tool_response>
{"tool_call_id": "call_abc123", "name": "terminal", "content": "output here"}
</tool_response>
```
- If tool content looks like JSON (starts with `{` or `[`), it's parsed so the
content field contains a JSON object/array rather than a string
- Multiple tool results are joined with newlines in one message
- The tool name is matched by position against the parent assistant's `tool_calls`
array
### System Message
The system message is generated at save time (not taken from the conversation).
It follows the Hermes function-calling prompt template with:
- Preamble explaining the function-calling protocol
- `<tools>` XML block containing the JSON tool definitions
- Schema reference for `FunctionCall` objects
- `<tool_call>` example
Tool definitions include `name`, `description`, `parameters`, and `required`
(set to `null` to match the canonical format).
## Loading Trajectories
Trajectories are standard JSONL — load with any JSON-lines reader:
```python
import json
def load_trajectories(path: str):
"""Load trajectory entries from a JSONL file."""
entries = []
with open(path, "r", encoding="utf-8") as f:
for line in f:
line = line.strip()
if line:
entries.append(json.loads(line))
return entries
# Filter to successful completions only
successful = [e for e in load_trajectories("trajectory_samples.jsonl")
if e.get("completed")]
# Extract just the conversations for training
training_data = [e["conversations"] for e in successful]
```
### Loading for HuggingFace Datasets
```python
from datasets import load_dataset
ds = load_dataset("json", data_files="trajectory_samples.jsonl")
```
The normalized `tool_stats` schema ensures all entries have the same columns,
preventing Arrow schema mismatch errors during dataset loading.
## Controlling Trajectory Saving
In the CLI, trajectory saving is controlled by:
```yaml
# config.yaml
agent:
save_trajectories: true # default: false
```
Or via the `--save-trajectories` flag. When the agent initializes with
`save_trajectories=True`, the `_save_trajectory()` method is called at the end
of each conversation turn.
The batch runner always saves trajectories (that's its primary purpose).
Samples with zero reasoning across all turns are automatically discarded by the
batch runner to avoid polluting training data with non-reasoning examples.

View File

@@ -20,6 +20,43 @@ This pulls the latest code, updates dependencies, and prompts you to configure a
`hermes update` automatically detects new configuration options and prompts you to add them. If you skipped that prompt, you can manually run `hermes config check` to see missing options, then `hermes config migrate` to interactively add them.
:::
### What happens during an update
When you run `hermes update`, the following steps occur:
1. **Git pull** — pulls the latest code from the `main` branch and updates submodules
2. **Dependency install** — runs `uv pip install -e ".[all]"` to pick up new or changed dependencies
3. **Config migration** — detects new config options added since your version and prompts you to set them
4. **Gateway auto-restart** — if the gateway service is running (systemd on Linux, launchd on macOS), it is **automatically restarted** after the update completes so the new code takes effect immediately
Expected output looks like:
```
$ hermes update
Updating Hermes Agent...
📥 Pulling latest code...
Already up to date. (or: Updating abc1234..def5678)
📦 Updating dependencies...
✅ Dependencies updated
🔍 Checking for new config options...
✅ Config is up to date (or: Found 2 new options — running migration...)
🔄 Restarting gateway service...
✅ Gateway restarted
✅ Hermes Agent updated successfully!
```
### Checking your current version
```bash
hermes version
```
Compare against the latest release at the [GitHub releases page](https://github.com/NousResearch/hermes-agent/releases) or check for available updates:
```bash
hermes update --check
```
### Updating from Messaging Platforms
You can also update directly from Telegram, Discord, Slack, or WhatsApp by sending:
@@ -28,7 +65,7 @@ You can also update directly from Telegram, Discord, Slack, or WhatsApp by sendi
/update
```
This pulls the latest code, updates dependencies, and restarts the gateway.
This pulls the latest code, updates dependencies, and restarts the gateway. The bot will briefly go offline during the restart (typically 515 seconds) and then resume.
### Manual Update
@@ -51,6 +88,57 @@ hermes config check
hermes config migrate # Interactively add any missing options
```
### Rollback instructions
If an update introduces a problem, you can roll back to a previous version:
```bash
cd /path/to/hermes-agent
# List recent versions
git log --oneline -10
# Roll back to a specific commit
git checkout <commit-hash>
git submodule update --init --recursive
uv pip install -e ".[all]"
# Restart the gateway if running
hermes gateway restart
```
To roll back to a specific release tag:
```bash
git checkout v0.6.0
git submodule update --init --recursive
uv pip install -e ".[all]"
```
:::warning
Rolling back may cause config incompatibilities if new options were added. Run `hermes config check` after rolling back and remove any unrecognized options from `config.yaml` if you encounter errors.
:::
### Note for Nix users
If you installed via Nix flake, updates are managed through the Nix package manager:
```bash
# Update the flake input
nix flake update hermes-agent
# Or rebuild with the latest
nix profile upgrade hermes-agent
```
Nix installations are immutable — rollback is handled by Nix's generation system:
```bash
nix profile rollback
```
See [Nix Setup](./nix-setup.md) for more details.
---
## Uninstalling

View File

@@ -8,18 +8,75 @@ sidebar_position: 0
Hermes Agent connects to external systems for AI inference, tool servers, IDE workflows, programmatic access, and more. These integrations extend what Hermes can do and where it can run.
## Available Integrations
## AI Providers & Routing
- **[AI Providers](/docs/user-guide/features/provider-routing)** — Set up and configure inference providers. Hermes works with OpenRouter, Anthropic, OpenAI, Google, and any OpenAI-compatible endpoint. Use `hermes model` to configure interactively.
Hermes supports multiple AI inference providers out of the box. Use `hermes model` to configure interactively, or set them in `config.yaml`.
- **[MCP Servers](/docs/user-guide/features/mcp)** — Connect Hermes to external tool servers via Model Context Protocol. Access tools from GitHub, databases, file systems, browser stacks, internal APIs, and more without writing native Hermes tools.
- **[AI Providers](/docs/user-guide/features/provider-routing)** — OpenRouter, Anthropic, OpenAI, Google, and any OpenAI-compatible endpoint. Hermes auto-detects capabilities like vision, streaming, and tool use per provider.
- **[Provider Routing](/docs/user-guide/features/provider-routing)** — Fine-grained control over which underlying providers handle your OpenRouter requests. Optimize for cost, speed, or quality with sorting, whitelists, blacklists, and explicit priority ordering.
- **[Fallback Providers](/docs/user-guide/features/fallback-providers)** — Automatic failover to backup LLM providers when your primary model encounters errors. Includes primary model fallback and independent auxiliary task fallback for vision, compression, and web extraction.
## Tool Servers (MCP)
- **[MCP Servers](/docs/user-guide/features/mcp)** — Connect Hermes to external tool servers via Model Context Protocol. Access tools from GitHub, databases, file systems, browser stacks, internal APIs, and more without writing native Hermes tools. Supports both stdio and SSE transports, per-server tool filtering, and capability-aware resource/prompt registration.
## Web Search Backends
The `web_search`, `web_extract`, and `web_crawl` tools support four backend providers, configured via `config.yaml` or `hermes tools`:
| Backend | Env Var | Search | Extract | Crawl |
|---------|---------|--------|---------|-------|
| **Firecrawl** (default) | `FIRECRAWL_API_KEY` | ✔ | ✔ | ✔ |
| **Parallel** | `PARALLEL_API_KEY` | ✔ | ✔ | — |
| **Tavily** | `TAVILY_API_KEY` | ✔ | ✔ | ✔ |
| **Exa** | `EXA_API_KEY` | ✔ | ✔ | — |
Quick setup example:
```yaml
web:
backend: firecrawl # firecrawl | parallel | tavily | exa
```
If `web.backend` is not set, the backend is auto-detected from whichever API key is available. Self-hosted Firecrawl is also supported via `FIRECRAWL_API_URL`.
## Browser Automation
Hermes includes full browser automation with multiple backend options for navigating websites, filling forms, and extracting information:
- **Browserbase** — Managed cloud browsers with anti-bot tooling, CAPTCHA solving, and residential proxies
- **Browser Use** — Alternative cloud browser provider
- **Local Chrome via CDP** — Connect to your running Chrome instance using `/browser connect`
- **Local Chromium** — Headless local browser via the `agent-browser` CLI
See [Browser Automation](/docs/user-guide/features/browser) for setup and usage.
## Voice & TTS Providers
Text-to-speech and speech-to-text across all messaging platforms:
| Provider | Quality | Cost | API Key |
|----------|---------|------|---------|
| **Edge TTS** (default) | Good | Free | None needed |
| **ElevenLabs** | Excellent | Paid | `ELEVENLABS_API_KEY` |
| **OpenAI TTS** | Good | Paid | `VOICE_TOOLS_OPENAI_KEY` |
| **NeuTTS** | Good | Free | None needed |
Speech-to-text uses Whisper for voice message transcription on Telegram, Discord, and WhatsApp. See [Voice & TTS](/docs/user-guide/features/tts) and [Voice Mode](/docs/user-guide/features/voice-mode) for details.
## IDE & Editor Integration
- **[IDE Integration (ACP)](/docs/user-guide/features/acp)** — Use Hermes Agent inside ACP-compatible editors such as VS Code, Zed, and JetBrains. Hermes runs as an ACP server, rendering chat messages, tool activity, file diffs, and terminal commands inside your editor.
## Programmatic Access
- **[API Server](/docs/user-guide/features/api-server)** — Expose Hermes as an OpenAI-compatible HTTP endpoint. Any frontend that speaks the OpenAI format — Open WebUI, LobeChat, LibreChat, NextChat, ChatBox — can connect and use Hermes as a backend with its full toolset.
## Memory & Personalization
- **[Honcho Memory](/docs/user-guide/features/honcho)** — AI-native persistent memory for cross-session user modeling and personalization. Honcho adds deep user modeling via dialectic reasoning on top of Hermes's built-in memory system.
- **[Provider Routing](/docs/user-guide/features/provider-routing)** — Fine-grained control over which underlying AI providers handle your OpenRouter requests. Optimize for cost, speed, or quality with sorting, whitelists, blacklists, and explicit priority ordering.
## Training & Evaluation
- **[Fallback Providers](/docs/user-guide/features/fallback-providers)** — Automatic failover to backup LLM providers when your primary model encounters errors. Includes primary model fallback and independent auxiliary task fallback for vision, compression, and web extraction.
- **[RL Training](/docs/user-guide/features/rl-training)** — Generate trajectory data from agent sessions for reinforcement learning and model fine-tuning.
- **[Batch Processing](/docs/user-guide/features/batch-processing)** — Run the agent across hundreds of prompts in parallel, generating structured ShareGPT-format trajectory data for training data generation or evaluation.

View File

@@ -860,12 +860,15 @@ When enabled, responses appear token-by-token inside a streaming box. Tool calls
```yaml
streaming:
enabled: true # Enable progressive message editing
transport: edit # "edit" (progressive message editing) or "off"
edit_interval: 0.3 # Seconds between message edits
buffer_threshold: 40 # Characters before forcing an edit flush
cursor: " ▉" # Cursor shown during streaming
```
When enabled, the bot sends a message on the first token, then progressively edits it as more tokens arrive. Platforms that don't support message editing (Signal, Email) gracefully skip streaming and deliver the final response normally.
When enabled, the bot sends a message on the first token, then progressively edits it as more tokens arrive. Platforms that don't support message editing (Signal, Email, Home Assistant) are auto-detected on the first attempt — streaming is gracefully disabled for that session with no flood of messages.
**Overflow handling:** If the streamed text exceeds the platform's message length limit (~4096 chars), the current message is finalized and a new one starts automatically.
:::note
Streaming is disabled by default. Enable it in `~/.hermes/config.yaml` to try the streaming UX.
@@ -929,23 +932,6 @@ Usage: type `/status`, `/disk`, `/update`, or `/gpu` in the CLI or any messaging
- **Type** — only `exec` is supported (runs a shell command); other types show an error
- **Works everywhere** — CLI, Telegram, Discord, Slack, WhatsApp, Signal, Email, Home Assistant
## Gateway Streaming
Enable progressive token delivery on messaging platforms. When streaming is enabled, responses appear character-by-character in Telegram, Discord, and Slack via message editing, rather than waiting for the full response.
```yaml
streaming:
enabled: false # Enable streaming token delivery (default: off)
transport: edit # "edit" (progressive message editing) or "off"
edit_interval: 0.3 # Min seconds between message edits
buffer_threshold: 40 # Characters accumulated before forcing an edit
cursor: " ▉" # Cursor character shown during streaming
```
**Platform support:** Telegram, Discord, and Slack support edit-based streaming. Platforms that don't support message editing (Signal, Email, Home Assistant) are auto-detected on the first attempt — streaming is gracefully disabled for that session with no flood of messages.
**Overflow handling:** If the streamed text exceeds the platform's message length limit (~4096 chars), the current message is finalized and a new one starts automatically.
## Human Delay
Simulate human-like response pacing in messaging platforms:

View File

@@ -1,10 +1,17 @@
---
sidebar_position: 7
title: "Docker"
description: "Running Hermes Agent in Docker and using Docker as a terminal backend"
---
# Hermes Agent — Docker
Want to run Hermes Agent, but without installing packages on your host? This'll sort you out.
There are two distinct ways Docker intersects with Hermes Agent:
This will let you run the agent in a container, with the most relevant modes outlined below.
1. **Running Hermes IN Docker** — the agent itself runs inside a container (this page's primary focus)
2. **Docker as a terminal backend** — the agent runs on your host but executes commands inside a Docker sandbox (see [Configuration → terminal.backend](./configuration.md))
The container stores all user data (config, API keys, sessions, skills, memories) in a single directory mounted from the host at `/opt/data`. The image itself is stateless and can be upgraded by pulling a new version without losing any configuration.
This page covers option 1. The container stores all user data (config, API keys, sessions, skills, memories) in a single directory mounted from the host at `/opt/data`. The image itself is stateless and can be upgraded by pulling a new version without losing any configuration.
## Quick start
@@ -41,6 +48,110 @@ docker run -it --rm \
nousresearch/hermes-agent
```
## Persistent volumes
The `/opt/data` volume is the single source of truth for all Hermes state. It maps to your host's `~/.hermes/` directory and contains:
| Path | Contents |
|------|----------|
| `.env` | API keys and secrets |
| `config.yaml` | All Hermes configuration |
| `SOUL.md` | Agent personality/identity |
| `sessions/` | Conversation history |
| `memories/` | Persistent memory store |
| `skills/` | Installed skills |
| `cron/` | Scheduled job definitions |
| `hooks/` | Event hooks |
| `logs/` | Runtime logs |
| `skins/` | Custom CLI skins |
:::warning
Never run two Hermes containers against the same data directory simultaneously — session files and memory stores are not designed for concurrent access.
:::
## Environment variable forwarding
API keys are read from `/opt/data/.env` inside the container. You can also pass environment variables directly:
```sh
docker run -it --rm \
-v ~/.hermes:/opt/data \
-e ANTHROPIC_API_KEY="sk-ant-..." \
-e OPENAI_API_KEY="sk-..." \
nousresearch/hermes-agent
```
Direct `-e` flags override values from `.env`. This is useful for CI/CD or secrets-manager integrations where you don't want keys on disk.
## Docker Compose example
For persistent gateway deployment, a `docker-compose.yaml` is convenient:
```yaml
version: "3.8"
services:
hermes:
image: nousresearch/hermes-agent:latest
container_name: hermes
restart: unless-stopped
command: gateway run
volumes:
- ~/.hermes:/opt/data
# Uncomment to forward specific env vars instead of using .env file:
# environment:
# - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
# - OPENAI_API_KEY=${OPENAI_API_KEY}
# - TELEGRAM_BOT_TOKEN=${TELEGRAM_BOT_TOKEN}
deploy:
resources:
limits:
memory: 4G
cpus: "2.0"
```
Start with `docker compose up -d` and view logs with `docker compose logs -f hermes`.
## Resource limits
The Hermes container needs moderate resources. Recommended minimums:
| Resource | Minimum | Recommended |
|----------|---------|-------------|
| Memory | 1 GB | 24 GB |
| CPU | 1 core | 2 cores |
| Disk (data volume) | 500 MB | 2+ GB (grows with sessions/skills) |
Browser automation (Playwright/Chromium) is the most memory-hungry feature. If you don't need browser tools, 1 GB is sufficient. With browser tools active, allocate at least 2 GB.
Set limits in Docker:
```sh
docker run -d \
--name hermes \
--restart unless-stopped \
--memory=4g --cpus=2 \
-v ~/.hermes:/opt/data \
nousresearch/hermes-agent gateway run
```
## What the Dockerfile does
The official image is based on `debian:13.4` and includes:
- Python 3 with all Hermes dependencies (`pip install -e ".[all]"`)
- Node.js + npm (for browser automation and WhatsApp bridge)
- Playwright with Chromium (`npx playwright install --with-deps chromium`)
- ripgrep and ffmpeg as system utilities
- The WhatsApp bridge (`scripts/whatsapp-bridge/`)
The entrypoint script (`docker/entrypoint.sh`) bootstraps the data volume on first run:
- Creates the directory structure (`sessions/`, `memories/`, `skills/`, etc.)
- Copies `.env.example``.env` if no `.env` exists
- Copies default `config.yaml` if missing
- Copies default `SOUL.md` if missing
- Syncs bundled skills using a manifest-based approach (preserves user edits)
- Then runs `hermes` with whatever arguments you pass
## Upgrading
Pull the latest image and recreate the container. Your data directory is untouched.
@@ -52,7 +163,14 @@ docker run -d \
--name hermes \
--restart unless-stopped \
-v ~/.hermes:/opt/data \
nousresearch/hermes-agent
nousresearch/hermes-agent gateway run
```
Or with Docker Compose:
```sh
docker compose pull
docker compose up -d
```
## Skills and credential files
@@ -60,3 +178,47 @@ docker run -d \
When using Docker as the execution environment (not the methods above, but when the agent runs commands inside a Docker sandbox), Hermes automatically bind-mounts the skills directory (`~/.hermes/skills/`) and any credential files declared by skills into the container as read-only volumes. This means skill scripts, templates, and references are available inside the sandbox without manual configuration.
The same syncing happens for SSH and Modal backends — skills and credential files are uploaded via rsync or the Modal mount API before each command.
## Troubleshooting
### Container exits immediately
Check logs: `docker logs hermes`. Common causes:
- Missing or invalid `.env` file — run interactively first to complete setup
- Port conflicts if running with exposed ports
### "Permission denied" errors
The container runs as root by default. If your host `~/.hermes/` was created by a non-root user, permissions should work. If you get errors, ensure the data directory is writable:
```sh
chmod -R 755 ~/.hermes
```
### Browser tools not working
Playwright needs shared memory. Add `--shm-size=1g` to your Docker run command:
```sh
docker run -d \
--name hermes \
--shm-size=1g \
-v ~/.hermes:/opt/data \
nousresearch/hermes-agent gateway run
```
### Gateway not reconnecting after network issues
The `--restart unless-stopped` flag handles most transient failures. If the gateway is stuck, restart the container:
```sh
docker restart hermes
```
### Checking container health
```sh
docker logs --tail 50 hermes # Recent logs
docker exec hermes hermes version # Verify version
docker stats hermes # Resource usage
```

View File

@@ -168,9 +168,7 @@ So a server that exposes callable tools but no resources/prompts will not get th
## Per-server filtering
This is the main feature added by the PR work.
You can now control which tools each MCP server contributes to Hermes.
You can control which tools each MCP server contributes to Hermes, allowing fine-grained management of your tool namespace.
### Disable a server entirely

View File

@@ -33,6 +33,15 @@ Hermes Agent includes a rich set of capabilities that extend far beyond basic ch
- **[Image Generation](image-generation.md)** — Generate images from text prompts using FAL.ai's FLUX 2 Pro model with automatic 2x upscaling via the Clarity Upscaler.
- **[Voice & TTS](tts.md)** — Text-to-speech output and voice message transcription across all messaging platforms, with four provider options: Edge TTS (free), ElevenLabs, OpenAI TTS, and NeuTTS.
## Integrations
- **[Provider Routing](provider-routing.md)** — Fine-grained control over which AI providers handle your requests. Optimize for cost, speed, or quality with sorting, whitelists, blacklists, and priority ordering.
- **[Fallback Providers](fallback-providers.md)** — Automatic failover to backup LLM providers when your primary model encounters errors, including independent fallback for auxiliary tasks like vision and compression.
- **[API Server](api-server.md)** — Expose Hermes as an OpenAI-compatible HTTP endpoint. Connect any frontend that speaks the OpenAI format — Open WebUI, LobeChat, LibreChat, and more.
- **[IDE Integration (ACP)](acp.md)** — Use Hermes inside ACP-compatible editors such as VS Code, Zed, and JetBrains. Chat, tool activity, file diffs, and terminal commands render inside your editor.
- **[Honcho Memory](honcho.md)** — AI-native persistent memory for cross-session user modeling and personalization via dialectic reasoning.
- **[RL Training](rl-training.md)** — Generate trajectory data from agent sessions for reinforcement learning and model fine-tuning.
## Customization
- **[Personality & SOUL.md](personality.md)** — Fully customizable agent personality. `SOUL.md` is the primary identity file — the first thing in the system prompt — and you can swap in built-in or custom `/personality` presets per session.

View File

@@ -25,6 +25,56 @@ Drop a directory into `~/.hermes/plugins/` with a `plugin.yaml` and Python code:
Start Hermes — your tools appear alongside built-in tools. The model can call them immediately.
### Minimal working example
Here is a complete plugin that adds a `hello_world` tool and logs every tool call via a hook.
**`~/.hermes/plugins/hello-world/plugin.yaml`**
```yaml
name: hello-world
version: "1.0"
description: A minimal example plugin
```
**`~/.hermes/plugins/hello-world/__init__.py`**
```python
"""Minimal Hermes plugin — registers a tool and a hook."""
def register(ctx):
# --- Tool: hello_world ---
schema = {
"name": "hello_world",
"description": "Returns a friendly greeting for the given name.",
"parameters": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "Name to greet",
}
},
"required": ["name"],
},
}
def handle_hello(params):
name = params.get("name", "World")
return f"Hello, {name}! 👋 (from the hello-world plugin)"
ctx.register_tool("hello_world", schema, handle_hello)
# --- Hook: log every tool call ---
def on_tool_call(tool_name, params, result):
print(f"[hello-world] tool called: {tool_name}")
ctx.register_hook("post_tool_call", on_tool_call)
```
Drop both files into `~/.hermes/plugins/hello-world/`, restart Hermes, and the model can immediately call `hello_world`. The hook prints a log line after every tool invocation.
Project-local plugins under `./.hermes/plugins/` are disabled by default. Enable them only for trusted repositories by setting `HERMES_ENABLE_PROJECT_PLUGINS=true` before starting Hermes.
## What plugins can do

View File

@@ -30,28 +30,150 @@ display:
## Built-in skins
| Skin | Description | Agent branding |
|------|-------------|----------------|
| `default` | Classic Hermes — gold and kawaii | `Hermes Agent` |
| `ares` | War-god theme — crimson and bronze | `Ares Agent` |
| `mono` | Monochrome — clean grayscale | `Hermes Agent` |
| `slate` | Cool blue — developer-focused | `Hermes Agent` |
| `poseidon` | Ocean-god theme — deep blue and seafoam | `Poseidon Agent` |
| `sisyphus` | Sisyphean theme — austere grayscale with persistence | `Sisyphus Agent` |
| `charizard` | Volcanic theme — burnt orange and ember | `Charizard Agent` |
| Skin | Description | Agent branding | Visual character |
|------|-------------|----------------|------------------|
| `default` | Classic Hermes — gold and kawaii | `Hermes Agent` | Warm gold borders, cornsilk text, kawaii faces in spinners. The familiar caduceus banner. Clean and inviting. |
| `ares` | War-god theme — crimson and bronze | `Ares Agent` | Deep crimson borders with bronze accents. Aggressive spinner verbs ("forging", "marching", "tempering steel"). Custom sword-and-shield ASCII art banner. |
| `mono` | Monochrome — clean grayscale | `Hermes Agent` | All grays — no color. Borders are `#555555`, text is `#c9d1d9`. Ideal for minimal terminal setups or screen recordings. |
| `slate` | Cool blue — developer-focused | `Hermes Agent` | Royal blue borders (`#4169e1`), soft blue text. Calm and professional. No custom spinner — uses default faces. |
| `poseidon` | Ocean-god theme — deep blue and seafoam | `Poseidon Agent` | Deep blue to seafoam gradient. Ocean-themed spinners ("charting currents", "sounding the depth"). Trident ASCII art banner. |
| `sisyphus` | Sisyphean theme — austere grayscale with persistence | `Sisyphus Agent` | Light grays with stark contrast. Boulder-themed spinners ("pushing uphill", "resetting the boulder", "enduring the loop"). Boulder-and-hill ASCII art banner. |
| `charizard` | Volcanic theme — burnt orange and ember | `Charizard Agent` | Warm burnt orange to ember gradient. Fire-themed spinners ("banking into the draft", "measuring burn"). Dragon-silhouette ASCII art banner. |
## What a skin can customize
## Complete list of configurable keys
| Area | Keys |
|------|------|
| Banner + response colors | `colors.banner_*`, `colors.response_border` |
| Spinner animation | `spinner.waiting_faces`, `spinner.thinking_faces`, `spinner.thinking_verbs`, `spinner.wings` |
| Branding text | `branding.agent_name`, `branding.welcome`, `branding.response_label`, `branding.prompt_symbol` |
| Tool activity prefix | `tool_prefix` |
### Colors (`colors:`)
Controls all color values throughout the CLI. Values are hex color strings.
| Key | Description | Default (`default` skin) |
|-----|-------------|--------------------------|
| `banner_border` | Panel border around the startup banner | `#CD7F32` (bronze) |
| `banner_title` | Title text color in the banner | `#FFD700` (gold) |
| `banner_accent` | Section headers in the banner (Available Tools, etc.) | `#FFBF00` (amber) |
| `banner_dim` | Muted text in the banner (separators, secondary labels) | `#B8860B` (dark goldenrod) |
| `banner_text` | Body text in the banner (tool names, skill names) | `#FFF8DC` (cornsilk) |
| `ui_accent` | General UI accent color (highlights, active elements) | `#FFBF00` |
| `ui_label` | UI labels and tags | `#4dd0e1` (teal) |
| `ui_ok` | Success indicators (checkmarks, completion) | `#4caf50` (green) |
| `ui_error` | Error indicators (failures, blocked) | `#ef5350` (red) |
| `ui_warn` | Warning indicators (caution, approval prompts) | `#ffa726` (orange) |
| `prompt` | Interactive prompt text color | `#FFF8DC` |
| `input_rule` | Horizontal rule above the input area | `#CD7F32` |
| `response_border` | Border around the agent's response box (ANSI escape) | `#FFD700` |
| `session_label` | Session label color | `#DAA520` |
| `session_border` | Session ID dim border color | `#8B8682` |
### Spinner (`spinner:`)
Controls the animated spinner shown while waiting for API responses.
| Key | Type | Description | Example |
|-----|------|-------------|---------|
| `waiting_faces` | list of strings | Faces cycled while waiting for API response | `["(⚔)", "(⛨)", "(▲)"]` |
| `thinking_faces` | list of strings | Faces cycled during model reasoning | `["(⚔)", "(⌁)", "(<>)"]` |
| `thinking_verbs` | list of strings | Verbs shown in spinner messages | `["forging", "plotting", "hammering plans"]` |
| `wings` | list of [left, right] pairs | Decorative brackets around the spinner | `[["⟪⚔", "⚔⟫"], ["⟪▲", "▲⟫"]]` |
When spinner values are empty (like in `default` and `mono`), hardcoded defaults from `display.py` are used.
### Branding (`branding:`)
Text strings used throughout the CLI interface.
| Key | Description | Default |
|-----|-------------|---------|
| `agent_name` | Name shown in banner title and status display | `Hermes Agent` |
| `welcome` | Welcome message shown at CLI startup | `Welcome to Hermes Agent! Type your message or /help for commands.` |
| `goodbye` | Message shown on exit | `Goodbye! ⚕` |
| `response_label` | Label on the response box header | ` ⚕ Hermes ` |
| `prompt_symbol` | Symbol before the user input prompt | ` ` |
| `help_header` | Header text for the `/help` command output | `(^_^)? Available Commands` |
### Other top-level keys
| Key | Type | Description | Default |
|-----|------|-------------|---------|
| `tool_prefix` | string | Character prefixed to tool output lines in the CLI | `┊` |
| `tool_emojis` | dict | Per-tool emoji overrides for spinners and progress (`{tool_name: emoji}`) | `{}` |
| `banner_logo` | string | Rich-markup ASCII art logo (replaces the default HERMES_AGENT banner) | `""` |
| `banner_hero` | string | Rich-markup hero art (replaces the default caduceus art) | `""` |
## Custom skins
Create YAML files under `~/.hermes/skins/`. User skins inherit missing values from the built-in `default` skin.
Create YAML files under `~/.hermes/skins/`. User skins inherit missing values from the built-in `default` skin, so you only need to specify the keys you want to change.
### Full custom skin YAML template
```yaml
# ~/.hermes/skins/mytheme.yaml
# Complete skin template — all keys shown. Delete any you don't need;
# missing values automatically inherit from the 'default' skin.
name: mytheme
description: My custom theme
colors:
banner_border: "#CD7F32"
banner_title: "#FFD700"
banner_accent: "#FFBF00"
banner_dim: "#B8860B"
banner_text: "#FFF8DC"
ui_accent: "#FFBF00"
ui_label: "#4dd0e1"
ui_ok: "#4caf50"
ui_error: "#ef5350"
ui_warn: "#ffa726"
prompt: "#FFF8DC"
input_rule: "#CD7F32"
response_border: "#FFD700"
session_label: "#DAA520"
session_border: "#8B8682"
spinner:
waiting_faces:
- "(⚔)"
- "(⛨)"
- "(▲)"
thinking_faces:
- "(⚔)"
- "(⌁)"
- "(<>)"
thinking_verbs:
- "processing"
- "analyzing"
- "computing"
- "evaluating"
wings:
- ["⟪⚡", "⚡⟫"]
- ["⟪●", "●⟫"]
branding:
agent_name: "My Agent"
welcome: "Welcome to My Agent! Type your message or /help for commands."
goodbye: "See you later! ⚡"
response_label: " ⚡ My Agent "
prompt_symbol: "⚡ "
help_header: "(⚡) Available Commands"
tool_prefix: "┊"
# Per-tool emoji overrides (optional)
tool_emojis:
terminal: "⚔"
web_search: "🔮"
read_file: "📄"
# Custom ASCII art banners (optional, Rich markup supported)
# banner_logo: |
# [bold #FFD700] MY AGENT [/]
# banner_hero: |
# [#FFD700] Custom art here [/]
```
### Minimal custom skin example
Since everything inherits from `default`, a minimal skin only needs to change what's different:
```yaml
name: cyberpunk
@@ -78,4 +200,7 @@ tool_prefix: "▏"
- Built-in skins load from `hermes_cli/skin_engine.py`.
- Unknown skins automatically fall back to `default`.
- `/skin` updates the active CLI theme immediately for the current session.
- `/skin` updates the active CLI theme immediately for the current session.
- User skins in `~/.hermes/skins/` take precedence over built-in skins with the same name.
- Skin changes via `/skin` are session-only. To make a skin your permanent default, set it in `config.yaml`.
- The `banner_logo` and `banner_hero` fields support Rich console markup (e.g., `[bold #FF0000]text[/]`) for colored ASCII art.