docs: deep quality pass — expand 10 thin pages, fix specific issues (#4134)

Developer guide stubs expanded to full documentation: - trajectory-format.md: 56→233 lines (JSONL format, ShareGPT example, normalization rules, reasoning markup, replay code) - session-storage.md: 66→388 lines (SQLite schema, migration table, FTS5 search syntax, lineage queries, Python API examples) - context-compression-and-caching.md: 72→321 lines (dual compression system, config defaults, 4-phase algorithm, before/after example, prompt caching mechanics, cache-aware patterns) - tools-runtime.md: 65→246 lines (registry API, dispatch flow, availability checking, error wrapping, approval flow) - prompt-assembly.md: 89→246 lines (concrete assembled prompt example, SOUL.md injection, context file discovery table) User-facing pages expanded: - docker.md: 62→224 lines (volumes, env forwarding, docker-compose, resource limits, troubleshooting) - updating.md: 79→167 lines (update behavior, version checking, rollback instructions, Nix users) - skins.md: 80→206 lines (all color/spinner/branding keys, built-in skin descriptions, full custom skin YAML template) Hub pages improved: - integrations/index.md: 25→82 lines (web search backends table, TTS/browser providers, quick config example) - features/overview.md: added Integrations section with 6 missing links Specific fixes: - configuration.md: removed duplicate Gateway Streaming section - mcp.md: removed internal "PR work" language - plugins.md: added inline minimal plugin example (self-contained) 13 files changed, ~1700 lines added. Docusaurus build verified clean.
2026-03-30 20:30:11 -07:00
parent 54b876a5c9
commit 5b0243e6ad
13 changed files with 1735 additions and 174 deletions
--- a/website/docs/developer-guide/context-compression-and-caching.md
+++ b/website/docs/developer-guide/context-compression-and-caching.md
@@ -1,72 +1,321 @@
---
-sidebar_position: 6
-title: "Context Compression & Prompt Caching"
-description: "How Hermes compresses long conversations and applies provider-side prompt caching"
---
+# Context Compression and Caching

-# Context Compression & Prompt Caching
+Hermes Agent uses a dual compression system and Anthropic prompt caching to
+manage context window usage efficiently across long conversations.

-Hermes manages long conversations with two complementary mechanisms:
+Source files: `agent/context_compressor.py`, `agent/prompt_caching.py`,
+`gateway/run.py` (session hygiene), `run_agent.py` (lines 1146-1204)

- prompt caching
- context compression

-Primary files:
+## Dual Compression System

- `agent/prompt_caching.py`
- `agent/context_compressor.py`
- `run_agent.py`
+Hermes has two separate compression layers that operate independently:

-## Prompt caching
+```
+                     ┌──────────────────────────┐
+  Incoming message   │   Gateway Session Hygiene │  Fires at 85% of context
+  ─────────────────► │   (pre-agent, rough est.) │  Safety net for large sessions
+                     └─────────────┬────────────┘
+                                   │
+                                   ▼
+                     ┌──────────────────────────┐
+                     │   Agent ContextCompressor │  Fires at 50% of context (default)
+                     │   (in-loop, real tokens)  │  Normal context management
+                     └──────────────────────────┘
+```

-For Anthropic/native and Claude-via-OpenRouter flows, Hermes applies Anthropic-style cache markers.
+### 1. Gateway Session Hygiene (85% threshold)

-Current strategy:
+Located in `gateway/run.py` (around line 2220). This is a **safety net** that
+runs before the agent processes a message. It prevents API failures when sessions
+grow too large between turns (e.g., overnight accumulation in Telegram/Discord).

- cache the system prompt
- cache the last 3 non-system messages
- default TTL is 5 minutes unless explicitly extended
+- **Threshold**: Fixed at 85% of model context length
+- **Token source**: Prefers actual API-reported tokens from last turn; falls back
+  to rough character-based estimate (`estimate_messages_tokens_rough`)
+- **Fires**: Only when `len(history) >= 4` and compression is enabled
+- **Purpose**: Catch sessions that escaped the agent's own compressor

-This is implemented in `agent/prompt_caching.py`.
+The gateway hygiene threshold is intentionally higher than the agent's compressor.
+Setting it at 50% (same as the agent) caused premature compression on every turn
+in long gateway sessions.

-## Why prompt stability matters
+### 2. Agent ContextCompressor (50% threshold, configurable)

-Prompt caching only helps when the stable prefix remains stable. That is why Hermes avoids rebuilding or mutating the core system prompt mid-session unless it has to.
+Located in `agent/context_compressor.py`. This is the **primary compression
+system** that runs inside the agent's tool loop with access to accurate,
+API-reported token counts.

-## Compression trigger

-Hermes can compress context when conversations become large. Configuration defaults live in `config.yaml`, and the compressor also has runtime checks based on actual prompt token counts.
+## Configuration

-## Compression algorithm
+All compression settings are read from `config.yaml` under the `compression` key:

-The compressor protects:
+```yaml
+compression:
+  enabled: true              # Enable/disable compression (default: true)
+  threshold: 0.50            # Fraction of context window (default: 0.50 = 50%)
+  target_ratio: 0.20         # How much of threshold to keep as tail (default: 0.20)
+  protect_last_n: 20         # Minimum protected tail messages (default: 20)
+  summary_model: null        # Override model for summaries (default: uses auxiliary)
+```

- the first N turns
- the last N turns
+### Parameter Details

-and summarizes the middle section.
+| Parameter | Default | Range | Description |
+|-----------|---------|-------|-------------|
+| `threshold` | `0.50` | 0.0-1.0 | Compression triggers when prompt tokens ≥ `threshold × context_length` |
+| `target_ratio` | `0.20` | 0.10-0.80 | Controls tail protection token budget: `threshold_tokens × target_ratio` |
+| `protect_last_n` | `20` | ≥1 | Minimum number of recent messages always preserved |
+| `protect_first_n` | `3` | (hardcoded) | System prompt + first exchange always preserved |

-It also cleans up structural issues such as orphaned tool-call/result pairs so the API never receives invalid conversation structure after compression.
+### Computed Values (for a 200K context model at defaults)

-## Pre-compression memory flush
+```
+context_length       = 200,000
+threshold_tokens     = 200,000 × 0.50 = 100,000
+tail_token_budget    = 100,000 × 0.20 = 20,000
+max_summary_tokens   = min(200,000 × 0.05, 12,000) = 10,000
+```

-Before compression, Hermes can give the model one last chance to persist memory so facts are not lost when middle turns are summarized away.

-## Session lineage after compression
+## Compression Algorithm

-Compression can split the session into a new session ID while preserving parent lineage in the state DB.
+The `ContextCompressor.compress()` method follows a 4-phase algorithm:

-This lets Hermes continue operating with a smaller active context while retaining a searchable ancestry chain.
+### Phase 1: Prune Old Tool Results (cheap, no LLM call)

-## Re-injected state after compression
+Old tool results (>200 chars) outside the protected tail are replaced with:
+```
+[Old tool output cleared to save context space]
+```

-After compression, Hermes may re-inject compact operational state such as:
+This is a cheap pre-pass that saves significant tokens from verbose tool
+outputs (file contents, terminal output, search results).

- todo snapshot
- prior-read-files summary
+### Phase 2: Determine Boundaries

-## Related docs
+```
+┌─────────────────────────────────────────────────────────────┐
+│  Message list                                               │
+│                                                             │
+│  [0..2]  ← protect_first_n (system + first exchange)       │
+│  [3..N]  ← middle turns → SUMMARIZED                       │
+│  [N..end] ← tail (by token budget OR protect_last_n)       │
+│                                                             │
+└─────────────────────────────────────────────────────────────┘
+```

- [Prompt Assembly](./prompt-assembly.md)
- [Session Storage](./session-storage.md)
- [Agent Loop Internals](./agent-loop.md)
+Tail protection is **token-budget based**: walks backward from the end,
+accumulating tokens until the budget is exhausted. Falls back to the fixed
+`protect_last_n` count if the budget would protect fewer messages.
+
+Boundaries are aligned to avoid splitting tool_call/tool_result groups.
+The `_align_boundary_backward()` method walks past consecutive tool results
+to find the parent assistant message, keeping groups intact.
+
+### Phase 3: Generate Structured Summary
+
+The middle turns are summarized using the auxiliary LLM with a structured
+template:
+
+```
+## Goal
+[What the user is trying to accomplish]
+
+## Constraints & Preferences
+[User preferences, coding style, constraints, important decisions]
+
+## Progress
+### Done
+[Completed work — specific file paths, commands run, results]
+### In Progress
+[Work currently underway]
+### Blocked
+[Any blockers or issues encountered]
+
+## Key Decisions
+[Important technical decisions and why]
+
+## Relevant Files
+[Files read, modified, or created — with brief note on each]
+
+## Next Steps
+[What needs to happen next]
+
+## Critical Context
+[Specific values, error messages, configuration details]
+```
+
+Summary budget scales with the amount of content being compressed:
+- Formula: `content_tokens × 0.20` (the `_SUMMARY_RATIO` constant)
+- Minimum: 2,000 tokens
+- Maximum: `min(context_length × 0.05, 12,000)` tokens
+
+### Phase 4: Assemble Compressed Messages
+
+The compressed message list is:
+1. Head messages (with a note appended to system prompt on first compression)
+2. Summary message (role chosen to avoid consecutive same-role violations)
+3. Tail messages (unmodified)
+
+Orphaned tool_call/tool_result pairs are cleaned up by `_sanitize_tool_pairs()`:
+- Tool results referencing removed calls → removed
+- Tool calls whose results were removed → stub result injected
+
+### Iterative Re-compression
+
+On subsequent compressions, the previous summary is passed to the LLM with
+instructions to **update** it rather than summarize from scratch. This preserves
+information across multiple compactions — items move from "In Progress" to "Done",
+new progress is added, and obsolete information is removed.
+
+The `_previous_summary` field on the compressor instance stores the last summary
+text for this purpose.
+
+
+## Before/After Example
+
+### Before Compression (45 messages, ~95K tokens)
+
+```
+[0] system:    "You are a helpful assistant..." (system prompt)
+[1] user:      "Help me set up a FastAPI project"
+[2] assistant: <tool_call> terminal: mkdir project </tool_call>
+[3] tool:      "directory created"
+[4] assistant: <tool_call> write_file: main.py </tool_call>
+[5] tool:      "file written (2.3KB)"
+    ... 30 more turns of file editing, testing, debugging ...
+[38] assistant: <tool_call> terminal: pytest </tool_call>
+[39] tool:      "8 passed, 2 failed\n..."  (5KB output)
+[40] user:      "Fix the failing tests"
+[41] assistant: <tool_call> read_file: tests/test_api.py </tool_call>
+[42] tool:      "import pytest\n..."  (3KB)
+[43] assistant: "I see the issue with the test fixtures..."
+[44] user:      "Great, also add error handling"
+```
+
+### After Compression (25 messages, ~45K tokens)
+
+```
+[0] system:    "You are a helpful assistant...
+               [Note: Some earlier conversation turns have been compacted...]"
+[1] user:      "Help me set up a FastAPI project"
+[2] assistant: "[CONTEXT COMPACTION] Earlier turns were compacted...
+
+               ## Goal
+               Set up a FastAPI project with tests and error handling
+
+               ## Progress
+               ### Done
+               - Created project structure: main.py, tests/, requirements.txt
+               - Implemented 5 API endpoints in main.py
+               - Wrote 10 test cases in tests/test_api.py
+               - 8/10 tests passing
+
+               ### In Progress
+               - Fixing 2 failing tests (test_create_user, test_delete_user)
+
+               ## Relevant Files
+               - main.py — FastAPI app with 5 endpoints
+               - tests/test_api.py — 10 test cases
+               - requirements.txt — fastapi, pytest, httpx
+
+               ## Next Steps
+               - Fix failing test fixtures
+               - Add error handling"
+[3] user:      "Fix the failing tests"
+[4] assistant: <tool_call> read_file: tests/test_api.py </tool_call>
+[5] tool:      "import pytest\n..."
+[6] assistant: "I see the issue with the test fixtures..."
+[7] user:      "Great, also add error handling"
+```
+
+
+## Prompt Caching (Anthropic)
+
+Source: `agent/prompt_caching.py`
+
+Reduces input token costs by ~75% on multi-turn conversations by caching the
+conversation prefix. Uses Anthropic's `cache_control` breakpoints.
+
+### Strategy: system_and_3
+
+Anthropic allows a maximum of 4 `cache_control` breakpoints per request. Hermes
+uses the "system_and_3" strategy:
+
+```
+Breakpoint 1: System prompt           (stable across all turns)
+Breakpoint 2: 3rd-to-last non-system message  ─┐
+Breakpoint 3: 2nd-to-last non-system message   ├─ Rolling window
+Breakpoint 4: Last non-system message          ─┘
+```
+
+### How It Works
+
+`apply_anthropic_cache_control()` deep-copies the messages and injects
+`cache_control` markers:
+
+```python
+# Cache marker format
+marker = {"type": "ephemeral"}
+# Or for 1-hour TTL:
+marker = {"type": "ephemeral", "ttl": "1h"}
+```
+
+The marker is applied differently based on content type:
+
+| Content Type | Where Marker Goes |
+|-------------|-------------------|
+| String content | Converted to `[{"type": "text", "text": ..., "cache_control": ...}]` |
+| List content | Added to the last element's dict |
+| None/empty | Added as `msg["cache_control"]` |
+| Tool messages | Added as `msg["cache_control"]` (native Anthropic only) |
+
+### Cache-Aware Design Patterns
+
+1. **Stable system prompt**: The system prompt is breakpoint 1 and cached across
+   all turns. Avoid mutating it mid-conversation (compression appends a note
+   only on the first compaction).
+
+2. **Message ordering matters**: Cache hits require prefix matching. Adding or
+   removing messages in the middle invalidates the cache for everything after.
+
+3. **Compression cache interaction**: After compression, the cache is invalidated
+   for the compressed region but the system prompt cache survives. The rolling
+   3-message window re-establishes caching within 1-2 turns.
+
+4. **TTL selection**: Default is `5m` (5 minutes). Use `1h` for long-running
+   sessions where the user takes breaks between turns.
+
+### Enabling Prompt Caching
+
+Prompt caching is automatically enabled when:
+- The model is an Anthropic Claude model (detected by model name)
+- The provider supports `cache_control` (native Anthropic API or OpenRouter)
+
+```yaml
+# config.yaml — TTL is configurable
+model:
+  cache_ttl: "5m"   # "5m" or "1h"
+```
+
+The CLI shows caching status at startup:
+```
+💾 Prompt caching: ENABLED (Claude via OpenRouter, 5m TTL)
+```
+
+
+## Context Pressure Warnings
+
+The agent emits context pressure warnings at 85% of the compression threshold
+(not 85% of context — 85% of the threshold which is itself 50% of context):
+
+```
+⚠️  Context is 85% to compaction threshold (42,500/50,000 tokens)
+```
+
+After compression, if usage drops below 85% of threshold, the warning state
+is cleared. If compression fails to reduce below the warning level (the
+conversation is too dense), the warning persists but compression won't
+re-trigger until the threshold is exceeded again.
--- a/website/docs/developer-guide/prompt-assembly.md
+++ b/website/docs/developer-guide/prompt-assembly.md
@@ -41,6 +41,163 @@ The cached system prompt is assembled in roughly this order:

 When `skip_context_files` is set (e.g., subagent delegation), SOUL.md is not loaded and the hardcoded `DEFAULT_AGENT_IDENTITY` is used instead.

+### Concrete example: assembled system prompt
+
+Here is a simplified view of what the final system prompt looks like when all layers are present (comments show the source of each section):
+
+```
+# Layer 1: Agent Identity (from ~/.hermes/SOUL.md)
+You are Hermes, an AI assistant created by Nous Research.
+You are an expert software engineer and researcher.
+You value correctness, clarity, and efficiency.
+...
+
+# Layer 2: Tool-aware behavior guidance
+You have persistent memory across sessions. Save durable facts using
+the memory tool: user preferences, environment details, tool quirks,
+and stable conventions. Memory is injected into every turn, so keep
+it compact and focused on facts that will still matter later.
+...
+When the user references something from a past conversation or you
+suspect relevant cross-session context exists, use session_search
+to recall it before asking them to repeat themselves.
+
+# Tool-use enforcement (for GPT/Codex models only)
+You MUST use your tools to take action — do not describe what you
+would do or plan to do without actually doing it.
+...
+
+# Layer 3: Honcho static block (when active)
+[Honcho personality/context data]
+
+# Layer 4: Optional system message (from config or API)
+[User-configured system message override]
+
+# Layer 5: Frozen MEMORY snapshot
+## Persistent Memory
+- User prefers Python 3.12, uses pyproject.toml
+- Default editor is nvim
+- Working on project "atlas" in ~/code/atlas
+- Timezone: US/Pacific
+
+# Layer 6: Frozen USER profile snapshot
+## User Profile
+- Name: Alice
+- GitHub: alice-dev
+
+# Layer 7: Skills index
+## Skills (mandatory)
+Before replying, scan the skills below. If one clearly matches
+your task, load it with skill_view(name) and follow its instructions.
+...
+<available_skills>
+  software-development:
+    - code-review: Structured code review workflow
+    - test-driven-development: TDD methodology
+  research:
+    - arxiv: Search and summarize arXiv papers
+</available_skills>
+
+# Layer 8: Context files (from project directory)
+# Project Context
+The following project context files have been loaded and should be followed:
+
+## AGENTS.md
+This is the atlas project. Use pytest for testing. The main
+entry point is src/atlas/main.py. Always run `make lint` before
+committing.
+
+# Layer 9: Timestamp + session
+Current time: 2026-03-30T14:30:00-07:00
+Session: abc123
+
+# Layer 10: Platform hint
+You are a CLI AI Agent. Try not to use markdown but simple text
+renderable inside a terminal.
+```
+
+## How SOUL.md appears in the prompt
+
+`SOUL.md` lives at `~/.hermes/SOUL.md` and serves as the agent's identity — the very first section of the system prompt. The loading logic in `prompt_builder.py` works as follows:
+
+```python
+# From agent/prompt_builder.py (simplified)
+def load_soul_md() -> Optional[str]:
+    soul_path = get_hermes_home() / "SOUL.md"
+    if not soul_path.exists():
+        return None
+    content = soul_path.read_text(encoding="utf-8").strip()
+    content = _scan_context_content(content, "SOUL.md")  # Security scan
+    content = _truncate_content(content, "SOUL.md")       # Cap at 20k chars
+    return content
+```
+
+When `load_soul_md()` returns content, it replaces the hardcoded `DEFAULT_AGENT_IDENTITY`. The `build_context_files_prompt()` function is then called with `skip_soul=True` to prevent SOUL.md from appearing twice (once as identity, once as a context file).
+
+If `SOUL.md` doesn't exist, the system falls back to:
+
+```
+You are Hermes Agent, an intelligent AI assistant created by Nous Research.
+You are helpful, knowledgeable, and direct. You assist users with a wide
+range of tasks including answering questions, writing and editing code,
+analyzing information, creative work, and executing actions via your tools.
+You communicate clearly, admit uncertainty when appropriate, and prioritize
+being genuinely useful over being verbose unless otherwise directed below.
+Be targeted and efficient in your exploration and investigations.
+```
+
+## How context files are injected
+
+`build_context_files_prompt()` uses a **priority system** — only one project context type is loaded (first match wins):
+
+```python
+# From agent/prompt_builder.py (simplified)
+def build_context_files_prompt(cwd=None, skip_soul=False):
+    cwd_path = Path(cwd).resolve()
+
+    # Priority: first match wins — only ONE project context loaded
+    project_context = (
+        _load_hermes_md(cwd_path)       # 1. .hermes.md / HERMES.md (walks to git root)
+        or _load_agents_md(cwd_path)    # 2. AGENTS.md (cwd only)
+        or _load_claude_md(cwd_path)    # 3. CLAUDE.md (cwd only)
+        or _load_cursorrules(cwd_path)  # 4. .cursorrules / .cursor/rules/*.mdc
+    )
+
+    sections = []
+    if project_context:
+        sections.append(project_context)
+
+    # SOUL.md from HERMES_HOME (independent of project context)
+    if not skip_soul:
+        soul_content = load_soul_md()
+        if soul_content:
+            sections.append(soul_content)
+
+    if not sections:
+        return ""
+
+    return (
+        "# Project Context\n\n"
+        "The following project context files have been loaded "
+        "and should be followed:\n\n"
+        + "\n".join(sections)
+    )
+```
+
+### Context file discovery details
+
+| Priority | Files | Search scope | Notes |
+|----------|-------|-------------|-------|
+| 1 | `.hermes.md`, `HERMES.md` | CWD up to git root | Hermes-native project config |
+| 2 | `AGENTS.md` | CWD only | Common agent instruction file |
+| 3 | `CLAUDE.md` | CWD only | Claude Code compatibility |
+| 4 | `.cursorrules`, `.cursor/rules/*.mdc` | CWD only | Cursor compatibility |
+
+All context files are:
+- **Security scanned** — checked for prompt injection patterns (invisible unicode, "ignore previous instructions", credential exfiltration attempts)
+- **Truncated** — capped at 20,000 characters using 70/20 head/tail ratio with a truncation marker
+- **YAML frontmatter stripped** — `.hermes.md` frontmatter is removed (reserved for future config overrides)
+
 ## API-call-time-only layers

 These are intentionally *not* persisted as part of the cached system prompt:
--- a/website/docs/developer-guide/session-storage.md
+++ b/website/docs/developer-guide/session-storage.md
@@ -1,66 +1,388 @@
---
-sidebar_position: 8
-title: "Session Storage"
-description: "How Hermes stores sessions in SQLite, maintains lineage, and exposes recall/search"
---
-
 # Session Storage

-Hermes uses a SQLite-backed session store as the main source of truth for historical conversation state.
+Hermes Agent uses a SQLite database (`~/.hermes/state.db`) to persist session
+metadata, full message history, and model configuration across CLI and gateway
+sessions. This replaces the earlier per-session JSONL file approach.

-Primary files:
+Source file: `hermes_state.py`

- `hermes_state.py`
- `gateway/session.py`
- `tools/session_search_tool.py`

-## Main database
+## Architecture Overview

-The primary store lives at:
-
-```text
-~/.hermes/state.db
+```
+~/.hermes/state.db (SQLite, WAL mode)
+├── sessions          — Session metadata, token counts, billing
+├── messages          — Full message history per session
+├── messages_fts      — FTS5 virtual table for full-text search
+└── schema_version    — Single-row table tracking migration state
 ```

-It contains:
+Key design decisions:
+- **WAL mode** for concurrent readers + one writer (gateway multi-platform)
+- **FTS5 virtual table** for fast text search across all session messages
+- **Session lineage** via `parent_session_id` chains (compression-triggered splits)
+- **Source tagging** (`cli`, `telegram`, `discord`, etc.) for platform filtering
+- Batch runner and RL trajectories are NOT stored here (separate systems)

- sessions
- messages
- metadata such as token counts and titles
- lineage relationships
- full-text search indexes

-## What is stored per session
+## SQLite Schema

-Examples of important session metadata:
+### Sessions Table

- session ID
- source/platform
- title
- created/updated timestamps
- token counts
- tool call counts
- stored system prompt snapshot
- parent session ID after compression splits
+```sql
+CREATE TABLE IF NOT EXISTS sessions (
+    id TEXT PRIMARY KEY,
+    source TEXT NOT NULL,
+    user_id TEXT,
+    model TEXT,
+    model_config TEXT,
+    system_prompt TEXT,
+    parent_session_id TEXT,
+    started_at REAL NOT NULL,
+    ended_at REAL,
+    end_reason TEXT,
+    message_count INTEGER DEFAULT 0,
+    tool_call_count INTEGER DEFAULT 0,
+    input_tokens INTEGER DEFAULT 0,
+    output_tokens INTEGER DEFAULT 0,
+    cache_read_tokens INTEGER DEFAULT 0,
+    cache_write_tokens INTEGER DEFAULT 0,
+    reasoning_tokens INTEGER DEFAULT 0,
+    billing_provider TEXT,
+    billing_base_url TEXT,
+    billing_mode TEXT,
+    estimated_cost_usd REAL,
+    actual_cost_usd REAL,
+    cost_status TEXT,
+    cost_source TEXT,
+    pricing_version TEXT,
+    title TEXT,
+    FOREIGN KEY (parent_session_id) REFERENCES sessions(id)
+);

-## Lineage
+CREATE INDEX IF NOT EXISTS idx_sessions_source ON sessions(source);
+CREATE INDEX IF NOT EXISTS idx_sessions_parent ON sessions(parent_session_id);
+CREATE INDEX IF NOT EXISTS idx_sessions_started ON sessions(started_at DESC);
+CREATE UNIQUE INDEX IF NOT EXISTS idx_sessions_title_unique
+    ON sessions(title) WHERE title IS NOT NULL;
+```

-When Hermes compresses a conversation, it can continue in a new session ID while preserving ancestry via `parent_session_id`.
+### Messages Table

-This means resuming/searching can follow session families instead of treating each compressed shard as unrelated.
+```sql
+CREATE TABLE IF NOT EXISTS messages (
+    id INTEGER PRIMARY KEY AUTOINCREMENT,
+    session_id TEXT NOT NULL REFERENCES sessions(id),
+    role TEXT NOT NULL,
+    content TEXT,
+    tool_call_id TEXT,
+    tool_calls TEXT,
+    tool_name TEXT,
+    timestamp REAL NOT NULL,
+    token_count INTEGER,
+    finish_reason TEXT,
+    reasoning TEXT,
+    reasoning_details TEXT,
+    codex_reasoning_items TEXT
+);

-## Gateway vs CLI persistence
+CREATE INDEX IF NOT EXISTS idx_messages_session ON messages(session_id, timestamp);
+```

- CLI uses the state DB directly for resume/history/search
- gateway keeps active-session mappings and may also maintain additional platform transcript/state files
- some legacy JSON/JSONL artifacts still exist for compatibility, but SQLite is the main historical store
+Notes:
+- `tool_calls` is stored as a JSON string (serialized list of tool call objects)
+- `reasoning_details` and `codex_reasoning_items` are stored as JSON strings
+- `reasoning` stores the raw reasoning text for providers that expose it
+- Timestamps are Unix epoch floats (`time.time()`)

-## Session search
+### FTS5 Full-Text Search

-The `session_search` tool uses the session DB's search features to retrieve and summarize relevant past work.
+```sql
+CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts USING fts5(
+    content,
+    content=messages,
+    content_rowid=id
+);
+```

-## Related docs
+The FTS5 table is kept in sync via three triggers that fire on INSERT, UPDATE,
+and DELETE of the `messages` table:

- [Gateway Internals](./gateway-internals.md)
- [Prompt Assembly](./prompt-assembly.md)
- [Context Compression & Prompt Caching](./context-compression-and-caching.md)
+```sql
+CREATE TRIGGER IF NOT EXISTS messages_fts_insert AFTER INSERT ON messages BEGIN
+    INSERT INTO messages_fts(rowid, content) VALUES (new.id, new.content);
+END;
+
+CREATE TRIGGER IF NOT EXISTS messages_fts_delete AFTER DELETE ON messages BEGIN
+    INSERT INTO messages_fts(messages_fts, rowid, content)
+        VALUES('delete', old.id, old.content);
+END;
+
+CREATE TRIGGER IF NOT EXISTS messages_fts_update AFTER UPDATE ON messages BEGIN
+    INSERT INTO messages_fts(messages_fts, rowid, content)
+        VALUES('delete', old.id, old.content);
+    INSERT INTO messages_fts(rowid, content) VALUES (new.id, new.content);
+END;
+```
+
+
+## Schema Version and Migrations
+
+Current schema version: **6**
+
+The `schema_version` table stores a single integer. On initialization,
+`_init_schema()` checks the current version and applies migrations sequentially:
+
+| Version | Change |
+|---------|--------|
+| 1 | Initial schema (sessions, messages, FTS5) |
+| 2 | Add `finish_reason` column to messages |
+| 3 | Add `title` column to sessions |
+| 4 | Add unique index on `title` (NULLs allowed, non-NULL must be unique) |
+| 5 | Add billing columns: `cache_read_tokens`, `cache_write_tokens`, `reasoning_tokens`, `billing_provider`, `billing_base_url`, `billing_mode`, `estimated_cost_usd`, `actual_cost_usd`, `cost_status`, `cost_source`, `pricing_version` |
+| 6 | Add reasoning columns to messages: `reasoning`, `reasoning_details`, `codex_reasoning_items` |
+
+Each migration uses `ALTER TABLE ADD COLUMN` wrapped in try/except to handle
+the column-already-exists case (idempotent). The version number is bumped after
+each successful migration block.
+
+
+## Write Contention Handling
+
+Multiple hermes processes (gateway + CLI sessions + worktree agents) share one
+`state.db`. The `SessionDB` class handles write contention with:
+
+- **Short SQLite timeout** (1 second) instead of the default 30s
+- **Application-level retry** with random jitter (20-150ms, up to 15 retries)
+- **BEGIN IMMEDIATE** transactions to surface lock contention at transaction start
+- **Periodic WAL checkpoints** every 50 successful writes (PASSIVE mode)
+
+This avoids the "convoy effect" where SQLite's deterministic internal backoff
+causes all competing writers to retry at the same intervals.
+
+```
+_WRITE_MAX_RETRIES = 15
+_WRITE_RETRY_MIN_S = 0.020   # 20ms
+_WRITE_RETRY_MAX_S = 0.150   # 150ms
+_CHECKPOINT_EVERY_N_WRITES = 50
+```
+
+
+## Common Operations
+
+### Initialize
+
+```python
+from hermes_state import SessionDB
+
+db = SessionDB()                           # Default: ~/.hermes/state.db
+db = SessionDB(db_path=Path("/tmp/test.db"))  # Custom path
+```
+
+### Create and Manage Sessions
+
+```python
+# Create a new session
+db.create_session(
+    session_id="sess_abc123",
+    source="cli",
+    model="anthropic/claude-sonnet-4.6",
+    user_id="user_1",
+    parent_session_id=None,  # or previous session ID for lineage
+)
+
+# End a session
+db.end_session("sess_abc123", end_reason="user_exit")
+
+# Reopen a session (clear ended_at/end_reason)
+db.reopen_session("sess_abc123")
+```
+
+### Store Messages
+
+```python
+msg_id = db.append_message(
+    session_id="sess_abc123",
+    role="assistant",
+    content="Here's the answer...",
+    tool_calls=[{"id": "call_1", "function": {"name": "terminal", "arguments": "{}"}}],
+    token_count=150,
+    finish_reason="stop",
+    reasoning="Let me think about this...",
+)
+```
+
+### Retrieve Messages
+
+```python
+# Raw messages with all metadata
+messages = db.get_messages("sess_abc123")
+
+# OpenAI conversation format (for API replay)
+conversation = db.get_messages_as_conversation("sess_abc123")
+# Returns: [{"role": "user", "content": "..."}, {"role": "assistant", ...}]
+```
+
+### Session Titles
+
+```python
+# Set a title (must be unique among non-NULL titles)
+db.set_session_title("sess_abc123", "Fix Docker Build")
+
+# Resolve by title (returns most recent in lineage)
+session_id = db.resolve_session_by_title("Fix Docker Build")
+
+# Auto-generate next title in lineage
+next_title = db.get_next_title_in_lineage("Fix Docker Build")
+# Returns: "Fix Docker Build #2"
+```
+
+
+## Full-Text Search
+
+The `search_messages()` method supports FTS5 query syntax with automatic
+sanitization of user input.
+
+### Basic Search
+
+```python
+results = db.search_messages("docker deployment")
+```
+
+### FTS5 Query Syntax
+
+| Syntax | Example | Meaning |
+|--------|---------|---------|
+| Keywords | `docker deployment` | Both terms (implicit AND) |
+| Quoted phrase | `"exact phrase"` | Exact phrase match |
+| Boolean OR | `docker OR kubernetes` | Either term |
+| Boolean NOT | `python NOT java` | Exclude term |
+| Prefix | `deploy*` | Prefix match |
+
+### Filtered Search
+
+```python
+# Search only CLI sessions
+results = db.search_messages("error", source_filter=["cli"])
+
+# Exclude gateway sessions
+results = db.search_messages("bug", exclude_sources=["telegram", "discord"])
+
+# Search only user messages
+results = db.search_messages("help", role_filter=["user"])
+```
+
+### Search Results Format
+
+Each result includes:
+- `id`, `session_id`, `role`, `timestamp`
+- `snippet` — FTS5-generated snippet with `>>>match<<<` markers
+- `context` — 1 message before and after the match (content truncated to 200 chars)
+- `source`, `model`, `session_started` — from the parent session
+
+The `_sanitize_fts5_query()` method handles edge cases:
+- Strips unmatched quotes and special characters
+- Wraps hyphenated terms in quotes (`chat-send` → `"chat-send"`)
+- Removes dangling boolean operators (`hello AND` → `hello`)
+
+
+## Session Lineage
+
+Sessions can form chains via `parent_session_id`. This happens when context
+compression triggers a session split in the gateway.
+
+### Query: Find Session Lineage
+
+```sql
+-- Find all ancestors of a session
+WITH RECURSIVE lineage AS (
+    SELECT * FROM sessions WHERE id = ?
+    UNION ALL
+    SELECT s.* FROM sessions s
+    JOIN lineage l ON s.id = l.parent_session_id
+)
+SELECT id, title, started_at, parent_session_id FROM lineage;
+
+-- Find all descendants of a session
+WITH RECURSIVE descendants AS (
+    SELECT * FROM sessions WHERE id = ?
+    UNION ALL
+    SELECT s.* FROM sessions s
+    JOIN descendants d ON s.parent_session_id = d.id
+)
+SELECT id, title, started_at FROM descendants;
+```
+
+### Query: Recent Sessions with Preview
+
+```sql
+SELECT s.*,
+    COALESCE(
+        (SELECT SUBSTR(m.content, 1, 63)
+         FROM messages m
+         WHERE m.session_id = s.id AND m.role = 'user' AND m.content IS NOT NULL
+         ORDER BY m.timestamp, m.id LIMIT 1),
+        ''
+    ) AS preview,
+    COALESCE(
+        (SELECT MAX(m2.timestamp) FROM messages m2 WHERE m2.session_id = s.id),
+        s.started_at
+    ) AS last_active
+FROM sessions s
+ORDER BY s.started_at DESC
+LIMIT 20;
+```
+
+### Query: Token Usage Statistics
+
+```sql
+-- Total tokens by model
+SELECT model,
+       COUNT(*) as session_count,
+       SUM(input_tokens) as total_input,
+       SUM(output_tokens) as total_output,
+       SUM(estimated_cost_usd) as total_cost
+FROM sessions
+WHERE model IS NOT NULL
+GROUP BY model
+ORDER BY total_cost DESC;
+
+-- Sessions with highest token usage
+SELECT id, title, model, input_tokens + output_tokens AS total_tokens,
+       estimated_cost_usd
+FROM sessions
+ORDER BY total_tokens DESC
+LIMIT 10;
+```
+
+
+## Export and Cleanup
+
+```python
+# Export a single session with messages
+data = db.export_session("sess_abc123")
+
+# Export all sessions (with messages) as list of dicts
+all_data = db.export_all(source="cli")
+
+# Delete old sessions (only ended sessions)
+deleted_count = db.prune_sessions(older_than_days=90)
+deleted_count = db.prune_sessions(older_than_days=30, source="telegram")
+
+# Clear messages but keep the session record
+db.clear_messages("sess_abc123")
+
+# Delete session and all messages
+db.delete_session("sess_abc123")
+```
+
+
+## Database Location
+
+Default path: `~/.hermes/state.db`
+
+This is derived from `hermes_constants.get_hermes_home()` which resolves to
+`~/.hermes/` by default, or the value of `HERMES_HOME` environment variable.
+
+The database file, WAL file (`state.db-wal`), and shared-memory file
+(`state.db-shm`) are all created in the same directory.
--- a/website/docs/developer-guide/tools-runtime.md
+++ b/website/docs/developer-guide/tools-runtime.md
@@ -22,6 +22,89 @@ Each tool module calls `registry.register(...)` at import time.

 `model_tools.py` is responsible for importing/discovering tool modules and building the schema list used by the model.

+### How `registry.register()` works
+
+Every tool file in `tools/` calls `registry.register()` at module level to declare itself. The function signature is:
+
+```python
+registry.register(
+    name="terminal",               # Unique tool name (used in API schemas)
+    toolset="terminal",            # Toolset this tool belongs to
+    schema={...},                  # OpenAI function-calling schema (description, parameters)
+    handler=handle_terminal,       # The function that executes when the tool is called
+    check_fn=check_terminal,       # Optional: returns True/False for availability
+    requires_env=["SOME_VAR"],     # Optional: env vars needed (for UI display)
+    is_async=False,                # Whether the handler is an async coroutine
+    description="Run commands",    # Human-readable description
+    emoji="💻",                    # Emoji for spinner/progress display
+)
+```
+
+Each call creates a `ToolEntry` stored in the singleton `ToolRegistry._tools` dict keyed by tool name. If a name collision occurs across toolsets, a warning is logged and the later registration wins.
+
+### Discovery: `_discover_tools()`
+
+When `model_tools.py` is imported, it calls `_discover_tools()` which imports every tool module in order:
+
+```python
+_modules = [
+    "tools.web_tools",
+    "tools.terminal_tool",
+    "tools.file_tools",
+    "tools.vision_tools",
+    "tools.mixture_of_agents_tool",
+    "tools.image_generation_tool",
+    "tools.skills_tool",
+    "tools.browser_tool",
+    "tools.cronjob_tools",
+    "tools.rl_training_tool",
+    "tools.tts_tool",
+    "tools.todo_tool",
+    "tools.memory_tool",
+    "tools.session_search_tool",
+    "tools.clarify_tool",
+    "tools.code_execution_tool",
+    "tools.delegate_tool",
+    "tools.process_registry",
+    "tools.send_message_tool",
+    "tools.honcho_tools",
+    "tools.homeassistant_tool",
+]
+```
+
+Each import triggers the module's `registry.register()` calls. Errors in optional tools (e.g., missing `fal_client` for image generation) are caught and logged — they don't prevent other tools from loading.
+
+After core tool discovery, MCP tools and plugin tools are also discovered:
+
+1. **MCP tools** — `tools.mcp_tool.discover_mcp_tools()` reads MCP server config and registers tools from external servers.
+2. **Plugin tools** — `hermes_cli.plugins.discover_plugins()` loads user/project/pip plugins that may register additional tools.
+
+## Tool availability checking (`check_fn`)
+
+Each tool can optionally provide a `check_fn` — a callable that returns `True` when the tool is available and `False` otherwise. Typical checks include:
+
+- **API key present** — e.g., `lambda: bool(os.environ.get("SERP_API_KEY"))` for web search
+- **Service running** — e.g., checking if the Honcho server is configured
+- **Binary installed** — e.g., verifying `playwright` is available for browser tools
+
+When `registry.get_definitions()` builds the schema list for the model, it runs each tool's `check_fn()`:
+
+```python
+# Simplified from registry.py
+if entry.check_fn:
+    try:
+        available = bool(entry.check_fn())
+    except Exception:
+        available = False   # Exceptions = unavailable
+    if not available:
+        continue            # Skip this tool entirely
+```
+
+Key behaviors:
+- Check results are **cached per-call** — if multiple tools share the same `check_fn`, it only runs once.
+- Exceptions in `check_fn()` are treated as "unavailable" (fail-safe).
+- The `is_toolset_available()` method checks whether a toolset's `check_fn` passes, used for UI display and toolset resolution.
+
 ## Toolset resolution

 Toolsets are named bundles of tools. Hermes resolves them through:
@@ -31,10 +114,108 @@ Toolsets are named bundles of tools. Hermes resolves them through:
 - dynamic MCP toolsets
 - curated special-purpose sets like `hermes-acp`

+### How `get_tool_definitions()` filters tools
+
+The main entry point is `model_tools.get_tool_definitions(enabled_toolsets, disabled_toolsets, quiet_mode)`:
+
+1. **If `enabled_toolsets` is provided** — only tools from those toolsets are included. Each toolset name is resolved via `resolve_toolset()` which expands composite toolsets into individual tool names.
+
+2. **If `disabled_toolsets` is provided** — start with ALL toolsets, then subtract the disabled ones.
+
+3. **If neither** — include all known toolsets.
+
+4. **Registry filtering** — the resolved tool name set is passed to `registry.get_definitions()`, which applies `check_fn` filtering and returns OpenAI-format schemas.
+
+5. **Dynamic schema patching** — after filtering, `execute_code` and `browser_navigate` schemas are dynamically adjusted to only reference tools that actually passed filtering (prevents model hallucination of unavailable tools).
+
+### Legacy toolset names
+
+Old toolset names with `_tools` suffixes (e.g., `web_tools`, `terminal_tools`) are mapped to their modern tool names via `_LEGACY_TOOLSET_MAP` for backward compatibility.
+
 ## Dispatch

 At runtime, tools are dispatched through the central registry, with agent-loop exceptions for some agent-level tools such as memory/todo/session-search handling.

+### Dispatch flow: model tool_call → handler execution
+
+When the model returns a `tool_call`, the flow is:
+
+```
+Model response with tool_call
+    ↓
+run_agent.py agent loop
+    ↓
+model_tools.handle_function_call(name, args, task_id, user_task)
+    ↓
+[Agent-loop tools?] → handled directly by agent loop (todo, memory, session_search, delegate_task)
+    ↓
+[Plugin pre-hook] → invoke_hook("pre_tool_call", ...)
+    ↓
+registry.dispatch(name, args, **kwargs)
+    ↓
+Look up ToolEntry by name
+    ↓
+[Async handler?] → bridge via _run_async()
+[Sync handler?]  → call directly
+    ↓
+Return result string (or JSON error)
+    ↓
+[Plugin post-hook] → invoke_hook("post_tool_call", ...)
+```
+
+### Error wrapping
+
+All tool execution is wrapped in error handling at two levels:
+
+1. **`registry.dispatch()`** — catches any exception from the handler and returns `{"error": "Tool execution failed: ExceptionType: message"}` as JSON.
+
+2. **`handle_function_call()`** — wraps the entire dispatch in a secondary try/except that returns `{"error": "Error executing tool_name: message"}`.
+
+This ensures the model always receives a well-formed JSON string, never an unhandled exception.
+
+### Agent-loop tools
+
+Four tools are intercepted before registry dispatch because they need agent-level state (TodoStore, MemoryStore, etc.):
+
+- `todo` — planning/task tracking
+- `memory` — persistent memory writes
+- `session_search` — cross-session recall
+- `delegate_task` — spawns subagent sessions
+
+These tools' schemas are still registered in the registry (for `get_tool_definitions`), but their handlers return a stub error if dispatch somehow reaches them directly.
+
+### Async bridging
+
+When a tool handler is async, `_run_async()` bridges it to the sync dispatch path:
+
+- **CLI path (no running loop)** — uses a persistent event loop to keep cached async clients alive
+- **Gateway path (running loop)** — spins up a disposable thread with `asyncio.run()`
+- **Worker threads (parallel tools)** — uses per-thread persistent loops stored in thread-local storage
+
+## The DANGEROUS_PATTERNS approval flow
+
+The terminal tool integrates a dangerous-command approval system defined in `tools/approval.py`:
+
+1. **Pattern detection** — `DANGEROUS_PATTERNS` is a list of `(regex, description)` tuples covering destructive operations:
+   - Recursive deletes (`rm -rf`)
+   - Filesystem formatting (`mkfs`, `dd`)
+   - SQL destructive operations (`DROP TABLE`, `DELETE FROM` without `WHERE`)
+   - System config overwrites (`> /etc/`)
+   - Service manipulation (`systemctl stop`)
+   - Remote code execution (`curl | sh`)
+   - Fork bombs, process kills, etc.
+
+2. **Detection** — before executing any terminal command, `detect_dangerous_command(command)` checks against all patterns.
+
+3. **Approval prompt** — if a match is found:
+   - **CLI mode** — an interactive prompt asks the user to approve, deny, or allow permanently
+   - **Gateway mode** — an async approval callback sends the request to the messaging platform
+   - **Smart approval** — optionally, an auxiliary LLM can auto-approve low-risk commands that match patterns (e.g., `rm -rf node_modules/` is safe but matches "recursive delete")
+
+4. **Session state** — approvals are tracked per-session. Once you approve "recursive delete" for a session, subsequent `rm -rf` commands don't re-prompt.
+
+5. **Permanent allowlist** — the "allow permanently" option writes the pattern to `config.yaml`'s `command_allowlist`, persisting across sessions.
+
 ## Terminal/runtime environments

 The terminal system supports multiple backends:
--- a/website/docs/developer-guide/trajectory-format.md
+++ b/website/docs/developer-guide/trajectory-format.md
@@ -1,56 +1,233 @@
---
-sidebar_position: 10
-title: "Trajectories & Training Format"
-description: "How Hermes saves trajectories, normalizes tool calls, and produces training-friendly outputs"
---
+# Trajectory Format

-# Trajectories & Training Format
+Hermes Agent saves conversation trajectories in ShareGPT-compatible JSONL format
+for use as training data, debugging artifacts, and reinforcement learning datasets.

-Hermes can save conversation trajectories for training, evaluation, and batch data generation workflows.
+Source files: `agent/trajectory.py`, `run_agent.py` (lines 1788-1975), `batch_runner.py`

-Primary files:

- `agent/trajectory.py`
- `run_agent.py`
- `batch_runner.py`
- `trajectory_compressor.py`
+## File Naming Convention

-## What trajectories are for
+Trajectories are written to files in the current working directory:

-Trajectory outputs are used for:
+| File | When |
+|------|------|
+| `trajectory_samples.jsonl` | Conversations that completed successfully (`completed=True`) |
+| `failed_trajectories.jsonl` | Conversations that failed or were interrupted (`completed=False`) |

- SFT data generation
- debugging agent behavior
- benchmark/evaluation artifact capture
- post-processing and compression pipelines
+The batch runner (`batch_runner.py`) writes to a custom output file per batch
+(e.g., `batch_001_output.jsonl`) with additional metadata fields.

-## Normalization strategy
+You can override the filename via the `filename` parameter in `save_trajectory()`.

-Hermes converts live conversation structure into a training-friendly format.

-Important behaviors include:
+## JSONL Entry Format

- representing reasoning in explicit markup
- converting tool calls into structured XML-like regions for dataset compatibility
- grouping tool outputs appropriately
- separating successful and failed trajectories
+Each line in the file is a self-contained JSON object. There are two variants:

-## Persistence boundaries
+### CLI/Interactive Format (from `_save_trajectory`)

-Trajectory files do **not** blindly mirror all runtime prompt state.
+```json
+{
+  "conversations": [ ... ],
+  "timestamp": "2026-03-30T14:22:31.456789",
+  "model": "anthropic/claude-sonnet-4.6",
+  "completed": true
+}
+```

-Some prompt-time-only layers are intentionally excluded from persisted trajectory content so datasets are cleaner and less environment-specific.
+### Batch Runner Format (from `batch_runner.py`)

-## Batch runner
+```json
+{
+  "prompt_index": 42,
+  "conversations": [ ... ],
+  "metadata": { "prompt_source": "gsm8k", "difficulty": "hard" },
+  "completed": true,
+  "partial": false,
+  "api_calls": 7,
+  "toolsets_used": ["code_tools", "file_tools"],
+  "tool_stats": {
+    "terminal": {"count": 3, "success": 3, "failure": 0},
+    "read_file": {"count": 2, "success": 2, "failure": 0},
+    "write_file": {"count": 0, "success": 0, "failure": 0}
+  },
+  "tool_error_counts": {
+    "terminal": 0,
+    "read_file": 0,
+    "write_file": 0
+  }
+}
+```

-`batch_runner.py` emits richer metadata than single-session trajectory saving, including:
+The `tool_stats` and `tool_error_counts` dictionaries are normalized to include
+ALL possible tools (from `model_tools.TOOL_TO_TOOLSET_MAP`) with zero defaults,
+ensuring consistent schema across entries for HuggingFace dataset loading.

- model/provider metadata
- toolset info
- partial/failure markers
- tool statistics

-## Related docs
+## Conversations Array (ShareGPT Format)

- [Environments, Benchmarks & Data Generation](./environments.md)
- [Agent Loop Internals](./agent-loop.md)
+The `conversations` array uses ShareGPT role conventions:
+
+| API Role | ShareGPT `from` |
+|----------|-----------------|
+| system | `"system"` |
+| user | `"human"` |
+| assistant | `"gpt"` |
+| tool | `"tool"` |
+
+### Complete Example
+
+```json
+{
+  "conversations": [
+    {
+      "from": "system",
+      "value": "You are a function calling AI model. You are provided with function signatures within <tools> </tools> XML tags. You may call one or more functions to assist with the user query. If available tools are not relevant in assisting with user query, just respond in natural conversational language. Don't make assumptions about what values to plug into functions. After calling & executing the functions, you will be provided with function results within <tool_response> </tool_response> XML tags. Here are the available tools:\n<tools>\n[{\"name\": \"terminal\", \"description\": \"Execute shell commands\", \"parameters\": {\"type\": \"object\", \"properties\": {\"command\": {\"type\": \"string\"}}}, \"required\": null}]\n</tools>\nFor each function call return a JSON object, with the following pydantic model json schema for each:\n{'title': 'FunctionCall', 'type': 'object', 'properties': {'name': {'title': 'Name', 'type': 'string'}, 'arguments': {'title': 'Arguments', 'type': 'object'}}, 'required': ['name', 'arguments']}\nEach function call should be enclosed within <tool_call> </tool_call> XML tags.\nExample:\n<tool_call>\n{'name': <function-name>,'arguments': <args-dict>}\n</tool_call>"
+    },
+    {
+      "from": "human",
+      "value": "What Python version is installed?"
+    },
+    {
+      "from": "gpt",
+      "value": "<think>\nThe user wants to know the Python version. I should run python3 --version.\n</think>\n<tool_call>\n{\"name\": \"terminal\", \"arguments\": {\"command\": \"python3 --version\"}}\n</tool_call>"
+    },
+    {
+      "from": "tool",
+      "value": "<tool_response>\n{\"tool_call_id\": \"call_abc123\", \"name\": \"terminal\", \"content\": \"Python 3.11.6\"}\n</tool_response>"
+    },
+    {
+      "from": "gpt",
+      "value": "<think>\nGot the version. I can now answer the user.\n</think>\nPython 3.11.6 is installed on this system."
+    }
+  ],
+  "timestamp": "2026-03-30T14:22:31.456789",
+  "model": "anthropic/claude-sonnet-4.6",
+  "completed": true
+}
+```
+
+
+## Normalization Rules
+
+### Reasoning Content Markup
+
+The trajectory converter normalizes ALL reasoning into `<think>` tags, regardless
+of how the model originally produced it:
+
+1. **Native thinking tokens** (`msg["reasoning"]` field from providers like
+   Anthropic, OpenAI o-series): Wrapped as `<think>\n{reasoning}\n</think>\n`
+   and prepended before the content.
+
+2. **REASONING_SCRATCHPAD XML** (when native thinking is disabled and the model
+   reasons via system-prompt-instructed XML): `<REASONING_SCRATCHPAD>` tags are
+   converted to `<think>` via `convert_scratchpad_to_think()`.
+
+3. **Empty think blocks**: Every `gpt` turn is guaranteed to have a `<think>`
+   block. If no reasoning was produced, an empty block is inserted:
+   `<think>\n</think>\n` — this ensures consistent format for training data.
+
+### Tool Call Normalization
+
+Tool calls from the API format (with `tool_call_id`, function name, arguments as
+JSON string) are converted to XML-wrapped JSON:
+
+```
+<tool_call>
+{"name": "terminal", "arguments": {"command": "ls -la"}}
+</tool_call>
+```
+
+- Arguments are parsed from JSON strings back to objects (not double-encoded)
+- If JSON parsing fails (shouldn't happen — validated during conversation),
+  an empty `{}` is used with a warning logged
+- Multiple tool calls in one assistant turn produce multiple `<tool_call>` blocks
+  in a single `gpt` message
+
+### Tool Response Normalization
+
+All tool results following an assistant message are grouped into a single `tool`
+turn with XML-wrapped JSON responses:
+
+```
+<tool_response>
+{"tool_call_id": "call_abc123", "name": "terminal", "content": "output here"}
+</tool_response>
+```
+
+- If tool content looks like JSON (starts with `{` or `[`), it's parsed so the
+  content field contains a JSON object/array rather than a string
+- Multiple tool results are joined with newlines in one message
+- The tool name is matched by position against the parent assistant's `tool_calls`
+  array
+
+### System Message
+
+The system message is generated at save time (not taken from the conversation).
+It follows the Hermes function-calling prompt template with:
+
+- Preamble explaining the function-calling protocol
+- `<tools>` XML block containing the JSON tool definitions
+- Schema reference for `FunctionCall` objects
+- `<tool_call>` example
+
+Tool definitions include `name`, `description`, `parameters`, and `required`
+(set to `null` to match the canonical format).
+
+
+## Loading Trajectories
+
+Trajectories are standard JSONL — load with any JSON-lines reader:
+
+```python
+import json
+
+def load_trajectories(path: str):
+    """Load trajectory entries from a JSONL file."""
+    entries = []
+    with open(path, "r", encoding="utf-8") as f:
+        for line in f:
+            line = line.strip()
+            if line:
+                entries.append(json.loads(line))
+    return entries
+
+# Filter to successful completions only
+successful = [e for e in load_trajectories("trajectory_samples.jsonl")
+              if e.get("completed")]
+
+# Extract just the conversations for training
+training_data = [e["conversations"] for e in successful]
+```
+
+### Loading for HuggingFace Datasets
+
+```python
+from datasets import load_dataset
+
+ds = load_dataset("json", data_files="trajectory_samples.jsonl")
+```
+
+The normalized `tool_stats` schema ensures all entries have the same columns,
+preventing Arrow schema mismatch errors during dataset loading.
+
+
+## Controlling Trajectory Saving
+
+In the CLI, trajectory saving is controlled by:
+
+```yaml
+# config.yaml
+agent:
+  save_trajectories: true  # default: false
+```
+
+Or via the `--save-trajectories` flag. When the agent initializes with
+`save_trajectories=True`, the `_save_trajectory()` method is called at the end
+of each conversation turn.
+
+The batch runner always saves trajectories (that's its primary purpose).
+
+Samples with zero reasoning across all turns are automatically discarded by the
+batch runner to avoid polluting training data with non-reasoning examples.
--- a/website/docs/getting-started/updating.md
+++ b/website/docs/getting-started/updating.md
@@ -20,6 +20,43 @@ This pulls the latest code, updates dependencies, and prompts you to configure a
 `hermes update` automatically detects new configuration options and prompts you to add them. If you skipped that prompt, you can manually run `hermes config check` to see missing options, then `hermes config migrate` to interactively add them.
 :::

+### What happens during an update
+
+When you run `hermes update`, the following steps occur:
+
+1. **Git pull** — pulls the latest code from the `main` branch and updates submodules
+2. **Dependency install** — runs `uv pip install -e ".[all]"` to pick up new or changed dependencies
+3. **Config migration** — detects new config options added since your version and prompts you to set them
+4. **Gateway auto-restart** — if the gateway service is running (systemd on Linux, launchd on macOS), it is **automatically restarted** after the update completes so the new code takes effect immediately
+
+Expected output looks like:
+
+```
+$ hermes update
+Updating Hermes Agent...
+📥 Pulling latest code...
+Already up to date.  (or: Updating abc1234..def5678)
+📦 Updating dependencies...
+✅ Dependencies updated
+🔍 Checking for new config options...
+✅ Config is up to date  (or: Found 2 new options — running migration...)
+🔄 Restarting gateway service...
+✅ Gateway restarted
+✅ Hermes Agent updated successfully!
+```
+
+### Checking your current version
+
+```bash
+hermes version
+```
+
+Compare against the latest release at the [GitHub releases page](https://github.com/NousResearch/hermes-agent/releases) or check for available updates:
+
+```bash
+hermes update --check
+```
+
 ### Updating from Messaging Platforms

 You can also update directly from Telegram, Discord, Slack, or WhatsApp by sending:
@@ -28,7 +65,7 @@ You can also update directly from Telegram, Discord, Slack, or WhatsApp by sendi
 /update
 ```

-This pulls the latest code, updates dependencies, and restarts the gateway.
+This pulls the latest code, updates dependencies, and restarts the gateway. The bot will briefly go offline during the restart (typically 5–15 seconds) and then resume.

 ### Manual Update

@@ -51,6 +88,57 @@ hermes config check
 hermes config migrate   # Interactively add any missing options
 ```

+### Rollback instructions
+
+If an update introduces a problem, you can roll back to a previous version:
+
+```bash
+cd /path/to/hermes-agent
+
+# List recent versions
+git log --oneline -10
+
+# Roll back to a specific commit
+git checkout <commit-hash>
+git submodule update --init --recursive
+uv pip install -e ".[all]"
+
+# Restart the gateway if running
+hermes gateway restart
+```
+
+To roll back to a specific release tag:
+
+```bash
+git checkout v0.6.0
+git submodule update --init --recursive
+uv pip install -e ".[all]"
+```
+
+:::warning
+Rolling back may cause config incompatibilities if new options were added. Run `hermes config check` after rolling back and remove any unrecognized options from `config.yaml` if you encounter errors.
+:::
+
+### Note for Nix users
+
+If you installed via Nix flake, updates are managed through the Nix package manager:
+
+```bash
+# Update the flake input
+nix flake update hermes-agent
+
+# Or rebuild with the latest
+nix profile upgrade hermes-agent
+```
+
+Nix installations are immutable — rollback is handled by Nix's generation system:
+
+```bash
+nix profile rollback
+```
+
+See [Nix Setup](./nix-setup.md) for more details.
+
 ---

 ## Uninstalling
--- a/website/docs/integrations/index.md
+++ b/website/docs/integrations/index.md
@@ -8,18 +8,75 @@ sidebar_position: 0

 Hermes Agent connects to external systems for AI inference, tool servers, IDE workflows, programmatic access, and more. These integrations extend what Hermes can do and where it can run.

-## Available Integrations
+## AI Providers & Routing

- **[AI Providers](/docs/user-guide/features/provider-routing)** — Set up and configure inference providers. Hermes works with OpenRouter, Anthropic, OpenAI, Google, and any OpenAI-compatible endpoint. Use `hermes model` to configure interactively.
+Hermes supports multiple AI inference providers out of the box. Use `hermes model` to configure interactively, or set them in `config.yaml`.

- **[MCP Servers](/docs/user-guide/features/mcp)** — Connect Hermes to external tool servers via Model Context Protocol. Access tools from GitHub, databases, file systems, browser stacks, internal APIs, and more without writing native Hermes tools.
+- **[AI Providers](/docs/user-guide/features/provider-routing)** — OpenRouter, Anthropic, OpenAI, Google, and any OpenAI-compatible endpoint. Hermes auto-detects capabilities like vision, streaming, and tool use per provider.
+- **[Provider Routing](/docs/user-guide/features/provider-routing)** — Fine-grained control over which underlying providers handle your OpenRouter requests. Optimize for cost, speed, or quality with sorting, whitelists, blacklists, and explicit priority ordering.
+- **[Fallback Providers](/docs/user-guide/features/fallback-providers)** — Automatic failover to backup LLM providers when your primary model encounters errors. Includes primary model fallback and independent auxiliary task fallback for vision, compression, and web extraction.
+
+## Tool Servers (MCP)
+
+- **[MCP Servers](/docs/user-guide/features/mcp)** — Connect Hermes to external tool servers via Model Context Protocol. Access tools from GitHub, databases, file systems, browser stacks, internal APIs, and more without writing native Hermes tools. Supports both stdio and SSE transports, per-server tool filtering, and capability-aware resource/prompt registration.
+
+## Web Search Backends
+
+The `web_search`, `web_extract`, and `web_crawl` tools support four backend providers, configured via `config.yaml` or `hermes tools`:
+
+| Backend | Env Var | Search | Extract | Crawl |
+|---------|---------|--------|---------|-------|
+| **Firecrawl** (default) | `FIRECRAWL_API_KEY` | ✔ | ✔ | ✔ |
+| **Parallel** | `PARALLEL_API_KEY` | ✔ | ✔ | — |
+| **Tavily** | `TAVILY_API_KEY` | ✔ | ✔ | ✔ |
+| **Exa** | `EXA_API_KEY` | ✔ | ✔ | — |
+
+Quick setup example:
+
+```yaml
+web:
+  backend: firecrawl    # firecrawl | parallel | tavily | exa
+```
+
+If `web.backend` is not set, the backend is auto-detected from whichever API key is available. Self-hosted Firecrawl is also supported via `FIRECRAWL_API_URL`.
+
+## Browser Automation
+
+Hermes includes full browser automation with multiple backend options for navigating websites, filling forms, and extracting information:
+
+- **Browserbase** — Managed cloud browsers with anti-bot tooling, CAPTCHA solving, and residential proxies
+- **Browser Use** — Alternative cloud browser provider
+- **Local Chrome via CDP** — Connect to your running Chrome instance using `/browser connect`
+- **Local Chromium** — Headless local browser via the `agent-browser` CLI
+
+See [Browser Automation](/docs/user-guide/features/browser) for setup and usage.
+
+## Voice & TTS Providers
+
+Text-to-speech and speech-to-text across all messaging platforms:
+
+| Provider | Quality | Cost | API Key |
+|----------|---------|------|---------|
+| **Edge TTS** (default) | Good | Free | None needed |
+| **ElevenLabs** | Excellent | Paid | `ELEVENLABS_API_KEY` |
+| **OpenAI TTS** | Good | Paid | `VOICE_TOOLS_OPENAI_KEY` |
+| **NeuTTS** | Good | Free | None needed |
+
+Speech-to-text uses Whisper for voice message transcription on Telegram, Discord, and WhatsApp. See [Voice & TTS](/docs/user-guide/features/tts) and [Voice Mode](/docs/user-guide/features/voice-mode) for details.
+
+## IDE & Editor Integration

 - **[IDE Integration (ACP)](/docs/user-guide/features/acp)** — Use Hermes Agent inside ACP-compatible editors such as VS Code, Zed, and JetBrains. Hermes runs as an ACP server, rendering chat messages, tool activity, file diffs, and terminal commands inside your editor.

+## Programmatic Access
+
 - **[API Server](/docs/user-guide/features/api-server)** — Expose Hermes as an OpenAI-compatible HTTP endpoint. Any frontend that speaks the OpenAI format — Open WebUI, LobeChat, LibreChat, NextChat, ChatBox — can connect and use Hermes as a backend with its full toolset.

+## Memory & Personalization
+
 - **[Honcho Memory](/docs/user-guide/features/honcho)** — AI-native persistent memory for cross-session user modeling and personalization. Honcho adds deep user modeling via dialectic reasoning on top of Hermes's built-in memory system.

- **[Provider Routing](/docs/user-guide/features/provider-routing)** — Fine-grained control over which underlying AI providers handle your OpenRouter requests. Optimize for cost, speed, or quality with sorting, whitelists, blacklists, and explicit priority ordering.
+## Training & Evaluation

- **[Fallback Providers](/docs/user-guide/features/fallback-providers)** — Automatic failover to backup LLM providers when your primary model encounters errors. Includes primary model fallback and independent auxiliary task fallback for vision, compression, and web extraction.
+- **[RL Training](/docs/user-guide/features/rl-training)** — Generate trajectory data from agent sessions for reinforcement learning and model fine-tuning.
+- **[Batch Processing](/docs/user-guide/features/batch-processing)** — Run the agent across hundreds of prompts in parallel, generating structured ShareGPT-format trajectory data for training data generation or evaluation.
--- a/website/docs/user-guide/configuration.md
+++ b/website/docs/user-guide/configuration.md
@@ -860,12 +860,15 @@ When enabled, responses appear token-by-token inside a streaming box. Tool calls
 ```yaml
 streaming:
  enabled: true           # Enable progressive message editing
+  transport: edit         # "edit" (progressive message editing) or "off"
  edit_interval: 0.3      # Seconds between message edits
  buffer_threshold: 40    # Characters before forcing an edit flush
  cursor: " ▉"            # Cursor shown during streaming
 ```

-When enabled, the bot sends a message on the first token, then progressively edits it as more tokens arrive. Platforms that don't support message editing (Signal, Email) gracefully skip streaming and deliver the final response normally.
+When enabled, the bot sends a message on the first token, then progressively edits it as more tokens arrive. Platforms that don't support message editing (Signal, Email, Home Assistant) are auto-detected on the first attempt — streaming is gracefully disabled for that session with no flood of messages.
+
+**Overflow handling:** If the streamed text exceeds the platform's message length limit (~4096 chars), the current message is finalized and a new one starts automatically.

 :::note
 Streaming is disabled by default. Enable it in `~/.hermes/config.yaml` to try the streaming UX.
@@ -929,23 +932,6 @@ Usage: type `/status`, `/disk`, `/update`, or `/gpu` in the CLI or any messaging
 - **Type** — only `exec` is supported (runs a shell command); other types show an error
 - **Works everywhere** — CLI, Telegram, Discord, Slack, WhatsApp, Signal, Email, Home Assistant

-## Gateway Streaming
-
-Enable progressive token delivery on messaging platforms. When streaming is enabled, responses appear character-by-character in Telegram, Discord, and Slack via message editing, rather than waiting for the full response.
-
-```yaml
-streaming:
-  enabled: false              # Enable streaming token delivery (default: off)
-  transport: edit             # "edit" (progressive message editing) or "off"
-  edit_interval: 0.3          # Min seconds between message edits
-  buffer_threshold: 40        # Characters accumulated before forcing an edit
-  cursor: " ▉"               # Cursor character shown during streaming
-```
-
-**Platform support:** Telegram, Discord, and Slack support edit-based streaming. Platforms that don't support message editing (Signal, Email, Home Assistant) are auto-detected on the first attempt — streaming is gracefully disabled for that session with no flood of messages.
-
-**Overflow handling:** If the streamed text exceeds the platform's message length limit (~4096 chars), the current message is finalized and a new one starts automatically.
-
 ## Human Delay

 Simulate human-like response pacing in messaging platforms:
--- a/website/docs/user-guide/docker.md
+++ b/website/docs/user-guide/docker.md
@@ -1,10 +1,17 @@
+---
+sidebar_position: 7
+title: "Docker"
+description: "Running Hermes Agent in Docker and using Docker as a terminal backend"
+---
+
 # Hermes Agent — Docker

-Want to run Hermes Agent, but without installing packages on your host? This'll sort you out.
+There are two distinct ways Docker intersects with Hermes Agent:

-This will let you run the agent in a container, with the most relevant modes outlined below.
+1. **Running Hermes IN Docker** — the agent itself runs inside a container (this page's primary focus)
+2. **Docker as a terminal backend** — the agent runs on your host but executes commands inside a Docker sandbox (see [Configuration → terminal.backend](./configuration.md))

-The container stores all user data (config, API keys, sessions, skills, memories) in a single directory mounted from the host at `/opt/data`. The image itself is stateless and can be upgraded by pulling a new version without losing any configuration.
+This page covers option 1. The container stores all user data (config, API keys, sessions, skills, memories) in a single directory mounted from the host at `/opt/data`. The image itself is stateless and can be upgraded by pulling a new version without losing any configuration.

 ## Quick start

@@ -41,6 +48,110 @@ docker run -it --rm \
  nousresearch/hermes-agent
 ```

+## Persistent volumes
+
+The `/opt/data` volume is the single source of truth for all Hermes state. It maps to your host's `~/.hermes/` directory and contains:
+
+| Path | Contents |
+|------|----------|
+| `.env` | API keys and secrets |
+| `config.yaml` | All Hermes configuration |
+| `SOUL.md` | Agent personality/identity |
+| `sessions/` | Conversation history |
+| `memories/` | Persistent memory store |
+| `skills/` | Installed skills |
+| `cron/` | Scheduled job definitions |
+| `hooks/` | Event hooks |
+| `logs/` | Runtime logs |
+| `skins/` | Custom CLI skins |
+
+:::warning
+Never run two Hermes containers against the same data directory simultaneously — session files and memory stores are not designed for concurrent access.
+:::
+
+## Environment variable forwarding
+
+API keys are read from `/opt/data/.env` inside the container. You can also pass environment variables directly:
+
+```sh
+docker run -it --rm \
+  -v ~/.hermes:/opt/data \
+  -e ANTHROPIC_API_KEY="sk-ant-..." \
+  -e OPENAI_API_KEY="sk-..." \
+  nousresearch/hermes-agent
+```
+
+Direct `-e` flags override values from `.env`. This is useful for CI/CD or secrets-manager integrations where you don't want keys on disk.
+
+## Docker Compose example
+
+For persistent gateway deployment, a `docker-compose.yaml` is convenient:
+
+```yaml
+version: "3.8"
+services:
+  hermes:
+    image: nousresearch/hermes-agent:latest
+    container_name: hermes
+    restart: unless-stopped
+    command: gateway run
+    volumes:
+      - ~/.hermes:/opt/data
+    # Uncomment to forward specific env vars instead of using .env file:
+    # environment:
+    #   - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
+    #   - OPENAI_API_KEY=${OPENAI_API_KEY}
+    #   - TELEGRAM_BOT_TOKEN=${TELEGRAM_BOT_TOKEN}
+    deploy:
+      resources:
+        limits:
+          memory: 4G
+          cpus: "2.0"
+```
+
+Start with `docker compose up -d` and view logs with `docker compose logs -f hermes`.
+
+## Resource limits
+
+The Hermes container needs moderate resources. Recommended minimums:
+
+| Resource | Minimum | Recommended |
+|----------|---------|-------------|
+| Memory | 1 GB | 2–4 GB |
+| CPU | 1 core | 2 cores |
+| Disk (data volume) | 500 MB | 2+ GB (grows with sessions/skills) |
+
+Browser automation (Playwright/Chromium) is the most memory-hungry feature. If you don't need browser tools, 1 GB is sufficient. With browser tools active, allocate at least 2 GB.
+
+Set limits in Docker:
+
+```sh
+docker run -d \
+  --name hermes \
+  --restart unless-stopped \
+  --memory=4g --cpus=2 \
+  -v ~/.hermes:/opt/data \
+  nousresearch/hermes-agent gateway run
+```
+
+## What the Dockerfile does
+
+The official image is based on `debian:13.4` and includes:
+
+- Python 3 with all Hermes dependencies (`pip install -e ".[all]"`)
+- Node.js + npm (for browser automation and WhatsApp bridge)
+- Playwright with Chromium (`npx playwright install --with-deps chromium`)
+- ripgrep and ffmpeg as system utilities
+- The WhatsApp bridge (`scripts/whatsapp-bridge/`)
+
+The entrypoint script (`docker/entrypoint.sh`) bootstraps the data volume on first run:
+- Creates the directory structure (`sessions/`, `memories/`, `skills/`, etc.)
+- Copies `.env.example` → `.env` if no `.env` exists
+- Copies default `config.yaml` if missing
+- Copies default `SOUL.md` if missing
+- Syncs bundled skills using a manifest-based approach (preserves user edits)
+- Then runs `hermes` with whatever arguments you pass
+
 ## Upgrading

 Pull the latest image and recreate the container. Your data directory is untouched.
@@ -52,7 +163,14 @@ docker run -d \
  --name hermes \
  --restart unless-stopped \
  -v ~/.hermes:/opt/data \
-  nousresearch/hermes-agent
+  nousresearch/hermes-agent gateway run
+```
+
+Or with Docker Compose:
+
+```sh
+docker compose pull
+docker compose up -d
 ```

 ## Skills and credential files
@@ -60,3 +178,47 @@ docker run -d \
 When using Docker as the execution environment (not the methods above, but when the agent runs commands inside a Docker sandbox), Hermes automatically bind-mounts the skills directory (`~/.hermes/skills/`) and any credential files declared by skills into the container as read-only volumes. This means skill scripts, templates, and references are available inside the sandbox without manual configuration.

 The same syncing happens for SSH and Modal backends — skills and credential files are uploaded via rsync or the Modal mount API before each command.
+
+## Troubleshooting
+
+### Container exits immediately
+
+Check logs: `docker logs hermes`. Common causes:
+- Missing or invalid `.env` file — run interactively first to complete setup
+- Port conflicts if running with exposed ports
+
+### "Permission denied" errors
+
+The container runs as root by default. If your host `~/.hermes/` was created by a non-root user, permissions should work. If you get errors, ensure the data directory is writable:
+
+```sh
+chmod -R 755 ~/.hermes
+```
+
+### Browser tools not working
+
+Playwright needs shared memory. Add `--shm-size=1g` to your Docker run command:
+
+```sh
+docker run -d \
+  --name hermes \
+  --shm-size=1g \
+  -v ~/.hermes:/opt/data \
+  nousresearch/hermes-agent gateway run
+```
+
+### Gateway not reconnecting after network issues
+
+The `--restart unless-stopped` flag handles most transient failures. If the gateway is stuck, restart the container:
+
+```sh
+docker restart hermes
+```
+
+### Checking container health
+
+```sh
+docker logs --tail 50 hermes          # Recent logs
+docker exec hermes hermes version     # Verify version
+docker stats hermes                    # Resource usage
+```
--- a/website/docs/user-guide/features/mcp.md
+++ b/website/docs/user-guide/features/mcp.md
@@ -168,9 +168,7 @@ So a server that exposes callable tools but no resources/prompts will not get th

 ## Per-server filtering

-This is the main feature added by the PR work.
-
-You can now control which tools each MCP server contributes to Hermes.
+You can control which tools each MCP server contributes to Hermes, allowing fine-grained management of your tool namespace.

 ### Disable a server entirely

--- a/website/docs/user-guide/features/overview.md
+++ b/website/docs/user-guide/features/overview.md
@@ -33,6 +33,15 @@ Hermes Agent includes a rich set of capabilities that extend far beyond basic ch
 - **[Image Generation](image-generation.md)** — Generate images from text prompts using FAL.ai's FLUX 2 Pro model with automatic 2x upscaling via the Clarity Upscaler.
 - **[Voice & TTS](tts.md)** — Text-to-speech output and voice message transcription across all messaging platforms, with four provider options: Edge TTS (free), ElevenLabs, OpenAI TTS, and NeuTTS.

+## Integrations
+
+- **[Provider Routing](provider-routing.md)** — Fine-grained control over which AI providers handle your requests. Optimize for cost, speed, or quality with sorting, whitelists, blacklists, and priority ordering.
+- **[Fallback Providers](fallback-providers.md)** — Automatic failover to backup LLM providers when your primary model encounters errors, including independent fallback for auxiliary tasks like vision and compression.
+- **[API Server](api-server.md)** — Expose Hermes as an OpenAI-compatible HTTP endpoint. Connect any frontend that speaks the OpenAI format — Open WebUI, LobeChat, LibreChat, and more.
+- **[IDE Integration (ACP)](acp.md)** — Use Hermes inside ACP-compatible editors such as VS Code, Zed, and JetBrains. Chat, tool activity, file diffs, and terminal commands render inside your editor.
+- **[Honcho Memory](honcho.md)** — AI-native persistent memory for cross-session user modeling and personalization via dialectic reasoning.
+- **[RL Training](rl-training.md)** — Generate trajectory data from agent sessions for reinforcement learning and model fine-tuning.
+
 ## Customization

 - **[Personality & SOUL.md](personality.md)** — Fully customizable agent personality. `SOUL.md` is the primary identity file — the first thing in the system prompt — and you can swap in built-in or custom `/personality` presets per session.
--- a/website/docs/user-guide/features/plugins.md
+++ b/website/docs/user-guide/features/plugins.md
@@ -25,6 +25,56 @@ Drop a directory into `~/.hermes/plugins/` with a `plugin.yaml` and Python code:

 Start Hermes — your tools appear alongside built-in tools. The model can call them immediately.

+### Minimal working example
+
+Here is a complete plugin that adds a `hello_world` tool and logs every tool call via a hook.
+
+**`~/.hermes/plugins/hello-world/plugin.yaml`**
+
+```yaml
+name: hello-world
+version: "1.0"
+description: A minimal example plugin
+```
+
+**`~/.hermes/plugins/hello-world/__init__.py`**
+
+```python
+"""Minimal Hermes plugin — registers a tool and a hook."""
+
+
+def register(ctx):
+    # --- Tool: hello_world ---
+    schema = {
+        "name": "hello_world",
+        "description": "Returns a friendly greeting for the given name.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "name": {
+                    "type": "string",
+                    "description": "Name to greet",
+                }
+            },
+            "required": ["name"],
+        },
+    }
+
+    def handle_hello(params):
+        name = params.get("name", "World")
+        return f"Hello, {name}! 👋  (from the hello-world plugin)"
+
+    ctx.register_tool("hello_world", schema, handle_hello)
+
+    # --- Hook: log every tool call ---
+    def on_tool_call(tool_name, params, result):
+        print(f"[hello-world] tool called: {tool_name}")
+
+    ctx.register_hook("post_tool_call", on_tool_call)
+```
+
+Drop both files into `~/.hermes/plugins/hello-world/`, restart Hermes, and the model can immediately call `hello_world`. The hook prints a log line after every tool invocation.
+
 Project-local plugins under `./.hermes/plugins/` are disabled by default. Enable them only for trusted repositories by setting `HERMES_ENABLE_PROJECT_PLUGINS=true` before starting Hermes.

 ## What plugins can do
--- a/website/docs/user-guide/features/skins.md
+++ b/website/docs/user-guide/features/skins.md
@@ -30,28 +30,150 @@ display:

 ## Built-in skins

-| Skin | Description | Agent branding |
-|------|-------------|----------------|
-| `default` | Classic Hermes — gold and kawaii | `Hermes Agent` |
-| `ares` | War-god theme — crimson and bronze | `Ares Agent` |
-| `mono` | Monochrome — clean grayscale | `Hermes Agent` |
-| `slate` | Cool blue — developer-focused | `Hermes Agent` |
-| `poseidon` | Ocean-god theme — deep blue and seafoam | `Poseidon Agent` |
-| `sisyphus` | Sisyphean theme — austere grayscale with persistence | `Sisyphus Agent` |
-| `charizard` | Volcanic theme — burnt orange and ember | `Charizard Agent` |
+| Skin | Description | Agent branding | Visual character |
+|------|-------------|----------------|------------------|
+| `default` | Classic Hermes — gold and kawaii | `Hermes Agent` | Warm gold borders, cornsilk text, kawaii faces in spinners. The familiar caduceus banner. Clean and inviting. |
+| `ares` | War-god theme — crimson and bronze | `Ares Agent` | Deep crimson borders with bronze accents. Aggressive spinner verbs ("forging", "marching", "tempering steel"). Custom sword-and-shield ASCII art banner. |
+| `mono` | Monochrome — clean grayscale | `Hermes Agent` | All grays — no color. Borders are `#555555`, text is `#c9d1d9`. Ideal for minimal terminal setups or screen recordings. |
+| `slate` | Cool blue — developer-focused | `Hermes Agent` | Royal blue borders (`#4169e1`), soft blue text. Calm and professional. No custom spinner — uses default faces. |
+| `poseidon` | Ocean-god theme — deep blue and seafoam | `Poseidon Agent` | Deep blue to seafoam gradient. Ocean-themed spinners ("charting currents", "sounding the depth"). Trident ASCII art banner. |
+| `sisyphus` | Sisyphean theme — austere grayscale with persistence | `Sisyphus Agent` | Light grays with stark contrast. Boulder-themed spinners ("pushing uphill", "resetting the boulder", "enduring the loop"). Boulder-and-hill ASCII art banner. |
+| `charizard` | Volcanic theme — burnt orange and ember | `Charizard Agent` | Warm burnt orange to ember gradient. Fire-themed spinners ("banking into the draft", "measuring burn"). Dragon-silhouette ASCII art banner. |

-## What a skin can customize
+## Complete list of configurable keys

-| Area | Keys |
-|------|------|
-| Banner + response colors | `colors.banner_*`, `colors.response_border` |
-| Spinner animation | `spinner.waiting_faces`, `spinner.thinking_faces`, `spinner.thinking_verbs`, `spinner.wings` |
-| Branding text | `branding.agent_name`, `branding.welcome`, `branding.response_label`, `branding.prompt_symbol` |
-| Tool activity prefix | `tool_prefix` |
+### Colors (`colors:`)
+
+Controls all color values throughout the CLI. Values are hex color strings.
+
+| Key | Description | Default (`default` skin) |
+|-----|-------------|--------------------------|
+| `banner_border` | Panel border around the startup banner | `#CD7F32` (bronze) |
+| `banner_title` | Title text color in the banner | `#FFD700` (gold) |
+| `banner_accent` | Section headers in the banner (Available Tools, etc.) | `#FFBF00` (amber) |
+| `banner_dim` | Muted text in the banner (separators, secondary labels) | `#B8860B` (dark goldenrod) |
+| `banner_text` | Body text in the banner (tool names, skill names) | `#FFF8DC` (cornsilk) |
+| `ui_accent` | General UI accent color (highlights, active elements) | `#FFBF00` |
+| `ui_label` | UI labels and tags | `#4dd0e1` (teal) |
+| `ui_ok` | Success indicators (checkmarks, completion) | `#4caf50` (green) |
+| `ui_error` | Error indicators (failures, blocked) | `#ef5350` (red) |
+| `ui_warn` | Warning indicators (caution, approval prompts) | `#ffa726` (orange) |
+| `prompt` | Interactive prompt text color | `#FFF8DC` |
+| `input_rule` | Horizontal rule above the input area | `#CD7F32` |
+| `response_border` | Border around the agent's response box (ANSI escape) | `#FFD700` |
+| `session_label` | Session label color | `#DAA520` |
+| `session_border` | Session ID dim border color | `#8B8682` |
+
+### Spinner (`spinner:`)
+
+Controls the animated spinner shown while waiting for API responses.
+
+| Key | Type | Description | Example |
+|-----|------|-------------|---------|
+| `waiting_faces` | list of strings | Faces cycled while waiting for API response | `["(⚔)", "(⛨)", "(▲)"]` |
+| `thinking_faces` | list of strings | Faces cycled during model reasoning | `["(⚔)", "(⌁)", "(<>)"]` |
+| `thinking_verbs` | list of strings | Verbs shown in spinner messages | `["forging", "plotting", "hammering plans"]` |
+| `wings` | list of [left, right] pairs | Decorative brackets around the spinner | `[["⟪⚔", "⚔⟫"], ["⟪▲", "▲⟫"]]` |
+
+When spinner values are empty (like in `default` and `mono`), hardcoded defaults from `display.py` are used.
+
+### Branding (`branding:`)
+
+Text strings used throughout the CLI interface.
+
+| Key | Description | Default |
+|-----|-------------|---------|
+| `agent_name` | Name shown in banner title and status display | `Hermes Agent` |
+| `welcome` | Welcome message shown at CLI startup | `Welcome to Hermes Agent! Type your message or /help for commands.` |
+| `goodbye` | Message shown on exit | `Goodbye! ⚕` |
+| `response_label` | Label on the response box header | ` ⚕ Hermes ` |
+| `prompt_symbol` | Symbol before the user input prompt | `❯ ` |
+| `help_header` | Header text for the `/help` command output | `(^_^)? Available Commands` |
+
+### Other top-level keys
+
+| Key | Type | Description | Default |
+|-----|------|-------------|---------|
+| `tool_prefix` | string | Character prefixed to tool output lines in the CLI | `┊` |
+| `tool_emojis` | dict | Per-tool emoji overrides for spinners and progress (`{tool_name: emoji}`) | `{}` |
+| `banner_logo` | string | Rich-markup ASCII art logo (replaces the default HERMES_AGENT banner) | `""` |
+| `banner_hero` | string | Rich-markup hero art (replaces the default caduceus art) | `""` |

 ## Custom skins

-Create YAML files under `~/.hermes/skins/`. User skins inherit missing values from the built-in `default` skin.
+Create YAML files under `~/.hermes/skins/`. User skins inherit missing values from the built-in `default` skin, so you only need to specify the keys you want to change.
+
+### Full custom skin YAML template
+
+```yaml
+# ~/.hermes/skins/mytheme.yaml
+# Complete skin template — all keys shown. Delete any you don't need;
+# missing values automatically inherit from the 'default' skin.
+
+name: mytheme
+description: My custom theme
+
+colors:
+  banner_border: "#CD7F32"
+  banner_title: "#FFD700"
+  banner_accent: "#FFBF00"
+  banner_dim: "#B8860B"
+  banner_text: "#FFF8DC"
+  ui_accent: "#FFBF00"
+  ui_label: "#4dd0e1"
+  ui_ok: "#4caf50"
+  ui_error: "#ef5350"
+  ui_warn: "#ffa726"
+  prompt: "#FFF8DC"
+  input_rule: "#CD7F32"
+  response_border: "#FFD700"
+  session_label: "#DAA520"
+  session_border: "#8B8682"
+
+spinner:
+  waiting_faces:
+    - "(⚔)"
+    - "(⛨)"
+    - "(▲)"
+  thinking_faces:
+    - "(⚔)"
+    - "(⌁)"
+    - "(<>)"
+  thinking_verbs:
+    - "processing"
+    - "analyzing"
+    - "computing"
+    - "evaluating"
+  wings:
+    - ["⟪⚡", "⚡⟫"]
+    - ["⟪●", "●⟫"]
+
+branding:
+  agent_name: "My Agent"
+  welcome: "Welcome to My Agent! Type your message or /help for commands."
+  goodbye: "See you later! ⚡"
+  response_label: " ⚡ My Agent "
+  prompt_symbol: "⚡ ❯ "
+  help_header: "(⚡) Available Commands"
+
+tool_prefix: "┊"
+
+# Per-tool emoji overrides (optional)
+tool_emojis:
+  terminal: "⚔"
+  web_search: "🔮"
+  read_file: "📄"
+
+# Custom ASCII art banners (optional, Rich markup supported)
+# banner_logo: |
+#   [bold #FFD700] MY AGENT [/]
+# banner_hero: |
+#   [#FFD700]  Custom art here  [/]
+```
+
+### Minimal custom skin example
+
+Since everything inherits from `default`, a minimal skin only needs to change what's different:

 ```yaml
 name: cyberpunk
@@ -78,4 +200,7 @@ tool_prefix: "▏"

 - Built-in skins load from `hermes_cli/skin_engine.py`.
 - Unknown skins automatically fall back to `default`.
- `/skin` updates the active CLI theme immediately for the current session.
+- `/skin` updates the active CLI theme immediately for the current session.
+- User skins in `~/.hermes/skins/` take precedence over built-in skins with the same name.
+- Skin changes via `/skin` are session-only. To make a skin your permanent default, set it in `config.yaml`.
+- The `banner_logo` and `banner_hero` fields support Rich console markup (e.g., `[bold #FF0000]text[/]`) for colored ASCII art.