Files
Teknium 43d468cea8 docs: comprehensive documentation audit — fix stale info, expand thin pages, add depth (#5393)
Major changes across 20 documentation pages:

Staleness fixes:
- Fix FAQ: wrong import path (hermes.agent → run_agent)
- Fix FAQ: stale Gemini 2.0 model → Gemini 3 Flash
- Fix integrations/index: missing MiniMax TTS provider
- Fix integrations/index: web_crawl is not a registered tool
- Fix sessions: add all 19 session sources (was only 5)
- Fix cron: add all 18 delivery targets (was only telegram/discord)
- Fix webhooks: add all delivery targets
- Fix overview: add missing MCP, memory providers, credential pools
- Fix all line-number references → use function name searches instead
- Update file size estimates (run_agent ~9200, gateway ~7200, cli ~8500)

Expanded thin pages (< 150 lines → substantial depth):
- honcho.md: 43 → 108 lines — added feature comparison, tools, config, CLI
- overview.md: 49 → 55 lines — added MCP, memory providers, credential pools
- toolsets-reference.md: 57 → 175 lines — added explanations, config examples,
  custom toolsets, wildcards, platform differences table
- optional-skills-catalog.md: 74 → 153 lines — added 25+ missing skills across
  communication, devops, mlops (18!), productivity, research categories
- integrations/index.md: 82 → 115 lines — added messaging, HA, plugins sections
- cron-internals.md: 90 → 195 lines — added job JSON example, lifecycle states,
  tick cycle, delivery targets, script-backed jobs, CLI interface
- gateway-internals.md: 111 → 250 lines — added architecture diagram, message
  flow, two-level guard, platform adapters, token locks, process management
- agent-loop.md: 112 → 235 lines — added entry points, API mode resolution,
  turn lifecycle detail, message alternation rules, tool execution flow,
  callback table, budget tracking, compression details
- architecture.md: 152 → 295 lines — added system overview diagram, data flow
  diagrams, design principles table, dependency chain

Other depth additions:
- context-references.md: added platform availability, compression interaction,
  common patterns sections
- slash-commands.md: added quick commands config example, alias resolution
- image-generation.md: added platform delivery table
- tools-reference.md: added tool counts, MCP tools note
- index.md: updated platform count (5 → 14+), tool count (40+ → 47)
2026-04-05 19:45:50 -07:00

16 KiB

sidebar_position, title, description
sidebar_position title description
1 Architecture Hermes Agent internals — major subsystems, execution paths, data flow, and where to read next

Architecture

This page is the top-level map of Hermes Agent internals. Use it to orient yourself in the codebase, then dive into subsystem-specific docs for implementation details.

System Overview

┌─────────────────────────────────────────────────────────────────────┐
│                        Entry Points                                  │
│                                                                      │
│  CLI (cli.py)    Gateway (gateway/run.py)    ACP (acp_adapter/)     │
│  Batch Runner    API Server                  Python Library          │
└──────────┬──────────────┬───────────────────────┬────────────────────┘
           │              │                       │
           ▼              ▼                       ▼
┌─────────────────────────────────────────────────────────────────────┐
│                     AIAgent (run_agent.py)                           │
│                                                                      │
│  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐                │
│  │ Prompt        │ │ Provider     │ │ Tool         │                │
│  │ Builder       │ │ Resolution   │ │ Dispatch     │                │
│  │ (prompt_      │ │ (runtime_    │ │ (model_      │                │
│  │  builder.py)  │ │  provider.py)│ │  tools.py)   │                │
│  └──────┬───────┘ └──────┬───────┘ └──────┬───────┘                │
│         │                │                │                          │
│  ┌──────┴───────┐ ┌──────┴───────┐ ┌──────┴───────┐                │
│  │ Compression  │ │ 3 API Modes  │ │ Tool Registry│                │
│  │ & Caching    │ │ chat_compl.  │ │ (registry.py)│                │
│  │              │ │ codex_resp.  │ │ 47 tools     │                │
│  │              │ │ anthropic    │ │ 37 toolsets   │                │
│  └──────────────┘ └──────────────┘ └──────────────┘                │
└─────────────────────────────────────────────────────────────────────┘
           │                                    │
           ▼                                    ▼
┌───────────────────┐              ┌──────────────────────┐
│ Session Storage   │              │ Tool Backends         │
│ (SQLite + FTS5)   │              │ Terminal (6 backends) │
│ hermes_state.py   │              │ Browser (5 backends)  │
│ gateway/session.py│              │ Web (4 backends)      │
└───────────────────┘              │ MCP (dynamic)         │
                                   │ File, Vision, etc.    │
                                   └──────────────────────┘

Directory Structure

hermes-agent/
├── run_agent.py              # AIAgent — core conversation loop (~9,200 lines)
├── cli.py                    # HermesCLI — interactive terminal UI (~8,500 lines)
├── model_tools.py            # Tool discovery, schema collection, dispatch
├── toolsets.py               # Tool groupings and platform presets
├── hermes_state.py           # SQLite session/state database with FTS5
├── hermes_constants.py       # HERMES_HOME, profile-aware paths
├── batch_runner.py           # Batch trajectory generation
│
├── agent/                    # Agent internals
│   ├── prompt_builder.py     # System prompt assembly
│   ├── context_compressor.py # Conversation compression algorithm
│   ├── prompt_caching.py     # Anthropic prompt caching
│   ├── auxiliary_client.py   # Auxiliary LLM for side tasks (vision, summarization)
│   ├── model_metadata.py     # Model context lengths, token estimation
│   ├── models_dev.py         # models.dev registry integration
│   ├── anthropic_adapter.py  # Anthropic Messages API format conversion
│   ├── display.py            # KawaiiSpinner, tool preview formatting
│   ├── skill_commands.py     # Skill slash commands
│   ├── memory_store.py       # Persistent memory read/write
│   └── trajectory.py         # Trajectory saving helpers
│
├── hermes_cli/               # CLI subcommands and setup
│   ├── main.py               # Entry point — all `hermes` subcommands (~4,200 lines)
│   ├── config.py             # DEFAULT_CONFIG, OPTIONAL_ENV_VARS, migration
│   ├── commands.py           # COMMAND_REGISTRY — central slash command definitions
│   ├── auth.py               # PROVIDER_REGISTRY, credential resolution
│   ├── runtime_provider.py   # Provider → api_mode + credentials
│   ├── models.py             # Model catalog, provider model lists
│   ├── model_switch.py       # /model command logic (CLI + gateway shared)
│   ├── setup.py              # Interactive setup wizard (~3,500 lines)
│   ├── skin_engine.py        # CLI theming engine
│   ├── skills_config.py      # hermes skills — enable/disable per platform
│   ├── skills_hub.py         # /skills slash command
│   ├── tools_config.py       # hermes tools — enable/disable per platform
│   ├── plugins.py            # PluginManager — discovery, loading, hooks
│   ├── callbacks.py          # Terminal callbacks (clarify, sudo, approval)
│   └── gateway.py            # hermes gateway start/stop
│
├── tools/                    # Tool implementations (one file per tool)
│   ├── registry.py           # Central tool registry
│   ├── approval.py           # Dangerous command detection
│   ├── terminal_tool.py      # Terminal orchestration
│   ├── process_registry.py   # Background process management
│   ├── file_tools.py         # read_file, write_file, patch, search_files
│   ├── web_tools.py          # web_search, web_extract
│   ├── browser_tool.py       # 11 browser automation tools
│   ├── code_execution_tool.py # execute_code sandbox
│   ├── delegate_tool.py      # Subagent delegation
│   ├── mcp_tool.py           # MCP client (~1,050 lines)
│   ├── credential_files.py   # File-based credential passthrough
│   ├── env_passthrough.py    # Env var passthrough for sandboxes
│   ├── ansi_strip.py         # ANSI escape stripping
│   └── environments/         # Terminal backends (local, docker, ssh, modal, daytona, singularity)
│
├── gateway/                  # Messaging platform gateway
│   ├── run.py                # GatewayRunner — message dispatch (~5,800 lines)
│   ├── session.py            # SessionStore — conversation persistence
│   ├── delivery.py           # Outbound message delivery
│   ├── pairing.py            # DM pairing authorization
│   ├── hooks.py              # Hook discovery and lifecycle events
│   ├── mirror.py             # Cross-session message mirroring
│   ├── status.py             # Token locks, profile-scoped process tracking
│   ├── builtin_hooks/        # Always-registered hooks
│   └── platforms/            # 14 adapters: telegram, discord, slack, whatsapp,
│                             #   signal, matrix, mattermost, email, sms,
│                             #   dingtalk, feishu, wecom, homeassistant, webhook
│
├── acp_adapter/              # ACP server (VS Code / Zed / JetBrains)
├── cron/                     # Scheduler (jobs.py, scheduler.py)
├── plugins/memory/           # Memory provider plugins
├── environments/             # RL training environments (Atropos)
├── skills/                   # Bundled skills (always available)
├── optional-skills/          # Official optional skills (install explicitly)
├── website/                  # Docusaurus documentation site
└── tests/                    # Pytest suite (~3,000+ tests)

Data Flow

CLI Session

User input → HermesCLI.process_input()
  → AIAgent.run_conversation()
    → prompt_builder.build_system_prompt()
    → runtime_provider.resolve_runtime_provider()
    → API call (chat_completions / codex_responses / anthropic_messages)
    → tool_calls? → model_tools.handle_function_call() → loop
    → final response → display → save to SessionDB

Gateway Message

Platform event → Adapter.on_message() → MessageEvent
  → GatewayRunner._handle_message()
    → authorize user
    → resolve session key
    → create AIAgent with session history
    → AIAgent.run_conversation()
    → deliver response back through adapter

Cron Job

Scheduler tick → load due jobs from jobs.json
  → create fresh AIAgent (no history)
  → inject attached skills as context
  → run job prompt
  → deliver response to target platform
  → update job state and next_run

If you are new to the codebase:

  1. This page — orient yourself
  2. Agent Loop Internals — how AIAgent works
  3. Prompt Assembly — system prompt construction
  4. Provider Runtime Resolution — how providers are selected
  5. Adding Providers — practical guide to adding a new provider
  6. Tools Runtime — tool registry, dispatch, environments
  7. Session Storage — SQLite schema, FTS5, session lineage
  8. Gateway Internals — messaging platform gateway
  9. Context Compression & Prompt Caching — compression and caching
  10. ACP Internals — IDE integration
  11. Environments, Benchmarks & Data Generation — RL training

Major Subsystems

Agent Loop

The synchronous orchestration engine (AIAgent in run_agent.py). Handles provider selection, prompt construction, tool execution, retries, fallback, callbacks, compression, and persistence. Supports three API modes for different provider backends.

Agent Loop Internals

Prompt System

Prompt construction and maintenance across the conversation lifecycle:

  • prompt_builder.py — Assembles the system prompt from: personality (SOUL.md), memory (MEMORY.md, USER.md), skills, context files (AGENTS.md, .hermes.md), tool-use guidance, and model-specific instructions
  • prompt_caching.py — Applies Anthropic cache breakpoints for prefix caching
  • context_compressor.py — Summarizes middle conversation turns when context exceeds thresholds

Prompt Assembly, Context Compression & Prompt Caching

Provider Resolution

A shared runtime resolver used by CLI, gateway, cron, ACP, and auxiliary calls. Maps (provider, model) tuples to (api_mode, api_key, base_url). Handles 18+ providers, OAuth flows, credential pools, and alias resolution.

Provider Runtime Resolution

Tool System

Central tool registry (tools/registry.py) with 47 registered tools across 20 toolsets. Each tool file self-registers at import time. The registry handles schema collection, dispatch, availability checking, and error wrapping. Terminal tools support 6 backends (local, Docker, SSH, Daytona, Modal, Singularity).

Tools Runtime

Session Persistence

SQLite-based session storage with FTS5 full-text search. Sessions have lineage tracking (parent/child across compressions), per-platform isolation, and atomic writes with contention handling.

Session Storage

Messaging Gateway

Long-running process with 14 platform adapters, unified session routing, user authorization (allowlists + DM pairing), slash command dispatch, hook system, cron ticking, and background maintenance.

Gateway Internals

Plugin System

Three discovery sources: ~/.hermes/plugins/ (user), .hermes/plugins/ (project), and pip entry points. Plugins register tools, hooks, and CLI commands through a context API. Memory providers are a specialized plugin type under plugins/memory/.

Plugin Guide, Memory Provider Plugin

Cron

First-class agent tasks (not shell tasks). Jobs store in JSON, support multiple schedule formats, can attach skills and scripts, and deliver to any platform.

Cron Internals

ACP Integration

Exposes Hermes as an editor-native agent over stdio/JSON-RPC for VS Code, Zed, and JetBrains.

ACP Internals

RL / Environments / Trajectories

Full environment framework for evaluation and RL training. Integrates with Atropos, supports multiple tool-call parsers, and generates ShareGPT-format trajectories.

Environments, Benchmarks & Data Generation, Trajectories & Training Format

Design Principles

Principle What it means in practice
Prompt stability System prompt doesn't change mid-conversation. No cache-breaking mutations except explicit user actions (/model).
Observable execution Every tool call is visible to the user via callbacks. Progress updates in CLI (spinner) and gateway (chat messages).
Interruptible API calls and tool execution can be cancelled mid-flight by user input or signals.
Platform-agnostic core One AIAgent class serves CLI, gateway, ACP, batch, and API server. Platform differences live in the entry point, not the agent.
Loose coupling Optional subsystems (MCP, plugins, memory providers, RL environments) use registry patterns and check_fn gating, not hard dependencies.
Profile isolation Each profile (hermes -p <name>) gets its own HERMES_HOME, config, memory, sessions, and gateway PID. Multiple profiles run concurrently.

File Dependency Chain

tools/registry.py  (no deps — imported by all tool files)
       ↑
tools/*.py  (each calls registry.register() at import time)
       ↑
model_tools.py  (imports tools/registry + triggers tool discovery)
       ↑
run_agent.py, cli.py, batch_runner.py, environments/

This chain means tool registration happens at import time, before any agent instance is created. Adding a new tool requires an import in model_tools.py's _discover_tools() list.