Files

Teknium 654e16187e feat(mcp): add sampling support — server-initiated LLM requests (#753 )

Add MCP sampling/createMessage capability via SamplingHandler class.

Text-only sampling + tool use in sampling with governance (rate limits,
model whitelist, token caps, tool loop limits). Per-server audit metrics.

Based on concept from PR #366 by eren-karakus0. Restructured as class-based
design with bug fixes and tests using real MCP SDK types.

50 new tests, 2600 total passing.

2026-03-09 03:37:38 -07:00

10 KiB

Raw Blame History

sidebar_position, title, description

sidebar_position	title	description
4	MCP (Model Context Protocol)	Connect Hermes Agent to external tool servers via MCP — databases, APIs, filesystems, and more

MCP (Model Context Protocol)

MCP lets Hermes Agent connect to external tool servers — giving the agent access to databases, APIs, filesystems, and more without any code changes.

Overview

The Model Context Protocol (MCP) is an open standard for connecting AI agents to external tools and data sources. MCP servers expose tools over a lightweight RPC protocol, and Hermes Agent can connect to any compliant server automatically.

What this means for you:

Thousands of ready-made tools — browse the MCP server directory for servers covering GitHub, Slack, databases, file systems, web scraping, and more
No code changes needed — add a few lines to ~/.hermes/config.yaml and the tools appear alongside built-in ones
Mix and match — run multiple MCP servers simultaneously, combining stdio-based and HTTP-based servers
Secure by default — environment variables are filtered and credentials are stripped from error messages

Prerequisites

pip install hermes-agent[mcp]

Server Type	Runtime Needed	Example
HTTP/remote	Nothing extra	`url: "https://mcp.example.com"`
npm-based (npx)	Node.js 18+	`command: "npx"`
Python-based	uv (recommended)	`command: "uvx"`

Configuration

MCP servers are configured in ~/.hermes/config.yaml under the mcp_servers key.

Stdio Servers

Stdio servers run as local subprocesses, communicating over stdin/stdout:

mcp_servers:
  filesystem:
    command: "npx"
    args: ["-y", "@modelcontextprotocol/server-filesystem", "/home/user/projects"]
    env: {}

  github:
    command: "npx"
    args: ["-y", "@modelcontextprotocol/server-github"]
    env:
      GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_xxxxxxxxxxxx"

Key	Required	Description
`command`	Yes	Executable to run (`npx`, `uvx`, `python`)
`args`	No	Command-line arguments
`env`	No	Environment variables for the subprocess

:::info Security Only explicitly listed env variables plus a safe baseline (PATH, HOME, USER, LANG, SHELL, TMPDIR, XDG_*) are passed to the subprocess. Your API keys and secrets are not leaked. :::

HTTP Servers

mcp_servers:
  remote_api:
    url: "https://my-mcp-server.example.com/mcp"
    headers:
      Authorization: "Bearer sk-xxxxxxxxxxxx"

Per-Server Timeouts

mcp_servers:
  slow_database:
    command: "npx"
    args: ["-y", "@modelcontextprotocol/server-postgres"]
    env:
      DATABASE_URL: "postgres://user:pass@localhost/mydb"
    timeout: 300          # Tool call timeout (default: 120s)
    connect_timeout: 90   # Initial connection timeout (default: 60s)

Mixed Configuration Example

mcp_servers:
  # Local filesystem via stdio
  filesystem:
    command: "npx"
    args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]

  # GitHub API via stdio with auth
  github:
    command: "npx"
    args: ["-y", "@modelcontextprotocol/server-github"]
    env:
      GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_xxxxxxxxxxxx"

  # Remote database via HTTP
  company_db:
    url: "https://mcp.internal.company.com/db"
    headers:
      Authorization: "Bearer sk-xxxxxxxxxxxx"
    timeout: 180

  # Python-based server via uvx
  memory:
    command: "uvx"
    args: ["mcp-server-memory"]

Translating from Claude Desktop Config

Many MCP server docs show Claude Desktop JSON format. Here's the translation:

Claude Desktop JSON:

{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
    }
  }
}

Hermes YAML:

mcp_servers:                          # mcpServers → mcp_servers (snake_case)
  filesystem:
    command: "npx"
    args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]

Rules: mcpServers → mcp_servers (snake_case), JSON → YAML. Keys like command, args, env are identical.

How It Works

Tool Registration

Each MCP tool is registered with a prefixed name:

mcp_{server_name}_{tool_name}

Server Name	MCP Tool Name	Registered As
`filesystem`	`read_file`	`mcp_filesystem_read_file`
`github`	`create-issue`	`mcp_github_create_issue`
`my-api`	`query.data`	`mcp_my_api_query_data`

Tools appear alongside built-in tools — the agent calls them like any other tool.

:::info In addition to the server's own tools, each MCP server also gets 4 utility tools auto-registered: list_resources, read_resource, list_prompts, and get_prompt. These allow the agent to discover and use MCP resources and prompts exposed by the server. :::

Reconnection

If an MCP server disconnects, Hermes automatically reconnects with exponential backoff (1s, 2s, 4s, 8s, 16s — max 5 attempts). Initial connection failures are reported immediately.

Shutdown

On agent exit, all MCP server connections are cleanly shut down.

Popular MCP Servers

Server	Package	Description
Filesystem	`@modelcontextprotocol/server-filesystem`	Read/write/search local files
GitHub	`@modelcontextprotocol/server-github`	Issues, PRs, repos, code search
Git	`@modelcontextprotocol/server-git`	Git operations on local repos
Fetch	`@modelcontextprotocol/server-fetch`	HTTP fetching and web content
Memory	`@modelcontextprotocol/server-memory`	Persistent key-value memory
SQLite	`@modelcontextprotocol/server-sqlite`	Query SQLite databases
PostgreSQL	`@modelcontextprotocol/server-postgres`	Query PostgreSQL databases
Brave Search	`@modelcontextprotocol/server-brave-search`	Web search via Brave API
Puppeteer	`@modelcontextprotocol/server-puppeteer`	Browser automation

Example Configs

mcp_servers:
  # No API key needed
  filesystem:
    command: "npx"
    args: ["-y", "@modelcontextprotocol/server-filesystem", "/home/user/projects"]

  git:
    command: "uvx"
    args: ["mcp-server-git", "--repository", "/home/user/my-repo"]

  fetch:
    command: "uvx"
    args: ["mcp-server-fetch"]

  sqlite:
    command: "uvx"
    args: ["mcp-server-sqlite", "--db-path", "/home/user/data.db"]

  # Requires API key
  github:
    command: "npx"
    args: ["-y", "@modelcontextprotocol/server-github"]
    env:
      GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_xxxxxxxxxxxx"

  brave_search:
    command: "npx"
    args: ["-y", "@modelcontextprotocol/server-brave-search"]
    env:
      BRAVE_API_KEY: "BSA_xxxxxxxxxxxx"

Troubleshooting

"MCP SDK not available"

pip install hermes-agent[mcp]

Server fails to start

The MCP server command (npx, uvx) is not on PATH. Install the required runtime:

# For npm-based servers
npm install -g npx    # or ensure Node.js 18+ is installed

# For Python-based servers
pip install uv        # then use "uvx" as the command

Server connects but tools fail with auth errors

Ensure the key is in the server's env block:

mcp_servers:
  github:
    command: "npx"
    args: ["-y", "@modelcontextprotocol/server-github"]
    env:
      GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_your_actual_token"  # Check this

Connection timeout

Increase connect_timeout for slow-starting servers:

mcp_servers:
  slow_server:
    command: "npx"
    args: ["-y", "heavy-server-package"]
    connect_timeout: 120   # default is 60

Reload MCP Servers

You can reload MCP servers without restarting Hermes:

In the CLI: the agent reconnects automatically
In messaging: send /reload-mcp

Sampling (Server-Initiated LLM Requests)

MCP's sampling/createMessage capability allows MCP servers to request LLM completions through the Hermes agent. This enables agent-in-the-loop workflows where servers can leverage the LLM during tool execution — for example, a database server asking the LLM to interpret query results, or a code analysis server requesting the LLM to review findings.

How It Works

When an MCP server sends a sampling/createMessage request:

The sampling callback validates against rate limits and model whitelist
Resolves which model to use (config override > server hint > default)
Converts MCP messages to OpenAI-compatible format
Offloads the LLM call to a thread via asyncio.to_thread() (non-blocking)
Returns the response (text or tool use) back to the server

Configuration

Sampling is enabled by default for all MCP servers. No extra setup needed — if you have an auxiliary LLM client configured, sampling works automatically.

mcp_servers:
  analysis_server:
    command: "npx"
    args: ["-y", "my-analysis-server"]
    sampling:
      enabled: true           # default: true
      model: "gemini-3-flash" # override model (optional)
      max_tokens_cap: 4096    # max tokens per request (default: 4096)
      timeout: 30             # LLM call timeout in seconds (default: 30)
      max_rpm: 10             # max requests per minute (default: 10)
      allowed_models: []      # model whitelist (empty = allow all)
      max_tool_rounds: 5      # max consecutive tool use rounds (0 = disable)
      log_level: "info"       # audit verbosity: debug, info, warning

Tool Use in Sampling

Servers can include tools and toolChoice in sampling requests, enabling multi-turn tool-augmented workflows within a single sampling session. The callback forwards tool definitions to the LLM, handles tool use responses with proper ToolUseContent types, and enforces max_tool_rounds to prevent infinite loops.

Security

Rate limiting: Per-server sliding window (default: 10 req/min)
Token cap: Servers can't request more than max_tokens_cap (default: 4096)
Model whitelist: allowed_models restricts which models a server can use
Tool loop limit: max_tool_rounds caps consecutive tool use rounds
Credential stripping: LLM responses are sanitized before returning to the server
Non-blocking: LLM calls run in a separate thread via asyncio.to_thread()
Typed errors: All failures return structured ErrorData per MCP spec

To disable sampling for untrusted servers:

mcp_servers:
  untrusted:
    command: "npx"
    args: ["-y", "untrusted-server"]
    sampling:
      enabled: false

10 KiB Raw Blame History

MCP (Model Context Protocol)

Overview

Prerequisites

Configuration

Stdio Servers

HTTP Servers

Per-Server Timeouts

Mixed Configuration Example

Translating from Claude Desktop Config

How It Works

Tool Registration

Reconnection

Shutdown

Popular MCP Servers

Example Configs

Troubleshooting

"MCP SDK not available"

Server fails to start

Server connects but tools fail with auth errors

Connection timeout

Reload MCP Servers

Sampling (Server-Initiated LLM Requests)

How It Works

Configuration

Tool Use in Sampling

Security

10 KiB

Raw Blame History