2025-09-10 00:43:55 -07:00
# Hermes Agent
An AI agent with advanced tool-calling capabilities, featuring a flexible toolsets system for organizing and managing tools.
## Features
- **Web Tools**: Search, extract content, and crawl websites
2026-01-23 12:26:53 +00:00
- **Terminal Tools**: Execute commands via mini-swe-agent (local, Docker, or Modal backends)
2026-01-29 06:10:24 +00:00
- **Browser Tools**: Automate web browsers to navigate, click, type, and extract content
2025-09-10 00:43:55 -07:00
- **Vision Tools**: Analyze images from URLs
- **Reasoning Tools**: Advanced multi-model reasoning (Mixture of Agents)
- **Creative Tools**: Generate images from text prompts
2026-01-30 07:54:21 +00:00
- **Skills Tools**: On-demand knowledge documents with progressive disclosure
2025-09-10 00:43:55 -07:00
- **Toolsets System**: Organize tools into logical groups for different scenarios
2025-10-06 03:17:58 +00:00
- **Batch Processing**: Process datasets in parallel with checkpointing and statistics tracking
2025-10-08 02:33:58 +00:00
- **Ephemeral System Prompts**: Guide model behavior without polluting training datasets
2025-09-10 00:43:55 -07:00
2025-07-26 04:37:51 +00:00
## Setup
2025-10-01 09:54:17 +00:00
2026-01-23 12:26:53 +00:00
### 1. Clone the Repository
```bash
# Clone with submodules (recommended)
git clone --recurse-submodules https://github.com/NousResearch/Hermes-Agent.git
cd Hermes-Agent
# Or if already cloned without submodules:
git submodule update --init --recursive
```
### 2. Install Dependencies
2025-09-10 00:43:55 -07:00
```bash
2025-10-01 09:54:17 +00:00
# Create and activate virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
2026-01-29 22:36:07 +00:00
# Install Python packages
2025-07-26 04:37:51 +00:00
pip install -r requirements.txt
2025-10-01 09:54:17 +00:00
2026-01-23 12:26:53 +00:00
# Install mini-swe-agent for terminal tools
pip install -e ./mini-swe-agent
2026-01-29 22:36:07 +00:00
# Install Node.js dependencies for browser tools (requires Node.js)
npm install
2025-10-01 09:54:17 +00:00
```
2026-01-23 12:26:53 +00:00
### 3. Configure Environment Variables
2025-10-01 09:54:17 +00:00
```bash
# Copy the example environment file
cp .env.example .env
# Edit .env and add your API keys
nano .env # or use your preferred editor
2025-07-26 04:37:51 +00:00
```
2025-10-01 09:54:17 +00:00
**Required API Keys:**
2026-01-23 12:26:53 +00:00
- `OPENROUTER_API_KEY` - LLM access via OpenRouter (get at: https://openrouter.ai/keys)
2025-10-01 09:54:17 +00:00
- `FIRECRAWL_API_KEY` - Web tools (get at: https://firecrawl.dev/)
- `NOUS_API_KEY` - Vision & reasoning tools (get at: https://inference-api.nousresearch.com/)
- `FAL_KEY` - Image generation (get at: https://fal.ai/)
2026-01-29 06:10:24 +00:00
**Optional API Keys (for specific features):**
- `BROWSERBASE_API_KEY` - Browser automation (get at: https://browserbase.com/)
- `BROWSERBASE_PROJECT_ID` - From Browserbase dashboard
2026-01-23 12:26:53 +00:00
- `MORPH_API_KEY` - For legacy Hecate terminal backend (get at: https://morph.so/)
### 4. Configure Terminal Backend
The terminal tool uses **mini-swe-agent ** environments. Configure in `.env` :
```bash
2026-01-29 06:10:24 +00:00
# Backend: "local", "docker", "singularity", or "modal"
TERMINAL_ENV=local # Default: runs on host machine (no isolation)
TERMINAL_ENV=singularity # Recommended for HPC: Apptainer/Singularity containers
TERMINAL_ENV=docker # Isolated Docker containers
2026-01-23 12:26:53 +00:00
TERMINAL_ENV=modal # Cloud execution via Modal
2026-01-29 06:10:24 +00:00
# Container image (for docker/singularity/modal backends)
2026-01-23 12:26:53 +00:00
TERMINAL_DOCKER_IMAGE=python:3.11-slim
2026-01-29 06:10:24 +00:00
TERMINAL_SINGULARITY_IMAGE=docker://python:3.11-slim
2026-01-23 12:26:53 +00:00
TERMINAL_TIMEOUT=60
```
**Backend Requirements:**
2026-01-29 06:10:24 +00:00
- **local**: No extra setup (runs directly on your machine, no isolation)
- **singularity**: Requires Apptainer or Singularity installed (common on HPC clusters, no root needed)
- **docker**: Requires Docker installed and user in `docker` group
2026-01-23 12:26:53 +00:00
- **modal**: Requires Modal account (see setup below)
2026-01-29 22:36:07 +00:00
### Singularity/Apptainer Setup (Recommended for HPC)
Singularity/Apptainer provides rootless container execution, ideal for HPC clusters:
```bash
# 1. Verify Apptainer is installed
apptainer --version # or: singularity --version
# 2. Set up cache directories (important for parallel workers)
# Use /scratch if available (HPC), otherwise /tmp
export APPTAINER_CACHEDIR=/scratch/$USER/.apptainer
export APPTAINER_TMPDIR=/scratch/$USER/.apptainer/tmp
mkdir -p "$APPTAINER_CACHEDIR" "$APPTAINER_TMPDIR"
# 3. Pre-build SIF image (recommended for parallel batch processing)
# This avoids race conditions when multiple workers start simultaneously
apptainer build $APPTAINER_CACHEDIR/python-nodejs.sif docker://nikolaik/python-nodejs:python3.11-nodejs20
# 4. Configure .env to use the local SIF
TERMINAL_ENV=singularity
TERMINAL_SINGULARITY_IMAGE=/scratch/$USER/.apptainer/python-nodejs.sif
```
**Tip:** The batch scripts in `configs/` automatically handle SIF pre-building if `/scratch` is available.
2026-01-23 12:26:53 +00:00
### Modal Cloud Backend Setup
[Modal ](https://modal.com ) provides serverless cloud compute for running sandboxed environments at scale.
```bash
# 1. Install Modal and dependencies
pip install modal boto3
# 2. Authenticate with Modal (opens browser)
modal setup
# 3. Set terminal backend to modal in .env
TERMINAL_ENV=modal
```
Modal uses CLI-based authentication (stored in `~/.modal/` ), so no API key is needed in `.env` . After running `modal setup` , commands will automatically execute in Modal's cloud sandboxes.
2026-01-29 06:10:24 +00:00
### Browser Tools Setup
Browser tools enable the agent to navigate websites, fill forms, click buttons, and extract content. They use [agent-browser ](https://github.com/vercel-labs/agent-browser ) CLI with [Browserbase ](https://browserbase.com ) cloud execution.
```bash
# 1. Install Node.js (if not already installed)
# Use nvm (recommended) or your package manager
2026-01-29 22:36:07 +00:00
# 2. Install agent-browser CLI (choose one option):
npm install -g agent-browser # Option A: Global install (recommended)
npm install # Option B: Local install (uses npx fallback)
2026-01-29 06:10:24 +00:00
# 3. Get Browserbase credentials
# Sign up at https://browserbase.com/ and get your:
# - API Key (from Settings → API Keys)
# - Project ID (from your project dashboard)
# 4. Add to your .env file:
BROWSERBASE_API_KEY=your_api_key_here
BROWSERBASE_PROJECT_ID=your_project_id_here
```
**Available Browser Tools:**
| Tool | Description |
|------|-------------|
| `browser_navigate` | Navigate to a URL |
| `browser_snapshot` | Get text-based page snapshot with element refs |
| `browser_click` | Click an element by ref (e.g., `@e5` ) |
| `browser_type` | Type text into an input field |
| `browser_scroll` | Scroll up or down |
| `browser_back` | Go back in browser history |
| `browser_press` | Press a keyboard key (Enter, Tab, etc.) |
| `browser_close` | Close the browser session |
| `browser_get_images` | Get list of images on the page |
**Example Usage:**
```bash
# Use browser tools with web search and vision
python run_agent.py \
--query "Go to amazon.com and find the price of the latest Kindle" \
--enabled_toolsets=browser,web,vision
# Use browser-focused distribution
python batch_runner.py \
--dataset_file=browser_tasks.jsonl \
--distribution=browser_use \
--run_name=browser_run
```
2026-01-23 12:26:53 +00:00
See `.env.example` for all available configuration options including debug settings.
2025-10-01 09:54:17 +00:00
2026-01-30 07:54:21 +00:00
### Skills Tools
Skills are on-demand knowledge documents the agent can load when needed. They follow a **progressive disclosure ** pattern to minimize token usage:
```
skills/
├── mlops/ # Category folder
│ ├── axolotl/ # Skill folder
│ │ ├── SKILL.md # Main instructions (required)
│ │ ├── references/ # Additional docs, API specs
│ │ └── templates/ # Output formats, configs
│ └── vllm/
│ └── SKILL.md
```
**Available Skills Tools:**
| Tool | Description |
|------|-------------|
| `skills_categories` | List available skill categories (~50 tokens) |
| `skills_list` | List skills with name + description (~3k tokens for 40 skills) |
| `skill_view` | Load full skill content, tags, and linked files |
**Example Usage:**
```bash
# Use skills tools
python run_agent.py \
--query "What skills do you have for fine-tuning? Show me the axolotl skill." \
--enabled_toolsets=skills
```
**Creating Skills:**
Skills use YAML frontmatter for metadata:
```yaml
---
name: my-skill
description: Brief description shown in skills_list
tags: [tag1, tag2]
related_skills: [other-skill]
version: 1.0.0
---
# Skill Content
Instructions, examples, and guidelines here...
```
Skills can include:
- `references/` - Additional documentation, API specs, examples
- `templates/` - Output formats, config files, boilerplate code
- `scripts/` - Executable helpers (Python, shell scripts)
2025-09-10 00:43:55 -07:00
## Toolsets System
The agent uses a toolsets system for organizing and managing tools. All tools must be part of a toolset to be accessible - individual tool selection is not supported. This ensures consistent and logical grouping of capabilities.
### Key Concepts
- **Toolsets**: Logical groups of tools for specific use cases (e.g., "research", "development", "debugging")
- **Composition**: Toolsets can include other toolsets for powerful combinations
- **Custom Toolsets**: Create your own toolsets at runtime or by editing `toolsets.py`
- **Toolset-Only Access**: Tools are only accessible through toolsets, not individually
### Available Toolsets
See `toolsets.py` for the complete list of predefined toolsets including:
- Basic toolsets (web, terminal, vision, creative, reasoning)
- Composite toolsets (research, development, analysis, etc.)
- Scenario-specific toolsets (debugging, documentation, API testing, etc.)
- Special toolsets (safe mode without terminal, minimal, offline)
### Using Toolsets
```bash
# Use a predefined toolset
python run_agent.py --enabled_toolsets=research --query "Find latest AI papers"
# Combine multiple toolsets
python run_agent.py --enabled_toolsets=web,vision --query "Analyze this website"
2025-10-03 13:45:29 +00:00
# Enable all toolsets explicitly (same as omitting the flag)
python run_agent.py --enabled_toolsets=all --query "Do web research and run commands if helpful"
2025-09-10 00:43:55 -07:00
# Safe mode (no terminal access)
python run_agent.py --enabled_toolsets=safe --query "Help without running commands"
# List all available toolsets and tools
python run_agent.py --list_tools
2025-07-26 04:37:51 +00:00
```
2025-09-10 00:43:55 -07:00
2026-01-29 22:36:07 +00:00
See `toolsets.py` for the complete list of available toolsets and how to create custom ones.
2025-09-10 00:43:55 -07:00
## Basic Usage
### Default (all tools enabled)
```bash
2026-01-23 12:26:53 +00:00
# Uses OpenRouter by default - just set OPENROUTER_API_KEY in .env
2025-07-26 04:37:51 +00:00
python run_agent.py \
2025-07-26 09:46:21 +00:00
--query "search up the latest docs on jit in python 3.13 and write me basic example that's not in their docs. profile its perf" \
2025-07-26 04:37:51 +00:00
--max_turns 20 \
2026-01-23 12:26:53 +00:00
--model anthropic/claude-sonnet-4-20250514
2025-07-26 04:37:51 +00:00
```
2025-09-10 00:43:55 -07:00
### With specific toolset
```bash
python run_agent.py \
--query "Debug this Python error" \
--enabled_toolsets=debugging \
2026-01-23 12:26:53 +00:00
--model anthropic/claude-sonnet-4-20250514
2025-09-10 00:43:55 -07:00
```
### Python API
```python
from run_agent import AIAgent
2026-01-23 12:26:53 +00:00
# Uses OpenRouter by default (reads OPENROUTER_API_KEY from .env)
2025-09-10 00:43:55 -07:00
agent = AIAgent(
2026-01-23 12:26:53 +00:00
model="anthropic/claude-sonnet-4-20250514",
2025-09-10 00:43:55 -07:00
enabled_toolsets=["research"]
)
response = agent.chat("Find information about quantum computing")
# Create custom toolset at runtime
from toolsets import create_custom_toolset
create_custom_toolset(
name="my_tools",
description="My custom toolkit",
tools=["web_search"],
includes=["terminal", "vision"]
)
agent = AIAgent(enabled_toolsets=["my_tools"])
```
2025-10-06 03:17:58 +00:00
## Batch Processing
Process multiple prompts from a dataset in parallel with automatic checkpointing and statistics tracking:
```bash
# Basic batch processing
python batch_runner.py \
--dataset_file=prompts.jsonl \
--batch_size=20 \
--run_name=my_run
# With specific distribution
python batch_runner.py \
--dataset_file=prompts.jsonl \
--batch_size=20 \
--run_name=image_run \
--distribution=image_gen \
--num_workers=4
```
**Key Features:**
- Parallel processing with configurable workers
- Toolset distributions for varied data generation
- Automatic checkpointing and resume capability
- Combined output in `data/<run_name>/trajectories.jsonl`
- Tool usage statistics and success rates
2026-01-29 22:36:07 +00:00
Use `--list_distributions` to see available toolset distributions for varied data generation.
### Trajectory Compression
Post-process trajectories to fit within token budgets for training:
```bash
# Compress a directory of JSONL files
python trajectory_compressor.py --input=data/my_run
# Compress a single JSONL file
python trajectory_compressor.py --input=data/trajectories.jsonl
# Compress a 15% sample (useful for creating smaller training sets)
python trajectory_compressor.py --input=data/trajectories.jsonl --sample_percent=15
# Custom output and token target
python trajectory_compressor.py \
--input=data/trajectories.jsonl \
--output=data/compressed.jsonl \
--target_max_tokens=16000
```
**Features:**
- Protects first turns (system, human, first GPT response, first tool call)
- Protects last N turns (configurable)
- Summarizes middle turns using LLM to fit target token budget
- Supports both directory and single file input
- Optional random sampling with `--sample_percent`
- Configurable via `configs/trajectory_compression.yaml`
2025-10-06 03:17:58 +00:00
2025-10-08 02:33:58 +00:00
### Ephemeral System Prompts
The ephemeral system prompt feature allows you to guide the model's behavior during batch processing **without ** saving that prompt to the training dataset trajectories. This is useful for:
- Guiding model behavior during data collection
- Adding task-specific instructions
- Keeping saved trajectories clean and focused on tool-calling format
**Example:**
```bash
python batch_runner.py \
--dataset_file=prompts.jsonl \
--batch_size=10 \
--run_name=my_run \
--ephemeral_system_prompt="You are a helpful assistant focused on image generation."
```
The ephemeral prompt will influence the model's behavior during execution, but **only the standard tool-calling system prompt ** will be saved in the trajectory files.
2026-01-29 22:36:07 +00:00
The ephemeral prompt influences model behavior during execution, but **only the standard tool-calling system prompt ** is saved in trajectory files.
2025-10-08 02:33:58 +00:00
2025-09-10 00:43:55 -07:00
## Command Line Arguments
2025-10-08 02:33:58 +00:00
**Single Agent (`run_agent.py` ):**
2025-09-10 00:43:55 -07:00
- `--query` : The question or task for the agent
- `--model` : Model to use (default: claude-opus-4-20250514)
- `--api_key` : API key for authentication
- `--base_url` : API endpoint URL
- `--max_turns` : Maximum number of tool-calling iterations
2025-10-03 13:45:29 +00:00
- `--enabled_toolsets` : Comma-separated list of toolsets to enable. Use `all` (or `*` ) to enable everything. If omitted, all toolsets are enabled by default.
2025-09-10 00:43:55 -07:00
- `--disabled_toolsets` : Comma-separated list of toolsets to disable
- `--list_tools` : List all available toolsets and tools
- `--save_trajectories` : Save conversation trajectories to JSONL files
2025-10-08 02:33:58 +00:00
**Batch Processing (`batch_runner.py` ):**
- `--dataset_file` : Path to JSONL file with prompts
- `--batch_size` : Number of prompts per batch
- `--run_name` : Name for this run (for output/checkpointing)
- `--distribution` : Toolset distribution to use (default: "default")
- `--num_workers` : Number of parallel workers (default: 4)
- `--resume` : Resume from checkpoint if interrupted
- `--ephemeral_system_prompt` : System prompt used during execution but NOT saved to trajectories
- `--list_distributions` : List available toolset distributions
2025-09-10 00:43:55 -07:00
## Environment Variables
2025-10-01 09:54:17 +00:00
All environment variables can be configured in the `.env` file (copy from `.env.example` ).
2026-01-23 12:26:53 +00:00
**LLM Provider (OpenRouter):**
- `OPENROUTER_API_KEY` : Primary LLM access via OpenRouter (supports Claude, GPT-4, Gemini, etc.)
- `LLM_MODEL` : Default model (e.g., `anthropic/claude-sonnet-4` , `openai/gpt-4o` )
**Tool API Keys:**
2025-10-01 09:54:17 +00:00
- `FIRECRAWL_API_KEY` : Web tools (search, extract, crawl)
- `NOUS_API_KEY` : Vision and reasoning tools
- `FAL_KEY` : Image generation tools
2025-09-10 00:43:55 -07:00
2026-01-23 12:26:53 +00:00
**Terminal Tool Configuration (mini-swe-agent backend):**
2026-01-29 22:36:07 +00:00
- `TERMINAL_ENV` : Backend type - `local` , `docker` , `singularity` , or `modal` (default: `local` )
- `TERMINAL_DOCKER_IMAGE` : Docker image for docker backend (default: `python:3.11-slim` )
- `TERMINAL_SINGULARITY_IMAGE` : Singularity/Apptainer image (can be `docker://...` URL or local `.sif` path)
2026-01-23 12:26:53 +00:00
- `TERMINAL_TIMEOUT` : Command timeout in seconds (default: `60` )
- `TERMINAL_LIFETIME_SECONDS` : Cleanup inactive environments after this time (default: `300` )
- `TERMINAL_CWD` : Working directory inside containers (default: `/tmp` )
2026-01-29 22:36:07 +00:00
- `TERMINAL_SCRATCH_DIR` : Custom scratch directory for sandbox storage (optional, auto-detects `/scratch` )
2026-01-23 12:26:53 +00:00
2026-01-29 06:10:24 +00:00
**Browser Tool Configuration (agent-browser + Browserbase):**
- `BROWSERBASE_API_KEY` : Browserbase API key for cloud browser execution
- `BROWSERBASE_PROJECT_ID` : Browserbase project ID
- `BROWSER_SESSION_TIMEOUT` : Session timeout in seconds (default: `300` )
2026-01-23 12:26:53 +00:00
**Legacy Hecate Terminal Backend (optional):**
- `MORPH_API_KEY` : For Hecate/MorphCloud terminal backend
2025-10-01 09:54:17 +00:00
- `HECATE_VM_LIFETIME_SECONDS` : VM lifetime (default: 300)
- `HECATE_DEFAULT_SNAPSHOT_ID` : Default snapshot (default: snapshot_p5294qxt)
2026-01-23 12:26:53 +00:00
**Debug Options:**
2025-10-01 09:54:17 +00:00
- `WEB_TOOLS_DEBUG` , `VISION_TOOLS_DEBUG` , `MOA_TOOLS_DEBUG` , `IMAGE_TOOLS_DEBUG` : Enable debug logging
2025-09-10 00:43:55 -07:00
2026-01-29 22:36:07 +00:00
## Key Files
| File | Purpose |
|------|---------|
| `run_agent.py` | Main agent runner - single query execution |
| `batch_runner.py` | Parallel batch processing with checkpointing |
| `model_tools.py` | Core tool definitions and handlers |
| `toolsets.py` | Toolset definitions and composition |
| `toolset_distributions.py` | Probability distributions for data generation |
| `trajectory_compressor.py` | Post-process trajectories for training |
| `tools/` | Individual tool implementations |
2026-01-30 07:54:21 +00:00
| `tools/skills_tool.py` | Skills system with progressive disclosure |
| `skills/` | On-demand knowledge documents |
2026-01-29 22:36:07 +00:00
| `architecture/` | Design documentation |
| `configs/` | Example batch run scripts |