Files
hermes-agent/README.md
teknium1 a3ba41fce2 Implement cron job management system for scheduled tasks (similar to OpenAI's Pulse but the AI can also schedule jobs)
- Introduced a new cron job system allowing users to schedule automated tasks via the CLI, supporting one-time reminders and recurring jobs.
- Added commands for managing cron jobs: `/cron` to list jobs, `/cron add` to create new jobs, and `/cron remove` to delete jobs.
- Implemented job storage in `~/.hermes/cron/jobs.json` with output saved to `~/.hermes/cron/output/{job_id}/{timestamp}.md`.
- Enhanced the CLI and README documentation to include detailed usage instructions and examples for cron job management.
- Integrated cron job tools into the hermes-cli toolset, ensuring they are only available in interactive CLI mode.
- Added support for cron expression parsing with the `croniter` package, enabling flexible scheduling options.
2026-02-02 08:26:42 -08:00

25 KiB

Hermes Agent

An AI agent with advanced tool-calling capabilities, featuring a flexible toolsets system for organizing and managing tools.

Features

  • Interactive CLI: Beautiful terminal interface with animated feedback, personalities, and session management
  • Web Tools: Search, extract content, and crawl websites
  • Terminal Tools: Execute commands via local, Docker, Singularity, Modal, or SSH backends
  • Browser Tools: Automate web browsers to navigate, click, type, and extract content
  • Vision Tools: Analyze images from URLs
  • Reasoning Tools: Advanced multi-model reasoning (Mixture of Agents)
  • Creative Tools: Generate images from text prompts
  • Skills Tools: On-demand knowledge documents with progressive disclosure
  • Toolsets System: Organize tools into logical groups for different scenarios
  • Batch Processing: Process datasets in parallel with checkpointing and statistics tracking
  • Ephemeral System Prompts: Guide model behavior without polluting training datasets

Quick Start (CLI)

# After setup (see below), just run:
./hermes

# Or with options:
./hermes --model "anthropic/claude-sonnet-4" --toolsets "web,terminal"

The CLI provides:

  • Animated spinners during thinking and tool execution
  • Kawaii-style feedback messages
  • /commands for configuration, history, and session management
  • Customizable personalities (/personality kawaii, /personality pirate, etc.)
  • Persistent configuration via cli-config.yaml

Setup

1. Clone the Repository

# Clone with submodules (recommended)
git clone --recurse-submodules https://github.com/NousResearch/Hermes-Agent.git
cd Hermes-Agent

# Or if already cloned without submodules:
git submodule update --init --recursive

2. Install Dependencies

# Create and activate virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install Python packages
pip install -r requirements.txt

# Install mini-swe-agent for terminal tools
pip install -e ./mini-swe-agent

# Install Node.js dependencies for browser tools (requires Node.js)
npm install

3. Configure Environment Variables

# Copy the example environment file
cp .env.example .env

# Edit .env and add your API keys
nano .env  # or use your preferred editor

Required API Keys:

Optional API Keys (for specific features):

  • BROWSERBASE_API_KEY - Browser automation (get at: https://browserbase.com/)
  • BROWSERBASE_PROJECT_ID - From Browserbase dashboard
  • MORPH_API_KEY - For legacy Hecate terminal backend (get at: https://morph.so/)

4. Configure Terminal Backend

The terminal tool uses mini-swe-agent environments. Configure in .env or cli-config.yaml:

# Backend: "local", "docker", "singularity", "modal", or "ssh"
TERMINAL_ENV=local          # Default: runs on host machine (no isolation)
TERMINAL_ENV=ssh            # Remote execution via SSH (agent code stays local)
TERMINAL_ENV=singularity    # Recommended for HPC: Apptainer/Singularity containers
TERMINAL_ENV=docker         # Isolated Docker containers
TERMINAL_ENV=modal          # Cloud execution via Modal

# Container image (for docker/singularity/modal backends)
TERMINAL_DOCKER_IMAGE=python:3.11-slim
TERMINAL_SINGULARITY_IMAGE=docker://python:3.11-slim
TERMINAL_TIMEOUT=60

# SSH backend (for ssh)
TERMINAL_SSH_HOST=my-server.example.com
TERMINAL_SSH_USER=myuser
TERMINAL_SSH_KEY=~/.ssh/id_rsa  # Optional, uses ssh-agent if not set

Backend Requirements:

  • local: No extra setup (runs directly on your machine, no isolation)
  • ssh: SSH access to remote machine (great for sandboxing - agent can't touch its own code)
  • singularity: Requires Apptainer or Singularity installed (common on HPC clusters, no root needed)
  • docker: Requires Docker installed and user in docker group
  • modal: Requires Modal account (see setup below)

Singularity/Apptainer provides rootless container execution, ideal for HPC clusters:

# 1. Verify Apptainer is installed
apptainer --version  # or: singularity --version

# 2. Set up cache directories (important for parallel workers)
# Use /scratch if available (HPC), otherwise /tmp
export APPTAINER_CACHEDIR=/scratch/$USER/.apptainer
export APPTAINER_TMPDIR=/scratch/$USER/.apptainer/tmp
mkdir -p "$APPTAINER_CACHEDIR" "$APPTAINER_TMPDIR"

# 3. Pre-build SIF image (recommended for parallel batch processing)
# This avoids race conditions when multiple workers start simultaneously
apptainer build $APPTAINER_CACHEDIR/python-nodejs.sif docker://nikolaik/python-nodejs:python3.11-nodejs20

# 4. Configure .env to use the local SIF
TERMINAL_ENV=singularity
TERMINAL_SINGULARITY_IMAGE=/scratch/$USER/.apptainer/python-nodejs.sif

Tip: The batch scripts in configs/ automatically handle SIF pre-building if /scratch is available.

Modal Cloud Backend Setup

Modal provides serverless cloud compute for running sandboxed environments at scale.

# 1. Install Modal and dependencies
pip install modal boto3

# 2. Authenticate with Modal (opens browser)
modal setup

# 3. Set terminal backend to modal in .env
TERMINAL_ENV=modal

Modal uses CLI-based authentication (stored in ~/.modal/), so no API key is needed in .env. After running modal setup, commands will automatically execute in Modal's cloud sandboxes.

Browser Tools Setup

Browser tools enable the agent to navigate websites, fill forms, click buttons, and extract content. They use agent-browser CLI with Browserbase cloud execution.

# 1. Install Node.js (if not already installed)
# Use nvm (recommended) or your package manager

# 2. Install agent-browser CLI (choose one option):
npm install -g agent-browser     # Option A: Global install (recommended)
npm install                      # Option B: Local install (uses npx fallback)

# 3. Get Browserbase credentials
# Sign up at https://browserbase.com/ and get your:
# - API Key (from Settings → API Keys)
# - Project ID (from your project dashboard)

# 4. Add to your .env file:
BROWSERBASE_API_KEY=your_api_key_here
BROWSERBASE_PROJECT_ID=your_project_id_here

Available Browser Tools:

Tool Description
browser_navigate Navigate to a URL
browser_snapshot Get text-based page snapshot with element refs
browser_click Click an element by ref (e.g., @e5)
browser_type Type text into an input field
browser_scroll Scroll up or down
browser_back Go back in browser history
browser_press Press a keyboard key (Enter, Tab, etc.)
browser_close Close the browser session
browser_get_images Get list of images on the page

Example Usage:

# Use browser tools with web search and vision
python run_agent.py \
  --query "Go to amazon.com and find the price of the latest Kindle" \
  --enabled_toolsets=browser,web,vision

# Use browser-focused distribution
python batch_runner.py \
  --dataset_file=browser_tasks.jsonl \
  --distribution=browser_use \
  --run_name=browser_run

See .env.example for all available configuration options including debug settings.

Skills Tools

Skills are on-demand knowledge documents the agent can load when needed. They follow a progressive disclosure pattern to minimize token usage:

skills/
├── mlops/                    # Category folder
│   ├── axolotl/             # Skill folder
│   │   ├── SKILL.md         # Main instructions (required)
│   │   ├── references/      # Additional docs, API specs
│   │   └── templates/       # Output formats, configs
│   └── vllm/
│       └── SKILL.md

Available Skills Tools:

Tool Description
skills_categories List available skill categories (~50 tokens)
skills_list List skills with name + description (~3k tokens for 40 skills)
skill_view Load full skill content, tags, and linked files

Example Usage:

# Use skills tools
python run_agent.py \
  --query "What skills do you have for fine-tuning? Show me the axolotl skill." \
  --enabled_toolsets=skills

Creating Skills:

Skills use YAML frontmatter for metadata:

---
name: my-skill
description: Brief description shown in skills_list
tags: [tag1, tag2]
related_skills: [other-skill]
version: 1.0.0
---
# Skill Content

Instructions, examples, and guidelines here...

Skills can include:

  • references/ - Additional documentation, API specs, examples
  • templates/ - Output formats, config files, boilerplate code
  • scripts/ - Executable helpers (Python, shell scripts)

Session Logging

Every conversation is automatically logged to logs/ for debugging and inspection:

logs/
├── session_20260201_143052_a1b2c3.json
├── session_20260201_150217_d4e5f6.json
└── ...

Log Format:

{
  "session_id": "20260201_143052_a1b2c3",
  "model": "anthropic/claude-sonnet-4",
  "session_start": "2026-02-01T14:30:52.123456",
  "last_updated": "2026-02-01T14:35:12.789012",
  "message_count": 8,
  "conversations": [
    {"from": "system", "value": "..."},
    {"from": "human", "value": "..."},
    {"from": "gpt", "value": "..."},
    {"from": "tool", "value": "..."}
  ]
}
  • Automatic: Logs are created and updated automatically after each conversation turn
  • Session ID in Banner: The CLI displays the session ID in the welcome banner
  • Trajectory Format: Uses the same format as batch processing for consistency
  • Git Ignored: logs/ is in .gitignore so logs aren't committed

Context Compression

Long conversations can exceed the model's context limit. Hermes Agent automatically compresses context when approaching the limit:

How it works:

  1. Tracks actual token usage from API responses (usage.prompt_tokens)
  2. When tokens reach 85% of model's context limit, triggers compression
  3. Protects first 3 turns (system prompt, initial request, first response)
  4. Protects last 4 turns (recent context is most relevant)
  5. Summarizes middle turns using a fast/cheap model (Gemini Flash)
  6. Inserts summary as a user message, conversation continues seamlessly

Configuration (cli-config.yaml):

compression:
  enabled: true                    # Enable auto-compression (default)
  threshold: 0.85                  # Compress at 85% of context limit
  summary_model: "google/gemini-2.0-flash-001"

Or via environment variables:

CONTEXT_COMPRESSION_ENABLED=true
CONTEXT_COMPRESSION_THRESHOLD=0.85
CONTEXT_COMPRESSION_MODEL=google/gemini-2.0-flash-001

When compression triggers, you'll see:

📦 Context compression triggered (170,000 tokens ≥ 170,000 threshold)
   📊 Model context limit: 200,000 tokens (85% = 170,000)
   🗜️  Summarizing turns 4-15 (12 turns)
   ✅ Compressed: 20 → 9 messages (~45,000 tokens saved)

Scheduled Tasks (Cron Jobs)

Hermes Agent can schedule automated tasks to run in the future - either one-time reminders or recurring jobs.

CLI Commands

# List scheduled jobs
/cron

# Add a one-shot reminder (runs once in 30 minutes)
/cron add 30m Remind me to check the build status

# Add a recurring job (every 2 hours)
/cron add "every 2h" Check server status at 192.168.1.100 and report any issues

# Add a cron expression (daily at 9am)
/cron add "0 9 * * *" Generate a morning briefing summarizing GitHub notifications

# Remove a job
/cron remove abc123def456

Agent Self-Scheduling

The agent can also schedule its own follow-up tasks using tools:

# Available when using hermes-cli toolset (default for CLI)
schedule_cronjob(prompt="...", schedule="30m", repeat=1)  # One-shot
schedule_cronjob(prompt="...", schedule="every 2h")       # Recurring
list_cronjobs()                                            # View all jobs
remove_cronjob(job_id="...")                              # Cancel a job

⚠️ Important: Cronjobs run in isolated sessions with NO prior context. The prompt must be completely self-contained with all necessary information (file paths, URLs, server addresses, etc.). The future agent will not remember anything from the current conversation.

Schedule Formats

Format Example Description
Duration 30m, 2h, 1d One-shot delay from now
Interval every 30m, every 2h Recurring at fixed intervals
Cron 0 9 * * * Cron expression (requires croniter)
Timestamp 2026-02-03T14:00 One-shot at specific time

Repeat Options

repeat Behavior
(omitted) One-shot schedules run once; intervals/cron run forever
1 Run once then auto-delete
N Run N times then auto-delete

Running the Cron Daemon

Jobs are stored in ~/.hermes/cron/jobs.json and executed by a scheduler:

# Option 1: Built-in daemon (checks every 60 seconds)
python cli.py --cron-daemon

# Option 2: System cron integration (run once per minute)
# Add to crontab: crontab -e
*/1 * * * * cd ~/hermes-agent && python cli.py --cron-tick-once >> ~/.hermes/cron/cron.log 2>&1

Job Output

Job outputs are saved to ~/.hermes/cron/output/{job_id}/{timestamp}.md for review.

Interactive CLI

The CLI provides a rich interactive experience for working with the agent.

Running the CLI

# Basic usage
./hermes

# With specific model
./hermes --model "anthropic/claude-sonnet-4"

# With specific toolsets
./hermes --toolsets "web,terminal,skills"

CLI Commands

Command Description
/help Show available commands
/tools List available tools by toolset
/toolsets List available toolsets
/model [name] Show or change the current model
/prompt [text] View/set custom system prompt
/personality [name] Set a predefined personality
/clear Clear screen and reset conversation
/reset Reset conversation only
/history Show conversation history
/save Save current conversation to file
/config Show current configuration
/cron Manage scheduled tasks (list, add, remove)
/quit Exit the CLI

Configuration

Copy cli-config.yaml.example to cli-config.yaml and customize:

# Model settings
model:
  default: "anthropic/claude-sonnet-4"

# Terminal backend (local, docker, singularity, modal, or ssh)
terminal:
  env_type: "local"
  cwd: "."  # Use current directory

# Or use SSH for remote execution (keeps agent code isolated)
# terminal:
#   env_type: "ssh"
#   ssh_host: "my-server.example.com"
#   ssh_user: "myuser"
#   ssh_key: "~/.ssh/id_rsa"
#   cwd: "/home/myuser/project"

# Enable specific toolsets
toolsets:
  - all  # or: web, terminal, browser, vision, etc.

# Custom personalities (use with /personality command)
agent:
  personalities:
    helpful: "You are a helpful assistant."
    kawaii: "You are a kawaii assistant! Use cute expressions..."

Personalities

Built-in personalities available via /personality:

  • helpful, concise, technical, creative, teacher
  • kawaii, catgirl, pirate, shakespeare, surfer
  • noir, uwu, philosopher, hype

Toolsets System

The agent uses a toolsets system for organizing and managing tools. All tools must be part of a toolset to be accessible - individual tool selection is not supported. This ensures consistent and logical grouping of capabilities.

Key Concepts

  • Toolsets: Logical groups of tools for specific use cases (e.g., "research", "development", "debugging")
  • Composition: Toolsets can include other toolsets for powerful combinations
  • Custom Toolsets: Create your own toolsets at runtime or by editing toolsets.py
  • Toolset-Only Access: Tools are only accessible through toolsets, not individually

Available Toolsets

See toolsets.py for the complete list of predefined toolsets including:

  • Basic toolsets (web, terminal, vision, creative, reasoning)
  • Composite toolsets (research, development, analysis, etc.)
  • Scenario-specific toolsets (debugging, documentation, API testing, etc.)
  • Special toolsets (safe mode without terminal, minimal, offline)

Using Toolsets

# Use a predefined toolset
python run_agent.py --enabled_toolsets=research --query "Find latest AI papers"

# Combine multiple toolsets
python run_agent.py --enabled_toolsets=web,vision --query "Analyze this website"

# Enable all toolsets explicitly (same as omitting the flag)
python run_agent.py --enabled_toolsets=all --query "Do web research and run commands if helpful"

# Safe mode (no terminal access)
python run_agent.py --enabled_toolsets=safe --query "Help without running commands"

# List all available toolsets and tools
python run_agent.py --list_tools

See toolsets.py for the complete list of available toolsets and how to create custom ones.

Basic Usage

Default (all tools enabled)

# Uses OpenRouter by default - just set OPENROUTER_API_KEY in .env
python run_agent.py \
  --query "search up the latest docs on jit in python 3.13 and write me basic example that's not in their docs. profile its perf" \
  --max_turns 20 \
  --model anthropic/claude-sonnet-4-20250514

With specific toolset

python run_agent.py \
  --query "Debug this Python error" \
  --enabled_toolsets=debugging \
  --model anthropic/claude-sonnet-4-20250514

Python API

from run_agent import AIAgent

# Uses OpenRouter by default (reads OPENROUTER_API_KEY from .env)
agent = AIAgent(
    model="anthropic/claude-sonnet-4-20250514",
    enabled_toolsets=["research"]
)
response = agent.chat("Find information about quantum computing")

# Create custom toolset at runtime
from toolsets import create_custom_toolset

create_custom_toolset(
    name="my_tools",
    description="My custom toolkit",
    tools=["web_search"],
    includes=["terminal", "vision"]
)

agent = AIAgent(enabled_toolsets=["my_tools"])

Batch Processing

Process multiple prompts from a dataset in parallel with automatic checkpointing and statistics tracking:

# Basic batch processing
python batch_runner.py \
  --dataset_file=prompts.jsonl \
  --batch_size=20 \
  --run_name=my_run

# With specific distribution
python batch_runner.py \
  --dataset_file=prompts.jsonl \
  --batch_size=20 \
  --run_name=image_run \
  --distribution=image_gen \
  --num_workers=4

Key Features:

  • Parallel processing with configurable workers
  • Toolset distributions for varied data generation
  • Automatic checkpointing and resume capability
  • Combined output in data/<run_name>/trajectories.jsonl
  • Tool usage statistics and success rates

Use --list_distributions to see available toolset distributions for varied data generation.

Trajectory Compression

Post-process trajectories to fit within token budgets for training:

# Compress a directory of JSONL files
python trajectory_compressor.py --input=data/my_run

# Compress a single JSONL file
python trajectory_compressor.py --input=data/trajectories.jsonl

# Compress a 15% sample (useful for creating smaller training sets)
python trajectory_compressor.py --input=data/trajectories.jsonl --sample_percent=15

# Custom output and token target
python trajectory_compressor.py \
  --input=data/trajectories.jsonl \
  --output=data/compressed.jsonl \
  --target_max_tokens=16000

Features:

  • Protects first turns (system, human, first GPT response, first tool call)
  • Protects last N turns (configurable)
  • Summarizes middle turns using LLM to fit target token budget
  • Supports both directory and single file input
  • Optional random sampling with --sample_percent
  • Configurable via configs/trajectory_compression.yaml

Ephemeral System Prompts

The ephemeral system prompt feature allows you to guide the model's behavior during batch processing without saving that prompt to the training dataset trajectories. This is useful for:

  • Guiding model behavior during data collection
  • Adding task-specific instructions
  • Keeping saved trajectories clean and focused on tool-calling format

Example:

python batch_runner.py \
  --dataset_file=prompts.jsonl \
  --batch_size=10 \
  --run_name=my_run \
  --ephemeral_system_prompt="You are a helpful assistant focused on image generation."

The ephemeral prompt will influence the model's behavior during execution, but only the standard tool-calling system prompt will be saved in the trajectory files.

The ephemeral prompt influences model behavior during execution, but only the standard tool-calling system prompt is saved in trajectory files.

Command Line Arguments

Single Agent (run_agent.py):

  • --query: The question or task for the agent
  • --model: Model to use (default: claude-opus-4-20250514)
  • --api_key: API key for authentication
  • --base_url: API endpoint URL
  • --max_turns: Maximum number of tool-calling iterations
  • --enabled_toolsets: Comma-separated list of toolsets to enable. Use all (or *) to enable everything. If omitted, all toolsets are enabled by default.
  • --disabled_toolsets: Comma-separated list of toolsets to disable
  • --list_tools: List all available toolsets and tools
  • --save_trajectories: Save conversation trajectories to JSONL files

Batch Processing (batch_runner.py):

  • --dataset_file: Path to JSONL file with prompts
  • --batch_size: Number of prompts per batch
  • --run_name: Name for this run (for output/checkpointing)
  • --distribution: Toolset distribution to use (default: "default")
  • --num_workers: Number of parallel workers (default: 4)
  • --resume: Resume from checkpoint if interrupted
  • --ephemeral_system_prompt: System prompt used during execution but NOT saved to trajectories
  • --list_distributions: List available toolset distributions

Environment Variables

All environment variables can be configured in the .env file (copy from .env.example).

LLM Provider (OpenRouter):

  • OPENROUTER_API_KEY: Primary LLM access via OpenRouter (supports Claude, GPT-4, Gemini, etc.)
  • LLM_MODEL: Default model (e.g., anthropic/claude-sonnet-4, openai/gpt-4o)

Tool API Keys:

  • FIRECRAWL_API_KEY: Web tools (search, extract, crawl)
  • NOUS_API_KEY: Vision and reasoning tools
  • FAL_KEY: Image generation tools

Terminal Tool Configuration (mini-swe-agent backend):

  • TERMINAL_ENV: Backend type - local, docker, singularity, modal, or ssh (default: local)
  • TERMINAL_DOCKER_IMAGE: Docker image for docker backend (default: python:3.11-slim)
  • TERMINAL_SINGULARITY_IMAGE: Singularity/Apptainer image (can be docker://... URL or local .sif path)
  • TERMINAL_TIMEOUT: Command timeout in seconds (default: 60)
  • TERMINAL_LIFETIME_SECONDS: Cleanup inactive environments after this time (default: 300)
  • TERMINAL_CWD: Working directory inside containers (default: /tmp)
  • TERMINAL_SCRATCH_DIR: Custom scratch directory for sandbox storage (optional, auto-detects /scratch)
  • SUDO_PASSWORD: Enable sudo commands by piping password via sudo -S (works with all backends)
    • If unset in CLI mode, you'll be prompted interactively when sudo is needed (45s timeout)

SSH Backend Configuration (for remote execution):

  • TERMINAL_SSH_HOST: Remote server hostname or IP
  • TERMINAL_SSH_USER: SSH username
  • TERMINAL_SSH_PORT: SSH port (default: 22)
  • TERMINAL_SSH_KEY: Path to SSH private key (optional, uses ssh-agent if not set)

Context Compression (auto-shrinks long conversations):

  • CONTEXT_COMPRESSION_ENABLED: Enable auto-compression (default: true)
  • CONTEXT_COMPRESSION_THRESHOLD: Compress at this % of context limit (default: 0.85)
  • CONTEXT_COMPRESSION_MODEL: Model for generating summaries (default: google/gemini-2.0-flash-001)

Browser Tool Configuration (agent-browser + Browserbase):

  • BROWSERBASE_API_KEY: Browserbase API key for cloud browser execution
  • BROWSERBASE_PROJECT_ID: Browserbase project ID
  • BROWSER_SESSION_TIMEOUT: Session timeout in seconds (default: 300)

Legacy Hecate Terminal Backend (optional):

  • MORPH_API_KEY: For Hecate/MorphCloud terminal backend
  • HECATE_VM_LIFETIME_SECONDS: VM lifetime (default: 300)
  • HECATE_DEFAULT_SNAPSHOT_ID: Default snapshot (default: snapshot_p5294qxt)

Debug Options:

  • WEB_TOOLS_DEBUG, VISION_TOOLS_DEBUG, MOA_TOOLS_DEBUG, IMAGE_TOOLS_DEBUG: Enable debug logging

Key Files

File Purpose
hermes CLI launcher script (run with ./hermes)
cli.py Interactive CLI implementation
cli-config.yaml CLI configuration (copy from .example)
run_agent.py Main agent runner - single query execution
batch_runner.py Parallel batch processing with checkpointing
model_tools.py Core tool definitions and handlers
toolsets.py Toolset definitions and composition
toolset_distributions.py Probability distributions for data generation
trajectory_compressor.py Post-process trajectories for training
tools/ Individual tool implementations
tools/skills_tool.py Skills system with progressive disclosure
skills/ On-demand knowledge documents
docs/ Documentation
configs/ Example batch run scripts