14 KiB
Hermes Agent — Sovereign Deployment Runbook
Goal: A new VPS can go from bare OS to a running Hermes instance in under 30 minutes using only this document.
Table of Contents
- Prerequisites
- Environment Setup
- Secret Injection
- Installation
- Starting the Stack
- Health Checks
- Stop / Restart Procedures
- Zero-Downtime Restart
- Rollback Procedure
- Database / State Migrations
- Docker Compose Deployment
- systemd Deployment
- Monitoring & Logs
- Security Checklist
- Troubleshooting
1. Prerequisites
| Requirement | Minimum | Recommended |
|---|---|---|
| OS | Ubuntu 22.04 LTS | Ubuntu 24.04 LTS |
| RAM | 512 MB | 2 GB |
| CPU | 1 vCPU | 2 vCPU |
| Disk | 5 GB | 20 GB |
| Python | 3.11 | 3.12 |
| Node.js | 18 | 20 |
| Git | any | any |
Optional but recommended:
- Docker Engine ≥ 24 + Compose plugin (for containerised deployment)
curl,jq(for health-check scripting)
2. Environment Setup
2a. Create a dedicated system user (bare-metal deployments)
sudo useradd -m -s /bin/bash hermes
sudo su - hermes
2b. Install Hermes
# Official one-liner installer
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
# Reload PATH so `hermes` is available
source ~/.bashrc
The installer places:
- The agent code at
~/.local/lib/python3.x/site-packages/(pip editable install) - The
hermesentry point at~/.local/bin/hermes - Default config directory at
~/.hermes/
2c. Verify installation
hermes --version
hermes doctor
3. Secret Injection
Rule: secrets never live in the repository. They live only in ~/.hermes/.env.
# Copy the template (do NOT edit the repo copy)
cp /path/to/hermes-agent/.env.example ~/.hermes/.env
chmod 600 ~/.hermes/.env
# Edit with your preferred editor
nano ~/.hermes/.env
Minimum required keys
| Variable | Purpose | Where to get it |
|---|---|---|
OPENROUTER_API_KEY |
LLM inference | https://openrouter.ai/keys |
TELEGRAM_BOT_TOKEN |
Telegram gateway | @BotFather on Telegram |
Optional but common keys
| Variable | Purpose |
|---|---|
DISCORD_BOT_TOKEN |
Discord gateway |
SLACK_BOT_TOKEN + SLACK_APP_TOKEN |
Slack gateway |
EXA_API_KEY |
Web search tool |
FAL_KEY |
Image generation |
ANTHROPIC_API_KEY |
Direct Anthropic inference |
Pre-flight validation
Before starting the stack, run:
python scripts/deploy-validate --check-ports --skip-health
This catches missing keys, placeholder values, and misconfigurations without touching running services.
4. Installation
4a. Clone the repository (if not using the installer)
git clone https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent.git
cd hermes-agent
pip install -e ".[all]" --user
npm install
4b. Run the setup wizard
hermes setup
The wizard configures your LLM provider, messaging platforms, and data directory interactively.
5. Starting the Stack
Bare-metal (foreground — useful for first run)
# Agent + gateway combined
hermes gateway start
# Or just the CLI agent (no messaging)
hermes
Bare-metal (background daemon)
hermes gateway start &
echo $! > ~/.hermes/gateway.pid
Via systemd (recommended for production)
See Section 12.
Via Docker Compose
See Section 11.
6. Health Checks
6a. API server liveness probe
The API server (enabled via api_server platform in gateway config) exposes /health:
curl -s http://127.0.0.1:8642/health | jq .
Expected response:
{
"status": "ok",
"platform": "hermes-agent",
"version": "0.5.0",
"uptime_seconds": 123,
"gateway_state": "running",
"platforms": {
"telegram": {"state": "connected"},
"discord": {"state": "connected"}
}
}
| Field | Meaning |
|---|---|
status |
"ok" — HTTP server is alive. Any non-200 = down. |
gateway_state |
"running" — all platforms started. "starting" — still initialising. |
platforms |
Per-adapter connection state. |
6b. Gateway runtime status file
cat ~/.hermes/gateway_state.json | jq '{state: .gateway_state, platforms: .platforms}'
6c. Deploy-validate script
python scripts/deploy-validate
Runs all checks and prints a pass/fail summary. Exit code 0 = healthy.
6d. systemd health
systemctl status hermes-gateway
journalctl -u hermes-gateway --since "5 minutes ago"
7. Stop / Restart Procedures
Graceful stop
# systemd
sudo systemctl stop hermes-gateway
# Docker Compose
docker compose -f deploy/docker-compose.yml down
# Process signal (if running ad-hoc)
kill -TERM $(cat ~/.hermes/gateway.pid)
Restart
# systemd
sudo systemctl restart hermes-gateway
# Docker Compose
docker compose -f deploy/docker-compose.yml restart hermes
# Ad-hoc
hermes gateway start --replace
The --replace flag removes stale PID/lock files from an unclean shutdown before starting.
8. Zero-Downtime Restart
Hermes is a stateful long-running process (persistent sessions, active cron jobs). True zero-downtime requires careful sequencing.
Strategy A — systemd rolling restart (recommended)
systemd's Restart=on-failure with a 5-second back-off ensures automatic recovery from crashes. For intentional restarts, use:
sudo systemctl reload-or-restart hermes-gateway
hermes-gateway.service uses TimeoutStopSec=30 so in-flight agent turns finish before the old process dies.
Note: Active messaging conversations will see a brief pause (< 30 s) while the gateway reconnects to platforms. The session store is file-based and persists across restarts — conversations resume where they left off.
Strategy B — Blue/green with two HERMES_HOME directories
For zero-downtime where even a brief pause is unacceptable:
# 1. Prepare the new environment (different HERMES_HOME)
export HERMES_HOME=/home/hermes/.hermes-green
hermes setup # configure green env with same .env
# 2. Start green on a different port (e.g. 8643)
API_SERVER_PORT=8643 hermes gateway start &
# 3. Verify green is healthy
curl -s http://127.0.0.1:8643/health | jq .gateway_state
# 4. Switch load balancer (nginx/caddy) to port 8643
# 5. Gracefully stop blue
kill -TERM $(cat ~/.hermes/.hermes/gateway.pid)
Strategy C — Docker Compose rolling update
# Pull the new image
docker compose -f deploy/docker-compose.yml pull hermes
# Recreate with zero-downtime if you have a replicated setup
docker compose -f deploy/docker-compose.yml up -d --no-deps hermes
Docker stops the old container only after the new one passes its healthcheck.
9. Rollback Procedure
9a. Code rollback (pip install)
# Find the previous version tag
git log --oneline --tags | head -10
# Roll back to a specific tag
git checkout v0.4.0
pip install -e ".[all]" --user --quiet
# Restart the gateway
sudo systemctl restart hermes-gateway
9b. Docker image rollback
# Pull a specific version
docker pull ghcr.io/nousresearch/hermes-agent:v0.4.0
# Update docker-compose.yml image tag, then:
docker compose -f deploy/docker-compose.yml up -d
9c. State / data rollback
The data directory (~/.hermes/ or the Docker volume hermes_data) contains sessions, memories, cron jobs, and the response store. Back it up before every update:
# Backup (run BEFORE updating)
tar czf ~/backups/hermes_data_$(date +%F_%H%M).tar.gz ~/.hermes/
# Restore from backup
sudo systemctl stop hermes-gateway
rm -rf ~/.hermes/
tar xzf ~/backups/hermes_data_2026-04-06_1200.tar.gz -C ~/
sudo systemctl start hermes-gateway
Tested rollback: The rollback procedure above was validated in staging on 2026-04-06. Data integrity was confirmed by checking session count before/after:
ls ~/.hermes/sessions/ | wc -l.
10. Database / State Migrations
Hermes uses two persistent stores:
| Store | Location | Format |
|---|---|---|
| Session store | ~/.hermes/sessions/*.json |
JSON files |
| Response store (API server) | ~/.hermes/response_store.db |
SQLite WAL |
| Gateway state | ~/.hermes/gateway_state.json |
JSON |
| Memories | ~/.hermes/memories/*.md |
Markdown files |
| Cron jobs | ~/.hermes/cron/*.json |
JSON files |
Migration steps (between versions)
- Stop the gateway before migrating.
- Backup the data directory (see Section 9c).
- Check release notes for migration instructions (see
RELEASE_*.md). - Run
hermes doctorafter starting the new version — it validates state compatibility. - Verify health via
python scripts/deploy-validate.
There are currently no SQL migrations to run manually. The SQLite schema is
created automatically on first use with CREATE TABLE IF NOT EXISTS.
11. Docker Compose Deployment
First-time setup
# 1. Copy .env.example to .env in the repo root
cp .env.example .env
nano .env # fill in your API keys
# 2. Validate config before starting
python scripts/deploy-validate --skip-health
# 3. Start the stack
docker compose -f deploy/docker-compose.yml up -d
# 4. Watch startup logs
docker compose -f deploy/docker-compose.yml logs -f
# 5. Verify health
curl -s http://127.0.0.1:8642/health | jq .
Updating to a new version
# Pull latest image
docker compose -f deploy/docker-compose.yml pull
# Recreate container (Docker waits for healthcheck before stopping old)
docker compose -f deploy/docker-compose.yml up -d
# Watch logs
docker compose -f deploy/docker-compose.yml logs -f --since 2m
Data backup (Docker)
docker run --rm \
-v hermes_data:/data \
-v $(pwd)/backups:/backup \
alpine tar czf /backup/hermes_data_$(date +%F).tar.gz /data
12. systemd Deployment
Install unit files
# From the repo root
sudo cp deploy/hermes-agent.service /etc/systemd/system/
sudo cp deploy/hermes-gateway.service /etc/systemd/system/
sudo systemctl daemon-reload
# Enable on boot + start now
sudo systemctl enable --now hermes-gateway
# (Optional) also run the CLI agent as a background service
# sudo systemctl enable --now hermes-agent
Adjust the unit file for your user/paths
Edit /etc/systemd/system/hermes-gateway.service:
[Service]
User=youruser # change from 'hermes'
WorkingDirectory=/home/youruser
EnvironmentFile=/home/youruser/.hermes/.env
ExecStart=/home/youruser/.local/bin/hermes gateway start --replace
Then:
sudo systemctl daemon-reload
sudo systemctl restart hermes-gateway
Verify
systemctl status hermes-gateway
journalctl -u hermes-gateway -f
13. Monitoring & Logs
Log locations
| Log | Location |
|---|---|
| Gateway (systemd) | journalctl -u hermes-gateway |
| Gateway (Docker) | docker compose logs hermes |
| Session trajectories | ~/.hermes/logs/session_*.json |
| Deploy events | ~/.hermes/logs/deploy.log |
| Runtime state | ~/.hermes/gateway_state.json |
Useful log commands
# Last 100 lines, follow
journalctl -u hermes-gateway -n 100 -f
# Errors only
journalctl -u hermes-gateway -p err --since today
# Docker: structured logs with timestamps
docker compose -f deploy/docker-compose.yml logs --timestamps hermes
Alerting
Add a cron job on the host to page you if the health check fails:
# /etc/cron.d/hermes-healthcheck
* * * * * root curl -sf http://127.0.0.1:8642/health > /dev/null || \
echo "Hermes unhealthy at $(date)" | mail -s "ALERT: Hermes down" ops@example.com
14. Security Checklist
.envhas permissions600and is not tracked by git (git ls-files .envreturns nothing).API_SERVER_KEYis set if the API server is exposed beyond127.0.0.1.- API server is bound to
127.0.0.1(not0.0.0.0) unless behind a TLS-terminating reverse proxy. - Firewall allows only the ports your platforms require (no unnecessary open ports).
- systemd unit uses
NoNewPrivileges=true,PrivateTmp=true,ProtectSystem=strict. - Docker container has resource limits set (
deploy.resources.limits). - Backups of
~/.hermes/are stored outside the server (e.g. S3, remote NAS). hermes doctorreturns no errors on the running instance.python scripts/deploy-validateexits 0 after every configuration change.
15. Troubleshooting
Gateway won't start
hermes gateway start --replace # clears stale PID files
# Check for port conflicts
ss -tlnp | grep 8642
# Verbose logs
HERMES_LOG_LEVEL=DEBUG hermes gateway start
Health check returns gateway_state: "starting" for more than 60 s
Platform adapters take time to authenticate (especially Telegram + Discord). Check logs for auth errors:
journalctl -u hermes-gateway --since "2 minutes ago" | grep -i "error\|token\|auth"
/health returns connection refused
The API server platform may not be enabled. Verify your gateway config (~/.hermes/config.yaml) includes:
gateway:
platforms:
- api_server
Rollback needed after failed update
See Section 9. If you backed up before updating, rollback takes < 5 minutes.
Sessions lost after restart
Sessions are file-based in ~/.hermes/sessions/. They persist across restarts. If they are gone, check:
ls -la ~/.hermes/sessions/
# Verify the volume is mounted (Docker):
docker exec hermes-agent ls /opt/data/sessions/
This runbook is owned by the Bezalel epic backlog. Update it whenever deployment procedures change.