# Hermes Agent — Sovereign Deployment Runbook > **Goal**: A new VPS can go from bare OS to a running Hermes instance in under 30 minutes using only this document. --- ## Table of Contents 1. [Prerequisites](#1-prerequisites) 2. [Environment Setup](#2-environment-setup) 3. [Secret Injection](#3-secret-injection) 4. [Installation](#4-installation) 5. [Starting the Stack](#5-starting-the-stack) 6. [Health Checks](#6-health-checks) 7. [Stop / Restart Procedures](#7-stop--restart-procedures) 8. [Zero-Downtime Restart](#8-zero-downtime-restart) 9. [Rollback Procedure](#9-rollback-procedure) 10. [Database / State Migrations](#10-database--state-migrations) 11. [Docker Compose Deployment](#11-docker-compose-deployment) 12. [systemd Deployment](#12-systemd-deployment) 13. [Monitoring & Logs](#13-monitoring--logs) 14. [Security Checklist](#14-security-checklist) 15. [Troubleshooting](#15-troubleshooting) --- ## 1. Prerequisites | Requirement | Minimum | Recommended | |-------------|---------|-------------| | OS | Ubuntu 22.04 LTS | Ubuntu 24.04 LTS | | RAM | 512 MB | 2 GB | | CPU | 1 vCPU | 2 vCPU | | Disk | 5 GB | 20 GB | | Python | 3.11 | 3.12 | | Node.js | 18 | 20 | | Git | any | any | **Optional but recommended:** - Docker Engine ≥ 24 + Compose plugin (for containerised deployment) - `curl`, `jq` (for health-check scripting) --- ## 2. Environment Setup ### 2a. Create a dedicated system user (bare-metal deployments) ```bash sudo useradd -m -s /bin/bash hermes sudo su - hermes ``` ### 2b. Install Hermes ```bash # Official one-liner installer curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash # Reload PATH so `hermes` is available source ~/.bashrc ``` The installer places: - The agent code at `~/.local/lib/python3.x/site-packages/` (pip editable install) - The `hermes` entry point at `~/.local/bin/hermes` - Default config directory at `~/.hermes/` ### 2c. Verify installation ```bash hermes --version hermes doctor ``` --- ## 3. Secret Injection **Rule: secrets never live in the repository. They live only in `~/.hermes/.env`.** ```bash # Copy the template (do NOT edit the repo copy) cp /path/to/hermes-agent/.env.example ~/.hermes/.env chmod 600 ~/.hermes/.env # Edit with your preferred editor nano ~/.hermes/.env ``` ### Minimum required keys | Variable | Purpose | Where to get it | |----------|---------|----------------| | `OPENROUTER_API_KEY` | LLM inference | https://openrouter.ai/keys | | `TELEGRAM_BOT_TOKEN` | Telegram gateway | @BotFather on Telegram | ### Optional but common keys | Variable | Purpose | |----------|---------| | `DISCORD_BOT_TOKEN` | Discord gateway | | `SLACK_BOT_TOKEN` + `SLACK_APP_TOKEN` | Slack gateway | | `EXA_API_KEY` | Web search tool | | `FAL_KEY` | Image generation | | `ANTHROPIC_API_KEY` | Direct Anthropic inference | ### Pre-flight validation Before starting the stack, run: ```bash python scripts/deploy-validate --check-ports --skip-health ``` This catches missing keys, placeholder values, and misconfigurations without touching running services. --- ## 4. Installation ### 4a. Clone the repository (if not using the installer) ```bash git clone https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent.git cd hermes-agent pip install -e ".[all]" --user npm install ``` ### 4b. Run the setup wizard ```bash hermes setup ``` The wizard configures your LLM provider, messaging platforms, and data directory interactively. --- ## 5. Starting the Stack ### Bare-metal (foreground — useful for first run) ```bash # Agent + gateway combined hermes gateway start # Or just the CLI agent (no messaging) hermes ``` ### Bare-metal (background daemon) ```bash hermes gateway start & echo $! > ~/.hermes/gateway.pid ``` ### Via systemd (recommended for production) See [Section 12](#12-systemd-deployment). ### Via Docker Compose See [Section 11](#11-docker-compose-deployment). --- ## 6. Health Checks ### 6a. API server liveness probe The API server (enabled via `api_server` platform in gateway config) exposes `/health`: ```bash curl -s http://127.0.0.1:8642/health | jq . ``` Expected response: ```json { "status": "ok", "platform": "hermes-agent", "version": "0.5.0", "uptime_seconds": 123, "gateway_state": "running", "platforms": { "telegram": {"state": "connected"}, "discord": {"state": "connected"} } } ``` | Field | Meaning | |-------|---------| | `status` | `"ok"` — HTTP server is alive. Any non-200 = down. | | `gateway_state` | `"running"` — all platforms started. `"starting"` — still initialising. | | `platforms` | Per-adapter connection state. | ### 6b. Gateway runtime status file ```bash cat ~/.hermes/gateway_state.json | jq '{state: .gateway_state, platforms: .platforms}' ``` ### 6c. Deploy-validate script ```bash python scripts/deploy-validate ``` Runs all checks and prints a pass/fail summary. Exit code 0 = healthy. ### 6d. systemd health ```bash systemctl status hermes-gateway journalctl -u hermes-gateway --since "5 minutes ago" ``` --- ## 7. Stop / Restart Procedures ### Graceful stop ```bash # systemd sudo systemctl stop hermes-gateway # Docker Compose docker compose -f deploy/docker-compose.yml down # Process signal (if running ad-hoc) kill -TERM $(cat ~/.hermes/gateway.pid) ``` ### Restart ```bash # systemd sudo systemctl restart hermes-gateway # Docker Compose docker compose -f deploy/docker-compose.yml restart hermes # Ad-hoc hermes gateway start --replace ``` The `--replace` flag removes stale PID/lock files from an unclean shutdown before starting. --- ## 8. Zero-Downtime Restart Hermes is a stateful long-running process (persistent sessions, active cron jobs). True zero-downtime requires careful sequencing. ### Strategy A — systemd rolling restart (recommended) systemd's `Restart=on-failure` with a 5-second back-off ensures automatic recovery from crashes. For intentional restarts, use: ```bash sudo systemctl reload-or-restart hermes-gateway ``` `hermes-gateway.service` uses `TimeoutStopSec=30` so in-flight agent turns finish before the old process dies. > **Note:** Active messaging conversations will see a brief pause (< 30 s) while the gateway reconnects to platforms. The session store is file-based and persists across restarts — conversations resume where they left off. ### Strategy B — Blue/green with two HERMES_HOME directories For zero-downtime where even a brief pause is unacceptable: ```bash # 1. Prepare the new environment (different HERMES_HOME) export HERMES_HOME=/home/hermes/.hermes-green hermes setup # configure green env with same .env # 2. Start green on a different port (e.g. 8643) API_SERVER_PORT=8643 hermes gateway start & # 3. Verify green is healthy curl -s http://127.0.0.1:8643/health | jq .gateway_state # 4. Switch load balancer (nginx/caddy) to port 8643 # 5. Gracefully stop blue kill -TERM $(cat ~/.hermes/.hermes/gateway.pid) ``` ### Strategy C — Docker Compose rolling update ```bash # Pull the new image docker compose -f deploy/docker-compose.yml pull hermes # Recreate with zero-downtime if you have a replicated setup docker compose -f deploy/docker-compose.yml up -d --no-deps hermes ``` Docker stops the old container only after the new one passes its healthcheck. --- ## 9. Rollback Procedure ### 9a. Code rollback (pip install) ```bash # Find the previous version tag git log --oneline --tags | head -10 # Roll back to a specific tag git checkout v0.4.0 pip install -e ".[all]" --user --quiet # Restart the gateway sudo systemctl restart hermes-gateway ``` ### 9b. Docker image rollback ```bash # Pull a specific version docker pull ghcr.io/nousresearch/hermes-agent:v0.4.0 # Update docker-compose.yml image tag, then: docker compose -f deploy/docker-compose.yml up -d ``` ### 9c. State / data rollback The data directory (`~/.hermes/` or the Docker volume `hermes_data`) contains sessions, memories, cron jobs, and the response store. Back it up before every update: ```bash # Backup (run BEFORE updating) tar czf ~/backups/hermes_data_$(date +%F_%H%M).tar.gz ~/.hermes/ # Restore from backup sudo systemctl stop hermes-gateway rm -rf ~/.hermes/ tar xzf ~/backups/hermes_data_2026-04-06_1200.tar.gz -C ~/ sudo systemctl start hermes-gateway ``` > **Tested rollback**: The rollback procedure above was validated in staging on 2026-04-06. Data integrity was confirmed by checking session count before/after: `ls ~/.hermes/sessions/ | wc -l`. --- ## 10. Database / State Migrations Hermes uses two persistent stores: | Store | Location | Format | |-------|----------|--------| | Session store | `~/.hermes/sessions/*.json` | JSON files | | Response store (API server) | `~/.hermes/response_store.db` | SQLite WAL | | Gateway state | `~/.hermes/gateway_state.json` | JSON | | Memories | `~/.hermes/memories/*.md` | Markdown files | | Cron jobs | `~/.hermes/cron/*.json` | JSON files | ### Migration steps (between versions) 1. **Stop** the gateway before migrating. 2. **Backup** the data directory (see Section 9c). 3. **Check release notes** for migration instructions (see `RELEASE_*.md`). 4. **Run** `hermes doctor` after starting the new version — it validates state compatibility. 5. **Verify** health via `python scripts/deploy-validate`. There are currently no SQL migrations to run manually. The SQLite schema is created automatically on first use with `CREATE TABLE IF NOT EXISTS`. --- ## 11. Docker Compose Deployment ### First-time setup ```bash # 1. Copy .env.example to .env in the repo root cp .env.example .env nano .env # fill in your API keys # 2. Validate config before starting python scripts/deploy-validate --skip-health # 3. Start the stack docker compose -f deploy/docker-compose.yml up -d # 4. Watch startup logs docker compose -f deploy/docker-compose.yml logs -f # 5. Verify health curl -s http://127.0.0.1:8642/health | jq . ``` ### Updating to a new version ```bash # Pull latest image docker compose -f deploy/docker-compose.yml pull # Recreate container (Docker waits for healthcheck before stopping old) docker compose -f deploy/docker-compose.yml up -d # Watch logs docker compose -f deploy/docker-compose.yml logs -f --since 2m ``` ### Data backup (Docker) ```bash docker run --rm \ -v hermes_data:/data \ -v $(pwd)/backups:/backup \ alpine tar czf /backup/hermes_data_$(date +%F).tar.gz /data ``` --- ## 12. systemd Deployment ### Install unit files ```bash # From the repo root sudo cp deploy/hermes-agent.service /etc/systemd/system/ sudo cp deploy/hermes-gateway.service /etc/systemd/system/ sudo systemctl daemon-reload # Enable on boot + start now sudo systemctl enable --now hermes-gateway # (Optional) also run the CLI agent as a background service # sudo systemctl enable --now hermes-agent ``` ### Adjust the unit file for your user/paths Edit `/etc/systemd/system/hermes-gateway.service`: ```ini [Service] User=youruser # change from 'hermes' WorkingDirectory=/home/youruser EnvironmentFile=/home/youruser/.hermes/.env ExecStart=/home/youruser/.local/bin/hermes gateway start --replace ``` Then: ```bash sudo systemctl daemon-reload sudo systemctl restart hermes-gateway ``` ### Verify ```bash systemctl status hermes-gateway journalctl -u hermes-gateway -f ``` --- ## 13. Monitoring & Logs ### Log locations | Log | Location | |-----|----------| | Gateway (systemd) | `journalctl -u hermes-gateway` | | Gateway (Docker) | `docker compose logs hermes` | | Session trajectories | `~/.hermes/logs/session_*.json` | | Deploy events | `~/.hermes/logs/deploy.log` | | Runtime state | `~/.hermes/gateway_state.json` | ### Useful log commands ```bash # Last 100 lines, follow journalctl -u hermes-gateway -n 100 -f # Errors only journalctl -u hermes-gateway -p err --since today # Docker: structured logs with timestamps docker compose -f deploy/docker-compose.yml logs --timestamps hermes ``` ### Alerting Add a cron job on the host to page you if the health check fails: ```bash # /etc/cron.d/hermes-healthcheck * * * * * root curl -sf http://127.0.0.1:8642/health > /dev/null || \ echo "Hermes unhealthy at $(date)" | mail -s "ALERT: Hermes down" ops@example.com ``` --- ## 14. Security Checklist - [ ] `.env` has permissions `600` and is **not** tracked by git (`git ls-files .env` returns nothing). - [ ] `API_SERVER_KEY` is set if the API server is exposed beyond `127.0.0.1`. - [ ] API server is bound to `127.0.0.1` (not `0.0.0.0`) unless behind a TLS-terminating reverse proxy. - [ ] Firewall allows only the ports your platforms require (no unnecessary open ports). - [ ] systemd unit uses `NoNewPrivileges=true`, `PrivateTmp=true`, `ProtectSystem=strict`. - [ ] Docker container has resource limits set (`deploy.resources.limits`). - [ ] Backups of `~/.hermes/` are stored outside the server (e.g. S3, remote NAS). - [ ] `hermes doctor` returns no errors on the running instance. - [ ] `python scripts/deploy-validate` exits 0 after every configuration change. --- ## 15. Troubleshooting ### Gateway won't start ```bash hermes gateway start --replace # clears stale PID files # Check for port conflicts ss -tlnp | grep 8642 # Verbose logs HERMES_LOG_LEVEL=DEBUG hermes gateway start ``` ### Health check returns `gateway_state: "starting"` for more than 60 s Platform adapters take time to authenticate (especially Telegram + Discord). Check logs for auth errors: ```bash journalctl -u hermes-gateway --since "2 minutes ago" | grep -i "error\|token\|auth" ``` ### `/health` returns connection refused The API server platform may not be enabled. Verify your gateway config (`~/.hermes/config.yaml`) includes: ```yaml gateway: platforms: - api_server ``` ### Rollback needed after failed update See [Section 9](#9-rollback-procedure). If you backed up before updating, rollback takes < 5 minutes. ### Sessions lost after restart Sessions are file-based in `~/.hermes/sessions/`. They persist across restarts. If they are gone, check: ```bash ls -la ~/.hermes/sessions/ # Verify the volume is mounted (Docker): docker exec hermes-agent ls /opt/data/sessions/ ``` --- *This runbook is owned by the Bezalel epic backlog. Update it whenever deployment procedures change.*