6.9 KiB
Forge Operations Guide
Audience: Forge wizards joining the hermes-agent project Purpose: Practical patterns, common pitfalls, and operational wisdom Companion to:
WIZARD_ENVIRONMENT_CONTRACT.md
The One Rule
Read the actual state before acting.
Before touching any service, config, or codebase: ps aux | grep hermes, cat ~/.hermes/gateway_state.json, curl http://127.0.0.1:8642/health. The forge punishes assumptions harder than it rewards speed. Evidence always beats intuition.
First 15 Minutes on a New System
# 1. Validate your environment
python wizard-bootstrap/wizard_bootstrap.py
# 2. Check what is actually running
ps aux | grep -E 'hermes|python|gateway'
# 3. Check the data directory
ls -la ~/.hermes/
cat ~/.hermes/gateway_state.json 2>/dev/null | python3 -m json.tool
# 4. Verify health endpoints (if gateway is up)
curl -sf http://127.0.0.1:8642/health | python3 -m json.tool
# 5. Run the smoke test
source venv/bin/activate
python -m pytest tests/ -q -x --timeout=60 2>&1 | tail -20
Do not begin work until all five steps return clean output.
Import Chain — Know It, Respect It
The dependency order is load-bearing. Violating it causes silent failures:
tools/registry.py ← no deps; imported by everything
↑
tools/*.py ← each calls registry.register() at import time
↑
model_tools.py ← imports registry; triggers tool discovery
↑
run_agent.py / cli.py / batch_runner.py
If you add a tool file, you must also:
- Add its import to
model_tools.py_discover_tools() - Add it to
toolsets.py(core or a named toolset)
Missing either step causes the tool to silently not appear — no error, just absence.
The Five Profile Rules
Hermes supports isolated profiles (hermes -p myprofile). Profile-unsafe code has caused repeated bugs. Memorize these:
| Do this | Not this |
|---|---|
get_hermes_home() |
Path.home() / ".hermes" |
display_hermes_home() in user messages |
hardcoded ~/.hermes strings |
get_hermes_home() / "sessions" in tests |
~/.hermes/sessions in tests |
Import both from hermes_constants. Every ~/.hermes hardcode is a latent profile bug.
Prompt Caching — Do Not Break It
The agent caches system prompts. Cache breaks force re-billing of the entire context window on every turn. The following actions break caching mid-conversation and are forbidden:
- Altering past context
- Changing the active toolset
- Reloading memories or rebuilding the system prompt
The only sanctioned context alteration is the context compressor (agent/context_compressor.py). If your feature touches the message history, read that file first.
Adding a Slash Command (Checklist)
Four files, in order:
hermes_cli/commands.py— addCommandDeftoCOMMAND_REGISTRYcli.py— add handler branch inHermesCLI.process_command()gateway/run.py— add handler if it should work in messaging platforms- Aliases — add to the
aliasestuple on theCommandDef; everything else updates automatically
All downstream consumers (Telegram menu, Slack routing, autocomplete, help text) derive from COMMAND_REGISTRY. You never touch them directly.
Tool Schema Pitfalls
Do NOT cross-reference other toolsets in schema descriptions.
Writing "prefer web_search over this tool" in a browser tool's description will cause the model to hallucinate calls to web_search when it's not loaded. Cross-references belong in get_tool_definitions() post-processing blocks in model_tools.py.
Do NOT use \033[K (ANSI erase-to-EOL) in display code.
Under prompt_toolkit's patch_stdout, it leaks as literal ?[K. Use space-padding instead: f"\r{line}{' ' * pad}".
Do NOT use simple_term_menu for interactive menus.
It ghosts on scroll in tmux/iTerm2. Use curses (stdlib). See hermes_cli/tools_config.py for the pattern.
Health Check Anatomy
A healthy instance returns:
{
"status": "ok",
"gateway_state": "running",
"platforms": {
"telegram": {"state": "connected"}
}
}
| Field | Healthy value | What a bad value means |
|---|---|---|
status |
"ok" |
HTTP server down |
gateway_state |
"running" |
Still starting or crashed |
platforms.<name>.state |
"connected" |
Auth failure or network issue |
gateway_state: "starting" is normal for up to 60 s on boot. Beyond that, check logs for auth errors:
journalctl -u hermes-gateway --since "2 minutes ago" | grep -i "error\|token\|auth"
Gateway Won't Start — Diagnosis Order
ss -tlnp | grep 8642— port conflict?cat ~/.hermes/gateway.pid→ps -p <pid>— stale PID file?hermes gateway start --replace— clears stale locks and PIDsHERMES_LOG_LEVEL=DEBUG hermes gateway start— verbose output- Check
~/.hermes/.env— missing or placeholder token?
Before Every PR
source venv/bin/activate
python -m pytest tests/ -q # full suite: ~3 min, ~3000 tests
python scripts/deploy-validate # deployment health check
python wizard-bootstrap/wizard_bootstrap.py # environment sanity
All three must exit 0. Do not skip. "It works locally" is not sufficient evidence.
Session and State Files
| Store | Location | Notes |
|---|---|---|
| Sessions | ~/.hermes/sessions/*.json |
Persisted across restarts |
| Memories | ~/.hermes/memories/*.md |
Written by the agent's memory tool |
| Cron jobs | ~/.hermes/cron/*.json |
Scheduler state |
| Gateway state | ~/.hermes/gateway_state.json |
Live platform connection status |
| Response store | ~/.hermes/response_store.db |
SQLite WAL — API server only |
All paths go through get_hermes_home(). Never hardcode. Always backup before a major update:
tar czf ~/backups/hermes_$(date +%F_%H%M).tar.gz ~/.hermes/
Writing Tests
python -m pytest tests/path/to/test.py -q # single file
python -m pytest tests/ -q -k "test_name" # by name
python -m pytest tests/ -q -x # stop on first failure
Test isolation rules:
tests/conftest.pyhas an autouse fixture that redirectsHERMES_HOMEto a temp dir. Never write to~/.hermes/in tests.- Profile tests must mock both
Path.home()andHERMES_HOME. Seetests/hermes_cli/test_profiles.pyfor the pattern. - Do not mock the database. Integration tests should use real SQLite with a temp path.
Commit Conventions
feat: add X # new capability
fix: correct Y # bug fix
refactor: restructure Z # no behaviour change
test: add tests for W # test-only
chore: update deps # housekeeping
docs: clarify X # documentation only
Include Fixes #NNN or Refs #NNN in the commit message body to close or reference issues automatically.
This guide lives in wizard-bootstrap/. Update it when you discover a new pitfall or pattern worth preserving.