# Forge Operations Guide > **Audience:** Forge wizards joining the hermes-agent project > **Purpose:** Practical patterns, common pitfalls, and operational wisdom > **Companion to:** `WIZARD_ENVIRONMENT_CONTRACT.md` --- ## The One Rule **Read the actual state before acting.** Before touching any service, config, or codebase: `ps aux | grep hermes`, `cat ~/.hermes/gateway_state.json`, `curl http://127.0.0.1:8642/health`. The forge punishes assumptions harder than it rewards speed. Evidence always beats intuition. --- ## First 15 Minutes on a New System ```bash # 1. Validate your environment python wizard-bootstrap/wizard_bootstrap.py # 2. Check what is actually running ps aux | grep -E 'hermes|python|gateway' # 3. Check the data directory ls -la ~/.hermes/ cat ~/.hermes/gateway_state.json 2>/dev/null | python3 -m json.tool # 4. Verify health endpoints (if gateway is up) curl -sf http://127.0.0.1:8642/health | python3 -m json.tool # 5. Run the smoke test source venv/bin/activate python -m pytest tests/ -q -x --timeout=60 2>&1 | tail -20 ``` Do not begin work until all five steps return clean output. --- ## Import Chain — Know It, Respect It The dependency order is load-bearing. Violating it causes silent failures: ``` tools/registry.py ← no deps; imported by everything ↑ tools/*.py ← each calls registry.register() at import time ↑ model_tools.py ← imports registry; triggers tool discovery ↑ run_agent.py / cli.py / batch_runner.py ``` **If you add a tool file**, you must also: 1. Add its import to `model_tools.py` `_discover_tools()` 2. Add it to `toolsets.py` (core or a named toolset) Missing either step causes the tool to silently not appear — no error, just absence. --- ## The Five Profile Rules Hermes supports isolated profiles (`hermes -p myprofile`). Profile-unsafe code has caused repeated bugs. Memorize these: | Do this | Not this | |---------|----------| | `get_hermes_home()` | `Path.home() / ".hermes"` | | `display_hermes_home()` in user messages | hardcoded `~/.hermes` strings | | `get_hermes_home() / "sessions"` in tests | `~/.hermes/sessions` in tests | Import both from `hermes_constants`. Every `~/.hermes` hardcode is a latent profile bug. --- ## Prompt Caching — Do Not Break It The agent caches system prompts. Cache breaks force re-billing of the entire context window on every turn. The following actions break caching mid-conversation and are forbidden: - Altering past context - Changing the active toolset - Reloading memories or rebuilding the system prompt The only sanctioned context alteration is the context compressor (`agent/context_compressor.py`). If your feature touches the message history, read that file first. --- ## Adding a Slash Command (Checklist) Four files, in order: 1. **`hermes_cli/commands.py`** — add `CommandDef` to `COMMAND_REGISTRY` 2. **`cli.py`** — add handler branch in `HermesCLI.process_command()` 3. **`gateway/run.py`** — add handler if it should work in messaging platforms 4. **Aliases** — add to the `aliases` tuple on the `CommandDef`; everything else updates automatically All downstream consumers (Telegram menu, Slack routing, autocomplete, help text) derive from `COMMAND_REGISTRY`. You never touch them directly. --- ## Tool Schema Pitfalls **Do NOT cross-reference other toolsets in schema descriptions.** Writing "prefer `web_search` over this tool" in a browser tool's description will cause the model to hallucinate calls to `web_search` when it's not loaded. Cross-references belong in `get_tool_definitions()` post-processing blocks in `model_tools.py`. **Do NOT use `\033[K` (ANSI erase-to-EOL) in display code.** Under `prompt_toolkit`'s `patch_stdout`, it leaks as literal `?[K`. Use space-padding instead: `f"\r{line}{' ' * pad}"`. **Do NOT use `simple_term_menu` for interactive menus.** It ghosts on scroll in tmux/iTerm2. Use `curses` (stdlib). See `hermes_cli/tools_config.py` for the pattern. --- ## Health Check Anatomy A healthy instance returns: ```json { "status": "ok", "gateway_state": "running", "platforms": { "telegram": {"state": "connected"} } } ``` | Field | Healthy value | What a bad value means | |-------|--------------|----------------------| | `status` | `"ok"` | HTTP server down | | `gateway_state` | `"running"` | Still starting or crashed | | `platforms..state` | `"connected"` | Auth failure or network issue | `gateway_state: "starting"` is normal for up to 60 s on boot. Beyond that, check logs for auth errors: ```bash journalctl -u hermes-gateway --since "2 minutes ago" | grep -i "error\|token\|auth" ``` --- ## Gateway Won't Start — Diagnosis Order 1. `ss -tlnp | grep 8642` — port conflict? 2. `cat ~/.hermes/gateway.pid` → `ps -p ` — stale PID file? 3. `hermes gateway start --replace` — clears stale locks and PIDs 4. `HERMES_LOG_LEVEL=DEBUG hermes gateway start` — verbose output 5. Check `~/.hermes/.env` — missing or placeholder token? --- ## Before Every PR ```bash source venv/bin/activate python -m pytest tests/ -q # full suite: ~3 min, ~3000 tests python scripts/deploy-validate # deployment health check python wizard-bootstrap/wizard_bootstrap.py # environment sanity ``` All three must exit 0. Do not skip. "It works locally" is not sufficient evidence. --- ## Session and State Files | Store | Location | Notes | |-------|----------|-------| | Sessions | `~/.hermes/sessions/*.json` | Persisted across restarts | | Memories | `~/.hermes/memories/*.md` | Written by the agent's memory tool | | Cron jobs | `~/.hermes/cron/*.json` | Scheduler state | | Gateway state | `~/.hermes/gateway_state.json` | Live platform connection status | | Response store | `~/.hermes/response_store.db` | SQLite WAL — API server only | All paths go through `get_hermes_home()`. Never hardcode. Always backup before a major update: ```bash tar czf ~/backups/hermes_$(date +%F_%H%M).tar.gz ~/.hermes/ ``` --- ## Writing Tests ```bash python -m pytest tests/path/to/test.py -q # single file python -m pytest tests/ -q -k "test_name" # by name python -m pytest tests/ -q -x # stop on first failure ``` **Test isolation rules:** - `tests/conftest.py` has an autouse fixture that redirects `HERMES_HOME` to a temp dir. Never write to `~/.hermes/` in tests. - Profile tests must mock both `Path.home()` and `HERMES_HOME`. See `tests/hermes_cli/test_profiles.py` for the pattern. - Do not mock the database. Integration tests should use real SQLite with a temp path. --- ## Commit Conventions ``` feat: add X # new capability fix: correct Y # bug fix refactor: restructure Z # no behaviour change test: add tests for W # test-only chore: update deps # housekeeping docs: clarify X # documentation only ``` Include `Fixes #NNN` or `Refs #NNN` in the commit message body to close or reference issues automatically. --- *This guide lives in `wizard-bootstrap/`. Update it when you discover a new pitfall or pattern worth preserving.*