Compare commits
1 Commits
claude/iss
...
claude/iss
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
17048c7dff |
215
wizard-bootstrap/FORGE_OPERATIONS_GUIDE.md
Normal file
215
wizard-bootstrap/FORGE_OPERATIONS_GUIDE.md
Normal file
@@ -0,0 +1,215 @@
|
||||
# Forge Operations Guide
|
||||
|
||||
> **Audience:** Forge wizards joining the hermes-agent project
|
||||
> **Purpose:** Practical patterns, common pitfalls, and operational wisdom
|
||||
> **Companion to:** `WIZARD_ENVIRONMENT_CONTRACT.md`
|
||||
|
||||
---
|
||||
|
||||
## The One Rule
|
||||
|
||||
**Read the actual state before acting.**
|
||||
|
||||
Before touching any service, config, or codebase: `ps aux | grep hermes`, `cat ~/.hermes/gateway_state.json`, `curl http://127.0.0.1:8642/health`. The forge punishes assumptions harder than it rewards speed. Evidence always beats intuition.
|
||||
|
||||
---
|
||||
|
||||
## First 15 Minutes on a New System
|
||||
|
||||
```bash
|
||||
# 1. Validate your environment
|
||||
python wizard-bootstrap/wizard_bootstrap.py
|
||||
|
||||
# 2. Check what is actually running
|
||||
ps aux | grep -E 'hermes|python|gateway'
|
||||
|
||||
# 3. Check the data directory
|
||||
ls -la ~/.hermes/
|
||||
cat ~/.hermes/gateway_state.json 2>/dev/null | python3 -m json.tool
|
||||
|
||||
# 4. Verify health endpoints (if gateway is up)
|
||||
curl -sf http://127.0.0.1:8642/health | python3 -m json.tool
|
||||
|
||||
# 5. Run the smoke test
|
||||
source venv/bin/activate
|
||||
python -m pytest tests/ -q -x --timeout=60 2>&1 | tail -20
|
||||
```
|
||||
|
||||
Do not begin work until all five steps return clean output.
|
||||
|
||||
---
|
||||
|
||||
## Import Chain — Know It, Respect It
|
||||
|
||||
The dependency order is load-bearing. Violating it causes silent failures:
|
||||
|
||||
```
|
||||
tools/registry.py ← no deps; imported by everything
|
||||
↑
|
||||
tools/*.py ← each calls registry.register() at import time
|
||||
↑
|
||||
model_tools.py ← imports registry; triggers tool discovery
|
||||
↑
|
||||
run_agent.py / cli.py / batch_runner.py
|
||||
```
|
||||
|
||||
**If you add a tool file**, you must also:
|
||||
1. Add its import to `model_tools.py` `_discover_tools()`
|
||||
2. Add it to `toolsets.py` (core or a named toolset)
|
||||
|
||||
Missing either step causes the tool to silently not appear — no error, just absence.
|
||||
|
||||
---
|
||||
|
||||
## The Five Profile Rules
|
||||
|
||||
Hermes supports isolated profiles (`hermes -p myprofile`). Profile-unsafe code has caused repeated bugs. Memorize these:
|
||||
|
||||
| Do this | Not this |
|
||||
|---------|----------|
|
||||
| `get_hermes_home()` | `Path.home() / ".hermes"` |
|
||||
| `display_hermes_home()` in user messages | hardcoded `~/.hermes` strings |
|
||||
| `get_hermes_home() / "sessions"` in tests | `~/.hermes/sessions` in tests |
|
||||
|
||||
Import both from `hermes_constants`. Every `~/.hermes` hardcode is a latent profile bug.
|
||||
|
||||
---
|
||||
|
||||
## Prompt Caching — Do Not Break It
|
||||
|
||||
The agent caches system prompts. Cache breaks force re-billing of the entire context window on every turn. The following actions break caching mid-conversation and are forbidden:
|
||||
|
||||
- Altering past context
|
||||
- Changing the active toolset
|
||||
- Reloading memories or rebuilding the system prompt
|
||||
|
||||
The only sanctioned context alteration is the context compressor (`agent/context_compressor.py`). If your feature touches the message history, read that file first.
|
||||
|
||||
---
|
||||
|
||||
## Adding a Slash Command (Checklist)
|
||||
|
||||
Four files, in order:
|
||||
|
||||
1. **`hermes_cli/commands.py`** — add `CommandDef` to `COMMAND_REGISTRY`
|
||||
2. **`cli.py`** — add handler branch in `HermesCLI.process_command()`
|
||||
3. **`gateway/run.py`** — add handler if it should work in messaging platforms
|
||||
4. **Aliases** — add to the `aliases` tuple on the `CommandDef`; everything else updates automatically
|
||||
|
||||
All downstream consumers (Telegram menu, Slack routing, autocomplete, help text) derive from `COMMAND_REGISTRY`. You never touch them directly.
|
||||
|
||||
---
|
||||
|
||||
## Tool Schema Pitfalls
|
||||
|
||||
**Do NOT cross-reference other toolsets in schema descriptions.**
|
||||
Writing "prefer `web_search` over this tool" in a browser tool's description will cause the model to hallucinate calls to `web_search` when it's not loaded. Cross-references belong in `get_tool_definitions()` post-processing blocks in `model_tools.py`.
|
||||
|
||||
**Do NOT use `\033[K` (ANSI erase-to-EOL) in display code.**
|
||||
Under `prompt_toolkit`'s `patch_stdout`, it leaks as literal `?[K`. Use space-padding instead: `f"\r{line}{' ' * pad}"`.
|
||||
|
||||
**Do NOT use `simple_term_menu` for interactive menus.**
|
||||
It ghosts on scroll in tmux/iTerm2. Use `curses` (stdlib). See `hermes_cli/tools_config.py` for the pattern.
|
||||
|
||||
---
|
||||
|
||||
## Health Check Anatomy
|
||||
|
||||
A healthy instance returns:
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "ok",
|
||||
"gateway_state": "running",
|
||||
"platforms": {
|
||||
"telegram": {"state": "connected"}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Healthy value | What a bad value means |
|
||||
|-------|--------------|----------------------|
|
||||
| `status` | `"ok"` | HTTP server down |
|
||||
| `gateway_state` | `"running"` | Still starting or crashed |
|
||||
| `platforms.<name>.state` | `"connected"` | Auth failure or network issue |
|
||||
|
||||
`gateway_state: "starting"` is normal for up to 60 s on boot. Beyond that, check logs for auth errors:
|
||||
|
||||
```bash
|
||||
journalctl -u hermes-gateway --since "2 minutes ago" | grep -i "error\|token\|auth"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Gateway Won't Start — Diagnosis Order
|
||||
|
||||
1. `ss -tlnp | grep 8642` — port conflict?
|
||||
2. `cat ~/.hermes/gateway.pid` → `ps -p <pid>` — stale PID file?
|
||||
3. `hermes gateway start --replace` — clears stale locks and PIDs
|
||||
4. `HERMES_LOG_LEVEL=DEBUG hermes gateway start` — verbose output
|
||||
5. Check `~/.hermes/.env` — missing or placeholder token?
|
||||
|
||||
---
|
||||
|
||||
## Before Every PR
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
python -m pytest tests/ -q # full suite: ~3 min, ~3000 tests
|
||||
python scripts/deploy-validate # deployment health check
|
||||
python wizard-bootstrap/wizard_bootstrap.py # environment sanity
|
||||
```
|
||||
|
||||
All three must exit 0. Do not skip. "It works locally" is not sufficient evidence.
|
||||
|
||||
---
|
||||
|
||||
## Session and State Files
|
||||
|
||||
| Store | Location | Notes |
|
||||
|-------|----------|-------|
|
||||
| Sessions | `~/.hermes/sessions/*.json` | Persisted across restarts |
|
||||
| Memories | `~/.hermes/memories/*.md` | Written by the agent's memory tool |
|
||||
| Cron jobs | `~/.hermes/cron/*.json` | Scheduler state |
|
||||
| Gateway state | `~/.hermes/gateway_state.json` | Live platform connection status |
|
||||
| Response store | `~/.hermes/response_store.db` | SQLite WAL — API server only |
|
||||
|
||||
All paths go through `get_hermes_home()`. Never hardcode. Always backup before a major update:
|
||||
|
||||
```bash
|
||||
tar czf ~/backups/hermes_$(date +%F_%H%M).tar.gz ~/.hermes/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Writing Tests
|
||||
|
||||
```bash
|
||||
python -m pytest tests/path/to/test.py -q # single file
|
||||
python -m pytest tests/ -q -k "test_name" # by name
|
||||
python -m pytest tests/ -q -x # stop on first failure
|
||||
```
|
||||
|
||||
**Test isolation rules:**
|
||||
- `tests/conftest.py` has an autouse fixture that redirects `HERMES_HOME` to a temp dir. Never write to `~/.hermes/` in tests.
|
||||
- Profile tests must mock both `Path.home()` and `HERMES_HOME`. See `tests/hermes_cli/test_profiles.py` for the pattern.
|
||||
- Do not mock the database. Integration tests should use real SQLite with a temp path.
|
||||
|
||||
---
|
||||
|
||||
## Commit Conventions
|
||||
|
||||
```
|
||||
feat: add X # new capability
|
||||
fix: correct Y # bug fix
|
||||
refactor: restructure Z # no behaviour change
|
||||
test: add tests for W # test-only
|
||||
chore: update deps # housekeeping
|
||||
docs: clarify X # documentation only
|
||||
```
|
||||
|
||||
Include `Fixes #NNN` or `Refs #NNN` in the commit message body to close or reference issues automatically.
|
||||
|
||||
---
|
||||
|
||||
*This guide lives in `wizard-bootstrap/`. Update it when you discover a new pitfall or pattern worth preserving.*
|
||||
Reference in New Issue
Block a user