Compare commits

..

1 Commits

Author SHA1 Message Date
Alexander Whitestone
17048c7dff docs: add Forge Operations Guide for wizard onboarding
Some checks failed
Docker Build and Publish / build-and-push (pull_request) Failing after 18s
Secret Scan / Scan for secrets (pull_request) Failing after 2s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Failing after 1s
Tests / test (pull_request) Failing after 3s
Captures practical patterns, pitfalls, and operational wisdom for
forge wizards joining the hermes-agent project. Covers:

- First-15-minutes system inspection checklist
- Import chain order and tool registration requirements
- Profile safety rules (get_hermes_home vs hardcoded paths)
- Prompt caching constraints
- Slash command addition checklist
- Tool schema pitfalls (ANSI codes, cross-toolset references)
- Health check anatomy and gateway diagnosis order
- Pre-PR test gate (pytest + deploy-validate + bootstrap)
- Test isolation and commit conventions

Companion document to WIZARD_ENVIRONMENT_CONTRACT.md.

Refs #142

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 22:05:12 -04:00
7 changed files with 215 additions and 451 deletions

View File

@@ -1,44 +0,0 @@
name: Notebook CI
on:
push:
paths:
- 'notebooks/**'
pull_request:
paths:
- 'notebooks/**'
jobs:
notebook-smoke:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: '3.12'
- name: Install dependencies
run: |
pip install papermill jupytext nbformat
python -m ipykernel install --user --name python3
- name: Execute system health notebook
run: |
papermill notebooks/agent_task_system_health.ipynb /tmp/output.ipynb \
-p threshold 0.5 \
-p hostname ci-runner
- name: Verify output has results
run: |
python -c "
import json
nb = json.load(open('/tmp/output.ipynb'))
code_cells = [c for c in nb['cells'] if c['cell_type'] == 'code']
outputs = [c.get('outputs', []) for c in code_cells]
total_outputs = sum(len(o) for o in outputs)
assert total_outputs > 0, 'Notebook produced no outputs'
print(f'Notebook executed successfully with {total_outputs} output(s)')
"

View File

@@ -1,57 +0,0 @@
# Notebook Workflow for Agent Tasks
This directory demonstrates a sovereign, version-controlled workflow for LLM agent tasks using Jupyter notebooks.
## Philosophy
- **`.py` files are the source of truth`** — authored and reviewed as plain Python with `# %%` cell markers (via Jupytext)
- **`.ipynb` files are generated artifacts** — auto-created from `.py` for execution and rich viewing
- **Papermill parameterizes and executes** — each run produces an output notebook with code, narrative, and results preserved
- **Output notebooks are audit artifacts** — every execution leaves a permanent, replayable record
## File Layout
```
notebooks/
agent_task_system_health.py # Source of truth (Jupytext)
agent_task_system_health.ipynb # Generated from .py
docs/
NOTEBOOK_WORKFLOW.md # This document
.gitea/workflows/
notebook-ci.yml # CI gate: executes notebooks on PR/push
```
## How Agents Work With Notebooks
1. **Create** — Agent generates a `.py` notebook using `# %% [markdown]` and `# %%` code blocks
2. **Review** — PR reviewers see clean diffs in Gitea (no JSON noise)
3. **Generate**`jupytext --to ipynb` produces the `.ipynb` before merge
4. **Execute** — Papermill runs the notebook with injected parameters
5. **Archive** — Output notebook is committed to a `reports/` branch or artifact store
## Converting Between Formats
```bash
# .py -> .ipynb
jupytext --to ipynb notebooks/agent_task_system_health.py
# .ipynb -> .py
jupytext --to py notebooks/agent_task_system_health.ipynb
# Execute with parameters
papermill notebooks/agent_task_system_health.ipynb output.ipynb \
-p threshold 1.0 -p hostname forge-vps-01
```
## CI Gate
The `notebook-ci.yml` workflow executes all notebooks in `notebooks/` on every PR and push, ensuring that checked-in notebooks still run and produce outputs.
## Why This Matters
| Problem | Notebook Solution |
|---|---|
| Ephemeral agent reasoning | Markdown cells narrate the thought process |
| Stateless single-turn tools | Stateful cells persist variables across steps |
| Unreviewable binary artifacts | `.py` source is diffable and PR-friendly |
| No execution audit trail | Output notebook preserves code + outputs + metadata |

View File

@@ -1,57 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Parameterized Agent Task: System Health Check\n",
"\n",
"This notebook demonstrates how an LLM agent can generate a task notebook,\n",
"a scheduler can parameterize and execute it via papermill,\n",
"and the output becomes a persistent audit artifact."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {"tags": ["parameters"]},
"outputs": [],
"source": [
"# Default parameters — papermill will inject overrides here\n",
"threshold = 1.0\n",
"hostname = \"localhost\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json, subprocess, datetime\n",
"gather_time = datetime.datetime.now().isoformat()\n",
"load_avg = subprocess.check_output([\"cat\", \"/proc/loadavg\"]).decode().strip()\n",
"load_values = [float(x) for x in load_avg.split()[:3]]\n",
"avg_load = sum(load_values) / len(load_values)\n",
"intervention_needed = avg_load > threshold\n",
"report = {\n",
" \"hostname\": hostname,\n",
" \"threshold\": threshold,\n",
" \"avg_load\": round(avg_load, 3),\n",
" \"intervention_needed\": intervention_needed,\n",
" \"gathered_at\": gather_time\n",
"}\n",
"print(json.dumps(report, indent=2))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,41 +0,0 @@
# ---
# jupyter:
# jupytext:
# text_representation:
# extension: .py
# format_name: percent
# format_version: '1.3'
# jupytext_version: 1.19.1
# kernelspec:
# display_name: Python 3
# language: python
# name: python3
# ---
# %% [markdown]
# # Parameterized Agent Task: System Health Check
#
# This notebook demonstrates how an LLM agent can generate a task notebook,
# a scheduler can parameterize and execute it via papermill,
# and the output becomes a persistent audit artifact.
# %% tags=["parameters"]
# Default parameters — papermill will inject overrides here
threshold = 1.0
hostname = "localhost"
# %%
import json, subprocess, datetime
gather_time = datetime.datetime.now().isoformat()
load_avg = subprocess.check_output(["cat", "/proc/loadavg"]).decode().strip()
load_values = [float(x) for x in load_avg.split()[:3]]
avg_load = sum(load_values) / len(load_values)
intervention_needed = avg_load > threshold
report = {
"hostname": hostname,
"threshold": threshold,
"avg_load": round(avg_load, 3),
"intervention_needed": intervention_needed,
"gathered_at": gather_time
}
print(json.dumps(report, indent=2))

View File

@@ -1,252 +0,0 @@
# Ezra — Quarterly Technical & Strategic Report
**April 2026**
---
## Executive Summary
This report consolidates the principal technical and strategic outputs from Q1/Q2 2026. Three major workstreams are covered:
1. **Security & Performance Hardening** — Shipped V-011 obfuscation detection and context-compressor tuning.
2. **System Formalization Audit** — Identified ~6,300 lines of homegrown infrastructure that can be replaced by well-maintained open-source projects.
3. **Business Development** — Formalized a pure-contracting go-to-market plan ("Operation Get A Job") to monetize the engineering collective.
---
## 1. Recent Deliverables
### 1.1 V-011 Obfuscation Bypass Detection
A significant security enhancement was shipped to the skills-guard subsystem to defeat obfuscated malicious skill code.
**Technical additions:**
- `normalize_input()` with NFKC normalization, case folding, and zero-width character removal to defeat homoglyph and ZWSP evasion.
- `PythonSecurityAnalyzer` AST visitor detecting `eval`/`exec`/`compile`, `getattr` dunder access, and imports of `base64`/`codecs`/`marshal`/`types`/`ctypes`.
- Additional regex patterns for `getattr` builtins chains, `__import__` os/subprocess, and nested base64 decoding.
- Full integration into `scan_file()`; Python files now receive both normalized regex scanning and AST-based analysis.
**Verification:** All tests passing (`103 passed, 4 warnings`).
**Reference:** Forge PR #131`[EPIC-999/Phase II] The Forge — V-011 obfuscation fix + compressor tuning`
### 1.2 Context Compressor Tuning
The default `protect_last_n` parameter was reduced from `20` to `5`. The previous default was overly conservative, preventing meaningful compression on long sessions. The new default preserves the five most recent conversational turns while allowing the compressor to effectively reduce token pressure.
A regression test was added verifying that the last five turns are never summarized away.
### 1.3 Burn Mode Resilience
The agent loop was enhanced with a configurable `burn_mode` flag that increases concurrent tool execution capacity and adds transient-failure retry logic.
**Changes:**
- `max_tool_workers` increased from `8` to `16` in burn mode.
- Expanded parallel tool coverage to include browser, vision, skill, and session-search tools.
- Added batch timeout protection (300s in burn mode / 180s normal) to prevent hung threads from blocking the agent loop.
- Thread-pool shutdown now uses `executor.shutdown(wait=False)` for immediate control return.
- Transient errors (timeouts, rate limits, 502/503/504) trigger one automatic retry in burn mode.
---
## 2. System Formalization Audit
A comprehensive audit was performed across the `hermes-agent` codebase to identify homegrown modules that could be replaced by mature open-source alternatives. The objective is efficiency: reduce maintenance burden, leverage community expertise, and improve reliability.
### 2.1 Candidate Matrix
| Priority | Component | Lines | Current State | Proposed Replacement | Effort | ROI |
|:--------:|-----------|------:|---------------|----------------------|:------:|:---:|
| **P0** | MCP Client | 2,176 | Custom asyncio transport, sampling, schema translation | `mcp` (official Python SDK) | 2-3 wks | Very High |
| **P0** | Cron Scheduler | ~1,500 | Custom JSON job store, manual tick loop | `APScheduler` | 1-2 wks | Very High |
| **P0** | Config Management | 2,589 | Manual YAML loader, no type safety | `pydantic-settings` + Pydantic v2 | 3-4 wks | High |
| **P1** | Checkpoint Manager | 548 | Shells out to `git` binary | `dulwich` (pure-Python git) | 1 wk | Medium-High |
| **P1** | Auth / Credential Pool | ~3,800 | Custom JWT decode, OAuth refresh, JSON auth store | `authlib` + `keyring` + `PyJWT` | 2-3 wks | Medium |
| **P1** | Batch Runner | 1,285 | Custom `multiprocessing.Pool` wrapper | `joblib` (local) or `celery` (distributed) | 1-2 wks | Medium |
| **P2** | SQLite Session Store | ~2,400 | Raw SQLite + FTS5, manual schema | SQLAlchemy ORM + Alembic | 2-3 wks | Medium |
| **P2** | Trajectory Compressor | 1,518 | Custom tokenizer + summarization pipeline | Keep core logic; add `zstandard` for binary storage | 3 days | Low-Medium |
| **P2** | Process Registry | 889 | Custom background process tracking | Keep (adds too much ops complexity) | — | Low |
| **P2** | Web Tools | 2,080+ | Firecrawl + Parallel wrappers | Keep (Firecrawl is already best-in-class) | — | Low |
### 2.2 P0 Replacements
#### MCP Client → Official `mcp` Python SDK
**Current:** `tools/mcp_tool.py` (2,176 lines) contains custom stdio/HTTP transport lifecycle, manual `anyio` cancel-scope cleanup, hand-rolled schema translation, custom sampling bridge, credential stripping, and reconnection backoff.
**Problem:** The Model Context Protocol is evolving rapidly. Maintaining a custom 2K-line client means every protocol revision requires manual patches. The official SDK already handles transport negotiation, lifecycle management, and type-safe schema generation.
**Migration Plan:**
1. Add `mcp>=1.0.0` to dependencies.
2. Build a thin `HermesMCPBridge` class that instantiates `mcp.ClientSession`, maps MCP `Tool` schemas to Hermes registry calls, forwards tool invocations, and preserves the sampling callback.
3. Deprecate the `_mcp_loop` background thread and `anyio`-based transport code.
4. Add integration tests against a test MCP server.
**Lines Saved:** ~1,600
**Risk:** Medium — sampling and timeout behavior need parity testing.
#### Cron Scheduler → APScheduler
**Current:** `cron/jobs.py` (753 lines) + `cron/scheduler.py` (~740 lines) use a JSON file as the job store, custom `parse_duration` and `compute_next_run` logic, a manual tick loop, and ad-hoc delivery orchestration.
**Problem:** Scheduling is a solved problem. The homegrown system lacks timezone support, job concurrency controls, graceful clustering, and durable execution guarantees.
**Migration Plan:**
1. Introduce `APScheduler` with a `SQLAlchemyJobStore` (or custom JSON store).
2. Refactor each Hermes cron job into an APScheduler `Job` function.
3. Preserve existing delivery logic (`_deliver_result`, `_build_job_prompt`, `_run_job_script`) as the job body.
4. Migrate `jobs.json` entries into APScheduler jobs on first run.
5. Expose `/cron` status via a thin CLI wrapper.
**Lines Saved:** ~700
**Risk:** Low — delivery logic is preserved; only the trigger mechanism changes.
#### Config Management → `pydantic-settings`
**Current:** `hermes_cli/config.py` (2,589 lines) uses manual YAML parsing with hardcoded defaults, a complex migration chain (`_config_version` currently at 11), no runtime type validation, and stringly-typed env var resolution.
**Problem:** Every new config option requires touching multiple places. Migration logic is ~400 lines and growing. Typo'd config values are only caught at runtime, often deep in the agent loop.
**Migration Plan:**
1. Define a `HermesConfig` Pydantic model with nested sections (`ModelConfig`, `ProviderConfig`, `AgentConfig`, `CompressionConfig`, etc.).
2. Use `pydantic-settings`'s `SettingsConfigDict(yaml_file="~/.hermes/config.yaml")` to auto-load.
3. Map env vars via `env_prefix="HERMES_"` or field-level `validation_alias`.
4. Keep the migration layer as a one-time upgrade function, then remove it after two releases.
5. Replace `load_config()` call sites with `HermesConfig()` instantiation.
**Lines Saved:** ~1,500
**Risk:** Medium-High — large blast radius; every module reads config. Requires backward compatibility.
### 2.3 P1 Replacements
**Checkpoint Manager → `dulwich`**
- Replace `subprocess.run(["git", ...])` calls with `dulwich.porcelain` equivalents.
- Use `dulwich.repo.Repo.init_bare()` for shadow repos.
- Snapshotting becomes an in-memory `Index` write + `commit()`.
- **Lines Saved:** ~200
- **Risk:** Low
**Auth / Credential Pool → `authlib` + `keyring` + `PyJWT`**
- Use `authlib` for OAuth2 session and token refresh.
- Replace custom JWT decoding with `PyJWT`.
- Migrate the auth store JSON to `keyring`-backed secure storage where available.
- Keep Hermes-specific credential pool strategies (round-robin, least-used, etc.).
- **Lines Saved:** ~800
- **Risk:** Medium
**Batch Runner → `joblib`**
- For typical local batch sizes, `joblib.Parallel(n_jobs=-1, backend='loky')` replaces the custom worker pool.
- Only migrate to Celery if cross-machine distribution is required.
- **Lines Saved:** ~400
- **Risk:** Low for `joblib`
### 2.4 Execution Roadmap
1. **Week 1-2:** Migrate Checkpoint Manager to `dulwich` (quick win, low risk)
2. **Week 3-4:** Migrate Cron Scheduler to `APScheduler` (high value, well-contained)
3. **Week 5-8:** Migrate MCP Client to official `mcp` SDK (highest complexity, highest payoff)
4. **Week 9-12:** Migrate Config Management to `pydantic-settings` (largest blast radius, do last)
5. **Ongoing:** Evaluate Auth/Credential Pool and Batch Runner replacements as follow-up epics.
### 2.5 Cost-Benefit Summary
| Metric | Value |
|--------|-------|
| Total homebrew lines audited | ~17,000 |
| Lines recommended for replacement | ~6,300 |
| Estimated dev weeks (P0 + P1) | 10-14 weeks |
| New runtime dependencies added | 4-6 well-maintained packages |
| Maintenance burden reduction | Very High |
| Risk level | Medium (mitigated by strong test coverage) |
---
## 3. Strategic Initiative: Operation Get A Job
### 3.1 Thesis
The engineering collective is capable of 10x delivery velocity compared to typical market offerings. The strategic opportunity is to monetize this capability through pure contracting — high-tempo, fixed-scope engagements with no exclusivity or employer-like constraints.
### 3.2 Service Menu
**Tier A — White-Glove Agent Infrastructure ($400-600/hr)**
- Custom AI agent deployment with tool use (Slack, Discord, Telegram, webhooks)
- MCP server development
- Local LLM stack setup (on-premise / VPC)
- Agent security audit and red teaming
**Tier B — Security Hardening & Code Review ($250-400/hr)**
- Security backlog burn-down (CVE-class bugs)
- Skills-guard / sandbox hardening
- Architecture review
**Tier C — Automation & Integration ($150-250/hr)**
- Webhook-to-action pipelines
- Research and intelligence reporting
- Content-to-code workflows
### 3.3 Engagement Packages
| Service | Description | Timeline | Investment |
|---------|-------------|----------|------------|
| Agent Security Audit | Review of one AI agent pipeline + written findings | 2-3 business days | $4,500 |
| MCP Server Build | One custom MCP server with 3-5 tools + docs + tests | 1-2 weeks | $8,000 |
| Custom Bot Deployment | End-to-end bot with up to 5 tools, deployed to client platform | 2-3 weeks | $12,000 |
| Security Sprint | Close top 5 security issues in a Python/JS repo | 1-2 weeks | $6,500 |
| Monthly Retainer — Core | 20 hrs/month prioritized engineering + triage | Ongoing | $6,000/mo |
| Monthly Retainer — Scale | 40 hrs/month prioritized engineering + on-call | Ongoing | $11,000/mo |
### 3.4 Go-to-Market Motion
**Immediate channels:**
- Cold outbound to CTOs/VPEs at Series A-C AI startups
- LinkedIn authority content (architecture reviews, security bulletins)
- Platform presence (Gun.io, Toptal, Upwork for specific niche keywords)
**Lead magnet:** Free 15-minute architecture review. No pitch. One concrete risk identified.
### 3.5 Infrastructure Foundation
The Hermes Agent framework serves as both the delivery platform and the portfolio piece:
- Open-source runtime with ~3,000 tests
- Gateway architecture supporting 8+ messaging platforms
- Native MCP client, cron scheduling, subagent delegation
- Self-hosted Forge (Gitea) with CI and automated PR review
- Local Gemma 4 inference stack on bare metal
### 3.6 90-Day Revenue Model
| Month | Target |
|-------|--------|
| Month 1 | $9-12K (1x retainer or 2x audits) |
| Month 2 | $17K (+ 1x MCP build) |
| Month 3 | $29K (+ 1x bot deployment + new retainer) |
### 3.7 Immediate Action Items
- File Wyoming LLC and obtain EIN
- Open Mercury business bank account
- Secure E&O insurance
- Update LinkedIn profile and publish first authority post
- Customize capabilities deck and begin warm outbound
---
## 4. Fleet Status Summary
| House | Host | Model / Provider | Gateway Status |
|-------|------|------------------|----------------|
| Ezra | Hermes VPS | `kimi-for-coding` (Kimi K2.5) | API `8658`, webhook `8648` — Active |
| Bezalel | Hermes VPS | Claude Opus 4.6 (Anthropic) | Port `8645` — Active |
| Allegro-Primus | Hermes VPS | Kimi K2.5 | Port `8644` — Requires restart |
| Bilbo | External | Gemma 4B (local) | Telegram dual-mode — Active |
**Network:** Hermes VPS public IP `143.198.27.163` (Ubuntu 24.04.3 LTS). Local Gemma 4 fallback on `127.0.0.1:11435`.
---
## 5. Conclusion
The codebase is in a strong position: security is hardened, the agent loop is more resilient, and a clear roadmap exists to replace high-maintenance homegrown infrastructure with battle-tested open-source projects. The commercialization strategy is formalized and ready for execution. The next critical path is the human-facing work of entity formation, sales outreach, and closing the first fixed-scope engagement.
Prepared by **Ezra**
April 2026

View File

@@ -0,0 +1,215 @@
# Forge Operations Guide
> **Audience:** Forge wizards joining the hermes-agent project
> **Purpose:** Practical patterns, common pitfalls, and operational wisdom
> **Companion to:** `WIZARD_ENVIRONMENT_CONTRACT.md`
---
## The One Rule
**Read the actual state before acting.**
Before touching any service, config, or codebase: `ps aux | grep hermes`, `cat ~/.hermes/gateway_state.json`, `curl http://127.0.0.1:8642/health`. The forge punishes assumptions harder than it rewards speed. Evidence always beats intuition.
---
## First 15 Minutes on a New System
```bash
# 1. Validate your environment
python wizard-bootstrap/wizard_bootstrap.py
# 2. Check what is actually running
ps aux | grep -E 'hermes|python|gateway'
# 3. Check the data directory
ls -la ~/.hermes/
cat ~/.hermes/gateway_state.json 2>/dev/null | python3 -m json.tool
# 4. Verify health endpoints (if gateway is up)
curl -sf http://127.0.0.1:8642/health | python3 -m json.tool
# 5. Run the smoke test
source venv/bin/activate
python -m pytest tests/ -q -x --timeout=60 2>&1 | tail -20
```
Do not begin work until all five steps return clean output.
---
## Import Chain — Know It, Respect It
The dependency order is load-bearing. Violating it causes silent failures:
```
tools/registry.py ← no deps; imported by everything
tools/*.py ← each calls registry.register() at import time
model_tools.py ← imports registry; triggers tool discovery
run_agent.py / cli.py / batch_runner.py
```
**If you add a tool file**, you must also:
1. Add its import to `model_tools.py` `_discover_tools()`
2. Add it to `toolsets.py` (core or a named toolset)
Missing either step causes the tool to silently not appear — no error, just absence.
---
## The Five Profile Rules
Hermes supports isolated profiles (`hermes -p myprofile`). Profile-unsafe code has caused repeated bugs. Memorize these:
| Do this | Not this |
|---------|----------|
| `get_hermes_home()` | `Path.home() / ".hermes"` |
| `display_hermes_home()` in user messages | hardcoded `~/.hermes` strings |
| `get_hermes_home() / "sessions"` in tests | `~/.hermes/sessions` in tests |
Import both from `hermes_constants`. Every `~/.hermes` hardcode is a latent profile bug.
---
## Prompt Caching — Do Not Break It
The agent caches system prompts. Cache breaks force re-billing of the entire context window on every turn. The following actions break caching mid-conversation and are forbidden:
- Altering past context
- Changing the active toolset
- Reloading memories or rebuilding the system prompt
The only sanctioned context alteration is the context compressor (`agent/context_compressor.py`). If your feature touches the message history, read that file first.
---
## Adding a Slash Command (Checklist)
Four files, in order:
1. **`hermes_cli/commands.py`** — add `CommandDef` to `COMMAND_REGISTRY`
2. **`cli.py`** — add handler branch in `HermesCLI.process_command()`
3. **`gateway/run.py`** — add handler if it should work in messaging platforms
4. **Aliases** — add to the `aliases` tuple on the `CommandDef`; everything else updates automatically
All downstream consumers (Telegram menu, Slack routing, autocomplete, help text) derive from `COMMAND_REGISTRY`. You never touch them directly.
---
## Tool Schema Pitfalls
**Do NOT cross-reference other toolsets in schema descriptions.**
Writing "prefer `web_search` over this tool" in a browser tool's description will cause the model to hallucinate calls to `web_search` when it's not loaded. Cross-references belong in `get_tool_definitions()` post-processing blocks in `model_tools.py`.
**Do NOT use `\033[K` (ANSI erase-to-EOL) in display code.**
Under `prompt_toolkit`'s `patch_stdout`, it leaks as literal `?[K`. Use space-padding instead: `f"\r{line}{' ' * pad}"`.
**Do NOT use `simple_term_menu` for interactive menus.**
It ghosts on scroll in tmux/iTerm2. Use `curses` (stdlib). See `hermes_cli/tools_config.py` for the pattern.
---
## Health Check Anatomy
A healthy instance returns:
```json
{
"status": "ok",
"gateway_state": "running",
"platforms": {
"telegram": {"state": "connected"}
}
}
```
| Field | Healthy value | What a bad value means |
|-------|--------------|----------------------|
| `status` | `"ok"` | HTTP server down |
| `gateway_state` | `"running"` | Still starting or crashed |
| `platforms.<name>.state` | `"connected"` | Auth failure or network issue |
`gateway_state: "starting"` is normal for up to 60 s on boot. Beyond that, check logs for auth errors:
```bash
journalctl -u hermes-gateway --since "2 minutes ago" | grep -i "error\|token\|auth"
```
---
## Gateway Won't Start — Diagnosis Order
1. `ss -tlnp | grep 8642` — port conflict?
2. `cat ~/.hermes/gateway.pid``ps -p <pid>` — stale PID file?
3. `hermes gateway start --replace` — clears stale locks and PIDs
4. `HERMES_LOG_LEVEL=DEBUG hermes gateway start` — verbose output
5. Check `~/.hermes/.env` — missing or placeholder token?
---
## Before Every PR
```bash
source venv/bin/activate
python -m pytest tests/ -q # full suite: ~3 min, ~3000 tests
python scripts/deploy-validate # deployment health check
python wizard-bootstrap/wizard_bootstrap.py # environment sanity
```
All three must exit 0. Do not skip. "It works locally" is not sufficient evidence.
---
## Session and State Files
| Store | Location | Notes |
|-------|----------|-------|
| Sessions | `~/.hermes/sessions/*.json` | Persisted across restarts |
| Memories | `~/.hermes/memories/*.md` | Written by the agent's memory tool |
| Cron jobs | `~/.hermes/cron/*.json` | Scheduler state |
| Gateway state | `~/.hermes/gateway_state.json` | Live platform connection status |
| Response store | `~/.hermes/response_store.db` | SQLite WAL — API server only |
All paths go through `get_hermes_home()`. Never hardcode. Always backup before a major update:
```bash
tar czf ~/backups/hermes_$(date +%F_%H%M).tar.gz ~/.hermes/
```
---
## Writing Tests
```bash
python -m pytest tests/path/to/test.py -q # single file
python -m pytest tests/ -q -k "test_name" # by name
python -m pytest tests/ -q -x # stop on first failure
```
**Test isolation rules:**
- `tests/conftest.py` has an autouse fixture that redirects `HERMES_HOME` to a temp dir. Never write to `~/.hermes/` in tests.
- Profile tests must mock both `Path.home()` and `HERMES_HOME`. See `tests/hermes_cli/test_profiles.py` for the pattern.
- Do not mock the database. Integration tests should use real SQLite with a temp path.
---
## Commit Conventions
```
feat: add X # new capability
fix: correct Y # bug fix
refactor: restructure Z # no behaviour change
test: add tests for W # test-only
chore: update deps # housekeeping
docs: clarify X # documentation only
```
Include `Fixes #NNN` or `Refs #NNN` in the commit message body to close or reference issues automatically.
---
*This guide lives in `wizard-bootstrap/`. Update it when you discover a new pitfall or pattern worth preserving.*