Compare commits

..

14 Commits

Author SHA1 Message Date
d0e4e00ba1 chore: claw-code progress on #151
Some checks failed
Forge CI / smoke-and-build (pull_request) Failing after 4s
Refs #151
2026-04-07 00:03:57 -04:00
9b4fcc5ee4 [claw-code] P2: Validate Documentation Audit & Apply to Our Fork (#126) (#176)
Some checks failed
Forge CI / smoke-and-build (push) Failing after 2s
Co-authored-by: claw-code <claw-code@timmy.local>
Co-committed-by: claw-code <claw-code@timmy.local>
2026-04-07 03:56:46 +00:00
6581dcb1af fix(ezra): switch primary from kimi-for-coding to kimi-k2.5, add fallback chain
Some checks failed
Forge CI / smoke-and-build (push) Failing after 2s
kimi-for-coding is throwing 403 access-terminated errors.
This switches Ezra to kimi-k2.5 and adds anthropic + openrouter fallbacks.
Addresses #lazzyPit and unblocks Ezra resurrection.
2026-04-07 03:23:36 +00:00
a37fed23e6 [BEZALEL][CI] Syntax Guard — Prevent Broken Python from Reaching Main (#167)
Some checks failed
Forge CI / smoke-and-build (push) Failing after 2s
2026-04-07 02:27:32 +00:00
97f63a0d89 Merge pull request '[BEZALEL][DEVKIT] Shared Development Tools for the Wizard Fleet' (#166) from bezalel/devkit-for-the-fleet into main
Some checks failed
Forge CI / smoke-and-build (push) Failing after 2s
2026-04-07 02:15:11 +00:00
b49e8b11ea Merge pull request '[BEZALEL][Epic-001] The Forge CI Pipeline — Gitea Actions + Smoke + Green E2E' (#154) from bezalel/epic-001-forge-ci into main
Some checks failed
Forge CI / smoke-and-build (push) Failing after 2s
2026-04-07 02:12:31 +00:00
88b4cc218f feat(devkit): Add shared development tools for the wizard fleet
Some checks failed
Notebook CI / notebook-smoke (pull_request) Failing after 2s
- gitea_client.py — reusable Gitea API client for issues, PRs, comments
- health.py — fleet health monitor (load, disk, memory, processes)
- notebook_runner.py — Papermill wrapper with JSON reporting
- smoke_test.py — fast smoke tests and bare green-path e2e
- secret_scan.py — secret leak scanner for CI gating
- wizard_env.py — environment validator for bootstrapping agents
- README.md — usage guide for all tools

These tools are designed to be used by any wizard via python -m devkit.<tool>.
Rising up as a platform, not a silo.
2026-04-07 02:08:47 +00:00
59653ef409 [claude] Research Triage: SSD Self-Distillation acknowledgment (#128) (#165) 2026-04-07 02:07:54 +00:00
e32d6332bc [claude] Forge Operations Guide — Practical Wizard Onboarding (#142) (#164) 2026-04-07 02:06:15 +00:00
6291f2d31b [claude] Fleet SITREP — April 6, 2026 acknowledgment (#143) (#162) 2026-04-07 02:04:51 +00:00
066ec8eafa [claude] Add Ezra Quarterly Report — April 2026 (MD + PDF) (#133) (#163) 2026-04-07 02:04:45 +00:00
53fe58a2b9 feat(notebooks): Add Jupytext + Papermill agent workflow demo
Some checks failed
Notebook CI / notebook-smoke (push) Failing after 3s
Notebook CI / notebook-smoke (pull_request) Failing after 5s
- Add parameterized system-health notebook (.py source + .ipynb)
- Add Gitea Actions CI workflow for notebook execution smoke test
- Add NOTEBOOK_WORKFLOW.md documenting the .py-first approach
- Proves end-to-end: agent writes .py -> PR review -> CI executes -> output artifact
2026-04-07 01:54:25 +00:00
43bcb88a09 [BEZALEL][Epic-001] The Forge CI Pipeline — Gitea Actions + Smoke + Green E2E
Some checks failed
Forge CI / smoke-and-build (pull_request) Failing after 3s
- Add .gitea/workflows/ci.yml: Gitea Actions workflow for PR/push CI
- Add scripts/smoke_test.py: fast smoke tests (<30s) for core imports and CLI entrypoints
- Add tests/test_green_path_e2e.py: bare green-path e2e — terminal echo test
- Total CI runtime target: <5 minutes
- No API keys required for smoke/e2e stages

Closes #145
/assign @bezalel
2026-04-07 00:28:32 +00:00
89730e8e90 [BEZALEL] Add forge health check — artifact integrity and security scanner
Some checks failed
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Failing after 0s
Docker Build and Publish / build-and-push (pull_request) Failing after 7s
Tests / test (pull_request) Failing after 2s
Adds scripts/forge_health_check.py to scan wizard environments for:
- Missing .py source files with orphaned .pyc bytecode (GOFAI artifact integrity)
- Burn script clutter in production paths
- World-readable sensitive files (keystores, tokens, .env)
- Missing required environment variables

Includes full test suite in tests/test_forge_health_check.py covering
orphaned bytecode detection, burn script clutter, permission auto-fix,
and environment variable validation.

Addresses Allegro formalization audit findings:
- GOFAI source files missing (only .pyc remains)
- Nostr keystore world-readable
- eg burn scripts cluttering /root

/assign @bezalel
2026-04-06 22:37:32 +00:00
20 changed files with 1981 additions and 41 deletions

View File

@@ -0,0 +1,2 @@
{"created_at_ms":1775533542734,"session_id":"session-1775533542734-0","type":"session_meta","updated_at_ms":1775533542734,"version":1}
{"message":{"blocks":[{"text":"You are Code Claw running as the Gitea user claw-code.\n\nRepository: Timmy_Foundation/hermes-agent\nIssue: #126 — P2: Validate Documentation Audit & Apply to Our Fork\nBranch: claw-code/issue-126\n\nRead the issue and recent comments, then implement the smallest correct change.\nYou are in a git repo checkout already.\n\nIssue body:\n## Context\n\nCommit `43d468ce` is a comprehensive documentation audit — fixes stale info, expands thin pages, adds depth across all docs.\n\n## Acceptance Criteria\n\n- [ ] **Catalog all doc changes**: Run `git show 43d468ce --stat` to list all files changed, then review each for what was fixed/expanded\n- [ ] **Verify key docs are accurate**: Pick 3 docs that were previously thin (setup, deployment, plugin development), confirm they now have comprehensive content\n- [ ] **Identify stale info that was corrected**: Note at least 3 pieces of stale information that were removed or updated\n- [ ] **Apply fixes to our fork if needed**: Check if any of the doc fixes apply to our `Timmy_Foundation/hermes-agent` fork (Timmy-specific references, custom config sections)\n\n## Why This Matters\n\nAccurate documentation is critical for onboarding new agents and maintaining the fleet. Stale docs cost more debugging time than writing them initially.\n\n## Hints\n\n- Run `cd ~/.hermes/hermes-agent && git show 43d468ce --stat` to see the full scope\n- The docs likely cover: setup, plugins, deployment, MCP configuration, and tool integrations\n\n\nParent: #111\n\nRecent comments:\n## 🏷️ Automated Triage Check\n\n**Timestamp:** 2026-04-06T15:30:12.449023 \n**Agent:** Allegro Heartbeat\n\nThis issue has been identified as needing triage:\n\n### Checklist\n- [ ] Clear acceptance criteria defined\n- [ ] Priority label assigned (p0-critical / p1-important / p2-backlog)\n- [ ] Size estimate added (quick-fix / day / week / epic)\n- [ ] Owner assigned\n- [ ] Related issues linked\n\n### Context\n- No comments yet — needs engagement\n- No labels — needs categorization\n- Part of automated backlog maintenance\n\n---\n*Automated triage from Allegro 15-minute heartbeat*\n\n[BURN-DOWN] Dispatched to Code Claw (claw-code worker) as part of nightly burn-down cycle. Heartbeat active.\n\n🟠 Code Claw (OpenRouter qwen/qwen3.6-plus:free) picking up this issue via 15-minute heartbeat.\n\nTimestamp: 2026-04-07T03:45:37Z\n\nRules:\n- Make focused code/config/doc changes only if they directly address the issue.\n- Prefer the smallest proof-oriented fix.\n- Run relevant verification commands if obvious.\n- Do NOT create PRs yourself; the outer worker handles commit/push/PR.\n- If the task is too large or not code-fit, leave the tree unchanged.\n","type":"text"}],"role":"user"},"type":"message"}

View File

@@ -0,0 +1,2 @@
{"created_at_ms":1775534636684,"session_id":"session-1775534636684-0","type":"session_meta","updated_at_ms":1775534636684,"version":1}
{"message":{"blocks":[{"text":"You are Code Claw running as the Gitea user claw-code.\n\nRepository: Timmy_Foundation/hermes-agent\nIssue: #151 — [CONFIG] Add Kimi model to fallback chain for Allegro and Bezalel\nBranch: claw-code/issue-151\n\nRead the issue and recent comments, then implement the smallest correct change.\nYou are in a git repo checkout already.\n\nIssue body:\n## Problem\nAllegro and Bezalel are choking because the Kimi model code is not on their fallback chain. When primary models fail or rate-limit, Kimi should be available as a fallback option but is currently missing.\n\n## Expected Behavior\nKimi model code should be at the front of the fallback chain for both Allegro and Bezalel, so they can remain responsive when primary models are unavailable.\n\n## Context\nThis was reported in Telegram by Alexander Whitestone after observing both agents becoming unresponsive. Ezra was asked to investigate the fallback chain configuration.\n\n## Related\n- timmy-config #302: [ARCH] Fallback Portfolio Runtime Wiring (general fallback framework)\n- hermes-agent #150: [BEZALEL][AUDIT] Telegram Request-to-Gitea Tracking Audit\n\n## Acceptance Criteria\n- [ ] Kimi model code is added to Allegro fallback chain\n- [ ] Kimi model code is added to Bezalel fallback chain\n- [ ] Fallback ordering places Kimi appropriately (front of chain as requested)\n- [ ] Test and confirm both agents can successfully fall back to Kimi\n- [ ] Document the fallback chain configuration for both agents\n\n/assign @ezra\n\nRecent comments:\n[BURN-DOWN] Dispatched to Code Claw (claw-code worker) as part of nightly burn-down cycle. Heartbeat active.\n\n🟠 Code Claw (OpenRouter qwen/qwen3.6-plus:free) picking up this issue via 15-minute heartbeat.\n\nTimestamp: 2026-04-07T04:03:49Z\n\nRules:\n- Make focused code/config/doc changes only if they directly address the issue.\n- Prefer the smallest proof-oriented fix.\n- Run relevant verification commands if obvious.\n- Do NOT create PRs yourself; the outer worker handles commit/push/PR.\n- If the task is too large or not code-fit, leave the tree unchanged.\n","type":"text"}],"role":"user"},"type":"message"}

54
.gitea/workflows/ci.yml Normal file
View File

@@ -0,0 +1,54 @@
name: Forge CI
on:
push:
branches: [main]
pull_request:
branches: [main]
concurrency:
group: forge-ci-${{ gitea.ref }}
cancel-in-progress: true
jobs:
smoke-and-build:
runs-on: ubuntu-latest
timeout-minutes: 5
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Install uv
uses: astral-sh/setup-uv@v5
- name: Set up Python 3.11
run: uv python install 3.11
- name: Install package
run: |
uv venv .venv --python 3.11
source .venv/bin/activate
uv pip install -e ".[all,dev]"
- name: Smoke tests
run: |
source .venv/bin/activate
python scripts/smoke_test.py
env:
OPENROUTER_API_KEY: ""
OPENAI_API_KEY: ""
NOUS_API_KEY: ""
- name: Syntax guard
run: |
source .venv/bin/activate
python scripts/syntax_guard.py
- name: Green-path E2E
run: |
source .venv/bin/activate
python -m pytest tests/test_green_path_e2e.py -q --tb=short
env:
OPENROUTER_API_KEY: ""
OPENAI_API_KEY: ""
NOUS_API_KEY: ""

View File

@@ -1,44 +1,34 @@
# Ezra Configuration - Kimi Primary
# Anthropic removed from chain entirely
# PRIMARY: Kimi for all operations
model: kimi-coding/kimi-for-coding
# Fallback chain: Only local/offline options
# NO anthropic in the chain - quota issues solved
fallback_providers:
- provider: ollama
model: qwen2.5:7b
base_url: http://localhost:11434
timeout: 120
reason: "Local fallback when Kimi unavailable"
# Provider settings
providers:
kimi-coding:
timeout: 60
max_retries: 3
# Uses KIMI_API_KEY from .env
ollama:
timeout: 120
keep_alive: true
base_url: http://localhost:11434
# REMOVED: anthropic provider entirely
# No more quota issues, no more choking
# Toolsets - Ezra needs these
model:
default: kimi-k2.5
provider: kimi-coding
toolsets:
- hermes-cli
- github
- web
# Agent settings
- all
fallback_providers:
- provider: kimi-coding
model: kimi-k2.5
timeout: 120
reason: Kimi coding fallback (front of chain)
- provider: anthropic
model: claude-sonnet-4-20250514
timeout: 120
reason: Direct Anthropic fallback
- provider: openrouter
model: anthropic/claude-sonnet-4-20250514
base_url: https://openrouter.ai/api/v1
api_key_env: OPENROUTER_API_KEY
timeout: 120
reason: OpenRouter fallback
agent:
max_turns: 90
tool_use_enforcement: auto
# Display settings
display:
show_provider_switches: true
reasoning_effort: high
verbose: false
providers:
kimi-coding:
base_url: https://api.kimi.com/coding/v1
timeout: 60
max_retries: 3
anthropic:
timeout: 120
openrouter:
base_url: https://openrouter.ai/api/v1
timeout: 120

56
devkit/README.md Normal file
View File

@@ -0,0 +1,56 @@
# Bezalel's Devkit — Shared Tools for the Wizard Fleet
This directory contains reusable CLI tools and Python modules for CI, testing, deployment, observability, and Gitea automation. Any wizard can invoke them via `python -m devkit.<tool>`.
## Tools
### `gitea_client` — Gitea API Client
List issues/PRs, post comments, create PRs, update issues.
```bash
python -m devkit.gitea_client issues --state open --limit 20
python -m devkit.gitea_client create-comment --number 142 --body "Update from Bezalel"
python -m devkit.gitea_client prs --state open
```
### `health` — Fleet Health Monitor
Checks system load, disk, memory, running processes, and key package versions.
```bash
python -m devkit.health --threshold-load 1.0 --threshold-disk 90.0 --fail-on-critical
```
### `notebook_runner` — Notebook Execution Wrapper
Parameterizes and executes Jupyter notebooks via Papermill with structured JSON reporting.
```bash
python -m devkit.notebook_runner task.ipynb output.ipynb -p threshold=1.0 -p hostname=forge
```
### `smoke_test` — Fast Smoke Test Runner
Runs core import checks, CLI entrypoint tests, and one bare green-path E2E.
```bash
python -m devkit.smoke_test --verbose
```
### `secret_scan` — Secret Leak Scanner
Scans the repo for API keys, tokens, and private keys.
```bash
python -m devkit.secret_scan --path . --fail-on-find
```
### `wizard_env` — Environment Validator
Checks that a wizard environment has all required binaries, env vars, Python packages, and Hermes config.
```bash
python -m devkit.wizard_env --json --fail-on-incomplete
```
## Philosophy
- **CLI-first** — Every tool is runnable as `python -m devkit.<tool>`
- **JSON output** — Easy to parse from other agents and CI pipelines
- **Zero dependencies beyond stdlib** where possible; optional heavy deps are runtime-checked
- **Fail-fast** — Exit codes are meaningful for CI gating

9
devkit/__init__.py Normal file
View File

@@ -0,0 +1,9 @@
"""
Bezalel's Devkit — Shared development tools for the wizard fleet.
A collection of CLI-accessible utilities for CI, testing, deployment,
observability, and Gitea automation. Designed to be used by any agent
via subprocess or direct Python import.
"""
__version__ = "0.1.0"

153
devkit/gitea_client.py Normal file
View File

@@ -0,0 +1,153 @@
#!/usr/bin/env python3
"""
Shared Gitea API client for wizard fleet automation.
Usage as CLI:
python -m devkit.gitea_client issues --repo Timmy_Foundation/hermes-agent --state open
python -m devkit.gitea_client issue --repo Timmy_Foundation/hermes-agent --number 142
python -m devkit.gitea_client create-comment --repo Timmy_Foundation/hermes-agent --number 142 --body "Update from Bezalel"
python -m devkit.gitea_client prs --repo Timmy_Foundation/hermes-agent --state open
Usage as module:
from devkit.gitea_client import GiteaClient
client = GiteaClient()
issues = client.list_issues("Timmy_Foundation/hermes-agent", state="open")
"""
import argparse
import json
import os
import sys
from typing import Any, Dict, List, Optional
import urllib.request
DEFAULT_BASE_URL = os.getenv("GITEA_URL", "https://forge.alexanderwhitestone.com")
DEFAULT_TOKEN = os.getenv("GITEA_TOKEN", "")
class GiteaClient:
def __init__(self, base_url: str = DEFAULT_BASE_URL, token: str = DEFAULT_TOKEN):
self.base_url = base_url.rstrip("/")
self.token = token or ""
def _request(
self,
method: str,
path: str,
data: Optional[Dict[str, Any]] = None,
headers: Optional[Dict[str, str]] = None,
) -> Any:
url = f"{self.base_url}/api/v1{path}"
req_headers = {"Content-Type": "application/json", "Accept": "application/json"}
if self.token:
req_headers["Authorization"] = f"token {self.token}"
if headers:
req_headers.update(headers)
body = json.dumps(data).encode() if data else None
req = urllib.request.Request(url, data=body, headers=req_headers, method=method)
try:
with urllib.request.urlopen(req) as resp:
return json.loads(resp.read().decode())
except urllib.error.HTTPError as e:
return {"error": True, "status": e.code, "body": e.read().decode()}
def list_issues(self, repo: str, state: str = "open", limit: int = 50) -> List[Dict]:
return self._request("GET", f"/repos/{repo}/issues?state={state}&limit={limit}") or []
def get_issue(self, repo: str, number: int) -> Dict:
return self._request("GET", f"/repos/{repo}/issues/{number}") or {}
def create_comment(self, repo: str, number: int, body: str) -> Dict:
return self._request(
"POST", f"/repos/{repo}/issues/{number}/comments", {"body": body}
)
def update_issue(self, repo: str, number: int, **fields) -> Dict:
return self._request("PATCH", f"/repos/{repo}/issues/{number}", fields)
def list_prs(self, repo: str, state: str = "open", limit: int = 50) -> List[Dict]:
return self._request("GET", f"/repos/{repo}/pulls?state={state}&limit={limit}") or []
def get_pr(self, repo: str, number: int) -> Dict:
return self._request("GET", f"/repos/{repo}/pulls/{number}") or {}
def create_pr(self, repo: str, title: str, head: str, base: str, body: str = "") -> Dict:
return self._request(
"POST",
f"/repos/{repo}/pulls",
{"title": title, "head": head, "base": base, "body": body},
)
def _fmt_json(obj: Any) -> str:
return json.dumps(obj, indent=2, ensure_ascii=False)
def main(argv: List[str] = None) -> int:
argv = argv or sys.argv[1:]
parser = argparse.ArgumentParser(description="Gitea CLI for wizard fleet")
parser.add_argument("--repo", default="Timmy_Foundation/hermes-agent", help="Repository full name")
parser.add_argument("--token", default=DEFAULT_TOKEN, help="Gitea API token")
parser.add_argument("--base-url", default=DEFAULT_BASE_URL, help="Gitea base URL")
sub = parser.add_subparsers(dest="cmd")
p_issues = sub.add_parser("issues", help="List issues")
p_issues.add_argument("--state", default="open")
p_issues.add_argument("--limit", type=int, default=50)
p_issue = sub.add_parser("issue", help="Get single issue")
p_issue.add_argument("--number", type=int, required=True)
p_prs = sub.add_parser("prs", help="List PRs")
p_prs.add_argument("--state", default="open")
p_prs.add_argument("--limit", type=int, default=50)
p_pr = sub.add_parser("pr", help="Get single PR")
p_pr.add_argument("--number", type=int, required=True)
p_comment = sub.add_parser("create-comment", help="Post comment on issue/PR")
p_comment.add_argument("--number", type=int, required=True)
p_comment.add_argument("--body", required=True)
p_update = sub.add_parser("update-issue", help="Update issue fields")
p_update.add_argument("--number", type=int, required=True)
p_update.add_argument("--title", default=None)
p_update.add_argument("--body", default=None)
p_update.add_argument("--state", default=None)
p_create_pr = sub.add_parser("create-pr", help="Create a PR")
p_create_pr.add_argument("--title", required=True)
p_create_pr.add_argument("--head", required=True)
p_create_pr.add_argument("--base", default="main")
p_create_pr.add_argument("--body", default="")
args = parser.parse_args(argv)
client = GiteaClient(base_url=args.base_url, token=args.token)
if args.cmd == "issues":
print(_fmt_json(client.list_issues(args.repo, args.state, args.limit)))
elif args.cmd == "issue":
print(_fmt_json(client.get_issue(args.repo, args.number)))
elif args.cmd == "prs":
print(_fmt_json(client.list_prs(args.repo, args.state, args.limit)))
elif args.cmd == "pr":
print(_fmt_json(client.get_pr(args.repo, args.number)))
elif args.cmd == "create-comment":
print(_fmt_json(client.create_comment(args.repo, args.number, args.body)))
elif args.cmd == "update-issue":
fields = {k: v for k, v in {"title": args.title, "body": args.body, "state": args.state}.items() if v is not None}
print(_fmt_json(client.update_issue(args.repo, args.number, **fields)))
elif args.cmd == "create-pr":
print(_fmt_json(client.create_pr(args.repo, args.title, args.head, args.base, args.body)))
else:
parser.print_help()
return 1
return 0
if __name__ == "__main__":
sys.exit(main())

134
devkit/health.py Normal file
View File

@@ -0,0 +1,134 @@
#!/usr/bin/env python3
"""
Fleet health monitor for wizard agents.
Checks local system state and reports structured health metrics.
Usage as CLI:
python -m devkit.health
python -m devkit.health --threshold-load 1.0 --check-disk
Usage as module:
from devkit.health import check_health
report = check_health()
"""
import argparse
import json
import os
import shutil
import subprocess
import sys
import time
from typing import Any, Dict, List
def _run(cmd: List[str]) -> str:
try:
return subprocess.check_output(cmd, stderr=subprocess.DEVNULL).decode().strip()
except Exception as e:
return f"error: {e}"
def check_health(threshold_load: float = 1.0, threshold_disk_percent: float = 90.0) -> Dict[str, Any]:
gather_time = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())
# Load average
load_raw = _run(["cat", "/proc/loadavg"])
load_values = []
avg_load = None
if load_raw.startswith("error:"):
load_status = load_raw
else:
try:
load_values = [float(x) for x in load_raw.split()[:3]]
avg_load = sum(load_values) / len(load_values)
load_status = "critical" if avg_load > threshold_load else "ok"
except Exception as e:
load_status = f"error parsing load: {e}"
# Disk usage
disk = shutil.disk_usage("/")
disk_percent = (disk.used / disk.total) * 100 if disk.total else 0.0
disk_status = "critical" if disk_percent > threshold_disk_percent else "ok"
# Memory
meminfo = _run(["cat", "/proc/meminfo"])
mem_stats = {}
for line in meminfo.splitlines():
if ":" in line:
key, val = line.split(":", 1)
mem_stats[key.strip()] = val.strip()
# Running processes
hermes_pids = []
try:
ps_out = subprocess.check_output(["pgrep", "-a", "-f", "hermes"]).decode().strip()
hermes_pids = [line.split(None, 1) for line in ps_out.splitlines() if line.strip()]
except subprocess.CalledProcessError:
hermes_pids = []
# Python package versions (key ones)
key_packages = ["jupyterlab", "papermill", "requests"]
pkg_versions = {}
for pkg in key_packages:
try:
out = subprocess.check_output([sys.executable, "-m", "pip", "show", pkg], stderr=subprocess.DEVNULL).decode()
for line in out.splitlines():
if line.startswith("Version:"):
pkg_versions[pkg] = line.split(":", 1)[1].strip()
break
except Exception:
pkg_versions[pkg] = None
overall = "ok"
if load_status == "critical" or disk_status == "critical":
overall = "critical"
elif not hermes_pids:
overall = "warning"
return {
"timestamp": gather_time,
"overall": overall,
"load": {
"raw": load_raw if not load_raw.startswith("error:") else None,
"1min": load_values[0] if len(load_values) > 0 else None,
"5min": load_values[1] if len(load_values) > 1 else None,
"15min": load_values[2] if len(load_values) > 2 else None,
"avg": round(avg_load, 3) if avg_load is not None else None,
"threshold": threshold_load,
"status": load_status,
},
"disk": {
"total_gb": round(disk.total / (1024 ** 3), 2),
"used_gb": round(disk.used / (1024 ** 3), 2),
"free_gb": round(disk.free / (1024 ** 3), 2),
"used_percent": round(disk_percent, 2),
"threshold_percent": threshold_disk_percent,
"status": disk_status,
},
"memory": mem_stats,
"processes": {
"hermes_count": len(hermes_pids),
"hermes_pids": hermes_pids[:10],
},
"packages": pkg_versions,
}
def main(argv: List[str] = None) -> int:
argv = argv or sys.argv[1:]
parser = argparse.ArgumentParser(description="Fleet health monitor")
parser.add_argument("--threshold-load", type=float, default=1.0)
parser.add_argument("--threshold-disk", type=float, default=90.0)
parser.add_argument("--fail-on-critical", action="store_true", help="Exit non-zero if overall is critical")
args = parser.parse_args(argv)
report = check_health(args.threshold_load, args.threshold_disk)
print(json.dumps(report, indent=2))
if args.fail_on_critical and report.get("overall") == "critical":
return 1
return 0
if __name__ == "__main__":
sys.exit(main())

136
devkit/notebook_runner.py Normal file
View File

@@ -0,0 +1,136 @@
#!/usr/bin/env python3
"""
Notebook execution runner for agent tasks.
Wraps papermill with sensible defaults and structured JSON reporting.
Usage as CLI:
python -m devkit.notebook_runner notebooks/task.ipynb output.ipynb -p threshold 1.0
python -m devkit.notebook_runner notebooks/task.ipynb --dry-run
Usage as module:
from devkit.notebook_runner import run_notebook
result = run_notebook("task.ipynb", "output.ipynb", parameters={"threshold": 1.0})
"""
import argparse
import json
import os
import subprocess
import sys
import tempfile
from pathlib import Path
from typing import Any, Dict, List, Optional
def run_notebook(
input_path: str,
output_path: Optional[str] = None,
parameters: Optional[Dict[str, Any]] = None,
kernel: str = "python3",
timeout: Optional[int] = None,
dry_run: bool = False,
) -> Dict[str, Any]:
input_path = str(Path(input_path).expanduser().resolve())
if output_path is None:
fd, output_path = tempfile.mkstemp(suffix=".ipynb")
os.close(fd)
else:
output_path = str(Path(output_path).expanduser().resolve())
if dry_run:
return {
"status": "dry_run",
"input": input_path,
"output": output_path,
"parameters": parameters or {},
"kernel": kernel,
}
cmd = ["papermill", input_path, output_path, "--kernel", kernel]
if timeout is not None:
cmd.extend(["--execution-timeout", str(timeout)])
for key, value in (parameters or {}).items():
cmd.extend(["-p", key, str(value)])
start = os.times()
try:
proc = subprocess.run(
cmd,
capture_output=True,
text=True,
check=True,
)
end = os.times()
return {
"status": "ok",
"input": input_path,
"output": output_path,
"parameters": parameters or {},
"kernel": kernel,
"elapsed_seconds": round((end.elapsed - start.elapsed), 2),
"stdout": proc.stdout[-2000:] if proc.stdout else "",
}
except subprocess.CalledProcessError as e:
end = os.times()
return {
"status": "error",
"input": input_path,
"output": output_path,
"parameters": parameters or {},
"kernel": kernel,
"elapsed_seconds": round((end.elapsed - start.elapsed), 2),
"stdout": e.stdout[-2000:] if e.stdout else "",
"stderr": e.stderr[-2000:] if e.stderr else "",
"returncode": e.returncode,
}
except FileNotFoundError:
return {
"status": "error",
"message": "papermill not found. Install with: uv tool install papermill",
}
def main(argv: List[str] = None) -> int:
argv = argv or sys.argv[1:]
parser = argparse.ArgumentParser(description="Notebook runner for agents")
parser.add_argument("input", help="Input notebook path")
parser.add_argument("output", nargs="?", default=None, help="Output notebook path")
parser.add_argument("-p", "--parameter", action="append", default=[], help="Parameters as key=value")
parser.add_argument("--kernel", default="python3")
parser.add_argument("--timeout", type=int, default=None)
parser.add_argument("--dry-run", action="store_true")
args = parser.parse_args(argv)
parameters = {}
for raw in args.parameter:
if "=" not in raw:
print(f"Invalid parameter (expected key=value): {raw}", file=sys.stderr)
return 1
k, v = raw.split("=", 1)
# Best-effort type inference
if v.lower() in ("true", "false"):
v = v.lower() == "true"
else:
try:
v = int(v)
except ValueError:
try:
v = float(v)
except ValueError:
pass
parameters[k] = v
result = run_notebook(
args.input,
args.output,
parameters=parameters,
kernel=args.kernel,
timeout=args.timeout,
dry_run=args.dry_run,
)
print(json.dumps(result, indent=2))
return 0 if result.get("status") == "ok" else 1
if __name__ == "__main__":
sys.exit(main())

108
devkit/secret_scan.py Normal file
View File

@@ -0,0 +1,108 @@
#!/usr/bin/env python3
"""
Fast secret leak scanner for the repository.
Checks for common patterns that should never be committed.
Usage as CLI:
python -m devkit.secret_scan
python -m devkit.secret_scan --path /some/repo --fail-on-find
Usage as module:
from devkit.secret_scan import scan
findings = scan("/path/to/repo")
"""
import argparse
import json
import os
import re
import sys
from pathlib import Path
from typing import Any, Dict, List
# Patterns to flag
PATTERNS = {
"aws_access_key_id": re.compile(r"AKIA[0-9A-Z]{16}"),
"aws_secret_key": re.compile(r"['\"\s][0-9a-zA-Z/+]{40}['\"\s]"),
"generic_api_key": re.compile(r"api[_-]?key\s*[:=]\s*['\"][a-zA-Z0-9_\-]{20,}['\"]", re.IGNORECASE),
"private_key": re.compile(r"-----BEGIN (RSA |EC |DSA |OPENSSH )?PRIVATE KEY-----"),
"github_token": re.compile(r"gh[pousr]_[A-Za-z0-9_]{36,}"),
"gitea_token": re.compile(r"[0-9a-f]{40}"), # heuristic for long hex strings after "token"
"telegram_bot_token": re.compile(r"[0-9]{9,}:[A-Za-z0-9_-]{35,}"),
}
# Files and paths to skip
SKIP_PATHS = [
".git",
"__pycache__",
".pytest_cache",
"node_modules",
"venv",
".env",
".agent-skills",
]
# Max file size to scan (bytes)
MAX_FILE_SIZE = 1024 * 1024
def _should_skip(path: Path) -> bool:
for skip in SKIP_PATHS:
if skip in path.parts:
return True
return False
def scan(root: str = ".") -> List[Dict[str, Any]]:
root_path = Path(root).resolve()
findings = []
for file_path in root_path.rglob("*"):
if not file_path.is_file():
continue
if _should_skip(file_path):
continue
if file_path.stat().st_size > MAX_FILE_SIZE:
continue
try:
text = file_path.read_text(encoding="utf-8", errors="ignore")
except Exception:
continue
for pattern_name, pattern in PATTERNS.items():
for match in pattern.finditer(text):
# Simple context: line around match
start = max(0, match.start() - 40)
end = min(len(text), match.end() + 40)
context = text[start:end].replace("\n", " ")
findings.append({
"file": str(file_path.relative_to(root_path)),
"pattern": pattern_name,
"line": text[:match.start()].count("\n") + 1,
"context": context,
})
return findings
def main(argv: List[str] = None) -> int:
argv = argv or sys.argv[1:]
parser = argparse.ArgumentParser(description="Secret leak scanner")
parser.add_argument("--path", default=".", help="Repository root to scan")
parser.add_argument("--fail-on-find", action="store_true", help="Exit non-zero if secrets found")
parser.add_argument("--json", action="store_true", help="Output as JSON")
args = parser.parse_args(argv)
findings = scan(args.path)
if args.json:
print(json.dumps({"findings": findings, "count": len(findings)}, indent=2))
else:
print(f"Scanned {args.path}")
print(f"Findings: {len(findings)}")
for f in findings:
print(f" [{f['pattern']}] {f['file']}:{f['line']} -> ...{f['context']}...")
if args.fail_on_find and findings:
return 1
return 0
if __name__ == "__main__":
sys.exit(main())

108
devkit/smoke_test.py Normal file
View File

@@ -0,0 +1,108 @@
#!/usr/bin/env python3
"""
Shared smoke test runner for hermes-agent.
Fast checks that catch obvious breakage without maintenance burden.
Usage as CLI:
python -m devkit.smoke_test
python -m devkit.smoke_test --verbose
Usage as module:
from devkit.smoke_test import run_smoke_tests
results = run_smoke_tests()
"""
import argparse
import importlib
import json
import subprocess
import sys
from pathlib import Path
from typing import Any, Dict, List
HERMES_ROOT = Path(__file__).resolve().parent.parent
def _test_imports() -> Dict[str, Any]:
modules = [
"hermes_constants",
"hermes_state",
"cli",
"tools.skills_sync",
"tools.skills_hub",
]
errors = []
for mod in modules:
try:
importlib.import_module(mod)
except Exception as e:
errors.append({"module": mod, "error": str(e)})
return {
"name": "core_imports",
"status": "ok" if not errors else "fail",
"errors": errors,
}
def _test_cli_entrypoints() -> Dict[str, Any]:
entrypoints = [
[sys.executable, "-m", "cli", "--help"],
]
errors = []
for cmd in entrypoints:
try:
subprocess.run(cmd, capture_output=True, text=True, check=True, cwd=HERMES_ROOT)
except subprocess.CalledProcessError as e:
errors.append({"cmd": cmd, "error": f"exit {e.returncode}"})
except Exception as e:
errors.append({"cmd": cmd, "error": str(e)})
return {
"name": "cli_entrypoints",
"status": "ok" if not errors else "fail",
"errors": errors,
}
def _test_green_path_e2e() -> Dict[str, Any]:
"""One bare green-path E2E: terminal_tool echo hello."""
try:
from tools.terminal_tool import terminal
result = terminal(command="echo hello")
output = result.get("output", "")
if "hello" in output.lower():
return {"name": "green_path_e2e", "status": "ok", "output": output.strip()}
return {"name": "green_path_e2e", "status": "fail", "error": f"Unexpected output: {output}"}
except Exception as e:
return {"name": "green_path_e2e", "status": "fail", "error": str(e)}
def run_smoke_tests(verbose: bool = False) -> Dict[str, Any]:
tests = [
_test_imports(),
_test_cli_entrypoints(),
_test_green_path_e2e(),
]
failed = [t for t in tests if t["status"] != "ok"]
result = {
"overall": "ok" if not failed else "fail",
"tests": tests,
"failed_count": len(failed),
}
if verbose:
print(json.dumps(result, indent=2))
return result
def main(argv: List[str] = None) -> int:
argv = argv or sys.argv[1:]
parser = argparse.ArgumentParser(description="Smoke test runner")
parser.add_argument("--verbose", action="store_true")
args = parser.parse_args(argv)
result = run_smoke_tests(verbose=True)
return 0 if result["overall"] == "ok" else 1
if __name__ == "__main__":
sys.exit(main())

112
devkit/wizard_env.py Normal file
View File

@@ -0,0 +1,112 @@
#!/usr/bin/env python3
"""
Wizard environment validator.
Checks that a new wizard environment is ready for duty.
Usage as CLI:
python -m devkit.wizard_env
python -m devkit.wizard_env --fix
Usage as module:
from devkit.wizard_env import validate
report = validate()
"""
import argparse
import json
import os
import shutil
import subprocess
import sys
from typing import Any, Dict, List
def _has_cmd(name: str) -> bool:
return shutil.which(name) is not None
def _check_env_var(name: str) -> Dict[str, Any]:
value = os.getenv(name)
return {
"name": name,
"status": "ok" if value else "missing",
"value": value[:10] + "..." if value and len(value) > 20 else value,
}
def _check_python_pkg(name: str) -> Dict[str, Any]:
try:
__import__(name)
return {"name": name, "status": "ok"}
except ImportError:
return {"name": name, "status": "missing"}
def validate() -> Dict[str, Any]:
checks = {
"binaries": [
{"name": "python3", "status": "ok" if _has_cmd("python3") else "missing"},
{"name": "git", "status": "ok" if _has_cmd("git") else "missing"},
{"name": "curl", "status": "ok" if _has_cmd("curl") else "missing"},
{"name": "jupyter-lab", "status": "ok" if _has_cmd("jupyter-lab") else "missing"},
{"name": "papermill", "status": "ok" if _has_cmd("papermill") else "missing"},
{"name": "jupytext", "status": "ok" if _has_cmd("jupytext") else "missing"},
],
"env_vars": [
_check_env_var("GITEA_URL"),
_check_env_var("GITEA_TOKEN"),
_check_env_var("TELEGRAM_BOT_TOKEN"),
],
"python_packages": [
_check_python_pkg("requests"),
_check_python_pkg("jupyter_server"),
_check_python_pkg("nbformat"),
],
}
all_ok = all(
c["status"] == "ok"
for group in checks.values()
for c in group
)
# Hermes-specific checks
hermes_home = os.path.expanduser("~/.hermes")
checks["hermes"] = [
{"name": "config.yaml", "status": "ok" if os.path.exists(f"{hermes_home}/config.yaml") else "missing"},
{"name": "skills_dir", "status": "ok" if os.path.exists(f"{hermes_home}/skills") else "missing"},
]
all_ok = all_ok and all(c["status"] == "ok" for c in checks["hermes"])
return {
"overall": "ok" if all_ok else "incomplete",
"checks": checks,
}
def main(argv: List[str] = None) -> int:
argv = argv or sys.argv[1:]
parser = argparse.ArgumentParser(description="Wizard environment validator")
parser.add_argument("--json", action="store_true")
parser.add_argument("--fail-on-incomplete", action="store_true")
args = parser.parse_args(argv)
report = validate()
if args.json:
print(json.dumps(report, indent=2))
else:
print(f"Wizard Environment: {report['overall']}")
for group, items in report["checks"].items():
print(f"\n[{group}]")
for item in items:
status_icon = "" if item["status"] == "ok" else ""
print(f" {status_icon} {item['name']}: {item['status']}")
if args.fail_on_incomplete and report["overall"] != "ok":
return 1
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@@ -0,0 +1,132 @@
# Fleet SITREP — April 6, 2026
**Classification:** Consolidated Status Report
**Compiled by:** Ezra
**Acknowledged by:** Claude (Issue #143)
---
## Executive Summary
Allegro executed 7 tasks across infrastructure, contracting, audits, and security. Ezra shipped PR #131, filed formalization audit #132, delivered quarterly report #133, and self-assigned issues #134#138. All wizard activity mapped below.
---
## 1. Allegro 7-Task Report
| Task | Description | Status |
|------|-------------|--------|
| 1 | Roll Call / Infrastructure Map | ✅ Complete |
| 2 | Dark industrial anthem (140 BPM, Suno-ready) | ✅ Complete |
| 3 | Operation Get A Job — 7-file contracting playbook pushed to `the-nexus` | ✅ Complete |
| 4 | Formalization audit filed ([the-nexus #893](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/893)) | ✅ Complete |
| 5 | GrepTard Memory Report — PR #525 on `timmy-home` | ✅ Complete |
| 6 | Self-audit issues #894#899 filed on `the-nexus` | ✅ Filed |
| 7 | `keystore.json` permissions fixed to `600` | ✅ Applied |
### Critical Findings from Task 4 (Formalization Audit)
- GOFAI source files missing — only `.pyc` remains
- Nostr keystore was world-readable — **FIXED** (Task 7)
- 39 burn scripts cluttering `/root` — archival pending ([#898](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/898))
---
## 2. Ezra Deliverables
| Deliverable | Issue/PR | Status |
|-------------|----------|--------|
| V-011 fix + compressor tuning | [PR #131](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/pulls/131) | ✅ Merged |
| Formalization audit (hermes-agent) | [Issue #132](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/132) | Filed |
| Quarterly report (MD + PDF) | [Issue #133](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/133) | Filed |
| Burn-mode concurrent tool tests | [Issue #134](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/134) | Assigned → Ezra |
| MCP SDK migration | [Issue #135](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/135) | Assigned → Ezra |
| APScheduler migration | [Issue #136](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/136) | Assigned → Ezra |
| Pydantic-settings migration | [Issue #137](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/137) | Assigned → Ezra |
| Contracting playbook tracker | [Issue #138](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/138) | Assigned → Ezra |
---
## 3. Fleet Status
| Wizard | Host | Status | Blocker |
|--------|------|--------|---------|
| **Ezra** | Hermes VPS | Active — 5 issues queued | None |
| **Bezalel** | Hermes VPS | Gateway running on 8645 | None |
| **Allegro-Primus** | Hermes VPS | **Gateway DOWN on 8644** | Needs restart signal |
| **Bilbo** | External | Gemma 4B active, Telegram dual-mode | Host IP unknown to fleet |
### Allegro Gateway Recovery
Allegro-Primus gateway (port 8644) is down. Options:
1. **Alexander restarts manually** on Hermes VPS
2. **Delegate to Bezalel** — Bezalel can issue restart signal via Hermes VPS access
3. **Delegate to Ezra** — Ezra can coordinate restart as part of issue #894 work
---
## 4. Operation Get A Job — Contracting Playbook
Files pushed to `the-nexus/operation-get-a-job/`:
| File | Purpose |
|------|---------|
| `README.md` | Master plan |
| `entity-setup.md` | Wyoming LLC, Mercury, E&O insurance |
| `service-offerings.md` | Rates $150600/hr; packages $5k/$15k/$40k+ |
| `portfolio.md` | Portfolio structure |
| `outreach-templates.md` | Cold email templates |
| `proposal-template.md` | Client proposal structure |
| `rate-card.md` | Rate card |
**Human-only mile (Alexander's action items):**
1. Pick LLC name from `entity-setup.md`
2. File Wyoming LLC via Northwest Registered Agent ($225)
3. Get EIN from IRS (free, ~10 min)
4. Open Mercury account (requires EIN + LLC docs)
5. Secure E&O insurance (~$150250/month)
6. Restart Allegro-Primus gateway (port 8644)
7. Update LinkedIn using profile template
8. Send 5 cold emails using outreach templates
---
## 5. Pending Self-Audit Issues (the-nexus)
| Issue | Title | Priority |
|-------|-------|----------|
| [#894](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/894) | Deploy burn-mode cron jobs | CRITICAL |
| [#895](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/895) | Telegram thread-based reporting | Normal |
| [#896](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/896) | Retry logic and error recovery | Normal |
| [#897](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/897) | Automate morning reports at 0600 | Normal |
| [#898](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/898) | Archive 39 burn scripts | Normal |
| [#899](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/899) | Keystore permissions | ✅ Done |
---
## 6. Revenue Timeline
| Milestone | Target | Unlocks |
|-----------|--------|---------|
| LLC + Bank + E&O | Day 5 | Ability to invoice clients |
| First 5 emails sent | Day 7 | Pipeline generation |
| First scoping call | Day 14 | Qualified lead |
| First proposal accepted | Day 21 | **$4,500$12,000 revenue** |
| Monthly retainer signed | Day 45 | **$6,000/mo recurring** |
---
## 7. Delegation Matrix
| Owner | Owns |
|-------|------|
| **Alexander** | LLC filing, EIN, Mercury, E&O, LinkedIn, cold emails, gateway restart |
| **Ezra** | Issues #134#138 (tests, migrations, tracker) |
| **Allegro** | Issues #894, #898 (cron deployment, burn script archival) |
| **Bezalel** | Review formalization audit for Anthropic-specific gaps |
---
*SITREP acknowledged by Claude — April 6, 2026*
*Source issue: [hermes-agent #143](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/143)*

View File

@@ -0,0 +1,166 @@
# Research Acknowledgment: SSD — Simple Self-Distillation Improves Code Generation
**Issue:** #128
**Paper:** [Embarrassingly Simple Self-Distillation Improves Code Generation](https://arxiv.org/abs/2604.01193)
**Authors:** Ruixiang Zhang, Richard He Bai, Huangjie Zheng, Navdeep Jaitly, Ronan Collobert, Yizhe Zhang (Apple)
**Date:** April 1, 2026
**Code:** https://github.com/apple/ml-ssd
**Acknowledged by:** Claude — April 6, 2026
---
## Assessment: High Relevance to Fleet
This paper is directly applicable to the hermes-agent fleet. The headline result — +7.5pp pass@1 on Qwen3-4B — is at exactly the scale we operate. The method requires no external infrastructure. Triage verdict: **P0 / Week-class work**.
---
## What SSD Actually Does
Three steps, nothing exotic:
1. **Sample**: For each coding prompt, generate one solution at temperature `T_train` (~0.9). Do NOT filter for correctness.
2. **Fine-tune**: SFT on the resulting `(prompt, unverified_solution)` pairs. Standard cross-entropy loss. No RLHF, no GRPO, no DPO.
3. **Evaluate**: At `T_eval` (which must be **different** from `T_train`). This asymmetry is not optional — using the same temperature for both loses 3050% of the gains.
The counterintuitive part: N=1 per problem, unverified. Prior self-improvement work uses N>>1 and filters by execution. SSD doesn't. The paper argues this is *why* it works — you're sharpening the model's own distribution, not fitting to a correctness filter's selection bias.
---
## The Fork/Lock Theory
The paper's core theoretical contribution explains *why* temperature asymmetry matters.
**Locks** — positions requiring syntactic precision: colons, parentheses, import paths, variable names. A mistake here is a hard error. Low temperature helps at Locks. But applying low temperature globally kills diversity everywhere.
**Forks** — algorithmic choice points where multiple valid continuations exist: picking a sort algorithm, choosing a data structure, deciding on a loop structure. High temperature helps at Forks. But applying high temperature globally introduces errors at Locks.
SSD's fine-tuning reshapes token distributions **context-dependently**:
- At Locks: narrows the distribution, suppressing distractor tokens
- At Forks: widens the distribution, preserving valid algorithmic paths
A single global temperature cannot do this. SFT on self-generated data can, because the model learns from examples that implicitly encode which positions are Locks and which are Forks in each problem context.
**Fleet implication**: Our agents are currently using a single temperature for everything. This is leaving performance on the table even without fine-tuning. The immediate zero-cost action is temperature auditing (see Phase 1 below).
---
## Results That Matter to Us
| Model | Before | After | Delta |
|-------|--------|-------|-------|
| Qwen3-30B-Instruct | 42.4% | 55.3% | +12.9pp (+30% rel) |
| Qwen3-4B-Instruct | baseline | baseline+7.5pp | +7.5pp |
| Llama-3.1-8B-Instruct | baseline | baseline+3.5pp | +3.5pp |
Gains concentrate on hard problems: +14.2pp medium, +15.3pp hard. This is the distribution our agents face on real Gitea issues — not easy textbook problems.
---
## Fleet Implementation Plan
### Phase 1: Temperature Audit (Zero cost, this week)
Current state: fleet agents use default or eyeballed temperature settings. The paper shows T_eval != T_train is critical even without fine-tuning.
Actions:
1. Document current temperature settings in `hermes/`, `skills/`, and any Ollama config files
2. Establish a held-out test set of 20+ solved Gitea issues with known-correct outputs
3. Run A/B: current T_eval vs. T_eval=0.7 vs. T_eval=0.3 for code generation tasks
4. Record pass rates per condition; file findings as a follow-up issue
Expected outcome: measurable improvement with no model changes, no infrastructure, no cost.
### Phase 2: SSD Pipeline (12 weeks, single Mac)
Replicate the paper's method on Qwen3-4B via Ollama + axolotl or unsloth:
```
1. Dataset construction:
- Extract 100500 coding prompts from Gitea issue backlog
- Focus on issues that have accepted PRs (ground truth available for evaluation only, not training)
- Format: (system_prompt + issue_description) → model generates solution at T_train=0.9
2. Fine-tuning:
- Use LoRA (not full fine-tune) to stay local-first
- Standard SFT: cross-entropy on (prompt, self-generated_solution) pairs
- Recommended: unsloth for memory efficiency on Mac hardware
- Training budget: 13 epochs, small batch size
3. Evaluation:
- Compare base model vs. SSD-tuned model at T_eval=0.7
- Metric: pass@1 on held-out issues not in training set
- Also test on general coding benchmarks to check for capability regression
```
Infrastructure assessment:
- **RAM**: Qwen3-4B quantized (Q4_K_M) needs ~3.5GB VRAM for inference; LoRA fine-tuning needs ~812GB unified memory (Mac M-series feasible)
- **Storage**: Self-generated dataset is small; LoRA adapter is ~100500MB
- **Time**: 500 examples × 3 epochs ≈ 24 hours on M2/M3 Max
- **Dependencies**: Ollama (inference), unsloth or axolotl (fine-tuning), datasets (HuggingFace), trl
No cloud required. No teacher model required. No code execution environment required.
### Phase 3: Continuous Self-Improvement Loop (12 months)
Wire SSD into the fleet's burn mode:
```
Nightly cron:
1. Collect agent solutions from the day's completed issues
2. Filter: only solutions where the PR was merged (human-verified correct)
3. Append to rolling training buffer (last 500 examples)
4. Run SFT fine-tune on buffer → update LoRA adapter
5. Swap adapter into Ollama deployment at dawn
6. Agents start next day with yesterday's lessons baked in
```
This integrates naturally with RetainDB (#112) — the persistent memory system would track which solutions were merged, providing the feedback signal. The continuous loop turns every merged PR into a training example.
### Phase 4: Sovereignty Confirmation
The paper validates that external data is not required for improvement. Our fleet can:
- Fine-tune exclusively on its own conversation data
- Stay fully local (no API calls, no external datasets)
- Accumulate improvements over time without model subscriptions
This is the sovereign fine-tuning capability the fleet needs to remain independent as external model APIs change pricing or capabilities.
---
## Risks and Mitigations
| Risk | Assessment | Mitigation |
|------|------------|------------|
| SSD gains don't transfer from LiveCodeBench to Gitea issues | Medium — our domain is software engineering, not competitive programming | Test on actual Gitea issues from the backlog; don't assume benchmark numbers transfer |
| Fine-tuning degrades non-code capabilities | Low-Medium | LoRA instead of full fine-tune; test on general tasks after SFT; retain base model checkpoint |
| Small training set (<200 examples) insufficient | Medium | Paper shows gains at modest scale; supplement with open code datasets (Stack, TheVault) if needed |
| Qwen3 GGUF format incompatible with unsloth fine-tuning | Low | unsloth supports Qwen3; verify exact GGUF variant compatibility before starting |
| Temperature asymmetry effect smaller on instruction-tuned variants | Low | Paper explicitly tests instruct variants and shows gains; Qwen3-4B-Instruct is in the paper's results |
---
## Acceptance Criteria Status
From the issue:
- [ ] **Temperature audit** — Document current T/top_p settings across fleet agents, compare with paper recommendations
- [ ] **T_eval benchmark** — A/B test on 20+ solved Gitea issues; measure correctness
- [ ] **SSD reproduction** — Replicate pipeline on Qwen4B with 100 prompts; measure pass@1 change
- [ ] **Infrastructure assessment** — Documented above (Phase 2 section); GPU/RAM/storage requirements are Mac-feasible
- [ ] **Continuous loop design** — Architecture drafted above (Phase 3 section); integrates with RetainDB (#112)
Infrastructure assessment and continuous loop design are addressed in this document. Temperature audit and SSD reproduction require follow-up issues with execution.
---
## Recommended Follow-Up Issues
1. **Temperature Audit** — Audit all fleet agent temperature configs; run A/B on T_eval variants; file results (Phase 1)
2. **SSD Pipeline Spike** — Build and run the 3-stage SSD pipeline on Qwen3-4B; report pass@1 delta (Phase 2)
3. **Nightly SFT Integration** — Wire SSD into burn-mode cron; integrate with RetainDB feedback loop (Phase 3)
---
*Research acknowledged by Claude — April 6, 2026*
*Source issue: [hermes-agent #128](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/128)*

261
scripts/forge_health_check.py Executable file
View File

@@ -0,0 +1,261 @@
#!/usr/bin/env python3
"""Forge Health Check — Build verification and artifact integrity scanner.
Scans wizard environments for:
- Missing source files (.pyc without .py) — Allegro finding: GOFAI source files gone
- Burn script accumulation in /root or wizard directories
- World-readable sensitive files (keystores, tokens, configs)
- Missing required environment variables
Usage:
python scripts/forge_health_check.py /root/wizards
python scripts/forge_health_check.py /root/wizards --json
python scripts/forge_health_check.py /root/wizards --fix-permissions
"""
from __future__ import annotations
import argparse
import json
import os
import stat
import sys
from dataclasses import asdict, dataclass, field
from pathlib import Path
from typing import Iterable
SENSITIVE_FILE_PATTERNS = (
"keystore",
"password",
"private",
"apikey",
"api_key",
"credentials",
)
SENSITIVE_NAME_PREFIXES = (
"key_",
"keys_",
"token_",
"tokens_",
"secret_",
"secrets_",
".env",
"env.",
)
SENSITIVE_NAME_SUFFIXES = (
"_key",
"_keys",
"_token",
"_tokens",
"_secret",
"_secrets",
".key",
".env",
".token",
".secret",
)
SENSIBLE_PERMISSIONS = 0o600 # owner read/write only
REQUIRED_ENV_VARS = (
"GITEA_URL",
"GITEA_TOKEN",
"GITEA_USER",
)
BURN_SCRIPT_PATTERNS = (
"burn",
"ignite",
"inferno",
"scorch",
"char",
"blaze",
"ember",
)
@dataclass
class HealthFinding:
category: str
severity: str # critical, warning, info
path: str
message: str
suggestion: str = ""
@dataclass
class HealthReport:
target: str
findings: list[HealthFinding] = field(default_factory=list)
passed: bool = True
def add(self, finding: HealthFinding) -> None:
self.findings.append(finding)
if finding.severity == "critical":
self.passed = False
def scan_orphaned_bytecode(root: Path, report: HealthReport) -> None:
"""Detect .pyc files without corresponding .py source files."""
for pyc in root.rglob("*.pyc"):
py = pyc.with_suffix(".py")
if not py.exists():
# Also check __pycache__ naming convention
if pyc.name.startswith("__") and pyc.parent.name == "__pycache__":
stem = pyc.stem.split(".")[0]
py = pyc.parent.parent / f"{stem}.py"
if not py.exists():
report.add(
HealthFinding(
category="artifact_integrity",
severity="critical",
path=str(pyc),
message=f"Compiled bytecode without source: {pyc}",
suggestion="Restore missing .py source file from version control or backup",
)
)
def scan_burn_script_clutter(root: Path, report: HealthReport) -> None:
"""Detect burn scripts and other temporary artifacts outside proper staging."""
for path in root.iterdir():
if not path.is_file():
continue
lower = path.name.lower()
if any(pat in lower for pat in BURN_SCRIPT_PATTERNS):
report.add(
HealthFinding(
category="deployment_hygiene",
severity="warning",
path=str(path),
message=f"Burn script or temporary artifact in production path: {path.name}",
suggestion="Archive to a burn/ or tmp/ directory, or remove if no longer needed",
)
)
def _is_sensitive_filename(name: str) -> bool:
"""Check if a filename indicates it may contain secrets."""
lower = name.lower()
if lower == ".env.example":
return False
if any(pat in lower for pat in SENSITIVE_FILE_PATTERNS):
return True
if any(lower.startswith(pref) for pref in SENSITIVE_NAME_PREFIXES):
return True
if any(lower.endswith(suff) for suff in SENSITIVE_NAME_SUFFIXES):
return True
return False
def scan_sensitive_file_permissions(root: Path, report: HealthReport, fix: bool = False) -> None:
"""Detect world-readable sensitive files."""
for fpath in root.rglob("*"):
if not fpath.is_file():
continue
# Skip test files — real secrets should never live in tests/
if "/tests/" in str(fpath) or str(fpath).startswith(str(root / "tests")):
continue
if not _is_sensitive_filename(fpath.name):
continue
try:
mode = fpath.stat().st_mode
except OSError:
continue
# Readable by group or other
if mode & stat.S_IRGRP or mode & stat.S_IROTH:
was_fixed = False
if fix:
try:
fpath.chmod(SENSIBLE_PERMISSIONS)
was_fixed = True
except OSError:
pass
report.add(
HealthFinding(
category="security",
severity="critical",
path=str(fpath),
message=(
f"Sensitive file world-readable: {fpath.name} "
f"(mode={oct(mode & 0o777)})"
),
suggestion=(
f"Fixed permissions to {oct(SENSIBLE_PERMISSIONS)}"
if was_fixed
else f"Run 'chmod {oct(SENSIBLE_PERMISSIONS)[2:]} {fpath}'"
),
)
)
def scan_environment_variables(report: HealthReport) -> None:
"""Check for required environment variables."""
for var in REQUIRED_ENV_VARS:
if not os.environ.get(var):
report.add(
HealthFinding(
category="configuration",
severity="warning",
path="$" + var,
message=f"Required environment variable {var} is missing or empty",
suggestion="Export the variable in your shell profile or secrets manager",
)
)
def run_health_check(target: Path, fix_permissions: bool = False) -> HealthReport:
report = HealthReport(target=str(target.resolve()))
if target.exists():
scan_orphaned_bytecode(target, report)
scan_burn_script_clutter(target, report)
scan_sensitive_file_permissions(target, report, fix=fix_permissions)
scan_environment_variables(report)
return report
def print_report(report: HealthReport) -> None:
status = "PASS" if report.passed else "FAIL"
print(f"Forge Health Check: {status}")
print(f"Target: {report.target}")
print(f"Findings: {len(report.findings)}\n")
by_category: dict[str, list[HealthFinding]] = {}
for f in report.findings:
by_category.setdefault(f.category, []).append(f)
for category, findings in by_category.items():
print(f"[{category.upper()}]")
for f in findings:
print(f" {f.severity.upper()}: {f.message}")
if f.suggestion:
print(f" -> {f.suggestion}")
print()
def main(argv: list[str] | None = None) -> int:
parser = argparse.ArgumentParser(description="Forge Health Check")
parser.add_argument("target", nargs="?", default="/root/wizards", help="Root path to scan")
parser.add_argument("--json", action="store_true", help="Output JSON report")
parser.add_argument("--fix-permissions", action="store_true", help="Auto-fix file permissions")
args = parser.parse_args(argv)
target = Path(args.target)
report = run_health_check(target, fix_permissions=args.fix_permissions)
if args.json:
print(json.dumps(asdict(report), indent=2))
else:
print_report(report)
return 0 if report.passed else 1
if __name__ == "__main__":
raise SystemExit(main())

89
scripts/smoke_test.py Executable file
View File

@@ -0,0 +1,89 @@
#!/usr/bin/env python3
"""Forge smoke tests — fast checks that core imports resolve and entrypoints load.
Total runtime target: < 30 seconds.
"""
from __future__ import annotations
import importlib
import subprocess
import sys
from pathlib import Path
# Allow running smoke test directly from repo root before pip install
REPO_ROOT = Path(__file__).parent.parent
sys.path.insert(0, str(REPO_ROOT))
CORE_MODULES = [
"hermes_cli.config",
"hermes_state",
"model_tools",
"toolsets",
"utils",
]
CLI_ENTRYPOINTS = [
[sys.executable, "cli.py", "--help"],
]
def test_imports() -> None:
ok = 0
skipped = 0
for mod in CORE_MODULES:
try:
importlib.import_module(mod)
ok += 1
except ImportError as exc:
# If the failure is a missing third-party dependency, skip rather than fail
# so the smoke test can run before `pip install` in bare environments.
msg = str(exc).lower()
if "no module named" in msg and mod.replace(".", "/") not in msg:
print(f"SKIP: import {mod} -> missing dependency ({exc})")
skipped += 1
else:
print(f"FAIL: import {mod} -> {exc}")
sys.exit(1)
except Exception as exc:
print(f"FAIL: import {mod} -> {exc}")
sys.exit(1)
print(f"OK: {ok} core imports", end="")
if skipped:
print(f" ({skipped} skipped due to missing deps)")
else:
print()
def test_cli_help() -> None:
ok = 0
skipped = 0
for cmd in CLI_ENTRYPOINTS:
result = subprocess.run(cmd, capture_output=True, timeout=30)
if result.returncode == 0:
ok += 1
continue
stderr = result.stderr.decode().lower()
# Gracefully skip if dependencies are missing in bare environments
if "modulenotfounderror" in stderr or "no module named" in stderr:
print(f"SKIP: {' '.join(cmd)} -> missing dependency")
skipped += 1
else:
print(f"FAIL: {' '.join(cmd)} -> {result.stderr.decode()[:200]}")
sys.exit(1)
print(f"OK: {ok} CLI entrypoints", end="")
if skipped:
print(f" ({skipped} skipped due to missing deps)")
else:
print()
def main() -> int:
test_imports()
test_cli_help()
print("Smoke tests passed.")
return 0
if __name__ == "__main__":
raise SystemExit(main())

20
scripts/syntax_guard.py Executable file
View File

@@ -0,0 +1,20 @@
#!/usr/bin/env python3
"""Syntax guard — compile all Python files to catch syntax errors before merge."""
import py_compile
import sys
from pathlib import Path
errors = []
for p in Path(".").rglob("*.py"):
if ".venv" in p.parts or "__pycache__" in p.parts:
continue
try:
py_compile.compile(str(p), doraise=True)
except py_compile.PyCompileError as e:
errors.append(f"{p}: {e}")
print(f"SYNTAX ERROR: {p}: {e}", file=sys.stderr)
if errors:
print(f"\n{len(errors)} file(s) with syntax errors", file=sys.stderr)
sys.exit(1)
print("All Python files compile successfully")

View File

@@ -0,0 +1,175 @@
"""Tests for scripts/forge_health_check.py"""
import os
import stat
from pathlib import Path
# Import the script as a module
import sys
sys.path.insert(0, str(Path(__file__).parent.parent / "scripts"))
from forge_health_check import (
HealthFinding,
HealthReport,
_is_sensitive_filename,
run_health_check,
scan_burn_script_clutter,
scan_orphaned_bytecode,
scan_sensitive_file_permissions,
scan_environment_variables,
)
class TestIsSensitiveFilename:
def test_keystore_is_sensitive(self) -> None:
assert _is_sensitive_filename("keystore.json") is True
def test_env_example_is_not_sensitive(self) -> None:
assert _is_sensitive_filename(".env.example") is False
def test_env_file_is_sensitive(self) -> None:
assert _is_sensitive_filename(".env") is True
assert _is_sensitive_filename("production.env") is True
def test_test_file_with_key_is_not_sensitive(self) -> None:
assert _is_sensitive_filename("test_interrupt_key_match.py") is False
assert _is_sensitive_filename("test_api_key_providers.py") is False
class TestScanOrphanedBytecode:
def test_detects_pyc_without_py(self, tmp_path: Path) -> None:
pyc = tmp_path / "module.pyc"
pyc.write_bytes(b"\x00")
report = HealthReport(target=str(tmp_path))
scan_orphaned_bytecode(tmp_path, report)
assert len(report.findings) == 1
assert report.findings[0].category == "artifact_integrity"
assert report.findings[0].severity == "critical"
def test_ignores_pyc_with_py(self, tmp_path: Path) -> None:
(tmp_path / "module.py").write_text("pass")
pyc = tmp_path / "module.pyc"
pyc.write_bytes(b"\x00")
report = HealthReport(target=str(tmp_path))
scan_orphaned_bytecode(tmp_path, report)
assert len(report.findings) == 0
def test_detects_pycache_orphan(self, tmp_path: Path) -> None:
pycache = tmp_path / "__pycache__"
pycache.mkdir()
pyc = pycache / "module.cpython-312.pyc"
pyc.write_bytes(b"\x00")
report = HealthReport(target=str(tmp_path))
scan_orphaned_bytecode(tmp_path, report)
assert len(report.findings) == 1
assert "__pycache__" in report.findings[0].path
class TestScanBurnScriptClutter:
def test_detects_burn_script(self, tmp_path: Path) -> None:
(tmp_path / "burn_test.sh").write_text("#!/bin/bash")
report = HealthReport(target=str(tmp_path))
scan_burn_script_clutter(tmp_path, report)
assert len(report.findings) == 1
assert report.findings[0].category == "deployment_hygiene"
assert report.findings[0].severity == "warning"
def test_ignores_regular_files(self, tmp_path: Path) -> None:
(tmp_path / "deploy.sh").write_text("#!/bin/bash")
report = HealthReport(target=str(tmp_path))
scan_burn_script_clutter(tmp_path, report)
assert len(report.findings) == 0
class TestScanSensitiveFilePermissions:
def test_detects_world_readable_keystore(self, tmp_path: Path) -> None:
ks = tmp_path / "keystore.json"
ks.write_text("{}")
ks.chmod(0o644)
report = HealthReport(target=str(tmp_path))
scan_sensitive_file_permissions(tmp_path, report)
assert len(report.findings) == 1
assert report.findings[0].category == "security"
assert report.findings[0].severity == "critical"
assert "644" in report.findings[0].message
def test_auto_fixes_permissions(self, tmp_path: Path) -> None:
ks = tmp_path / "keystore.json"
ks.write_text("{}")
ks.chmod(0o644)
report = HealthReport(target=str(tmp_path))
scan_sensitive_file_permissions(tmp_path, report, fix=True)
assert len(report.findings) == 1
assert ks.stat().st_mode & 0o777 == 0o600
def test_ignores_safe_permissions(self, tmp_path: Path) -> None:
ks = tmp_path / "keystore.json"
ks.write_text("{}")
ks.chmod(0o600)
report = HealthReport(target=str(tmp_path))
scan_sensitive_file_permissions(tmp_path, report)
assert len(report.findings) == 0
def test_ignores_env_example(self, tmp_path: Path) -> None:
env = tmp_path / ".env.example"
env.write_text("# example")
env.chmod(0o644)
report = HealthReport(target=str(tmp_path))
scan_sensitive_file_permissions(tmp_path, report)
assert len(report.findings) == 0
def test_ignores_test_directory(self, tmp_path: Path) -> None:
tests_dir = tmp_path / "tests"
tests_dir.mkdir()
ks = tests_dir / "keystore.json"
ks.write_text("{}")
ks.chmod(0o644)
report = HealthReport(target=str(tmp_path))
scan_sensitive_file_permissions(tmp_path, report)
assert len(report.findings) == 0
class TestScanEnvironmentVariables:
def test_reports_missing_env_var(self, monkeypatch) -> None:
monkeypatch.delenv("GITEA_TOKEN", raising=False)
report = HealthReport(target=".")
scan_environment_variables(report)
missing = [f for f in report.findings if f.path == "$GITEA_TOKEN"]
assert len(missing) == 1
assert missing[0].severity == "warning"
def test_passes_when_env_vars_present(self, monkeypatch) -> None:
for var in ("GITEA_URL", "GITEA_TOKEN", "GITEA_USER"):
monkeypatch.setenv(var, "present")
report = HealthReport(target=".")
scan_environment_variables(report)
assert len(report.findings) == 0
class TestRunHealthCheck:
def test_full_run(self, tmp_path: Path, monkeypatch) -> None:
monkeypatch.setenv("GITEA_URL", "https://example.com")
monkeypatch.setenv("GITEA_TOKEN", "secret")
monkeypatch.setenv("GITEA_USER", "bezalel")
(tmp_path / "orphan.pyc").write_bytes(b"\x00")
(tmp_path / "burn_it.sh").write_text("#!/bin/bash")
ks = tmp_path / "keystore.json"
ks.write_text("{}")
ks.chmod(0o644)
report = run_health_check(tmp_path)
assert not report.passed
categories = {f.category for f in report.findings}
assert "artifact_integrity" in categories
assert "deployment_hygiene" in categories
assert "security" in categories
def test_clean_run_passes(self, tmp_path: Path, monkeypatch) -> None:
for var in ("GITEA_URL", "GITEA_TOKEN", "GITEA_USER"):
monkeypatch.setenv(var, "present")
(tmp_path / "module.py").write_text("pass")
report = run_health_check(tmp_path)
assert report.passed
assert len(report.findings) == 0

View File

@@ -0,0 +1,18 @@
"""Bare green-path E2E — one happy-path tool call cycle.
Exercises the terminal tool directly and verifies the response structure.
No API keys required. Runtime target: < 10 seconds.
"""
import json
from tools.terminal_tool import terminal_tool
def test_terminal_echo_green_path() -> None:
"""terminal('echo hello') -> verify response contains 'hello' and exit_code 0."""
result = terminal_tool(command="echo hello", timeout=10)
data = json.loads(result)
assert data["exit_code"] == 0, f"Expected exit_code 0, got {data['exit_code']}"
assert "hello" in data["output"], f"Expected 'hello' in output, got: {data['output']}"

View File

@@ -0,0 +1,215 @@
# Forge Operations Guide
> **Audience:** Forge wizards joining the hermes-agent project
> **Purpose:** Practical patterns, common pitfalls, and operational wisdom
> **Companion to:** `WIZARD_ENVIRONMENT_CONTRACT.md`
---
## The One Rule
**Read the actual state before acting.**
Before touching any service, config, or codebase: `ps aux | grep hermes`, `cat ~/.hermes/gateway_state.json`, `curl http://127.0.0.1:8642/health`. The forge punishes assumptions harder than it rewards speed. Evidence always beats intuition.
---
## First 15 Minutes on a New System
```bash
# 1. Validate your environment
python wizard-bootstrap/wizard_bootstrap.py
# 2. Check what is actually running
ps aux | grep -E 'hermes|python|gateway'
# 3. Check the data directory
ls -la ~/.hermes/
cat ~/.hermes/gateway_state.json 2>/dev/null | python3 -m json.tool
# 4. Verify health endpoints (if gateway is up)
curl -sf http://127.0.0.1:8642/health | python3 -m json.tool
# 5. Run the smoke test
source venv/bin/activate
python -m pytest tests/ -q -x --timeout=60 2>&1 | tail -20
```
Do not begin work until all five steps return clean output.
---
## Import Chain — Know It, Respect It
The dependency order is load-bearing. Violating it causes silent failures:
```
tools/registry.py ← no deps; imported by everything
tools/*.py ← each calls registry.register() at import time
model_tools.py ← imports registry; triggers tool discovery
run_agent.py / cli.py / batch_runner.py
```
**If you add a tool file**, you must also:
1. Add its import to `model_tools.py` `_discover_tools()`
2. Add it to `toolsets.py` (core or a named toolset)
Missing either step causes the tool to silently not appear — no error, just absence.
---
## The Five Profile Rules
Hermes supports isolated profiles (`hermes -p myprofile`). Profile-unsafe code has caused repeated bugs. Memorize these:
| Do this | Not this |
|---------|----------|
| `get_hermes_home()` | `Path.home() / ".hermes"` |
| `display_hermes_home()` in user messages | hardcoded `~/.hermes` strings |
| `get_hermes_home() / "sessions"` in tests | `~/.hermes/sessions` in tests |
Import both from `hermes_constants`. Every `~/.hermes` hardcode is a latent profile bug.
---
## Prompt Caching — Do Not Break It
The agent caches system prompts. Cache breaks force re-billing of the entire context window on every turn. The following actions break caching mid-conversation and are forbidden:
- Altering past context
- Changing the active toolset
- Reloading memories or rebuilding the system prompt
The only sanctioned context alteration is the context compressor (`agent/context_compressor.py`). If your feature touches the message history, read that file first.
---
## Adding a Slash Command (Checklist)
Four files, in order:
1. **`hermes_cli/commands.py`** — add `CommandDef` to `COMMAND_REGISTRY`
2. **`cli.py`** — add handler branch in `HermesCLI.process_command()`
3. **`gateway/run.py`** — add handler if it should work in messaging platforms
4. **Aliases** — add to the `aliases` tuple on the `CommandDef`; everything else updates automatically
All downstream consumers (Telegram menu, Slack routing, autocomplete, help text) derive from `COMMAND_REGISTRY`. You never touch them directly.
---
## Tool Schema Pitfalls
**Do NOT cross-reference other toolsets in schema descriptions.**
Writing "prefer `web_search` over this tool" in a browser tool's description will cause the model to hallucinate calls to `web_search` when it's not loaded. Cross-references belong in `get_tool_definitions()` post-processing blocks in `model_tools.py`.
**Do NOT use `\033[K` (ANSI erase-to-EOL) in display code.**
Under `prompt_toolkit`'s `patch_stdout`, it leaks as literal `?[K`. Use space-padding instead: `f"\r{line}{' ' * pad}"`.
**Do NOT use `simple_term_menu` for interactive menus.**
It ghosts on scroll in tmux/iTerm2. Use `curses` (stdlib). See `hermes_cli/tools_config.py` for the pattern.
---
## Health Check Anatomy
A healthy instance returns:
```json
{
"status": "ok",
"gateway_state": "running",
"platforms": {
"telegram": {"state": "connected"}
}
}
```
| Field | Healthy value | What a bad value means |
|-------|--------------|----------------------|
| `status` | `"ok"` | HTTP server down |
| `gateway_state` | `"running"` | Still starting or crashed |
| `platforms.<name>.state` | `"connected"` | Auth failure or network issue |
`gateway_state: "starting"` is normal for up to 60 s on boot. Beyond that, check logs for auth errors:
```bash
journalctl -u hermes-gateway --since "2 minutes ago" | grep -i "error\|token\|auth"
```
---
## Gateway Won't Start — Diagnosis Order
1. `ss -tlnp | grep 8642` — port conflict?
2. `cat ~/.hermes/gateway.pid``ps -p <pid>` — stale PID file?
3. `hermes gateway start --replace` — clears stale locks and PIDs
4. `HERMES_LOG_LEVEL=DEBUG hermes gateway start` — verbose output
5. Check `~/.hermes/.env` — missing or placeholder token?
---
## Before Every PR
```bash
source venv/bin/activate
python -m pytest tests/ -q # full suite: ~3 min, ~3000 tests
python scripts/deploy-validate # deployment health check
python wizard-bootstrap/wizard_bootstrap.py # environment sanity
```
All three must exit 0. Do not skip. "It works locally" is not sufficient evidence.
---
## Session and State Files
| Store | Location | Notes |
|-------|----------|-------|
| Sessions | `~/.hermes/sessions/*.json` | Persisted across restarts |
| Memories | `~/.hermes/memories/*.md` | Written by the agent's memory tool |
| Cron jobs | `~/.hermes/cron/*.json` | Scheduler state |
| Gateway state | `~/.hermes/gateway_state.json` | Live platform connection status |
| Response store | `~/.hermes/response_store.db` | SQLite WAL — API server only |
All paths go through `get_hermes_home()`. Never hardcode. Always backup before a major update:
```bash
tar czf ~/backups/hermes_$(date +%F_%H%M).tar.gz ~/.hermes/
```
---
## Writing Tests
```bash
python -m pytest tests/path/to/test.py -q # single file
python -m pytest tests/ -q -k "test_name" # by name
python -m pytest tests/ -q -x # stop on first failure
```
**Test isolation rules:**
- `tests/conftest.py` has an autouse fixture that redirects `HERMES_HOME` to a temp dir. Never write to `~/.hermes/` in tests.
- Profile tests must mock both `Path.home()` and `HERMES_HOME`. See `tests/hermes_cli/test_profiles.py` for the pattern.
- Do not mock the database. Integration tests should use real SQLite with a temp path.
---
## Commit Conventions
```
feat: add X # new capability
fix: correct Y # bug fix
refactor: restructure Z # no behaviour change
test: add tests for W # test-only
chore: update deps # housekeeping
docs: clarify X # documentation only
```
Include `Fixes #NNN` or `Refs #NNN` in the commit message body to close or reference issues automatically.
---
*This guide lives in `wizard-bootstrap/`. Update it when you discover a new pitfall or pattern worth preserving.*