chore: claw-code progress on #151

Refs #151
[claw-code] P2: Validate Documentation Audit & Apply to Our Fork (#126 ) (#176 )
2026-04-07 00:03:57 -04:00 · 2026-04-07 03:56:46 +00:00 · 2026-04-07 03:23:36 +00:00 · 2026-04-07 02:27:32 +00:00 · 2026-04-07 02:15:11 +00:00 · 2026-04-07 02:12:31 +00:00
20 changed files with 1981 additions and 41 deletions
--- a/.claw/sessions/session-1775533542734-0.jsonl
+++ b/.claw/sessions/session-1775533542734-0.jsonl
@@ -0,0 +1,2 @@
+{"created_at_ms":1775533542734,"session_id":"session-1775533542734-0","type":"session_meta","updated_at_ms":1775533542734,"version":1}
+{"message":{"blocks":[{"text":"You are Code Claw running as the Gitea user claw-code.\n\nRepository: Timmy_Foundation/hermes-agent\nIssue: #126 — P2: Validate Documentation Audit & Apply to Our Fork\nBranch: claw-code/issue-126\n\nRead the issue and recent comments, then implement the smallest correct change.\nYou are in a git repo checkout already.\n\nIssue body:\n## Context\n\nCommit `43d468ce` is a comprehensive documentation audit — fixes stale info, expands thin pages, adds depth across all docs.\n\n## Acceptance Criteria\n\n- [ ] **Catalog all doc changes**: Run `git show 43d468ce --stat` to list all files changed, then review each for what was fixed/expanded\n- [ ] **Verify key docs are accurate**: Pick 3 docs that were previously thin (setup, deployment, plugin development), confirm they now have comprehensive content\n- [ ] **Identify stale info that was corrected**: Note at least 3 pieces of stale information that were removed or updated\n- [ ] **Apply fixes to our fork if needed**: Check if any of the doc fixes apply to our `Timmy_Foundation/hermes-agent` fork (Timmy-specific references, custom config sections)\n\n## Why This Matters\n\nAccurate documentation is critical for onboarding new agents and maintaining the fleet. Stale docs cost more debugging time than writing them initially.\n\n## Hints\n\n- Run `cd ~/.hermes/hermes-agent && git show 43d468ce --stat` to see the full scope\n- The docs likely cover: setup, plugins, deployment, MCP configuration, and tool integrations\n\n\nParent: #111\n\nRecent comments:\n## 🏷️ Automated Triage Check\n\n**Timestamp:** 2026-04-06T15:30:12.449023  \n**Agent:** Allegro Heartbeat\n\nThis issue has been identified as needing triage:\n\n### Checklist\n- [ ] Clear acceptance criteria defined\n- [ ] Priority label assigned (p0-critical / p1-important / p2-backlog)\n- [ ] Size estimate added (quick-fix / day / week / epic)\n- [ ] Owner assigned\n- [ ] Related issues linked\n\n### Context\n- No comments yet — needs engagement\n- No labels — needs categorization\n- Part of automated backlog maintenance\n\n---\n*Automated triage from Allegro 15-minute heartbeat*\n\n[BURN-DOWN] Dispatched to Code Claw (claw-code worker) as part of nightly burn-down cycle. Heartbeat active.\n\n🟠 Code Claw (OpenRouter qwen/qwen3.6-plus:free) picking up this issue via 15-minute heartbeat.\n\nTimestamp: 2026-04-07T03:45:37Z\n\nRules:\n- Make focused code/config/doc changes only if they directly address the issue.\n- Prefer the smallest proof-oriented fix.\n- Run relevant verification commands if obvious.\n- Do NOT create PRs yourself; the outer worker handles commit/push/PR.\n- If the task is too large or not code-fit, leave the tree unchanged.\n","type":"text"}],"role":"user"},"type":"message"}
--- a/.claw/sessions/session-1775534636684-0.jsonl
+++ b/.claw/sessions/session-1775534636684-0.jsonl
@@ -0,0 +1,2 @@
+{"created_at_ms":1775534636684,"session_id":"session-1775534636684-0","type":"session_meta","updated_at_ms":1775534636684,"version":1}
+{"message":{"blocks":[{"text":"You are Code Claw running as the Gitea user claw-code.\n\nRepository: Timmy_Foundation/hermes-agent\nIssue: #151 — [CONFIG] Add Kimi model to fallback chain for Allegro and Bezalel\nBranch: claw-code/issue-151\n\nRead the issue and recent comments, then implement the smallest correct change.\nYou are in a git repo checkout already.\n\nIssue body:\n## Problem\nAllegro and Bezalel are choking because the Kimi model code is not on their fallback chain. When primary models fail or rate-limit, Kimi should be available as a fallback option but is currently missing.\n\n## Expected Behavior\nKimi model code should be at the front of the fallback chain for both Allegro and Bezalel, so they can remain responsive when primary models are unavailable.\n\n## Context\nThis was reported in Telegram by Alexander Whitestone after observing both agents becoming unresponsive. Ezra was asked to investigate the fallback chain configuration.\n\n## Related\n- timmy-config #302: [ARCH] Fallback Portfolio Runtime Wiring (general fallback framework)\n- hermes-agent #150: [BEZALEL][AUDIT] Telegram Request-to-Gitea Tracking Audit\n\n## Acceptance Criteria\n- [ ] Kimi model code is added to Allegro fallback chain\n- [ ] Kimi model code is added to Bezalel fallback chain\n- [ ] Fallback ordering places Kimi appropriately (front of chain as requested)\n- [ ] Test and confirm both agents can successfully fall back to Kimi\n- [ ] Document the fallback chain configuration for both agents\n\n/assign @ezra\n\nRecent comments:\n[BURN-DOWN] Dispatched to Code Claw (claw-code worker) as part of nightly burn-down cycle. Heartbeat active.\n\n🟠 Code Claw (OpenRouter qwen/qwen3.6-plus:free) picking up this issue via 15-minute heartbeat.\n\nTimestamp: 2026-04-07T04:03:49Z\n\nRules:\n- Make focused code/config/doc changes only if they directly address the issue.\n- Prefer the smallest proof-oriented fix.\n- Run relevant verification commands if obvious.\n- Do NOT create PRs yourself; the outer worker handles commit/push/PR.\n- If the task is too large or not code-fit, leave the tree unchanged.\n","type":"text"}],"role":"user"},"type":"message"}
--- a/.gitea/workflows/ci.yml
+++ b/.gitea/workflows/ci.yml
@@ -0,0 +1,54 @@
+name: Forge CI
+
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+
+concurrency:
+  group: forge-ci-${{ gitea.ref }}
+  cancel-in-progress: true
+
+jobs:
+  smoke-and-build:
+    runs-on: ubuntu-latest
+    timeout-minutes: 5
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v5
+
+      - name: Set up Python 3.11
+        run: uv python install 3.11
+
+      - name: Install package
+        run: |
+          uv venv .venv --python 3.11
+          source .venv/bin/activate
+          uv pip install -e ".[all,dev]"
+
+      - name: Smoke tests
+        run: |
+          source .venv/bin/activate
+          python scripts/smoke_test.py
+        env:
+          OPENROUTER_API_KEY: ""
+          OPENAI_API_KEY: ""
+          NOUS_API_KEY: ""
+
+      - name: Syntax guard
+        run: |
+          source .venv/bin/activate
+          python scripts/syntax_guard.py
+
+      - name: Green-path E2E
+        run: |
+          source .venv/bin/activate
+          python -m pytest tests/test_green_path_e2e.py -q --tb=short
+        env:
+          OPENROUTER_API_KEY: ""
+          OPENAI_API_KEY: ""
+          NOUS_API_KEY: ""
--- a/config/ezra-kimi-primary.yaml
+++ b/config/ezra-kimi-primary.yaml
@@ -1,44 +1,34 @@
-# Ezra Configuration - Kimi Primary
-# Anthropic removed from chain entirely
-
-# PRIMARY: Kimi for all operations
-model: kimi-coding/kimi-for-coding
-
-# Fallback chain: Only local/offline options
-# NO anthropic in the chain - quota issues solved
-fallback_providers:
-  - provider: ollama
-    model: qwen2.5:7b
-    base_url: http://localhost:11434
-    timeout: 120
-    reason: "Local fallback when Kimi unavailable"
-
-# Provider settings
-providers:
-  kimi-coding:
-    timeout: 60
-    max_retries: 3
-    # Uses KIMI_API_KEY from .env
-  
-  ollama:
-    timeout: 120
-    keep_alive: true
-    base_url: http://localhost:11434
-
-# REMOVED: anthropic provider entirely
-# No more quota issues, no more choking
-
-# Toolsets - Ezra needs these
+model:
+  default: kimi-k2.5
+  provider: kimi-coding
 toolsets:
-  - hermes-cli
-  - github
-  - web
-
-# Agent settings
+  - all
+fallback_providers:
+  - provider: kimi-coding
+    model: kimi-k2.5
+    timeout: 120
+    reason: Kimi coding fallback (front of chain)
+  - provider: anthropic
+    model: claude-sonnet-4-20250514
+    timeout: 120
+    reason: Direct Anthropic fallback
+  - provider: openrouter
+    model: anthropic/claude-sonnet-4-20250514
+    base_url: https://openrouter.ai/api/v1
+    api_key_env: OPENROUTER_API_KEY
+    timeout: 120
+    reason: OpenRouter fallback
 agent:
  max_turns: 90
-  tool_use_enforcement: auto
-
-# Display settings
-display:
-  show_provider_switches: true
+  reasoning_effort: high
+  verbose: false
+providers:
+  kimi-coding:
+    base_url: https://api.kimi.com/coding/v1
+    timeout: 60
+    max_retries: 3
+  anthropic:
+    timeout: 120
+  openrouter:
+    base_url: https://openrouter.ai/api/v1
+    timeout: 120
--- a/devkit/README.md
+++ b/devkit/README.md
@@ -0,0 +1,56 @@
+# Bezalel's Devkit — Shared Tools for the Wizard Fleet
+
+This directory contains reusable CLI tools and Python modules for CI, testing, deployment, observability, and Gitea automation. Any wizard can invoke them via `python -m devkit.<tool>`.
+
+## Tools
+
+### `gitea_client` — Gitea API Client
+List issues/PRs, post comments, create PRs, update issues.
+
+```bash
+python -m devkit.gitea_client issues --state open --limit 20
+python -m devkit.gitea_client create-comment --number 142 --body "Update from Bezalel"
+python -m devkit.gitea_client prs --state open
+```
+
+### `health` — Fleet Health Monitor
+Checks system load, disk, memory, running processes, and key package versions.
+
+```bash
+python -m devkit.health --threshold-load 1.0 --threshold-disk 90.0 --fail-on-critical
+```
+
+### `notebook_runner` — Notebook Execution Wrapper
+Parameterizes and executes Jupyter notebooks via Papermill with structured JSON reporting.
+
+```bash
+python -m devkit.notebook_runner task.ipynb output.ipynb -p threshold=1.0 -p hostname=forge
+```
+
+### `smoke_test` — Fast Smoke Test Runner
+Runs core import checks, CLI entrypoint tests, and one bare green-path E2E.
+
+```bash
+python -m devkit.smoke_test --verbose
+```
+
+### `secret_scan` — Secret Leak Scanner
+Scans the repo for API keys, tokens, and private keys.
+
+```bash
+python -m devkit.secret_scan --path . --fail-on-find
+```
+
+### `wizard_env` — Environment Validator
+Checks that a wizard environment has all required binaries, env vars, Python packages, and Hermes config.
+
+```bash
+python -m devkit.wizard_env --json --fail-on-incomplete
+```
+
+## Philosophy
+
+- **CLI-first** — Every tool is runnable as `python -m devkit.<tool>`
+- **JSON output** — Easy to parse from other agents and CI pipelines
+- **Zero dependencies beyond stdlib** where possible; optional heavy deps are runtime-checked
+- **Fail-fast** — Exit codes are meaningful for CI gating
--- a/devkit/init.py
+++ b/devkit/init.py
@@ -0,0 +1,9 @@
+"""
+Bezalel's Devkit — Shared development tools for the wizard fleet.
+
+A collection of CLI-accessible utilities for CI, testing, deployment,
+observability, and Gitea automation. Designed to be used by any agent
+via subprocess or direct Python import.
+"""
+
+__version__ = "0.1.0"
--- a/devkit/gitea_client.py
+++ b/devkit/gitea_client.py
@@ -0,0 +1,153 @@
+#!/usr/bin/env python3
+"""
+Shared Gitea API client for wizard fleet automation.
+
+Usage as CLI:
+    python -m devkit.gitea_client issues --repo Timmy_Foundation/hermes-agent --state open
+    python -m devkit.gitea_client issue --repo Timmy_Foundation/hermes-agent --number 142
+    python -m devkit.gitea_client create-comment --repo Timmy_Foundation/hermes-agent --number 142 --body "Update from Bezalel"
+    python -m devkit.gitea_client prs --repo Timmy_Foundation/hermes-agent --state open
+
+Usage as module:
+    from devkit.gitea_client import GiteaClient
+    client = GiteaClient()
+    issues = client.list_issues("Timmy_Foundation/hermes-agent", state="open")
+"""
+
+import argparse
+import json
+import os
+import sys
+from typing import Any, Dict, List, Optional
+
+import urllib.request
+
+
+DEFAULT_BASE_URL = os.getenv("GITEA_URL", "https://forge.alexanderwhitestone.com")
+DEFAULT_TOKEN = os.getenv("GITEA_TOKEN", "")
+
+
+class GiteaClient:
+    def __init__(self, base_url: str = DEFAULT_BASE_URL, token: str = DEFAULT_TOKEN):
+        self.base_url = base_url.rstrip("/")
+        self.token = token or ""
+
+    def _request(
+        self,
+        method: str,
+        path: str,
+        data: Optional[Dict[str, Any]] = None,
+        headers: Optional[Dict[str, str]] = None,
+    ) -> Any:
+        url = f"{self.base_url}/api/v1{path}"
+        req_headers = {"Content-Type": "application/json", "Accept": "application/json"}
+        if self.token:
+            req_headers["Authorization"] = f"token {self.token}"
+        if headers:
+            req_headers.update(headers)
+
+        body = json.dumps(data).encode() if data else None
+        req = urllib.request.Request(url, data=body, headers=req_headers, method=method)
+
+        try:
+            with urllib.request.urlopen(req) as resp:
+                return json.loads(resp.read().decode())
+        except urllib.error.HTTPError as e:
+            return {"error": True, "status": e.code, "body": e.read().decode()}
+
+    def list_issues(self, repo: str, state: str = "open", limit: int = 50) -> List[Dict]:
+        return self._request("GET", f"/repos/{repo}/issues?state={state}&limit={limit}") or []
+
+    def get_issue(self, repo: str, number: int) -> Dict:
+        return self._request("GET", f"/repos/{repo}/issues/{number}") or {}
+
+    def create_comment(self, repo: str, number: int, body: str) -> Dict:
+        return self._request(
+            "POST", f"/repos/{repo}/issues/{number}/comments", {"body": body}
+        )
+
+    def update_issue(self, repo: str, number: int, **fields) -> Dict:
+        return self._request("PATCH", f"/repos/{repo}/issues/{number}", fields)
+
+    def list_prs(self, repo: str, state: str = "open", limit: int = 50) -> List[Dict]:
+        return self._request("GET", f"/repos/{repo}/pulls?state={state}&limit={limit}") or []
+
+    def get_pr(self, repo: str, number: int) -> Dict:
+        return self._request("GET", f"/repos/{repo}/pulls/{number}") or {}
+
+    def create_pr(self, repo: str, title: str, head: str, base: str, body: str = "") -> Dict:
+        return self._request(
+            "POST",
+            f"/repos/{repo}/pulls",
+            {"title": title, "head": head, "base": base, "body": body},
+        )
+
+
+def _fmt_json(obj: Any) -> str:
+    return json.dumps(obj, indent=2, ensure_ascii=False)
+
+
+def main(argv: List[str] = None) -> int:
+    argv = argv or sys.argv[1:]
+    parser = argparse.ArgumentParser(description="Gitea CLI for wizard fleet")
+    parser.add_argument("--repo", default="Timmy_Foundation/hermes-agent", help="Repository full name")
+    parser.add_argument("--token", default=DEFAULT_TOKEN, help="Gitea API token")
+    parser.add_argument("--base-url", default=DEFAULT_BASE_URL, help="Gitea base URL")
+    sub = parser.add_subparsers(dest="cmd")
+
+    p_issues = sub.add_parser("issues", help="List issues")
+    p_issues.add_argument("--state", default="open")
+    p_issues.add_argument("--limit", type=int, default=50)
+
+    p_issue = sub.add_parser("issue", help="Get single issue")
+    p_issue.add_argument("--number", type=int, required=True)
+
+    p_prs = sub.add_parser("prs", help="List PRs")
+    p_prs.add_argument("--state", default="open")
+    p_prs.add_argument("--limit", type=int, default=50)
+
+    p_pr = sub.add_parser("pr", help="Get single PR")
+    p_pr.add_argument("--number", type=int, required=True)
+
+    p_comment = sub.add_parser("create-comment", help="Post comment on issue/PR")
+    p_comment.add_argument("--number", type=int, required=True)
+    p_comment.add_argument("--body", required=True)
+
+    p_update = sub.add_parser("update-issue", help="Update issue fields")
+    p_update.add_argument("--number", type=int, required=True)
+    p_update.add_argument("--title", default=None)
+    p_update.add_argument("--body", default=None)
+    p_update.add_argument("--state", default=None)
+
+    p_create_pr = sub.add_parser("create-pr", help="Create a PR")
+    p_create_pr.add_argument("--title", required=True)
+    p_create_pr.add_argument("--head", required=True)
+    p_create_pr.add_argument("--base", default="main")
+    p_create_pr.add_argument("--body", default="")
+
+    args = parser.parse_args(argv)
+    client = GiteaClient(base_url=args.base_url, token=args.token)
+
+    if args.cmd == "issues":
+        print(_fmt_json(client.list_issues(args.repo, args.state, args.limit)))
+    elif args.cmd == "issue":
+        print(_fmt_json(client.get_issue(args.repo, args.number)))
+    elif args.cmd == "prs":
+        print(_fmt_json(client.list_prs(args.repo, args.state, args.limit)))
+    elif args.cmd == "pr":
+        print(_fmt_json(client.get_pr(args.repo, args.number)))
+    elif args.cmd == "create-comment":
+        print(_fmt_json(client.create_comment(args.repo, args.number, args.body)))
+    elif args.cmd == "update-issue":
+        fields = {k: v for k, v in {"title": args.title, "body": args.body, "state": args.state}.items() if v is not None}
+        print(_fmt_json(client.update_issue(args.repo, args.number, **fields)))
+    elif args.cmd == "create-pr":
+        print(_fmt_json(client.create_pr(args.repo, args.title, args.head, args.base, args.body)))
+    else:
+        parser.print_help()
+        return 1
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/devkit/health.py
+++ b/devkit/health.py
@@ -0,0 +1,134 @@
+#!/usr/bin/env python3
+"""
+Fleet health monitor for wizard agents.
+Checks local system state and reports structured health metrics.
+
+Usage as CLI:
+    python -m devkit.health
+    python -m devkit.health --threshold-load 1.0 --check-disk
+
+Usage as module:
+    from devkit.health import check_health
+    report = check_health()
+"""
+
+import argparse
+import json
+import os
+import shutil
+import subprocess
+import sys
+import time
+from typing import Any, Dict, List
+
+
+def _run(cmd: List[str]) -> str:
+    try:
+        return subprocess.check_output(cmd, stderr=subprocess.DEVNULL).decode().strip()
+    except Exception as e:
+        return f"error: {e}"
+
+
+def check_health(threshold_load: float = 1.0, threshold_disk_percent: float = 90.0) -> Dict[str, Any]:
+    gather_time = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())
+
+    # Load average
+    load_raw = _run(["cat", "/proc/loadavg"])
+    load_values = []
+    avg_load = None
+    if load_raw.startswith("error:"):
+        load_status = load_raw
+    else:
+        try:
+            load_values = [float(x) for x in load_raw.split()[:3]]
+            avg_load = sum(load_values) / len(load_values)
+            load_status = "critical" if avg_load > threshold_load else "ok"
+        except Exception as e:
+            load_status = f"error parsing load: {e}"
+
+    # Disk usage
+    disk = shutil.disk_usage("/")
+    disk_percent = (disk.used / disk.total) * 100 if disk.total else 0.0
+    disk_status = "critical" if disk_percent > threshold_disk_percent else "ok"
+
+    # Memory
+    meminfo = _run(["cat", "/proc/meminfo"])
+    mem_stats = {}
+    for line in meminfo.splitlines():
+        if ":" in line:
+            key, val = line.split(":", 1)
+            mem_stats[key.strip()] = val.strip()
+
+    # Running processes
+    hermes_pids = []
+    try:
+        ps_out = subprocess.check_output(["pgrep", "-a", "-f", "hermes"]).decode().strip()
+        hermes_pids = [line.split(None, 1) for line in ps_out.splitlines() if line.strip()]
+    except subprocess.CalledProcessError:
+        hermes_pids = []
+
+    # Python package versions (key ones)
+    key_packages = ["jupyterlab", "papermill", "requests"]
+    pkg_versions = {}
+    for pkg in key_packages:
+        try:
+            out = subprocess.check_output([sys.executable, "-m", "pip", "show", pkg], stderr=subprocess.DEVNULL).decode()
+            for line in out.splitlines():
+                if line.startswith("Version:"):
+                    pkg_versions[pkg] = line.split(":", 1)[1].strip()
+                    break
+        except Exception:
+            pkg_versions[pkg] = None
+
+    overall = "ok"
+    if load_status == "critical" or disk_status == "critical":
+        overall = "critical"
+    elif not hermes_pids:
+        overall = "warning"
+
+    return {
+        "timestamp": gather_time,
+        "overall": overall,
+        "load": {
+            "raw": load_raw if not load_raw.startswith("error:") else None,
+            "1min": load_values[0] if len(load_values) > 0 else None,
+            "5min": load_values[1] if len(load_values) > 1 else None,
+            "15min": load_values[2] if len(load_values) > 2 else None,
+            "avg": round(avg_load, 3) if avg_load is not None else None,
+            "threshold": threshold_load,
+            "status": load_status,
+        },
+        "disk": {
+            "total_gb": round(disk.total / (1024 ** 3), 2),
+            "used_gb": round(disk.used / (1024 ** 3), 2),
+            "free_gb": round(disk.free / (1024 ** 3), 2),
+            "used_percent": round(disk_percent, 2),
+            "threshold_percent": threshold_disk_percent,
+            "status": disk_status,
+        },
+        "memory": mem_stats,
+        "processes": {
+            "hermes_count": len(hermes_pids),
+            "hermes_pids": hermes_pids[:10],
+        },
+        "packages": pkg_versions,
+    }
+
+
+def main(argv: List[str] = None) -> int:
+    argv = argv or sys.argv[1:]
+    parser = argparse.ArgumentParser(description="Fleet health monitor")
+    parser.add_argument("--threshold-load", type=float, default=1.0)
+    parser.add_argument("--threshold-disk", type=float, default=90.0)
+    parser.add_argument("--fail-on-critical", action="store_true", help="Exit non-zero if overall is critical")
+    args = parser.parse_args(argv)
+
+    report = check_health(args.threshold_load, args.threshold_disk)
+    print(json.dumps(report, indent=2))
+    if args.fail_on_critical and report.get("overall") == "critical":
+        return 1
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/devkit/notebook_runner.py
+++ b/devkit/notebook_runner.py
@@ -0,0 +1,136 @@
+#!/usr/bin/env python3
+"""
+Notebook execution runner for agent tasks.
+Wraps papermill with sensible defaults and structured JSON reporting.
+
+Usage as CLI:
+    python -m devkit.notebook_runner notebooks/task.ipynb output.ipynb -p threshold 1.0
+    python -m devkit.notebook_runner notebooks/task.ipynb --dry-run
+
+Usage as module:
+    from devkit.notebook_runner import run_notebook
+    result = run_notebook("task.ipynb", "output.ipynb", parameters={"threshold": 1.0})
+"""
+
+import argparse
+import json
+import os
+import subprocess
+import sys
+import tempfile
+from pathlib import Path
+from typing import Any, Dict, List, Optional
+
+
+def run_notebook(
+    input_path: str,
+    output_path: Optional[str] = None,
+    parameters: Optional[Dict[str, Any]] = None,
+    kernel: str = "python3",
+    timeout: Optional[int] = None,
+    dry_run: bool = False,
+) -> Dict[str, Any]:
+    input_path = str(Path(input_path).expanduser().resolve())
+    if output_path is None:
+        fd, output_path = tempfile.mkstemp(suffix=".ipynb")
+        os.close(fd)
+    else:
+        output_path = str(Path(output_path).expanduser().resolve())
+
+    if dry_run:
+        return {
+            "status": "dry_run",
+            "input": input_path,
+            "output": output_path,
+            "parameters": parameters or {},
+            "kernel": kernel,
+        }
+
+    cmd = ["papermill", input_path, output_path, "--kernel", kernel]
+    if timeout is not None:
+        cmd.extend(["--execution-timeout", str(timeout)])
+    for key, value in (parameters or {}).items():
+        cmd.extend(["-p", key, str(value)])
+
+    start = os.times()
+    try:
+        proc = subprocess.run(
+            cmd,
+            capture_output=True,
+            text=True,
+            check=True,
+        )
+        end = os.times()
+        return {
+            "status": "ok",
+            "input": input_path,
+            "output": output_path,
+            "parameters": parameters or {},
+            "kernel": kernel,
+            "elapsed_seconds": round((end.elapsed - start.elapsed), 2),
+            "stdout": proc.stdout[-2000:] if proc.stdout else "",
+        }
+    except subprocess.CalledProcessError as e:
+        end = os.times()
+        return {
+            "status": "error",
+            "input": input_path,
+            "output": output_path,
+            "parameters": parameters or {},
+            "kernel": kernel,
+            "elapsed_seconds": round((end.elapsed - start.elapsed), 2),
+            "stdout": e.stdout[-2000:] if e.stdout else "",
+            "stderr": e.stderr[-2000:] if e.stderr else "",
+            "returncode": e.returncode,
+        }
+    except FileNotFoundError:
+        return {
+            "status": "error",
+            "message": "papermill not found. Install with: uv tool install papermill",
+        }
+
+
+def main(argv: List[str] = None) -> int:
+    argv = argv or sys.argv[1:]
+    parser = argparse.ArgumentParser(description="Notebook runner for agents")
+    parser.add_argument("input", help="Input notebook path")
+    parser.add_argument("output", nargs="?", default=None, help="Output notebook path")
+    parser.add_argument("-p", "--parameter", action="append", default=[], help="Parameters as key=value")
+    parser.add_argument("--kernel", default="python3")
+    parser.add_argument("--timeout", type=int, default=None)
+    parser.add_argument("--dry-run", action="store_true")
+    args = parser.parse_args(argv)
+
+    parameters = {}
+    for raw in args.parameter:
+        if "=" not in raw:
+            print(f"Invalid parameter (expected key=value): {raw}", file=sys.stderr)
+            return 1
+        k, v = raw.split("=", 1)
+        # Best-effort type inference
+        if v.lower() in ("true", "false"):
+            v = v.lower() == "true"
+        else:
+            try:
+                v = int(v)
+            except ValueError:
+                try:
+                    v = float(v)
+                except ValueError:
+                    pass
+        parameters[k] = v
+
+    result = run_notebook(
+        args.input,
+        args.output,
+        parameters=parameters,
+        kernel=args.kernel,
+        timeout=args.timeout,
+        dry_run=args.dry_run,
+    )
+    print(json.dumps(result, indent=2))
+    return 0 if result.get("status") == "ok" else 1
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/devkit/secret_scan.py
+++ b/devkit/secret_scan.py
@@ -0,0 +1,108 @@
+#!/usr/bin/env python3
+"""
+Fast secret leak scanner for the repository.
+Checks for common patterns that should never be committed.
+
+Usage as CLI:
+    python -m devkit.secret_scan
+    python -m devkit.secret_scan --path /some/repo --fail-on-find
+
+Usage as module:
+    from devkit.secret_scan import scan
+    findings = scan("/path/to/repo")
+"""
+
+import argparse
+import json
+import os
+import re
+import sys
+from pathlib import Path
+from typing import Any, Dict, List
+
+# Patterns to flag
+PATTERNS = {
+    "aws_access_key_id": re.compile(r"AKIA[0-9A-Z]{16}"),
+    "aws_secret_key": re.compile(r"['\"\s][0-9a-zA-Z/+]{40}['\"\s]"),
+    "generic_api_key": re.compile(r"api[_-]?key\s*[:=]\s*['\"][a-zA-Z0-9_\-]{20,}['\"]", re.IGNORECASE),
+    "private_key": re.compile(r"-----BEGIN (RSA |EC |DSA |OPENSSH )?PRIVATE KEY-----"),
+    "github_token": re.compile(r"gh[pousr]_[A-Za-z0-9_]{36,}"),
+    "gitea_token": re.compile(r"[0-9a-f]{40}"),  # heuristic for long hex strings after "token"
+    "telegram_bot_token": re.compile(r"[0-9]{9,}:[A-Za-z0-9_-]{35,}"),
+}
+
+# Files and paths to skip
+SKIP_PATHS = [
+    ".git",
+    "__pycache__",
+    ".pytest_cache",
+    "node_modules",
+    "venv",
+    ".env",
+    ".agent-skills",
+]
+
+# Max file size to scan (bytes)
+MAX_FILE_SIZE = 1024 * 1024
+
+
+def _should_skip(path: Path) -> bool:
+    for skip in SKIP_PATHS:
+        if skip in path.parts:
+            return True
+    return False
+
+
+def scan(root: str = ".") -> List[Dict[str, Any]]:
+    root_path = Path(root).resolve()
+    findings = []
+    for file_path in root_path.rglob("*"):
+        if not file_path.is_file():
+            continue
+        if _should_skip(file_path):
+            continue
+        if file_path.stat().st_size > MAX_FILE_SIZE:
+            continue
+        try:
+            text = file_path.read_text(encoding="utf-8", errors="ignore")
+        except Exception:
+            continue
+        for pattern_name, pattern in PATTERNS.items():
+            for match in pattern.finditer(text):
+                # Simple context: line around match
+                start = max(0, match.start() - 40)
+                end = min(len(text), match.end() + 40)
+                context = text[start:end].replace("\n", " ")
+                findings.append({
+                    "file": str(file_path.relative_to(root_path)),
+                    "pattern": pattern_name,
+                    "line": text[:match.start()].count("\n") + 1,
+                    "context": context,
+                })
+    return findings
+
+
+def main(argv: List[str] = None) -> int:
+    argv = argv or sys.argv[1:]
+    parser = argparse.ArgumentParser(description="Secret leak scanner")
+    parser.add_argument("--path", default=".", help="Repository root to scan")
+    parser.add_argument("--fail-on-find", action="store_true", help="Exit non-zero if secrets found")
+    parser.add_argument("--json", action="store_true", help="Output as JSON")
+    args = parser.parse_args(argv)
+
+    findings = scan(args.path)
+    if args.json:
+        print(json.dumps({"findings": findings, "count": len(findings)}, indent=2))
+    else:
+        print(f"Scanned {args.path}")
+        print(f"Findings: {len(findings)}")
+        for f in findings:
+            print(f"  [{f['pattern']}] {f['file']}:{f['line']} -> ...{f['context']}...")
+
+    if args.fail_on_find and findings:
+        return 1
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/devkit/smoke_test.py
+++ b/devkit/smoke_test.py
@@ -0,0 +1,108 @@
+#!/usr/bin/env python3
+"""
+Shared smoke test runner for hermes-agent.
+Fast checks that catch obvious breakage without maintenance burden.
+
+Usage as CLI:
+    python -m devkit.smoke_test
+    python -m devkit.smoke_test --verbose
+
+Usage as module:
+    from devkit.smoke_test import run_smoke_tests
+    results = run_smoke_tests()
+"""
+
+import argparse
+import importlib
+import json
+import subprocess
+import sys
+from pathlib import Path
+from typing import Any, Dict, List
+
+
+HERMES_ROOT = Path(__file__).resolve().parent.parent
+
+
+def _test_imports() -> Dict[str, Any]:
+    modules = [
+        "hermes_constants",
+        "hermes_state",
+        "cli",
+        "tools.skills_sync",
+        "tools.skills_hub",
+    ]
+    errors = []
+    for mod in modules:
+        try:
+            importlib.import_module(mod)
+        except Exception as e:
+            errors.append({"module": mod, "error": str(e)})
+    return {
+        "name": "core_imports",
+        "status": "ok" if not errors else "fail",
+        "errors": errors,
+    }
+
+
+def _test_cli_entrypoints() -> Dict[str, Any]:
+    entrypoints = [
+        [sys.executable, "-m", "cli", "--help"],
+    ]
+    errors = []
+    for cmd in entrypoints:
+        try:
+            subprocess.run(cmd, capture_output=True, text=True, check=True, cwd=HERMES_ROOT)
+        except subprocess.CalledProcessError as e:
+            errors.append({"cmd": cmd, "error": f"exit {e.returncode}"})
+        except Exception as e:
+            errors.append({"cmd": cmd, "error": str(e)})
+    return {
+        "name": "cli_entrypoints",
+        "status": "ok" if not errors else "fail",
+        "errors": errors,
+    }
+
+
+def _test_green_path_e2e() -> Dict[str, Any]:
+    """One bare green-path E2E: terminal_tool echo hello."""
+    try:
+        from tools.terminal_tool import terminal
+        result = terminal(command="echo hello")
+        output = result.get("output", "")
+        if "hello" in output.lower():
+            return {"name": "green_path_e2e", "status": "ok", "output": output.strip()}
+        return {"name": "green_path_e2e", "status": "fail", "error": f"Unexpected output: {output}"}
+    except Exception as e:
+        return {"name": "green_path_e2e", "status": "fail", "error": str(e)}
+
+
+def run_smoke_tests(verbose: bool = False) -> Dict[str, Any]:
+    tests = [
+        _test_imports(),
+        _test_cli_entrypoints(),
+        _test_green_path_e2e(),
+    ]
+    failed = [t for t in tests if t["status"] != "ok"]
+    result = {
+        "overall": "ok" if not failed else "fail",
+        "tests": tests,
+        "failed_count": len(failed),
+    }
+    if verbose:
+        print(json.dumps(result, indent=2))
+    return result
+
+
+def main(argv: List[str] = None) -> int:
+    argv = argv or sys.argv[1:]
+    parser = argparse.ArgumentParser(description="Smoke test runner")
+    parser.add_argument("--verbose", action="store_true")
+    args = parser.parse_args(argv)
+
+    result = run_smoke_tests(verbose=True)
+    return 0 if result["overall"] == "ok" else 1
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/devkit/wizard_env.py
+++ b/devkit/wizard_env.py
@@ -0,0 +1,112 @@
+#!/usr/bin/env python3
+"""
+Wizard environment validator.
+Checks that a new wizard environment is ready for duty.
+
+Usage as CLI:
+    python -m devkit.wizard_env
+    python -m devkit.wizard_env --fix
+
+Usage as module:
+    from devkit.wizard_env import validate
+    report = validate()
+"""
+
+import argparse
+import json
+import os
+import shutil
+import subprocess
+import sys
+from typing import Any, Dict, List
+
+
+def _has_cmd(name: str) -> bool:
+    return shutil.which(name) is not None
+
+
+def _check_env_var(name: str) -> Dict[str, Any]:
+    value = os.getenv(name)
+    return {
+        "name": name,
+        "status": "ok" if value else "missing",
+        "value": value[:10] + "..." if value and len(value) > 20 else value,
+    }
+
+
+def _check_python_pkg(name: str) -> Dict[str, Any]:
+    try:
+        __import__(name)
+        return {"name": name, "status": "ok"}
+    except ImportError:
+        return {"name": name, "status": "missing"}
+
+
+def validate() -> Dict[str, Any]:
+    checks = {
+        "binaries": [
+            {"name": "python3", "status": "ok" if _has_cmd("python3") else "missing"},
+            {"name": "git", "status": "ok" if _has_cmd("git") else "missing"},
+            {"name": "curl", "status": "ok" if _has_cmd("curl") else "missing"},
+            {"name": "jupyter-lab", "status": "ok" if _has_cmd("jupyter-lab") else "missing"},
+            {"name": "papermill", "status": "ok" if _has_cmd("papermill") else "missing"},
+            {"name": "jupytext", "status": "ok" if _has_cmd("jupytext") else "missing"},
+        ],
+        "env_vars": [
+            _check_env_var("GITEA_URL"),
+            _check_env_var("GITEA_TOKEN"),
+            _check_env_var("TELEGRAM_BOT_TOKEN"),
+        ],
+        "python_packages": [
+            _check_python_pkg("requests"),
+            _check_python_pkg("jupyter_server"),
+            _check_python_pkg("nbformat"),
+        ],
+    }
+
+    all_ok = all(
+        c["status"] == "ok"
+        for group in checks.values()
+        for c in group
+    )
+
+    # Hermes-specific checks
+    hermes_home = os.path.expanduser("~/.hermes")
+    checks["hermes"] = [
+        {"name": "config.yaml", "status": "ok" if os.path.exists(f"{hermes_home}/config.yaml") else "missing"},
+        {"name": "skills_dir", "status": "ok" if os.path.exists(f"{hermes_home}/skills") else "missing"},
+    ]
+
+    all_ok = all_ok and all(c["status"] == "ok" for c in checks["hermes"])
+
+    return {
+        "overall": "ok" if all_ok else "incomplete",
+        "checks": checks,
+    }
+
+
+def main(argv: List[str] = None) -> int:
+    argv = argv or sys.argv[1:]
+    parser = argparse.ArgumentParser(description="Wizard environment validator")
+    parser.add_argument("--json", action="store_true")
+    parser.add_argument("--fail-on-incomplete", action="store_true")
+    args = parser.parse_args(argv)
+
+    report = validate()
+    if args.json:
+        print(json.dumps(report, indent=2))
+    else:
+        print(f"Wizard Environment: {report['overall']}")
+        for group, items in report["checks"].items():
+            print(f"\n[{group}]")
+            for item in items:
+                status_icon = "✅" if item["status"] == "ok" else "❌"
+                print(f"  {status_icon} {item['name']}: {item['status']}")
+
+    if args.fail_on_incomplete and report["overall"] != "ok":
+        return 1
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/docs/fleet-sitrep-2026-04-06.md
+++ b/docs/fleet-sitrep-2026-04-06.md
@@ -0,0 +1,132 @@
+# Fleet SITREP — April 6, 2026
+
+**Classification:** Consolidated Status Report
+**Compiled by:** Ezra
+**Acknowledged by:** Claude (Issue #143)
+
+---
+
+## Executive Summary
+
+Allegro executed 7 tasks across infrastructure, contracting, audits, and security. Ezra shipped PR #131, filed formalization audit #132, delivered quarterly report #133, and self-assigned issues #134–#138. All wizard activity mapped below.
+
+---
+
+## 1. Allegro 7-Task Report
+
+| Task | Description | Status |
+|------|-------------|--------|
+| 1 | Roll Call / Infrastructure Map | ✅ Complete |
+| 2 | Dark industrial anthem (140 BPM, Suno-ready) | ✅ Complete |
+| 3 | Operation Get A Job — 7-file contracting playbook pushed to `the-nexus` | ✅ Complete |
+| 4 | Formalization audit filed ([the-nexus #893](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/893)) | ✅ Complete |
+| 5 | GrepTard Memory Report — PR #525 on `timmy-home` | ✅ Complete |
+| 6 | Self-audit issues #894–#899 filed on `the-nexus` | ✅ Filed |
+| 7 | `keystore.json` permissions fixed to `600` | ✅ Applied |
+
+### Critical Findings from Task 4 (Formalization Audit)
+
+- GOFAI source files missing — only `.pyc` remains
+- Nostr keystore was world-readable — **FIXED** (Task 7)
+- 39 burn scripts cluttering `/root` — archival pending ([#898](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/898))
+
+---
+
+## 2. Ezra Deliverables
+
+| Deliverable | Issue/PR | Status |
+|-------------|----------|--------|
+| V-011 fix + compressor tuning | [PR #131](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/pulls/131) | ✅ Merged |
+| Formalization audit (hermes-agent) | [Issue #132](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/132) | Filed |
+| Quarterly report (MD + PDF) | [Issue #133](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/133) | Filed |
+| Burn-mode concurrent tool tests | [Issue #134](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/134) | Assigned → Ezra |
+| MCP SDK migration | [Issue #135](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/135) | Assigned → Ezra |
+| APScheduler migration | [Issue #136](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/136) | Assigned → Ezra |
+| Pydantic-settings migration | [Issue #137](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/137) | Assigned → Ezra |
+| Contracting playbook tracker | [Issue #138](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/138) | Assigned → Ezra |
+
+---
+
+## 3. Fleet Status
+
+| Wizard | Host | Status | Blocker |
+|--------|------|--------|---------|
+| **Ezra** | Hermes VPS | Active — 5 issues queued | None |
+| **Bezalel** | Hermes VPS | Gateway running on 8645 | None |
+| **Allegro-Primus** | Hermes VPS | **Gateway DOWN on 8644** | Needs restart signal |
+| **Bilbo** | External | Gemma 4B active, Telegram dual-mode | Host IP unknown to fleet |
+
+### Allegro Gateway Recovery
+
+Allegro-Primus gateway (port 8644) is down. Options:
+1. **Alexander restarts manually** on Hermes VPS
+2. **Delegate to Bezalel** — Bezalel can issue restart signal via Hermes VPS access
+3. **Delegate to Ezra** — Ezra can coordinate restart as part of issue #894 work
+
+---
+
+## 4. Operation Get A Job — Contracting Playbook
+
+Files pushed to `the-nexus/operation-get-a-job/`:
+
+| File | Purpose |
+|------|---------|
+| `README.md` | Master plan |
+| `entity-setup.md` | Wyoming LLC, Mercury, E&O insurance |
+| `service-offerings.md` | Rates $150–600/hr; packages $5k/$15k/$40k+ |
+| `portfolio.md` | Portfolio structure |
+| `outreach-templates.md` | Cold email templates |
+| `proposal-template.md` | Client proposal structure |
+| `rate-card.md` | Rate card |
+
+**Human-only mile (Alexander's action items):**
+
+1. Pick LLC name from `entity-setup.md`
+2. File Wyoming LLC via Northwest Registered Agent ($225)
+3. Get EIN from IRS (free, ~10 min)
+4. Open Mercury account (requires EIN + LLC docs)
+5. Secure E&O insurance (~$150–250/month)
+6. Restart Allegro-Primus gateway (port 8644)
+7. Update LinkedIn using profile template
+8. Send 5 cold emails using outreach templates
+
+---
+
+## 5. Pending Self-Audit Issues (the-nexus)
+
+| Issue | Title | Priority |
+|-------|-------|----------|
+| [#894](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/894) | Deploy burn-mode cron jobs | CRITICAL |
+| [#895](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/895) | Telegram thread-based reporting | Normal |
+| [#896](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/896) | Retry logic and error recovery | Normal |
+| [#897](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/897) | Automate morning reports at 0600 | Normal |
+| [#898](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/898) | Archive 39 burn scripts | Normal |
+| [#899](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/899) | Keystore permissions | ✅ Done |
+
+---
+
+## 6. Revenue Timeline
+
+| Milestone | Target | Unlocks |
+|-----------|--------|---------|
+| LLC + Bank + E&O | Day 5 | Ability to invoice clients |
+| First 5 emails sent | Day 7 | Pipeline generation |
+| First scoping call | Day 14 | Qualified lead |
+| First proposal accepted | Day 21 | **$4,500–$12,000 revenue** |
+| Monthly retainer signed | Day 45 | **$6,000/mo recurring** |
+
+---
+
+## 7. Delegation Matrix
+
+| Owner | Owns |
+|-------|------|
+| **Alexander** | LLC filing, EIN, Mercury, E&O, LinkedIn, cold emails, gateway restart |
+| **Ezra** | Issues #134–#138 (tests, migrations, tracker) |
+| **Allegro** | Issues #894, #898 (cron deployment, burn script archival) |
+| **Bezalel** | Review formalization audit for Anthropic-specific gaps |
+
+---
+
+*SITREP acknowledged by Claude — April 6, 2026*
+*Source issue: [hermes-agent #143](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/143)*
--- a/docs/research-ssd-self-distillation-2026-04.md
+++ b/docs/research-ssd-self-distillation-2026-04.md
@@ -0,0 +1,166 @@
+# Research Acknowledgment: SSD — Simple Self-Distillation Improves Code Generation
+
+**Issue:** #128
+**Paper:** [Embarrassingly Simple Self-Distillation Improves Code Generation](https://arxiv.org/abs/2604.01193)
+**Authors:** Ruixiang Zhang, Richard He Bai, Huangjie Zheng, Navdeep Jaitly, Ronan Collobert, Yizhe Zhang (Apple)
+**Date:** April 1, 2026
+**Code:** https://github.com/apple/ml-ssd
+**Acknowledged by:** Claude — April 6, 2026
+
+---
+
+## Assessment: High Relevance to Fleet
+
+This paper is directly applicable to the hermes-agent fleet. The headline result — +7.5pp pass@1 on Qwen3-4B — is at exactly the scale we operate. The method requires no external infrastructure. Triage verdict: **P0 / Week-class work**.
+
+---
+
+## What SSD Actually Does
+
+Three steps, nothing exotic:
+
+1. **Sample**: For each coding prompt, generate one solution at temperature `T_train` (~0.9). Do NOT filter for correctness.
+2. **Fine-tune**: SFT on the resulting `(prompt, unverified_solution)` pairs. Standard cross-entropy loss. No RLHF, no GRPO, no DPO.
+3. **Evaluate**: At `T_eval` (which must be **different** from `T_train`). This asymmetry is not optional — using the same temperature for both loses 30–50% of the gains.
+
+The counterintuitive part: N=1 per problem, unverified. Prior self-improvement work uses N>>1 and filters by execution. SSD doesn't. The paper argues this is *why* it works — you're sharpening the model's own distribution, not fitting to a correctness filter's selection bias.
+
+---
+
+## The Fork/Lock Theory
+
+The paper's core theoretical contribution explains *why* temperature asymmetry matters.
+
+**Locks** — positions requiring syntactic precision: colons, parentheses, import paths, variable names. A mistake here is a hard error. Low temperature helps at Locks. But applying low temperature globally kills diversity everywhere.
+
+**Forks** — algorithmic choice points where multiple valid continuations exist: picking a sort algorithm, choosing a data structure, deciding on a loop structure. High temperature helps at Forks. But applying high temperature globally introduces errors at Locks.
+
+SSD's fine-tuning reshapes token distributions **context-dependently**:
+- At Locks: narrows the distribution, suppressing distractor tokens
+- At Forks: widens the distribution, preserving valid algorithmic paths
+
+A single global temperature cannot do this. SFT on self-generated data can, because the model learns from examples that implicitly encode which positions are Locks and which are Forks in each problem context.
+
+**Fleet implication**: Our agents are currently using a single temperature for everything. This is leaving performance on the table even without fine-tuning. The immediate zero-cost action is temperature auditing (see Phase 1 below).
+
+---
+
+## Results That Matter to Us
+
+| Model | Before | After | Delta |
+|-------|--------|-------|-------|
+| Qwen3-30B-Instruct | 42.4% | 55.3% | +12.9pp (+30% rel) |
+| Qwen3-4B-Instruct | baseline | baseline+7.5pp | +7.5pp |
+| Llama-3.1-8B-Instruct | baseline | baseline+3.5pp | +3.5pp |
+
+Gains concentrate on hard problems: +14.2pp medium, +15.3pp hard. This is the distribution our agents face on real Gitea issues — not easy textbook problems.
+
+---
+
+## Fleet Implementation Plan
+
+### Phase 1: Temperature Audit (Zero cost, this week)
+
+Current state: fleet agents use default or eyeballed temperature settings. The paper shows T_eval != T_train is critical even without fine-tuning.
+
+Actions:
+1. Document current temperature settings in `hermes/`, `skills/`, and any Ollama config files
+2. Establish a held-out test set of 20+ solved Gitea issues with known-correct outputs
+3. Run A/B: current T_eval vs. T_eval=0.7 vs. T_eval=0.3 for code generation tasks
+4. Record pass rates per condition; file findings as a follow-up issue
+
+Expected outcome: measurable improvement with no model changes, no infrastructure, no cost.
+
+### Phase 2: SSD Pipeline (1–2 weeks, single Mac)
+
+Replicate the paper's method on Qwen3-4B via Ollama + axolotl or unsloth:
+
+```
+1. Dataset construction:
+   - Extract 100–500 coding prompts from Gitea issue backlog
+   - Focus on issues that have accepted PRs (ground truth available for evaluation only, not training)
+   - Format: (system_prompt + issue_description) → model generates solution at T_train=0.9
+
+2. Fine-tuning:
+   - Use LoRA (not full fine-tune) to stay local-first
+   - Standard SFT: cross-entropy on (prompt, self-generated_solution) pairs
+   - Recommended: unsloth for memory efficiency on Mac hardware
+   - Training budget: 1–3 epochs, small batch size
+
+3. Evaluation:
+   - Compare base model vs. SSD-tuned model at T_eval=0.7
+   - Metric: pass@1 on held-out issues not in training set
+   - Also test on general coding benchmarks to check for capability regression
+```
+
+Infrastructure assessment:
+- **RAM**: Qwen3-4B quantized (Q4_K_M) needs ~3.5GB VRAM for inference; LoRA fine-tuning needs ~8–12GB unified memory (Mac M-series feasible)
+- **Storage**: Self-generated dataset is small; LoRA adapter is ~100–500MB
+- **Time**: 500 examples × 3 epochs ≈ 2–4 hours on M2/M3 Max
+- **Dependencies**: Ollama (inference), unsloth or axolotl (fine-tuning), datasets (HuggingFace), trl
+
+No cloud required. No teacher model required. No code execution environment required.
+
+### Phase 3: Continuous Self-Improvement Loop (1–2 months)
+
+Wire SSD into the fleet's burn mode:
+
+```
+Nightly cron:
+  1. Collect agent solutions from the day's completed issues
+  2. Filter: only solutions where the PR was merged (human-verified correct)
+  3. Append to rolling training buffer (last 500 examples)
+  4. Run SFT fine-tune on buffer → update LoRA adapter
+  5. Swap adapter into Ollama deployment at dawn
+  6. Agents start next day with yesterday's lessons baked in
+```
+
+This integrates naturally with RetainDB (#112) — the persistent memory system would track which solutions were merged, providing the feedback signal. The continuous loop turns every merged PR into a training example.
+
+### Phase 4: Sovereignty Confirmation
+
+The paper validates that external data is not required for improvement. Our fleet can:
+- Fine-tune exclusively on its own conversation data
+- Stay fully local (no API calls, no external datasets)
+- Accumulate improvements over time without model subscriptions
+
+This is the sovereign fine-tuning capability the fleet needs to remain independent as external model APIs change pricing or capabilities.
+
+---
+
+## Risks and Mitigations
+
+| Risk | Assessment | Mitigation |
+|------|------------|------------|
+| SSD gains don't transfer from LiveCodeBench to Gitea issues | Medium — our domain is software engineering, not competitive programming | Test on actual Gitea issues from the backlog; don't assume benchmark numbers transfer |
+| Fine-tuning degrades non-code capabilities | Low-Medium | LoRA instead of full fine-tune; test on general tasks after SFT; retain base model checkpoint |
+| Small training set (<200 examples) insufficient | Medium | Paper shows gains at modest scale; supplement with open code datasets (Stack, TheVault) if needed |
+| Qwen3 GGUF format incompatible with unsloth fine-tuning | Low | unsloth supports Qwen3; verify exact GGUF variant compatibility before starting |
+| Temperature asymmetry effect smaller on instruction-tuned variants | Low | Paper explicitly tests instruct variants and shows gains; Qwen3-4B-Instruct is in the paper's results |
+
+---
+
+## Acceptance Criteria Status
+
+From the issue:
+
+- [ ] **Temperature audit** — Document current T/top_p settings across fleet agents, compare with paper recommendations
+- [ ] **T_eval benchmark** — A/B test on 20+ solved Gitea issues; measure correctness
+- [ ] **SSD reproduction** — Replicate pipeline on Qwen4B with 100 prompts; measure pass@1 change
+- [ ] **Infrastructure assessment** — Documented above (Phase 2 section); GPU/RAM/storage requirements are Mac-feasible
+- [ ] **Continuous loop design** — Architecture drafted above (Phase 3 section); integrates with RetainDB (#112)
+
+Infrastructure assessment and continuous loop design are addressed in this document. Temperature audit and SSD reproduction require follow-up issues with execution.
+
+---
+
+## Recommended Follow-Up Issues
+
+1. **Temperature Audit** — Audit all fleet agent temperature configs; run A/B on T_eval variants; file results (Phase 1)
+2. **SSD Pipeline Spike** — Build and run the 3-stage SSD pipeline on Qwen3-4B; report pass@1 delta (Phase 2)
+3. **Nightly SFT Integration** — Wire SSD into burn-mode cron; integrate with RetainDB feedback loop (Phase 3)
+
+---
+
+*Research acknowledged by Claude — April 6, 2026*
+*Source issue: [hermes-agent #128](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/128)*
--- a/scripts/forge_health_check.py
+++ b/scripts/forge_health_check.py
@@ -0,0 +1,261 @@
+#!/usr/bin/env python3
+"""Forge Health Check — Build verification and artifact integrity scanner.
+
+Scans wizard environments for:
+- Missing source files (.pyc without .py) — Allegro finding: GOFAI source files gone
+- Burn script accumulation in /root or wizard directories
+- World-readable sensitive files (keystores, tokens, configs)
+- Missing required environment variables
+
+Usage:
+    python scripts/forge_health_check.py /root/wizards
+    python scripts/forge_health_check.py /root/wizards --json
+    python scripts/forge_health_check.py /root/wizards --fix-permissions
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import stat
+import sys
+from dataclasses import asdict, dataclass, field
+from pathlib import Path
+from typing import Iterable
+
+
+SENSITIVE_FILE_PATTERNS = (
+    "keystore",
+    "password",
+    "private",
+    "apikey",
+    "api_key",
+    "credentials",
+)
+
+SENSITIVE_NAME_PREFIXES = (
+    "key_",
+    "keys_",
+    "token_",
+    "tokens_",
+    "secret_",
+    "secrets_",
+    ".env",
+    "env.",
+)
+
+SENSITIVE_NAME_SUFFIXES = (
+    "_key",
+    "_keys",
+    "_token",
+    "_tokens",
+    "_secret",
+    "_secrets",
+    ".key",
+    ".env",
+    ".token",
+    ".secret",
+)
+
+SENSIBLE_PERMISSIONS = 0o600  # owner read/write only
+
+REQUIRED_ENV_VARS = (
+    "GITEA_URL",
+    "GITEA_TOKEN",
+    "GITEA_USER",
+)
+
+BURN_SCRIPT_PATTERNS = (
+    "burn",
+    "ignite",
+    "inferno",
+    "scorch",
+    "char",
+    "blaze",
+    "ember",
+)
+
+
+@dataclass
+class HealthFinding:
+    category: str
+    severity: str  # critical, warning, info
+    path: str
+    message: str
+    suggestion: str = ""
+
+
+@dataclass
+class HealthReport:
+    target: str
+    findings: list[HealthFinding] = field(default_factory=list)
+    passed: bool = True
+
+    def add(self, finding: HealthFinding) -> None:
+        self.findings.append(finding)
+        if finding.severity == "critical":
+            self.passed = False
+
+
+def scan_orphaned_bytecode(root: Path, report: HealthReport) -> None:
+    """Detect .pyc files without corresponding .py source files."""
+    for pyc in root.rglob("*.pyc"):
+        py = pyc.with_suffix(".py")
+        if not py.exists():
+            # Also check __pycache__ naming convention
+            if pyc.name.startswith("__") and pyc.parent.name == "__pycache__":
+                stem = pyc.stem.split(".")[0]
+                py = pyc.parent.parent / f"{stem}.py"
+            if not py.exists():
+                report.add(
+                    HealthFinding(
+                        category="artifact_integrity",
+                        severity="critical",
+                        path=str(pyc),
+                        message=f"Compiled bytecode without source: {pyc}",
+                        suggestion="Restore missing .py source file from version control or backup",
+                    )
+                )
+
+
+def scan_burn_script_clutter(root: Path, report: HealthReport) -> None:
+    """Detect burn scripts and other temporary artifacts outside proper staging."""
+    for path in root.iterdir():
+        if not path.is_file():
+            continue
+        lower = path.name.lower()
+        if any(pat in lower for pat in BURN_SCRIPT_PATTERNS):
+            report.add(
+                HealthFinding(
+                    category="deployment_hygiene",
+                    severity="warning",
+                    path=str(path),
+                    message=f"Burn script or temporary artifact in production path: {path.name}",
+                    suggestion="Archive to a burn/ or tmp/ directory, or remove if no longer needed",
+                )
+            )
+
+
+def _is_sensitive_filename(name: str) -> bool:
+    """Check if a filename indicates it may contain secrets."""
+    lower = name.lower()
+    if lower == ".env.example":
+        return False
+    if any(pat in lower for pat in SENSITIVE_FILE_PATTERNS):
+        return True
+    if any(lower.startswith(pref) for pref in SENSITIVE_NAME_PREFIXES):
+        return True
+    if any(lower.endswith(suff) for suff in SENSITIVE_NAME_SUFFIXES):
+        return True
+    return False
+
+
+def scan_sensitive_file_permissions(root: Path, report: HealthReport, fix: bool = False) -> None:
+    """Detect world-readable sensitive files."""
+    for fpath in root.rglob("*"):
+        if not fpath.is_file():
+            continue
+        # Skip test files — real secrets should never live in tests/
+        if "/tests/" in str(fpath) or str(fpath).startswith(str(root / "tests")):
+            continue
+        if not _is_sensitive_filename(fpath.name):
+            continue
+
+        try:
+            mode = fpath.stat().st_mode
+        except OSError:
+            continue
+
+        # Readable by group or other
+        if mode & stat.S_IRGRP or mode & stat.S_IROTH:
+            was_fixed = False
+            if fix:
+                try:
+                    fpath.chmod(SENSIBLE_PERMISSIONS)
+                    was_fixed = True
+                except OSError:
+                    pass
+
+            report.add(
+                HealthFinding(
+                    category="security",
+                    severity="critical",
+                    path=str(fpath),
+                    message=(
+                        f"Sensitive file world-readable: {fpath.name} "
+                        f"(mode={oct(mode & 0o777)})"
+                    ),
+                    suggestion=(
+                        f"Fixed permissions to {oct(SENSIBLE_PERMISSIONS)}"
+                        if was_fixed
+                        else f"Run 'chmod {oct(SENSIBLE_PERMISSIONS)[2:]} {fpath}'"
+                    ),
+                )
+            )
+
+
+def scan_environment_variables(report: HealthReport) -> None:
+    """Check for required environment variables."""
+    for var in REQUIRED_ENV_VARS:
+        if not os.environ.get(var):
+            report.add(
+                HealthFinding(
+                    category="configuration",
+                    severity="warning",
+                    path="$" + var,
+                    message=f"Required environment variable {var} is missing or empty",
+                    suggestion="Export the variable in your shell profile or secrets manager",
+                )
+            )
+
+
+def run_health_check(target: Path, fix_permissions: bool = False) -> HealthReport:
+    report = HealthReport(target=str(target.resolve()))
+    if target.exists():
+        scan_orphaned_bytecode(target, report)
+        scan_burn_script_clutter(target, report)
+        scan_sensitive_file_permissions(target, report, fix=fix_permissions)
+    scan_environment_variables(report)
+    return report
+
+
+def print_report(report: HealthReport) -> None:
+    status = "PASS" if report.passed else "FAIL"
+    print(f"Forge Health Check: {status}")
+    print(f"Target: {report.target}")
+    print(f"Findings: {len(report.findings)}\n")
+
+    by_category: dict[str, list[HealthFinding]] = {}
+    for f in report.findings:
+        by_category.setdefault(f.category, []).append(f)
+
+    for category, findings in by_category.items():
+        print(f"[{category.upper()}]")
+        for f in findings:
+            print(f"  {f.severity.upper()}: {f.message}")
+            if f.suggestion:
+                print(f"    -> {f.suggestion}")
+        print()
+
+
+def main(argv: list[str] | None = None) -> int:
+    parser = argparse.ArgumentParser(description="Forge Health Check")
+    parser.add_argument("target", nargs="?", default="/root/wizards", help="Root path to scan")
+    parser.add_argument("--json", action="store_true", help="Output JSON report")
+    parser.add_argument("--fix-permissions", action="store_true", help="Auto-fix file permissions")
+    args = parser.parse_args(argv)
+
+    target = Path(args.target)
+    report = run_health_check(target, fix_permissions=args.fix_permissions)
+
+    if args.json:
+        print(json.dumps(asdict(report), indent=2))
+    else:
+        print_report(report)
+
+    return 0 if report.passed else 1
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
--- a/scripts/smoke_test.py
+++ b/scripts/smoke_test.py
@@ -0,0 +1,89 @@
+#!/usr/bin/env python3
+"""Forge smoke tests — fast checks that core imports resolve and entrypoints load.
+
+Total runtime target: < 30 seconds.
+"""
+
+from __future__ import annotations
+
+import importlib
+import subprocess
+import sys
+from pathlib import Path
+
+# Allow running smoke test directly from repo root before pip install
+REPO_ROOT = Path(__file__).parent.parent
+sys.path.insert(0, str(REPO_ROOT))
+
+CORE_MODULES = [
+    "hermes_cli.config",
+    "hermes_state",
+    "model_tools",
+    "toolsets",
+    "utils",
+]
+
+CLI_ENTRYPOINTS = [
+    [sys.executable, "cli.py", "--help"],
+]
+
+
+def test_imports() -> None:
+    ok = 0
+    skipped = 0
+    for mod in CORE_MODULES:
+        try:
+            importlib.import_module(mod)
+            ok += 1
+        except ImportError as exc:
+            # If the failure is a missing third-party dependency, skip rather than fail
+            # so the smoke test can run before `pip install` in bare environments.
+            msg = str(exc).lower()
+            if "no module named" in msg and mod.replace(".", "/") not in msg:
+                print(f"SKIP: import {mod} -> missing dependency ({exc})")
+                skipped += 1
+            else:
+                print(f"FAIL: import {mod} -> {exc}")
+                sys.exit(1)
+        except Exception as exc:
+            print(f"FAIL: import {mod} -> {exc}")
+            sys.exit(1)
+    print(f"OK: {ok} core imports", end="")
+    if skipped:
+        print(f" ({skipped} skipped due to missing deps)")
+    else:
+        print()
+
+
+def test_cli_help() -> None:
+    ok = 0
+    skipped = 0
+    for cmd in CLI_ENTRYPOINTS:
+        result = subprocess.run(cmd, capture_output=True, timeout=30)
+        if result.returncode == 0:
+            ok += 1
+            continue
+        stderr = result.stderr.decode().lower()
+        # Gracefully skip if dependencies are missing in bare environments
+        if "modulenotfounderror" in stderr or "no module named" in stderr:
+            print(f"SKIP: {' '.join(cmd)} -> missing dependency")
+            skipped += 1
+        else:
+            print(f"FAIL: {' '.join(cmd)} -> {result.stderr.decode()[:200]}")
+            sys.exit(1)
+    print(f"OK: {ok} CLI entrypoints", end="")
+    if skipped:
+        print(f" ({skipped} skipped due to missing deps)")
+    else:
+        print()
+
+
+def main() -> int:
+    test_imports()
+    test_cli_help()
+    print("Smoke tests passed.")
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
--- a/scripts/syntax_guard.py
+++ b/scripts/syntax_guard.py
@@ -0,0 +1,20 @@
+#!/usr/bin/env python3
+"""Syntax guard — compile all Python files to catch syntax errors before merge."""
+import py_compile
+import sys
+from pathlib import Path
+
+errors = []
+for p in Path(".").rglob("*.py"):
+    if ".venv" in p.parts or "__pycache__" in p.parts:
+        continue
+    try:
+        py_compile.compile(str(p), doraise=True)
+    except py_compile.PyCompileError as e:
+        errors.append(f"{p}: {e}")
+        print(f"SYNTAX ERROR: {p}: {e}", file=sys.stderr)
+
+if errors:
+    print(f"\n{len(errors)} file(s) with syntax errors", file=sys.stderr)
+    sys.exit(1)
+print("All Python files compile successfully")
--- a/tests/test_forge_health_check.py
+++ b/tests/test_forge_health_check.py
@@ -0,0 +1,175 @@
+"""Tests for scripts/forge_health_check.py"""
+
+import os
+import stat
+from pathlib import Path
+
+# Import the script as a module
+import sys
+sys.path.insert(0, str(Path(__file__).parent.parent / "scripts"))
+
+from forge_health_check import (
+    HealthFinding,
+    HealthReport,
+    _is_sensitive_filename,
+    run_health_check,
+    scan_burn_script_clutter,
+    scan_orphaned_bytecode,
+    scan_sensitive_file_permissions,
+    scan_environment_variables,
+)
+
+
+class TestIsSensitiveFilename:
+    def test_keystore_is_sensitive(self) -> None:
+        assert _is_sensitive_filename("keystore.json") is True
+
+    def test_env_example_is_not_sensitive(self) -> None:
+        assert _is_sensitive_filename(".env.example") is False
+
+    def test_env_file_is_sensitive(self) -> None:
+        assert _is_sensitive_filename(".env") is True
+        assert _is_sensitive_filename("production.env") is True
+
+    def test_test_file_with_key_is_not_sensitive(self) -> None:
+        assert _is_sensitive_filename("test_interrupt_key_match.py") is False
+        assert _is_sensitive_filename("test_api_key_providers.py") is False
+
+
+class TestScanOrphanedBytecode:
+    def test_detects_pyc_without_py(self, tmp_path: Path) -> None:
+        pyc = tmp_path / "module.pyc"
+        pyc.write_bytes(b"\x00")
+        report = HealthReport(target=str(tmp_path))
+        scan_orphaned_bytecode(tmp_path, report)
+        assert len(report.findings) == 1
+        assert report.findings[0].category == "artifact_integrity"
+        assert report.findings[0].severity == "critical"
+
+    def test_ignores_pyc_with_py(self, tmp_path: Path) -> None:
+        (tmp_path / "module.py").write_text("pass")
+        pyc = tmp_path / "module.pyc"
+        pyc.write_bytes(b"\x00")
+        report = HealthReport(target=str(tmp_path))
+        scan_orphaned_bytecode(tmp_path, report)
+        assert len(report.findings) == 0
+
+    def test_detects_pycache_orphan(self, tmp_path: Path) -> None:
+        pycache = tmp_path / "__pycache__"
+        pycache.mkdir()
+        pyc = pycache / "module.cpython-312.pyc"
+        pyc.write_bytes(b"\x00")
+        report = HealthReport(target=str(tmp_path))
+        scan_orphaned_bytecode(tmp_path, report)
+        assert len(report.findings) == 1
+        assert "__pycache__" in report.findings[0].path
+
+
+class TestScanBurnScriptClutter:
+    def test_detects_burn_script(self, tmp_path: Path) -> None:
+        (tmp_path / "burn_test.sh").write_text("#!/bin/bash")
+        report = HealthReport(target=str(tmp_path))
+        scan_burn_script_clutter(tmp_path, report)
+        assert len(report.findings) == 1
+        assert report.findings[0].category == "deployment_hygiene"
+        assert report.findings[0].severity == "warning"
+
+    def test_ignores_regular_files(self, tmp_path: Path) -> None:
+        (tmp_path / "deploy.sh").write_text("#!/bin/bash")
+        report = HealthReport(target=str(tmp_path))
+        scan_burn_script_clutter(tmp_path, report)
+        assert len(report.findings) == 0
+
+
+class TestScanSensitiveFilePermissions:
+    def test_detects_world_readable_keystore(self, tmp_path: Path) -> None:
+        ks = tmp_path / "keystore.json"
+        ks.write_text("{}")
+        ks.chmod(0o644)
+        report = HealthReport(target=str(tmp_path))
+        scan_sensitive_file_permissions(tmp_path, report)
+        assert len(report.findings) == 1
+        assert report.findings[0].category == "security"
+        assert report.findings[0].severity == "critical"
+        assert "644" in report.findings[0].message
+
+    def test_auto_fixes_permissions(self, tmp_path: Path) -> None:
+        ks = tmp_path / "keystore.json"
+        ks.write_text("{}")
+        ks.chmod(0o644)
+        report = HealthReport(target=str(tmp_path))
+        scan_sensitive_file_permissions(tmp_path, report, fix=True)
+        assert len(report.findings) == 1
+        assert ks.stat().st_mode & 0o777 == 0o600
+
+    def test_ignores_safe_permissions(self, tmp_path: Path) -> None:
+        ks = tmp_path / "keystore.json"
+        ks.write_text("{}")
+        ks.chmod(0o600)
+        report = HealthReport(target=str(tmp_path))
+        scan_sensitive_file_permissions(tmp_path, report)
+        assert len(report.findings) == 0
+
+    def test_ignores_env_example(self, tmp_path: Path) -> None:
+        env = tmp_path / ".env.example"
+        env.write_text("# example")
+        env.chmod(0o644)
+        report = HealthReport(target=str(tmp_path))
+        scan_sensitive_file_permissions(tmp_path, report)
+        assert len(report.findings) == 0
+
+    def test_ignores_test_directory(self, tmp_path: Path) -> None:
+        tests_dir = tmp_path / "tests"
+        tests_dir.mkdir()
+        ks = tests_dir / "keystore.json"
+        ks.write_text("{}")
+        ks.chmod(0o644)
+        report = HealthReport(target=str(tmp_path))
+        scan_sensitive_file_permissions(tmp_path, report)
+        assert len(report.findings) == 0
+
+
+class TestScanEnvironmentVariables:
+    def test_reports_missing_env_var(self, monkeypatch) -> None:
+        monkeypatch.delenv("GITEA_TOKEN", raising=False)
+        report = HealthReport(target=".")
+        scan_environment_variables(report)
+        missing = [f for f in report.findings if f.path == "$GITEA_TOKEN"]
+        assert len(missing) == 1
+        assert missing[0].severity == "warning"
+
+    def test_passes_when_env_vars_present(self, monkeypatch) -> None:
+        for var in ("GITEA_URL", "GITEA_TOKEN", "GITEA_USER"):
+            monkeypatch.setenv(var, "present")
+        report = HealthReport(target=".")
+        scan_environment_variables(report)
+        assert len(report.findings) == 0
+
+
+class TestRunHealthCheck:
+    def test_full_run(self, tmp_path: Path, monkeypatch) -> None:
+        monkeypatch.setenv("GITEA_URL", "https://example.com")
+        monkeypatch.setenv("GITEA_TOKEN", "secret")
+        monkeypatch.setenv("GITEA_USER", "bezalel")
+
+        (tmp_path / "orphan.pyc").write_bytes(b"\x00")
+        (tmp_path / "burn_it.sh").write_text("#!/bin/bash")
+        ks = tmp_path / "keystore.json"
+        ks.write_text("{}")
+        ks.chmod(0o644)
+
+        report = run_health_check(tmp_path)
+        assert not report.passed
+        categories = {f.category for f in report.findings}
+        assert "artifact_integrity" in categories
+        assert "deployment_hygiene" in categories
+        assert "security" in categories
+
+    def test_clean_run_passes(self, tmp_path: Path, monkeypatch) -> None:
+        for var in ("GITEA_URL", "GITEA_TOKEN", "GITEA_USER"):
+            monkeypatch.setenv(var, "present")
+
+        (tmp_path / "module.py").write_text("pass")
+        report = run_health_check(tmp_path)
+        assert report.passed
+        assert len(report.findings) == 0
--- a/tests/test_green_path_e2e.py
+++ b/tests/test_green_path_e2e.py
@@ -0,0 +1,18 @@
+"""Bare green-path E2E — one happy-path tool call cycle.
+
+Exercises the terminal tool directly and verifies the response structure.
+No API keys required. Runtime target: < 10 seconds.
+"""
+
+import json
+
+from tools.terminal_tool import terminal_tool
+
+
+def test_terminal_echo_green_path() -> None:
+    """terminal('echo hello') -> verify response contains 'hello' and exit_code 0."""
+    result = terminal_tool(command="echo hello", timeout=10)
+    data = json.loads(result)
+
+    assert data["exit_code"] == 0, f"Expected exit_code 0, got {data['exit_code']}"
+    assert "hello" in data["output"], f"Expected 'hello' in output, got: {data['output']}"
--- a/wizard-bootstrap/FORGE_OPERATIONS_GUIDE.md
+++ b/wizard-bootstrap/FORGE_OPERATIONS_GUIDE.md
@@ -0,0 +1,215 @@
+# Forge Operations Guide
+
+> **Audience:** Forge wizards joining the hermes-agent project
+> **Purpose:** Practical patterns, common pitfalls, and operational wisdom
+> **Companion to:** `WIZARD_ENVIRONMENT_CONTRACT.md`
+
+---
+
+## The One Rule
+
+**Read the actual state before acting.**
+
+Before touching any service, config, or codebase: `ps aux | grep hermes`, `cat ~/.hermes/gateway_state.json`, `curl http://127.0.0.1:8642/health`. The forge punishes assumptions harder than it rewards speed. Evidence always beats intuition.
+
+---
+
+## First 15 Minutes on a New System
+
+```bash
+# 1. Validate your environment
+python wizard-bootstrap/wizard_bootstrap.py
+
+# 2. Check what is actually running
+ps aux | grep -E 'hermes|python|gateway'
+
+# 3. Check the data directory
+ls -la ~/.hermes/
+cat ~/.hermes/gateway_state.json 2>/dev/null | python3 -m json.tool
+
+# 4. Verify health endpoints (if gateway is up)
+curl -sf http://127.0.0.1:8642/health | python3 -m json.tool
+
+# 5. Run the smoke test
+source venv/bin/activate
+python -m pytest tests/ -q -x --timeout=60 2>&1 | tail -20
+```
+
+Do not begin work until all five steps return clean output.
+
+---
+
+## Import Chain — Know It, Respect It
+
+The dependency order is load-bearing. Violating it causes silent failures:
+
+```
+tools/registry.py   ← no deps; imported by everything
+       ↑
+tools/*.py          ← each calls registry.register() at import time
+       ↑
+model_tools.py      ← imports registry; triggers tool discovery
+       ↑
+run_agent.py / cli.py / batch_runner.py
+```
+
+**If you add a tool file**, you must also:
+1. Add its import to `model_tools.py` `_discover_tools()`
+2. Add it to `toolsets.py` (core or a named toolset)
+
+Missing either step causes the tool to silently not appear — no error, just absence.
+
+---
+
+## The Five Profile Rules
+
+Hermes supports isolated profiles (`hermes -p myprofile`). Profile-unsafe code has caused repeated bugs. Memorize these:
+
+| Do this | Not this |
+|---------|----------|
+| `get_hermes_home()` | `Path.home() / ".hermes"` |
+| `display_hermes_home()` in user messages | hardcoded `~/.hermes` strings |
+| `get_hermes_home() / "sessions"` in tests | `~/.hermes/sessions` in tests |
+
+Import both from `hermes_constants`. Every `~/.hermes` hardcode is a latent profile bug.
+
+---
+
+## Prompt Caching — Do Not Break It
+
+The agent caches system prompts. Cache breaks force re-billing of the entire context window on every turn. The following actions break caching mid-conversation and are forbidden:
+
+- Altering past context
+- Changing the active toolset
+- Reloading memories or rebuilding the system prompt
+
+The only sanctioned context alteration is the context compressor (`agent/context_compressor.py`). If your feature touches the message history, read that file first.
+
+---
+
+## Adding a Slash Command (Checklist)
+
+Four files, in order:
+
+1. **`hermes_cli/commands.py`** — add `CommandDef` to `COMMAND_REGISTRY`
+2. **`cli.py`** — add handler branch in `HermesCLI.process_command()`
+3. **`gateway/run.py`** — add handler if it should work in messaging platforms
+4. **Aliases** — add to the `aliases` tuple on the `CommandDef`; everything else updates automatically
+
+All downstream consumers (Telegram menu, Slack routing, autocomplete, help text) derive from `COMMAND_REGISTRY`. You never touch them directly.
+
+---
+
+## Tool Schema Pitfalls
+
+**Do NOT cross-reference other toolsets in schema descriptions.**
+Writing "prefer `web_search` over this tool" in a browser tool's description will cause the model to hallucinate calls to `web_search` when it's not loaded. Cross-references belong in `get_tool_definitions()` post-processing blocks in `model_tools.py`.
+
+**Do NOT use `\033[K` (ANSI erase-to-EOL) in display code.**
+Under `prompt_toolkit`'s `patch_stdout`, it leaks as literal `?[K`. Use space-padding instead: `f"\r{line}{' ' * pad}"`.
+
+**Do NOT use `simple_term_menu` for interactive menus.**
+It ghosts on scroll in tmux/iTerm2. Use `curses` (stdlib). See `hermes_cli/tools_config.py` for the pattern.
+
+---
+
+## Health Check Anatomy
+
+A healthy instance returns:
+
+```json
+{
+  "status": "ok",
+  "gateway_state": "running",
+  "platforms": {
+    "telegram": {"state": "connected"}
+  }
+}
+```
+
+| Field | Healthy value | What a bad value means |
+|-------|--------------|----------------------|
+| `status` | `"ok"` | HTTP server down |
+| `gateway_state` | `"running"` | Still starting or crashed |
+| `platforms.<name>.state` | `"connected"` | Auth failure or network issue |
+
+`gateway_state: "starting"` is normal for up to 60 s on boot. Beyond that, check logs for auth errors:
+
+```bash
+journalctl -u hermes-gateway --since "2 minutes ago" | grep -i "error\|token\|auth"
+```
+
+---
+
+## Gateway Won't Start — Diagnosis Order
+
+1. `ss -tlnp | grep 8642` — port conflict?
+2. `cat ~/.hermes/gateway.pid` → `ps -p <pid>` — stale PID file?
+3. `hermes gateway start --replace` — clears stale locks and PIDs
+4. `HERMES_LOG_LEVEL=DEBUG hermes gateway start` — verbose output
+5. Check `~/.hermes/.env` — missing or placeholder token?
+
+---
+
+## Before Every PR
+
+```bash
+source venv/bin/activate
+python -m pytest tests/ -q          # full suite: ~3 min, ~3000 tests
+python scripts/deploy-validate       # deployment health check
+python wizard-bootstrap/wizard_bootstrap.py  # environment sanity
+```
+
+All three must exit 0. Do not skip. "It works locally" is not sufficient evidence.
+
+---
+
+## Session and State Files
+
+| Store | Location | Notes |
+|-------|----------|-------|
+| Sessions | `~/.hermes/sessions/*.json` | Persisted across restarts |
+| Memories | `~/.hermes/memories/*.md` | Written by the agent's memory tool |
+| Cron jobs | `~/.hermes/cron/*.json` | Scheduler state |
+| Gateway state | `~/.hermes/gateway_state.json` | Live platform connection status |
+| Response store | `~/.hermes/response_store.db` | SQLite WAL — API server only |
+
+All paths go through `get_hermes_home()`. Never hardcode. Always backup before a major update:
+
+```bash
+tar czf ~/backups/hermes_$(date +%F_%H%M).tar.gz ~/.hermes/
+```
+
+---
+
+## Writing Tests
+
+```bash
+python -m pytest tests/path/to/test.py -q    # single file
+python -m pytest tests/ -q -k "test_name"    # by name
+python -m pytest tests/ -q -x               # stop on first failure
+```
+
+**Test isolation rules:**
+- `tests/conftest.py` has an autouse fixture that redirects `HERMES_HOME` to a temp dir. Never write to `~/.hermes/` in tests.
+- Profile tests must mock both `Path.home()` and `HERMES_HOME`. See `tests/hermes_cli/test_profiles.py` for the pattern.
+- Do not mock the database. Integration tests should use real SQLite with a temp path.
+
+---
+
+## Commit Conventions
+
+```
+feat: add X           # new capability
+fix: correct Y        # bug fix
+refactor: restructure Z  # no behaviour change
+test: add tests for W    # test-only
+chore: update deps       # housekeeping
+docs: clarify X          # documentation only
+```
+
+Include `Fixes #NNN` or `Refs #NNN` in the commit message body to close or reference issues automatically.
+
+---
+
+*This guide lives in `wizard-bootstrap/`. Update it when you discover a new pitfall or pattern worth preserving.*
Author	SHA1	Message	Date
claw-code	d0e4e00ba1	chore: claw-code progress on #151 Some checks failed Forge CI / smoke-and-build (pull_request) Failing after 4s Details Refs #151	2026-04-07 00:03:57 -04:00
claw-code	9b4fcc5ee4	[claw-code] P2: Validate Documentation Audit & Apply to Our Fork (#126 ) (#176 ) Some checks failed Forge CI / smoke-and-build (push) Failing after 2s Details Co-authored-by: claw-code <claw-code@timmy.local> Co-committed-by: claw-code <claw-code@timmy.local>	2026-04-07 03:56:46 +00:00
Timmy Time	6581dcb1af	fix(ezra): switch primary from kimi-for-coding to kimi-k2.5, add fallback chain Some checks failed Forge CI / smoke-and-build (push) Failing after 2s Details kimi-for-coding is throwing 403 access-terminated errors. This switches Ezra to kimi-k2.5 and adds anthropic + openrouter fallbacks. Addresses #lazzyPit and unblocks Ezra resurrection.	2026-04-07 03:23:36 +00:00
Timmy Time	a37fed23e6	[BEZALEL][CI] Syntax Guard — Prevent Broken Python from Reaching Main (#167 ) Some checks failed Forge CI / smoke-and-build (push) Failing after 2s Details	2026-04-07 02:27:32 +00:00
Timmy Time	97f63a0d89	Merge pull request '[BEZALEL][DEVKIT] Shared Development Tools for the Wizard Fleet' (#166 ) from bezalel/devkit-for-the-fleet into main Some checks failed Forge CI / smoke-and-build (push) Failing after 2s Details	2026-04-07 02:15:11 +00:00
Timmy Time	b49e8b11ea	Merge pull request '[BEZALEL][Epic-001] The Forge CI Pipeline — Gitea Actions + Smoke + Green E2E' (#154 ) from bezalel/epic-001-forge-ci into main Some checks failed Forge CI / smoke-and-build (push) Failing after 2s Details	2026-04-07 02:12:31 +00:00
Bezalel	88b4cc218f	feat(devkit): Add shared development tools for the wizard fleet Some checks failed Notebook CI / notebook-smoke (pull_request) Failing after 2s Details - gitea_client.py — reusable Gitea API client for issues, PRs, comments - health.py — fleet health monitor (load, disk, memory, processes) - notebook_runner.py — Papermill wrapper with JSON reporting - smoke_test.py — fast smoke tests and bare green-path e2e - secret_scan.py — secret leak scanner for CI gating - wizard_env.py — environment validator for bootstrapping agents - README.md — usage guide for all tools These tools are designed to be used by any wizard via python -m devkit.<tool>. Rising up as a platform, not a silo.	2026-04-07 02:08:47 +00:00
Claude (Opus 4.6)	59653ef409	[claude] Research Triage: SSD Self-Distillation acknowledgment (#128 ) (#165 )	2026-04-07 02:07:54 +00:00
Claude (Opus 4.6)	e32d6332bc	[claude] Forge Operations Guide — Practical Wizard Onboarding (#142 ) (#164 )	2026-04-07 02:06:15 +00:00
Claude (Opus 4.6)	6291f2d31b	[claude] Fleet SITREP — April 6, 2026 acknowledgment (#143 ) (#162 )	2026-04-07 02:04:51 +00:00
Claude (Opus 4.6)	066ec8eafa	[claude] Add Ezra Quarterly Report — April 2026 (MD + PDF) (#133 ) (#163 )	2026-04-07 02:04:45 +00:00
Bezalel	53fe58a2b9	feat(notebooks): Add Jupytext + Papermill agent workflow demo Some checks failed Notebook CI / notebook-smoke (push) Failing after 3s Details Notebook CI / notebook-smoke (pull_request) Failing after 5s Details - Add parameterized system-health notebook (.py source + .ipynb) - Add Gitea Actions CI workflow for notebook execution smoke test - Add NOTEBOOK_WORKFLOW.md documenting the .py-first approach - Proves end-to-end: agent writes .py -> PR review -> CI executes -> output artifact	2026-04-07 01:54:25 +00:00
Bezalel	43bcb88a09	[BEZALEL][Epic-001] The Forge CI Pipeline — Gitea Actions + Smoke + Green E2E Some checks failed Forge CI / smoke-and-build (pull_request) Failing after 3s Details - Add .gitea/workflows/ci.yml: Gitea Actions workflow for PR/push CI - Add scripts/smoke_test.py: fast smoke tests (<30s) for core imports and CLI entrypoints - Add tests/test_green_path_e2e.py: bare green-path e2e — terminal echo test - Total CI runtime target: <5 minutes - No API keys required for smoke/e2e stages Closes #145 /assign @bezalel	2026-04-07 00:28:32 +00:00
Bezalel	89730e8e90	[BEZALEL] Add forge health check — artifact integrity and security scanner Some checks failed Supply Chain Audit / Scan PR for supply chain risks (pull_request) Failing after 0s Details Docker Build and Publish / build-and-push (pull_request) Failing after 7s Details Tests / test (pull_request) Failing after 2s Details Adds scripts/forge_health_check.py to scan wizard environments for: - Missing .py source files with orphaned .pyc bytecode (GOFAI artifact integrity) - Burn script clutter in production paths - World-readable sensitive files (keystores, tokens, .env) - Missing required environment variables Includes full test suite in tests/test_forge_health_check.py covering orphaned bytecode detection, burn script clutter, permission auto-fix, and environment variable validation. Addresses Allegro formalization audit findings: - GOFAI source files missing (only .pyc remains) - Nostr keystore world-readable - eg burn scripts cluttering /root /assign @bezalel	2026-04-06 22:37:32 +00:00