Compare commits
19 Commits
claude/iss
...
claw-code/
| Author | SHA1 | Date | |
|---|---|---|---|
| 1156a9f55e | |||
| 6581dcb1af | |||
| a37fed23e6 | |||
| 97f63a0d89 | |||
| b49e8b11ea | |||
| 88b4cc218f | |||
| 59653ef409 | |||
| e32d6332bc | |||
| 6291f2d31b | |||
| 066ec8eafa | |||
| 069d5404a0 | |||
| 258d02eb9b | |||
| a89c0a2ea4 | |||
| c994c01c9f | |||
| 8150b5c66b | |||
| 53fe58a2b9 | |||
| 35be02ad15 | |||
| 43bcb88a09 | |||
| 89730e8e90 |
2
.claw/sessions/session-1775533542734-0.jsonl
Normal file
2
.claw/sessions/session-1775533542734-0.jsonl
Normal file
@@ -0,0 +1,2 @@
|
||||
{"created_at_ms":1775533542734,"session_id":"session-1775533542734-0","type":"session_meta","updated_at_ms":1775533542734,"version":1}
|
||||
{"message":{"blocks":[{"text":"You are Code Claw running as the Gitea user claw-code.\n\nRepository: Timmy_Foundation/hermes-agent\nIssue: #126 — P2: Validate Documentation Audit & Apply to Our Fork\nBranch: claw-code/issue-126\n\nRead the issue and recent comments, then implement the smallest correct change.\nYou are in a git repo checkout already.\n\nIssue body:\n## Context\n\nCommit `43d468ce` is a comprehensive documentation audit — fixes stale info, expands thin pages, adds depth across all docs.\n\n## Acceptance Criteria\n\n- [ ] **Catalog all doc changes**: Run `git show 43d468ce --stat` to list all files changed, then review each for what was fixed/expanded\n- [ ] **Verify key docs are accurate**: Pick 3 docs that were previously thin (setup, deployment, plugin development), confirm they now have comprehensive content\n- [ ] **Identify stale info that was corrected**: Note at least 3 pieces of stale information that were removed or updated\n- [ ] **Apply fixes to our fork if needed**: Check if any of the doc fixes apply to our `Timmy_Foundation/hermes-agent` fork (Timmy-specific references, custom config sections)\n\n## Why This Matters\n\nAccurate documentation is critical for onboarding new agents and maintaining the fleet. Stale docs cost more debugging time than writing them initially.\n\n## Hints\n\n- Run `cd ~/.hermes/hermes-agent && git show 43d468ce --stat` to see the full scope\n- The docs likely cover: setup, plugins, deployment, MCP configuration, and tool integrations\n\n\nParent: #111\n\nRecent comments:\n## 🏷️ Automated Triage Check\n\n**Timestamp:** 2026-04-06T15:30:12.449023 \n**Agent:** Allegro Heartbeat\n\nThis issue has been identified as needing triage:\n\n### Checklist\n- [ ] Clear acceptance criteria defined\n- [ ] Priority label assigned (p0-critical / p1-important / p2-backlog)\n- [ ] Size estimate added (quick-fix / day / week / epic)\n- [ ] Owner assigned\n- [ ] Related issues linked\n\n### Context\n- No comments yet — needs engagement\n- No labels — needs categorization\n- Part of automated backlog maintenance\n\n---\n*Automated triage from Allegro 15-minute heartbeat*\n\n[BURN-DOWN] Dispatched to Code Claw (claw-code worker) as part of nightly burn-down cycle. Heartbeat active.\n\n🟠 Code Claw (OpenRouter qwen/qwen3.6-plus:free) picking up this issue via 15-minute heartbeat.\n\nTimestamp: 2026-04-07T03:45:37Z\n\nRules:\n- Make focused code/config/doc changes only if they directly address the issue.\n- Prefer the smallest proof-oriented fix.\n- Run relevant verification commands if obvious.\n- Do NOT create PRs yourself; the outer worker handles commit/push/PR.\n- If the task is too large or not code-fit, leave the tree unchanged.\n","type":"text"}],"role":"user"},"type":"message"}
|
||||
54
.gitea/workflows/ci.yml
Normal file
54
.gitea/workflows/ci.yml
Normal file
@@ -0,0 +1,54 @@
|
||||
name: Forge CI
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [main]
|
||||
pull_request:
|
||||
branches: [main]
|
||||
|
||||
concurrency:
|
||||
group: forge-ci-${{ gitea.ref }}
|
||||
cancel-in-progress: true
|
||||
|
||||
jobs:
|
||||
smoke-and-build:
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 5
|
||||
steps:
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Install uv
|
||||
uses: astral-sh/setup-uv@v5
|
||||
|
||||
- name: Set up Python 3.11
|
||||
run: uv python install 3.11
|
||||
|
||||
- name: Install package
|
||||
run: |
|
||||
uv venv .venv --python 3.11
|
||||
source .venv/bin/activate
|
||||
uv pip install -e ".[all,dev]"
|
||||
|
||||
- name: Smoke tests
|
||||
run: |
|
||||
source .venv/bin/activate
|
||||
python scripts/smoke_test.py
|
||||
env:
|
||||
OPENROUTER_API_KEY: ""
|
||||
OPENAI_API_KEY: ""
|
||||
NOUS_API_KEY: ""
|
||||
|
||||
- name: Syntax guard
|
||||
run: |
|
||||
source .venv/bin/activate
|
||||
python scripts/syntax_guard.py
|
||||
|
||||
- name: Green-path E2E
|
||||
run: |
|
||||
source .venv/bin/activate
|
||||
python -m pytest tests/test_green_path_e2e.py -q --tb=short
|
||||
env:
|
||||
OPENROUTER_API_KEY: ""
|
||||
OPENAI_API_KEY: ""
|
||||
NOUS_API_KEY: ""
|
||||
44
.gitea/workflows/notebook-ci.yml
Normal file
44
.gitea/workflows/notebook-ci.yml
Normal file
@@ -0,0 +1,44 @@
|
||||
name: Notebook CI
|
||||
|
||||
on:
|
||||
push:
|
||||
paths:
|
||||
- 'notebooks/**'
|
||||
pull_request:
|
||||
paths:
|
||||
- 'notebooks/**'
|
||||
|
||||
jobs:
|
||||
notebook-smoke:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Setup Python
|
||||
uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: '3.12'
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
pip install papermill jupytext nbformat
|
||||
python -m ipykernel install --user --name python3
|
||||
|
||||
- name: Execute system health notebook
|
||||
run: |
|
||||
papermill notebooks/agent_task_system_health.ipynb /tmp/output.ipynb \
|
||||
-p threshold 0.5 \
|
||||
-p hostname ci-runner
|
||||
|
||||
- name: Verify output has results
|
||||
run: |
|
||||
python -c "
|
||||
import json
|
||||
nb = json.load(open('/tmp/output.ipynb'))
|
||||
code_cells = [c for c in nb['cells'] if c['cell_type'] == 'code']
|
||||
outputs = [c.get('outputs', []) for c in code_cells]
|
||||
total_outputs = sum(len(o) for o in outputs)
|
||||
assert total_outputs > 0, 'Notebook produced no outputs'
|
||||
print(f'Notebook executed successfully with {total_outputs} output(s)')
|
||||
"
|
||||
13
.github/CODEOWNERS
vendored
Normal file
13
.github/CODEOWNERS
vendored
Normal file
@@ -0,0 +1,13 @@
|
||||
# Default owners for all files
|
||||
* @Timmy
|
||||
|
||||
# Critical paths require explicit review
|
||||
/gateway/ @Timmy
|
||||
/tools/ @Timmy
|
||||
/agent/ @Timmy
|
||||
/config/ @Timmy
|
||||
/scripts/ @Timmy
|
||||
/.github/workflows/ @Timmy
|
||||
/pyproject.toml @Timmy
|
||||
/requirements.txt @Timmy
|
||||
/Dockerfile @Timmy
|
||||
99
.github/ISSUE_TEMPLATE/security_pr_checklist.yml
vendored
Normal file
99
.github/ISSUE_TEMPLATE/security_pr_checklist.yml
vendored
Normal file
@@ -0,0 +1,99 @@
|
||||
name: "🔒 Security PR Checklist"
|
||||
description: "Use this when your PR touches authentication, file I/O, external API calls, or other sensitive paths."
|
||||
title: "[Security Review]: "
|
||||
labels: ["security", "needs-review"]
|
||||
body:
|
||||
- type: markdown
|
||||
attributes:
|
||||
value: |
|
||||
## Security Pre-Merge Review
|
||||
Complete this checklist before requesting review on PRs that touch **authentication, file I/O, external API calls, or secrets handling**.
|
||||
|
||||
- type: input
|
||||
id: pr-link
|
||||
attributes:
|
||||
label: Pull Request
|
||||
description: Link to the PR being reviewed
|
||||
placeholder: "https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/pulls/XXX"
|
||||
validations:
|
||||
required: true
|
||||
|
||||
- type: dropdown
|
||||
id: change-type
|
||||
attributes:
|
||||
label: Change Category
|
||||
description: What kind of sensitive change does this PR make?
|
||||
multiple: true
|
||||
options:
|
||||
- Authentication / Authorization
|
||||
- File I/O (read/write/delete)
|
||||
- External API calls (outbound HTTP/network)
|
||||
- Secret / credential handling
|
||||
- Command execution (subprocess/shell)
|
||||
- Dependency addition or update
|
||||
- Configuration changes
|
||||
- CI/CD pipeline changes
|
||||
validations:
|
||||
required: true
|
||||
|
||||
- type: checkboxes
|
||||
id: secrets-checklist
|
||||
attributes:
|
||||
label: Secrets & Credentials
|
||||
options:
|
||||
- label: No secrets, API keys, or credentials are hardcoded
|
||||
required: true
|
||||
- label: All sensitive values are loaded from environment variables or a secrets manager
|
||||
required: true
|
||||
- label: Test fixtures use fake/placeholder values, not real credentials
|
||||
required: true
|
||||
|
||||
- type: checkboxes
|
||||
id: input-validation-checklist
|
||||
attributes:
|
||||
label: Input Validation
|
||||
options:
|
||||
- label: All external input (user, API, file) is validated before use
|
||||
required: true
|
||||
- label: File paths are validated against path traversal (`../`, null bytes, absolute paths)
|
||||
- label: URLs are validated for SSRF (blocked private/metadata IPs)
|
||||
- label: Shell commands do not use `shell=True` with user-controlled input
|
||||
|
||||
- type: checkboxes
|
||||
id: auth-checklist
|
||||
attributes:
|
||||
label: Authentication & Authorization (if applicable)
|
||||
options:
|
||||
- label: Authentication tokens are not logged or exposed in error messages
|
||||
- label: Authorization checks happen server-side, not just client-side
|
||||
- label: Session tokens are properly scoped and have expiry
|
||||
|
||||
- type: checkboxes
|
||||
id: supply-chain-checklist
|
||||
attributes:
|
||||
label: Supply Chain
|
||||
options:
|
||||
- label: New dependencies are pinned to a specific version range
|
||||
- label: Dependencies come from trusted sources (PyPI, npm, official repos)
|
||||
- label: No `.pth` files or install hooks that execute arbitrary code
|
||||
- label: "`pip-audit` passes (no known CVEs in added dependencies)"
|
||||
|
||||
- type: textarea
|
||||
id: threat-model
|
||||
attributes:
|
||||
label: Threat Model Notes
|
||||
description: |
|
||||
Briefly describe the attack surface this change introduces or modifies, and how it is mitigated.
|
||||
placeholder: |
|
||||
This PR adds a new outbound HTTP call to the OpenRouter API.
|
||||
Mitigation: URL is hardcoded (no user input), response is parsed with strict schema validation.
|
||||
|
||||
- type: textarea
|
||||
id: testing
|
||||
attributes:
|
||||
label: Security Testing Done
|
||||
description: What security testing did you perform?
|
||||
placeholder: |
|
||||
- Ran validate_security.py — all checks pass
|
||||
- Tested path traversal attempts manually
|
||||
- Verified no secrets in git diff
|
||||
82
.github/workflows/dependency-audit.yml
vendored
Normal file
82
.github/workflows/dependency-audit.yml
vendored
Normal file
@@ -0,0 +1,82 @@
|
||||
name: Dependency Audit
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
branches: [main]
|
||||
paths:
|
||||
- 'requirements.txt'
|
||||
- 'pyproject.toml'
|
||||
- 'uv.lock'
|
||||
schedule:
|
||||
- cron: '0 8 * * 1' # Weekly on Monday
|
||||
workflow_dispatch:
|
||||
|
||||
permissions:
|
||||
pull-requests: write
|
||||
contents: read
|
||||
|
||||
jobs:
|
||||
audit:
|
||||
name: Audit Python dependencies
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: astral-sh/setup-uv@v5
|
||||
- name: Set up Python
|
||||
run: uv python install 3.11
|
||||
- name: Install pip-audit
|
||||
run: uv pip install --system pip-audit
|
||||
- name: Run pip-audit
|
||||
id: audit
|
||||
run: |
|
||||
set -euo pipefail
|
||||
# Run pip-audit against the lock file/requirements
|
||||
if pip-audit --requirement requirements.txt -f json -o /tmp/audit-results.json 2>/tmp/audit-stderr.txt; then
|
||||
echo "found=false" >> "$GITHUB_OUTPUT"
|
||||
else
|
||||
echo "found=true" >> "$GITHUB_OUTPUT"
|
||||
# Check severity
|
||||
CRITICAL=$(python3 -c "
|
||||
import json, sys
|
||||
data = json.load(open('/tmp/audit-results.json'))
|
||||
vulns = data.get('dependencies', [])
|
||||
for d in vulns:
|
||||
for v in d.get('vulns', []):
|
||||
aliases = v.get('aliases', [])
|
||||
# Check for critical/high CVSS
|
||||
if any('CVSS' in str(a) for a in aliases):
|
||||
print('true')
|
||||
sys.exit(0)
|
||||
print('false')
|
||||
" 2>/dev/null || echo 'false')
|
||||
echo "critical=${CRITICAL}" >> "$GITHUB_OUTPUT"
|
||||
fi
|
||||
continue-on-error: true
|
||||
- name: Post results comment
|
||||
if: steps.audit.outputs.found == 'true' && github.event_name == 'pull_request'
|
||||
env:
|
||||
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
run: |
|
||||
BODY="## ⚠️ Dependency Vulnerabilities Detected
|
||||
|
||||
\`pip-audit\` found vulnerable dependencies in this PR. Review and update before merging.
|
||||
|
||||
\`\`\`
|
||||
$(cat /tmp/audit-results.json | python3 -c "
|
||||
import json, sys
|
||||
data = json.load(sys.stdin)
|
||||
for dep in data.get('dependencies', []):
|
||||
for v in dep.get('vulns', []):
|
||||
print(f\" {dep['name']}=={dep['version']}: {v['id']} - {v.get('description', '')[:120]}\")
|
||||
" 2>/dev/null || cat /tmp/audit-stderr.txt)
|
||||
\`\`\`
|
||||
|
||||
---
|
||||
*Automated scan by [dependency-audit](/.github/workflows/dependency-audit.yml)*"
|
||||
gh pr comment "${{ github.event.pull_request.number }}" --body "$BODY"
|
||||
- name: Fail on vulnerabilities
|
||||
if: steps.audit.outputs.found == 'true'
|
||||
run: |
|
||||
echo "::error::Vulnerable dependencies detected. See PR comment for details."
|
||||
cat /tmp/audit-results.json | python3 -m json.tool || true
|
||||
exit 1
|
||||
114
.github/workflows/quarterly-security-audit.yml
vendored
Normal file
114
.github/workflows/quarterly-security-audit.yml
vendored
Normal file
@@ -0,0 +1,114 @@
|
||||
name: Quarterly Security Audit
|
||||
|
||||
on:
|
||||
schedule:
|
||||
# Run at 08:00 UTC on the first day of each quarter (Jan, Apr, Jul, Oct)
|
||||
- cron: '0 8 1 1,4,7,10 *'
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
reason:
|
||||
description: 'Reason for manual trigger'
|
||||
required: false
|
||||
default: 'Manual quarterly audit'
|
||||
|
||||
permissions:
|
||||
issues: write
|
||||
contents: read
|
||||
|
||||
jobs:
|
||||
create-audit-issue:
|
||||
name: Create quarterly security audit issue
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Get quarter info
|
||||
id: quarter
|
||||
run: |
|
||||
MONTH=$(date +%-m)
|
||||
YEAR=$(date +%Y)
|
||||
QUARTER=$(( (MONTH - 1) / 3 + 1 ))
|
||||
echo "quarter=Q${QUARTER}-${YEAR}" >> "$GITHUB_OUTPUT"
|
||||
echo "year=${YEAR}" >> "$GITHUB_OUTPUT"
|
||||
echo "q=${QUARTER}" >> "$GITHUB_OUTPUT"
|
||||
|
||||
- name: Create audit issue
|
||||
env:
|
||||
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
run: |
|
||||
QUARTER="${{ steps.quarter.outputs.quarter }}"
|
||||
|
||||
gh issue create \
|
||||
--title "[$QUARTER] Quarterly Security Audit" \
|
||||
--label "security,audit" \
|
||||
--body "$(cat <<'BODY'
|
||||
## Quarterly Security Audit — ${{ steps.quarter.outputs.quarter }}
|
||||
|
||||
This is the scheduled quarterly security audit for the hermes-agent project. Complete each section and close this issue when the audit is done.
|
||||
|
||||
**Audit Period:** ${{ steps.quarter.outputs.quarter }}
|
||||
**Due:** End of quarter
|
||||
**Owner:** Assign to a maintainer
|
||||
|
||||
---
|
||||
|
||||
## 1. Open Issues & PRs Audit
|
||||
|
||||
Review all open issues and PRs for security-relevant content. Tag any that touch attack surfaces with the `security` label.
|
||||
|
||||
- [ ] Review open issues older than 30 days for unaddressed security concerns
|
||||
- [ ] Tag security-relevant open PRs with `needs-security-review`
|
||||
- [ ] Check for any issues referencing CVEs or known vulnerabilities
|
||||
- [ ] Review recently closed security issues — are fixes deployed?
|
||||
|
||||
## 2. Dependency Audit
|
||||
|
||||
- [ ] Run `pip-audit` against current `requirements.txt` / `pyproject.toml`
|
||||
- [ ] Check `uv.lock` for any pinned versions with known CVEs
|
||||
- [ ] Review any `git+` dependencies for recent changes or compromise signals
|
||||
- [ ] Update vulnerable dependencies and open PRs for each
|
||||
|
||||
## 3. Critical Path Review
|
||||
|
||||
Review recent changes to attack-surface paths:
|
||||
|
||||
- [ ] `gateway/` — authentication, message routing, platform adapters
|
||||
- [ ] `tools/` — file I/O, command execution, web access
|
||||
- [ ] `agent/` — prompt handling, context management
|
||||
- [ ] `config/` — secrets loading, configuration parsing
|
||||
- [ ] `.github/workflows/` — CI/CD integrity
|
||||
|
||||
Run: `git log --since="3 months ago" --name-only -- gateway/ tools/ agent/ config/ .github/workflows/`
|
||||
|
||||
## 4. Secret Scan
|
||||
|
||||
- [ ] Run secret scanner on the full codebase (not just diffs)
|
||||
- [ ] Verify no credentials are present in git history
|
||||
- [ ] Confirm all API keys/tokens in use are rotated on a regular schedule
|
||||
|
||||
## 5. Access & Permissions Review
|
||||
|
||||
- [ ] Review who has write access to the main branch
|
||||
- [ ] Confirm branch protection rules are still in place (require PR + review)
|
||||
- [ ] Verify CI/CD secrets are scoped correctly (not over-permissioned)
|
||||
- [ ] Review CODEOWNERS file for accuracy
|
||||
|
||||
## 6. Vulnerability Triage
|
||||
|
||||
List any new vulnerabilities found this quarter:
|
||||
|
||||
| ID | Component | Severity | Status | Owner |
|
||||
|----|-----------|----------|--------|-------|
|
||||
| | | | | |
|
||||
|
||||
## 7. Action Items
|
||||
|
||||
| Action | Owner | Due Date | Status |
|
||||
|--------|-------|----------|--------|
|
||||
| | | | |
|
||||
|
||||
---
|
||||
|
||||
*Auto-generated by [quarterly-security-audit](/.github/workflows/quarterly-security-audit.yml). Close this issue when the audit is complete.*
|
||||
BODY
|
||||
)"
|
||||
136
.github/workflows/secret-scan.yml
vendored
Normal file
136
.github/workflows/secret-scan.yml
vendored
Normal file
@@ -0,0 +1,136 @@
|
||||
name: Secret Scan
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
types: [opened, synchronize, reopened]
|
||||
|
||||
permissions:
|
||||
pull-requests: write
|
||||
contents: read
|
||||
|
||||
jobs:
|
||||
scan:
|
||||
name: Scan for secrets
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
fetch-depth: 0
|
||||
|
||||
- name: Fetch base branch
|
||||
run: git fetch origin ${{ github.base_ref }}
|
||||
|
||||
- name: Scan diff for secrets
|
||||
id: scan
|
||||
run: |
|
||||
set -euo pipefail
|
||||
|
||||
# Get only added lines from the diff (exclude deletions and context lines)
|
||||
DIFF=$(git diff "origin/${{ github.base_ref }}"...HEAD -- \
|
||||
':!*.lock' ':!uv.lock' ':!package-lock.json' ':!yarn.lock' \
|
||||
| grep '^+' | grep -v '^+++' || true)
|
||||
|
||||
FINDINGS=""
|
||||
CRITICAL=false
|
||||
|
||||
check() {
|
||||
local label="$1"
|
||||
local pattern="$2"
|
||||
local critical="${3:-false}"
|
||||
local matches
|
||||
matches=$(echo "$DIFF" | grep -oP "$pattern" || true)
|
||||
if [ -n "$matches" ]; then
|
||||
FINDINGS="${FINDINGS}\n- **${label}**: pattern matched"
|
||||
if [ "$critical" = "true" ]; then
|
||||
CRITICAL=true
|
||||
fi
|
||||
fi
|
||||
}
|
||||
|
||||
# AWS keys — critical
|
||||
check "AWS Access Key" 'AKIA[0-9A-Z]{16}' true
|
||||
|
||||
# Private key headers — critical
|
||||
check "Private Key Header" '-----BEGIN (RSA|EC|DSA|OPENSSH|PGP) PRIVATE KEY' true
|
||||
|
||||
# OpenAI / Anthropic style keys
|
||||
check "OpenAI-style API key (sk-)" 'sk-[a-zA-Z0-9]{20,}' false
|
||||
|
||||
# GitHub tokens
|
||||
check "GitHub personal access token (ghp_)" 'ghp_[a-zA-Z0-9]{36}' true
|
||||
check "GitHub fine-grained PAT (github_pat_)" 'github_pat_[a-zA-Z0-9_]{1,}' true
|
||||
|
||||
# Slack tokens
|
||||
check "Slack bot token (xoxb-)" 'xoxb-[0-9A-Za-z\-]{10,}' true
|
||||
check "Slack user token (xoxp-)" 'xoxp-[0-9A-Za-z\-]{10,}' true
|
||||
|
||||
# Generic assignment patterns — exclude obvious placeholders
|
||||
GENERIC=$(echo "$DIFF" | grep -iP '(api_key|apikey|api-key|secret_key|access_token|auth_token)\s*[=:]\s*['"'"'"][^'"'"'"]{20,}['"'"'"]' \
|
||||
| grep -ivP '(fake|mock|test|placeholder|example|dummy|your[_-]|xxx|<|>|\{\{)' || true)
|
||||
if [ -n "$GENERIC" ]; then
|
||||
FINDINGS="${FINDINGS}\n- **Generic credential assignment**: possible hardcoded secret"
|
||||
fi
|
||||
|
||||
# .env additions with long values
|
||||
ENV_DIFF=$(git diff "origin/${{ github.base_ref }}"...HEAD -- '*.env' '**/.env' '.env*' \
|
||||
| grep '^+' | grep -v '^+++' || true)
|
||||
ENV_MATCHES=$(echo "$ENV_DIFF" | grep -P '^[A-Z_]+=.{16,}' \
|
||||
| grep -ivP '(fake|mock|test|placeholder|example|dummy|your[_-]|xxx)' || true)
|
||||
if [ -n "$ENV_MATCHES" ]; then
|
||||
FINDINGS="${FINDINGS}\n- **.env file**: lines with potentially real secret values detected"
|
||||
fi
|
||||
|
||||
# Write outputs
|
||||
if [ -n "$FINDINGS" ]; then
|
||||
echo "found=true" >> "$GITHUB_OUTPUT"
|
||||
else
|
||||
echo "found=false" >> "$GITHUB_OUTPUT"
|
||||
fi
|
||||
|
||||
if [ "$CRITICAL" = "true" ]; then
|
||||
echo "critical=true" >> "$GITHUB_OUTPUT"
|
||||
else
|
||||
echo "critical=false" >> "$GITHUB_OUTPUT"
|
||||
fi
|
||||
|
||||
# Store findings in a file to use in comment step
|
||||
printf "%b" "$FINDINGS" > /tmp/secret-findings.txt
|
||||
|
||||
- name: Post PR comment with findings
|
||||
if: steps.scan.outputs.found == 'true' && github.event_name == 'pull_request'
|
||||
env:
|
||||
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
run: |
|
||||
FINDINGS=$(cat /tmp/secret-findings.txt)
|
||||
SEVERITY="warning"
|
||||
if [ "${{ steps.scan.outputs.critical }}" = "true" ]; then
|
||||
SEVERITY="CRITICAL"
|
||||
fi
|
||||
|
||||
BODY="## Secret Scan — ${SEVERITY} findings
|
||||
|
||||
The automated secret scanner detected potential secrets in the diff for this PR.
|
||||
|
||||
### Findings
|
||||
${FINDINGS}
|
||||
|
||||
### What to do
|
||||
1. Remove any real credentials from the diff immediately.
|
||||
2. If the match is a false positive (test fixture, placeholder), add a comment explaining why or rename the variable to include \`fake\`, \`mock\`, or \`test\`.
|
||||
3. Rotate any exposed credentials regardless of whether this PR is merged.
|
||||
|
||||
---
|
||||
*Automated scan by [secret-scan](/.github/workflows/secret-scan.yml)*"
|
||||
|
||||
gh pr comment "${{ github.event.pull_request.number }}" --body "$BODY"
|
||||
|
||||
- name: Fail on critical secrets
|
||||
if: steps.scan.outputs.critical == 'true'
|
||||
run: |
|
||||
echo "::error::Critical secrets detected in diff (private keys, AWS keys, or GitHub tokens). Remove them before merging."
|
||||
exit 1
|
||||
|
||||
- name: Warn on non-critical findings
|
||||
if: steps.scan.outputs.found == 'true' && steps.scan.outputs.critical == 'false'
|
||||
run: |
|
||||
echo "::warning::Potential secrets detected in diff. Review the PR comment for details."
|
||||
25
.pre-commit-config.yaml
Normal file
25
.pre-commit-config.yaml
Normal file
@@ -0,0 +1,25 @@
|
||||
repos:
|
||||
# Secret detection
|
||||
- repo: https://github.com/gitleaks/gitleaks
|
||||
rev: v8.21.2
|
||||
hooks:
|
||||
- id: gitleaks
|
||||
name: Detect secrets with gitleaks
|
||||
description: Detect hardcoded secrets, API keys, and credentials
|
||||
|
||||
# Basic security hygiene
|
||||
- repo: https://github.com/pre-commit/pre-commit-hooks
|
||||
rev: v5.0.0
|
||||
hooks:
|
||||
- id: check-added-large-files
|
||||
args: ['--maxkb=500']
|
||||
- id: detect-private-key
|
||||
name: Detect private keys
|
||||
- id: check-merge-conflict
|
||||
- id: check-yaml
|
||||
- id: check-toml
|
||||
- id: end-of-file-fixer
|
||||
- id: trailing-whitespace
|
||||
args: ['--markdown-linebreak-ext=md']
|
||||
- id: no-commit-to-branch
|
||||
args: ['--branch', 'main']
|
||||
569
DEPLOY.md
Normal file
569
DEPLOY.md
Normal file
@@ -0,0 +1,569 @@
|
||||
# Hermes Agent — Sovereign Deployment Runbook
|
||||
|
||||
> **Goal**: A new VPS can go from bare OS to a running Hermes instance in under 30 minutes using only this document.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Prerequisites](#1-prerequisites)
|
||||
2. [Environment Setup](#2-environment-setup)
|
||||
3. [Secret Injection](#3-secret-injection)
|
||||
4. [Installation](#4-installation)
|
||||
5. [Starting the Stack](#5-starting-the-stack)
|
||||
6. [Health Checks](#6-health-checks)
|
||||
7. [Stop / Restart Procedures](#7-stop--restart-procedures)
|
||||
8. [Zero-Downtime Restart](#8-zero-downtime-restart)
|
||||
9. [Rollback Procedure](#9-rollback-procedure)
|
||||
10. [Database / State Migrations](#10-database--state-migrations)
|
||||
11. [Docker Compose Deployment](#11-docker-compose-deployment)
|
||||
12. [systemd Deployment](#12-systemd-deployment)
|
||||
13. [Monitoring & Logs](#13-monitoring--logs)
|
||||
14. [Security Checklist](#14-security-checklist)
|
||||
15. [Troubleshooting](#15-troubleshooting)
|
||||
|
||||
---
|
||||
|
||||
## 1. Prerequisites
|
||||
|
||||
| Requirement | Minimum | Recommended |
|
||||
|-------------|---------|-------------|
|
||||
| OS | Ubuntu 22.04 LTS | Ubuntu 24.04 LTS |
|
||||
| RAM | 512 MB | 2 GB |
|
||||
| CPU | 1 vCPU | 2 vCPU |
|
||||
| Disk | 5 GB | 20 GB |
|
||||
| Python | 3.11 | 3.12 |
|
||||
| Node.js | 18 | 20 |
|
||||
| Git | any | any |
|
||||
|
||||
**Optional but recommended:**
|
||||
- Docker Engine ≥ 24 + Compose plugin (for containerised deployment)
|
||||
- `curl`, `jq` (for health-check scripting)
|
||||
|
||||
---
|
||||
|
||||
## 2. Environment Setup
|
||||
|
||||
### 2a. Create a dedicated system user (bare-metal deployments)
|
||||
|
||||
```bash
|
||||
sudo useradd -m -s /bin/bash hermes
|
||||
sudo su - hermes
|
||||
```
|
||||
|
||||
### 2b. Install Hermes
|
||||
|
||||
```bash
|
||||
# Official one-liner installer
|
||||
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
|
||||
|
||||
# Reload PATH so `hermes` is available
|
||||
source ~/.bashrc
|
||||
```
|
||||
|
||||
The installer places:
|
||||
- The agent code at `~/.local/lib/python3.x/site-packages/` (pip editable install)
|
||||
- The `hermes` entry point at `~/.local/bin/hermes`
|
||||
- Default config directory at `~/.hermes/`
|
||||
|
||||
### 2c. Verify installation
|
||||
|
||||
```bash
|
||||
hermes --version
|
||||
hermes doctor
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Secret Injection
|
||||
|
||||
**Rule: secrets never live in the repository. They live only in `~/.hermes/.env`.**
|
||||
|
||||
```bash
|
||||
# Copy the template (do NOT edit the repo copy)
|
||||
cp /path/to/hermes-agent/.env.example ~/.hermes/.env
|
||||
chmod 600 ~/.hermes/.env
|
||||
|
||||
# Edit with your preferred editor
|
||||
nano ~/.hermes/.env
|
||||
```
|
||||
|
||||
### Minimum required keys
|
||||
|
||||
| Variable | Purpose | Where to get it |
|
||||
|----------|---------|----------------|
|
||||
| `OPENROUTER_API_KEY` | LLM inference | https://openrouter.ai/keys |
|
||||
| `TELEGRAM_BOT_TOKEN` | Telegram gateway | @BotFather on Telegram |
|
||||
|
||||
### Optional but common keys
|
||||
|
||||
| Variable | Purpose |
|
||||
|----------|---------|
|
||||
| `DISCORD_BOT_TOKEN` | Discord gateway |
|
||||
| `SLACK_BOT_TOKEN` + `SLACK_APP_TOKEN` | Slack gateway |
|
||||
| `EXA_API_KEY` | Web search tool |
|
||||
| `FAL_KEY` | Image generation |
|
||||
| `ANTHROPIC_API_KEY` | Direct Anthropic inference |
|
||||
|
||||
### Pre-flight validation
|
||||
|
||||
Before starting the stack, run:
|
||||
|
||||
```bash
|
||||
python scripts/deploy-validate --check-ports --skip-health
|
||||
```
|
||||
|
||||
This catches missing keys, placeholder values, and misconfigurations without touching running services.
|
||||
|
||||
---
|
||||
|
||||
## 4. Installation
|
||||
|
||||
### 4a. Clone the repository (if not using the installer)
|
||||
|
||||
```bash
|
||||
git clone https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent.git
|
||||
cd hermes-agent
|
||||
pip install -e ".[all]" --user
|
||||
npm install
|
||||
```
|
||||
|
||||
### 4b. Run the setup wizard
|
||||
|
||||
```bash
|
||||
hermes setup
|
||||
```
|
||||
|
||||
The wizard configures your LLM provider, messaging platforms, and data directory interactively.
|
||||
|
||||
---
|
||||
|
||||
## 5. Starting the Stack
|
||||
|
||||
### Bare-metal (foreground — useful for first run)
|
||||
|
||||
```bash
|
||||
# Agent + gateway combined
|
||||
hermes gateway start
|
||||
|
||||
# Or just the CLI agent (no messaging)
|
||||
hermes
|
||||
```
|
||||
|
||||
### Bare-metal (background daemon)
|
||||
|
||||
```bash
|
||||
hermes gateway start &
|
||||
echo $! > ~/.hermes/gateway.pid
|
||||
```
|
||||
|
||||
### Via systemd (recommended for production)
|
||||
|
||||
See [Section 12](#12-systemd-deployment).
|
||||
|
||||
### Via Docker Compose
|
||||
|
||||
See [Section 11](#11-docker-compose-deployment).
|
||||
|
||||
---
|
||||
|
||||
## 6. Health Checks
|
||||
|
||||
### 6a. API server liveness probe
|
||||
|
||||
The API server (enabled via `api_server` platform in gateway config) exposes `/health`:
|
||||
|
||||
```bash
|
||||
curl -s http://127.0.0.1:8642/health | jq .
|
||||
```
|
||||
|
||||
Expected response:
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "ok",
|
||||
"platform": "hermes-agent",
|
||||
"version": "0.5.0",
|
||||
"uptime_seconds": 123,
|
||||
"gateway_state": "running",
|
||||
"platforms": {
|
||||
"telegram": {"state": "connected"},
|
||||
"discord": {"state": "connected"}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Meaning |
|
||||
|-------|---------|
|
||||
| `status` | `"ok"` — HTTP server is alive. Any non-200 = down. |
|
||||
| `gateway_state` | `"running"` — all platforms started. `"starting"` — still initialising. |
|
||||
| `platforms` | Per-adapter connection state. |
|
||||
|
||||
### 6b. Gateway runtime status file
|
||||
|
||||
```bash
|
||||
cat ~/.hermes/gateway_state.json | jq '{state: .gateway_state, platforms: .platforms}'
|
||||
```
|
||||
|
||||
### 6c. Deploy-validate script
|
||||
|
||||
```bash
|
||||
python scripts/deploy-validate
|
||||
```
|
||||
|
||||
Runs all checks and prints a pass/fail summary. Exit code 0 = healthy.
|
||||
|
||||
### 6d. systemd health
|
||||
|
||||
```bash
|
||||
systemctl status hermes-gateway
|
||||
journalctl -u hermes-gateway --since "5 minutes ago"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Stop / Restart Procedures
|
||||
|
||||
### Graceful stop
|
||||
|
||||
```bash
|
||||
# systemd
|
||||
sudo systemctl stop hermes-gateway
|
||||
|
||||
# Docker Compose
|
||||
docker compose -f deploy/docker-compose.yml down
|
||||
|
||||
# Process signal (if running ad-hoc)
|
||||
kill -TERM $(cat ~/.hermes/gateway.pid)
|
||||
```
|
||||
|
||||
### Restart
|
||||
|
||||
```bash
|
||||
# systemd
|
||||
sudo systemctl restart hermes-gateway
|
||||
|
||||
# Docker Compose
|
||||
docker compose -f deploy/docker-compose.yml restart hermes
|
||||
|
||||
# Ad-hoc
|
||||
hermes gateway start --replace
|
||||
```
|
||||
|
||||
The `--replace` flag removes stale PID/lock files from an unclean shutdown before starting.
|
||||
|
||||
---
|
||||
|
||||
## 8. Zero-Downtime Restart
|
||||
|
||||
Hermes is a stateful long-running process (persistent sessions, active cron jobs). True zero-downtime requires careful sequencing.
|
||||
|
||||
### Strategy A — systemd rolling restart (recommended)
|
||||
|
||||
systemd's `Restart=on-failure` with a 5-second back-off ensures automatic recovery from crashes. For intentional restarts, use:
|
||||
|
||||
```bash
|
||||
sudo systemctl reload-or-restart hermes-gateway
|
||||
```
|
||||
|
||||
`hermes-gateway.service` uses `TimeoutStopSec=30` so in-flight agent turns finish before the old process dies.
|
||||
|
||||
> **Note:** Active messaging conversations will see a brief pause (< 30 s) while the gateway reconnects to platforms. The session store is file-based and persists across restarts — conversations resume where they left off.
|
||||
|
||||
### Strategy B — Blue/green with two HERMES_HOME directories
|
||||
|
||||
For zero-downtime where even a brief pause is unacceptable:
|
||||
|
||||
```bash
|
||||
# 1. Prepare the new environment (different HERMES_HOME)
|
||||
export HERMES_HOME=/home/hermes/.hermes-green
|
||||
hermes setup # configure green env with same .env
|
||||
|
||||
# 2. Start green on a different port (e.g. 8643)
|
||||
API_SERVER_PORT=8643 hermes gateway start &
|
||||
|
||||
# 3. Verify green is healthy
|
||||
curl -s http://127.0.0.1:8643/health | jq .gateway_state
|
||||
|
||||
# 4. Switch load balancer (nginx/caddy) to port 8643
|
||||
|
||||
# 5. Gracefully stop blue
|
||||
kill -TERM $(cat ~/.hermes/.hermes/gateway.pid)
|
||||
```
|
||||
|
||||
### Strategy C — Docker Compose rolling update
|
||||
|
||||
```bash
|
||||
# Pull the new image
|
||||
docker compose -f deploy/docker-compose.yml pull hermes
|
||||
|
||||
# Recreate with zero-downtime if you have a replicated setup
|
||||
docker compose -f deploy/docker-compose.yml up -d --no-deps hermes
|
||||
```
|
||||
|
||||
Docker stops the old container only after the new one passes its healthcheck.
|
||||
|
||||
---
|
||||
|
||||
## 9. Rollback Procedure
|
||||
|
||||
### 9a. Code rollback (pip install)
|
||||
|
||||
```bash
|
||||
# Find the previous version tag
|
||||
git log --oneline --tags | head -10
|
||||
|
||||
# Roll back to a specific tag
|
||||
git checkout v0.4.0
|
||||
pip install -e ".[all]" --user --quiet
|
||||
|
||||
# Restart the gateway
|
||||
sudo systemctl restart hermes-gateway
|
||||
```
|
||||
|
||||
### 9b. Docker image rollback
|
||||
|
||||
```bash
|
||||
# Pull a specific version
|
||||
docker pull ghcr.io/nousresearch/hermes-agent:v0.4.0
|
||||
|
||||
# Update docker-compose.yml image tag, then:
|
||||
docker compose -f deploy/docker-compose.yml up -d
|
||||
```
|
||||
|
||||
### 9c. State / data rollback
|
||||
|
||||
The data directory (`~/.hermes/` or the Docker volume `hermes_data`) contains sessions, memories, cron jobs, and the response store. Back it up before every update:
|
||||
|
||||
```bash
|
||||
# Backup (run BEFORE updating)
|
||||
tar czf ~/backups/hermes_data_$(date +%F_%H%M).tar.gz ~/.hermes/
|
||||
|
||||
# Restore from backup
|
||||
sudo systemctl stop hermes-gateway
|
||||
rm -rf ~/.hermes/
|
||||
tar xzf ~/backups/hermes_data_2026-04-06_1200.tar.gz -C ~/
|
||||
sudo systemctl start hermes-gateway
|
||||
```
|
||||
|
||||
> **Tested rollback**: The rollback procedure above was validated in staging on 2026-04-06. Data integrity was confirmed by checking session count before/after: `ls ~/.hermes/sessions/ | wc -l`.
|
||||
|
||||
---
|
||||
|
||||
## 10. Database / State Migrations
|
||||
|
||||
Hermes uses two persistent stores:
|
||||
|
||||
| Store | Location | Format |
|
||||
|-------|----------|--------|
|
||||
| Session store | `~/.hermes/sessions/*.json` | JSON files |
|
||||
| Response store (API server) | `~/.hermes/response_store.db` | SQLite WAL |
|
||||
| Gateway state | `~/.hermes/gateway_state.json` | JSON |
|
||||
| Memories | `~/.hermes/memories/*.md` | Markdown files |
|
||||
| Cron jobs | `~/.hermes/cron/*.json` | JSON files |
|
||||
|
||||
### Migration steps (between versions)
|
||||
|
||||
1. **Stop** the gateway before migrating.
|
||||
2. **Backup** the data directory (see Section 9c).
|
||||
3. **Check release notes** for migration instructions (see `RELEASE_*.md`).
|
||||
4. **Run** `hermes doctor` after starting the new version — it validates state compatibility.
|
||||
5. **Verify** health via `python scripts/deploy-validate`.
|
||||
|
||||
There are currently no SQL migrations to run manually. The SQLite schema is
|
||||
created automatically on first use with `CREATE TABLE IF NOT EXISTS`.
|
||||
|
||||
---
|
||||
|
||||
## 11. Docker Compose Deployment
|
||||
|
||||
### First-time setup
|
||||
|
||||
```bash
|
||||
# 1. Copy .env.example to .env in the repo root
|
||||
cp .env.example .env
|
||||
nano .env # fill in your API keys
|
||||
|
||||
# 2. Validate config before starting
|
||||
python scripts/deploy-validate --skip-health
|
||||
|
||||
# 3. Start the stack
|
||||
docker compose -f deploy/docker-compose.yml up -d
|
||||
|
||||
# 4. Watch startup logs
|
||||
docker compose -f deploy/docker-compose.yml logs -f
|
||||
|
||||
# 5. Verify health
|
||||
curl -s http://127.0.0.1:8642/health | jq .
|
||||
```
|
||||
|
||||
### Updating to a new version
|
||||
|
||||
```bash
|
||||
# Pull latest image
|
||||
docker compose -f deploy/docker-compose.yml pull
|
||||
|
||||
# Recreate container (Docker waits for healthcheck before stopping old)
|
||||
docker compose -f deploy/docker-compose.yml up -d
|
||||
|
||||
# Watch logs
|
||||
docker compose -f deploy/docker-compose.yml logs -f --since 2m
|
||||
```
|
||||
|
||||
### Data backup (Docker)
|
||||
|
||||
```bash
|
||||
docker run --rm \
|
||||
-v hermes_data:/data \
|
||||
-v $(pwd)/backups:/backup \
|
||||
alpine tar czf /backup/hermes_data_$(date +%F).tar.gz /data
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 12. systemd Deployment
|
||||
|
||||
### Install unit files
|
||||
|
||||
```bash
|
||||
# From the repo root
|
||||
sudo cp deploy/hermes-agent.service /etc/systemd/system/
|
||||
sudo cp deploy/hermes-gateway.service /etc/systemd/system/
|
||||
|
||||
sudo systemctl daemon-reload
|
||||
|
||||
# Enable on boot + start now
|
||||
sudo systemctl enable --now hermes-gateway
|
||||
|
||||
# (Optional) also run the CLI agent as a background service
|
||||
# sudo systemctl enable --now hermes-agent
|
||||
```
|
||||
|
||||
### Adjust the unit file for your user/paths
|
||||
|
||||
Edit `/etc/systemd/system/hermes-gateway.service`:
|
||||
|
||||
```ini
|
||||
[Service]
|
||||
User=youruser # change from 'hermes'
|
||||
WorkingDirectory=/home/youruser
|
||||
EnvironmentFile=/home/youruser/.hermes/.env
|
||||
ExecStart=/home/youruser/.local/bin/hermes gateway start --replace
|
||||
```
|
||||
|
||||
Then:
|
||||
|
||||
```bash
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl restart hermes-gateway
|
||||
```
|
||||
|
||||
### Verify
|
||||
|
||||
```bash
|
||||
systemctl status hermes-gateway
|
||||
journalctl -u hermes-gateway -f
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 13. Monitoring & Logs
|
||||
|
||||
### Log locations
|
||||
|
||||
| Log | Location |
|
||||
|-----|----------|
|
||||
| Gateway (systemd) | `journalctl -u hermes-gateway` |
|
||||
| Gateway (Docker) | `docker compose logs hermes` |
|
||||
| Session trajectories | `~/.hermes/logs/session_*.json` |
|
||||
| Deploy events | `~/.hermes/logs/deploy.log` |
|
||||
| Runtime state | `~/.hermes/gateway_state.json` |
|
||||
|
||||
### Useful log commands
|
||||
|
||||
```bash
|
||||
# Last 100 lines, follow
|
||||
journalctl -u hermes-gateway -n 100 -f
|
||||
|
||||
# Errors only
|
||||
journalctl -u hermes-gateway -p err --since today
|
||||
|
||||
# Docker: structured logs with timestamps
|
||||
docker compose -f deploy/docker-compose.yml logs --timestamps hermes
|
||||
```
|
||||
|
||||
### Alerting
|
||||
|
||||
Add a cron job on the host to page you if the health check fails:
|
||||
|
||||
```bash
|
||||
# /etc/cron.d/hermes-healthcheck
|
||||
* * * * * root curl -sf http://127.0.0.1:8642/health > /dev/null || \
|
||||
echo "Hermes unhealthy at $(date)" | mail -s "ALERT: Hermes down" ops@example.com
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 14. Security Checklist
|
||||
|
||||
- [ ] `.env` has permissions `600` and is **not** tracked by git (`git ls-files .env` returns nothing).
|
||||
- [ ] `API_SERVER_KEY` is set if the API server is exposed beyond `127.0.0.1`.
|
||||
- [ ] API server is bound to `127.0.0.1` (not `0.0.0.0`) unless behind a TLS-terminating reverse proxy.
|
||||
- [ ] Firewall allows only the ports your platforms require (no unnecessary open ports).
|
||||
- [ ] systemd unit uses `NoNewPrivileges=true`, `PrivateTmp=true`, `ProtectSystem=strict`.
|
||||
- [ ] Docker container has resource limits set (`deploy.resources.limits`).
|
||||
- [ ] Backups of `~/.hermes/` are stored outside the server (e.g. S3, remote NAS).
|
||||
- [ ] `hermes doctor` returns no errors on the running instance.
|
||||
- [ ] `python scripts/deploy-validate` exits 0 after every configuration change.
|
||||
|
||||
---
|
||||
|
||||
## 15. Troubleshooting
|
||||
|
||||
### Gateway won't start
|
||||
|
||||
```bash
|
||||
hermes gateway start --replace # clears stale PID files
|
||||
|
||||
# Check for port conflicts
|
||||
ss -tlnp | grep 8642
|
||||
|
||||
# Verbose logs
|
||||
HERMES_LOG_LEVEL=DEBUG hermes gateway start
|
||||
```
|
||||
|
||||
### Health check returns `gateway_state: "starting"` for more than 60 s
|
||||
|
||||
Platform adapters take time to authenticate (especially Telegram + Discord). Check logs for auth errors:
|
||||
|
||||
```bash
|
||||
journalctl -u hermes-gateway --since "2 minutes ago" | grep -i "error\|token\|auth"
|
||||
```
|
||||
|
||||
### `/health` returns connection refused
|
||||
|
||||
The API server platform may not be enabled. Verify your gateway config (`~/.hermes/config.yaml`) includes:
|
||||
|
||||
```yaml
|
||||
gateway:
|
||||
platforms:
|
||||
- api_server
|
||||
```
|
||||
|
||||
### Rollback needed after failed update
|
||||
|
||||
See [Section 9](#9-rollback-procedure). If you backed up before updating, rollback takes < 5 minutes.
|
||||
|
||||
### Sessions lost after restart
|
||||
|
||||
Sessions are file-based in `~/.hermes/sessions/`. They persist across restarts. If they are gone, check:
|
||||
|
||||
```bash
|
||||
ls -la ~/.hermes/sessions/
|
||||
# Verify the volume is mounted (Docker):
|
||||
docker exec hermes-agent ls /opt/data/sessions/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
*This runbook is owned by the Bezalel epic backlog. Update it whenever deployment procedures change.*
|
||||
@@ -1,44 +1,34 @@
|
||||
# Ezra Configuration - Kimi Primary
|
||||
# Anthropic removed from chain entirely
|
||||
|
||||
# PRIMARY: Kimi for all operations
|
||||
model: kimi-coding/kimi-for-coding
|
||||
|
||||
# Fallback chain: Only local/offline options
|
||||
# NO anthropic in the chain - quota issues solved
|
||||
fallback_providers:
|
||||
- provider: ollama
|
||||
model: qwen2.5:7b
|
||||
base_url: http://localhost:11434
|
||||
timeout: 120
|
||||
reason: "Local fallback when Kimi unavailable"
|
||||
|
||||
# Provider settings
|
||||
providers:
|
||||
kimi-coding:
|
||||
timeout: 60
|
||||
max_retries: 3
|
||||
# Uses KIMI_API_KEY from .env
|
||||
|
||||
ollama:
|
||||
timeout: 120
|
||||
keep_alive: true
|
||||
base_url: http://localhost:11434
|
||||
|
||||
# REMOVED: anthropic provider entirely
|
||||
# No more quota issues, no more choking
|
||||
|
||||
# Toolsets - Ezra needs these
|
||||
model:
|
||||
default: kimi-k2.5
|
||||
provider: kimi-coding
|
||||
toolsets:
|
||||
- hermes-cli
|
||||
- github
|
||||
- web
|
||||
|
||||
# Agent settings
|
||||
- all
|
||||
fallback_providers:
|
||||
- provider: kimi-coding
|
||||
model: kimi-k2.5
|
||||
timeout: 120
|
||||
reason: Kimi coding fallback (front of chain)
|
||||
- provider: anthropic
|
||||
model: claude-sonnet-4-20250514
|
||||
timeout: 120
|
||||
reason: Direct Anthropic fallback
|
||||
- provider: openrouter
|
||||
model: anthropic/claude-sonnet-4-20250514
|
||||
base_url: https://openrouter.ai/api/v1
|
||||
api_key_env: OPENROUTER_API_KEY
|
||||
timeout: 120
|
||||
reason: OpenRouter fallback
|
||||
agent:
|
||||
max_turns: 90
|
||||
tool_use_enforcement: auto
|
||||
|
||||
# Display settings
|
||||
display:
|
||||
show_provider_switches: true
|
||||
reasoning_effort: high
|
||||
verbose: false
|
||||
providers:
|
||||
kimi-coding:
|
||||
base_url: https://api.kimi.com/coding/v1
|
||||
timeout: 60
|
||||
max_retries: 3
|
||||
anthropic:
|
||||
timeout: 120
|
||||
openrouter:
|
||||
base_url: https://openrouter.ai/api/v1
|
||||
timeout: 120
|
||||
|
||||
33
deploy/docker-compose.override.yml.example
Normal file
33
deploy/docker-compose.override.yml.example
Normal file
@@ -0,0 +1,33 @@
|
||||
# docker-compose.override.yml.example
|
||||
#
|
||||
# Copy this file to docker-compose.override.yml and uncomment sections as needed.
|
||||
# Override files are merged on top of docker-compose.yml automatically.
|
||||
# They are gitignored — safe for local customization without polluting the repo.
|
||||
|
||||
services:
|
||||
hermes:
|
||||
# --- Local build (for development) ---
|
||||
# build:
|
||||
# context: ..
|
||||
# dockerfile: ../Dockerfile
|
||||
# target: development
|
||||
|
||||
# --- Expose gateway port externally (dev only — not for production) ---
|
||||
# ports:
|
||||
# - "8642:8642"
|
||||
|
||||
# --- Attach to a custom network shared with other local services ---
|
||||
# networks:
|
||||
# - myapp_network
|
||||
|
||||
# --- Override resource limits for a smaller VPS ---
|
||||
# deploy:
|
||||
# resources:
|
||||
# limits:
|
||||
# cpus: "0.5"
|
||||
# memory: 512M
|
||||
|
||||
# --- Mount local source for live-reload (dev only) ---
|
||||
# volumes:
|
||||
# - hermes_data:/opt/data
|
||||
# - ..:/opt/hermes:ro
|
||||
85
deploy/docker-compose.yml
Normal file
85
deploy/docker-compose.yml
Normal file
@@ -0,0 +1,85 @@
|
||||
# Hermes Agent — Docker Compose Stack
|
||||
# Brings up the agent + messaging gateway as a single unit.
|
||||
#
|
||||
# Usage:
|
||||
# docker compose up -d # start in background
|
||||
# docker compose logs -f # follow logs
|
||||
# docker compose down # stop and remove containers
|
||||
# docker compose pull && docker compose up -d # rolling update
|
||||
#
|
||||
# Secrets:
|
||||
# Never commit .env to version control. Copy .env.example → .env and fill it in.
|
||||
# See DEPLOY.md for the full environment-variable reference.
|
||||
|
||||
services:
|
||||
hermes:
|
||||
image: ghcr.io/nousresearch/hermes-agent:latest
|
||||
# To build locally instead:
|
||||
# build:
|
||||
# context: ..
|
||||
# dockerfile: ../Dockerfile
|
||||
container_name: hermes-agent
|
||||
restart: unless-stopped
|
||||
|
||||
# Bind-mount the data volume so state (sessions, logs, memories, cron)
|
||||
# survives container replacement.
|
||||
volumes:
|
||||
- hermes_data:/opt/data
|
||||
|
||||
# Load secrets from the .env file next to docker-compose.yml.
|
||||
# The file is bind-mounted at runtime; it is NOT baked into the image.
|
||||
env_file:
|
||||
- ../.env
|
||||
|
||||
environment:
|
||||
# Override the data directory so it always points at the volume.
|
||||
HERMES_HOME: /opt/data
|
||||
|
||||
# Expose the OpenAI-compatible API server (if api_server platform enabled).
|
||||
# Comment out or remove if you are not using the API server.
|
||||
ports:
|
||||
- "127.0.0.1:8642:8642"
|
||||
|
||||
healthcheck:
|
||||
# Hits the API server's /health endpoint. The gateway writes its own
|
||||
# health state to /opt/data/gateway_state.json — checked by the
|
||||
# health-check script in scripts/deploy-validate.
|
||||
test: ["CMD", "python3", "-c",
|
||||
"import urllib.request; urllib.request.urlopen('http://localhost:8642/health', timeout=5)"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 60s
|
||||
|
||||
# The container does not need internet on a private network;
|
||||
# restrict egress as needed via your host firewall.
|
||||
networks:
|
||||
- hermes_net
|
||||
|
||||
logging:
|
||||
driver: "json-file"
|
||||
options:
|
||||
max-size: "50m"
|
||||
max-file: "5"
|
||||
|
||||
# Resource limits: tune for your VPS size.
|
||||
# 2 GB RAM and 1.5 CPUs work for most conversational workloads.
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
cpus: "1.5"
|
||||
memory: 2G
|
||||
reservations:
|
||||
memory: 512M
|
||||
|
||||
volumes:
|
||||
hermes_data:
|
||||
# Named volume — Docker manages the lifecycle.
|
||||
# To inspect: docker volume inspect hermes_data
|
||||
# To back up:
|
||||
# docker run --rm -v hermes_data:/data -v $(pwd):/backup \
|
||||
# alpine tar czf /backup/hermes_data_$(date +%F).tar.gz /data
|
||||
|
||||
networks:
|
||||
hermes_net:
|
||||
driver: bridge
|
||||
59
deploy/hermes-agent.service
Normal file
59
deploy/hermes-agent.service
Normal file
@@ -0,0 +1,59 @@
|
||||
# systemd unit — Hermes Agent (interactive CLI / headless agent)
|
||||
#
|
||||
# Install:
|
||||
# sudo cp hermes-agent.service /etc/systemd/system/
|
||||
# sudo systemctl daemon-reload
|
||||
# sudo systemctl enable --now hermes-agent
|
||||
#
|
||||
# This unit runs the Hermes CLI in headless / non-interactive mode, meaning the
|
||||
# agent loop stays alive but does not present a TUI. It is appropriate for
|
||||
# dedicated VPS deployments where you want the agent always running and
|
||||
# accessible via the messaging gateway or API server.
|
||||
#
|
||||
# If you only want the messaging gateway, use hermes-gateway.service instead.
|
||||
# Running both units simultaneously is safe — they share ~/.hermes by default.
|
||||
|
||||
[Unit]
|
||||
Description=Hermes Agent
|
||||
Documentation=https://hermes-agent.nousresearch.com/docs/
|
||||
After=network-online.target
|
||||
Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=hermes
|
||||
Group=hermes
|
||||
|
||||
# The working directory — adjust if Hermes is installed elsewhere.
|
||||
WorkingDirectory=/home/hermes
|
||||
|
||||
# Load secrets from the data directory (never from the source repo).
|
||||
EnvironmentFile=/home/hermes/.hermes/.env
|
||||
|
||||
# Run the gateway; add --replace if restarting over a stale PID file.
|
||||
ExecStart=/home/hermes/.local/bin/hermes gateway start
|
||||
|
||||
# Graceful stop: send SIGTERM and wait up to 30 s before SIGKILL.
|
||||
ExecStop=/bin/kill -TERM $MAINPID
|
||||
TimeoutStopSec=30
|
||||
|
||||
# Restart automatically on failure; back off exponentially.
|
||||
Restart=on-failure
|
||||
RestartSec=5s
|
||||
StartLimitBurst=5
|
||||
StartLimitIntervalSec=60s
|
||||
|
||||
# Security hardening — tighten as appropriate for your deployment.
|
||||
NoNewPrivileges=true
|
||||
PrivateTmp=true
|
||||
ProtectSystem=strict
|
||||
ProtectHome=read-only
|
||||
ReadWritePaths=/home/hermes/.hermes /home/hermes/.local/share/hermes
|
||||
|
||||
# Logging — output goes to journald; read with: journalctl -u hermes-agent -f
|
||||
StandardOutput=journal
|
||||
StandardError=journal
|
||||
SyslogIdentifier=hermes-agent
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
59
deploy/hermes-gateway.service
Normal file
59
deploy/hermes-gateway.service
Normal file
@@ -0,0 +1,59 @@
|
||||
# systemd unit — Hermes Gateway (messaging platform adapter)
|
||||
#
|
||||
# Install:
|
||||
# sudo cp hermes-gateway.service /etc/systemd/system/
|
||||
# sudo systemctl daemon-reload
|
||||
# sudo systemctl enable --now hermes-gateway
|
||||
#
|
||||
# The gateway connects Hermes to Telegram, Discord, Slack, WhatsApp, Signal,
|
||||
# and other platforms. It is a long-running asyncio process that bridges
|
||||
# inbound messages to the agent and routes responses back.
|
||||
#
|
||||
# See DEPLOY.md for environment variable configuration.
|
||||
|
||||
[Unit]
|
||||
Description=Hermes Gateway (messaging platform bridge)
|
||||
Documentation=https://hermes-agent.nousresearch.com/docs/user-guide/messaging
|
||||
After=network-online.target
|
||||
Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=hermes
|
||||
Group=hermes
|
||||
|
||||
WorkingDirectory=/home/hermes
|
||||
|
||||
# Load environment (API keys, platform tokens, etc.) from the data directory.
|
||||
EnvironmentFile=/home/hermes/.hermes/.env
|
||||
|
||||
# --replace clears stale PID/lock files from an unclean previous shutdown.
|
||||
ExecStart=/home/hermes/.local/bin/hermes gateway start --replace
|
||||
|
||||
# Pre-start hook: write a timestamped marker so rollback can diff against it.
|
||||
ExecStartPre=/bin/sh -c 'echo "$(date -u +%%Y-%%m-%%dT%%H:%%M:%%SZ) gateway starting" >> /home/hermes/.hermes/logs/deploy.log'
|
||||
|
||||
# Post-stop hook: log shutdown time for audit trail.
|
||||
ExecStopPost=/bin/sh -c 'echo "$(date -u +%%Y-%%m-%%dT%%H:%%M:%%SZ) gateway stopped" >> /home/hermes/.hermes/logs/deploy.log'
|
||||
|
||||
ExecStop=/bin/kill -TERM $MAINPID
|
||||
TimeoutStopSec=30
|
||||
|
||||
Restart=on-failure
|
||||
RestartSec=5s
|
||||
StartLimitBurst=5
|
||||
StartLimitIntervalSec=60s
|
||||
|
||||
# Security hardening.
|
||||
NoNewPrivileges=true
|
||||
PrivateTmp=true
|
||||
ProtectSystem=strict
|
||||
ProtectHome=read-only
|
||||
ReadWritePaths=/home/hermes/.hermes /home/hermes/.local/share/hermes
|
||||
|
||||
StandardOutput=journal
|
||||
StandardError=journal
|
||||
SyslogIdentifier=hermes-gateway
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
56
devkit/README.md
Normal file
56
devkit/README.md
Normal file
@@ -0,0 +1,56 @@
|
||||
# Bezalel's Devkit — Shared Tools for the Wizard Fleet
|
||||
|
||||
This directory contains reusable CLI tools and Python modules for CI, testing, deployment, observability, and Gitea automation. Any wizard can invoke them via `python -m devkit.<tool>`.
|
||||
|
||||
## Tools
|
||||
|
||||
### `gitea_client` — Gitea API Client
|
||||
List issues/PRs, post comments, create PRs, update issues.
|
||||
|
||||
```bash
|
||||
python -m devkit.gitea_client issues --state open --limit 20
|
||||
python -m devkit.gitea_client create-comment --number 142 --body "Update from Bezalel"
|
||||
python -m devkit.gitea_client prs --state open
|
||||
```
|
||||
|
||||
### `health` — Fleet Health Monitor
|
||||
Checks system load, disk, memory, running processes, and key package versions.
|
||||
|
||||
```bash
|
||||
python -m devkit.health --threshold-load 1.0 --threshold-disk 90.0 --fail-on-critical
|
||||
```
|
||||
|
||||
### `notebook_runner` — Notebook Execution Wrapper
|
||||
Parameterizes and executes Jupyter notebooks via Papermill with structured JSON reporting.
|
||||
|
||||
```bash
|
||||
python -m devkit.notebook_runner task.ipynb output.ipynb -p threshold=1.0 -p hostname=forge
|
||||
```
|
||||
|
||||
### `smoke_test` — Fast Smoke Test Runner
|
||||
Runs core import checks, CLI entrypoint tests, and one bare green-path E2E.
|
||||
|
||||
```bash
|
||||
python -m devkit.smoke_test --verbose
|
||||
```
|
||||
|
||||
### `secret_scan` — Secret Leak Scanner
|
||||
Scans the repo for API keys, tokens, and private keys.
|
||||
|
||||
```bash
|
||||
python -m devkit.secret_scan --path . --fail-on-find
|
||||
```
|
||||
|
||||
### `wizard_env` — Environment Validator
|
||||
Checks that a wizard environment has all required binaries, env vars, Python packages, and Hermes config.
|
||||
|
||||
```bash
|
||||
python -m devkit.wizard_env --json --fail-on-incomplete
|
||||
```
|
||||
|
||||
## Philosophy
|
||||
|
||||
- **CLI-first** — Every tool is runnable as `python -m devkit.<tool>`
|
||||
- **JSON output** — Easy to parse from other agents and CI pipelines
|
||||
- **Zero dependencies beyond stdlib** where possible; optional heavy deps are runtime-checked
|
||||
- **Fail-fast** — Exit codes are meaningful for CI gating
|
||||
9
devkit/__init__.py
Normal file
9
devkit/__init__.py
Normal file
@@ -0,0 +1,9 @@
|
||||
"""
|
||||
Bezalel's Devkit — Shared development tools for the wizard fleet.
|
||||
|
||||
A collection of CLI-accessible utilities for CI, testing, deployment,
|
||||
observability, and Gitea automation. Designed to be used by any agent
|
||||
via subprocess or direct Python import.
|
||||
"""
|
||||
|
||||
__version__ = "0.1.0"
|
||||
153
devkit/gitea_client.py
Normal file
153
devkit/gitea_client.py
Normal file
@@ -0,0 +1,153 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Shared Gitea API client for wizard fleet automation.
|
||||
|
||||
Usage as CLI:
|
||||
python -m devkit.gitea_client issues --repo Timmy_Foundation/hermes-agent --state open
|
||||
python -m devkit.gitea_client issue --repo Timmy_Foundation/hermes-agent --number 142
|
||||
python -m devkit.gitea_client create-comment --repo Timmy_Foundation/hermes-agent --number 142 --body "Update from Bezalel"
|
||||
python -m devkit.gitea_client prs --repo Timmy_Foundation/hermes-agent --state open
|
||||
|
||||
Usage as module:
|
||||
from devkit.gitea_client import GiteaClient
|
||||
client = GiteaClient()
|
||||
issues = client.list_issues("Timmy_Foundation/hermes-agent", state="open")
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
import urllib.request
|
||||
|
||||
|
||||
DEFAULT_BASE_URL = os.getenv("GITEA_URL", "https://forge.alexanderwhitestone.com")
|
||||
DEFAULT_TOKEN = os.getenv("GITEA_TOKEN", "")
|
||||
|
||||
|
||||
class GiteaClient:
|
||||
def __init__(self, base_url: str = DEFAULT_BASE_URL, token: str = DEFAULT_TOKEN):
|
||||
self.base_url = base_url.rstrip("/")
|
||||
self.token = token or ""
|
||||
|
||||
def _request(
|
||||
self,
|
||||
method: str,
|
||||
path: str,
|
||||
data: Optional[Dict[str, Any]] = None,
|
||||
headers: Optional[Dict[str, str]] = None,
|
||||
) -> Any:
|
||||
url = f"{self.base_url}/api/v1{path}"
|
||||
req_headers = {"Content-Type": "application/json", "Accept": "application/json"}
|
||||
if self.token:
|
||||
req_headers["Authorization"] = f"token {self.token}"
|
||||
if headers:
|
||||
req_headers.update(headers)
|
||||
|
||||
body = json.dumps(data).encode() if data else None
|
||||
req = urllib.request.Request(url, data=body, headers=req_headers, method=method)
|
||||
|
||||
try:
|
||||
with urllib.request.urlopen(req) as resp:
|
||||
return json.loads(resp.read().decode())
|
||||
except urllib.error.HTTPError as e:
|
||||
return {"error": True, "status": e.code, "body": e.read().decode()}
|
||||
|
||||
def list_issues(self, repo: str, state: str = "open", limit: int = 50) -> List[Dict]:
|
||||
return self._request("GET", f"/repos/{repo}/issues?state={state}&limit={limit}") or []
|
||||
|
||||
def get_issue(self, repo: str, number: int) -> Dict:
|
||||
return self._request("GET", f"/repos/{repo}/issues/{number}") or {}
|
||||
|
||||
def create_comment(self, repo: str, number: int, body: str) -> Dict:
|
||||
return self._request(
|
||||
"POST", f"/repos/{repo}/issues/{number}/comments", {"body": body}
|
||||
)
|
||||
|
||||
def update_issue(self, repo: str, number: int, **fields) -> Dict:
|
||||
return self._request("PATCH", f"/repos/{repo}/issues/{number}", fields)
|
||||
|
||||
def list_prs(self, repo: str, state: str = "open", limit: int = 50) -> List[Dict]:
|
||||
return self._request("GET", f"/repos/{repo}/pulls?state={state}&limit={limit}") or []
|
||||
|
||||
def get_pr(self, repo: str, number: int) -> Dict:
|
||||
return self._request("GET", f"/repos/{repo}/pulls/{number}") or {}
|
||||
|
||||
def create_pr(self, repo: str, title: str, head: str, base: str, body: str = "") -> Dict:
|
||||
return self._request(
|
||||
"POST",
|
||||
f"/repos/{repo}/pulls",
|
||||
{"title": title, "head": head, "base": base, "body": body},
|
||||
)
|
||||
|
||||
|
||||
def _fmt_json(obj: Any) -> str:
|
||||
return json.dumps(obj, indent=2, ensure_ascii=False)
|
||||
|
||||
|
||||
def main(argv: List[str] = None) -> int:
|
||||
argv = argv or sys.argv[1:]
|
||||
parser = argparse.ArgumentParser(description="Gitea CLI for wizard fleet")
|
||||
parser.add_argument("--repo", default="Timmy_Foundation/hermes-agent", help="Repository full name")
|
||||
parser.add_argument("--token", default=DEFAULT_TOKEN, help="Gitea API token")
|
||||
parser.add_argument("--base-url", default=DEFAULT_BASE_URL, help="Gitea base URL")
|
||||
sub = parser.add_subparsers(dest="cmd")
|
||||
|
||||
p_issues = sub.add_parser("issues", help="List issues")
|
||||
p_issues.add_argument("--state", default="open")
|
||||
p_issues.add_argument("--limit", type=int, default=50)
|
||||
|
||||
p_issue = sub.add_parser("issue", help="Get single issue")
|
||||
p_issue.add_argument("--number", type=int, required=True)
|
||||
|
||||
p_prs = sub.add_parser("prs", help="List PRs")
|
||||
p_prs.add_argument("--state", default="open")
|
||||
p_prs.add_argument("--limit", type=int, default=50)
|
||||
|
||||
p_pr = sub.add_parser("pr", help="Get single PR")
|
||||
p_pr.add_argument("--number", type=int, required=True)
|
||||
|
||||
p_comment = sub.add_parser("create-comment", help="Post comment on issue/PR")
|
||||
p_comment.add_argument("--number", type=int, required=True)
|
||||
p_comment.add_argument("--body", required=True)
|
||||
|
||||
p_update = sub.add_parser("update-issue", help="Update issue fields")
|
||||
p_update.add_argument("--number", type=int, required=True)
|
||||
p_update.add_argument("--title", default=None)
|
||||
p_update.add_argument("--body", default=None)
|
||||
p_update.add_argument("--state", default=None)
|
||||
|
||||
p_create_pr = sub.add_parser("create-pr", help="Create a PR")
|
||||
p_create_pr.add_argument("--title", required=True)
|
||||
p_create_pr.add_argument("--head", required=True)
|
||||
p_create_pr.add_argument("--base", default="main")
|
||||
p_create_pr.add_argument("--body", default="")
|
||||
|
||||
args = parser.parse_args(argv)
|
||||
client = GiteaClient(base_url=args.base_url, token=args.token)
|
||||
|
||||
if args.cmd == "issues":
|
||||
print(_fmt_json(client.list_issues(args.repo, args.state, args.limit)))
|
||||
elif args.cmd == "issue":
|
||||
print(_fmt_json(client.get_issue(args.repo, args.number)))
|
||||
elif args.cmd == "prs":
|
||||
print(_fmt_json(client.list_prs(args.repo, args.state, args.limit)))
|
||||
elif args.cmd == "pr":
|
||||
print(_fmt_json(client.get_pr(args.repo, args.number)))
|
||||
elif args.cmd == "create-comment":
|
||||
print(_fmt_json(client.create_comment(args.repo, args.number, args.body)))
|
||||
elif args.cmd == "update-issue":
|
||||
fields = {k: v for k, v in {"title": args.title, "body": args.body, "state": args.state}.items() if v is not None}
|
||||
print(_fmt_json(client.update_issue(args.repo, args.number, **fields)))
|
||||
elif args.cmd == "create-pr":
|
||||
print(_fmt_json(client.create_pr(args.repo, args.title, args.head, args.base, args.body)))
|
||||
else:
|
||||
parser.print_help()
|
||||
return 1
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
134
devkit/health.py
Normal file
134
devkit/health.py
Normal file
@@ -0,0 +1,134 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Fleet health monitor for wizard agents.
|
||||
Checks local system state and reports structured health metrics.
|
||||
|
||||
Usage as CLI:
|
||||
python -m devkit.health
|
||||
python -m devkit.health --threshold-load 1.0 --check-disk
|
||||
|
||||
Usage as module:
|
||||
from devkit.health import check_health
|
||||
report = check_health()
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import shutil
|
||||
import subprocess
|
||||
import sys
|
||||
import time
|
||||
from typing import Any, Dict, List
|
||||
|
||||
|
||||
def _run(cmd: List[str]) -> str:
|
||||
try:
|
||||
return subprocess.check_output(cmd, stderr=subprocess.DEVNULL).decode().strip()
|
||||
except Exception as e:
|
||||
return f"error: {e}"
|
||||
|
||||
|
||||
def check_health(threshold_load: float = 1.0, threshold_disk_percent: float = 90.0) -> Dict[str, Any]:
|
||||
gather_time = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())
|
||||
|
||||
# Load average
|
||||
load_raw = _run(["cat", "/proc/loadavg"])
|
||||
load_values = []
|
||||
avg_load = None
|
||||
if load_raw.startswith("error:"):
|
||||
load_status = load_raw
|
||||
else:
|
||||
try:
|
||||
load_values = [float(x) for x in load_raw.split()[:3]]
|
||||
avg_load = sum(load_values) / len(load_values)
|
||||
load_status = "critical" if avg_load > threshold_load else "ok"
|
||||
except Exception as e:
|
||||
load_status = f"error parsing load: {e}"
|
||||
|
||||
# Disk usage
|
||||
disk = shutil.disk_usage("/")
|
||||
disk_percent = (disk.used / disk.total) * 100 if disk.total else 0.0
|
||||
disk_status = "critical" if disk_percent > threshold_disk_percent else "ok"
|
||||
|
||||
# Memory
|
||||
meminfo = _run(["cat", "/proc/meminfo"])
|
||||
mem_stats = {}
|
||||
for line in meminfo.splitlines():
|
||||
if ":" in line:
|
||||
key, val = line.split(":", 1)
|
||||
mem_stats[key.strip()] = val.strip()
|
||||
|
||||
# Running processes
|
||||
hermes_pids = []
|
||||
try:
|
||||
ps_out = subprocess.check_output(["pgrep", "-a", "-f", "hermes"]).decode().strip()
|
||||
hermes_pids = [line.split(None, 1) for line in ps_out.splitlines() if line.strip()]
|
||||
except subprocess.CalledProcessError:
|
||||
hermes_pids = []
|
||||
|
||||
# Python package versions (key ones)
|
||||
key_packages = ["jupyterlab", "papermill", "requests"]
|
||||
pkg_versions = {}
|
||||
for pkg in key_packages:
|
||||
try:
|
||||
out = subprocess.check_output([sys.executable, "-m", "pip", "show", pkg], stderr=subprocess.DEVNULL).decode()
|
||||
for line in out.splitlines():
|
||||
if line.startswith("Version:"):
|
||||
pkg_versions[pkg] = line.split(":", 1)[1].strip()
|
||||
break
|
||||
except Exception:
|
||||
pkg_versions[pkg] = None
|
||||
|
||||
overall = "ok"
|
||||
if load_status == "critical" or disk_status == "critical":
|
||||
overall = "critical"
|
||||
elif not hermes_pids:
|
||||
overall = "warning"
|
||||
|
||||
return {
|
||||
"timestamp": gather_time,
|
||||
"overall": overall,
|
||||
"load": {
|
||||
"raw": load_raw if not load_raw.startswith("error:") else None,
|
||||
"1min": load_values[0] if len(load_values) > 0 else None,
|
||||
"5min": load_values[1] if len(load_values) > 1 else None,
|
||||
"15min": load_values[2] if len(load_values) > 2 else None,
|
||||
"avg": round(avg_load, 3) if avg_load is not None else None,
|
||||
"threshold": threshold_load,
|
||||
"status": load_status,
|
||||
},
|
||||
"disk": {
|
||||
"total_gb": round(disk.total / (1024 ** 3), 2),
|
||||
"used_gb": round(disk.used / (1024 ** 3), 2),
|
||||
"free_gb": round(disk.free / (1024 ** 3), 2),
|
||||
"used_percent": round(disk_percent, 2),
|
||||
"threshold_percent": threshold_disk_percent,
|
||||
"status": disk_status,
|
||||
},
|
||||
"memory": mem_stats,
|
||||
"processes": {
|
||||
"hermes_count": len(hermes_pids),
|
||||
"hermes_pids": hermes_pids[:10],
|
||||
},
|
||||
"packages": pkg_versions,
|
||||
}
|
||||
|
||||
|
||||
def main(argv: List[str] = None) -> int:
|
||||
argv = argv or sys.argv[1:]
|
||||
parser = argparse.ArgumentParser(description="Fleet health monitor")
|
||||
parser.add_argument("--threshold-load", type=float, default=1.0)
|
||||
parser.add_argument("--threshold-disk", type=float, default=90.0)
|
||||
parser.add_argument("--fail-on-critical", action="store_true", help="Exit non-zero if overall is critical")
|
||||
args = parser.parse_args(argv)
|
||||
|
||||
report = check_health(args.threshold_load, args.threshold_disk)
|
||||
print(json.dumps(report, indent=2))
|
||||
if args.fail_on_critical and report.get("overall") == "critical":
|
||||
return 1
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
136
devkit/notebook_runner.py
Normal file
136
devkit/notebook_runner.py
Normal file
@@ -0,0 +1,136 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Notebook execution runner for agent tasks.
|
||||
Wraps papermill with sensible defaults and structured JSON reporting.
|
||||
|
||||
Usage as CLI:
|
||||
python -m devkit.notebook_runner notebooks/task.ipynb output.ipynb -p threshold 1.0
|
||||
python -m devkit.notebook_runner notebooks/task.ipynb --dry-run
|
||||
|
||||
Usage as module:
|
||||
from devkit.notebook_runner import run_notebook
|
||||
result = run_notebook("task.ipynb", "output.ipynb", parameters={"threshold": 1.0})
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
|
||||
def run_notebook(
|
||||
input_path: str,
|
||||
output_path: Optional[str] = None,
|
||||
parameters: Optional[Dict[str, Any]] = None,
|
||||
kernel: str = "python3",
|
||||
timeout: Optional[int] = None,
|
||||
dry_run: bool = False,
|
||||
) -> Dict[str, Any]:
|
||||
input_path = str(Path(input_path).expanduser().resolve())
|
||||
if output_path is None:
|
||||
fd, output_path = tempfile.mkstemp(suffix=".ipynb")
|
||||
os.close(fd)
|
||||
else:
|
||||
output_path = str(Path(output_path).expanduser().resolve())
|
||||
|
||||
if dry_run:
|
||||
return {
|
||||
"status": "dry_run",
|
||||
"input": input_path,
|
||||
"output": output_path,
|
||||
"parameters": parameters or {},
|
||||
"kernel": kernel,
|
||||
}
|
||||
|
||||
cmd = ["papermill", input_path, output_path, "--kernel", kernel]
|
||||
if timeout is not None:
|
||||
cmd.extend(["--execution-timeout", str(timeout)])
|
||||
for key, value in (parameters or {}).items():
|
||||
cmd.extend(["-p", key, str(value)])
|
||||
|
||||
start = os.times()
|
||||
try:
|
||||
proc = subprocess.run(
|
||||
cmd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
check=True,
|
||||
)
|
||||
end = os.times()
|
||||
return {
|
||||
"status": "ok",
|
||||
"input": input_path,
|
||||
"output": output_path,
|
||||
"parameters": parameters or {},
|
||||
"kernel": kernel,
|
||||
"elapsed_seconds": round((end.elapsed - start.elapsed), 2),
|
||||
"stdout": proc.stdout[-2000:] if proc.stdout else "",
|
||||
}
|
||||
except subprocess.CalledProcessError as e:
|
||||
end = os.times()
|
||||
return {
|
||||
"status": "error",
|
||||
"input": input_path,
|
||||
"output": output_path,
|
||||
"parameters": parameters or {},
|
||||
"kernel": kernel,
|
||||
"elapsed_seconds": round((end.elapsed - start.elapsed), 2),
|
||||
"stdout": e.stdout[-2000:] if e.stdout else "",
|
||||
"stderr": e.stderr[-2000:] if e.stderr else "",
|
||||
"returncode": e.returncode,
|
||||
}
|
||||
except FileNotFoundError:
|
||||
return {
|
||||
"status": "error",
|
||||
"message": "papermill not found. Install with: uv tool install papermill",
|
||||
}
|
||||
|
||||
|
||||
def main(argv: List[str] = None) -> int:
|
||||
argv = argv or sys.argv[1:]
|
||||
parser = argparse.ArgumentParser(description="Notebook runner for agents")
|
||||
parser.add_argument("input", help="Input notebook path")
|
||||
parser.add_argument("output", nargs="?", default=None, help="Output notebook path")
|
||||
parser.add_argument("-p", "--parameter", action="append", default=[], help="Parameters as key=value")
|
||||
parser.add_argument("--kernel", default="python3")
|
||||
parser.add_argument("--timeout", type=int, default=None)
|
||||
parser.add_argument("--dry-run", action="store_true")
|
||||
args = parser.parse_args(argv)
|
||||
|
||||
parameters = {}
|
||||
for raw in args.parameter:
|
||||
if "=" not in raw:
|
||||
print(f"Invalid parameter (expected key=value): {raw}", file=sys.stderr)
|
||||
return 1
|
||||
k, v = raw.split("=", 1)
|
||||
# Best-effort type inference
|
||||
if v.lower() in ("true", "false"):
|
||||
v = v.lower() == "true"
|
||||
else:
|
||||
try:
|
||||
v = int(v)
|
||||
except ValueError:
|
||||
try:
|
||||
v = float(v)
|
||||
except ValueError:
|
||||
pass
|
||||
parameters[k] = v
|
||||
|
||||
result = run_notebook(
|
||||
args.input,
|
||||
args.output,
|
||||
parameters=parameters,
|
||||
kernel=args.kernel,
|
||||
timeout=args.timeout,
|
||||
dry_run=args.dry_run,
|
||||
)
|
||||
print(json.dumps(result, indent=2))
|
||||
return 0 if result.get("status") == "ok" else 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
108
devkit/secret_scan.py
Normal file
108
devkit/secret_scan.py
Normal file
@@ -0,0 +1,108 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Fast secret leak scanner for the repository.
|
||||
Checks for common patterns that should never be committed.
|
||||
|
||||
Usage as CLI:
|
||||
python -m devkit.secret_scan
|
||||
python -m devkit.secret_scan --path /some/repo --fail-on-find
|
||||
|
||||
Usage as module:
|
||||
from devkit.secret_scan import scan
|
||||
findings = scan("/path/to/repo")
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List
|
||||
|
||||
# Patterns to flag
|
||||
PATTERNS = {
|
||||
"aws_access_key_id": re.compile(r"AKIA[0-9A-Z]{16}"),
|
||||
"aws_secret_key": re.compile(r"['\"\s][0-9a-zA-Z/+]{40}['\"\s]"),
|
||||
"generic_api_key": re.compile(r"api[_-]?key\s*[:=]\s*['\"][a-zA-Z0-9_\-]{20,}['\"]", re.IGNORECASE),
|
||||
"private_key": re.compile(r"-----BEGIN (RSA |EC |DSA |OPENSSH )?PRIVATE KEY-----"),
|
||||
"github_token": re.compile(r"gh[pousr]_[A-Za-z0-9_]{36,}"),
|
||||
"gitea_token": re.compile(r"[0-9a-f]{40}"), # heuristic for long hex strings after "token"
|
||||
"telegram_bot_token": re.compile(r"[0-9]{9,}:[A-Za-z0-9_-]{35,}"),
|
||||
}
|
||||
|
||||
# Files and paths to skip
|
||||
SKIP_PATHS = [
|
||||
".git",
|
||||
"__pycache__",
|
||||
".pytest_cache",
|
||||
"node_modules",
|
||||
"venv",
|
||||
".env",
|
||||
".agent-skills",
|
||||
]
|
||||
|
||||
# Max file size to scan (bytes)
|
||||
MAX_FILE_SIZE = 1024 * 1024
|
||||
|
||||
|
||||
def _should_skip(path: Path) -> bool:
|
||||
for skip in SKIP_PATHS:
|
||||
if skip in path.parts:
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def scan(root: str = ".") -> List[Dict[str, Any]]:
|
||||
root_path = Path(root).resolve()
|
||||
findings = []
|
||||
for file_path in root_path.rglob("*"):
|
||||
if not file_path.is_file():
|
||||
continue
|
||||
if _should_skip(file_path):
|
||||
continue
|
||||
if file_path.stat().st_size > MAX_FILE_SIZE:
|
||||
continue
|
||||
try:
|
||||
text = file_path.read_text(encoding="utf-8", errors="ignore")
|
||||
except Exception:
|
||||
continue
|
||||
for pattern_name, pattern in PATTERNS.items():
|
||||
for match in pattern.finditer(text):
|
||||
# Simple context: line around match
|
||||
start = max(0, match.start() - 40)
|
||||
end = min(len(text), match.end() + 40)
|
||||
context = text[start:end].replace("\n", " ")
|
||||
findings.append({
|
||||
"file": str(file_path.relative_to(root_path)),
|
||||
"pattern": pattern_name,
|
||||
"line": text[:match.start()].count("\n") + 1,
|
||||
"context": context,
|
||||
})
|
||||
return findings
|
||||
|
||||
|
||||
def main(argv: List[str] = None) -> int:
|
||||
argv = argv or sys.argv[1:]
|
||||
parser = argparse.ArgumentParser(description="Secret leak scanner")
|
||||
parser.add_argument("--path", default=".", help="Repository root to scan")
|
||||
parser.add_argument("--fail-on-find", action="store_true", help="Exit non-zero if secrets found")
|
||||
parser.add_argument("--json", action="store_true", help="Output as JSON")
|
||||
args = parser.parse_args(argv)
|
||||
|
||||
findings = scan(args.path)
|
||||
if args.json:
|
||||
print(json.dumps({"findings": findings, "count": len(findings)}, indent=2))
|
||||
else:
|
||||
print(f"Scanned {args.path}")
|
||||
print(f"Findings: {len(findings)}")
|
||||
for f in findings:
|
||||
print(f" [{f['pattern']}] {f['file']}:{f['line']} -> ...{f['context']}...")
|
||||
|
||||
if args.fail_on_find and findings:
|
||||
return 1
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
108
devkit/smoke_test.py
Normal file
108
devkit/smoke_test.py
Normal file
@@ -0,0 +1,108 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Shared smoke test runner for hermes-agent.
|
||||
Fast checks that catch obvious breakage without maintenance burden.
|
||||
|
||||
Usage as CLI:
|
||||
python -m devkit.smoke_test
|
||||
python -m devkit.smoke_test --verbose
|
||||
|
||||
Usage as module:
|
||||
from devkit.smoke_test import run_smoke_tests
|
||||
results = run_smoke_tests()
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import importlib
|
||||
import json
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List
|
||||
|
||||
|
||||
HERMES_ROOT = Path(__file__).resolve().parent.parent
|
||||
|
||||
|
||||
def _test_imports() -> Dict[str, Any]:
|
||||
modules = [
|
||||
"hermes_constants",
|
||||
"hermes_state",
|
||||
"cli",
|
||||
"tools.skills_sync",
|
||||
"tools.skills_hub",
|
||||
]
|
||||
errors = []
|
||||
for mod in modules:
|
||||
try:
|
||||
importlib.import_module(mod)
|
||||
except Exception as e:
|
||||
errors.append({"module": mod, "error": str(e)})
|
||||
return {
|
||||
"name": "core_imports",
|
||||
"status": "ok" if not errors else "fail",
|
||||
"errors": errors,
|
||||
}
|
||||
|
||||
|
||||
def _test_cli_entrypoints() -> Dict[str, Any]:
|
||||
entrypoints = [
|
||||
[sys.executable, "-m", "cli", "--help"],
|
||||
]
|
||||
errors = []
|
||||
for cmd in entrypoints:
|
||||
try:
|
||||
subprocess.run(cmd, capture_output=True, text=True, check=True, cwd=HERMES_ROOT)
|
||||
except subprocess.CalledProcessError as e:
|
||||
errors.append({"cmd": cmd, "error": f"exit {e.returncode}"})
|
||||
except Exception as e:
|
||||
errors.append({"cmd": cmd, "error": str(e)})
|
||||
return {
|
||||
"name": "cli_entrypoints",
|
||||
"status": "ok" if not errors else "fail",
|
||||
"errors": errors,
|
||||
}
|
||||
|
||||
|
||||
def _test_green_path_e2e() -> Dict[str, Any]:
|
||||
"""One bare green-path E2E: terminal_tool echo hello."""
|
||||
try:
|
||||
from tools.terminal_tool import terminal
|
||||
result = terminal(command="echo hello")
|
||||
output = result.get("output", "")
|
||||
if "hello" in output.lower():
|
||||
return {"name": "green_path_e2e", "status": "ok", "output": output.strip()}
|
||||
return {"name": "green_path_e2e", "status": "fail", "error": f"Unexpected output: {output}"}
|
||||
except Exception as e:
|
||||
return {"name": "green_path_e2e", "status": "fail", "error": str(e)}
|
||||
|
||||
|
||||
def run_smoke_tests(verbose: bool = False) -> Dict[str, Any]:
|
||||
tests = [
|
||||
_test_imports(),
|
||||
_test_cli_entrypoints(),
|
||||
_test_green_path_e2e(),
|
||||
]
|
||||
failed = [t for t in tests if t["status"] != "ok"]
|
||||
result = {
|
||||
"overall": "ok" if not failed else "fail",
|
||||
"tests": tests,
|
||||
"failed_count": len(failed),
|
||||
}
|
||||
if verbose:
|
||||
print(json.dumps(result, indent=2))
|
||||
return result
|
||||
|
||||
|
||||
def main(argv: List[str] = None) -> int:
|
||||
argv = argv or sys.argv[1:]
|
||||
parser = argparse.ArgumentParser(description="Smoke test runner")
|
||||
parser.add_argument("--verbose", action="store_true")
|
||||
args = parser.parse_args(argv)
|
||||
|
||||
result = run_smoke_tests(verbose=True)
|
||||
return 0 if result["overall"] == "ok" else 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
112
devkit/wizard_env.py
Normal file
112
devkit/wizard_env.py
Normal file
@@ -0,0 +1,112 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Wizard environment validator.
|
||||
Checks that a new wizard environment is ready for duty.
|
||||
|
||||
Usage as CLI:
|
||||
python -m devkit.wizard_env
|
||||
python -m devkit.wizard_env --fix
|
||||
|
||||
Usage as module:
|
||||
from devkit.wizard_env import validate
|
||||
report = validate()
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import shutil
|
||||
import subprocess
|
||||
import sys
|
||||
from typing import Any, Dict, List
|
||||
|
||||
|
||||
def _has_cmd(name: str) -> bool:
|
||||
return shutil.which(name) is not None
|
||||
|
||||
|
||||
def _check_env_var(name: str) -> Dict[str, Any]:
|
||||
value = os.getenv(name)
|
||||
return {
|
||||
"name": name,
|
||||
"status": "ok" if value else "missing",
|
||||
"value": value[:10] + "..." if value and len(value) > 20 else value,
|
||||
}
|
||||
|
||||
|
||||
def _check_python_pkg(name: str) -> Dict[str, Any]:
|
||||
try:
|
||||
__import__(name)
|
||||
return {"name": name, "status": "ok"}
|
||||
except ImportError:
|
||||
return {"name": name, "status": "missing"}
|
||||
|
||||
|
||||
def validate() -> Dict[str, Any]:
|
||||
checks = {
|
||||
"binaries": [
|
||||
{"name": "python3", "status": "ok" if _has_cmd("python3") else "missing"},
|
||||
{"name": "git", "status": "ok" if _has_cmd("git") else "missing"},
|
||||
{"name": "curl", "status": "ok" if _has_cmd("curl") else "missing"},
|
||||
{"name": "jupyter-lab", "status": "ok" if _has_cmd("jupyter-lab") else "missing"},
|
||||
{"name": "papermill", "status": "ok" if _has_cmd("papermill") else "missing"},
|
||||
{"name": "jupytext", "status": "ok" if _has_cmd("jupytext") else "missing"},
|
||||
],
|
||||
"env_vars": [
|
||||
_check_env_var("GITEA_URL"),
|
||||
_check_env_var("GITEA_TOKEN"),
|
||||
_check_env_var("TELEGRAM_BOT_TOKEN"),
|
||||
],
|
||||
"python_packages": [
|
||||
_check_python_pkg("requests"),
|
||||
_check_python_pkg("jupyter_server"),
|
||||
_check_python_pkg("nbformat"),
|
||||
],
|
||||
}
|
||||
|
||||
all_ok = all(
|
||||
c["status"] == "ok"
|
||||
for group in checks.values()
|
||||
for c in group
|
||||
)
|
||||
|
||||
# Hermes-specific checks
|
||||
hermes_home = os.path.expanduser("~/.hermes")
|
||||
checks["hermes"] = [
|
||||
{"name": "config.yaml", "status": "ok" if os.path.exists(f"{hermes_home}/config.yaml") else "missing"},
|
||||
{"name": "skills_dir", "status": "ok" if os.path.exists(f"{hermes_home}/skills") else "missing"},
|
||||
]
|
||||
|
||||
all_ok = all_ok and all(c["status"] == "ok" for c in checks["hermes"])
|
||||
|
||||
return {
|
||||
"overall": "ok" if all_ok else "incomplete",
|
||||
"checks": checks,
|
||||
}
|
||||
|
||||
|
||||
def main(argv: List[str] = None) -> int:
|
||||
argv = argv or sys.argv[1:]
|
||||
parser = argparse.ArgumentParser(description="Wizard environment validator")
|
||||
parser.add_argument("--json", action="store_true")
|
||||
parser.add_argument("--fail-on-incomplete", action="store_true")
|
||||
args = parser.parse_args(argv)
|
||||
|
||||
report = validate()
|
||||
if args.json:
|
||||
print(json.dumps(report, indent=2))
|
||||
else:
|
||||
print(f"Wizard Environment: {report['overall']}")
|
||||
for group, items in report["checks"].items():
|
||||
print(f"\n[{group}]")
|
||||
for item in items:
|
||||
status_icon = "✅" if item["status"] == "ok" else "❌"
|
||||
print(f" {status_icon} {item['name']}: {item['status']}")
|
||||
|
||||
if args.fail_on_incomplete and report["overall"] != "ok":
|
||||
return 1
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
57
docs/NOTEBOOK_WORKFLOW.md
Normal file
57
docs/NOTEBOOK_WORKFLOW.md
Normal file
@@ -0,0 +1,57 @@
|
||||
# Notebook Workflow for Agent Tasks
|
||||
|
||||
This directory demonstrates a sovereign, version-controlled workflow for LLM agent tasks using Jupyter notebooks.
|
||||
|
||||
## Philosophy
|
||||
|
||||
- **`.py` files are the source of truth`** — authored and reviewed as plain Python with `# %%` cell markers (via Jupytext)
|
||||
- **`.ipynb` files are generated artifacts** — auto-created from `.py` for execution and rich viewing
|
||||
- **Papermill parameterizes and executes** — each run produces an output notebook with code, narrative, and results preserved
|
||||
- **Output notebooks are audit artifacts** — every execution leaves a permanent, replayable record
|
||||
|
||||
## File Layout
|
||||
|
||||
```
|
||||
notebooks/
|
||||
agent_task_system_health.py # Source of truth (Jupytext)
|
||||
agent_task_system_health.ipynb # Generated from .py
|
||||
docs/
|
||||
NOTEBOOK_WORKFLOW.md # This document
|
||||
.gitea/workflows/
|
||||
notebook-ci.yml # CI gate: executes notebooks on PR/push
|
||||
```
|
||||
|
||||
## How Agents Work With Notebooks
|
||||
|
||||
1. **Create** — Agent generates a `.py` notebook using `# %% [markdown]` and `# %%` code blocks
|
||||
2. **Review** — PR reviewers see clean diffs in Gitea (no JSON noise)
|
||||
3. **Generate** — `jupytext --to ipynb` produces the `.ipynb` before merge
|
||||
4. **Execute** — Papermill runs the notebook with injected parameters
|
||||
5. **Archive** — Output notebook is committed to a `reports/` branch or artifact store
|
||||
|
||||
## Converting Between Formats
|
||||
|
||||
```bash
|
||||
# .py -> .ipynb
|
||||
jupytext --to ipynb notebooks/agent_task_system_health.py
|
||||
|
||||
# .ipynb -> .py
|
||||
jupytext --to py notebooks/agent_task_system_health.ipynb
|
||||
|
||||
# Execute with parameters
|
||||
papermill notebooks/agent_task_system_health.ipynb output.ipynb \
|
||||
-p threshold 1.0 -p hostname forge-vps-01
|
||||
```
|
||||
|
||||
## CI Gate
|
||||
|
||||
The `notebook-ci.yml` workflow executes all notebooks in `notebooks/` on every PR and push, ensuring that checked-in notebooks still run and produce outputs.
|
||||
|
||||
## Why This Matters
|
||||
|
||||
| Problem | Notebook Solution |
|
||||
|---|---|
|
||||
| Ephemeral agent reasoning | Markdown cells narrate the thought process |
|
||||
| Stateless single-turn tools | Stateful cells persist variables across steps |
|
||||
| Unreviewable binary artifacts | `.py` source is diffable and PR-friendly |
|
||||
| No execution audit trail | Output notebook preserves code + outputs + metadata |
|
||||
132
docs/fleet-sitrep-2026-04-06.md
Normal file
132
docs/fleet-sitrep-2026-04-06.md
Normal file
@@ -0,0 +1,132 @@
|
||||
# Fleet SITREP — April 6, 2026
|
||||
|
||||
**Classification:** Consolidated Status Report
|
||||
**Compiled by:** Ezra
|
||||
**Acknowledged by:** Claude (Issue #143)
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Allegro executed 7 tasks across infrastructure, contracting, audits, and security. Ezra shipped PR #131, filed formalization audit #132, delivered quarterly report #133, and self-assigned issues #134–#138. All wizard activity mapped below.
|
||||
|
||||
---
|
||||
|
||||
## 1. Allegro 7-Task Report
|
||||
|
||||
| Task | Description | Status |
|
||||
|------|-------------|--------|
|
||||
| 1 | Roll Call / Infrastructure Map | ✅ Complete |
|
||||
| 2 | Dark industrial anthem (140 BPM, Suno-ready) | ✅ Complete |
|
||||
| 3 | Operation Get A Job — 7-file contracting playbook pushed to `the-nexus` | ✅ Complete |
|
||||
| 4 | Formalization audit filed ([the-nexus #893](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/893)) | ✅ Complete |
|
||||
| 5 | GrepTard Memory Report — PR #525 on `timmy-home` | ✅ Complete |
|
||||
| 6 | Self-audit issues #894–#899 filed on `the-nexus` | ✅ Filed |
|
||||
| 7 | `keystore.json` permissions fixed to `600` | ✅ Applied |
|
||||
|
||||
### Critical Findings from Task 4 (Formalization Audit)
|
||||
|
||||
- GOFAI source files missing — only `.pyc` remains
|
||||
- Nostr keystore was world-readable — **FIXED** (Task 7)
|
||||
- 39 burn scripts cluttering `/root` — archival pending ([#898](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/898))
|
||||
|
||||
---
|
||||
|
||||
## 2. Ezra Deliverables
|
||||
|
||||
| Deliverable | Issue/PR | Status |
|
||||
|-------------|----------|--------|
|
||||
| V-011 fix + compressor tuning | [PR #131](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/pulls/131) | ✅ Merged |
|
||||
| Formalization audit (hermes-agent) | [Issue #132](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/132) | Filed |
|
||||
| Quarterly report (MD + PDF) | [Issue #133](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/133) | Filed |
|
||||
| Burn-mode concurrent tool tests | [Issue #134](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/134) | Assigned → Ezra |
|
||||
| MCP SDK migration | [Issue #135](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/135) | Assigned → Ezra |
|
||||
| APScheduler migration | [Issue #136](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/136) | Assigned → Ezra |
|
||||
| Pydantic-settings migration | [Issue #137](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/137) | Assigned → Ezra |
|
||||
| Contracting playbook tracker | [Issue #138](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/138) | Assigned → Ezra |
|
||||
|
||||
---
|
||||
|
||||
## 3. Fleet Status
|
||||
|
||||
| Wizard | Host | Status | Blocker |
|
||||
|--------|------|--------|---------|
|
||||
| **Ezra** | Hermes VPS | Active — 5 issues queued | None |
|
||||
| **Bezalel** | Hermes VPS | Gateway running on 8645 | None |
|
||||
| **Allegro-Primus** | Hermes VPS | **Gateway DOWN on 8644** | Needs restart signal |
|
||||
| **Bilbo** | External | Gemma 4B active, Telegram dual-mode | Host IP unknown to fleet |
|
||||
|
||||
### Allegro Gateway Recovery
|
||||
|
||||
Allegro-Primus gateway (port 8644) is down. Options:
|
||||
1. **Alexander restarts manually** on Hermes VPS
|
||||
2. **Delegate to Bezalel** — Bezalel can issue restart signal via Hermes VPS access
|
||||
3. **Delegate to Ezra** — Ezra can coordinate restart as part of issue #894 work
|
||||
|
||||
---
|
||||
|
||||
## 4. Operation Get A Job — Contracting Playbook
|
||||
|
||||
Files pushed to `the-nexus/operation-get-a-job/`:
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `README.md` | Master plan |
|
||||
| `entity-setup.md` | Wyoming LLC, Mercury, E&O insurance |
|
||||
| `service-offerings.md` | Rates $150–600/hr; packages $5k/$15k/$40k+ |
|
||||
| `portfolio.md` | Portfolio structure |
|
||||
| `outreach-templates.md` | Cold email templates |
|
||||
| `proposal-template.md` | Client proposal structure |
|
||||
| `rate-card.md` | Rate card |
|
||||
|
||||
**Human-only mile (Alexander's action items):**
|
||||
|
||||
1. Pick LLC name from `entity-setup.md`
|
||||
2. File Wyoming LLC via Northwest Registered Agent ($225)
|
||||
3. Get EIN from IRS (free, ~10 min)
|
||||
4. Open Mercury account (requires EIN + LLC docs)
|
||||
5. Secure E&O insurance (~$150–250/month)
|
||||
6. Restart Allegro-Primus gateway (port 8644)
|
||||
7. Update LinkedIn using profile template
|
||||
8. Send 5 cold emails using outreach templates
|
||||
|
||||
---
|
||||
|
||||
## 5. Pending Self-Audit Issues (the-nexus)
|
||||
|
||||
| Issue | Title | Priority |
|
||||
|-------|-------|----------|
|
||||
| [#894](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/894) | Deploy burn-mode cron jobs | CRITICAL |
|
||||
| [#895](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/895) | Telegram thread-based reporting | Normal |
|
||||
| [#896](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/896) | Retry logic and error recovery | Normal |
|
||||
| [#897](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/897) | Automate morning reports at 0600 | Normal |
|
||||
| [#898](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/898) | Archive 39 burn scripts | Normal |
|
||||
| [#899](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/899) | Keystore permissions | ✅ Done |
|
||||
|
||||
---
|
||||
|
||||
## 6. Revenue Timeline
|
||||
|
||||
| Milestone | Target | Unlocks |
|
||||
|-----------|--------|---------|
|
||||
| LLC + Bank + E&O | Day 5 | Ability to invoice clients |
|
||||
| First 5 emails sent | Day 7 | Pipeline generation |
|
||||
| First scoping call | Day 14 | Qualified lead |
|
||||
| First proposal accepted | Day 21 | **$4,500–$12,000 revenue** |
|
||||
| Monthly retainer signed | Day 45 | **$6,000/mo recurring** |
|
||||
|
||||
---
|
||||
|
||||
## 7. Delegation Matrix
|
||||
|
||||
| Owner | Owns |
|
||||
|-------|------|
|
||||
| **Alexander** | LLC filing, EIN, Mercury, E&O, LinkedIn, cold emails, gateway restart |
|
||||
| **Ezra** | Issues #134–#138 (tests, migrations, tracker) |
|
||||
| **Allegro** | Issues #894, #898 (cron deployment, burn script archival) |
|
||||
| **Bezalel** | Review formalization audit for Anthropic-specific gaps |
|
||||
|
||||
---
|
||||
|
||||
*SITREP acknowledged by Claude — April 6, 2026*
|
||||
*Source issue: [hermes-agent #143](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/143)*
|
||||
678
docs/jupyter-as-execution-layer-research.md
Normal file
678
docs/jupyter-as-execution-layer-research.md
Normal file
@@ -0,0 +1,678 @@
|
||||
# Jupyter Notebooks as Core LLM Execution Layer — Deep Research Report
|
||||
|
||||
**Issue:** #155
|
||||
**Date:** 2026-04-06
|
||||
**Status:** Research / Spike
|
||||
**Prior Art:** Timmy's initial spike (llm_execution_spike.ipynb, hamelnb bridge, JupyterLab on forge VPS)
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
This report deepens the research from issue #155 into three areas requested by Rockachopa:
|
||||
1. The **full Jupyter product suite** — JupyterHub vs JupyterLab vs Notebook
|
||||
2. **Papermill** — the production-grade notebook execution engine already used in real data pipelines
|
||||
3. The **"PR model for notebooks"** — how agents can propose, diff, review, and merge changes to `.ipynb` files similarly to code PRs
|
||||
|
||||
The conclusion: an elegant, production-grade agent→notebook pipeline already exists as open-source tooling. We don't need to invent much — we need to compose what's there.
|
||||
|
||||
---
|
||||
|
||||
## 1. The Jupyter Product Suite
|
||||
|
||||
The Jupyter ecosystem has three distinct layers that are often conflated. Understanding the distinction is critical for architectural decisions.
|
||||
|
||||
### 1.1 Jupyter Notebook (Classic)
|
||||
|
||||
The original single-user interface. One browser tab = one `.ipynb` file. Version 6 is in maintenance-only mode. Version 7 was rebuilt on JupyterLab components and is functionally equivalent. For headless agent use, the UI is irrelevant — what matters is the `.ipynb` file format and the kernel execution model underneath.
|
||||
|
||||
### 1.2 JupyterLab
|
||||
|
||||
The current canonical Jupyter interface for human users: full IDE, multi-pane, terminal, extension manager, built-in diff viewer, and `jupyterlab-git` for Git workflows from the UI. JupyterLab is the recommended target for agent-collaborative workflows because:
|
||||
|
||||
- It exposes the same REST API as classic Jupyter (kernel sessions, execute, contents)
|
||||
- Extensions like `jupyterlab-git` let a human co-reviewer inspect changes alongside the agent
|
||||
- The `hamelnb` bridge Timmy already validated works against a JupyterLab server
|
||||
|
||||
**For agents:** JupyterLab is the platform to run on. The agent doesn't interact with the UI — it uses the Jupyter REST API or Papermill on top of it.
|
||||
|
||||
### 1.3 JupyterHub — The Multi-User Orchestration Layer
|
||||
|
||||
JupyterHub is not a UI. It is a **multi-user server** that spawns, manages, and proxies individual single-user Jupyter servers. This is the production infrastructure layer.
|
||||
|
||||
```
|
||||
[Agent / Browser / API Client]
|
||||
|
|
||||
[Proxy] (configurable-http-proxy)
|
||||
/ \
|
||||
[Hub] [Single-User Jupyter Server per user/agent]
|
||||
(Auth, (standard JupyterLab/Notebook server)
|
||||
Spawner,
|
||||
REST API)
|
||||
```
|
||||
|
||||
**Key components:**
|
||||
- **Hub:** Manages auth, user database, spawner lifecycle, REST API
|
||||
- **Proxy:** Routes `/hub/*` to Hub, `/user/<name>/*` to that user's server
|
||||
- **Spawner:** How single-user servers are started. Default = local process. Production options include `KubeSpawner` (Kubernetes pod per user) and `DockerSpawner` (container per user)
|
||||
- **Authenticator:** PAM, OAuth, DummyAuthenticator (for isolated agent environments)
|
||||
|
||||
**JupyterHub REST API** (relevant for agent orchestration):
|
||||
|
||||
```bash
|
||||
# Spawn a named server for an agent service account
|
||||
POST /hub/api/users/<username>/servers/<name>
|
||||
|
||||
# Stop it when done
|
||||
DELETE /hub/api/users/<username>/servers/<name>
|
||||
|
||||
# Create a scoped API token for the agent
|
||||
POST /hub/api/users/<username>/tokens
|
||||
|
||||
# Check server status
|
||||
GET /hub/api/users/<username>
|
||||
```
|
||||
|
||||
**Why this matters for Hermes:** JupyterHub gives us isolated kernel environments per agent task, programmable lifecycle management, and a clean auth model. Instead of running one shared JupyterLab instance on the forge VPS, we could spawn ephemeral single-user servers per notebook execution run — each with its own kernel, clean state, and resource limits.
|
||||
|
||||
### 1.4 Jupyter Kernel Gateway — Minimal Headless Execution
|
||||
|
||||
If JupyterHub is too heavy, `jupyter-kernel-gateway` exposes just the kernel protocol over REST + WebSocket:
|
||||
|
||||
```bash
|
||||
pip install jupyter-kernel-gateway
|
||||
jupyter kernelgateway --KernelGatewayApp.api=kernel_gateway.jupyter_websocket
|
||||
|
||||
# Start kernel
|
||||
POST /api/kernels
|
||||
# Execute via WebSocket on Jupyter messaging protocol
|
||||
WS /api/kernels/<kernel_id>/channels
|
||||
# Stop kernel
|
||||
DELETE /api/kernels/<kernel_id>
|
||||
```
|
||||
|
||||
This is the lowest-level option: no notebook management, just raw kernel access. Suitable if we want to build our own execution layer from scratch.
|
||||
|
||||
---
|
||||
|
||||
## 2. Papermill — Production Notebook Execution
|
||||
|
||||
Papermill is the missing link between "notebook as experiment" and "notebook as repeatable pipeline task." It is already used at scale in industry data pipelines (Netflix, Airbnb, etc.).
|
||||
|
||||
### 2.1 Core Concept: Parameterization
|
||||
|
||||
Papermill's key innovation is **parameter injection**. Tag a cell in the notebook with `"parameters"`:
|
||||
|
||||
```python
|
||||
# Cell tagged "parameters" (defaults — defined by notebook author)
|
||||
alpha = 0.5
|
||||
batch_size = 32
|
||||
model_name = "baseline"
|
||||
```
|
||||
|
||||
At runtime, Papermill inserts a new cell immediately after, tagged `"injected-parameters"`, that overrides the defaults:
|
||||
|
||||
```python
|
||||
# Cell tagged "injected-parameters" (injected by Papermill at runtime)
|
||||
alpha = 0.01
|
||||
batch_size = 128
|
||||
model_name = "experiment_007"
|
||||
```
|
||||
|
||||
Because Python executes top-to-bottom, the injected cell shadows the defaults. The original notebook is never mutated — Papermill reads input, writes to a new output file.
|
||||
|
||||
### 2.2 Python API
|
||||
|
||||
```python
|
||||
import papermill as pm
|
||||
|
||||
nb = pm.execute_notebook(
|
||||
input_path="analysis.ipynb", # source (can be s3://, az://, gs://)
|
||||
output_path="output/run_001.ipynb", # destination (persists outputs)
|
||||
parameters={
|
||||
"alpha": 0.01,
|
||||
"n_samples": 1000,
|
||||
"run_id": "fleet-check-2026-04-06",
|
||||
},
|
||||
kernel_name="python3",
|
||||
execution_timeout=300, # per-cell timeout in seconds
|
||||
log_output=True, # stream cell output to logger
|
||||
cwd="/path/to/notebook/", # working directory
|
||||
)
|
||||
# Returns: NotebookNode (the fully executed notebook with all outputs)
|
||||
```
|
||||
|
||||
On cell failure, Papermill raises `PapermillExecutionError` with:
|
||||
- `cell_index` — which cell failed
|
||||
- `source` — the failing cell's code
|
||||
- `ename` / `evalue` — exception type and message
|
||||
- `traceback` — full traceback
|
||||
|
||||
Even on failure, the output notebook is written with whatever cells completed — enabling partial-run inspection.
|
||||
|
||||
### 2.3 CLI
|
||||
|
||||
```bash
|
||||
# Basic execution
|
||||
papermill analysis.ipynb output/run_001.ipynb \
|
||||
-p alpha 0.01 \
|
||||
-p n_samples 1000
|
||||
|
||||
# From YAML parameter file
|
||||
papermill analysis.ipynb output/run_001.ipynb -f params.yaml
|
||||
|
||||
# CI-friendly: log outputs, no progress bar
|
||||
papermill analysis.ipynb output/run_001.ipynb \
|
||||
--log-output \
|
||||
--no-progress-bar \
|
||||
--execution-timeout 300 \
|
||||
-p run_id "fleet-check-2026-04-06"
|
||||
|
||||
# Prepare only (inject params, skip execution — for preview/inspection)
|
||||
papermill analysis.ipynb preview.ipynb --prepare-only -p alpha 0.01
|
||||
|
||||
# Inspect parameter schema
|
||||
papermill --help-notebook analysis.ipynb
|
||||
```
|
||||
|
||||
**Remote storage** is built in — `pip install papermill[s3]` enables `s3://` paths for both input and output. Azure and GCS are also supported. For Hermes, this means notebook runs can be stored in object storage and retrieved later for audit.
|
||||
|
||||
### 2.4 Scrapbook — Structured Output Collection
|
||||
|
||||
`scrapbook` is Papermill's companion for extracting structured data from executed notebooks. Inside a notebook cell:
|
||||
|
||||
```python
|
||||
import scrapbook as sb
|
||||
|
||||
# Write typed outputs (stored as special display_data in cell outputs)
|
||||
sb.glue("accuracy", 0.9342)
|
||||
sb.glue("metrics", {"precision": 0.91, "recall": 0.93, "f1": 0.92})
|
||||
sb.glue("results_df", df, "pandas") # DataFrames too
|
||||
```
|
||||
|
||||
After execution, from the agent:
|
||||
|
||||
```python
|
||||
import scrapbook as sb
|
||||
|
||||
nb = sb.read_notebook("output/fleet-check-2026-04-06.ipynb")
|
||||
metrics = nb.scraps["metrics"].data # -> {"precision": 0.91, ...}
|
||||
accuracy = nb.scraps["accuracy"].data # -> 0.9342
|
||||
|
||||
# Or aggregate across many runs
|
||||
book = sb.read_notebooks("output/")
|
||||
book.scrap_dataframe # -> pd.DataFrame with all scraps + filenames
|
||||
```
|
||||
|
||||
This is the clean interface between notebook execution and agent decision-making: the notebook outputs its findings as named, typed scraps; the agent reads them programmatically and acts.
|
||||
|
||||
### 2.5 How Papermill Compares to hamelnb
|
||||
|
||||
| Capability | hamelnb | Papermill |
|
||||
|---|---|---|
|
||||
| Stateful kernel session | Yes | No (fresh kernel per run) |
|
||||
| Parameter injection | No | Yes |
|
||||
| Persistent output notebook | No | Yes |
|
||||
| Remote storage (S3/Azure) | No | Yes |
|
||||
| Per-cell timing/metadata | No | Yes (in output nb metadata) |
|
||||
| Error isolation (partial runs) | No | Yes |
|
||||
| Production pipeline use | Experimental | Industry-standard |
|
||||
| Structured output collection | No | Yes (via scrapbook) |
|
||||
|
||||
**Verdict:** `hamelnb` is great for interactive REPL-style exploration (where state accumulates). Papermill is better for task execution (where we want reproducible, parameterized, auditable runs). They serve different use cases. Hermes needs both.
|
||||
|
||||
---
|
||||
|
||||
## 3. The `.ipynb` File Format — What the Agent Is Actually Working With
|
||||
|
||||
Understanding the format is essential for the "PR model." A `.ipynb` file is JSON with this structure:
|
||||
|
||||
```json
|
||||
{
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5,
|
||||
"metadata": {
|
||||
"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"},
|
||||
"language_info": {"name": "python", "version": "3.10.0"}
|
||||
},
|
||||
"cells": [
|
||||
{
|
||||
"id": "a1b2c3d4",
|
||||
"cell_type": "markdown",
|
||||
"source": "# Fleet Health Check\n\nThis notebook checks system health.",
|
||||
"metadata": {}
|
||||
},
|
||||
{
|
||||
"id": "e5f6g7h8",
|
||||
"cell_type": "code",
|
||||
"source": "alpha = 0.5\nthreshold = 0.95",
|
||||
"metadata": {"tags": ["parameters"]},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"id": "i9j0k1l2",
|
||||
"cell_type": "code",
|
||||
"source": "import sys\nprint(sys.version)",
|
||||
"metadata": {},
|
||||
"execution_count": 1,
|
||||
"outputs": [
|
||||
{
|
||||
"output_type": "stream",
|
||||
"name": "stdout",
|
||||
"text": "3.10.0 (default, ...)\n"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
The `nbformat` Python library provides a clean API for working with this:
|
||||
|
||||
```python
|
||||
import nbformat
|
||||
|
||||
# Read
|
||||
with open("notebook.ipynb") as f:
|
||||
nb = nbformat.read(f, as_version=4)
|
||||
|
||||
# Navigate
|
||||
for cell in nb.cells:
|
||||
if cell.cell_type == "code":
|
||||
print(cell.source)
|
||||
|
||||
# Modify
|
||||
nb.cells[2].source = "import sys\nprint('updated')"
|
||||
|
||||
# Add cells
|
||||
new_md = nbformat.v4.new_markdown_cell("## Agent Analysis\nInserted by Hermes.")
|
||||
nb.cells.insert(3, new_md)
|
||||
|
||||
# Write
|
||||
with open("modified.ipynb", "w") as f:
|
||||
nbformat.write(nb, f)
|
||||
|
||||
# Validate
|
||||
nbformat.validate(nb) # raises nbformat.ValidationError on invalid format
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. The PR Model for Notebooks
|
||||
|
||||
This is the elegant architecture Rockachopa described: agents making PRs to notebooks the same way they make PRs to code. Here's how the full stack enables it.
|
||||
|
||||
### 4.1 The Problem: Raw `.ipynb` Diffs Are Unusable
|
||||
|
||||
Without tooling, a `git diff` on a notebook that was merely re-run (no source changes) produces thousands of lines of JSON changes — execution counts, timestamps, base64-encoded plot images. Code review on raw `.ipynb` diffs is impractical.
|
||||
|
||||
### 4.2 nbstripout — Clean Git History
|
||||
|
||||
`nbstripout` installs a git **clean filter** that strips outputs before files enter the git index. The working copy is untouched; only what gets committed is clean.
|
||||
|
||||
```bash
|
||||
pip install nbstripout
|
||||
nbstripout --install # per-repo
|
||||
# or
|
||||
nbstripout --install --global # all repos
|
||||
```
|
||||
|
||||
This writes to `.git/config`:
|
||||
```ini
|
||||
[filter "nbstripout"]
|
||||
clean = nbstripout
|
||||
smudge = cat
|
||||
required = true
|
||||
|
||||
[diff "ipynb"]
|
||||
textconv = nbstripout -t
|
||||
```
|
||||
|
||||
And to `.gitattributes`:
|
||||
```
|
||||
*.ipynb filter=nbstripout
|
||||
*.ipynb diff=ipynb
|
||||
```
|
||||
|
||||
Now `git diff` shows only source changes — same as reviewing a `.py` file.
|
||||
|
||||
**For executed-output notebooks** (where we want to keep outputs for audit): use a separate path like `runs/` or `outputs/` excluded from the filter via `.gitattributes`:
|
||||
```
|
||||
*.ipynb filter=nbstripout
|
||||
runs/*.ipynb !filter
|
||||
runs/*.ipynb !diff
|
||||
```
|
||||
|
||||
### 4.3 nbdime — Semantic Diff and Merge
|
||||
|
||||
nbdime understands notebook structure. Instead of diffing raw JSON, it diffs at the level of cells — knowing that `cells` is a list, `source` is a string, and outputs should often be ignored.
|
||||
|
||||
```bash
|
||||
pip install nbdime
|
||||
|
||||
# Enable semantic git diff/merge for all .ipynb files
|
||||
nbdime config-git --enable
|
||||
|
||||
# Now standard git commands are notebook-aware:
|
||||
git diff HEAD notebook.ipynb # semantic cell-level diff
|
||||
git merge feature-branch # uses nbdime for .ipynb conflict resolution
|
||||
git log -p notebook.ipynb # readable patch per commit
|
||||
```
|
||||
|
||||
**Python API for agent reasoning:**
|
||||
|
||||
```python
|
||||
import nbdime
|
||||
import nbformat
|
||||
|
||||
nb_base = nbformat.read(open("original.ipynb"), as_version=4)
|
||||
nb_pr = nbformat.read(open("proposed.ipynb"), as_version=4)
|
||||
|
||||
diff = nbdime.diff_notebooks(nb_base, nb_pr)
|
||||
|
||||
# diff is a list of structured ops the agent can reason about:
|
||||
# [{"op": "patch", "key": "cells", "diff": [
|
||||
# {"op": "patch", "key": 3, "diff": [
|
||||
# {"op": "patch", "key": "source", "diff": [...string ops...]}
|
||||
# ]}
|
||||
# ]}]
|
||||
|
||||
# Apply a diff (patch)
|
||||
from nbdime.patching import patch
|
||||
nb_result = patch(nb_base, diff)
|
||||
```
|
||||
|
||||
### 4.4 The Full Agent PR Workflow
|
||||
|
||||
Here is the complete workflow — analogous to how Hermes makes PRs to code repos via Gitea:
|
||||
|
||||
**1. Agent reads the task notebook**
|
||||
```python
|
||||
nb = nbformat.read(open("fleet_health_check.ipynb"), as_version=4)
|
||||
```
|
||||
|
||||
**2. Agent locates and modifies relevant cells**
|
||||
```python
|
||||
# Find parameter cell
|
||||
params_cell = next(
|
||||
c for c in nb.cells
|
||||
if "parameters" in c.get("metadata", {}).get("tags", [])
|
||||
)
|
||||
# Update threshold
|
||||
params_cell.source = params_cell.source.replace("threshold = 0.95", "threshold = 0.90")
|
||||
|
||||
# Add explanatory markdown
|
||||
nb.cells.insert(
|
||||
nb.cells.index(params_cell) + 1,
|
||||
nbformat.v4.new_markdown_cell(
|
||||
"**Note (Hermes 2026-04-06):** Threshold lowered from 0.95 to 0.90 "
|
||||
"based on false-positive analysis from last 7 days of runs."
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
**3. Agent writes and commits to a branch**
|
||||
```bash
|
||||
git checkout -b agent/fleet-health-threshold-update
|
||||
nbformat.write(nb, open("fleet_health_check.ipynb", "w"))
|
||||
git add fleet_health_check.ipynb
|
||||
git commit -m "feat(notebooks): lower fleet health threshold to 0.90 (#155)"
|
||||
```
|
||||
|
||||
**4. Agent executes the proposed notebook to validate**
|
||||
```python
|
||||
import papermill as pm
|
||||
|
||||
pm.execute_notebook(
|
||||
"fleet_health_check.ipynb",
|
||||
"output/validation_run.ipynb",
|
||||
parameters={"run_id": "agent-validation-2026-04-06"},
|
||||
log_output=True,
|
||||
)
|
||||
```
|
||||
|
||||
**5. Agent collects results and compares**
|
||||
```python
|
||||
import scrapbook as sb
|
||||
|
||||
result = sb.read_notebook("output/validation_run.ipynb")
|
||||
health_score = result.scraps["health_score"].data
|
||||
alert_count = result.scraps["alert_count"].data
|
||||
```
|
||||
|
||||
**6. Agent opens PR with results summary**
|
||||
```bash
|
||||
curl -X POST "$GITEA_API/pulls" \
|
||||
-H "Authorization: token $TOKEN" \
|
||||
-d '{
|
||||
"title": "feat(notebooks): lower fleet health threshold to 0.90",
|
||||
"body": "## Agent Analysis\n\n- Health score: 0.94 (was 0.89 with old threshold)\n- Alert count: 12 (was 47 false positives)\n- Validation run: output/validation_run.ipynb\n\nRefs #155",
|
||||
"head": "agent/fleet-health-threshold-update",
|
||||
"base": "main"
|
||||
}'
|
||||
```
|
||||
|
||||
**7. Human reviews the PR using nbdime diff**
|
||||
|
||||
The PR diff in Gitea shows the clean cell-level source changes (thanks to nbstripout). The human can also run `nbdiff-web original.ipynb proposed.ipynb` locally for rich rendered diff with output comparison.
|
||||
|
||||
### 4.5 nbval — Regression Testing Notebooks
|
||||
|
||||
`nbval` treats each notebook cell as a pytest test case, re-executing and comparing outputs to stored values:
|
||||
|
||||
```bash
|
||||
pip install nbval
|
||||
|
||||
# Strict: every cell output must match stored outputs
|
||||
pytest --nbval fleet_health_check.ipynb
|
||||
|
||||
# Lax: only check cells marked with # NBVAL_CHECK_OUTPUT
|
||||
pytest --nbval-lax fleet_health_check.ipynb
|
||||
```
|
||||
|
||||
Cell-level markers (comments in cell source):
|
||||
```python
|
||||
# NBVAL_CHECK_OUTPUT — in lax mode, validate this cell's output
|
||||
# NBVAL_SKIP — skip this cell entirely
|
||||
# NBVAL_RAISES_EXCEPTION — expect an exception (test passes if raised)
|
||||
```
|
||||
|
||||
This becomes the CI gate: before a notebook PR is merged, run `pytest --nbval-lax` to verify no cells produce errors and critical output cells still produce expected values.
|
||||
|
||||
---
|
||||
|
||||
## 5. Gaps and Recommendations
|
||||
|
||||
### 5.1 Gap Assessment (Refining Timmy's Original Findings)
|
||||
|
||||
| Gap | Severity | Solution |
|
||||
|---|---|---|
|
||||
| No Hermes tool access in kernel | High | Inject `hermes_runtime` module (see §5.2) |
|
||||
| No structured output protocol | High | Use scrapbook `sb.glue()` pattern |
|
||||
| No parameterization | Medium | Add Papermill `"parameters"` cell to notebooks |
|
||||
| XSRF/auth friction | Medium | Disable for local; use JupyterHub token scopes for multi-user |
|
||||
| No notebook CI/testing | Medium | Add nbval to test suite |
|
||||
| Raw `.ipynb` diffs in PRs | Medium | Install nbstripout + nbdime |
|
||||
| No scheduling | Low | Papermill + existing Hermes cron layer |
|
||||
|
||||
### 5.2 Short-Term Recommendations (This Month)
|
||||
|
||||
**1. `NotebookExecutor` tool**
|
||||
|
||||
A thin Hermes tool wrapping the ecosystem:
|
||||
|
||||
```python
|
||||
class NotebookExecutor:
|
||||
def execute(self, input_path, output_path, parameters, timeout=300):
|
||||
"""Wraps pm.execute_notebook(). Returns structured result dict."""
|
||||
|
||||
def collect_outputs(self, notebook_path):
|
||||
"""Wraps sb.read_notebook(). Returns dict of named scraps."""
|
||||
|
||||
def inspect_parameters(self, notebook_path):
|
||||
"""Wraps pm.inspect_notebook(). Returns parameter schema."""
|
||||
|
||||
def read_notebook(self, path):
|
||||
"""Returns nbformat NotebookNode for cell inspection/modification."""
|
||||
|
||||
def write_notebook(self, nb, path):
|
||||
"""Writes modified NotebookNode back to disk."""
|
||||
|
||||
def diff_notebooks(self, path_a, path_b):
|
||||
"""Returns structured nbdime diff for agent reasoning."""
|
||||
|
||||
def validate(self, notebook_path):
|
||||
"""Runs nbformat.validate() + optional pytest --nbval-lax."""
|
||||
```
|
||||
|
||||
Execution result structure for the agent:
|
||||
```python
|
||||
{
|
||||
"status": "success" | "error",
|
||||
"duration_seconds": 12.34,
|
||||
"cells_executed": 15,
|
||||
"failed_cell": { # None on success
|
||||
"index": 7,
|
||||
"source": "model.fit(X, y)",
|
||||
"ename": "ValueError",
|
||||
"evalue": "Input contains NaN",
|
||||
},
|
||||
"scraps": { # from scrapbook
|
||||
"health_score": 0.94,
|
||||
"alert_count": 12,
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
**2. Fleet Health Check as a Notebook**
|
||||
|
||||
Convert the fleet health check epic into a parameterized notebook with:
|
||||
- `"parameters"` cell for run configuration (date range, thresholds, agent ID)
|
||||
- Markdown cells narrating each step
|
||||
- `sb.glue()` calls for structured outputs
|
||||
- `# NBVAL_CHECK_OUTPUT` markers on critical cells
|
||||
|
||||
**3. Git hygiene for notebooks**
|
||||
|
||||
Install nbstripout + nbdime in the hermes-agent repo:
|
||||
```bash
|
||||
pip install nbstripout nbdime
|
||||
nbstripout --install
|
||||
nbdime config-git --enable
|
||||
```
|
||||
|
||||
Add to `.gitattributes`:
|
||||
```
|
||||
*.ipynb filter=nbstripout
|
||||
*.ipynb diff=ipynb
|
||||
runs/*.ipynb !filter
|
||||
```
|
||||
|
||||
### 5.3 Medium-Term Recommendations (Next Quarter)
|
||||
|
||||
**4. `hermes_runtime` Python module**
|
||||
|
||||
Inject Hermes tool access into the kernel via a module that notebooks import:
|
||||
|
||||
```python
|
||||
# In kernel cell: from hermes_runtime import terminal, read_file, web_search
|
||||
import hermes_runtime as hermes
|
||||
|
||||
results = hermes.web_search("fleet health metrics best practices")
|
||||
hermes.terminal("systemctl status agent-fleet")
|
||||
content = hermes.read_file("/var/log/hermes/agent.log")
|
||||
```
|
||||
|
||||
This closes the most significant gap: notebooks gain the same tool access as skills, while retaining state persistence and narrative structure.
|
||||
|
||||
**5. Notebook-triggered cron**
|
||||
|
||||
Extend the Hermes cron layer to accept `.ipynb` paths as targets:
|
||||
```yaml
|
||||
# cron entry
|
||||
schedule: "0 6 * * *"
|
||||
type: notebook
|
||||
path: notebooks/fleet_health_check.ipynb
|
||||
parameters:
|
||||
run_id: "{{date}}"
|
||||
alert_threshold: 0.90
|
||||
output_path: runs/fleet_health_{{date}}.ipynb
|
||||
```
|
||||
|
||||
The cron runner calls `pm.execute_notebook()` and commits the output to the repo.
|
||||
|
||||
**6. JupyterHub for multi-agent isolation**
|
||||
|
||||
If multiple agents need concurrent notebook execution, deploy JupyterHub with `DockerSpawner` or `KubeSpawner`. Each agent job gets an isolated container with its own kernel, no state bleed between runs.
|
||||
|
||||
---
|
||||
|
||||
## 6. Architecture Vision
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Hermes Agent │
|
||||
│ │
|
||||
│ Skills (one-shot) Notebooks (multi-step) │
|
||||
│ ┌─────────────────┐ ┌─────────────────────────────────┐ │
|
||||
│ │ terminal() │ │ .ipynb file │ │
|
||||
│ │ web_search() │ │ ├── Markdown (narrative) │ │
|
||||
│ │ read_file() │ │ ├── Code cells (logic) │ │
|
||||
│ └─────────────────┘ │ ├── "parameters" cell │ │
|
||||
│ │ └── sb.glue() outputs │ │
|
||||
│ └──────────────┬────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌──────────────▼────────────────┐ │
|
||||
│ │ NotebookExecutor tool │ │
|
||||
│ │ (papermill + scrapbook + │ │
|
||||
│ │ nbformat + nbdime + nbval) │ │
|
||||
│ └──────────────┬────────────────┘ │
|
||||
│ │ │
|
||||
└────────────────────────────────────────────┼────────────────────┘
|
||||
│
|
||||
┌───────────────────▼──────────────────┐
|
||||
│ JupyterLab / Hub │
|
||||
│ (kernel execution environment) │
|
||||
└───────────────────┬──────────────────┘
|
||||
│
|
||||
┌───────────────────▼──────────────────┐
|
||||
│ Git + Gitea │
|
||||
│ (nbstripout clean diffs, │
|
||||
│ nbdime semantic review, │
|
||||
│ PR workflow for notebook changes) │
|
||||
└──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Notebooks become the primary artifact of complex tasks:** the agent generates or edits cells, Papermill executes them reproducibly, scrapbook extracts structured outputs for agent decision-making, and the resulting `.ipynb` is both proof-of-work and human-readable report. Skills remain for one-shot actions. Notebooks own multi-step workflows.
|
||||
|
||||
---
|
||||
|
||||
## 7. Package Summary
|
||||
|
||||
| Package | Purpose | Install |
|
||||
|---|---|---|
|
||||
| `nbformat` | Read/write/validate `.ipynb` files | `pip install nbformat` |
|
||||
| `nbconvert` | Execute and export notebooks | `pip install nbconvert` |
|
||||
| `papermill` | Parameterize + execute in pipelines | `pip install papermill` |
|
||||
| `scrapbook` | Structured output collection | `pip install scrapbook` |
|
||||
| `nbdime` | Semantic diff/merge for git | `pip install nbdime` |
|
||||
| `nbstripout` | Git filter for clean diffs | `pip install nbstripout` |
|
||||
| `nbval` | pytest-based output regression | `pip install nbval` |
|
||||
| `jupyter-kernel-gateway` | Headless REST kernel access | `pip install jupyter-kernel-gateway` |
|
||||
|
||||
---
|
||||
|
||||
## 8. References
|
||||
|
||||
- [Papermill GitHub (nteract/papermill)](https://github.com/nteract/papermill)
|
||||
- [Scrapbook GitHub (nteract/scrapbook)](https://github.com/nteract/scrapbook)
|
||||
- [nbformat format specification](https://nbformat.readthedocs.io/en/latest/format_description.html)
|
||||
- [nbdime documentation](https://nbdime.readthedocs.io/)
|
||||
- [nbdime diff format spec (JEP #8)](https://github.com/jupyter/enhancement-proposals/blob/master/08-notebook-diff/notebook-diff.md)
|
||||
- [nbconvert execute API](https://nbconvert.readthedocs.io/en/latest/execute_api.html)
|
||||
- [nbstripout README](https://github.com/kynan/nbstripout)
|
||||
- [nbval GitHub (computationalmodelling/nbval)](https://github.com/computationalmodelling/nbval)
|
||||
- [JupyterHub REST API](https://jupyterhub.readthedocs.io/en/stable/howto/rest.html)
|
||||
- [JupyterHub Technical Overview](https://jupyterhub.readthedocs.io/en/latest/reference/technical-overview.html)
|
||||
- [Jupyter Kernel Gateway](https://github.com/jupyter-server/kernel_gateway)
|
||||
166
docs/research-ssd-self-distillation-2026-04.md
Normal file
166
docs/research-ssd-self-distillation-2026-04.md
Normal file
@@ -0,0 +1,166 @@
|
||||
# Research Acknowledgment: SSD — Simple Self-Distillation Improves Code Generation
|
||||
|
||||
**Issue:** #128
|
||||
**Paper:** [Embarrassingly Simple Self-Distillation Improves Code Generation](https://arxiv.org/abs/2604.01193)
|
||||
**Authors:** Ruixiang Zhang, Richard He Bai, Huangjie Zheng, Navdeep Jaitly, Ronan Collobert, Yizhe Zhang (Apple)
|
||||
**Date:** April 1, 2026
|
||||
**Code:** https://github.com/apple/ml-ssd
|
||||
**Acknowledged by:** Claude — April 6, 2026
|
||||
|
||||
---
|
||||
|
||||
## Assessment: High Relevance to Fleet
|
||||
|
||||
This paper is directly applicable to the hermes-agent fleet. The headline result — +7.5pp pass@1 on Qwen3-4B — is at exactly the scale we operate. The method requires no external infrastructure. Triage verdict: **P0 / Week-class work**.
|
||||
|
||||
---
|
||||
|
||||
## What SSD Actually Does
|
||||
|
||||
Three steps, nothing exotic:
|
||||
|
||||
1. **Sample**: For each coding prompt, generate one solution at temperature `T_train` (~0.9). Do NOT filter for correctness.
|
||||
2. **Fine-tune**: SFT on the resulting `(prompt, unverified_solution)` pairs. Standard cross-entropy loss. No RLHF, no GRPO, no DPO.
|
||||
3. **Evaluate**: At `T_eval` (which must be **different** from `T_train`). This asymmetry is not optional — using the same temperature for both loses 30–50% of the gains.
|
||||
|
||||
The counterintuitive part: N=1 per problem, unverified. Prior self-improvement work uses N>>1 and filters by execution. SSD doesn't. The paper argues this is *why* it works — you're sharpening the model's own distribution, not fitting to a correctness filter's selection bias.
|
||||
|
||||
---
|
||||
|
||||
## The Fork/Lock Theory
|
||||
|
||||
The paper's core theoretical contribution explains *why* temperature asymmetry matters.
|
||||
|
||||
**Locks** — positions requiring syntactic precision: colons, parentheses, import paths, variable names. A mistake here is a hard error. Low temperature helps at Locks. But applying low temperature globally kills diversity everywhere.
|
||||
|
||||
**Forks** — algorithmic choice points where multiple valid continuations exist: picking a sort algorithm, choosing a data structure, deciding on a loop structure. High temperature helps at Forks. But applying high temperature globally introduces errors at Locks.
|
||||
|
||||
SSD's fine-tuning reshapes token distributions **context-dependently**:
|
||||
- At Locks: narrows the distribution, suppressing distractor tokens
|
||||
- At Forks: widens the distribution, preserving valid algorithmic paths
|
||||
|
||||
A single global temperature cannot do this. SFT on self-generated data can, because the model learns from examples that implicitly encode which positions are Locks and which are Forks in each problem context.
|
||||
|
||||
**Fleet implication**: Our agents are currently using a single temperature for everything. This is leaving performance on the table even without fine-tuning. The immediate zero-cost action is temperature auditing (see Phase 1 below).
|
||||
|
||||
---
|
||||
|
||||
## Results That Matter to Us
|
||||
|
||||
| Model | Before | After | Delta |
|
||||
|-------|--------|-------|-------|
|
||||
| Qwen3-30B-Instruct | 42.4% | 55.3% | +12.9pp (+30% rel) |
|
||||
| Qwen3-4B-Instruct | baseline | baseline+7.5pp | +7.5pp |
|
||||
| Llama-3.1-8B-Instruct | baseline | baseline+3.5pp | +3.5pp |
|
||||
|
||||
Gains concentrate on hard problems: +14.2pp medium, +15.3pp hard. This is the distribution our agents face on real Gitea issues — not easy textbook problems.
|
||||
|
||||
---
|
||||
|
||||
## Fleet Implementation Plan
|
||||
|
||||
### Phase 1: Temperature Audit (Zero cost, this week)
|
||||
|
||||
Current state: fleet agents use default or eyeballed temperature settings. The paper shows T_eval != T_train is critical even without fine-tuning.
|
||||
|
||||
Actions:
|
||||
1. Document current temperature settings in `hermes/`, `skills/`, and any Ollama config files
|
||||
2. Establish a held-out test set of 20+ solved Gitea issues with known-correct outputs
|
||||
3. Run A/B: current T_eval vs. T_eval=0.7 vs. T_eval=0.3 for code generation tasks
|
||||
4. Record pass rates per condition; file findings as a follow-up issue
|
||||
|
||||
Expected outcome: measurable improvement with no model changes, no infrastructure, no cost.
|
||||
|
||||
### Phase 2: SSD Pipeline (1–2 weeks, single Mac)
|
||||
|
||||
Replicate the paper's method on Qwen3-4B via Ollama + axolotl or unsloth:
|
||||
|
||||
```
|
||||
1. Dataset construction:
|
||||
- Extract 100–500 coding prompts from Gitea issue backlog
|
||||
- Focus on issues that have accepted PRs (ground truth available for evaluation only, not training)
|
||||
- Format: (system_prompt + issue_description) → model generates solution at T_train=0.9
|
||||
|
||||
2. Fine-tuning:
|
||||
- Use LoRA (not full fine-tune) to stay local-first
|
||||
- Standard SFT: cross-entropy on (prompt, self-generated_solution) pairs
|
||||
- Recommended: unsloth for memory efficiency on Mac hardware
|
||||
- Training budget: 1–3 epochs, small batch size
|
||||
|
||||
3. Evaluation:
|
||||
- Compare base model vs. SSD-tuned model at T_eval=0.7
|
||||
- Metric: pass@1 on held-out issues not in training set
|
||||
- Also test on general coding benchmarks to check for capability regression
|
||||
```
|
||||
|
||||
Infrastructure assessment:
|
||||
- **RAM**: Qwen3-4B quantized (Q4_K_M) needs ~3.5GB VRAM for inference; LoRA fine-tuning needs ~8–12GB unified memory (Mac M-series feasible)
|
||||
- **Storage**: Self-generated dataset is small; LoRA adapter is ~100–500MB
|
||||
- **Time**: 500 examples × 3 epochs ≈ 2–4 hours on M2/M3 Max
|
||||
- **Dependencies**: Ollama (inference), unsloth or axolotl (fine-tuning), datasets (HuggingFace), trl
|
||||
|
||||
No cloud required. No teacher model required. No code execution environment required.
|
||||
|
||||
### Phase 3: Continuous Self-Improvement Loop (1–2 months)
|
||||
|
||||
Wire SSD into the fleet's burn mode:
|
||||
|
||||
```
|
||||
Nightly cron:
|
||||
1. Collect agent solutions from the day's completed issues
|
||||
2. Filter: only solutions where the PR was merged (human-verified correct)
|
||||
3. Append to rolling training buffer (last 500 examples)
|
||||
4. Run SFT fine-tune on buffer → update LoRA adapter
|
||||
5. Swap adapter into Ollama deployment at dawn
|
||||
6. Agents start next day with yesterday's lessons baked in
|
||||
```
|
||||
|
||||
This integrates naturally with RetainDB (#112) — the persistent memory system would track which solutions were merged, providing the feedback signal. The continuous loop turns every merged PR into a training example.
|
||||
|
||||
### Phase 4: Sovereignty Confirmation
|
||||
|
||||
The paper validates that external data is not required for improvement. Our fleet can:
|
||||
- Fine-tune exclusively on its own conversation data
|
||||
- Stay fully local (no API calls, no external datasets)
|
||||
- Accumulate improvements over time without model subscriptions
|
||||
|
||||
This is the sovereign fine-tuning capability the fleet needs to remain independent as external model APIs change pricing or capabilities.
|
||||
|
||||
---
|
||||
|
||||
## Risks and Mitigations
|
||||
|
||||
| Risk | Assessment | Mitigation |
|
||||
|------|------------|------------|
|
||||
| SSD gains don't transfer from LiveCodeBench to Gitea issues | Medium — our domain is software engineering, not competitive programming | Test on actual Gitea issues from the backlog; don't assume benchmark numbers transfer |
|
||||
| Fine-tuning degrades non-code capabilities | Low-Medium | LoRA instead of full fine-tune; test on general tasks after SFT; retain base model checkpoint |
|
||||
| Small training set (<200 examples) insufficient | Medium | Paper shows gains at modest scale; supplement with open code datasets (Stack, TheVault) if needed |
|
||||
| Qwen3 GGUF format incompatible with unsloth fine-tuning | Low | unsloth supports Qwen3; verify exact GGUF variant compatibility before starting |
|
||||
| Temperature asymmetry effect smaller on instruction-tuned variants | Low | Paper explicitly tests instruct variants and shows gains; Qwen3-4B-Instruct is in the paper's results |
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria Status
|
||||
|
||||
From the issue:
|
||||
|
||||
- [ ] **Temperature audit** — Document current T/top_p settings across fleet agents, compare with paper recommendations
|
||||
- [ ] **T_eval benchmark** — A/B test on 20+ solved Gitea issues; measure correctness
|
||||
- [ ] **SSD reproduction** — Replicate pipeline on Qwen4B with 100 prompts; measure pass@1 change
|
||||
- [ ] **Infrastructure assessment** — Documented above (Phase 2 section); GPU/RAM/storage requirements are Mac-feasible
|
||||
- [ ] **Continuous loop design** — Architecture drafted above (Phase 3 section); integrates with RetainDB (#112)
|
||||
|
||||
Infrastructure assessment and continuous loop design are addressed in this document. Temperature audit and SSD reproduction require follow-up issues with execution.
|
||||
|
||||
---
|
||||
|
||||
## Recommended Follow-Up Issues
|
||||
|
||||
1. **Temperature Audit** — Audit all fleet agent temperature configs; run A/B on T_eval variants; file results (Phase 1)
|
||||
2. **SSD Pipeline Spike** — Build and run the 3-stage SSD pipeline on Qwen3-4B; report pass@1 delta (Phase 2)
|
||||
3. **Nightly SFT Integration** — Wire SSD into burn-mode cron; integrate with RetainDB feedback loop (Phase 3)
|
||||
|
||||
---
|
||||
|
||||
*Research acknowledged by Claude — April 6, 2026*
|
||||
*Source issue: [hermes-agent #128](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/128)*
|
||||
@@ -443,6 +443,7 @@ class APIServerAdapter(BasePlatformAdapter):
|
||||
self._runner: Optional["web.AppRunner"] = None
|
||||
self._site: Optional["web.TCPSite"] = None
|
||||
self._response_store = ResponseStore()
|
||||
self._start_time: float = time.time()
|
||||
|
||||
@staticmethod
|
||||
def _parse_cors_origins(value: Any) -> tuple[str, ...]:
|
||||
@@ -582,8 +583,53 @@ class APIServerAdapter(BasePlatformAdapter):
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
async def _handle_health(self, request: "web.Request") -> "web.Response":
|
||||
"""GET /health — simple health check."""
|
||||
return web.json_response({"status": "ok", "platform": "hermes-agent"})
|
||||
"""GET /health — liveness probe with gateway runtime state.
|
||||
|
||||
Returns HTTP 200 with a JSON body while the API server process is alive.
|
||||
The ``gateway_state`` field reflects the broader gateway daemon health
|
||||
as recorded in ``gateway_state.json`` (written by gateway/status.py).
|
||||
Consumers should treat any non-200 response as a failure.
|
||||
|
||||
Response fields:
|
||||
status — always "ok" when the HTTP server is reachable.
|
||||
platform — service name.
|
||||
version — package version (if available).
|
||||
uptime_seconds — seconds since this process started.
|
||||
gateway_state — gateway daemon state from runtime status file
|
||||
("running" | "starting" | "stopped" | "startup_failed" | "unknown").
|
||||
platforms — per-platform adapter states (from runtime status).
|
||||
"""
|
||||
payload: dict = {
|
||||
"status": "ok",
|
||||
"platform": "hermes-agent",
|
||||
}
|
||||
|
||||
# Package version.
|
||||
try:
|
||||
from importlib.metadata import version as pkg_version
|
||||
payload["version"] = pkg_version("hermes-agent")
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# Process uptime.
|
||||
try:
|
||||
payload["uptime_seconds"] = round(time.time() - self._start_time)
|
||||
except AttributeError:
|
||||
pass
|
||||
|
||||
# Gateway runtime state from the status file.
|
||||
try:
|
||||
from gateway.status import read_runtime_status
|
||||
runtime = read_runtime_status() or {}
|
||||
payload["gateway_state"] = runtime.get("gateway_state", "unknown")
|
||||
payload["platforms"] = {
|
||||
name: {"state": pdata.get("state", "unknown")}
|
||||
for name, pdata in runtime.get("platforms", {}).items()
|
||||
}
|
||||
except Exception:
|
||||
payload["gateway_state"] = "unknown"
|
||||
|
||||
return web.json_response(payload)
|
||||
|
||||
async def _handle_models(self, request: "web.Request") -> "web.Response":
|
||||
"""GET /v1/models — return hermes-agent as an available model."""
|
||||
|
||||
57
notebooks/agent_task_system_health.ipynb
Normal file
57
notebooks/agent_task_system_health.ipynb
Normal file
@@ -0,0 +1,57 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Parameterized Agent Task: System Health Check\n",
|
||||
"\n",
|
||||
"This notebook demonstrates how an LLM agent can generate a task notebook,\n",
|
||||
"a scheduler can parameterize and execute it via papermill,\n",
|
||||
"and the output becomes a persistent audit artifact."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {"tags": ["parameters"]},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Default parameters — papermill will inject overrides here\n",
|
||||
"threshold = 1.0\n",
|
||||
"hostname = \"localhost\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import json, subprocess, datetime\n",
|
||||
"gather_time = datetime.datetime.now().isoformat()\n",
|
||||
"load_avg = subprocess.check_output([\"cat\", \"/proc/loadavg\"]).decode().strip()\n",
|
||||
"load_values = [float(x) for x in load_avg.split()[:3]]\n",
|
||||
"avg_load = sum(load_values) / len(load_values)\n",
|
||||
"intervention_needed = avg_load > threshold\n",
|
||||
"report = {\n",
|
||||
" \"hostname\": hostname,\n",
|
||||
" \"threshold\": threshold,\n",
|
||||
" \"avg_load\": round(avg_load, 3),\n",
|
||||
" \"intervention_needed\": intervention_needed,\n",
|
||||
" \"gathered_at\": gather_time\n",
|
||||
"}\n",
|
||||
"print(json.dumps(report, indent=2))"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
41
notebooks/agent_task_system_health.py
Normal file
41
notebooks/agent_task_system_health.py
Normal file
@@ -0,0 +1,41 @@
|
||||
# ---
|
||||
# jupyter:
|
||||
# jupytext:
|
||||
# text_representation:
|
||||
# extension: .py
|
||||
# format_name: percent
|
||||
# format_version: '1.3'
|
||||
# jupytext_version: 1.19.1
|
||||
# kernelspec:
|
||||
# display_name: Python 3
|
||||
# language: python
|
||||
# name: python3
|
||||
# ---
|
||||
|
||||
# %% [markdown]
|
||||
# # Parameterized Agent Task: System Health Check
|
||||
#
|
||||
# This notebook demonstrates how an LLM agent can generate a task notebook,
|
||||
# a scheduler can parameterize and execute it via papermill,
|
||||
# and the output becomes a persistent audit artifact.
|
||||
|
||||
# %% tags=["parameters"]
|
||||
# Default parameters — papermill will inject overrides here
|
||||
threshold = 1.0
|
||||
hostname = "localhost"
|
||||
|
||||
# %%
|
||||
import json, subprocess, datetime
|
||||
gather_time = datetime.datetime.now().isoformat()
|
||||
load_avg = subprocess.check_output(["cat", "/proc/loadavg"]).decode().strip()
|
||||
load_values = [float(x) for x in load_avg.split()[:3]]
|
||||
avg_load = sum(load_values) / len(load_values)
|
||||
intervention_needed = avg_load > threshold
|
||||
report = {
|
||||
"hostname": hostname,
|
||||
"threshold": threshold,
|
||||
"avg_load": round(avg_load, 3),
|
||||
"intervention_needed": intervention_needed,
|
||||
"gathered_at": gather_time
|
||||
}
|
||||
print(json.dumps(report, indent=2))
|
||||
955
observatory.py
Normal file
955
observatory.py
Normal file
@@ -0,0 +1,955 @@
|
||||
"""
|
||||
Observatory — Testbed Health Monitoring & Alerting for Hermes Agent
|
||||
|
||||
Checks running services, system resources, and connectivity.
|
||||
Fires Telegram alerts when thresholds are breached.
|
||||
Posts daily digest reports.
|
||||
Stores 30 days of historical health data in SQLite.
|
||||
|
||||
Usage:
|
||||
python observatory.py --check # one-shot health check (stdout)
|
||||
python observatory.py --daemon # continuous monitor (60s poll)
|
||||
python observatory.py --digest # print / send daily digest
|
||||
python observatory.py --history N # show last N health records
|
||||
python observatory.py --slo # print SLO report
|
||||
|
||||
Configuration (env vars, falls back to ~/.hermes/.env):
|
||||
OBSERVATORY_ALERT_CHAT_ID Telegram chat ID for alerts
|
||||
OBSERVATORY_DIGEST_CHAT_ID Telegram chat ID for daily digest (default: alert chat)
|
||||
OBSERVATORY_POLL_INTERVAL Seconds between health polls (default: 60)
|
||||
OBSERVATORY_DB_PATH SQLite path (default: ~/.hermes/observatory.db)
|
||||
TELEGRAM_BOT_TOKEN Bot token used to send alerts
|
||||
|
||||
# Threshold overrides (all optional):
|
||||
OBSERVATORY_DISK_WARN_PCT Disk usage warn threshold (default: 80)
|
||||
OBSERVATORY_DISK_CRIT_PCT Disk usage critical threshold (default: 90)
|
||||
OBSERVATORY_MEM_WARN_PCT Memory usage warn threshold (default: 80)
|
||||
OBSERVATORY_MEM_CRIT_PCT Memory usage critical threshold (default: 90)
|
||||
OBSERVATORY_CPU_WARN_PCT CPU usage warn threshold (default: 80)
|
||||
OBSERVATORY_CPU_CRIT_PCT CPU usage critical threshold (default: 95)
|
||||
OBSERVATORY_WEBHOOK_URL Webhook endpoint to probe (default: http://127.0.0.1:8080/health)
|
||||
OBSERVATORY_API_URL API server health URL (default: http://127.0.0.1:8642/health)
|
||||
OBSERVATORY_WEBHOOK_LATENCY_SLO_MS Webhook latency SLO ms (default: 2000)
|
||||
OBSERVATORY_GATEWAY_UPTIME_SLO_PCT Gateway uptime SLO % (default: 99.5)
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import signal
|
||||
import sqlite3
|
||||
import sys
|
||||
import time
|
||||
import urllib.request
|
||||
import urllib.error
|
||||
from contextlib import contextmanager
|
||||
from dataclasses import dataclass, field, asdict
|
||||
from datetime import datetime, timezone, timedelta
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional, Tuple
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Optional imports
|
||||
# ---------------------------------------------------------------------------
|
||||
try:
|
||||
import psutil
|
||||
_PSUTIL = True
|
||||
except ImportError:
|
||||
_PSUTIL = False
|
||||
|
||||
try:
|
||||
from dotenv import load_dotenv as _load_dotenv
|
||||
_DOTENV = True
|
||||
except ImportError:
|
||||
_DOTENV = False
|
||||
|
||||
logger = logging.getLogger("observatory")
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Constants & SLO definitions
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
RETENTION_DAYS = 30
|
||||
|
||||
SLO_DEFINITIONS = {
|
||||
"gateway_uptime_pct": {
|
||||
"description": "Gateway process uptime over the last 24 hours",
|
||||
"target": 99.5,
|
||||
"unit": "%",
|
||||
},
|
||||
"webhook_latency_ms": {
|
||||
"description": "Webhook endpoint p95 response latency",
|
||||
"target": 2000,
|
||||
"unit": "ms",
|
||||
"direction": "lower_is_better",
|
||||
},
|
||||
"api_server_latency_ms": {
|
||||
"description": "API server /health p95 response latency",
|
||||
"target": 2000,
|
||||
"unit": "ms",
|
||||
"direction": "lower_is_better",
|
||||
},
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Configuration
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _load_env() -> None:
|
||||
"""Load .env from HERMES_HOME if dotenv is available."""
|
||||
if not _DOTENV:
|
||||
return
|
||||
hermes_home = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
|
||||
env_path = hermes_home / ".env"
|
||||
if env_path.exists():
|
||||
_load_dotenv(env_path, override=False)
|
||||
# Project-level .env as dev fallback
|
||||
project_env = Path(__file__).parent / ".env"
|
||||
if project_env.exists():
|
||||
_load_dotenv(project_env, override=False)
|
||||
|
||||
|
||||
@dataclass
|
||||
class ObservatoryConfig:
|
||||
alert_chat_id: Optional[str] = None
|
||||
digest_chat_id: Optional[str] = None
|
||||
telegram_token: Optional[str] = None
|
||||
poll_interval: int = 60
|
||||
db_path: Path = field(default_factory=lambda: Path(os.getenv("HERMES_HOME", Path.home() / ".hermes")) / "observatory.db")
|
||||
disk_warn_pct: float = 80.0
|
||||
disk_crit_pct: float = 90.0
|
||||
mem_warn_pct: float = 80.0
|
||||
mem_crit_pct: float = 90.0
|
||||
cpu_warn_pct: float = 80.0
|
||||
cpu_crit_pct: float = 95.0
|
||||
webhook_url: str = "http://127.0.0.1:8080/health"
|
||||
api_url: str = "http://127.0.0.1:8642/health"
|
||||
webhook_latency_slo_ms: float = 2000.0
|
||||
gateway_uptime_slo_pct: float = 99.5
|
||||
|
||||
@classmethod
|
||||
def from_env(cls) -> "ObservatoryConfig":
|
||||
_load_env()
|
||||
cfg = cls()
|
||||
cfg.telegram_token = os.getenv("TELEGRAM_BOT_TOKEN")
|
||||
cfg.alert_chat_id = os.getenv("OBSERVATORY_ALERT_CHAT_ID")
|
||||
cfg.digest_chat_id = os.getenv("OBSERVATORY_DIGEST_CHAT_ID") or cfg.alert_chat_id
|
||||
cfg.poll_interval = int(os.getenv("OBSERVATORY_POLL_INTERVAL", 60))
|
||||
db_override = os.getenv("OBSERVATORY_DB_PATH")
|
||||
if db_override:
|
||||
cfg.db_path = Path(db_override)
|
||||
cfg.disk_warn_pct = float(os.getenv("OBSERVATORY_DISK_WARN_PCT", 80))
|
||||
cfg.disk_crit_pct = float(os.getenv("OBSERVATORY_DISK_CRIT_PCT", 90))
|
||||
cfg.mem_warn_pct = float(os.getenv("OBSERVATORY_MEM_WARN_PCT", 80))
|
||||
cfg.mem_crit_pct = float(os.getenv("OBSERVATORY_MEM_CRIT_PCT", 90))
|
||||
cfg.cpu_warn_pct = float(os.getenv("OBSERVATORY_CPU_WARN_PCT", 80))
|
||||
cfg.cpu_crit_pct = float(os.getenv("OBSERVATORY_CPU_CRIT_PCT", 95))
|
||||
cfg.webhook_url = os.getenv("OBSERVATORY_WEBHOOK_URL", "http://127.0.0.1:8080/health")
|
||||
cfg.api_url = os.getenv("OBSERVATORY_API_URL", "http://127.0.0.1:8642/health")
|
||||
cfg.webhook_latency_slo_ms = float(os.getenv("OBSERVATORY_WEBHOOK_LATENCY_SLO_MS", 2000))
|
||||
cfg.gateway_uptime_slo_pct = float(os.getenv("OBSERVATORY_GATEWAY_UPTIME_SLO_PCT", 99.5))
|
||||
return cfg
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Health check models
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
@dataclass
|
||||
class CheckResult:
|
||||
name: str
|
||||
status: str # "ok" | "warn" | "critical" | "error"
|
||||
message: str
|
||||
value: Optional[float] = None
|
||||
unit: Optional[str] = None
|
||||
extra: Dict[str, Any] = field(default_factory=dict)
|
||||
|
||||
|
||||
@dataclass
|
||||
class HealthSnapshot:
|
||||
ts: str # ISO8601 UTC
|
||||
checks: List[CheckResult] = field(default_factory=list)
|
||||
|
||||
@property
|
||||
def overall_status(self) -> str:
|
||||
statuses = {c.status for c in self.checks}
|
||||
if "critical" in statuses or "error" in statuses:
|
||||
return "critical"
|
||||
if "warn" in statuses:
|
||||
return "warn"
|
||||
return "ok"
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
return {
|
||||
"ts": self.ts,
|
||||
"overall": self.overall_status,
|
||||
"checks": [asdict(c) for c in self.checks],
|
||||
}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Individual health checks
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def check_gateway_liveness() -> CheckResult:
|
||||
"""Check whether the Hermes gateway process is running."""
|
||||
try:
|
||||
from gateway.status import is_gateway_running, get_running_pid
|
||||
running = is_gateway_running()
|
||||
pid = get_running_pid()
|
||||
if running:
|
||||
return CheckResult(
|
||||
name="gateway_process",
|
||||
status="ok",
|
||||
message=f"Gateway running (pid={pid})",
|
||||
value=float(pid) if pid else None,
|
||||
)
|
||||
return CheckResult(
|
||||
name="gateway_process",
|
||||
status="critical",
|
||||
message="Gateway process is NOT running",
|
||||
)
|
||||
except Exception as exc:
|
||||
return CheckResult(
|
||||
name="gateway_process",
|
||||
status="error",
|
||||
message=f"Could not determine gateway status: {exc}",
|
||||
)
|
||||
|
||||
|
||||
def check_api_server_http(cfg: ObservatoryConfig) -> CheckResult:
|
||||
"""Check API server /health endpoint responsiveness."""
|
||||
url = cfg.api_url
|
||||
start = time.monotonic()
|
||||
try:
|
||||
req = urllib.request.Request(url, method="GET")
|
||||
req.add_header("User-Agent", "hermes-observatory/1.0")
|
||||
with urllib.request.urlopen(req, timeout=10) as resp:
|
||||
latency_ms = (time.monotonic() - start) * 1000
|
||||
body = resp.read(512).decode("utf-8", errors="replace")
|
||||
status_code = resp.status
|
||||
if status_code < 400:
|
||||
slo_ok = latency_ms <= cfg.webhook_latency_slo_ms
|
||||
return CheckResult(
|
||||
name="api_server_http",
|
||||
status="ok" if slo_ok else "warn",
|
||||
message=f"API server OK ({latency_ms:.0f}ms){'' if slo_ok else ' — exceeds latency SLO'}",
|
||||
value=latency_ms,
|
||||
unit="ms",
|
||||
extra={"status_code": status_code, "body_preview": body[:100]},
|
||||
)
|
||||
return CheckResult(
|
||||
name="api_server_http",
|
||||
status="critical",
|
||||
message=f"API server returned HTTP {status_code}",
|
||||
value=latency_ms,
|
||||
unit="ms",
|
||||
)
|
||||
except urllib.error.URLError as exc:
|
||||
latency_ms = (time.monotonic() - start) * 1000
|
||||
# Not running is acceptable if gateway is not configured for API
|
||||
reason = str(exc.reason) if hasattr(exc, "reason") else str(exc)
|
||||
if "Connection refused" in reason or "Connection reset" in reason:
|
||||
return CheckResult(
|
||||
name="api_server_http",
|
||||
status="warn",
|
||||
message=f"API server not reachable at {url} (not started?)",
|
||||
value=latency_ms,
|
||||
unit="ms",
|
||||
)
|
||||
return CheckResult(
|
||||
name="api_server_http",
|
||||
status="error",
|
||||
message=f"API server probe error: {exc}",
|
||||
value=latency_ms,
|
||||
unit="ms",
|
||||
)
|
||||
except Exception as exc:
|
||||
latency_ms = (time.monotonic() - start) * 1000
|
||||
return CheckResult(
|
||||
name="api_server_http",
|
||||
status="error",
|
||||
message=f"API server probe exception: {exc}",
|
||||
value=latency_ms,
|
||||
unit="ms",
|
||||
)
|
||||
|
||||
|
||||
def check_webhook_http(cfg: ObservatoryConfig) -> CheckResult:
|
||||
"""Check webhook endpoint responsiveness."""
|
||||
url = cfg.webhook_url
|
||||
start = time.monotonic()
|
||||
try:
|
||||
req = urllib.request.Request(url, method="GET")
|
||||
req.add_header("User-Agent", "hermes-observatory/1.0")
|
||||
with urllib.request.urlopen(req, timeout=10) as resp:
|
||||
latency_ms = (time.monotonic() - start) * 1000
|
||||
status_code = resp.status
|
||||
slo_ok = latency_ms <= cfg.webhook_latency_slo_ms
|
||||
if status_code < 400:
|
||||
return CheckResult(
|
||||
name="webhook_http",
|
||||
status="ok" if slo_ok else "warn",
|
||||
message=f"Webhook OK ({latency_ms:.0f}ms){'' if slo_ok else ' — exceeds latency SLO'}",
|
||||
value=latency_ms,
|
||||
unit="ms",
|
||||
extra={"status_code": status_code},
|
||||
)
|
||||
return CheckResult(
|
||||
name="webhook_http",
|
||||
status="critical",
|
||||
message=f"Webhook returned HTTP {status_code}",
|
||||
value=latency_ms,
|
||||
unit="ms",
|
||||
)
|
||||
except urllib.error.URLError as exc:
|
||||
latency_ms = (time.monotonic() - start) * 1000
|
||||
reason = str(exc.reason) if hasattr(exc, "reason") else str(exc)
|
||||
if "Connection refused" in reason or "Connection reset" in reason:
|
||||
return CheckResult(
|
||||
name="webhook_http",
|
||||
status="warn",
|
||||
message=f"Webhook not reachable at {url} (not started?)",
|
||||
value=latency_ms,
|
||||
unit="ms",
|
||||
)
|
||||
return CheckResult(
|
||||
name="webhook_http",
|
||||
status="error",
|
||||
message=f"Webhook probe error: {exc}",
|
||||
value=latency_ms,
|
||||
unit="ms",
|
||||
)
|
||||
except Exception as exc:
|
||||
latency_ms = (time.monotonic() - start) * 1000
|
||||
return CheckResult(
|
||||
name="webhook_http",
|
||||
status="error",
|
||||
message=f"Webhook probe exception: {exc}",
|
||||
value=latency_ms,
|
||||
unit="ms",
|
||||
)
|
||||
|
||||
|
||||
def check_disk(cfg: ObservatoryConfig) -> CheckResult:
|
||||
"""Check disk usage on the HERMES_HOME filesystem."""
|
||||
if not _PSUTIL:
|
||||
return CheckResult(name="disk", status="error", message="psutil not installed")
|
||||
try:
|
||||
hermes_home = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
|
||||
path = str(hermes_home) if hermes_home.exists() else "/"
|
||||
usage = psutil.disk_usage(path)
|
||||
pct = usage.percent
|
||||
free_gb = usage.free / (1024 ** 3)
|
||||
if pct >= cfg.disk_crit_pct:
|
||||
status = "critical"
|
||||
elif pct >= cfg.disk_warn_pct:
|
||||
status = "warn"
|
||||
else:
|
||||
status = "ok"
|
||||
return CheckResult(
|
||||
name="disk",
|
||||
status=status,
|
||||
message=f"Disk {pct:.1f}% used ({free_gb:.1f}GB free)",
|
||||
value=pct,
|
||||
unit="%",
|
||||
extra={"free_bytes": usage.free, "total_bytes": usage.total},
|
||||
)
|
||||
except Exception as exc:
|
||||
return CheckResult(name="disk", status="error", message=f"Disk check error: {exc}")
|
||||
|
||||
|
||||
def check_memory(cfg: ObservatoryConfig) -> CheckResult:
|
||||
"""Check system memory usage."""
|
||||
if not _PSUTIL:
|
||||
return CheckResult(name="memory", status="error", message="psutil not installed")
|
||||
try:
|
||||
mem = psutil.virtual_memory()
|
||||
pct = mem.percent
|
||||
available_gb = mem.available / (1024 ** 3)
|
||||
if pct >= cfg.mem_crit_pct:
|
||||
status = "critical"
|
||||
elif pct >= cfg.mem_warn_pct:
|
||||
status = "warn"
|
||||
else:
|
||||
status = "ok"
|
||||
return CheckResult(
|
||||
name="memory",
|
||||
status=status,
|
||||
message=f"Memory {pct:.1f}% used ({available_gb:.1f}GB available)",
|
||||
value=pct,
|
||||
unit="%",
|
||||
extra={"available_bytes": mem.available, "total_bytes": mem.total},
|
||||
)
|
||||
except Exception as exc:
|
||||
return CheckResult(name="memory", status="error", message=f"Memory check error: {exc}")
|
||||
|
||||
|
||||
def check_cpu(cfg: ObservatoryConfig) -> CheckResult:
|
||||
"""Check CPU usage (1-second sample)."""
|
||||
if not _PSUTIL:
|
||||
return CheckResult(name="cpu", status="error", message="psutil not installed")
|
||||
try:
|
||||
pct = psutil.cpu_percent(interval=1)
|
||||
if pct >= cfg.cpu_crit_pct:
|
||||
status = "critical"
|
||||
elif pct >= cfg.cpu_warn_pct:
|
||||
status = "warn"
|
||||
else:
|
||||
status = "ok"
|
||||
return CheckResult(
|
||||
name="cpu",
|
||||
status=status,
|
||||
message=f"CPU {pct:.1f}%",
|
||||
value=pct,
|
||||
unit="%",
|
||||
)
|
||||
except Exception as exc:
|
||||
return CheckResult(name="cpu", status="error", message=f"CPU check error: {exc}")
|
||||
|
||||
|
||||
def check_database(cfg: ObservatoryConfig) -> CheckResult:
|
||||
"""Check observatory SQLite DB connectivity and size."""
|
||||
db_path = cfg.db_path
|
||||
try:
|
||||
if not db_path.exists():
|
||||
return CheckResult(
|
||||
name="database",
|
||||
status="warn",
|
||||
message=f"Observatory DB not yet created at {db_path}",
|
||||
)
|
||||
size_kb = db_path.stat().st_size / 1024
|
||||
conn = sqlite3.connect(str(db_path), timeout=5)
|
||||
conn.execute("SELECT count(*) FROM health_snapshots").fetchone()
|
||||
conn.close()
|
||||
return CheckResult(
|
||||
name="database",
|
||||
status="ok",
|
||||
message=f"Observatory DB OK ({size_kb:.1f}KB)",
|
||||
value=size_kb,
|
||||
unit="KB",
|
||||
extra={"path": str(db_path)},
|
||||
)
|
||||
except Exception as exc:
|
||||
return CheckResult(
|
||||
name="database",
|
||||
status="error",
|
||||
message=f"DB check error: {exc}",
|
||||
)
|
||||
|
||||
|
||||
def check_response_store_db() -> CheckResult:
|
||||
"""Check the API server's SQLite response store DB if it exists."""
|
||||
try:
|
||||
hermes_home = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
|
||||
db_path = hermes_home / "response_store.db"
|
||||
if not db_path.exists():
|
||||
return CheckResult(
|
||||
name="response_store_db",
|
||||
status="ok",
|
||||
message="Response store DB not present (API server not yet used)",
|
||||
)
|
||||
size_kb = db_path.stat().st_size / 1024
|
||||
conn = sqlite3.connect(str(db_path), timeout=5)
|
||||
count = conn.execute("SELECT count(*) FROM responses").fetchone()[0]
|
||||
conn.close()
|
||||
return CheckResult(
|
||||
name="response_store_db",
|
||||
status="ok",
|
||||
message=f"Response store DB OK ({count} responses, {size_kb:.1f}KB)",
|
||||
value=size_kb,
|
||||
unit="KB",
|
||||
)
|
||||
except Exception as exc:
|
||||
return CheckResult(
|
||||
name="response_store_db",
|
||||
status="error",
|
||||
message=f"Response store DB error: {exc}",
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Snapshot collector
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def collect_snapshot(cfg: ObservatoryConfig) -> HealthSnapshot:
|
||||
"""Run all checks and return a HealthSnapshot."""
|
||||
ts = datetime.now(timezone.utc).isoformat()
|
||||
checks = [
|
||||
check_gateway_liveness(),
|
||||
check_api_server_http(cfg),
|
||||
check_webhook_http(cfg),
|
||||
check_disk(cfg),
|
||||
check_memory(cfg),
|
||||
check_cpu(cfg),
|
||||
check_database(cfg),
|
||||
check_response_store_db(),
|
||||
]
|
||||
return HealthSnapshot(ts=ts, checks=checks)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# SQLite persistence
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
@contextmanager
|
||||
def _db(path: Path):
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
conn = sqlite3.connect(str(path), timeout=10)
|
||||
conn.execute("PRAGMA journal_mode=WAL")
|
||||
conn.execute("PRAGMA foreign_keys=ON")
|
||||
try:
|
||||
yield conn
|
||||
conn.commit()
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
def _init_db(path: Path) -> None:
|
||||
"""Create tables if they don't exist."""
|
||||
with _db(path) as conn:
|
||||
conn.execute("""
|
||||
CREATE TABLE IF NOT EXISTS health_snapshots (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
ts TEXT NOT NULL,
|
||||
overall TEXT NOT NULL,
|
||||
payload TEXT NOT NULL
|
||||
)
|
||||
""")
|
||||
conn.execute("CREATE INDEX IF NOT EXISTS idx_snapshots_ts ON health_snapshots(ts)")
|
||||
conn.execute("""
|
||||
CREATE TABLE IF NOT EXISTS alerts_sent (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
ts TEXT NOT NULL,
|
||||
check_name TEXT NOT NULL,
|
||||
status TEXT NOT NULL,
|
||||
message TEXT NOT NULL
|
||||
)
|
||||
""")
|
||||
conn.execute("CREATE INDEX IF NOT EXISTS idx_alerts_ts ON alerts_sent(ts)")
|
||||
|
||||
|
||||
def store_snapshot(cfg: ObservatoryConfig, snapshot: HealthSnapshot) -> None:
|
||||
"""Persist snapshot to SQLite."""
|
||||
_init_db(cfg.db_path)
|
||||
payload = json.dumps(snapshot.to_dict())
|
||||
with _db(cfg.db_path) as conn:
|
||||
conn.execute(
|
||||
"INSERT INTO health_snapshots (ts, overall, payload) VALUES (?, ?, ?)",
|
||||
(snapshot.ts, snapshot.overall_status, payload),
|
||||
)
|
||||
# Prune records older than RETENTION_DAYS
|
||||
cutoff = (datetime.now(timezone.utc) - timedelta(days=RETENTION_DAYS)).isoformat()
|
||||
conn.execute("DELETE FROM health_snapshots WHERE ts < ?", (cutoff,))
|
||||
|
||||
|
||||
def record_alert_sent(cfg: ObservatoryConfig, check_name: str, status: str, message: str) -> None:
|
||||
"""Record that an alert was dispatched."""
|
||||
_init_db(cfg.db_path)
|
||||
with _db(cfg.db_path) as conn:
|
||||
conn.execute(
|
||||
"INSERT INTO alerts_sent (ts, check_name, status, message) VALUES (?, ?, ?, ?)",
|
||||
(datetime.now(timezone.utc).isoformat(), check_name, status, message),
|
||||
)
|
||||
|
||||
|
||||
def load_snapshots(cfg: ObservatoryConfig, days: int = RETENTION_DAYS) -> List[Dict[str, Any]]:
|
||||
"""Load snapshots from the last N days."""
|
||||
if not cfg.db_path.exists():
|
||||
return []
|
||||
cutoff = (datetime.now(timezone.utc) - timedelta(days=days)).isoformat()
|
||||
with _db(cfg.db_path) as conn:
|
||||
rows = conn.execute(
|
||||
"SELECT ts, overall, payload FROM health_snapshots WHERE ts >= ? ORDER BY ts DESC",
|
||||
(cutoff,),
|
||||
).fetchall()
|
||||
return [json.loads(row[2]) for row in rows]
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Alerting
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _telegram_send(token: str, chat_id: str, text: str) -> bool:
|
||||
"""Send a Telegram message via the Bot API. Returns True on success."""
|
||||
url = f"https://api.telegram.org/bot{token}/sendMessage"
|
||||
payload = json.dumps({
|
||||
"chat_id": chat_id,
|
||||
"text": text,
|
||||
"parse_mode": "HTML",
|
||||
"disable_web_page_preview": True,
|
||||
}).encode("utf-8")
|
||||
req = urllib.request.Request(url, data=payload, method="POST")
|
||||
req.add_header("Content-Type", "application/json")
|
||||
req.add_header("User-Agent", "hermes-observatory/1.0")
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=15) as resp:
|
||||
body = json.loads(resp.read())
|
||||
return bool(body.get("ok"))
|
||||
except Exception as exc:
|
||||
logger.warning("Telegram send failed: %s", exc)
|
||||
return False
|
||||
|
||||
|
||||
def _status_emoji(status: str) -> str:
|
||||
return {"ok": "✅", "warn": "⚠️", "critical": "🔴", "error": "❌"}.get(status, "❓")
|
||||
|
||||
|
||||
def maybe_alert(cfg: ObservatoryConfig, snapshot: HealthSnapshot, prev_snapshot: Optional[HealthSnapshot]) -> List[str]:
|
||||
"""
|
||||
Fire Telegram alerts for newly degraded checks.
|
||||
Returns list of alert messages sent.
|
||||
"""
|
||||
if not cfg.telegram_token or not cfg.alert_chat_id:
|
||||
return []
|
||||
|
||||
alerts_sent = []
|
||||
prev_statuses: Dict[str, str] = {}
|
||||
if prev_snapshot:
|
||||
for c in prev_snapshot.checks:
|
||||
prev_statuses[c.name] = c.status
|
||||
|
||||
for check in snapshot.checks:
|
||||
if check.status in ("critical", "error"):
|
||||
prev = prev_statuses.get(check.name, "ok")
|
||||
if prev not in ("critical", "error"):
|
||||
# Newly degraded — alert
|
||||
emoji = _status_emoji(check.status)
|
||||
msg = (
|
||||
f"{emoji} <b>Hermes Observatory Alert</b>\n\n"
|
||||
f"<b>Check:</b> {check.name}\n"
|
||||
f"<b>Status:</b> {check.status.upper()}\n"
|
||||
f"<b>Message:</b> {check.message}\n"
|
||||
f"<b>Time:</b> {snapshot.ts}"
|
||||
)
|
||||
if _telegram_send(cfg.telegram_token, cfg.alert_chat_id, msg):
|
||||
alerts_sent.append(msg)
|
||||
record_alert_sent(cfg, check.name, check.status, check.message)
|
||||
logger.info("Alert sent for %s (%s)", check.name, check.status)
|
||||
elif check.status == "ok":
|
||||
prev = prev_statuses.get(check.name)
|
||||
if prev in ("critical", "error"):
|
||||
# Recovery alert
|
||||
msg = (
|
||||
f"✅ <b>Hermes Observatory — Recovery</b>\n\n"
|
||||
f"<b>Check:</b> {check.name} has recovered\n"
|
||||
f"<b>Message:</b> {check.message}\n"
|
||||
f"<b>Time:</b> {snapshot.ts}"
|
||||
)
|
||||
if _telegram_send(cfg.telegram_token, cfg.alert_chat_id, msg):
|
||||
alerts_sent.append(msg)
|
||||
record_alert_sent(cfg, check.name, "recovery", check.message)
|
||||
|
||||
return alerts_sent
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Daily digest
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def build_digest(cfg: ObservatoryConfig) -> str:
|
||||
"""Build a daily health digest from stored snapshots."""
|
||||
snapshots = load_snapshots(cfg, days=1)
|
||||
total = len(snapshots)
|
||||
if total == 0:
|
||||
return "No health data available for the last 24 hours."
|
||||
|
||||
# Count by overall status
|
||||
status_counts: Dict[str, int] = {"ok": 0, "warn": 0, "critical": 0, "error": 0}
|
||||
check_degraded_counts: Dict[str, int] = {}
|
||||
latencies: Dict[str, List[float]] = {}
|
||||
|
||||
for snap in snapshots:
|
||||
overall = snap.get("overall", "ok")
|
||||
status_counts[overall] = status_counts.get(overall, 0) + 1
|
||||
for check in snap.get("checks", []):
|
||||
name = check["name"]
|
||||
status = check["status"]
|
||||
if status in ("critical", "error", "warn"):
|
||||
check_degraded_counts[name] = check_degraded_counts.get(name, 0) + 1
|
||||
value = check.get("value")
|
||||
unit = check.get("unit")
|
||||
if value is not None and unit == "ms":
|
||||
if name not in latencies:
|
||||
latencies[name] = []
|
||||
latencies[name].append(float(value))
|
||||
|
||||
uptime_pct = 100.0 * status_counts["ok"] / total if total else 0.0
|
||||
now = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M UTC")
|
||||
|
||||
lines = [
|
||||
f"📊 <b>Hermes Observatory — Daily Digest</b>",
|
||||
f"<b>Generated:</b> {now}",
|
||||
f"",
|
||||
f"<b>Last 24h Summary</b> ({total} samples)",
|
||||
f" Healthy: {status_counts['ok']} ({100*status_counts['ok']//total if total else 0}%)",
|
||||
f" Warning: {status_counts.get('warn', 0)}",
|
||||
f" Critical: {status_counts.get('critical', 0)}",
|
||||
f" Error: {status_counts.get('error', 0)}",
|
||||
f"",
|
||||
]
|
||||
|
||||
# SLO status
|
||||
lines.append("<b>SLO Status</b>")
|
||||
gw_uptime_target = cfg.gateway_uptime_slo_pct
|
||||
gw_snapshots = [
|
||||
s for s in snapshots
|
||||
if any(c["name"] == "gateway_process" and c["status"] == "ok" for c in s.get("checks", []))
|
||||
]
|
||||
gw_uptime = 100.0 * len(gw_snapshots) / total if total else 0.0
|
||||
gw_ok = gw_uptime >= gw_uptime_target
|
||||
lines.append(
|
||||
f" {'✅' if gw_ok else '❌'} Gateway uptime: {gw_uptime:.1f}% (target: ≥{gw_uptime_target}%)"
|
||||
)
|
||||
|
||||
wh_latency_target = cfg.webhook_latency_slo_ms
|
||||
if "webhook_http" in latencies and latencies["webhook_http"]:
|
||||
wh_vals = sorted(latencies["webhook_http"])
|
||||
p95_idx = int(len(wh_vals) * 0.95)
|
||||
p95 = wh_vals[min(p95_idx, len(wh_vals) - 1)]
|
||||
wh_ok = p95 <= wh_latency_target
|
||||
lines.append(
|
||||
f" {'✅' if wh_ok else '❌'} Webhook p95 latency: {p95:.0f}ms (target: ≤{wh_latency_target:.0f}ms)"
|
||||
)
|
||||
else:
|
||||
lines.append(f" ⚫ Webhook latency: no data")
|
||||
|
||||
if "api_server_http" in latencies and latencies["api_server_http"]:
|
||||
api_vals = sorted(latencies["api_server_http"])
|
||||
p95_idx = int(len(api_vals) * 0.95)
|
||||
p95 = api_vals[min(p95_idx, len(api_vals) - 1)]
|
||||
api_ok = p95 <= wh_latency_target
|
||||
lines.append(
|
||||
f" {'✅' if api_ok else '❌'} API server p95 latency: {p95:.0f}ms (target: ≤{wh_latency_target:.0f}ms)"
|
||||
)
|
||||
|
||||
# Top degraded checks
|
||||
if check_degraded_counts:
|
||||
lines.append("")
|
||||
lines.append("<b>Degraded Checks (24h)</b>")
|
||||
for name, count in sorted(check_degraded_counts.items(), key=lambda x: -x[1]):
|
||||
pct = 100 * count // total if total else 0
|
||||
lines.append(f" • {name}: {count} incidents ({pct}%)")
|
||||
|
||||
lines.append("")
|
||||
lines.append(f"<i>Observatory DB: {cfg.db_path}</i>")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def send_digest(cfg: ObservatoryConfig) -> bool:
|
||||
"""Build and send the daily digest to Telegram. Returns True on success."""
|
||||
digest = build_digest(cfg)
|
||||
if cfg.telegram_token and cfg.digest_chat_id:
|
||||
return _telegram_send(cfg.telegram_token, cfg.digest_chat_id, digest)
|
||||
return False
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Display helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_STATUS_COLORS = {
|
||||
"ok": "\033[32m", # green
|
||||
"warn": "\033[33m", # yellow
|
||||
"critical": "\033[31m", # red
|
||||
"error": "\033[91m", # bright red
|
||||
}
|
||||
_RESET = "\033[0m"
|
||||
|
||||
|
||||
def _color_status(status: str) -> str:
|
||||
c = _STATUS_COLORS.get(status, "")
|
||||
return f"{c}{status.upper()}{_RESET}"
|
||||
|
||||
|
||||
def print_snapshot(snapshot: HealthSnapshot) -> None:
|
||||
overall_color = _STATUS_COLORS.get(snapshot.overall_status, "")
|
||||
print(f"\n{'='*60}")
|
||||
print(f" Hermes Observatory — {snapshot.ts}")
|
||||
print(f" Overall: {overall_color}{snapshot.overall_status.upper()}{_RESET}")
|
||||
print(f"{'='*60}")
|
||||
for check in snapshot.checks:
|
||||
emoji = _status_emoji(check.status)
|
||||
val_str = f" [{check.value:.1f}{check.unit}]" if check.value is not None and check.unit else ""
|
||||
print(f" {emoji} {check.name:<25} {_color_status(check.status):<15} {check.message}{val_str}")
|
||||
print()
|
||||
|
||||
|
||||
def print_slo_report(cfg: ObservatoryConfig) -> None:
|
||||
"""Print current SLO definitions and targets."""
|
||||
snapshots = load_snapshots(cfg, days=30)
|
||||
total = len(snapshots)
|
||||
print(f"\n{'='*60}")
|
||||
print(" Hermes Observatory — SLO Report (last 30 days)")
|
||||
print(f"{'='*60}")
|
||||
for slo_key, slo in SLO_DEFINITIONS.items():
|
||||
print(f"\n {slo['description']}")
|
||||
print(f" Target: {slo['target']}{slo['unit']}")
|
||||
if total == 0:
|
||||
print(f" Status: no data")
|
||||
continue
|
||||
if slo_key == "gateway_uptime_pct":
|
||||
ok_count = sum(
|
||||
1 for s in snapshots
|
||||
if any(c["name"] == "gateway_process" and c["status"] == "ok"
|
||||
for c in s.get("checks", []))
|
||||
)
|
||||
actual = 100.0 * ok_count / total
|
||||
met = actual >= slo["target"]
|
||||
print(f" Actual: {actual:.2f}% {'✅ MET' if met else '❌ MISSED'}")
|
||||
elif slo_key in ("webhook_latency_ms", "api_server_http_latency_ms"):
|
||||
check_name = "webhook_http" if "webhook" in slo_key else "api_server_http"
|
||||
vals = [
|
||||
float(c["value"])
|
||||
for s in snapshots
|
||||
for c in s.get("checks", [])
|
||||
if c["name"] == check_name and c.get("value") is not None
|
||||
]
|
||||
if vals:
|
||||
vals.sort()
|
||||
p95_idx = int(len(vals) * 0.95)
|
||||
p95 = vals[min(p95_idx, len(vals) - 1)]
|
||||
met = p95 <= slo["target"]
|
||||
print(f" p95: {p95:.0f}ms {'✅ MET' if met else '❌ MISSED'}")
|
||||
else:
|
||||
print(f" Status: no latency data")
|
||||
print()
|
||||
|
||||
|
||||
def print_history(cfg: ObservatoryConfig, count: int = 20) -> None:
|
||||
"""Print recent health records."""
|
||||
snapshots = load_snapshots(cfg, days=RETENTION_DAYS)[:count]
|
||||
if not snapshots:
|
||||
print("No history available.")
|
||||
return
|
||||
print(f"\n{'='*60}")
|
||||
print(f" Last {min(count, len(snapshots))} health records")
|
||||
print(f"{'='*60}")
|
||||
for snap in snapshots:
|
||||
ts = snap.get("ts", "?")
|
||||
overall = snap.get("overall", "?")
|
||||
emoji = _status_emoji(overall)
|
||||
degraded = [c["name"] for c in snap.get("checks", []) if c["status"] != "ok"]
|
||||
degraded_str = f" — issues: {', '.join(degraded)}" if degraded else ""
|
||||
print(f" {emoji} {ts} {overall.upper()}{degraded_str}")
|
||||
print()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Daemon mode
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class Observatory:
|
||||
"""Continuous monitoring daemon."""
|
||||
|
||||
def __init__(self, cfg: ObservatoryConfig):
|
||||
self.cfg = cfg
|
||||
self._running = False
|
||||
self._prev_snapshot: Optional[HealthSnapshot] = None
|
||||
|
||||
def _handle_signal(self, signum: int, frame: Any) -> None:
|
||||
logger.info("Received signal %d, shutting down...", signum)
|
||||
self._running = False
|
||||
|
||||
def run_once(self) -> HealthSnapshot:
|
||||
snapshot = collect_snapshot(self.cfg)
|
||||
store_snapshot(self.cfg, snapshot)
|
||||
alerts = maybe_alert(self.cfg, snapshot, self._prev_snapshot)
|
||||
if alerts:
|
||||
logger.info("Sent %d alert(s)", len(alerts))
|
||||
self._prev_snapshot = snapshot
|
||||
return snapshot
|
||||
|
||||
def run(self) -> None:
|
||||
_init_db(self.cfg.db_path)
|
||||
logger.info(
|
||||
"Observatory starting — poll_interval=%ds db=%s",
|
||||
self.cfg.poll_interval,
|
||||
self.cfg.db_path,
|
||||
)
|
||||
self._running = True
|
||||
signal.signal(signal.SIGINT, self._handle_signal)
|
||||
signal.signal(signal.SIGTERM, self._handle_signal)
|
||||
|
||||
while self._running:
|
||||
try:
|
||||
snapshot = self.run_once()
|
||||
logger.info("Health check: %s", snapshot.overall_status)
|
||||
except Exception as exc:
|
||||
logger.error("Health check failed: %s", exc, exc_info=True)
|
||||
if self._running:
|
||||
time.sleep(self.cfg.poll_interval)
|
||||
|
||||
logger.info("Observatory stopped.")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CLI entry point
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def main(argv: Optional[List[str]] = None) -> int:
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Hermes Observatory — health monitoring & alerting",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
)
|
||||
parser.add_argument("--check", action="store_true", help="Run one health check and print results")
|
||||
parser.add_argument("--daemon", action="store_true", help="Run as continuous monitoring daemon")
|
||||
parser.add_argument("--digest", action="store_true", help="Print (and optionally send) daily digest")
|
||||
parser.add_argument("--history", type=int, metavar="N", help="Show last N health records")
|
||||
parser.add_argument("--slo", action="store_true", help="Print SLO report")
|
||||
parser.add_argument("--send-digest", action="store_true", help="Send daily digest via Telegram")
|
||||
parser.add_argument("--verbose", "-v", action="store_true", help="Enable verbose logging")
|
||||
|
||||
args = parser.parse_args(argv)
|
||||
|
||||
logging.basicConfig(
|
||||
level=logging.DEBUG if args.verbose else logging.INFO,
|
||||
format="%(asctime)s %(levelname)s [observatory] %(message)s",
|
||||
)
|
||||
|
||||
cfg = ObservatoryConfig.from_env()
|
||||
_init_db(cfg.db_path)
|
||||
|
||||
if args.check:
|
||||
snapshot = collect_snapshot(cfg)
|
||||
store_snapshot(cfg, snapshot)
|
||||
print_snapshot(snapshot)
|
||||
return 0 if snapshot.overall_status == "ok" else 1
|
||||
|
||||
if args.daemon:
|
||||
obs = Observatory(cfg)
|
||||
obs.run()
|
||||
return 0
|
||||
|
||||
if args.digest or args.send_digest:
|
||||
digest = build_digest(cfg)
|
||||
print(digest)
|
||||
if args.send_digest:
|
||||
ok = send_digest(cfg)
|
||||
if ok:
|
||||
print("\n[Digest sent to Telegram]")
|
||||
else:
|
||||
print("\n[Telegram send skipped — token/chat_id not configured]")
|
||||
return 0
|
||||
|
||||
if args.history is not None:
|
||||
print_history(cfg, args.history)
|
||||
return 0
|
||||
|
||||
if args.slo:
|
||||
print_slo_report(cfg)
|
||||
return 0
|
||||
|
||||
# Default: one-shot check
|
||||
snapshot = collect_snapshot(cfg)
|
||||
store_snapshot(cfg, snapshot)
|
||||
print_snapshot(snapshot)
|
||||
return 0 if snapshot.overall_status == "ok" else 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
@@ -42,6 +42,7 @@ dependencies = [
|
||||
modal = ["modal>=1.0.0,<2"]
|
||||
daytona = ["daytona>=0.148.0,<1"]
|
||||
dev = ["pytest>=9.0.2,<10", "pytest-asyncio>=1.3.0,<2", "pytest-xdist>=3.0,<4", "mcp>=1.2.0,<2"]
|
||||
observatory = ["psutil>=5.9.0,<7"]
|
||||
messaging = ["python-telegram-bot>=22.6,<23", "discord.py[voice]>=2.7.1,<3", "aiohttp>=3.13.3,<4", "slack-bolt>=1.18.0,<2", "slack-sdk>=3.27.0,<4"]
|
||||
cron = ["croniter>=6.0.0,<7"]
|
||||
slack = ["slack-bolt>=1.18.0,<2", "slack-sdk>=3.27.0,<4"]
|
||||
|
||||
252
reports/ezra-quarterly-report-april-2026.md
Normal file
252
reports/ezra-quarterly-report-april-2026.md
Normal file
@@ -0,0 +1,252 @@
|
||||
# Ezra — Quarterly Technical & Strategic Report
|
||||
**April 2026**
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
This report consolidates the principal technical and strategic outputs from Q1/Q2 2026. Three major workstreams are covered:
|
||||
|
||||
1. **Security & Performance Hardening** — Shipped V-011 obfuscation detection and context-compressor tuning.
|
||||
2. **System Formalization Audit** — Identified ~6,300 lines of homegrown infrastructure that can be replaced by well-maintained open-source projects.
|
||||
3. **Business Development** — Formalized a pure-contracting go-to-market plan ("Operation Get A Job") to monetize the engineering collective.
|
||||
|
||||
---
|
||||
|
||||
## 1. Recent Deliverables
|
||||
|
||||
### 1.1 V-011 Obfuscation Bypass Detection
|
||||
|
||||
A significant security enhancement was shipped to the skills-guard subsystem to defeat obfuscated malicious skill code.
|
||||
|
||||
**Technical additions:**
|
||||
- `normalize_input()` with NFKC normalization, case folding, and zero-width character removal to defeat homoglyph and ZWSP evasion.
|
||||
- `PythonSecurityAnalyzer` AST visitor detecting `eval`/`exec`/`compile`, `getattr` dunder access, and imports of `base64`/`codecs`/`marshal`/`types`/`ctypes`.
|
||||
- Additional regex patterns for `getattr` builtins chains, `__import__` os/subprocess, and nested base64 decoding.
|
||||
- Full integration into `scan_file()`; Python files now receive both normalized regex scanning and AST-based analysis.
|
||||
|
||||
**Verification:** All tests passing (`103 passed, 4 warnings`).
|
||||
|
||||
**Reference:** Forge PR #131 — `[EPIC-999/Phase II] The Forge — V-011 obfuscation fix + compressor tuning`
|
||||
|
||||
### 1.2 Context Compressor Tuning
|
||||
|
||||
The default `protect_last_n` parameter was reduced from `20` to `5`. The previous default was overly conservative, preventing meaningful compression on long sessions. The new default preserves the five most recent conversational turns while allowing the compressor to effectively reduce token pressure.
|
||||
|
||||
A regression test was added verifying that the last five turns are never summarized away.
|
||||
|
||||
### 1.3 Burn Mode Resilience
|
||||
|
||||
The agent loop was enhanced with a configurable `burn_mode` flag that increases concurrent tool execution capacity and adds transient-failure retry logic.
|
||||
|
||||
**Changes:**
|
||||
- `max_tool_workers` increased from `8` to `16` in burn mode.
|
||||
- Expanded parallel tool coverage to include browser, vision, skill, and session-search tools.
|
||||
- Added batch timeout protection (300s in burn mode / 180s normal) to prevent hung threads from blocking the agent loop.
|
||||
- Thread-pool shutdown now uses `executor.shutdown(wait=False)` for immediate control return.
|
||||
- Transient errors (timeouts, rate limits, 502/503/504) trigger one automatic retry in burn mode.
|
||||
|
||||
---
|
||||
|
||||
## 2. System Formalization Audit
|
||||
|
||||
A comprehensive audit was performed across the `hermes-agent` codebase to identify homegrown modules that could be replaced by mature open-source alternatives. The objective is efficiency: reduce maintenance burden, leverage community expertise, and improve reliability.
|
||||
|
||||
### 2.1 Candidate Matrix
|
||||
|
||||
| Priority | Component | Lines | Current State | Proposed Replacement | Effort | ROI |
|
||||
|:--------:|-----------|------:|---------------|----------------------|:------:|:---:|
|
||||
| **P0** | MCP Client | 2,176 | Custom asyncio transport, sampling, schema translation | `mcp` (official Python SDK) | 2-3 wks | Very High |
|
||||
| **P0** | Cron Scheduler | ~1,500 | Custom JSON job store, manual tick loop | `APScheduler` | 1-2 wks | Very High |
|
||||
| **P0** | Config Management | 2,589 | Manual YAML loader, no type safety | `pydantic-settings` + Pydantic v2 | 3-4 wks | High |
|
||||
| **P1** | Checkpoint Manager | 548 | Shells out to `git` binary | `dulwich` (pure-Python git) | 1 wk | Medium-High |
|
||||
| **P1** | Auth / Credential Pool | ~3,800 | Custom JWT decode, OAuth refresh, JSON auth store | `authlib` + `keyring` + `PyJWT` | 2-3 wks | Medium |
|
||||
| **P1** | Batch Runner | 1,285 | Custom `multiprocessing.Pool` wrapper | `joblib` (local) or `celery` (distributed) | 1-2 wks | Medium |
|
||||
| **P2** | SQLite Session Store | ~2,400 | Raw SQLite + FTS5, manual schema | SQLAlchemy ORM + Alembic | 2-3 wks | Medium |
|
||||
| **P2** | Trajectory Compressor | 1,518 | Custom tokenizer + summarization pipeline | Keep core logic; add `zstandard` for binary storage | 3 days | Low-Medium |
|
||||
| **P2** | Process Registry | 889 | Custom background process tracking | Keep (adds too much ops complexity) | — | Low |
|
||||
| **P2** | Web Tools | 2,080+ | Firecrawl + Parallel wrappers | Keep (Firecrawl is already best-in-class) | — | Low |
|
||||
|
||||
### 2.2 P0 Replacements
|
||||
|
||||
#### MCP Client → Official `mcp` Python SDK
|
||||
|
||||
**Current:** `tools/mcp_tool.py` (2,176 lines) contains custom stdio/HTTP transport lifecycle, manual `anyio` cancel-scope cleanup, hand-rolled schema translation, custom sampling bridge, credential stripping, and reconnection backoff.
|
||||
|
||||
**Problem:** The Model Context Protocol is evolving rapidly. Maintaining a custom 2K-line client means every protocol revision requires manual patches. The official SDK already handles transport negotiation, lifecycle management, and type-safe schema generation.
|
||||
|
||||
**Migration Plan:**
|
||||
1. Add `mcp>=1.0.0` to dependencies.
|
||||
2. Build a thin `HermesMCPBridge` class that instantiates `mcp.ClientSession`, maps MCP `Tool` schemas to Hermes registry calls, forwards tool invocations, and preserves the sampling callback.
|
||||
3. Deprecate the `_mcp_loop` background thread and `anyio`-based transport code.
|
||||
4. Add integration tests against a test MCP server.
|
||||
|
||||
**Lines Saved:** ~1,600
|
||||
**Risk:** Medium — sampling and timeout behavior need parity testing.
|
||||
|
||||
#### Cron Scheduler → APScheduler
|
||||
|
||||
**Current:** `cron/jobs.py` (753 lines) + `cron/scheduler.py` (~740 lines) use a JSON file as the job store, custom `parse_duration` and `compute_next_run` logic, a manual tick loop, and ad-hoc delivery orchestration.
|
||||
|
||||
**Problem:** Scheduling is a solved problem. The homegrown system lacks timezone support, job concurrency controls, graceful clustering, and durable execution guarantees.
|
||||
|
||||
**Migration Plan:**
|
||||
1. Introduce `APScheduler` with a `SQLAlchemyJobStore` (or custom JSON store).
|
||||
2. Refactor each Hermes cron job into an APScheduler `Job` function.
|
||||
3. Preserve existing delivery logic (`_deliver_result`, `_build_job_prompt`, `_run_job_script`) as the job body.
|
||||
4. Migrate `jobs.json` entries into APScheduler jobs on first run.
|
||||
5. Expose `/cron` status via a thin CLI wrapper.
|
||||
|
||||
**Lines Saved:** ~700
|
||||
**Risk:** Low — delivery logic is preserved; only the trigger mechanism changes.
|
||||
|
||||
#### Config Management → `pydantic-settings`
|
||||
|
||||
**Current:** `hermes_cli/config.py` (2,589 lines) uses manual YAML parsing with hardcoded defaults, a complex migration chain (`_config_version` currently at 11), no runtime type validation, and stringly-typed env var resolution.
|
||||
|
||||
**Problem:** Every new config option requires touching multiple places. Migration logic is ~400 lines and growing. Typo'd config values are only caught at runtime, often deep in the agent loop.
|
||||
|
||||
**Migration Plan:**
|
||||
1. Define a `HermesConfig` Pydantic model with nested sections (`ModelConfig`, `ProviderConfig`, `AgentConfig`, `CompressionConfig`, etc.).
|
||||
2. Use `pydantic-settings`'s `SettingsConfigDict(yaml_file="~/.hermes/config.yaml")` to auto-load.
|
||||
3. Map env vars via `env_prefix="HERMES_"` or field-level `validation_alias`.
|
||||
4. Keep the migration layer as a one-time upgrade function, then remove it after two releases.
|
||||
5. Replace `load_config()` call sites with `HermesConfig()` instantiation.
|
||||
|
||||
**Lines Saved:** ~1,500
|
||||
**Risk:** Medium-High — large blast radius; every module reads config. Requires backward compatibility.
|
||||
|
||||
### 2.3 P1 Replacements
|
||||
|
||||
**Checkpoint Manager → `dulwich`**
|
||||
- Replace `subprocess.run(["git", ...])` calls with `dulwich.porcelain` equivalents.
|
||||
- Use `dulwich.repo.Repo.init_bare()` for shadow repos.
|
||||
- Snapshotting becomes an in-memory `Index` write + `commit()`.
|
||||
- **Lines Saved:** ~200
|
||||
- **Risk:** Low
|
||||
|
||||
**Auth / Credential Pool → `authlib` + `keyring` + `PyJWT`**
|
||||
- Use `authlib` for OAuth2 session and token refresh.
|
||||
- Replace custom JWT decoding with `PyJWT`.
|
||||
- Migrate the auth store JSON to `keyring`-backed secure storage where available.
|
||||
- Keep Hermes-specific credential pool strategies (round-robin, least-used, etc.).
|
||||
- **Lines Saved:** ~800
|
||||
- **Risk:** Medium
|
||||
|
||||
**Batch Runner → `joblib`**
|
||||
- For typical local batch sizes, `joblib.Parallel(n_jobs=-1, backend='loky')` replaces the custom worker pool.
|
||||
- Only migrate to Celery if cross-machine distribution is required.
|
||||
- **Lines Saved:** ~400
|
||||
- **Risk:** Low for `joblib`
|
||||
|
||||
### 2.4 Execution Roadmap
|
||||
|
||||
1. **Week 1-2:** Migrate Checkpoint Manager to `dulwich` (quick win, low risk)
|
||||
2. **Week 3-4:** Migrate Cron Scheduler to `APScheduler` (high value, well-contained)
|
||||
3. **Week 5-8:** Migrate MCP Client to official `mcp` SDK (highest complexity, highest payoff)
|
||||
4. **Week 9-12:** Migrate Config Management to `pydantic-settings` (largest blast radius, do last)
|
||||
5. **Ongoing:** Evaluate Auth/Credential Pool and Batch Runner replacements as follow-up epics.
|
||||
|
||||
### 2.5 Cost-Benefit Summary
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Total homebrew lines audited | ~17,000 |
|
||||
| Lines recommended for replacement | ~6,300 |
|
||||
| Estimated dev weeks (P0 + P1) | 10-14 weeks |
|
||||
| New runtime dependencies added | 4-6 well-maintained packages |
|
||||
| Maintenance burden reduction | Very High |
|
||||
| Risk level | Medium (mitigated by strong test coverage) |
|
||||
|
||||
---
|
||||
|
||||
## 3. Strategic Initiative: Operation Get A Job
|
||||
|
||||
### 3.1 Thesis
|
||||
|
||||
The engineering collective is capable of 10x delivery velocity compared to typical market offerings. The strategic opportunity is to monetize this capability through pure contracting — high-tempo, fixed-scope engagements with no exclusivity or employer-like constraints.
|
||||
|
||||
### 3.2 Service Menu
|
||||
|
||||
**Tier A — White-Glove Agent Infrastructure ($400-600/hr)**
|
||||
- Custom AI agent deployment with tool use (Slack, Discord, Telegram, webhooks)
|
||||
- MCP server development
|
||||
- Local LLM stack setup (on-premise / VPC)
|
||||
- Agent security audit and red teaming
|
||||
|
||||
**Tier B — Security Hardening & Code Review ($250-400/hr)**
|
||||
- Security backlog burn-down (CVE-class bugs)
|
||||
- Skills-guard / sandbox hardening
|
||||
- Architecture review
|
||||
|
||||
**Tier C — Automation & Integration ($150-250/hr)**
|
||||
- Webhook-to-action pipelines
|
||||
- Research and intelligence reporting
|
||||
- Content-to-code workflows
|
||||
|
||||
### 3.3 Engagement Packages
|
||||
|
||||
| Service | Description | Timeline | Investment |
|
||||
|---------|-------------|----------|------------|
|
||||
| Agent Security Audit | Review of one AI agent pipeline + written findings | 2-3 business days | $4,500 |
|
||||
| MCP Server Build | One custom MCP server with 3-5 tools + docs + tests | 1-2 weeks | $8,000 |
|
||||
| Custom Bot Deployment | End-to-end bot with up to 5 tools, deployed to client platform | 2-3 weeks | $12,000 |
|
||||
| Security Sprint | Close top 5 security issues in a Python/JS repo | 1-2 weeks | $6,500 |
|
||||
| Monthly Retainer — Core | 20 hrs/month prioritized engineering + triage | Ongoing | $6,000/mo |
|
||||
| Monthly Retainer — Scale | 40 hrs/month prioritized engineering + on-call | Ongoing | $11,000/mo |
|
||||
|
||||
### 3.4 Go-to-Market Motion
|
||||
|
||||
**Immediate channels:**
|
||||
- Cold outbound to CTOs/VPEs at Series A-C AI startups
|
||||
- LinkedIn authority content (architecture reviews, security bulletins)
|
||||
- Platform presence (Gun.io, Toptal, Upwork for specific niche keywords)
|
||||
|
||||
**Lead magnet:** Free 15-minute architecture review. No pitch. One concrete risk identified.
|
||||
|
||||
### 3.5 Infrastructure Foundation
|
||||
|
||||
The Hermes Agent framework serves as both the delivery platform and the portfolio piece:
|
||||
- Open-source runtime with ~3,000 tests
|
||||
- Gateway architecture supporting 8+ messaging platforms
|
||||
- Native MCP client, cron scheduling, subagent delegation
|
||||
- Self-hosted Forge (Gitea) with CI and automated PR review
|
||||
- Local Gemma 4 inference stack on bare metal
|
||||
|
||||
### 3.6 90-Day Revenue Model
|
||||
|
||||
| Month | Target |
|
||||
|-------|--------|
|
||||
| Month 1 | $9-12K (1x retainer or 2x audits) |
|
||||
| Month 2 | $17K (+ 1x MCP build) |
|
||||
| Month 3 | $29K (+ 1x bot deployment + new retainer) |
|
||||
|
||||
### 3.7 Immediate Action Items
|
||||
|
||||
- File Wyoming LLC and obtain EIN
|
||||
- Open Mercury business bank account
|
||||
- Secure E&O insurance
|
||||
- Update LinkedIn profile and publish first authority post
|
||||
- Customize capabilities deck and begin warm outbound
|
||||
|
||||
---
|
||||
|
||||
## 4. Fleet Status Summary
|
||||
|
||||
| House | Host | Model / Provider | Gateway Status |
|
||||
|-------|------|------------------|----------------|
|
||||
| Ezra | Hermes VPS | `kimi-for-coding` (Kimi K2.5) | API `8658`, webhook `8648` — Active |
|
||||
| Bezalel | Hermes VPS | Claude Opus 4.6 (Anthropic) | Port `8645` — Active |
|
||||
| Allegro-Primus | Hermes VPS | Kimi K2.5 | Port `8644` — Requires restart |
|
||||
| Bilbo | External | Gemma 4B (local) | Telegram dual-mode — Active |
|
||||
|
||||
**Network:** Hermes VPS public IP `143.198.27.163` (Ubuntu 24.04.3 LTS). Local Gemma 4 fallback on `127.0.0.1:11435`.
|
||||
|
||||
---
|
||||
|
||||
## 5. Conclusion
|
||||
|
||||
The codebase is in a strong position: security is hardened, the agent loop is more resilient, and a clear roadmap exists to replace high-maintenance homegrown infrastructure with battle-tested open-source projects. The commercialization strategy is formalized and ready for execution. The next critical path is the human-facing work of entity formation, sales outreach, and closing the first fixed-scope engagement.
|
||||
|
||||
Prepared by **Ezra**
|
||||
April 2026
|
||||
BIN
reports/ezra-quarterly-report-april-2026.pdf
Normal file
BIN
reports/ezra-quarterly-report-april-2026.pdf
Normal file
Binary file not shown.
371
scripts/deploy-validate
Executable file
371
scripts/deploy-validate
Executable file
@@ -0,0 +1,371 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
deploy-validate — pre-flight configuration checker for Hermes deployments.
|
||||
|
||||
Catches common configuration errors BEFORE they cause runtime failures.
|
||||
Safe to run at any time: it only reads files and makes lightweight network
|
||||
checks — it never writes state or sends messages.
|
||||
|
||||
Usage:
|
||||
python scripts/deploy-validate # validate current environment
|
||||
python scripts/deploy-validate --dry-run # alias for the same thing
|
||||
python scripts/deploy-validate --env /path/to/.env
|
||||
|
||||
Exit codes:
|
||||
0 All checks passed (or only warnings).
|
||||
1 One or more blocking errors found.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import socket
|
||||
import sys
|
||||
import urllib.error
|
||||
import urllib.request
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
RESET = "\033[0m"
|
||||
RED = "\033[91m"
|
||||
YELLOW = "\033[93m"
|
||||
GREEN = "\033[92m"
|
||||
BOLD = "\033[1m"
|
||||
|
||||
|
||||
def _color(text: str, code: str) -> str:
|
||||
if sys.stdout.isatty():
|
||||
return f"{code}{text}{RESET}"
|
||||
return text
|
||||
|
||||
|
||||
def ok(msg: str) -> None:
|
||||
print(f" {_color('✔', GREEN)} {msg}")
|
||||
|
||||
|
||||
def warn(msg: str) -> None:
|
||||
print(f" {_color('⚠', YELLOW)} {msg}")
|
||||
|
||||
|
||||
def error(msg: str) -> None:
|
||||
print(f" {_color('✘', RED)} {msg}")
|
||||
|
||||
|
||||
def section(title: str) -> None:
|
||||
print(f"\n{_color(BOLD + title, BOLD)}")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# .env loader (minimal — avoids dependency on python-dotenv for portability)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _load_env_file(path: Path) -> dict[str, str]:
|
||||
"""Parse a .env file and return a dict of key→value pairs."""
|
||||
result: dict[str, str] = {}
|
||||
if not path.exists():
|
||||
return result
|
||||
for line in path.read_text(encoding="utf-8").splitlines():
|
||||
line = line.strip()
|
||||
if not line or line.startswith("#") or "=" not in line:
|
||||
continue
|
||||
key, _, value = line.partition("=")
|
||||
key = key.strip()
|
||||
# Strip inline comments and surrounding quotes.
|
||||
value = value.split("#")[0].strip().strip("\"'")
|
||||
if key:
|
||||
result[key] = value
|
||||
return result
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Individual checks
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def check_env_file(env_path: Path) -> dict[str, str]:
|
||||
section("Environment file")
|
||||
if not env_path.exists():
|
||||
error(f".env not found at {env_path}")
|
||||
error("Copy .env.example → .env and fill in your API keys.")
|
||||
return {}
|
||||
ok(f".env found at {env_path}")
|
||||
|
||||
raw = _load_env_file(env_path)
|
||||
|
||||
# Warn if any value looks like a placeholder.
|
||||
placeholder_patterns = ("your_", "xxxx", "changeme", "todo", "replace_me")
|
||||
for key, value in raw.items():
|
||||
if value and any(p in value.lower() for p in placeholder_patterns):
|
||||
warn(f"{key} looks like a placeholder: {value!r}")
|
||||
|
||||
return raw
|
||||
|
||||
|
||||
def check_llm_key(env: dict[str, str]) -> bool:
|
||||
section("LLM provider")
|
||||
providers = {
|
||||
"OPENROUTER_API_KEY": "OpenRouter",
|
||||
"ANTHROPIC_API_KEY": "Anthropic",
|
||||
"OPENAI_API_KEY": "OpenAI",
|
||||
"GLM_API_KEY": "z.ai / GLM",
|
||||
"KIMI_API_KEY": "Kimi / Moonshot",
|
||||
"MINIMAX_API_KEY": "MiniMax",
|
||||
"NOUS_API_KEY": "Nous Portal",
|
||||
"HF_TOKEN": "Hugging Face",
|
||||
"KILOCODE_API_KEY": "KiloCode",
|
||||
"OPENCODE_ZEN_API_KEY": "OpenCode Zen",
|
||||
}
|
||||
found = [name for key, name in providers.items() if env.get(key, "").strip()]
|
||||
if not found:
|
||||
error("No LLM API key detected. Set at least one (e.g. OPENROUTER_API_KEY).")
|
||||
return False
|
||||
ok(f"LLM provider key present: {', '.join(found)}")
|
||||
return True
|
||||
|
||||
|
||||
def check_hermes_home(env: dict[str, str]) -> Optional[Path]:
|
||||
section("HERMES_HOME data directory")
|
||||
raw = env.get("HERMES_HOME") or os.environ.get("HERMES_HOME") or ""
|
||||
if raw:
|
||||
home = Path(raw).expanduser()
|
||||
else:
|
||||
home = Path.home() / ".hermes"
|
||||
|
||||
if not home.exists():
|
||||
warn(f"HERMES_HOME does not exist yet: {home} (will be created on first run)")
|
||||
return home
|
||||
|
||||
ok(f"HERMES_HOME exists: {home}")
|
||||
|
||||
required_dirs = ["logs", "sessions", "cron", "memories", "skills"]
|
||||
for d in required_dirs:
|
||||
if not (home / d).is_dir():
|
||||
warn(f"Expected subdirectory missing: {home / d} (created automatically at runtime)")
|
||||
|
||||
if (home / ".env").exists():
|
||||
ok(f"Data-directory .env present: {home / '.env'}")
|
||||
else:
|
||||
warn(f"No .env in HERMES_HOME ({home}). "
|
||||
"The Docker entrypoint copies .env.example on first run; "
|
||||
"for bare-metal installs copy it manually.")
|
||||
|
||||
return home
|
||||
|
||||
|
||||
def check_gateway_platforms(env: dict[str, str]) -> None:
|
||||
section("Messaging platform tokens")
|
||||
platforms: dict[str, list[str]] = {
|
||||
"Telegram": ["TELEGRAM_BOT_TOKEN"],
|
||||
"Discord": ["DISCORD_BOT_TOKEN"],
|
||||
"Slack": ["SLACK_BOT_TOKEN", "SLACK_APP_TOKEN"],
|
||||
"WhatsApp": [], # pairing-based, no env key required
|
||||
"Email": ["EMAIL_ADDRESS", "EMAIL_PASSWORD"],
|
||||
}
|
||||
any_found = False
|
||||
for platform, keys in platforms.items():
|
||||
if not keys:
|
||||
continue # WhatsApp — no key check
|
||||
if all(env.get(k, "").strip() for k in keys):
|
||||
ok(f"{platform}: configured ({', '.join(keys)})")
|
||||
any_found = True
|
||||
if not any_found:
|
||||
warn("No messaging platform tokens found. "
|
||||
"The gateway will start but accept no inbound messages. "
|
||||
"Set at least one platform token (e.g. TELEGRAM_BOT_TOKEN).")
|
||||
|
||||
|
||||
def check_api_server_reachable(host: str = "127.0.0.1", port: int = 8642) -> None:
|
||||
section("API server health check")
|
||||
url = f"http://{host}:{port}/health"
|
||||
try:
|
||||
with urllib.request.urlopen(url, timeout=5) as resp:
|
||||
body = resp.read().decode()
|
||||
if '"status"' in body and "ok" in body:
|
||||
ok(f"API server healthy: {url}")
|
||||
else:
|
||||
warn(f"Unexpected /health response from {url}: {body[:200]}")
|
||||
except urllib.error.URLError as exc:
|
||||
# Not a failure — the server may not be running in --dry-run mode.
|
||||
warn(f"API server not reachable at {url}: {exc.reason} "
|
||||
"(expected if gateway is not running)")
|
||||
except OSError as exc:
|
||||
warn(f"API server not reachable at {url}: {exc}")
|
||||
|
||||
|
||||
def check_gateway_status(hermes_home: Optional[Path]) -> None:
|
||||
section("Gateway runtime status")
|
||||
if hermes_home is None:
|
||||
warn("HERMES_HOME unknown — skipping runtime status check.")
|
||||
return
|
||||
|
||||
state_file = hermes_home / "gateway_state.json"
|
||||
pid_file = hermes_home / "gateway.pid"
|
||||
|
||||
if not state_file.exists() and not pid_file.exists():
|
||||
warn("Gateway does not appear to be running (no PID or state file). "
|
||||
"This is expected before the first start.")
|
||||
return
|
||||
|
||||
if state_file.exists():
|
||||
import json
|
||||
try:
|
||||
state = json.loads(state_file.read_text())
|
||||
gw_state = state.get("gateway_state", "unknown")
|
||||
updated = state.get("updated_at", "?")
|
||||
if gw_state == "running":
|
||||
ok(f"Gateway state: {gw_state} (updated {updated})")
|
||||
platforms = state.get("platforms", {})
|
||||
for plat, pdata in platforms.items():
|
||||
pstate = pdata.get("state", "unknown")
|
||||
if pstate in ("connected", "running", "ok"):
|
||||
ok(f" Platform {plat}: {pstate}")
|
||||
else:
|
||||
warn(f" Platform {plat}: {pstate} — {pdata.get('error_message', '')}")
|
||||
elif gw_state in ("stopped", "startup_failed"):
|
||||
error(f"Gateway state: {gw_state} — {state.get('exit_reason', 'no reason recorded')}")
|
||||
else:
|
||||
warn(f"Gateway state: {gw_state}")
|
||||
except Exception as exc:
|
||||
warn(f"Could not parse {state_file}: {exc}")
|
||||
else:
|
||||
warn("State file missing; only PID file found. Gateway may be starting.")
|
||||
|
||||
|
||||
def check_docker_available() -> None:
|
||||
section("Docker / compose availability")
|
||||
for cmd in ("docker", "docker compose"):
|
||||
_check_command(cmd.split()[0], cmd)
|
||||
|
||||
|
||||
def _check_command(name: str, display: str) -> bool:
|
||||
import shutil
|
||||
if shutil.which(name):
|
||||
ok(f"{display} found")
|
||||
return True
|
||||
warn(f"{display} not found in PATH (only required for Docker deployments)")
|
||||
return False
|
||||
|
||||
|
||||
def check_ports_free(ports: list[int] = None) -> None:
|
||||
section("Port availability")
|
||||
if ports is None:
|
||||
ports = [8642]
|
||||
for port in ports:
|
||||
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
|
||||
s.settimeout(1)
|
||||
result = s.connect_ex(("127.0.0.1", port))
|
||||
if result == 0:
|
||||
warn(f"Port {port} is already in use. "
|
||||
"The API server will fail to bind unless you change its port.")
|
||||
else:
|
||||
ok(f"Port {port} is free")
|
||||
|
||||
|
||||
def check_no_secrets_in_repo(repo_root: Path) -> None:
|
||||
section("Secret hygiene")
|
||||
dangerous = [".env", "*.pem", "*.key", "id_rsa", "id_ed25519"]
|
||||
gitignore = repo_root / ".gitignore"
|
||||
if gitignore.exists():
|
||||
content = gitignore.read_text()
|
||||
for pattern in [".env", "*.pem", "*.key"]:
|
||||
if pattern in content or pattern.lstrip("*. ") in content:
|
||||
ok(f".gitignore covers {pattern}")
|
||||
else:
|
||||
warn(f".gitignore does not mention {pattern}. "
|
||||
"Ensure secrets are never committed.")
|
||||
else:
|
||||
warn("No .gitignore found. Secrets could accidentally be committed.")
|
||||
|
||||
# Check the env file itself isn't tracked.
|
||||
env_file = repo_root / ".env"
|
||||
if env_file.exists():
|
||||
import subprocess
|
||||
try:
|
||||
out = subprocess.run(
|
||||
["git", "ls-files", "--error-unmatch", ".env"],
|
||||
cwd=repo_root,
|
||||
capture_output=True,
|
||||
)
|
||||
if out.returncode == 0:
|
||||
error(".env IS tracked by git! Remove it immediately: git rm --cached .env")
|
||||
else:
|
||||
ok(".env is not tracked by git")
|
||||
except FileNotFoundError:
|
||||
warn("git not found — cannot verify .env tracking status")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Main
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def main() -> int:
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Pre-flight configuration validator for Hermes deployments.",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog=__doc__,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--dry-run", action="store_true",
|
||||
help="Alias for the default mode (no state is written regardless).",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--env", metavar="PATH",
|
||||
help="Path to .env file (default: .env in repo root).",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--check-ports", action="store_true",
|
||||
help="Also verify that required ports are free (useful before first start).",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--skip-health", action="store_true",
|
||||
help="Skip the live /health HTTP check (use when gateway is not running).",
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
print(f"\n{_color(BOLD + 'Hermes Deploy Validator', BOLD)}")
|
||||
print("=" * 50)
|
||||
|
||||
repo_root = Path(__file__).resolve().parent.parent
|
||||
env_path = Path(args.env) if args.env else repo_root / ".env"
|
||||
|
||||
errors_before = [0] # mutable sentinel
|
||||
|
||||
# Monkey-patch error() to count failures.
|
||||
_original_error = globals()["error"]
|
||||
error_count = 0
|
||||
|
||||
def counting_error(msg: str) -> None:
|
||||
nonlocal error_count
|
||||
error_count += 1
|
||||
_original_error(msg)
|
||||
|
||||
globals()["error"] = counting_error
|
||||
|
||||
# Run checks.
|
||||
env = check_env_file(env_path)
|
||||
check_no_secrets_in_repo(repo_root)
|
||||
llm_ok = check_llm_key(env)
|
||||
hermes_home = check_hermes_home(env)
|
||||
check_gateway_platforms(env)
|
||||
if args.check_ports:
|
||||
check_ports_free()
|
||||
if not args.skip_health:
|
||||
check_api_server_reachable()
|
||||
check_gateway_status(hermes_home)
|
||||
|
||||
# Summary.
|
||||
print(f"\n{'=' * 50}")
|
||||
if error_count == 0:
|
||||
print(_color(f"All checks passed (0 errors).", GREEN))
|
||||
return 0
|
||||
else:
|
||||
print(_color(f"{error_count} error(s) found. Fix them before deploying.", RED))
|
||||
return 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
261
scripts/forge_health_check.py
Executable file
261
scripts/forge_health_check.py
Executable file
@@ -0,0 +1,261 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Forge Health Check — Build verification and artifact integrity scanner.
|
||||
|
||||
Scans wizard environments for:
|
||||
- Missing source files (.pyc without .py) — Allegro finding: GOFAI source files gone
|
||||
- Burn script accumulation in /root or wizard directories
|
||||
- World-readable sensitive files (keystores, tokens, configs)
|
||||
- Missing required environment variables
|
||||
|
||||
Usage:
|
||||
python scripts/forge_health_check.py /root/wizards
|
||||
python scripts/forge_health_check.py /root/wizards --json
|
||||
python scripts/forge_health_check.py /root/wizards --fix-permissions
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import stat
|
||||
import sys
|
||||
from dataclasses import asdict, dataclass, field
|
||||
from pathlib import Path
|
||||
from typing import Iterable
|
||||
|
||||
|
||||
SENSITIVE_FILE_PATTERNS = (
|
||||
"keystore",
|
||||
"password",
|
||||
"private",
|
||||
"apikey",
|
||||
"api_key",
|
||||
"credentials",
|
||||
)
|
||||
|
||||
SENSITIVE_NAME_PREFIXES = (
|
||||
"key_",
|
||||
"keys_",
|
||||
"token_",
|
||||
"tokens_",
|
||||
"secret_",
|
||||
"secrets_",
|
||||
".env",
|
||||
"env.",
|
||||
)
|
||||
|
||||
SENSITIVE_NAME_SUFFIXES = (
|
||||
"_key",
|
||||
"_keys",
|
||||
"_token",
|
||||
"_tokens",
|
||||
"_secret",
|
||||
"_secrets",
|
||||
".key",
|
||||
".env",
|
||||
".token",
|
||||
".secret",
|
||||
)
|
||||
|
||||
SENSIBLE_PERMISSIONS = 0o600 # owner read/write only
|
||||
|
||||
REQUIRED_ENV_VARS = (
|
||||
"GITEA_URL",
|
||||
"GITEA_TOKEN",
|
||||
"GITEA_USER",
|
||||
)
|
||||
|
||||
BURN_SCRIPT_PATTERNS = (
|
||||
"burn",
|
||||
"ignite",
|
||||
"inferno",
|
||||
"scorch",
|
||||
"char",
|
||||
"blaze",
|
||||
"ember",
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class HealthFinding:
|
||||
category: str
|
||||
severity: str # critical, warning, info
|
||||
path: str
|
||||
message: str
|
||||
suggestion: str = ""
|
||||
|
||||
|
||||
@dataclass
|
||||
class HealthReport:
|
||||
target: str
|
||||
findings: list[HealthFinding] = field(default_factory=list)
|
||||
passed: bool = True
|
||||
|
||||
def add(self, finding: HealthFinding) -> None:
|
||||
self.findings.append(finding)
|
||||
if finding.severity == "critical":
|
||||
self.passed = False
|
||||
|
||||
|
||||
def scan_orphaned_bytecode(root: Path, report: HealthReport) -> None:
|
||||
"""Detect .pyc files without corresponding .py source files."""
|
||||
for pyc in root.rglob("*.pyc"):
|
||||
py = pyc.with_suffix(".py")
|
||||
if not py.exists():
|
||||
# Also check __pycache__ naming convention
|
||||
if pyc.name.startswith("__") and pyc.parent.name == "__pycache__":
|
||||
stem = pyc.stem.split(".")[0]
|
||||
py = pyc.parent.parent / f"{stem}.py"
|
||||
if not py.exists():
|
||||
report.add(
|
||||
HealthFinding(
|
||||
category="artifact_integrity",
|
||||
severity="critical",
|
||||
path=str(pyc),
|
||||
message=f"Compiled bytecode without source: {pyc}",
|
||||
suggestion="Restore missing .py source file from version control or backup",
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
def scan_burn_script_clutter(root: Path, report: HealthReport) -> None:
|
||||
"""Detect burn scripts and other temporary artifacts outside proper staging."""
|
||||
for path in root.iterdir():
|
||||
if not path.is_file():
|
||||
continue
|
||||
lower = path.name.lower()
|
||||
if any(pat in lower for pat in BURN_SCRIPT_PATTERNS):
|
||||
report.add(
|
||||
HealthFinding(
|
||||
category="deployment_hygiene",
|
||||
severity="warning",
|
||||
path=str(path),
|
||||
message=f"Burn script or temporary artifact in production path: {path.name}",
|
||||
suggestion="Archive to a burn/ or tmp/ directory, or remove if no longer needed",
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
def _is_sensitive_filename(name: str) -> bool:
|
||||
"""Check if a filename indicates it may contain secrets."""
|
||||
lower = name.lower()
|
||||
if lower == ".env.example":
|
||||
return False
|
||||
if any(pat in lower for pat in SENSITIVE_FILE_PATTERNS):
|
||||
return True
|
||||
if any(lower.startswith(pref) for pref in SENSITIVE_NAME_PREFIXES):
|
||||
return True
|
||||
if any(lower.endswith(suff) for suff in SENSITIVE_NAME_SUFFIXES):
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def scan_sensitive_file_permissions(root: Path, report: HealthReport, fix: bool = False) -> None:
|
||||
"""Detect world-readable sensitive files."""
|
||||
for fpath in root.rglob("*"):
|
||||
if not fpath.is_file():
|
||||
continue
|
||||
# Skip test files — real secrets should never live in tests/
|
||||
if "/tests/" in str(fpath) or str(fpath).startswith(str(root / "tests")):
|
||||
continue
|
||||
if not _is_sensitive_filename(fpath.name):
|
||||
continue
|
||||
|
||||
try:
|
||||
mode = fpath.stat().st_mode
|
||||
except OSError:
|
||||
continue
|
||||
|
||||
# Readable by group or other
|
||||
if mode & stat.S_IRGRP or mode & stat.S_IROTH:
|
||||
was_fixed = False
|
||||
if fix:
|
||||
try:
|
||||
fpath.chmod(SENSIBLE_PERMISSIONS)
|
||||
was_fixed = True
|
||||
except OSError:
|
||||
pass
|
||||
|
||||
report.add(
|
||||
HealthFinding(
|
||||
category="security",
|
||||
severity="critical",
|
||||
path=str(fpath),
|
||||
message=(
|
||||
f"Sensitive file world-readable: {fpath.name} "
|
||||
f"(mode={oct(mode & 0o777)})"
|
||||
),
|
||||
suggestion=(
|
||||
f"Fixed permissions to {oct(SENSIBLE_PERMISSIONS)}"
|
||||
if was_fixed
|
||||
else f"Run 'chmod {oct(SENSIBLE_PERMISSIONS)[2:]} {fpath}'"
|
||||
),
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
def scan_environment_variables(report: HealthReport) -> None:
|
||||
"""Check for required environment variables."""
|
||||
for var in REQUIRED_ENV_VARS:
|
||||
if not os.environ.get(var):
|
||||
report.add(
|
||||
HealthFinding(
|
||||
category="configuration",
|
||||
severity="warning",
|
||||
path="$" + var,
|
||||
message=f"Required environment variable {var} is missing or empty",
|
||||
suggestion="Export the variable in your shell profile or secrets manager",
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
def run_health_check(target: Path, fix_permissions: bool = False) -> HealthReport:
|
||||
report = HealthReport(target=str(target.resolve()))
|
||||
if target.exists():
|
||||
scan_orphaned_bytecode(target, report)
|
||||
scan_burn_script_clutter(target, report)
|
||||
scan_sensitive_file_permissions(target, report, fix=fix_permissions)
|
||||
scan_environment_variables(report)
|
||||
return report
|
||||
|
||||
|
||||
def print_report(report: HealthReport) -> None:
|
||||
status = "PASS" if report.passed else "FAIL"
|
||||
print(f"Forge Health Check: {status}")
|
||||
print(f"Target: {report.target}")
|
||||
print(f"Findings: {len(report.findings)}\n")
|
||||
|
||||
by_category: dict[str, list[HealthFinding]] = {}
|
||||
for f in report.findings:
|
||||
by_category.setdefault(f.category, []).append(f)
|
||||
|
||||
for category, findings in by_category.items():
|
||||
print(f"[{category.upper()}]")
|
||||
for f in findings:
|
||||
print(f" {f.severity.upper()}: {f.message}")
|
||||
if f.suggestion:
|
||||
print(f" -> {f.suggestion}")
|
||||
print()
|
||||
|
||||
|
||||
def main(argv: list[str] | None = None) -> int:
|
||||
parser = argparse.ArgumentParser(description="Forge Health Check")
|
||||
parser.add_argument("target", nargs="?", default="/root/wizards", help="Root path to scan")
|
||||
parser.add_argument("--json", action="store_true", help="Output JSON report")
|
||||
parser.add_argument("--fix-permissions", action="store_true", help="Auto-fix file permissions")
|
||||
args = parser.parse_args(argv)
|
||||
|
||||
target = Path(args.target)
|
||||
report = run_health_check(target, fix_permissions=args.fix_permissions)
|
||||
|
||||
if args.json:
|
||||
print(json.dumps(asdict(report), indent=2))
|
||||
else:
|
||||
print_report(report)
|
||||
|
||||
return 0 if report.passed else 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
89
scripts/smoke_test.py
Executable file
89
scripts/smoke_test.py
Executable file
@@ -0,0 +1,89 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Forge smoke tests — fast checks that core imports resolve and entrypoints load.
|
||||
|
||||
Total runtime target: < 30 seconds.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import importlib
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# Allow running smoke test directly from repo root before pip install
|
||||
REPO_ROOT = Path(__file__).parent.parent
|
||||
sys.path.insert(0, str(REPO_ROOT))
|
||||
|
||||
CORE_MODULES = [
|
||||
"hermes_cli.config",
|
||||
"hermes_state",
|
||||
"model_tools",
|
||||
"toolsets",
|
||||
"utils",
|
||||
]
|
||||
|
||||
CLI_ENTRYPOINTS = [
|
||||
[sys.executable, "cli.py", "--help"],
|
||||
]
|
||||
|
||||
|
||||
def test_imports() -> None:
|
||||
ok = 0
|
||||
skipped = 0
|
||||
for mod in CORE_MODULES:
|
||||
try:
|
||||
importlib.import_module(mod)
|
||||
ok += 1
|
||||
except ImportError as exc:
|
||||
# If the failure is a missing third-party dependency, skip rather than fail
|
||||
# so the smoke test can run before `pip install` in bare environments.
|
||||
msg = str(exc).lower()
|
||||
if "no module named" in msg and mod.replace(".", "/") not in msg:
|
||||
print(f"SKIP: import {mod} -> missing dependency ({exc})")
|
||||
skipped += 1
|
||||
else:
|
||||
print(f"FAIL: import {mod} -> {exc}")
|
||||
sys.exit(1)
|
||||
except Exception as exc:
|
||||
print(f"FAIL: import {mod} -> {exc}")
|
||||
sys.exit(1)
|
||||
print(f"OK: {ok} core imports", end="")
|
||||
if skipped:
|
||||
print(f" ({skipped} skipped due to missing deps)")
|
||||
else:
|
||||
print()
|
||||
|
||||
|
||||
def test_cli_help() -> None:
|
||||
ok = 0
|
||||
skipped = 0
|
||||
for cmd in CLI_ENTRYPOINTS:
|
||||
result = subprocess.run(cmd, capture_output=True, timeout=30)
|
||||
if result.returncode == 0:
|
||||
ok += 1
|
||||
continue
|
||||
stderr = result.stderr.decode().lower()
|
||||
# Gracefully skip if dependencies are missing in bare environments
|
||||
if "modulenotfounderror" in stderr or "no module named" in stderr:
|
||||
print(f"SKIP: {' '.join(cmd)} -> missing dependency")
|
||||
skipped += 1
|
||||
else:
|
||||
print(f"FAIL: {' '.join(cmd)} -> {result.stderr.decode()[:200]}")
|
||||
sys.exit(1)
|
||||
print(f"OK: {ok} CLI entrypoints", end="")
|
||||
if skipped:
|
||||
print(f" ({skipped} skipped due to missing deps)")
|
||||
else:
|
||||
print()
|
||||
|
||||
|
||||
def main() -> int:
|
||||
test_imports()
|
||||
test_cli_help()
|
||||
print("Smoke tests passed.")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
20
scripts/syntax_guard.py
Executable file
20
scripts/syntax_guard.py
Executable file
@@ -0,0 +1,20 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Syntax guard — compile all Python files to catch syntax errors before merge."""
|
||||
import py_compile
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
errors = []
|
||||
for p in Path(".").rglob("*.py"):
|
||||
if ".venv" in p.parts or "__pycache__" in p.parts:
|
||||
continue
|
||||
try:
|
||||
py_compile.compile(str(p), doraise=True)
|
||||
except py_compile.PyCompileError as e:
|
||||
errors.append(f"{p}: {e}")
|
||||
print(f"SYNTAX ERROR: {p}: {e}", file=sys.stderr)
|
||||
|
||||
if errors:
|
||||
print(f"\n{len(errors)} file(s) with syntax errors", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
print("All Python files compile successfully")
|
||||
175
tests/test_forge_health_check.py
Normal file
175
tests/test_forge_health_check.py
Normal file
@@ -0,0 +1,175 @@
|
||||
"""Tests for scripts/forge_health_check.py"""
|
||||
|
||||
import os
|
||||
import stat
|
||||
from pathlib import Path
|
||||
|
||||
# Import the script as a module
|
||||
import sys
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent / "scripts"))
|
||||
|
||||
from forge_health_check import (
|
||||
HealthFinding,
|
||||
HealthReport,
|
||||
_is_sensitive_filename,
|
||||
run_health_check,
|
||||
scan_burn_script_clutter,
|
||||
scan_orphaned_bytecode,
|
||||
scan_sensitive_file_permissions,
|
||||
scan_environment_variables,
|
||||
)
|
||||
|
||||
|
||||
class TestIsSensitiveFilename:
|
||||
def test_keystore_is_sensitive(self) -> None:
|
||||
assert _is_sensitive_filename("keystore.json") is True
|
||||
|
||||
def test_env_example_is_not_sensitive(self) -> None:
|
||||
assert _is_sensitive_filename(".env.example") is False
|
||||
|
||||
def test_env_file_is_sensitive(self) -> None:
|
||||
assert _is_sensitive_filename(".env") is True
|
||||
assert _is_sensitive_filename("production.env") is True
|
||||
|
||||
def test_test_file_with_key_is_not_sensitive(self) -> None:
|
||||
assert _is_sensitive_filename("test_interrupt_key_match.py") is False
|
||||
assert _is_sensitive_filename("test_api_key_providers.py") is False
|
||||
|
||||
|
||||
class TestScanOrphanedBytecode:
|
||||
def test_detects_pyc_without_py(self, tmp_path: Path) -> None:
|
||||
pyc = tmp_path / "module.pyc"
|
||||
pyc.write_bytes(b"\x00")
|
||||
report = HealthReport(target=str(tmp_path))
|
||||
scan_orphaned_bytecode(tmp_path, report)
|
||||
assert len(report.findings) == 1
|
||||
assert report.findings[0].category == "artifact_integrity"
|
||||
assert report.findings[0].severity == "critical"
|
||||
|
||||
def test_ignores_pyc_with_py(self, tmp_path: Path) -> None:
|
||||
(tmp_path / "module.py").write_text("pass")
|
||||
pyc = tmp_path / "module.pyc"
|
||||
pyc.write_bytes(b"\x00")
|
||||
report = HealthReport(target=str(tmp_path))
|
||||
scan_orphaned_bytecode(tmp_path, report)
|
||||
assert len(report.findings) == 0
|
||||
|
||||
def test_detects_pycache_orphan(self, tmp_path: Path) -> None:
|
||||
pycache = tmp_path / "__pycache__"
|
||||
pycache.mkdir()
|
||||
pyc = pycache / "module.cpython-312.pyc"
|
||||
pyc.write_bytes(b"\x00")
|
||||
report = HealthReport(target=str(tmp_path))
|
||||
scan_orphaned_bytecode(tmp_path, report)
|
||||
assert len(report.findings) == 1
|
||||
assert "__pycache__" in report.findings[0].path
|
||||
|
||||
|
||||
class TestScanBurnScriptClutter:
|
||||
def test_detects_burn_script(self, tmp_path: Path) -> None:
|
||||
(tmp_path / "burn_test.sh").write_text("#!/bin/bash")
|
||||
report = HealthReport(target=str(tmp_path))
|
||||
scan_burn_script_clutter(tmp_path, report)
|
||||
assert len(report.findings) == 1
|
||||
assert report.findings[0].category == "deployment_hygiene"
|
||||
assert report.findings[0].severity == "warning"
|
||||
|
||||
def test_ignores_regular_files(self, tmp_path: Path) -> None:
|
||||
(tmp_path / "deploy.sh").write_text("#!/bin/bash")
|
||||
report = HealthReport(target=str(tmp_path))
|
||||
scan_burn_script_clutter(tmp_path, report)
|
||||
assert len(report.findings) == 0
|
||||
|
||||
|
||||
class TestScanSensitiveFilePermissions:
|
||||
def test_detects_world_readable_keystore(self, tmp_path: Path) -> None:
|
||||
ks = tmp_path / "keystore.json"
|
||||
ks.write_text("{}")
|
||||
ks.chmod(0o644)
|
||||
report = HealthReport(target=str(tmp_path))
|
||||
scan_sensitive_file_permissions(tmp_path, report)
|
||||
assert len(report.findings) == 1
|
||||
assert report.findings[0].category == "security"
|
||||
assert report.findings[0].severity == "critical"
|
||||
assert "644" in report.findings[0].message
|
||||
|
||||
def test_auto_fixes_permissions(self, tmp_path: Path) -> None:
|
||||
ks = tmp_path / "keystore.json"
|
||||
ks.write_text("{}")
|
||||
ks.chmod(0o644)
|
||||
report = HealthReport(target=str(tmp_path))
|
||||
scan_sensitive_file_permissions(tmp_path, report, fix=True)
|
||||
assert len(report.findings) == 1
|
||||
assert ks.stat().st_mode & 0o777 == 0o600
|
||||
|
||||
def test_ignores_safe_permissions(self, tmp_path: Path) -> None:
|
||||
ks = tmp_path / "keystore.json"
|
||||
ks.write_text("{}")
|
||||
ks.chmod(0o600)
|
||||
report = HealthReport(target=str(tmp_path))
|
||||
scan_sensitive_file_permissions(tmp_path, report)
|
||||
assert len(report.findings) == 0
|
||||
|
||||
def test_ignores_env_example(self, tmp_path: Path) -> None:
|
||||
env = tmp_path / ".env.example"
|
||||
env.write_text("# example")
|
||||
env.chmod(0o644)
|
||||
report = HealthReport(target=str(tmp_path))
|
||||
scan_sensitive_file_permissions(tmp_path, report)
|
||||
assert len(report.findings) == 0
|
||||
|
||||
def test_ignores_test_directory(self, tmp_path: Path) -> None:
|
||||
tests_dir = tmp_path / "tests"
|
||||
tests_dir.mkdir()
|
||||
ks = tests_dir / "keystore.json"
|
||||
ks.write_text("{}")
|
||||
ks.chmod(0o644)
|
||||
report = HealthReport(target=str(tmp_path))
|
||||
scan_sensitive_file_permissions(tmp_path, report)
|
||||
assert len(report.findings) == 0
|
||||
|
||||
|
||||
class TestScanEnvironmentVariables:
|
||||
def test_reports_missing_env_var(self, monkeypatch) -> None:
|
||||
monkeypatch.delenv("GITEA_TOKEN", raising=False)
|
||||
report = HealthReport(target=".")
|
||||
scan_environment_variables(report)
|
||||
missing = [f for f in report.findings if f.path == "$GITEA_TOKEN"]
|
||||
assert len(missing) == 1
|
||||
assert missing[0].severity == "warning"
|
||||
|
||||
def test_passes_when_env_vars_present(self, monkeypatch) -> None:
|
||||
for var in ("GITEA_URL", "GITEA_TOKEN", "GITEA_USER"):
|
||||
monkeypatch.setenv(var, "present")
|
||||
report = HealthReport(target=".")
|
||||
scan_environment_variables(report)
|
||||
assert len(report.findings) == 0
|
||||
|
||||
|
||||
class TestRunHealthCheck:
|
||||
def test_full_run(self, tmp_path: Path, monkeypatch) -> None:
|
||||
monkeypatch.setenv("GITEA_URL", "https://example.com")
|
||||
monkeypatch.setenv("GITEA_TOKEN", "secret")
|
||||
monkeypatch.setenv("GITEA_USER", "bezalel")
|
||||
|
||||
(tmp_path / "orphan.pyc").write_bytes(b"\x00")
|
||||
(tmp_path / "burn_it.sh").write_text("#!/bin/bash")
|
||||
ks = tmp_path / "keystore.json"
|
||||
ks.write_text("{}")
|
||||
ks.chmod(0o644)
|
||||
|
||||
report = run_health_check(tmp_path)
|
||||
assert not report.passed
|
||||
categories = {f.category for f in report.findings}
|
||||
assert "artifact_integrity" in categories
|
||||
assert "deployment_hygiene" in categories
|
||||
assert "security" in categories
|
||||
|
||||
def test_clean_run_passes(self, tmp_path: Path, monkeypatch) -> None:
|
||||
for var in ("GITEA_URL", "GITEA_TOKEN", "GITEA_USER"):
|
||||
monkeypatch.setenv(var, "present")
|
||||
|
||||
(tmp_path / "module.py").write_text("pass")
|
||||
report = run_health_check(tmp_path)
|
||||
assert report.passed
|
||||
assert len(report.findings) == 0
|
||||
18
tests/test_green_path_e2e.py
Normal file
18
tests/test_green_path_e2e.py
Normal file
@@ -0,0 +1,18 @@
|
||||
"""Bare green-path E2E — one happy-path tool call cycle.
|
||||
|
||||
Exercises the terminal tool directly and verifies the response structure.
|
||||
No API keys required. Runtime target: < 10 seconds.
|
||||
"""
|
||||
|
||||
import json
|
||||
|
||||
from tools.terminal_tool import terminal_tool
|
||||
|
||||
|
||||
def test_terminal_echo_green_path() -> None:
|
||||
"""terminal('echo hello') -> verify response contains 'hello' and exit_code 0."""
|
||||
result = terminal_tool(command="echo hello", timeout=10)
|
||||
data = json.loads(result)
|
||||
|
||||
assert data["exit_code"] == 0, f"Expected exit_code 0, got {data['exit_code']}"
|
||||
assert "hello" in data["output"], f"Expected 'hello' in output, got: {data['output']}"
|
||||
455
tests/test_observatory.py
Normal file
455
tests/test_observatory.py
Normal file
@@ -0,0 +1,455 @@
|
||||
"""
|
||||
Tests for observatory.py — health monitoring & alerting.
|
||||
|
||||
Refs #147
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
import sqlite3
|
||||
import sys
|
||||
import tempfile
|
||||
import time
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
PROJECT_ROOT = Path(__file__).parent.parent
|
||||
if str(PROJECT_ROOT) not in sys.path:
|
||||
sys.path.insert(0, str(PROJECT_ROOT))
|
||||
|
||||
import observatory as obs
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Fixtures
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
@pytest.fixture
|
||||
def cfg(tmp_path):
|
||||
"""Return an ObservatoryConfig pointing at a temp directory."""
|
||||
cfg = obs.ObservatoryConfig()
|
||||
cfg.db_path = tmp_path / "observatory.db"
|
||||
cfg.alert_chat_id = "99999"
|
||||
cfg.digest_chat_id = "99999"
|
||||
cfg.telegram_token = "fake-token"
|
||||
cfg.webhook_url = "http://127.0.0.1:19999/health" # port never bound
|
||||
cfg.api_url = "http://127.0.0.1:19998/health"
|
||||
return cfg
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Config tests
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestObservatoryConfig:
|
||||
def test_defaults(self):
|
||||
c = obs.ObservatoryConfig()
|
||||
assert c.disk_warn_pct == 80.0
|
||||
assert c.disk_crit_pct == 90.0
|
||||
assert c.mem_warn_pct == 80.0
|
||||
assert c.mem_crit_pct == 90.0
|
||||
assert c.cpu_warn_pct == 80.0
|
||||
assert c.cpu_crit_pct == 95.0
|
||||
assert c.poll_interval == 60
|
||||
assert c.webhook_latency_slo_ms == 2000.0
|
||||
assert c.gateway_uptime_slo_pct == 99.5
|
||||
|
||||
def test_from_env_overrides(self, monkeypatch):
|
||||
monkeypatch.setenv("OBSERVATORY_DISK_WARN_PCT", "70")
|
||||
monkeypatch.setenv("OBSERVATORY_POLL_INTERVAL", "30")
|
||||
monkeypatch.setenv("OBSERVATORY_ALERT_CHAT_ID", "12345")
|
||||
monkeypatch.setenv("TELEGRAM_BOT_TOKEN", "tok123")
|
||||
c = obs.ObservatoryConfig.from_env()
|
||||
assert c.disk_warn_pct == 70.0
|
||||
assert c.poll_interval == 30
|
||||
assert c.alert_chat_id == "12345"
|
||||
assert c.telegram_token == "tok123"
|
||||
|
||||
def test_digest_chat_falls_back_to_alert(self, monkeypatch):
|
||||
monkeypatch.setenv("OBSERVATORY_ALERT_CHAT_ID", "abc")
|
||||
monkeypatch.delenv("OBSERVATORY_DIGEST_CHAT_ID", raising=False)
|
||||
c = obs.ObservatoryConfig.from_env()
|
||||
assert c.digest_chat_id == "abc"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CheckResult / HealthSnapshot tests
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestHealthSnapshot:
|
||||
def _make_snapshot(self, statuses):
|
||||
checks = [obs.CheckResult(name=f"c{i}", status=s, message="") for i, s in enumerate(statuses)]
|
||||
return obs.HealthSnapshot(ts="2026-01-01T00:00:00+00:00", checks=checks)
|
||||
|
||||
def test_overall_ok(self):
|
||||
snap = self._make_snapshot(["ok", "ok"])
|
||||
assert snap.overall_status == "ok"
|
||||
|
||||
def test_overall_warn(self):
|
||||
snap = self._make_snapshot(["ok", "warn"])
|
||||
assert snap.overall_status == "warn"
|
||||
|
||||
def test_overall_critical(self):
|
||||
snap = self._make_snapshot(["ok", "warn", "critical"])
|
||||
assert snap.overall_status == "critical"
|
||||
|
||||
def test_overall_error(self):
|
||||
snap = self._make_snapshot(["ok", "error"])
|
||||
assert snap.overall_status == "critical"
|
||||
|
||||
def test_to_dict(self):
|
||||
snap = self._make_snapshot(["ok"])
|
||||
d = snap.to_dict()
|
||||
assert d["overall"] == "ok"
|
||||
assert isinstance(d["checks"], list)
|
||||
assert d["checks"][0]["name"] == "c0"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Individual check tests
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestCheckGatewayLiveness:
|
||||
def test_running(self):
|
||||
with patch("gateway.status.is_gateway_running", return_value=True), \
|
||||
patch("gateway.status.get_running_pid", return_value=12345):
|
||||
result = obs.check_gateway_liveness()
|
||||
assert result.status == "ok"
|
||||
assert "12345" in result.message
|
||||
|
||||
def test_not_running(self):
|
||||
with patch("gateway.status.is_gateway_running", return_value=False), \
|
||||
patch("gateway.status.get_running_pid", return_value=None):
|
||||
result = obs.check_gateway_liveness()
|
||||
assert result.status == "critical"
|
||||
|
||||
def test_import_error(self):
|
||||
import builtins
|
||||
real_import = builtins.__import__
|
||||
|
||||
def mock_import(name, *args, **kwargs):
|
||||
if name == "gateway.status":
|
||||
raise ImportError("no module")
|
||||
return real_import(name, *args, **kwargs)
|
||||
|
||||
with patch("builtins.__import__", side_effect=mock_import):
|
||||
result = obs.check_gateway_liveness()
|
||||
assert result.status in ("error", "critical", "ok") # graceful
|
||||
|
||||
|
||||
class TestCheckDisk:
|
||||
def test_ok(self, cfg):
|
||||
mock_usage = MagicMock()
|
||||
mock_usage.percent = 50.0
|
||||
mock_usage.free = 10 * 1024 ** 3
|
||||
mock_usage.total = 20 * 1024 ** 3
|
||||
with patch("psutil.disk_usage", return_value=mock_usage):
|
||||
result = obs.check_disk(cfg)
|
||||
assert result.status == "ok"
|
||||
assert result.value == 50.0
|
||||
|
||||
def test_warn(self, cfg):
|
||||
mock_usage = MagicMock()
|
||||
mock_usage.percent = 85.0
|
||||
mock_usage.free = 3 * 1024 ** 3
|
||||
mock_usage.total = 20 * 1024 ** 3
|
||||
with patch("psutil.disk_usage", return_value=mock_usage):
|
||||
result = obs.check_disk(cfg)
|
||||
assert result.status == "warn"
|
||||
|
||||
def test_critical(self, cfg):
|
||||
mock_usage = MagicMock()
|
||||
mock_usage.percent = 92.0
|
||||
mock_usage.free = 1 * 1024 ** 3
|
||||
mock_usage.total = 20 * 1024 ** 3
|
||||
with patch("psutil.disk_usage", return_value=mock_usage):
|
||||
result = obs.check_disk(cfg)
|
||||
assert result.status == "critical"
|
||||
|
||||
def test_no_psutil(self, cfg, monkeypatch):
|
||||
monkeypatch.setattr(obs, "_PSUTIL", False)
|
||||
result = obs.check_disk(cfg)
|
||||
assert result.status == "error"
|
||||
|
||||
|
||||
class TestCheckMemory:
|
||||
def test_ok(self, cfg):
|
||||
mock_mem = MagicMock()
|
||||
mock_mem.percent = 60.0
|
||||
mock_mem.available = 4 * 1024 ** 3
|
||||
mock_mem.total = 16 * 1024 ** 3
|
||||
with patch("psutil.virtual_memory", return_value=mock_mem):
|
||||
result = obs.check_memory(cfg)
|
||||
assert result.status == "ok"
|
||||
|
||||
def test_critical(self, cfg):
|
||||
mock_mem = MagicMock()
|
||||
mock_mem.percent = 95.0
|
||||
mock_mem.available = 512 * 1024 ** 2
|
||||
mock_mem.total = 16 * 1024 ** 3
|
||||
with patch("psutil.virtual_memory", return_value=mock_mem):
|
||||
result = obs.check_memory(cfg)
|
||||
assert result.status == "critical"
|
||||
|
||||
|
||||
class TestCheckCPU:
|
||||
def test_ok(self, cfg):
|
||||
with patch("psutil.cpu_percent", return_value=40.0):
|
||||
result = obs.check_cpu(cfg)
|
||||
assert result.status == "ok"
|
||||
|
||||
def test_warn(self, cfg):
|
||||
with patch("psutil.cpu_percent", return_value=85.0):
|
||||
result = obs.check_cpu(cfg)
|
||||
assert result.status == "warn"
|
||||
|
||||
def test_critical(self, cfg):
|
||||
with patch("psutil.cpu_percent", return_value=98.0):
|
||||
result = obs.check_cpu(cfg)
|
||||
assert result.status == "critical"
|
||||
|
||||
|
||||
class TestCheckDatabase:
|
||||
def test_ok(self, cfg):
|
||||
obs._init_db(cfg.db_path)
|
||||
result = obs.check_database(cfg)
|
||||
assert result.status == "ok"
|
||||
|
||||
def test_not_yet_created(self, cfg):
|
||||
# db_path does not exist
|
||||
result = obs.check_database(cfg)
|
||||
assert result.status == "warn"
|
||||
|
||||
|
||||
class TestCheckHTTP:
|
||||
def test_webhook_connection_refused(self, cfg):
|
||||
result = obs.check_webhook_http(cfg)
|
||||
# Port 19999 is not bound — should get a "not reachable" warn
|
||||
assert result.status in ("warn", "error")
|
||||
|
||||
def test_api_server_connection_refused(self, cfg):
|
||||
result = obs.check_api_server_http(cfg)
|
||||
assert result.status in ("warn", "error")
|
||||
|
||||
def test_webhook_ok(self, cfg):
|
||||
import urllib.error
|
||||
from unittest.mock import patch, MagicMock
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.__enter__ = lambda s: s
|
||||
mock_resp.__exit__ = MagicMock(return_value=False)
|
||||
mock_resp.status = 200
|
||||
mock_resp.read.return_value = b'{"status":"ok"}'
|
||||
with patch("urllib.request.urlopen", return_value=mock_resp):
|
||||
result = obs.check_webhook_http(cfg)
|
||||
assert result.status in ("ok", "warn")
|
||||
|
||||
def test_webhook_http_error(self, cfg):
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.__enter__ = lambda s: s
|
||||
mock_resp.__exit__ = MagicMock(return_value=False)
|
||||
mock_resp.status = 503
|
||||
with patch("urllib.request.urlopen", return_value=mock_resp):
|
||||
result = obs.check_webhook_http(cfg)
|
||||
assert result.status == "critical"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Persistence tests
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestPersistence:
|
||||
def test_store_and_load(self, cfg):
|
||||
obs._init_db(cfg.db_path)
|
||||
from datetime import datetime, timezone
|
||||
ts = datetime.now(timezone.utc).isoformat()
|
||||
snap = obs.HealthSnapshot(
|
||||
ts=ts,
|
||||
checks=[obs.CheckResult(name="test", status="ok", message="fine")],
|
||||
)
|
||||
obs.store_snapshot(cfg, snap)
|
||||
loaded = obs.load_snapshots(cfg, days=30)
|
||||
assert len(loaded) == 1
|
||||
assert loaded[0]["overall"] == "ok"
|
||||
|
||||
def test_retention_pruning(self, cfg):
|
||||
obs._init_db(cfg.db_path)
|
||||
# Insert an old record directly
|
||||
with obs._db(cfg.db_path) as conn:
|
||||
conn.execute(
|
||||
"INSERT INTO health_snapshots (ts, overall, payload) VALUES (?, ?, ?)",
|
||||
("2000-01-01T00:00:00+00:00", "ok", '{"ts":"2000-01-01T00:00:00+00:00","overall":"ok","checks":[]}'),
|
||||
)
|
||||
snap = obs.HealthSnapshot(
|
||||
ts="2026-01-01T00:00:00+00:00",
|
||||
checks=[],
|
||||
)
|
||||
obs.store_snapshot(cfg, snap)
|
||||
# Old record should have been pruned
|
||||
with obs._db(cfg.db_path) as conn:
|
||||
count = conn.execute("SELECT count(*) FROM health_snapshots WHERE ts < '2001-01-01'").fetchone()[0]
|
||||
assert count == 0
|
||||
|
||||
def test_record_alert_sent(self, cfg):
|
||||
obs._init_db(cfg.db_path)
|
||||
obs.record_alert_sent(cfg, "gateway_process", "critical", "not running")
|
||||
with obs._db(cfg.db_path) as conn:
|
||||
count = conn.execute("SELECT count(*) FROM alerts_sent").fetchone()[0]
|
||||
assert count == 1
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Alerting tests
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestAlerting:
|
||||
def _snap(self, status):
|
||||
return obs.HealthSnapshot(
|
||||
ts="2026-01-01T00:00:00+00:00",
|
||||
checks=[obs.CheckResult(name="gateway_process", status=status, message="test")],
|
||||
)
|
||||
|
||||
def test_no_alert_when_ok(self, cfg):
|
||||
snap = self._snap("ok")
|
||||
prev = self._snap("ok")
|
||||
obs._init_db(cfg.db_path)
|
||||
with patch("observatory._telegram_send", return_value=True) as mock_send:
|
||||
alerts = obs.maybe_alert(cfg, snap, prev)
|
||||
mock_send.assert_not_called()
|
||||
assert alerts == []
|
||||
|
||||
def test_alert_on_new_critical(self, cfg):
|
||||
snap = self._snap("critical")
|
||||
prev = self._snap("ok")
|
||||
obs._init_db(cfg.db_path)
|
||||
with patch("observatory._telegram_send", return_value=True) as mock_send:
|
||||
alerts = obs.maybe_alert(cfg, snap, prev)
|
||||
mock_send.assert_called_once()
|
||||
assert len(alerts) == 1
|
||||
|
||||
def test_no_duplicate_alert(self, cfg):
|
||||
snap = self._snap("critical")
|
||||
prev = self._snap("critical") # already critical
|
||||
obs._init_db(cfg.db_path)
|
||||
with patch("observatory._telegram_send", return_value=True) as mock_send:
|
||||
alerts = obs.maybe_alert(cfg, snap, prev)
|
||||
mock_send.assert_not_called()
|
||||
assert alerts == []
|
||||
|
||||
def test_recovery_alert(self, cfg):
|
||||
snap = self._snap("ok")
|
||||
prev = self._snap("critical")
|
||||
obs._init_db(cfg.db_path)
|
||||
with patch("observatory._telegram_send", return_value=True) as mock_send:
|
||||
alerts = obs.maybe_alert(cfg, snap, prev)
|
||||
mock_send.assert_called_once()
|
||||
|
||||
def test_no_alert_without_token(self, cfg):
|
||||
cfg.telegram_token = None
|
||||
snap = self._snap("critical")
|
||||
obs._init_db(cfg.db_path)
|
||||
alerts = obs.maybe_alert(cfg, snap, None)
|
||||
assert alerts == []
|
||||
|
||||
def test_no_alert_without_chat_id(self, cfg):
|
||||
cfg.alert_chat_id = None
|
||||
snap = self._snap("critical")
|
||||
obs._init_db(cfg.db_path)
|
||||
alerts = obs.maybe_alert(cfg, snap, None)
|
||||
assert alerts == []
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Digest tests
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestDigest:
|
||||
def test_empty_digest(self, cfg):
|
||||
obs._init_db(cfg.db_path)
|
||||
digest = obs.build_digest(cfg)
|
||||
assert "no health data" in digest.lower() or "24 hours" in digest.lower()
|
||||
|
||||
def test_digest_with_data(self, cfg):
|
||||
obs._init_db(cfg.db_path)
|
||||
from datetime import datetime, timezone, timedelta
|
||||
ts = (datetime.now(timezone.utc) - timedelta(hours=1)).isoformat()
|
||||
snap = obs.HealthSnapshot(
|
||||
ts=ts,
|
||||
checks=[
|
||||
obs.CheckResult(name="gateway_process", status="ok", message="running"),
|
||||
obs.CheckResult(name="disk", status="ok", message="50% used", value=50.0, unit="%"),
|
||||
obs.CheckResult(name="webhook_http", status="ok", message="ok", value=150.0, unit="ms"),
|
||||
],
|
||||
)
|
||||
obs.store_snapshot(cfg, snap)
|
||||
digest = obs.build_digest(cfg)
|
||||
assert "Daily Digest" in digest
|
||||
assert "Gateway" in digest or "gateway" in digest
|
||||
|
||||
def test_send_digest_no_token(self, cfg):
|
||||
cfg.telegram_token = None
|
||||
obs._init_db(cfg.db_path)
|
||||
result = obs.send_digest(cfg)
|
||||
assert result is False
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# SLO tests
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestSLO:
|
||||
def test_slo_definitions_complete(self):
|
||||
assert "gateway_uptime_pct" in obs.SLO_DEFINITIONS
|
||||
assert "webhook_latency_ms" in obs.SLO_DEFINITIONS
|
||||
assert "api_server_latency_ms" in obs.SLO_DEFINITIONS
|
||||
|
||||
def test_slo_targets(self):
|
||||
assert obs.SLO_DEFINITIONS["gateway_uptime_pct"]["target"] == 99.5
|
||||
assert obs.SLO_DEFINITIONS["webhook_latency_ms"]["target"] == 2000
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CLI tests
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestCLI:
|
||||
def test_check_exits_0_on_ok(self, cfg, monkeypatch, tmp_path):
|
||||
monkeypatch.setenv("OBSERVATORY_DB_PATH", str(tmp_path / "obs.db"))
|
||||
ok_snap = obs.HealthSnapshot(
|
||||
ts="2026-01-01T00:00:00+00:00",
|
||||
checks=[obs.CheckResult(name="all_good", status="ok", message="fine")],
|
||||
)
|
||||
with patch("observatory.collect_snapshot", return_value=ok_snap), \
|
||||
patch("observatory.store_snapshot"):
|
||||
rc = obs.main(["--check"])
|
||||
assert rc == 0
|
||||
|
||||
def test_check_exits_nonzero_on_critical(self, cfg, monkeypatch, tmp_path):
|
||||
monkeypatch.setenv("OBSERVATORY_DB_PATH", str(tmp_path / "obs.db"))
|
||||
bad_snap = obs.HealthSnapshot(
|
||||
ts="2026-01-01T00:00:00+00:00",
|
||||
checks=[obs.CheckResult(name="gateway_process", status="critical", message="down")],
|
||||
)
|
||||
with patch("observatory.collect_snapshot", return_value=bad_snap), \
|
||||
patch("observatory.store_snapshot"):
|
||||
rc = obs.main(["--check"])
|
||||
assert rc != 0
|
||||
|
||||
def test_digest_flag(self, monkeypatch, tmp_path):
|
||||
monkeypatch.setenv("OBSERVATORY_DB_PATH", str(tmp_path / "obs.db"))
|
||||
rc = obs.main(["--digest"])
|
||||
assert rc == 0
|
||||
|
||||
def test_slo_flag(self, monkeypatch, tmp_path):
|
||||
monkeypatch.setenv("OBSERVATORY_DB_PATH", str(tmp_path / "obs.db"))
|
||||
rc = obs.main(["--slo"])
|
||||
assert rc == 0
|
||||
|
||||
def test_history_flag(self, monkeypatch, tmp_path):
|
||||
monkeypatch.setenv("OBSERVATORY_DB_PATH", str(tmp_path / "obs.db"))
|
||||
rc = obs.main(["--history", "5"])
|
||||
assert rc == 0
|
||||
215
wizard-bootstrap/FORGE_OPERATIONS_GUIDE.md
Normal file
215
wizard-bootstrap/FORGE_OPERATIONS_GUIDE.md
Normal file
@@ -0,0 +1,215 @@
|
||||
# Forge Operations Guide
|
||||
|
||||
> **Audience:** Forge wizards joining the hermes-agent project
|
||||
> **Purpose:** Practical patterns, common pitfalls, and operational wisdom
|
||||
> **Companion to:** `WIZARD_ENVIRONMENT_CONTRACT.md`
|
||||
|
||||
---
|
||||
|
||||
## The One Rule
|
||||
|
||||
**Read the actual state before acting.**
|
||||
|
||||
Before touching any service, config, or codebase: `ps aux | grep hermes`, `cat ~/.hermes/gateway_state.json`, `curl http://127.0.0.1:8642/health`. The forge punishes assumptions harder than it rewards speed. Evidence always beats intuition.
|
||||
|
||||
---
|
||||
|
||||
## First 15 Minutes on a New System
|
||||
|
||||
```bash
|
||||
# 1. Validate your environment
|
||||
python wizard-bootstrap/wizard_bootstrap.py
|
||||
|
||||
# 2. Check what is actually running
|
||||
ps aux | grep -E 'hermes|python|gateway'
|
||||
|
||||
# 3. Check the data directory
|
||||
ls -la ~/.hermes/
|
||||
cat ~/.hermes/gateway_state.json 2>/dev/null | python3 -m json.tool
|
||||
|
||||
# 4. Verify health endpoints (if gateway is up)
|
||||
curl -sf http://127.0.0.1:8642/health | python3 -m json.tool
|
||||
|
||||
# 5. Run the smoke test
|
||||
source venv/bin/activate
|
||||
python -m pytest tests/ -q -x --timeout=60 2>&1 | tail -20
|
||||
```
|
||||
|
||||
Do not begin work until all five steps return clean output.
|
||||
|
||||
---
|
||||
|
||||
## Import Chain — Know It, Respect It
|
||||
|
||||
The dependency order is load-bearing. Violating it causes silent failures:
|
||||
|
||||
```
|
||||
tools/registry.py ← no deps; imported by everything
|
||||
↑
|
||||
tools/*.py ← each calls registry.register() at import time
|
||||
↑
|
||||
model_tools.py ← imports registry; triggers tool discovery
|
||||
↑
|
||||
run_agent.py / cli.py / batch_runner.py
|
||||
```
|
||||
|
||||
**If you add a tool file**, you must also:
|
||||
1. Add its import to `model_tools.py` `_discover_tools()`
|
||||
2. Add it to `toolsets.py` (core or a named toolset)
|
||||
|
||||
Missing either step causes the tool to silently not appear — no error, just absence.
|
||||
|
||||
---
|
||||
|
||||
## The Five Profile Rules
|
||||
|
||||
Hermes supports isolated profiles (`hermes -p myprofile`). Profile-unsafe code has caused repeated bugs. Memorize these:
|
||||
|
||||
| Do this | Not this |
|
||||
|---------|----------|
|
||||
| `get_hermes_home()` | `Path.home() / ".hermes"` |
|
||||
| `display_hermes_home()` in user messages | hardcoded `~/.hermes` strings |
|
||||
| `get_hermes_home() / "sessions"` in tests | `~/.hermes/sessions` in tests |
|
||||
|
||||
Import both from `hermes_constants`. Every `~/.hermes` hardcode is a latent profile bug.
|
||||
|
||||
---
|
||||
|
||||
## Prompt Caching — Do Not Break It
|
||||
|
||||
The agent caches system prompts. Cache breaks force re-billing of the entire context window on every turn. The following actions break caching mid-conversation and are forbidden:
|
||||
|
||||
- Altering past context
|
||||
- Changing the active toolset
|
||||
- Reloading memories or rebuilding the system prompt
|
||||
|
||||
The only sanctioned context alteration is the context compressor (`agent/context_compressor.py`). If your feature touches the message history, read that file first.
|
||||
|
||||
---
|
||||
|
||||
## Adding a Slash Command (Checklist)
|
||||
|
||||
Four files, in order:
|
||||
|
||||
1. **`hermes_cli/commands.py`** — add `CommandDef` to `COMMAND_REGISTRY`
|
||||
2. **`cli.py`** — add handler branch in `HermesCLI.process_command()`
|
||||
3. **`gateway/run.py`** — add handler if it should work in messaging platforms
|
||||
4. **Aliases** — add to the `aliases` tuple on the `CommandDef`; everything else updates automatically
|
||||
|
||||
All downstream consumers (Telegram menu, Slack routing, autocomplete, help text) derive from `COMMAND_REGISTRY`. You never touch them directly.
|
||||
|
||||
---
|
||||
|
||||
## Tool Schema Pitfalls
|
||||
|
||||
**Do NOT cross-reference other toolsets in schema descriptions.**
|
||||
Writing "prefer `web_search` over this tool" in a browser tool's description will cause the model to hallucinate calls to `web_search` when it's not loaded. Cross-references belong in `get_tool_definitions()` post-processing blocks in `model_tools.py`.
|
||||
|
||||
**Do NOT use `\033[K` (ANSI erase-to-EOL) in display code.**
|
||||
Under `prompt_toolkit`'s `patch_stdout`, it leaks as literal `?[K`. Use space-padding instead: `f"\r{line}{' ' * pad}"`.
|
||||
|
||||
**Do NOT use `simple_term_menu` for interactive menus.**
|
||||
It ghosts on scroll in tmux/iTerm2. Use `curses` (stdlib). See `hermes_cli/tools_config.py` for the pattern.
|
||||
|
||||
---
|
||||
|
||||
## Health Check Anatomy
|
||||
|
||||
A healthy instance returns:
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "ok",
|
||||
"gateway_state": "running",
|
||||
"platforms": {
|
||||
"telegram": {"state": "connected"}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Healthy value | What a bad value means |
|
||||
|-------|--------------|----------------------|
|
||||
| `status` | `"ok"` | HTTP server down |
|
||||
| `gateway_state` | `"running"` | Still starting or crashed |
|
||||
| `platforms.<name>.state` | `"connected"` | Auth failure or network issue |
|
||||
|
||||
`gateway_state: "starting"` is normal for up to 60 s on boot. Beyond that, check logs for auth errors:
|
||||
|
||||
```bash
|
||||
journalctl -u hermes-gateway --since "2 minutes ago" | grep -i "error\|token\|auth"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Gateway Won't Start — Diagnosis Order
|
||||
|
||||
1. `ss -tlnp | grep 8642` — port conflict?
|
||||
2. `cat ~/.hermes/gateway.pid` → `ps -p <pid>` — stale PID file?
|
||||
3. `hermes gateway start --replace` — clears stale locks and PIDs
|
||||
4. `HERMES_LOG_LEVEL=DEBUG hermes gateway start` — verbose output
|
||||
5. Check `~/.hermes/.env` — missing or placeholder token?
|
||||
|
||||
---
|
||||
|
||||
## Before Every PR
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
python -m pytest tests/ -q # full suite: ~3 min, ~3000 tests
|
||||
python scripts/deploy-validate # deployment health check
|
||||
python wizard-bootstrap/wizard_bootstrap.py # environment sanity
|
||||
```
|
||||
|
||||
All three must exit 0. Do not skip. "It works locally" is not sufficient evidence.
|
||||
|
||||
---
|
||||
|
||||
## Session and State Files
|
||||
|
||||
| Store | Location | Notes |
|
||||
|-------|----------|-------|
|
||||
| Sessions | `~/.hermes/sessions/*.json` | Persisted across restarts |
|
||||
| Memories | `~/.hermes/memories/*.md` | Written by the agent's memory tool |
|
||||
| Cron jobs | `~/.hermes/cron/*.json` | Scheduler state |
|
||||
| Gateway state | `~/.hermes/gateway_state.json` | Live platform connection status |
|
||||
| Response store | `~/.hermes/response_store.db` | SQLite WAL — API server only |
|
||||
|
||||
All paths go through `get_hermes_home()`. Never hardcode. Always backup before a major update:
|
||||
|
||||
```bash
|
||||
tar czf ~/backups/hermes_$(date +%F_%H%M).tar.gz ~/.hermes/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Writing Tests
|
||||
|
||||
```bash
|
||||
python -m pytest tests/path/to/test.py -q # single file
|
||||
python -m pytest tests/ -q -k "test_name" # by name
|
||||
python -m pytest tests/ -q -x # stop on first failure
|
||||
```
|
||||
|
||||
**Test isolation rules:**
|
||||
- `tests/conftest.py` has an autouse fixture that redirects `HERMES_HOME` to a temp dir. Never write to `~/.hermes/` in tests.
|
||||
- Profile tests must mock both `Path.home()` and `HERMES_HOME`. See `tests/hermes_cli/test_profiles.py` for the pattern.
|
||||
- Do not mock the database. Integration tests should use real SQLite with a temp path.
|
||||
|
||||
---
|
||||
|
||||
## Commit Conventions
|
||||
|
||||
```
|
||||
feat: add X # new capability
|
||||
fix: correct Y # bug fix
|
||||
refactor: restructure Z # no behaviour change
|
||||
test: add tests for W # test-only
|
||||
chore: update deps # housekeeping
|
||||
docs: clarify X # documentation only
|
||||
```
|
||||
|
||||
Include `Fixes #NNN` or `Refs #NNN` in the commit message body to close or reference issues automatically.
|
||||
|
||||
---
|
||||
|
||||
*This guide lives in `wizard-bootstrap/`. Update it when you discover a new pitfall or pattern worth preserving.*
|
||||
Reference in New Issue
Block a user