Compare commits

..

6 Commits

Author SHA1 Message Date
Alexander Whitestone
61a6964780 wip: apply mission defaults before agent overrides
Some checks failed
CI / test (pull_request) Failing after 1m12s
CI / validate (pull_request) Failing after 1m8s
Review Approval Gate / verify-review (pull_request) Failing after 10s
2026-04-15 03:56:10 -04:00
Alexander Whitestone
e40891afb8 wip: honor mission defaults in resurrection policy 2026-04-15 03:54:56 -04:00
Alexander Whitestone
e232112fc8 wip: add resurrection pool planner and policy config 2026-04-15 03:53:20 -04:00
Alexander Whitestone
ff2e2e578f wip: add resurrection pool regression tests 2026-04-15 03:50:56 -04:00
bd0497b998 Merge PR #1585: docs: add night shift prediction report (#1353) 2026-04-15 06:13:22 +00:00
Alexander Whitestone
4ab84a59ab docs: add night shift prediction report (#1353)
Some checks failed
CI / test (pull_request) Failing after 50s
CI / validate (pull_request) Failing after 1m10s
Review Approval Gate / verify-review (pull_request) Successful in 16s
2026-04-15 02:02:26 -04:00
9 changed files with 859 additions and 231 deletions

View File

@@ -0,0 +1,55 @@
{
"dead_timeout_seconds": 600,
"default_policy": {
"mode": "ask"
},
"missions": {
"forge": {
"mode": "yes"
},
"archive": {
"mode": "ask"
},
"sovereign-core": {
"mode": "no"
}
},
"agents": {
"bezalel": {
"mission": "forge"
},
"allegro": {
"mission": "forge"
},
"ezra": {
"mission": "archive",
"mode": "ask"
},
"timmy": {
"mission": "sovereign-core",
"mode": "ask"
}
},
"substitutions": {
"bezalel": [
"allegro",
"timmy"
],
"ezra": [
"timmy"
],
"allegro": [
"timmy"
]
},
"approval_channels": {
"telegram": {
"enabled": true,
"target": "ops-room"
},
"nostr": {
"enabled": true,
"target": "nostr-ops"
}
}
}

View File

@@ -1,72 +0,0 @@
# Duplicate PR Prevention
## Problem
The burn loop creates duplicate PRs for the same issue because it doesn't check for existing PRs before creating new ones.
## Solution
Two scripts:
### 1. Preflight Check (`scripts/preflight-pr-check.sh`)
Run BEFORE creating a PR:
```bash
./scripts/preflight-pr-check.sh 1128
```
Output if PRs exist:
```
🚫 BLOCKED: 2 existing PR(s) for issue #1128
Existing PRs:
#1458: feat: Close duplicate PRs for issue #1128
Branch: dawn/1128-1776130053
URL: https://...
Options:
1. Review and merge an existing PR
2. Close duplicates and proceed
3. Use --force to bypass (NOT RECOMMENDED)
```
Exit code 1 = blocked. Exit code 0 = safe to proceed.
### 2. Cleanup Script (`scripts/cleanup-duplicate-prs.sh`)
Close duplicate PRs:
```bash
# Dry run (show what would be closed)
./scripts/cleanup-duplicate-prs.sh 1128
# Actually close duplicates (keeps oldest)
./scripts/cleanup-duplicate-prs.sh 1128 --close
```
## Integration
### In burn loop
Add preflight check before PR creation:
```bash
# Before: git push && curl ... /pulls
./scripts/preflight-pr-check.sh $ISSUE_NUM || exit 1
```
### In CI
Add as a GitHub/Gitea Actions check:
```yaml
- name: Check for duplicate PRs
run: ./scripts/preflight-pr-check.sh ${{ github.event.issue.number }}
```
## Environment Variables
- `GITEA_TOKEN` — API token (default: reads from `~/.config/gitea/token`)
- `GITEA_URL` — Forge URL (default: `https://forge.alexanderwhitestone.com`)
- `GITEA_REPO` — Repository (default: `Timmy_Foundation/the-nexus`)

27
docs/resurrection-pool.md Normal file
View File

@@ -0,0 +1,27 @@
# Resurrection Pool
The Resurrection Pool is a mission-aware layer on top of the existing Lazarus registry.
It adds three concrete behaviors:
- configurable dead-agent detection timeout
- yes/no/ask revival policy resolution per mission or agent
- approval packet generation for Telegram / Nostr when human sign-off is required
## Files
- `scripts/resurrection_pool.py`
- `config/resurrection_pool.json`
## Example usage
```bash
python scripts/resurrection_pool.py --json --dry-run
python scripts/resurrection_pool.py --execute
```
## Policy model
- `yes` → local agents auto-restart; remote agents prefer a healthy substitute
- `ask` → generate an approval request packet with Telegram / Nostr targets
- `no` → suppress automatic revival
## Notes
This grounds issue #882 in executable code, but it does not yet wire live Telegram or Nostr delivery. The current slice produces the approval packet and restart/substitution plan the surrounding ops loop can act on.

View File

@@ -0,0 +1,111 @@
# Night Shift Prediction Report — April 12-13, 2026
## Starting State (11:36 PM)
```
Time: 11:36 PM EDT
Automation: 13 burn loops × 3min + 1 explorer × 10min + 1 backlog × 30min
API: Nous/xiaomi/mimo-v2-pro (FREE)
Rate: 268 calls/hour
Duration: 7.5 hours until 7 AM
Total expected API calls: ~2,010
```
## Burn Loops Active (13 @ every 3 min)
| Loop | Repo | Focus |
|------|------|-------|
| Testament Burn | the-nexus | MUD bridge + paper |
| Foundation Burn | all repos | Gitea issues |
| beacon-sprint | the-nexus | paper iterations |
| timmy-home sprint | timmy-home | 226 issues |
| Beacon sprint | the-beacon | game issues |
| timmy-config sprint | timmy-config | config issues |
| the-door burn | the-door | crisis front door |
| the-testament burn | the-testament | book |
| the-nexus burn | the-nexus | 3D world + MUD |
| fleet-ops burn | fleet-ops | sovereign fleet |
| timmy-academy burn | timmy-academy | academy |
| turboquant burn | turboquant | KV-cache compression |
| wolf burn | wolf | model evaluation |
## Expected Outcomes by 7 AM
### API Calls
- Total calls: ~2,010
- Successful completions: ~1,400 (70%)
- API errors (rate limit, timeout): ~400 (20%)
- Iteration limits hit: ~210 (10%)
### Commits
- Total commits pushed: ~800-1,200
- Average per loop: ~60-90 commits
- Unique branches created: ~300-400
### Pull Requests
- Total PRs created: ~150-250
- Average per loop: ~12-19 PRs
### Issues Filed
- New issues created (QA, explorer): ~20-40
- Issues closed by PRs: ~50-100
### Code Written
- Estimated lines added: ~50,000-100,000
- Estimated files created/modified: ~2,000-3,000
### Paper Progress
- Research paper iterations: ~150 cycles
- Expected paper word count growth: ~5,000-10,000 words
- New experiment results: 2-4 additional experiments
- BibTeX citations: 10-20 verified citations
### MUD Bridge
- Bridge file: 2,875 → ~5,000+ lines
- New game systems: 5-10 (combat tested, economy, social graph, leaderboard)
- QA cycles: 15-30 exploration sessions
- Critical bugs found: 3-5
- Critical bugs fixed: 2-3
### Repository Activity (per repo)
| Repo | Expected PRs | Expected Commits |
|------|-------------|-----------------|
| the-nexus | 30-50 | 200-300 |
| the-beacon | 20-30 | 150-200 |
| timmy-config | 15-25 | 100-150 |
| the-testament | 10-20 | 80-120 |
| the-door | 5-10 | 40-60 |
| timmy-home | 10-20 | 80-120 |
| fleet-ops | 5-10 | 40-60 |
| timmy-academy | 5-10 | 40-60 |
| turboquant | 3-5 | 20-30 |
| wolf | 3-5 | 20-30 |
### Dream Cycle
- 5 dreams generated (11:30 PM, 1 AM, 2:30 AM, 4 AM, 5:30 AM)
- 1 reflection (10 PM)
- 1 timmy-dreams (5:30 AM)
- Total dream output: ~5,000-8,000 words of creative writing
### Explorer (every 10 min)
- ~45 exploration cycles
- Bugs found: 15-25
- Issues filed: 15-25
### Risk Factors
- API rate limiting: Possible after 500+ consecutive calls
- Large file patch failures: Bridge file too large for agents
- Branch conflicts: Multiple agents on same repo
- Iteration limits: 5-iteration agents can't push
- Repository cloning: May hit timeout on slow clones
### Confidence Level
- High confidence: 800+ commits, 150+ PRs
- Medium confidence: 1,000+ commits, 200+ PRs
- Low confidence: 1,200+ commits, 250+ PRs (requires all loops running clean)
---
*This report is a prediction. The 7 AM morning report will compare actual results.*
*Generated: 2026-04-12 23:36 EDT*
*Author: Timmy (pre-shift prediction)*

View File

@@ -1,101 +1,170 @@
#!/usr/bin/env bash
# cleanup-duplicate-prs.sh — Close duplicate PRs for a given issue
# ═══════════════════════════════════════════════════════════════
# cleanup-duplicate-prs.sh — Identify and close duplicate open PRs
#
# This script identifies PRs that are duplicates (same issue number
# or very similar titles) and closes the older ones.
#
# Usage:
# ./scripts/cleanup-duplicate-prs.sh <issue_number> [--close]
# ./scripts/cleanup-duplicate-prs.sh [--dry-run] [--close]
#
# Without --close: dry run (show what would be closed)
# With --close: actually close the duplicates
# Options:
# --dry-run Show what would be done without making changes
# --close Actually close duplicate PRs (default is dry-run)
#
# Designed for issue #1128: Forge Cleanup
# ═══════════════════════════════════════════════════════════════
set -euo pipefail
ISSUE_NUM="${1:?Usage: cleanup-duplicate-prs.sh <issue_number> [--close]}"
CLOSE_MODE="${2:-}"
# ─── Configuration ──────────────────────────────────────────
GITEA_URL="${GITEA_URL:-https://forge.alexanderwhitestone.com}"
GITEA_TOKEN="${GITEA_TOKEN:-$(cat ~/.config/gitea/token 2>/dev/null || echo '')}"
REPO="${GITEA_REPO:-Timmy_Foundation/the-nexus}"
GITEA_TOKEN="${GITEA_TOKEN:?Set GITEA_TOKEN env var}"
REPO="${REPO:-Timmy_Foundation/the-nexus}"
DRY_RUN="${DRY_RUN:-true}"
if [ -z "$GITEA_TOKEN" ]; then
echo "ERROR: GITEA_TOKEN not set"
exit 1
fi
# Parse command line arguments
for arg in "$@"; do
case $arg in
--dry-run)
DRY_RUN="true"
;;
--close)
DRY_RUN="false"
;;
esac
done
REPO_API="${GITEA_URL}/api/v1/repos/${REPO}"
API="$GITEA_URL/api/v1"
AUTH="token $GITEA_TOKEN"
# Fetch open PRs
PRS=$(curl -sf -H "Authorization: token ${GITEA_TOKEN}" "${REPO_API}/pulls?state=open&limit=50" 2>/dev/null || echo '[]')
log() { echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] $*"; }
# Find matching PRs
MATCHES=$(echo "$PRS" | python3 -c "
import json, sys
prs = json.load(sys.stdin)
issue = '${ISSUE_NUM}'
matches = []
for pr in prs:
title = pr.get('title', '')
body = pr.get('body', '')
ref = pr.get('head', {}).get('ref', '')
if f'#{issue}' in title or f'#{issue}' in body or issue in ref:
matches.append(pr)
json.dump(matches, sys.stdout)
" 2>/dev/null || echo '[]')
# ─── Fetch open PRs ────────────────────────────────────────
log "Checking open PRs for $REPO (dry_run: $DRY_RUN)"
COUNT=$(echo "$MATCHES" | python3 -c "import json,sys; print(len(json.load(sys.stdin)))" 2>/dev/null || echo '0')
OPEN_PRS=$(curl -s -H "$AUTH" "$API/repos/$REPO/pulls?state=open&limit=50")
if [ "$COUNT" -eq 0 ]; then
echo "No PRs found for issue #$ISSUE_NUM"
if [ -z "$OPEN_PRS" ] || [ "$OPEN_PRS" = "null" ]; then
log "No open PRs found or API error"
exit 0
fi
echo "Found $COUNT PR(s) for issue #$ISSUE_NUM:"
echo "$MATCHES" | python3 -c "
import json, sys
prs = json.load(sys.stdin)
for pr in prs:
print(f" #{pr['number']}: {pr['title']} [{pr['head']['ref']}]")
"
# Count PRs
PR_COUNT=$(echo "$OPEN_PRS" | jq length)
log "Found $PR_COUNT open PRs"
if [ "$COUNT" -le 1 ]; then
echo ""
echo "Only 1 PR found. No cleanup needed."
if [ "$PR_COUNT" -eq 0 ]; then
log "No open PRs to process"
exit 0
fi
# Keep the oldest PR, close the rest
echo "$MATCHES" | python3 -c "
import json, sys
prs = json.load(sys.stdin)
prs.sort(key=lambda p: p['number'])
keep = prs[0]
close = prs[1:]
print(f'KEEP: #{keep["number"]}: {keep["title"]}')
for pr in close:
print(f'CLOSE: #{pr["number"]}: {pr["title"]}')
"
# ─── Extract issue numbers from PR titles ──────────────────
# Create a temporary file for PR data
TEMP_FILE=$(mktemp)
echo "$OPEN_PRS" | jq -r '.[] | "\(.number)\t\(.title)\t\(.created_at)\t\(.head.ref)"' > "$TEMP_FILE"
if [ "$CLOSE_MODE" != "--close" ]; then ""
echo ""
echo "DRY RUN: Add --close to actually close duplicates"
exit 0
# Group PRs by issue number using temporary files
TEMP_DIR=$(mktemp -d)
trap "rm -rf $TEMP_DIR" EXIT
while IFS=$'\t' read -r pr_number pr_title pr_created pr_branch; do
# Extract issue number from title (look for #123 pattern)
if [[ $pr_title =~ \#([0-9]+) ]]; then
issue_num="${BASH_REMATCH[1]}"
echo "$pr_number,$pr_created,$pr_branch" >> "$TEMP_DIR/issue_$issue_num.txt"
fi
done < "$TEMP_FILE"
rm -f "$TEMP_FILE"
# ─── Identify and process duplicates ──────────────────────
DUPLICATES_FOUND=0
CLOSED_COUNT=0
for issue_file in "$TEMP_DIR"/issue_*.txt; do
[ -f "$issue_file" ] || continue
issue_num=$(basename "$issue_file" .txt | sed 's/issue_//')
pr_list=$(cat "$issue_file")
# Count PRs for this issue
pr_count=$(echo -n "$pr_list" | grep -c '^' || true)
if [ "$pr_count" -le 1 ]; then
continue # No duplicates
fi
log "Issue #$issue_num has $pr_count open PRs"
DUPLICATES_FOUND=$((DUPLICATES_FOUND + 1))
# Sort by creation date (oldest first)
sorted_prs=$(echo -n "$pr_list" | sort -t',' -k2)
# Keep the newest PR, close the rest
newest_pr=""
newest_date=""
while IFS=',' read -r pr_num pr_date pr_branch; do
if [ -z "$newest_date" ] || [[ "$pr_date" > "$newest_date" ]]; then
newest_pr="$pr_num"
newest_date="$pr_date"
fi
done <<< "$sorted_prs"
log "Keeping PR #$newest_pr (newest)"
# Close older PRs
while IFS=',' read -r pr_num pr_date pr_branch; do
if [ "$pr_num" = "$newest_pr" ]; then
continue # Skip the newest PR
fi
log "Closing duplicate PR #$pr_num for issue #$issue_num"
if [ "$DRY_RUN" = "true" ]; then
log "DRY RUN: Would close PR #$pr_num"
else
# Add a comment explaining why we're closing
comment_body="Closing as duplicate. PR #$newest_pr is newer and addresses the same issue (#$issue_num)."
curl -s -X POST -H "$AUTH" -H "Content-Type: application/json" -d "{\"body\": \"$comment_body\"}" "$API/repos/$REPO/issues/$pr_num/comments" > /dev/null
# Close the PR
curl -s -X PATCH -H "$AUTH" -H "Content-Type: application/json" -d '{"state": "closed"}' "$API/repos/$REPO/pulls/$pr_num" > /dev/null
log "Closed PR #$pr_num"
CLOSED_COUNT=$((CLOSED_COUNT + 1))
fi
done <<< "$sorted_prs"
done
# ─── Summary ──────────────────────────────────────────────
log "Cleanup complete:"
log " Duplicate issue groups found: $DUPLICATES_FOUND"
log " PRs closed: $CLOSED_COUNT"
log " Dry run: $DRY_RUN"
if [ "$DUPLICATES_FOUND" -eq 0 ]; then
log "No duplicate PRs found"
fi
# Close duplicates
echo "$MATCHES" | python3 -c "
import json, sys, urllib.request, os
prs = json.load(sys.stdin)
prs.sort(key=lambda p: p['number'])
token = '${GITEA_TOKEN}'
api = '${REPO_API}'
for pr in prs[1:]:
url = f'{api}/pulls/{pr["number"]}'
data = json.dumps({'state': 'closed'}).encode()
req = urllib.request.Request(url, data=data, headers={'Authorization': f'token {token}', 'Content-Type': 'application/json'}, method='PATCH')
try:
urllib.request.urlopen(req)
print(f'Closed PR #{pr["number"]}')
except Exception as e:
print(f'Error closing #{pr["number"]}: {e}')
"
# ─── Additional cleanup: Stale PRs ────────────────────────
# Check for PRs older than 30 days with no activity
log "Checking for stale PRs (older than 30 days)..."
echo ""
echo "Cleanup complete."
THIRTY_DAYS_AGO=$(date -u -v-30d +%Y-%m-%dT%H:%M:%SZ 2>/dev/null || date -u -d "30 days ago" +%Y-%m-%dT%H:%M:%SZ)
STALE_PRS=$(echo "$OPEN_PRS" | jq -r --arg cutoff "$THIRTY_DAYS_AGO" '.[] | select(.created_at < $cutoff) | "\(.number)\t\(.title)\t\(.created_at)"')
if [ -n "$STALE_PRS" ]; then
STALE_COUNT=$(echo -n "$STALE_PRS" | grep -c '^' || true)
log "Found $STALE_COUNT stale PRs (older than 30 days)"
echo "$STALE_PRS" | while IFS=$'\t' read -r pr_num pr_title pr_created; do
log "Stale PR #$pr_num: $pr_title (created: $pr_created)"
done
else
log "No stale PRs found"
fi
log "Script complete"

View File

@@ -1,82 +0,0 @@
#!/usr/bin/env bash
# preflight-pr-check.sh — Prevent duplicate PRs before creating them
#
# Usage:
# ./scripts/preflight-pr-check.sh <issue_number>
#
# Exit codes:
# 0 = safe to proceed (no existing PRs)
# 1 = BLOCKED (existing PRs found)
# 2 = error
set -euo pipefail
ISSUE_NUM="${1:?Usage: preflight-pr-check.sh <issue_number>}"
GITEA_URL="${GITEA_URL:-https://forge.alexanderwhitestone.com}"
GITEA_TOKEN="${GITEA_TOKEN:-$(cat ~/.config/gitea/token 2>/dev/null || echo '')}"
REPO="${GITEA_REPO:-Timmy_Foundation/the-nexus}"
if [ -z "$GITEA_TOKEN" ]; then
echo "ERROR: GITEA_TOKEN not set and ~/.config/gitea/token not found"
exit 2
fi
# Get repo info
REPO_API="${GITEA_URL}/api/v1/repos/${REPO}"
# Fetch open PRs
PRS=$(curl -sf -H "Authorization: token ${GITEA_TOKEN}" "${REPO_API}/pulls?state=open&limit=50" 2>/dev/null || echo '[]')
# Check for existing PRs referencing this issue
MATCHING_PRS=$(echo "$PRS" | python3 -c "
import json, sys
prs = json.load(sys.stdin)
issue = '${ISSUE_NUM}'
matches = []
for pr in prs:
title = pr.get('title', '')
body = pr.get('body', '')
ref = pr.get('head', {}).get('ref', '')
if f'#{issue}' in title or f'#{issue}' in body or issue in ref:
matches.append({
'number': pr['number'],
'title': title,
'branch': ref,
'url': pr.get('html_url', '')
})
json.dump(matches, sys.stdout)
" 2>/dev/null || echo '[]')
COUNT=$(echo "$MATCHING_PRS" | python3 -c "import json,sys; print(len(json.load(sys.stdin)))" 2>/dev/null || echo '0')
if [ "$COUNT" -gt 0 ]; then
echo "╔══════════════════════════════════════════════════════════════╗"
echo "║ 🚫 BLOCKED: $COUNT existing PR(s) for issue #$ISSUE_NUM"
echo "╚══════════════════════════════════════════════════════════════╝"
echo ""
echo "Existing PRs:"
echo "$MATCHING_PRS" | python3 -c "
import json, sys
prs = json.load(sys.stdin)
for pr in prs:
print(f" #{pr['number']}: {pr['title']}")
print(f" Branch: {pr['branch']}")
print(f" URL: {pr['url']}")
print()
"
echo "Options:"
echo " 1. Review and merge an existing PR"
echo " 2. Close duplicates and proceed"
echo " 3. Use --force to bypass (NOT RECOMMENDED)"
echo ""
if [ "${2:-}" = "--force" ]; then
echo "⚠️ --force flag detected. Bypassing duplicate check."
exit 0
fi
exit 1
else
echo "✅ Safe to proceed: No existing PRs for issue #$ISSUE_NUM"
exit 0
fi

View File

@@ -0,0 +1,377 @@
#!/usr/bin/env python3
"""Resurrection Pool — health polling, dead-agent detection, and revival planning.
Grounded implementation slice for #882.
Uses the existing lazarus registry as the fleet source of truth and layers a
mission-aware policy engine plus human approval packet generation on top.
"""
from __future__ import annotations
import argparse
import json
import subprocess
import urllib.request
from datetime import datetime, timezone
from pathlib import Path
from typing import Any, Dict, List, Optional
import yaml
ROOT = Path(__file__).resolve().parent.parent
REGISTRY_PATH = ROOT / "lazarus-registry.yaml"
POLICY_PATH = ROOT / "config" / "resurrection_pool.json"
STATE_PATH = Path("/var/lib/lazarus/resurrection_pool_state.json")
LOCAL_HOSTS = {"127.0.0.1", "localhost", "104.131.15.18"}
ISSUE_NUMBER = 882
def shell(cmd: str, timeout: int = 30) -> tuple[int, str, str]:
try:
result = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=timeout)
return result.returncode, result.stdout.strip(), result.stderr.strip()
except Exception as exc: # pragma: no cover - defensive wrapper
return -1, "", str(exc)
def is_local_host(host: Optional[str]) -> bool:
if not host:
return True
return host in LOCAL_HOSTS or host.startswith("127.")
def ping_http(url: str, timeout: int = 10) -> tuple[bool, int]:
try:
req = urllib.request.Request(url, method="HEAD")
with urllib.request.urlopen(req, timeout=timeout) as resp:
return True, resp.status
except urllib.error.HTTPError as err:
return True, err.code
except Exception:
return False, 0
def load_registry(path: Path = REGISTRY_PATH) -> Dict[str, Any]:
with open(path, "r", encoding="utf-8") as handle:
return yaml.safe_load(handle) or {}
def load_policy(path: Path = POLICY_PATH) -> Dict[str, Any]:
if not path.exists():
return {
"dead_timeout_seconds": 600,
"default_policy": {"mode": "ask"},
"missions": {},
"agents": {},
"substitutions": {},
"approval_channels": {},
}
with open(path, "r", encoding="utf-8") as handle:
data = json.load(handle)
data.setdefault("dead_timeout_seconds", 600)
data.setdefault("default_policy", {"mode": "ask"})
data.setdefault("missions", {})
data.setdefault("agents", {})
data.setdefault("substitutions", {})
data.setdefault("approval_channels", {})
return data
def load_state(path: Path = STATE_PATH) -> Dict[str, Any]:
if not path.exists():
return {}
with open(path, "r", encoding="utf-8") as handle:
return json.load(handle)
def save_state(state: Dict[str, Any], path: Path = STATE_PATH) -> None:
path.parent.mkdir(parents=True, exist_ok=True)
with open(path, "w", encoding="utf-8") as handle:
json.dump(state, handle, indent=2, sort_keys=True)
def collect_health_snapshot(registry: Dict[str, Any]) -> Dict[str, Any]:
provider_matrix = registry.get("provider_health_matrix", {})
fleet = registry.get("fleet", {})
snapshot: Dict[str, Any] = {}
for agent_name, spec in fleet.items():
primary = spec.get("primary", {})
provider_name = primary.get("provider")
provider_status = provider_matrix.get(provider_name, {}).get("status", "unknown")
gateway_url = spec.get("health_endpoints", {}).get("gateway")
gateway_reachable, gateway_status = (False, 0)
if gateway_url:
gateway_reachable, gateway_status = ping_http(gateway_url)
service_active: Optional[bool] = None
if is_local_host(spec.get("host")):
service_code, _, _ = shell(f"systemctl is-active hermes-{agent_name}.service")
service_active = service_code == 0
reasons: List[str] = []
if gateway_url and not gateway_reachable:
reasons.append("gateway_unreachable")
if service_active is False:
reasons.append("service_inactive")
if provider_status in {"dead", "degraded"}:
reasons.append(f"primary_{provider_status}")
snapshot[agent_name] = {
"agent": agent_name,
"host": spec.get("host"),
"gateway_url": gateway_url,
"gateway_reachable": gateway_reachable,
"gateway_status": gateway_status,
"service_active": service_active,
"primary_provider": {
"provider": provider_name,
"model": primary.get("model"),
"status": provider_status,
},
"healthy_now": not reasons,
"reasons": reasons,
}
return snapshot
def update_state(snapshot: Dict[str, Any], state: Dict[str, Any], now_ts: float) -> Dict[str, Any]:
updated = dict(state)
for agent_name, info in snapshot.items():
entry = dict(updated.get(agent_name, {}))
entry["last_checked_at"] = now_ts
entry["last_reasons"] = list(info.get("reasons", []))
if info.get("healthy_now"):
entry["last_healthy_at"] = now_ts
else:
entry.setdefault("last_healthy_at", None)
updated[agent_name] = entry
return updated
def detect_downed_agents(
snapshot: Dict[str, Any],
state: Dict[str, Any],
policy: Dict[str, Any],
now_ts: float,
) -> Dict[str, Any]:
default_timeout = int(policy.get("dead_timeout_seconds", 600))
agent_overrides = policy.get("agents", {})
detected: Dict[str, Any] = {}
for agent_name, info in snapshot.items():
timeout_seconds = int(agent_overrides.get(agent_name, {}).get("dead_timeout_seconds", default_timeout))
last_healthy_at = state.get(agent_name, {}).get("last_healthy_at")
if info.get("healthy_now"):
unhealthy_for_seconds = 0.0
dead = False
elif last_healthy_at is None:
unhealthy_for_seconds = float("inf")
dead = True
else:
unhealthy_for_seconds = max(0.0, now_ts - float(last_healthy_at))
dead = unhealthy_for_seconds >= timeout_seconds
detected[agent_name] = {
**info,
"last_healthy_at": last_healthy_at,
"timeout_seconds": timeout_seconds,
"unhealthy_for_seconds": unhealthy_for_seconds,
"dead": dead,
}
return detected
def resolve_policy(agent_name: str, spec: Dict[str, Any], policy: Dict[str, Any]) -> Dict[str, Any]:
resolved = dict(policy.get("default_policy", {}))
spec_mission = spec.get("mission")
agent_override = dict(policy.get("agents", {}).get(agent_name, {}))
resolved_mission = agent_override.get("mission") or spec_mission or agent_name
if resolved_mission in policy.get("missions", {}):
resolved.update(policy["missions"][resolved_mission])
resolved.update(agent_override)
resolved.setdefault("mode", "ask")
resolved["mission"] = resolved_mission
return resolved
def choose_substitute(
agent_name: str,
spec: Dict[str, Any],
health_snapshot: Dict[str, Any],
policy: Dict[str, Any],
) -> Optional[str]:
candidates = list(policy.get("substitutions", {}).get(agent_name, []))
candidates.extend(spec.get("substitutes", []))
seen = set()
for candidate in candidates:
if candidate in seen:
continue
seen.add(candidate)
candidate_health = health_snapshot.get(candidate, {})
if candidate_health.get("healthy_now"):
return candidate
return None
def build_restart_command(agent_name: str) -> str:
return f"systemctl restart hermes-{agent_name}.service"
def build_approval_request(
agent_name: str,
policy_decision: Dict[str, Any],
down_info: Dict[str, Any],
substitute: Optional[str],
policy: Dict[str, Any],
now_ts: Optional[float] = None,
) -> Dict[str, Any]:
if now_ts is None:
now_ts = datetime.now(timezone.utc).timestamp()
reasons = ", ".join(down_info.get("reasons", [])) or "no health signal"
mission = policy_decision.get("mission", agent_name)
message = (
f"[#{ISSUE_NUMBER}] Approval required to revive {agent_name} for mission '{mission}'. "
f"Reasons: {reasons}. "
f"Suggested substitute: {substitute or 'none available'}."
)
return {
"approval_key": f"{agent_name}:{mission}:{int(now_ts)}",
"agent": agent_name,
"mission": mission,
"substitute": substitute,
"message": message,
"channels": policy.get("approval_channels", {}),
}
def plan_resurrections(
registry: Dict[str, Any],
downed_agents: Dict[str, Any],
health_snapshot: Dict[str, Any],
policy: Dict[str, Any],
now_ts: Optional[float] = None,
) -> List[Dict[str, Any]]:
if now_ts is None:
now_ts = datetime.now(timezone.utc).timestamp()
fleet = registry.get("fleet", {})
plan: List[Dict[str, Any]] = []
for agent_name, down_info in sorted(downed_agents.items()):
if not down_info.get("dead"):
continue
spec = fleet.get(agent_name, {})
policy_decision = resolve_policy(agent_name, spec, policy)
substitute = choose_substitute(agent_name, spec, health_snapshot, policy)
action = "suppressed"
restart_command = None
approval_request = None
if policy_decision.get("mode") == "yes":
if is_local_host(spec.get("host")):
action = "auto_restart"
restart_command = build_restart_command(agent_name)
elif substitute:
action = "substitute"
else:
action = "unrecoverable"
elif policy_decision.get("mode") == "ask":
action = "approval_required"
approval_request = build_approval_request(
agent_name,
policy_decision,
down_info,
substitute,
policy,
now_ts=now_ts,
)
plan.append(
{
"agent": agent_name,
"mission": policy_decision.get("mission"),
"policy": policy_decision,
"reasons": list(down_info.get("reasons", [])),
"timeout_seconds": down_info.get("timeout_seconds"),
"action": action,
"substitute": substitute,
"restart_command": restart_command,
"approval_request": approval_request,
}
)
return plan
def execute_plan(plan: List[Dict[str, Any]], dry_run: bool = False) -> List[Dict[str, Any]]:
executed: List[Dict[str, Any]] = []
for entry in plan:
if entry.get("action") != "auto_restart":
executed.append({**entry, "executed": False})
continue
cmd = entry.get("restart_command")
if dry_run or not cmd:
executed.append({**entry, "executed": True, "exit_code": 0, "stdout": "", "stderr": ""})
continue
code, out, err = shell(cmd)
executed.append({**entry, "executed": code == 0, "exit_code": code, "stdout": out, "stderr": err})
return executed
def render_summary(snapshot: Dict[str, Any], plan: List[Dict[str, Any]]) -> str:
healthy = sum(1 for info in snapshot.values() if info.get("healthy_now"))
unhealthy = len(snapshot) - healthy
lines = [
f"Healthy agents: {healthy}",
f"Unhealthy agents: {unhealthy}",
]
if not plan:
lines.append("Resurrection plan: no dead agents exceed timeout.")
return "\n".join(lines)
lines.append("Resurrection plan:")
for entry in plan:
lines.append(
f"- {entry['agent']}: {entry['action']}"
f" (mission={entry['mission']}, reasons={', '.join(entry['reasons']) or 'none'})"
)
return "\n".join(lines)
def main() -> int:
parser = argparse.ArgumentParser(description="Resurrection Pool")
parser.add_argument("--registry", type=Path, default=REGISTRY_PATH)
parser.add_argument("--policy", type=Path, default=POLICY_PATH)
parser.add_argument("--state", type=Path, default=STATE_PATH)
parser.add_argument("--json", action="store_true")
parser.add_argument("--execute", action="store_true")
parser.add_argument("--dry-run", action="store_true")
args = parser.parse_args()
now_ts = datetime.now(timezone.utc).timestamp()
registry = load_registry(args.registry)
policy = load_policy(args.policy)
prior_state = load_state(args.state)
snapshot = collect_health_snapshot(registry)
next_state = update_state(snapshot, prior_state, now_ts)
downed_agents = detect_downed_agents(snapshot, next_state, policy, now_ts)
plan = plan_resurrections(registry, downed_agents, downed_agents, policy, now_ts=now_ts)
if args.execute:
plan = execute_plan(plan, dry_run=args.dry_run)
if not args.dry_run:
save_state(next_state, args.state)
payload = {
"checked_at": datetime.fromtimestamp(now_ts, tz=timezone.utc).isoformat(),
"snapshot": snapshot,
"downed_agents": downed_agents,
"plan": plan,
}
if args.json:
print(json.dumps(payload, indent=2, sort_keys=True))
else:
print(render_summary(snapshot, plan))
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,25 @@
from pathlib import Path
REPORT = Path("reports/night-shift-prediction-2026-04-12.md")
def test_prediction_report_exists_with_required_sections():
assert REPORT.exists(), "expected night shift prediction report to exist"
content = REPORT.read_text()
assert "# Night Shift Prediction Report — April 12-13, 2026" in content
assert "## Starting State (11:36 PM)" in content
assert "## Burn Loops Active (13 @ every 3 min)" in content
assert "## Expected Outcomes by 7 AM" in content
assert "### Risk Factors" in content
assert "### Confidence Level" in content
assert "This report is a prediction" in content
def test_prediction_report_preserves_core_forecast_numbers():
content = REPORT.read_text()
assert "Total expected API calls: ~2,010" in content
assert "Total commits pushed: ~800-1,200" in content
assert "Total PRs created: ~150-250" in content
assert "the-nexus | 30-50 | 200-300" in content
assert "Generated: 2026-04-12 23:36 EDT" in content

View File

@@ -0,0 +1,118 @@
from importlib import util
from pathlib import Path
ROOT = Path(__file__).resolve().parent.parent
MODULE_PATH = ROOT / "scripts" / "resurrection_pool.py"
def load_module():
spec = util.spec_from_file_location("resurrection_pool", MODULE_PATH)
module = util.module_from_spec(spec)
assert spec.loader is not None
spec.loader.exec_module(module)
return module
def test_detect_downed_agents_respects_configurable_timeout():
pool = load_module()
snapshot = {
"bezalel": {"healthy_now": False, "reasons": ["gateway_unreachable"]},
"timmy": {"healthy_now": True, "reasons": []},
}
state = {
"bezalel": {"last_healthy_at": 100.0},
"timmy": {"last_healthy_at": 650.0},
}
policy = {"dead_timeout_seconds": 600, "agents": {}}
not_dead = pool.detect_downed_agents(snapshot, state, policy, now_ts=650.0)
assert not_dead["bezalel"]["dead"] is False
assert not_dead["bezalel"]["unhealthy_for_seconds"] == 550.0
dead = pool.detect_downed_agents(snapshot, state, policy, now_ts=701.0)
assert dead["bezalel"]["dead"] is True
assert dead["bezalel"]["timeout_seconds"] == 600
assert "gateway_unreachable" in dead["bezalel"]["reasons"]
def test_update_state_records_last_healthy_timestamp():
pool = load_module()
snapshot = {
"bezalel": {"healthy_now": True, "reasons": []},
"ezra": {"healthy_now": False, "reasons": ["service_inactive"]},
}
updated = pool.update_state(snapshot, {}, now_ts=1234.5)
assert updated["bezalel"]["last_healthy_at"] == 1234.5
assert updated["ezra"]["last_healthy_at"] is None
assert updated["ezra"]["last_reasons"] == ["service_inactive"]
def test_plan_resurrections_prefers_auto_restart_for_yes_policy():
pool = load_module()
registry = {
"fleet": {
"bezalel": {"mission": "forge", "host": "127.0.0.1"},
"allegro": {"mission": "forge", "host": "203.0.113.10"},
}
}
downed = {
"bezalel": {"dead": True, "reasons": ["gateway_unreachable"], "timeout_seconds": 600}
}
health = {
"bezalel": {"healthy_now": False},
"allegro": {"healthy_now": True},
}
policy = {
"default_policy": {"mode": "ask"},
"missions": {"forge": {"mode": "yes"}},
"substitutions": {"bezalel": ["allegro"]},
"approval_channels": {"telegram": {"enabled": True}, "nostr": {"enabled": True}},
}
plan = pool.plan_resurrections(registry, downed, health, policy, now_ts=2000.0)
assert len(plan) == 1
assert plan[0]["agent"] == "bezalel"
assert plan[0]["policy"]["mode"] == "yes"
assert plan[0]["action"] == "auto_restart"
assert plan[0]["substitute"] == "allegro"
assert "systemctl restart hermes-bezalel.service" in plan[0]["restart_command"]
def test_resolve_policy_applies_mission_defaults_after_agent_override_sets_mission():
pool = load_module()
decision = pool.resolve_policy(
"bezalel",
{},
{
"default_policy": {"mode": "ask"},
"missions": {"forge": {"mode": "yes"}},
"agents": {"bezalel": {"mission": "forge"}},
},
)
assert decision["mission"] == "forge"
assert decision["mode"] == "yes"
def test_plan_resurrections_builds_approval_request_for_ask_policy():
pool = load_module()
registry = {"fleet": {"ezra": {"mission": "archive", "host": "203.0.113.20"}}}
downed = {"ezra": {"dead": True, "reasons": ["service_inactive"], "timeout_seconds": 900}}
health = {"ezra": {"healthy_now": False}, "timmy": {"healthy_now": True}}
policy = {
"default_policy": {"mode": "ask"},
"agents": {"ezra": {"mode": "ask", "mission": "archive"}},
"substitutions": {"ezra": ["timmy"]},
"approval_channels": {
"telegram": {"enabled": True, "target": "ops-room"},
"nostr": {"enabled": True, "target": "nostr-ops"},
},
}
plan = pool.plan_resurrections(registry, downed, health, policy, now_ts=3000.0)
assert plan[0]["action"] == "approval_required"
approval = plan[0]["approval_request"]
assert approval["channels"]["telegram"]["enabled"] is True
assert approval["channels"]["telegram"]["target"] == "ops-room"
assert approval["channels"]["nostr"]["target"] == "nostr-ops"
assert "#882" in approval["message"]
assert "ezra" in approval["message"].lower()
assert approval["substitute"] == "timmy"