wip: apply mission defaults before agent overrides

wip: honor mission defaults in resurrection policy
wip: add resurrection pool planner and policy config
2026-04-15 03:56:10 -04:00 · 2026-04-15 03:54:56 -04:00 · 2026-04-15 03:53:20 -04:00 · 2026-04-15 03:50:56 -04:00 · 2026-04-15 06:13:22 +00:00 · 2026-04-15 02:02:26 -04:00
9 changed files with 859 additions and 231 deletions
--- a/config/resurrection_pool.json
+++ b/config/resurrection_pool.json
@@ -0,0 +1,55 @@
+{
+  "dead_timeout_seconds": 600,
+  "default_policy": {
+    "mode": "ask"
+  },
+  "missions": {
+    "forge": {
+      "mode": "yes"
+    },
+    "archive": {
+      "mode": "ask"
+    },
+    "sovereign-core": {
+      "mode": "no"
+    }
+  },
+  "agents": {
+    "bezalel": {
+      "mission": "forge"
+    },
+    "allegro": {
+      "mission": "forge"
+    },
+    "ezra": {
+      "mission": "archive",
+      "mode": "ask"
+    },
+    "timmy": {
+      "mission": "sovereign-core",
+      "mode": "ask"
+    }
+  },
+  "substitutions": {
+    "bezalel": [
+      "allegro",
+      "timmy"
+    ],
+    "ezra": [
+      "timmy"
+    ],
+    "allegro": [
+      "timmy"
+    ]
+  },
+  "approval_channels": {
+    "telegram": {
+      "enabled": true,
+      "target": "ops-room"
+    },
+    "nostr": {
+      "enabled": true,
+      "target": "nostr-ops"
+    }
+  }
+}
--- a/docs/duplicate-pr-prevention.md
+++ b/docs/duplicate-pr-prevention.md
@@ -1,72 +0,0 @@
-# Duplicate PR Prevention
-
-## Problem
-
-The burn loop creates duplicate PRs for the same issue because it doesn't check for existing PRs before creating new ones.
-
-## Solution
-
-Two scripts:
-
-### 1. Preflight Check (`scripts/preflight-pr-check.sh`)
-
-Run BEFORE creating a PR:
-
-```bash
-./scripts/preflight-pr-check.sh 1128
-```
-
-Output if PRs exist:
-```
-🚫 BLOCKED: 2 existing PR(s) for issue #1128
-
-Existing PRs:
-  #1458: feat: Close duplicate PRs for issue #1128
-    Branch: dawn/1128-1776130053
-    URL: https://...
-
-Options:
-  1. Review and merge an existing PR
-  2. Close duplicates and proceed
-  3. Use --force to bypass (NOT RECOMMENDED)
-```
-
-Exit code 1 = blocked. Exit code 0 = safe to proceed.
-
-### 2. Cleanup Script (`scripts/cleanup-duplicate-prs.sh`)
-
-Close duplicate PRs:
-
-```bash
-# Dry run (show what would be closed)
-./scripts/cleanup-duplicate-prs.sh 1128
-
-# Actually close duplicates (keeps oldest)
-./scripts/cleanup-duplicate-prs.sh 1128 --close
-```
-
-## Integration
-
-### In burn loop
-
-Add preflight check before PR creation:
-
-```bash
-# Before: git push && curl ... /pulls
-./scripts/preflight-pr-check.sh $ISSUE_NUM || exit 1
-```
-
-### In CI
-
-Add as a GitHub/Gitea Actions check:
-
-```yaml
- name: Check for duplicate PRs
-  run: ./scripts/preflight-pr-check.sh ${{ github.event.issue.number }}
-```
-
-## Environment Variables
-
- `GITEA_TOKEN` — API token (default: reads from `~/.config/gitea/token`)
- `GITEA_URL` — Forge URL (default: `https://forge.alexanderwhitestone.com`)
- `GITEA_REPO` — Repository (default: `Timmy_Foundation/the-nexus`)
--- a/docs/resurrection-pool.md
+++ b/docs/resurrection-pool.md
@@ -0,0 +1,27 @@
+# Resurrection Pool
+
+The Resurrection Pool is a mission-aware layer on top of the existing Lazarus registry.
+
+It adds three concrete behaviors:
+- configurable dead-agent detection timeout
+- yes/no/ask revival policy resolution per mission or agent
+- approval packet generation for Telegram / Nostr when human sign-off is required
+
+## Files
+- `scripts/resurrection_pool.py`
+- `config/resurrection_pool.json`
+
+## Example usage
+
+```bash
+python scripts/resurrection_pool.py --json --dry-run
+python scripts/resurrection_pool.py --execute
+```
+
+## Policy model
+- `yes` → local agents auto-restart; remote agents prefer a healthy substitute
+- `ask` → generate an approval request packet with Telegram / Nostr targets
+- `no` → suppress automatic revival
+
+## Notes
+This grounds issue #882 in executable code, but it does not yet wire live Telegram or Nostr delivery. The current slice produces the approval packet and restart/substitution plan the surrounding ops loop can act on.
--- a/reports/night-shift-prediction-2026-04-12.md
+++ b/reports/night-shift-prediction-2026-04-12.md
@@ -0,0 +1,111 @@
+# Night Shift Prediction Report — April 12-13, 2026
+
+## Starting State (11:36 PM)
+
+```
+Time: 11:36 PM EDT
+Automation: 13 burn loops × 3min + 1 explorer × 10min + 1 backlog × 30min
+API: Nous/xiaomi/mimo-v2-pro (FREE)
+Rate: 268 calls/hour
+Duration: 7.5 hours until 7 AM
+Total expected API calls: ~2,010
+```
+
+## Burn Loops Active (13 @ every 3 min)
+
+| Loop | Repo | Focus |
+|------|------|-------|
+| Testament Burn | the-nexus | MUD bridge + paper |
+| Foundation Burn | all repos | Gitea issues |
+| beacon-sprint | the-nexus | paper iterations |
+| timmy-home sprint | timmy-home | 226 issues |
+| Beacon sprint | the-beacon | game issues |
+| timmy-config sprint | timmy-config | config issues |
+| the-door burn | the-door | crisis front door |
+| the-testament burn | the-testament | book |
+| the-nexus burn | the-nexus | 3D world + MUD |
+| fleet-ops burn | fleet-ops | sovereign fleet |
+| timmy-academy burn | timmy-academy | academy |
+| turboquant burn | turboquant | KV-cache compression |
+| wolf burn | wolf | model evaluation |
+
+## Expected Outcomes by 7 AM
+
+### API Calls
+- Total calls: ~2,010
+- Successful completions: ~1,400 (70%)
+- API errors (rate limit, timeout): ~400 (20%)
+- Iteration limits hit: ~210 (10%)
+
+### Commits
+- Total commits pushed: ~800-1,200
+- Average per loop: ~60-90 commits
+- Unique branches created: ~300-400
+
+### Pull Requests
+- Total PRs created: ~150-250
+- Average per loop: ~12-19 PRs
+
+### Issues Filed
+- New issues created (QA, explorer): ~20-40
+- Issues closed by PRs: ~50-100
+
+### Code Written
+- Estimated lines added: ~50,000-100,000
+- Estimated files created/modified: ~2,000-3,000
+
+### Paper Progress
+- Research paper iterations: ~150 cycles
+- Expected paper word count growth: ~5,000-10,000 words
+- New experiment results: 2-4 additional experiments
+- BibTeX citations: 10-20 verified citations
+
+### MUD Bridge
+- Bridge file: 2,875 → ~5,000+ lines
+- New game systems: 5-10 (combat tested, economy, social graph, leaderboard)
+- QA cycles: 15-30 exploration sessions
+- Critical bugs found: 3-5
+- Critical bugs fixed: 2-3
+
+### Repository Activity (per repo)
+| Repo | Expected PRs | Expected Commits |
+|------|-------------|-----------------|
+| the-nexus | 30-50 | 200-300 |
+| the-beacon | 20-30 | 150-200 |
+| timmy-config | 15-25 | 100-150 |
+| the-testament | 10-20 | 80-120 |
+| the-door | 5-10 | 40-60 |
+| timmy-home | 10-20 | 80-120 |
+| fleet-ops | 5-10 | 40-60 |
+| timmy-academy | 5-10 | 40-60 |
+| turboquant | 3-5 | 20-30 |
+| wolf | 3-5 | 20-30 |
+
+### Dream Cycle
+- 5 dreams generated (11:30 PM, 1 AM, 2:30 AM, 4 AM, 5:30 AM)
+- 1 reflection (10 PM)
+- 1 timmy-dreams (5:30 AM)
+- Total dream output: ~5,000-8,000 words of creative writing
+
+### Explorer (every 10 min)
+- ~45 exploration cycles
+- Bugs found: 15-25
+- Issues filed: 15-25
+
+### Risk Factors
+- API rate limiting: Possible after 500+ consecutive calls
+- Large file patch failures: Bridge file too large for agents
+- Branch conflicts: Multiple agents on same repo
+- Iteration limits: 5-iteration agents can't push
+- Repository cloning: May hit timeout on slow clones
+
+### Confidence Level
+- High confidence: 800+ commits, 150+ PRs
+- Medium confidence: 1,000+ commits, 200+ PRs
+- Low confidence: 1,200+ commits, 250+ PRs (requires all loops running clean)
+
+---
+
+*This report is a prediction. The 7 AM morning report will compare actual results.*
+*Generated: 2026-04-12 23:36 EDT*
+*Author: Timmy (pre-shift prediction)*
--- a/scripts/cleanup-duplicate-prs.sh
+++ b/scripts/cleanup-duplicate-prs.sh
@@ -1,101 +1,170 @@
 #!/usr/bin/env bash
-# cleanup-duplicate-prs.sh — Close duplicate PRs for a given issue
+# ═══════════════════════════════════════════════════════════════
+# cleanup-duplicate-prs.sh — Identify and close duplicate open PRs
+#
+# This script identifies PRs that are duplicates (same issue number
+# or very similar titles) and closes the older ones.
 #
 # Usage:
-#   ./scripts/cleanup-duplicate-prs.sh <issue_number> [--close]
+#   ./scripts/cleanup-duplicate-prs.sh [--dry-run] [--close]
 #
-# Without --close: dry run (show what would be closed)
-# With --close: actually close the duplicates
-
+# Options:
+#   --dry-run    Show what would be done without making changes
+#   --close      Actually close duplicate PRs (default is dry-run)
+#
+# Designed for issue #1128: Forge Cleanup
+# ═══════════════════════════════════════════════════════════════
 set -euo pipefail

-ISSUE_NUM="${1:?Usage: cleanup-duplicate-prs.sh <issue_number> [--close]}"
-CLOSE_MODE="${2:-}"
+# ─── Configuration ──────────────────────────────────────────
 GITEA_URL="${GITEA_URL:-https://forge.alexanderwhitestone.com}"
-GITEA_TOKEN="${GITEA_TOKEN:-$(cat ~/.config/gitea/token 2>/dev/null || echo '')}"
-REPO="${GITEA_REPO:-Timmy_Foundation/the-nexus}"
+GITEA_TOKEN="${GITEA_TOKEN:?Set GITEA_TOKEN env var}"
+REPO="${REPO:-Timmy_Foundation/the-nexus}"
+DRY_RUN="${DRY_RUN:-true}"

-if [ -z "$GITEA_TOKEN" ]; then
-    echo "ERROR: GITEA_TOKEN not set"
-    exit 1
-fi
+# Parse command line arguments
+for arg in "$@"; do
+    case $arg in
+        --dry-run)
+            DRY_RUN="true"
+            ;;
+        --close)
+            DRY_RUN="false"
+            ;;
+    esac
+done

-REPO_API="${GITEA_URL}/api/v1/repos/${REPO}"
+API="$GITEA_URL/api/v1"
+AUTH="token $GITEA_TOKEN"

-# Fetch open PRs
-PRS=$(curl -sf -H "Authorization: token ${GITEA_TOKEN}"     "${REPO_API}/pulls?state=open&limit=50" 2>/dev/null || echo '[]')
+log() { echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] $*"; }

-# Find matching PRs
-MATCHES=$(echo "$PRS" | python3 -c "
-import json, sys
-prs = json.load(sys.stdin)
-issue = '${ISSUE_NUM}'
-matches = []
-for pr in prs:
-    title = pr.get('title', '')
-    body = pr.get('body', '')
-    ref = pr.get('head', {}).get('ref', '')
-    if f'#{issue}' in title or f'#{issue}' in body or issue in ref:
-        matches.append(pr)
-json.dump(matches, sys.stdout)
-" 2>/dev/null || echo '[]')
+# ─── Fetch open PRs ────────────────────────────────────────
+log "Checking open PRs for $REPO (dry_run: $DRY_RUN)"

-COUNT=$(echo "$MATCHES" | python3 -c "import json,sys; print(len(json.load(sys.stdin)))" 2>/dev/null || echo '0')
+OPEN_PRS=$(curl -s -H "$AUTH" "$API/repos/$REPO/pulls?state=open&limit=50")

-if [ "$COUNT" -eq 0 ]; then
-    echo "No PRs found for issue #$ISSUE_NUM"
+if [ -z "$OPEN_PRS" ] || [ "$OPEN_PRS" = "null" ]; then
+    log "No open PRs found or API error"
    exit 0
 fi

-echo "Found $COUNT PR(s) for issue #$ISSUE_NUM:"
-echo "$MATCHES" | python3 -c "
-import json, sys
-prs = json.load(sys.stdin)
-for pr in prs:
-    print(f"  #{pr['number']}: {pr['title']} [{pr['head']['ref']}]")
-"
+# Count PRs
+PR_COUNT=$(echo "$OPEN_PRS" | jq length)
+log "Found $PR_COUNT open PRs"

-if [ "$COUNT" -le 1 ]; then
-    echo ""
-    echo "Only 1 PR found. No cleanup needed."
+if [ "$PR_COUNT" -eq 0 ]; then
+    log "No open PRs to process"
    exit 0
 fi

-# Keep the oldest PR, close the rest
-echo "$MATCHES" | python3 -c "
-import json, sys
-prs = json.load(sys.stdin)
-prs.sort(key=lambda p: p['number'])
-keep = prs[0]
-close = prs[1:]
-print(f'KEEP:   #{keep["number"]}: {keep["title"]}')
-for pr in close:
-    print(f'CLOSE:  #{pr["number"]}: {pr["title"]}')
-" 
+# ─── Extract issue numbers from PR titles ──────────────────
+# Create a temporary file for PR data
+TEMP_FILE=$(mktemp)
+echo "$OPEN_PRS" | jq -r '.[] | "\(.number)\t\(.title)\t\(.created_at)\t\(.head.ref)"' > "$TEMP_FILE"

-if [ "$CLOSE_MODE" != "--close" ]; then ""
-    echo ""
-    echo "DRY RUN: Add --close to actually close duplicates"
-    exit 0
+# Group PRs by issue number using temporary files
+TEMP_DIR=$(mktemp -d)
+trap "rm -rf $TEMP_DIR" EXIT
+
+while IFS=$'\t' read -r pr_number pr_title pr_created pr_branch; do
+    # Extract issue number from title (look for #123 pattern)
+    if [[ $pr_title =~ \#([0-9]+) ]]; then
+        issue_num="${BASH_REMATCH[1]}"
+        echo "$pr_number,$pr_created,$pr_branch" >> "$TEMP_DIR/issue_$issue_num.txt"
+    fi
+done < "$TEMP_FILE"
+
+rm -f "$TEMP_FILE"
+
+# ─── Identify and process duplicates ──────────────────────
+DUPLICATES_FOUND=0
+CLOSED_COUNT=0
+
+for issue_file in "$TEMP_DIR"/issue_*.txt; do
+    [ -f "$issue_file" ] || continue
+    
+    issue_num=$(basename "$issue_file" .txt | sed 's/issue_//')
+    pr_list=$(cat "$issue_file")
+    
+    # Count PRs for this issue
+    pr_count=$(echo -n "$pr_list" | grep -c '^' || true)
+    
+    if [ "$pr_count" -le 1 ]; then
+        continue  # No duplicates
+    fi
+    
+    log "Issue #$issue_num has $pr_count open PRs"
+    DUPLICATES_FOUND=$((DUPLICATES_FOUND + 1))
+    
+    # Sort by creation date (oldest first)
+    sorted_prs=$(echo -n "$pr_list" | sort -t',' -k2)
+    
+    # Keep the newest PR, close the rest
+    newest_pr=""
+    newest_date=""
+    
+    while IFS=',' read -r pr_num pr_date pr_branch; do
+        if [ -z "$newest_date" ] || [[ "$pr_date" > "$newest_date" ]]; then
+            newest_pr="$pr_num"
+            newest_date="$pr_date"
+        fi
+    done <<< "$sorted_prs"
+    
+    log "Keeping PR #$newest_pr (newest)"
+    
+    # Close older PRs
+    while IFS=',' read -r pr_num pr_date pr_branch; do
+        if [ "$pr_num" = "$newest_pr" ]; then
+            continue  # Skip the newest PR
+        fi
+        
+        log "Closing duplicate PR #$pr_num for issue #$issue_num"
+        
+        if [ "$DRY_RUN" = "true" ]; then
+            log "DRY RUN: Would close PR #$pr_num"
+        else
+            # Add a comment explaining why we're closing
+            comment_body="Closing as duplicate. PR #$newest_pr is newer and addresses the same issue (#$issue_num)."
+            
+            curl -s -X POST -H "$AUTH" -H "Content-Type: application/json"                 -d "{\"body\": \"$comment_body\"}"                 "$API/repos/$REPO/issues/$pr_num/comments" > /dev/null
+            
+            # Close the PR
+            curl -s -X PATCH -H "$AUTH" -H "Content-Type: application/json"                 -d '{"state": "closed"}'                 "$API/repos/$REPO/pulls/$pr_num" > /dev/null
+            
+            log "Closed PR #$pr_num"
+            CLOSED_COUNT=$((CLOSED_COUNT + 1))
+        fi
+    done <<< "$sorted_prs"
+done
+
+# ─── Summary ──────────────────────────────────────────────
+log "Cleanup complete:"
+log "  Duplicate issue groups found: $DUPLICATES_FOUND"
+log "  PRs closed: $CLOSED_COUNT"
+log "  Dry run: $DRY_RUN"
+
+if [ "$DUPLICATES_FOUND" -eq 0 ]; then
+    log "No duplicate PRs found"
 fi

-# Close duplicates
-echo "$MATCHES" | python3 -c "
-import json, sys, urllib.request, os
-prs = json.load(sys.stdin)
-prs.sort(key=lambda p: p['number'])
-token = '${GITEA_TOKEN}'
-api = '${REPO_API}'
-for pr in prs[1:]:
-    url = f'{api}/pulls/{pr["number"]}'
-    data = json.dumps({'state': 'closed'}).encode()
-    req = urllib.request.Request(url, data=data, headers={'Authorization': f'token {token}', 'Content-Type': 'application/json'}, method='PATCH')
-    try:
-        urllib.request.urlopen(req)
-        print(f'Closed PR #{pr["number"]}')
-    except Exception as e:
-        print(f'Error closing #{pr["number"]}: {e}')
-"
+# ─── Additional cleanup: Stale PRs ────────────────────────
+# Check for PRs older than 30 days with no activity
+log "Checking for stale PRs (older than 30 days)..."

-echo ""
-echo "Cleanup complete."
+THIRTY_DAYS_AGO=$(date -u -v-30d +%Y-%m-%dT%H:%M:%SZ 2>/dev/null || date -u -d "30 days ago" +%Y-%m-%dT%H:%M:%SZ)
+
+STALE_PRS=$(echo "$OPEN_PRS" | jq -r --arg cutoff "$THIRTY_DAYS_AGO"     '.[] | select(.created_at < $cutoff) | "\(.number)\t\(.title)\t\(.created_at)"')
+
+if [ -n "$STALE_PRS" ]; then
+    STALE_COUNT=$(echo -n "$STALE_PRS" | grep -c '^' || true)
+    log "Found $STALE_COUNT stale PRs (older than 30 days)"
+    
+    echo "$STALE_PRS" | while IFS=$'\t' read -r pr_num pr_title pr_created; do
+        log "Stale PR #$pr_num: $pr_title (created: $pr_created)"
+    done
+else
+    log "No stale PRs found"
+fi
+
+log "Script complete"
--- a/scripts/preflight-pr-check.sh
+++ b/scripts/preflight-pr-check.sh
@@ -1,82 +0,0 @@
-#!/usr/bin/env bash
-# preflight-pr-check.sh — Prevent duplicate PRs before creating them
-#
-# Usage:
-#   ./scripts/preflight-pr-check.sh <issue_number>
-#
-# Exit codes:
-#   0 = safe to proceed (no existing PRs)
-#   1 = BLOCKED (existing PRs found)
-#   2 = error
-
-set -euo pipefail
-
-ISSUE_NUM="${1:?Usage: preflight-pr-check.sh <issue_number>}"
-GITEA_URL="${GITEA_URL:-https://forge.alexanderwhitestone.com}"
-GITEA_TOKEN="${GITEA_TOKEN:-$(cat ~/.config/gitea/token 2>/dev/null || echo '')}"
-REPO="${GITEA_REPO:-Timmy_Foundation/the-nexus}"
-
-if [ -z "$GITEA_TOKEN" ]; then
-    echo "ERROR: GITEA_TOKEN not set and ~/.config/gitea/token not found"
-    exit 2
-fi
-
-# Get repo info
-REPO_API="${GITEA_URL}/api/v1/repos/${REPO}"
-
-# Fetch open PRs
-PRS=$(curl -sf -H "Authorization: token ${GITEA_TOKEN}"     "${REPO_API}/pulls?state=open&limit=50" 2>/dev/null || echo '[]')
-
-# Check for existing PRs referencing this issue
-MATCHING_PRS=$(echo "$PRS" | python3 -c "
-import json, sys
-prs = json.load(sys.stdin)
-issue = '${ISSUE_NUM}'
-matches = []
-for pr in prs:
-    title = pr.get('title', '')
-    body = pr.get('body', '')
-    ref = pr.get('head', {}).get('ref', '')
-    if f'#{issue}' in title or f'#{issue}' in body or issue in ref:
-        matches.append({
-            'number': pr['number'],
-            'title': title,
-            'branch': ref,
-            'url': pr.get('html_url', '')
-        })
-json.dump(matches, sys.stdout)
-" 2>/dev/null || echo '[]')
-
-COUNT=$(echo "$MATCHING_PRS" | python3 -c "import json,sys; print(len(json.load(sys.stdin)))" 2>/dev/null || echo '0')
-
-if [ "$COUNT" -gt 0 ]; then
-    echo "╔══════════════════════════════════════════════════════════════╗"
-    echo "║  🚫 BLOCKED: $COUNT existing PR(s) for issue #$ISSUE_NUM"
-    echo "╚══════════════════════════════════════════════════════════════╝"
-    echo ""
-    echo "Existing PRs:"
-    echo "$MATCHING_PRS" | python3 -c "
-import json, sys
-prs = json.load(sys.stdin)
-for pr in prs:
-    print(f"  #{pr['number']}: {pr['title']}")
-    print(f"    Branch: {pr['branch']}")
-    print(f"    URL: {pr['url']}")
-    print()
-"
-    echo "Options:"
-    echo "  1. Review and merge an existing PR"
-    echo "  2. Close duplicates and proceed"
-    echo "  3. Use --force to bypass (NOT RECOMMENDED)"
-    echo ""
-    
-    if [ "${2:-}" = "--force" ]; then
-        echo "⚠️  --force flag detected. Bypassing duplicate check."
-        exit 0
-    fi
-    
-    exit 1
-else
-    echo "✅ Safe to proceed: No existing PRs for issue #$ISSUE_NUM"
-    exit 0
-fi
--- a/scripts/resurrection_pool.py
+++ b/scripts/resurrection_pool.py
@@ -0,0 +1,377 @@
+#!/usr/bin/env python3
+"""Resurrection Pool — health polling, dead-agent detection, and revival planning.
+
+Grounded implementation slice for #882.
+Uses the existing lazarus registry as the fleet source of truth and layers a
+mission-aware policy engine plus human approval packet generation on top.
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import subprocess
+import urllib.request
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Any, Dict, List, Optional
+
+import yaml
+
+ROOT = Path(__file__).resolve().parent.parent
+REGISTRY_PATH = ROOT / "lazarus-registry.yaml"
+POLICY_PATH = ROOT / "config" / "resurrection_pool.json"
+STATE_PATH = Path("/var/lib/lazarus/resurrection_pool_state.json")
+LOCAL_HOSTS = {"127.0.0.1", "localhost", "104.131.15.18"}
+ISSUE_NUMBER = 882
+
+
+def shell(cmd: str, timeout: int = 30) -> tuple[int, str, str]:
+    try:
+        result = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=timeout)
+        return result.returncode, result.stdout.strip(), result.stderr.strip()
+    except Exception as exc:  # pragma: no cover - defensive wrapper
+        return -1, "", str(exc)
+
+
+def is_local_host(host: Optional[str]) -> bool:
+    if not host:
+        return True
+    return host in LOCAL_HOSTS or host.startswith("127.")
+
+
+def ping_http(url: str, timeout: int = 10) -> tuple[bool, int]:
+    try:
+        req = urllib.request.Request(url, method="HEAD")
+        with urllib.request.urlopen(req, timeout=timeout) as resp:
+            return True, resp.status
+    except urllib.error.HTTPError as err:
+        return True, err.code
+    except Exception:
+        return False, 0
+
+
+def load_registry(path: Path = REGISTRY_PATH) -> Dict[str, Any]:
+    with open(path, "r", encoding="utf-8") as handle:
+        return yaml.safe_load(handle) or {}
+
+
+def load_policy(path: Path = POLICY_PATH) -> Dict[str, Any]:
+    if not path.exists():
+        return {
+            "dead_timeout_seconds": 600,
+            "default_policy": {"mode": "ask"},
+            "missions": {},
+            "agents": {},
+            "substitutions": {},
+            "approval_channels": {},
+        }
+    with open(path, "r", encoding="utf-8") as handle:
+        data = json.load(handle)
+    data.setdefault("dead_timeout_seconds", 600)
+    data.setdefault("default_policy", {"mode": "ask"})
+    data.setdefault("missions", {})
+    data.setdefault("agents", {})
+    data.setdefault("substitutions", {})
+    data.setdefault("approval_channels", {})
+    return data
+
+
+def load_state(path: Path = STATE_PATH) -> Dict[str, Any]:
+    if not path.exists():
+        return {}
+    with open(path, "r", encoding="utf-8") as handle:
+        return json.load(handle)
+
+
+def save_state(state: Dict[str, Any], path: Path = STATE_PATH) -> None:
+    path.parent.mkdir(parents=True, exist_ok=True)
+    with open(path, "w", encoding="utf-8") as handle:
+        json.dump(state, handle, indent=2, sort_keys=True)
+
+
+def collect_health_snapshot(registry: Dict[str, Any]) -> Dict[str, Any]:
+    provider_matrix = registry.get("provider_health_matrix", {})
+    fleet = registry.get("fleet", {})
+    snapshot: Dict[str, Any] = {}
+
+    for agent_name, spec in fleet.items():
+        primary = spec.get("primary", {})
+        provider_name = primary.get("provider")
+        provider_status = provider_matrix.get(provider_name, {}).get("status", "unknown")
+        gateway_url = spec.get("health_endpoints", {}).get("gateway")
+        gateway_reachable, gateway_status = (False, 0)
+        if gateway_url:
+            gateway_reachable, gateway_status = ping_http(gateway_url)
+
+        service_active: Optional[bool] = None
+        if is_local_host(spec.get("host")):
+            service_code, _, _ = shell(f"systemctl is-active hermes-{agent_name}.service")
+            service_active = service_code == 0
+
+        reasons: List[str] = []
+        if gateway_url and not gateway_reachable:
+            reasons.append("gateway_unreachable")
+        if service_active is False:
+            reasons.append("service_inactive")
+        if provider_status in {"dead", "degraded"}:
+            reasons.append(f"primary_{provider_status}")
+
+        snapshot[agent_name] = {
+            "agent": agent_name,
+            "host": spec.get("host"),
+            "gateway_url": gateway_url,
+            "gateway_reachable": gateway_reachable,
+            "gateway_status": gateway_status,
+            "service_active": service_active,
+            "primary_provider": {
+                "provider": provider_name,
+                "model": primary.get("model"),
+                "status": provider_status,
+            },
+            "healthy_now": not reasons,
+            "reasons": reasons,
+        }
+    return snapshot
+
+
+def update_state(snapshot: Dict[str, Any], state: Dict[str, Any], now_ts: float) -> Dict[str, Any]:
+    updated = dict(state)
+    for agent_name, info in snapshot.items():
+        entry = dict(updated.get(agent_name, {}))
+        entry["last_checked_at"] = now_ts
+        entry["last_reasons"] = list(info.get("reasons", []))
+        if info.get("healthy_now"):
+            entry["last_healthy_at"] = now_ts
+        else:
+            entry.setdefault("last_healthy_at", None)
+        updated[agent_name] = entry
+    return updated
+
+
+def detect_downed_agents(
+    snapshot: Dict[str, Any],
+    state: Dict[str, Any],
+    policy: Dict[str, Any],
+    now_ts: float,
+) -> Dict[str, Any]:
+    default_timeout = int(policy.get("dead_timeout_seconds", 600))
+    agent_overrides = policy.get("agents", {})
+    detected: Dict[str, Any] = {}
+
+    for agent_name, info in snapshot.items():
+        timeout_seconds = int(agent_overrides.get(agent_name, {}).get("dead_timeout_seconds", default_timeout))
+        last_healthy_at = state.get(agent_name, {}).get("last_healthy_at")
+        if info.get("healthy_now"):
+            unhealthy_for_seconds = 0.0
+            dead = False
+        elif last_healthy_at is None:
+            unhealthy_for_seconds = float("inf")
+            dead = True
+        else:
+            unhealthy_for_seconds = max(0.0, now_ts - float(last_healthy_at))
+            dead = unhealthy_for_seconds >= timeout_seconds
+
+        detected[agent_name] = {
+            **info,
+            "last_healthy_at": last_healthy_at,
+            "timeout_seconds": timeout_seconds,
+            "unhealthy_for_seconds": unhealthy_for_seconds,
+            "dead": dead,
+        }
+    return detected
+
+
+def resolve_policy(agent_name: str, spec: Dict[str, Any], policy: Dict[str, Any]) -> Dict[str, Any]:
+    resolved = dict(policy.get("default_policy", {}))
+    spec_mission = spec.get("mission")
+    agent_override = dict(policy.get("agents", {}).get(agent_name, {}))
+    resolved_mission = agent_override.get("mission") or spec_mission or agent_name
+    if resolved_mission in policy.get("missions", {}):
+        resolved.update(policy["missions"][resolved_mission])
+    resolved.update(agent_override)
+    resolved.setdefault("mode", "ask")
+    resolved["mission"] = resolved_mission
+    return resolved
+
+
+def choose_substitute(
+    agent_name: str,
+    spec: Dict[str, Any],
+    health_snapshot: Dict[str, Any],
+    policy: Dict[str, Any],
+) -> Optional[str]:
+    candidates = list(policy.get("substitutions", {}).get(agent_name, []))
+    candidates.extend(spec.get("substitutes", []))
+    seen = set()
+    for candidate in candidates:
+        if candidate in seen:
+            continue
+        seen.add(candidate)
+        candidate_health = health_snapshot.get(candidate, {})
+        if candidate_health.get("healthy_now"):
+            return candidate
+    return None
+
+
+def build_restart_command(agent_name: str) -> str:
+    return f"systemctl restart hermes-{agent_name}.service"
+
+
+def build_approval_request(
+    agent_name: str,
+    policy_decision: Dict[str, Any],
+    down_info: Dict[str, Any],
+    substitute: Optional[str],
+    policy: Dict[str, Any],
+    now_ts: Optional[float] = None,
+) -> Dict[str, Any]:
+    if now_ts is None:
+        now_ts = datetime.now(timezone.utc).timestamp()
+    reasons = ", ".join(down_info.get("reasons", [])) or "no health signal"
+    mission = policy_decision.get("mission", agent_name)
+    message = (
+        f"[#{ISSUE_NUMBER}] Approval required to revive {agent_name} for mission '{mission}'. "
+        f"Reasons: {reasons}. "
+        f"Suggested substitute: {substitute or 'none available'}."
+    )
+    return {
+        "approval_key": f"{agent_name}:{mission}:{int(now_ts)}",
+        "agent": agent_name,
+        "mission": mission,
+        "substitute": substitute,
+        "message": message,
+        "channels": policy.get("approval_channels", {}),
+    }
+
+
+def plan_resurrections(
+    registry: Dict[str, Any],
+    downed_agents: Dict[str, Any],
+    health_snapshot: Dict[str, Any],
+    policy: Dict[str, Any],
+    now_ts: Optional[float] = None,
+) -> List[Dict[str, Any]]:
+    if now_ts is None:
+        now_ts = datetime.now(timezone.utc).timestamp()
+    fleet = registry.get("fleet", {})
+    plan: List[Dict[str, Any]] = []
+
+    for agent_name, down_info in sorted(downed_agents.items()):
+        if not down_info.get("dead"):
+            continue
+        spec = fleet.get(agent_name, {})
+        policy_decision = resolve_policy(agent_name, spec, policy)
+        substitute = choose_substitute(agent_name, spec, health_snapshot, policy)
+        action = "suppressed"
+        restart_command = None
+        approval_request = None
+
+        if policy_decision.get("mode") == "yes":
+            if is_local_host(spec.get("host")):
+                action = "auto_restart"
+                restart_command = build_restart_command(agent_name)
+            elif substitute:
+                action = "substitute"
+            else:
+                action = "unrecoverable"
+        elif policy_decision.get("mode") == "ask":
+            action = "approval_required"
+            approval_request = build_approval_request(
+                agent_name,
+                policy_decision,
+                down_info,
+                substitute,
+                policy,
+                now_ts=now_ts,
+            )
+
+        plan.append(
+            {
+                "agent": agent_name,
+                "mission": policy_decision.get("mission"),
+                "policy": policy_decision,
+                "reasons": list(down_info.get("reasons", [])),
+                "timeout_seconds": down_info.get("timeout_seconds"),
+                "action": action,
+                "substitute": substitute,
+                "restart_command": restart_command,
+                "approval_request": approval_request,
+            }
+        )
+
+    return plan
+
+
+def execute_plan(plan: List[Dict[str, Any]], dry_run: bool = False) -> List[Dict[str, Any]]:
+    executed: List[Dict[str, Any]] = []
+    for entry in plan:
+        if entry.get("action") != "auto_restart":
+            executed.append({**entry, "executed": False})
+            continue
+        cmd = entry.get("restart_command")
+        if dry_run or not cmd:
+            executed.append({**entry, "executed": True, "exit_code": 0, "stdout": "", "stderr": ""})
+            continue
+        code, out, err = shell(cmd)
+        executed.append({**entry, "executed": code == 0, "exit_code": code, "stdout": out, "stderr": err})
+    return executed
+
+
+def render_summary(snapshot: Dict[str, Any], plan: List[Dict[str, Any]]) -> str:
+    healthy = sum(1 for info in snapshot.values() if info.get("healthy_now"))
+    unhealthy = len(snapshot) - healthy
+    lines = [
+        f"Healthy agents: {healthy}",
+        f"Unhealthy agents: {unhealthy}",
+    ]
+    if not plan:
+        lines.append("Resurrection plan: no dead agents exceed timeout.")
+        return "\n".join(lines)
+    lines.append("Resurrection plan:")
+    for entry in plan:
+        lines.append(
+            f"- {entry['agent']}: {entry['action']}"
+            f" (mission={entry['mission']}, reasons={', '.join(entry['reasons']) or 'none'})"
+        )
+    return "\n".join(lines)
+
+
+def main() -> int:
+    parser = argparse.ArgumentParser(description="Resurrection Pool")
+    parser.add_argument("--registry", type=Path, default=REGISTRY_PATH)
+    parser.add_argument("--policy", type=Path, default=POLICY_PATH)
+    parser.add_argument("--state", type=Path, default=STATE_PATH)
+    parser.add_argument("--json", action="store_true")
+    parser.add_argument("--execute", action="store_true")
+    parser.add_argument("--dry-run", action="store_true")
+    args = parser.parse_args()
+
+    now_ts = datetime.now(timezone.utc).timestamp()
+    registry = load_registry(args.registry)
+    policy = load_policy(args.policy)
+    prior_state = load_state(args.state)
+    snapshot = collect_health_snapshot(registry)
+    next_state = update_state(snapshot, prior_state, now_ts)
+    downed_agents = detect_downed_agents(snapshot, next_state, policy, now_ts)
+    plan = plan_resurrections(registry, downed_agents, downed_agents, policy, now_ts=now_ts)
+    if args.execute:
+        plan = execute_plan(plan, dry_run=args.dry_run)
+    if not args.dry_run:
+        save_state(next_state, args.state)
+
+    payload = {
+        "checked_at": datetime.fromtimestamp(now_ts, tz=timezone.utc).isoformat(),
+        "snapshot": snapshot,
+        "downed_agents": downed_agents,
+        "plan": plan,
+    }
+    if args.json:
+        print(json.dumps(payload, indent=2, sort_keys=True))
+    else:
+        print(render_summary(snapshot, plan))
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
--- a/tests/test_night_shift_prediction_report.py
+++ b/tests/test_night_shift_prediction_report.py
@@ -0,0 +1,25 @@
+from pathlib import Path
+
+
+REPORT = Path("reports/night-shift-prediction-2026-04-12.md")
+
+
+def test_prediction_report_exists_with_required_sections():
+    assert REPORT.exists(), "expected night shift prediction report to exist"
+    content = REPORT.read_text()
+    assert "# Night Shift Prediction Report — April 12-13, 2026" in content
+    assert "## Starting State (11:36 PM)" in content
+    assert "## Burn Loops Active (13 @ every 3 min)" in content
+    assert "## Expected Outcomes by 7 AM" in content
+    assert "### Risk Factors" in content
+    assert "### Confidence Level" in content
+    assert "This report is a prediction" in content
+
+
+def test_prediction_report_preserves_core_forecast_numbers():
+    content = REPORT.read_text()
+    assert "Total expected API calls: ~2,010" in content
+    assert "Total commits pushed: ~800-1,200" in content
+    assert "Total PRs created: ~150-250" in content
+    assert "the-nexus | 30-50 | 200-300" in content
+    assert "Generated: 2026-04-12 23:36 EDT" in content
--- a/tests/test_resurrection_pool.py
+++ b/tests/test_resurrection_pool.py
@@ -0,0 +1,118 @@
+from importlib import util
+from pathlib import Path
+
+
+ROOT = Path(__file__).resolve().parent.parent
+MODULE_PATH = ROOT / "scripts" / "resurrection_pool.py"
+
+
+def load_module():
+    spec = util.spec_from_file_location("resurrection_pool", MODULE_PATH)
+    module = util.module_from_spec(spec)
+    assert spec.loader is not None
+    spec.loader.exec_module(module)
+    return module
+
+
+def test_detect_downed_agents_respects_configurable_timeout():
+    pool = load_module()
+    snapshot = {
+        "bezalel": {"healthy_now": False, "reasons": ["gateway_unreachable"]},
+        "timmy": {"healthy_now": True, "reasons": []},
+    }
+    state = {
+        "bezalel": {"last_healthy_at": 100.0},
+        "timmy": {"last_healthy_at": 650.0},
+    }
+    policy = {"dead_timeout_seconds": 600, "agents": {}}
+
+    not_dead = pool.detect_downed_agents(snapshot, state, policy, now_ts=650.0)
+    assert not_dead["bezalel"]["dead"] is False
+    assert not_dead["bezalel"]["unhealthy_for_seconds"] == 550.0
+
+    dead = pool.detect_downed_agents(snapshot, state, policy, now_ts=701.0)
+    assert dead["bezalel"]["dead"] is True
+    assert dead["bezalel"]["timeout_seconds"] == 600
+    assert "gateway_unreachable" in dead["bezalel"]["reasons"]
+
+
+def test_update_state_records_last_healthy_timestamp():
+    pool = load_module()
+    snapshot = {
+        "bezalel": {"healthy_now": True, "reasons": []},
+        "ezra": {"healthy_now": False, "reasons": ["service_inactive"]},
+    }
+    updated = pool.update_state(snapshot, {}, now_ts=1234.5)
+    assert updated["bezalel"]["last_healthy_at"] == 1234.5
+    assert updated["ezra"]["last_healthy_at"] is None
+    assert updated["ezra"]["last_reasons"] == ["service_inactive"]
+
+
+def test_plan_resurrections_prefers_auto_restart_for_yes_policy():
+    pool = load_module()
+    registry = {
+        "fleet": {
+            "bezalel": {"mission": "forge", "host": "127.0.0.1"},
+            "allegro": {"mission": "forge", "host": "203.0.113.10"},
+        }
+    }
+    downed = {
+        "bezalel": {"dead": True, "reasons": ["gateway_unreachable"], "timeout_seconds": 600}
+    }
+    health = {
+        "bezalel": {"healthy_now": False},
+        "allegro": {"healthy_now": True},
+    }
+    policy = {
+        "default_policy": {"mode": "ask"},
+        "missions": {"forge": {"mode": "yes"}},
+        "substitutions": {"bezalel": ["allegro"]},
+        "approval_channels": {"telegram": {"enabled": True}, "nostr": {"enabled": True}},
+    }
+    plan = pool.plan_resurrections(registry, downed, health, policy, now_ts=2000.0)
+    assert len(plan) == 1
+    assert plan[0]["agent"] == "bezalel"
+    assert plan[0]["policy"]["mode"] == "yes"
+    assert plan[0]["action"] == "auto_restart"
+    assert plan[0]["substitute"] == "allegro"
+    assert "systemctl restart hermes-bezalel.service" in plan[0]["restart_command"]
+
+
+def test_resolve_policy_applies_mission_defaults_after_agent_override_sets_mission():
+    pool = load_module()
+    decision = pool.resolve_policy(
+        "bezalel",
+        {},
+        {
+            "default_policy": {"mode": "ask"},
+            "missions": {"forge": {"mode": "yes"}},
+            "agents": {"bezalel": {"mission": "forge"}},
+        },
+    )
+    assert decision["mission"] == "forge"
+    assert decision["mode"] == "yes"
+
+
+def test_plan_resurrections_builds_approval_request_for_ask_policy():
+    pool = load_module()
+    registry = {"fleet": {"ezra": {"mission": "archive", "host": "203.0.113.20"}}}
+    downed = {"ezra": {"dead": True, "reasons": ["service_inactive"], "timeout_seconds": 900}}
+    health = {"ezra": {"healthy_now": False}, "timmy": {"healthy_now": True}}
+    policy = {
+        "default_policy": {"mode": "ask"},
+        "agents": {"ezra": {"mode": "ask", "mission": "archive"}},
+        "substitutions": {"ezra": ["timmy"]},
+        "approval_channels": {
+            "telegram": {"enabled": True, "target": "ops-room"},
+            "nostr": {"enabled": True, "target": "nostr-ops"},
+        },
+    }
+    plan = pool.plan_resurrections(registry, downed, health, policy, now_ts=3000.0)
+    assert plan[0]["action"] == "approval_required"
+    approval = plan[0]["approval_request"]
+    assert approval["channels"]["telegram"]["enabled"] is True
+    assert approval["channels"]["telegram"]["target"] == "ops-room"
+    assert approval["channels"]["nostr"]["target"] == "nostr-ops"
+    assert "#882" in approval["message"]
+    assert "ezra" in approval["message"].lower()
+    assert approval["substitute"] == "timmy"
Author	SHA1	Message	Date
Alexander Whitestone	61a6964780	wip: apply mission defaults before agent overrides Some checks failed CI / test (pull_request) Failing after 1m12s Details CI / validate (pull_request) Failing after 1m8s Details Review Approval Gate / verify-review (pull_request) Failing after 10s Details	2026-04-15 03:56:10 -04:00
Alexander Whitestone	e40891afb8	wip: honor mission defaults in resurrection policy	2026-04-15 03:54:56 -04:00
Alexander Whitestone	e232112fc8	wip: add resurrection pool planner and policy config	2026-04-15 03:53:20 -04:00
Alexander Whitestone	ff2e2e578f	wip: add resurrection pool regression tests	2026-04-15 03:50:56 -04:00
Timmy Time	bd0497b998	Merge PR #1585 : docs: add night shift prediction report (#1353 )	2026-04-15 06:13:22 +00:00
Alexander Whitestone	4ab84a59ab	docs: add night shift prediction report (#1353 ) Some checks failed CI / test (pull_request) Failing after 50s Details CI / validate (pull_request) Failing after 1m10s Details Review Approval Gate / verify-review (pull_request) Successful in 16s Details	2026-04-15 02:02:26 -04:00