wip: apply mission defaults before agent overrides

wip: honor mission defaults in resurrection policy
wip: add resurrection pool planner and policy config
2026-04-15 03:56:10 -04:00 · 2026-04-15 03:54:56 -04:00 · 2026-04-15 03:53:20 -04:00 · 2026-04-15 03:50:56 -04:00 · 2026-04-15 06:13:22 +00:00 · 2026-04-15 02:02:26 -04:00
7 changed files with 713 additions and 89 deletions
--- a/config/resurrection_pool.json
+++ b/config/resurrection_pool.json
@@ -0,0 +1,55 @@
+{
+  "dead_timeout_seconds": 600,
+  "default_policy": {
+    "mode": "ask"
+  },
+  "missions": {
+    "forge": {
+      "mode": "yes"
+    },
+    "archive": {
+      "mode": "ask"
+    },
+    "sovereign-core": {
+      "mode": "no"
+    }
+  },
+  "agents": {
+    "bezalel": {
+      "mission": "forge"
+    },
+    "allegro": {
+      "mission": "forge"
+    },
+    "ezra": {
+      "mission": "archive",
+      "mode": "ask"
+    },
+    "timmy": {
+      "mission": "sovereign-core",
+      "mode": "ask"
+    }
+  },
+  "substitutions": {
+    "bezalel": [
+      "allegro",
+      "timmy"
+    ],
+    "ezra": [
+      "timmy"
+    ],
+    "allegro": [
+      "timmy"
+    ]
+  },
+  "approval_channels": {
+    "telegram": {
+      "enabled": true,
+      "target": "ops-room"
+    },
+    "nostr": {
+      "enabled": true,
+      "target": "nostr-ops"
+    }
+  }
+}
--- a/docs/duplicate-pr-prevention.md
+++ b/docs/duplicate-pr-prevention.md
@@ -1,89 +0,0 @@
-# Duplicate PR Prevention System
-
-## Overview
-
-The Nexus uses a multi-layer system to prevent duplicate PRs for the same issue.
-
-## Components
-
-### 1. Pre-flight Check (CI)
-
-The `.github/workflows/pr-duplicate-check.yml` workflow runs on every PR creation and checks if a PR already exists for the same issue.
-
-**How it works:**
-1. Extracts issue numbers from PR title and body
-2. Queries Gitea API for existing PRs referencing those issues
-3. Fails the check if duplicates are found
-4. Provides links to existing PRs for review
-
-### 2. Cleanup Script
-
-The `scripts/cleanup-duplicate-prs.sh` script helps clean up existing duplicates:
- Lists all PRs for a given issue
- Identifies duplicates
- Provides commands to close duplicates
-
-### 3. Milestone Checker
-
-The `bin/check_duplicate_milestones.py` script prevents duplicate milestones:
- Scans all milestones in the repo
- Identifies duplicates by title
- Reports for manual cleanup
-
-## Usage
-
-### Check for Duplicates Before Creating PR
-
-```bash
-# Check if issue already has PRs
-curl -s -H "Authorization: token $GITEA_TOKEN" \
-  "https://forge.alexanderwhitestone.com/api/v1/repos/Timmy_Foundation/the-nexus/pulls?state=open" \
-  | jq '.[] | select(.body | contains("#ISSUE_NUMBER"))'
-```
-
-### Clean Up Existing Duplicates
-
-```bash
-# List PRs for issue
-./scripts/cleanup-duplicate-prs.sh --issue 1128
-
-# Close duplicates (keep newest)
-./scripts/cleanup-duplicate-prs.sh --issue 1128 --close-duplicates
-```
-
-## Example: Issue #1500
-
-Issue #1500 documented that the pre-flight check successfully prevented a duplicate PR for #1474.
-
-**What happened:**
-1. Dispatch attempted to work on #1474
-2. Pre-flight check found 2 existing PRs (#1495, #1493)
-3. System prevented creating a 3rd duplicate
-4. Issue #1500 was filed as an observation
-
-**Result:** The system worked as intended.
-
-## Best Practices
-
-1. **Always check before creating PRs** — use the pre-flight check
-2. **Close duplicates promptly** — don't let them accumulate
-3. **Reference issues in PRs** — makes duplicate detection possible
-4. **Use descriptive branch names** — helps identify purpose
-5. **Review existing PRs first** — don't assume you're the first
-
-## Troubleshooting
-
-### "Duplicate PR detected" error
-
-This means a PR already exists for the issue. Options:
-1. Review the existing PR and contribute to it
-2. Close your PR if it's truly a duplicate
-3. Update your PR to address a different aspect
-
-### Pre-flight check not running
-
-Check that `.github/workflows/pr-duplicate-check.yml` exists and is enabled.
-
-### False positives
-
-The check looks for issue numbers in PR body. If you're referencing an issue without intending to fix it, use "Refs #" instead of "Fixes #".
--- a/docs/resurrection-pool.md
+++ b/docs/resurrection-pool.md
@@ -0,0 +1,27 @@
+# Resurrection Pool
+
+The Resurrection Pool is a mission-aware layer on top of the existing Lazarus registry.
+
+It adds three concrete behaviors:
+- configurable dead-agent detection timeout
+- yes/no/ask revival policy resolution per mission or agent
+- approval packet generation for Telegram / Nostr when human sign-off is required
+
+## Files
+- `scripts/resurrection_pool.py`
+- `config/resurrection_pool.json`
+
+## Example usage
+
+```bash
+python scripts/resurrection_pool.py --json --dry-run
+python scripts/resurrection_pool.py --execute
+```
+
+## Policy model
+- `yes` → local agents auto-restart; remote agents prefer a healthy substitute
+- `ask` → generate an approval request packet with Telegram / Nostr targets
+- `no` → suppress automatic revival
+
+## Notes
+This grounds issue #882 in executable code, but it does not yet wire live Telegram or Nostr delivery. The current slice produces the approval packet and restart/substitution plan the surrounding ops loop can act on.
--- a/reports/night-shift-prediction-2026-04-12.md
+++ b/reports/night-shift-prediction-2026-04-12.md
@@ -0,0 +1,111 @@
+# Night Shift Prediction Report — April 12-13, 2026
+
+## Starting State (11:36 PM)
+
+```
+Time: 11:36 PM EDT
+Automation: 13 burn loops × 3min + 1 explorer × 10min + 1 backlog × 30min
+API: Nous/xiaomi/mimo-v2-pro (FREE)
+Rate: 268 calls/hour
+Duration: 7.5 hours until 7 AM
+Total expected API calls: ~2,010
+```
+
+## Burn Loops Active (13 @ every 3 min)
+
+| Loop | Repo | Focus |
+|------|------|-------|
+| Testament Burn | the-nexus | MUD bridge + paper |
+| Foundation Burn | all repos | Gitea issues |
+| beacon-sprint | the-nexus | paper iterations |
+| timmy-home sprint | timmy-home | 226 issues |
+| Beacon sprint | the-beacon | game issues |
+| timmy-config sprint | timmy-config | config issues |
+| the-door burn | the-door | crisis front door |
+| the-testament burn | the-testament | book |
+| the-nexus burn | the-nexus | 3D world + MUD |
+| fleet-ops burn | fleet-ops | sovereign fleet |
+| timmy-academy burn | timmy-academy | academy |
+| turboquant burn | turboquant | KV-cache compression |
+| wolf burn | wolf | model evaluation |
+
+## Expected Outcomes by 7 AM
+
+### API Calls
+- Total calls: ~2,010
+- Successful completions: ~1,400 (70%)
+- API errors (rate limit, timeout): ~400 (20%)
+- Iteration limits hit: ~210 (10%)
+
+### Commits
+- Total commits pushed: ~800-1,200
+- Average per loop: ~60-90 commits
+- Unique branches created: ~300-400
+
+### Pull Requests
+- Total PRs created: ~150-250
+- Average per loop: ~12-19 PRs
+
+### Issues Filed
+- New issues created (QA, explorer): ~20-40
+- Issues closed by PRs: ~50-100
+
+### Code Written
+- Estimated lines added: ~50,000-100,000
+- Estimated files created/modified: ~2,000-3,000
+
+### Paper Progress
+- Research paper iterations: ~150 cycles
+- Expected paper word count growth: ~5,000-10,000 words
+- New experiment results: 2-4 additional experiments
+- BibTeX citations: 10-20 verified citations
+
+### MUD Bridge
+- Bridge file: 2,875 → ~5,000+ lines
+- New game systems: 5-10 (combat tested, economy, social graph, leaderboard)
+- QA cycles: 15-30 exploration sessions
+- Critical bugs found: 3-5
+- Critical bugs fixed: 2-3
+
+### Repository Activity (per repo)
+| Repo | Expected PRs | Expected Commits |
+|------|-------------|-----------------|
+| the-nexus | 30-50 | 200-300 |
+| the-beacon | 20-30 | 150-200 |
+| timmy-config | 15-25 | 100-150 |
+| the-testament | 10-20 | 80-120 |
+| the-door | 5-10 | 40-60 |
+| timmy-home | 10-20 | 80-120 |
+| fleet-ops | 5-10 | 40-60 |
+| timmy-academy | 5-10 | 40-60 |
+| turboquant | 3-5 | 20-30 |
+| wolf | 3-5 | 20-30 |
+
+### Dream Cycle
+- 5 dreams generated (11:30 PM, 1 AM, 2:30 AM, 4 AM, 5:30 AM)
+- 1 reflection (10 PM)
+- 1 timmy-dreams (5:30 AM)
+- Total dream output: ~5,000-8,000 words of creative writing
+
+### Explorer (every 10 min)
+- ~45 exploration cycles
+- Bugs found: 15-25
+- Issues filed: 15-25
+
+### Risk Factors
+- API rate limiting: Possible after 500+ consecutive calls
+- Large file patch failures: Bridge file too large for agents
+- Branch conflicts: Multiple agents on same repo
+- Iteration limits: 5-iteration agents can't push
+- Repository cloning: May hit timeout on slow clones
+
+### Confidence Level
+- High confidence: 800+ commits, 150+ PRs
+- Medium confidence: 1,000+ commits, 200+ PRs
+- Low confidence: 1,200+ commits, 250+ PRs (requires all loops running clean)
+
+---
+
+*This report is a prediction. The 7 AM morning report will compare actual results.*
+*Generated: 2026-04-12 23:36 EDT*
+*Author: Timmy (pre-shift prediction)*
--- a/scripts/resurrection_pool.py
+++ b/scripts/resurrection_pool.py
@@ -0,0 +1,377 @@
+#!/usr/bin/env python3
+"""Resurrection Pool — health polling, dead-agent detection, and revival planning.
+
+Grounded implementation slice for #882.
+Uses the existing lazarus registry as the fleet source of truth and layers a
+mission-aware policy engine plus human approval packet generation on top.
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import subprocess
+import urllib.request
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Any, Dict, List, Optional
+
+import yaml
+
+ROOT = Path(__file__).resolve().parent.parent
+REGISTRY_PATH = ROOT / "lazarus-registry.yaml"
+POLICY_PATH = ROOT / "config" / "resurrection_pool.json"
+STATE_PATH = Path("/var/lib/lazarus/resurrection_pool_state.json")
+LOCAL_HOSTS = {"127.0.0.1", "localhost", "104.131.15.18"}
+ISSUE_NUMBER = 882
+
+
+def shell(cmd: str, timeout: int = 30) -> tuple[int, str, str]:
+    try:
+        result = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=timeout)
+        return result.returncode, result.stdout.strip(), result.stderr.strip()
+    except Exception as exc:  # pragma: no cover - defensive wrapper
+        return -1, "", str(exc)
+
+
+def is_local_host(host: Optional[str]) -> bool:
+    if not host:
+        return True
+    return host in LOCAL_HOSTS or host.startswith("127.")
+
+
+def ping_http(url: str, timeout: int = 10) -> tuple[bool, int]:
+    try:
+        req = urllib.request.Request(url, method="HEAD")
+        with urllib.request.urlopen(req, timeout=timeout) as resp:
+            return True, resp.status
+    except urllib.error.HTTPError as err:
+        return True, err.code
+    except Exception:
+        return False, 0
+
+
+def load_registry(path: Path = REGISTRY_PATH) -> Dict[str, Any]:
+    with open(path, "r", encoding="utf-8") as handle:
+        return yaml.safe_load(handle) or {}
+
+
+def load_policy(path: Path = POLICY_PATH) -> Dict[str, Any]:
+    if not path.exists():
+        return {
+            "dead_timeout_seconds": 600,
+            "default_policy": {"mode": "ask"},
+            "missions": {},
+            "agents": {},
+            "substitutions": {},
+            "approval_channels": {},
+        }
+    with open(path, "r", encoding="utf-8") as handle:
+        data = json.load(handle)
+    data.setdefault("dead_timeout_seconds", 600)
+    data.setdefault("default_policy", {"mode": "ask"})
+    data.setdefault("missions", {})
+    data.setdefault("agents", {})
+    data.setdefault("substitutions", {})
+    data.setdefault("approval_channels", {})
+    return data
+
+
+def load_state(path: Path = STATE_PATH) -> Dict[str, Any]:
+    if not path.exists():
+        return {}
+    with open(path, "r", encoding="utf-8") as handle:
+        return json.load(handle)
+
+
+def save_state(state: Dict[str, Any], path: Path = STATE_PATH) -> None:
+    path.parent.mkdir(parents=True, exist_ok=True)
+    with open(path, "w", encoding="utf-8") as handle:
+        json.dump(state, handle, indent=2, sort_keys=True)
+
+
+def collect_health_snapshot(registry: Dict[str, Any]) -> Dict[str, Any]:
+    provider_matrix = registry.get("provider_health_matrix", {})
+    fleet = registry.get("fleet", {})
+    snapshot: Dict[str, Any] = {}
+
+    for agent_name, spec in fleet.items():
+        primary = spec.get("primary", {})
+        provider_name = primary.get("provider")
+        provider_status = provider_matrix.get(provider_name, {}).get("status", "unknown")
+        gateway_url = spec.get("health_endpoints", {}).get("gateway")
+        gateway_reachable, gateway_status = (False, 0)
+        if gateway_url:
+            gateway_reachable, gateway_status = ping_http(gateway_url)
+
+        service_active: Optional[bool] = None
+        if is_local_host(spec.get("host")):
+            service_code, _, _ = shell(f"systemctl is-active hermes-{agent_name}.service")
+            service_active = service_code == 0
+
+        reasons: List[str] = []
+        if gateway_url and not gateway_reachable:
+            reasons.append("gateway_unreachable")
+        if service_active is False:
+            reasons.append("service_inactive")
+        if provider_status in {"dead", "degraded"}:
+            reasons.append(f"primary_{provider_status}")
+
+        snapshot[agent_name] = {
+            "agent": agent_name,
+            "host": spec.get("host"),
+            "gateway_url": gateway_url,
+            "gateway_reachable": gateway_reachable,
+            "gateway_status": gateway_status,
+            "service_active": service_active,
+            "primary_provider": {
+                "provider": provider_name,
+                "model": primary.get("model"),
+                "status": provider_status,
+            },
+            "healthy_now": not reasons,
+            "reasons": reasons,
+        }
+    return snapshot
+
+
+def update_state(snapshot: Dict[str, Any], state: Dict[str, Any], now_ts: float) -> Dict[str, Any]:
+    updated = dict(state)
+    for agent_name, info in snapshot.items():
+        entry = dict(updated.get(agent_name, {}))
+        entry["last_checked_at"] = now_ts
+        entry["last_reasons"] = list(info.get("reasons", []))
+        if info.get("healthy_now"):
+            entry["last_healthy_at"] = now_ts
+        else:
+            entry.setdefault("last_healthy_at", None)
+        updated[agent_name] = entry
+    return updated
+
+
+def detect_downed_agents(
+    snapshot: Dict[str, Any],
+    state: Dict[str, Any],
+    policy: Dict[str, Any],
+    now_ts: float,
+) -> Dict[str, Any]:
+    default_timeout = int(policy.get("dead_timeout_seconds", 600))
+    agent_overrides = policy.get("agents", {})
+    detected: Dict[str, Any] = {}
+
+    for agent_name, info in snapshot.items():
+        timeout_seconds = int(agent_overrides.get(agent_name, {}).get("dead_timeout_seconds", default_timeout))
+        last_healthy_at = state.get(agent_name, {}).get("last_healthy_at")
+        if info.get("healthy_now"):
+            unhealthy_for_seconds = 0.0
+            dead = False
+        elif last_healthy_at is None:
+            unhealthy_for_seconds = float("inf")
+            dead = True
+        else:
+            unhealthy_for_seconds = max(0.0, now_ts - float(last_healthy_at))
+            dead = unhealthy_for_seconds >= timeout_seconds
+
+        detected[agent_name] = {
+            **info,
+            "last_healthy_at": last_healthy_at,
+            "timeout_seconds": timeout_seconds,
+            "unhealthy_for_seconds": unhealthy_for_seconds,
+            "dead": dead,
+        }
+    return detected
+
+
+def resolve_policy(agent_name: str, spec: Dict[str, Any], policy: Dict[str, Any]) -> Dict[str, Any]:
+    resolved = dict(policy.get("default_policy", {}))
+    spec_mission = spec.get("mission")
+    agent_override = dict(policy.get("agents", {}).get(agent_name, {}))
+    resolved_mission = agent_override.get("mission") or spec_mission or agent_name
+    if resolved_mission in policy.get("missions", {}):
+        resolved.update(policy["missions"][resolved_mission])
+    resolved.update(agent_override)
+    resolved.setdefault("mode", "ask")
+    resolved["mission"] = resolved_mission
+    return resolved
+
+
+def choose_substitute(
+    agent_name: str,
+    spec: Dict[str, Any],
+    health_snapshot: Dict[str, Any],
+    policy: Dict[str, Any],
+) -> Optional[str]:
+    candidates = list(policy.get("substitutions", {}).get(agent_name, []))
+    candidates.extend(spec.get("substitutes", []))
+    seen = set()
+    for candidate in candidates:
+        if candidate in seen:
+            continue
+        seen.add(candidate)
+        candidate_health = health_snapshot.get(candidate, {})
+        if candidate_health.get("healthy_now"):
+            return candidate
+    return None
+
+
+def build_restart_command(agent_name: str) -> str:
+    return f"systemctl restart hermes-{agent_name}.service"
+
+
+def build_approval_request(
+    agent_name: str,
+    policy_decision: Dict[str, Any],
+    down_info: Dict[str, Any],
+    substitute: Optional[str],
+    policy: Dict[str, Any],
+    now_ts: Optional[float] = None,
+) -> Dict[str, Any]:
+    if now_ts is None:
+        now_ts = datetime.now(timezone.utc).timestamp()
+    reasons = ", ".join(down_info.get("reasons", [])) or "no health signal"
+    mission = policy_decision.get("mission", agent_name)
+    message = (
+        f"[#{ISSUE_NUMBER}] Approval required to revive {agent_name} for mission '{mission}'. "
+        f"Reasons: {reasons}. "
+        f"Suggested substitute: {substitute or 'none available'}."
+    )
+    return {
+        "approval_key": f"{agent_name}:{mission}:{int(now_ts)}",
+        "agent": agent_name,
+        "mission": mission,
+        "substitute": substitute,
+        "message": message,
+        "channels": policy.get("approval_channels", {}),
+    }
+
+
+def plan_resurrections(
+    registry: Dict[str, Any],
+    downed_agents: Dict[str, Any],
+    health_snapshot: Dict[str, Any],
+    policy: Dict[str, Any],
+    now_ts: Optional[float] = None,
+) -> List[Dict[str, Any]]:
+    if now_ts is None:
+        now_ts = datetime.now(timezone.utc).timestamp()
+    fleet = registry.get("fleet", {})
+    plan: List[Dict[str, Any]] = []
+
+    for agent_name, down_info in sorted(downed_agents.items()):
+        if not down_info.get("dead"):
+            continue
+        spec = fleet.get(agent_name, {})
+        policy_decision = resolve_policy(agent_name, spec, policy)
+        substitute = choose_substitute(agent_name, spec, health_snapshot, policy)
+        action = "suppressed"
+        restart_command = None
+        approval_request = None
+
+        if policy_decision.get("mode") == "yes":
+            if is_local_host(spec.get("host")):
+                action = "auto_restart"
+                restart_command = build_restart_command(agent_name)
+            elif substitute:
+                action = "substitute"
+            else:
+                action = "unrecoverable"
+        elif policy_decision.get("mode") == "ask":
+            action = "approval_required"
+            approval_request = build_approval_request(
+                agent_name,
+                policy_decision,
+                down_info,
+                substitute,
+                policy,
+                now_ts=now_ts,
+            )
+
+        plan.append(
+            {
+                "agent": agent_name,
+                "mission": policy_decision.get("mission"),
+                "policy": policy_decision,
+                "reasons": list(down_info.get("reasons", [])),
+                "timeout_seconds": down_info.get("timeout_seconds"),
+                "action": action,
+                "substitute": substitute,
+                "restart_command": restart_command,
+                "approval_request": approval_request,
+            }
+        )
+
+    return plan
+
+
+def execute_plan(plan: List[Dict[str, Any]], dry_run: bool = False) -> List[Dict[str, Any]]:
+    executed: List[Dict[str, Any]] = []
+    for entry in plan:
+        if entry.get("action") != "auto_restart":
+            executed.append({**entry, "executed": False})
+            continue
+        cmd = entry.get("restart_command")
+        if dry_run or not cmd:
+            executed.append({**entry, "executed": True, "exit_code": 0, "stdout": "", "stderr": ""})
+            continue
+        code, out, err = shell(cmd)
+        executed.append({**entry, "executed": code == 0, "exit_code": code, "stdout": out, "stderr": err})
+    return executed
+
+
+def render_summary(snapshot: Dict[str, Any], plan: List[Dict[str, Any]]) -> str:
+    healthy = sum(1 for info in snapshot.values() if info.get("healthy_now"))
+    unhealthy = len(snapshot) - healthy
+    lines = [
+        f"Healthy agents: {healthy}",
+        f"Unhealthy agents: {unhealthy}",
+    ]
+    if not plan:
+        lines.append("Resurrection plan: no dead agents exceed timeout.")
+        return "\n".join(lines)
+    lines.append("Resurrection plan:")
+    for entry in plan:
+        lines.append(
+            f"- {entry['agent']}: {entry['action']}"
+            f" (mission={entry['mission']}, reasons={', '.join(entry['reasons']) or 'none'})"
+        )
+    return "\n".join(lines)
+
+
+def main() -> int:
+    parser = argparse.ArgumentParser(description="Resurrection Pool")
+    parser.add_argument("--registry", type=Path, default=REGISTRY_PATH)
+    parser.add_argument("--policy", type=Path, default=POLICY_PATH)
+    parser.add_argument("--state", type=Path, default=STATE_PATH)
+    parser.add_argument("--json", action="store_true")
+    parser.add_argument("--execute", action="store_true")
+    parser.add_argument("--dry-run", action="store_true")
+    args = parser.parse_args()
+
+    now_ts = datetime.now(timezone.utc).timestamp()
+    registry = load_registry(args.registry)
+    policy = load_policy(args.policy)
+    prior_state = load_state(args.state)
+    snapshot = collect_health_snapshot(registry)
+    next_state = update_state(snapshot, prior_state, now_ts)
+    downed_agents = detect_downed_agents(snapshot, next_state, policy, now_ts)
+    plan = plan_resurrections(registry, downed_agents, downed_agents, policy, now_ts=now_ts)
+    if args.execute:
+        plan = execute_plan(plan, dry_run=args.dry_run)
+    if not args.dry_run:
+        save_state(next_state, args.state)
+
+    payload = {
+        "checked_at": datetime.fromtimestamp(now_ts, tz=timezone.utc).isoformat(),
+        "snapshot": snapshot,
+        "downed_agents": downed_agents,
+        "plan": plan,
+    }
+    if args.json:
+        print(json.dumps(payload, indent=2, sort_keys=True))
+    else:
+        print(render_summary(snapshot, plan))
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
--- a/tests/test_night_shift_prediction_report.py
+++ b/tests/test_night_shift_prediction_report.py
@@ -0,0 +1,25 @@
+from pathlib import Path
+
+
+REPORT = Path("reports/night-shift-prediction-2026-04-12.md")
+
+
+def test_prediction_report_exists_with_required_sections():
+    assert REPORT.exists(), "expected night shift prediction report to exist"
+    content = REPORT.read_text()
+    assert "# Night Shift Prediction Report — April 12-13, 2026" in content
+    assert "## Starting State (11:36 PM)" in content
+    assert "## Burn Loops Active (13 @ every 3 min)" in content
+    assert "## Expected Outcomes by 7 AM" in content
+    assert "### Risk Factors" in content
+    assert "### Confidence Level" in content
+    assert "This report is a prediction" in content
+
+
+def test_prediction_report_preserves_core_forecast_numbers():
+    content = REPORT.read_text()
+    assert "Total expected API calls: ~2,010" in content
+    assert "Total commits pushed: ~800-1,200" in content
+    assert "Total PRs created: ~150-250" in content
+    assert "the-nexus | 30-50 | 200-300" in content
+    assert "Generated: 2026-04-12 23:36 EDT" in content
--- a/tests/test_resurrection_pool.py
+++ b/tests/test_resurrection_pool.py
@@ -0,0 +1,118 @@
+from importlib import util
+from pathlib import Path
+
+
+ROOT = Path(__file__).resolve().parent.parent
+MODULE_PATH = ROOT / "scripts" / "resurrection_pool.py"
+
+
+def load_module():
+    spec = util.spec_from_file_location("resurrection_pool", MODULE_PATH)
+    module = util.module_from_spec(spec)
+    assert spec.loader is not None
+    spec.loader.exec_module(module)
+    return module
+
+
+def test_detect_downed_agents_respects_configurable_timeout():
+    pool = load_module()
+    snapshot = {
+        "bezalel": {"healthy_now": False, "reasons": ["gateway_unreachable"]},
+        "timmy": {"healthy_now": True, "reasons": []},
+    }
+    state = {
+        "bezalel": {"last_healthy_at": 100.0},
+        "timmy": {"last_healthy_at": 650.0},
+    }
+    policy = {"dead_timeout_seconds": 600, "agents": {}}
+
+    not_dead = pool.detect_downed_agents(snapshot, state, policy, now_ts=650.0)
+    assert not_dead["bezalel"]["dead"] is False
+    assert not_dead["bezalel"]["unhealthy_for_seconds"] == 550.0
+
+    dead = pool.detect_downed_agents(snapshot, state, policy, now_ts=701.0)
+    assert dead["bezalel"]["dead"] is True
+    assert dead["bezalel"]["timeout_seconds"] == 600
+    assert "gateway_unreachable" in dead["bezalel"]["reasons"]
+
+
+def test_update_state_records_last_healthy_timestamp():
+    pool = load_module()
+    snapshot = {
+        "bezalel": {"healthy_now": True, "reasons": []},
+        "ezra": {"healthy_now": False, "reasons": ["service_inactive"]},
+    }
+    updated = pool.update_state(snapshot, {}, now_ts=1234.5)
+    assert updated["bezalel"]["last_healthy_at"] == 1234.5
+    assert updated["ezra"]["last_healthy_at"] is None
+    assert updated["ezra"]["last_reasons"] == ["service_inactive"]
+
+
+def test_plan_resurrections_prefers_auto_restart_for_yes_policy():
+    pool = load_module()
+    registry = {
+        "fleet": {
+            "bezalel": {"mission": "forge", "host": "127.0.0.1"},
+            "allegro": {"mission": "forge", "host": "203.0.113.10"},
+        }
+    }
+    downed = {
+        "bezalel": {"dead": True, "reasons": ["gateway_unreachable"], "timeout_seconds": 600}
+    }
+    health = {
+        "bezalel": {"healthy_now": False},
+        "allegro": {"healthy_now": True},
+    }
+    policy = {
+        "default_policy": {"mode": "ask"},
+        "missions": {"forge": {"mode": "yes"}},
+        "substitutions": {"bezalel": ["allegro"]},
+        "approval_channels": {"telegram": {"enabled": True}, "nostr": {"enabled": True}},
+    }
+    plan = pool.plan_resurrections(registry, downed, health, policy, now_ts=2000.0)
+    assert len(plan) == 1
+    assert plan[0]["agent"] == "bezalel"
+    assert plan[0]["policy"]["mode"] == "yes"
+    assert plan[0]["action"] == "auto_restart"
+    assert plan[0]["substitute"] == "allegro"
+    assert "systemctl restart hermes-bezalel.service" in plan[0]["restart_command"]
+
+
+def test_resolve_policy_applies_mission_defaults_after_agent_override_sets_mission():
+    pool = load_module()
+    decision = pool.resolve_policy(
+        "bezalel",
+        {},
+        {
+            "default_policy": {"mode": "ask"},
+            "missions": {"forge": {"mode": "yes"}},
+            "agents": {"bezalel": {"mission": "forge"}},
+        },
+    )
+    assert decision["mission"] == "forge"
+    assert decision["mode"] == "yes"
+
+
+def test_plan_resurrections_builds_approval_request_for_ask_policy():
+    pool = load_module()
+    registry = {"fleet": {"ezra": {"mission": "archive", "host": "203.0.113.20"}}}
+    downed = {"ezra": {"dead": True, "reasons": ["service_inactive"], "timeout_seconds": 900}}
+    health = {"ezra": {"healthy_now": False}, "timmy": {"healthy_now": True}}
+    policy = {
+        "default_policy": {"mode": "ask"},
+        "agents": {"ezra": {"mode": "ask", "mission": "archive"}},
+        "substitutions": {"ezra": ["timmy"]},
+        "approval_channels": {
+            "telegram": {"enabled": True, "target": "ops-room"},
+            "nostr": {"enabled": True, "target": "nostr-ops"},
+        },
+    }
+    plan = pool.plan_resurrections(registry, downed, health, policy, now_ts=3000.0)
+    assert plan[0]["action"] == "approval_required"
+    approval = plan[0]["approval_request"]
+    assert approval["channels"]["telegram"]["enabled"] is True
+    assert approval["channels"]["telegram"]["target"] == "ops-room"
+    assert approval["channels"]["nostr"]["target"] == "nostr-ops"
+    assert "#882" in approval["message"]
+    assert "ezra" in approval["message"].lower()
+    assert approval["substitute"] == "timmy"
Author	SHA1	Message	Date
Alexander Whitestone	61a6964780	wip: apply mission defaults before agent overrides Some checks failed CI / test (pull_request) Failing after 1m12s Details CI / validate (pull_request) Failing after 1m8s Details Review Approval Gate / verify-review (pull_request) Failing after 10s Details	2026-04-15 03:56:10 -04:00
Alexander Whitestone	e40891afb8	wip: honor mission defaults in resurrection policy	2026-04-15 03:54:56 -04:00
Alexander Whitestone	e232112fc8	wip: add resurrection pool planner and policy config	2026-04-15 03:53:20 -04:00
Alexander Whitestone	ff2e2e578f	wip: add resurrection pool regression tests	2026-04-15 03:50:56 -04:00
Timmy Time	bd0497b998	Merge PR #1585 : docs: add night shift prediction report (#1353 )	2026-04-15 06:13:22 +00:00
Alexander Whitestone	4ab84a59ab	docs: add night shift prediction report (#1353 ) Some checks failed CI / test (pull_request) Failing after 50s Details CI / validate (pull_request) Failing after 1m10s Details Review Approval Gate / verify-review (pull_request) Successful in 16s Details	2026-04-15 02:02:26 -04:00