wip: apply mission defaults before agent overrides

wip: honor mission defaults in resurrection policy
wip: add resurrection pool planner and policy config
2026-04-15 03:56:10 -04:00 · 2026-04-15 03:54:56 -04:00 · 2026-04-15 03:53:20 -04:00 · 2026-04-15 03:50:56 -04:00
12 changed files with 600 additions and 337 deletions
--- a/.gitea/branch-protection/the-nexus.yml
+++ b/.gitea/branch-protection/the-nexus.yml
@@ -6,4 +6,3 @@ rules:
  require_ci_to_merge: false # CI runner dead (issue #915)
  block_force_pushes: true
  block_deletions: true
-  block_on_outdated_branch: true
--- a/.github/BRANCH_PROTECTION.md
+++ b/.github/BRANCH_PROTECTION.md
@@ -12,7 +12,6 @@ All repositories must enforce these rules on the `main` branch:
 | Require CI to pass | ⚠ Conditional | Only where CI exists |
 | Block force push | ✅ Enabled | Protect commit history |
 | Block branch deletion | ✅ Enabled | Prevent accidental deletion |
-| Require branch up-to-date before merge | ✅ Enabled | Surface conflicts before merge and force contributors to rebase |

 ## Default Reviewer Assignments

--- a/app.js
+++ b/app.js
@@ -714,10 +714,6 @@ async function init() {
  camera = new THREE.PerspectiveCamera(65, window.innerWidth / window.innerHeight, 0.1, 1000);
  camera.position.copy(playerPos);

-  // Initialize avatar and LOD systems
-  if (window.AvatarCustomization) window.AvatarCustomization.init(scene, camera);
-  if (window.LODSystem) window.LODSystem.init(scene, camera);
-
  updateLoad(20);

  createSkybox();
@@ -3561,10 +3557,6 @@ function gameLoop() {

  if (composer) { composer.render(); } else { renderer.render(scene, camera); }

-  // Update avatar and LOD systems
-  if (window.AvatarCustomization && playerPos) window.AvatarCustomization.update(playerPos);
-  if (window.LODSystem && playerPos) window.LODSystem.update(playerPos);
-
  updateAshStorm(delta, elapsed);

  // Project Mnemosyne - Memory Orb Animation
--- a/config/resurrection_pool.json
+++ b/config/resurrection_pool.json
@@ -0,0 +1,55 @@
+{
+  "dead_timeout_seconds": 600,
+  "default_policy": {
+    "mode": "ask"
+  },
+  "missions": {
+    "forge": {
+      "mode": "yes"
+    },
+    "archive": {
+      "mode": "ask"
+    },
+    "sovereign-core": {
+      "mode": "no"
+    }
+  },
+  "agents": {
+    "bezalel": {
+      "mission": "forge"
+    },
+    "allegro": {
+      "mission": "forge"
+    },
+    "ezra": {
+      "mission": "archive",
+      "mode": "ask"
+    },
+    "timmy": {
+      "mission": "sovereign-core",
+      "mode": "ask"
+    }
+  },
+  "substitutions": {
+    "bezalel": [
+      "allegro",
+      "timmy"
+    ],
+    "ezra": [
+      "timmy"
+    ],
+    "allegro": [
+      "timmy"
+    ]
+  },
+  "approval_channels": {
+    "telegram": {
+      "enabled": true,
+      "target": "ops-room"
+    },
+    "nostr": {
+      "enabled": true,
+      "target": "nostr-ops"
+    }
+  }
+}
--- a/docs/hermes-capabilities.md
+++ b/docs/hermes-capabilities.md
@@ -1,55 +0,0 @@
-# Hermes Agent Capability Expansion — Status Tracker
-
-Epic #1120. Six workstreams transforming Hermes from chatbot to sovereign agent.
-
-## Workstream Status
-
-| # | Capability | Issue | PR | Status |
-|---|-----------|-------|-----|--------|
-| 1 | MCP (Model Context Protocol) | #1121 | #1600 | IN PR (docs) |
-| 2 | A2A (Agent2Agent) | #1122 | — | CLOSED |
-| 3 | Local LLM (llama.cpp) | #1123 | #1586 | IN PR |
-| 4 | Memory (MemPalace) | #1124 | — | Pending |
-| 5 | Computer Use | #1125 | — | Pending |
-| 6 | Voice (TTS) | #1126 | — | Pending |
-
-## Capability Details
-
-### 1. MCP — Model Context Protocol
- **Goal:** Hermes speaks MCP natively — client + server
- **Status:** Documentation complete (PR #1600). Client code exists in `tools/mcp_tool.py`
- **Next:** Server implementation, integration testing
-
-### 2. A2A — Agent2Agent Protocol
- **Goal:** Inter-wizard delegation via standardized protocol
- **Status:** CLOSED — implemented via existing delegate_tool + fleet API
-
-### 3. Local LLM — llama.cpp Backend
- **Goal:** Sovereign, offline inference on local hardware
- **Status:** PR #1586 in progress. Ollama integration exists for some models
- **Next:** Standardize llama.cpp backend, FP8 quantization
-
-### 4. Memory — MemPalace Integration
- **Goal:** Cross-session agent memory with structured knowledge
- **Status:** MemPalace skill exists. Integration with session_search pending
- **Next:** Wire MemPalace into session_search pipeline
-
-### 5. Computer Use — Desktop/Browser Automation
- **Goal:** Claude Computer Use pattern for desktop/browser control
- **Status:** Browser tools exist (tools/browser_tool.py). Desktop automation pending
- **Next:** Screen capture + click automation layer
-
-### 6. Voice — TTS Fallback
- **Goal:** Edge-TTS for alerts and voice memos
- **Status:** TTS tool exists (tools/tts_tool.py). Edge-TTS backend pending
- **Next:** Wire edge-tts as fallback for cloud TTS
-
-## Definition of Done
-
- [ ] All 6 sub-issues closed or descoped with reason
- [ ] At least 3 capabilities live in production (Beta/Alpha)
- [ ] Documentation updated
- [ ] Demo: cross-capability flow recorded
-
-## Target: 2026-05-31
-## Owner: Bezalel
--- a/docs/resurrection-pool.md
+++ b/docs/resurrection-pool.md
@@ -0,0 +1,27 @@
+# Resurrection Pool
+
+The Resurrection Pool is a mission-aware layer on top of the existing Lazarus registry.
+
+It adds three concrete behaviors:
+- configurable dead-agent detection timeout
+- yes/no/ask revival policy resolution per mission or agent
+- approval packet generation for Telegram / Nostr when human sign-off is required
+
+## Files
+- `scripts/resurrection_pool.py`
+- `config/resurrection_pool.json`
+
+## Example usage
+
+```bash
+python scripts/resurrection_pool.py --json --dry-run
+python scripts/resurrection_pool.py --execute
+```
+
+## Policy model
+- `yes` → local agents auto-restart; remote agents prefer a healthy substitute
+- `ask` → generate an approval request packet with Telegram / Nostr targets
+- `no` → suppress automatic revival
+
+## Notes
+This grounds issue #882 in executable code, but it does not yet wire live Telegram or Nostr delivery. The current slice produces the approval packet and restart/substitution plan the surrounding ops loop can act on.
--- a/index.html
+++ b/index.html
@@ -395,8 +395,6 @@
 <div id="memory-connections-panel" class="memory-connections-panel" style="display:none;" aria-label="Memory Connections Panel"></div>

 <script src="./boot.js"></script>
-<script src="./avatar-customization.js"></script>
-<script src="./lod-system.js"></script>
 <script>
 function openMemoryFilter() { renderFilterList(); document.getElementById('memory-filter').style.display = 'flex'; }
 function closeMemoryFilter() { document.getElementById('memory-filter').style.display = 'none'; }
--- a/lod-system.js
+++ b/lod-system.js
@@ -1,186 +0,0 @@
-/**
- * LOD (Level of Detail) System for The Nexus
- * 
- * Optimizes rendering when many avatars/users are visible:
- * - Distance-based LOD: far users become billboard sprites
- * - Occlusion: skip rendering users behind walls
- * - Budget: maintain 60 FPS target with 50+ avatars
- * 
- * Usage:
- *   LODSystem.init(scene, camera);
- *   LODSystem.registerAvatar(avatarMesh, userId);
- *   LODSystem.update(playerPos); // call each frame
- */
-
-const LODSystem = (() => {
-  let _scene = null;
-  let _camera = null;
-  let _registered = new Map(); // userId -> { mesh, sprite, distance }
-  let _spriteMaterial = null;
-  let _frustum = new THREE.Frustum();
-  let _projScreenMatrix = new THREE.Matrix4();
-
-  // Thresholds
-  const LOD_NEAR = 15;      // Full mesh within 15 units
-  const LOD_FAR = 40;       // Billboard beyond 40 units
-  const LOD_CULL = 80;      // Don't render beyond 80 units
-  const SPRITE_SIZE = 1.2;
-
-  function init(sceneRef, cameraRef) {
-    _scene = sceneRef;
-    _camera = cameraRef;
-
-    // Create shared sprite material
-    const canvas = document.createElement('canvas');
-    canvas.width = 64;
-    canvas.height = 64;
-    const ctx = canvas.getContext('2d');
-    // Simple avatar indicator: colored circle
-    ctx.fillStyle = '#00ffcc';
-    ctx.beginPath();
-    ctx.arc(32, 32, 20, 0, Math.PI * 2);
-    ctx.fill();
-    ctx.fillStyle = '#0a0f1a';
-    ctx.beginPath();
-    ctx.arc(32, 28, 8, 0, Math.PI * 2); // head
-    ctx.fill();
-
-    const texture = new THREE.CanvasTexture(canvas);
-    _spriteMaterial = new THREE.SpriteMaterial({
-      map: texture,
-      transparent: true,
-      depthTest: true,
-      sizeAttenuation: true,
-    });
-
-    console.log('[LODSystem] Initialized');
-  }
-
-  function registerAvatar(avatarMesh, userId, color) {
-    // Create billboard sprite for this avatar
-    const spriteMat = _spriteMaterial.clone();
-    if (color) {
-      // Tint sprite to match avatar color
-      const canvas = document.createElement('canvas');
-      canvas.width = 64;
-      canvas.height = 64;
-      const ctx = canvas.getContext('2d');
-      ctx.fillStyle = color;
-      ctx.beginPath();
-      ctx.arc(32, 32, 20, 0, Math.PI * 2);
-      ctx.fill();
-      ctx.fillStyle = '#0a0f1a';
-      ctx.beginPath();
-      ctx.arc(32, 28, 8, 0, Math.PI * 2);
-      ctx.fill();
-      spriteMat.map = new THREE.CanvasTexture(canvas);
-      spriteMat.map.needsUpdate = true;
-    }
-
-    const sprite = new THREE.Sprite(spriteMat);
-    sprite.scale.set(SPRITE_SIZE, SPRITE_SIZE, 1);
-    sprite.visible = false;
-    _scene.add(sprite);
-
-    _registered.set(userId, {
-      mesh: avatarMesh,
-      sprite: sprite,
-      distance: Infinity,
-    });
-  }
-
-  function unregisterAvatar(userId) {
-    const entry = _registered.get(userId);
-    if (entry) {
-      _scene.remove(entry.sprite);
-      entry.sprite.material.dispose();
-      _registered.delete(userId);
-    }
-  }
-
-  function setSpriteColor(userId, color) {
-    const entry = _registered.get(userId);
-    if (!entry) return;
-    const canvas = document.createElement('canvas');
-    canvas.width = 64;
-    canvas.height = 64;
-    const ctx = canvas.getContext('2d');
-    ctx.fillStyle = color;
-    ctx.beginPath();
-    ctx.arc(32, 32, 20, 0, Math.PI * 2);
-    ctx.fill();
-    ctx.fillStyle = '#0a0f1a';
-    ctx.beginPath();
-    ctx.arc(32, 28, 8, 0, Math.PI * 2);
-    ctx.fill();
-    entry.sprite.material.map = new THREE.CanvasTexture(canvas);
-    entry.sprite.material.map.needsUpdate = true;
-  }
-
-  function update(playerPos) {
-    if (!_camera) return;
-
-    // Update frustum for culling
-    _projScreenMatrix.multiplyMatrices(
-      _camera.projectionMatrix,
-      _camera.matrixWorldInverse
-    );
-    _frustum.setFromProjectionMatrix(_projScreenMatrix);
-
-    _registered.forEach((entry, userId) => {
-      if (!entry.mesh) return;
-
-      const meshPos = entry.mesh.position;
-      const distance = playerPos.distanceTo(meshPos);
-      entry.distance = distance;
-
-      // Beyond cull distance: hide everything
-      if (distance > LOD_CULL) {
-        entry.mesh.visible = false;
-        entry.sprite.visible = false;
-        return;
-      }
-
-      // Check if in camera frustum
-      const inFrustum = _frustum.containsPoint(meshPos);
-      if (!inFrustum) {
-        entry.mesh.visible = false;
-        entry.sprite.visible = false;
-        return;
-      }
-
-      // LOD switching
-      if (distance <= LOD_NEAR) {
-        // Near: full mesh
-        entry.mesh.visible = true;
-        entry.sprite.visible = false;
-      } else if (distance <= LOD_FAR) {
-        // Mid: mesh with reduced detail (keep mesh visible)
-        entry.mesh.visible = true;
-        entry.sprite.visible = false;
-      } else {
-        // Far: billboard sprite
-        entry.mesh.visible = false;
-        entry.sprite.visible = true;
-        entry.sprite.position.copy(meshPos);
-        entry.sprite.position.y += 1.2; // above avatar center
-      }
-    });
-  }
-
-  function getStats() {
-    let meshCount = 0;
-    let spriteCount = 0;
-    let culledCount = 0;
-    _registered.forEach(entry => {
-      if (entry.mesh.visible) meshCount++;
-      else if (entry.sprite.visible) spriteCount++;
-      else culledCount++;
-    });
-    return { total: _registered.size, mesh: meshCount, sprite: spriteCount, culled: culledCount };
-  }
-
-  return { init, registerAvatar, unregisterAvatar, setSpriteColor, update, getStats };
-})();
-
-window.LODSystem = LODSystem;
--- a/scripts/resurrection_pool.py
+++ b/scripts/resurrection_pool.py
@@ -0,0 +1,377 @@
+#!/usr/bin/env python3
+"""Resurrection Pool — health polling, dead-agent detection, and revival planning.
+
+Grounded implementation slice for #882.
+Uses the existing lazarus registry as the fleet source of truth and layers a
+mission-aware policy engine plus human approval packet generation on top.
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import subprocess
+import urllib.request
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Any, Dict, List, Optional
+
+import yaml
+
+ROOT = Path(__file__).resolve().parent.parent
+REGISTRY_PATH = ROOT / "lazarus-registry.yaml"
+POLICY_PATH = ROOT / "config" / "resurrection_pool.json"
+STATE_PATH = Path("/var/lib/lazarus/resurrection_pool_state.json")
+LOCAL_HOSTS = {"127.0.0.1", "localhost", "104.131.15.18"}
+ISSUE_NUMBER = 882
+
+
+def shell(cmd: str, timeout: int = 30) -> tuple[int, str, str]:
+    try:
+        result = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=timeout)
+        return result.returncode, result.stdout.strip(), result.stderr.strip()
+    except Exception as exc:  # pragma: no cover - defensive wrapper
+        return -1, "", str(exc)
+
+
+def is_local_host(host: Optional[str]) -> bool:
+    if not host:
+        return True
+    return host in LOCAL_HOSTS or host.startswith("127.")
+
+
+def ping_http(url: str, timeout: int = 10) -> tuple[bool, int]:
+    try:
+        req = urllib.request.Request(url, method="HEAD")
+        with urllib.request.urlopen(req, timeout=timeout) as resp:
+            return True, resp.status
+    except urllib.error.HTTPError as err:
+        return True, err.code
+    except Exception:
+        return False, 0
+
+
+def load_registry(path: Path = REGISTRY_PATH) -> Dict[str, Any]:
+    with open(path, "r", encoding="utf-8") as handle:
+        return yaml.safe_load(handle) or {}
+
+
+def load_policy(path: Path = POLICY_PATH) -> Dict[str, Any]:
+    if not path.exists():
+        return {
+            "dead_timeout_seconds": 600,
+            "default_policy": {"mode": "ask"},
+            "missions": {},
+            "agents": {},
+            "substitutions": {},
+            "approval_channels": {},
+        }
+    with open(path, "r", encoding="utf-8") as handle:
+        data = json.load(handle)
+    data.setdefault("dead_timeout_seconds", 600)
+    data.setdefault("default_policy", {"mode": "ask"})
+    data.setdefault("missions", {})
+    data.setdefault("agents", {})
+    data.setdefault("substitutions", {})
+    data.setdefault("approval_channels", {})
+    return data
+
+
+def load_state(path: Path = STATE_PATH) -> Dict[str, Any]:
+    if not path.exists():
+        return {}
+    with open(path, "r", encoding="utf-8") as handle:
+        return json.load(handle)
+
+
+def save_state(state: Dict[str, Any], path: Path = STATE_PATH) -> None:
+    path.parent.mkdir(parents=True, exist_ok=True)
+    with open(path, "w", encoding="utf-8") as handle:
+        json.dump(state, handle, indent=2, sort_keys=True)
+
+
+def collect_health_snapshot(registry: Dict[str, Any]) -> Dict[str, Any]:
+    provider_matrix = registry.get("provider_health_matrix", {})
+    fleet = registry.get("fleet", {})
+    snapshot: Dict[str, Any] = {}
+
+    for agent_name, spec in fleet.items():
+        primary = spec.get("primary", {})
+        provider_name = primary.get("provider")
+        provider_status = provider_matrix.get(provider_name, {}).get("status", "unknown")
+        gateway_url = spec.get("health_endpoints", {}).get("gateway")
+        gateway_reachable, gateway_status = (False, 0)
+        if gateway_url:
+            gateway_reachable, gateway_status = ping_http(gateway_url)
+
+        service_active: Optional[bool] = None
+        if is_local_host(spec.get("host")):
+            service_code, _, _ = shell(f"systemctl is-active hermes-{agent_name}.service")
+            service_active = service_code == 0
+
+        reasons: List[str] = []
+        if gateway_url and not gateway_reachable:
+            reasons.append("gateway_unreachable")
+        if service_active is False:
+            reasons.append("service_inactive")
+        if provider_status in {"dead", "degraded"}:
+            reasons.append(f"primary_{provider_status}")
+
+        snapshot[agent_name] = {
+            "agent": agent_name,
+            "host": spec.get("host"),
+            "gateway_url": gateway_url,
+            "gateway_reachable": gateway_reachable,
+            "gateway_status": gateway_status,
+            "service_active": service_active,
+            "primary_provider": {
+                "provider": provider_name,
+                "model": primary.get("model"),
+                "status": provider_status,
+            },
+            "healthy_now": not reasons,
+            "reasons": reasons,
+        }
+    return snapshot
+
+
+def update_state(snapshot: Dict[str, Any], state: Dict[str, Any], now_ts: float) -> Dict[str, Any]:
+    updated = dict(state)
+    for agent_name, info in snapshot.items():
+        entry = dict(updated.get(agent_name, {}))
+        entry["last_checked_at"] = now_ts
+        entry["last_reasons"] = list(info.get("reasons", []))
+        if info.get("healthy_now"):
+            entry["last_healthy_at"] = now_ts
+        else:
+            entry.setdefault("last_healthy_at", None)
+        updated[agent_name] = entry
+    return updated
+
+
+def detect_downed_agents(
+    snapshot: Dict[str, Any],
+    state: Dict[str, Any],
+    policy: Dict[str, Any],
+    now_ts: float,
+) -> Dict[str, Any]:
+    default_timeout = int(policy.get("dead_timeout_seconds", 600))
+    agent_overrides = policy.get("agents", {})
+    detected: Dict[str, Any] = {}
+
+    for agent_name, info in snapshot.items():
+        timeout_seconds = int(agent_overrides.get(agent_name, {}).get("dead_timeout_seconds", default_timeout))
+        last_healthy_at = state.get(agent_name, {}).get("last_healthy_at")
+        if info.get("healthy_now"):
+            unhealthy_for_seconds = 0.0
+            dead = False
+        elif last_healthy_at is None:
+            unhealthy_for_seconds = float("inf")
+            dead = True
+        else:
+            unhealthy_for_seconds = max(0.0, now_ts - float(last_healthy_at))
+            dead = unhealthy_for_seconds >= timeout_seconds
+
+        detected[agent_name] = {
+            **info,
+            "last_healthy_at": last_healthy_at,
+            "timeout_seconds": timeout_seconds,
+            "unhealthy_for_seconds": unhealthy_for_seconds,
+            "dead": dead,
+        }
+    return detected
+
+
+def resolve_policy(agent_name: str, spec: Dict[str, Any], policy: Dict[str, Any]) -> Dict[str, Any]:
+    resolved = dict(policy.get("default_policy", {}))
+    spec_mission = spec.get("mission")
+    agent_override = dict(policy.get("agents", {}).get(agent_name, {}))
+    resolved_mission = agent_override.get("mission") or spec_mission or agent_name
+    if resolved_mission in policy.get("missions", {}):
+        resolved.update(policy["missions"][resolved_mission])
+    resolved.update(agent_override)
+    resolved.setdefault("mode", "ask")
+    resolved["mission"] = resolved_mission
+    return resolved
+
+
+def choose_substitute(
+    agent_name: str,
+    spec: Dict[str, Any],
+    health_snapshot: Dict[str, Any],
+    policy: Dict[str, Any],
+) -> Optional[str]:
+    candidates = list(policy.get("substitutions", {}).get(agent_name, []))
+    candidates.extend(spec.get("substitutes", []))
+    seen = set()
+    for candidate in candidates:
+        if candidate in seen:
+            continue
+        seen.add(candidate)
+        candidate_health = health_snapshot.get(candidate, {})
+        if candidate_health.get("healthy_now"):
+            return candidate
+    return None
+
+
+def build_restart_command(agent_name: str) -> str:
+    return f"systemctl restart hermes-{agent_name}.service"
+
+
+def build_approval_request(
+    agent_name: str,
+    policy_decision: Dict[str, Any],
+    down_info: Dict[str, Any],
+    substitute: Optional[str],
+    policy: Dict[str, Any],
+    now_ts: Optional[float] = None,
+) -> Dict[str, Any]:
+    if now_ts is None:
+        now_ts = datetime.now(timezone.utc).timestamp()
+    reasons = ", ".join(down_info.get("reasons", [])) or "no health signal"
+    mission = policy_decision.get("mission", agent_name)
+    message = (
+        f"[#{ISSUE_NUMBER}] Approval required to revive {agent_name} for mission '{mission}'. "
+        f"Reasons: {reasons}. "
+        f"Suggested substitute: {substitute or 'none available'}."
+    )
+    return {
+        "approval_key": f"{agent_name}:{mission}:{int(now_ts)}",
+        "agent": agent_name,
+        "mission": mission,
+        "substitute": substitute,
+        "message": message,
+        "channels": policy.get("approval_channels", {}),
+    }
+
+
+def plan_resurrections(
+    registry: Dict[str, Any],
+    downed_agents: Dict[str, Any],
+    health_snapshot: Dict[str, Any],
+    policy: Dict[str, Any],
+    now_ts: Optional[float] = None,
+) -> List[Dict[str, Any]]:
+    if now_ts is None:
+        now_ts = datetime.now(timezone.utc).timestamp()
+    fleet = registry.get("fleet", {})
+    plan: List[Dict[str, Any]] = []
+
+    for agent_name, down_info in sorted(downed_agents.items()):
+        if not down_info.get("dead"):
+            continue
+        spec = fleet.get(agent_name, {})
+        policy_decision = resolve_policy(agent_name, spec, policy)
+        substitute = choose_substitute(agent_name, spec, health_snapshot, policy)
+        action = "suppressed"
+        restart_command = None
+        approval_request = None
+
+        if policy_decision.get("mode") == "yes":
+            if is_local_host(spec.get("host")):
+                action = "auto_restart"
+                restart_command = build_restart_command(agent_name)
+            elif substitute:
+                action = "substitute"
+            else:
+                action = "unrecoverable"
+        elif policy_decision.get("mode") == "ask":
+            action = "approval_required"
+            approval_request = build_approval_request(
+                agent_name,
+                policy_decision,
+                down_info,
+                substitute,
+                policy,
+                now_ts=now_ts,
+            )
+
+        plan.append(
+            {
+                "agent": agent_name,
+                "mission": policy_decision.get("mission"),
+                "policy": policy_decision,
+                "reasons": list(down_info.get("reasons", [])),
+                "timeout_seconds": down_info.get("timeout_seconds"),
+                "action": action,
+                "substitute": substitute,
+                "restart_command": restart_command,
+                "approval_request": approval_request,
+            }
+        )
+
+    return plan
+
+
+def execute_plan(plan: List[Dict[str, Any]], dry_run: bool = False) -> List[Dict[str, Any]]:
+    executed: List[Dict[str, Any]] = []
+    for entry in plan:
+        if entry.get("action") != "auto_restart":
+            executed.append({**entry, "executed": False})
+            continue
+        cmd = entry.get("restart_command")
+        if dry_run or not cmd:
+            executed.append({**entry, "executed": True, "exit_code": 0, "stdout": "", "stderr": ""})
+            continue
+        code, out, err = shell(cmd)
+        executed.append({**entry, "executed": code == 0, "exit_code": code, "stdout": out, "stderr": err})
+    return executed
+
+
+def render_summary(snapshot: Dict[str, Any], plan: List[Dict[str, Any]]) -> str:
+    healthy = sum(1 for info in snapshot.values() if info.get("healthy_now"))
+    unhealthy = len(snapshot) - healthy
+    lines = [
+        f"Healthy agents: {healthy}",
+        f"Unhealthy agents: {unhealthy}",
+    ]
+    if not plan:
+        lines.append("Resurrection plan: no dead agents exceed timeout.")
+        return "\n".join(lines)
+    lines.append("Resurrection plan:")
+    for entry in plan:
+        lines.append(
+            f"- {entry['agent']}: {entry['action']}"
+            f" (mission={entry['mission']}, reasons={', '.join(entry['reasons']) or 'none'})"
+        )
+    return "\n".join(lines)
+
+
+def main() -> int:
+    parser = argparse.ArgumentParser(description="Resurrection Pool")
+    parser.add_argument("--registry", type=Path, default=REGISTRY_PATH)
+    parser.add_argument("--policy", type=Path, default=POLICY_PATH)
+    parser.add_argument("--state", type=Path, default=STATE_PATH)
+    parser.add_argument("--json", action="store_true")
+    parser.add_argument("--execute", action="store_true")
+    parser.add_argument("--dry-run", action="store_true")
+    args = parser.parse_args()
+
+    now_ts = datetime.now(timezone.utc).timestamp()
+    registry = load_registry(args.registry)
+    policy = load_policy(args.policy)
+    prior_state = load_state(args.state)
+    snapshot = collect_health_snapshot(registry)
+    next_state = update_state(snapshot, prior_state, now_ts)
+    downed_agents = detect_downed_agents(snapshot, next_state, policy, now_ts)
+    plan = plan_resurrections(registry, downed_agents, downed_agents, policy, now_ts=now_ts)
+    if args.execute:
+        plan = execute_plan(plan, dry_run=args.dry_run)
+    if not args.dry_run:
+        save_state(next_state, args.state)
+
+    payload = {
+        "checked_at": datetime.fromtimestamp(now_ts, tz=timezone.utc).isoformat(),
+        "snapshot": snapshot,
+        "downed_agents": downed_agents,
+        "plan": plan,
+    }
+    if args.json:
+        print(json.dumps(payload, indent=2, sort_keys=True))
+    else:
+        print(render_summary(snapshot, plan))
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
--- a/scripts/sync_branch_protection.py
+++ b/scripts/sync_branch_protection.py
@@ -4,61 +4,48 @@ Sync branch protection rules from .gitea/branch-protection/*.yml to Gitea.
 Correctly uses the Gitea 1.25+ API (not GitHub-style).
 """

-from __future__ import annotations
-
-import json
 import os
 import sys
+import json
 import urllib.request
-from pathlib import Path
-
 import yaml

 GITEA_URL = os.getenv("GITEA_URL", "https://forge.alexanderwhitestone.com")
 GITEA_TOKEN = os.getenv("GITEA_TOKEN", "")
 ORG = "Timmy_Foundation"
-PROJECT_ROOT = Path(__file__).resolve().parent.parent
-CONFIG_DIR = PROJECT_ROOT / ".gitea" / "branch-protection"
+CONFIG_DIR = ".gitea/branch-protection"


 def api_request(method: str, path: str, payload: dict | None = None) -> dict:
    url = f"{GITEA_URL}/api/v1{path}"
    data = json.dumps(payload).encode() if payload else None
-    req = urllib.request.Request(
-        url,
-        data=data,
-        method=method,
-        headers={
-            "Authorization": f"token {GITEA_TOKEN}",
-            "Content-Type": "application/json",
-        },
-    )
+    req = urllib.request.Request(url, data=data, method=method, headers={
+        "Authorization": f"token {GITEA_TOKEN}",
+        "Content-Type": "application/json",
+    })
    with urllib.request.urlopen(req, timeout=30) as resp:
        return json.loads(resp.read().decode())


-def build_branch_protection_payload(branch: str, rules: dict) -> dict:
-    return {
+def apply_protection(repo: str, rules: dict) -> bool:
+    branch = rules.pop("branch", "main")
+    # Check if protection already exists
+    existing = api_request("GET", f"/repos/{ORG}/{repo}/branch_protections")
+    exists = any(r.get("branch_name") == branch for r in existing)
+
+    payload = {
        "branch_name": branch,
        "rule_name": branch,
        "required_approvals": rules.get("required_approvals", 1),
        "block_on_rejected_reviews": rules.get("block_on_rejected_reviews", True),
        "dismiss_stale_approvals": rules.get("dismiss_stale_approvals", True),
        "block_deletions": rules.get("block_deletions", True),
-        "block_force_push": rules.get("block_force_push", rules.get("block_force_pushes", True)),
+        "block_force_push": rules.get("block_force_push", True),
        "block_admin_merge_override": rules.get("block_admin_merge_override", True),
        "enable_status_check": rules.get("require_ci_to_merge", False),
        "status_check_contexts": rules.get("status_check_contexts", []),
-        "block_on_outdated_branch": rules.get("block_on_outdated_branch", False),
    }

-
-def apply_protection(repo: str, rules: dict) -> bool:
-    branch = rules.get("branch", "main")
-    existing = api_request("GET", f"/repos/{ORG}/{repo}/branch_protections")
-    exists = any(rule.get("branch_name") == branch for rule in existing)
-    payload = build_branch_protection_payload(branch, rules)
-
    try:
        if exists:
            api_request("PATCH", f"/repos/{ORG}/{repo}/branch_protections/{branch}", payload)
@@ -66,8 +53,8 @@ def apply_protection(repo: str, rules: dict) -> bool:
            api_request("POST", f"/repos/{ORG}/{repo}/branch_protections", payload)
        print(f"✅ {repo}:{branch} synced")
        return True
-    except Exception as exc:
-        print(f"❌ {repo}:{branch} failed: {exc}")
+    except Exception as e:
+        print(f"❌ {repo}:{branch} failed: {e}")
        return False


@@ -75,18 +62,15 @@ def main() -> int:
    if not GITEA_TOKEN:
        print("ERROR: GITEA_TOKEN not set")
        return 1
-    if not CONFIG_DIR.exists():
-        print(f"ERROR: config directory not found: {CONFIG_DIR}")
-        return 1

    ok = 0
-    for cfg_path in sorted(CONFIG_DIR.glob("*.yml")):
-        repo = cfg_path.stem
-        with cfg_path.open() as fh:
-            cfg = yaml.safe_load(fh) or {}
-        rules = cfg.get("rules", {})
-        rules.setdefault("branch", cfg.get("branch", "main"))
-        if apply_protection(repo, rules):
+    for fname in os.listdir(CONFIG_DIR):
+        if not fname.endswith(".yml"):
+            continue
+        repo = fname[:-4]
+        with open(os.path.join(CONFIG_DIR, fname)) as f:
+            cfg = yaml.safe_load(f)
+        if apply_protection(repo, cfg.get("rules", {})):
            ok += 1

    print(f"\nSynced {ok} repo(s)")
--- a/tests/test_resurrection_pool.py
+++ b/tests/test_resurrection_pool.py
@@ -0,0 +1,118 @@
+from importlib import util
+from pathlib import Path
+
+
+ROOT = Path(__file__).resolve().parent.parent
+MODULE_PATH = ROOT / "scripts" / "resurrection_pool.py"
+
+
+def load_module():
+    spec = util.spec_from_file_location("resurrection_pool", MODULE_PATH)
+    module = util.module_from_spec(spec)
+    assert spec.loader is not None
+    spec.loader.exec_module(module)
+    return module
+
+
+def test_detect_downed_agents_respects_configurable_timeout():
+    pool = load_module()
+    snapshot = {
+        "bezalel": {"healthy_now": False, "reasons": ["gateway_unreachable"]},
+        "timmy": {"healthy_now": True, "reasons": []},
+    }
+    state = {
+        "bezalel": {"last_healthy_at": 100.0},
+        "timmy": {"last_healthy_at": 650.0},
+    }
+    policy = {"dead_timeout_seconds": 600, "agents": {}}
+
+    not_dead = pool.detect_downed_agents(snapshot, state, policy, now_ts=650.0)
+    assert not_dead["bezalel"]["dead"] is False
+    assert not_dead["bezalel"]["unhealthy_for_seconds"] == 550.0
+
+    dead = pool.detect_downed_agents(snapshot, state, policy, now_ts=701.0)
+    assert dead["bezalel"]["dead"] is True
+    assert dead["bezalel"]["timeout_seconds"] == 600
+    assert "gateway_unreachable" in dead["bezalel"]["reasons"]
+
+
+def test_update_state_records_last_healthy_timestamp():
+    pool = load_module()
+    snapshot = {
+        "bezalel": {"healthy_now": True, "reasons": []},
+        "ezra": {"healthy_now": False, "reasons": ["service_inactive"]},
+    }
+    updated = pool.update_state(snapshot, {}, now_ts=1234.5)
+    assert updated["bezalel"]["last_healthy_at"] == 1234.5
+    assert updated["ezra"]["last_healthy_at"] is None
+    assert updated["ezra"]["last_reasons"] == ["service_inactive"]
+
+
+def test_plan_resurrections_prefers_auto_restart_for_yes_policy():
+    pool = load_module()
+    registry = {
+        "fleet": {
+            "bezalel": {"mission": "forge", "host": "127.0.0.1"},
+            "allegro": {"mission": "forge", "host": "203.0.113.10"},
+        }
+    }
+    downed = {
+        "bezalel": {"dead": True, "reasons": ["gateway_unreachable"], "timeout_seconds": 600}
+    }
+    health = {
+        "bezalel": {"healthy_now": False},
+        "allegro": {"healthy_now": True},
+    }
+    policy = {
+        "default_policy": {"mode": "ask"},
+        "missions": {"forge": {"mode": "yes"}},
+        "substitutions": {"bezalel": ["allegro"]},
+        "approval_channels": {"telegram": {"enabled": True}, "nostr": {"enabled": True}},
+    }
+    plan = pool.plan_resurrections(registry, downed, health, policy, now_ts=2000.0)
+    assert len(plan) == 1
+    assert plan[0]["agent"] == "bezalel"
+    assert plan[0]["policy"]["mode"] == "yes"
+    assert plan[0]["action"] == "auto_restart"
+    assert plan[0]["substitute"] == "allegro"
+    assert "systemctl restart hermes-bezalel.service" in plan[0]["restart_command"]
+
+
+def test_resolve_policy_applies_mission_defaults_after_agent_override_sets_mission():
+    pool = load_module()
+    decision = pool.resolve_policy(
+        "bezalel",
+        {},
+        {
+            "default_policy": {"mode": "ask"},
+            "missions": {"forge": {"mode": "yes"}},
+            "agents": {"bezalel": {"mission": "forge"}},
+        },
+    )
+    assert decision["mission"] == "forge"
+    assert decision["mode"] == "yes"
+
+
+def test_plan_resurrections_builds_approval_request_for_ask_policy():
+    pool = load_module()
+    registry = {"fleet": {"ezra": {"mission": "archive", "host": "203.0.113.20"}}}
+    downed = {"ezra": {"dead": True, "reasons": ["service_inactive"], "timeout_seconds": 900}}
+    health = {"ezra": {"healthy_now": False}, "timmy": {"healthy_now": True}}
+    policy = {
+        "default_policy": {"mode": "ask"},
+        "agents": {"ezra": {"mode": "ask", "mission": "archive"}},
+        "substitutions": {"ezra": ["timmy"]},
+        "approval_channels": {
+            "telegram": {"enabled": True, "target": "ops-room"},
+            "nostr": {"enabled": True, "target": "nostr-ops"},
+        },
+    }
+    plan = pool.plan_resurrections(registry, downed, health, policy, now_ts=3000.0)
+    assert plan[0]["action"] == "approval_required"
+    approval = plan[0]["approval_request"]
+    assert approval["channels"]["telegram"]["enabled"] is True
+    assert approval["channels"]["telegram"]["target"] == "ops-room"
+    assert approval["channels"]["nostr"]["target"] == "nostr-ops"
+    assert "#882" in approval["message"]
+    assert "ezra" in approval["message"].lower()
+    assert approval["substitute"] == "timmy"
--- a/tests/test_sync_branch_protection.py
+++ b/tests/test_sync_branch_protection.py
@@ -1,45 +0,0 @@
-from __future__ import annotations
-
-import importlib.util
-import sys
-from pathlib import Path
-
-import yaml
-
-PROJECT_ROOT = Path(__file__).parent.parent
-
-_spec = importlib.util.spec_from_file_location(
-    "sync_branch_protection_test",
-    PROJECT_ROOT / "scripts" / "sync_branch_protection.py",
-)
-_mod = importlib.util.module_from_spec(_spec)
-sys.modules["sync_branch_protection_test"] = _mod
-_spec.loader.exec_module(_mod)
-
-build_branch_protection_payload = _mod.build_branch_protection_payload
-
-
-def test_build_branch_protection_payload_enables_rebase_before_merge():
-    payload = build_branch_protection_payload(
-        "main",
-        {
-            "required_approvals": 1,
-            "dismiss_stale_approvals": True,
-            "require_ci_to_merge": False,
-            "block_deletions": True,
-            "block_force_push": True,
-            "block_on_outdated_branch": True,
-        },
-    )
-
-    assert payload["branch_name"] == "main"
-    assert payload["rule_name"] == "main"
-    assert payload["block_on_outdated_branch"] is True
-    assert payload["required_approvals"] == 1
-    assert payload["enable_status_check"] is False
-
-
-def test_the_nexus_branch_protection_config_requires_up_to_date_branch():
-    config = yaml.safe_load((PROJECT_ROOT / ".gitea" / "branch-protection" / "the-nexus.yml").read_text())
-    rules = config["rules"]
-    assert rules["block_on_outdated_branch"] is True
Author	SHA1	Message	Date
Alexander Whitestone	61a6964780	wip: apply mission defaults before agent overrides Some checks are pending CI / test (pull_request) Waiting to run Details CI / validate (pull_request) Waiting to run Details Review Approval Gate / verify-review (pull_request) Waiting to run Details	2026-04-15 03:56:10 -04:00
Alexander Whitestone	e40891afb8	wip: honor mission defaults in resurrection policy	2026-04-15 03:54:56 -04:00
Alexander Whitestone	e232112fc8	wip: add resurrection pool planner and policy config	2026-04-15 03:53:20 -04:00
Alexander Whitestone	ff2e2e578f	wip: add resurrection pool regression tests	2026-04-15 03:50:56 -04:00