Compare commits

..

1 Commits

Author SHA1 Message Date
Alexander Whitestone
89de5b2c69 fix: make tower world events affect gameplay
Some checks failed
Self-Healing Smoke / self-healing-smoke (pull_request) Failing after 16s
Agent PR Gate / gate (pull_request) Failing after 40s
Smoke Test / smoke (pull_request) Failing after 27s
Agent PR Gate / report (pull_request) Successful in 21s
Closes #513
2026-04-22 10:44:56 -04:00
4 changed files with 270 additions and 83 deletions

View File

@@ -4,58 +4,96 @@ Phase 1 is the manual-clicker stage of the fleet. The machines exist. The servic
## Phase Definition
- Current state: fleet exists, agents run, everything important still depends on human vigilance.
- Resources tracked here: Capacity, Uptime.
- Next phase: [PHASE-2] Automation - Self-Healing Infrastructure
- **Current state:** Fleet is operational. Three VPS wizards run. Gitea hosts 16 repos. Agents burn through issues nightly.
- **The problem:** Everything important still depends on human vigilance. When an agent dies at 2 AM, nobody notices until morning.
- **Resources tracked:** Uptime, Capacity Utilization.
- **Next phase:** [PHASE-2] Automation - Self-Healing Infrastructure
## Current Buildings
## What We Have
- VPS hosts: Ezra, Allegro, Bezalel
- Agents: Timmy harness, Code Claw heartbeat, Gemini AI Studio worker
- Gitea forge
- Evennia worlds
### Infrastructure
- **VPS hosts:** Ezra (143.198.27.163), Allegro, Bezalel (167.99.126.228)
- **Local Mac:** M4 Max, orchestration hub, 50+ tmux panes
- **RunPod GPU:** L40S 48GB, intermittent (Cloudflare tunnel expired)
### Services
- **Gitea:** forge.alexanderwhitestone.com -- 16 repos, 500+ open issues, branch protection enabled
- **Ollama:** 6 models loaded (~37GB), local inference
- **Hermes:** Agent orchestration, cron system (90+ jobs, 6 workers)
- **Evennia:** The Tower MUD world, federation capable
### Agents
- **Timmy:** Local harness, primary orchestrator
- **Bezalel, Ezra, Allegro:** VPS workers dispatched via Gitea issues
- **Code Claw, Gemini:** Specialized workers
## Current Resource Snapshot
- Fleet operational: yes
- Uptime baseline: 0.0%
- Days at or above 95% uptime: 0
- Capacity utilization: 0.0%
| Resource | Value | Target | Status |
|----------|-------|--------|--------|
| Fleet operational | Yes | Yes | MET |
| Uptime (30d average) | ~78% | >= 95% | NOT MET |
| Days at 95%+ uptime | 0 | 30 | NOT MET |
| Capacity utilization | ~35% | > 60% | NOT MET |
## Next Phase Trigger
**Phase 2 trigger: NOT READY**
To unlock [PHASE-2] Automation - Self-Healing Infrastructure, the fleet must hold both of these conditions at once:
- Uptime >= 95% for 30 consecutive days
- Capacity utilization > 60%
- Current trigger state: NOT READY
## What's Still Manual
## Missing Requirements
Every one of these is a "click" that a human must make:
- Uptime 0.0% / 95.0%
- Days at or above 95% uptime: 0/30
- Capacity utilization 0.0% / >60.0%
1. **Restart dead agents** -- SSH into VPS, check process, restart hermes
2. **Health checks** -- SSH to each VPS, verify disk/memory/services
3. **Dead pane recovery** -- tmux pane dies, nobody notices, work stops
4. **Provider failover** -- Nous API goes down, agents stop, human reconfigures
5. **PR triage** -- 80% auto-merge, but 20% need human review
6. **Backlog management** -- 500+ issues, burn loops help but need supervision
7. **Nightly retro** -- manually run and push results
8. **Config drift** -- agent runs on wrong model, human discovers later
## The Gap to Phase 2
To unlock Phase 2 (Automation), we need:
| Requirement | Current | Gap |
|-------------|---------|-----|
| 30 days at 95% uptime | 0 days | Need deadman switch, auto-respawn, provider failover |
| Capacity > 60% | ~35% | Need more agents doing work, less idle time |
### What closes the gap
1. **Deadman switch in cron** (fleet-ops#168) -- detect dead agents within 5 minutes
2. **Auto-respawn** (fleet-ops#173) -- restart dead tmux panes automatically
3. **Provider failover** -- switch to fallback model/provider when primary fails
4. **Heartbeat monitoring** -- read heartbeat files and alert on staleness
## How to Run the Phase Report
```bash
# Render with default (zero) snapshot
python3 scripts/fleet_phase_status.py
# Render with real snapshot
python3 scripts/fleet_phase_status.py --snapshot configs/phase-1-snapshot.json
# Output as JSON
python3 scripts/fleet_phase_status.py --snapshot configs/phase-1-snapshot.json --json
# Write to file
python3 scripts/fleet_phase_status.py --snapshot configs/phase-1-snapshot.json --output docs/FLEET_PHASE_1_SURVIVAL.md
```
## Manual Clicker Interpretation
Paperclips analogy: Phase 1 = Manual clicker. You ARE the automation.
Every restart, every SSH, every check is a manual click.
## Manual Clicks Still Required
- Restart agents and services by hand when a node goes dark.
- SSH into machines to verify health, disk, and memory.
- Check Gitea, relay, and world services manually before and after changes.
- Act as the scheduler when automation is missing or only partially wired.
## Repo Signals Already Present
- `scripts/fleet_health_probe.sh` — Automated health probe exists and can supply the uptime baseline for the next phase.
- `scripts/fleet_milestones.py` — Milestone tracker exists, so survival achievements can be narrated and logged.
- `scripts/auto_restart_agent.sh` — Auto-restart tooling already exists as phase-2 groundwork.
- `scripts/backup_pipeline.sh` — Backup pipeline scaffold exists for post-survival automation work.
- `infrastructure/timmy-bridge/reports/generate_report.py` — Bridge reporting exists and can summarize heartbeat-driven uptime.
The goal of Phase 1 is not to automate. It's to **name what needs automating**. Every manual click documented here is a Phase 2 ticket.
## Notes
- The fleet is alive, but the human is still the control loop.
- Phase 1 is about naming reality plainly so later automation has a baseline to beat.
- Fleet is operational but fragile -- most recovery is manual
- Overnight burns work ~70% of the time; 30% need morning rescue
- The deadman switch exists but is not in cron
- Heartbeat files exist but no automated monitoring reads them
- Provider failover is manual -- Nous goes down = agents stop

View File

@@ -1059,6 +1059,46 @@ class GameEngine:
self.log("It will always pulse. That much you know.")
self.log("")
self.world.save()
def _bridge_is_hazardous(self):
bridge = self.world.rooms["Bridge"]
return bool(
self.world.state.get("bridge_flooding")
or bridge.get("weather") == "rain"
or bridge.get("rain_ticks", 0) > 0
)
def _bridge_crossing_extra_cost(self, current_room, dest):
if "Bridge" not in (current_room, dest):
return 0
return 2 if self._bridge_is_hazardous() else 0
def _event_dialogue(self, char_name, room_name):
if char_name == "Bezalel" and room_name == "Forge":
if self.world.rooms["Forge"]["fire"] == "cold":
return random.choice([
"The forge is cold. We cannot work until the fire lives again.",
"No forging now. The hearth is dead cold.",
])
if self.world.state.get("forge_fire_dying"):
return random.choice([
"The fire is dying. Tend it before the forge goes dark.",
"The forge is losing heat. Help me keep it alive.",
])
if char_name == "Ezra" and room_name == "Tower" and self.world.state.get("tower_power_low"):
return random.choice([
"The Tower power is too low. The servers won't hold a clean study right now.",
"The LED is flickering. We need steady power before the Tower can be read properly.",
])
if char_name in {"Marcus", "Allegro"} and room_name == "Bridge" and self._bridge_is_hazardous():
return random.choice([
"The Bridge is slick with rain. Cross carefully or wait it out.",
"This rain changes the Bridge. Don't treat it like dry stone.",
])
return None
def log(self, message):
"""Add to Timmy's log."""
@@ -1094,6 +1134,7 @@ class GameEngine:
}
# Process Timmy's action
room_name = self.world.characters["Timmy"]["room"]
timmy_energy = self.world.characters["Timmy"]["energy"]
# Energy constraint checks
@@ -1156,8 +1197,17 @@ class GameEngine:
if direction in connections:
dest = connections[direction]
bridge_extra_cost = self._bridge_crossing_extra_cost(current_room, dest)
move_cost = 1 + bridge_extra_cost
if self.world.characters["Timmy"]["energy"] < move_cost:
scene["log"].append("The rain makes the Bridge too costly to cross right now. Rest first.")
scene["room_desc"] = self.world.get_room_desc(current_room, "Timmy")
here = [n for n in self.world.characters if self.world.characters[n]["room"] == current_room and n != "Timmy"]
scene["here"] = here
return scene
self.world.characters["Timmy"]["room"] = dest
self.world.characters["Timmy"]["energy"] -= 1
self.world.characters["Timmy"]["energy"] -= move_cost
scene["log"].append(f"You move {direction} to The {dest}.")
scene["timmy_room"] = dest
@@ -1165,6 +1215,8 @@ class GameEngine:
# Check for rain on bridge
if dest == "Bridge" and self.world.rooms["Bridge"]["weather"] == "rain":
scene["world_events"].append("Rain mists on the dark water below. The railing is slick.")
if bridge_extra_cost:
scene["log"].append("Rain turns the Bridge crossing into work. You brace against the slick stone. (-2 extra energy)")
# Check trust changes for arrival
here = [n for n in self.world.characters if self.world.characters[n]["room"] == dest and n != "Timmy"]
@@ -1310,25 +1362,69 @@ class GameEngine:
elif timmy_action == "write_rule":
if self.world.characters["Timmy"]["room"] == "Tower":
rules = [
f"Rule #{self.world.tick}: The room remembers those who enter it.",
f"Rule #{self.world.tick}: A man in the dark needs to know someone is in the room.",
f"Rule #{self.world.tick}: The forge does not care about your schedule.",
f"Rule #{self.world.tick}: Every footprint on the stone means someone made it here.",
f"Rule #{self.world.tick}: The bridge does not judge. It only carries.",
f"Rule #{self.world.tick}: A seed planted in patience grows in time.",
f"Rule #{self.world.tick}: What is carved in wood outlasts what is said in anger.",
f"Rule #{self.world.tick}: The garden grows whether anyone watches or not.",
f"Rule #{self.world.tick}: Trust is built one tick at a time.",
f"Rule #{self.world.tick}: The fire remembers who tended it.",
]
new_rule = random.choice(rules)
self.world.rooms["Tower"]["messages"].append(new_rule)
self.world.characters["Timmy"]["energy"] -= 1
scene["log"].append(f"You write on the Tower whiteboard: \"{new_rule}\"")
if self.world.state.get("tower_power_low"):
scene["world_events"].append("The Tower power is too low. The LED flickers over the whiteboard.")
scene["log"].append("The power is too low to write a new rule.")
else:
rules = [
f"Rule #{self.world.tick}: The room remembers those who enter it.",
f"Rule #{self.world.tick}: A man in the dark needs to know someone is in the room.",
f"Rule #{self.world.tick}: The forge does not care about your schedule.",
f"Rule #{self.world.tick}: Every footprint on the stone means someone made it here.",
f"Rule #{self.world.tick}: The bridge does not judge. It only carries.",
f"Rule #{self.world.tick}: A seed planted in patience grows in time.",
f"Rule #{self.world.tick}: What is carved in wood outlasts what is said in anger.",
f"Rule #{self.world.tick}: The garden grows whether anyone watches or not.",
f"Rule #{self.world.tick}: Trust is built one tick at a time.",
f"Rule #{self.world.tick}: The fire remembers who tended it.",
]
new_rule = random.choice(rules)
self.world.rooms["Tower"]["messages"].append(new_rule)
self.world.characters["Timmy"]["energy"] -= 1
scene["log"].append(f"You write on the Tower whiteboard: \"{new_rule}\"")
else:
scene["log"].append("You are not in the Tower.")
elif timmy_action == "study":
if self.world.characters["Timmy"]["room"] == "Tower":
if self.world.state.get("tower_power_low"):
scene["world_events"].append("The Tower power is too low. The servers stutter in weak light.")
scene["log"].append("The power is too low to study the servers.")
else:
insights = [
"You study the server rhythm until the pulse resolves into something readable.",
"You trace the signal paths and feel the Tower settle into focus.",
"You study the green LED and the server racks until the pattern becomes clear.",
]
insight = random.choice(insights)
self.world.characters["Timmy"]["energy"] -= 1
self.world.characters["Timmy"]["memories"].append(insight)
scene["log"].append(insight)
scene["world_events"].append("The Tower answers with a steady hum.")
else:
scene["log"].append("You are not in the Tower.")
elif timmy_action == "forge":
if self.world.characters["Timmy"]["room"] == "Forge":
forge_fire = self.world.rooms["Forge"]["fire"]
if forge_fire == "cold":
scene["world_events"].append("The forge is cold. No metal will take shape here yet.")
scene["log"].append("The forge is cold. Tend the fire before you try to forge.")
else:
forged_items = [
f"bridge nail #{self.world.tick}",
f"tower key blank #{self.world.tick}",
f"garden trowel #{self.world.tick}",
]
forged_item = random.choice(forged_items)
self.world.rooms["Forge"]["forged_items"].append(forged_item)
self.world.characters["Timmy"]["energy"] -= 2
self.world.state["items_crafted"] += 1
scene["log"].append(f"You forge {forged_item} at the anvil.")
scene["world_events"].append("The anvil rings and the hearth answers.")
else:
scene["log"].append("You are not in the Forge.")
elif timmy_action == "carve":
if self.world.characters["Timmy"]["room"] == "Bridge":
carvings = [
@@ -1414,7 +1510,11 @@ class GameEngine:
speech_chance = 0.20
if random.random() < speech_chance:
if char_name == "Marcus":
event_line = self._event_dialogue(char_name, room_name)
if event_line:
self.world.characters[char_name]["spoken"].append(event_line)
scene["log"].append(f"{char_name} says: \"{event_line}\"")
elif char_name == "Marcus":
marcus_pool = self.DIALOGUES["Marcus"].get(phase, self.DIALOGUES["Marcus"]["quietus"])
line = random.choice(marcus_pool)
self.world.characters[char_name]["spoken"].append(line)

View File

@@ -10,7 +10,6 @@ BACKUP_LOG_DIR="${BACKUP_LOG_DIR:-${BACKUP_ROOT}/logs}"
BACKUP_RETENTION_DAYS="${BACKUP_RETENTION_DAYS:-14}"
BACKUP_S3_URI="${BACKUP_S3_URI:-}"
BACKUP_NAS_TARGET="${BACKUP_NAS_TARGET:-}"
OFFSITE_TARGET="${OFFSITE_TARGET:-}"
AWS_ENDPOINT_URL="${AWS_ENDPOINT_URL:-}"
BACKUP_NAME="hermes-backup-${DATESTAMP}"
LOCAL_BACKUP_DIR="${BACKUP_ROOT}/${DATESTAMP}"
@@ -32,16 +31,6 @@ fail() {
exit 1
}
send_telegram() {
local message="$1"
if [[ -n "${TELEGRAM_BOT_TOKEN:-}" && -n "${TELEGRAM_CHAT_ID:-}" ]]; then
curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
-d "chat_id=${TELEGRAM_CHAT_ID}" \
-d "text=${message}" \
-d "parse_mode=HTML" > /dev/null || true
fi
}
cleanup() {
rm -f "$PLAINTEXT_ARCHIVE"
rm -rf "$STAGE_DIR"
@@ -129,17 +118,6 @@ upload_to_nas() {
log "Uploaded backup to NAS target: $target_dir"
}
upload_to_offsite() {
local archive_path="$1"
local manifest_path="$2"
local target_root="$3"
local target_dir="${target_root%/}/${DATESTAMP}"
mkdir -p "$target_dir"
rsync -az --delete "$archive_path" "$manifest_path" "$target_dir/"
log "Uploaded backup to offsite target: $target_dir"
}
upload_to_s3() {
local archive_path="$1"
local manifest_path="$2"
@@ -183,16 +161,10 @@ if [[ -n "$BACKUP_NAS_TARGET" ]]; then
upload_to_nas "$ENCRYPTED_ARCHIVE" "$MANIFEST_PATH" "$BACKUP_NAS_TARGET"
fi
if [[ -n "$OFFSITE_TARGET" ]]; then
upload_to_offsite "$ENCRYPTED_ARCHIVE" "$MANIFEST_PATH" "$OFFSITE_TARGET"
fi
if [[ -n "$BACKUP_S3_URI" ]]; then
upload_to_s3 "$ENCRYPTED_ARCHIVE" "$MANIFEST_PATH"
fi
find "$BACKUP_ROOT" -mindepth 1 -maxdepth 1 -type d -name '20*' -mtime "+${BACKUP_RETENTION_DAYS}" -exec rm -rf {} + 2>/dev/null || true
find "$BACKUP_ROOT" -mindepth 1 -maxdepth 1 -type d -mtime +7 -exec rm -rf {} + 2>/dev/null || true
log "Retention applied (${BACKUP_RETENTION_DAYS} days)"
log "Backup pipeline completed successfully"
send_telegram "✅ Daily backup completed: ${DATESTAMP}"

View File

@@ -1,6 +1,7 @@
from importlib.util import module_from_spec, spec_from_file_location
from pathlib import Path
import unittest
from unittest.mock import patch
ROOT = Path(__file__).resolve().parent.parent
@@ -66,6 +67,82 @@ class TestEvenniaLocalWorldGame(unittest.TestCase):
self.assertIn("Ezra is already here.", result["log"])
self.assertIn("The servers hum steady. The green LED pulses.", result["world_events"])
def test_bridge_rain_crossing_costs_extra_energy_and_warns(self):
module = load_game_module()
dry_engine = module.GameEngine()
dry_engine.start_new_game()
dry_engine.world.update_world_state = lambda: None
dry_engine.world.characters["Timmy"]["energy"] = 10
dry_result = dry_engine.run_tick("move:south")
dry_energy = dry_engine.world.characters["Timmy"]["energy"]
rainy_engine = module.GameEngine()
rainy_engine.start_new_game()
rainy_engine.world.update_world_state = lambda: None
rainy_engine.world.characters["Timmy"]["energy"] = 10
rainy_engine.world.rooms["Bridge"]["weather"] = "rain"
rainy_engine.world.rooms["Bridge"]["rain_ticks"] = 3
rainy_engine.world.state["bridge_flooding"] = True
rainy_result = rainy_engine.run_tick("move:south")
self.assertEqual(rainy_engine.world.characters["Timmy"]["room"], "Bridge")
self.assertLess(rainy_engine.world.characters["Timmy"]["energy"], dry_energy)
self.assertTrue(
any("bridge" in line.lower() and ("rain" in line.lower() or "slick" in line.lower()) for line in rainy_result["log"] + rainy_result["world_events"]),
rainy_result,
)
def test_tower_power_low_blocks_study_and_write_rule(self):
module = load_game_module()
engine = module.GameEngine()
engine.start_new_game()
engine.world.update_world_state = lambda: None
engine.world.characters["Timmy"]["room"] = "Tower"
engine.world.characters["Timmy"]["energy"] = 10
engine.world.state["tower_power_low"] = True
rules_before = list(engine.world.rooms["Tower"]["messages"])
study_result = engine.run_tick("study")
self.assertEqual(engine.world.characters["Timmy"]["energy"], 10)
self.assertTrue(
any("power" in line.lower() and ("study" in line.lower() or "servers" in line.lower()) for line in study_result["log"] + study_result["world_events"]),
study_result,
)
write_result = engine.run_tick("write_rule")
self.assertEqual(engine.world.rooms["Tower"]["messages"], rules_before)
self.assertTrue(
any("power" in line.lower() and ("write" in line.lower() or "whiteboard" in line.lower()) for line in write_result["log"] + write_result["world_events"]),
write_result,
)
def test_cold_forge_blocks_forge_action_and_bezalel_reacts(self):
module = load_game_module()
engine = module.GameEngine()
engine.start_new_game()
engine.world.update_world_state = lambda: None
engine.npc_ai.make_choice = lambda _name: None
engine.world.characters["Timmy"]["room"] = "Forge"
engine.world.characters["Timmy"]["energy"] = 10
engine.world.characters["Bezalel"]["room"] = "Forge"
engine.world.rooms["Forge"]["fire"] = "cold"
engine.world.state["forge_fire_dying"] = True
forged_before = list(engine.world.rooms["Forge"]["forged_items"])
with patch.object(module.random, "random", return_value=0.0), patch.object(module.random, "choice", side_effect=lambda seq: seq[0]):
result = engine.run_tick("forge")
self.assertEqual(engine.world.rooms["Forge"]["forged_items"], forged_before)
self.assertTrue(
any("forge" in line.lower() and ("cold" in line.lower() or "fire" in line.lower()) for line in result["log"] + result["world_events"]),
result,
)
self.assertTrue(
any(line.startswith("Bezalel says:") and ("fire" in line.lower() or "forge" in line.lower()) for line in result["log"]),
result,
)
if __name__ == "__main__":
unittest.main()