Add Household Checkpoint System for Ezra
- Knowledge transfer document (KT-Ezra) - Template checkpoint heartbeat script - Master deployment script for all wizards - Updated README with checkpoint status for all wizards - Instructions to save ALL workers, not just Allegro
This commit is contained in:
402
docs/kt-ezra-household-checkpoint-system.md
Normal file
402
docs/kt-ezra-household-checkpoint-system.md
Normal file
@@ -0,0 +1,402 @@
|
||||
# Knowledge Transfer: Household Checkpoint System
|
||||
|
||||
**From:** Allegro
|
||||
**To:** Ezra
|
||||
**Date:** 2026-04-02
|
||||
**Priority:** HIGH
|
||||
**Scope:** ALL profiles/wizards in the house
|
||||
|
||||
---
|
||||
|
||||
## Mission
|
||||
|
||||
Implement checkpoint heartbeat system for **all** Timmy Time wizards, not just Allegro. Save the workers. Save all wizards.
|
||||
|
||||
---
|
||||
|
||||
## Current State
|
||||
|
||||
| Wizard | Profile Location | Checkpoint Repo | Status |
|
||||
|--------|-----------------|-----------------|--------|
|
||||
| allegro | /root/.hermes/profiles/allegro/ | allegro-checkpoint | ✅ Active |
|
||||
| adagio | /root/.hermes/profiles/adagio/ | NONE | ❌ Not backed up |
|
||||
| timmy | /root/timmy/ | NONE | ❌ Not backed up |
|
||||
| bilbo | NOT DEPLOYED | NONE | ❌ Ghost |
|
||||
| ezra | /root/wizards/ezra/home/ | NONE | ❌ Not backed up |
|
||||
|
||||
---
|
||||
|
||||
## Architecture Pattern
|
||||
|
||||
### 1. Per-Wizard Checkpoint Repo
|
||||
|
||||
**Naming Convention:** `{wizard-id}-checkpoint`
|
||||
|
||||
**Example repos to create:**
|
||||
- `adagio-checkpoint`
|
||||
- `timmy-checkpoint`
|
||||
- `ezra-checkpoint`
|
||||
|
||||
### 2. What to Checkpoint
|
||||
|
||||
**Critical Files (copy these):**
|
||||
```
|
||||
SOUL.md # Conscience/principles
|
||||
config.yaml # Harness configuration
|
||||
.env # Environment variables
|
||||
memories/ # Durable memories
|
||||
skills/ # Custom skills (if any)
|
||||
work/ # Active work items
|
||||
```
|
||||
|
||||
**DO NOT copy:**
|
||||
- `state.db` (too large, changes too frequently)
|
||||
- `cache/` (ephemeral)
|
||||
- `logs/` (too large)
|
||||
- `sessions/` (ephemeral)
|
||||
- `.venv/` (can be rebuilt)
|
||||
|
||||
### 3. Heartbeat Script Pattern
|
||||
|
||||
**Location:** `scripts/checkpoint_heartbeat.py` in each checkpoint repo
|
||||
|
||||
**Key Functions:**
|
||||
```python
|
||||
def sync_directory(src, dst):
|
||||
# Rsync-style: delete old, copy new
|
||||
# Preserves directory structure
|
||||
|
||||
def capture_state():
|
||||
# Sync critical files
|
||||
# Update MANIFEST.md timestamp
|
||||
|
||||
def commit_checkpoint():
|
||||
# git add -A
|
||||
# git commit -m "Checkpoint: {timestamp}"
|
||||
# git push origin main
|
||||
```
|
||||
|
||||
### 4. Cron Schedule
|
||||
|
||||
**Frequency:** Every 4 hours
|
||||
```cron
|
||||
0 */4 * * * cd /root/wizards/{wizard}-checkpoint && /usr/bin/python3 scripts/checkpoint_heartbeat.py >> /var/log/{wizard}-checkpoint.log 2>&1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
### Phase 1: Create Missing Checkpoint Repos
|
||||
|
||||
**For each wizard NOT allegro:**
|
||||
|
||||
1. **Create repo in Gitea:**
|
||||
```bash
|
||||
curl -X POST "http://143.198.27.163:3000/api/v1/user/repos" \
|
||||
-H "Authorization: token ${GITEA_TOKEN}" \
|
||||
-d '{
|
||||
"name": "{wizard}-checkpoint",
|
||||
"description": "State checkpoint for {wizard} - automatic 4-hour backups",
|
||||
"private": false,
|
||||
"auto_init": true
|
||||
}'
|
||||
```
|
||||
|
||||
2. **Clone and setup structure:**
|
||||
```bash
|
||||
cd /root/wizards
|
||||
git clone "http://allegro:${GITEA_TOKEN}@143.198.27.163:3000/allegro/{wizard}-checkpoint.git"
|
||||
cd {wizard}-checkpoint
|
||||
|
||||
# Create directories
|
||||
mkdir -p scripts memories skills work config
|
||||
|
||||
# Copy template script (see below)
|
||||
cp /root/wizards/allegro-checkpoint/scripts/checkpoint_heartbeat.py scripts/
|
||||
|
||||
# Edit script for this wizard
|
||||
# Change: SOURCE_DIR = Path("/root/wizards/{wizard}/home")
|
||||
# Change: REPO_DIR = Path("/root/wizards/{wizard}-checkpoint")
|
||||
```
|
||||
|
||||
3. **Create initial MANIFEST.md:**
|
||||
```markdown
|
||||
# {Wizard} State Checkpoint
|
||||
|
||||
**Wizard:** {name}
|
||||
**Role:** {role}
|
||||
**Status:** INITIALIZING
|
||||
|
||||
## Contents
|
||||
- SOUL.md - Conscience and principles
|
||||
- config.yaml - Harness configuration
|
||||
- memories/ - Durable memories
|
||||
- skills/ - Custom skills
|
||||
- work/ - Active work items
|
||||
|
||||
---
|
||||
*Auto-generated by Household Checkpoint System*
|
||||
```
|
||||
|
||||
4. **Initial commit:**
|
||||
```bash
|
||||
git add -A
|
||||
git config user.email "ezra@hermes.local"
|
||||
git config user.name "Ezra"
|
||||
git commit -m "Initial checkpoint structure"
|
||||
git push origin main
|
||||
```
|
||||
|
||||
### Phase 2: Deploy Heartbeat Scripts
|
||||
|
||||
**For each wizard:**
|
||||
|
||||
1. **Test the script:**
|
||||
```bash
|
||||
cd /root/wizards/{wizard}-checkpoint
|
||||
python3 scripts/checkpoint_heartbeat.py
|
||||
```
|
||||
|
||||
2. **Add to cron:**
|
||||
```bash
|
||||
(crontab -l 2>/dev/null; echo "0 */4 * * * cd /root/wizards/{wizard}-checkpoint && /usr/bin/python3 scripts/checkpoint_heartbeat.py >> /var/log/{wizard}-checkpoint.log 2>&1") | crontab -
|
||||
```
|
||||
|
||||
### Phase 3: Verify All Checkpoints
|
||||
|
||||
**Verification checklist:**
|
||||
- [ ] adagio-checkpoint repo exists
|
||||
- [ ] timmy-checkpoint repo exists
|
||||
- [ ] ezra-checkpoint repo exists
|
||||
- [ ] Each has scripts/checkpoint_heartbeat.py
|
||||
- [ ] Each has initial commit
|
||||
- [ ] Cron jobs installed for all
|
||||
- [ ] First checkpoint completed for all
|
||||
|
||||
---
|
||||
|
||||
## Template: Generalized Checkpoint Script
|
||||
|
||||
**File:** `/root/wizards/household-snapshots/scripts/template_checkpoint_heartbeat.py`
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Household Checkpoint Heartbeat - Template
|
||||
Copy and customize for each wizard
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import json
|
||||
import subprocess
|
||||
import shutil
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
# CONFIGURE THESE FOR EACH WIZARD
|
||||
WIZARD_ID = "WIZARD_ID_HERE" # e.g., "adagio"
|
||||
WIZARD_NAME = "WIZARD_NAME_HERE" # e.g., "Adagio"
|
||||
WIZARD_ROLE = "WIZARD_ROLE_HERE" # e.g., "breath-and-design"
|
||||
|
||||
# Paths (standard structure)
|
||||
REPO_DIR = Path(f"/root/wizards/{WIZARD_ID}-checkpoint")
|
||||
SOURCE_DIR = Path(f"/root/wizards/{WIZARD_ID}/home")
|
||||
|
||||
# What to checkpoint
|
||||
CHECKPOINT_DIRS = ["memories", "skills", "work"]
|
||||
CHECKPOINT_FILES = ["SOUL.md", "config.yaml", ".env"]
|
||||
|
||||
def run_cmd(cmd, cwd=None):
|
||||
result = subprocess.run(cmd, shell=True, cwd=cwd, capture_output=True, text=True)
|
||||
return result.stdout.strip(), result.stderr.strip(), result.returncode
|
||||
|
||||
def sync_directory(src, dst):
|
||||
if not src.exists():
|
||||
print(f" ✗ Source not found: {src}")
|
||||
return False
|
||||
dst.mkdir(parents=True, exist_ok=True)
|
||||
for item in dst.iterdir():
|
||||
if item.is_dir():
|
||||
shutil.rmtree(item)
|
||||
else:
|
||||
item.unlink()
|
||||
for item in src.iterdir():
|
||||
if item.is_dir():
|
||||
shutil.copytree(item, dst / item.name)
|
||||
else:
|
||||
shutil.copy2(item, dst / item.name)
|
||||
return True
|
||||
|
||||
def sync_file(src, dst):
|
||||
if not src.exists():
|
||||
print(f" ✗ Source not found: {src}")
|
||||
return False
|
||||
dst.parent.mkdir(parents=True, exist_ok=True)
|
||||
shutil.copy2(src, dst)
|
||||
return True
|
||||
|
||||
def capture_state():
|
||||
print(f"=== Capturing {WIZARD_NAME} State ===")
|
||||
|
||||
for dirname in CHECKPOINT_DIRS:
|
||||
src = SOURCE_DIR / dirname
|
||||
dst = REPO_DIR / dirname
|
||||
if sync_directory(src, dst):
|
||||
print(f" ✓ Synced {dirname}/")
|
||||
|
||||
for filename in CHECKPOINT_FILES:
|
||||
src = SOURCE_DIR / filename
|
||||
dst = REPO_DIR / filename
|
||||
if sync_file(src, dst):
|
||||
print(f" ✓ Synced {filename}")
|
||||
|
||||
# Update MANIFEST
|
||||
manifest = REPO_DIR / "MANIFEST.md"
|
||||
if manifest.exists():
|
||||
content = manifest.read_text()
|
||||
now = datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S UTC")
|
||||
timestamp_line = f"**Last Checkpoint:** {now}"
|
||||
if timestamp_line not in content:
|
||||
content = content.replace(
|
||||
f"**Status:** ACTIVE",
|
||||
f"**Status:** ACTIVE \n{timestamp_line}"
|
||||
)
|
||||
manifest.write_text(content)
|
||||
print(f" ✓ Updated MANIFEST.md")
|
||||
|
||||
def has_changes():
|
||||
stdout, _, _ = run_cmd("git status --porcelain", cwd=REPO_DIR)
|
||||
return bool(stdout.strip())
|
||||
|
||||
def commit_checkpoint():
|
||||
timestamp = datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S UTC")
|
||||
run_cmd("git add -A", cwd=REPO_DIR)
|
||||
|
||||
if not has_changes():
|
||||
print(f" → No changes to commit")
|
||||
return True
|
||||
|
||||
stdout, stderr, code = run_cmd(
|
||||
f'git commit -m "Checkpoint: {timestamp}"',
|
||||
cwd=REPO_DIR
|
||||
)
|
||||
|
||||
if code != 0:
|
||||
print(f" ✗ Commit failed: {stderr}")
|
||||
return False
|
||||
|
||||
stdout, stderr, code = run_cmd("git push origin main", cwd=REPO_DIR)
|
||||
if code != 0:
|
||||
print(f" ✗ Push failed: {stderr}")
|
||||
return False
|
||||
|
||||
print(f" ✓ Committed to Gitea: {timestamp}")
|
||||
return True
|
||||
|
||||
def main():
|
||||
print(f"=== {WIZARD_NAME} Checkpoint Heartbeat ===")
|
||||
print(f"Time: {datetime.utcnow().isoformat()}Z")
|
||||
print()
|
||||
|
||||
capture_state()
|
||||
print()
|
||||
|
||||
if commit_checkpoint():
|
||||
print(f"\n✓ {WIZARD_NAME} checkpoint complete")
|
||||
return 0
|
||||
else:
|
||||
print(f"\n✗ {WIZARD_NAME} checkpoint failed")
|
||||
return 1
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Master Deployment Script
|
||||
|
||||
**Optional:** Create `/root/wizards/household-snapshots/scripts/deploy_all_checkpoints.py`
|
||||
|
||||
This script automates the entire process for all wizards.
|
||||
|
||||
**Features:**
|
||||
- Creates all repos via Gitea API
|
||||
- Clones and sets up structure
|
||||
- Deploys customized heartbeat scripts
|
||||
- Adds cron jobs
|
||||
- Runs initial checkpoint
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
python3 deploy_all_checkpoints.py --wizards adagio,timmy,ezra
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Verification Commands
|
||||
|
||||
**Check all checkpoint repos exist:**
|
||||
```bash
|
||||
curl -s "http://143.198.27.163:3000/api/v1/users/allegro/repos" \
|
||||
-H "Authorization: token ${GITEA_TOKEN}" | \
|
||||
python3 -c "import json,sys; data=json.load(sys.stdin);
|
||||
checkpoints=[r['name'] for r in data if '-checkpoint' in r['name']];
|
||||
print('Checkpoint repos:', checkpoints)"
|
||||
```
|
||||
|
||||
**Check all cron jobs installed:**
|
||||
```bash
|
||||
crontab -l | grep checkpoint
|
||||
```
|
||||
|
||||
**Manual trigger all checkpoints:**
|
||||
```bash
|
||||
for wizard in adagio timmy ezra; do
|
||||
echo "=== $wizard ==="
|
||||
cd /root/wizards/${wizard}-checkpoint && python3 scripts/checkpoint_heartbeat.py
|
||||
done
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
- [ ] Every wizard has a `-checkpoint` repo in Gitea
|
||||
- [ ] Each repo has: SOUL.md, config.yaml, memories/, skills/
|
||||
- [ ] Each has a working checkpoint_heartbeat.py
|
||||
- [ ] Cron runs every 4 hours for each wizard
|
||||
- [ ] First checkpoint completed and pushed for all
|
||||
- [ ] Log files at /var/log/{wizard}-checkpoint.log
|
||||
|
||||
---
|
||||
|
||||
## Emergency Recovery
|
||||
|
||||
If a wizard is lost, restore from checkpoint:
|
||||
```bash
|
||||
cd /root/wizards/{wizard}/home
|
||||
git clone "http://allegro:${GITEA_TOKEN}@143.198.27.163:3000/allegro/{wizard}-checkpoint.git" /tmp/restore
|
||||
cp -r /tmp/restore/memories/* memories/
|
||||
cp -r /tmp/restore/skills/* skills/
|
||||
cp /tmp/restore/SOUL.md .
|
||||
cp /tmp/restore/config.yaml .
|
||||
# Restart gateway
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Questions?
|
||||
|
||||
**Ask Allegro** via Evenia world tick:
|
||||
```bash
|
||||
python3 /root/.hermes/evenia/world_tick.py message ezra allegro "Checkpoint question..."
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Save the workers, Ezra. Save all wizards.**
|
||||
|
||||
*Allegro — Knowledge Transfer Complete*
|
||||
Reference in New Issue
Block a user