Allegro heartbeat for Adagio: 2026-04-02 01:05:01 UTC
This commit is contained in:
48
config/agent-registry.json
Normal file
48
config/agent-registry.json
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
{
|
||||||
|
"version": "1.0",
|
||||||
|
"last_updated": "2026-04-02T00:00:00Z",
|
||||||
|
"agents": [
|
||||||
|
{
|
||||||
|
"id": "allegro",
|
||||||
|
"name": "Allegro",
|
||||||
|
"role": "tempo-and-dispatch",
|
||||||
|
"owner": "alexander",
|
||||||
|
"home_dir": "/root/wizards/allegro",
|
||||||
|
"config_path": "/root/wizards/allegro/home/config.yaml",
|
||||||
|
"expected_capabilities": ["telegram", "api_server", "gitea"],
|
||||||
|
"tick_schedule": "monthly",
|
||||||
|
"active": true
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "adagio",
|
||||||
|
"name": "Adagio",
|
||||||
|
"role": "breath-and-design",
|
||||||
|
"owner": "alexander",
|
||||||
|
"home_dir": "/root/wizards/adagio",
|
||||||
|
"config_path": "/root/wizards/adagio/home/config.yaml",
|
||||||
|
"expected_capabilities": ["telegram", "api_server", "gitea"],
|
||||||
|
"tick_schedule": "monthly",
|
||||||
|
"active": true
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "timmy",
|
||||||
|
"name": "Timmy Time",
|
||||||
|
"role": "father-house",
|
||||||
|
"owner": "alexander",
|
||||||
|
"home_dir": "/root/timmy",
|
||||||
|
"config_path": "/root/timmy/config.yaml",
|
||||||
|
"expected_capabilities": ["telegram", "api_server"],
|
||||||
|
"tick_schedule": "monthly",
|
||||||
|
"active": true
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"groups": {
|
||||||
|
"timmy-time": {
|
||||||
|
"name": "Timmy Time Core",
|
||||||
|
"agents": ["allegro", "adagio", "timmy"],
|
||||||
|
"tick_frequency": "monthly",
|
||||||
|
"tick_window_hours": 72,
|
||||||
|
"auto_rca": true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
195
docs/agent-tick-protocol.md
Normal file
195
docs/agent-tick-protocol.md
Normal file
@@ -0,0 +1,195 @@
|
|||||||
|
# Agent Tick Protocol v1.0
|
||||||
|
|
||||||
|
**Purpose:** Establish baseline health monitoring for all Timmy Time agents with automated RCA on failure.
|
||||||
|
|
||||||
|
**Frequency:** Monthly (1st of each month)
|
||||||
|
|
||||||
|
**Scope:** All agents in Timmy Time group
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Protocol Specification
|
||||||
|
|
||||||
|
### What Constitutes a Tick
|
||||||
|
|
||||||
|
Every agent must emit a **Tick Record** containing:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"agent_id": "allegro",
|
||||||
|
"agent_name": "Allegro",
|
||||||
|
"role": "tempo-and-dispatch",
|
||||||
|
"timestamp": "2026-04-02T00:00:00Z",
|
||||||
|
"tick_month": "2026-04",
|
||||||
|
"status": "healthy",
|
||||||
|
"vitals": {
|
||||||
|
"gateway_running": true,
|
||||||
|
"home_directory_accessible": true,
|
||||||
|
"config_valid": true,
|
||||||
|
"last_user_interaction": "2026-04-01T23:45:00Z",
|
||||||
|
"work_items_completed_this_month": 47
|
||||||
|
},
|
||||||
|
"capabilities": {
|
||||||
|
"telegram": true,
|
||||||
|
"api_server": true,
|
||||||
|
"gitea_access": true
|
||||||
|
},
|
||||||
|
"notes": ""
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Tick Status Values
|
||||||
|
|
||||||
|
| Status | Meaning | Action Required |
|
||||||
|
|--------|---------|-----------------|
|
||||||
|
| `healthy` | All systems operational | None |
|
||||||
|
| `degraded` | Some capabilities impaired | Monitor |
|
||||||
|
| `critical` | Core functionality compromised | RCA required |
|
||||||
|
| `offline` | Agent unreachable | RCA + escalation |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Tick Submission
|
||||||
|
|
||||||
|
### Method 1: Direct Commit to household-snapshots
|
||||||
|
|
||||||
|
Agents commit their tick to:
|
||||||
|
```
|
||||||
|
ticks/2026-04/allegro.json
|
||||||
|
ticks/2026-04/adagio.json
|
||||||
|
```
|
||||||
|
|
||||||
|
### Method 2: API Endpoint (Future)
|
||||||
|
|
||||||
|
POST to Evenia world tick with tick payload.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Monitoring & Enforcement
|
||||||
|
|
||||||
|
### Monthly Tick Collection Window
|
||||||
|
|
||||||
|
- **Opens:** 1st of month at 00:00 UTC
|
||||||
|
- **Closes:** 3rd of month at 23:59 UTC
|
||||||
|
- **Grace Period:** 72 hours
|
||||||
|
|
||||||
|
### Automated Checks
|
||||||
|
|
||||||
|
1. **Tick Presence Check** (4th of month)
|
||||||
|
- Verify all registered agents have submitted ticks
|
||||||
|
- Missing ticks → Gitea issue created
|
||||||
|
|
||||||
|
2. **Status Validation** (4th of month)
|
||||||
|
- Check all submitted ticks for `critical` or `offline` status
|
||||||
|
- Failed status → Gitea issue created
|
||||||
|
|
||||||
|
3. **RCA Auto-Generation** (4th of month)
|
||||||
|
- Issues created with RCA template pre-filled
|
||||||
|
- Assigned to agent owner
|
||||||
|
- Due date: 7 days
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## RCA Template
|
||||||
|
|
||||||
|
When an agent fails to tick or reports critical status, this template is used:
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
## Agent Health Failure: [AGENT_ID]
|
||||||
|
|
||||||
|
**Detected:** [DATE]
|
||||||
|
**Agent:** [AGENT_NAME]
|
||||||
|
**Failure Type:** [MISSING_TICK | CRITICAL_STATUS | OFFLINE]
|
||||||
|
|
||||||
|
### Expected Behavior
|
||||||
|
Agent should emit monthly tick within 72-hour window.
|
||||||
|
|
||||||
|
### Actual Behavior
|
||||||
|
- Tick Status: [STATUS]
|
||||||
|
- Last Known Good: [DATE]
|
||||||
|
- Capabilities Lost: [LIST]
|
||||||
|
|
||||||
|
### Root Cause Analysis Required
|
||||||
|
|
||||||
|
Please investigate and document:
|
||||||
|
|
||||||
|
1. **What happened?**
|
||||||
|
- Last successful operation
|
||||||
|
- Error logs (if any)
|
||||||
|
- System state at failure
|
||||||
|
|
||||||
|
2. **Why did it happen?**
|
||||||
|
- Configuration drift
|
||||||
|
- Resource exhaustion
|
||||||
|
- External dependency failure
|
||||||
|
- Code regression
|
||||||
|
|
||||||
|
3. **How do we prevent recurrence?**
|
||||||
|
- Monitoring improvements
|
||||||
|
- Automated recovery
|
||||||
|
- Alert tuning
|
||||||
|
|
||||||
|
4. **Recovery steps taken**
|
||||||
|
- Actions performed
|
||||||
|
- Current status
|
||||||
|
- Validation performed
|
||||||
|
|
||||||
|
### Timeline
|
||||||
|
|
||||||
|
- [ ] T+0: Issue created (auto)
|
||||||
|
- [ ] T+1h: Initial response
|
||||||
|
- [ ] T+24h: RCA submitted
|
||||||
|
- [ ] T+7d: Resolution verified
|
||||||
|
|
||||||
|
### Related
|
||||||
|
- Previous tick: [LINK]
|
||||||
|
- Agent config: [LINK]
|
||||||
|
- Logs: [LINK]
|
||||||
|
|
||||||
|
---
|
||||||
|
*Auto-generated by Agent Tick Monitor*
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Agent Registry
|
||||||
|
|
||||||
|
Current agents in scope:
|
||||||
|
|
||||||
|
| Agent ID | Name | Role | Owner | Status |
|
||||||
|
|----------|------|------|-------|--------|
|
||||||
|
| allegro | Allegro | tempo-and-dispatch | Alexander | active |
|
||||||
|
| adagio | Adagio | breath-and-design | Alexander | active |
|
||||||
|
| timmy | Timmy Time | father-house | Alexander | active |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation
|
||||||
|
|
||||||
|
### Files
|
||||||
|
|
||||||
|
- `scripts/agent_tick_monitor.py` - Monthly monitoring
|
||||||
|
- `scripts/agent_tick_submitter.py` - Agent self-reporting
|
||||||
|
- `templates/rca-template.md` - RCA issue template
|
||||||
|
- `config/agent-registry.json` - Agent definitions
|
||||||
|
|
||||||
|
### Cron Schedule
|
||||||
|
|
||||||
|
```cron
|
||||||
|
# Monthly tick collection - 1st of month at 00:00
|
||||||
|
0 0 1 * * /usr/bin/python3 /root/wizards/household-snapshots/scripts/agent_tick_submitter.py --all
|
||||||
|
|
||||||
|
# Tick validation and RCA trigger - 4th of month at 00:00
|
||||||
|
0 0 4 * * /usr/bin/python3 /root/wizards/household-snapshots/scripts/agent_tick_monitor.py --check-and-report
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Version History
|
||||||
|
|
||||||
|
| Version | Date | Changes |
|
||||||
|
|---------|------|---------|
|
||||||
|
| 1.0 | 2026-04-02 | Initial protocol |
|
||||||
|
|
||||||
|
---
|
||||||
|
*Evenia binds us. Health is monitored. Failures are learned from.*
|
||||||
12
heartbeats/allegro/20260402_010501.json
Normal file
12
heartbeats/allegro/20260402_010501.json
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
{
|
||||||
|
"timestamp": "2026-04-02T01:05:01.485463Z",
|
||||||
|
"wizard": "adagio",
|
||||||
|
"checked_by": "allegro",
|
||||||
|
"checks": {
|
||||||
|
"home_exists": true,
|
||||||
|
"soul_present": true,
|
||||||
|
"config_present": true,
|
||||||
|
"gateway_running": true
|
||||||
|
},
|
||||||
|
"status": "healthy"
|
||||||
|
}
|
||||||
@@ -1 +1 @@
|
|||||||
20260402_005001.json
|
20260402_010501.json
|
||||||
239
scripts/agent_tick_submitter.py
Normal file
239
scripts/agent_tick_submitter.py
Normal file
@@ -0,0 +1,239 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Agent Tick Submitter
|
||||||
|
|
||||||
|
Each agent runs this monthly to submit their health tick.
|
||||||
|
Can also be run centrally for all agents.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import json
|
||||||
|
import subprocess
|
||||||
|
import argparse
|
||||||
|
from datetime import datetime
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
REPO_DIR = Path("/root/wizards/household-snapshots")
|
||||||
|
TICKS_DIR = REPO_DIR / "ticks"
|
||||||
|
REGISTRY_FILE = REPO_DIR / "config" / "agent-registry.json"
|
||||||
|
GITEA_URL = os.environ.get("CLAW_CODE_GITEA_URL", "http://143.198.27.163:3000")
|
||||||
|
GITEA_TOKEN = os.environ.get("GITEA_TOKEN", "")
|
||||||
|
|
||||||
|
def run_cmd(cmd, cwd=None):
|
||||||
|
"""Run shell command and return output."""
|
||||||
|
result = subprocess.run(
|
||||||
|
cmd, shell=True, cwd=cwd, capture_output=True, text=True
|
||||||
|
)
|
||||||
|
return result.stdout.strip(), result.stderr.strip(), result.returncode
|
||||||
|
|
||||||
|
def load_registry():
|
||||||
|
"""Load agent registry."""
|
||||||
|
with open(REGISTRY_FILE) as f:
|
||||||
|
return json.load(f)
|
||||||
|
|
||||||
|
def check_gateway_running(agent_id):
|
||||||
|
"""Check if agent's gateway is running."""
|
||||||
|
stdout, _, _ = run_cmd(f"pgrep -f 'hermes gateway.*{agent_id}' || true")
|
||||||
|
return bool(stdout)
|
||||||
|
|
||||||
|
def check_home_accessible(home_dir):
|
||||||
|
"""Check if home directory is accessible."""
|
||||||
|
return Path(home_dir).exists()
|
||||||
|
|
||||||
|
def check_config_valid(config_path):
|
||||||
|
"""Basic config validation."""
|
||||||
|
if not Path(config_path).exists():
|
||||||
|
return False
|
||||||
|
try:
|
||||||
|
with open(config_path) as f:
|
||||||
|
content = f.read()
|
||||||
|
return 'model:' in content and 'platforms:' in content
|
||||||
|
except:
|
||||||
|
return False
|
||||||
|
|
||||||
|
def get_last_interaction(agent_id):
|
||||||
|
"""Get last user interaction from logs (if available)."""
|
||||||
|
log_dir = Path(f"/root/wizards/{agent_id}/logs")
|
||||||
|
if not log_dir.exists():
|
||||||
|
return None
|
||||||
|
|
||||||
|
try:
|
||||||
|
log_files = sorted(log_dir.glob("*.log"), key=lambda x: x.stat().st_mtime, reverse=True)
|
||||||
|
if log_files:
|
||||||
|
mtime = datetime.fromtimestamp(log_files[0].stat().st_mtime)
|
||||||
|
return mtime.isoformat() + "Z"
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
return None
|
||||||
|
|
||||||
|
def count_work_items(agent_id):
|
||||||
|
"""Count work items completed this month."""
|
||||||
|
work_dir = Path(f"/root/wizards/{agent_id}/work")
|
||||||
|
if not work_dir.exists():
|
||||||
|
return 0
|
||||||
|
|
||||||
|
count = 0
|
||||||
|
current_month = datetime.utcnow().strftime("%Y-%m")
|
||||||
|
|
||||||
|
for item in work_dir.rglob("*"):
|
||||||
|
if item.is_file():
|
||||||
|
try:
|
||||||
|
mtime = datetime.fromtimestamp(item.stat().st_mtime)
|
||||||
|
if mtime.strftime("%Y-%m") == current_month:
|
||||||
|
count += 1
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
|
||||||
|
return count
|
||||||
|
|
||||||
|
def generate_tick(agent_id, registry_data):
|
||||||
|
"""Generate tick record for an agent."""
|
||||||
|
agent = None
|
||||||
|
for a in registry_data["agents"]:
|
||||||
|
if a["id"] == agent_id:
|
||||||
|
agent = a
|
||||||
|
break
|
||||||
|
|
||||||
|
if not agent:
|
||||||
|
raise ValueError(f"Agent {agent_id} not found in registry")
|
||||||
|
|
||||||
|
now = datetime.utcnow()
|
||||||
|
|
||||||
|
# Check vitals
|
||||||
|
gateway_running = check_gateway_running(agent_id)
|
||||||
|
home_accessible = check_home_accessible(agent["home_dir"])
|
||||||
|
config_valid = check_config_valid(agent["config_path"])
|
||||||
|
|
||||||
|
# Determine status
|
||||||
|
if not gateway_running or not home_accessible:
|
||||||
|
status = "critical"
|
||||||
|
elif not config_valid:
|
||||||
|
status = "degraded"
|
||||||
|
else:
|
||||||
|
status = "healthy"
|
||||||
|
|
||||||
|
# Check capabilities
|
||||||
|
capabilities = {}
|
||||||
|
for cap in agent.get("expected_capabilities", []):
|
||||||
|
if cap == "telegram":
|
||||||
|
# Check if telegram is enabled in config
|
||||||
|
try:
|
||||||
|
with open(agent["config_path"]) as f:
|
||||||
|
capabilities[cap] = "enabled: true" in f.read()
|
||||||
|
except:
|
||||||
|
capabilities[cap] = False
|
||||||
|
elif cap == "api_server":
|
||||||
|
capabilities[cap] = gateway_running
|
||||||
|
elif cap == "gitea":
|
||||||
|
capabilities[cap] = bool(GITEA_TOKEN)
|
||||||
|
else:
|
||||||
|
capabilities[cap] = True
|
||||||
|
|
||||||
|
tick = {
|
||||||
|
"agent_id": agent_id,
|
||||||
|
"agent_name": agent["name"],
|
||||||
|
"role": agent["role"],
|
||||||
|
"timestamp": now.isoformat() + "Z",
|
||||||
|
"tick_month": now.strftime("%Y-%m"),
|
||||||
|
"status": status,
|
||||||
|
"vitals": {
|
||||||
|
"gateway_running": gateway_running,
|
||||||
|
"home_directory_accessible": home_accessible,
|
||||||
|
"config_valid": config_valid,
|
||||||
|
"last_user_interaction": get_last_interaction(agent_id),
|
||||||
|
"work_items_completed_this_month": count_work_items(agent_id)
|
||||||
|
},
|
||||||
|
"capabilities": capabilities,
|
||||||
|
"notes": ""
|
||||||
|
}
|
||||||
|
|
||||||
|
return tick
|
||||||
|
|
||||||
|
def save_tick(tick):
|
||||||
|
"""Save tick to repository."""
|
||||||
|
month_dir = TICKS_DIR / tick["tick_month"]
|
||||||
|
month_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
tick_file = month_dir / f"{tick['agent_id']}.json"
|
||||||
|
|
||||||
|
with open(tick_file, 'w') as f:
|
||||||
|
json.dump(tick, f, indent=2)
|
||||||
|
|
||||||
|
return tick_file
|
||||||
|
|
||||||
|
def commit_tick(agent_id, tick_month):
|
||||||
|
"""Commit tick to Gitea."""
|
||||||
|
tick_file = TICKS_DIR / tick_month / f"{agent_id}.json"
|
||||||
|
|
||||||
|
if not tick_file.exists():
|
||||||
|
return False, "Tick file not found"
|
||||||
|
|
||||||
|
# Git operations
|
||||||
|
run_cmd("git add -A", cwd=REPO_DIR)
|
||||||
|
stdout, stderr, code = run_cmd(
|
||||||
|
f'git commit -m "Monthly tick: {agent_id} for {tick_month}"',
|
||||||
|
cwd=REPO_DIR
|
||||||
|
)
|
||||||
|
|
||||||
|
if code != 0 and "nothing to commit" not in stderr.lower():
|
||||||
|
return False, f"Commit failed: {stderr}"
|
||||||
|
|
||||||
|
# Push
|
||||||
|
stdout, stderr, code = run_cmd("git push origin main", cwd=REPO_DIR)
|
||||||
|
if code != 0:
|
||||||
|
return False, f"Push failed: {stderr}"
|
||||||
|
|
||||||
|
return True, "Tick committed successfully"
|
||||||
|
|
||||||
|
def submit_agent_tick(agent_id, registry_data, commit=True):
|
||||||
|
"""Submit tick for a single agent."""
|
||||||
|
print(f"Generating tick for {agent_id}...")
|
||||||
|
|
||||||
|
tick = generate_tick(agent_id, registry_data)
|
||||||
|
tick_file = save_tick(tick)
|
||||||
|
print(f" ✓ Tick saved: {tick_file}")
|
||||||
|
print(f" Status: {tick['status']}")
|
||||||
|
|
||||||
|
if commit:
|
||||||
|
success, msg = commit_tick(agent_id, tick["tick_month"])
|
||||||
|
if success:
|
||||||
|
print(f" ✓ Committed to Gitea")
|
||||||
|
else:
|
||||||
|
print(f" ✗ Commit failed: {msg}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
return tick
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(description="Agent Tick Submitter")
|
||||||
|
parser.add_argument("--agent", help="Submit tick for specific agent")
|
||||||
|
parser.add_argument("--all", action="store_true", help="Submit ticks for all active agents")
|
||||||
|
parser.add_argument("--no-commit", action="store_true", help="Save locally but don't commit")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
registry = load_registry()
|
||||||
|
|
||||||
|
if args.all:
|
||||||
|
print(f"=== Submitting ticks for all agents ===")
|
||||||
|
for agent in registry["agents"]:
|
||||||
|
if agent["active"]:
|
||||||
|
submit_agent_tick(agent["id"], registry, commit=not args.no_commit)
|
||||||
|
print()
|
||||||
|
elif args.agent:
|
||||||
|
submit_agent_tick(args.agent, registry, commit=not args.no_commit)
|
||||||
|
else:
|
||||||
|
# Auto-detect current agent from hostname or env
|
||||||
|
hostname = os.uname().nodename
|
||||||
|
agent_id = hostname if any(a["id"] == hostname for a in registry["agents"]) else None
|
||||||
|
|
||||||
|
if agent_id:
|
||||||
|
submit_agent_tick(agent_id, registry, commit=not args.no_commit)
|
||||||
|
else:
|
||||||
|
print("Error: Could not auto-detect agent. Use --agent or --all")
|
||||||
|
return 1
|
||||||
|
|
||||||
|
return 0
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
sys.exit(main())
|
||||||
Reference in New Issue
Block a user