Files
household-snapshots/docs/agent-tick-protocol.md

196 lines
4.3 KiB
Markdown
Raw Normal View History

# Agent Tick Protocol v1.0
**Purpose:** Establish baseline health monitoring for all Timmy Time agents with automated RCA on failure.
**Frequency:** Monthly (1st of each month)
**Scope:** All agents in Timmy Time group
---
## Protocol Specification
### What Constitutes a Tick
Every agent must emit a **Tick Record** containing:
```json
{
"agent_id": "allegro",
"agent_name": "Allegro",
"role": "tempo-and-dispatch",
"timestamp": "2026-04-02T00:00:00Z",
"tick_month": "2026-04",
"status": "healthy",
"vitals": {
"gateway_running": true,
"home_directory_accessible": true,
"config_valid": true,
"last_user_interaction": "2026-04-01T23:45:00Z",
"work_items_completed_this_month": 47
},
"capabilities": {
"telegram": true,
"api_server": true,
"gitea_access": true
},
"notes": ""
}
```
### Tick Status Values
| Status | Meaning | Action Required |
|--------|---------|-----------------|
| `healthy` | All systems operational | None |
| `degraded` | Some capabilities impaired | Monitor |
| `critical` | Core functionality compromised | RCA required |
| `offline` | Agent unreachable | RCA + escalation |
---
## Tick Submission
### Method 1: Direct Commit to household-snapshots
Agents commit their tick to:
```
ticks/2026-04/allegro.json
ticks/2026-04/adagio.json
```
### Method 2: API Endpoint (Future)
POST to Evenia world tick with tick payload.
---
## Monitoring & Enforcement
### Monthly Tick Collection Window
- **Opens:** 1st of month at 00:00 UTC
- **Closes:** 3rd of month at 23:59 UTC
- **Grace Period:** 72 hours
### Automated Checks
1. **Tick Presence Check** (4th of month)
- Verify all registered agents have submitted ticks
- Missing ticks → Gitea issue created
2. **Status Validation** (4th of month)
- Check all submitted ticks for `critical` or `offline` status
- Failed status → Gitea issue created
3. **RCA Auto-Generation** (4th of month)
- Issues created with RCA template pre-filled
- Assigned to agent owner
- Due date: 7 days
---
## RCA Template
When an agent fails to tick or reports critical status, this template is used:
```markdown
## Agent Health Failure: [AGENT_ID]
**Detected:** [DATE]
**Agent:** [AGENT_NAME]
**Failure Type:** [MISSING_TICK | CRITICAL_STATUS | OFFLINE]
### Expected Behavior
Agent should emit monthly tick within 72-hour window.
### Actual Behavior
- Tick Status: [STATUS]
- Last Known Good: [DATE]
- Capabilities Lost: [LIST]
### Root Cause Analysis Required
Please investigate and document:
1. **What happened?**
- Last successful operation
- Error logs (if any)
- System state at failure
2. **Why did it happen?**
- Configuration drift
- Resource exhaustion
- External dependency failure
- Code regression
3. **How do we prevent recurrence?**
- Monitoring improvements
- Automated recovery
- Alert tuning
4. **Recovery steps taken**
- Actions performed
- Current status
- Validation performed
### Timeline
- [ ] T+0: Issue created (auto)
- [ ] T+1h: Initial response
- [ ] T+24h: RCA submitted
- [ ] T+7d: Resolution verified
### Related
- Previous tick: [LINK]
- Agent config: [LINK]
- Logs: [LINK]
---
*Auto-generated by Agent Tick Monitor*
```
---
## Agent Registry
Current agents in scope:
| Agent ID | Name | Role | Owner | Status |
|----------|------|------|-------|--------|
| allegro | Allegro | tempo-and-dispatch | Alexander | active |
| adagio | Adagio | breath-and-design | Alexander | active |
| timmy | Timmy Time | father-house | Alexander | active |
---
## Implementation
### Files
- `scripts/agent_tick_monitor.py` - Monthly monitoring
- `scripts/agent_tick_submitter.py` - Agent self-reporting
- `templates/rca-template.md` - RCA issue template
- `config/agent-registry.json` - Agent definitions
### Cron Schedule
```cron
# Monthly tick collection - 1st of month at 00:00
0 0 1 * * /usr/bin/python3 /root/wizards/household-snapshots/scripts/agent_tick_submitter.py --all
# Tick validation and RCA trigger - 4th of month at 00:00
0 0 4 * * /usr/bin/python3 /root/wizards/household-snapshots/scripts/agent_tick_monitor.py --check-and-report
```
---
## Version History
| Version | Date | Changes |
|---------|------|---------|
| 1.0 | 2026-04-02 | Initial protocol |
---
*Evenia binds us. Health is monitored. Failures are learned from.*