196 lines
4.3 KiB
Markdown
196 lines
4.3 KiB
Markdown
|
|
# Agent Tick Protocol v1.0
|
||
|
|
|
||
|
|
**Purpose:** Establish baseline health monitoring for all Timmy Time agents with automated RCA on failure.
|
||
|
|
|
||
|
|
**Frequency:** Monthly (1st of each month)
|
||
|
|
|
||
|
|
**Scope:** All agents in Timmy Time group
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Protocol Specification
|
||
|
|
|
||
|
|
### What Constitutes a Tick
|
||
|
|
|
||
|
|
Every agent must emit a **Tick Record** containing:
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"agent_id": "allegro",
|
||
|
|
"agent_name": "Allegro",
|
||
|
|
"role": "tempo-and-dispatch",
|
||
|
|
"timestamp": "2026-04-02T00:00:00Z",
|
||
|
|
"tick_month": "2026-04",
|
||
|
|
"status": "healthy",
|
||
|
|
"vitals": {
|
||
|
|
"gateway_running": true,
|
||
|
|
"home_directory_accessible": true,
|
||
|
|
"config_valid": true,
|
||
|
|
"last_user_interaction": "2026-04-01T23:45:00Z",
|
||
|
|
"work_items_completed_this_month": 47
|
||
|
|
},
|
||
|
|
"capabilities": {
|
||
|
|
"telegram": true,
|
||
|
|
"api_server": true,
|
||
|
|
"gitea_access": true
|
||
|
|
},
|
||
|
|
"notes": ""
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Tick Status Values
|
||
|
|
|
||
|
|
| Status | Meaning | Action Required |
|
||
|
|
|--------|---------|-----------------|
|
||
|
|
| `healthy` | All systems operational | None |
|
||
|
|
| `degraded` | Some capabilities impaired | Monitor |
|
||
|
|
| `critical` | Core functionality compromised | RCA required |
|
||
|
|
| `offline` | Agent unreachable | RCA + escalation |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Tick Submission
|
||
|
|
|
||
|
|
### Method 1: Direct Commit to household-snapshots
|
||
|
|
|
||
|
|
Agents commit their tick to:
|
||
|
|
```
|
||
|
|
ticks/2026-04/allegro.json
|
||
|
|
ticks/2026-04/adagio.json
|
||
|
|
```
|
||
|
|
|
||
|
|
### Method 2: API Endpoint (Future)
|
||
|
|
|
||
|
|
POST to Evenia world tick with tick payload.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Monitoring & Enforcement
|
||
|
|
|
||
|
|
### Monthly Tick Collection Window
|
||
|
|
|
||
|
|
- **Opens:** 1st of month at 00:00 UTC
|
||
|
|
- **Closes:** 3rd of month at 23:59 UTC
|
||
|
|
- **Grace Period:** 72 hours
|
||
|
|
|
||
|
|
### Automated Checks
|
||
|
|
|
||
|
|
1. **Tick Presence Check** (4th of month)
|
||
|
|
- Verify all registered agents have submitted ticks
|
||
|
|
- Missing ticks → Gitea issue created
|
||
|
|
|
||
|
|
2. **Status Validation** (4th of month)
|
||
|
|
- Check all submitted ticks for `critical` or `offline` status
|
||
|
|
- Failed status → Gitea issue created
|
||
|
|
|
||
|
|
3. **RCA Auto-Generation** (4th of month)
|
||
|
|
- Issues created with RCA template pre-filled
|
||
|
|
- Assigned to agent owner
|
||
|
|
- Due date: 7 days
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## RCA Template
|
||
|
|
|
||
|
|
When an agent fails to tick or reports critical status, this template is used:
|
||
|
|
|
||
|
|
```markdown
|
||
|
|
## Agent Health Failure: [AGENT_ID]
|
||
|
|
|
||
|
|
**Detected:** [DATE]
|
||
|
|
**Agent:** [AGENT_NAME]
|
||
|
|
**Failure Type:** [MISSING_TICK | CRITICAL_STATUS | OFFLINE]
|
||
|
|
|
||
|
|
### Expected Behavior
|
||
|
|
Agent should emit monthly tick within 72-hour window.
|
||
|
|
|
||
|
|
### Actual Behavior
|
||
|
|
- Tick Status: [STATUS]
|
||
|
|
- Last Known Good: [DATE]
|
||
|
|
- Capabilities Lost: [LIST]
|
||
|
|
|
||
|
|
### Root Cause Analysis Required
|
||
|
|
|
||
|
|
Please investigate and document:
|
||
|
|
|
||
|
|
1. **What happened?**
|
||
|
|
- Last successful operation
|
||
|
|
- Error logs (if any)
|
||
|
|
- System state at failure
|
||
|
|
|
||
|
|
2. **Why did it happen?**
|
||
|
|
- Configuration drift
|
||
|
|
- Resource exhaustion
|
||
|
|
- External dependency failure
|
||
|
|
- Code regression
|
||
|
|
|
||
|
|
3. **How do we prevent recurrence?**
|
||
|
|
- Monitoring improvements
|
||
|
|
- Automated recovery
|
||
|
|
- Alert tuning
|
||
|
|
|
||
|
|
4. **Recovery steps taken**
|
||
|
|
- Actions performed
|
||
|
|
- Current status
|
||
|
|
- Validation performed
|
||
|
|
|
||
|
|
### Timeline
|
||
|
|
|
||
|
|
- [ ] T+0: Issue created (auto)
|
||
|
|
- [ ] T+1h: Initial response
|
||
|
|
- [ ] T+24h: RCA submitted
|
||
|
|
- [ ] T+7d: Resolution verified
|
||
|
|
|
||
|
|
### Related
|
||
|
|
- Previous tick: [LINK]
|
||
|
|
- Agent config: [LINK]
|
||
|
|
- Logs: [LINK]
|
||
|
|
|
||
|
|
---
|
||
|
|
*Auto-generated by Agent Tick Monitor*
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Agent Registry
|
||
|
|
|
||
|
|
Current agents in scope:
|
||
|
|
|
||
|
|
| Agent ID | Name | Role | Owner | Status |
|
||
|
|
|----------|------|------|-------|--------|
|
||
|
|
| allegro | Allegro | tempo-and-dispatch | Alexander | active |
|
||
|
|
| adagio | Adagio | breath-and-design | Alexander | active |
|
||
|
|
| timmy | Timmy Time | father-house | Alexander | active |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Implementation
|
||
|
|
|
||
|
|
### Files
|
||
|
|
|
||
|
|
- `scripts/agent_tick_monitor.py` - Monthly monitoring
|
||
|
|
- `scripts/agent_tick_submitter.py` - Agent self-reporting
|
||
|
|
- `templates/rca-template.md` - RCA issue template
|
||
|
|
- `config/agent-registry.json` - Agent definitions
|
||
|
|
|
||
|
|
### Cron Schedule
|
||
|
|
|
||
|
|
```cron
|
||
|
|
# Monthly tick collection - 1st of month at 00:00
|
||
|
|
0 0 1 * * /usr/bin/python3 /root/wizards/household-snapshots/scripts/agent_tick_submitter.py --all
|
||
|
|
|
||
|
|
# Tick validation and RCA trigger - 4th of month at 00:00
|
||
|
|
0 0 4 * * /usr/bin/python3 /root/wizards/household-snapshots/scripts/agent_tick_monitor.py --check-and-report
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Version History
|
||
|
|
|
||
|
|
| Version | Date | Changes |
|
||
|
|
|---------|------|---------|
|
||
|
|
| 1.0 | 2026-04-02 | Initial protocol |
|
||
|
|
|
||
|
|
---
|
||
|
|
*Evenia binds us. Health is monitored. Failures are learned from.*
|