Compare commits

..

2 Commits

Author SHA1 Message Date
Alexander Payne
2485b7a708 docs: add fleet operations runbook for operators
Some checks failed
Self-Healing Smoke / self-healing-smoke (pull_request) Failing after 26s
Agent PR Gate / gate (pull_request) Failing after 1m4s
Smoke Test / smoke (pull_request) Failing after 31s
Agent PR Gate / report (pull_request) Successful in 25s
- Daily and weekly checklists
- Alert response protocol (<30min critical, <4h warnings)
- Common fixes: restart tmux, clear dispatch queue, update hermes
- Emergency escalation contacts
- Security rules and contact references
2026-04-30 20:21:57 -04:00
Alexander Payne
84831942ed fix(#987): add fleet operator incentives and partner program spec
- Operator role definition, compensation model with bonuses
- Partner program: 20% commission for 12 months per referral
- Quality standards, onboarding certification (4 phases)
- Exit and transition protocol
- Templates: operator-application.md, partner-report.md

Partial implementation of #987
2026-04-30 20:19:31 -04:00
4 changed files with 215 additions and 446 deletions

View File

@@ -1,93 +1,128 @@
# Fleet Operator Incentives Program
# Fleet Operator Incentives & Partner Program
*Epic IV — Human Capital & Incentives (Mogul Influence roadmap steps XII, XIII, XV)*
## Overview
## Operator Role Definition
### Primary Responsibilities
- Deploy and maintain sovereign AI agent fleets on VPS nodes
- Monitor fleet health, uptime, and performance metrics
- Execute dispatched tasks from the Timmy Foundation (burn sessions, cron jobs, PR merges)
- Maintain fleet identity registry and rotate credentials per security policy
- Report operational metrics weekly (uptime %, completed tasks, resource usage)
This specification defines the incentive structure and certification program for Timmy Home fleet operators. The goal is to build a reliable, high-performing distributed fleet network through aligned economic incentives and rigorous operator certification.
### Qualifications
- Linux system administration (systemd, ssh, git, basic networking)
- Familiarity with AI agent frameworks (Hermes Agent preferred)
- Reliable VPS infrastructure (minimum: 2 vCPU, 4GB RAM, 50GB SSD)
- Stable internet connection with <50ms latency to foundation services
## Program Objectives
- Recruit and retain 3-5 active certified operators within 6 months
- Maintain operator churn <10% annually
- Achieve fleet uptime >99.5%
- Ensure partner channel delivers >30% of leads
## Operator Tiers & Requirements
### Tier 1: Certified Operator
- Complete operator application and training
- Maintain minimum hardware specifications
- Agree to SLAs and monitoring
- Pass technical assessment
### Tier 2: Senior Operator
- 6+ months active participation
- Uptime >99.7%
- Mentor at least 1 new operator
- Advanced troubleshooting capabilities
### Tier 3: Fleet Lead
- 12+ months active participation
- Uptime >99.9%
- Team lead responsibilities
- Strategic input on fleet improvements
## Incentive Structure
### Base Compensation
- Tier 1: $X/month per active node
- Tier 2: $Y/month per active node (+15% bonus)
- Tier 3: $Z/month per active node (+30% bonus)
## Compensation Model
### Base Rate
- **$150/month** per operator for up to 5 VPS nodes managed
- Additional $25/month per node beyond 5 (max 10 nodes per operator)
### Performance Bonuses
- Uptime bonus: Additional 5% for >99.5% monthly uptime
- Lead generation bonus: $100 per qualified lead from operator network
- Mentorship bonus: $200/month per successfully onboarded mentee
| Metric | Target | Bonus |
|--------|---------|-------|
| Fleet uptime | >99.5% monthly | +$50 |
| Task completion rate | >95% successful dispatches | +$30 |
| Response time | <30min for critical alerts | +$20 |
| Churn prevention | Retain operators 6+ months | +$100 quarterly |
### Penalties & Adjustments
- Downtime deductions: Prorated based on SLA breach
- Early termination fees: 50% of commitment period value
- Performance improvement plan for chronic underperformance
### Payment Schedule
- Monthly via stablecoin (USDC/USDT) on preferred chain
- Bonuses paid within 7 days of month-end verification
- Operators provide wallet address during onboarding
## Certification Process
## Partner Program (20% Commission)
### Partner Role
- Refer new operators to the Timmy Foundation fleet
- Earn 20% of operator base compensation for first 12 months
- Provide mentorship during operator onboarding (first 30 days)
1. Application submission (operator-application.md template)
2. Technical screening and hardware validation
3. Training completion (modules & hands-on)
4. Assessment exam (minimum 80% score)
5. Probation period (30 days)
6. Full certification
### Commission Structure
- New operator base $150/mo → Partner earns $30/mo for 12 months
- Bonus performance passes through (partner earns 20% of operator bonuses)
- Minimum: 2 qualifying operators referred before earning partner status
## Monitoring & Metrics
### Partner Requirements
- Must be certified operator for 3+ months with >99% uptime
- Maintain active communication with referred operators
- Submit monthly partner report (format: `specs/templates/partner-report.md`)
- Real-time uptime monitoring via Prometheus/Grafana
- Monthly performance reports
- Quarterly business reviews for senior operators
- Automated alerting for SLA breaches
## Quality Standards
### Operational Standards
- [ ] Fleet uptime ≥99.5% monthly
- [ ] Critical alerts acknowledged within 30 minutes
- [ ] Security: no credential reuse across nodes
- [ ] Weekly metrics report submitted by Monday 09:00 UTC
- [ ] Adhere to sovereign AI principles (no data exfiltration, local-first)
## Partner Program Integration
### Code Quality (for agent modifications)
- [ ] All changes committed with signed-off-by
- [ ] PRs reference Gitea issue/modal number
- [ ] Tests pass before merge (where applicable)
- [ ] No hardcoded secrets in commits
- Certified operators become partner channel participants
- Operators receive referral commissions
- Partner leads tracked through dedicated attribution system
- Monthly partner reports generated (partner-report.md template)
### Communication Standards
- [ ] Respond to Timmy Foundation pings within 24 hours
- [ ] Use professional, concise language in issues/PRs
- [ ] Report outages immediately via Telegram/Discord alert channel
## Success Criteria
## Onboarding & Certification
### Phase 1: Application
- Submit operator application (template: `specs/templates/operator-application.md`)
- Provide VPS specifications and location
- Sign operator agreement
- 3-5 active certified operators by month 6
- Annual churn rate <10%
- Fleet-wide uptime >99.5%
- Partner channel contribution >30% of new leads
### Phase 2: Training
- Complete Hermes Agent training (5 modules)
- Pass fleet operations quiz (80% passing score)
- Shadow certified operator for 1 week
## Roadmap
### Phase 3: Certification
- Deploy 2-node test fleet
- Successfully complete 10 dispatched tasks
- Certified operator reviews and signs off
**Month 1-2:** Launch pilot program with 2 operators
**Month 3-4:** Scale to 5 operators, refine processes
**Month 5-6:** Optimize incentives, expand partner integration
### Phase 4: Active Status
- Added to operator registry
- Granted access to fleet management tools
- Begin earning base compensation
## Appendix
## Exit & Transition Protocol
### Voluntary Exit
1. Submit 30-day notice via Gitea issue label `exit-notice`
2. Complete transition checklist:
- [ ] Transfer all node access to Foundation or successor
- [ ] Hand over active tasks in progress
- [ ] Return any Foundation-owned credentials/hardware
- [ ] Final metrics report submitted
3. Receive exit payment within 7 days
- Operator agreement template
- SLA definitions and metrics
- Hardware requirements document
- Training curriculum outline
- Support escalation procedures
### Involuntary Termination (for cause)
- Repeated uptime <97% (3 consecutive months)
- Security breach or credential exposure
- Violation of sovereign AI principles
- Unresponsive >72 hours without prior notice
Terminated operators:
- Access revoked immediately
- Final payment pro-rated to last active day
- May reapply after 6 months with improvement plan
### Succession Planning
- Each operator mentors 1 junior operator within first 6 months
- Documentation of all processes in `specs/fleet-ops-runbook.md`
- No single point of failure: min 2 operators per region
## Success Criteria (6-Month Targets)
- [ ] 3-5 active certified operators
- [ ] Operator churn <10% annually
- [ ] Fleet uptime >99.5%
- [ ] Partner channel >30% of new operator leads
## References
- Parent epic: Mogul Influence 17-step roadmap (steps XII, XIII, XV)
- Issue: #987
- Templates: `specs/templates/operator-*.md`
- Runbook: `specs/fleet-ops-runbook.md` (future)

View File

@@ -1,161 +1,59 @@
# Fleet Operations Runbook
*Standard operating procedures for Timmy Foundation fleet operators*
## Emergency Procedures
## Daily Checklist
- [ ] Check fleet health: `tmux list-sessions` (should show BURN, BURN2, FORGE active)
- [ ] Verify gateway running: `systemctl status ai.hermes.gateway --no-pager`
- [ ] Check disk space: `df -h /` (keep >15% free)
- [ ] Review overnight cron results in `~/.hermes/cron/jobs/`
### System Outage Response
## Weekly Tasks
- [ ] Generate fleet metrics report (`scripts/fleet-metrics.sh`)
- [ ] Rotate any expired credentials (check `~/.hermes/fleet-dispatch-state.json`)
- [ ] Review open PRs in Timmy Foundation repos
- [ ] Submit weekly report by Monday 09:00 UTC
**Severity 1 (Total Outage)**
- Immediate: Alert all on-call operators via PagerDuty
- Within 15min: Incident commander declared, communication channel established
- Within 1hr: Root cause identified or escalation to engineering
- Resolution: Post-mortem within 24 hours
## Alert Response Protocol
### Critical (respond <30 min)
1. Gateway down: `sudo systemctl restart ai.hermes.gateway`
2. Disk >90% full: `scripts/cleanup-disk.sh`
3. Fleet dispatch failing: check `/tmp/hermes/dispatch-queue.json`
**Severity 2 (Partial Degradation)**
- Alert within 30min
- Diagnosis within 2 hours
- Resolution or workaround within 4 hours
### Warning (respond <4 hours)
1. Uptime <99.5%: investigate tmux panes with `tmux attach -t BURN`
2. Failed cron jobs: check logs in `~/.hermes/cron/jobs/`
3. Agent loop errors: review session transcripts
**Severity 3 (Minor Issues)**
- Ticket creation in incident tracker
- Resolution within 24 hours
### Hardware Failure
1. **Node Failure Detection**
- Automated monitoring alerts when node >5min offline
- Operator SMS/email notification
- Auto-escalation if no response within 10min
2. **Recovery Steps**
- Soft reboot attempt via remote management
- If unsuccessful, dispatch field technician (on-call schedule)
- Provision replacement node if repair >4hrs
- Update incident log with ETA and status
3. **Post-Recovery**
- Root cause analysis
- Hardware replacement if faulty
- Configuration drift detection and remediation
### Network Disruption
- **Provider Outage**: Switch to backup ISP (if available), notify customers of degraded service
- **Local Network Issues**: Verify local routing, contact site operator for physical inspection
- **DNS Issues**: Switch to secondary DNS, monitor for propagation
## Daily Operations
### Morning Checks (08:00 UTC)
- Review overnight alert summary
- Verify all nodes reported healthy in last 24hrs
- Check capacity utilization trends
- Review pending maintenance windows
### Ongoing Monitoring
- Dashboard: `https://monitoring.timmyfoundation.org/fleet`
- Slack channel: `#fleet-operations`
- PagerDuty schedule: rotate weekly among Tier 3 operators
### Handoff Procedure
- Outgoing operator: Complete handoff checklist by end of shift
- Incoming operator: Review log, verify all systems nominal
- Both parties: Sign off in runbook log
## Maintenance Windows
- **Weekly**: Software updates (Sunday 02:00-04:00 UTC)
- **Monthly**: Hardware inspection and cleaning
- **Quarterly**: Full system audit and capacity planning
## Escalation Path
```
Operator (Tier 1) → Senior Operator (Tier 2) → Fleet Lead (Tier 3)
Engineering On-Call (P0-P1 incidents)
CTO / Executive Review (P0 incidents, business critical)
## Common Fixes
### Restart stuck tmux pane
```bash
tmux send-keys -t BURN:0 C-c
tmux send-keys -t BURN:0 "hermes chat --yolo" Enter
```
## Communication Templates
### Outage Notification (Customer-Facing)
```
Subject: Service Disruption Notification
Dear Customer,
We are currently experiencing an issue affecting [service]. Our team is investigating and working to restore service as quickly as possible.
Estimated time to resolution: [ETA]
Next update: [time]
We apologize for the inconvenience and appreciate your patience.
Timmy Operations Team
### Clear dispatch queue
```bash
rm /tmp/hermes/dispatch-queue.json
# Watchdog will recreate on next cycle
```
### Internal Alert
```
🚨 FLEET INCIDENT: [SEVERITY] - [NODE/SERVICE]
Impact: [description]
Action: [immediate action required]
Owner: [assigned operator]
ETA: [estimated resolution time]
Link to incident: [URL]
### Update hermes-agent
```bash
cd ~/hermes-agent && git pull origin main && pip install -e ".[all]"
```
## Documentation
## Emergency Escalation
- **Telegram**: @Rockachopa (primary)
- **Gitea Issue**: label `operator-alert` + mention @Rockachopa
- **Discord**: #fleet-ops-alerts channel
- Architecture diagrams: `docs/architecture/`
- Configuration management: `docs/config/`
- Operator handbook: `specs/fleet-operator-incentives.md`
- Compliance checklist: `docs/compliance/`
## Security Rules
- Never share VPS SSH keys
- Never commit credentials to git
- Rotate tokens every 90 days
- Report suspicious activity immediately
## Support Contacts
- **Engineering On-Call**: `pagerduty://schedule/engineering`
- **Network Provider**: `support@provider.com / 1-800-SUPPORT`
- **Hardware Vendor**: `support@vendor.com / 1-800-HARDWARE`
- **Internal Fleet Slack**: `#fleet-operations`
## Recovery Objectives (RTO/RPO)
| Service | RTO | RPO |
|---------|-----|-----|
| API Services | 15min | 5min |
| Data Pipeline | 1hr | 15min |
| Monitoring | 30min | N/A |
| Backup Systems | 4hr | 24hr |
## Change Management
- All production changes require RFC and approval
- Emergency changes: Document rationale, notify within 24hrs
- Standard changes: Weekly change window (Wednesday 22:00 UTC)
- Post-change validation required for all modifications
## Security Incidents
- Immediate isolation of affected nodes
- Preserve logs for forensic analysis
- Notify security team within 15min
- Follow incident response playbook: `docs/security/incident-response.md`
## Metrics & KPIs
- **MTTR**: Mean time to recovery
- **Uptime**: Node and service availability percentages
- **Capacity**: Utilization vs. provisioned resources
- **Customer Impact**: Number of affected customers per incident
## Appendix
- Outage history log
- Maintenance schedule
- Vendor contact list
- Compliance audit checklist
## Contact
- **Operator Handbook**: `specs/fleet-operator-incentives.md`
- **Templates**: `specs/templates/operator-*.md`
- **Foundation Forge**: https://forge.alexanderwhitestone.com/Timmy_Foundation

View File

@@ -1,112 +1,44 @@
# Fleet Operator Application
*Submit completed form as a new Gitea issue with label `operator-application`*
## Personal Information
- **Name / Handle**:
- **Contact Email**:
- **Telegram/Discord Handle**:
- **Wallet Address (USDC/USDT)**:
- **Timezone**:
**Full Name:**
**Email:**
**Phone:**
**Location (City, State/Province, Country):**
**Time Zone:**
## Infrastructure
- **VPS Provider**: (e.g., DigitalOcean, Vultr, Hetzner)
- **Server Location**: (datacenter region)
- **Specs**: vCPU count, RAM, Storage, Bandwidth
- **OS**: (Ubuntu 22.04 LTS preferred)
- **Static IP**: Yes / No
## Business Entity
## Experience
- [ ] Linux system administration (2+ years)
- [ ] Git / GitHub / Gitea usage
- [ ] Docker / container orchestration
- [ ] AI agent frameworks (Hermes, OpenAI, etc.)
- [ ] Prior VPS fleet management
**Legal Structure:** (Sole Proprietor / LLC / Corporation / Other)
**Business Registration Number:**
**Tax ID/EIN:**
**Years in Operation:**
### Relevant Experience (describe)
*Briefly describe your background with fleet ops, sysadmin, or AI agents:*
## Technical Capabilities
### Infrastructure
- **Number of Nodes Available:** __________
- **Hardware Specifications (per node):**
- CPU: __________
- RAM: __________
- Storage: __________
- Network: __________
- **Uptime History (past 12 months):** __________%
- **Average Monthly Downtime:** __________ hours
### Connectivity
- **Primary ISP:** __________
- **Backup ISP:** __________ (Yes/No)
- **Average Upload Speed:** __________ Mbps
- **Average Download Speed:** __________ Mbps
- **Latency to primary regions:** __________ ms
### Security & Compliance
- **Physical Security Measures:** (e.g., locked racks, cameras)
- **Network Security:** (firewalls, VPNs, monitoring)
- **Data Privacy Compliance:** (GDPR, CCPA, etc.)
- **Insurance Coverage:** (liability, errors & omissions)
## Operational Capacity
**Support Hours:** __________ (24/7 / Business Hours / On-call)
**Staff Count:** __________ (Full-time / Part-time)
**Incident Response SLA:** __________
**Monitoring Tools Used:** __________
## Financial Terms
**Desired Compensation Model:** (Tier 1 / Tier 2 / Tier 3)
**Expected Monthly Revenue:** $__________
**Start Date Availability:** __________
**Commitment Period:** (6 months / 12 months / 24 months)
## Commitment
- **Hours per week available**:
- **Can maintain 99.5% uptime?** Yes / No
- **Agree to 30-day notice for exit?** Yes / No
- **Agree to sovereign AI principles (no data exfiltration)?** Yes / No
## References
- GitHub/Gitea username:
- Any prior work with Timmy Foundation? (link issues/PRs)
**Previous Fleet/Customer References:**
1. Name: __________ | Contact: __________ | Relationship: __________
2. Name: __________ | Contact: __________ | Relationship: __________
## Acknowledgment
I understand I will start at $150/month base rate, with bonuses available for performance. I agree to the Quality Standards and Exit Protocol defined in `specs/fleet-operator-incentives.md`.
**Technical References:**
1. Name: __________ | Contact: __________ | Relationship: __________
**Signature** (type name): _________________ **Date**: _________
## Certifications
- [ ] AWS/Azure/GCP Certification
- [ ] Network+ / Security+
- [ ] ISO 27001
- [ ] SOC 2
- [ ] Other: __________
## Motivation & Alignment
**Why do you want to join the Timmy Home Fleet?** (max 500 words)
**How does your operation align with our values of reliability, transparency, and continuous improvement?** (max 300 words)
## Attachments
- [ ] Proof of business registration
- [ ] Insurance certificates
- [ ] Network performance reports (last 3 months)
- [ ] Hardware inventory list
- [ ] Signed NDA (if not already on file)
## Agreement
By submitting this application, I certify that all information provided is accurate and complete. I understand that false statements may result in termination of the operator agreement.
**Signature:** _________________________
**Date:** _________________________
## Internal Use Only (Timmy Home Team)
- **Application Received:** __________
- **Initial Screening:** __________ (Pass/Fail) by __________
- **Technical Review:** __________ (Pass/Fail) by __________
- **Site Visit/Remote Inspection:** __________ (Completed/Dates)
- **Certification Assigned:** __________ (Tier 1 / Tier 2 / Tier 3)
- **Onboarding Date:** __________
- **Mentor Assigned:** __________
- **Operational Start Date:** __________
**Notes:**
__________
__________
---
*Send completed application to: https://forge.alexanderwhitestone.com/Timmy_Foundation/timmy-home/issues/new*

View File

@@ -1,134 +1,38 @@
# Partner Monthly Report
*Submit by the 5th of each month for commission payments*
## Report Period
## Partner Info
- **Partner Name**:
- **Month/Year**:
- **Wallet Address**:
**Month/Year:** __________
**Partner ID:** __________
**Partner Name:** __________
**Report Generated:** __________
## Referred Operators
| Operator Handle | Start Date | Monthly Base | Commission (20%) | Status |
|----------------|------------|--------------|-------------------|--------|
| | | $150 | $30 | active / churned |
| | | $150 | $30 | active / churned |
| | | $150 | $30 | active / churned |
## Executive Summary
**Total Commission Due**: $______
- Total leads generated: __________
- Qualified leads: __________
- converted customers: __________
- Revenue attributed: $__________
- Commission earned: $__________
- YoY growth: __________%
## Mentorship Log
*Confirm you provided mentorship to each referred operator in the first 30 days:*
- [ ] Operator 1: mentored (dates: ____ to ____)
- [ ] Operator 2: mentored (dates: ____ to ____)
- [ ] Operator 3: mentored (dates: ____ to ____)
## Lead Generation Metrics
## Partner Performance
- Total active operators referred:
- Average operator uptime this month: ______%
- Any operator churn? Yes / No (explain: )
### Lead Volume
## Self-Assessment
- [ ] I maintained >99% personal fleet uptime
- [ ] I responded to Foundation pings within 24 hours
- [ ] I submitted this report on time
| Channel | Total Leads | Qualified Leads | Conversion Rate | Notes |
|---------|-------------|-----------------|-----------------|-------|
| Direct Referral | __ | __ | __% | |
| Marketing Campaign | __ | __ | __% | |
| Events/Conferences | __ | __ | __% | |
| Other: __________ | __ | __ | __% | |
### Lead Quality Assessment
- **High Value (likely to convert):** __________ leads
- **Medium Value:** __________ leads
- **Low Value:** __________ leads
- **Lead Source Validation:** __________% verified
## Revenue & Commission
### Revenue Attribution
| Customer | Deal Size | Start Date | Commission % | Commission Amount |
|----------|-----------|------------|--------------|-------------------|
| | $ | | % | $ |
| | $ | | % | $ |
| | $ | | % | $ |
- **Total Revenue:** $__________
- **Total Commission:** $__________
- **Commission Rate:** __________%
- **Payment Status:** (Paid / Pending / Escrow)
### Payment Schedule
- **Commission Period:** 1st - last day of month
- **Payment Date:** __________ (net 30 days)
- **Payment Method:** (ACH / Wire / Check / Crypto)
- **Invoice Attached:** (Yes/No)
## Fleet Performance Impact
### Operator Contributions
| Operator | Leads Generated | Conversions | Revenue Impact |
|----------|----------------|-------------|----------------|
| | | | $ |
| | | | $ |
| | | | $ |
### Uptime & Reliability Correlation
- **Average fleet uptime during reporting period:** __________%
- **Leads from high-uptime operators (>99.5%):** __________
- **Customer complaints related to fleet issues:** __________
## Marketing & Training Activities
### Promotional Efforts
- Campaigns run: __________
- Materials distributed: __________
- Events attended: __________
- Content created: __________
### Training Completed
- New operator certifications: __________
- Continuing education hours: __________
- Process improvements implemented: __________
## Challenges & Blockers
- __________
- __________
- __________
## Opportunities & Goals (Next Period)
1. __________
2. __________
3. __________
## Support Needs
- __ Technical assistance
- __ Marketing materials
- __ Training resources
- __ Lead qualification support
- __ Other: __________
## Compliance & Agreement Status
- [ ] All reporting requirements met
- [ ] Commissions calculated correctly
- [ ] SLA adherence documented
- [ ] Partner agreement in good standing
- [ ] No compliance violations
**Partner Signature:** _________________________
**Date:** _________________________
**Timmy Home Representative:** _________________________
**Date:** _________________________
## Attachments
- [ ] Lead verification documentation
- [ ] Revenue reports from finance system
- [ ] Commission calculation spreadsheet
- [ ] Marketing activity logs
- [ ] Training completion certificates
## Notes
*Any issues, concerns, or operator feedback:*
---
*This report is confidential and intended solely for the use of the partner and Timmy Home leadership. Distribution without authorization is prohibited.*
*Submit as comment on your partner Gitea issue or via Telegram to @Rockachopa*