Some checks failed
Smoke Test / smoke (pull_request) Failing after 6s
71 lines
2.7 KiB
Markdown
71 lines
2.7 KiB
Markdown
# Operational Runbook Index
|
|
|
|
Last updated: 2026-04-13
|
|
|
|
Quick-reference index for common operational tasks across the Timmy Foundation infrastructure.
|
|
|
|
## Fleet Operations
|
|
|
|
| Task | Location | Command/Procedure |
|
|
|------|----------|-------------------|
|
|
| Deploy fleet update | fleet-ops | `ansible-playbook playbooks/provision_and_deploy.yml --ask-vault-pass` |
|
|
| Check fleet health | fleet-ops | `python3 scripts/fleet_readiness.py` |
|
|
| Agent scorecard | fleet-ops | `python3 scripts/agent_scorecard.py` |
|
|
| View fleet manifest | fleet-ops | `cat manifest.yaml` |
|
|
|
|
## the-nexus (Frontend + Brain)
|
|
|
|
| Task | Location | Command/Procedure |
|
|
|------|----------|-------------------|
|
|
| Run tests | the-nexus | `pytest tests/` |
|
|
| Validate repo integrity | the-nexus | `python3 scripts/repo_truth_guard.py` |
|
|
| Check swarm governor | the-nexus | `python3 bin/swarm_governor.py --status` |
|
|
| Start dev server | the-nexus | `python3 server.py` |
|
|
| Run deep dive pipeline | the-nexus | `cd intelligence/deepdive && python3 pipeline.py` |
|
|
|
|
## timmy-config (Control Plane)
|
|
|
|
| Task | Location | Command/Procedure |
|
|
|------|----------|-------------------|
|
|
| Run Ansible deploy | timmy-config | `cd ansible && ansible-playbook playbooks/site.yml` |
|
|
| Scan for banned providers | timmy-config | `python3 bin/banned_provider_scan.py` |
|
|
| Check merge conflicts | timmy-config | `python3 bin/conflict_detector.py` |
|
|
| Muda audit | timmy-config | `bash fleet/muda-audit.sh` |
|
|
|
|
## hermes-agent (Agent Framework)
|
|
|
|
| Task | Location | Command/Procedure |
|
|
|------|----------|-------------------|
|
|
| Start agent | hermes-agent | `python3 run_agent.py` |
|
|
| Check provider allowlist | hermes-agent | `python3 tools/provider_allowlist.py --check` |
|
|
| Run test suite | hermes-agent | `pytest` |
|
|
|
|
## Incident Response
|
|
|
|
### Agent Down
|
|
1. Check health endpoint: `curl http://<host>:<port>/health`
|
|
2. Check systemd: `systemctl status hermes-<agent>`
|
|
3. Check logs: `journalctl -u hermes-<agent> --since "1 hour ago"`
|
|
4. Restart: `systemctl restart hermes-<agent>`
|
|
|
|
### Banned Provider Detected
|
|
1. Run scanner: `python3 bin/banned_provider_scan.py`
|
|
2. Check golden state: `cat ansible/inventory/group_vars/wizards.yml`
|
|
3. Verify BANNED_PROVIDERS.yml is current
|
|
4. Fix config and redeploy
|
|
|
|
### Merge Conflict Cascade
|
|
1. Run conflict detector: `python3 bin/conflict_detector.py`
|
|
2. Rebase oldest conflicting PR first
|
|
3. Merge, then repeat — cascade resolves naturally
|
|
|
|
## Key Files
|
|
|
|
| File | Repo | Purpose |
|
|
|------|------|---------|
|
|
| `manifest.yaml` | fleet-ops | Fleet service definitions |
|
|
| `config.yaml` | timmy-config | Agent runtime config |
|
|
| `ansible/BANNED_PROVIDERS.yml` | timmy-config | Provider ban enforcement |
|
|
| `portals.json` | the-nexus | Portal registry |
|
|
| `vision.json` | the-nexus | Vision system config |
|