Files
timmy-home/docs/FLEET_SECRET_ROTATION.md
Alexander Whitestone b334139fb5
Some checks failed
Smoke Test / smoke (pull_request) Failing after 15s
feat: add fleet secret rotation playbook (#694)
2026-04-14 23:59:54 -04:00

69 lines
2.7 KiB
Markdown

# Fleet Secret Rotation
Issue: `timmy-home#694`
This runbook adds a single place to rotate fleet API keys, service tokens, and SSH authorized keys without hand-editing remote hosts.
## Files
- `ansible/inventory/hosts.ini` — fleet hosts (`ezra`, `bezalel`)
- `ansible/inventory/group_vars/fleet.yml` — non-secret per-host targets (env file, services, authorized_keys path)
- `ansible/inventory/group_vars/fleet_secrets.vault.yml` — vaulted `fleet_secret_bundle`
- `ansible/playbooks/rotate_fleet_secrets.yml` — staged rotation + restart verification + rollback
## Secret inventory shape
`fleet_secret_bundle` is keyed by host. Each host carries the env secrets to rewrite plus the full `authorized_keys` payload to distribute.
```yaml
fleet_secret_bundle:
ezra:
env:
GITEA_TOKEN: !vault |
...
TELEGRAM_BOT_TOKEN: !vault |
...
PRIMARY_MODEL_API_KEY: !vault |
...
ssh_authorized_keys: !vault |
...
```
The committed vault file contains placeholder encrypted values only. Replace them with real rotated material before production use.
## Rotate a new bundle
From repo root:
```bash
cd ansible
ansible-vault edit inventory/group_vars/fleet_secrets.vault.yml
ansible-playbook -i inventory/hosts.ini playbooks/rotate_fleet_secrets.yml --ask-vault-pass
```
Or update one value at a time with `ansible-vault encrypt_string` and paste it into `fleet_secret_bundle`.
## What the playbook does
1. Validates that each host has a secret bundle and target metadata.
2. Writes rollback snapshots under `/var/lib/timmy/secret-rotations/<rotation_id>/<host>/`.
3. Stages a candidate `.env` file and candidate `authorized_keys` file before promotion.
4. Promotes staged files into place.
5. Restarts every declared dependent service.
6. Verifies each service with `systemctl is-active`.
7. If anything fails, restores the previous `.env` and `authorized_keys`, restarts services again, and aborts the run.
## Rollback semantics
Rollback is host-safe and automatic inside the playbook `rescue:` block.
- Existing `.env` and `authorized_keys` files are restored from backup when they existed before rotation.
- Newly created files are removed if the host had no prior version.
- Service restart is retried after rollback so the node returns to the last-known-good bundle.
## Operational notes
- Keep `required_env_keys` in `ansible/inventory/group_vars/fleet.yml` aligned with each house's real runtime contract.
- `ssh_authorized_keys` distributes public keys only. Rotate corresponding private keys out-of-band, then publish the new authorized key list through the vault.
- Use one vault edit per rotation window so API keys, bot tokens, and SSH access move together.