Files
timmy-config/docs/backup-recovery-runbook.md

156 lines
5.5 KiB
Markdown
Raw Normal View History

# Gitea Backup & Recovery Runbook
**Last updated:** 2026-04-30
**Scope:** Single-node VPS (Ezra, 143.198.27.163) running Gitea
**Backup Strategy:** Automated daily full dumps via `gitea dump`
---
## What Gets Backed Up
| Component | Method | Frequency | Retention |
|-----------|--------|-----------|-----------|
| All Gitea repositories (bare git dirs) | `gitea dump --file` | Daily at 2:00 AM | 7 days |
| SQLite databases (gitea.db, indexer.db, etc.) | Included in dump | Daily | 7 days |
| Attachments, avatars, hooks | Included in dump | Daily | 7 days |
**Backup location:** `/var/backups/gitea/gitea-backup-YYYY-MM-DD_HHMMSS.tar.gz`
**Log file:** `/var/log/gitea-backup.log`
---
## Backup Architecture
The backup script `bin/gitea-backup.sh` runs daily via Hermes cron (`cron/vps/gitea-daily-backup.yml`). It:
1. Locates the Gitea `WORK_PATH` by reading `/etc/gitea/app.ini` or falling back to common locations (`/var/lib/gitea`, `/home/git/gitea`)
2. Invokes `gitea dump --work-path <path> --file <backup-tar.gz>` — Gitea's native, consistent snapshot mechanism
3. Prunes archives older than 7 days
4. Logs all operations to `/var/log/gitea-backup.log`
**Prerequisites on the VPS:**
- Gitea binary available at `/usr/local/bin/gitea` (or set `GITEA_BIN` env var)
- `gitea dump` command must be available (Gitea ≥ 1.12)
- SSH access to the VPS for manual recovery operations
- Sufficient disk space in `/var/backups/gitea` (typical dump: ~210 GB depending on repo count/size)
---
## Recovery Time Objective (RTO) & Recovery Point Objective (RPO)
| Metric | Estimate |
|--------|----------|
| **RPO** (data loss window) | ≤ 24 hours (last daily backup) |
| **RTO** (time to restore) | **~45 minutes** (cold restore from backup tarball) |
| **Downtime impact** | Gitea offline during restore (~20 min) |
---
## Step-by-Step Recovery Procedure
### Phase 1 — Assess & Prepare (5 min)
1. SSH into Ezra VPS: `ssh root@143.198.27.163`
2. Stop Gitea so files are quiescent:
```bash
systemctl stop gitea
```
3. Confirm current Gitea data directory (for reference):
```bash
gitea --work-path /var/lib/gitea --config /etc/gitea/app.ini dump --help 2>&1
# Or check app.ini for WORK_PATH
cat /etc/gitea/app.ini | grep '^WORK_PATH'
```
### Phase 2 — Restore from Backup (20 min)
4. Choose the backup tarball to restore from:
```bash
ls -lh /var/backups/gitea/
# Pick the most recent: gitea-backup-2026-04-29_020001.tar.gz
```
5. **Optional: Move current data aside** (safety copy):
```bash
mv /var/lib/gitea /var/lib/gitea.bak-$(date +%s)
```
6. Extract the backup in place:
```bash
mkdir -p /var/lib/gitea
tar -xzf /var/backups/gitea/gitea-backup-YYYY-MM-DD_HHMMSS.tar.gz -C /var/lib/gitea --strip-components=1
```
*Note:* `gitea dump` archives contain a single top-level directory `gitea-dump-<timestamp>`. The `--strip-components=1` puts its contents directly into `/var/lib/gitea`.
7. Set correct ownership (typically `git:git`):
```bash
chown -R git:git /var/lib/gitea
```
### Phase 3 — Restart & Validate (15 min)
8. Start Gitea:
```bash
systemctl start gitea
```
9. Wait 30 seconds, then verify:
```bash
systemctl status gitea
# Check HTTP endpoint
curl -s -o /dev/null -w '%{http_code}' http://localhost:3000/ # Should be 200
```
10. Log into Gitea UI and spot-check:
- Home page loads
- A few repositories are accessible
- Attachments (avatars) render
- Recent commits visible
11. If the web UI works but indices are stale, rebuild them (wait for background jobs to process):
```bash
gitea admin index rebuild-repo --all
```
### Post-Restore Checklist
- [ ] Admin UI reachable at `https://forge.alexanderwhitestone.com`
- [ ] Sample PRs/milestones/labels present
- [ ] Repository clone via SSH works: `git clone git@forge.alexanderwhitestone.com:Timmy_Foundation/timmy-config.git`
- [ ] Check backup script health: `cat /var/log/gitea-backup.log | tail -20`
- [ ] Re-enable any disabled integrations (webhooks, CI/CD runners)
- [ ] Notify the fleet: post to relevant channels confirming operational status
---
## Known Issues & Workarounds
| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `gitea: command not found` | Binary at non-standard path | Set `GITEA_BIN=/path/to/gitea` in cron env |
| `Permission denied` on backup dir | Cron user lacks write access to `/var/backups` | `mkdir /var/backups/gitea && chown root:root /var/backups/gitea` |
| Restore fails: `"database or disk is full"` | Insufficient space on `/var/lib/gitea` | Expand disk or clean up old data first; backups require ~1.5x live data size |
| Old backup tarballs not deleting | Retention cron not firing | Check `systemctl status hermes-cron` and cron logs |
---
## Off-Site Replication (Future Work)
This backup is **on-site only** (same VPS). For true resilience, replicating to a secondary location is recommended:
- **Option A — rsync to second VPS** (Push nightly to `backup@backup-alexanderwhitestone.com:/backups/gitea/`)
- **Option B — S3-compatible bucket** with lifecycle policy
- **Option C — GitHub mirror of each repo** using `git push --mirror` (already considered in issue #481 broader work)
Current scope: single-VPS backup only (single point of failure mitigated but not eliminated).
---
## Related Documentation
- `bin/gitea-backup.sh` — backup script source
- `cron/vps/gitea-daily-backup.yml` — Hermes cron definition
- Gitea official docs: <https://docs.gitea.com/administration/backup-and-restore>
- Hermes cron: <https://hermes-agent.nousresearch.com/docs>