feat(backup): add automated Gitea daily backup and recovery runbook
- Add bin/gitea-backup.sh: daily backup script using gitea dump - Add cron/vps/gitea-daily-backup.yml: Hermes cron job (2 AM daily) - Add docs/backup-recovery-runbook.md: complete recovery procedures Addresses [AUDIT][RISK] Single-node VPS is a single point of failure. Closes #481
This commit is contained in:
committed by
Timmy (STEP35 FREE BURN)
parent
514583ea03
commit
fb36197e8c
155
docs/backup-recovery-runbook.md
Normal file
155
docs/backup-recovery-runbook.md
Normal file
@@ -0,0 +1,155 @@
|
||||
# Gitea Backup & Recovery Runbook
|
||||
|
||||
**Last updated:** 2026-04-30
|
||||
**Scope:** Single-node VPS (Ezra, 143.198.27.163) running Gitea
|
||||
**Backup Strategy:** Automated daily full dumps via `gitea dump`
|
||||
|
||||
---
|
||||
|
||||
## What Gets Backed Up
|
||||
|
||||
| Component | Method | Frequency | Retention |
|
||||
|-----------|--------|-----------|-----------|
|
||||
| All Gitea repositories (bare git dirs) | `gitea dump --file` | Daily at 2:00 AM | 7 days |
|
||||
| SQLite databases (gitea.db, indexer.db, etc.) | Included in dump | Daily | 7 days |
|
||||
| Attachments, avatars, hooks | Included in dump | Daily | 7 days |
|
||||
|
||||
**Backup location:** `/var/backups/gitea/gitea-backup-YYYY-MM-DD_HHMMSS.tar.gz`
|
||||
|
||||
**Log file:** `/var/log/gitea-backup.log`
|
||||
|
||||
---
|
||||
|
||||
## Backup Architecture
|
||||
|
||||
The backup script `bin/gitea-backup.sh` runs daily via Hermes cron (`cron/vps/gitea-daily-backup.yml`). It:
|
||||
|
||||
1. Locates the Gitea `WORK_PATH` by reading `/etc/gitea/app.ini` or falling back to common locations (`/var/lib/gitea`, `/home/git/gitea`)
|
||||
2. Invokes `gitea dump --work-path <path> --file <backup-tar.gz>` — Gitea's native, consistent snapshot mechanism
|
||||
3. Prunes archives older than 7 days
|
||||
4. Logs all operations to `/var/log/gitea-backup.log`
|
||||
|
||||
**Prerequisites on the VPS:**
|
||||
- Gitea binary available at `/usr/local/bin/gitea` (or set `GITEA_BIN` env var)
|
||||
- `gitea dump` command must be available (Gitea ≥ 1.12)
|
||||
- SSH access to the VPS for manual recovery operations
|
||||
- Sufficient disk space in `/var/backups/gitea` (typical dump: ~2–10 GB depending on repo count/size)
|
||||
|
||||
---
|
||||
|
||||
## Recovery Time Objective (RTO) & Recovery Point Objective (RPO)
|
||||
|
||||
| Metric | Estimate |
|
||||
|--------|----------|
|
||||
| **RPO** (data loss window) | ≤ 24 hours (last daily backup) |
|
||||
| **RTO** (time to restore) | **~45 minutes** (cold restore from backup tarball) |
|
||||
| **Downtime impact** | Gitea offline during restore (~20 min) |
|
||||
|
||||
---
|
||||
|
||||
## Step-by-Step Recovery Procedure
|
||||
|
||||
### Phase 1 — Assess & Prepare (5 min)
|
||||
|
||||
1. SSH into Ezra VPS: `ssh root@143.198.27.163`
|
||||
2. Stop Gitea so files are quiescent:
|
||||
```bash
|
||||
systemctl stop gitea
|
||||
```
|
||||
3. Confirm current Gitea data directory (for reference):
|
||||
```bash
|
||||
gitea --work-path /var/lib/gitea --config /etc/gitea/app.ini dump --help 2>&1
|
||||
# Or check app.ini for WORK_PATH
|
||||
cat /etc/gitea/app.ini | grep '^WORK_PATH'
|
||||
```
|
||||
|
||||
### Phase 2 — Restore from Backup (20 min)
|
||||
|
||||
4. Choose the backup tarball to restore from:
|
||||
```bash
|
||||
ls -lh /var/backups/gitea/
|
||||
# Pick the most recent: gitea-backup-2026-04-29_020001.tar.gz
|
||||
```
|
||||
|
||||
5. **Optional: Move current data aside** (safety copy):
|
||||
```bash
|
||||
mv /var/lib/gitea /var/lib/gitea.bak-$(date +%s)
|
||||
```
|
||||
|
||||
6. Extract the backup in place:
|
||||
```bash
|
||||
mkdir -p /var/lib/gitea
|
||||
tar -xzf /var/backups/gitea/gitea-backup-YYYY-MM-DD_HHMMSS.tar.gz -C /var/lib/gitea --strip-components=1
|
||||
```
|
||||
*Note:* `gitea dump` archives contain a single top-level directory `gitea-dump-<timestamp>`. The `--strip-components=1` puts its contents directly into `/var/lib/gitea`.
|
||||
|
||||
7. Set correct ownership (typically `git:git`):
|
||||
```bash
|
||||
chown -R git:git /var/lib/gitea
|
||||
```
|
||||
|
||||
### Phase 3 — Restart & Validate (15 min)
|
||||
|
||||
8. Start Gitea:
|
||||
```bash
|
||||
systemctl start gitea
|
||||
```
|
||||
|
||||
9. Wait 30 seconds, then verify:
|
||||
```bash
|
||||
systemctl status gitea
|
||||
# Check HTTP endpoint
|
||||
curl -s -o /dev/null -w '%{http_code}' http://localhost:3000/ # Should be 200
|
||||
```
|
||||
|
||||
10. Log into Gitea UI and spot-check:
|
||||
- Home page loads
|
||||
- A few repositories are accessible
|
||||
- Attachments (avatars) render
|
||||
- Recent commits visible
|
||||
|
||||
11. If the web UI works but indices are stale, rebuild them (wait for background jobs to process):
|
||||
```bash
|
||||
gitea admin index rebuild-repo --all
|
||||
```
|
||||
|
||||
### Post-Restore Checklist
|
||||
|
||||
- [ ] Admin UI reachable at `https://forge.alexanderwhitestone.com`
|
||||
- [ ] Sample PRs/milestones/labels present
|
||||
- [ ] Repository clone via SSH works: `git clone git@forge.alexanderwhitestone.com:Timmy_Foundation/timmy-config.git`
|
||||
- [ ] Check backup script health: `cat /var/log/gitea-backup.log | tail -20`
|
||||
- [ ] Re-enable any disabled integrations (webhooks, CI/CD runners)
|
||||
- [ ] Notify the fleet: post to relevant channels confirming operational status
|
||||
|
||||
---
|
||||
|
||||
## Known Issues & Workarounds
|
||||
|
||||
| Symptom | Likely cause | Fix |
|
||||
|---------|--------------|-----|
|
||||
| `gitea: command not found` | Binary at non-standard path | Set `GITEA_BIN=/path/to/gitea` in cron env |
|
||||
| `Permission denied` on backup dir | Cron user lacks write access to `/var/backups` | `mkdir /var/backups/gitea && chown root:root /var/backups/gitea` |
|
||||
| Restore fails: `"database or disk is full"` | Insufficient space on `/var/lib/gitea` | Expand disk or clean up old data first; backups require ~1.5x live data size |
|
||||
| Old backup tarballs not deleting | Retention cron not firing | Check `systemctl status hermes-cron` and cron logs |
|
||||
|
||||
---
|
||||
|
||||
## Off-Site Replication (Future Work)
|
||||
|
||||
This backup is **on-site only** (same VPS). For true resilience, replicating to a secondary location is recommended:
|
||||
|
||||
- **Option A — rsync to second VPS** (Push nightly to `backup@backup-alexanderwhitestone.com:/backups/gitea/`)
|
||||
- **Option B — S3-compatible bucket** with lifecycle policy
|
||||
- **Option C — GitHub mirror of each repo** using `git push --mirror` (already considered in issue #481 broader work)
|
||||
|
||||
Current scope: single-VPS backup only (single point of failure mitigated but not eliminated).
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- `bin/gitea-backup.sh` — backup script source
|
||||
- `cron/vps/gitea-daily-backup.yml` — Hermes cron definition
|
||||
- Gitea official docs: <https://docs.gitea.com/administration/backup-and-restore>
|
||||
- Hermes cron: <https://hermes-agent.nousresearch.com/docs>
|
||||
Reference in New Issue
Block a user