Some checks failed
Architecture Lint / Linter Tests (push) Successful in 30s
Smoke Test / smoke (push) Failing after 24s
Validate Config / YAML Lint (push) Failing after 16s
Validate Config / JSON Validate (push) Successful in 21s
Validate Config / Cron Syntax Check (push) Successful in 15s
Validate Config / Deploy Script Dry Run (push) Successful in 14s
Validate Config / Python Syntax & Import Check (push) Failing after 1m2s
Validate Config / Python Test Suite (push) Has been skipped
Validate Config / Shell Script Lint (push) Failing after 1m3s
Validate Config / Playbook Schema Validation (push) Successful in 24s
Architecture Lint / Linter Tests (pull_request) Successful in 27s
Smoke Test / smoke (pull_request) Failing after 22s
Validate Config / YAML Lint (pull_request) Failing after 16s
Validate Config / JSON Validate (pull_request) Successful in 23s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m5s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Cron Syntax Check (pull_request) Successful in 12s
Validate Config / Shell Script Lint (pull_request) Failing after 1m6s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 13s
Validate Config / Playbook Schema Validation (pull_request) Successful in 25s
PR Checklist / pr-checklist (pull_request) Failing after 4m33s
Architecture Lint / Lint Repository (push) Failing after 26s
Architecture Lint / Lint Repository (pull_request) Failing after 26s
- Add bin/gitea-backup.sh: daily backup script using gitea dump - Add cron/vps/gitea-daily-backup.yml: Hermes cron job (2 AM daily) - Add docs/backup-recovery-runbook.md: complete recovery procedures Addresses [AUDIT][RISK] Single-node VPS is a single point of failure. Closes #481
156 lines
5.5 KiB
Markdown
156 lines
5.5 KiB
Markdown
# Gitea Backup & Recovery Runbook
|
||
|
||
**Last updated:** 2026-04-30
|
||
**Scope:** Single-node VPS (Ezra, 143.198.27.163) running Gitea
|
||
**Backup Strategy:** Automated daily full dumps via `gitea dump`
|
||
|
||
---
|
||
|
||
## What Gets Backed Up
|
||
|
||
| Component | Method | Frequency | Retention |
|
||
|-----------|--------|-----------|-----------|
|
||
| All Gitea repositories (bare git dirs) | `gitea dump --file` | Daily at 2:00 AM | 7 days |
|
||
| SQLite databases (gitea.db, indexer.db, etc.) | Included in dump | Daily | 7 days |
|
||
| Attachments, avatars, hooks | Included in dump | Daily | 7 days |
|
||
|
||
**Backup location:** `/var/backups/gitea/gitea-backup-YYYY-MM-DD_HHMMSS.tar.gz`
|
||
|
||
**Log file:** `/var/log/gitea-backup.log`
|
||
|
||
---
|
||
|
||
## Backup Architecture
|
||
|
||
The backup script `bin/gitea-backup.sh` runs daily via Hermes cron (`cron/vps/gitea-daily-backup.yml`). It:
|
||
|
||
1. Locates the Gitea `WORK_PATH` by reading `/etc/gitea/app.ini` or falling back to common locations (`/var/lib/gitea`, `/home/git/gitea`)
|
||
2. Invokes `gitea dump --work-path <path> --file <backup-tar.gz>` — Gitea's native, consistent snapshot mechanism
|
||
3. Prunes archives older than 7 days
|
||
4. Logs all operations to `/var/log/gitea-backup.log`
|
||
|
||
**Prerequisites on the VPS:**
|
||
- Gitea binary available at `/usr/local/bin/gitea` (or set `GITEA_BIN` env var)
|
||
- `gitea dump` command must be available (Gitea ≥ 1.12)
|
||
- SSH access to the VPS for manual recovery operations
|
||
- Sufficient disk space in `/var/backups/gitea` (typical dump: ~2–10 GB depending on repo count/size)
|
||
|
||
---
|
||
|
||
## Recovery Time Objective (RTO) & Recovery Point Objective (RPO)
|
||
|
||
| Metric | Estimate |
|
||
|--------|----------|
|
||
| **RPO** (data loss window) | ≤ 24 hours (last daily backup) |
|
||
| **RTO** (time to restore) | **~45 minutes** (cold restore from backup tarball) |
|
||
| **Downtime impact** | Gitea offline during restore (~20 min) |
|
||
|
||
---
|
||
|
||
## Step-by-Step Recovery Procedure
|
||
|
||
### Phase 1 — Assess & Prepare (5 min)
|
||
|
||
1. SSH into Ezra VPS: `ssh root@143.198.27.163`
|
||
2. Stop Gitea so files are quiescent:
|
||
```bash
|
||
systemctl stop gitea
|
||
```
|
||
3. Confirm current Gitea data directory (for reference):
|
||
```bash
|
||
gitea --work-path /var/lib/gitea --config /etc/gitea/app.ini dump --help 2>&1
|
||
# Or check app.ini for WORK_PATH
|
||
cat /etc/gitea/app.ini | grep '^WORK_PATH'
|
||
```
|
||
|
||
### Phase 2 — Restore from Backup (20 min)
|
||
|
||
4. Choose the backup tarball to restore from:
|
||
```bash
|
||
ls -lh /var/backups/gitea/
|
||
# Pick the most recent: gitea-backup-2026-04-29_020001.tar.gz
|
||
```
|
||
|
||
5. **Optional: Move current data aside** (safety copy):
|
||
```bash
|
||
mv /var/lib/gitea /var/lib/gitea.bak-$(date +%s)
|
||
```
|
||
|
||
6. Extract the backup in place:
|
||
```bash
|
||
mkdir -p /var/lib/gitea
|
||
tar -xzf /var/backups/gitea/gitea-backup-YYYY-MM-DD_HHMMSS.tar.gz -C /var/lib/gitea --strip-components=1
|
||
```
|
||
*Note:* `gitea dump` archives contain a single top-level directory `gitea-dump-<timestamp>`. The `--strip-components=1` puts its contents directly into `/var/lib/gitea`.
|
||
|
||
7. Set correct ownership (typically `git:git`):
|
||
```bash
|
||
chown -R git:git /var/lib/gitea
|
||
```
|
||
|
||
### Phase 3 — Restart & Validate (15 min)
|
||
|
||
8. Start Gitea:
|
||
```bash
|
||
systemctl start gitea
|
||
```
|
||
|
||
9. Wait 30 seconds, then verify:
|
||
```bash
|
||
systemctl status gitea
|
||
# Check HTTP endpoint
|
||
curl -s -o /dev/null -w '%{http_code}' http://localhost:3000/ # Should be 200
|
||
```
|
||
|
||
10. Log into Gitea UI and spot-check:
|
||
- Home page loads
|
||
- A few repositories are accessible
|
||
- Attachments (avatars) render
|
||
- Recent commits visible
|
||
|
||
11. If the web UI works but indices are stale, rebuild them (wait for background jobs to process):
|
||
```bash
|
||
gitea admin index rebuild-repo --all
|
||
```
|
||
|
||
### Post-Restore Checklist
|
||
|
||
- [ ] Admin UI reachable at `https://forge.alexanderwhitestone.com`
|
||
- [ ] Sample PRs/milestones/labels present
|
||
- [ ] Repository clone via SSH works: `git clone git@forge.alexanderwhitestone.com:Timmy_Foundation/timmy-config.git`
|
||
- [ ] Check backup script health: `cat /var/log/gitea-backup.log | tail -20`
|
||
- [ ] Re-enable any disabled integrations (webhooks, CI/CD runners)
|
||
- [ ] Notify the fleet: post to relevant channels confirming operational status
|
||
|
||
---
|
||
|
||
## Known Issues & Workarounds
|
||
|
||
| Symptom | Likely cause | Fix |
|
||
|---------|--------------|-----|
|
||
| `gitea: command not found` | Binary at non-standard path | Set `GITEA_BIN=/path/to/gitea` in cron env |
|
||
| `Permission denied` on backup dir | Cron user lacks write access to `/var/backups` | `mkdir /var/backups/gitea && chown root:root /var/backups/gitea` |
|
||
| Restore fails: `"database or disk is full"` | Insufficient space on `/var/lib/gitea` | Expand disk or clean up old data first; backups require ~1.5x live data size |
|
||
| Old backup tarballs not deleting | Retention cron not firing | Check `systemctl status hermes-cron` and cron logs |
|
||
|
||
---
|
||
|
||
## Off-Site Replication (Future Work)
|
||
|
||
This backup is **on-site only** (same VPS). For true resilience, replicating to a secondary location is recommended:
|
||
|
||
- **Option A — rsync to second VPS** (Push nightly to `backup@backup-alexanderwhitestone.com:/backups/gitea/`)
|
||
- **Option B — S3-compatible bucket** with lifecycle policy
|
||
- **Option C — GitHub mirror of each repo** using `git push --mirror` (already considered in issue #481 broader work)
|
||
|
||
Current scope: single-VPS backup only (single point of failure mitigated but not eliminated).
|
||
|
||
---
|
||
|
||
## Related Documentation
|
||
|
||
- `bin/gitea-backup.sh` — backup script source
|
||
- `cron/vps/gitea-daily-backup.yml` — Hermes cron definition
|
||
- Gitea official docs: <https://docs.gitea.com/administration/backup-and-restore>
|
||
- Hermes cron: <https://hermes-agent.nousresearch.com/docs>
|