Files
timmy-config/docs/backup-recovery-runbook.md
Rockachopa 874ce137b0
Some checks failed
Architecture Lint / Linter Tests (push) Successful in 30s
Smoke Test / smoke (push) Failing after 24s
Validate Config / YAML Lint (push) Failing after 16s
Validate Config / JSON Validate (push) Successful in 21s
Validate Config / Cron Syntax Check (push) Successful in 15s
Validate Config / Deploy Script Dry Run (push) Successful in 14s
Validate Config / Python Syntax & Import Check (push) Failing after 1m2s
Validate Config / Python Test Suite (push) Has been skipped
Validate Config / Shell Script Lint (push) Failing after 1m3s
Validate Config / Playbook Schema Validation (push) Successful in 24s
Architecture Lint / Linter Tests (pull_request) Successful in 27s
Smoke Test / smoke (pull_request) Failing after 22s
Validate Config / YAML Lint (pull_request) Failing after 16s
Validate Config / JSON Validate (pull_request) Successful in 23s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m5s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Cron Syntax Check (pull_request) Successful in 12s
Validate Config / Shell Script Lint (pull_request) Failing after 1m6s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 13s
Validate Config / Playbook Schema Validation (pull_request) Successful in 25s
PR Checklist / pr-checklist (pull_request) Failing after 4m33s
Architecture Lint / Lint Repository (push) Failing after 26s
Architecture Lint / Lint Repository (pull_request) Failing after 26s
feat(backup): add automated Gitea daily backup and recovery runbook
- Add bin/gitea-backup.sh: daily backup script using gitea dump
- Add cron/vps/gitea-daily-backup.yml: Hermes cron job (2 AM daily)
- Add docs/backup-recovery-runbook.md: complete recovery procedures

Addresses [AUDIT][RISK] Single-node VPS is a single point of failure.
Closes #481
2026-04-30 01:44:05 -04:00

5.5 KiB
Raw Blame History

Gitea Backup & Recovery Runbook

Last updated: 2026-04-30
Scope: Single-node VPS (Ezra, 143.198.27.163) running Gitea
Backup Strategy: Automated daily full dumps via gitea dump


What Gets Backed Up

Component Method Frequency Retention
All Gitea repositories (bare git dirs) gitea dump --file Daily at 2:00 AM 7 days
SQLite databases (gitea.db, indexer.db, etc.) Included in dump Daily 7 days
Attachments, avatars, hooks Included in dump Daily 7 days

Backup location: /var/backups/gitea/gitea-backup-YYYY-MM-DD_HHMMSS.tar.gz

Log file: /var/log/gitea-backup.log


Backup Architecture

The backup script bin/gitea-backup.sh runs daily via Hermes cron (cron/vps/gitea-daily-backup.yml). It:

  1. Locates the Gitea WORK_PATH by reading /etc/gitea/app.ini or falling back to common locations (/var/lib/gitea, /home/git/gitea)
  2. Invokes gitea dump --work-path <path> --file <backup-tar.gz> — Gitea's native, consistent snapshot mechanism
  3. Prunes archives older than 7 days
  4. Logs all operations to /var/log/gitea-backup.log

Prerequisites on the VPS:

  • Gitea binary available at /usr/local/bin/gitea (or set GITEA_BIN env var)
  • gitea dump command must be available (Gitea ≥ 1.12)
  • SSH access to the VPS for manual recovery operations
  • Sufficient disk space in /var/backups/gitea (typical dump: ~210 GB depending on repo count/size)

Recovery Time Objective (RTO) & Recovery Point Objective (RPO)

Metric Estimate
RPO (data loss window) ≤ 24 hours (last daily backup)
RTO (time to restore) ~45 minutes (cold restore from backup tarball)
Downtime impact Gitea offline during restore (~20 min)

Step-by-Step Recovery Procedure

Phase 1 — Assess & Prepare (5 min)

  1. SSH into Ezra VPS: ssh root@143.198.27.163
  2. Stop Gitea so files are quiescent:
    systemctl stop gitea
    
  3. Confirm current Gitea data directory (for reference):
    gitea --work-path /var/lib/gitea --config /etc/gitea/app.ini dump --help 2>&1
    # Or check app.ini for WORK_PATH
    cat /etc/gitea/app.ini | grep '^WORK_PATH'
    

Phase 2 — Restore from Backup (20 min)

  1. Choose the backup tarball to restore from:

    ls -lh /var/backups/gitea/
    # Pick the most recent: gitea-backup-2026-04-29_020001.tar.gz
    
  2. Optional: Move current data aside (safety copy):

    mv /var/lib/gitea /var/lib/gitea.bak-$(date +%s)
    
  3. Extract the backup in place:

    mkdir -p /var/lib/gitea
    tar -xzf /var/backups/gitea/gitea-backup-YYYY-MM-DD_HHMMSS.tar.gz -C /var/lib/gitea --strip-components=1
    

    Note: gitea dump archives contain a single top-level directory gitea-dump-<timestamp>. The --strip-components=1 puts its contents directly into /var/lib/gitea.

  4. Set correct ownership (typically git:git):

    chown -R git:git /var/lib/gitea
    

Phase 3 — Restart & Validate (15 min)

  1. Start Gitea:

    systemctl start gitea
    
  2. Wait 30 seconds, then verify:

    systemctl status gitea
    # Check HTTP endpoint
    curl -s -o /dev/null -w '%{http_code}' http://localhost:3000/  # Should be 200
    
  3. Log into Gitea UI and spot-check:

    • Home page loads
    • A few repositories are accessible
    • Attachments (avatars) render
    • Recent commits visible
  4. If the web UI works but indices are stale, rebuild them (wait for background jobs to process):

    gitea admin index rebuild-repo --all
    

Post-Restore Checklist

  • Admin UI reachable at https://forge.alexanderwhitestone.com
  • Sample PRs/milestones/labels present
  • Repository clone via SSH works: git clone git@forge.alexanderwhitestone.com:Timmy_Foundation/timmy-config.git
  • Check backup script health: cat /var/log/gitea-backup.log | tail -20
  • Re-enable any disabled integrations (webhooks, CI/CD runners)
  • Notify the fleet: post to relevant channels confirming operational status

Known Issues & Workarounds

Symptom Likely cause Fix
gitea: command not found Binary at non-standard path Set GITEA_BIN=/path/to/gitea in cron env
Permission denied on backup dir Cron user lacks write access to /var/backups mkdir /var/backups/gitea && chown root:root /var/backups/gitea
Restore fails: "database or disk is full" Insufficient space on /var/lib/gitea Expand disk or clean up old data first; backups require ~1.5x live data size
Old backup tarballs not deleting Retention cron not firing Check systemctl status hermes-cron and cron logs

Off-Site Replication (Future Work)

This backup is on-site only (same VPS). For true resilience, replicating to a secondary location is recommended:

  • Option A — rsync to second VPS (Push nightly to backup@backup-alexanderwhitestone.com:/backups/gitea/)
  • Option B — S3-compatible bucket with lifecycle policy
  • Option C — GitHub mirror of each repo using git push --mirror (already considered in issue #481 broader work)

Current scope: single-VPS backup only (single point of failure mitigated but not eliminated).