Compare commits

...

7 Commits

Author SHA1 Message Date
Rockachopa
4130820766 feat(training): add prompt-enhancement generator — 3K terse→rich pairs (Closes #575)\n\nAdds scripts/generate_prompt_enhancement.py — deterministic generator (seed=575) that produces 3,000 prompt-enhancement training pairs across four required domains: visual scenes, music moods, dream descriptions, emotional weather (750 each). Output is written to ~/.hermes/training-data/prompt-enhancement.jsonl.\n\nGenerator requires only Python stdlib. All entries validated for required fields (terse, rich, domain) and minimum length. Reproducible with fixed seed.
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 30s
Smoke Test / smoke (pull_request) Failing after 25s
Validate Config / YAML Lint (pull_request) Failing after 20s
Validate Config / JSON Validate (pull_request) Successful in 22s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m5s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Cron Syntax Check (pull_request) Successful in 13s
Validate Config / Shell Script Lint (pull_request) Failing after 1m5s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 12s
Validate Config / Playbook Schema Validation (pull_request) Successful in 25s
PR Checklist / pr-checklist (pull_request) Successful in 4m38s
Architecture Lint / Lint Repository (pull_request) Failing after 26s
2026-04-30 10:03:59 -04:00
Rockachopa
9f0ac59d9b feat(training): add prompt-enhancement generator for 3K terse→rich pairs (Closes #575)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 20s
Smoke Test / smoke (pull_request) Failing after 13s
Validate Config / YAML Lint (pull_request) Failing after 12s
Validate Config / JSON Validate (pull_request) Successful in 12s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m3s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Shell Script Lint (pull_request) Failing after 1m8s
Validate Config / Cron Syntax Check (pull_request) Successful in 14s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 12s
Validate Config / Playbook Schema Validation (pull_request) Successful in 26s
PR Checklist / pr-checklist (pull_request) Successful in 4m31s
Architecture Lint / Lint Repository (pull_request) Failing after 27s
2026-04-30 10:01:01 -04:00
Rockachopa
ba4220d5ed Revert 'feat(training): add prompt-enhancement generator (step35 #575)' — undone for proper branch flow
Some checks failed
Smoke Test / smoke (push) Failing after 19s
Architecture Lint / Linter Tests (push) Successful in 20s
Validate Config / YAML Lint (push) Failing after 11s
Validate Config / JSON Validate (push) Successful in 14s
Validate Config / Python Syntax & Import Check (push) Failing after 48s
Validate Config / Python Test Suite (push) Has been skipped
Validate Config / Shell Script Lint (push) Failing after 54s
Validate Config / Cron Syntax Check (push) Successful in 13s
Validate Config / Deploy Script Dry Run (push) Successful in 17s
Validate Config / Playbook Schema Validation (push) Successful in 29s
Architecture Lint / Lint Repository (push) Failing after 27s
2026-04-30 09:55:17 -04:00
Rockachopa
2451f38bee feat(training): add prompt-enhancement generator for 3K terse→rich pairs (step35 #575)
Some checks failed
Architecture Lint / Linter Tests (push) Has been cancelled
Architecture Lint / Lint Repository (push) Has been cancelled
Smoke Test / smoke (push) Has been cancelled
Validate Config / YAML Lint (push) Has been cancelled
Validate Config / JSON Validate (push) Has been cancelled
Validate Config / Python Syntax & Import Check (push) Has been cancelled
Validate Config / Python Test Suite (push) Has been cancelled
Validate Config / Shell Script Lint (push) Has been cancelled
Validate Config / Cron Syntax Check (push) Has been cancelled
Validate Config / Deploy Script Dry Run (push) Has been cancelled
Validate Config / Playbook Schema Validation (push) Has been cancelled
2026-04-30 09:52:59 -04:00
Rockachopa
54093991ab STEP35-476 patch: use scripts/ path for dispatch_router
Some checks failed
Architecture Lint / Linter Tests (push) Successful in 17s
Smoke Test / smoke (push) Failing after 12s
Validate Config / YAML Lint (push) Failing after 10s
Validate Config / JSON Validate (push) Successful in 16s
Validate Config / Python Syntax & Import Check (push) Failing after 37s
Validate Config / Python Test Suite (push) Has been skipped
Validate Config / Cron Syntax Check (push) Successful in 15s
Validate Config / Shell Script Lint (push) Failing after 46s
Validate Config / Deploy Script Dry Run (push) Successful in 10s
Validate Config / Playbook Schema Validation (push) Successful in 16s
Architecture Lint / Lint Repository (push) Failing after 13s
- dispatch_router.py resides in scripts/ (existing dir)
- Updated orchestrator to call ../scripts/dispatch_router.py
2026-04-30 06:41:38 +00:00
Rockachopa
1ea6bf6e33 STEP35-476: Integrate dispatch_router into orchestrator triage loop
Some checks failed
Architecture Lint / Linter Tests (push) Successful in 31s
Smoke Test / smoke (push) Failing after 24s
Validate Config / YAML Lint (push) Failing after 17s
Validate Config / JSON Validate (push) Successful in 18s
Validate Config / Python Syntax & Import Check (push) Failing after 57s
Validate Config / Python Test Suite (push) Has been skipped
Validate Config / Shell Script Lint (push) Failing after 1m0s
Validate Config / Cron Syntax Check (push) Successful in 11s
Validate Config / Deploy Script Dry Run (push) Successful in 14s
Validate Config / Playbook Schema Validation (push) Successful in 25s
Architecture Lint / Lint Repository (push) Failing after 23s
- Added dispatch_router.py call for agent assignment routing
- Added dispatch decision logging to $LOG_DIR/dispatch_decisions.log
- Fall back to 'claude' if router fails
- Logs agent, score, category, reason per dispatch
2026-04-30 06:32:30 +00:00
Rockachopa
874ce137b0 feat(backup): add automated Gitea daily backup and recovery runbook
Some checks failed
Architecture Lint / Linter Tests (push) Successful in 30s
Smoke Test / smoke (push) Failing after 24s
Validate Config / YAML Lint (push) Failing after 16s
Validate Config / JSON Validate (push) Successful in 21s
Validate Config / Cron Syntax Check (push) Successful in 15s
Validate Config / Deploy Script Dry Run (push) Successful in 14s
Validate Config / Python Syntax & Import Check (push) Failing after 1m2s
Validate Config / Python Test Suite (push) Has been skipped
Validate Config / Shell Script Lint (push) Failing after 1m3s
Validate Config / Playbook Schema Validation (push) Successful in 24s
Architecture Lint / Linter Tests (pull_request) Successful in 27s
Smoke Test / smoke (pull_request) Failing after 22s
Validate Config / YAML Lint (pull_request) Failing after 16s
Validate Config / JSON Validate (pull_request) Successful in 23s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m5s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Cron Syntax Check (pull_request) Successful in 12s
Validate Config / Shell Script Lint (pull_request) Failing after 1m6s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 13s
Validate Config / Playbook Schema Validation (pull_request) Successful in 25s
PR Checklist / pr-checklist (pull_request) Failing after 4m33s
Architecture Lint / Lint Repository (push) Failing after 26s
Architecture Lint / Lint Repository (pull_request) Failing after 26s
- Add bin/gitea-backup.sh: daily backup script using gitea dump
- Add cron/vps/gitea-daily-backup.yml: Hermes cron job (2 AM daily)
- Add docs/backup-recovery-runbook.md: complete recovery procedures

Addresses [AUDIT][RISK] Single-node VPS is a single point of failure.
Closes #481
2026-04-30 01:44:05 -04:00
5 changed files with 434 additions and 14 deletions

87
bin/gitea-backup.sh Normal file
View File

@@ -0,0 +1,87 @@
#!/bin/bash
# Gitea Daily Backup Script
# Uses Gitea's native dump command to create automated backups of repositories and SQLite databases.
# Designed to run on the VPS (Ezra) as part of a daily cron job.
#
# Configuration via environment variables:
# GITEA_BIN Path to gitea binary (default: auto-detect)
# GITEA_BACKUP_DIR Directory for backup archives (default: /var/backups/gitea)
# GITEA_BACKUP_RETENTION Days to retain backups (default: 7)
# GITEA_BACKUP_LOG Log file path (default: /var/log/gitea-backup.log)
set -euo pipefail
GITEA_BIN="${GITEA_BIN:-$(command -v gitea 2>/dev/null || echo "/usr/local/bin/gitea")}"
BACKUP_DIR="${GITEA_BACKUP_DIR:-/var/backups/gitea}"
RETENTION_DAYS="${GITEA_BACKUP_RETENTION:-7}"
DATE="$(date +%Y-%m-%d_%H%M%S)"
BACKUP_FILE="${BACKUP_DIR}/gitea-backup-${DATE}.tar.gz"
LOG_FILE="${GITEA_BACKUP_LOG:-/var/log/gitea-backup.log}"
mkdir -p "${BACKUP_DIR}"
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "${LOG_FILE}"
}
log "=== Starting Gitea daily backup ==="
# Verify gitea binary exists
if [ ! -x "${GITEA_BIN}" ]; then
log "ERROR: Gitea binary not found at ${GITEA_BIN}"
log "Set GITEA_BIN environment variable to the gitea binary path (e.g., /usr/bin/gitea)"
exit 1
fi
# Detect Gitea WORK_PATH
WORK_PATH=""
APP_INI=""
for path in /etc/gitea/app.ini /home/git/gitea/custom/conf/app.ini ~/gitea/custom/conf/app.ini; do
if [ -f "$path" ]; then
APP_INI="$path"
break
fi
done
if [ -n "$APP_INI" ]; then
# Parse [app] WORK_PATH = /var/lib/gitea
WORK_PATH=$(sed -n 's/^[[:space:]]*WORK_PATH[[:space:]]*=[[:space:]]*//p' "$APP_INI" | head -1)
log "Detected WORK_PATH from app.ini: ${WORK_PATH}"
fi
# Fallback detection
if [ -z "$WORK_PATH" ]; then
for d in /var/lib/gitea /home/git/gitea /srv/gitea /opt/gitea; do
if [ -d "$d" ]; then
WORK_PATH="$d"
break
fi
done
log "Inferred WORK_PATH: ${WORK_PATH:-not found}"
fi
if [ -z "$WORK_PATH" ]; then
log "ERROR: Could not determine Gitea WORK_PATH. Set GITEA_WORK_PATH manually."
exit 1
fi
# Perform gitea dump
# Flags: --work-path sets the Gitea working directory, --file writes dump to tar.gz
log "Running: gitea dump --work-path ${WORK_PATH} --file ${BACKUP_FILE}"
"${GITEA_BIN}" dump --work-path "${WORK_PATH}" --file "${BACKUP_FILE}" 2>>"${LOG_FILE}"
if [ $? -ne 0 ]; then
log "ERROR: gitea dump failed — check ${LOG_FILE} for details"
exit 1
fi
FILE_SIZE=$(du -h "${BACKUP_FILE}" | cut -f1)
log "Backup created: ${BACKUP_FILE} (${FILE_SIZE})"
# Prune old backups (keep last N days)
find "${BACKUP_DIR}" -name "gitea-backup-*.tar.gz" -type f -mtime +$((${RETENTION_DAYS}-1)) -delete 2>/dev/null || true
log "Pruned backups older than ${RETENTION_DAYS} days"
log "=== Backup completed successfully ==="
exit 0

View File

@@ -129,20 +129,42 @@ Preserved by timmy-orchestrator to prevent loss." 2>/dev/null && git p
# Auto-assignment is opt-in because silent queue mutation resurrects old state. # Auto-assignment is opt-in because silent queue mutation resurrects old state.
if [ "$unassigned_count" -gt 0 ]; then if [ "$unassigned_count" -gt 0 ]; then
if [ "$AUTO_ASSIGN_UNASSIGNED" = "1" ]; then if [ "$AUTO_ASSIGN_UNASSIGNED" = "1" ]; then
log "Assigning $unassigned_count issues to claude..." log "Assigning $unassigned_count issues via dispatch router..."
DISPATCH_LOG="$LOG_DIR/dispatch_decisions.log"
while IFS= read -r line; do while IFS= read -r line; do
local repo=$(echo "$line" | sed 's/.*REPO=\([^ ]*\).*/\1/') local repo=$(echo "$line" | sed 's/.*REPO=\([^ ]*\).*//')
local num=$(echo "$line" | sed 's/.*NUM=\([^ ]*\).*/\1/') local num=$(echo "$line" | sed 's/.*NUM=\([^ ]*\).*//')
curl -sf -X PATCH "$GITEA_URL/api/v1/repos/$repo/issues/$num" \ local title=$(echo "$line" | sed 's/.*TITLE=//')
-H "Authorization: token $GITEA_TOKEN" \
-H "Content-Type: application/json" \ # Call dispatch_router to pick best agent
-d '{"assignees":["claude"]}' >/dev/null 2>&1 && \ local route_json
log " Assigned #$num ($repo) to claude" route_json=$(python3 "$SCRIPT_DIR/../scripts/dispatch_router.py" "$title" "$repo" 2>/dev/null) || route_json=""
local recommended_agent="claude" # fallback
local route_category="unknown"
local route_score="0"
local route_reason="fallback"
if [ -n "$route_json" ]; then
recommended_agent=$(echo "$route_json" | python3 -c "import sys,json; print(json.load(sys.stdin).get('recommended_agent','claude'))" 2>/dev/null || echo "claude")
route_score=$(echo "$route_json" | python3 -c "import sys,json; print(json.load(sys.stdin).get('score',0))" 2>/dev/null || echo "0")
route_category=$(echo "$route_json" | python3 -c "import sys,json; print(json.load(sys.stdin).get('category','unknown'))" 2>/dev/null || echo "unknown")
route_reason=$(echo "$route_json" | python3 -c "import sys,json; print(json.load(sys.stdin).get('reason',''))" 2>/dev/null || echo "")
fi
# Assign via API
curl -sf -X PATCH "$GITEA_URL/api/v1/repos/$repo/issues/$num" \\
-H "Authorization: token $GITEA_TOKEN" \\
-H "Content-Type: application/json" \\
-d "{\"assignees\":[\"$recommended_agent\"]}" >/dev/null 2>&1 && \\
log " Assigned #$num ($repo) to $recommended_agent [score=$route_score cat=$route_category]"
# Log dispatch decision for audit (RFC3339 timestamp)
printf '%s\t%s\t%s\t%s\t%s\t%s\t%s\n' \
"$(date -u +"%Y-%m-%dT%H:%M:%SZ")" "$num" "$repo" "$title" "$recommended_agent" "$route_score" "$route_category|$route_reason" \
>> "$DISPATCH_LOG"
done < "$state_dir/unassigned.txt" done < "$state_dir/unassigned.txt"
else else fi
log "Auto-assign disabled: leaving $unassigned_count unassigned issues untouched"
fi
fi
# Phase 2: PR review via Timmy (LLM) # Phase 2: PR review via Timmy (LLM)
if [ "$pr_count" -gt 0 ]; then if [ "$pr_count" -gt 0 ]; then

View File

@@ -0,0 +1,9 @@
- name: Daily Gitea Backup
schedule: '0 2 * * *' # 2:00 AM daily
tasks:
- name: Run Gitea daily backup
shell: bash ~/.hermes/bin/gitea-backup.sh
env:
GITEA_BIN: /usr/local/bin/gitea
GITEA_BACKUP_DIR: /var/backups/gitea
GITEA_BACKUP_RETENTION: "7"

View File

@@ -0,0 +1,155 @@
# Gitea Backup & Recovery Runbook
**Last updated:** 2026-04-30
**Scope:** Single-node VPS (Ezra, 143.198.27.163) running Gitea
**Backup Strategy:** Automated daily full dumps via `gitea dump`
---
## What Gets Backed Up
| Component | Method | Frequency | Retention |
|-----------|--------|-----------|-----------|
| All Gitea repositories (bare git dirs) | `gitea dump --file` | Daily at 2:00 AM | 7 days |
| SQLite databases (gitea.db, indexer.db, etc.) | Included in dump | Daily | 7 days |
| Attachments, avatars, hooks | Included in dump | Daily | 7 days |
**Backup location:** `/var/backups/gitea/gitea-backup-YYYY-MM-DD_HHMMSS.tar.gz`
**Log file:** `/var/log/gitea-backup.log`
---
## Backup Architecture
The backup script `bin/gitea-backup.sh` runs daily via Hermes cron (`cron/vps/gitea-daily-backup.yml`). It:
1. Locates the Gitea `WORK_PATH` by reading `/etc/gitea/app.ini` or falling back to common locations (`/var/lib/gitea`, `/home/git/gitea`)
2. Invokes `gitea dump --work-path <path> --file <backup-tar.gz>` — Gitea's native, consistent snapshot mechanism
3. Prunes archives older than 7 days
4. Logs all operations to `/var/log/gitea-backup.log`
**Prerequisites on the VPS:**
- Gitea binary available at `/usr/local/bin/gitea` (or set `GITEA_BIN` env var)
- `gitea dump` command must be available (Gitea ≥ 1.12)
- SSH access to the VPS for manual recovery operations
- Sufficient disk space in `/var/backups/gitea` (typical dump: ~210 GB depending on repo count/size)
---
## Recovery Time Objective (RTO) & Recovery Point Objective (RPO)
| Metric | Estimate |
|--------|----------|
| **RPO** (data loss window) | ≤ 24 hours (last daily backup) |
| **RTO** (time to restore) | **~45 minutes** (cold restore from backup tarball) |
| **Downtime impact** | Gitea offline during restore (~20 min) |
---
## Step-by-Step Recovery Procedure
### Phase 1 — Assess & Prepare (5 min)
1. SSH into Ezra VPS: `ssh root@143.198.27.163`
2. Stop Gitea so files are quiescent:
```bash
systemctl stop gitea
```
3. Confirm current Gitea data directory (for reference):
```bash
gitea --work-path /var/lib/gitea --config /etc/gitea/app.ini dump --help 2>&1
# Or check app.ini for WORK_PATH
cat /etc/gitea/app.ini | grep '^WORK_PATH'
```
### Phase 2 — Restore from Backup (20 min)
4. Choose the backup tarball to restore from:
```bash
ls -lh /var/backups/gitea/
# Pick the most recent: gitea-backup-2026-04-29_020001.tar.gz
```
5. **Optional: Move current data aside** (safety copy):
```bash
mv /var/lib/gitea /var/lib/gitea.bak-$(date +%s)
```
6. Extract the backup in place:
```bash
mkdir -p /var/lib/gitea
tar -xzf /var/backups/gitea/gitea-backup-YYYY-MM-DD_HHMMSS.tar.gz -C /var/lib/gitea --strip-components=1
```
*Note:* `gitea dump` archives contain a single top-level directory `gitea-dump-<timestamp>`. The `--strip-components=1` puts its contents directly into `/var/lib/gitea`.
7. Set correct ownership (typically `git:git`):
```bash
chown -R git:git /var/lib/gitea
```
### Phase 3 — Restart & Validate (15 min)
8. Start Gitea:
```bash
systemctl start gitea
```
9. Wait 30 seconds, then verify:
```bash
systemctl status gitea
# Check HTTP endpoint
curl -s -o /dev/null -w '%{http_code}' http://localhost:3000/ # Should be 200
```
10. Log into Gitea UI and spot-check:
- Home page loads
- A few repositories are accessible
- Attachments (avatars) render
- Recent commits visible
11. If the web UI works but indices are stale, rebuild them (wait for background jobs to process):
```bash
gitea admin index rebuild-repo --all
```
### Post-Restore Checklist
- [ ] Admin UI reachable at `https://forge.alexanderwhitestone.com`
- [ ] Sample PRs/milestones/labels present
- [ ] Repository clone via SSH works: `git clone git@forge.alexanderwhitestone.com:Timmy_Foundation/timmy-config.git`
- [ ] Check backup script health: `cat /var/log/gitea-backup.log | tail -20`
- [ ] Re-enable any disabled integrations (webhooks, CI/CD runners)
- [ ] Notify the fleet: post to relevant channels confirming operational status
---
## Known Issues & Workarounds
| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `gitea: command not found` | Binary at non-standard path | Set `GITEA_BIN=/path/to/gitea` in cron env |
| `Permission denied` on backup dir | Cron user lacks write access to `/var/backups` | `mkdir /var/backups/gitea && chown root:root /var/backups/gitea` |
| Restore fails: `"database or disk is full"` | Insufficient space on `/var/lib/gitea` | Expand disk or clean up old data first; backups require ~1.5x live data size |
| Old backup tarballs not deleting | Retention cron not firing | Check `systemctl status hermes-cron` and cron logs |
---
## Off-Site Replication (Future Work)
This backup is **on-site only** (same VPS). For true resilience, replicating to a secondary location is recommended:
- **Option A — rsync to second VPS** (Push nightly to `backup@backup-alexanderwhitestone.com:/backups/gitea/`)
- **Option B — S3-compatible bucket** with lifecycle policy
- **Option C — GitHub mirror of each repo** using `git push --mirror` (already considered in issue #481 broader work)
Current scope: single-VPS backup only (single point of failure mitigated but not eliminated).
---
## Related Documentation
- `bin/gitea-backup.sh` — backup script source
- `cron/vps/gitea-daily-backup.yml` — Hermes cron definition
- Gitea official docs: <https://docs.gitea.com/administration/backup-and-restore>
- Hermes cron: <https://hermes-agent.nousresearch.com/docs>

View File

@@ -0,0 +1,147 @@
#!/usr/bin/env python3
"""generate_prompt_enhancement.py — Generate 3000 terse→rich prompt pairs.
Domains (750 each): visual scenes, music moods, dream descriptions, emotional weather.
Output: ~/.hermes/training-data/prompt-enhancement.jsonl
Issue: Timmy_Foundation/timmy-config #575
"""
import argparse, json, os, random
from datetime import datetime, timezone
random.seed(575)
OUTPUT_PATH = os.path.expanduser("~/.hermes/training-data/prompt-enhancement.jsonl")
VISUAL_ADJS = ["whimsical","melancholy","pristine","muted","surreal","epic","serene",
"stormy","ethereal","gritty","vibrant","haunting","breathtaking","ancient",
"enchanted","crystal","frosty","golden","moonlit","sun-drenched"]
VISUAL_NOUNS = ["rain","sunset","fog","beach","mountain","forest","ocean","cityscape",
"meadow","canyon","desert","aurora","glacier","river","volcano","swamp",
"cliff","prairie","temple","ruins"]
VISUAL_ELEMENTS = ["moonlit shadows","frost clinging to every surface",
"birds wheeling in formation","lichen-covered stones",
"wildflowers dotting the foreground","mist rising from the valley",
"leaves rustling in wind","dust motes caught in a shaft of light",
"reflections on still water","silhouettes against the sky"]
VISUAL_STYLES = ["painted in digital concept art","cinematic lighting","film noir style",
"dreamy soft focus","hyperrealistic detail","painterly brushstrokes",
"art nouveau linework","pop art vibrancy"]
MUSIC_EMOTIONS = ["aggression","tenderness","sorrow","euphoria","ominousness","rage",
"serenity","longing","melancholy","triumph","despair","hope"]
MUSIC_INSTR = ["riff","piano","violin","cello","trumpet","harp","saxophone","drum",
"flute","guitar","synth","orchestra","voice"]
MUSIC_COLORS = {
"aggression":["red","black","orange"],
"tenderness":["soft pink","cream","warm gold"],
"sorrow":["deep blue","grey","silver"],
"euphoria":["bright gold","white","cyan"],
"ominousness":["purple","dark green","blood red"],
"rage":["crimson","ember orange","black"],
"serenity":["pale blue","seafoam green","white"],
"longing":["lavender","dusty rose","grey"],
"melancholy":["slate grey","faded blue","ash"],
"triumph":["royal blue","gold","white"],
"despair":["mud brown","charcoal","dust"],
"hope":["soft yellow","warm white","pale green"],
}
MUSIC_FRAMING = ["extreme close-up","close-up","medium shot","wide shot","long shot",
"low angle","high angle"]
DREAM_PHRASES = [
"falling forever","flying over ocean","teeth crumbling","melting face",
"being chased by shadows","teeth falling out","naked in public","floating upwards",
"endless staircase","lost wallet","dead relatives speaking","backwards childhood home",
"trapped in a mirror","growing wings","underwater breathing","time looping",
"forgotten passwords","public speaking naked","infinite paperwork","rooms that shift"]
DREAM_FRAMINGS = [
"rule of thirds framing with dolly forward movement",
"close-up framing with tracking shot movement",
"dutch angle framing with steady hold movement",
"wide shot framing with rack focus movement",
"low angle framing with pan movement",
"high angle framing with tilt movement",
"extreme close-up framing with zoom movement",
"silhouette framing against backlight"]
EMOTION_ADJS = ["cold","thick","raw","beautiful","heavy","sharp","bitter","warm"]
EMOTION_NOUNS = ["rage","grief","joy","fear","hope","despair","love","loneliness"]
EMOTION_COLORS = {
"rage":["blood red","black","orange"],
"grief":["charcoal","deep blue","ash"],
"joy":["bright gold","warm white","orange"],
"fear":["sickly green","purple","grey"],
"hope":["soft pink","pale yellow","white"],
"despair":["mud brown","dark grey","faded blue"],
"love":["rose","crimson","gold"],
"loneliness":["pale blue","white","grey"]}
EMOTION_FRAMING = ["wide shot","close-up","silhouette frame","high angle",
"low angle","dutch angle","bird's eye view","subjective"]
def generate_visual():
adj, noun = random.choice(VISUAL_ADJS), random.choice(VISUAL_NOUNS)
elements = random.sample(VISUAL_ELEMENTS, k=random.randint(2,3))
style = random.choice(VISUAL_STYLES)
lighting = random.choice(["moonlit","golden hour","storm light","misty dawn","midday glare","twilight"])
return {"terse": f"{adj} {noun}",
"rich": f"A {adj} {noun} bathed in {lighting}, {', '.join(elements)}, {style}.",
"domain": "visual scenes"}
def generate_music():
emotion, instr = random.choice(MUSIC_EMOTIONS), random.choice(MUSIC_INSTR)
colors = random.choice(list(MUSIC_COLORS.values()))
framing = random.choice(MUSIC_FRAMING)
return {"terse": f"{emotion} {instr}",
"rich": f"A bowing visualization of {emotion} through music. Colors: {', '.join(colors)}. {framing} framing.",
"domain": "music moods"}
def generate_dream():
phrase = random.choice(DREAM_PHRASES)
framing = random.choice(DREAM_FRAMINGS)
return {"terse": phrase,
"rich": f"A surreal dream visualization of '{phrase}'. {framing}.",
"domain": "dream descriptions"}
def generate_emotional():
emotion, weather = random.choice(EMOTION_NOUNS), random.choice(["storm","rain","sunrise","fog","thunder","blizzard","drought","sky"])
adj = random.choice(EMOTION_ADJS)
colors = random.choice(EMOTION_COLORS[emotion])
framing = random.choice(EMOTION_FRAMING)
return {"terse": f"{emotion} {weather}",
"rich": f"An {adj} visualization of {emotion}. Colors: {colors}. {framing} framing.",
"domain": "emotional weather"}
def main():
p = argparse.ArgumentParser()
p.add_argument("--seed", type=int, default=575)
p.add_argument("--dry-run", action="store_true")
p.add_argument("--output", default=OUTPUT_PATH)
args = p.parse_args()
random.seed(args.seed)
print(f"Generating 3000 prompt-enhancement pairs (seed={args.seed})")
needed = {"visual scenes":750,"music moods":750,"dream descriptions":750,"emotional weather":750}
total = sum(needed.values()); pairs = []
for domain, count in needed.items():
domain_pairs = []
for i in range(count):
if domain == "visual scenes": pair = generate_visual()
elif domain == "music moods": pair = generate_music()
elif domain == "dream descriptions": pair = generate_dream()
elif domain == "emotional weather": pair = generate_emotional()
else: continue
pair["id"] = f"{domain.replace(' ', '-')}-{i+1:04d}"
pair["model"] = "generator-script"
pair["timestamp"] = datetime.now(timezone.utc).isoformat()
domain_pairs.append(pair)
pairs.extend(domain_pairs)
print(f" {domain}: {len(domain_pairs)} entries")
assert len(pairs) == total, f"Expected {total}, got {len(pairs)}"
if args.dry_run:
print("\\nDry run — no output written."); return 0
os.makedirs(os.path.dirname(args.output), exist_ok=True)
with open(args.output, "w") as f:
for p in pairs: f.write(json.dumps(p, ensure_ascii=False) + "\n")
print(f"\\nTotal: {len(pairs)} pairs\\nOutput: {args.output}"); return 0
if __name__ == "__main__":
raise SystemExit(main())