Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 25s
Smoke Test / smoke (pull_request) Failing after 24s
Validate Config / YAML Lint (pull_request) Failing after 16s
Validate Config / JSON Validate (pull_request) Successful in 18s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 58s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Cron Syntax Check (pull_request) Successful in 12s
Validate Config / Shell Script Lint (pull_request) Failing after 58s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 11s
Validate Config / Playbook Schema Validation (pull_request) Successful in 25s
Architecture Lint / Lint Repository (pull_request) Failing after 21s
PR Checklist / pr-checklist (pull_request) Successful in 3m20s
- Add docs/issue-605-verification.md documenting dataset presence - Add tests/test_prompt_enhancement_video_scenes.py for regression - 500 record video-scenes-500.jsonl dataset verified on main - Closes #605
2.2 KiB
2.2 KiB
Issue #605 Verification
Status: already implemented on main.
Issue: Prompt Enhancement: Video Scenes — 500 Terse→Rich Pairs
What is already present on main
training/data/prompt-enhancement/video-scenes-500.jsonl- 500 JSONL records
- every record includes
terse,rich, anddomain - every
domainvalue isvideo scenes - 500/500 full records are unique
- every
richprompt includes video-scene structure markers forlighting,composition, andtransition
Evidence gathered from a fresh clone
- validation against
training/data/prompt-enhancement/video-scenes-500.jsonlreturned:count = 500unique_records = 500unique_terse = 120domains = ['video scenes']missing_keys = 0- all 500
richprompts containlighting,composition, andtransition
- closed PRs
#755(fix/605) and#648(feat/605-video-scenes-prompts) show prior attempts on this lane - SHA-256 of
training/data/prompt-enhancement/video-scenes-500.jsonlonorigin/mainmatches the same file on remote branchfix/605, which shows the requested dataset is already present onmain
Verification commands
python3 - <<'PY'
import json
from pathlib import Path
path = Path('training/data/prompt-enhancement/video-scenes-500.jsonl')
records = [json.loads(line) for line in path.read_text().splitlines() if line.strip()]
print('count', len(records))
print('unique_records', len({json.dumps(r, sort_keys=True) for r in records}))
print('unique_terse', len({r['terse'] for r in records}))
print('domains', sorted({r.get('domain') for r in records}))
print('missing_keys', sum(any(k not in r or not str(r[k]).strip() for k in ('terse', 'rich', 'domain')) for r in records))
print('lighting_count', sum('lighting' in r['rich'].lower() for r in records))
print('composition_count', sum('composition' in r['rich'].lower() for r in records))
print('transition_count', sum('transition' in r['rich'].lower() for r in records))
PY
Recommendation
- Close issue #605 as already implemented on
main. - This branch only adds a durable verification note and regression test so the open issue can be closed cleanly without regenerating duplicate training data.