Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 19s
Smoke Test / smoke (pull_request) Failing after 24s
Validate Config / YAML Lint (pull_request) Failing after 20s
Validate Config / JSON Validate (pull_request) Successful in 19s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m0s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Shell Script Lint (pull_request) Failing after 24s
Validate Config / Cron Syntax Check (pull_request) Successful in 5s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 4s
Validate Config / Playbook Schema Validation (pull_request) Successful in 8s
Architecture Lint / Lint Repository (pull_request) Failing after 7s
PR Checklist / pr-checklist (pull_request) Successful in 6m17s
Add a verification note and regression test showing that training/data/prompt-enhancement/video-scenes-500.jsonl already exists on main with 500 valid records.
47 lines
2.2 KiB
Markdown
47 lines
2.2 KiB
Markdown
# Issue #605 Verification
|
|
|
|
Status: already implemented on `main`.
|
|
|
|
Issue: Prompt Enhancement: Video Scenes — 500 Terse→Rich Pairs
|
|
|
|
What is already present on `main`
|
|
- `training/data/prompt-enhancement/video-scenes-500.jsonl`
|
|
- 500 JSONL records
|
|
- every record includes `terse`, `rich`, and `domain`
|
|
- every `domain` value is `video scenes`
|
|
- 500/500 full records are unique
|
|
- every `rich` prompt includes video-scene structure markers for `lighting`, `composition`, and `transition`
|
|
|
|
Evidence gathered from a fresh clone
|
|
- validation against `training/data/prompt-enhancement/video-scenes-500.jsonl` returned:
|
|
- `count = 500`
|
|
- `unique_records = 500`
|
|
- `unique_terse = 120`
|
|
- `domains = ['video scenes']`
|
|
- `missing_keys = 0`
|
|
- all 500 `rich` prompts contain `lighting`, `composition`, and `transition`
|
|
- closed PRs `#755` (`fix/605`) and `#648` (`feat/605-video-scenes-prompts`) show prior attempts on this lane
|
|
- SHA-256 of `training/data/prompt-enhancement/video-scenes-500.jsonl` on `origin/main` matches the same file on remote branch `fix/605`, which shows the requested dataset is already present on `main`
|
|
|
|
Verification commands
|
|
```bash
|
|
python3 - <<'PY'
|
|
import json
|
|
from pathlib import Path
|
|
path = Path('training/data/prompt-enhancement/video-scenes-500.jsonl')
|
|
records = [json.loads(line) for line in path.read_text().splitlines() if line.strip()]
|
|
print('count', len(records))
|
|
print('unique_records', len({json.dumps(r, sort_keys=True) for r in records}))
|
|
print('unique_terse', len({r['terse'] for r in records}))
|
|
print('domains', sorted({r.get('domain') for r in records}))
|
|
print('missing_keys', sum(any(k not in r or not str(r[k]).strip() for k in ('terse', 'rich', 'domain')) for r in records))
|
|
print('lighting_count', sum('lighting' in r['rich'].lower() for r in records))
|
|
print('composition_count', sum('composition' in r['rich'].lower() for r in records))
|
|
print('transition_count', sum('transition' in r['rich'].lower() for r in records))
|
|
PY
|
|
```
|
|
|
|
Recommendation
|
|
- Close issue #605 as already implemented on `main`.
|
|
- This branch only adds a durable verification note and regression test so the open issue can be closed cleanly without regenerating duplicate training data.
|