Files
timmy-config/docs/issue-600-verification.md
Alexander Whitestone c072012983
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 23s
Smoke Test / smoke (pull_request) Failing after 23s
Validate Config / YAML Lint (pull_request) Failing after 20s
Validate Config / JSON Validate (pull_request) Successful in 22s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m3s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Shell Script Lint (pull_request) Failing after 1m5s
Validate Config / Cron Syntax Check (pull_request) Successful in 13s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 13s
Validate Config / Playbook Schema Validation (pull_request) Successful in 31s
Architecture Lint / Lint Repository (pull_request) Failing after 26s
PR Checklist / pr-checklist (pull_request) Successful in 3m19s
docs: verify issue #600 visual scenes dataset
Add a verification note and regression test showing that
training/data/prompt-enhancement/visual-scenes-500.jsonl already
exists on main with 500 valid records.
2026-04-22 10:25:06 -04:00

1.7 KiB

Issue #600 Verification

Status: already implemented on main.

Issue: Prompt Enhancement: Visual Scenes — 500 Terse→Rich Pairs

What is already present on main

  • training/data/prompt-enhancement/visual-scenes-500.jsonl
  • 500 JSONL records
  • every record includes terse, rich, and domain
  • every domain value is visual scenes
  • 500/500 full records are unique

Evidence gathered from a fresh clone

  • python3 validation against training/data/prompt-enhancement/visual-scenes-500.jsonl returned:
    • count = 500
    • unique_records = 500
    • unique_terse = 435
    • domains = ['visual scenes']
    • missing_keys = 0
  • remote branch fix/600 still exists from closed PR #731 (feat: 500 visual scene prompt enhancement pairs (#600))
  • SHA-256 of training/data/prompt-enhancement/visual-scenes-500.jsonl on origin/main matches the same file on fix/600, which shows the requested dataset is already present on main

Verification commands

python3 - <<'PY'
import json
from pathlib import Path
path = Path('training/data/prompt-enhancement/visual-scenes-500.jsonl')
records = [json.loads(line) for line in path.read_text().splitlines() if line.strip()]
print('count', len(records))
print('unique_records', len({json.dumps(r, sort_keys=True) for r in records}))
print('unique_terse', len({r['terse'] for r in records}))
print('domains', sorted({r.get('domain') for r in records}))
print('missing_keys', sum(any(k not in r or not str(r[k]).strip() for k in ('terse', 'rich', 'domain')) for r in records))
PY

Recommendation

  • Close issue #600 as already implemented on main.
  • This branch only adds a durable verification note and regression test so the zombie issue can be closed cleanly without regenerating duplicate training data.