Commit Graph

3 Commits

Author SHA1 Message Date
Alexander Whitestone
eb0ff150ea feat: JSON schema + validator for scene description training data (#647)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 26s
Smoke Test / smoke (pull_request) Failing after 18s
Validate Config / YAML Lint (pull_request) Failing after 15s
Validate Config / JSON Validate (pull_request) Successful in 17s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m25s
Validate Config / Shell Script Lint (pull_request) Failing after 1m4s
Validate Config / Cron Syntax Check (pull_request) Successful in 11s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 13s
Validate Config / Playbook Schema Validation (pull_request) Successful in 20s
Validate Training Data / validate (pull_request) Successful in 22s
PR Checklist / pr-checklist (pull_request) Failing after 5m17s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
- schema.json: JSON Schema for scene description entries (song, artist,
  beat, timestamp, duration_seconds, lyric_line, scene with
  mood/colors/composition/camera/description)
- validate.py: CLI validator for JSONL files (no jsonschema dependency,
  manual fallback). Validates all fields, types, ranges, empty values.
- .gitea/workflows/validate-training-data.yml: CI check on PR

Tested: 100/100 entries in scene-descriptions-r&b-soul.jsonl pass.

Closes #647
2026-04-14 23:11:51 -04:00
Alexander Whitestone
f525c0e094 feat(training): R&B/Soul scene descriptions — 100 lyrics->visual sets (#613)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 25s
Smoke Test / smoke (pull_request) Failing after 14s
Validate Config / YAML Lint (pull_request) Failing after 11s
Validate Config / JSON Validate (pull_request) Successful in 11s
PR Checklist / pr-checklist (pull_request) Failing after 1m14s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 43s
Validate Config / Cron Syntax Check (pull_request) Successful in 10s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 10s
Validate Config / Shell Script Lint (pull_request) Failing after 41s
Validate Config / Playbook Schema Validation (pull_request) Successful in 14s
Architecture Lint / Lint Repository (pull_request) Failing after 19s
Validate Config / Python Test Suite (pull_request) Has been cancelled
10 R&B/Soul songs, 10 visual beats each = 100 scene description sets.

Songs: Velvet Hours, Ghost Note, Kitchen Floor, After Midnight Caller,
Grandmother's Hands, Concrete Lullaby, Blue Note Serenade, Sunday Driver,
Pillow Talk Confessions, Crowns and Thorns.

Each beat includes: timestamp, duration, lyric_line, scene (mood, colors,
composition, camera movement, description).

Moods span: intimacy, grief, joy, defiance, nostalgia, passion, faith,
resilience, vulnerability, triumph.

Output: training/data/scene-descriptions/scene-descriptions-r&b-soul.jsonl
Also deployed to: ~/.hermes/training-data/

Closes #613
2026-04-14 18:35:51 -04:00
perplexity
6507cffc15 feat: migrate autolora pipeline into training/
Per direction shift (the-nexus#542).

Replaces the autolora repo (1,500 lines of custom pipeline code)
with config files for existing tools:

- axolotl.yaml: replaces train_modal.py (239 lines)
- mlx-lora.yaml: replaces MLX training scripts
- eval-tasks.yaml: replaces run_eval.py (300 lines)
- Makefile: replaces run_vibes.py, compare.py, convert_to_mlx.py

Data migrated as-is:
- curated_dataset.jsonl (26 gold-standard conversations)
- preference_pairs.jsonl (DPO pairs)
- prompts_vibes.yaml, prompts_nexus_vibes.yaml
- v0-baseline eval results (historical record)

Thin glue kept:
- build_curated.py (data authoring, not infrastructure)
- ingest_trajectories.py (domain-specific quality filter)

Dependencies: pip install axolotl mlx-lm lm-evaluation-harness
2026-03-25 23:05:50 +00:00