Commit Graph

27 Commits

Author SHA1 Message Date
Merge Bot
8beac54407 Merge PR #724: training/run_adversary_eval.py (added) 2026-04-16 05:04:14 +00:00
Merge Bot
10e32defff Merge PR #724: training/data/prompts_adversary_value_violations.yaml (added) 2026-04-16 05:04:12 +00:00
Merge Bot
fcaf75a9fe Merge PR #731: training/data/prompt-enhancement/visual-scenes-500.jsonl (added) 2026-04-16 05:03:47 +00:00
Merge Bot
c7f0c081fc Merge PR #751: training/test_training_pair_provenance.py (added) 2026-04-16 05:03:01 +00:00
Merge Bot
1cd56a06ce Merge PR #767: training/validate_provenance.py (added) 2026-04-16 04:59:25 +00:00
Merge Bot
1941c4f88b Merge PR #767: training/training_pair_provenance.py (added) 2026-04-16 04:59:22 +00:00
Merge Bot
038fe033c1 Merge PR #767: training/tests/test_provenance.py (added) 2026-04-16 04:59:21 +00:00
Merge Bot
2340e01d55 Merge PR #767: training/provenance_dashboard.py (added) 2026-04-16 04:59:19 +00:00
Merge Bot
a653434dbb Merge PR #786: training/scripts/quality_filter.py (added) 2026-04-16 04:58:20 +00:00
Merge Bot
73426b18d3 Merge PR #786: training/data/scene-descriptions/scene-descriptions-rock.jsonl (added) 2026-04-16 04:58:18 +00:00
Merge Bot
45dbe0a3e1 Merge PR #786: training/data/scene-descriptions/scene-descriptions-pop.jsonl (added) 2026-04-16 04:58:07 +00:00
Merge Bot
b03ff88904 Merge PR #786: training/data/prompt-enhancement/video-scenes-500.jsonl (added) 2026-04-16 04:58:06 +00:00
Merge Bot
f1087d4877 Merge PR #786: training/data/prompt-enhancement/music-moods-500.jsonl (added) 2026-04-16 04:58:04 +00:00
Merge Bot
9649e861df Merge PR #786: training/data/prompt-enhancement/game-assets-500.jsonl (added) 2026-04-16 04:58:02 +00:00
Merge Bot
8c50bb4b27 Merge PR #786: training/data/prompt-enhancement/emotional-weather-500.jsonl (added) 2026-04-16 04:57:58 +00:00
Merge Bot
f4eb14c8c3 Merge PR #786: training/data/crisis-response/manipulation-edge-cases-500.jsonl (added) 2026-04-16 04:57:56 +00:00
667cdfd51b Merge pull request 'feat: Electronic scene descriptions — 100 lyrics→visual sets (#609)' (#746) from fix/609 into main 2026-04-15 16:03:41 +00:00
Alexander Whitestone
0fdfb8e65b feat: Electronic scene descriptions — 100 lyrics->visual sets (#609)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 21s
Smoke Test / smoke (pull_request) Failing after 14s
Validate Config / YAML Lint (pull_request) Failing after 14s
Validate Config / JSON Validate (pull_request) Successful in 15s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m21s
Validate Config / Shell Script Lint (pull_request) Failing after 24s
Validate Config / Cron Syntax Check (pull_request) Successful in 8s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 8s
Validate Config / Playbook Schema Validation (pull_request) Successful in 13s
Validate Training Data / validate (pull_request) Successful in 9s
PR Checklist / pr-checklist (pull_request) Failing after 6m11s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
10 Electronic songs, 10 visual beats each = 100 scene description sets.

Songs: Neon Pulse, Subterranean, Digital Elegy, Rave in the Ruins,
Satellite Hymn, Glitch Garden, Warehouse Frequency, Cybernetic Lullaby,
Thunderdome Protocol, Dawn at Berghain.

Closes #609
2026-04-15 11:47:59 -04:00
Alexander Whitestone
b62748f51d feat: Folk scene descriptions — 100 lyrics->visual sets (#610)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 13s
PR Checklist / pr-checklist (pull_request) Failing after 2m50s
Smoke Test / smoke (pull_request) Failing after 5s
Validate Config / YAML Lint (pull_request) Failing after 4s
Validate Config / JSON Validate (pull_request) Successful in 5s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 24s
Validate Config / Cron Syntax Check (pull_request) Successful in 10s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 10s
Validate Config / Shell Script Lint (pull_request) Failing after 37s
Validate Config / Playbook Schema Validation (pull_request) Successful in 16s
Validate Training Data / validate (pull_request) Successful in 15s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
10 Folk songs, 10 visual beats each = 100 scene description sets.

Songs: Dust Bowl Daughter, Lantern in the Window, River Baptism,
Coal Miner's Lullaby, Wildflower Road, Grandmother's Kitchen,
Harbor Song, Holler Echo, Train Whistle Gospel, Old Growth.

Closes #610
2026-04-15 11:40:46 -04:00
Alexander Whitestone
5ef9bbe98c feat: Jazz scene descriptions — 100 lyrics->visual sets (#611)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 6s
PR Checklist / pr-checklist (pull_request) Failing after 1m49s
Smoke Test / smoke (pull_request) Failing after 6s
Validate Config / YAML Lint (pull_request) Failing after 5s
Validate Config / JSON Validate (pull_request) Successful in 8s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 40s
Validate Config / Shell Script Lint (pull_request) Failing after 15s
Validate Config / Cron Syntax Check (pull_request) Successful in 4s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 6s
Validate Config / Playbook Schema Validation (pull_request) Successful in 9s
Validate Training Data / validate (pull_request) Successful in 7s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
10 Jazz songs, 10 visual beats each = 100 scene description sets.

Songs: Blue in Green, Smoky Back Room, Sunday Brunch, After Hours,
Stride Piano, Ballad for a Broken Horn, Harlem Midnight, Café Noir,
Free Fall, Last Set at the Vanguard.

Closes #611
2026-04-15 11:33:47 -04:00
Alexander Whitestone
0221be9460 feat: Classical scene descriptions — 100 lyrics->visual sets (#612)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 20s
Smoke Test / smoke (pull_request) Failing after 14s
Validate Config / YAML Lint (pull_request) Failing after 15s
Validate Config / JSON Validate (pull_request) Successful in 16s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m16s
Validate Config / Shell Script Lint (pull_request) Failing after 29s
Validate Config / Cron Syntax Check (pull_request) Successful in 6s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 6s
PR Checklist / pr-checklist (pull_request) Failing after 3m14s
Validate Config / Playbook Schema Validation (pull_request) Successful in 13s
Validate Training Data / validate (pull_request) Successful in 9s
Validate Config / Python Test Suite (pull_request) Has been cancelled
Architecture Lint / Lint Repository (pull_request) Has been cancelled
10 Classical songs, 10 visual beats each = 100 scene description sets.

Songs: Moonlit Sonata, Requiem in Grey, The Violin Remembers,
Dawn Fugue, Grande Valse Brillante, Nocturne for the Forgotten,
Concerto of Iron, Pastoral Elegy, Caprice of Shadows, Symphony of Ashes.

Closes #612
2026-04-15 11:29:03 -04:00
Alexander Whitestone
b2b3d59bad feat: Metal scene descriptions — 100 lyrics->visual sets (#615)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 13s
PR Checklist / pr-checklist (pull_request) Failing after 2m42s
Smoke Test / smoke (pull_request) Failing after 18s
Validate Config / YAML Lint (pull_request) Failing after 15s
Validate Config / JSON Validate (pull_request) Successful in 13s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m2s
Validate Config / Shell Script Lint (pull_request) Failing after 39s
Validate Config / Cron Syntax Check (pull_request) Successful in 8s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 9s
Validate Config / Playbook Schema Validation (pull_request) Successful in 19s
Validate Training Data / validate (pull_request) Successful in 13s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
10 Metal songs, 10 visual beats each = 100 scene description sets.

Songs: Iron Crown, Void Cathedral, Molten Psalm, Serpent Hymn,
Bone Harvest, Electric Funeral, Obsidian Throne, Witch Pyre,
Grave New World, Wolves in the Walls.

Moods span: rage, dread, defiance, apocalypse, transcendence,
brutality, paranoia, hope.

Closes #615
2026-04-15 11:19:04 -04:00
Alexander Whitestone
3603030235 feat: training data augmentation — paraphrase and translate pairs (#695)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 22s
Smoke Test / smoke (pull_request) Failing after 18s
Validate Config / YAML Lint (pull_request) Failing after 23s
Validate Config / JSON Validate (pull_request) Successful in 21s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m54s
Validate Config / Shell Script Lint (pull_request) Failing after 54s
Validate Config / Cron Syntax Check (pull_request) Successful in 16s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 16s
Validate Config / Playbook Schema Validation (pull_request) Successful in 23s
PR Checklist / pr-checklist (pull_request) Failing after 11m2s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
augment_pairs.py: generates paraphrases and translations for any
JSONL training file.

Features:
- Auto-detects text field (rich, terse, text, content, lyric_line, etc.)
- N paraphrases per entry (template-based, or LLM with --llm-endpoint)
- Translations to ES, FR, DE (template dictionary, or LLM)
- Outputs augmented JSONL alongside originals
- Marks each augmented entry with _augmentation, _original, _language

Usage:
  python3 augment_pairs.py --input data.jsonl
  python3 augment_pairs.py --input data.jsonl --paraphrases 5 --langs es,fr
  python3 augment_pairs.py --input data.jsonl --llm-endpoint http://localhost:11434/v1

Closes #695
2026-04-15 07:51:38 -04:00
Alexander Whitestone
eb0ff150ea feat: JSON schema + validator for scene description training data (#647)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 26s
Smoke Test / smoke (pull_request) Failing after 18s
Validate Config / YAML Lint (pull_request) Failing after 15s
Validate Config / JSON Validate (pull_request) Successful in 17s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m25s
Validate Config / Shell Script Lint (pull_request) Failing after 1m4s
Validate Config / Cron Syntax Check (pull_request) Successful in 11s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 13s
Validate Config / Playbook Schema Validation (pull_request) Successful in 20s
Validate Training Data / validate (pull_request) Successful in 22s
PR Checklist / pr-checklist (pull_request) Failing after 5m17s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
- schema.json: JSON Schema for scene description entries (song, artist,
  beat, timestamp, duration_seconds, lyric_line, scene with
  mood/colors/composition/camera/description)
- validate.py: CLI validator for JSONL files (no jsonschema dependency,
  manual fallback). Validates all fields, types, ranges, empty values.
- .gitea/workflows/validate-training-data.yml: CI check on PR

Tested: 100/100 entries in scene-descriptions-r&b-soul.jsonl pass.

Closes #647
2026-04-14 23:11:51 -04:00
Alexander Whitestone
f525c0e094 feat(training): R&B/Soul scene descriptions — 100 lyrics->visual sets (#613)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 25s
Smoke Test / smoke (pull_request) Failing after 14s
Validate Config / YAML Lint (pull_request) Failing after 11s
Validate Config / JSON Validate (pull_request) Successful in 11s
PR Checklist / pr-checklist (pull_request) Failing after 1m14s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 43s
Validate Config / Cron Syntax Check (pull_request) Successful in 10s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 10s
Validate Config / Shell Script Lint (pull_request) Failing after 41s
Validate Config / Playbook Schema Validation (pull_request) Successful in 14s
Architecture Lint / Lint Repository (pull_request) Failing after 19s
Validate Config / Python Test Suite (pull_request) Has been cancelled
10 R&B/Soul songs, 10 visual beats each = 100 scene description sets.

Songs: Velvet Hours, Ghost Note, Kitchen Floor, After Midnight Caller,
Grandmother's Hands, Concrete Lullaby, Blue Note Serenade, Sunday Driver,
Pillow Talk Confessions, Crowns and Thorns.

Each beat includes: timestamp, duration, lyric_line, scene (mood, colors,
composition, camera movement, description).

Moods span: intimacy, grief, joy, defiance, nostalgia, passion, faith,
resilience, vulnerability, triumph.

Output: training/data/scene-descriptions/scene-descriptions-r&b-soul.jsonl
Also deployed to: ~/.hermes/training-data/

Closes #613
2026-04-14 18:35:51 -04:00
Alexander Whitestone
82ba8896b3 docs: clarify sidecar boundary and training status 2026-03-27 17:15:57 -04:00
perplexity
6507cffc15 feat: migrate autolora pipeline into training/
Per direction shift (the-nexus#542).

Replaces the autolora repo (1,500 lines of custom pipeline code)
with config files for existing tools:

- axolotl.yaml: replaces train_modal.py (239 lines)
- mlx-lora.yaml: replaces MLX training scripts
- eval-tasks.yaml: replaces run_eval.py (300 lines)
- Makefile: replaces run_vibes.py, compare.py, convert_to_mlx.py

Data migrated as-is:
- curated_dataset.jsonl (26 gold-standard conversations)
- preference_pairs.jsonl (DPO pairs)
- prompts_vibes.yaml, prompts_nexus_vibes.yaml
- v0-baseline eval results (historical record)

Thin glue kept:
- build_curated.py (data authoring, not infrastructure)
- ingest_trajectories.py (domain-specific quality filter)

Dependencies: pip install axolotl mlx-lm lm-evaluation-harness
2026-03-25 23:05:50 +00:00