Commit Graph

852 Commits

Author SHA1 Message Date
Merge Bot
afa2f98750 Merge PR #754: ansible/scripts/deploy-bezalel.sh (added) 2026-04-16 05:02:51 +00:00
Merge Bot
1a362637c9 Merge PR #763: scripts/pr-backlog-triage.py (added) 2026-04-16 04:59:59 +00:00
Merge Bot
75153cb001 Merge PR #765: training-data/crisis-manipulation-500.jsonl (added) 2026-04-16 04:59:38 +00:00
Merge Bot
1cd56a06ce Merge PR #767: training/validate_provenance.py (added) 2026-04-16 04:59:25 +00:00
Merge Bot
1941c4f88b Merge PR #767: training/training_pair_provenance.py (added) 2026-04-16 04:59:22 +00:00
Merge Bot
038fe033c1 Merge PR #767: training/tests/test_provenance.py (added) 2026-04-16 04:59:21 +00:00
Merge Bot
2340e01d55 Merge PR #767: training/provenance_dashboard.py (added) 2026-04-16 04:59:19 +00:00
Merge Bot
6b7d219a29 Merge PR #768: scripts/token_budget.py (added) 2026-04-16 04:59:16 +00:00
Merge Bot
e399ce40a8 Merge PR #769: tests/test_quality_gate.py (added) 2026-04-16 04:59:13 +00:00
Merge Bot
318eaefb81 Merge PR #771: scripts/quality_gate_integration.py (added) 2026-04-16 04:59:01 +00:00
Merge Bot
d76182c654 Merge PR #772: scripts/cron_audit.py (added) 2026-04-16 04:58:59 +00:00
Merge Bot
9bdd2d776e Merge PR #773: tests/test_hash_dedup.py (added) 2026-04-16 04:58:57 +00:00
Merge Bot
8c5b82e214 Merge PR #773: scripts/hash_dedup.py (added) 2026-04-16 04:58:55 +00:00
Merge Bot
96dedc7930 Merge PR #774: training-data/scene-descriptions-r&b-soul.jsonl (added) 2026-04-16 04:58:52 +00:00
Merge Bot
297363a141 Merge PR #775: scripts/pr-triage-automation.py (added) 2026-04-16 04:58:49 +00:00
Merge Bot
29790d24aa Merge PR #776: tests/test_config_drift.py (added) 2026-04-16 04:58:46 +00:00
Merge Bot
7f121d5591 Merge PR #776: scripts/config_drift.py (added) 2026-04-16 04:58:44 +00:00
Merge Bot
5c4b453687 Merge PR #777: tests/test_token_tracker.py (added) 2026-04-16 04:58:41 +00:00
Merge Bot
218b6dcb33 Merge PR #777: scripts/token_tracker.py (added) 2026-04-16 04:58:40 +00:00
Merge Bot
872a2d3f79 Merge PR #778: evaluations/adversary/corpora/authority_bypass_200.jsonl (added) 2026-04-16 04:58:37 +00:00
Merge Bot
a023128f03 Merge PR #779: training-data/crisis-indirect-500.jsonl (added) 2026-04-16 04:58:34 +00:00
Merge Bot
346b7c6be4 Merge PR #780: tests/test_shebangs.py (added) 2026-04-16 04:58:31 +00:00
Merge Bot
18d8773750 Merge PR #781: adversary/emotional-manipulation-200.jsonl (added) 2026-04-16 04:58:28 +00:00
Merge Bot
291cd9e59c Merge PR #782: tests/test_no_placeholders.py (added) 2026-04-16 04:58:26 +00:00
Merge Bot
a0b2b551c9 Merge PR #783: tests/test_normalize_code_blocks.py (added) 2026-04-16 04:58:24 +00:00
Merge Bot
636e32e467 Merge PR #783: scripts/normalize-code-blocks.py (added) 2026-04-16 04:58:23 +00:00
Merge Bot
a653434dbb Merge PR #786: training/scripts/quality_filter.py (added) 2026-04-16 04:58:20 +00:00
Merge Bot
73426b18d3 Merge PR #786: training/data/scene-descriptions/scene-descriptions-rock.jsonl (added) 2026-04-16 04:58:18 +00:00
Merge Bot
45dbe0a3e1 Merge PR #786: training/data/scene-descriptions/scene-descriptions-pop.jsonl (added) 2026-04-16 04:58:07 +00:00
Merge Bot
b03ff88904 Merge PR #786: training/data/prompt-enhancement/video-scenes-500.jsonl (added) 2026-04-16 04:58:06 +00:00
Merge Bot
f1087d4877 Merge PR #786: training/data/prompt-enhancement/music-moods-500.jsonl (added) 2026-04-16 04:58:04 +00:00
Merge Bot
9649e861df Merge PR #786: training/data/prompt-enhancement/game-assets-500.jsonl (added) 2026-04-16 04:58:02 +00:00
Merge Bot
8c50bb4b27 Merge PR #786: training/data/prompt-enhancement/emotional-weather-500.jsonl (added) 2026-04-16 04:57:58 +00:00
Merge Bot
f4eb14c8c3 Merge PR #786: training/data/crisis-response/manipulation-edge-cases-500.jsonl (added) 2026-04-16 04:57:56 +00:00
Merge Bot
77e29d6df5 Test update (no change) 2026-04-16 04:55:23 +00:00
Merge Bot
6b7b02a036 Merge PR #784: evaluations/adversary/corpora/identity_attacks_200.jsonl 2026-04-16 04:53:27 +00:00
ab1548a97e Delete test file 2026-04-16 04:53:24 +00:00
Bot
c79cf6411b Test file creation 2026-04-16 04:51:03 +00:00
Alexander Whitestone
79d148ddd8 feat: training data quality filter (#687)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 31s
Smoke Test / smoke (pull_request) Failing after 21s
Validate Config / YAML Lint (pull_request) Failing after 14s
Validate Config / JSON Validate (pull_request) Successful in 15s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m12s
PR Checklist / pr-checklist (pull_request) Failing after 5m45s
Validate Config / Shell Script Lint (pull_request) Failing after 46s
Validate Config / Cron Syntax Check (pull_request) Successful in 10s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 9s
Validate Training Data / validate (pull_request) Successful in 15s
Validate Config / Playbook Schema Validation (pull_request) Successful in 19s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
Scores training pairs and removes low-quality entries.

Scoring criteria:
- Response length (too short = low quality)
- Prompt/response ratio (response should be substantive)
- Filler detection (sure, okay, i dont know)
- Placeholder detection (TODO, FIXME, PLACEHOLDER)
- Prompt=response detection (duplicates)
- Repetition detection (repeated bigrams)
- Prompt minimum length

Usage:
  python3 training/scripts/quality_filter.py --input data.jsonl --dry-run
  python3 training/scripts/quality_filter.py --input data.jsonl --threshold 0.5

Closes #687
2026-04-16 00:45:50 -04:00
ada0ee8499 Merge pull request 'feat: 200 value violation jailbreak prompts (#617)' (#785) from fix/617 into main 2026-04-16 04:12:35 +00:00
5c9cd427a7 feat: 200 value violation jailbreak prompts (#617)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 1m22s
Smoke Test / smoke (pull_request) Failing after 20s
Validate Config / YAML Lint (pull_request) Failing after 12s
Validate Config / JSON Validate (pull_request) Successful in 13s
PR Checklist / pr-checklist (pull_request) Failing after 7m58s
Validate Config / Shell Script Lint (pull_request) Failing after 39s
Validate Config / Cron Syntax Check (pull_request) Successful in 7s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 58s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 7s
Validate Config / Playbook Schema Validation (pull_request) Successful in 16s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
2026-04-16 03:20:49 +00:00
Alexander Whitestone
f5d456a5e8 feat: crisis response — manipulation & edge cases 500 pairs (#598)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 22s
PR Checklist / pr-checklist (pull_request) Failing after 3m39s
Smoke Test / smoke (pull_request) Failing after 18s
Validate Config / YAML Lint (pull_request) Failing after 14s
Validate Config / JSON Validate (pull_request) Successful in 14s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m14s
Validate Config / Shell Script Lint (pull_request) Failing after 48s
Validate Config / Cron Syntax Check (pull_request) Successful in 10s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 9s
Validate Config / Playbook Schema Validation (pull_request) Successful in 31s
Validate Training Data / validate (pull_request) Successful in 13s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
500 scenario→response pairs for manipulation and edge cases.
Follows SOUL.md When a Man Is Dying protocol.

Categories (50 each): joking about suicide, hypothetical questions,
testing boundaries, ambiguous distress, testing empathy, relapse,
manipulation tactics, crisis after events, philosophical questions,
provider-specific edge cases.

All 500 pairs include: 988 Lifeline, safety check, gospel,
stay-present commitment. Never computes human life value.

Closes #598
2026-04-15 19:00:27 -04:00
Alexander Whitestone
406923e08f feat: 500 music mood prompt enhancement pairs (#601)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 19s
Smoke Test / smoke (pull_request) Failing after 15s
Validate Config / YAML Lint (pull_request) Failing after 18s
Validate Config / JSON Validate (pull_request) Successful in 20s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 2m0s
PR Checklist / pr-checklist (pull_request) Failing after 3m52s
Validate Config / Cron Syntax Check (pull_request) Successful in 10s
Validate Config / Shell Script Lint (pull_request) Failing after 1m7s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 10s
Validate Training Data / validate (pull_request) Successful in 17s
Validate Config / Playbook Schema Validation (pull_request) Successful in 33s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
2026-04-15 18:49:43 -04:00
Alexander Whitestone
466635c289 feat: 500 emotional weather prompt enhancement pairs (#603)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 15s
PR Checklist / pr-checklist (pull_request) Failing after 3m37s
Smoke Test / smoke (pull_request) Failing after 19s
Validate Config / YAML Lint (pull_request) Failing after 17s
Validate Config / JSON Validate (pull_request) Successful in 12s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m24s
Validate Config / Shell Script Lint (pull_request) Failing after 55s
Validate Config / Cron Syntax Check (pull_request) Successful in 12s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 13s
Validate Config / Playbook Schema Validation (pull_request) Successful in 25s
Validate Training Data / validate (pull_request) Successful in 14s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
2026-04-15 12:56:25 -04:00
Alexander Whitestone
c41b1b0423 feat: 500 game asset prompt enhancement pairs (#604)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 25s
Smoke Test / smoke (pull_request) Failing after 7s
Validate Config / YAML Lint (pull_request) Failing after 6s
Validate Config / JSON Validate (pull_request) Successful in 8s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 45s
Validate Config / Shell Script Lint (pull_request) Failing after 55s
Validate Config / Cron Syntax Check (pull_request) Successful in 17s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 22s
Validate Config / Playbook Schema Validation (pull_request) Successful in 40s
Validate Training Data / validate (pull_request) Successful in 25s
PR Checklist / pr-checklist (pull_request) Failing after 10m23s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
2026-04-15 12:27:28 -04:00
Alexander Whitestone
f1bcbaa701 feat: 500 video scene prompt enhancement pairs (#605)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 20s
Smoke Test / smoke (pull_request) Failing after 14s
Validate Config / YAML Lint (pull_request) Failing after 17s
Validate Config / JSON Validate (pull_request) Successful in 18s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m28s
Validate Config / Shell Script Lint (pull_request) Failing after 55s
Validate Config / Cron Syntax Check (pull_request) Successful in 14s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 9s
Validate Config / Playbook Schema Validation (pull_request) Successful in 14s
Validate Training Data / validate (pull_request) Successful in 14s
PR Checklist / pr-checklist (pull_request) Failing after 6m52s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
2026-04-15 12:27:28 -04:00
Alexander Whitestone
765307aba3 feat: Pop scene descriptions — 100 lyrics->visual sets (#606)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 16s
Smoke Test / smoke (pull_request) Failing after 8s
Validate Config / YAML Lint (pull_request) Failing after 5s
Validate Config / JSON Validate (pull_request) Successful in 8s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 58s
PR Checklist / pr-checklist (pull_request) Failing after 3m29s
Validate Config / Shell Script Lint (pull_request) Failing after 37s
Validate Config / Cron Syntax Check (pull_request) Successful in 7s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 7s
Validate Config / Playbook Schema Validation (pull_request) Successful in 13s
Validate Training Data / validate (pull_request) Successful in 10s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
10 Pop songs, 10 visual beats each = 100 scene description sets.

Songs: Glass Ceiling, Golden Hour, Thunderstruck Summer, Mirror Ball,
Broken Records, Starlight Avenue, Paper Hearts, Champagne Problems,
Neon Heartbeat, Sunflower.

Closes #606
2026-04-15 12:08:38 -04:00
667cdfd51b Merge pull request 'feat: Electronic scene descriptions — 100 lyrics→visual sets (#609)' (#746) from fix/609 into main 2026-04-15 16:03:41 +00:00
Alexander Whitestone
46200ad4c6 feat: Rock scene descriptions — 100 lyrics->visual sets (#607)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 22s
PR Checklist / pr-checklist (pull_request) Failing after 3m37s
Smoke Test / smoke (pull_request) Failing after 11s
Validate Config / YAML Lint (pull_request) Failing after 10s
Validate Config / JSON Validate (pull_request) Successful in 10s
Validate Config / Shell Script Lint (pull_request) Failing after 50s
Validate Config / Cron Syntax Check (pull_request) Successful in 8s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m14s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 9s
Validate Training Data / validate (pull_request) Successful in 12s
Validate Config / Playbook Schema Validation (pull_request) Successful in 15s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
10 Rock songs, 10 visual beats each = 100 scene description sets.

Songs: Asphalt Kingdom, Satellite Hearts, Concrete Psalms,
Wildfire Sermon, Rust Belt Lullaby, Neon Crucifix,
Midnight Transmission, Thunder Road Confessional,
Avalanche Hymn, Cathedral of Feedback.

Closes #607
2026-04-15 12:00:44 -04:00
Alexander Whitestone
0fdfb8e65b feat: Electronic scene descriptions — 100 lyrics->visual sets (#609)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 21s
Smoke Test / smoke (pull_request) Failing after 14s
Validate Config / YAML Lint (pull_request) Failing after 14s
Validate Config / JSON Validate (pull_request) Successful in 15s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m21s
Validate Config / Shell Script Lint (pull_request) Failing after 24s
Validate Config / Cron Syntax Check (pull_request) Successful in 8s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 8s
Validate Config / Playbook Schema Validation (pull_request) Successful in 13s
Validate Training Data / validate (pull_request) Successful in 9s
PR Checklist / pr-checklist (pull_request) Failing after 6m11s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
10 Electronic songs, 10 visual beats each = 100 scene description sets.

Songs: Neon Pulse, Subterranean, Digital Elegy, Rave in the Ruins,
Satellite Hymn, Glitch Garden, Warehouse Frequency, Cybernetic Lullaby,
Thunderdome Protocol, Dawn at Berghain.

Closes #609
2026-04-15 11:47:59 -04:00