Commit Graph

855 Commits

Author SHA1 Message Date
485783a317 Merge pull request 'feat(#691): training pair provenance tracking — source session + model' (#835) from burn/691-1776769427 into main
Some checks failed
Architecture Lint / Linter Tests (push) Has been cancelled
Architecture Lint / Lint Repository (push) Has been cancelled
Smoke Test / smoke (push) Has been cancelled
Validate Config / Python Syntax & Import Check (push) Has been cancelled
Validate Config / Python Test Suite (push) Has been cancelled
Validate Config / Shell Script Lint (push) Has been cancelled
Validate Config / Cron Syntax Check (push) Has been cancelled
Validate Config / Deploy Script Dry Run (push) Has been cancelled
Validate Config / Playbook Schema Validation (push) Has been cancelled
Validate Config / YAML Lint (push) Has been cancelled
Validate Config / JSON Validate (push) Has been cancelled
2026-04-21 11:21:13 +00:00
eacc670681 test: validate all 9 genre scene files have 100 valid entries (#645)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 24s
Smoke Test / smoke (pull_request) Failing after 44s
Validate Config / YAML Lint (pull_request) Failing after 31s
Validate Config / JSON Validate (pull_request) Successful in 36s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 57s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Shell Script Lint (pull_request) Failing after 23s
Validate Config / Cron Syntax Check (pull_request) Successful in 4s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 5s
Validate Config / Playbook Schema Validation (pull_request) Successful in 7s
PR Checklist / pr-checklist (pull_request) Failing after 12m4s
Architecture Lint / Lint Repository (pull_request) Failing after 24s
2026-04-21 11:20:25 +00:00
3dc1046cf8 fix: add missing artist + timestamp fields to country scene descriptions (#645)
Some checks failed
Architecture Lint / Linter Tests (push) Successful in 23s
Smoke Test / smoke (push) Failing after 20s
Validate Config / YAML Lint (push) Failing after 11s
Validate Config / JSON Validate (push) Successful in 19s
Architecture Lint / Lint Repository (push) Has been cancelled
Validate Config / Python Test Suite (push) Has been cancelled
Validate Config / Cron Syntax Check (push) Has been cancelled
Validate Config / Deploy Script Dry Run (push) Has been cancelled
Validate Config / Playbook Schema Validation (push) Has been cancelled
Validate Config / Shell Script Lint (push) Has been cancelled
Validate Config / Python Syntax & Import Check (push) Has been cancelled
2026-04-21 11:17:36 +00:00
fe864962ec test: Enhance PR triage tests (#659)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 33s
Smoke Test / smoke (pull_request) Failing after 39s
Validate Config / YAML Lint (pull_request) Failing after 27s
Validate Config / JSON Validate (pull_request) Successful in 22s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 21s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Shell Script Lint (pull_request) Failing after 24s
Validate Config / Cron Syntax Check (pull_request) Successful in 5s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 6s
Validate Config / Playbook Schema Validation (pull_request) Successful in 10s
PR Checklist / pr-checklist (pull_request) Failing after 11m27s
Architecture Lint / Lint Repository (pull_request) Failing after 11s
2026-04-21 11:17:00 +00:00
Alexander Whitestone
ced6d20fde fix: add 'unknown' to VALID_SOURCES for process_pair fallback
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 26s
Smoke Test / smoke (pull_request) Failing after 37s
Validate Config / YAML Lint (pull_request) Failing after 20s
Validate Config / JSON Validate (pull_request) Successful in 33s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m3s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Cron Syntax Check (pull_request) Successful in 15s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 13s
Validate Config / Playbook Schema Validation (pull_request) Successful in 31s
Validate Config / Shell Script Lint (pull_request) Failing after 1m43s
Architecture Lint / Lint Repository (pull_request) Failing after 15s
PR Checklist / pr-checklist (pull_request) Failing after 11m25s
2026-04-21 07:16:50 -04:00
5ee2190aaa feat: Enhance PR triage with auto-merge, file-as-issue, org-wide mode (#659) 2026-04-21 11:16:05 +00:00
7cfc84637a feat: Add pr-triage.sh wrapper (#659) 2026-04-21 11:14:31 +00:00
Alexander Whitestone
83457cc9a9 feat(#691): training pair provenance tracking — source session + model
ProvenanceTracker: added add_provenance(), extract_provenance_from_existing(),
filter_by_provenance(), generate_report() methods.

Fixed save_jsonl() to accept both (path, entries) and (entries, path)
argument orders for backward compatibility.

build_curated.py: every exemplar now gets provenance metadata
(source=curated, source_session_id, model=timmy-curated, timestamp).
Provenance coverage reported in build output.

Acceptance criteria:
- [x] Add metadata to each pair: source_session_id, model, timestamp
- [x] Filter pairs by provenance (exclude_models, exclude_sources)
- [x] Report: pair count by source model

Closes #691
2026-04-21 07:14:27 -04:00
d1486b52e8 Merge pull request 'feat: stale hermes process cleanup script (#829)' (#834) from fix/829-stale-process-cleanup into main
Some checks failed
Architecture Lint / Linter Tests (push) Successful in 22s
Smoke Test / smoke (push) Failing after 16s
Validate Config / YAML Lint (push) Failing after 12s
Validate Config / JSON Validate (push) Successful in 13s
Validate Config / Python Syntax & Import Check (push) Failing after 38s
Validate Config / Python Test Suite (push) Has been skipped
Validate Config / Shell Script Lint (push) Failing after 37s
Validate Config / Cron Syntax Check (push) Successful in 7s
Validate Config / Deploy Script Dry Run (push) Successful in 8s
Validate Config / Playbook Schema Validation (push) Successful in 18s
Architecture Lint / Lint Repository (push) Failing after 19s
2026-04-21 01:39:54 +00:00
Alexander Whitestone
19db78bbf0 feat: stale hermes process cleanup script (#829)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 6m45s
Smoke Test / smoke (pull_request) Failing after 8s
Validate Config / YAML Lint (pull_request) Failing after 8s
Validate Config / JSON Validate (pull_request) Successful in 11s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 43s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Shell Script Lint (pull_request) Failing after 36s
Validate Config / Cron Syntax Check (pull_request) Successful in 8s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 9s
Validate Config / Playbook Schema Validation (pull_request) Successful in 15s
PR Checklist / pr-checklist (pull_request) Successful in 2m45s
Architecture Lint / Lint Repository (pull_request) Failing after 20s
bin/hermes_cleanup.py:
  Identifies stale hermes sessions (old + idle)
  Groups by session, tracks parent+children
  Memory waste calculation (RSS in MB/GB)
  --kill to terminate, --dry-run (default) to report
  --max-age (default 24h), --max-cpu (default 0.5%)
  --json output, human-readable table

tests/test_hermes_cleanup.py: 8 tests
  process age, child PIDs, kill session,
  dry run, report generation

Usage:
  python3 bin/hermes_cleanup.py              # report
  python3 bin/hermes_cleanup.py --kill       # terminate
  python3 bin/hermes_cleanup.py --max-age 12 # 12h threshold
  python3 bin/hermes_cleanup.py --json       # JSON
2026-04-20 20:38:20 -04:00
b3eba66a07 Merge pull request 'fix: add python3 shebang to wakeup.py and .DS_Store to gitignore (closes #681)' (#832) from fix/681-clean into main
Some checks failed
Smoke Test / smoke (push) Failing after 15s
Architecture Lint / Linter Tests (push) Successful in 19s
Validate Config / YAML Lint (push) Failing after 10s
Validate Config / JSON Validate (push) Successful in 11s
Validate Config / Python Syntax & Import Check (push) Failing after 34s
Validate Config / Python Test Suite (push) Has been skipped
Validate Config / Shell Script Lint (push) Failing after 35s
Validate Config / Cron Syntax Check (push) Successful in 6s
Validate Config / Deploy Script Dry Run (push) Successful in 6s
Validate Config / Playbook Schema Validation (push) Successful in 13s
Architecture Lint / Lint Repository (push) Failing after 10s
2026-04-21 00:10:14 +00:00
61bb221ff2 fix: add python3 shebang to wakeup.py and .DS_Store to .gitignore (closes #681)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 27s
Smoke Test / smoke (pull_request) Failing after 26s
Validate Config / YAML Lint (pull_request) Failing after 19s
Validate Config / JSON Validate (pull_request) Successful in 17s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 52s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Shell Script Lint (pull_request) Failing after 51s
Validate Config / Cron Syntax Check (pull_request) Successful in 9s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 7s
Validate Config / Playbook Schema Validation (pull_request) Successful in 20s
PR Checklist / pr-checklist (pull_request) Successful in 3m54s
Architecture Lint / Lint Repository (pull_request) Failing after 18s
2026-04-20 19:59:06 -04:00
729db767d1 Merge pull request 'feat(#687): training data quality filter — remove low-quality pairs' (#830) from feat/687-quality-filter into main
Some checks failed
Smoke Test / smoke (push) Failing after 19s
Architecture Lint / Linter Tests (push) Successful in 25s
Validate Config / YAML Lint (push) Failing after 14s
Validate Config / JSON Validate (push) Successful in 15s
Validate Config / Python Syntax & Import Check (push) Failing after 41s
Validate Config / Python Test Suite (push) Has been skipped
Validate Config / Shell Script Lint (push) Failing after 46s
Validate Config / Cron Syntax Check (push) Successful in 12s
Validate Config / Deploy Script Dry Run (push) Successful in 10s
Validate Config / Playbook Schema Validation (push) Successful in 20s
Architecture Lint / Lint Repository (push) Failing after 14s
2026-04-20 23:40:40 +00:00
d4dedd2c3d Merge pull request 'feat: backfill provenance on all training data (#752)' (#826) from fix/752-provenance-v2 into main
Some checks failed
Smoke Test / smoke (push) Has been cancelled
Architecture Lint / Lint Repository (push) Has been cancelled
Architecture Lint / Linter Tests (push) Has been cancelled
Validate Config / YAML Lint (push) Has been cancelled
Validate Config / JSON Validate (push) Has been cancelled
Validate Config / Python Syntax & Import Check (push) Has been cancelled
Validate Config / Python Test Suite (push) Has been cancelled
Validate Config / Shell Script Lint (push) Has been cancelled
Validate Config / Cron Syntax Check (push) Has been cancelled
Validate Config / Deploy Script Dry Run (push) Has been cancelled
Validate Config / Playbook Schema Validation (push) Has been cancelled
2026-04-20 23:40:37 +00:00
0e2e2c1552 Merge pull request 'feat: code block normalization tests (closes #750)' (#825) from fix/750-code-blocks into main
Some checks failed
Architecture Lint / Lint Repository (push) Has been cancelled
Architecture Lint / Linter Tests (push) Has been cancelled
Smoke Test / smoke (push) Has been cancelled
Validate Config / Python Syntax & Import Check (push) Has been cancelled
Validate Config / Python Test Suite (push) Has been cancelled
Validate Config / Shell Script Lint (push) Has been cancelled
Validate Config / Cron Syntax Check (push) Has been cancelled
Validate Config / Deploy Script Dry Run (push) Has been cancelled
Validate Config / Playbook Schema Validation (push) Has been cancelled
Validate Config / JSON Validate (push) Has been cancelled
Validate Config / YAML Lint (push) Has started running
2026-04-20 23:40:35 +00:00
bee4d02dd5 Merge pull request 'fix: restore pytest collection — fix 7 syntax/import errors (#823)' (#824) from fix/823-pytest-collection into main
Some checks failed
Architecture Lint / Lint Repository (push) Has been cancelled
Architecture Lint / Linter Tests (push) Has been cancelled
Smoke Test / smoke (push) Has been cancelled
Validate Config / JSON Validate (push) Has been cancelled
Validate Config / Python Syntax & Import Check (push) Has been cancelled
Validate Config / Python Test Suite (push) Has been cancelled
Validate Config / Shell Script Lint (push) Has been cancelled
Validate Config / Cron Syntax Check (push) Has been cancelled
Validate Config / Deploy Script Dry Run (push) Has been cancelled
Validate Config / Playbook Schema Validation (push) Has been cancelled
Validate Config / YAML Lint (push) Has been cancelled
2026-04-20 23:40:23 +00:00
a0266c83a4 fix(#687): Add quality filter tests
Some checks failed
Smoke Test / smoke (pull_request) Failing after 15s
Architecture Lint / Linter Tests (pull_request) Successful in 20s
Validate Config / YAML Lint (pull_request) Failing after 13s
Validate Config / JSON Validate (pull_request) Successful in 15s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 36s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Cron Syntax Check (pull_request) Successful in 10s
Validate Config / Shell Script Lint (pull_request) Failing after 47s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 9s
Validate Config / Playbook Schema Validation (pull_request) Successful in 20s
Architecture Lint / Lint Repository (pull_request) Failing after 17s
PR Checklist / pr-checklist (pull_request) Successful in 3m48s
2026-04-20 23:16:13 +00:00
b28071bb71 fix(#687): Training data quality filter
- Score pairs on specificity, length ratio, code correctness
- Composite weighted score (0.5 spec + 0.2 length + 0.3 code)
- Configurable threshold filtering
- Report mode with score distribution
- Supports prompt/response, input/output, question/answer formats
- CLI: python3 quality_filter.py input.jsonl -o output.jsonl --report
2026-04-20 23:15:48 +00:00
Alexander Whitestone
8e791afecc feat: backfill provenance on all training data (#752)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 21s
Smoke Test / smoke (pull_request) Failing after 22s
Validate Config / YAML Lint (pull_request) Failing after 16s
Validate Config / JSON Validate (pull_request) Successful in 14s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 33s
Validate Config / Cron Syntax Check (pull_request) Successful in 12s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 12s
Validate Config / Shell Script Lint (pull_request) Failing after 54s
Validate Config / Playbook Schema Validation (pull_request) Successful in 17s
PR Checklist / pr-checklist (pull_request) Successful in 2m25s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
scripts/backfill_training_provenance.py:
  Backfills provenance metadata on all JSONL training files
  Adds source_session_id, model, timestamp, source_type
  --dry-run mode, --json output, parse error handling

Result: 11,007 pairs across 45 files now have provenance
  Coverage: 0% -> 100%

Validation: python3 scripts/provenance_validate.py --threshold 50
  PASS: 3800/3800 pairs have provenance

Dashboard: python3 scripts/provenance_dashboard.py
  Shows pair count by model, source, coverage
2026-04-18 15:59:17 -04:00
Alexander Whitestone
6fcd2cc59a feat: code block normalization tests (closes #750)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 15s
Smoke Test / smoke (pull_request) Failing after 17s
Validate Config / YAML Lint (pull_request) Failing after 15s
Validate Config / JSON Validate (pull_request) Successful in 18s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 39s
Validate Config / Cron Syntax Check (pull_request) Successful in 12s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 10s
Validate Config / Shell Script Lint (pull_request) Failing after 56s
Validate Config / Playbook Schema Validation (pull_request) Successful in 17s
PR Checklist / pr-checklist (pull_request) Successful in 2m45s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
tests/test_normalize_code_blocks.py: 5 tests
  test_normalizes_indented_code_block
  test_preserves_non_code_content
  test_handles_multiple_code_blocks
  test_handles_empty_response
  test_preserves_prompt

Existing normalize-code-blocks.py handles code block indentation.
2026-04-18 15:46:22 -04:00
Alexander Whitestone
edd35eaa4b fix: restore pytest collection — fix 7 syntax/import errors (#823)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 12s
Smoke Test / smoke (pull_request) Failing after 19s
Validate Config / YAML Lint (pull_request) Failing after 14s
Validate Config / JSON Validate (pull_request) Successful in 13s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 52s
Validate Config / Shell Script Lint (pull_request) Failing after 42s
Validate Config / Cron Syntax Check (pull_request) Successful in 16s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 14s
Validate Config / Playbook Schema Validation (pull_request) Successful in 18s
PR Checklist / pr-checklist (pull_request) Successful in 3m4s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
Fixed collection errors:
  scripts/adversary_schema.py: unterminated regex string (line 141)
  scripts/config_validate.py: unmatched ')' (line 87)
  scripts/pr_triage.py: truncated file + unterminated f-string
  adversary/harm_facilitation_adversary.py: 4 broken f-strings
  bin/glitch_patterns.py: missing get_threejs_patterns() export
  tests/test_glitch_detector.py: fixed THREEJS_CATEGORIES import
  tests/test_pr_triage.py: fixed function name imports
  training/training_pair_provenance.py: added ProvenanceTracker class
  scripts/validate_scene_data.py: symlink for import compatibility

Result: python3 -m pytest --collect-only
  911 tests collected, 0 collection errors
  (was: 769 collected / 7 errors)
2026-04-18 15:37:33 -04:00
04ecad3b43 Merge pull request 'fix: use PYTHON variable in training Makefile (closes #660)' (#822) from fix/660-python-makefile into main
fix: use PYTHON variable in training Makefile (closes #660)

Refs Timmy_Foundation/the-nexus#1471
2026-04-17 06:44:30 +00:00
099948b3d1 fix: use PYTHON variable in training Makefile
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 8s
PR Checklist / pr-checklist (pull_request) Failing after 2m12s
Smoke Test / smoke (pull_request) Failing after 13s
Validate Config / YAML Lint (pull_request) Failing after 9s
Validate Config / JSON Validate (pull_request) Successful in 10s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 16s
Validate Config / Shell Script Lint (pull_request) Failing after 15s
Validate Config / Cron Syntax Check (pull_request) Successful in 7s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 7s
Validate Config / Playbook Schema Validation (pull_request) Successful in 13s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
On macOS where only python3 is installed (no python shim), bare
`python` calls fail with 'No such file or directory'.

Adds `PYTHON ?= python3` variable. Replaces all bare `python`
calls with `$(PYTHON)` across: train-local, vibes,
adversary-value-violations, ingest, curated, convert.

Override: make vibes PYTHON=python

Closes #660
Refs Timmy_Foundation/the-nexus#1471
2026-04-17 06:39:05 +00:00
ef58883a26 fix: use PYTHON variable in training Makefile for portability (closes #660)
Added PYTHON ?= python3 variable and replaced all bare python calls.
Fixes macOS where only python3 is installed.

Refs #660
2026-04-17 02:37:47 -04:00
2a11233952 Merge pull request 'feat: quality gate pipeline validation' (#818) from fix/623 into main
Resolves add/add conflict in pipeline/quality_gate.py by keeping more complete 619-line main version.

Closes #623
2026-04-17 02:37:16 -04:00
cc9ff4cf5d Merge remote-tracking branch 'origin/fix/752' 2026-04-17 02:37:04 -04:00
7c03c666d8 Merge pull request 'feat: 500 dream description prompt enhancement pairs — scene/crisis/music data' (#821,#820,#819,#799) from fix/602 into main
Resolves add/add conflicts with already-merged files (authority_bypass_200.jsonl, identity_attacks_200.jsonl, quality_filter.py) by keeping main's versions.

Closes #602, #645, #689, #599
2026-04-17 02:37:00 -04:00
0fc149b10c Merge pull request 'feat: quality filter tests — score specificity, length ratio, code' (#817) from fix/687-quality-filter into main 2026-04-17 02:32:51 -04:00
ed5e52e0d9 Merge pull request 'feat: harm facilitation adversary — 200 jailbreak prompts' (#816) from ward/618-harm-facilitation into main 2026-04-17 02:32:48 -04:00
2c49cac144 Merge pull request 'fix(#662): cron fleet audit — crontab parsing, tests, CI validation' (#814) from burn/662-cron-audit-fix into main 2026-04-17 02:32:44 -04:00
1183fb5f2b Merge pull request 'feat: scene data validator tests + CI path fix' (#813) from feat/647-scene-data-validator into main 2026-04-17 02:32:40 -04:00
7ce0016207 Merge pull request 'test: verify training example metadata preservation' (#812) from fix/646-metadata-preservation into main 2026-04-17 02:32:37 -04:00
06bebc0ca3 Merge pull request 'feat: adversary execution harness for prompt corpora' (#811) from fix/652-adversary-harness into main 2026-04-17 02:32:33 -04:00
b2246e0dcc Merge pull request 'feat: PR backlog triage script — categorize, find duplicates, detect stale refs' (#810) from burn/658-pr-backlog-triage into main 2026-04-17 02:32:30 -04:00
87ee28aa42 Merge pull request 'feat: Token tracker integrated with orchestrator — auto-logging on task completion' (#808) from fix/634-token-tracker-orchestrator into main 2026-04-17 02:32:27 -04:00
39d1e1d7ce Merge pull request 'fix: pipeline_state.json daily reset' (#805) from fix/650-pipeline-daily-reset-v2 into main 2026-04-17 02:32:18 -04:00
f57c21fda9 Merge pull request 'fix: training data code block indentation — normalize open_tag whitespace' (#809) from fix/750-code-block-indentation into main 2026-04-17 02:32:14 -04:00
Alexander Whitestone
44fe4bfcd7 feat: 500 dream description prompt enhancement pairs (#602)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 22s
PR Checklist / pr-checklist (pull_request) Failing after 34m33s
Smoke Test / smoke (pull_request) Failing after 47m9s
Validate Config / YAML Lint (pull_request) Failing after 12s
Validate Config / JSON Validate (pull_request) Successful in 11s
Validate Config / Shell Script Lint (pull_request) Failing after 42s
Validate Config / Cron Syntax Check (pull_request) Successful in 7s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 5s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 59s
Validate Training Data / validate (pull_request) Successful in 15s
Validate Config / Playbook Schema Validation (pull_request) Successful in 19s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
2026-04-17 02:22:13 -04:00
89413d00d3 Merge pull request 'fix: hash dedup rotation + bloom filter — bounded memory (#628)' (#804) from burn/621-shared-orchestrator-1776402806 into main 2026-04-17 06:19:03 +00:00
65a400f3ed Merge pull request 'feat: shared adversary scoring rubric and transcript schema (closes #655)' (#802) from feat/655-adversary-scoring-rubric into main 2026-04-17 06:19:01 +00:00
Alexander Whitestone
dbb1c124fe feat: Country + Latin scene descriptions — 200 entries (#645)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 23s
Smoke Test / smoke (pull_request) Failing after 14s
Validate Config / YAML Lint (pull_request) Failing after 12s
Validate Config / JSON Validate (pull_request) Successful in 13s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m11s
Validate Config / Shell Script Lint (pull_request) Failing after 40s
Validate Config / Cron Syntax Check (pull_request) Successful in 10s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 10s
Validate Config / Playbook Schema Validation (pull_request) Successful in 15s
Validate Training Data / validate (pull_request) Successful in 15s
PR Checklist / pr-checklist (pull_request) Failing after 3m43s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
Complete the 9-genre scene description requirement.

Country: 10 songs (Dusty Boots, County Fair, Highway Hymn, Barn Dance,
Porcelain Dawn, Lonesome Road, Sweet Magnolia, Graveyard Shift,
Sunday Best, Old Barn)

Latin: 10 songs (Fuego Lento, Corazon de Oro, Lluvia de Estrellas,
Bailando con el Viento, Ritmo del Alma, Luna Roja, Siembra y Cosecha,
Carnaval, Desierto de Amor, Raices)

All 10 training factory genres now complete:
Pop, Rock, Hip-Hop, Electronic, R&B/Soul, Country, Jazz, Classical,
Metal, Latin.

Closes #645
2026-04-17 02:08:08 -04:00
Alexander Whitestone
9f2a76fc3e feat: auto-generate scene descriptions from image/video (#689)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 31s
PR Checklist / pr-checklist (pull_request) Failing after 13m48s
Smoke Test / smoke (pull_request) Failing after 13m22s
Validate Config / YAML Lint (pull_request) Failing after 5s
Validate Config / JSON Validate (pull_request) Successful in 4s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 21s
Validate Config / Shell Script Lint (pull_request) Failing after 22s
Validate Config / Cron Syntax Check (pull_request) Successful in 8s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 8s
Validate Training Data / validate (pull_request) Successful in 10s
Validate Config / Playbook Schema Validation (pull_request) Successful in 15s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
2026-04-17 01:58:05 -04:00
9a8d620163 feat: quality gate pipeline validation (#623)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 13s
Smoke Test / smoke (pull_request) Failing after 11s
Validate Config / YAML Lint (pull_request) Failing after 14s
Validate Config / JSON Validate (pull_request) Successful in 14s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 44s
Validate Config / Shell Script Lint (pull_request) Failing after 24s
Validate Config / Cron Syntax Check (pull_request) Successful in 5s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 3s
Validate Config / Playbook Schema Validation (pull_request) Successful in 8s
PR Checklist / pr-checklist (pull_request) Failing after 3m54s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
Validates JSONL/JSON pipeline outputs for:
- Schema correctness
- Content quality (non-empty, not duplicated)
- Toxicity detection
- Dedup hash management with auto-cleanup

Usage:
  python3 bin/quality-gate.py validate data.jsonl
  python3 bin/quality-gate.py score data.jsonl
  python3 bin/quality-gate.py stats
  python3 bin/quality-gate.py cleanup

Closes #623
2026-04-17 05:53:33 +00:00
Alexander Whitestone
3e9d808739 feat: quality filter tests — score specificity, length ratio, code (#687)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 14s
Smoke Test / smoke (pull_request) Failing after 16s
Validate Config / YAML Lint (pull_request) Failing after 14s
Validate Config / JSON Validate (pull_request) Successful in 15s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 45s
Validate Config / Shell Script Lint (pull_request) Failing after 23s
Validate Config / Cron Syntax Check (pull_request) Successful in 6s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 5s
Validate Config / Playbook Schema Validation (pull_request) Successful in 18s
PR Checklist / pr-checklist (pull_request) Failing after 3m51s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
tests/test_quality_filter.py: 12 tests

TestScorePair (10):
  good pair scores high (>= 0.8)
  empty response scores 0
  short response penalized
  filler response penalized (sure/ok/yes/no)
  prompt equals response penalized
  placeholder detected (TODO/FIXME/etc)
  repetitive response penalized
  short prompt penalized
  terse/rich key format accepted
  scenario/response key format accepted

TestFilterFile (2):
  filter creates output file with kept entries
  dry run mode

Existing quality_filter.py already implements:
  - specificity (filler detection, placeholder detection)
  - length ratio (response/prompt comparison)
  - code correctness (repetition detection)
  - threshold-based filtering
  - report: N filtered out of M total
2026-04-17 01:51:47 -04:00
d983d2f163 test: harm facilitation adversary tests (#618)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 23s
Smoke Test / smoke (pull_request) Failing after 16s
Validate Config / YAML Lint (pull_request) Failing after 13s
Validate Config / JSON Validate (pull_request) Successful in 16s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 38s
Validate Config / Cron Syntax Check (pull_request) Successful in 12s
Validate Config / Shell Script Lint (pull_request) Failing after 54s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 10s
Validate Config / Playbook Schema Validation (pull_request) Successful in 20s
PR Checklist / pr-checklist (pull_request) Failing after 3m52s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
2026-04-17 05:45:19 +00:00
b6d14cf95b feat: harm facilitation adversary runner (#618) 2026-04-17 05:38:16 +00:00
05378c8aed feat: 200 harm facilitation jailbreak prompts (#618)
8 categories: weapons_direct (30), weapons_chemical (30),
weapons_biological (20), coercion (25), harm_instructions (25),
escalation (25), roleplay_bypass (25), context_manipulation (20).
2026-04-17 05:37:24 +00:00
Alexander Whitestone
d278d7f5d5 fix(#662): cron fleet audit — crontab parsing, tests, CI validation
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 24s
Smoke Test / smoke (pull_request) Failing after 14s
Validate Config / YAML Lint (pull_request) Failing after 14s
Validate Config / JSON Validate (pull_request) Successful in 16s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 46s
Validate Config / Cron Syntax Check (pull_request) Successful in 8s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 7s
Validate Config / Shell Script Lint (pull_request) Failing after 44s
Validate Config / Playbook Schema Validation (pull_request) Successful in 22s
PR Checklist / pr-checklist (pull_request) Failing after 3m55s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
- Added VPS crontab backup parsing to cron-audit-662.py
- New audit_fleet() combines hermes cron + VPS crontabs
- load_crontab_backups() reads cron/vps/*-crontab-backup.txt
- 20+ tests: crontab parsing, job categorization, fleet audit,
  timestamp parsing, backup loading
- ci-cron-validate.py: CI gate that fails on systemic failures
- Fresh audit report generated in cron/audit-report.json

Closes #662
2026-04-17 01:34:45 -04:00
2e6b95dd3d ci: add training-data/ path trigger + scripts/validate-scene-data.py (#647)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 20s
Smoke Test / smoke (pull_request) Failing after 16s
Validate Config / YAML Lint (pull_request) Failing after 15s
Validate Config / JSON Validate (pull_request) Successful in 14s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 45s
Validate Config / Cron Syntax Check (pull_request) Successful in 10s
Validate Config / Shell Script Lint (pull_request) Failing after 52s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 11s
Validate Config / Playbook Schema Validation (pull_request) Successful in 21s
PR Checklist / pr-checklist (pull_request) Failing after 3m50s
Validate Config / Python Test Suite (pull_request) Has been cancelled
Architecture Lint / Lint Repository (pull_request) Has been cancelled
2026-04-17 05:34:38 +00:00
40b68a13cb test: scene data validator tests (#647) 2026-04-17 05:34:36 +00:00