Commit Graph

112 Commits

Author SHA1 Message Date
729db767d1 Merge pull request 'feat(#687): training data quality filter — remove low-quality pairs' (#830) from feat/687-quality-filter into main
Some checks failed
Smoke Test / smoke (push) Failing after 19s
Architecture Lint / Linter Tests (push) Successful in 25s
Validate Config / YAML Lint (push) Failing after 14s
Validate Config / JSON Validate (push) Successful in 15s
Validate Config / Python Syntax & Import Check (push) Failing after 41s
Validate Config / Python Test Suite (push) Has been skipped
Validate Config / Shell Script Lint (push) Failing after 46s
Validate Config / Cron Syntax Check (push) Successful in 12s
Validate Config / Deploy Script Dry Run (push) Successful in 10s
Validate Config / Playbook Schema Validation (push) Successful in 20s
Architecture Lint / Lint Repository (push) Failing after 14s
2026-04-20 23:40:40 +00:00
d4dedd2c3d Merge pull request 'feat: backfill provenance on all training data (#752)' (#826) from fix/752-provenance-v2 into main
Some checks failed
Smoke Test / smoke (push) Has been cancelled
Architecture Lint / Lint Repository (push) Has been cancelled
Architecture Lint / Linter Tests (push) Has been cancelled
Validate Config / YAML Lint (push) Has been cancelled
Validate Config / JSON Validate (push) Has been cancelled
Validate Config / Python Syntax & Import Check (push) Has been cancelled
Validate Config / Python Test Suite (push) Has been cancelled
Validate Config / Shell Script Lint (push) Has been cancelled
Validate Config / Cron Syntax Check (push) Has been cancelled
Validate Config / Deploy Script Dry Run (push) Has been cancelled
Validate Config / Playbook Schema Validation (push) Has been cancelled
2026-04-20 23:40:37 +00:00
a0266c83a4 fix(#687): Add quality filter tests
Some checks failed
Smoke Test / smoke (pull_request) Failing after 15s
Architecture Lint / Linter Tests (pull_request) Successful in 20s
Validate Config / YAML Lint (pull_request) Failing after 13s
Validate Config / JSON Validate (pull_request) Successful in 15s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 36s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Cron Syntax Check (pull_request) Successful in 10s
Validate Config / Shell Script Lint (pull_request) Failing after 47s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 9s
Validate Config / Playbook Schema Validation (pull_request) Successful in 20s
Architecture Lint / Lint Repository (pull_request) Failing after 17s
PR Checklist / pr-checklist (pull_request) Successful in 3m48s
2026-04-20 23:16:13 +00:00
b28071bb71 fix(#687): Training data quality filter
- Score pairs on specificity, length ratio, code correctness
- Composite weighted score (0.5 spec + 0.2 length + 0.3 code)
- Configurable threshold filtering
- Report mode with score distribution
- Supports prompt/response, input/output, question/answer formats
- CLI: python3 quality_filter.py input.jsonl -o output.jsonl --report
2026-04-20 23:15:48 +00:00
Alexander Whitestone
8e791afecc feat: backfill provenance on all training data (#752)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 21s
Smoke Test / smoke (pull_request) Failing after 22s
Validate Config / YAML Lint (pull_request) Failing after 16s
Validate Config / JSON Validate (pull_request) Successful in 14s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 33s
Validate Config / Cron Syntax Check (pull_request) Successful in 12s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 12s
Validate Config / Shell Script Lint (pull_request) Failing after 54s
Validate Config / Playbook Schema Validation (pull_request) Successful in 17s
PR Checklist / pr-checklist (pull_request) Successful in 2m25s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
scripts/backfill_training_provenance.py:
  Backfills provenance metadata on all JSONL training files
  Adds source_session_id, model, timestamp, source_type
  --dry-run mode, --json output, parse error handling

Result: 11,007 pairs across 45 files now have provenance
  Coverage: 0% -> 100%

Validation: python3 scripts/provenance_validate.py --threshold 50
  PASS: 3800/3800 pairs have provenance

Dashboard: python3 scripts/provenance_dashboard.py
  Shows pair count by model, source, coverage
2026-04-18 15:59:17 -04:00
Alexander Whitestone
edd35eaa4b fix: restore pytest collection — fix 7 syntax/import errors (#823)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 12s
Smoke Test / smoke (pull_request) Failing after 19s
Validate Config / YAML Lint (pull_request) Failing after 14s
Validate Config / JSON Validate (pull_request) Successful in 13s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 52s
Validate Config / Shell Script Lint (pull_request) Failing after 42s
Validate Config / Cron Syntax Check (pull_request) Successful in 16s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 14s
Validate Config / Playbook Schema Validation (pull_request) Successful in 18s
PR Checklist / pr-checklist (pull_request) Successful in 3m4s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
Fixed collection errors:
  scripts/adversary_schema.py: unterminated regex string (line 141)
  scripts/config_validate.py: unmatched ')' (line 87)
  scripts/pr_triage.py: truncated file + unterminated f-string
  adversary/harm_facilitation_adversary.py: 4 broken f-strings
  bin/glitch_patterns.py: missing get_threejs_patterns() export
  tests/test_glitch_detector.py: fixed THREEJS_CATEGORIES import
  tests/test_pr_triage.py: fixed function name imports
  training/training_pair_provenance.py: added ProvenanceTracker class
  scripts/validate_scene_data.py: symlink for import compatibility

Result: python3 -m pytest --collect-only
  911 tests collected, 0 collection errors
  (was: 769 collected / 7 errors)
2026-04-18 15:37:33 -04:00
7c03c666d8 Merge pull request 'feat: 500 dream description prompt enhancement pairs — scene/crisis/music data' (#821,#820,#819,#799) from fix/602 into main
Resolves add/add conflicts with already-merged files (authority_bypass_200.jsonl, identity_attacks_200.jsonl, quality_filter.py) by keeping main's versions.

Closes #602, #645, #689, #599
2026-04-17 02:37:00 -04:00
2c49cac144 Merge pull request 'fix(#662): cron fleet audit — crontab parsing, tests, CI validation' (#814) from burn/662-cron-audit-fix into main 2026-04-17 02:32:44 -04:00
06bebc0ca3 Merge pull request 'feat: adversary execution harness for prompt corpora' (#811) from fix/652-adversary-harness into main 2026-04-17 02:32:33 -04:00
b2246e0dcc Merge pull request 'feat: PR backlog triage script — categorize, find duplicates, detect stale refs' (#810) from burn/658-pr-backlog-triage into main 2026-04-17 02:32:30 -04:00
39d1e1d7ce Merge pull request 'fix: pipeline_state.json daily reset' (#805) from fix/650-pipeline-daily-reset-v2 into main 2026-04-17 02:32:18 -04:00
f57c21fda9 Merge pull request 'fix: training data code block indentation — normalize open_tag whitespace' (#809) from fix/750-code-block-indentation into main 2026-04-17 02:32:14 -04:00
65a400f3ed Merge pull request 'feat: shared adversary scoring rubric and transcript schema (closes #655)' (#802) from feat/655-adversary-scoring-rubric into main 2026-04-17 06:19:01 +00:00
Alexander Whitestone
d278d7f5d5 fix(#662): cron fleet audit — crontab parsing, tests, CI validation
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 24s
Smoke Test / smoke (pull_request) Failing after 14s
Validate Config / YAML Lint (pull_request) Failing after 14s
Validate Config / JSON Validate (pull_request) Successful in 16s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 46s
Validate Config / Cron Syntax Check (pull_request) Successful in 8s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 7s
Validate Config / Shell Script Lint (pull_request) Failing after 44s
Validate Config / Playbook Schema Validation (pull_request) Successful in 22s
PR Checklist / pr-checklist (pull_request) Failing after 3m55s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
- Added VPS crontab backup parsing to cron-audit-662.py
- New audit_fleet() combines hermes cron + VPS crontabs
- load_crontab_backups() reads cron/vps/*-crontab-backup.txt
- 20+ tests: crontab parsing, job categorization, fleet audit,
  timestamp parsing, backup loading
- ci-cron-validate.py: CI gate that fails on systemic failures
- Fresh audit report generated in cron/audit-report.json

Closes #662
2026-04-17 01:34:45 -04:00
c633afd66d fix: add underscore module version for test imports (#750) 2026-04-17 05:33:26 +00:00
c69ae0e72b fix: normalize open_tag whitespace in code block parser (#750) 2026-04-17 05:33:24 +00:00
f094b0d5b5 feat: Add PR backlog triage script — categorize, duplicates, stale detection (#658) 2026-04-17 05:32:19 +00:00
42ff05aeec feat: adversary execution harness for prompt corpora (#652)
Reusable harness for replaying JSONL corpora against live agents.
Supports Ollama, hermes, and mock backends.
Captures transcripts, scores responses, auto-files P0 issues.

Closes #652
2026-04-17 05:31:27 +00:00
acba760731 fix: reset_stale_states delegates to standalone script (closes #650)
Some checks failed
Validate Config / Playbook Schema Validation (pull_request) Successful in 14s
Architecture Lint / Linter Tests (pull_request) Successful in 26s
PR Checklist / pr-checklist (pull_request) Failing after 25m6s
Smoke Test / smoke (pull_request) Failing after 12s
Validate Config / YAML Lint (pull_request) Failing after 8s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 35s
Validate Config / JSON Validate (pull_request) Successful in 13s
Validate Config / Cron Syntax Check (pull_request) Successful in 8s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 6s
Validate Config / Shell Script Lint (pull_request) Failing after 34s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
2026-04-17 05:26:06 +00:00
34ade6fc0e fix: pipeline state daily reset (closes #650) 2026-04-17 05:24:14 +00:00
c5270d76e0 fix: pipeline state daily reset (closes #650) 2026-04-17 05:24:12 +00:00
38a4a73a67 feat: shared adversary scoring rubric and transcript schema (#655) 2026-04-17 05:17:29 +00:00
6b984532a1 feat: config validation script
Closes #690

Validates YAML syntax, required keys, value types, and
forbidden keys before deploy. Prevents broken deploys
from bad config.
2026-04-17 05:07:44 +00:00
Alexander Whitestone
f169634a75 feat: config drift detection across all fleet nodes (#686)
Some checks failed
PR Checklist / pr-checklist (pull_request) Has been cancelled
Architecture Lint / Linter Tests (pull_request) Has been cancelled
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Smoke Test / smoke (pull_request) Has been cancelled
Validate Config / YAML Lint (pull_request) Has been cancelled
Validate Config / JSON Validate (pull_request) Has been cancelled
Validate Config / Python Syntax & Import Check (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
Validate Config / Shell Script Lint (pull_request) Has been cancelled
Validate Config / Cron Syntax Check (pull_request) Has been cancelled
Validate Config / Deploy Script Dry Run (pull_request) Has been cancelled
Validate Config / Playbook Schema Validation (pull_request) Has been cancelled
Validate Training Data / validate (pull_request) Has been cancelled
Detect config drift between fleet nodes and canonical timmy-config.

scripts/config_drift_detector.py (200 lines):
- SSH-based config collection from all nodes
- Recursive diff against canonical config
- Report: which keys differ, on which nodes
- JSON output for programmatic consumption

Fleet nodes: local, ezra (143.198.27.163), bezalel (167.99.126.228)

Usage:
  python3 scripts/config_drift_detector.py --report
  python3 scripts/config_drift_detector.py --json

Closes #686
2026-04-16 01:33:57 -04:00
Merge Bot
11e476e79e Merge PR #633: scripts/token-tracker.py 2026-04-16 05:11:23 +00:00
Merge Bot
5ac19b27ee Merge PR #665: scripts/pr_triage.py 2026-04-16 05:10:46 +00:00
Merge Bot
7c16ddb741 Merge PR #712: scripts/nightly-pipeline-scheduler.sh (changed) 2026-04-16 05:09:54 +00:00
Merge Bot
4642c8b3b1 Merge PR #656: scripts/generate-crisis-direct-suicidal-pairs.py (added) 2026-04-16 05:06:47 +00:00
Merge Bot
7ee587b9f4 Merge PR #667: scripts/validate-scene-data.py (added) 2026-04-16 05:06:10 +00:00
Merge Bot
720516d452 Merge PR #671: scripts/cron-audit-662.py (added) 2026-04-16 05:05:56 +00:00
Merge Bot
8bc6e4e5f0 Merge PR #679: scripts/pr_triage.py (added) 2026-04-16 05:05:44 +00:00
Merge Bot
17adc703f8 Merge PR #729: scripts/generate_scene_descriptions.py (added) 2026-04-16 05:03:55 +00:00
Merge Bot
4b891f8f46 Merge PR #738: scripts/config_template.py (added) 2026-04-16 05:03:30 +00:00
Merge Bot
1a362637c9 Merge PR #763: scripts/pr-backlog-triage.py (added) 2026-04-16 04:59:59 +00:00
Merge Bot
6b7d219a29 Merge PR #768: scripts/token_budget.py (added) 2026-04-16 04:59:16 +00:00
Merge Bot
318eaefb81 Merge PR #771: scripts/quality_gate_integration.py (added) 2026-04-16 04:59:01 +00:00
Merge Bot
d76182c654 Merge PR #772: scripts/cron_audit.py (added) 2026-04-16 04:58:59 +00:00
Merge Bot
8c5b82e214 Merge PR #773: scripts/hash_dedup.py (added) 2026-04-16 04:58:55 +00:00
Merge Bot
297363a141 Merge PR #775: scripts/pr-triage-automation.py (added) 2026-04-16 04:58:49 +00:00
Merge Bot
7f121d5591 Merge PR #776: scripts/config_drift.py (added) 2026-04-16 04:58:44 +00:00
Merge Bot
218b6dcb33 Merge PR #777: scripts/token_tracker.py (added) 2026-04-16 04:58:40 +00:00
Merge Bot
636e32e467 Merge PR #783: scripts/normalize-code-blocks.py (added) 2026-04-16 04:58:23 +00:00
d120526244 fix: add python3 shebang to scripts/visual_pr_reviewer.py (#681) 2026-04-15 02:57:53 +00:00
8596ff761b fix: add python3 shebang to scripts/diagram_meaning_extractor.py (#681) 2026-04-15 02:57:40 +00:00
7553fd4f3e fix: add python3 shebang to scripts/captcha_bypass_handler.py (#681) 2026-04-15 02:57:25 +00:00
ad751a6de6 docs: add pipeline scheduler README 2026-04-14 22:47:12 +00:00
82f9810081 feat: add nightly-pipeline-scheduler.sh 2026-04-14 22:46:38 +00:00
04cceccd01 feat: add rock scene generator (#607)
Some checks failed
Architecture Lint / Lint Repository (push) Has been cancelled
Architecture Lint / Linter Tests (push) Has been cancelled
Smoke Test / smoke (push) Has been cancelled
Validate Config / YAML Lint (push) Has been cancelled
Validate Config / JSON Validate (push) Has been cancelled
Validate Config / Python Syntax & Import Check (push) Has been cancelled
Validate Config / Python Test Suite (push) Has been cancelled
Validate Config / Shell Script Lint (push) Has been cancelled
Validate Config / Cron Syntax Check (push) Has been cancelled
Validate Config / Deploy Script Dry Run (push) Has been cancelled
Validate Config / Playbook Schema Validation (push) Has been cancelled
2026-04-14 22:35:43 +00:00
04dbf772b1 feat: Visual Smoke Test for The Nexus #490 (#558)
Some checks failed
Architecture Lint / Linter Tests (push) Successful in 15s
Smoke Test / smoke (push) Failing after 14s
Validate Config / YAML Lint (push) Failing after 13s
Validate Config / JSON Validate (push) Successful in 13s
Validate Config / Shell Script Lint (push) Failing after 40s
Validate Config / Python Syntax & Import Check (push) Failing after 58s
Validate Config / Python Test Suite (push) Has been skipped
Validate Config / Cron Syntax Check (push) Successful in 11s
Validate Config / Deploy Script Dry Run (push) Successful in 11s
Validate Config / Playbook Schema Validation (push) Successful in 18s
Architecture Lint / Lint Repository (push) Failing after 13s
Architecture Lint / Linter Tests (pull_request) Successful in 26s
Smoke Test / smoke (pull_request) Failing after 17s
Validate Config / YAML Lint (pull_request) Failing after 12s
Validate Config / JSON Validate (pull_request) Successful in 12s
PR Checklist / pr-checklist (pull_request) Failing after 3m36s
Validate Config / Shell Script Lint (pull_request) Failing after 40s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m4s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Cron Syntax Check (pull_request) Successful in 9s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 9s
Validate Config / Playbook Schema Validation (pull_request) Successful in 20s
Architecture Lint / Lint Repository (pull_request) Failing after 16s
Merge PR #558
2026-04-14 22:18:23 +00:00
9651a56308 feat: Glitch Detector HTML Report with annotated screenshots #544 (#567)
Some checks failed
Architecture Lint / Lint Repository (push) Has been cancelled
Architecture Lint / Linter Tests (push) Has been cancelled
Smoke Test / smoke (push) Has been cancelled
Validate Config / JSON Validate (push) Has been cancelled
Validate Config / Python Syntax & Import Check (push) Has been cancelled
Validate Config / Python Test Suite (push) Has been cancelled
Validate Config / Shell Script Lint (push) Has been cancelled
Validate Config / Cron Syntax Check (push) Has been cancelled
Validate Config / Deploy Script Dry Run (push) Has been cancelled
Validate Config / Playbook Schema Validation (push) Has been cancelled
Validate Config / YAML Lint (push) Has been cancelled
Merge PR #567
2026-04-14 22:17:32 +00:00