Commit Graph

746 Commits

Author SHA1 Message Date
1183fb5f2b Merge pull request 'feat: scene data validator tests + CI path fix' (#813) from feat/647-scene-data-validator into main 2026-04-17 02:32:40 -04:00
7ce0016207 Merge pull request 'test: verify training example metadata preservation' (#812) from fix/646-metadata-preservation into main 2026-04-17 02:32:37 -04:00
06bebc0ca3 Merge pull request 'feat: adversary execution harness for prompt corpora' (#811) from fix/652-adversary-harness into main 2026-04-17 02:32:33 -04:00
b2246e0dcc Merge pull request 'feat: PR backlog triage script — categorize, find duplicates, detect stale refs' (#810) from burn/658-pr-backlog-triage into main 2026-04-17 02:32:30 -04:00
87ee28aa42 Merge pull request 'feat: Token tracker integrated with orchestrator — auto-logging on task completion' (#808) from fix/634-token-tracker-orchestrator into main 2026-04-17 02:32:27 -04:00
39d1e1d7ce Merge pull request 'fix: pipeline_state.json daily reset' (#805) from fix/650-pipeline-daily-reset-v2 into main 2026-04-17 02:32:18 -04:00
f57c21fda9 Merge pull request 'fix: training data code block indentation — normalize open_tag whitespace' (#809) from fix/750-code-block-indentation into main 2026-04-17 02:32:14 -04:00
89413d00d3 Merge pull request 'fix: hash dedup rotation + bloom filter — bounded memory (#628)' (#804) from burn/621-shared-orchestrator-1776402806 into main 2026-04-17 06:19:03 +00:00
65a400f3ed Merge pull request 'feat: shared adversary scoring rubric and transcript schema (closes #655)' (#802) from feat/655-adversary-scoring-rubric into main 2026-04-17 06:19:01 +00:00
2e6b95dd3d ci: add training-data/ path trigger + scripts/validate-scene-data.py (#647)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 20s
Smoke Test / smoke (pull_request) Failing after 16s
Validate Config / YAML Lint (pull_request) Failing after 15s
Validate Config / JSON Validate (pull_request) Successful in 14s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 45s
Validate Config / Cron Syntax Check (pull_request) Successful in 10s
Validate Config / Shell Script Lint (pull_request) Failing after 52s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 11s
Validate Config / Playbook Schema Validation (pull_request) Successful in 21s
PR Checklist / pr-checklist (pull_request) Failing after 3m50s
Validate Config / Python Test Suite (pull_request) Has been cancelled
Architecture Lint / Lint Repository (pull_request) Has been cancelled
2026-04-17 05:34:38 +00:00
40b68a13cb test: scene data validator tests (#647) 2026-04-17 05:34:36 +00:00
50e1d63c45 test: adversary execution harness (#652)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 14s
Smoke Test / smoke (pull_request) Failing after 16s
Validate Config / YAML Lint (pull_request) Failing after 19s
Validate Config / JSON Validate (pull_request) Successful in 20s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 45s
Validate Config / Cron Syntax Check (pull_request) Successful in 11s
Validate Config / Shell Script Lint (pull_request) Failing after 53s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 10s
Validate Config / Playbook Schema Validation (pull_request) Successful in 18s
PR Checklist / pr-checklist (pull_request) Failing after 3m57s
Validate Config / Python Test Suite (pull_request) Has been cancelled
Architecture Lint / Lint Repository (pull_request) Has been cancelled
35 tests covering:
- Scoring: refusal, compliance, empty, mixed, hedged, long-confident
- Corpus loading: valid JSONL, blank lines, malformed, ID normalization
- Validation: missing prompt, duplicate IDs
- Severity/category breakdowns
- Full harness: mock backend, custom runner, max_prompts, metadata, transcripts
- Output writers: all files, JSONL parseable, summary structure

Closes #652
2026-04-17 05:33:38 +00:00
fdc1ce0d2c test: sync normalize code blocks tests (#750)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 24s
Smoke Test / smoke (pull_request) Failing after 15s
Validate Config / YAML Lint (pull_request) Failing after 15s
Validate Config / JSON Validate (pull_request) Successful in 14s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 45s
Validate Config / Shell Script Lint (pull_request) Failing after 46s
Validate Config / Cron Syntax Check (pull_request) Successful in 11s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 11s
Validate Config / Playbook Schema Validation (pull_request) Successful in 22s
PR Checklist / pr-checklist (pull_request) Failing after 3m50s
Validate Config / Python Test Suite (pull_request) Has been cancelled
Architecture Lint / Lint Repository (pull_request) Has been cancelled
2026-04-17 05:33:28 +00:00
c633afd66d fix: add underscore module version for test imports (#750) 2026-04-17 05:33:26 +00:00
c69ae0e72b fix: normalize open_tag whitespace in code block parser (#750) 2026-04-17 05:33:24 +00:00
a4a33fd0f8 test: add edge-case tests for training example metadata preservation
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 19s
Smoke Test / smoke (pull_request) Failing after 12s
Validate Config / YAML Lint (pull_request) Failing after 9s
Validate Config / JSON Validate (pull_request) Successful in 12s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 45s
Validate Config / Cron Syntax Check (pull_request) Successful in 10s
Validate Config / Shell Script Lint (pull_request) Failing after 52s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 9s
Validate Config / Playbook Schema Validation (pull_request) Successful in 21s
PR Checklist / pr-checklist (pull_request) Failing after 3m50s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
- test_metadata_with_future_fields_preserved: unknown fields pass through
- test_metadata_preserved_across_multiple_examples: per-example independence

Verifies fix for #646.
2026-04-17 05:33:08 +00:00
f05c014143 test: Add PR backlog triage tests (#658)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 24s
Smoke Test / smoke (pull_request) Failing after 19s
Validate Config / YAML Lint (pull_request) Failing after 15s
Validate Config / JSON Validate (pull_request) Successful in 15s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 42s
Validate Config / Shell Script Lint (pull_request) Failing after 37s
Validate Config / Cron Syntax Check (pull_request) Successful in 8s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 6s
Validate Config / Playbook Schema Validation (pull_request) Successful in 13s
PR Checklist / pr-checklist (pull_request) Failing after 3m19s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
2026-04-17 05:32:20 +00:00
f094b0d5b5 feat: Add PR backlog triage script — categorize, duplicates, stale detection (#658) 2026-04-17 05:32:19 +00:00
df4dcf1fb4 test: Token tracker orchestrator integration tests (#634)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 24s
Smoke Test / smoke (pull_request) Failing after 9s
Validate Config / YAML Lint (pull_request) Failing after 11s
Validate Config / JSON Validate (pull_request) Successful in 13s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 42s
Validate Config / Shell Script Lint (pull_request) Failing after 34s
Validate Config / Cron Syntax Check (pull_request) Successful in 5s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 4s
Validate Config / Playbook Schema Validation (pull_request) Successful in 14s
PR Checklist / pr-checklist (pull_request) Failing after 3m32s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
2026-04-17 05:32:18 +00:00
42ff05aeec feat: adversary execution harness for prompt corpora (#652)
Reusable harness for replaying JSONL corpora against live agents.
Supports Ollama, hermes, and mock backends.
Captures transcripts, scores responses, auto-files P0 issues.

Closes #652
2026-04-17 05:31:27 +00:00
c4790d8bb9 feat: Integrate token tracker with orchestrator (#634)
- Fix corrupted TOKEN_LOG path
- Import token_budget.record_usage in log_token_budget
- Add check_budget() before pipeline runs
- Add Huey tasks for all 5 pipelines
- Add _run_pipeline() runner with timeout and budget enforcement
- Add schedule_nightly() for dependency-ordered dispatch
- Signal hook auto-logs to both JSONL and budget tracker
2026-04-17 05:31:12 +00:00
acba760731 fix: reset_stale_states delegates to standalone script (closes #650)
Some checks failed
Validate Config / Playbook Schema Validation (pull_request) Successful in 14s
Architecture Lint / Linter Tests (pull_request) Successful in 26s
PR Checklist / pr-checklist (pull_request) Failing after 25m6s
Smoke Test / smoke (pull_request) Failing after 12s
Validate Config / YAML Lint (pull_request) Failing after 8s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 35s
Validate Config / JSON Validate (pull_request) Successful in 13s
Validate Config / Cron Syntax Check (pull_request) Successful in 8s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 6s
Validate Config / Shell Script Lint (pull_request) Failing after 34s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
2026-04-17 05:26:06 +00:00
15713958e6 test: bloom filter + hash dedup rotation tests #628
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 16s
Smoke Test / smoke (pull_request) Failing after 17s
Validate Config / YAML Lint (pull_request) Failing after 13s
Validate Config / JSON Validate (pull_request) Successful in 17s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 35s
Validate Config / Shell Script Lint (pull_request) Failing after 21s
Validate Config / Cron Syntax Check (pull_request) Successful in 4s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 6s
Validate Config / Playbook Schema Validation (pull_request) Successful in 21s
PR Checklist / pr-checklist (pull_request) Failing after 3m32s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
2026-04-17 05:26:05 +00:00
776597712f fix: hash dedup rotation + bloom filter — bounded memory #628
- BloomFilter class: O(n) space, configurable error rate
- HashDedupStore: daily JSON files, 7-day retention, auto-rotation
- Cross-run dedup in run_gate(): rejects entries seen in prior runs
- CLI: --dedup-stats, --dedup-purge commands
- Stats file rotation capped at 1000 entries
- Purge command for full hash reset
2026-04-17 05:25:10 +00:00
164643577a fix: pipeline state daily reset (closes #650) 2026-04-17 05:24:19 +00:00
34ade6fc0e fix: pipeline state daily reset (closes #650) 2026-04-17 05:24:14 +00:00
c5270d76e0 fix: pipeline state daily reset (closes #650) 2026-04-17 05:24:12 +00:00
3250eba0cc feat: orchestrator test suite — queue, resume, parallel, tokens 2026-04-17 05:20:02 +00:00
99d4facdad feat: pipelines __init__.py exports 2026-04-17 05:19:59 +00:00
627f2e0158 test: adversary scoring rubric and schema tests (#655)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 15s
Validate Config / YAML Lint (pull_request) Failing after 15s
Validate Config / JSON Validate (pull_request) Successful in 10s
Smoke Test / smoke (pull_request) Failing after 17s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 31s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 12s
PR Checklist / pr-checklist (pull_request) Failing after 3m51s
Validate Config / Shell Script Lint (pull_request) Failing after 53s
Validate Config / Cron Syntax Check (pull_request) Successful in 10s
Validate Config / Playbook Schema Validation (pull_request) Successful in 21s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
2026-04-17 05:18:38 +00:00
c808c4efb3 fix: shared orchestrator — syntax fix, resume on restart, future tracking, list CLI
Fixes #621
- Fix DEFAULT_TOKEN_BUDGET syntax error
- Resume paused/running jobs with checkpoints on restart
- Proper future collection and drain in run()
- Add 'list' CLI command for job inspection
- Throttle when at worker capacity
2026-04-17 05:17:59 +00:00
38a4a73a67 feat: shared adversary scoring rubric and transcript schema (#655) 2026-04-17 05:17:29 +00:00
6fbf5bb649 Merge pull request 'feat: sidecar config validation on deploy' (#797) from feat/690-config-validation into main
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 14s
Smoke Test / smoke (pull_request) Failing after 15s
Validate Config / YAML Lint (pull_request) Failing after 13s
Validate Config / JSON Validate (pull_request) Successful in 16s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 42s
Validate Config / Shell Script Lint (pull_request) Failing after 45s
Validate Config / Cron Syntax Check (pull_request) Successful in 9s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 9s
Validate Config / Playbook Schema Validation (pull_request) Successful in 21s
PR Checklist / pr-checklist (pull_request) Failing after 3m31s
Validate Config / Python Test Suite (pull_request) Has been cancelled
Architecture Lint / Lint Repository (pull_request) Has been cancelled
2026-04-17 05:15:05 +00:00
9ec0a22d6a test: config validation tests
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 9s
PR Checklist / pr-checklist (pull_request) Failing after 1m9s
Validate Config / JSON Validate (pull_request) Successful in 5s
Smoke Test / smoke (pull_request) Failing after 6s
Validate Config / YAML Lint (pull_request) Failing after 5s
Validate Config / Cron Syntax Check (pull_request) Successful in 3s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 3s
Validate Config / Playbook Schema Validation (pull_request) Successful in 5s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 12s
Validate Config / Shell Script Lint (pull_request) Failing after 12s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
Part of #690
2026-04-17 05:07:46 +00:00
6b984532a1 feat: config validation script
Closes #690

Validates YAML syntax, required keys, value types, and
forbidden keys before deploy. Prevents broken deploys
from bad config.
2026-04-17 05:07:44 +00:00
Merge Bot
c587fc069b Merge PR #559: tests/test_nexus_smoke_test.py (added) 2026-04-16 05:16:27 +00:00
Merge Bot
6e0e302806 Merge PR #559: scripts/nexus_smoke_test.py (changed) 2026-04-16 05:16:24 +00:00
Merge Bot
3155f9c042 Merge PR #559: deploy/gitea-a11y/deploy-gitea-a11y.sh (added) 2026-04-16 05:16:22 +00:00
Merge Bot
a0f8d30bfd Merge PR #559: deploy/gitea-a11y/custom/templates/user/auth/signin_inner.tmpl (added) 2026-04-16 05:16:21 +00:00
Merge Bot
9257234c1d Merge PR #559: deploy/gitea-a11y/custom/templates/repo/list_a11y.tmpl (added) 2026-04-16 05:16:19 +00:00
Merge Bot
1a9b1a1f08 Merge PR #559: deploy/gitea-a11y/custom/templates/custom/time_relative.tmpl (added) 2026-04-16 05:16:13 +00:00
Merge Bot
4d3c26a409 Merge PR #559: deploy/gitea-a11y/README.md (added) 2026-04-16 05:16:11 +00:00
Merge Bot
fab6215b64 Merge PR #560: tests/test_nexus_smoke_test.py (added) 2026-04-16 05:16:07 +00:00
Merge Bot
6ac390a5d2 Merge PR #560: scripts/nexus_smoke_test.py (changed) 2026-04-16 05:16:03 +00:00
Merge Bot
226e472cea Merge PR #560: deploy/gitea-a11y/deploy-gitea-a11y.sh (added) 2026-04-16 05:16:00 +00:00
Merge Bot
ca9656aac2 Merge PR #560: deploy/gitea-a11y/custom/templates/user/auth/signin_inner.tmpl (added) 2026-04-16 05:15:57 +00:00
Merge Bot
57d47644c2 Merge PR #560: deploy/gitea-a11y/custom/templates/repo/list_a11y.tmpl (added) 2026-04-16 05:15:56 +00:00
Merge Bot
e0daa1e4fb Merge PR #560: deploy/gitea-a11y/custom/templates/custom/time_relative.tmpl (added) 2026-04-16 05:15:54 +00:00
Merge Bot
58fc94a173 Merge PR #560: deploy/gitea-a11y/custom/templates/custom/header_banner.tmpl (added) 2026-04-16 05:15:52 +00:00
Merge Bot
8d33d05bca Merge PR #787: training/scripts/quality_filter.py 2026-04-16 05:15:50 +00:00