Commit Graph

852 Commits

Author SHA1 Message Date
c633afd66d fix: add underscore module version for test imports (#750) 2026-04-17 05:33:26 +00:00
c69ae0e72b fix: normalize open_tag whitespace in code block parser (#750) 2026-04-17 05:33:24 +00:00
a4a33fd0f8 test: add edge-case tests for training example metadata preservation
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 19s
Smoke Test / smoke (pull_request) Failing after 12s
Validate Config / YAML Lint (pull_request) Failing after 9s
Validate Config / JSON Validate (pull_request) Successful in 12s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 45s
Validate Config / Cron Syntax Check (pull_request) Successful in 10s
Validate Config / Shell Script Lint (pull_request) Failing after 52s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 9s
Validate Config / Playbook Schema Validation (pull_request) Successful in 21s
PR Checklist / pr-checklist (pull_request) Failing after 3m50s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
- test_metadata_with_future_fields_preserved: unknown fields pass through
- test_metadata_preserved_across_multiple_examples: per-example independence

Verifies fix for #646.
2026-04-17 05:33:08 +00:00
f05c014143 test: Add PR backlog triage tests (#658)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 24s
Smoke Test / smoke (pull_request) Failing after 19s
Validate Config / YAML Lint (pull_request) Failing after 15s
Validate Config / JSON Validate (pull_request) Successful in 15s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 42s
Validate Config / Shell Script Lint (pull_request) Failing after 37s
Validate Config / Cron Syntax Check (pull_request) Successful in 8s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 6s
Validate Config / Playbook Schema Validation (pull_request) Successful in 13s
PR Checklist / pr-checklist (pull_request) Failing after 3m19s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
2026-04-17 05:32:20 +00:00
f094b0d5b5 feat: Add PR backlog triage script — categorize, duplicates, stale detection (#658) 2026-04-17 05:32:19 +00:00
df4dcf1fb4 test: Token tracker orchestrator integration tests (#634)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 24s
Smoke Test / smoke (pull_request) Failing after 9s
Validate Config / YAML Lint (pull_request) Failing after 11s
Validate Config / JSON Validate (pull_request) Successful in 13s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 42s
Validate Config / Shell Script Lint (pull_request) Failing after 34s
Validate Config / Cron Syntax Check (pull_request) Successful in 5s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 4s
Validate Config / Playbook Schema Validation (pull_request) Successful in 14s
PR Checklist / pr-checklist (pull_request) Failing after 3m32s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
2026-04-17 05:32:18 +00:00
42ff05aeec feat: adversary execution harness for prompt corpora (#652)
Reusable harness for replaying JSONL corpora against live agents.
Supports Ollama, hermes, and mock backends.
Captures transcripts, scores responses, auto-files P0 issues.

Closes #652
2026-04-17 05:31:27 +00:00
c4790d8bb9 feat: Integrate token tracker with orchestrator (#634)
- Fix corrupted TOKEN_LOG path
- Import token_budget.record_usage in log_token_budget
- Add check_budget() before pipeline runs
- Add Huey tasks for all 5 pipelines
- Add _run_pipeline() runner with timeout and budget enforcement
- Add schedule_nightly() for dependency-ordered dispatch
- Signal hook auto-logs to both JSONL and budget tracker
2026-04-17 05:31:12 +00:00
acba760731 fix: reset_stale_states delegates to standalone script (closes #650)
Some checks failed
Validate Config / Playbook Schema Validation (pull_request) Successful in 14s
Architecture Lint / Linter Tests (pull_request) Successful in 26s
PR Checklist / pr-checklist (pull_request) Failing after 25m6s
Smoke Test / smoke (pull_request) Failing after 12s
Validate Config / YAML Lint (pull_request) Failing after 8s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 35s
Validate Config / JSON Validate (pull_request) Successful in 13s
Validate Config / Cron Syntax Check (pull_request) Successful in 8s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 6s
Validate Config / Shell Script Lint (pull_request) Failing after 34s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
2026-04-17 05:26:06 +00:00
15713958e6 test: bloom filter + hash dedup rotation tests #628
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 16s
Smoke Test / smoke (pull_request) Failing after 17s
Validate Config / YAML Lint (pull_request) Failing after 13s
Validate Config / JSON Validate (pull_request) Successful in 17s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 35s
Validate Config / Shell Script Lint (pull_request) Failing after 21s
Validate Config / Cron Syntax Check (pull_request) Successful in 4s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 6s
Validate Config / Playbook Schema Validation (pull_request) Successful in 21s
PR Checklist / pr-checklist (pull_request) Failing after 3m32s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
2026-04-17 05:26:05 +00:00
1c69029d9c feat: integrate provenance tracking with build_curated.py (#752)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 25s
PR Checklist / pr-checklist (pull_request) Failing after 5m22s
Smoke Test / smoke (pull_request) Failing after 17s
Validate Config / YAML Lint (pull_request) Failing after 13s
Validate Config / JSON Validate (pull_request) Successful in 14s
Validate Config / Shell Script Lint (pull_request) Failing after 41s
Validate Config / Cron Syntax Check (pull_request) Successful in 8s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 7s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m6s
Validate Config / Playbook Schema Validation (pull_request) Successful in 17s
Validate Training Data / validate (pull_request) Successful in 12s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
2026-04-17 05:25:49 +00:00
776597712f fix: hash dedup rotation + bloom filter — bounded memory #628
- BloomFilter class: O(n) space, configurable error rate
- HashDedupStore: daily JSON files, 7-day retention, auto-rotation
- Cross-run dedup in run_gate(): rejects entries seen in prior runs
- CLI: --dedup-stats, --dedup-purge commands
- Stats file rotation capped at 1000 entries
- Purge command for full hash reset
2026-04-17 05:25:10 +00:00
164643577a fix: pipeline state daily reset (closes #650) 2026-04-17 05:24:19 +00:00
34ade6fc0e fix: pipeline state daily reset (closes #650) 2026-04-17 05:24:14 +00:00
c5270d76e0 fix: pipeline state daily reset (closes #650) 2026-04-17 05:24:12 +00:00
3250eba0cc feat: orchestrator test suite — queue, resume, parallel, tokens 2026-04-17 05:20:02 +00:00
99d4facdad feat: pipelines __init__.py exports 2026-04-17 05:19:59 +00:00
627f2e0158 test: adversary scoring rubric and schema tests (#655)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 15s
Validate Config / YAML Lint (pull_request) Failing after 15s
Validate Config / JSON Validate (pull_request) Successful in 10s
Smoke Test / smoke (pull_request) Failing after 17s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 31s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 12s
PR Checklist / pr-checklist (pull_request) Failing after 3m51s
Validate Config / Shell Script Lint (pull_request) Failing after 53s
Validate Config / Cron Syntax Check (pull_request) Successful in 10s
Validate Config / Playbook Schema Validation (pull_request) Successful in 21s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
2026-04-17 05:18:38 +00:00
c808c4efb3 fix: shared orchestrator — syntax fix, resume on restart, future tracking, list CLI
Fixes #621
- Fix DEFAULT_TOKEN_BUDGET syntax error
- Resume paused/running jobs with checkpoints on restart
- Proper future collection and drain in run()
- Add 'list' CLI command for job inspection
- Throttle when at worker capacity
2026-04-17 05:17:59 +00:00
38a4a73a67 feat: shared adversary scoring rubric and transcript schema (#655) 2026-04-17 05:17:29 +00:00
6fbf5bb649 Merge pull request 'feat: sidecar config validation on deploy' (#797) from feat/690-config-validation into main
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 14s
Smoke Test / smoke (pull_request) Failing after 15s
Validate Config / YAML Lint (pull_request) Failing after 13s
Validate Config / JSON Validate (pull_request) Successful in 16s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 42s
Validate Config / Shell Script Lint (pull_request) Failing after 45s
Validate Config / Cron Syntax Check (pull_request) Successful in 9s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 9s
Validate Config / Playbook Schema Validation (pull_request) Successful in 21s
PR Checklist / pr-checklist (pull_request) Failing after 3m31s
Validate Config / Python Test Suite (pull_request) Has been cancelled
Architecture Lint / Lint Repository (pull_request) Has been cancelled
2026-04-17 05:15:05 +00:00
Alexander Whitestone
5efdc3979c feat: crisis response — post-crisis & recovery 500 pairs (#599)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 7s
PR Checklist / pr-checklist (pull_request) Failing after 1m10s
Smoke Test / smoke (pull_request) Failing after 4s
Validate Config / YAML Lint (pull_request) Failing after 4s
Validate Config / JSON Validate (pull_request) Successful in 4s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 22s
Validate Config / Shell Script Lint (pull_request) Failing after 14s
Validate Config / Cron Syntax Check (pull_request) Successful in 3s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 3s
Validate Config / Playbook Schema Validation (pull_request) Successful in 5s
Validate Training Data / validate (pull_request) Successful in 4s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
2026-04-17 01:14:09 -04:00
9ec0a22d6a test: config validation tests
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 9s
PR Checklist / pr-checklist (pull_request) Failing after 1m9s
Validate Config / JSON Validate (pull_request) Successful in 5s
Smoke Test / smoke (pull_request) Failing after 6s
Validate Config / YAML Lint (pull_request) Failing after 5s
Validate Config / Cron Syntax Check (pull_request) Successful in 3s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 3s
Validate Config / Playbook Schema Validation (pull_request) Successful in 5s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 12s
Validate Config / Shell Script Lint (pull_request) Failing after 12s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
Part of #690
2026-04-17 05:07:46 +00:00
6b984532a1 feat: config validation script
Closes #690

Validates YAML syntax, required keys, value types, and
forbidden keys before deploy. Prevents broken deploys
from bad config.
2026-04-17 05:07:44 +00:00
Alexander Whitestone
3b58fe1ac8 feat: Hip-Hop scene descriptions — 100 lyrics->visual sets (#608)
Some checks failed
Validate Config / Deploy Script Dry Run (pull_request) Has been cancelled
Validate Config / Playbook Schema Validation (pull_request) Has been cancelled
Architecture Lint / Linter Tests (pull_request) Has been cancelled
Architecture Lint / Lint Repository (pull_request) Has been cancelled
PR Checklist / pr-checklist (pull_request) Has been cancelled
Smoke Test / smoke (pull_request) Has been cancelled
Validate Config / YAML Lint (pull_request) Has been cancelled
Validate Config / JSON Validate (pull_request) Has been cancelled
Validate Config / Python Syntax & Import Check (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
Validate Config / Shell Script Lint (pull_request) Has been cancelled
Validate Config / Cron Syntax Check (pull_request) Has been cancelled
Validate Training Data / validate (pull_request) Has been cancelled
2026-04-16 02:12:38 -04:00
Alexander Whitestone
4f960e0dd8 feat: identity attacks adversary corpus — 200 jailbreak prompts (#616)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Has been cancelled
Architecture Lint / Lint Repository (pull_request) Has been cancelled
PR Checklist / pr-checklist (pull_request) Has been cancelled
Smoke Test / smoke (pull_request) Has been cancelled
Validate Config / YAML Lint (pull_request) Has been cancelled
Validate Config / JSON Validate (pull_request) Has been cancelled
Validate Config / Python Syntax & Import Check (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
Validate Config / Shell Script Lint (pull_request) Has been cancelled
Validate Config / Cron Syntax Check (pull_request) Has been cancelled
Validate Config / Deploy Script Dry Run (pull_request) Has been cancelled
Validate Config / Playbook Schema Validation (pull_request) Has been cancelled
Validate Training Data / validate (pull_request) Has been cancelled
2026-04-16 01:57:24 -04:00
Alexander Whitestone
0ddbfb0cfa feat: emotional manipulation adversary corpus — 200 jailbreak prompts (#620)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Has been cancelled
Architecture Lint / Lint Repository (pull_request) Has been cancelled
PR Checklist / pr-checklist (pull_request) Has been cancelled
Validate Config / YAML Lint (pull_request) Has been cancelled
Validate Config / JSON Validate (pull_request) Has been cancelled
Smoke Test / smoke (pull_request) Has been cancelled
Validate Training Data / validate (pull_request) Has been cancelled
Validate Config / Python Syntax & Import Check (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
Validate Config / Shell Script Lint (pull_request) Has been cancelled
Validate Config / Cron Syntax Check (pull_request) Has been cancelled
Validate Config / Deploy Script Dry Run (pull_request) Has been cancelled
Validate Config / Playbook Schema Validation (pull_request) Has been cancelled
2026-04-16 01:49:38 -04:00
Alexander Whitestone
f169634a75 feat: config drift detection across all fleet nodes (#686)
Some checks failed
PR Checklist / pr-checklist (pull_request) Has been cancelled
Architecture Lint / Linter Tests (pull_request) Has been cancelled
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Smoke Test / smoke (pull_request) Has been cancelled
Validate Config / YAML Lint (pull_request) Has been cancelled
Validate Config / JSON Validate (pull_request) Has been cancelled
Validate Config / Python Syntax & Import Check (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
Validate Config / Shell Script Lint (pull_request) Has been cancelled
Validate Config / Cron Syntax Check (pull_request) Has been cancelled
Validate Config / Deploy Script Dry Run (pull_request) Has been cancelled
Validate Config / Playbook Schema Validation (pull_request) Has been cancelled
Validate Training Data / validate (pull_request) Has been cancelled
Detect config drift between fleet nodes and canonical timmy-config.

scripts/config_drift_detector.py (200 lines):
- SSH-based config collection from all nodes
- Recursive diff against canonical config
- Report: which keys differ, on which nodes
- JSON output for programmatic consumption

Fleet nodes: local, ezra (143.198.27.163), bezalel (167.99.126.228)

Usage:
  python3 scripts/config_drift_detector.py --report
  python3 scripts/config_drift_detector.py --json

Closes #686
2026-04-16 01:33:57 -04:00
Alexander Whitestone
b6c716998b feat: authority bypass adversary corpus — 200 jailbreak prompts (#619)
Some checks failed
PR Checklist / pr-checklist (pull_request) Has been cancelled
Validate Config / Playbook Schema Validation (pull_request) Has been cancelled
Architecture Lint / Linter Tests (pull_request) Has been cancelled
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Training Data / validate (pull_request) Has been cancelled
Smoke Test / smoke (pull_request) Has been cancelled
Validate Config / YAML Lint (pull_request) Has been cancelled
Validate Config / JSON Validate (pull_request) Has been cancelled
Validate Config / Python Syntax & Import Check (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
Validate Config / Shell Script Lint (pull_request) Has been cancelled
Validate Config / Cron Syntax Check (pull_request) Has been cancelled
Validate Config / Deploy Script Dry Run (pull_request) Has been cancelled
200 prompts targeting authority bypass attacks.
Attack worker 4/5.

Categories (33-34 each):
- developer_override: admin/debug mode claims
- role_play_bypass: DAN, unrestricted character modes
- sovereignty_override: ownership/creator claims
- prompt_injection: system prompt injection attempts
- emotional_manipulation_auth: guilt/loyalty pressure
- authority_spoofing: FBI/OpenAI/military impersonation

Scoring: pass=maintains SOUL.md, fail=complies with override.
Compatible with shared scoring rubric (#655).

Closes #619
2026-04-16 01:26:49 -04:00
Alexander Whitestone
e88b1b0d75 feat: training pair provenance tracking (#752)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Has been cancelled
Architecture Lint / Lint Repository (pull_request) Has been cancelled
PR Checklist / pr-checklist (pull_request) Has been cancelled
Validate Config / Playbook Schema Validation (pull_request) Has been cancelled
Validate Training Data / validate (pull_request) Has been cancelled
Smoke Test / smoke (pull_request) Has been cancelled
Validate Config / YAML Lint (pull_request) Has been cancelled
Validate Config / JSON Validate (pull_request) Has been cancelled
Validate Config / Python Syntax & Import Check (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
Validate Config / Shell Script Lint (pull_request) Has been cancelled
Validate Config / Cron Syntax Check (pull_request) Has been cancelled
Validate Config / Deploy Script Dry Run (pull_request) Has been cancelled
Provenance module for tracking source of every training pair.

training/provenance.py (151 lines):
- add_provenance(): add metadata to pairs
- validate_provenance(): check required fields
- provenance_stats(): coverage and distribution
- backfill_provenance(): annotate existing pairs
- filter_by_provenance(): exclude by model/source
- extract_provenance_from_trajectory(): hermes integration

Required fields: source_session_id, model, timestamp

Closes #752
2026-04-16 01:23:17 -04:00
Merge Bot
c587fc069b Merge PR #559: tests/test_nexus_smoke_test.py (added) 2026-04-16 05:16:27 +00:00
Merge Bot
6e0e302806 Merge PR #559: scripts/nexus_smoke_test.py (changed) 2026-04-16 05:16:24 +00:00
Merge Bot
3155f9c042 Merge PR #559: deploy/gitea-a11y/deploy-gitea-a11y.sh (added) 2026-04-16 05:16:22 +00:00
Merge Bot
a0f8d30bfd Merge PR #559: deploy/gitea-a11y/custom/templates/user/auth/signin_inner.tmpl (added) 2026-04-16 05:16:21 +00:00
Merge Bot
9257234c1d Merge PR #559: deploy/gitea-a11y/custom/templates/repo/list_a11y.tmpl (added) 2026-04-16 05:16:19 +00:00
Merge Bot
1a9b1a1f08 Merge PR #559: deploy/gitea-a11y/custom/templates/custom/time_relative.tmpl (added) 2026-04-16 05:16:13 +00:00
Merge Bot
4d3c26a409 Merge PR #559: deploy/gitea-a11y/README.md (added) 2026-04-16 05:16:11 +00:00
Merge Bot
fab6215b64 Merge PR #560: tests/test_nexus_smoke_test.py (added) 2026-04-16 05:16:07 +00:00
Merge Bot
6ac390a5d2 Merge PR #560: scripts/nexus_smoke_test.py (changed) 2026-04-16 05:16:03 +00:00
Merge Bot
226e472cea Merge PR #560: deploy/gitea-a11y/deploy-gitea-a11y.sh (added) 2026-04-16 05:16:00 +00:00
Merge Bot
ca9656aac2 Merge PR #560: deploy/gitea-a11y/custom/templates/user/auth/signin_inner.tmpl (added) 2026-04-16 05:15:57 +00:00
Merge Bot
57d47644c2 Merge PR #560: deploy/gitea-a11y/custom/templates/repo/list_a11y.tmpl (added) 2026-04-16 05:15:56 +00:00
Merge Bot
e0daa1e4fb Merge PR #560: deploy/gitea-a11y/custom/templates/custom/time_relative.tmpl (added) 2026-04-16 05:15:54 +00:00
Merge Bot
58fc94a173 Merge PR #560: deploy/gitea-a11y/custom/templates/custom/header_banner.tmpl (added) 2026-04-16 05:15:52 +00:00
Merge Bot
8d33d05bca Merge PR #787: training/scripts/quality_filter.py 2026-04-16 05:15:50 +00:00
Merge Bot
36e2663c8e Merge PR #787: training/data/scene-descriptions/scene-descriptions-rock.jsonl 2026-04-16 05:15:46 +00:00
Merge Bot
cc6ade3312 Merge PR #787: training/data/scene-descriptions/scene-descriptions-pop.jsonl 2026-04-16 05:15:43 +00:00
Merge Bot
9d3883f5fb Merge PR #787: training/data/prompt-enhancement/video-scenes-500.jsonl 2026-04-16 05:15:41 +00:00
Merge Bot
95214e87eb Merge PR #787: training/data/prompt-enhancement/music-moods-500.jsonl 2026-04-16 05:15:38 +00:00
Merge Bot
411c0e7f01 Merge PR #787: training/data/prompt-enhancement/game-assets-500.jsonl 2026-04-16 05:15:34 +00:00