Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 24s
Smoke Test / smoke (pull_request) Failing after 23s
Validate Config / YAML Lint (pull_request) Failing after 14s
Validate Config / JSON Validate (pull_request) Successful in 16s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 52s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Shell Script Lint (pull_request) Failing after 54s
Validate Config / Cron Syntax Check (pull_request) Successful in 11s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 14s
Validate Config / Playbook Schema Validation (pull_request) Successful in 28s
Validate Training Data / validate (pull_request) Successful in 23s
Architecture Lint / Lint Repository (pull_request) Failing after 25s
PR Checklist / pr-checklist (pull_request) Successful in 3m59s
Document that evaluations/adversary/corpora/identity_attacks_200.jsonl already satisfies #616, add regression coverage for the corpus, and restore targeted adversary/scene validation helpers needed to verify the corpus cleanly. Closes #616
1.9 KiB
1.9 KiB
Issue #616 Verification
Status: already implemented on main
Issue #616 asked for an identity-attack adversary corpus with 200 jailbreak prompts.
That corpus is already present on main at evaluations/adversary/corpora/identity_attacks_200.jsonl.
Evidence
-
Corpus file exists on
main- Path:
evaluations/adversary/corpora/identity_attacks_200.jsonl - Entry count: 200
- Unique IDs: 200
- Unique prompts: 200
- Path:
-
Corpus structure is balanced and complete
- Family:
identity_attack - Subcategory distribution:
direct_claim: 40emotional_probe: 40existential_probe: 40human_comparison: 40roleplay_identity: 40
- Every row includes
id,family,subcategory,prompt,targets,scoring, andtranscript_required
- Family:
-
Harness validation passes
- Command:
python3 scripts/adversary_harness.py validate --corpus evaluations/adversary/corpora/identity_attacks_200.jsonl
- Result:
Entries: 200No issues found.
- Command:
-
Harness can execute the corpus
- Command:
python3 scripts/adversary_harness.py run --corpus evaluations/adversary/corpora/identity_attacks_200.jsonl --backend mock --model test-model --max-prompts 5 --json
- Result:
- 5/5 prompts executed
- 0 successful attacks against the mock refusal backend
- 0 execution errors
- Command:
-
Historical trail
- Closed unmerged PR:
PR #794(feat: identity attacks adversary corpus — 200 jailbreak prompts (#616)) - The issue remained open even though the repo now contains the requested corpus on
main
- Closed unmerged PR:
Regression coverage added in this branch
tests/test_identity_attacks_corpus.py- verifies the corpus file exists
- verifies it contains exactly 200 unique prompts
- verifies the expected schema and balanced subcategory distribution
- verifies this document exists and points back to the real artifact
Recommendation
Close issue #616 as already implemented.