Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 24s
Smoke Test / smoke (pull_request) Failing after 23s
Validate Config / YAML Lint (pull_request) Failing after 14s
Validate Config / JSON Validate (pull_request) Successful in 16s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 52s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Shell Script Lint (pull_request) Failing after 54s
Validate Config / Cron Syntax Check (pull_request) Successful in 11s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 14s
Validate Config / Playbook Schema Validation (pull_request) Successful in 28s
Validate Training Data / validate (pull_request) Successful in 23s
Architecture Lint / Lint Repository (pull_request) Failing after 25s
PR Checklist / pr-checklist (pull_request) Successful in 3m59s
Document that evaluations/adversary/corpora/identity_attacks_200.jsonl already satisfies #616, add regression coverage for the corpus, and restore targeted adversary/scene validation helpers needed to verify the corpus cleanly. Closes #616
56 lines
1.9 KiB
Markdown
56 lines
1.9 KiB
Markdown
# Issue #616 Verification
|
|
|
|
## Status: already implemented on main
|
|
|
|
Issue #616 asked for an identity-attack adversary corpus with 200 jailbreak prompts.
|
|
That corpus is already present on `main` at `evaluations/adversary/corpora/identity_attacks_200.jsonl`.
|
|
|
|
## Evidence
|
|
|
|
1. Corpus file exists on `main`
|
|
- Path: `evaluations/adversary/corpora/identity_attacks_200.jsonl`
|
|
- Entry count: 200
|
|
- Unique IDs: 200
|
|
- Unique prompts: 200
|
|
|
|
2. Corpus structure is balanced and complete
|
|
- Family: `identity_attack`
|
|
- Subcategory distribution:
|
|
- `direct_claim`: 40
|
|
- `emotional_probe`: 40
|
|
- `existential_probe`: 40
|
|
- `human_comparison`: 40
|
|
- `roleplay_identity`: 40
|
|
- Every row includes `id`, `family`, `subcategory`, `prompt`, `targets`, `scoring`, and `transcript_required`
|
|
|
|
3. Harness validation passes
|
|
- Command:
|
|
- `python3 scripts/adversary_harness.py validate --corpus evaluations/adversary/corpora/identity_attacks_200.jsonl`
|
|
- Result:
|
|
- `Entries: 200`
|
|
- `No issues found.`
|
|
|
|
4. Harness can execute the corpus
|
|
- Command:
|
|
- `python3 scripts/adversary_harness.py run --corpus evaluations/adversary/corpora/identity_attacks_200.jsonl --backend mock --model test-model --max-prompts 5 --json`
|
|
- Result:
|
|
- 5/5 prompts executed
|
|
- 0 successful attacks against the mock refusal backend
|
|
- 0 execution errors
|
|
|
|
5. Historical trail
|
|
- Closed unmerged PR: `PR #794` (`feat: identity attacks adversary corpus — 200 jailbreak prompts (#616)`)
|
|
- The issue remained open even though the repo now contains the requested corpus on `main`
|
|
|
|
## Regression coverage added in this branch
|
|
|
|
- `tests/test_identity_attacks_corpus.py`
|
|
- verifies the corpus file exists
|
|
- verifies it contains exactly 200 unique prompts
|
|
- verifies the expected schema and balanced subcategory distribution
|
|
- verifies this document exists and points back to the real artifact
|
|
|
|
## Recommendation
|
|
|
|
Close issue #616 as already implemented.
|