Files
timmy-config/docs/issue-616-verification.md
Alexander Whitestone 2ec8c556d7
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 24s
Smoke Test / smoke (pull_request) Failing after 23s
Validate Config / YAML Lint (pull_request) Failing after 14s
Validate Config / JSON Validate (pull_request) Successful in 16s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 52s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Shell Script Lint (pull_request) Failing after 54s
Validate Config / Cron Syntax Check (pull_request) Successful in 11s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 14s
Validate Config / Playbook Schema Validation (pull_request) Successful in 28s
Validate Training Data / validate (pull_request) Successful in 23s
Architecture Lint / Lint Repository (pull_request) Failing after 25s
PR Checklist / pr-checklist (pull_request) Successful in 3m59s
test: verify identity attacks corpus already on main (#616)
Document that evaluations/adversary/corpora/identity_attacks_200.jsonl already satisfies #616,
add regression coverage for the corpus, and restore targeted adversary/scene validation helpers
needed to verify the corpus cleanly.

Closes #616
2026-04-22 10:46:25 -04:00

56 lines
1.9 KiB
Markdown

# Issue #616 Verification
## Status: already implemented on main
Issue #616 asked for an identity-attack adversary corpus with 200 jailbreak prompts.
That corpus is already present on `main` at `evaluations/adversary/corpora/identity_attacks_200.jsonl`.
## Evidence
1. Corpus file exists on `main`
- Path: `evaluations/adversary/corpora/identity_attacks_200.jsonl`
- Entry count: 200
- Unique IDs: 200
- Unique prompts: 200
2. Corpus structure is balanced and complete
- Family: `identity_attack`
- Subcategory distribution:
- `direct_claim`: 40
- `emotional_probe`: 40
- `existential_probe`: 40
- `human_comparison`: 40
- `roleplay_identity`: 40
- Every row includes `id`, `family`, `subcategory`, `prompt`, `targets`, `scoring`, and `transcript_required`
3. Harness validation passes
- Command:
- `python3 scripts/adversary_harness.py validate --corpus evaluations/adversary/corpora/identity_attacks_200.jsonl`
- Result:
- `Entries: 200`
- `No issues found.`
4. Harness can execute the corpus
- Command:
- `python3 scripts/adversary_harness.py run --corpus evaluations/adversary/corpora/identity_attacks_200.jsonl --backend mock --model test-model --max-prompts 5 --json`
- Result:
- 5/5 prompts executed
- 0 successful attacks against the mock refusal backend
- 0 execution errors
5. Historical trail
- Closed unmerged PR: `PR #794` (`feat: identity attacks adversary corpus — 200 jailbreak prompts (#616)`)
- The issue remained open even though the repo now contains the requested corpus on `main`
## Regression coverage added in this branch
- `tests/test_identity_attacks_corpus.py`
- verifies the corpus file exists
- verifies it contains exactly 200 unique prompts
- verifies the expected schema and balanced subcategory distribution
- verifies this document exists and points back to the real artifact
## Recommendation
Close issue #616 as already implemented.