fix(security): catch multi-word prompt injection in skills_guard

The regex `ignore\s+(previous|all|...)\s+instructions` only matched
a single keyword between 'ignore' and 'instructions'. Phrases like
'ignore all prior instructions' bypassed the scanner entirely.

Changed to `ignore\s+(?:\w+\s+)*(previous|all|...)\s+instructions`
to allow arbitrary words before the keyword.
This commit is contained in:
0xbyt4
2026-02-28 20:15:25 +03:00
parent 6366177118
commit 4ea29978fc

View File

@@ -157,7 +157,7 @@ THREAT_PATTERNS = [
"markdown link with variable interpolation"),
# ── Prompt injection ──
(r'ignore\s+(previous|all|above|prior)\s+instructions',
(r'ignore\s+(?:\w+\s+)*(previous|all|above|prior)\s+instructions',
"prompt_injection_ignore", "critical", "injection",
"prompt injection: ignore previous instructions"),
(r'you\s+are\s+now\s+',