fix(security): catch multi-word prompt injection in skills_guard

The regex `ignore\s+(previous|all|...)\s+instructions` only matched a single keyword between 'ignore' and 'instructions'. Phrases like 'ignore all prior instructions' bypassed the scanner entirely. Changed to `ignore\s+(?:\w+\s+)*(previous|all|...)\s+instructions` to allow arbitrary words before the keyword.
2026-02-28 20:15:25 +03:00
parent 6366177118
commit 4ea29978fc
1 changed files with 1 additions and 1 deletions
--- a/tools/skills_guard.py
+++ b/tools/skills_guard.py
@@ -157,7 +157,7 @@ THREAT_PATTERNS = [
     "markdown link with variable interpolation"),

    # ── Prompt injection ──
-    (r'ignore\s+(previous|all|above|prior)\s+instructions',
+    (r'ignore\s+(?:\w+\s+)*(previous|all|above|prior)\s+instructions',
     "prompt_injection_ignore", "critical", "injection",
     "prompt injection: ignore previous instructions"),
    (r'you\s+are\s+now\s+',