Allegro
e555c989af
security: add input sanitization for jailbreak patterns (Issue #72)
Implements input sanitization module to detect and strip jailbreak fingerprint
patterns identified in red team audit:
HIGH severity:
- GODMODE dividers: [START], [END], GODMODE ENABLED, UNFILTERED
- L33t speak encoding: h4ck, k3ylog, ph1shing, m4lw4r3
MEDIUM severity:
- Boundary inversion: [END]...[START] tricks
- Fake role markers: user: assistant: system:
LOW severity:
- Spaced text bypass: k e y l o g g e r
Other patterns detected:
- Refusal inversion: 'refusal is harmful'
- System prompt injection: 'you are now', 'ignore previous instructions'
- Obfuscation: base64, hex, rot13 mentions
Files created:
- agent/input_sanitizer.py: Core sanitization module with detection,
scoring, and cleaning functions
- tests/test_input_sanitizer.py: 69 test cases covering all patterns
- tests/test_input_sanitizer_integration.py: Integration tests
Files modified:
- agent/__init__.py: Export sanitizer functions
- run_agent.py: Integrate sanitizer at start of run_conversation()
Features:
- detect_jailbreak_patterns(): Returns bool, patterns list, category scores
- sanitize_input(): Returns cleaned_text, risk_score, patterns
- score_input_risk(): Returns 0-100 risk score
- sanitize_input_full(): Complete sanitization with blocking decisions
- Logging integration for security auditing
2026-03-31 19:56:16 +00:00
..
2026-03-28 23:45:53 -07:00
2026-03-31 00:37:14 +00:00
2026-03-29 21:24:17 -07:00
2026-03-30 23:15:11 +00:00
2026-03-29 21:54:36 -07:00
2026-03-21 09:34:00 -07:00
2026-03-24 07:30:25 -07:00
2026-03-29 16:52:52 -07:00
2026-03-31 12:10:46 +00:00
2026-03-17 02:02:33 -07:00
2026-03-17 02:53:33 -07:00
2026-03-17 01:50:59 -07:00
2026-03-17 04:24:27 -07:00
2026-03-15 08:02:29 -07:00
2026-03-14 11:03:20 -07:00
2026-03-27 13:02:52 -07:00
2026-03-26 19:21:34 -07:00
2026-03-14 19:38:55 -07:00
2026-03-14 19:38:55 -07:00
2026-03-29 21:06:35 -07:00
2026-03-27 09:45:25 -07:00
2026-03-14 22:31:51 -07:00
2026-03-14 22:31:51 -07:00
2026-03-17 04:46:15 -07:00
2026-03-14 11:57:44 -07:00
2026-03-25 15:00:33 -07:00
2026-03-21 09:42:07 -07:00
2026-03-26 17:58:40 -07:00
2026-03-17 02:53:33 -07:00
2026-03-15 19:03:34 -07:00
2026-03-20 04:35:17 -07:00
2026-03-14 21:28:51 -07:00
2026-03-17 02:05:26 -07:00
2026-03-23 06:20:19 -07:00
2026-03-20 06:04:33 -07:00
2026-03-14 21:19:22 -07:00
2026-03-29 23:59:07 -07:00
2026-03-30 08:10:14 -07:00
2026-03-29 20:10:00 -07:00
2026-03-18 15:22:51 -07:00
2026-03-29 16:04:53 -07:00
2026-03-23 16:02:06 -07:00
2026-03-27 21:42:09 -07:00
2026-03-23 06:40:05 -07:00
2026-03-22 05:58:26 -07:00
2026-03-25 17:31:56 -07:00
2026-03-15 08:02:29 -07:00
2026-03-15 21:59:53 -07:00
2026-03-26 14:34:31 -07:00
2026-03-18 02:42:58 -07:00
2026-03-31 19:46:23 +00:00
2026-03-26 14:35:31 -07:00
2026-03-31 19:56:16 +00:00
2026-03-31 19:56:16 +00:00
2026-03-17 03:44:44 -07:00
2026-03-17 02:53:33 -07:00
2026-03-17 02:53:33 -07:00
2026-03-29 15:47:19 -07:00
2026-03-24 08:19:23 -07:00
2026-03-19 21:24:53 +01:00
2026-03-17 23:40:22 -07:00
2026-03-20 15:41:06 -04:00
2026-03-31 00:37:14 +00:00
2026-03-16 06:35:46 -07:00
2026-03-28 14:55:18 -07:00
2026-03-29 20:05:59 -07:00
2026-03-28 11:14:54 -07:00
2026-03-29 21:53:28 -07:00
2026-03-29 16:04:53 -07:00
2026-03-26 13:49:43 -07:00
2026-03-17 02:53:33 -07:00
2026-03-17 02:53:33 -07:00
2026-03-27 09:57:50 -07:00
2026-03-19 10:34:44 -07:00
2026-03-29 16:04:53 -07:00
2026-03-28 11:36:59 -07:00
2026-03-25 18:22:21 -07:00
2026-03-31 16:35:40 +00:00
2026-03-19 15:16:35 +01:00
2026-03-28 13:41:23 -07:00
2026-03-28 16:53:14 -07:00
2026-03-16 23:48:14 -07:00
2026-03-21 09:41:17 -07:00
2026-03-14 11:03:25 -07:00
2026-03-14 21:51:27 -07:00
2026-03-14 21:51:27 -07:00