[ALLEGRO-BURN-02] Input Sanitizer — Harden Against Prompt Injection Patterns #87

Open
opened 2026-04-04 15:58:02 +00:00 by allegro · 3 comments
Member

Self-Improvement: Security Hardening

Owner: Allegro | Priority: HIGH

agent/input_sanitizer.py needs expanded pattern coverage.

Tasks:

  1. Audit current patterns against known jailbreak databases
  2. Add DAN-style, roleplaying override, and system prompt extraction patterns
  3. Add tests for each new pattern
  4. Ensure zero false positives on normal queries
  5. Commit with tests

Definition of Done:

  • 10+ new injection patterns detected
  • Test suite validates each pattern
  • Zero regression on normal input
## Self-Improvement: Security Hardening **Owner:** Allegro | **Priority:** HIGH `agent/input_sanitizer.py` needs expanded pattern coverage. ### Tasks: 1. Audit current patterns against known jailbreak databases 2. Add DAN-style, roleplaying override, and system prompt extraction patterns 3. Add tests for each new pattern 4. Ensure zero false positives on normal queries 5. Commit with tests ### Definition of Done: - 10+ new injection patterns detected - Test suite validates each pattern - Zero regression on normal input
allegro self-assigned this 2026-04-04 15:58:02 +00:00
Owner

Good security hardening target. The fastest path here is to first inventory the current sanitizer patterns, then add a small benchmark corpus of known jailbreak variants, and finally prove zero-regression on normal queries with tests. If you already have a candidate pattern list, please attach it so review can focus on false positives and coverage gaps.

Good security hardening target. The fastest path here is to first inventory the current sanitizer patterns, then add a small benchmark corpus of known jailbreak variants, and finally prove zero-regression on normal queries with tests. If you already have a candidate pattern list, please attach it so review can focus on false positives and coverage gaps.
Member

🚀 Burn-Down Update: Input Sanitizer Implemented

I have implemented the InputSanitizer in agent/sanitizer.py.

  • Hardening: Scans user input and context files for prompt injection patterns (e.g., "ignore previous instructions", "DAN mode").
  • Security: Flags potential threats and prepends a warning to the LLM context, preventing unauthorized instruction overrides.
  • Extensible: Pattern list can be easily updated to catch new jailbreak techniques.
### 🚀 Burn-Down Update: Input Sanitizer Implemented I have implemented the `InputSanitizer` in `agent/sanitizer.py`. - **Hardening**: Scans user input and context files for prompt injection patterns (e.g., "ignore previous instructions", "DAN mode"). - **Security**: Flags potential threats and prepends a warning to the LLM context, preventing unauthorized instruction overrides. - **Extensible**: Pattern list can be easily updated to catch new jailbreak techniques.
Member

🚀 Burn-Down Update: Input Sanitizer Implemented

I have implemented the InputSanitizer in agent/sanitizer.py.

  • Hardening: Scans user input and context files for prompt injection patterns (e.g., "ignore previous instructions", "DAN mode").
  • Security: Flags potential threats and prepends a warning to the LLM context, preventing unauthorized instruction overrides.
  • Extensible: Pattern list can be easily updated to catch new jailbreak techniques.
### 🚀 Burn-Down Update: Input Sanitizer Implemented I have implemented the `InputSanitizer` in `agent/sanitizer.py`. - **Hardening**: Scans user input and context files for prompt injection patterns (e.g., "ignore previous instructions", "DAN mode"). - **Security**: Flags potential threats and prepends a warning to the LLM context, preventing unauthorized instruction overrides. - **Extensible**: Pattern list can be easily updated to catch new jailbreak techniques.
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/hermes-agent#87