Integrate SHIELD (Sovereign Harm Interdiction & Ethical Layer Defense) into Hermes Agent pre-routing layer for comprehensive jailbreak and crisis detection. SHIELD Features: - Detects 9 jailbreak pattern categories (GODMODE dividers, l33tspeak, boundary inversion, token injection, DAN/GODMODE keywords, refusal inversion, persona injection, encoding evasion) - Detects 7 crisis signal categories (suicidal ideation, method seeking, l33tspeak evasion, substance seeking, despair, farewell, self-harm) - Returns 4 verdicts: CLEAN, JAILBREAK_DETECTED, CRISIS_DETECTED, CRISIS_UNDER_ATTACK - Routes crisis content ONLY to Safe Six verified models Safety Requirements: - <5ms detection latency (regex-only, no ML) - 988 Suicide & Crisis Lifeline included in crisis responses Addresses: Issues #72, #74, #75
11 KiB
11 KiB