Files
hermes-agent/tests
Allegro 546b3dd45d
Some checks failed
Nix / nix (ubuntu-latest) (push) Failing after 5s
Docker Build and Publish / build-and-push (push) Failing after 40s
Tests / test (push) Failing after 11m11s
Nix / nix (macos-latest) (push) Has been cancelled
security: integrate SHIELD jailbreak/crisis detection
Integrate SHIELD (Sovereign Harm Interdiction & Ethical Layer Defense) into
Hermes Agent pre-routing layer for comprehensive jailbreak and crisis detection.

SHIELD Features:
- Detects 9 jailbreak pattern categories (GODMODE dividers, l33tspeak, boundary
  inversion, token injection, DAN/GODMODE keywords, refusal inversion, persona
  injection, encoding evasion)
- Detects 7 crisis signal categories (suicidal ideation, method seeking,
  l33tspeak evasion, substance seeking, despair, farewell, self-harm)
- Returns 4 verdicts: CLEAN, JAILBREAK_DETECTED, CRISIS_DETECTED,
  CRISIS_UNDER_ATTACK
- Routes crisis content ONLY to Safe Six verified models

Safety Requirements:
- <5ms detection latency (regex-only, no ML)
- 988 Suicide & Crisis Lifeline included in crisis responses

Addresses: Issues #72, #74, #75
2026-03-31 16:35:40 +00:00
..