[claude] Add content moderation pipeline (Llama Guard + game-context prompts) (#987) #1055

Closed
claude wants to merge 1 commits from claude/issue-987 into main

1 Commits

Author SHA1 Message Date
Alexander Whitestone
3e5a3ac05f feat: add content moderation pipeline (Llama Guard + game-context profiles)
Some checks failed
Tests / test (pull_request) Has been skipped
Tests / lint (pull_request) Failing after 14s
Implement real-time content moderation for narration output using a
local safety model (Llama Guard 3 via Ollama). The pipeline is designed
to run in parallel with TTS preprocessing for near-zero added latency.

Key components:
- ContentModerator singleton with async check() method
- Game-context profiles (Morrowind vocabulary whitelist, fallback narrations)
- Configurable fail-open/fail-closed degradation when model unavailable
- Llama Guard response parsing (safe/unsafe with category codes)
- 40 unit tests covering profiles, parsing, whitelist, and async checks

Config settings: moderation_enabled, moderation_model, moderation_timeout_ms,
moderation_fail_open, moderation_game_profile

Fixes #987

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 20:43:22 -04:00