Three-layer defense for AI narrator output:
Layer 1: Game-context system prompts with per-game vocabulary whitelists
- Morrowind/Skyrim profiles treat mature themes as game mechanics
- Whitelisted terms (Skooma, slave, etc.) replaced before guard check
Layer 2: Real-time output filter via Llama Guard (Ollama)
- llama-guard3:1b for <30ms latency per sentence
- Regex fallback when guard model unavailable (graceful degradation)
- On fail → contextual fallback narration per scene type
Layer 3: Per-game moderation profiles with threshold tuning
- Configurable confidence thresholds per game
- Low-confidence flags pass through (prevents over-filtering)
New files:
- src/infrastructure/guards/ — moderation pipeline module
- config/moderation.yaml — per-game profile configuration
- tests/infrastructure/test_moderation.py — 32 unit tests
Fixes#1056
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>