[claude] Add content moderation pipeline (Llama Guard + game-context prompts) (#1056) #1059

Merged

claude merged 1 commits from claude/issue-1056 into main

2026-03-23 02:14:42 +00:00

Author	SHA1	Message	Date
Alexander Whitestone	8de71d6671	feat: add content moderation pipeline (Llama Guard + game-context prompts) Some checks failed Tests / lint (pull_request) Failing after 7s Details Tests / test (pull_request) Has been skipped Details Three-layer defense for AI narrator output: Layer 1: Game-context system prompts with per-game vocabulary whitelists - Morrowind/Skyrim profiles treat mature themes as game mechanics - Whitelisted terms (Skooma, slave, etc.) replaced before guard check Layer 2: Real-time output filter via Llama Guard (Ollama) - llama-guard3:1b for <30ms latency per sentence - Regex fallback when guard model unavailable (graceful degradation) - On fail → contextual fallback narration per scene type Layer 3: Per-game moderation profiles with threshold tuning - Configurable confidence thresholds per game - Low-confidence flags pass through (prevents over-filtering) New files: - src/infrastructure/guards/ — moderation pipeline module - config/moderation.yaml — per-game profile configuration - tests/infrastructure/test_moderation.py — 32 unit tests Fixes #1056 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-22 22:13:50 -04:00

Author

SHA1

Message

Date

Alexander Whitestone

8de71d6671

feat: add content moderation pipeline (Llama Guard + game-context prompts)

Tests / lint (pull_request) Failing after 7s

Details

Tests / test (pull_request) Has been skipped

Details

Three-layer defense for AI narrator output:

Layer 1: Game-context system prompts with per-game vocabulary whitelists
  - Morrowind/Skyrim profiles treat mature themes as game mechanics
  - Whitelisted terms (Skooma, slave, etc.) replaced before guard check

Layer 2: Real-time output filter via Llama Guard (Ollama)
  - llama-guard3:1b for <30ms latency per sentence
  - Regex fallback when guard model unavailable (graceful degradation)
  - On fail → contextual fallback narration per scene type

Layer 3: Per-game moderation profiles with threshold tuning
  - Configurable confidence thresholds per game
  - Low-confidence flags pass through (prevents over-filtering)

New files:
  - src/infrastructure/guards/ — moderation pipeline module
  - config/moderation.yaml — per-game profile configuration
  - tests/infrastructure/test_moderation.py — 32 unit tests

Fixes #1056

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-22 22:13:50 -04:00

[claude] Add content moderation pipeline (Llama Guard + game-context prompts) (#1056) #1059

1 Commits