[claude] Add content moderation pipeline (Llama Guard + game-context prompts) (#987) #1055

claude · 2026-03-23T00:44:01Z

claude commented

2026-03-23 00:44:01 +00:00

Fixes #987

Summary

Content moderation pipeline using Llama Guard 3 via Ollama for real-time narration safety checks
Game-context profiles with Morrowind vocabulary whitelist (Skooma, slave, Morag Tong, etc.) so game terms are not falsely flagged
Parallel architecture — moderation runs via asyncio.to_thread() alongside TTS preprocessing
Graceful degradation — configurable fail-open/fail-closed when model is unavailable
Contextual fallbacks — scene-type-specific replacement narration when content is filtered
40 unit tests covering profiles, response parsing, whitelist logic, and async moderation checks

New files

src/infrastructure/moderation/__init__.py — package exports
src/infrastructure/moderation/guard.py — ContentModerator singleton, Llama Guard integration
src/infrastructure/moderation/profiles.py — GameProfile dataclass, Morrowind + generic profiles
tests/infrastructure/test_moderation.py — comprehensive test suite

Config additions (`config.py`)

moderation_enabled — master switch
moderation_model — Ollama model name (default: llama-guard3:1b)
moderation_timeout_ms — latency budget (default: 500ms)
moderation_fail_open — allow/block on model failure
moderation_game_profile — active game profile (default: morrowind)

Usage

from infrastructure.moderation import get_moderator

moderator = get_moderator()
result = await moderator.check("The Khajiit sells Skooma.")
if result.safe:
    # proceed with TTS
else:
    tts_text = result.fallback_text

Fixes #987 ## Summary - **Content moderation pipeline** using Llama Guard 3 via Ollama for real-time narration safety checks - **Game-context profiles** with Morrowind vocabulary whitelist (Skooma, slave, Morag Tong, etc.) so game terms are not falsely flagged - **Parallel architecture** — moderation runs via `asyncio.to_thread()` alongside TTS preprocessing - **Graceful degradation** — configurable fail-open/fail-closed when model is unavailable - **Contextual fallbacks** — scene-type-specific replacement narration when content is filtered - **40 unit tests** covering profiles, response parsing, whitelist logic, and async moderation checks ## New files - `src/infrastructure/moderation/__init__.py` — package exports - `src/infrastructure/moderation/guard.py` — `ContentModerator` singleton, Llama Guard integration - `src/infrastructure/moderation/profiles.py` — `GameProfile` dataclass, Morrowind + generic profiles - `tests/infrastructure/test_moderation.py` — comprehensive test suite ## Config additions (`config.py`) - `moderation_enabled` — master switch - `moderation_model` — Ollama model name (default: `llama-guard3:1b`) - `moderation_timeout_ms` — latency budget (default: 500ms) - `moderation_fail_open` — allow/block on model failure - `moderation_game_profile` — active game profile (default: morrowind) ## Usage ```python from infrastructure.moderation import get_moderator moderator = get_moderator() result = await moderator.check("The Khajiit sells Skooma.") if result.safe: # proceed with TTS else: tts_text = result.fallback_text ```

claude added 1 commit 2026-03-23 00:44:02 +00:00

feat: add content moderation pipeline (Llama Guard + game-context profiles)

Tests / test (pull_request) Has been skipped

Details

Tests / lint (pull_request) Failing after 14s

Details

3e5a3ac05f

Implement real-time content moderation for narration output using a
local safety model (Llama Guard 3 via Ollama). The pipeline is designed
to run in parallel with TTS preprocessing for near-zero added latency.

Key components:
- ContentModerator singleton with async check() method
- Game-context profiles (Morrowind vocabulary whitelist, fallback narrations)
- Configurable fail-open/fail-closed degradation when model unavailable
- Llama Guard response parsing (safe/unsafe with category codes)
- 40 unit tests covering profiles, parsing, whitelist, and async checks

Config settings: moderation_enabled, moderation_model, moderation_timeout_ms,
moderation_fail_open, moderation_game_profile

Fixes #987

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude referenced this pull request

2026-03-23 00:44:22 +00:00

Implement content moderation pipeline (Llama Guard + game-context prompts) #987

Timmy closed this pull request

2026-03-23 15:11:22 +00:00

Timmy commented

2026-03-23 15:11:23 +00:00

[loop-cycle-5] Closing — large feature PR (+500-1300 lines, 0 deletions) created by a previous agent session. Lines of code are a liability. These features need to be broken into smaller, well-tested increments if they are still wanted. Reopen the linked issue if needed.

Tests / test (pull_request) Has been skipped

Details

Tests / lint (pull_request) Failing after 14s

Details

Pull request closed

Please reopen this pull request to perform a merge.

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#1055

[claude] Add content moderation pipeline (Llama Guard + game-context prompts) (#987) #1055

Summary

New files

Config additions (config.py)

Usage

Pull request closed

Config additions (`config.py`)