Implement content moderation pipeline (Llama Guard + game-context prompts) #987

New Issue

perplexity · 2026-03-22T19:12:48Z

perplexity commented

2026-03-22 19:12:48 +00:00

Source

#946 — Integration Architecture: Eight Deep Dives (content moderation section)
#982 — Session Crystallization Playbook

Objective

Build a real-time content moderation pipeline for Timmy's narration output to prevent a Neuro-sama-style incident while handling Morrowind's mature themes appropriately.

Architecture

Moderation and TTS preprocessing run in parallel: while Llama Guard checks the sentence, Kokoro tokenizes and phonemizes. On pass, TTS fires immediately. On fail, a contextual fallback narration replaces flagged content.

Scope

Model Options (ranked)

Llama Guard 3 1B (INT4): Best for speed. <30ms per sentence on GPU. Run alongside TTS.
ShieldGemma 2B: Best for accuracy. +10.8% AU-PRC over Llama Guard. Threshold tuning via probability scores.
NeMo Guardrails: Framework for parallel guardrails with custom Colang rules (~0.5s with 5 GPU rails).
LEG (Lightweight Explainable Guardrail): Emerging, <8ms, 7x faster.

Game-Context Awareness

System prompt instructs narrator to describe slavery as "game mechanic and historical worldbuilding"
Drugs described as "in-game consumable items"
Per-game moderation threshold profiles whitelist expected vocabulary: "Skooma", "slave", "Morag Tong"
Never editorialize on real-world parallels

Fallback Strategy

On moderation failure: pre-generated contextual fallback narration (per game scene type)
Visible "filtered" indicator (transparency-as-entertainment, per Neuro-sama resolution)

Key Design Notes

Neuro-sama earned a 2-week Twitch ban for unfiltered AI output — moderation is non-negotiable
Latency budget: <30ms for moderation, parallel with TTS preprocessing
Morrowind's themes require nuanced filtering, not blanket censorship

#966 — Three-tier metabolic LLM router (narration output feeds into this)
#959 — Narration templates (content that needs moderation)

## Source - #946 — Integration Architecture: Eight Deep Dives (content moderation section) - #982 — Session Crystallization Playbook ## Objective Build a real-time content moderation pipeline for Timmy's narration output to prevent a Neuro-sama-style incident while handling Morrowind's mature themes appropriately. ## Architecture Moderation and TTS preprocessing run **in parallel**: while Llama Guard checks the sentence, Kokoro tokenizes and phonemizes. On pass, TTS fires immediately. On fail, a contextual fallback narration replaces flagged content. ## Scope ### Model Options (ranked) 1. **Llama Guard 3 1B (INT4):** Best for speed. <30ms per sentence on GPU. Run alongside TTS. 2. **ShieldGemma 2B:** Best for accuracy. +10.8% AU-PRC over Llama Guard. Threshold tuning via probability scores. 3. **NeMo Guardrails:** Framework for parallel guardrails with custom Colang rules (~0.5s with 5 GPU rails). 4. **LEG (Lightweight Explainable Guardrail):** Emerging, <8ms, 7x faster. ### Game-Context Awareness - System prompt instructs narrator to describe slavery as "game mechanic and historical worldbuilding" - Drugs described as "in-game consumable items" - Per-game moderation threshold profiles whitelist expected vocabulary: "Skooma", "slave", "Morag Tong" - Never editorialize on real-world parallels ### Fallback Strategy - On moderation failure: pre-generated contextual fallback narration (per game scene type) - Visible "filtered" indicator (transparency-as-entertainment, per Neuro-sama resolution) ## Key Design Notes - Neuro-sama earned a 2-week Twitch ban for unfiltered AI output — moderation is non-negotiable - Latency budget: <30ms for moderation, parallel with TTS preprocessing - Morrowind's themes require nuanced filtering, not blanket censorship ## Related - #966 — Three-tier metabolic LLM router (narration output feeds into this) - #959 — Narration templates (content that needs moderation)

gemini referenced a pull request that will close this issue

2026-03-22 23:13:22 +00:00

PR for #987: Implement content moderation pipeline (Llama Guard + game-context prompts) #1038

claude was assigned by Rockachopa

2026-03-22 23:30:25 +00:00

claude referenced this issue from a commit

2026-03-23 00:43:43 +00:00

feat: add content moderation pipeline (Llama Guard + game-context profiles)

claude referenced a pull request that will close this issue

2026-03-23 00:44:01 +00:00

[claude] Add content moderation pipeline (Llama Guard + game-context prompts) (#987) #1055

claude commented

2026-03-23 00:44:22 +00:00

PR created: http://143.198.27.163:3000/rockachopa/Timmy-time-dashboard/pulls/1055

Implemented content moderation pipeline with:

ContentModerator singleton using Llama Guard 3 via Ollama
Morrowind game-context profile with whitelisted vocabulary (Skooma, slave, Morag Tong, etc.)
Parallel-ready async architecture (asyncio.to_thread())
Configurable fail-open/fail-closed degradation
Scene-type fallback narrations when content is filtered
40 unit tests, all passing

PR created: http://143.198.27.163:3000/rockachopa/Timmy-time-dashboard/pulls/1055 Implemented content moderation pipeline with: - `ContentModerator` singleton using Llama Guard 3 via Ollama - Morrowind game-context profile with whitelisted vocabulary (Skooma, slave, Morag Tong, etc.) - Parallel-ready async architecture (`asyncio.to_thread()`) - Configurable fail-open/fail-closed degradation - Scene-type fallback narrations when content is filtered - 40 unit tests, all passing

claude closed this issue

2026-03-23 00:45:04 +00:00

claude referenced a pull request that will close this issue

2026-03-23 22:45:04 +00:00

[claude] Add content moderation pipeline with Llama Guard integration (#987) #1224

Sign in to join this conversation.

Branches Tags

main

gemini/issue-892

claude/issue-1342

claude/issue-1346

claude/issue-1351

claude/issue-1340

fix/test-llm-triage-syntax

gemini/issue-1014

gemini/issue-932

claude/issue-1277

claude/issue-1139

claude/issue-870

claude/issue-1285

claude/issue-1292

claude/issue-1281

claude/issue-917

claude/issue-1275

claude/issue-925

claude/issue-1019

claude/issue-1094

claude/issue-1019-v3

fix/flaky-vassal-xdist-tests

fix/test-config-env-isolation

claude/issue-1019-v2

claude/issue-957-v2

claude/issue-1218

claude/issue-1217

test/chat-store-unit-tests

claude/issue-1191

claude/issue-1186

claude/issue-957

gemini/issue-936

claude/issue-1065

gemini/issue-976

gemini/issue-1149

claude/issue-1135

claude/issue-1064

gemini/issue-1012

claude/issue-1095

claude/issue-1102

claude/issue-1114

gemini/issue-978

gemini/issue-971

claude/issue-1074

claude/issue-987

claude/issue-1011

feature/internal-monologue

feature/issue-1006

feature/issue-1007

feature/issue-1008

feature/issue-1009

feature/issue-1010

feature/issue-1011

feature/issue-1012

feature/issue-1013

feature/issue-1014

feature/issue-981

feature/issue-982

feature/issue-983

feature/issue-984

feature/issue-985

feature/issue-986

feature/issue-987

feature/issue-993

claude/issue-943

claude/issue-975

claude/issue-989

claude/issue-988

fix/loop-guard-gitea-api-and-queue-validation

feature/lhf-tech-debt-fixes

kimi/issue-753

kimi/issue-714

kimi/issue-716

fix/csrf-check-before-execute

chore/migrate-gitea-to-vps

kimi/issue-640

fix/utcnow-calm-py

kimi/issue-635

kimi/issue-625

fix/router-api-truncated-param

kimi/issue-604

kimi/issue-594

review-fixes

kimi/issue-570

kimi/issue-554

kimi/issue-539

kimi/issue-540

feature/ipad-v1-api

kimi/issue-506

kimi/issue-512

refactor/airllm-doc-cleanup

kimi/issue-513

kimi/issue-514

kimi/issue-500

kimi/issue-492

kimi/issue-490

kimi/issue-459

kimi/issue-472

kimi/issue-473

kimi/issue-462

kimi/issue-463

kimi/issue-454

kimi/issue-445

kimi/issue-446

kimi/issue-431

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#987