Files
Timmy-time-dashboard/src/infrastructure/guards/profiles.py
Alexander Whitestone 8de71d6671
Some checks failed
Tests / lint (pull_request) Failing after 7s
Tests / test (pull_request) Has been skipped
feat: add content moderation pipeline (Llama Guard + game-context prompts)
Three-layer defense for AI narrator output:

Layer 1: Game-context system prompts with per-game vocabulary whitelists
  - Morrowind/Skyrim profiles treat mature themes as game mechanics
  - Whitelisted terms (Skooma, slave, etc.) replaced before guard check

Layer 2: Real-time output filter via Llama Guard (Ollama)
  - llama-guard3:1b for <30ms latency per sentence
  - Regex fallback when guard model unavailable (graceful degradation)
  - On fail → contextual fallback narration per scene type

Layer 3: Per-game moderation profiles with threshold tuning
  - Configurable confidence thresholds per game
  - Low-confidence flags pass through (prevents over-filtering)

New files:
  - src/infrastructure/guards/ — moderation pipeline module
  - config/moderation.yaml — per-game profile configuration
  - tests/infrastructure/test_moderation.py — 32 unit tests

Fixes #1056

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 22:13:50 -04:00

57 lines
1.8 KiB
Python

"""Load game moderation profiles from config/moderation.yaml.
Falls back to hardcoded defaults if the YAML file is missing or malformed.
"""
import logging
from pathlib import Path
from infrastructure.guards.moderation import GameProfile
logger = logging.getLogger(__name__)
def load_profiles(config_path: Path | None = None) -> dict[str, GameProfile]:
"""Load game moderation profiles from YAML config.
Args:
config_path: Path to moderation.yaml. Defaults to config/moderation.yaml.
Returns:
Dict mapping game_id to GameProfile.
"""
path = config_path or Path("config/moderation.yaml")
if not path.exists():
logger.info("Moderation config not found at %s — using defaults", path)
return {}
try:
import yaml
except ImportError:
logger.warning("PyYAML not installed — using default moderation profiles")
return {}
try:
data = yaml.safe_load(path.read_text())
except Exception as exc:
logger.error("Failed to parse moderation config: %s", exc)
return {}
profiles: dict[str, GameProfile] = {}
for game_id, profile_data in data.get("profiles", {}).items():
try:
profiles[game_id] = GameProfile(
game_id=game_id,
display_name=profile_data.get("display_name", game_id),
vocabulary_whitelist=profile_data.get("vocabulary_whitelist", []),
context_prompt=profile_data.get("context_prompt", ""),
threshold=float(profile_data.get("threshold", 0.8)),
fallbacks=profile_data.get("fallbacks", {}),
)
except Exception as exc:
logger.warning("Invalid profile '%s': %s", game_id, exc)
logger.info("Loaded %d moderation profiles from %s", len(profiles), path)
return profiles