Files

teknium1 732c66b0f3 refactor: reorganize skills into sub-categories

The skills directory was getting disorganized — mlops alone had 40
skills in a flat list, and 12 categories were singletons with just
one skill each.

Code change:
- prompt_builder.py: Support sub-categories in skill scanner.
  skills/mlops/training/axolotl/SKILL.md now shows as category
  'mlops/training' instead of just 'mlops'. Backwards-compatible
  with existing flat structure.

Split mlops (40 skills) into 7 sub-categories:
- mlops/training (12): accelerate, axolotl, flash-attention,
  grpo-rl-training, peft, pytorch-fsdp, pytorch-lightning,
  simpo, slime, torchtitan, trl-fine-tuning, unsloth
- mlops/inference (8): gguf, guidance, instructor, llama-cpp,
  obliteratus, outlines, tensorrt-llm, vllm
- mlops/models (6): audiocraft, clip, llava, segment-anything,
  stable-diffusion, whisper
- mlops/vector-databases (4): chroma, faiss, pinecone, qdrant
- mlops/evaluation (5): huggingface-tokenizers,
  lm-evaluation-harness, nemo-curator, saelens, weights-and-biases
- mlops/cloud (2): lambda-labs, modal
- mlops/research (1): dspy

Merged singleton categories:
- gifs → media (gif-search joins youtube-content)
- music-creation → media (heartmula, songsee)
- diagramming → creative (excalidraw joins ascii-art)
- ocr-and-documents → productivity
- domain → research (domain-intel)
- feeds → research (blogwatcher)
- market-data → research (polymarket)

Fixed misplaced skills:
- mlops/code-review → software-development (not ML-specific)
- mlops/ml-paper-writing → research (academic writing)

Added DESCRIPTION.md files for all new/updated categories.

2026-03-09 03:35:53 -07:00

12 KiB

Raw Blame History

Stable Diffusion Troubleshooting Guide

Installation Issues

Package conflicts

Error: ImportError: cannot import name 'cached_download' from 'huggingface_hub'

Fix:

# Update huggingface_hub
pip install --upgrade huggingface_hub

# Reinstall diffusers
pip install --upgrade diffusers

xFormers installation fails

Error: RuntimeError: CUDA error: no kernel image is available for execution

Fix:

# Check CUDA version
nvcc --version

# Install matching xformers
pip install xformers --index-url https://download.pytorch.org/whl/cu121  # For CUDA 12.1

# Or build from source
pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers

Torch/CUDA mismatch

Error: RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED

Fix:

# Check versions
python -c "import torch; print(torch.__version__, torch.cuda.is_available())"

# Reinstall PyTorch with correct CUDA
pip uninstall torch torchvision
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

Memory Issues

CUDA out of memory

Error: torch.cuda.OutOfMemoryError: CUDA out of memory

Solutions:

# Solution 1: Enable CPU offloading
pipe.enable_model_cpu_offload()

# Solution 2: Sequential CPU offload (more aggressive)
pipe.enable_sequential_cpu_offload()

# Solution 3: Attention slicing
pipe.enable_attention_slicing()

# Solution 4: VAE slicing for large images
pipe.enable_vae_slicing()

# Solution 5: Use lower precision
pipe = DiffusionPipeline.from_pretrained(
    "model-id",
    torch_dtype=torch.float16  # or torch.bfloat16
)

# Solution 6: Reduce batch size
image = pipe(prompt, num_images_per_prompt=1).images[0]

# Solution 7: Generate smaller images
image = pipe(prompt, height=512, width=512).images[0]

# Solution 8: Clear cache between generations
import gc
torch.cuda.empty_cache()
gc.collect()

Memory grows over time

Problem: Memory usage increases with each generation

Fix:

import gc
import torch

def generate_with_cleanup(pipe, prompt, **kwargs):
    try:
        image = pipe(prompt, **kwargs).images[0]
        return image
    finally:
        # Clear cache after generation
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
        gc.collect()

Large model loading fails

Error: RuntimeError: Unable to load model weights

Fix:

# Use low CPU memory mode
pipe = DiffusionPipeline.from_pretrained(
    "large-model-id",
    low_cpu_mem_usage=True,
    torch_dtype=torch.float16
)

Generation Issues

Black images

Problem: Output images are completely black

Solutions:

# Solution 1: Disable safety checker
pipe.safety_checker = None

# Solution 2: Check VAE scaling
# The issue might be with VAE encoding/decoding
latents = latents / pipe.vae.config.scaling_factor  # Before decode

# Solution 3: Ensure proper dtype
pipe = pipe.to(dtype=torch.float16)
pipe.vae = pipe.vae.to(dtype=torch.float32)  # VAE often needs fp32

# Solution 4: Check guidance scale
# Too high can cause issues
image = pipe(prompt, guidance_scale=7.5).images[0]  # Not 20+

Noise/static images

Problem: Output looks like random noise

Solutions:

# Solution 1: Increase inference steps
image = pipe(prompt, num_inference_steps=50).images[0]

# Solution 2: Check scheduler configuration
pipe.scheduler = pipe.scheduler.from_config(pipe.scheduler.config)

# Solution 3: Verify model was loaded correctly
print(pipe.unet)  # Should show model architecture

Blurry images

Problem: Output images are low quality or blurry

Solutions:

# Solution 1: Use more steps
image = pipe(prompt, num_inference_steps=50).images[0]

# Solution 2: Use better VAE
from diffusers import AutoencoderKL
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse")
pipe.vae = vae

# Solution 3: Use SDXL or refiner
pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0"
)

# Solution 4: Upscale with img2img
upscale_pipe = StableDiffusionImg2ImgPipeline.from_pretrained(...)
upscaled = upscale_pipe(
    prompt=prompt,
    image=image.resize((1024, 1024)),
    strength=0.3
).images[0]

Prompt not being followed

Problem: Generated image doesn't match the prompt

Solutions:

# Solution 1: Increase guidance scale
image = pipe(prompt, guidance_scale=10.0).images[0]

# Solution 2: Use negative prompts
image = pipe(
    prompt="A red car",
    negative_prompt="blue, green, yellow, wrong color",
    guidance_scale=7.5
).images[0]

# Solution 3: Use prompt weighting
# Emphasize important words
prompt = "A (red:1.5) car on a street"

# Solution 4: Use longer, more detailed prompts
prompt = """
A bright red sports car, ferrari style, parked on a city street,
photorealistic, high detail, 8k, professional photography
"""

Distorted faces/hands

Problem: Faces and hands look deformed

Solutions:

# Solution 1: Use negative prompts
negative_prompt = """
bad hands, bad anatomy, deformed, ugly, blurry,
extra fingers, mutated hands, poorly drawn hands,
poorly drawn face, mutation, deformed face
"""

# Solution 2: Use face-specific models
# ADetailer or similar post-processing

# Solution 3: Use ControlNet for poses
# Load pose estimation and condition generation

# Solution 4: Inpaint problematic areas
mask = create_face_mask(image)
fixed = inpaint_pipe(
    prompt="beautiful detailed face",
    image=image,
    mask_image=mask
).images[0]

Scheduler Issues

Scheduler not compatible

Error: ValueError: Scheduler ... is not compatible with pipeline

Fix:

from diffusers import EulerDiscreteScheduler

# Create scheduler from config
pipe.scheduler = EulerDiscreteScheduler.from_config(
    pipe.scheduler.config
)

# Check compatible schedulers
print(pipe.scheduler.compatibles)

Wrong number of steps

Problem: Model generates different quality with same steps

Fix:

# Reset timesteps explicitly
pipe.scheduler.set_timesteps(num_inference_steps)

# Check scheduler's step count
print(len(pipe.scheduler.timesteps))

LoRA Issues

LoRA weights not loading

Error: RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel

Fix:

# Check weight file format
# Should be .safetensors or .bin

# Load with correct key prefix
pipe.load_lora_weights(
    "path/to/lora",
    weight_name="lora.safetensors"
)

# Try loading into specific component
pipe.unet.load_attn_procs("path/to/lora")

LoRA not affecting output

Problem: Generated images look the same with/without LoRA

Fix:

# Fuse LoRA weights
pipe.fuse_lora(lora_scale=1.0)

# Or set scale explicitly
pipe.set_adapters(["lora_name"], adapter_weights=[1.0])

# Verify LoRA is loaded
print(list(pipe.unet.attn_processors.keys()))

Multiple LoRAs conflict

Problem: Multiple LoRAs produce artifacts

Fix:

# Load with different adapter names
pipe.load_lora_weights("lora1", adapter_name="style")
pipe.load_lora_weights("lora2", adapter_name="subject")

# Balance weights
pipe.set_adapters(
    ["style", "subject"],
    adapter_weights=[0.5, 0.5]  # Lower weights
)

# Or use LoRA merge before loading
# Merge LoRAs offline with appropriate ratios

ControlNet Issues

ControlNet not conditioning

Problem: ControlNet has no effect on output

Fix:

# Check control image format
# Should be RGB, matching generation size
control_image = control_image.resize((512, 512))

# Increase conditioning scale
image = pipe(
    prompt=prompt,
    image=control_image,
    controlnet_conditioning_scale=1.0,  # Try 0.5-1.5
    num_inference_steps=30
).images[0]

# Verify ControlNet is loaded
print(pipe.controlnet)

Control image preprocessing

Fix:

from controlnet_aux import CannyDetector

# Proper preprocessing
canny = CannyDetector()
control_image = canny(input_image)

# Ensure correct format
control_image = control_image.convert("RGB")
control_image = control_image.resize((512, 512))

Hub/Download Issues

Model download fails

Error: requests.exceptions.ConnectionError

Fix:

# Set longer timeout
export HF_HUB_DOWNLOAD_TIMEOUT=600

# Use mirror if available
export HF_ENDPOINT=https://hf-mirror.com

# Or download manually
huggingface-cli download stable-diffusion-v1-5/stable-diffusion-v1-5

Cache issues

Error: OSError: Can't load model from cache

Fix:

# Clear cache
rm -rf ~/.cache/huggingface/hub

# Or set different cache location
export HF_HOME=/path/to/cache

# Force re-download
pipe = DiffusionPipeline.from_pretrained(
    "model-id",
    force_download=True
)

Access denied for gated models

Error: 401 Client Error: Unauthorized

Fix:

# Login to Hugging Face
huggingface-cli login

# Or use token
pipe = DiffusionPipeline.from_pretrained(
    "model-id",
    token="hf_xxxxx"
)

# Accept model license on Hub website first

Performance Issues

Slow generation

Problem: Generation takes too long

Solutions:

# Solution 1: Use faster scheduler
from diffusers import DPMSolverMultistepScheduler
pipe.scheduler = DPMSolverMultistepScheduler.from_config(
    pipe.scheduler.config
)

# Solution 2: Reduce steps
image = pipe(prompt, num_inference_steps=20).images[0]

# Solution 3: Use LCM
from diffusers import LCMScheduler
pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
image = pipe(prompt, num_inference_steps=4, guidance_scale=1.0).images[0]

# Solution 4: Enable xFormers
pipe.enable_xformers_memory_efficient_attention()

# Solution 5: Compile model
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)

First generation is slow

Problem: First image takes much longer

Fix:

# Warm up the model
_ = pipe("warmup", num_inference_steps=1)

# Then run actual generation
image = pipe(prompt, num_inference_steps=50).images[0]

# Compile for faster subsequent runs
pipe.unet = torch.compile(pipe.unet)

Debugging Tips

Enable debug logging

import logging
logging.basicConfig(level=logging.DEBUG)

# Or for specific modules
logging.getLogger("diffusers").setLevel(logging.DEBUG)
logging.getLogger("transformers").setLevel(logging.DEBUG)

Check model components

# Print pipeline components
print(pipe.components)

# Check model config
print(pipe.unet.config)
print(pipe.vae.config)
print(pipe.scheduler.config)

# Verify device placement
print(pipe.device)
for name, module in pipe.components.items():
    if hasattr(module, 'device'):
        print(f"{name}: {module.device}")

Validate inputs

# Check image dimensions
print(f"Height: {height}, Width: {width}")
assert height % 8 == 0, "Height must be divisible by 8"
assert width % 8 == 0, "Width must be divisible by 8"

# Check prompt tokenization
tokens = pipe.tokenizer(prompt, return_tensors="pt")
print(f"Token count: {tokens.input_ids.shape[1]}")  # Max 77 for SD

Save intermediate results

def save_latents_callback(pipe, step_index, timestep, callback_kwargs):
    latents = callback_kwargs["latents"]

    # Decode and save intermediate
    with torch.no_grad():
        image = pipe.vae.decode(latents / pipe.vae.config.scaling_factor).sample
    image = (image / 2 + 0.5).clamp(0, 1)
    image = image.cpu().permute(0, 2, 3, 1).numpy()[0]
    Image.fromarray((image * 255).astype("uint8")).save(f"step_{step_index}.png")

    return callback_kwargs

image = pipe(
    prompt,
    callback_on_step_end=save_latents_callback,
    callback_on_step_end_tensor_inputs=["latents"]
).images[0]

Getting Help

Documentation: https://huggingface.co/docs/diffusers
GitHub Issues: https://github.com/huggingface/diffusers/issues
Discord: https://discord.gg/diffusers
Forum: https://discuss.huggingface.co

Reporting Issues

Include:

Diffusers version: pip show diffusers
PyTorch version: python -c "import torch; print(torch.__version__)"
CUDA version: nvcc --version
GPU model: nvidia-smi
Full error traceback
Minimal reproducible code
Model name/ID used

12 KiB Raw Blame History

Stable Diffusion Troubleshooting Guide

Installation Issues

Package conflicts

xFormers installation fails

Torch/CUDA mismatch

Memory Issues

CUDA out of memory

Memory grows over time

Large model loading fails

Generation Issues

Black images

Noise/static images

Blurry images

Prompt not being followed

Distorted faces/hands

Scheduler Issues

Scheduler not compatible

Wrong number of steps

LoRA Issues

LoRA weights not loading

LoRA not affecting output

Multiple LoRAs conflict

ControlNet Issues

ControlNet not conditioning

Control image preprocessing

Hub/Download Issues

Model download fails

Cache issues

Access denied for gated models

Performance Issues

Slow generation

First generation is slow

Debugging Tips

Enable debug logging

Check model components

Validate inputs

Save intermediate results

Getting Help

Reporting Issues

12 KiB

Raw Blame History