The skills directory was getting disorganized — mlops alone had 40 skills in a flat list, and 12 categories were singletons with just one skill each. Code change: - prompt_builder.py: Support sub-categories in skill scanner. skills/mlops/training/axolotl/SKILL.md now shows as category 'mlops/training' instead of just 'mlops'. Backwards-compatible with existing flat structure. Split mlops (40 skills) into 7 sub-categories: - mlops/training (12): accelerate, axolotl, flash-attention, grpo-rl-training, peft, pytorch-fsdp, pytorch-lightning, simpo, slime, torchtitan, trl-fine-tuning, unsloth - mlops/inference (8): gguf, guidance, instructor, llama-cpp, obliteratus, outlines, tensorrt-llm, vllm - mlops/models (6): audiocraft, clip, llava, segment-anything, stable-diffusion, whisper - mlops/vector-databases (4): chroma, faiss, pinecone, qdrant - mlops/evaluation (5): huggingface-tokenizers, lm-evaluation-harness, nemo-curator, saelens, weights-and-biases - mlops/cloud (2): lambda-labs, modal - mlops/research (1): dspy Merged singleton categories: - gifs → media (gif-search joins youtube-content) - music-creation → media (heartmula, songsee) - diagramming → creative (excalidraw joins ascii-art) - ocr-and-documents → productivity - domain → research (domain-intel) - feeds → research (blogwatcher) - market-data → research (polymarket) Fixed misplaced skills: - mlops/code-review → software-development (not ML-specific) - mlops/ml-paper-writing → research (academic writing) Added DESCRIPTION.md files for all new/updated categories.
12 KiB
Stable Diffusion Troubleshooting Guide
Installation Issues
Package conflicts
Error: ImportError: cannot import name 'cached_download' from 'huggingface_hub'
Fix:
# Update huggingface_hub
pip install --upgrade huggingface_hub
# Reinstall diffusers
pip install --upgrade diffusers
xFormers installation fails
Error: RuntimeError: CUDA error: no kernel image is available for execution
Fix:
# Check CUDA version
nvcc --version
# Install matching xformers
pip install xformers --index-url https://download.pytorch.org/whl/cu121 # For CUDA 12.1
# Or build from source
pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
Torch/CUDA mismatch
Error: RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED
Fix:
# Check versions
python -c "import torch; print(torch.__version__, torch.cuda.is_available())"
# Reinstall PyTorch with correct CUDA
pip uninstall torch torchvision
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
Memory Issues
CUDA out of memory
Error: torch.cuda.OutOfMemoryError: CUDA out of memory
Solutions:
# Solution 1: Enable CPU offloading
pipe.enable_model_cpu_offload()
# Solution 2: Sequential CPU offload (more aggressive)
pipe.enable_sequential_cpu_offload()
# Solution 3: Attention slicing
pipe.enable_attention_slicing()
# Solution 4: VAE slicing for large images
pipe.enable_vae_slicing()
# Solution 5: Use lower precision
pipe = DiffusionPipeline.from_pretrained(
"model-id",
torch_dtype=torch.float16 # or torch.bfloat16
)
# Solution 6: Reduce batch size
image = pipe(prompt, num_images_per_prompt=1).images[0]
# Solution 7: Generate smaller images
image = pipe(prompt, height=512, width=512).images[0]
# Solution 8: Clear cache between generations
import gc
torch.cuda.empty_cache()
gc.collect()
Memory grows over time
Problem: Memory usage increases with each generation
Fix:
import gc
import torch
def generate_with_cleanup(pipe, prompt, **kwargs):
try:
image = pipe(prompt, **kwargs).images[0]
return image
finally:
# Clear cache after generation
if torch.cuda.is_available():
torch.cuda.empty_cache()
gc.collect()
Large model loading fails
Error: RuntimeError: Unable to load model weights
Fix:
# Use low CPU memory mode
pipe = DiffusionPipeline.from_pretrained(
"large-model-id",
low_cpu_mem_usage=True,
torch_dtype=torch.float16
)
Generation Issues
Black images
Problem: Output images are completely black
Solutions:
# Solution 1: Disable safety checker
pipe.safety_checker = None
# Solution 2: Check VAE scaling
# The issue might be with VAE encoding/decoding
latents = latents / pipe.vae.config.scaling_factor # Before decode
# Solution 3: Ensure proper dtype
pipe = pipe.to(dtype=torch.float16)
pipe.vae = pipe.vae.to(dtype=torch.float32) # VAE often needs fp32
# Solution 4: Check guidance scale
# Too high can cause issues
image = pipe(prompt, guidance_scale=7.5).images[0] # Not 20+
Noise/static images
Problem: Output looks like random noise
Solutions:
# Solution 1: Increase inference steps
image = pipe(prompt, num_inference_steps=50).images[0]
# Solution 2: Check scheduler configuration
pipe.scheduler = pipe.scheduler.from_config(pipe.scheduler.config)
# Solution 3: Verify model was loaded correctly
print(pipe.unet) # Should show model architecture
Blurry images
Problem: Output images are low quality or blurry
Solutions:
# Solution 1: Use more steps
image = pipe(prompt, num_inference_steps=50).images[0]
# Solution 2: Use better VAE
from diffusers import AutoencoderKL
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse")
pipe.vae = vae
# Solution 3: Use SDXL or refiner
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0"
)
# Solution 4: Upscale with img2img
upscale_pipe = StableDiffusionImg2ImgPipeline.from_pretrained(...)
upscaled = upscale_pipe(
prompt=prompt,
image=image.resize((1024, 1024)),
strength=0.3
).images[0]
Prompt not being followed
Problem: Generated image doesn't match the prompt
Solutions:
# Solution 1: Increase guidance scale
image = pipe(prompt, guidance_scale=10.0).images[0]
# Solution 2: Use negative prompts
image = pipe(
prompt="A red car",
negative_prompt="blue, green, yellow, wrong color",
guidance_scale=7.5
).images[0]
# Solution 3: Use prompt weighting
# Emphasize important words
prompt = "A (red:1.5) car on a street"
# Solution 4: Use longer, more detailed prompts
prompt = """
A bright red sports car, ferrari style, parked on a city street,
photorealistic, high detail, 8k, professional photography
"""
Distorted faces/hands
Problem: Faces and hands look deformed
Solutions:
# Solution 1: Use negative prompts
negative_prompt = """
bad hands, bad anatomy, deformed, ugly, blurry,
extra fingers, mutated hands, poorly drawn hands,
poorly drawn face, mutation, deformed face
"""
# Solution 2: Use face-specific models
# ADetailer or similar post-processing
# Solution 3: Use ControlNet for poses
# Load pose estimation and condition generation
# Solution 4: Inpaint problematic areas
mask = create_face_mask(image)
fixed = inpaint_pipe(
prompt="beautiful detailed face",
image=image,
mask_image=mask
).images[0]
Scheduler Issues
Scheduler not compatible
Error: ValueError: Scheduler ... is not compatible with pipeline
Fix:
from diffusers import EulerDiscreteScheduler
# Create scheduler from config
pipe.scheduler = EulerDiscreteScheduler.from_config(
pipe.scheduler.config
)
# Check compatible schedulers
print(pipe.scheduler.compatibles)
Wrong number of steps
Problem: Model generates different quality with same steps
Fix:
# Reset timesteps explicitly
pipe.scheduler.set_timesteps(num_inference_steps)
# Check scheduler's step count
print(len(pipe.scheduler.timesteps))
LoRA Issues
LoRA weights not loading
Error: RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel
Fix:
# Check weight file format
# Should be .safetensors or .bin
# Load with correct key prefix
pipe.load_lora_weights(
"path/to/lora",
weight_name="lora.safetensors"
)
# Try loading into specific component
pipe.unet.load_attn_procs("path/to/lora")
LoRA not affecting output
Problem: Generated images look the same with/without LoRA
Fix:
# Fuse LoRA weights
pipe.fuse_lora(lora_scale=1.0)
# Or set scale explicitly
pipe.set_adapters(["lora_name"], adapter_weights=[1.0])
# Verify LoRA is loaded
print(list(pipe.unet.attn_processors.keys()))
Multiple LoRAs conflict
Problem: Multiple LoRAs produce artifacts
Fix:
# Load with different adapter names
pipe.load_lora_weights("lora1", adapter_name="style")
pipe.load_lora_weights("lora2", adapter_name="subject")
# Balance weights
pipe.set_adapters(
["style", "subject"],
adapter_weights=[0.5, 0.5] # Lower weights
)
# Or use LoRA merge before loading
# Merge LoRAs offline with appropriate ratios
ControlNet Issues
ControlNet not conditioning
Problem: ControlNet has no effect on output
Fix:
# Check control image format
# Should be RGB, matching generation size
control_image = control_image.resize((512, 512))
# Increase conditioning scale
image = pipe(
prompt=prompt,
image=control_image,
controlnet_conditioning_scale=1.0, # Try 0.5-1.5
num_inference_steps=30
).images[0]
# Verify ControlNet is loaded
print(pipe.controlnet)
Control image preprocessing
Fix:
from controlnet_aux import CannyDetector
# Proper preprocessing
canny = CannyDetector()
control_image = canny(input_image)
# Ensure correct format
control_image = control_image.convert("RGB")
control_image = control_image.resize((512, 512))
Hub/Download Issues
Model download fails
Error: requests.exceptions.ConnectionError
Fix:
# Set longer timeout
export HF_HUB_DOWNLOAD_TIMEOUT=600
# Use mirror if available
export HF_ENDPOINT=https://hf-mirror.com
# Or download manually
huggingface-cli download stable-diffusion-v1-5/stable-diffusion-v1-5
Cache issues
Error: OSError: Can't load model from cache
Fix:
# Clear cache
rm -rf ~/.cache/huggingface/hub
# Or set different cache location
export HF_HOME=/path/to/cache
# Force re-download
pipe = DiffusionPipeline.from_pretrained(
"model-id",
force_download=True
)
Access denied for gated models
Error: 401 Client Error: Unauthorized
Fix:
# Login to Hugging Face
huggingface-cli login
# Or use token
pipe = DiffusionPipeline.from_pretrained(
"model-id",
token="hf_xxxxx"
)
# Accept model license on Hub website first
Performance Issues
Slow generation
Problem: Generation takes too long
Solutions:
# Solution 1: Use faster scheduler
from diffusers import DPMSolverMultistepScheduler
pipe.scheduler = DPMSolverMultistepScheduler.from_config(
pipe.scheduler.config
)
# Solution 2: Reduce steps
image = pipe(prompt, num_inference_steps=20).images[0]
# Solution 3: Use LCM
from diffusers import LCMScheduler
pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
image = pipe(prompt, num_inference_steps=4, guidance_scale=1.0).images[0]
# Solution 4: Enable xFormers
pipe.enable_xformers_memory_efficient_attention()
# Solution 5: Compile model
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
First generation is slow
Problem: First image takes much longer
Fix:
# Warm up the model
_ = pipe("warmup", num_inference_steps=1)
# Then run actual generation
image = pipe(prompt, num_inference_steps=50).images[0]
# Compile for faster subsequent runs
pipe.unet = torch.compile(pipe.unet)
Debugging Tips
Enable debug logging
import logging
logging.basicConfig(level=logging.DEBUG)
# Or for specific modules
logging.getLogger("diffusers").setLevel(logging.DEBUG)
logging.getLogger("transformers").setLevel(logging.DEBUG)
Check model components
# Print pipeline components
print(pipe.components)
# Check model config
print(pipe.unet.config)
print(pipe.vae.config)
print(pipe.scheduler.config)
# Verify device placement
print(pipe.device)
for name, module in pipe.components.items():
if hasattr(module, 'device'):
print(f"{name}: {module.device}")
Validate inputs
# Check image dimensions
print(f"Height: {height}, Width: {width}")
assert height % 8 == 0, "Height must be divisible by 8"
assert width % 8 == 0, "Width must be divisible by 8"
# Check prompt tokenization
tokens = pipe.tokenizer(prompt, return_tensors="pt")
print(f"Token count: {tokens.input_ids.shape[1]}") # Max 77 for SD
Save intermediate results
def save_latents_callback(pipe, step_index, timestep, callback_kwargs):
latents = callback_kwargs["latents"]
# Decode and save intermediate
with torch.no_grad():
image = pipe.vae.decode(latents / pipe.vae.config.scaling_factor).sample
image = (image / 2 + 0.5).clamp(0, 1)
image = image.cpu().permute(0, 2, 3, 1).numpy()[0]
Image.fromarray((image * 255).astype("uint8")).save(f"step_{step_index}.png")
return callback_kwargs
image = pipe(
prompt,
callback_on_step_end=save_latents_callback,
callback_on_step_end_tensor_inputs=["latents"]
).images[0]
Getting Help
- Documentation: https://huggingface.co/docs/diffusers
- GitHub Issues: https://github.com/huggingface/diffusers/issues
- Discord: https://discord.gg/diffusers
- Forum: https://discuss.huggingface.co
Reporting Issues
Include:
- Diffusers version:
pip show diffusers - PyTorch version:
python -c "import torch; print(torch.__version__)" - CUDA version:
nvcc --version - GPU model:
nvidia-smi - Full error traceback
- Minimal reproducible code
- Model name/ID used