View file src/colab/diffusers.py - Download

# -*- coding: utf-8 -*-
"""diffusers.ipynb

Automatically generated by Colab.

Original file is located at
    https://colab.research.google.com/drive/1E-vQ_vTMRbIl6jdBWxLDwnDRX_ow_tgz

https://huggingface.co/docs/diffusers/quicktour

Quicktour

Diffusion models are trained to denoise random Gaussian noise step-by-step to generate a sample of interest, such as an image or audio. This has sparked a tremendous amount of interest in generative AI, and you have probably seen examples of diffusion generated images on the internet. 🧨 Diffusers is a library aimed at making diffusion models widely accessible to everyone.

Whether you’re a developer or an everyday user, this quicktour will introduce you to 🧨 Diffusers and help you get up and generating quickly! There are three main components of the library to know about:

    The DiffusionPipeline is a high-level end-to-end class designed to rapidly generate samples from pretrained diffusion models for inference.
    Popular pretrained model architectures and modules that can be used as building blocks for creating diffusion systems.
    Many different schedulers - algorithms that control how noise is added for training, and how to generate denoised images during inference.

The quicktour will show you how to use the DiffusionPipeline for inference, and then walk you through how to combine a model and scheduler to replicate what’s happening inside the DiffusionPipeline.

The quicktour is a simplified version of the introductory 🧨 Diffusers notebook to help you get started quickly. If you want to learn more about 🧨 Diffusers’ goal, design philosophy, and additional details about its core API, check out the notebook!

Before you begin, make sure you have all the necessary libraries installed:

# uncomment to install the necessary libraries in Colab
#!pip install --upgrade diffusers accelerate transformers

    🤗 Accelerate speeds up model loading for inference and training.
    🤗 Transformers is required to run the most popular diffusion models, such as Stable Diffusion.

DiffusionPipeline

The DiffusionPipeline is the easiest way to use a pretrained diffusion system for inference. It is an end-to-end system containing the model and the scheduler. You can use the DiffusionPipeline out-of-the-box for many tasks. Take a look at the table below for some supported tasks, and for a complete list of supported tasks, check out the 🧨 Diffusers Summary table.

Start by creating an instance of a DiffusionPipeline and specify which pipeline checkpoint you would like to download. You can use the DiffusionPipeline for any checkpoint stored on the Hugging Face Hub. In this quicktour, you’ll load the stable-diffusion-v1-5 checkpoint for text-to-image generation.

For Stable Diffusion models, please carefully read the license first before running the model. 🧨 Diffusers implements a safety_checker to prevent offensive or harmful content, but the model’s improved image generation capabilities can still produce potentially harmful content.

Load the model with the from_pretrained() method:
"""

from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", use_safetensors=True)
pipeline.to("cuda")
pipeline

"""The DiffusionPipeline downloads and caches all modeling, tokenization, and scheduling components. You’ll see that the Stable Diffusion pipeline is composed of the UNet2DConditionModel and PNDMScheduler among other things:

Now you can pass a text prompt to the pipeline to generate an image, and then access the denoised image. By default, the image output is wrapped in a PIL.Image object.
"""

image = pipeline("An image of a squirrel in Picasso style").images[0]
image

"""Save the image by calling save:"""

image.save("image_of_squirrel_painting.png")

# Commented out IPython magic to ensure Python compatibility.
# %%script echo Disabled
# !git lfs install
# !git clone https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5

# Commented out IPython magic to ensure Python compatibility.
# %%script echo Disabled
# pipeline = DiffusionPipeline.from_pretrained("./stable-diffusion-v1-5", use_safetensors=True)

# Remove the manual git clone commands as from_pretrained will handle the download
# !git lfs install
# !git clone https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5

from diffusers import DiffusionPipeline

# Load the pipeline directly from the Hugging Face Hub, which handles the download
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", use_safetensors=True)
pipeline.to("cuda")
pipeline

"""Swapping schedulers

Different schedulers come with different denoising speeds and quality trade-offs. The best way to find out which one works best for you is to try them out! One of the main features of 🧨 Diffusers is to allow you to easily switch between schedulers. For example, to replace the default PNDMScheduler with the EulerDiscreteScheduler, load it with the from_config() method:
"""

from diffusers import EulerDiscreteScheduler

pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", use_safetensors=True)
pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config)

"""Try generating an image with the new scheduler and see if you notice a difference!

"""

image = pipeline("An image of a squirrel in Picasso style").images[0]
image

"""
In the next section, you’ll take a closer look at the components - the model and scheduler - that make up the DiffusionPipeline and learn how to use these components to generate an image of a cat.
Models

Most models take a noisy sample, and at each timestep it predicts the noise residual (other models learn to predict the previous sample directly or the velocity or v-prediction), the difference between a less noisy image and the input image. You can mix and match models to create other diffusion systems.

Models are initiated with the from_pretrained() method which also locally caches the model weights so it is faster the next time you load the model. For the quicktour, you’ll load the UNet2DModel, a basic unconditional image generation model with a checkpoint trained on cat images:"""

from diffusers import UNet2DModel

repo_id = "google/ddpm-cat-256"
model = UNet2DModel.from_pretrained(repo_id, use_safetensors=True)

"""To access the model parameters, call model.config:"""

model.config

"""The model configuration is a 🧊 frozen 🧊 dictionary, which means those parameters can’t be changed after the model is created. This is intentional and ensures that the parameters used to define the model architecture at the start remain the same, while other parameters can still be adjusted during inference.

Some of the most important parameters are:

    sample_size: the height and width dimension of the input sample.
    in_channels: the number of input channels of the input sample.
    down_block_types and up_block_types: the type of down- and upsampling blocks used to create the UNet architecture.
    block_out_channels: the number of output channels of the downsampling blocks; also used in reverse order for the number of input channels of the upsampling blocks.
    layers_per_block: the number of ResNet blocks present in each UNet block.

To use the model for inference, create the image shape with random Gaussian noise. It should have a batch axis because the model can receive multiple random noises, a channel axis corresponding to the number of input channels, and a sample_size axis for the height and width of the image:


"""

import torch

torch.manual_seed(0)

noisy_sample = torch.randn(1, model.config.in_channels, model.config.sample_size, model.config.sample_size)
noisy_sample.shape

"""For inference, pass the noisy image and a timestep to the model. The timestep indicates how noisy the input image is, with more noise at the beginning and less at the end. This helps the model determine its position in the diffusion process, whether it is closer to the start or the end. Use the sample method to get the model output:"""

with torch.no_grad():
    noisy_residual = model(sample=noisy_sample, timestep=2).sample

"""To generate actual examples though, you’ll need a scheduler to guide the denoising process. In the next section, you’ll learn how to couple a model with a scheduler.
Schedulers

Schedulers manage going from a noisy sample to a less noisy sample given the model output - in this case, it is the noisy_residual.

🧨 Diffusers is a toolbox for building diffusion systems. While the DiffusionPipeline is a convenient way to get started with a pre-built diffusion system, you can also choose your own model and scheduler components separately to build a custom diffusion system.

For the quicktour, you’ll instantiate the DDPMScheduler with its from_config() method:
"""

from diffusers import DDPMScheduler

scheduler = DDPMScheduler.from_pretrained(repo_id)
scheduler

"""💡 Unlike a model, a scheduler does not have trainable weights and is parameter-free!

Some of the most important parameters are:

    num_train_timesteps: the length of the denoising process or, in other words, the number of timesteps required to process random Gaussian noise into a data sample.
    beta_schedule: the type of noise schedule to use for inference and training.
    beta_start and beta_end: the start and end noise values for the noise schedule.

To predict a slightly less noisy image, pass the following to the scheduler’s step() method: model output, timestep, and current sample.
"""

less_noisy_sample = scheduler.step(model_output=noisy_residual, timestep=2, sample=noisy_sample).prev_sample
less_noisy_sample.shape

"""The less_noisy_sample can be passed to the next timestep where it’ll get even less noisy! Let’s bring it all together now and visualize the entire denoising process.

First, create a function that postprocesses and displays the denoised image as a PIL.Image:
"""

import PIL.Image
import numpy as np


def display_sample(sample, i):
    image_processed = sample.cpu().permute(0, 2, 3, 1)
    image_processed = (image_processed + 1.0) * 127.5
    image_processed = image_processed.numpy().astype(np.uint8)

    image_pil = PIL.Image.fromarray(image_processed[0])
    display(f"Image at step {i}")
    display(image_pil)

"""To speed up the denoising process, move the input and model to a GPU:"""

model.to("cuda")
noisy_sample = noisy_sample.to("cuda")

"""Now create a denoising loop that predicts the residual of the less noisy sample, and computes the less noisy sample with the scheduler:


"""

import tqdm

sample = noisy_sample

for i, t in enumerate(tqdm.tqdm(scheduler.timesteps)):
    # 1. predict noise residual
    with torch.no_grad():
        residual = model(sample, t).sample

    # 2. compute less noisy image and set x_t -> x_t-1
    sample = scheduler.step(residual, t, sample).prev_sample

    # 3. optionally look at image
    if (i + 1) % 50 == 0:
        display_sample(sample, i + 1)

"""Next steps

Hopefully, you generated some cool images with 🧨 Diffusers in this quicktour! For your next steps, you can:

    Train or finetune a model to generate your own images in the training tutorial.
    See example official and community training or finetuning scripts for a variety of use cases.
    Learn more about loading, accessing, changing, and comparing schedulers in the Using different Schedulers guide.
    Explore prompt engineering, speed and memory optimizations, and tips and tricks for generating higher-quality images with the Stable Diffusion guide.
    Dive deeper into speeding up 🧨 Diffusers with guides on optimized PyTorch on a GPU, and inference guides for running Stable Diffusion on Apple Silicon (M1/M2) and ONNX Runtime.
"""