Programming Project #5: Diffusion Models

CS180: Intro to Computer Vision and Computational Photography

Overview

Implement and deploy diffusion models for image generation

Part 1: Sampling Loops

For Part 1.1, I successfully implemented the forward process to add progressively increasing levels of noise to a clean image using a Gaussian distribution. This involved scaling the original image based on noise coefficients and adding noise sampled from a normal distribution, showcasing how the image degrades at specific timesteps (t=250, 500, 750).

1.1 Implementing the Forward Process

Noisy Image at t = 250

Noisy Campanile at t=250

Noisy Image at t = 500

Noisy Campanile at t=500

Noisy Image at t = 750

Noisy Campanile at t=750

1.2 Classical Denoising

For Part 1.2, I applied Gaussian blur filtering to attempt denoising the noisy images created in Part 1.1. Although Gaussian blur smoothens some noise, it struggles to restore fine details, highlighting the limitations of classical denoising techniques compared to diffusion models.

Noisy Image at t = 250

Noisy Image at t=250

Denoised Image at t = 250

Denoised Image at t=250

Noisy Image at t = 500

Noisy Image at t=500

Denoised Image at t = 500

Denoised Image at t=500

Noisy Image at t = 750

Noisy Image at t=750

Denoised Image at t = 750

Denoised Image at t=750

In Part 1.3, I cleaned up noisy images from Part 1.2 at timesteps t = [250, 500, 750]. Using the UNet model, I estimated the noise in each noisy image and removed it by scaling the noise properly. This gave me a cleaner version of the original image. I visualized the results, showing the original image, the noisy version, and the cleaned-up estimate to see how well the model performed.

1.3 One-Step Denoising

Original Image (t=250)

Original Image t=250

Original Image (t=500)

Original Image t=500

Original Image (t=750)

Original Image t=750

Noisy Image (t=250)

Noisy Image t=250

Noisy Image (t=500)

Noisy Image t=500

Noisy Image (t=750)

Noisy Image t=750

Denoised Image (t=250)

Denoised Image t=250

Denoised Image (t=500)

Denoised Image t=500

Denoised Image (t=750)

Denoised Image t=750

1.4 Iterative Denoising

Instead of removing noise in one step, I removed it gradually over several steps. This allowed the image to get cleaner little by little. By skipping some steps, I sped up the process while still getting good results. This method worked much better than doing everything in one go or using basic methods like blurring.

Denoising at t=0Denoising at t=60Denoising at t=210Denoising at t=360Denoising at t=510Denoising at t=660

1.5 Image Generation

I started with complete noise (just random pixels) and used the denoising steps to create entirely new images. The model turned the noise into clear and realistic pictures. This showed how the model can generate something from nothing.

Generated Image at t=0Generated Image at t=1Generated Image at t=4

1.7 Noise Levels

I took an existing image, added some noise, and then removed it. This made the model "rethink" the image and change it slightly. Adding only a little noise kept the edits small, while more noise created bigger changes. I tried this with regular pictures and hand-drawn ones.

Noise Level: 1

Cleaned Image Noise Level 1

Noise Level: 3

Cleaned Image Noise Level 3

Noise Level: 5

Cleaned Image Noise Level 5

Noise Level: 7

Cleaned Image Noise Level 7

Noise Level: 10

Cleaned Image Noise Level 10

Noise Level: 20

Cleaned Image Noise Level 20

1.7.1 Timesteps

Visualizing cleaned images at different timesteps.

Timestep: 1

Cleaned Image Timestep 1

Timestep: 3

Cleaned Image Timestep 3

Timestep: 5

Cleaned Image Timestep 5

Timestep: 7

Cleaned Image Timestep 7

Timestep: 10

Cleaned Image Timestep 10

Timestep: 20

Cleaned Image Timestep 20

1.7.3 Custom Images

This was similar to the last part, but I added text instructions to guide the changes. For example, I could make an image look like "a rocket ship" by giving the model that instruction. The model followed the text prompt while fixing the noise.

Custom Timestep: 1

Generated Custom Timestep 1

Custom Timestep: 3

Generated Custom Timestep 3

Custom Timestep: 5

Generated Custom Timestep 5

Custom Timestep: 7

Generated Custom Timestep 7

Custom Timestep: 10

Generated Custom Timestep 10

Custom Timestep: 20

Generated Custom Timestep 20

1.8 Diffusion Outcomes

Compared outcomes across different parameters to evaluate the final model's efficiency(had a bug I couldn't fix)

Outcome 1

Diffusion Outcome 1

Outcome 2

Diffusion Outcome 2

Outcome 3

Diffusion Outcome 3

1.10 Additional Results

Additional generated images at varying timesteps(had a bug I couldn't fix)

Result 1

Generated Image 1

Result 2

Generated Image 2

Result 3

Generated Image 3

Part 2: Training

2.1 Adding Time Conditioning to UNet

I trained a time-conditioned UNet model. This training process enabled the model to predict noise at different timesteps. The loss curve and generated outputs after specific epochs illustrate the model's progress.

Training Loss Curve

Time-Conditioned UNet Training Loss Curve

Generated Samples After 5 Epochs

Generated Samples After 5 Epochs

Generated Samples After 20 Epochs

Generated Samples After 20 Epochs

2.2 Iterative Noise Reduction

I experimented with the gradual reduction of noise over multiple timesteps. Below are the results showing how the model progressively refines the images.

Step 0

Denoising at Step 0

Step 60

Denoising at Step 60

Step 210

Denoising at Step 210

Step 360

Denoising at Step 360

Step 510

Denoising at Step 510

Step 660

Denoising at Step 660

2.3 Comparing Outputs Across Epochs

By visualizing outputs after training for different numbers of epochs, we can see the model's improvements in generating realistic outputs from noisy inputs.

Output After 5 Epochs

Generated Samples After 5 Epochs

Output After 20 Epochs

Generated Samples After 20 Epochs

2.5 Class-Conditional Sample Generation

I generated samples for each digit class using a class-conditional UNet after training for 5 and 20 epochs. The generated images demonstrate how the model improves its ability to generate realistic outputs with more training.

Generated Samples After 5 Epochs

Generated Samples After 5 Epochs

Generated Samples After 20 Epochs

Generated Samples After 20 Epochs