6 min read

Upscaling Images Using AI

A tutorial on using Real-ESRGAN to upscale images in python
Upscaling Images Using AI

Introduction

Many of the open-source image generation models such as stable diffusion tend to produce images in a lower resolution. Stable diffusion 1.5 was trained on images that were 512 x 512 pixels. Version 2.0 was trained on 768 x 768 pixels. In an era where 4K images are common, these resolutions seem subpar. I'm confident that over time advancements in generative image models will lead to higher resolutions. However, in the present, are there any immediate actions we can take to address this problem?

Enter AI image upscaling.

Through a process called Super Resolution, you can take a low resolution image and convert it into a higher resolution without losing image quality. One of the most popular models using super resolution is Real-ESRGAN. Real-ESRGAN is a generative model that tries to create high-quality images that closely resemble the desired target images.

How It Works

Real-ESRGAN Architecture | Image Source

Real-ESRGAN consists of two neural networks:

  1. Generator
    • Responsible for the super-resolution task
    • Trained on a dataset of paired low-resolution and high-resolution images
    • Uses its learned weights and architecture to upscale the input image by adding details and improving its visual quality
  1. GAN(Generative Adversarial Network)
    • Used to distinguish between real high-resolution images and images generated by the generator
    • The generator tries to produce high-resolution images that are indistinguishable from real ones, while the GAN tries to become better at telling real from fake
TLDR; The generator continually tries to improve the quality of the images it has generated. While the GAN tries to continually improve its ability to detect fake images. These two models are pitted against each other, resulting in both models becoming extremely good at their respective task.
Real-ESRGAN Training Dataset

During training, the generator takes a low-resolution input image from the dataset and attempts to generate a high-resolution image. The GAN, on the other hand, receives the real high-resolution image and the generated image from the generator and learns to differentiate between them. The adversarial training process involves the generator trying to produce high-resolution images that are convincing enough to deceive the GAN.

At the end of the day, you get a generator model that is very good at producing high-resolution images from low-resolution inputs.

Code

To run this model, a GPU with at least 8GB of VRAM is required. Google Colab's free tier comes with a GPU with 16GB of VRAM, so that is more than enough.

For the full code, please take a look at this google colab:

Google Colaboratory

Step 1: Setup

!git clone https://github.com/xinntao/Real-ESRGAN.git
%cd Real-ESRGAN
!pip install gfpgan>=1.3.5
!pip install basicsr>=1.3.3.11
!pip install facexlib>=0.2.0.3
!pip install gfpgan>=0.2.1
!pip install -r requirements.txt
!python setup.py develop

Here we are simply cloning the project and installing the required python dependencies.

Step 2: Download the model weights

import os
os.makedirs('weights', exist_ok=True)
!wget https://github.com/xinntao/Real-ESRGAN/releases/download/v0.1.0/RealESRGAN_x4plus.pth -P ./weights

We need the weights for the model in order to use it. There are multiple variants of the model, the RealESRGAN_x4 is a 4X image upscaling model that works well for most use cases, so we'll be using this one.

There are a few other weights you can try if you're curious:

  • RealESRNet_x4plus
  • RealESRGAN_x4plus_anime_6B
  • RealESRGAN_x2plus

Step 3: Import modules and set up model

from basicsr.archs.rrdbnet_arch import RRDBNet
import cv2

from realesrgan import RealESRGANer
from realesrgan.archs.srvgg_arch import SRVGGNetCompact
from gfpgan import GFPGANer
import tempfile
import base64
import numpy as np
import io
from PIL import Image


model = RRDBNet(num_in_ch=3, num_out_ch=3, num_feat=64, num_block=23, num_grow_ch=32, scale=4)
netscale = 4
model_path = os.path.join("weights", "RealESRGAN_x4plus.pth")
upsampler = RealESRGANer(
            scale=netscale,
            model_path=model_path,
            model=model,
            tile=0,
            tile_pad=10,
            pre_pad=0,
            half=True)

In step 1 we cloned the git repo and ran the command !python setup.py develop. This is what allows us to import realesrgan as a python module. Using this module we can load the model using the RealESRGANer class.

The RRDBNet is a type of neural network used by Real-ESRGAN. The upsampler is an instance of the RealESRGANer class as mentioned above.

  1. RRDBNet
    • num_in_ch: Number of input channels, which is set to 3, indicating that it expects color images with Red, Green, and Blue (RGB) channels.
    • num_out_ch: Number of output channels, also set to 3, indicating that it will generate color high-resolution images.
    • num_feat: Number of feature maps or channels in the network. It's set to 64, which determines the network's capacity.
    • num_block: Number of residual blocks in the network. In this case, it's set to 23, indicating a deep network.
    • num_grow_ch: The growth rate of channels within the residual blocks, set to 32.
    • scale: The scaling factor for super-resolution. It's set to 4, which means the network is designed to upscale images by a factor of 4.
  1. upsampler
    • scale: This parameter defines the scaling factor by which the low-resolution input images will be upscaled. In this case, it's set to 4, meaning the input images will be scaled up by a factor of 4.
    • model_path: This is the path to the pre-trained Real-ESRGAN model weights, which are loaded to perform the upscaling. The weights are typically learned during the training phase.
    • model: The RRDBNet instance, which is the neural network responsible for the upscaling.
    • tile, tile_pad, and pre_pad: These parameters can relate to how the upscaling is applied in a tiled manner, which can be useful for processing large images in smaller pieces to save memory. tile determines the tile size, tile_pad specifies the padding around each tile, and pre_pad sets the padding before processing.

Step 4: Run the model

image = "/content/weird_llama.jpeg"
scale = 4

img = Image.open(image)
img = np.array(img)
output, _ = upsampler.enhance(img, outscale=scale)

output_image = Image.fromarray(output)
# Save the image
output_image.save('/content/output.jpeg')

The input to the model is an image path, in this case, I have an image of a llama I generated using stable diffusion at the path /content/weird_llama.jpeg. Similarly, the output of the image is stored at /content/output.jpeg.

Here are the results I got for the weird_llama image:

Here is another example:
💡
For some images, it might be difficult to see the difference. The difference becomes a lot more apparent once you zoom into the images.

Conclusion

In this post, my goal was to show how a deep learning model can be used to improve image quality by increasing the resolution. Real-ESRGAN isn't the only model that is capable of doing this task. Various other models can be used for image upscaling as well. However, this model has been battle-tested on real-world images and has displayed excellent performance in these areas.

With the rise of generative AI, the current models we have are just the beginning. In the future, models will continue to improve and give us the ability to restore and enhance all images with perfection. Generative image models like Stable Diffusion and Midjourney also provide us with a plethora of synthetic data which can help expedite training for models like GANs. I suspect we will see big improvements in this area in the next few years.

I hope you enjoyed this article and if you'd like to read more articles like this, please consider subscribing 😀