A Multi-Scale Deep Back-Projection Backbone for Face Super-Resolution with Diffusion Models

Face verification and recognition are important tasks that have made great progress in recent years. However, recognizing low-resolution faces from small images is still a difficult problem. In this paper, we advocate using diffusion models (DMs) to enhance face resolution and improve their quality...

Full description

Bibliographic Details
Main Authors: Juhao Gao, Ni Tang, Dongxiao Zhang
Format: Article
Language:English
Published: MDPI AG 2023-07-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/14/8110
_version_ 1797590444748570624
author Juhao Gao
Ni Tang
Dongxiao Zhang
author_facet Juhao Gao
Ni Tang
Dongxiao Zhang
author_sort Juhao Gao
collection DOAJ
description Face verification and recognition are important tasks that have made great progress in recent years. However, recognizing low-resolution faces from small images is still a difficult problem. In this paper, we advocate using diffusion models (DMs) to enhance face resolution and improve their quality for various downstream applications. Most existing DMs for super-resolution use U-Net as their backbone network, which only exploits multi-scale features along the spatial dimension. These approaches result in a slow convergence of corresponding DMs and the inability to capture complex details and fine textures. To address this issue, we propose a novel conditional generative model based on DMs called BPSR3, which replaces the U-Net in super-resolution via repeated refinement (SR3) with a multi-scale deep back-projection network structure. BPSR3 can extract richer features not only in depth but also in breadth. This helps to effectively refine the image quality at different scales. The experimental results on facial datasets show that BPSR3 significantly improved both convergence speed and reconstruction performance. BPSR3 has about 1/4 of the parameters of SR3 but achieves a 50.1% improvement in PSNR, a 19.8% improvement in SSIM, and a 15.4% reduction in FID. Our contribution lies in achieving less time and space consumption and better reconstruction results. In addition, we propose an idea of enhancing the performance of DMs by replacing the U-Net with a better network.
first_indexed 2024-03-11T01:20:36Z
format Article
id doaj.art-5ea805fe02114c549754cc6d00fd4234
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-11T01:20:36Z
publishDate 2023-07-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-5ea805fe02114c549754cc6d00fd42342023-11-18T18:08:22ZengMDPI AGApplied Sciences2076-34172023-07-011314811010.3390/app13148110A Multi-Scale Deep Back-Projection Backbone for Face Super-Resolution with Diffusion ModelsJuhao Gao0Ni Tang1Dongxiao Zhang2School of Science, Jimei University, Xiamen 361021, ChinaSchool of Science, Jimei University, Xiamen 361021, ChinaSchool of Science, Jimei University, Xiamen 361021, ChinaFace verification and recognition are important tasks that have made great progress in recent years. However, recognizing low-resolution faces from small images is still a difficult problem. In this paper, we advocate using diffusion models (DMs) to enhance face resolution and improve their quality for various downstream applications. Most existing DMs for super-resolution use U-Net as their backbone network, which only exploits multi-scale features along the spatial dimension. These approaches result in a slow convergence of corresponding DMs and the inability to capture complex details and fine textures. To address this issue, we propose a novel conditional generative model based on DMs called BPSR3, which replaces the U-Net in super-resolution via repeated refinement (SR3) with a multi-scale deep back-projection network structure. BPSR3 can extract richer features not only in depth but also in breadth. This helps to effectively refine the image quality at different scales. The experimental results on facial datasets show that BPSR3 significantly improved both convergence speed and reconstruction performance. BPSR3 has about 1/4 of the parameters of SR3 but achieves a 50.1% improvement in PSNR, a 19.8% improvement in SSIM, and a 15.4% reduction in FID. Our contribution lies in achieving less time and space consumption and better reconstruction results. In addition, we propose an idea of enhancing the performance of DMs by replacing the U-Net with a better network.https://www.mdpi.com/2076-3417/13/14/8110super-resolutiondiffusion modelsU-Netback-projection
spellingShingle Juhao Gao
Ni Tang
Dongxiao Zhang
A Multi-Scale Deep Back-Projection Backbone for Face Super-Resolution with Diffusion Models
Applied Sciences
super-resolution
diffusion models
U-Net
back-projection
title A Multi-Scale Deep Back-Projection Backbone for Face Super-Resolution with Diffusion Models
title_full A Multi-Scale Deep Back-Projection Backbone for Face Super-Resolution with Diffusion Models
title_fullStr A Multi-Scale Deep Back-Projection Backbone for Face Super-Resolution with Diffusion Models
title_full_unstemmed A Multi-Scale Deep Back-Projection Backbone for Face Super-Resolution with Diffusion Models
title_short A Multi-Scale Deep Back-Projection Backbone for Face Super-Resolution with Diffusion Models
title_sort multi scale deep back projection backbone for face super resolution with diffusion models
topic super-resolution
diffusion models
U-Net
back-projection
url https://www.mdpi.com/2076-3417/13/14/8110
work_keys_str_mv AT juhaogao amultiscaledeepbackprojectionbackboneforfacesuperresolutionwithdiffusionmodels
AT nitang amultiscaledeepbackprojectionbackboneforfacesuperresolutionwithdiffusionmodels
AT dongxiaozhang amultiscaledeepbackprojectionbackboneforfacesuperresolutionwithdiffusionmodels
AT juhaogao multiscaledeepbackprojectionbackboneforfacesuperresolutionwithdiffusionmodels
AT nitang multiscaledeepbackprojectionbackboneforfacesuperresolutionwithdiffusionmodels
AT dongxiaozhang multiscaledeepbackprojectionbackboneforfacesuperresolutionwithdiffusionmodels