Exploiting diffusion prior for real-world image super-resolution
We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution. Specifically, by employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model, thereby pre...
Main Authors: | , , , , |
---|---|
Other Authors: | |
Format: | Journal Article |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/180685 |
_version_ | 1824455903971115008 |
---|---|
author | Wang, Jianyi Yue, Zongsheng Zhou, Shangchen Chan, Kelvin C. K. Loy, Chen Change |
author2 | College of Computing and Data Science |
author_facet | College of Computing and Data Science Wang, Jianyi Yue, Zongsheng Zhou, Shangchen Chan, Kelvin C. K. Loy, Chen Change |
author_sort | Wang, Jianyi |
collection | NTU |
description | We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution. Specifically, by employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model, thereby preserving the generative prior and minimizing training cost. To remedy the loss of fidelity caused by the inherent stochasticity of diffusion models, we employ a controllable feature wrapping module that allows users to balance quality and fidelity by simply adjusting a scalar value during the inference process. Moreover, we develop a progressive aggregation sampling strategy to overcome the fixed-size constraints of pre-trained diffusion models, enabling adaptation to resolutions of any size. A comprehensive evaluation of our method using both synthetic and real-world benchmarks demonstrates its superiority over current state-of-the-art approaches. Code and models are available at https://github.com/IceClear/StableSR. |
first_indexed | 2025-02-19T03:45:37Z |
format | Journal Article |
id | ntu-10356/180685 |
institution | Nanyang Technological University |
language | English |
last_indexed | 2025-02-19T03:45:37Z |
publishDate | 2024 |
record_format | dspace |
spelling | ntu-10356/1806852024-10-21T02:09:42Z Exploiting diffusion prior for real-world image super-resolution Wang, Jianyi Yue, Zongsheng Zhou, Shangchen Chan, Kelvin C. K. Loy, Chen Change College of Computing and Data Science S-Lab Computer and Information Science Image restoration Diffusion models We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution. Specifically, by employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model, thereby preserving the generative prior and minimizing training cost. To remedy the loss of fidelity caused by the inherent stochasticity of diffusion models, we employ a controllable feature wrapping module that allows users to balance quality and fidelity by simply adjusting a scalar value during the inference process. Moreover, we develop a progressive aggregation sampling strategy to overcome the fixed-size constraints of pre-trained diffusion models, enabling adaptation to resolutions of any size. A comprehensive evaluation of our method using both synthetic and real-world benchmarks demonstrates its superiority over current state-of-the-art approaches. Code and models are available at https://github.com/IceClear/StableSR. National Research Foundation (NRF) This study is supported by the National Research Foundation, Singapore under its AI Singapore Programme (AISG Award No: AISG2-PhD-2022-01-033[T]), RIE2020 Industry Alignment Fund Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s). We sincerely thank Yi Li for providing valuable advice and building the WebUI implementation (https://github.com/pkuliyi2015/ sd-webui-stablesr) of our work. We also thank the continuous interest and contributions from the community. 2024-10-21T02:09:42Z 2024-10-21T02:09:42Z 2024 Journal Article Wang, J., Yue, Z., Zhou, S., Chan, K. C. K. & Loy, C. C. (2024). Exploiting diffusion prior for real-world image super-resolution. International Journal of Computer Vision. https://dx.doi.org/10.1007/s11263-024-02168-7 0920-5691 https://hdl.handle.net/10356/180685 10.1007/s11263-024-02168-7 2-s2.0-85198058630 en RIE2020 AISG2-PhD-2022-01-033[T] International Journal of Computer Vision © 2024 The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature. All rights reserved. |
spellingShingle | Computer and Information Science Image restoration Diffusion models Wang, Jianyi Yue, Zongsheng Zhou, Shangchen Chan, Kelvin C. K. Loy, Chen Change Exploiting diffusion prior for real-world image super-resolution |
title | Exploiting diffusion prior for real-world image super-resolution |
title_full | Exploiting diffusion prior for real-world image super-resolution |
title_fullStr | Exploiting diffusion prior for real-world image super-resolution |
title_full_unstemmed | Exploiting diffusion prior for real-world image super-resolution |
title_short | Exploiting diffusion prior for real-world image super-resolution |
title_sort | exploiting diffusion prior for real world image super resolution |
topic | Computer and Information Science Image restoration Diffusion models |
url | https://hdl.handle.net/10356/180685 |
work_keys_str_mv | AT wangjianyi exploitingdiffusionpriorforrealworldimagesuperresolution AT yuezongsheng exploitingdiffusionpriorforrealworldimagesuperresolution AT zhoushangchen exploitingdiffusionpriorforrealworldimagesuperresolution AT chankelvinck exploitingdiffusionpriorforrealworldimagesuperresolution AT loychenchange exploitingdiffusionpriorforrealworldimagesuperresolution |