Iterative token evaluation and refinement for real-world super-resolution

Real-world image super-resolution (RWSR) is a longstanding problem as low-quality (LQ) images often have complex and unidentified degradations. Existing methods such as Generative Adversarial Networks (GANs) or continuous diffusion models present their own issues including GANs being difficult to tr...

Full description

Bibliographic Details
Main Authors:	Chen, Chaofeng, Zhou, Shangchen, Liao, Liang, Wu, Haoning, Sun, Wenxiu, Yan, Qiong, Lin, Weisi
Other Authors:	College of Computing and Data Science
Format:	Conference Paper
Language:	English
Published:	2024
Subjects:	Computer and Information Science Computational photography Image & video synthesis
Online Access:	https://hdl.handle.net/10356/178460 https://ojs.aaai.org/index.php/AAAI/article/view/27861

_version_	1811680810465492992
author	Chen, Chaofeng Zhou, Shangchen Liao, Liang Wu, Haoning Sun, Wenxiu Yan, Qiong Lin, Weisi
author2	College of Computing and Data Science
author_facet	College of Computing and Data Science Chen, Chaofeng Zhou, Shangchen Liao, Liang Wu, Haoning Sun, Wenxiu Yan, Qiong Lin, Weisi
author_sort	Chen, Chaofeng
collection	NTU
description	Real-world image super-resolution (RWSR) is a longstanding problem as low-quality (LQ) images often have complex and unidentified degradations. Existing methods such as Generative Adversarial Networks (GANs) or continuous diffusion models present their own issues including GANs being difficult to train while continuous diffusion models requiring numerous inference steps. In this paper, we propose an Iterative Token Evaluation and Refinement (ITER) framework for RWSR, which utilizes a discrete diffusion model operating in the discrete token representation space, i.e., indexes of features extracted from a VQGAN codebook pre-trained with high-quality (HQ) images. We show that ITER is easier to train than GANs and more efficient than continuous diffusion models. Specifically, we divide RWSR into two sub-tasks, i.e., distortion removal and texture generation. Distortion removal involves simple HQ token prediction with LQ images, while texture generation uses a discrete diffusion model to iteratively refine the distortion removal output with a token refinement network. In particular, we propose to include a token evaluation network in the discrete diffusion process. It learns to evaluate which tokens are good restorations and helps to improve the iterative refinement results. Moreover, the evaluation network can first check status of the distortion removal output and then adaptively select total refinement steps needed, thereby maintaining a good balance between distortion removal and texture generation. Extensive experimental results show that ITER is easy to train and performs well within just 8 iterative steps.
first_indexed	2024-10-01T03:30:58Z
format	Conference Paper
id	ntu-10356/178460
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T03:30:58Z
publishDate	2024
record_format	dspace
spelling	ntu-10356/1784602024-06-21T02:04:14Z Iterative token evaluation and refinement for real-world super-resolution Chen, Chaofeng Zhou, Shangchen Liao, Liang Wu, Haoning Sun, Wenxiu Yan, Qiong Lin, Weisi College of Computing and Data Science School of Computer Science and Engineering 38th AAAI Conference on Artificial Intelligence (2024) S-Lab Computer and Information Science Computational photography Image & video synthesis Real-world image super-resolution (RWSR) is a longstanding problem as low-quality (LQ) images often have complex and unidentified degradations. Existing methods such as Generative Adversarial Networks (GANs) or continuous diffusion models present their own issues including GANs being difficult to train while continuous diffusion models requiring numerous inference steps. In this paper, we propose an Iterative Token Evaluation and Refinement (ITER) framework for RWSR, which utilizes a discrete diffusion model operating in the discrete token representation space, i.e., indexes of features extracted from a VQGAN codebook pre-trained with high-quality (HQ) images. We show that ITER is easier to train than GANs and more efficient than continuous diffusion models. Specifically, we divide RWSR into two sub-tasks, i.e., distortion removal and texture generation. Distortion removal involves simple HQ token prediction with LQ images, while texture generation uses a discrete diffusion model to iteratively refine the distortion removal output with a token refinement network. In particular, we propose to include a token evaluation network in the discrete diffusion process. It learns to evaluate which tokens are good restorations and helps to improve the iterative refinement results. Moreover, the evaluation network can first check status of the distortion removal output and then adaptively select total refinement steps needed, thereby maintaining a good balance between distortion removal and texture generation. Extensive experimental results show that ITER is easy to train and performs well within just 8 iterative steps. This study is supported under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s). 2024-06-21T02:04:13Z 2024-06-21T02:04:13Z 2024 Conference Paper Chen, C., Zhou, S., Liao, L., Wu, H., Sun, W., Yan, Q. & Lin, W. (2024). Iterative token evaluation and refinement for real-world super-resolution. 38th AAAI Conference on Artificial Intelligence (2024), 38, 1010-1018. https://dx.doi.org/10.1609/aaai.v38i2.27861 https://hdl.handle.net/10356/178460 10.1609/aaai.v38i2.27861 2-s2.0-85189536364 https://ojs.aaai.org/index.php/AAAI/article/view/27861 38 1010 1018 en © 2024 Association for the Advancement of Artifcial Intelligence (www.aaai.org). All rights reserved.
spellingShingle	Computer and Information Science Computational photography Image & video synthesis Chen, Chaofeng Zhou, Shangchen Liao, Liang Wu, Haoning Sun, Wenxiu Yan, Qiong Lin, Weisi Iterative token evaluation and refinement for real-world super-resolution
title	Iterative token evaluation and refinement for real-world super-resolution
title_full	Iterative token evaluation and refinement for real-world super-resolution
title_fullStr	Iterative token evaluation and refinement for real-world super-resolution
title_full_unstemmed	Iterative token evaluation and refinement for real-world super-resolution
title_short	Iterative token evaluation and refinement for real-world super-resolution
title_sort	iterative token evaluation and refinement for real world super resolution
topic	Computer and Information Science Computational photography Image & video synthesis
url	https://hdl.handle.net/10356/178460 https://ojs.aaai.org/index.php/AAAI/article/view/27861
work_keys_str_mv	AT chenchaofeng iterativetokenevaluationandrefinementforrealworldsuperresolution AT zhoushangchen iterativetokenevaluationandrefinementforrealworldsuperresolution AT liaoliang iterativetokenevaluationandrefinementforrealworldsuperresolution AT wuhaoning iterativetokenevaluationandrefinementforrealworldsuperresolution AT sunwenxiu iterativetokenevaluationandrefinementforrealworldsuperresolution AT yanqiong iterativetokenevaluationandrefinementforrealworldsuperresolution AT linweisi iterativetokenevaluationandrefinementforrealworldsuperresolution

Iterative token evaluation and refinement for real-world super-resolution

Similar Items