Image Super-Resolution Using Dilated Window Transformer

Transformer-based networks using attention mechanisms have shown promising results in low-level vision tasks, such as image super-resolution (SR). Specifically, recent studies that utilize window-based self-attention mechanisms have exhibited notable advancements in image SR. However, window-based s...

Full description

Bibliographic Details
Main Authors:	Soobin Park, Yong Suk Choi
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Image super-resolution self-attention mechanism transformer window-based self-attention
Online Access:	https://ieeexplore.ieee.org/document/10147198/

_version_	1827915976680669184
author	Soobin Park Yong Suk Choi
author_facet	Soobin Park Yong Suk Choi
author_sort	Soobin Park
collection	DOAJ
description	Transformer-based networks using attention mechanisms have shown promising results in low-level vision tasks, such as image super-resolution (SR). Specifically, recent studies that utilize window-based self-attention mechanisms have exhibited notable advancements in image SR. However, window-based self-attention, results in a slower expansion of the receptive field, thereby restricting the modeling of long-range dependencies. To address this issue, we introduce a novel dilated window transformer, namely DWT, which utilizes a dilation strategy. We employ a simple yet efficient dilation strategy that enlarges the window by inserting intervals between the tokens of each window to enable rapid and effective expansion of the receptive field. In particular, we adjust the interval between the tokens to become wider as the layers go deeper. This strategy enables the extraction of local features by allowing interaction between neighboring tokens in the shallow layers while also facilitating efficient extraction of global features by enabling interaction between not only adjacent tokens but also distant tokens in the deep layers. We conduct extensive experiments on five benchmark datasets to demonstrate the superior performance of our proposed method. Our DWT surpasses the state-of-the-art network of similar sizes by a PSNR margin of 0.11dB to 0.27dB on the Urban100 dataset. Moreover, even when compared to state-of-the-art network with about 1.4 times more parameters, DWT achieves competitive results for both quantitative and visual comparisons.
first_indexed	2024-03-13T03:07:17Z
format	Article
id	doaj.art-ec181c10028b4ef287566c3c96f79245
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-13T03:07:17Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-ec181c10028b4ef287566c3c96f792452023-06-26T23:00:14ZengIEEEIEEE Access2169-35362023-01-0111600286003910.1109/ACCESS.2023.328453910147198Image Super-Resolution Using Dilated Window TransformerSoobin Park0https://orcid.org/0000-0002-5373-2219Yong Suk Choi1https://orcid.org/0000-0002-9042-0599Department of Artificial Intelligence, Hanyang University, Seoul, South KoreaDepartment of Computer Science, Hanyang University, Seoul, South KoreaTransformer-based networks using attention mechanisms have shown promising results in low-level vision tasks, such as image super-resolution (SR). Specifically, recent studies that utilize window-based self-attention mechanisms have exhibited notable advancements in image SR. However, window-based self-attention, results in a slower expansion of the receptive field, thereby restricting the modeling of long-range dependencies. To address this issue, we introduce a novel dilated window transformer, namely DWT, which utilizes a dilation strategy. We employ a simple yet efficient dilation strategy that enlarges the window by inserting intervals between the tokens of each window to enable rapid and effective expansion of the receptive field. In particular, we adjust the interval between the tokens to become wider as the layers go deeper. This strategy enables the extraction of local features by allowing interaction between neighboring tokens in the shallow layers while also facilitating efficient extraction of global features by enabling interaction between not only adjacent tokens but also distant tokens in the deep layers. We conduct extensive experiments on five benchmark datasets to demonstrate the superior performance of our proposed method. Our DWT surpasses the state-of-the-art network of similar sizes by a PSNR margin of 0.11dB to 0.27dB on the Urban100 dataset. Moreover, even when compared to state-of-the-art network with about 1.4 times more parameters, DWT achieves competitive results for both quantitative and visual comparisons.https://ieeexplore.ieee.org/document/10147198/Image super-resolutionself-attention mechanismtransformerwindow-based self-attention
spellingShingle	Soobin Park Yong Suk Choi Image Super-Resolution Using Dilated Window Transformer IEEE Access Image super-resolution self-attention mechanism transformer window-based self-attention
title	Image Super-Resolution Using Dilated Window Transformer
title_full	Image Super-Resolution Using Dilated Window Transformer
title_fullStr	Image Super-Resolution Using Dilated Window Transformer
title_full_unstemmed	Image Super-Resolution Using Dilated Window Transformer
title_short	Image Super-Resolution Using Dilated Window Transformer
title_sort	image super resolution using dilated window transformer
topic	Image super-resolution self-attention mechanism transformer window-based self-attention
url	https://ieeexplore.ieee.org/document/10147198/
work_keys_str_mv	AT soobinpark imagesuperresolutionusingdilatedwindowtransformer AT yongsukchoi imagesuperresolutionusingdilatedwindowtransformer

Image Super-Resolution Using Dilated Window Transformer

Similar Items