Image Super-Resolution Using Dilated Window Transformer

Transformer-based networks using attention mechanisms have shown promising results in low-level vision tasks, such as image super-resolution (SR). Specifically, recent studies that utilize window-based self-attention mechanisms have exhibited notable advancements in image SR. However, window-based s...

Full description

Bibliographic Details
Main Authors: Soobin Park, Yong Suk Choi
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10147198/
_version_ 1827915976680669184
author Soobin Park
Yong Suk Choi
author_facet Soobin Park
Yong Suk Choi
author_sort Soobin Park
collection DOAJ
description Transformer-based networks using attention mechanisms have shown promising results in low-level vision tasks, such as image super-resolution (SR). Specifically, recent studies that utilize window-based self-attention mechanisms have exhibited notable advancements in image SR. However, window-based self-attention, results in a slower expansion of the receptive field, thereby restricting the modeling of long-range dependencies. To address this issue, we introduce a novel dilated window transformer, namely DWT, which utilizes a dilation strategy. We employ a simple yet efficient dilation strategy that enlarges the window by inserting intervals between the tokens of each window to enable rapid and effective expansion of the receptive field. In particular, we adjust the interval between the tokens to become wider as the layers go deeper. This strategy enables the extraction of local features by allowing interaction between neighboring tokens in the shallow layers while also facilitating efficient extraction of global features by enabling interaction between not only adjacent tokens but also distant tokens in the deep layers. We conduct extensive experiments on five benchmark datasets to demonstrate the superior performance of our proposed method. Our DWT surpasses the state-of-the-art network of similar sizes by a PSNR margin of 0.11dB to 0.27dB on the Urban100 dataset. Moreover, even when compared to state-of-the-art network with about 1.4 times more parameters, DWT achieves competitive results for both quantitative and visual comparisons.
first_indexed 2024-03-13T03:07:17Z
format Article
id doaj.art-ec181c10028b4ef287566c3c96f79245
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-13T03:07:17Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-ec181c10028b4ef287566c3c96f792452023-06-26T23:00:14ZengIEEEIEEE Access2169-35362023-01-0111600286003910.1109/ACCESS.2023.328453910147198Image Super-Resolution Using Dilated Window TransformerSoobin Park0https://orcid.org/0000-0002-5373-2219Yong Suk Choi1https://orcid.org/0000-0002-9042-0599Department of Artificial Intelligence, Hanyang University, Seoul, South KoreaDepartment of Computer Science, Hanyang University, Seoul, South KoreaTransformer-based networks using attention mechanisms have shown promising results in low-level vision tasks, such as image super-resolution (SR). Specifically, recent studies that utilize window-based self-attention mechanisms have exhibited notable advancements in image SR. However, window-based self-attention, results in a slower expansion of the receptive field, thereby restricting the modeling of long-range dependencies. To address this issue, we introduce a novel dilated window transformer, namely DWT, which utilizes a dilation strategy. We employ a simple yet efficient dilation strategy that enlarges the window by inserting intervals between the tokens of each window to enable rapid and effective expansion of the receptive field. In particular, we adjust the interval between the tokens to become wider as the layers go deeper. This strategy enables the extraction of local features by allowing interaction between neighboring tokens in the shallow layers while also facilitating efficient extraction of global features by enabling interaction between not only adjacent tokens but also distant tokens in the deep layers. We conduct extensive experiments on five benchmark datasets to demonstrate the superior performance of our proposed method. Our DWT surpasses the state-of-the-art network of similar sizes by a PSNR margin of 0.11dB to 0.27dB on the Urban100 dataset. Moreover, even when compared to state-of-the-art network with about 1.4 times more parameters, DWT achieves competitive results for both quantitative and visual comparisons.https://ieeexplore.ieee.org/document/10147198/Image super-resolutionself-attention mechanismtransformerwindow-based self-attention
spellingShingle Soobin Park
Yong Suk Choi
Image Super-Resolution Using Dilated Window Transformer
IEEE Access
Image super-resolution
self-attention mechanism
transformer
window-based self-attention
title Image Super-Resolution Using Dilated Window Transformer
title_full Image Super-Resolution Using Dilated Window Transformer
title_fullStr Image Super-Resolution Using Dilated Window Transformer
title_full_unstemmed Image Super-Resolution Using Dilated Window Transformer
title_short Image Super-Resolution Using Dilated Window Transformer
title_sort image super resolution using dilated window transformer
topic Image super-resolution
self-attention mechanism
transformer
window-based self-attention
url https://ieeexplore.ieee.org/document/10147198/
work_keys_str_mv AT soobinpark imagesuperresolutionusingdilatedwindowtransformer
AT yongsukchoi imagesuperresolutionusingdilatedwindowtransformer