Image Super-Resolution Using Dilated Window Transformer
Transformer-based networks using attention mechanisms have shown promising results in low-level vision tasks, such as image super-resolution (SR). Specifically, recent studies that utilize window-based self-attention mechanisms have exhibited notable advancements in image SR. However, window-based s...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10147198/ |
_version_ | 1827915976680669184 |
---|---|
author | Soobin Park Yong Suk Choi |
author_facet | Soobin Park Yong Suk Choi |
author_sort | Soobin Park |
collection | DOAJ |
description | Transformer-based networks using attention mechanisms have shown promising results in low-level vision tasks, such as image super-resolution (SR). Specifically, recent studies that utilize window-based self-attention mechanisms have exhibited notable advancements in image SR. However, window-based self-attention, results in a slower expansion of the receptive field, thereby restricting the modeling of long-range dependencies. To address this issue, we introduce a novel dilated window transformer, namely DWT, which utilizes a dilation strategy. We employ a simple yet efficient dilation strategy that enlarges the window by inserting intervals between the tokens of each window to enable rapid and effective expansion of the receptive field. In particular, we adjust the interval between the tokens to become wider as the layers go deeper. This strategy enables the extraction of local features by allowing interaction between neighboring tokens in the shallow layers while also facilitating efficient extraction of global features by enabling interaction between not only adjacent tokens but also distant tokens in the deep layers. We conduct extensive experiments on five benchmark datasets to demonstrate the superior performance of our proposed method. Our DWT surpasses the state-of-the-art network of similar sizes by a PSNR margin of 0.11dB to 0.27dB on the Urban100 dataset. Moreover, even when compared to state-of-the-art network with about 1.4 times more parameters, DWT achieves competitive results for both quantitative and visual comparisons. |
first_indexed | 2024-03-13T03:07:17Z |
format | Article |
id | doaj.art-ec181c10028b4ef287566c3c96f79245 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-03-13T03:07:17Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-ec181c10028b4ef287566c3c96f792452023-06-26T23:00:14ZengIEEEIEEE Access2169-35362023-01-0111600286003910.1109/ACCESS.2023.328453910147198Image Super-Resolution Using Dilated Window TransformerSoobin Park0https://orcid.org/0000-0002-5373-2219Yong Suk Choi1https://orcid.org/0000-0002-9042-0599Department of Artificial Intelligence, Hanyang University, Seoul, South KoreaDepartment of Computer Science, Hanyang University, Seoul, South KoreaTransformer-based networks using attention mechanisms have shown promising results in low-level vision tasks, such as image super-resolution (SR). Specifically, recent studies that utilize window-based self-attention mechanisms have exhibited notable advancements in image SR. However, window-based self-attention, results in a slower expansion of the receptive field, thereby restricting the modeling of long-range dependencies. To address this issue, we introduce a novel dilated window transformer, namely DWT, which utilizes a dilation strategy. We employ a simple yet efficient dilation strategy that enlarges the window by inserting intervals between the tokens of each window to enable rapid and effective expansion of the receptive field. In particular, we adjust the interval between the tokens to become wider as the layers go deeper. This strategy enables the extraction of local features by allowing interaction between neighboring tokens in the shallow layers while also facilitating efficient extraction of global features by enabling interaction between not only adjacent tokens but also distant tokens in the deep layers. We conduct extensive experiments on five benchmark datasets to demonstrate the superior performance of our proposed method. Our DWT surpasses the state-of-the-art network of similar sizes by a PSNR margin of 0.11dB to 0.27dB on the Urban100 dataset. Moreover, even when compared to state-of-the-art network with about 1.4 times more parameters, DWT achieves competitive results for both quantitative and visual comparisons.https://ieeexplore.ieee.org/document/10147198/Image super-resolutionself-attention mechanismtransformerwindow-based self-attention |
spellingShingle | Soobin Park Yong Suk Choi Image Super-Resolution Using Dilated Window Transformer IEEE Access Image super-resolution self-attention mechanism transformer window-based self-attention |
title | Image Super-Resolution Using Dilated Window Transformer |
title_full | Image Super-Resolution Using Dilated Window Transformer |
title_fullStr | Image Super-Resolution Using Dilated Window Transformer |
title_full_unstemmed | Image Super-Resolution Using Dilated Window Transformer |
title_short | Image Super-Resolution Using Dilated Window Transformer |
title_sort | image super resolution using dilated window transformer |
topic | Image super-resolution self-attention mechanism transformer window-based self-attention |
url | https://ieeexplore.ieee.org/document/10147198/ |
work_keys_str_mv | AT soobinpark imagesuperresolutionusingdilatedwindowtransformer AT yongsukchoi imagesuperresolutionusingdilatedwindowtransformer |