Image Super-Resolution Using Dilated Window Transformer

被引：4

作者：

Park, Soobin ^{[1
]}

Choi, Yong Suk ^{[2
]}

机构：

[1] Hanyang Univ, Dept Artificial Intelligence, Seoul 04763, South Korea

[2] Hanyang Univ, Dept Comp Sci, Seoul 04763, South Korea

来源：

IEEE ACCESS | 2023年 / 11卷 / 60028-60039期

基金：

新加坡国家研究基金会;

关键词：

Image super-resolution; self-attention mechanism; transformer; window-based self-attention;

D O I：

10.1109/ACCESS.2023.3284539

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Transformer-based networks using attention mechanisms have shown promising results in low-level vision tasks, such as image super-resolution (SR). Specifically, recent studies that utilize window-based self-attention mechanisms have exhibited notable advancements in image SR. However, window-based self-attention, results in a slower expansion of the receptive field, thereby restricting the modeling of long-range dependencies. To address this issue, we introduce a novel dilated window transformer, namely DWT, which utilizes a dilation strategy. We employ a simple yet efficient dilation strategy that enlarges the window by inserting intervals between the tokens of each window to enable rapid and effective expansion of the receptive field. In particular, we adjust the interval between the tokens to become wider as the layers go deeper. This strategy enables the extraction of local features by allowing interaction between neighboring tokens in the shallow layers while also facilitating efficient extraction of global features by enabling interaction between not only adjacent tokens but also distant tokens in the deep layers. We conduct extensive experiments on five benchmark datasets to demonstrate the superior performance of our proposed method. Our DWT surpasses the state-of-the-art network of similar sizes by a PSNR margin of 0.11dB to 0.27dB on the Urban100 dataset. Moreover, even when compared to state-of-the-art network with about 1.4 times more parameters, DWT achieves competitive results for both quantitative and visual comparisons.

引用

页码：60028 / 60039

页数：12

共 55 条

[1] [Anonymous], 2011, 45 AS C SIGN SYST CO
[2] Ben Niu, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12357), P191, DOI 10.1007/978-3-030-58610-2_12
[3] Low-Complexity Single-Image Super-Resolution based on Nonnegative Neighbor Embedding
Bevilacqua, Marco
Roumy, Aline
Guillemot, Christine
Morel, Marie-Line Alberi
[J]. PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2012, 2012,
[4] Cao Hu, 2023, Computer Vision - ECCV 2022 Workshops: Proceedings. Lecture Notes in Computer Science (13803), P205, DOI 10.1007/978-3-031-25066-8_9
[5] Cao JZ, 2023, Arxiv, DOI arXiv:2106.06847
[6] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[7] Pre-Trained Image Processing Transformer
Chen, Hanting
Wang, Yunhe
Guo, Tianyu
Xu, Chang
Deng, Yiping
Liu, Zhenhua
Ma, Siwei
Xu, Chunjing
Xu, Chao
Gao, Wen
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12294 - 12305
[8] Chen XY, 2022, Arxiv, DOI [arXiv:2205.04437, DOI 10.48550/ARXIV.2205.04437]
[9] Chu XX, 2021, ADV NEUR IN
[10] Second-order Attention Network for Single Image Super-Resolution
Dai, Tao
Cai, Jianrui
Zhang, Yongbing
Xia, Shu-Tao
Zhang, Lei
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 11057 - 11066

← 1 2 3 4 5 6 →