Image Super-Resolution Using Dilated Window Transformer

被引:4
作者
Park, Soobin [1 ]
Choi, Yong Suk [2 ]
机构
[1] Hanyang Univ, Dept Artificial Intelligence, Seoul 04763, South Korea
[2] Hanyang Univ, Dept Comp Sci, Seoul 04763, South Korea
来源
IEEE ACCESS | 2023年 / 11卷 / 60028-60039期
基金
新加坡国家研究基金会;
关键词
Image super-resolution; self-attention mechanism; transformer; window-based self-attention;
D O I
10.1109/ACCESS.2023.3284539
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Transformer-based networks using attention mechanisms have shown promising results in low-level vision tasks, such as image super-resolution (SR). Specifically, recent studies that utilize window-based self-attention mechanisms have exhibited notable advancements in image SR. However, window-based self-attention, results in a slower expansion of the receptive field, thereby restricting the modeling of long-range dependencies. To address this issue, we introduce a novel dilated window transformer, namely DWT, which utilizes a dilation strategy. We employ a simple yet efficient dilation strategy that enlarges the window by inserting intervals between the tokens of each window to enable rapid and effective expansion of the receptive field. In particular, we adjust the interval between the tokens to become wider as the layers go deeper. This strategy enables the extraction of local features by allowing interaction between neighboring tokens in the shallow layers while also facilitating efficient extraction of global features by enabling interaction between not only adjacent tokens but also distant tokens in the deep layers. We conduct extensive experiments on five benchmark datasets to demonstrate the superior performance of our proposed method. Our DWT surpasses the state-of-the-art network of similar sizes by a PSNR margin of 0.11dB to 0.27dB on the Urban100 dataset. Moreover, even when compared to state-of-the-art network with about 1.4 times more parameters, DWT achieves competitive results for both quantitative and visual comparisons.
引用
收藏
页码:60028 / 60039
页数:12
相关论文
共 55 条
  • [1] [Anonymous], 2011, 45 AS C SIGN SYST CO
  • [2] Ben Niu, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12357), P191, DOI 10.1007/978-3-030-58610-2_12
  • [3] Low-Complexity Single-Image Super-Resolution based on Nonnegative Neighbor Embedding
    Bevilacqua, Marco
    Roumy, Aline
    Guillemot, Christine
    Morel, Marie-Line Alberi
    [J]. PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2012, 2012,
  • [4] Cao Hu, 2023, Computer Vision - ECCV 2022 Workshops: Proceedings. Lecture Notes in Computer Science (13803), P205, DOI 10.1007/978-3-031-25066-8_9
  • [5] Cao JZ, 2023, Arxiv, DOI arXiv:2106.06847
  • [6] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
  • [7] Pre-Trained Image Processing Transformer
    Chen, Hanting
    Wang, Yunhe
    Guo, Tianyu
    Xu, Chang
    Deng, Yiping
    Liu, Zhenhua
    Ma, Siwei
    Xu, Chunjing
    Xu, Chao
    Gao, Wen
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12294 - 12305
  • [8] Chen XY, 2022, Arxiv, DOI [arXiv:2205.04437, DOI 10.48550/ARXIV.2205.04437]
  • [9] Chu XX, 2021, ADV NEUR IN
  • [10] Second-order Attention Network for Single Image Super-Resolution
    Dai, Tao
    Cai, Jianrui
    Zhang, Yongbing
    Xia, Shu-Tao
    Zhang, Lei
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 11057 - 11066