Local feature enhancement transformer for image super-resolution

被引:0
作者
Huang, Weijie [1 ]
Huang, Detian [2 ]
机构
[1] Huaqiao Univ, Sch Business, Quanzhou 362021, Fujian, Peoples R China
[2] Huaqiao Univ, Coll Engn, Quanzhou 362021, Fujian, Peoples R China
基金
中国国家自然科学基金;
关键词
Image super-resolution; Transformer; Global context information; Local feature interaction; Attention mechanism; NETWORK;
D O I
10.1038/s41598-025-07650-x
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Transformers have demonstrated remarkable success in image super-resolution (SR) owing to their powerful long-range dependency modeling capability. Although increasing the sliding window size of transformer-based models (e.g., SwinIR) can improve SR performance, this weakens the learning of the fine-level local features, resulting in blurry details in the reconstructed images. To address this limitation, we propose a local feature enhancement transformer for image super-resolution (LFESR) that benefits from global feature capture while enhancing local feature interaction. The basis of our LFESR is the local feature enhancement transformer (LFET), which achieves a balance between the spatial processing and channel configuration in self-attention. Our LFET contains neighborhood self-attention (NSA) and a ghost head, which can be easily applied to existing SR networks based on window self-attention. First, NSA utilizes the Hadamard operation to implement a third-order mapping to enhance local interaction, thus providing clues for high-quality image reconstruction. Next, the novel ghost head combines attention maps with static matrices to increase the channel capacity, thereby enhancing the inference capability of local features. Finally, ConvFFN is incorporated to further strengthen high-frequency detail information for reconstructed images. Extensive experiments were performed to validate the proposed LFESR, which significantly outperformed state-of-the-art methods in terms of both visual quality and quantitative metrics. Especially, the proposed LFESR exceeds SwinIR by 0.49 dB and 0.52 dB in PSNR metrics at a scaling factor of 4 on Urban100 and Manga109 datasets, respectively.
引用
收藏
页数:15
相关论文
共 63 条
[1]   Low-Complexity Single-Image Super-Resolution based on Nonnegative Neighbor Embedding [J].
Bevilacqua, Marco ;
Roumy, Aline ;
Guillemot, Christine ;
Morel, Marie-Line Alberi .
PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2012, 2012,
[2]   Toward Real-World Single Image Super-Resolution: A New Benchmark and A New Model [J].
Cai, Jianrui ;
Zeng, Hui ;
Yong, Hongwei ;
Cao, Zisheng ;
Zhang, Lei .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3086-3095
[3]  
Chen Chun-Fu., 2021, arXiv
[4]   Pre-Trained Image Processing Transformer [J].
Chen, Hanting ;
Wang, Yunhe ;
Guo, Tianyu ;
Xu, Chang ;
Deng, Yiping ;
Liu, Zhenhua ;
Ma, Siwei ;
Xu, Chunjing ;
Xu, Chao ;
Gao, Wen .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :12294-12305
[5]  
Chen XY, 2022, Arxiv, DOI arXiv:2205.04437
[6]  
Chu X., 2021, Twins: Revisiting the design of spatial attention in vision transformers, DOI 10.48550/arXiv.2104.13840
[7]   Second-order Attention Network for Single Image Super-Resolution [J].
Dai, Tao ;
Cai, Jianrui ;
Zhang, Yongbing ;
Xia, Shu-Tao ;
Zhang, Lei .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :11057-11066
[8]  
Dong X., 2021, Cswin transformer: A general vision transformer backbone with cross-shaped windows, P12114, DOI [10.48550/arXiv.2107.00652, DOI 10.48550/ARXIV.2107.00652]
[9]  
Fang JM, 2022, Arxiv, DOI arXiv:2105.15168
[10]  
Han K, 2021, ADV NEUR IN