Spatial and frequency information fusion transformer for image super-resolution

被引:1
作者
Zhang, Yan [1 ]
Xu, Fujie [1 ]
Sun, Yemei [1 ]
Wang, Jiao [1 ]
机构
[1] Tianjin Chengjian Univ, Coll Comp & Informat Engn, Tianjin 300384, Peoples R China
关键词
Super resolution; Vision transformer; Frequency components; Convolutional neural network;
D O I
10.1016/j.neunet.2025.107351
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Previous works have indicated that Transformer-based models bring impressive image reconstruction performance in single image super-resolution (SISR). However, existing Transformer-based approaches utilize self-attention within non-overlapping windows. This restriction hinders the network's ability to adopt large receptive fields, which are essential for capturing global information and establishing long-distance dependencies, especially in the early layers. To fully leverage global information and activate more pixels during the image reconstruction process, we have developed a Spatial and Frequency Information Fusion Transformer (SFFT) with an expansive receptive field. SFFT concurrently combines spatial and frequency domain information to comprehensively leverage their complementary strengths, capturing both local and global image features while integrating low and high-frequency information. Additionally, we utilize the overlapping cross-attention block (OCAB) to facilitate pixel transmission between adjacent windows, enhancing network performance. During the training stage, we incorporate the Fast Fourier Transform (FFT) loss, thereby fully leveraging the capabilities of our proposed modules and further tapping into the model's potential. Extensive quantitative and qualitative evaluations on benchmark datasets indicate that the proposed algorithm surpasses state-of-the-art methods in terms of accuracy. Specifically, our method achieves a PSNR score of 32.67 dB on the Manga109 dataset, surpassing SwinIR by 0.64 dB and HAT by 0.19 dB, respectively. The source code and pre-trained models are available at https://github.com/Xufujie/SFFT
引用
收藏
页数:12
相关论文
共 68 条
[1]   Low-Complexity Single-Image Super-Resolution based on Nonnegative Neighbor Embedding [J].
Bevilacqua, Marco ;
Roumy, Aline ;
Guillemot, Christine ;
Morel, Marie-Line Alberi .
PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2012, 2012,
[2]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[3]   Pre-Trained Image Processing Transformer [J].
Chen, Hanting ;
Wang, Yunhe ;
Guo, Tianyu ;
Xu, Chang ;
Deng, Yiping ;
Liu, Zhenhua ;
Ma, Siwei ;
Xu, Chunjing ;
Xu, Chao ;
Gao, Wen .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :12294-12305
[4]   HINet: Half Instance Normalization Network for Image Restoration [J].
Chen, Liangyu ;
Lu, Xin ;
Zhang, Jie ;
Chu, Xiaojie ;
Chen, Chengpeng .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, :182-192
[5]   Activating More Pixels in Image Super-Resolution Transformer [J].
Chen, Xiangyu ;
Wang, Xintao ;
Zhou, Jiantao ;
Qiao, Yu ;
Dong, Chao .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :22367-22377
[6]   MICU: Image super-resolution via multi-level information compensation and U-net [J].
Chen, Yuantao ;
Xia, Runlong ;
Yang, Kai ;
Zou, Ke .
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 245
[7]   Dual Aggregation Transformer for Image Super-Resolution [J].
Chen, Zheng ;
Zhang, Yulun ;
Gu, Jinjin ;
Kong, Linghe ;
Yang, Xiaokang ;
Yu, Fisher .
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, :12278-12287
[8]  
Chen Z, 2023, Arxiv, DOI arXiv:2211.13654
[9]  
Chi L., 2020, P 34 INT C NEUR INF, V33, P4479, DOI DOI 10.5555/3495724.3496100
[10]  
Chu XX, 2021, ADV NEUR IN