Efficient frequency feature aggregation transformer for image super-resolution

被引:0
作者
Song, Jianwen [1 ,2 ]
Sowmya, Arcot [1 ]
Sun, Changming [1 ,2 ]
机构
[1] Univ New South Wales, Sch Comp Sci & Engn, Sydney, NSW 2052, Australia
[2] CSIRO Data61, Epping, NSW 1710, Australia
关键词
Image super-resolution; Frequency aggregation; Transformer; Feature extraction; ATTENTION; MODULE;
D O I
10.1016/j.patcog.2025.111735
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although vision transformers have shown remarkable performance in image super-resolution tasks, the key component, i.e., the self-attention mechanism, suffers from insufficient high-frequency information extraction capability and high computational costs, hindering further advancement. To address these limitations, we propose an efficient frequency feature aggregation transformer for single image super-resolution (EFATSR). Specifically, a frequency self-attention aggregation block is proposed to enhance the extraction of high-frequency information. This block incorporates a frequency spatial feature aggregation branch to supplement high-frequency feature extraction for a self-attention branch, enabling the model to capture high-frequency information more effectively. Additionally, a frequency channel-spatial aggregation block is proposed to extract channel and spatial features in the frequency domain, enhancing the efficiency of deep feature extraction. Extensive experiments on single image super-resolution demonstrate that EFATSR achieves stateof-the-art performance while maintaining low computational complexity. Furthermore, we extend EFATSR for stereo image super-resolution by incorporating a multi-head parallax-attention block, forming EFATSSR, which also shows remarkable performance and high efficiency. Source code is avaliable at https://github.com/ jianwensong/EFATSR.
引用
收藏
页数:12
相关论文
共 57 条
[21]   Adaptive Frequency Filters As Efficient Global Token Mixers [J].
Huang, Zhipeng ;
Zhang, Zhizheng ;
Lan, Cuiling ;
Zha, Zheng-Jun ;
Lu, Yan ;
Guo, Baining .
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, :6026-6036
[22]   Lightweight Image Super-Resolution with Information Multi-distillation Network [J].
Hui, Zheng ;
Gao, Xinbo ;
Yang, Yunchu ;
Wang, Xiumei .
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, :2024-2032
[23]   Hierarchical dense recursive network for image super-resolution [J].
Jiang, Kui ;
Wang, Zhongyuan ;
Yi, Peng ;
Jiang, Junjun .
PATTERN RECOGNITION, 2020, 107
[24]   Efficient Frequency Domain-based Transformers for High-Quality Image Deblurring [J].
Kong, Lingshun ;
Dong, Jiangxin ;
Ge, Jianjun ;
Li, Mingqiang ;
Pan, Jinshan .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :5886-5895
[25]  
Li JM, 2023, AAAI CONF ARTIF INTE, P1343
[26]  
Li Wenbo, 2020, Advances in Neural Information Processing Systems, V33
[27]   DLGSANet: Lightweight Dynamic Local and Global Self-Attention Network for Image Super-Resolution [J].
Li, Xiang ;
Dong, Jiangxin ;
Tang, Jinhui ;
Pan, Jinshan .
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, :12746-12755
[28]   SwinIR: Image Restoration Using Swin Transformer [J].
Liang, Jingyun ;
Cao, Jiezhang ;
Sun, Guolei ;
Zhang, Kai ;
Van Gool, Luc ;
Timofte, Radu .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, :1833-1844
[29]   Enhanced Deep Residual Networks for Single Image Super-Resolution [J].
Lim, Bee ;
Son, Sanghyun ;
Kim, Heewon ;
Nah, Seungjun ;
Lee, Kyoung Mu .
2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, :1132-1140
[30]   Steformer: Efficient Stereo Image Super-Resolution With Transformer [J].
Lin, Jianxin ;
Yin, Lianying ;
Wang, Yijun .
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 :8396-8407