Image Super-Resolution via Efficient Transformer Embedding Frequency Decomposition With Restart

被引:1
作者
Zuo, Yifan [1 ]
Yao, Wenhao [1 ]
Hu, Yuqi [1 ]
Fang, Yuming [1 ]
Liu, Wei [2 ]
Peng, Yuxin [3 ]
机构
[1] Jiangxi Univ Finance & Econ, Sch Informat Management, Nanchang 330032, Jiangxi, Peoples R China
[2] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai 200240, Peoples R China
[3] Peking Univ, Wangxuan Inst Comp Technol, Beijing 100871, Peoples R China
基金
中国国家自然科学基金;
关键词
Frequency decomposition; vision transformer; image super-resolution; self-attention; octave convolution; NETWORK;
D O I
10.1109/TIP.2024.3444317
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, transformer-based backbones show superior performance over the convolutional counterparts in computer vision. Due to quadratic complexity with respect to the token number in global attention, local attention is always adopted in low-level image processing with linear complexity. However, the limited receptive field is harmful to the performance. In this paper, motivated by Octave convolution, we propose a transformer-based single image super-resolution (SISR) model, which explicitly embeds dynamic frequency decomposition into the standard local transformer. All the frequency components are continuously updated and re-assigned via intra-scale attention and inter-scale interaction, respectively. Specifically, the attention in low resolution is enough for low-frequency features, which not only increases the receptive field, but also decreases the complexity. Compared with the standard local transformer, the proposed FDRTran layer simultaneously decreases FLOPs and parameters. By contrast, Octave convolution only decreases FLOPs of the standard convolution, but keeps the parameter number unchanged. In addition, the restart mechanism is proposed for every a few frequency updates, which first fuses the low and high frequency, then decomposes the features again. In this way, the features can be decomposed in multiple viewpoints by learnable parameters, which avoids the risk of early saturation for frequency representation. Furthermore, based on the FDRTran layer with restart mechanism, the proposed FDRNet is the first transformer backbone for SISR which discusses the Octave design. Sufficient experiments show our model reaches state-of-the-art performance on 6 synthetic and real datasets. The code and the models are available at https://github.com/catnip1029/FDRNet.
引用
收藏
页码:4670 / 4685
页数:16
相关论文
共 66 条
[1]   NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study [J].
Agustsson, Eirikur ;
Timofte, Radu .
2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, :1122-1131
[2]   Fast, Accurate, and Lightweight Super-Resolution with Cascading Residual Network [J].
Ahn, Namhyuk ;
Kang, Byungkon ;
Sohn, Kyung-Ah .
COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 :256-272
[3]   Low-Complexity Single-Image Super-Resolution based on Nonnegative Neighbor Embedding [J].
Bevilacqua, Marco ;
Roumy, Aline ;
Guillemot, Christine ;
Morel, Marie-Line Alberi .
PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2012, 2012,
[4]   NTIRE 2019 Challenge on Real Image Super-Resolution: Methods and Results [J].
Cai, Jianrui ;
Gu, Shuhang ;
Timofte, Radu ;
Zhang, Lei ;
Liu, Xiao ;
Ding, Yukang ;
He, Dongliang ;
Li, Chao ;
Fu, Yi ;
Wen, Shilei ;
Feng, Ruicheng ;
Gu, Jinjin ;
Qiao, Yu ;
Dong, Chao ;
Park, Dongwon ;
Chun, Se Young ;
Yoon, Sanghoon ;
Kwak, Junhyung ;
Son, Donghee ;
Zamir, Syed Waqas ;
Arora, Aditya ;
Khan, Salman ;
Khan, Fahad Shahbaz ;
Shao, Ling ;
Wei, Zhengping ;
Liu, Lei ;
Cai, Hong ;
Li, Darui ;
Gao, Fujie ;
Hui, Zheng ;
Wang, Xiumei ;
Gao, Xinbo ;
Cheng, Guoan ;
Matsune, Ai ;
Li, Qiuyu ;
Zhu, Leilei ;
Zang, Huaijuan ;
Zhan, Shu ;
Qiu, Yajun ;
Wang, Ruxin ;
Li, Jiawei ;
Jing, Yongcheng ;
Song, Mingli ;
Liu, Pengju ;
Zhang, Kai ;
Liu, Jingdong ;
Liu, Jiye ;
Zhang, Hongzhi ;
Zuo, Wangmeng ;
Tang, Wenyi .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, :2211-2223
[5]   Toward Real-World Single Image Super-Resolution: A New Benchmark and A New Model [J].
Cai, Jianrui ;
Zeng, Hui ;
Yong, Hongwei ;
Cao, Zisheng ;
Zhang, Lei .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3086-3095
[6]   TDPN: Texture and Detail-Preserving Network for Single Image Super-Resolution [J].
Cai, Qing ;
Li, Jinxing ;
Li, Huafeng ;
Yang, Yee-Hong ;
Wu, Feng ;
Zhang, David .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 :2375-2389
[7]   Pre-Trained Image Processing Transformer [J].
Chen, Hanting ;
Wang, Yunhe ;
Guo, Tianyu ;
Xu, Chang ;
Deng, Yiping ;
Liu, Zhenhua ;
Ma, Siwei ;
Xu, Chunjing ;
Xu, Chao ;
Gao, Wen .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :12294-12305
[8]   Activating More Pixels in Image Super-Resolution Transformer [J].
Chen, Xiangyu ;
Wang, Xintao ;
Zhou, Jiantao ;
Qiao, Yu ;
Dong, Chao .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :22367-22377
[9]   Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution [J].
Chen, Yunpeng ;
Fan, Haoqi ;
Xu, Bing ;
Yan, Zhicheng ;
Kalantidis, Yannis ;
Rohrbach, Marcus ;
Yan, Shuicheng ;
Feng, Jiashi .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3434-3443
[10]   Second-order Attention Network for Single Image Super-Resolution [J].
Dai, Tao ;
Cai, Jianrui ;
Zhang, Yongbing ;
Xia, Shu-Tao ;
Zhang, Lei .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :11057-11066