Enriched CNN-Transformer Feature Aggregation Networks for Super-Resolution

被引:44
作者
Yoo, Jinsu [1 ]
Kim, Taehoon [2 ]
Lee, Sihaeng [2 ]
Kim, Seung Hwan [2 ]
Lee, Honglak [2 ]
Kim, Tae Hyun [1 ]
机构
[1] Hanyang Univ, Seoul, South Korea
[2] LG AI Res, Seoul, South Korea
来源
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) | 2023年
关键词
D O I
10.1109/WACV56688.2023.00493
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent transformer-based super-resolution (SR) methods have achieved promising results against conventional CNN-based methods. However, these approaches suffer from essential shortsightedness created by only utilizing the standard self-attention-based reasoning. In this paper, we introduce an effective hybrid SR network to aggregate enriched features, including local features from CNNs and long-range multi-scale dependencies captured by transformers. Specifically, our network comprises transformer and convolutional branches, which synergetically complement each representation during the restoration procedure. Furthermore, we propose a cross-scale token attention module, allowing the transformer branch to exploit the informative relationships among tokens across different scales efficiently. Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.
引用
收藏
页码:4945 / 4954
页数:10
相关论文
共 51 条
[41]  
Vaswani A, 2017, ADV NEUR IN, V30
[42]   Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions [J].
Wang, Wenhai ;
Xie, Enze ;
Li, Xiang ;
Fan, Deng-Ping ;
Song, Kaitao ;
Liang, Ding ;
Lu, Tong ;
Luo, Ping ;
Shao, Ling .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :548-558
[43]   Uformer: A General U-Shaped Transformer for Image Restoration [J].
Wang, Zhendong ;
Cun, Xiaodong ;
Bao, Jianmin ;
Zhou, Wengang ;
Liu, Jianzhuang ;
Li, Houqiang .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :17662-17672
[44]  
Wang Zhendong, 2022, P IEEE CVF C COMP VI
[45]   CvT: Introducing Convolutions to Vision Transformers [J].
Wu, Haiping ;
Xiao, Bin ;
Codella, Noel ;
Liu, Mengchen ;
Dai, Xiyang ;
Yuan, Lu ;
Zhang, Lei .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :22-31
[46]  
Yuan Li, 2021, P IEEE CVF INT C COM
[47]  
Zeyde R., 2012, CURVES SURFACES, DOI [DOI 10.1007/978-3-642-27413-847, 10.1007/978-3-642-27413-8_47, DOI 10.1007/978-3-642-27413-8_47]
[48]  
Zhang Y., 2019, INT C LEARN REPR, DOI DOI 10.1080/21691401.2018.1483379
[49]   Image Super-Resolution Using Very Deep Residual Channel Attention Networks [J].
Zhang, Yulun ;
Li, Kunpeng ;
Li, Kai ;
Wang, Lichen ;
Zhong, Bineng ;
Fu, Yun .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :294-310
[50]  
Zhou Shangchen, 2020, Advances in Neural Information Processing Systems (NeurIPS)