Enriched CNN-Transformer Feature Aggregation Networks for Super-Resolution

被引：44

作者：

Yoo, Jinsu ^{[1
]}

Kim, Taehoon ^{[2
]}

Lee, Sihaeng ^{[2
]}

Kim, Seung Hwan ^{[2
]}

Lee, Honglak ^{[2
]}

Kim, Tae Hyun ^{[1
]}

机构：

[1] Hanyang Univ, Seoul, South Korea

[2] LG AI Res, Seoul, South Korea

来源：

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) | 2023年

关键词：

D O I：

10.1109/WACV56688.2023.00493

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent transformer-based super-resolution (SR) methods have achieved promising results against conventional CNN-based methods. However, these approaches suffer from essential shortsightedness created by only utilizing the standard self-attention-based reasoning. In this paper, we introduce an effective hybrid SR network to aggregate enriched features, including local features from CNNs and long-range multi-scale dependencies captured by transformers. Specifically, our network comprises transformer and convolutional branches, which synergetically complement each representation during the restoration procedure. Furthermore, we propose a cross-scale token attention module, allowing the transformer branch to exploit the informative relationships among tokens across different scales efficiently. Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.

引用

页码：4945 / 4954

页数：10

共 51 条

[41]

Vaswani A, 2017, ADV NEUR IN, V30

[42] Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions [J].

Wang, Wenhai ;

Xie, Enze ;

Li, Xiang ;

Fan, Deng-Ping ;

Song, Kaitao ;

Liang, Ding ;

Lu, Tong ;

Luo, Ping ;

Shao, Ling .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :548-558

[43] Uformer: A General U-Shaped Transformer for Image Restoration [J].

Wang, Zhendong ;

Cun, Xiaodong ;

Bao, Jianmin ;

Zhou, Wengang ;

Liu, Jianzhuang ;

Li, Houqiang .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :17662-17672

[44]

Wang Zhendong, 2022, P IEEE CVF C COMP VI

[45] CvT: Introducing Convolutions to Vision Transformers [J].

Wu, Haiping ;

Xiao, Bin ;

Codella, Noel ;

Liu, Mengchen ;

Dai, Xiyang ;

Yuan, Lu ;

Zhang, Lei .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :22-31

[46]

Yuan Li, 2021, P IEEE CVF INT C COM

[47]

Zeyde R., 2012, CURVES SURFACES, DOI [DOI 10.1007/978-3-642-27413-847, 10.1007/978-3-642-27413-8_47, DOI 10.1007/978-3-642-27413-8_47]

[48]

Zhang Y., 2019, INT C LEARN REPR, DOI DOI 10.1080/21691401.2018.1483379

[49] Image Super-Resolution Using Very Deep Residual Channel Attention Networks [J].

Zhang, Yulun ;

Li, Kunpeng ;

Li, Kai ;

Wang, Lichen ;

Zhong, Bineng ;

Fu, Yun .

COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :294-310

[50]

Zhou Shangchen, 2020, Advances in Neural Information Processing Systems (NeurIPS)

← 1 2 3 4 5 6 →