Cross-Frame Transformer-Based Spatio-Temporal Video Super-Resolution

被引:18
作者
Zhang, Wenhui [1 ]
Zhou, Mingliang [2 ]
Ji, Cheng [3 ]
Sui, Xiubao [1 ]
Bai, Junqi [4 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Elect & Opt Engn, Nanjing 210094, Jiangsu, Peoples R China
[2] Chongqing Univ, Coll Comp Sci, Chongqing 400030, Peoples R China
[3] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Jiangsu, Peoples R China
[4] China Elect Technol Grp Corp, Res Inst 28, Equipment Technol Ctr, Nanjing 210007, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Superresolution; Feature extraction; Transformers; Image reconstruction; Interpolation; Convolution; Task analysis; Transformer network; spatio-temporal video super-resolution; cross-frame transformer module; multi-level residual reconstruction; self-attention; video frame interpolation; IMAGE QUALITY ASSESSMENT;
D O I
10.1109/TBC.2022.3147145
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we explore the spatio-temporal video super-resolution task, which aims to generate a high-resolution and high-frame-rate video from an existing video with low resolution and frame rate. First, we propose an end-to-end spatio-temporal video super-resolution network chiefly composed of cross-frame transformers instead of traditional convolutions. Especially, the cross-frame transformer module divides the input feature sequence into query, key, value matrixes, and then obtains the maximum similarity and similarity coefficient matrixes between neighboring and current feature maps through self-attention processing operations. Next, we propose a multi-level residual reconstruction module, which could make full use of the maximum similarity and similarity coefficient matrixes obtained by the cross-frame transformer, to reconstruct the high resolution and frame rate results from coarse to fine. Qualitative and quantitative evaluation results show that our method offers better performance and requires fewer training parameters compared with the existing two-stage network.
引用
收藏
页码:359 / 369
页数:11
相关论文
共 48 条
[1]   Depth-Aware Video Frame Interpolation [J].
Bao, Wenbo ;
Lai, Wei-Sheng ;
Ma, Chao ;
Zhang, Xiaoyun ;
Gao, Zhiyong ;
Yang, Ming-Hsuan .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3698-3707
[2]   MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement [J].
Bao, Wenbo ;
Lai, Wei-Sheng ;
Zhang, Xiaoyun ;
Gao, Zhiyong ;
Yang, Ming-Hsuan .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (03) :933-948
[3]   Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation [J].
Caballero, Jose ;
Ledig, Christian ;
Aitken, Andrew ;
Acosta, Alejandro ;
Totz, Johannes ;
Wang, Zehan ;
Shi, Wenzhe .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2848-2857
[4]   Fast and robust multiframe super resolution [J].
Farsiu, S ;
Robinson, MD ;
Elad, M ;
Milanfar, P .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2004, 13 (10) :1327-1344
[5]   Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation [J].
Jiang, Huaizu ;
Sun, Deqing ;
Jampani, Varun ;
Yang, Ming-Hsuan ;
Learned-Miller, Erik ;
Kautz, Jan .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :9000-9008
[6]   Lightweight Super-Resolution Using Deep Neural Learning [J].
Jiang, Zhuqing ;
Zhu, Honghui ;
Lu, Yue ;
Ju, Guodong ;
Men, Aidong .
IEEE TRANSACTIONS ON BROADCASTING, 2020, 66 (04) :814-823
[7]   Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation [J].
Jo, Younghyun ;
Oh, Seoung Wug ;
Kang, Jaeyeon ;
Kim, Seon Joo .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3224-3232
[8]   Video Super-Resolution With Convolutional Neural Networks [J].
Kappeler, Armin ;
Yoo, Seunghwan ;
Dai, Qiqin ;
Katsaggelos, Aggelos K. .
IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2016, 2 (02) :109-122
[9]   Spatio-Temporal Transformer Network for Video Restoration [J].
Kim, Tae Hyun ;
Sajjadi, Mehdi S. M. ;
Hirsch, Michael ;
Schoelkopf, Bernhard .
COMPUTER VISION - ECCV 2018, PT III, 2018, 11207 :111-127
[10]   Fast Spatio-Temporal Residual Network for Video Super-Resolution [J].
Li, Sheng ;
He, Fengxiang ;
Du, Bo ;
Zhang, Lefei ;
Xu, Yonghao ;
Tao, Dacheng .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :10514-10523