Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting

被引:13
作者
Li, Gen [1 ]
Ji, Jie [1 ]
Qin, Minghai
Niu, Wei [2 ]
Ren, Bin [2 ]
Afghah, Fatemeh [1 ]
Guo, Linke [1 ]
Ma, Xiaolong [1 ]
机构
[1] Clemson Univ, Clemson, SC 29631 USA
[2] William & Mary, Williamsburg, VA USA
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年
基金
美国国家科学基金会;
关键词
RECONSTRUCTION;
D O I
10.1109/CVPR52729.2023.00989
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As deep convolutional neural networks (DNNs) are widely used in various fields of computer vision, leveraging the overfitting ability of the DNN to achieve video resolution upscaling has become a new trend in the modern video delivery system. By dividing videos into chunks and overfitting each chunk with a super-resolution model, the server encodes videos before transmitting them to the clients, thus achieving better video quality and transmission efficiency. However, a large number of chunks are expected to ensure good overfitting quality, which substantially increases the storage and consumes more bandwidth resources for data transmission. On the other hand, decreasing the number of chunks through training optimization techniques usually requires high model capacity, which significantly slows down execution speed. To reconcile such, we propose a novel method for high-quality and efficient video resolution upscaling tasks, which leverages the spatial-temporal information to accurately divide video into chunks, thus keeping the number of chunks as well as the model size to minimum. Additionally, we advance our method into a single overfitting model by a data-aware joint training technique, which further reduces the storage requirement with negligible quality drop. We deploy our models on an off-the-shelf mobile phone, and experimental results show that our method achieves real-time video super-resolution with high video quality. Compared with the state-of-the-art, our method achieves 28 fps streaming speed with 41.6 PSNR, which is 14x faster and 2.29 dB better in the live video resolution upscaling tasks. Code available in https://github.com/coulsonlee/STDO-CVPR2023.git.
引用
收藏
页码:10259 / 10269
页数:11
相关论文
共 61 条
[1]  
[Anonymous], 2017, ARXIV170208635
[2]  
[Anonymous], 2010, ADV NEURAL INFORM PR
[3]  
[Anonymous], 2019, 25 ANN INT C MOB COM, DOI DOI 10.1145/3300061.3345455
[4]  
[Anonymous], 2022, ARTIFICIAL INTELLIGE, DOI DOI 10.1017/S0144686X22000071
[5]  
Bengio Y, 2007, LARGE SCALE KERNEL M, V34, P1
[6]   Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation [J].
Caballero, Jose ;
Ledig, Christian ;
Aitken, Andrew ;
Acosta, Alejandro ;
Totz, Johannes ;
Wang, Zehan ;
Shi, Wenzhe .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2848-2857
[7]   Pre-Trained Image Processing Transformer [J].
Chen, Hanting ;
Wang, Yunhe ;
Guo, Tianyu ;
Xu, Chang ;
Deng, Yiping ;
Liu, Zhenhua ;
Ma, Siwei ;
Xu, Chunjing ;
Xu, Chao ;
Gao, Wen .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :12294-12305
[8]   SR360: Boosting 360-Degree Video Streaming with Super-Resolution [J].
Chen, Jiawen ;
Hu, Miao ;
Luo, Zhenxiao ;
Wang, Zelong ;
Wu, Di .
NOSSDAV '20: PROCEEDINGS OF THE 2020 WORKSHOP ON NETWORK AND OPERATING SYSTEM SUPPORT FOR DIGITAL AUDIO AND VIDEO, 2020, :1-6
[9]  
Chen T, 2017, ADV MAT SCI ENG, V2017, P1, DOI DOI 10.1109/NAPS.2017.8107189
[10]   Deformable Convolutional Networks [J].
Dai, Jifeng ;
Qi, Haozhi ;
Xiong, Yuwen ;
Li, Yi ;
Zhang, Guodong ;
Hu, Han ;
Wei, Yichen .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773