M-LVC: Multiple Frames Prediction for Learned Video Compression

被引:141
作者
Lin, Jianping [1 ]
Liu, Dong [1 ]
Li, Houqiang [1 ]
Wu, Feng [1 ]
机构
[1] Univ Sci & Technol China, CAS Key Lab Technol Geospatial Informat Proc & Ap, Hefei 230027, Peoples R China
来源
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2020年
关键词
D O I
10.1109/CVPR42600.2020.00360
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose an end-to-end learned video compression scheme for low-latency scenarios. Previous methods are limited in using the previous one frame as reference. Our method introduces the usage of the previous multiple frames as references. In our scheme, the motion vector (MV) field is calculated between the current frame and the previous one. With multiple reference frames and associated multiple MV fields, our designed network can generate more accurate prediction of the current frame, yielding less residual. Multiple reference frames also help generate MV prediction, which reduces the coding cost of MV field. We use two deep auto-encoders to compress the residual and the MV, respectively. To compensate for the compression error of the auto-encoders, we further design a MV refinement network and a residual refinement network, taking use of the multiple reference frames as well. All the modules in our scheme are jointly optimized through a single rate-distortion loss function. We use a step-by-step training strategy to optimize the entire scheme. Experimental results show that the proposed method outperforms the existing learned video compression methods for low-latency mode. Our method also performs better than H.265 in both PSNR and MS-SSIM. Our code and models are publicly available.
引用
收藏
页码:3543 / 3551
页数:9
相关论文
共 31 条
[1]  
[Anonymous], 2017, P ICLR
[2]  
[Anonymous], 2011, JCTVCF900
[3]  
[Anonymous], 2018, CISC VIS NETW IND GL
[4]  
[Anonymous], 2015, INT C LEARN REPR
[5]  
Balle Johannes, 2016, 5 INT C LEARNING REP
[6]  
Balle Johannes, 2018, P INT C LEARN REPR M
[7]  
Bellard F., Bpg image format
[8]   Neural Inter-Frame Compression for Video Coding [J].
Djelouah, Abdelaziz ;
Campos, Joaquim ;
Schaub-Meyer, Simone ;
Schroers, Christopher .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6430-6438
[9]  
Han Jun, 2018, ARXIV181002845
[10]   Recurrent Back-Projection Network for Video Super-Resolution [J].
Haris, Muhammad ;
Shakhnarovich, Greg ;
Ukita, Norimichi .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3892-3901