Deep Learning-Based Video Coding: A Review and a Case Study

被引:127
作者
Liu, Dong [1 ]
Li, Yue [1 ]
Lin, Jianping [1 ]
Li, Houqiang [1 ]
Wu, Feng [1 ]
机构
[1] Univ Sci & Technol China, CAS Key Lab Technol Geospatial Informat Proc & Ap, 443 Huangshan Rd, Hefei 230027, Anhui, Peoples R China
关键词
Deep learning; image coding; prediction; transform; video coding; IMAGE COMPRESSION; NEURAL-NETWORK; FRAMEWORK;
D O I
10.1145/3368405
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The past decade has witnessed the great success of deep learning in many disciplines, especially in computer vision and image processing. However, deep learning-based video coding remains in its infancy. We review the representative works about using deep learning for image/video coding, an actively developing research area since 2015. We divide the related works into two categories: new coding schemes that are built primarily upon deep networks, and deep network-based coding tools that shall be used within traditional coding schemes. For deep schemes, pixel probability modeling and auto-encoder are the two approaches, that can be viewed as predictive coding and transform coding, respectively. For deep tools, there have been several techniques using deep learning to perform intra-picture prediction, inter-picture prediction, cross-channel prediction, probability distribution prediction, transform, post- or in-loop filtering, down- and up-sampling, as well as encoding optimizations. In the hope of advocating the research of deep learning-based video coding, we present a case study of our developed prototype video codec, Deep Learning Video Coding (DLVC). DLVC features two deep tools that are both based on convolutional neural network (CNN), namely CNN-based in-loop filter and CNN-based block adaptive resolution coding. The source code of DLVC has been released for future research.
引用
收藏
页数:35
相关论文
共 164 条
[1]   Video Compression Based on Spatio-Temporal Resolution Adaptation [J].
Afonso, Mariana ;
Zhang, Fan ;
Bull, David R. .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (01) :275-280
[2]  
Agustsson E, 2017, ADV NEUR IN, V30
[3]  
Agustsson Eirikur., 2018, IEEE C COMPUT VIS PA, P2587
[4]   Lossless Image Compression Using Reversible Integer Wavelet Transforms and Convolutional Neural Networks [J].
Ahanonu, E. ;
Marcellin, M. W. ;
Bilgin, A. .
2018 DATA COMPRESSION CONFERENCE (DCC 2018), 2018, :395-395
[5]  
Akbari M, 2019, INT CONF ACOUST SPEE, P2042, DOI [10.1109/icassp.2019.8683541, 10.1109/ICASSP.2019.8683541]
[6]  
[Anonymous], 2018, VEH SYST DYN, DOI DOI 10.1109/PHM-CHONGQING.2018.00008
[7]  
[Anonymous], puter Vision and Pattern Recognition
[8]  
[Anonymous], 2017, P ICLR
[9]  
[Anonymous], 2018, DCC, DOI DOI 10.1109/DCC.2018.00028
[10]  
[Anonymous], PROC CVPR IEEE