Deep Multi-modality Soft-decoding of Very Low Bit-rate Face Videos

被引:10
作者
Guo, Yanhui [1 ]
Zhang, Xi [2 ]
Wu, Xiaolin [1 ]
机构
[1] McMaster Univ, Halmilton, ON, Canada
[2] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
来源
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA | 2020年
关键词
Multi-modality; neural networks; video restoration; soft decoding; COMPUTATION; MFCC;
D O I
10.1145/3394171.3413709
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a novel deep multi-modality neural network for restoring very low bit rate videos of talking heads. Such video contents are very common in social media, teleconferencing, distance education, tele-medicine, etc., and often need to be transmitted with limited bandwidth. The proposed CNN method exploits the correlations among three modalities, video, audio and emotion state of the speaker, to remove the video compression artifacts caused by spatial down sampling and quantization. The deep learning approach turns out to be ideally suited for the video restoration task, as the complex non-linear cross-modality correlations are very difficult to model analytically and explicitly. The new method is a video post processor that can significantly boost the perceptual quality of aggressively compressed talking head videos, while being fully compatible with all existing video compression standards.
引用
收藏
页码:3947 / 3955
页数:9
相关论文
共 38 条
[1]  
Andrew G., 2013, INT C MACHINE LEARNI, V28, P1247
[2]  
[Anonymous], IEEE I CONF COMP VIS
[3]   OpenFace 2.0: Facial Behavior Analysis Toolkit [J].
Baltrusaitis, Tadas ;
Zadeh, Amir ;
Lim, Yao Chong ;
Morency, Louis-Philippe .
PROCEEDINGS 2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018), 2018, :59-66
[4]   Deformable Convolutional Networks [J].
Dai, Jifeng ;
Qi, Haozhi ;
Xiong, Yuwen ;
Li, Yi ;
Zhang, Guodong ;
Hu, Han ;
Wei, Yichen .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773
[5]  
Friesen W.V., 1978, Facial action coding system: a technique for the measurement of facial movement
[6]   Deep Back-Projection Networks For Super-Resolution [J].
Haris, Muhammad ;
Shakhnarovich, Greg ;
Ukita, Norimichi .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1664-1673
[7]  
He XY, 2018, IEEE IMAGE PROC, P216, DOI 10.1109/ICIP.2018.8451086
[8]   AttGAN: Facial Attribute Editing by Only Changing What You Want [J].
He, Zhenliang ;
Zuo, Wangmeng ;
Kan, Meina ;
Shan, Shiguang ;
Chen, Xilin .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (11) :5464-5478
[9]   You Said That?: Synthesising Talking Faces from Audio [J].
Jamaludin, Amir ;
Chung, Joon Son ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2019, 127 (11-12) :1767-1779
[10]   Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation [J].
Jo, Younghyun ;
Oh, Seoung Wug ;
Kang, Jaeyeon ;
Kim, Seon Joo .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3224-3232