A GENERATIVE COMPRESSION FRAMEWORK FOR LOW BANDWIDTH VIDEO CONFERENCE

被引:1
作者
Feng, Dahu [1 ]
Huang, Yan [1 ]
Zhang, Yiwei [1 ]
Ling, Jun [1 ]
Tang, Anni [1 ]
Song, Li [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Inst Image Commun & Network Engn, Shanghai, Peoples R China
[2] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China
来源
2021 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW) | 2021年
关键词
Video Compression; Generative Compression; image reconstruction; IMAGE SYNTHESIS;
D O I
10.1109/ICMEW53276.2021.9455985
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Video conferences introduce a new scenario for video transmission, which focuses on keeping the fidelity of faces even in the low bandwidth network environment. In this work, we propose VSBNet, one of the frameworks to utilize face landmarks in video compression. Our method utilizes the adversarial learning to reconstruct origin frames from the landmarks. To recover more details and keep the consistency of identity, we propose the concept of visual sensitivity to separate the contour of the face from the fast-moving parts, such as eyes and mouth. Experimental results demonstrate the superiority of our framework with a low bit rate of around 1KB/s.
引用
收藏
页数:6
相关论文
共 16 条
[1]  
Arjovsky M, 2017, PR MACH LEARN RES, V70
[2]  
Bossen F., 2018, JOINT VIDEO EXPERTS, V16
[3]  
Brandenburg Jens, VVEN C FRAUNHOFER, V1, P2
[4]  
Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
[5]   Deep Multi-modality Soft-decoding of Very Low Bit-rate Face Videos [J].
Guo, Yanhui ;
Zhang, Xi ;
Wu, Xiaolin .
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, :3947-3955
[6]   TOWARDS CODING FOR HUMAN AND MACHINE VISION: A SCALABLE IMAGE CODING APPROACH [J].
Hu, Yueyu ;
Yang, Shuai ;
Yang, Wenhan ;
Duan, Ling-Yu ;
Liu, Jiaying .
2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
[7]  
Jun Ling, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12373), P37, DOI 10.1007/978-3-030-58604-1_3
[8]   The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English [J].
Livingstone, Steven R. ;
Russo, Frank A. .
PLOS ONE, 2018, 13 (05)
[9]   DVC: An End-to-end Deep Video Compression Framework [J].
Lu, Guo ;
Ouyang, Wanli ;
Xu, Dong ;
Zhang, Xiaoyun ;
Cai, Chunlei ;
Gao, Zhiyong .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :10998-11007
[10]   FSGAN: Subject Agnostic Face Swapping and Reenactment [J].
Nirkin, Yuval ;
Keller, Yosi ;
Hassner, Tal .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :7183-7192