End-To-End Compression for Surveillance Video With Unsupervised Foreground-Background Separation

被引:12
作者
Zhao, Yu [1 ]
Luo, Dengyan [1 ]
Wang, Fuchun [1 ]
Gao, Han [1 ]
Ye, Mao [1 ]
Zhu, Ce [2 ]
机构
[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China
基金
中国国家自然科学基金;
关键词
Encoding; Surveillance; Video compression; Video coding; Neural networks; Deep learning; Streaming media; foreground-background separation; surveillance video; PREDICTION; CASCADE; FRAMES; HEVC;
D O I
10.1109/TBC.2023.3280039
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With the exponential growth of surveillance video, efficient video coding method is in great demand. The learning-based methods emerge which either directly use a general video compression framework, or separate the foreground and background and then compress them in two stages. However, they do not take into account the relatively static background fact of surveillance video, or simply separate foreground and background in offline mode which reduces the separation performance because the temporal domain correlation is not considered very well. In this paper, we propose an end-to-end Unsupervised foreground-background separation based Video Compression neural Networks, dubbed as UVCNet. Our method mainly consists of three parts. First, the Mask Net unsupervisely separates foreground and background online which sufficiently uses the temporal correlation prior. Then, a traditional motion estimation-based residual coding module is applied to foreground compression. Simultaneously, a background compression module is applied to compress background residual and update the background by sufficiently using the relatively static property. Compared with previous approaches, our method does not separate foreground and background in advance but in an end-to-end manner. So we can not only use the relatively static background property to save bit rate, but also achieve end-to-end online video compression. Experimental results demonstrate that the proposed UVCNet achieves superior performance compared with the state-of-the-art methods. Specifically, UVCNet can achieve 2.11 dB average improvement on Peak Signal-to-Noise Ratio (PSNR) compared with H.265 on surveillance datasets.
引用
收藏
页码:966 / 978
页数:13
相关论文
共 58 条
[1]  
Bagdanov A. D., 2011, Proceedings of the 2011 IEEE International Symposium on Multimedia (ISM 2011), P190, DOI 10.1109/ISM.2011.38
[2]   A virtual environment for EMC testing [J].
Bertocco, M ;
Offelli, C ;
Sona, A .
2001 IEEE INTERNATIONAL WORKSHOP ON VIRTUAL AND INTELLIGENT MEASUREMENT SYSTEMS, 2001, :18-21
[3]  
Bielski Adam, 2019, ADV NEURAL INFORM PR, V32
[4]  
Bjontegaard G., 2001, Calculation of average PSNR differences between RDcurves
[5]  
Brutzer S, 2011, PROC CVPR IEEE
[6]  
Chakraborty S., 2014, PROC IEEE 16 INT WOR, P1
[7]   Block-Composed Background Reference for High Efficiency Video Coding [J].
Chen, Fangdong ;
Li, Houqiang ;
Li, Li ;
Liu, Dong ;
Wu, Feng .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2017, 27 (12) :2639-2651
[8]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893
[9]  
Dhar P. K., 2012, INT J SIGNAL PROCESS, V5, P93
[10]   Efficient kernel density estimation using the fast gauss transform with applications to color modeling and tracking [J].
Elgammal, A ;
Duraiswami, R ;
Davis, LS .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2003, 25 (11) :1499-1504