A Novel Video Coding Strategy in HEVC for Object Detection

被引：16

作者：

Cai, Qi ^{[1
]}

Chen, Zhifeng ^{[2
]}

Wu, Dapeng Oliver ^{[1
]}

Liu, Shan ^{[3
]}

Li, Xiang ^{[3
]}

机构：

[1] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32608 USA

[2] Fuzhou Univ, Dept Phys & Informat Engn, Fuzhou 350108, Peoples R China

[3] Tencent Amer, Media Lab, Palo Alto, CA 94306 USA

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2021年 / 31卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Video coding; Object detection; Bit rate; Encoding; Codecs; Detectors; Visualization; HEVC; object detection; detection accuracy modeling; pixel-level impact on detection; bit allocation; MODEL;

D O I：

10.1109/TCSVT.2021.3056134

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Occupying the most significant portion of global data traffic, video is being generated in almost every aspect of our life. Because of its huge volume, we are depending much more heavily on machine intelligence based analysis. In the meantime, video coding technology has been continuously improved for better compression efficiency. However, the state-of-the-art video coding standards, such as H.265/HEVC and versatile video coding (VVC), are still designed assuming that the compressed video will be watched by a human later. Such a design is not optimal when the compressed video will be used by computer vision applications. While the human visual system (HVS) is consistently sensitive to the content with high contrast, the impact of pixels on computer vision algorithms is task driven. For example, because of the different categories of objects used to train detection algorithms, the influence of the same image content on those detectors also varies. Therefore, human oriented video coding strategies may not be optimal when the compressed signal is further processed by algorithms, as the encoder is unaware of the task specific information. In this article, taking object detection as an example, we propose a novel video coding strategy for computer vision. By protecting the information according to its importance for an object detector rather than for the human visual system, our proposed method has the potential to achieve a better object detection performance with the same bandwidth. The main contributions of our paper are: 1) the modeling of the relationship between object detection accuracy and bit rate; 2) a back propagation based method to analyze the influence of each pixel on the detection of target objects; 3) an object detection oriented bit allocation and codec control parameter determination scheme; 4) an evaluation metric to compare the impact of video coding strategies on a given object detector over a predefined range of bit rate. Experimental results demonstrate that our proposed algorithm can better preserve the video content vital for object detection than state-of-the-art video coding schemes.

引用

页码：4924 / 4937

页数：14

共 42 条

[1]

Alexe B, 2010, PROC CVPR IEEE, P73, DOI 10.1109/CVPR.2010.5540226

[2]

[Anonymous], 2003, Advanced Video Coding for Generic Audiovisual Services, documentITU-TH.264, ISO/IEC 14496-10 AVC

[3]

[Anonymous], 2015, ISO/IEC International Standard 23008-2

[4]

Bjontegaard G., 2001, VCEGM33

[5] Salient Object Detection: A Benchmark [J].

Borji, Ali ;

Cheng, Ming-Ming ;

Jiang, Huaizu ;

Li, Jia .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) :5706-5722

[6]

Borji Ali, 2019, [Computational Visual Media, 计算可视媒体], V5, P117

[7]

Bossen Frank., 2011, Joint Collaborative Team on Video Coding (JCT-VC), JCTVC-F900

[8]

Bross B., 2019, Joint Video Experts Team (JVET) of ITU-T SG, V16, P3

[9] Quality-of-Content-Based Joint Source and Channel Coding for Human Detections in a Mobile Surveillance Cloud [J].

Chen, Xiang ;

Hwang, Jenq-Neng ;

Meng, De ;

Lee, Kuan-Hui ;

de Queiroz, Ricardo L. ;

Yeh, Fu-Ming .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2017, 27 (01) :19-31

[10] Perceptually-Friendly H.264/AVC Video Coding Based on Foveated Just-Noticeable-Distortion Model [J].

Chen, Zhenzhong ;

Guillemot, Christine .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2010, 20 (06) :806-819

← 1 2 3 4 5 →