SALIENCY-DRIVEN VERSATILE VIDEO CODING FOR NEURAL OBJECT DETECTION

被引：22

作者：

Fischer, Kristian ^{[1
]}

Fleckenstein, Felix ^{[1
]}

Herglotz, Christian ^{[1
]}

Kaup, Andre ^{[1
]}

机构：

[1] Friedrich Alexander Univ Erlangen Nurnberg FAU, Cauerstr 7, D-91058 Erlangen, Germany

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

Video Coding for Machines; Saliency Coding; Versatile Video Coding; Mask R-CNN; YOLO;

D O I：

10.1109/ICASSP39728.2021.9415048

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Saliency-driven image and video coding for humans has gained importance in the recent past. In this paper, we propose such a saliency-driven coding framework for the video coding for machines task using the latest video coding standard Versatile Video Coding (VVC). To determine the salient regions before encoding, we employ the real-time-capable object detection network You Only Look Once (YOLO) in combination with a novel decision criterion. To measure the coding quality for a machine, the state-of-the-art object segmentation network Mask R-CNN was applied to the decoded frame. From extensive simulations we find that, compared to the reference VVC with a constant quality, up to 29% of bitrate can be saved with the same detection accuracy at the decoder side by applying the proposed saliency-driven framework. Besides, we compare YOLO against other, more traditional saliency detection methods.

引用

页码：1505 / 1509

页数：5

共 22 条

[1]

Bjontegaard Gisle, 2001, Calculation of average PSNR differ

[2]

Bossen F., 2019, TECH REP

[3]

Chen J., 2019, TECH REP

[4] BING: Binarized Normed Gradients for Objectness Estimation at 300fps [J].

Cheng, Ming-Ming ;

Zhang, Ziming ;

Lin, Wen-Yan ;

Torr, Philip .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :3286-3293

[5]

Choi H, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P1792, DOI 10.1109/ICASSP.2018.8462653

[6] The Cityscapes Dataset for Semantic Urban Scene Understanding [J].

Cordts, Marius ;

Omran, Mohamed ;

Ramos, Sebastian ;

Rehfeld, Timo ;

Enzweiler, Markus ;

Benenson, Rodrigo ;

Franke, Uwe ;

Roth, Stefan ;

Schiele, Bernt .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223

[7]

Fischer K., 2020, IEEE INT WORKSH MULT, P1

[8]

Fischer K, 2020, IEEE IMAGE PROC, P1147, DOI 10.1109/ICIP40778.2020.9191023

[9]

Galteri Leonardo, 2018, 2018 24th International Conference on Pattern Recognition (ICPR), P3007, DOI 10.1109/ICPR.2018.8546064

[10]

He KM, 2017, IEEE I CONF COMP VIS, P2980, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]

← 1 2 3 →