Convolutional Neural Network-Based Occupancy Map Accuracy Improvement for Video-Based Point Cloud Compression

被引：16

作者：

Jia, Wei ^{[1
]}

Li, Li ^{[2
]}

Akhtar, Anique ^{[1
]}

Li, Zhu ^{[1
]}

Liu, Shan ^{[3
]}

机构：

[1] Univ Missouri, Dept Comp Sci & Elect Engn, Kansas City, MO 64110 USA

[2] Univ Sci & Technol, Dept Elect Engn & Informance Sci, Hefei 230027, Anhui, Peoples R China

[3] Tencent Media Lab, Palo Alto, CA 94306 USA

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2022年 / 24卷

关键词：

Three-dimensional displays; Videos; Geometry; Heuristic algorithms; Noise measurement; Bit rate; Software algorithms; Convolutional neural network; high efficiency video coding; occupancy map; segmentation; video-based point cloud compression; MPEG;

D O I：

10.1109/TMM.2021.3079698

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In video-based point cloud compression (V-PCC), a dynamic point cloud is projected onto geometry and attribute videos patch by patch for compression. In addition to the geometry and attribute videos, an occupancy map video is compressed into a V-PCC bitstream to indicate whether a two-dimensional (2D) point in the projected geometry video corresponds to any point in three-dimensional (3D) space. The occupancy map video is usually downsampled before compression to obtain a tradeoff between the bitrate and the reconstructed point cloud quality. Due to the accuracy loss in the downsampling process, some noisy points are generated, which leads to severe objective and subjective quality degradation of the reconstructed point cloud. To improve the quality of the reconstructed point cloud, we propose using a convolutional neural network (CNN) to improve the accuracy of the occupancy map video. We mainly make the following contributions. First, we improve the accuracy of the occupancy map video by formulating the problem as a binary segmentation problem since the pixel values of the occupancy map video are either 0 or 1. Second, in addition to the downsampled occupancy map video, we introduce a reconstructed geometry video as the other input of the CNN to provide more useful information in order to indicate the occupancy map video. To the best of our knowledge, this is the first learning-based work to improve the performance of V-PCC. Compared to state-of-the-art schemes, our proposed CNN-based approach achieves much more accurate occupancy map videos and significant bitrate savings.

引用

页码：2352 / 2365

页数：14

共 43 条

[1]

Andrivon P, 2019, DOCU MENT ISOIEC JTC

[2]

[Anonymous], POINT CLOUD COMPRESS

[3]

[Anonymous], MOBILE MAPPING SYSTE

[4]

[Anonymous], 2017, ACM Transactions on Graphics (TOG), DOI DOI 10.1145/3083722

[5]

[Anonymous], CULTURE 3D CLOUD

[6]

Bjontegaard G., 2001, ITU T SG16 Q 6

[7]

Bruder G, 2014, 2014 IEEE SYMPOSIUM ON 3D USER INTERFACES (3DUI), P161, DOI 10.1109/3DUI.2014.6798870

[8]

Cai K., 2019, DOCUMENT ISOIEC JTC1

[9] Point Cloud Encoding for 3D Building Model Retrieval [J].

Chen, Jyun-Yuan ;

Lin, Chao-Hung ;

Hsu, Po-Chi ;

Chen, Chung-Hao .

IEEE TRANSACTIONS ON MULTIMEDIA, 2014, 16 (02) :337-345

[10]

Chen X, 2016, 2016405 NCES US DEP

← 1 2 3 4 5 →