OctAttention: Octree-Based Large-Scale Contexts Model for Point Cloud Compression

被引：0

作者：

Fu, Chunyang ^{[1
,2
]}

Li, Ge ^{[1
]}

Song, Rui ^{[1
]}

Gao, Wei ^{[1
,2
]}

Liu, Shan ^{[3
]}

机构：

[1] Peking Univ, Shenzhen Grad Sch, Sch Elect & Comp Engn, Beijing, Peoples R China

[2] Peng Cheng Lab, Shenzhen, Peoples R China

[3] Tencent Amer, Palo Alto, CA USA

来源：

THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2022年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In point cloud compression, sufficient contexts are significant for modeling the point cloud distribution. However, the contexts gathered by the previous voxel-based methods decrease when handling sparse point clouds. To address this problem, we propose a multiple-contexts deep learning framework called OctAttention employing the octree structure, a memory-efficient representation for point clouds. Our approach encodes octree symbol sequences in a lossless way by gathering the information of sibling and ancestor nodes. Expressly, we first represent point clouds with octree to reduce spatial redundancy, which is robust for point clouds with different resolutions. We then design a conditional entropy model with a large receptive field that models the sibling and ancestor contexts to exploit the strong dependency among the neighboring nodes and employ an attention mechanism to emphasize the correlated nodes in the context. Furthermore, we introduce a mask operation during training and testing to make a trade-off between encoding time and performance. Compared to the previous state-of-the-art works, our approach obtains a 10%-35% BD-Rate gain on the LiDAR benchmark (e.g. SemanticKITTI) and object point cloud dataset (e.g. MPEG 8i, MVUB), and saves 95% coding time compared to the voxel-based baseline.

引用

页码：625 / 633

页数：9

共 35 条

[1]

3DG, 2021, JTC1SC29WG7W20346 IS

[2] SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences [J].

Behley, Jens ;

Garbade, Martin ;

Milioto, Andres ;

Quenzel, Jan ;

Behnke, Sven ;

Stachniss, Cyrill ;

Gall, Juergen .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9296-9306

[3]

Biswas S., 2020, Advances in Neural Information Processing Systems (NeurIPS), V33, P1

[4]

Charles L., 2016, Standard ISO/IEC JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG) m38673/M72012

[5] A Volumetric Approach to Point Cloud Compression-Part I: Attribute Compression [J].

Chou, Philip A. ;

Koroteev, Maxim ;

Krivokuca, Maja .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :2203-2216

[6] LEARNING-BASED LOSSLESS COMPRESSION OF 3D POINT CLOUD GEOMETRY [J].

Dat Thanh Nguyen ;

Quach, Maurice ;

Valenzise, Giuseppe ;

Duhamel, Pierre .

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :4220-4224

[7] Geometric compression for interactive transmission [J].

Devillers, O ;

Gandoin, PM .

VISUALIZATION 2000, PROCEEDINGS, 2000, :319-326

[8]

Eugene d'Eon, 2017, ISO/IEC JTC1/SC29 JointWG11/WG1 (MPEG/JPEG) input document WG11M40059/WG1M74006

[9]

Gao Z., 2019, JTC1SC29WG11M51012 I

[10] Geometry Coding for Dynamic Voxelized Point Clouds Using Octrees and Multiple Contexts [J].

Garcia, Diogo C. ;

Fonseca, Tiago A. ;

Ferreira, Renan U. ;

de Queiroz, Ricardo L. .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :313-322

← 1 2 3 4 →