PQ-Transformer: Jointly Parsing 3D Objects and Layouts From Point Clouds

被引：23

作者：

Chen, Xiaoxue ^{[1
]}

Zhao, Hao ^{[2
]}

Zhou, Guyue ^{[1
]}

Zhang, Ya-Qin ^{[1
]}

机构：

[1] Tsinghua Univ, Inst AI Ind Res, Beijing 100190, Peoples R China

[2] Peking Univ, Intel Labs China, Beijing 100871, Peoples R China

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2022年 / 7卷 / 02期

关键词：

Object detection; layout; point cloud; NETWORK;

D O I：

10.1109/LRA.2022.3143224

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

3D scene understanding from point clouds plays a vital role for various robotic applications. Unfortunately, current state-of-the-art methods use separate neural networks for different tasks like object detection or room layout estimation. Such a scheme has two limitations: 1) Storing and running several networks for different tasks are expensive for typical robotic platforms. 2) The intrinsic structure of separate outputs are ignored and potentially violated. To this end, we propose the first transformer architecture that predicts 3D objects and layouts simultaneously, using point cloud inputs. Unlike existing methods that either estimate layout keypoints or edges, we directly parameterize room layout as a set of quads. As such, the proposed architecture is termed as P(oint)Q(uad)-Transformer. Along with the novel quad representation, we propose a tailored physical constraint loss function that discourages object-layout interference. The quantitative and qualitative evaluations on the public benchmark Scan Net show that the proposed PQ-Transformer succeeds to jointly parse 3D objects and layouts, running at a quasi-real-time (8.91 FPS) rate without efficiency-oriented optimization. Moreover, the new physical constraint lass can improve strong baselines, and the F1-score of the room layout is significantly promoted from 37.9% to 57.9%.(1)

引用

页码：2519 / 2526

页数：8

共 37 条

[1]

[Anonymous], 2011, Advances in Neural Information Processing Systems

[2] SceneCAD: Predicting Object Alignments and Layouts in RGB-D Scans [J].

Avetisyan, Armen ;

Khanova, Tatiana ;

Choy, Christopher ;

Dash, Denver ;

Dai, Angela ;

Niessner, Matthias .

COMPUTER VISION - ECCV 2020, PT XXII, 2020, 12367 :596-612

[3] A Hierarchical Graph Network for 3D Object Detection on Point Clouds [J].

Chen, Jintai ;

Lei, Biwen ;

Song, Qingyu ;

Ying, Haochao ;

Chen, Danny Z. ;

Wu, Jian .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :389-398

[4] Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense [J].

Chen, Yixin ;

Huang, Siyuan ;

Yuan, Tao ;

Qi, Siyuan ;

Zhu, Yixin ;

Zhu, Song-Chun .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :8647-8656

[5] Location Extraction from Twitter Messages using Bidirectional Long Short-Term Memory Model [J].

Chen, Zi ;

Pokharel, Badal ;

Li, Bingnan ;

Lim, Samsung .

PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON GEOGRAPHICAL INFORMATION SYSTEMS THEORY, APPLICATIONS AND MANAGEMENT (GISTAM), 2020, :45-50

[6] Indoor Scene Understanding with Geometric and Semantic Contexts [J].

Choi, Wongun ;

Chao, Yu-Wei ;

Pantofaru, Caroline ;

Savarese, Silvio .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 112 (02) :204-220

[7] 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks [J].

Choy, Christopher ;

Gwak, JunYoung ;

Savarese, Silvio .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3070-3079

[8] Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis [J].

Dai, Angela ;

Qi, Charles Ruizhongtai ;

Niessner, Matthias .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6545-6554

[9] ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes [J].

Dai, Angela ;

Chang, Angel X. ;

Savva, Manolis ;

Halber, Maciej ;

Funkhouser, Thomas ;

Niessner, Matthias .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2432-2443

[10] DeLay: Robust Spatial Layout Estimation for Cluttered Indoor Scenes [J].

Dasgupta, Saumitro ;

Fang, Kuan ;

Chen, Kevin ;

Savarese, Silvio .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :616-624

← 1 2 3 4 →