3D Scene Graph Generation From Point Clouds

被引:1
作者
Wei, Wenwen [1 ,2 ]
Wei, Ping [1 ,2 ]
Qin, Jialu [1 ,2 ]
Liao, Zhimin [1 ,2 ]
Wang, Shuaijie [1 ,2 ]
Cheng, Xiang [3 ]
Liu, Meiqin [1 ,2 ]
Zheng, Nanning [1 ,2 ]
机构
[1] Xi An Jiao Tong Univ, Natl Key Lab Human Machine Hybrid Augmented Intell, Xian 710049, Peoples R China
[2] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian 710049, Peoples R China
[3] Peking Univ, Beijing 100871, Peoples R China
关键词
Three-dimensional displays; Feature extraction; Point cloud compression; Task analysis; Head; Semantics; Proposals; 3D scene graph generation; point RoI; location attention; point cloud; OBJECT; NETWORK;
D O I
10.1109/TMM.2023.3331583
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scene graph generation is a significant and challenging task for scene understanding. Most existing methods are confined to the 2D space (i.e. images) or additional use of segmentation information, while neglecting the richer spatial and geometric information of 3D space. In this paper, we propose a novel method to generate scene graphs from 3D point clouds. Specifically, our model consists of three parts: a point feature extraction backbone, a box head, and a relation head. The feature extraction backbone extracts base features directly from raw point clouds, and the box head produces detected 3D bounding boxes. Final 3D scene graphs are obtained from the relation head which takes the extracted features and 3D boxes as inputs. We also design a point RoI module which sequentially processes points inside 3D boxes with a bidirectional LSTM. To further leverage the geometric characteristics of point clouds, we propose a location attention module which learns the influence of relative locations between objects. We introduce the RelationScanNet dataset with densely annotated semantic and geometric relationships, which extends one of the most widely used dataset ScanNetV2 in 3D indoor scene understanding. We test the proposed method on the RelationScanNet dataset and 3DSSG dataset. The results prove the strength of our method.
引用
收藏
页码:5358 / 5368
页数:11
相关论文
共 64 条
[1]   VQA: Visual Question Answering [J].
Antol, Stanislaw ;
Agrawal, Aishwarya ;
Lu, Jiasen ;
Mitchell, Margaret ;
Batra, Dhruv ;
Zitnick, C. Lawrence ;
Parikh, Devi .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2425-2433
[2]   3D Scene Graph: A structure for unified semantics, 3D space, and camera [J].
Armeni, Iro ;
He, Zhi-Yang ;
Gwak, JunYoung ;
Zamir, Amir R. ;
Fischer, Martin ;
Malik, Jitendra ;
Savarese, Silvio .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :5663-5672
[3]   Broad-to-Narrow Registration and Identification of 3D Objects in Partially Scanned and Cluttered Point Clouds [J].
Arvanitis, Gerasimos ;
Zacharaki, Evangelia I. ;
Vasa, Libor ;
Moustakas, Konstantinos .
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 :2230-2245
[4]  
Beltrán J, 2018, IEEE INT C INTELL TR, P3517, DOI 10.1109/ITSC.2018.8569311
[5]   Knowledge-Embedded Routing Network for Scene Graph Generation [J].
Chen, Tianshui ;
Yu, Weihao ;
Chen, Riquan ;
Lin, Liang .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :6156-6164
[6]  
Chen VS, 2019, IEEE I CONF COMP VIS, P2580, DOI [10.1109/ICCV.2019.00267, 10.1109/iccv.2019.00267]
[7]   Multi-View 3D Object Detection Network for Autonomous Driving [J].
Chen, Xiaozhi ;
Ma, Huimin ;
Wan, Ji ;
Li, Bo ;
Xia, Tian .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6526-6534
[8]  
Chen YL, 2019, IEEE I CONF COMP VIS, P9774, DOI [10.1109/ICCV.2019.00987, 10.1109/iccv.2019.00987]
[9]   ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes [J].
Dai, Angela ;
Chang, Angel X. ;
Savva, Manolis ;
Halber, Maciej ;
Funkhouser, Thomas ;
Niessner, Matthias .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2432-2443
[10]  
Engelcke Martin, 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA), P1355, DOI 10.1109/ICRA.2017.7989161