Exploring Hierarchical Spatial Layout Cues for 3D Point Cloud Based Scene Graph Prediction

被引：9

作者：

Feng, Mingtao ^{[1
]}

Hou, Haoran ^{[1
]}

Zhang, Liang ^{[1
]}

Guo, Yulan ^{[2
]}

Yu, Hongshan ^{[3
,4
]}

Wang, Yaonan ^{[3
]}

Mian, Ajmal ^{[5
]}

机构：

[1] Xidian Univ, Sch Comp Sci & Technol, Xian 710071, Peoples R China

[2] Sun Yat Sen Univ, Sch Elect & Commun Engn, Shenzhen Campus, Shenzhen 518107, Peoples R China

[3] Hunan Univ, Coll Elect & Informat Engn, Changsha 410082, Peoples R China

[4] Hunan Univ, Quanzhou Inst Ind Design & Machine Intelligence In, Changsha 410012, Peoples R China

[5] Univ Western Australia, Dept Comp Sci & Software Engn, Perth Crawley, WA 6009, Australia

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2025年 / 27卷

基金：

中国国家自然科学基金; 澳大利亚研究理事会;

关键词：

Three-dimensional displays; Layout; Point cloud compression; Solid modeling; Semantics; Cognition; Task analysis; 3D scene graph; hierarchical reasoning; point cloud; spatial layout; NETWORK;

D O I：

10.1109/TMM.2023.3277736

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

3D scene graph prediction is important for intelligent agents to gather information and perceive semantics of their environments. However, constructing an effective graph is nontrivial given the complexity of natural scenes. Existing solutions for graph representation of 3D scenes still distinguish each detailed discrepancy among all the relationships as flat thinking, ignoring the mechanism used by humans to perform this task. Inspired by the role of the prefrontal cortex in hierarchical reasoning, we analyze this problem from a novel perspective: exploring hierarchical spatial layout cues in 3D space and navigating that hierarchy to make the 3D scene graph more accurate in a vertical division to horizontal propagation strategy. To this end, we first encode the contextual object features for fine-gained object category classification. Next, we build a bottom-up hierarchical graph to predict remarkably diverse support relationships in a single concept regardless of numerous irrelevant relationships. Finally, equipped with the spatially-true and semantically-meaningful support relationships, we focus on the local region layout to propagate the semantic features to predict the additional non-support relationships under the guidance of the given referred hierarchical graph nodes. Experiments on the challenging 3DSSG benchmark show that our algorithm outperforms existing state-of-the-art, and can also alleviate the impact of the long-tailed distribution of training data.

引用

页码：731 / 743

页数：13

共 64 条

[1] Achlioptas Panos, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P422, DOI 10.1007/978-3-030-58452-8_25
[2] 3D Scene Graph: A structure for unified semantics, 3D space, and camera
Armeni, Iro
He, Zhi-Yang
Gwak, JunYoung
Zamir, Amir R.
Fischer, Martin
Malik, Jitendra
Savarese, Silvio
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5663 - 5672
[3] HAPGN: Hierarchical Attentive Pooling Graph Network for Point Cloud Segmentation
Chen, Chaofan
Qian, Shengsheng
Fang, Quan
Xu, Changsheng
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 2335 - 2346
[4] Chen D.Z., 2020, EUR C COMP VIS, P202
[5] Scene Recognition With Prototype-Agnostic Scene Layout
Chen, Gongwei
Song, Xinhang
Zeng, Haitao
Jiang, Shuqiang
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 5877 - 5888
[6] A Hierarchical Graph Network for 3D Object Detection on Point Clouds
Chen, Jintai
Lei, Biwen
Song, Qingyu
Ying, Haochao
Chen, Danny Z.
Wu, Jian
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 389 - 398
[7] Counterfactual Critic Multi-Agent Training for Scene Graph Generation
Chen, Long
Zhang, Hanwang
Xiao, Jun
He, Xiangnan
Pu, Shiliang
Chang, Shih-Fu
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4612 - 4622
[8] Knowledge-Embedded Routing Network for Scene Graph Generation
Chen, Tianshui
Yu, Weihao
Chen, Riquan
Lin, Liang
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6156 - 6164
[9] Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense
Chen, Yixin
Huang, Siyuan
Yuan, Tao
Qi, Siyuan
Zhu, Yixin
Zhu, Song-Chun
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8647 - 8656
[10] Free-form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud
Feng, Mingtao
Li, Zhen
Li, Qi
Zhang, Liang
Zhang, XiangDong
Zhu, Guangming
Zhang, Hui
Wang, Yaonan
Mian, Ajmal
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 3702 - 3711

← 1 2 3 4 5 6 7 →