Granular3D: Delving into multi-granularity 3D scene graph prediction

被引：0

作者：

Huang, Kaixiang ^{[1
,2
]}

Yang, Jingru ^{[1
,2
]}

Wang, Jin ^{[1
,2
,6
]}

He, Shengfeng ^{[3
]}

Wang, Zhan ^{[4
]}

He, Haiyan ^{[1
,2
,5
]}

Zhang, Qifeng

Lu, Guodong ^{[1
,2
]}

机构：

[1] Zhejiang Univ, State Key Lab Fluid Power & Mechatron Syst, Hangzhou 310027, Zhejiang, Peoples R China

[2] Zhejiang Univ, Robot Inst, Hangzhou 310027, Zhejiang, Peoples R China

[3] Singapore Management Univ, Singapore 178903, Singapore

[4] Zhejiang Energy Digital Technol Co Ltd, Dept Artificial Intelligence & Robot, Hangzhou 310027, Zhejiang, Peoples R China

[5] Zhejiang Baima Lake Lab Co Ltd, Hangzhou 310000, Zhejiang, Peoples R China

[6] Jinhua Key Lab Robot Intelligent Welding Technol, Jinhua 321000, Zhejiang, Peoples R China

来源：

PATTERN RECOGNITION | 2024年 / 153卷

基金：

中国国家自然科学基金;

关键词：

3D point cloud; 3D semantic scene graph prediction; Multi-granularity; Gather point transformer; LANGUAGE;

D O I：

10.1016/j.patcog.2024.110562

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper addresses the significant challenges in 3D Semantic Scene Graph (3DSSG) prediction, essential for understanding complex 3D environments. Traditional approaches, primarily using PointNet and Graph Convolutional Networks, struggle with effectively extracting multi -grained features from intricate 3D scenes, largely due to a focus on global scene processing and single -scale feature extraction. To overcome these limitations, we introduce Granular3D, a novel approach that shifts the focus towards multi -granularity analysis by predicting relation triplets from specific sub -scenes. One key is the Adaptive Instance Enveloping Method (AIEM), which establishes an approximate envelope structure around irregular instances, providing shape -adaptive local point cloud sampling, thereby comprehensively covering the contextual environments of instances. Moreover, Granular3D incorporates a Hierarchical Dual -Stage Network (HDSN), which differentiates and processes features of instances and their pairs at varying scales, leading to a targeted prediction of instance categories and their relationships. To advance the perception of sub -scene in HDSN, we design a Gather Point Transformer structure (GaPT) that enables the combinatorial interaction of local information from multiple point cloud sets, achieving a more comprehensive local contextual feature extraction. Extensive evaluations on the challenging 3DSSG benchmark demonstrate that our methods provide substantial improvements, establishing a new state-of-the-art in 3DSSG prediction, boosting the top -50 triplet accuracy by + 2.8%.

引用

页数：12

共 50 条

[41] How Linguistic and Cultural Forces Shape Conceptions of Time: English and Mandarin Time in 3D
Fuhrman, Orly
McCormick, Kelly
Chen, Eva
Jiang, Heidi
Shu, Dingfang
Mao, Shuaimei
Boroditsky, Lera
COGNITIVE SCIENCE, 2011, 35 (07) : 1305 - 1328
[42] 3D mapping of language networks in clinical and pre-clinical Alzheimer's disease
Apostolova, Liana G.
Lu, Po
Rogers, Steve
Dutton, Rebecca A.
Hayashi, Kiralee M.
Toga, Arthur W.
Cummings, Jeffrey L.
Thompson, Paul M.
BRAIN AND LANGUAGE, 2008, 104 (01) : 33 - 41
[43] Automatic analysis of cross-sectional cerebral asymmetry on 3D in vivo MRI scans of human and chimpanzee
Xiang, Li
Crow, Timothy
Roberts, Neil
JOURNAL OF NEUROSCIENCE RESEARCH, 2019, 97 (06) : 673 - 682
[44] 3D BIM-enabled spatial query for retrieving property boundaries: a case study in Victoria, Australia
Barzegar, Maryam
Rajabifard, Abbas
Kalantari, Mohsen
Atazadeh, Behnam
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2020, 34 (02) : 251 - 271
[45] Real body versus 3D avatar: the effects of different embodied learning types on EFL listening comprehension
Lan, Yu-Ju
Fang, Wei-Chieh
Hsiao, Indy Y. T.
Chen, Nian-Shing
ETR&D-EDUCATIONAL TECHNOLOGY RESEARCH AND DEVELOPMENT, 2018, 66 (03): : 709 - 731
[46] Linguistic judgments in 3D: the aesthetic quality, linguistic acceptability, and surface probability of stigmatized and non-stigmatized variation
Schoenmakers, Gert-Jan
LINGUISTICS, 2023, 61 (03) : 779 - 824
[47] 3D CNN for neuropsychiatry: Predicting Autism with interpretable Deep Learning applied to minimally preprocessed structural MRI data
Garcia, Melanie
Kelly, Clare
PLOS ONE, 2024, 19 (10):
[48] Enhancing EFL vocabulary learning with multimodal cues supported by an educational robot and an IoT-Based 3D book
Lin, Vivien
Yeh, Hui-Chin
Huang, Huai-Hsuan
Chen, Nian-Shing
SYSTEM, 2022, 104
[49] A sparsity preserving genetic algorithm for extracting diverse functional 3D designs from deep generative neural networks
Cunningham, James D.
Shu, Dule
Simpson, Timothy W.
Tucker, Conrad S.
DESIGN SCIENCE, 2020, 6 (06):
[50] Multimodal Interaction Grammar Analysis Based on Two-Stage User-Based Elicitation in 3D Modeling
Hou, Wen-Jun
Guo, Ge-Xin
Cheng, Yi-Ting
INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2024, 40 (08) : 2120 - 2141

← 1 2 3 4 5 →