Granular3D: Delving into multi-granularity 3D scene graph prediction

被引:0
作者
Huang, Kaixiang [1 ,2 ]
Yang, Jingru [1 ,2 ]
Wang, Jin [1 ,2 ,6 ]
He, Shengfeng [3 ]
Wang, Zhan [4 ]
He, Haiyan [1 ,2 ,5 ]
Zhang, Qifeng
Lu, Guodong [1 ,2 ]
机构
[1] Zhejiang Univ, State Key Lab Fluid Power & Mechatron Syst, Hangzhou 310027, Zhejiang, Peoples R China
[2] Zhejiang Univ, Robot Inst, Hangzhou 310027, Zhejiang, Peoples R China
[3] Singapore Management Univ, Singapore 178903, Singapore
[4] Zhejiang Energy Digital Technol Co Ltd, Dept Artificial Intelligence & Robot, Hangzhou 310027, Zhejiang, Peoples R China
[5] Zhejiang Baima Lake Lab Co Ltd, Hangzhou 310000, Zhejiang, Peoples R China
[6] Jinhua Key Lab Robot Intelligent Welding Technol, Jinhua 321000, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
3D point cloud; 3D semantic scene graph prediction; Multi-granularity; Gather point transformer; LANGUAGE;
D O I
10.1016/j.patcog.2024.110562
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper addresses the significant challenges in 3D Semantic Scene Graph (3DSSG) prediction, essential for understanding complex 3D environments. Traditional approaches, primarily using PointNet and Graph Convolutional Networks, struggle with effectively extracting multi -grained features from intricate 3D scenes, largely due to a focus on global scene processing and single -scale feature extraction. To overcome these limitations, we introduce Granular3D, a novel approach that shifts the focus towards multi -granularity analysis by predicting relation triplets from specific sub -scenes. One key is the Adaptive Instance Enveloping Method (AIEM), which establishes an approximate envelope structure around irregular instances, providing shape -adaptive local point cloud sampling, thereby comprehensively covering the contextual environments of instances. Moreover, Granular3D incorporates a Hierarchical Dual -Stage Network (HDSN), which differentiates and processes features of instances and their pairs at varying scales, leading to a targeted prediction of instance categories and their relationships. To advance the perception of sub -scene in HDSN, we design a Gather Point Transformer structure (GaPT) that enables the combinatorial interaction of local information from multiple point cloud sets, achieving a more comprehensive local contextual feature extraction. Extensive evaluations on the challenging 3DSSG benchmark demonstrate that our methods provide substantial improvements, establishing a new state-of-the-art in 3DSSG prediction, boosting the top -50 triplet accuracy by + 2.8%.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] How Linguistic and Cultural Forces Shape Conceptions of Time: English and Mandarin Time in 3D
    Fuhrman, Orly
    McCormick, Kelly
    Chen, Eva
    Jiang, Heidi
    Shu, Dingfang
    Mao, Shuaimei
    Boroditsky, Lera
    COGNITIVE SCIENCE, 2011, 35 (07) : 1305 - 1328
  • [42] 3D mapping of language networks in clinical and pre-clinical Alzheimer's disease
    Apostolova, Liana G.
    Lu, Po
    Rogers, Steve
    Dutton, Rebecca A.
    Hayashi, Kiralee M.
    Toga, Arthur W.
    Cummings, Jeffrey L.
    Thompson, Paul M.
    BRAIN AND LANGUAGE, 2008, 104 (01) : 33 - 41
  • [43] Automatic analysis of cross-sectional cerebral asymmetry on 3D in vivo MRI scans of human and chimpanzee
    Xiang, Li
    Crow, Timothy
    Roberts, Neil
    JOURNAL OF NEUROSCIENCE RESEARCH, 2019, 97 (06) : 673 - 682
  • [44] 3D BIM-enabled spatial query for retrieving property boundaries: a case study in Victoria, Australia
    Barzegar, Maryam
    Rajabifard, Abbas
    Kalantari, Mohsen
    Atazadeh, Behnam
    INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2020, 34 (02) : 251 - 271
  • [45] Real body versus 3D avatar: the effects of different embodied learning types on EFL listening comprehension
    Lan, Yu-Ju
    Fang, Wei-Chieh
    Hsiao, Indy Y. T.
    Chen, Nian-Shing
    ETR&D-EDUCATIONAL TECHNOLOGY RESEARCH AND DEVELOPMENT, 2018, 66 (03): : 709 - 731
  • [46] Linguistic judgments in 3D: the aesthetic quality, linguistic acceptability, and surface probability of stigmatized and non-stigmatized variation
    Schoenmakers, Gert-Jan
    LINGUISTICS, 2023, 61 (03) : 779 - 824
  • [47] 3D CNN for neuropsychiatry: Predicting Autism with interpretable Deep Learning applied to minimally preprocessed structural MRI data
    Garcia, Melanie
    Kelly, Clare
    PLOS ONE, 2024, 19 (10):
  • [48] Enhancing EFL vocabulary learning with multimodal cues supported by an educational robot and an IoT-Based 3D book
    Lin, Vivien
    Yeh, Hui-Chin
    Huang, Huai-Hsuan
    Chen, Nian-Shing
    SYSTEM, 2022, 104
  • [49] A sparsity preserving genetic algorithm for extracting diverse functional 3D designs from deep generative neural networks
    Cunningham, James D.
    Shu, Dule
    Simpson, Timothy W.
    Tucker, Conrad S.
    DESIGN SCIENCE, 2020, 6 (06):
  • [50] Multimodal Interaction Grammar Analysis Based on Two-Stage User-Based Elicitation in 3D Modeling
    Hou, Wen-Jun
    Guo, Ge-Xin
    Cheng, Yi-Ting
    INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2024, 40 (08) : 2120 - 2141