Granular3D: Delving into multi-granularity 3D scene graph prediction

被引:0
作者
Huang, Kaixiang [1 ,2 ]
Yang, Jingru [1 ,2 ]
Wang, Jin [1 ,2 ,6 ]
He, Shengfeng [3 ]
Wang, Zhan [4 ]
He, Haiyan [1 ,2 ,5 ]
Zhang, Qifeng
Lu, Guodong [1 ,2 ]
机构
[1] Zhejiang Univ, State Key Lab Fluid Power & Mechatron Syst, Hangzhou 310027, Zhejiang, Peoples R China
[2] Zhejiang Univ, Robot Inst, Hangzhou 310027, Zhejiang, Peoples R China
[3] Singapore Management Univ, Singapore 178903, Singapore
[4] Zhejiang Energy Digital Technol Co Ltd, Dept Artificial Intelligence & Robot, Hangzhou 310027, Zhejiang, Peoples R China
[5] Zhejiang Baima Lake Lab Co Ltd, Hangzhou 310000, Zhejiang, Peoples R China
[6] Jinhua Key Lab Robot Intelligent Welding Technol, Jinhua 321000, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
3D point cloud; 3D semantic scene graph prediction; Multi-granularity; Gather point transformer; LANGUAGE;
D O I
10.1016/j.patcog.2024.110562
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper addresses the significant challenges in 3D Semantic Scene Graph (3DSSG) prediction, essential for understanding complex 3D environments. Traditional approaches, primarily using PointNet and Graph Convolutional Networks, struggle with effectively extracting multi -grained features from intricate 3D scenes, largely due to a focus on global scene processing and single -scale feature extraction. To overcome these limitations, we introduce Granular3D, a novel approach that shifts the focus towards multi -granularity analysis by predicting relation triplets from specific sub -scenes. One key is the Adaptive Instance Enveloping Method (AIEM), which establishes an approximate envelope structure around irregular instances, providing shape -adaptive local point cloud sampling, thereby comprehensively covering the contextual environments of instances. Moreover, Granular3D incorporates a Hierarchical Dual -Stage Network (HDSN), which differentiates and processes features of instances and their pairs at varying scales, leading to a targeted prediction of instance categories and their relationships. To advance the perception of sub -scene in HDSN, we design a Gather Point Transformer structure (GaPT) that enables the combinatorial interaction of local information from multiple point cloud sets, achieving a more comprehensive local contextual feature extraction. Extensive evaluations on the challenging 3DSSG benchmark demonstrate that our methods provide substantial improvements, establishing a new state-of-the-art in 3DSSG prediction, boosting the top -50 triplet accuracy by + 2.8%.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Intelligent 3D Layout Design with Shape Grammars
    Grzesiak-Kopec, Katarzyna
    Ogorzalek, Maciej
    2013 6TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTIONS (HSI), 2013, : 265 - 270
  • [22] Deep generative models for 3D molecular structure
    Baillif, Benoit
    Cole, Jason
    McCabe, Patrick
    Bender, Andreas
    CURRENT OPINION IN STRUCTURAL BIOLOGY, 2023, 80
  • [23] A 3D shape generative method for aesthetic product design
    Alcaide-Marzal, Jorge
    Antonio Diego-Mas, Jose
    Acosta-Zazueta, Gonzalo
    DESIGN STUDIES, 2020, 66 : 144 - 176
  • [24] Querying 3D Cadastral Information from BIM Models
    Atazadeh, Behnam
    Rajabifard, Abbas
    Zhang, Yibo
    Barzegar, Maryam
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2019, 8 (08)
  • [25] FOSTERING INTERDISCIPLINARY RESEARCH IN COMPUTATIONAL THINKING: PROJECT 3D GEOMETRY
    Basogain, X.
    Monroy, F.
    Duran, S.
    Rocha, D.
    Alvarado, M.
    Olabe, M. A.
    Olabe, J. C.
    INTED2014: 8TH INTERNATIONAL TECHNOLOGY, EDUCATION AND DEVELOPMENT CONFERENCE, 2014, : 7072 - 7079
  • [26] Virtually overcoming grammar learning with 3D application of Loci mnemonics?
    Hagstrom, Josefin
    Winman, Anders
    APPLIED COGNITIVE PSYCHOLOGY, 2018, 32 (04) : 450 - 462
  • [27] Enhancing Medical Students' Communicative Skills in a 3D Virtual World
    Wu, Yi-Ju Ariel
    Lan, Yu-Ju
    Huang, Sin-Bao Paul
    Lin, Ting R.
    EDUCATIONAL TECHNOLOGY & SOCIETY, 2019, 22 (04): : 18 - 32
  • [28] A review of platforms for simulating embodied agents in 3D virtual environments
    Kaur, Deepti Prit
    Singh, Narinder Pal
    Banerjee, Bonny
    ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (04) : 3711 - 3753
  • [29] Does a 3D immersive experience enhance Mandarin writing by CSL students?
    Lan, Yu-Ju
    Lyu, Bo-Ning
    Chin, Chee Kuen
    LANGUAGE LEARNING & TECHNOLOGY, 2019, 23 (02): : 125 - 144
  • [30] A TEXT-TO-SL SYNTHESIS SYSTEM USING 3D AVATAR TECHNOLOGY
    Gibet, Sylvie
    Marteau, Pierre-Francois
    2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,