Granular3D: Delving into multi-granularity 3D scene graph prediction

被引:0
作者
Huang, Kaixiang [1 ,2 ]
Yang, Jingru [1 ,2 ]
Wang, Jin [1 ,2 ,6 ]
He, Shengfeng [3 ]
Wang, Zhan [4 ]
He, Haiyan [1 ,2 ,5 ]
Zhang, Qifeng
Lu, Guodong [1 ,2 ]
机构
[1] Zhejiang Univ, State Key Lab Fluid Power & Mechatron Syst, Hangzhou 310027, Zhejiang, Peoples R China
[2] Zhejiang Univ, Robot Inst, Hangzhou 310027, Zhejiang, Peoples R China
[3] Singapore Management Univ, Singapore 178903, Singapore
[4] Zhejiang Energy Digital Technol Co Ltd, Dept Artificial Intelligence & Robot, Hangzhou 310027, Zhejiang, Peoples R China
[5] Zhejiang Baima Lake Lab Co Ltd, Hangzhou 310000, Zhejiang, Peoples R China
[6] Jinhua Key Lab Robot Intelligent Welding Technol, Jinhua 321000, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
3D point cloud; 3D semantic scene graph prediction; Multi-granularity; Gather point transformer; LANGUAGE;
D O I
10.1016/j.patcog.2024.110562
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper addresses the significant challenges in 3D Semantic Scene Graph (3DSSG) prediction, essential for understanding complex 3D environments. Traditional approaches, primarily using PointNet and Graph Convolutional Networks, struggle with effectively extracting multi -grained features from intricate 3D scenes, largely due to a focus on global scene processing and single -scale feature extraction. To overcome these limitations, we introduce Granular3D, a novel approach that shifts the focus towards multi -granularity analysis by predicting relation triplets from specific sub -scenes. One key is the Adaptive Instance Enveloping Method (AIEM), which establishes an approximate envelope structure around irregular instances, providing shape -adaptive local point cloud sampling, thereby comprehensively covering the contextual environments of instances. Moreover, Granular3D incorporates a Hierarchical Dual -Stage Network (HDSN), which differentiates and processes features of instances and their pairs at varying scales, leading to a targeted prediction of instance categories and their relationships. To advance the perception of sub -scene in HDSN, we design a Gather Point Transformer structure (GaPT) that enables the combinatorial interaction of local information from multiple point cloud sets, achieving a more comprehensive local contextual feature extraction. Extensive evaluations on the challenging 3DSSG benchmark demonstrate that our methods provide substantial improvements, establishing a new state-of-the-art in 3DSSG prediction, boosting the top -50 triplet accuracy by + 2.8%.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] USER STUDY OF HAND GESTURES FOR GESTURE BASED 3D CAD MODELING
    Thakur, Aditya
    Rai, Rahul
    INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2015, VOL 1B, 2016,
  • [32] 3D immersive scaffolding game for enhancing Mandarin learning in children with ADHD
    Lan, Yu-Ju
    Shih, Mei-Feng
    Hsiao, Yu-Ting
    EDUCATIONAL TECHNOLOGY & SOCIETY, 2024, 27 (02): : 4 - 24
  • [33] RETRACTED: A New Approach for Animating 3D Signing Avatars (Retracted Article)
    Ben Yahia, Nour
    Jemni, Mohamed
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, PT I, 2013, 7971 : 683 - 696
  • [34] Can mental time lines co-exist in 3D space?
    Ding, Xianfeng
    Feng, Ning
    He, Tingyu
    Cheng, Xiaorong
    Fan, Zhao
    ACTA PSYCHOLOGICA, 2020, 207
  • [35] 3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds
    Cai, Daigang
    Zhao, Lichen
    Zhang, Jing
    Sheng, Lu
    Xu, Dong
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16443 - 16452
  • [36] Exploring children's negotiation of meanings about "D" in 2D and 3D shapes in a year 5/6 New Zealand primary classroom
    Sharma, Shweta
    MATHEMATICS EDUCATION RESEARCH JOURNAL, 2024, 36 (02) : 259 - 283
  • [37] Use of Reference Frame and Movement Pattern in Haptically Enhanced 3D Virtual Environment
    Lee, Ja Young
    Bahn, Sangwoo
    Nam, Chang S.
    INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2014, 30 (11) : 891 - 903
  • [38] 3D Mapping of Brain Differences in Native Signing Congenitally and Prelingually Deaf Subjects
    Lepore, Natasha
    Vachon, Patrick
    Lepore, Franco
    Chou, Yi-Yu
    Voss, Patrice
    Brun, Caroline C.
    Lee, Agatha D.
    Toga, Arthur W.
    Thompson, Paul M.
    HUMAN BRAIN MAPPING, 2010, 31 (07) : 970 - 978
  • [39] Improving Target-driven Visual Navigation with Attention on 3D Spatial Relationships
    Lyu, Yunlian
    Shi, Yimin
    Zhang, Xianggang
    NEURAL PROCESSING LETTERS, 2022, 54 (05) : 3979 - 3998
  • [40] Three-Dimensional Structure Database of Natural Metabolites (3DMET): A Novel Database of Curated 3D Structures
    Maeda, Miki H.
    Kondo, Kazumi
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2013, 53 (03) : 527 - 533