Granular3D: Delving into multi-granularity 3D scene graph prediction

被引:0
作者
Huang, Kaixiang [1 ,2 ]
Yang, Jingru [1 ,2 ]
Wang, Jin [1 ,2 ,6 ]
He, Shengfeng [3 ]
Wang, Zhan [4 ]
He, Haiyan [1 ,2 ,5 ]
Zhang, Qifeng
Lu, Guodong [1 ,2 ]
机构
[1] Zhejiang Univ, State Key Lab Fluid Power & Mechatron Syst, Hangzhou 310027, Zhejiang, Peoples R China
[2] Zhejiang Univ, Robot Inst, Hangzhou 310027, Zhejiang, Peoples R China
[3] Singapore Management Univ, Singapore 178903, Singapore
[4] Zhejiang Energy Digital Technol Co Ltd, Dept Artificial Intelligence & Robot, Hangzhou 310027, Zhejiang, Peoples R China
[5] Zhejiang Baima Lake Lab Co Ltd, Hangzhou 310000, Zhejiang, Peoples R China
[6] Jinhua Key Lab Robot Intelligent Welding Technol, Jinhua 321000, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
3D point cloud; 3D semantic scene graph prediction; Multi-granularity; Gather point transformer; LANGUAGE;
D O I
10.1016/j.patcog.2024.110562
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper addresses the significant challenges in 3D Semantic Scene Graph (3DSSG) prediction, essential for understanding complex 3D environments. Traditional approaches, primarily using PointNet and Graph Convolutional Networks, struggle with effectively extracting multi -grained features from intricate 3D scenes, largely due to a focus on global scene processing and single -scale feature extraction. To overcome these limitations, we introduce Granular3D, a novel approach that shifts the focus towards multi -granularity analysis by predicting relation triplets from specific sub -scenes. One key is the Adaptive Instance Enveloping Method (AIEM), which establishes an approximate envelope structure around irregular instances, providing shape -adaptive local point cloud sampling, thereby comprehensively covering the contextual environments of instances. Moreover, Granular3D incorporates a Hierarchical Dual -Stage Network (HDSN), which differentiates and processes features of instances and their pairs at varying scales, leading to a targeted prediction of instance categories and their relationships. To advance the perception of sub -scene in HDSN, we design a Gather Point Transformer structure (GaPT) that enables the combinatorial interaction of local information from multiple point cloud sets, achieving a more comprehensive local contextual feature extraction. Extensive evaluations on the challenging 3DSSG benchmark demonstrate that our methods provide substantial improvements, establishing a new state-of-the-art in 3DSSG prediction, boosting the top -50 triplet accuracy by + 2.8%.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Multi-granularity relationship reasoning network for high-fidelity 3D shape reconstruction
    Li, Lei
    Zhou, Zhiyuan
    Wu, Suping
    Li, Pan
    Zhang, Boyang
    PATTERN RECOGNITION, 2024, 155
  • [2] MULTI-GRANULARITY FEATURE INTERACTION AND RELATION REASONING FOR 3D DENSE ALIGNMENT AND FACE RECONSTRUCTION
    Li, Lei
    Li, Xiangzheng
    Wu, Kangbo
    Lin, Kui
    Wu, Suping
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4265 - 4269
  • [3] MLGPnet: Multi-granularity neural network for 3D shape recognition using pyramid data
    Li, Zekun
    Seah, Hock Soon
    Guo, Baolong
    Yang, Muli
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 239
  • [4] Multi-Granularity Sparse Relationship Matrix Prediction Network for End-to-End Scene Graph Generation
    Wang, Lei
    Yuan, Zejian
    Chen, Badong
    COMPUTER VISION-ECCV 2024, PT LXXXII, 2025, 15140 : 105 - 121
  • [5] Multi-granularity spatial temporal graph convolution network with consecutive attention for human motion prediction
    Ma, Jinli
    Zhang, Yumei
    Zhou, Hanghang
    Yang, Honghong
    Wu, Xiaojun
    APPLIED SOFT COMPUTING, 2024, 165
  • [6] Boosting the performance of molecular property prediction via graph-text alignment and multi-granularity representation enhancement
    Zhao, Zhuoran
    Zhou, Qing
    Wu, Chengkai
    Su, Renbin
    Xiong, Weihong
    JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 2024, 132
  • [7] Multi-granularity PM2.5 concentration long sequence prediction model combined with spatial-temporal graph
    Zhang, Bo
    Qin, Hongsheng
    Zhang, Yuqi
    Li, Maozhen
    Qin, Dongming
    Guo, Xiaoyang
    Li, Meizi
    Guo, Chang
    ENVIRONMENTAL MODELLING & SOFTWARE, 2025, 188
  • [8] GNNGO3D: Protein Function Prediction Based on 3D Structure and Functional Hierarchy Learning
    Zhang, Liyuan
    Jiang, Yongquan
    Yang, Yan
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (08) : 3867 - 3878
  • [9] SAM-Guided Graph Cut for 3D Instance Segmentation
    Guo, Haoyu
    Zhu, He
    Peng, Sida
    Wang, Yuang
    Shen, Yujun
    Hu, Ruizhen
    Zhou, Xiaowei
    COMPUTER VISION - ECCV 2024, PT XLVIII, 2025, 15106 : 234 - 251
  • [10] 3D Question Answering
    Ye, Shuquan
    Chen, Dongdong
    Han, Songfang
    Liao, Jing
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (03) : 1772 - 1786