Multi-Granularity Sparse Relationship Matrix Prediction Network for End-to-End Scene Graph Generation

被引:0
作者
Wang, Lei [1 ]
Yuan, Zejian [1 ]
Chen, Badong [1 ]
机构
[1] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian 710049, Peoples R China
来源
COMPUTER VISION-ECCV 2024, PT LXXXII | 2025年 / 15140卷
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Scene Graph Generation; End-to-End; Sparse Relationship Matrix; Multi-Granularity;
D O I
10.1007/978-3-031-73007-8_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current end-to-end Scene Graph Generation (SGG) relies solely on visual representations to separately detect sparse relations and entities in an image. This leads to the issue where the predictions of entities do not contribute to the prediction of relations, necessitating post-processing to assign corresponding subjects and objects to the predicted relations. In this paper, we introduce a sparse relationship matrix that bridges entity detection and relation detection. Our approach not only eliminates the need for relation matching, but also leverages the semantics and positional information of predicted entities to enhance relation prediction. Specifically, a multi-granularity sparse relationship matrix prediction network is proposed, which utilizes three gated pooling modules focusing on filtering negative samples at different granularities, thereby obtaining a sparse relationship matrix containing entity pairs most likely to form relations. Finally, a set of sparse, most probable subject-object pairs can be constructed and used for relation decoding. Experimental results on multiple datasets demonstrate that our method achieves a new state-of-the-art overall performance. Our code is available at https://github.com/wanglei0618/Mg-RMPN.
引用
收藏
页码:105 / 121
页数:17
相关论文
共 40 条
  • [31] Multi-granularity relationship reasoning network for high-fidelity 3D shape reconstruction
    Li, Lei
    Zhou, Zhiyuan
    Wu, Suping
    Li, Pan
    Zhang, Boyang
    PATTERN RECOGNITION, 2024, 155
  • [32] ATTENTION-AUGMENTED END-TO-END MULTI-TASK LEARNING FOR EMOTION PREDICTION FROM SPEECH
    Zhang, Zixing
    Wu, Bingwen
    Schuller, Bjoern
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6705 - 6709
  • [33] Multi-granularity PM2.5 concentration long sequence prediction model combined with spatial-temporal graph
    Zhang, Bo
    Qin, Hongsheng
    Zhang, Yuqi
    Li, Maozhen
    Qin, Dongming
    Guo, Xiaoyang
    Li, Meizi
    Guo, Chang
    ENVIRONMENTAL MODELLING & SOFTWARE, 2025, 188
  • [34] End-to-end fatigue driving EEG signal detection model based on improved temporal-graph convolution network
    Jia, Huijie
    Xiao, Zhongjun
    Ji, Peng
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 152
  • [35] MG-GCN: Multi-Granularity Graph Convolutional Neural Network for Multi-Label Classification in Multi-Label Information System
    Yu, Bin
    Xie, Hengjie
    Cai, Mingjie
    Ding, Weiping
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (01): : 288 - 299
  • [36] Simultaneous End-to-End Vehicle and License Plate Detection With Multi-Branch Attention Neural Network
    Chen, Song-Lu
    Yang, Chun
    Ma, Jia-Wei
    Chen, Feng
    Yin, Xu-Cheng
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2020, 21 (09) : 3686 - 3695
  • [37] End-to-End 3D Human Pose Estimation Network With Multi-Layer Feature Fusion
    Cai, Guoci
    Zhang, Changshe
    Xie, Jingxiu
    Pan, Jie
    Li, Chaopeng
    Wu, Yiliang
    IEEE ACCESS, 2024, 12 : 89124 - 89134
  • [38] Efficient Neural Network for Text Recognition in Natural Scenes Based on End-to-End Multi-Scale Attention Mechanism
    Peng, Huiling
    Yu, Jia
    Nie, Yalin
    ELECTRONICS, 2023, 12 (06)
  • [39] SwinMFF: toward high-fidelity end-to-end multi-focus image fusion via swin transformer-based network
    Xie, Xinzhe
    Guo, Buyu
    Li, Peiliang
    He, Shuangyan
    Zhou, Sangjun
    VISUAL COMPUTER, 2024, : 3883 - 3906
  • [40] A novel complex network prediction method based on multi-granularity contrastive learningA novel complex network prediction method…\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ldots$$\end{document}S. Sui et al.
    Shanshan Sui
    Qilong Han
    Dan Lu
    Shiqing Wu
    Guandong Xu
    CCF Transactions on Pervasive Computing and Interaction, 2024, 6 (4) : 394 - 405