Multi-Granularity Sparse Relationship Matrix Prediction Network for End-to-End Scene Graph Generation

被引：0

作者：

Wang, Lei ^{[1
]}

Yuan, Zejian ^{[1
]}

Chen, Badong ^{[1
]}

机构：

[1] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian 710049, Peoples R China

来源：

COMPUTER VISION-ECCV 2024, PT LXXXII | 2025年 / 15140卷

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Scene Graph Generation; End-to-End; Sparse Relationship Matrix; Multi-Granularity;

D O I：

10.1007/978-3-031-73007-8_7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Current end-to-end Scene Graph Generation (SGG) relies solely on visual representations to separately detect sparse relations and entities in an image. This leads to the issue where the predictions of entities do not contribute to the prediction of relations, necessitating post-processing to assign corresponding subjects and objects to the predicted relations. In this paper, we introduce a sparse relationship matrix that bridges entity detection and relation detection. Our approach not only eliminates the need for relation matching, but also leverages the semantics and positional information of predicted entities to enhance relation prediction. Specifically, a multi-granularity sparse relationship matrix prediction network is proposed, which utilizes three gated pooling modules focusing on filtering negative samples at different granularities, thereby obtaining a sparse relationship matrix containing entity pairs most likely to form relations. Finally, a set of sparse, most probable subject-object pairs can be constructed and used for relation decoding. Experimental results on multiple datasets demonstrate that our method achieves a new state-of-the-art overall performance. Our code is available at https://github.com/wanglei0618/Mg-RMPN.

引用

页码：105 / 121

页数：17

共 40 条

[1] End-to-End Disparity Estimation with Multi-granularity Fully Convolutional Network
Yang, Guorun
Deng, Zhidong
NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 238 - 248
[2] A Novel End-to-End Transformer for Scene Graph Generation
Ren, Chengkai
Liu, Xiuhua
Cao, Mengyuan
Zhang, Jian
Wang, Hongwei
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[3] SGTR plus : End-to-End Scene Graph Generation With Transformer
Li, Rongjie
Zhang, Songyang
He, Xuming
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (04) : 2191 - 2205
[4] Multi-Granularity Sequence Alignment Mapping for Encoder-Decoder Based End-to-End ASR
Tang, Jian
Zhang, Jie
Song, Yan
McLoughlin, Ian
Dai, Li-Rong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2816 - 2828
[5] Multi-Granularity Contrastive Cross-Modal Collaborative Generation for End-to-End Long-Term Video Question Answering
Yu, Ting
Fu, Kunhao
Zhang, Jian
Huang, Qingming
Yu, Jun
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 3115 - 3129
[6] Granular3D: Delving into multi-granularity 3D scene graph prediction
Huang, Kaixiang
Yang, Jingru
Wang, Jin
He, Shengfeng
Wang, Zhan
He, Haiyan
Zhang, Qifeng
Lu, Guodong
PATTERN RECOGNITION, 2024, 153
[7] End-to-end event factuality prediction using directional labeled graph recurrent network
Liu, Xiao
Huang, Heyan
Zhang, Yue
INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (02)
[8] Multi-granularity scenarios understanding network for trajectory prediction
Biao Yang
Jicheng Yang
Rongrong Ni
Changchun Yang
Xiaofeng Liu
Complex & Intelligent Systems, 2023, 9 : 851 - 864
[9] Multi-granularity scenarios understanding network for trajectory prediction
Yang, Biao
Yang, Jicheng
Ni, Rongrong
Yang, Changchun
Liu, Xiaofeng
COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (01) : 851 - 864
[10] Multi-granularity spatial temporal graph convolution network with consecutive attention for human motion prediction
Ma, Jinli
Zhang, Yumei
Zhou, Hanghang
Yang, Honghong
Wu, Xiaojun
APPLIED SOFT COMPUTING, 2024, 165

← 1 2 3 4 →