Attention redirection transformer with semantic oriented learning for unbiased scene graph generation

被引:0
作者
Zhang, Ruonan [1 ,2 ]
An, Gaoyun [1 ]
Cen, Yigang [1 ,2 ]
Ruan, Qiuqi [1 ,2 ]
机构
[1] Beijing Jiaotong Univ, Inst Informat Sci, Beijing, Peoples R China
[2] Beijing Key Lab Adv Informat Sci & Network Technol, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Scene graph generation; Transformer; Attention redirection; Translation embedding; Scene understanding;
D O I
10.1016/j.patcog.2024.111039
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene Graph Generation (SGG) plays an important role in scene understanding because all of the objects and relations in an image can be abstracted into a concise topological graph. Due to the complexity of visual scenes, including mutual occlusion between objects and semantic ambiguity, SGG is still a challenging task. Most of the existing models only focused on the context of a single object while contexts provided by paired objects are ignored. In this paper, we propose an Attention Redirection Transformer (ART) to extract pair-level contexts specifically, which is divided into an attention distraction stage and an attention integration stage. In this way, the attention of the model is forced to be redirected, which explores the implicit information in the background. In addition, to incorporate the semantic information of predicates, a Semantic Oriented Learning Module (SOL) is designed, which may assist in getting better textual semantics and also prompts cross-modal information fusion. At last, a self-diversity driven Dual Translation Embedding Module (DTM) is designed, which refines representations of subject and object and makes them distinct. Experimental results on the Visual Genome dataset demonstrate the effectiveness of our proposed method. Moreover, our method outperforms state-of-the-art methods on the mR@K metric. The source codes are released on Github: https://github.com/Nora-Zhang98/ART-SOL.
引用
收藏
页数:12
相关论文
共 50 条
[1]   Probabilistic Debiasing of Scene Graphs [J].
Biswas, Bashirul Azam ;
Ji, Qiang .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :10429-10438
[2]  
Bordes A., 2013, ADV NEURAL INFORM PR, V26
[3]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[4]  
Chen C, 2022, AAAI CONF ARTIF INTE, P212
[5]   Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs [J].
Chen, Shizhe ;
Jin, Qin ;
Wang, Peng ;
Wu, Qi .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :9959-9968
[6]   Knowledge-Embedded Routing Network for Scene Graph Generation [J].
Chen, Tianshui ;
Yu, Weihao ;
Chen, Riquan ;
Lin, Liang .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :6156-6164
[7]   RelTR: Relation Transformer for Scene Graph Generation [J].
Cong, Yuren ;
Yang, Michael Ying ;
Rosenhahn, Bodo .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (09) :11169-11183
[8]   Hierarchical Memory Learning for Fine-Grained Scene Graph Generation [J].
Deng, Youming ;
Li, Yansheng ;
Zhang, Yongjun ;
Xiang, Xiang ;
Wang, Jian ;
Chen, Jingdong ;
Ma, Jiayi .
COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 :266-283
[9]   Unconditional Scene Graph Generation [J].
Garg, Sarthak ;
Dhamo, Helisa ;
Farshad, Azade ;
Musatian, Sabrina ;
Navab, Nassir ;
Tombari, Federico .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :16342-16351
[10]   From General to Specific: Informative Scene Graph Generation via Balance Adjustment [J].
Guo, Yuyu ;
Gao, Lianli ;
Wang, Xuanhan ;
Hu, Yuxuan ;
Xu, Xing ;
Lu, Xu ;
Shen, Heng Tao ;
Song, Jingkuan .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :16363-16372