Transformer networks with adaptive inference for scene graph generation

被引：1

作者：

Wang, Yini ^{[1
]}

Gao, Yongbin ^{[1
]}

Yu, Wenjun ^{[1
]}

Guo, Ruyan ^{[1
]}

Wan, Weibing ^{[1
]}

Yang, Shuqun ^{[1
]}

Huang, Bo ^{[1
]}

机构：

[1] Shanghai Univ Engn Sci, Sch Elect & Elect Engn, Shanghai, Peoples R China

来源：

APPLIED INTELLIGENCE | 2023年 / 53卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Scene graph generation; Image-to-text translation; Visual relationship detection; Computer vision;

D O I：

10.1007/s10489-022-04022-0

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Understanding a visual scene requires not only identifying single objects in isolation but also inferring the relationships and interactions between object pairs. In this study, we propose a novel scene graph generation framework based on Transformer to convert image data into linguistic descriptions characterized as nodes and edges of a graph describing the information of the given image. The proposed model consists of three components. First, we propose an enhanced object detection module with bidirectional long short-term memory (Bi-LSTM) for object-to-object information exchange to generate the classification probabilities for object bounding boxes and classes. Second, we introduce a novel context information capture module containing Transformer layers that outputs object categories containing object context as well as edge information for specific object pairs with context. Finally, since the relationship frequencies follow a long-tailed distribution, an adaptive inference module with a special feature fusion strategy is designed to soften the distribution and perform adaptive reasoning about relationship classification based on the visual appearance of object pairs. We have conducted detailed experiments on three popular open-source datasets, namely, Visual Genome, OpenImages, and Visual Relationship Detection, and have performed ablation experiments on each module, demonstrating significant improvements under different settings and in terms of various metrics.

引用

页码：9621 / 9633

页数：13

共 50 条

[1] Transformer networks with adaptive inference for scene graph generation
Yini Wang
Yongbin Gao
Wenjun Yu
Ruyan Guo
Weibing Wan
Shuqun Yang
Bo Huang
Applied Intelligence, 2023, 53 : 9621 - 9633
[2] Multimodal graph inference network for scene graph generation
Jingwen Duan
Weidong Min
Deyu Lin
Jianfeng Xu
Xin Xiong
Applied Intelligence, 2021, 51 : 8768 - 8783
[3] Multimodal graph inference network for scene graph generation
Duan, Jingwen
Min, Weidong
Lin, Deyu
Xu, Jianfeng
Xiong, Xin
APPLIED INTELLIGENCE, 2021, 51 (12) : 8768 - 8783
[4] RelTR: Relation Transformer for Scene Graph Generation
Cong, Yuren
Yang, Michael Ying
Rosenhahn, Bodo
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (09) : 11169 - 11183
[5] SGTR plus : End-to-End Scene Graph Generation With Transformer
Li, Rongjie
Zhang, Songyang
He, Xuming
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (04) : 2191 - 2205
[6] A Novel End-to-End Transformer for Scene Graph Generation
Ren, Chengkai
Liu, Xiuhua
Cao, Mengyuan
Zhang, Jian
Wang, Hongwei
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[7] Dynamic Gated Graph Neural Networks for Scene Graph Generation
Khademi, Mahmoud
Schulte, Oliver
COMPUTER VISION - ACCV 2018, PT VI, 2019, 11366 : 669 - 685
[8] Scene Adaptive Context Modeling and Balanced Relation Prediction for Scene Graph Generation
Xu, Kai
Wang, Lichun
Li, Shuang
Gao, Tong
Yin, Baocai
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2025, 21 (03)
[9] Review on scene graph generation methods
Monesh, S.
Senthilkumar, N. C.
MULTIAGENT AND GRID SYSTEMS, 2024, 20 (02) : 129 - 160
[10] Scene Graph Generation: A comprehensive survey
Li, Hongsheng
Zhu, Guangming
Zhang, Liang
Jiang, Youliang
Dang, Yixuan
Hou, Haoran
Shen, Peiyi
Zhao, Xia
Shah, Syed Afaq Ali
Bennamoun, Mohammed
NEUROCOMPUTING, 2024, 566

← 1 2 3 4 5 →