Transformer networks with adaptive inference for scene graph generation

被引:1
|
作者
Wang, Yini [1 ]
Gao, Yongbin [1 ]
Yu, Wenjun [1 ]
Guo, Ruyan [1 ]
Wan, Weibing [1 ]
Yang, Shuqun [1 ]
Huang, Bo [1 ]
机构
[1] Shanghai Univ Engn Sci, Sch Elect & Elect Engn, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Scene graph generation; Image-to-text translation; Visual relationship detection; Computer vision;
D O I
10.1007/s10489-022-04022-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Understanding a visual scene requires not only identifying single objects in isolation but also inferring the relationships and interactions between object pairs. In this study, we propose a novel scene graph generation framework based on Transformer to convert image data into linguistic descriptions characterized as nodes and edges of a graph describing the information of the given image. The proposed model consists of three components. First, we propose an enhanced object detection module with bidirectional long short-term memory (Bi-LSTM) for object-to-object information exchange to generate the classification probabilities for object bounding boxes and classes. Second, we introduce a novel context information capture module containing Transformer layers that outputs object categories containing object context as well as edge information for specific object pairs with context. Finally, since the relationship frequencies follow a long-tailed distribution, an adaptive inference module with a special feature fusion strategy is designed to soften the distribution and perform adaptive reasoning about relationship classification based on the visual appearance of object pairs. We have conducted detailed experiments on three popular open-source datasets, namely, Visual Genome, OpenImages, and Visual Relationship Detection, and have performed ablation experiments on each module, demonstrating significant improvements under different settings and in terms of various metrics.
引用
收藏
页码:9621 / 9633
页数:13
相关论文
共 50 条
  • [21] Heterogeneous Learning for Scene Graph Generation
    He, Yunqing
    Ren, Tongwei
    Tang, Jinhui
    Wu, Gangshan
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4704 - 4713
  • [22] Scene Graph Generation With Hierarchical Context
    Ren, Guanghui
    Ren, Lejian
    Liao, Yue
    Liu, Si
    Li, Bo
    Han, Jizhong
    Yan, Shuicheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (02) : 909 - 915
  • [23] RepSGG: Novel Representations of Entities and Relationships for Scene Graph Generation
    Liu, Hengyue
    Bhanu, Bir
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 8018 - 8035
  • [24] Atom correlation based graph propagation for scene graph generation
    Lin, Bingqian
    Zhu, Yi
    Liang, Xiaodan
    PATTERN RECOGNITION, 2022, 122
  • [25] Uncertainty-Aware Scene Graph Generation
    Li, Xuewei
    Wu, Tao
    Zheng, Guangcong
    Yu, Yunlong
    Li, Xi
    PATTERN RECOGNITION LETTERS, 2023, 167 : 30 - 37
  • [26] One-shot Scene Graph Generation
    Guo, Yuyu
    Song, Jingkuan
    Gao, Lianli
    Shen, Heng Tao
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3090 - 3098
  • [27] Neural Belief Propagation for Scene Graph Generation
    Liu, Daqi
    Bober, Miroslaw
    Kittler, Josef
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (08) : 10161 - 10172
  • [28] CONTEXTUAL LABEL TRANSFORMATION FOR SCENE GRAPH GENERATION
    Lee, Wonhee
    Kim, Sungeun
    Kim, Gunhee
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2533 - 2537
  • [29] Multimodal Context Embedding for Scene Graph Generation
    Jung, Gayoung
    Kim, Incheol
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2020, 16 (06): : 1250 - 1260
  • [30] Quaternion Relation Embedding for Scene Graph Generation
    Wang, Zheng
    Xu, Xing
    Wang, Guoqing
    Yang, Yang
    Shen, Heng Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8646 - 8656