Transformer networks with adaptive inference for scene graph generation

被引:1
|
作者
Wang, Yini [1 ]
Gao, Yongbin [1 ]
Yu, Wenjun [1 ]
Guo, Ruyan [1 ]
Wan, Weibing [1 ]
Yang, Shuqun [1 ]
Huang, Bo [1 ]
机构
[1] Shanghai Univ Engn Sci, Sch Elect & Elect Engn, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Scene graph generation; Image-to-text translation; Visual relationship detection; Computer vision;
D O I
10.1007/s10489-022-04022-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Understanding a visual scene requires not only identifying single objects in isolation but also inferring the relationships and interactions between object pairs. In this study, we propose a novel scene graph generation framework based on Transformer to convert image data into linguistic descriptions characterized as nodes and edges of a graph describing the information of the given image. The proposed model consists of three components. First, we propose an enhanced object detection module with bidirectional long short-term memory (Bi-LSTM) for object-to-object information exchange to generate the classification probabilities for object bounding boxes and classes. Second, we introduce a novel context information capture module containing Transformer layers that outputs object categories containing object context as well as edge information for specific object pairs with context. Finally, since the relationship frequencies follow a long-tailed distribution, an adaptive inference module with a special feature fusion strategy is designed to soften the distribution and perform adaptive reasoning about relationship classification based on the visual appearance of object pairs. We have conducted detailed experiments on three popular open-source datasets, namely, Visual Genome, OpenImages, and Visual Relationship Detection, and have performed ablation experiments on each module, demonstrating significant improvements under different settings and in terms of various metrics.
引用
收藏
页码:9621 / 9633
页数:13
相关论文
共 50 条
  • [31] Consistent Scene Graph Generation by Constraint Optimization
    Chen, Boqi
    Marussy, Kristof
    Pilarski, Sebastian
    Semerath, Oszkar
    Varro, Daniel
    PROCEEDINGS OF THE 37TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE 2022, 2022,
  • [32] Constrained Structure Learning for Scene Graph Generation
    Liu, Daqi
    Bober, Miroslaw
    Kittler, Josef
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (10) : 11588 - 11599
  • [33] Toward a Unified Transformer-Based Framework for Scene Graph Generation and Human-Object Interaction Detection
    He, Tao
    Gao, Lianli
    Song, Jingkuan
    Li, Yuan-Fang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 6274 - 6288
  • [34] Predicate Correlation Learning for Scene Graph Generation
    Tao, Leitian
    Mi, Li
    Li, Nannan
    Cheng, Xianhang
    Hu, Yaosi
    Chen, Zhenzhong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 4173 - 4185
  • [35] Informative Scene Graph Generation via Debiasing
    Gao, Lianli
    Lyu, Xinyu
    Guo, Yuyu
    Hu, Yuxuan
    Li, Yuan-Fang
    Xu, Lu
    Shen, Heng Tao
    Song, Jingkuan
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, : 4196 - 4219
  • [36] Knowledge-Based Scene Graph Generation with Visual Contextual Dependency
    Zhang, Lizong
    Yin, Haojun
    Hui, Bei
    Liu, Sijuan
    Zhang, Wei
    MATHEMATICS, 2022, 10 (14)
  • [37] Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation
    Hung, Zih-Siou
    Mallya, Arun
    Lazebnik, Svetlana
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (11) : 3820 - 3832
  • [38] Tackling the Challenges in Scene Graph Generation With Local-to-Global Interactions
    Woo, Sangmin
    Noh, Junhyug
    Kim, Kangil
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 9713 - 9726
  • [39] A unified deep sparse graph attention network for scene graph generation
    Zhou, Hao
    Yang, Yazhou
    Luo, Tingjin
    Zhang, Jun
    Li, Shuohao
    PATTERN RECOGNITION, 2022, 123
  • [40] Relation Detection with Transformers for Panoptic Scene Graph Generation
    Liu, Chang
    Yan, Wenchao
    Chen, Shilin
    Huang, Liqun
    Huang, Xiaotao
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT IV, 2025, 15034 : 223 - 238