RepSGG: Novel Representations of Entities and Relationships for Scene Graph Generation

被引:3
|
作者
Liu, Hengyue [1 ]
Bhanu, Bir [1 ]
机构
[1] Univ Calif Riverside, Dept Elect & Comp Engn, Riverside, CA 92521 USA
基金
美国国家科学基金会;
关键词
Feature extraction; Visualization; Semantics; Task analysis; Detectors; Shape; Training; Scene graph generation; visual relationship detection; long-tailed learning; human-Object interaction;
D O I
10.1109/TPAMI.2024.3402143
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene Graph Generation (SGG) has achieved significant progress recently. However, most previous works rely heavily on fixed-size entity representations based on bounding box proposals, anchors, or learnable queries. As each representation's cardinality has different trade-offs between performance and computation overhead, extracting highly representative features efficiently and dynamically is both challenging and crucial for SGG. In this work, a novel architecture called RepSGG is proposed to address the aforementioned challenges, formulating a subject as queries, an object as keys, and their relationship as the maximum attention weight between pairwise queries and keys. With more fine-grained and flexible representation power for entities and relationships, RepSGG learns to sample semantically discriminative and representative points for relationship inference. Moreover, the long-tailed distribution also poses a significant challenge for generalization of SGG. A run-time performance-guided logit adjustment (PGLA) strategy is proposed such that the relationship logits are modified via affine transformations based on run-time performance during training. This strategy encourages a more balanced performance between dominant and rare classes. Experimental results show that RepSGG achieves the state-of-the-art or comparable performance on the Visual Genome and Open Images V6 datasets with fast inference speed, demonstrating the efficacy and efficiency of the proposed methods.
引用
收藏
页码:8018 / 8035
页数:18
相关论文
共 50 条
  • [31] 3D Scene Graph Generation From Point Clouds
    Wei, Wenwen
    Wei, Ping
    Qin, Jialu
    Liao, Zhimin
    Wang, Shuaijie
    Cheng, Xiang
    Liu, Meiqin
    Zheng, Nanning
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5358 - 5368
  • [32] Transformer networks with adaptive inference for scene graph generation
    Wang, Yini
    Gao, Yongbin
    Yu, Wenjun
    Guo, Ruyan
    Wan, Weibing
    Yang, Shuqun
    Huang, Bo
    APPLIED INTELLIGENCE, 2023, 53 (08) : 9621 - 9633
  • [33] BiFormer for Scene Graph Generation Based on VisionNet With Taylor Hiking Optimization Algorithm
    Monesh, S.
    Senthilkumar, N. C.
    IEEE ACCESS, 2025, 13 : 57207 - 57222
  • [34] Transformer networks with adaptive inference for scene graph generation
    Yini Wang
    Yongbin Gao
    Wenjun Yu
    Ruyan Guo
    Weibing Wan
    Shuqun Yang
    Bo Huang
    Applied Intelligence, 2023, 53 : 9621 - 9633
  • [35] SGG-MVAR: Cross-Modal Retrieval With Scene Graph Generation and Multiview Attribute Relationship Guidance
    Wang, Suping
    Zhou, Fei
    Yang, Ming
    Shi, Lei
    Tan, Chaohong
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2025,
  • [36] Gaussian Distribution-Aware Commonsense Knowledge Learning for Scene Graph Generation
    Tian, Hongshuo
    Xu, Ning
    Kankanhalli, Mohan
    Liu, An-An
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 13044 - 13057
  • [37] USGG: UNION MESSAGE BASED SCENE GRAPH GENERATION
    Sun, Shiqi
    Huang, Danlan
    Qin, Zhijin
    Tao, Xiaoming
    Pan, Chengkang
    Liu, Guangyi
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2575 - 2579
  • [38] Beware of Overcorrection: Scene-induced Commonsense Graph for Scene Graph Generation
    Chen, Lianggangxu
    Lu, Jiale
    Song, Youqi
    Wang, Changbo
    He, Gaoqi
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2888 - 2897
  • [39] Dual-Branch Hybrid Learning Network for Unbiased Scene Graph Generation
    Zheng, Chaofan
    Gao, Lianli
    Lyu, Xinyu
    Zeng, Pengpeng
    El Saddik, Abdulmotaleb
    Shen, Heng Tao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (03) : 1743 - 1756
  • [40] Graph R-CNN for Scene Graph Generation
    Yang, Jianwei
    Lu, Jiasen
    Lee, Stefan
    Batra, Dhruv
    Parikh, Devi
    COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 690 - 706