Spatial-Temporal Transformer for Dynamic Scene Graph Generation

被引:67
|
作者
Cong, Yuren [1 ]
Liao, Wentong [1 ]
Ackermann, Hanno [1 ]
Rosenhahn, Bodo [1 ]
Yang, Michael Ying [2 ]
机构
[1] Leibniz Univ Hannover, TNT, Hannover, Germany
[2] Univ Twente, SUG, Enschede, Netherlands
关键词
LANGUAGE;
D O I
10.1109/ICCV48922.2021.01606
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dynamic scene graph generation aims at generating a scene graph of the given video. Compared to the task of scene graph generation from images, it is more challenging because of the dynamic relationships between objects and the temporal dependencies between frames allowing for a richer semantic interpretation. In this paper, we propose Spatial-temporal Transformer (STTran), a neural network that consists of two core modules: (1) a spatial encoder that takes an input frame to extract spatial context and reason about the visual relationships within a frame, and (2) a temporal decoder which takes the output of the spatial encoder as input in order to capture the temporal dependencies between frames and infer the dynamic relationships. Furthermore, STTran is flexible to take varying lengths of videos as input without clipping, which is especially important for long videos. Our method is validated on the benchmark dataset Action Genome (AG). The experimental results demonstrate the superior performance of our method in terms of dynamic scene graphs. Moreover, a set of ablative studies is conducted and the effect of each proposed module is justified. Code available at: https://github.com/yrcong/STTran.
引用
收藏
页码:16352 / 16362
页数:11
相关论文
共 50 条
  • [1] Video Scene Graph Generation with Spatial-Temporal Knowledge
    Pu, Tao
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9340 - 9344
  • [2] Spatial-temporal Graph Transformer Network for Spatial-temporal Forecasting
    Dao, Minh-Son
    Zetsu, Koji
    Hoang, Duy-Tang
    Proceedings - 2024 IEEE International Conference on Big Data, BigData 2024, 2024, : 1276 - 1281
  • [3] DyAdapTransformer: Dynamic Adaptive Spatial-Temporal Graph Transformer for Traffic Prediction
    Dong, Hui
    Pan, Xiao
    Chen, Xiao
    Sun, Jing
    Wang, Shuhai
    SPATIAL DATA AND INTELLIGENCE, SPATIALDI 2024, 2024, 14619 : 228 - 241
  • [4] Concurrent Transformer for Spatial-Temporal Graph Modeling
    Xie, Yi
    Xiong, Yun
    Zhu, Yangyong
    Yu, Philip S.
    Jin, Cheng
    Wang, Qiang
    Li, Haihong
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2022, PT III, 2022, : 314 - 321
  • [5] Spatial–Temporal Knowledge-Embedded Transformer for Video Scene Graph Generation
    Pu, Tao
    Chen, Tianshui
    Wu, Hefeng
    Lu, Yongyi
    Lin, Liang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 556 - 568
  • [6] STGGAN: Spatial-temporal Graph Generation
    Zhang, Liming
    27TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2019), 2019, : 608 - 609
  • [7] A spatial-temporal graph gated transformer for traffic forecasting
    Bouchemoukha, Haroun
    Zennir, Mohamed Nadjib
    Alioua, Ahmed
    TRANSACTIONS ON EMERGING TELECOMMUNICATIONS TECHNOLOGIES, 2024, 35 (07):
  • [8] Graph Spatial-Temporal Transformer Network for Traffic Prediction
    Zhao, Zhenzhen
    Shen, Guojiang
    Wang, Lei
    Kong, Xiangjie
    BIG DATA RESEARCH, 2024, 36
  • [9] TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking
    Chu, Peng
    Wang, Jiang
    You, Quanzeng
    Ling, Haibin
    Liu, Zicheng
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 4859 - 4869
  • [10] Spatial-Temporal Graph Sandwich Transformer for Traffic Flow Forecasting
    Fan, Yujie
    Yeh, Chin-Chia Michael
    Chen, Huiyuan
    Wang, Liang
    Zhuang, Zhongfang
    Wang, Junpeng
    Dai, Xin
    Zheng, Yan
    Zhang, Wei
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE AND DEMO TRACK, ECML PKDD 2023, PT VII, 2023, 14175 : 210 - 225