Video Scene Graph Generation with Spatial-Temporal Knowledge

被引:1
|
作者
Pu, Tao [1 ]
机构
[1] Sun Yat Sen Univ, Guangzhou, Guangdong, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Video Understanding; Vision and Language; Dynamic Scene Graph Generation;
D O I
10.1145/3581783.3613433
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Various video understanding tasks have been extensively explored in the multimedia community, among which the video scene graph generation (VidSGG) task is more challenging since it requires identifying objects in comprehensive scenes and deducing their relationships. Existing methods for this task generally aggregate object-level visual information from both spatial and temporal perspectives to better learn powerful relationship representations. However, these leading techniques merely implicitly model the spatial-temporal context, which may lead to ambiguous predicate predictions when visual relations vary frequently. In this work, I propose incorporating spatial-temporal knowledge into relation representation learning to effectively constrain the spatial prediction space within each image and sequential variation across temporal frames. To this end, I design a novel spatial-temporal knowledge-embedded transformer (STKET) that incorporates the prior spatial-temporal knowledge into the multi-head cross-attention mechanism to learn more representative relationship representations. Extensive experiments conducted on Action Genome demonstrate the effectiveness of the proposed STKET.
引用
收藏
页码:9340 / 9344
页数:5
相关论文
共 50 条
  • [1] Spatial-Temporal Transformer for Dynamic Scene Graph Generation
    Cong, Yuren
    Liao, Wentong
    Ackermann, Hanno
    Rosenhahn, Bodo
    Yang, Michael Ying
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 16352 - 16362
  • [2] Spatial–Temporal Knowledge-Embedded Transformer for Video Scene Graph Generation
    Pu, Tao
    Chen, Tianshui
    Wu, Hefeng
    Lu, Yongyi
    Lin, Liang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 556 - 568
  • [3] STGGAN: Spatial-temporal Graph Generation
    Zhang, Liming
    27TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2019), 2019, : 608 - 609
  • [4] Video summarization by spatial-temporal graph optimization
    Lu, S
    Lyu, MR
    King, I
    2004 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 2, PROCEEDINGS, 2004, : 197 - 200
  • [5] Spatial-Temporal Graph Network for Video Crowd Counting
    Wu, Zhe
    Zhang, Xinfeng
    Tian, Geng
    Wang, Yaowei
    Huang, Qingming
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (01) : 228 - 241
  • [6] Spatial-Temporal Attention Network for Temporal Knowledge Graph Completion
    Zhang, Jiasheng
    Liang, Shuang
    Deng, Zhiyi
    Shao, Jie
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT I, 2021, 12681 : 207 - 223
  • [7] Scene Cut Detection in Video by using Combination of Spatial-Temporal Video Characteristics
    Jokovic, Jugoslav
    Dordevic, Danilo
    TELSIKS 2009, VOLS 1 AND 2, 2009, : 479 - 482
  • [8] Spatial-temporal knowledge graph network for event prediction
    Huai, Zepeng
    Zhang, Dawei
    Yang, Guohua
    Tao, Jianhua
    NEUROCOMPUTING, 2023, 553
  • [9] Spatial-temporal graph attention network for video anomaly detection
    Chen, Haoyang
    Mei, Xue
    Ma, Zhiyuan
    Wu, Xinhong
    Wei, Yachuan
    IMAGE AND VISION COMPUTING, 2023, 131
  • [10] Spatial-temporal Graph Transformer Network for Spatial-temporal Forecasting
    Dao, Minh-Son
    Zetsu, Koji
    Hoang, Duy-Tang
    Proceedings - 2024 IEEE International Conference on Big Data, BigData 2024, 2024, : 1276 - 1281