SSAN: Separable Self-Attention Network for Video Representation Learning

被引:20
|
作者
Guo, Xudong [1 ,3 ]
Guo, Xun [2 ]
Lu, Yan [2 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
[3] MSRA, Beijing, Peoples R China
关键词
D O I
10.1109/CVPR46437.2021.01243
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Self-attention has been successfully applied to video representation learning due to the effectiveness of modeling long range dependencies. Existing approaches build the dependencies merely by computing the pairwise correlations along spatial and temporal dimensions simultaneously. However; spatial correlations and temporal correlations represent different contextual information of scenes and temporal reasoning. Intuitively, learning spatial contextual information first will benefit temporal modeling. In this paper; we propose a separable self-attention (SSA) module, which models spatial and temporal correlations sequentially so that spatial contexts can be efficiently used in temporal modeling. By adding SSA module into 21) CNN, we build a SSA network (SSAN) for video representation learning. On the task of video action recognition, our approach outperforms state-of:the-art methods on Something Something and Kinetics-400 datasets. Our models often outperform counterparts with shallower network and fewer modalities. We further verify the semantic learning ability of our method in visual-language task of video retrieval, which showcases the homogeneity of video representations and text embeddings. On MSR-VIT and Youcook2 dowsers, video representations learnt by SSA significantly improve the state-of-the-art performance.
引用
收藏
页码:12613 / 12622
页数:10
相关论文
共 50 条
  • [1] Self-attention with Functional Time Representation Learning
    Xu, Da
    Ruan, Chuanwei
    Kumar, Sushant
    Korpeoglu, Evren
    Achan, Kannan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [2] LEARNING HIERARCHICAL SELF-ATTENTION FOR VIDEO SUMMARIZATION
    Liu, Yen-Ting
    Li, Yu-Jhe
    Yang, Fu-En
    Chen, Shang-Fu
    Wang, Yu-Chiang Frank
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3377 - 3381
  • [3] Attribute Network Representation Learning Based on Generative Adversarial Network and Self-attention Mechanism
    Li, Shanshan
    Tang, Meiling
    Dong, Yingnan
    International Journal of Network Security, 2024, 26 (01) : 51 - 58
  • [4] VStreamDRLS: Dynamic Graph Representation Learning with Self-Attention for Enterprise Distributed Video Streaming Solutions
    Antaris, Stefanos
    Rafailidis, Dimitrios
    2020 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2020, : 486 - 493
  • [5] Progressively Normalized Self-Attention Network for Video Polyp Segmentation
    Ji, Ge-Peng
    Chou, Yu-Cheng
    Fan, Deng-Ping
    Chen, Geng
    Fu, Huazhu
    Jha, Debesh
    Shao, Ling
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT I, 2021, 12901 : 142 - 152
  • [6] Subgraph representation learning with self-attention and free adversarial training
    Qin, Denggao
    Tang, Xianghong
    Lu, Jianguang
    APPLIED INTELLIGENCE, 2024, : 7012 - 7029
  • [7] SAQENet: A Quality Enhancement Network for Compressed Video with Self-attention
    Sun, Xuan
    Liu, Pengyu
    Jia, Kebin
    Chen, Shanji
    DCC 2022: 2022 DATA COMPRESSION CONFERENCE (DCC), 2022, : 485 - 485
  • [8] EGAD: Evolving Graph Representation Learning with Self-Attention and Knowledge Distillation for Live Video Streaming Events
    Antaris, Stefanos
    Rafailidis, Dimitrios
    Girdzijauskas, Sarunas
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 1455 - 1464
  • [9] Self-attention driven adversarial similarity learning network
    Gao, Xinjian
    Zhang, Zhao
    Mu, Tingting
    Zhang, Xudong
    Cui, Chaoran
    Wang, Meng
    PATTERN RECOGNITION, 2020, 105
  • [10] Self-Attention Based Video Summarization
    Li Y.
    Wang J.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2020, 32 (04): : 652 - 659