SSAN: Separable Self-Attention Network for Video Representation Learning

被引:20
|
作者
Guo, Xudong [1 ,3 ]
Guo, Xun [2 ]
Lu, Yan [2 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
[3] MSRA, Beijing, Peoples R China
关键词
D O I
10.1109/CVPR46437.2021.01243
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Self-attention has been successfully applied to video representation learning due to the effectiveness of modeling long range dependencies. Existing approaches build the dependencies merely by computing the pairwise correlations along spatial and temporal dimensions simultaneously. However; spatial correlations and temporal correlations represent different contextual information of scenes and temporal reasoning. Intuitively, learning spatial contextual information first will benefit temporal modeling. In this paper; we propose a separable self-attention (SSA) module, which models spatial and temporal correlations sequentially so that spatial contexts can be efficiently used in temporal modeling. By adding SSA module into 21) CNN, we build a SSA network (SSAN) for video representation learning. On the task of video action recognition, our approach outperforms state-of:the-art methods on Something Something and Kinetics-400 datasets. Our models often outperform counterparts with shallower network and fewer modalities. We further verify the semantic learning ability of our method in visual-language task of video retrieval, which showcases the homogeneity of video representations and text embeddings. On MSR-VIT and Youcook2 dowsers, video representations learnt by SSA significantly improve the state-of-the-art performance.
引用
收藏
页码:12613 / 12622
页数:10
相关论文
共 50 条
  • [31] Self-attention Multi-view Representation Learning with Diversity-promoting Complementarity
    Liu, Jian-wei
    Ding, Xi-hao
    Lu, Run-kun
    Luo, Xionglin
    PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 3972 - 3978
  • [32] Script event prediction method based on self-attention mechanism and graph representation learning
    Hu, Meng
    Bai, Lu
    Yang, Mei
    2022 IEEE 6TH ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2022, : 722 - 726
  • [33] Spatio-Temporal Self-Attention Network for Fire Detection and Segmentation in Video Surveillance
    Shahid, Mohammad
    Virtusio, John Jethro
    Wu, Yu-Hsien
    Chen, Yung-Yao
    Tanveer, M.
    Muhammad, Khan
    Hua, Kai-Lung
    IEEE ACCESS, 2022, 10 : 1259 - 1275
  • [34] Attentional control and the self: The Self-Attention Network (SAN)
    Humphreys, Glyn W.
    Sui, Jie
    COGNITIVE NEUROSCIENCE, 2016, 7 (1-4) : 5 - 17
  • [35] DySAT: Deep Neural Representation Learning on Dynamic Graphs via Self-Attention Networks
    Sankar, Aravind
    Wu, Yanhong
    Gou, Liang
    Zhang, Wei
    Yang, Hao
    PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM '20), 2020, : 519 - 527
  • [36] Self-Attention with Cross-Lingual Position Representation
    Ding, Liang
    Wang, Longyue
    Tao, Dacheng
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 1679 - 1685
  • [37] INTEGRATING DEPENDENCY TREE INTO SELF-ATTENTION FOR SENTENCE REPRESENTATION
    Ma, Junhua
    Li, Jiajun
    Liu, Yuxuan
    Zhou, Shangbo
    Li, Xue
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8137 - 8141
  • [38] Relational Self-Attention: What's Missing in Attention for Video Understanding
    Kim, Manjin
    Kwon, Heeseung
    Wang, Chunyu
    Kwak, Suha
    Cho, Minsu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [39] Self-attention binary neural tree for video summarization
    Fu, Hao
    Wang, Hongxing
    PATTERN RECOGNITION LETTERS, 2021, 143 : 19 - 26
  • [40] Self-attention binary neural tree for video summarization
    Fu, Hao
    Wang, Hongxing
    Wang, Hongxing (ihxwang@cqu.edu.cn), 1600, Elsevier B.V. (143): : 19 - 26