SSAN: Separable Self-Attention Network for Video Representation Learning

被引:20
|
作者
Guo, Xudong [1 ,3 ]
Guo, Xun [2 ]
Lu, Yan [2 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
[3] MSRA, Beijing, Peoples R China
关键词
D O I
10.1109/CVPR46437.2021.01243
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Self-attention has been successfully applied to video representation learning due to the effectiveness of modeling long range dependencies. Existing approaches build the dependencies merely by computing the pairwise correlations along spatial and temporal dimensions simultaneously. However; spatial correlations and temporal correlations represent different contextual information of scenes and temporal reasoning. Intuitively, learning spatial contextual information first will benefit temporal modeling. In this paper; we propose a separable self-attention (SSA) module, which models spatial and temporal correlations sequentially so that spatial contexts can be efficiently used in temporal modeling. By adding SSA module into 21) CNN, we build a SSA network (SSAN) for video representation learning. On the task of video action recognition, our approach outperforms state-of:the-art methods on Something Something and Kinetics-400 datasets. Our models often outperform counterparts with shallower network and fewer modalities. We further verify the semantic learning ability of our method in visual-language task of video retrieval, which showcases the homogeneity of video representations and text embeddings. On MSR-VIT and Youcook2 dowsers, video representations learnt by SSA significantly improve the state-of-the-art performance.
引用
收藏
页码:12613 / 12622
页数:10
相关论文
共 50 条
  • [41] Homogeneous Learning: Self-Attention Decentralized Deep Learning
    Sun, Yuwei
    Ochiai, Hideya
    IEEE ACCESS, 2022, 10 : 7695 - 7703
  • [42] MSAN: Multiscale self-attention network for pansharpening
    Lu, Hangyuan
    Yang, Yong
    Huang, Shuying
    Liu, Rixian
    Guo, Huimin
    PATTERN RECOGNITION, 2025, 162
  • [43] Contextualized Word Representations for Self-Attention Network
    Essam, Mariam
    Eldawlatly, Seif
    Abbas, Hazem
    PROCEEDINGS OF 2018 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND SYSTEMS (ICCES), 2018, : 116 - 121
  • [44] Self-attention recurrent network for saliency detection
    Sun, Fengdong
    Li, Wenhui
    Guan, Yuanyuan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (21) : 30793 - 30807
  • [45] Lightweight Self-Attention Network for Semantic Segmentation
    Zhou, Yan
    Zhou, Haibin
    Li, Nanjun
    Li, Jianxun
    Wang, Dongli
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [46] Self-Attention Based Network for Punctuation Restoration
    Wang, Feng
    Chen, Wei
    Yang, Zhen
    Xu, Bo
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 2803 - 2808
  • [47] QKSAN: A Quantum Kernel Self-Attention Network
    Zhao, Ren-Xin
    Shi, Jinjing
    Li, Xuelong
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 10184 - 10195
  • [48] Crowd Counting Network with Self-attention Distillation
    Li, Yaoyao
    Wang, Li
    Zhao, Huailin
    Nie, Zhen
    JOURNAL OF ROBOTICS NETWORKING AND ARTIFICIAL LIFE, 2020, 7 (02): : 116 - 120
  • [49] Variational Self-attention Network for Sequential Recommendation
    Zhao, Jing
    Zhao, Pengpeng
    Zhao, Lei
    Liu, Yanchi
    Sheng, Victor S.
    Zhou, Xiaofang
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 1559 - 1570
  • [50] Self-attention empowered graph convolutional network for structure learning and node embedding
    Jiang, Mengying
    Liu, Guizhong
    Su, Yuanchao
    Wu, Xinliang
    PATTERN RECOGNITION, 2024, 153