Spatiotemporal distilled dense-connectivity network for video action recognition

被引:41
|
作者
Hao, Wangli [1 ,3 ]
Zhang, Zhaoxiang [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci CASIA Beijing, Inst Automat, CRIPAC, NLPR, Beijing 100190, Peoples R China
[2] Ctr Excellence Brain Sci & Intelligence Technol C, Beijing 100190, Peoples R China
[3] Univ Chinese Acad Sci UCAS Beijing, Beijing 100190, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Two-stream; Action recognition; Dense-connectivity; Knowledge distillation;
D O I
10.1016/j.patcog.2019.03.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Two-stream convolutional neural networks show great promise for action recognition tasks. However, most two-stream based approaches train the appearance and motion subnetworks independently, which may lead to the decline in performance due to the lack of interactions among two streams. To overcome this limitation, we propose a Spatiotemporal Distilled Dense-Connectivity Network (STDDCN) for video action recognition. This network implements both knowledge distillation and dense-connectivity (adapted from DenseNet). Using this STDDCN architecture, we aim to explore interaction strategies between appearance and motion streams along different hierarchies. Specifically, block-level dense connections between appearance and motion pathways enable spatiotemporal interaction at the feature representation layers. Moreover, knowledge distillation among two streams (each treated as a student) and their last fusion (treated as teacher) allows both streams to interact at the high level layers. The special architecture of STDDCN allows it to gradually obtain effective hierarchical spatiotemporal features. Moreover, it can be trained end-to-end. Finally, numerous ablation studies validate the effectiveness and generalization of our model on two benchmark datasets, including UCF101 and HMDB51. Simultaneously, our model achieves promising performances. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:13 / 24
页数:12
相关论文
共 50 条
  • [41] Video action recognition method based on attention residual network and LSTM
    Zhang, Yu
    Dong, Pengyue
    PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 3611 - 3616
  • [42] MNv3-MFAE: A Lightweight Network for Video Action Recognition
    Liu, Jie
    Liu, Wenyue
    Han, Ke
    ELECTRONICS, 2025, 14 (05):
  • [43] CANet: Comprehensive Attention Network for video-based action recognition
    Gao, Xiong
    Chang, Zhaobin
    Ran, Xingcheng
    Lu, Yonggang
    KNOWLEDGE-BASED SYSTEMS, 2024, 296
  • [44] MLENet: Multi-Level Extraction Network for video action recognition
    Wang, Fan
    Li, Xinke
    Xiong, Han
    Mo, Haofan
    Li, Yongming
    PATTERN RECOGNITION, 2024, 154
  • [45] Exploiting Spatiotemporal Features for Action Recognition
    Bin Muslim, Usairam
    Khan, Muhammad Hassan
    Farid, Muhammad Shahid
    PROCEEDINGS OF 2021 INTERNATIONAL BHURBAN CONFERENCE ON APPLIED SCIENCES AND TECHNOLOGIES (IBCAST), 2021, : 613 - 619
  • [46] Recurrent Region Attention and Video Frame Attention Based Video Action Recognition Network Design
    Sang H.-F.
    Zhao Z.-Y.
    He D.-K.
    Zhao, Zi-Yu (Maikuraky1022@outlook.com), 1600, Chinese Institute of Electronics (48): : 1052 - 1061
  • [47] Action recognition on continuous video
    Chang, Y. L.
    Chan, C. S.
    Remagnino, P.
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (04) : 1233 - 1243
  • [48] Action recognition on continuous video
    Y. L. Chang
    C. S. Chan
    P. Remagnino
    Neural Computing and Applications, 2021, 33 : 1233 - 1243
  • [49] Learning SpatioTemporal and Motion Features in a Unified 2D Network for Action Recognition
    Wang, Mengmeng
    Xing, Jiazheng
    Su, Jing
    Chen, Jun
    Liu, Yong
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) : 3347 - 3362
  • [50] An efficient video transformer network with token discard and keyframe enhancement for action recognition
    Zhang, Qian
    Yang, Zuosui
    Shao, Mingwen
    Liang, Hong
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (02)