Spatiotemporal distilled dense-connectivity network for video action recognition

被引:41
|
作者
Hao, Wangli [1 ,3 ]
Zhang, Zhaoxiang [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci CASIA Beijing, Inst Automat, CRIPAC, NLPR, Beijing 100190, Peoples R China
[2] Ctr Excellence Brain Sci & Intelligence Technol C, Beijing 100190, Peoples R China
[3] Univ Chinese Acad Sci UCAS Beijing, Beijing 100190, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Two-stream; Action recognition; Dense-connectivity; Knowledge distillation;
D O I
10.1016/j.patcog.2019.03.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Two-stream convolutional neural networks show great promise for action recognition tasks. However, most two-stream based approaches train the appearance and motion subnetworks independently, which may lead to the decline in performance due to the lack of interactions among two streams. To overcome this limitation, we propose a Spatiotemporal Distilled Dense-Connectivity Network (STDDCN) for video action recognition. This network implements both knowledge distillation and dense-connectivity (adapted from DenseNet). Using this STDDCN architecture, we aim to explore interaction strategies between appearance and motion streams along different hierarchies. Specifically, block-level dense connections between appearance and motion pathways enable spatiotemporal interaction at the feature representation layers. Moreover, knowledge distillation among two streams (each treated as a student) and their last fusion (treated as teacher) allows both streams to interact at the high level layers. The special architecture of STDDCN allows it to gradually obtain effective hierarchical spatiotemporal features. Moreover, it can be trained end-to-end. Finally, numerous ablation studies validate the effectiveness and generalization of our model on two benchmark datasets, including UCF101 and HMDB51. Simultaneously, our model achieves promising performances. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:13 / 24
页数:12
相关论文
共 50 条
  • [31] Multipath Attention and Adaptive Gating Network for Video Action Recognition
    Zhang, Haiping
    Hu, Zepeng
    Yu, Dongjin
    Guan, Liming
    Liu, Xu
    Ma, Conghao
    NEURAL PROCESSING LETTERS, 2024, 56 (02)
  • [32] Multipath Attention and Adaptive Gating Network for Video Action Recognition
    Haiping Zhang
    Zepeng Hu
    Dongjin Yu
    Liming Guan
    Xu Liu
    Conghao Ma
    Neural Processing Letters, 56
  • [33] ATTENTIONAL FUSED TEMPORAL TRANSFORMATION NETWORK FOR VIDEO ACTION RECOGNITION
    Yang, Ke
    Wang, Zhiyuan
    Dai, Huadong
    Shen, Tianlong
    Qiao, Peng
    Niu, Xin
    Jiang, Jie
    Li, Dongsheng
    Dou, Yong
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4377 - 4381
  • [34] FREQUENCY ENHANCEMENT NETWORK FOR EFFICIENT COMPRESSED VIDEO ACTION RECOGNITION
    Ming, Yue
    Xiong, Lu
    Jia, Xia
    Zheng, Qingfang
    Zhou, Jiangwan
    Feng, Fan
    Hu, Nannan
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 825 - 829
  • [35] Convolutional Neural Network-Based Video Super-Resolution for Action Recognition
    Zhang, Haochen
    Liu, Dong
    Xiong, Zhiwei
    PROCEEDINGS 2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018), 2018, : 746 - 750
  • [36] An Effective Video Transformer With Synchronized Spatiotemporal and Spatial Self-Attention for Action Recognition
    Alfasly, Saghir
    Chui, Charles K.
    Jiang, Qingtang
    Lu, Jian
    Xu, Chen
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 2496 - 2509
  • [37] Deep learning network model based on fusion of spatiotemporal features for action recognition
    Yang, Ge
    Zou, Wu-xing
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (07) : 9875 - 9896
  • [38] Action Recognition Network Based on Local Spatiotemporal Features and Global Temporal Excitation
    Li, Shukai
    Wang, Xiaofang
    Shan, Dongri
    Zhang, Peng
    APPLIED SCIENCES-BASEL, 2023, 13 (11):
  • [39] Deep learning network model based on fusion of spatiotemporal features for action recognition
    Ge Yang
    Wu-xing Zou
    Multimedia Tools and Applications, 2022, 81 : 9875 - 9896
  • [40] CSST-Net: Channel Split Spatiotemporal Network for Human Action Recognition
    Zhou, Xuan
    Ma, Jixiang
    Yi, Jianping
    INFORMATION TECHNOLOGY AND CONTROL, 2023, 52 (04): : 952 - 965