Temporal Pyramid Pooling-Based Convolutional Neural Network for Action Recognition

被引:94
|
作者
Wang, Peng [1 ]
Cao, Yuanzhouhan [2 ]
Shen, Chunhua [2 ,3 ]
Liu, Lingqiao [2 ]
Shen, Heng Tao [1 ]
机构
[1] Univ Queensland, Sch Informat Technol & Elect Engn, St Lucia, Qld 4072, Australia
[2] Univ Adelaide, Sch Comp Sci, Adelaide, SA 5005, Australia
[3] Australian Ctr Robot Vis, Brisbane, Qld 4000, Australia
基金
澳大利亚研究理事会;
关键词
Action recognition; convolutional neural network (CNN); temporal pyramid pooling;
D O I
10.1109/TCSVT.2016.2576761
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Encouraged by the success of convolutional neural networks (CNNs) in image classification, recently much effort is spent on applying the CNNs to the video-based action recognition problems. One challenge is that a video contains a varying number of frames, which is incompatible to the standard input format of the CNNs. Existing methods handle this issue either by directly sampling a fixed number of frames or bypassing this issue by introducing a 3D convolutional layer, which conducts convolution in spatial-temporal domain. In this paper, we propose a novel network structure, which allows an arbitrary number of frames as the network input. The key to our solution is to introduce a module consisting of an encoding layer and a temporal pyramid pooling layer. The encoding layer maps the activation from the previous layers to a feature vector suitable for pooling, whereas the temporal pyramid pooling layer converts multiple frame-level activations into a fixed-length video-level representation. In addition, we adopt a feature concatenation layer that combines the appearance and motion information. Compared with the frame sampling strategy, our method avoids the risk of missing any important frames. Compared with the 3D convolutional method, which requires a huge video data set for network training, our model can be learned on a small target data set because we can leverage the off-the-shelf image-level CNN for model parameter initialization. Experiments on three challenging data sets, Hollywood2, HMDB51, and UCF101 demonstrate the effectiveness of the proposed network.
引用
收藏
页码:2613 / 2622
页数:10
相关论文
共 50 条
  • [1] A pyramid stripe pooling-based convolutional neural network for malware detection and classification
    Jiang J.
    Zhang Y.
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (03) : 2785 - 2796
  • [2] A Spatial Pyramid Pooling-Based Deep Convolutional Neural Network for the Classification of Electrocardiogram Beats
    Li, Jia
    Si, Yujuan
    Lang, Liuqi
    Liu, Lixun
    Xu, Tao
    APPLIED SCIENCES-BASEL, 2018, 8 (09):
  • [3] Temporal Pyramid Pooling Based Relation Network for Action Recognition
    Zheng, Zhenxing
    An, Gaoyun
    Ruan, Qiuqi
    PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 644 - 647
  • [4] Spatial-temporal pyramid based Convolutional Neural Network for action recognition
    Zheng, Zhenxing
    An, Gaoyun
    Wu, Dapeng
    Ruan, Qiuqi
    NEUROCOMPUTING, 2019, 358 : 446 - 455
  • [5] Attention pooling-based convolutional neural network for sentence modelling
    Er, Meng Joo
    Zhang, Yong
    Wang, Ning
    Pratama, Mahardhika
    INFORMATION SCIENCES, 2016, 373 : 388 - 403
  • [6] Manchu Word Recognition Based on Convolutional Neural Network with Spatial Pyramid Pooling
    Li, Min
    Zheng, Ruirui
    Xu, Shuang
    Fu, Yu
    Huang, Di
    2018 11TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2018), 2018,
  • [7] Temporal Pyramid Pooling Convolutional Neural Network for Cover Song Identification
    Yu, Zhesong
    Xu, Xiaoshuo
    Chen, Xiaoou
    Yang, Deshun
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 4846 - 4852
  • [8] Convolutional neural network with spatial pyramid pooling for hand gesture recognition
    Tan, Yong Soon
    Lim, Kian Ming
    Tee, Connie
    Lee, Chin Poo
    Low, Cheng Yaw
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (10): : 5339 - 5351
  • [9] Self-Attention Pooling-Based Long-Term Temporal Network for Action Recognition
    Li, Huifang
    Huang, Jingwei
    Zhou, Mengchu
    Shi, Qisong
    Fei, Qing
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2023, 15 (01) : 65 - 77
  • [10] Convolutional neural network with spatial pyramid pooling for hand gesture recognition
    Yong Soon Tan
    Kian Ming Lim
    Connie Tee
    Chin Poo Lee
    Cheng Yaw Low
    Neural Computing and Applications, 2021, 33 : 5339 - 5351