Video spatiotemporal mapping for human action recognition by convolutional neural network

被引:15
作者
Zare, Amin [1 ]
Abrishami Moghaddam, Hamid [2 ]
Sharifi, Arash [1 ]
机构
[1] Islamic Azad Univ, Dept Comp Engn, Tehran, Iran
[2] KN Toosi Univ Technol, Fac Elect & Comp Engn, POB 16315-1355, Tehran, Iran
基金
巴西圣保罗研究基金会;
关键词
Video spatiotemporal mapping; Convolutional neural network; Data augmentation; Human action recognition; FEATURES; TRAJECTORIES; DESCRIPTORS; DENSE;
D O I
10.1007/s10044-019-00788-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a 2D representation of a video clip called video spatiotemporal map (VSTM) is presented. VSTM is a compact representation of a video clip which incorporates its spatial and temporal properties. It is created by vertical concatenation of feature vectors generated from subsequent frames. The feature vector corresponding to each frame is generated by applying wavelet transform to that frame (or its subtraction from the subsequent frame) and computing vertical and horizontal projection of quantized coefficients of some specific wavelet subbands. VSTM enables convolutional neural networks (CNNs) to process a video clip for human action recognition (HAR). The proposed approach benefits from power of CNNs to analyze visual patterns and attempts to overcome some CNN challenges such as variable video length problem and lack of training data that leads to over-fitting. VSTM presents a sequence of frames to CNN without imposing any additional computational cost to the CNN learning algorithm. The experimental results of the proposed method on the KTH, Weizmann, and UCF Sports HAR benchmark datasets have shown the supremacy of the proposed method compared with the state-of-the-art methods that used CNN to solve HAR problem.
引用
收藏
页码:265 / 279
页数:15
相关论文
共 67 条
  • [1] Automatic visual detection of human behavior: A review from 2000 to 2014
    Afsar, Palwasha
    Cortez, Paulo
    Santos, Henrique
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (20) : 6935 - 6956
  • [2] Al-Azzo Fadwa, 2017, 2017 Annual Conference on New Trends in Information & Communications Technology Applications (NTICT), P240, DOI 10.1109/NTICT.2017.7976123
  • [3] [Anonymous], ECCV 16
  • [4] [Anonymous], 2015, ARXIV PREPRINT ARXIV
  • [5] [Anonymous], 2014, INTRO COMPUTATIONAL
  • [6] [Anonymous], 1997, Neural Computation
  • [7] [Anonymous], ADV NEURAL INFORM PR
  • [8] Baccouche Moez, 2011, Human Behavior Unterstanding. Proceedings Second International Workshop, HBU 2011, P29, DOI 10.1007/978-3-642-25446-8_4
  • [9] Spatio-Temporal Convolutional Sparse Auto-Encoder for Sequence Classification
    Baccouche, Moez
    Mamalet, Franck
    Wolf, Christian
    Garcia, Christophe
    Baskurt, Atilla
    [J]. PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2012, 2012,
  • [10] Effective Codebooks for Human Action Representation and Classification in Unconstrained Videos
    Ballan, Lamberto
    Bertini, Marco
    Del Bimbo, Alberto
    Seidenari, Lorenzo
    Serra, Giuseppe
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2012, 14 (04) : 1234 - 1245