Global Temporal Representation Based CNNs for Infrared Action Recognition

被引:48
作者
Liu, Yang [1 ]
Lu, Zhaoyang [1 ]
Li, Jing [1 ]
Yang, Tao [2 ]
Yao, Chao [3 ]
机构
[1] Xidian Univ, Sch Telecommun Engn, Xian 710071, Shaanxi, Peoples R China
[2] Northwestern Polytech Univ, Sch Comp Sci, SAIIP, Xian 710072, Shaanxi, Peoples R China
[3] Northwestern Polytech Univ, Sch Automat, Xian 710072, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Convolutional neural networks (CNN); deep learning; global temporal information; infrared action recognition;
D O I
10.1109/LSP.2018.2823910
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Infrared human action recognition has many advantages, i.e., it is insensitive to illumination change, appearance variability, and shadows. Existing methods for infrared action recognition are either based on spatial or local temporal information, however, the global temporal information, which can better describe the movements of body parts across the whole video, is not considered. In this letter, we propose a novel global temporal representation named optical-flow stacked difference image (OFSDI) and extract robust and discriminative feature from the infrared action data by considering the local, global, and spatial temporal information together. Due to the small size of the infrared action dataset, we first apply convolutional neural networks on local, spatial, and global temporal stream respectively to obtain efficient convolutional feature maps from the raw data rather than train a classifier directly. Then these convolutional feature maps are aggregated into effective descriptors named three-stream trajectory-pooled deep-convolutional descriptors by trajectory-constrained pooling. Furthermore, we improve the robustness of these features by using the locality-constrained linear coding (LLC) method. With these features, a linear support vector machine (SVM) is adopted to classify the action data in our scheme. We conduct the experiments on infrared action recognition datasets InfAR and NTU RGB+D. The experimental results show that the proposed approach outperforms the representative state-of-the-art handcrafted features and deep learning features based methods for the infrared action recognition.
引用
收藏
页码:848 / 852
页数:5
相关论文
共 29 条
  • [1] BILEN H, 2016, PROC CVPR IEEE, P3034, DOI DOI 10.1109/CVPR.2016.331
  • [2] LIBSVM: A Library for Support Vector Machines
    Chang, Chih-Chung
    Lin, Chih-Jen
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
  • [3] Sequential Segment Networks for Action Recognition
    Chen, Quan-Qi
    Zhang, Yu-Jin
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (05) : 712 - 716
  • [4] Convolutional Two-Stream Network Fusion for Video Action Recognition
    Feichtenhofer, Christoph
    Pinz, Axel
    Zisserman, Andrew
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1933 - 1941
  • [5] InfAR dataset: Infrared action recognition at different times
    Gao, Chenqiang
    Du, Yinhe
    Liu, Jiang
    Lv, Jing
    Yang, Luyu
    Meng, Deyu
    Hauptmann, Alexander G.
    [J]. NEUROCOMPUTING, 2016, 212 : 36 - 47
  • [6] A New Dataset and Evaluation for Infrared Action Recognition
    Gao, Chenqiang
    Du, Yinhe
    Liu, Jiang
    Yang, Luyu
    Meng, Deyu
    [J]. COMPUTER VISION, CCCV 2015, PT II, 2015, 547 : 302 - 312
  • [7] 3D Convolutional Neural Networks for Human Action Recognition
    Ji, Shuiwang
    Xu, Wei
    Yang, Ming
    Yu, Kai
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (01) : 221 - 231
  • [8] Caffe: Convolutional Architecture for Fast Feature Embedding
    Jia, Yangqing
    Shelhamer, Evan
    Donahue, Jeff
    Karayev, Sergey
    Long, Jonathan
    Girshick, Ross
    Guadarrama, Sergio
    Darrell, Trevor
    [J]. PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 675 - 678
  • [9] Learning Spatiotemporal Features for Infrared Action Recognition with 3D Convolutional Neural Networks
    Jiang, Zhuolin
    Rozgic, Viktor
    Adali, Sancar
    [J]. 2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 309 - 317
  • [10] Leveraging Structural Context Models and Ranking Score Fusion for Human Interaction Prediction
    Ke, Qiuhong
    Bennamoun, Mohammed
    An, Senjian
    Sohel, Ferdous
    Boussaid, Farid
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (07) : 1712 - 1723