Learning multi-temporal-scale deep information for action recognition

被引:25
作者
Yao, Guangle [1 ,2 ,3 ]
Lei, Tao [1 ]
Zhong, Jiandan [1 ,2 ,3 ]
Jiang, Ping [1 ]
机构
[1] Chinese Acad Sci, Inst Opt & Elect, Chengdu, Sichuan, Peoples R China
[2] Univ Elect Sci & Technol China, Chengdu, Sichuan, Peoples R China
[3] Univ Chinese Acad Sci, Beijing, Peoples R China
关键词
Action recognition; Convolutional neural networks; Deep learning; Spatiotemporal information; HISTOGRAMS; NETWORKS;
D O I
10.1007/s10489-018-1347-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Action recognition in video is widely applied in video indexing, intelligent surveillance, multimedia understanding, and other fields. A typical human action contains the spatiotemporal information from various scales. Learning and fusing the multi-temporal-scale information make action recognition more reliable in terms of recognition accuracy. To demonstrate this argument, in this paper, we use Res3D, a 3D Convolution Neural Network (CNN) architecture, to extract information in multiple temporal scales. And in each temporal scale, we transfer the knowledge learned from RGB to 3-channel optical flow (OF) and learn information from RGB and OF fields. We also propose Parallel Pair Discriminant Correlation Analysis (PPDCA) to fuse the multi-temporal-scale information into action representation with a lower dimension. Experimental results show that compared with single-temporal-scale method, the proposed multi-temporal-scale method gains higher recognition accuracy, and spends more time on feature extraction, but less time on classification due to the representation with lower dimension. Moreover, the proposed method achieves recognition performance comparable to that of the state-of-the-art methods. The source code and 3D filter animations are available online: https://github.com/JerryYaoGl/multi-temporal-scale.
引用
收藏
页码:2017 / 2029
页数:13
相关论文
共 47 条
  • [21] Human detection using oriented histograms of flow and appearance
    Dalal, Navneet
    Triggs, Bill
    Schmid, Cordelia
    [J]. COMPUTER VISION - ECCV 2006, PT 2, PROCEEDINGS, 2006, 3952 : 428 - 441
  • [22] Long-Term Recurrent Convolutional Networks for Visual Recognition and Description
    Donahue, Jeff
    Hendricks, Lisa Anne
    Rohrbach, Marcus
    Venugopalan, Subhashini
    Guadarrama, Sergio
    Saenko, Kate
    Darrell, Trevor
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (04) : 677 - 691
  • [23] Learning Spatiotemporal Features with 3D Convolutional Networks
    Du Tran
    Bourdev, Lubomir
    Fergus, Rob
    Torresani, Lorenzo
    Paluri, Manohar
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
  • [24] Efros AA, 2003, NINTH IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS I AND II, PROCEEDINGS, P726
  • [25] Learning Hierarchical Features for Scene Labeling
    Farabet, Clement
    Couprie, Camille
    Najman, Laurent
    LeCun, Yann
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) : 1915 - 1929
  • [26] Discriminant Correlation Analysis: Real-Time Feature Level Fusion for Multimodal Biometric Recognition
    Haghighat, Mohammad
    Abdel-Mottaleb, Mohamed
    Alhalabi, Wadee
    [J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2016, 11 (09) : 1984 - 1996
  • [27] 3D Convolutional Neural Networks for Human Action Recognition
    Ji, Shuiwang
    Xu, Wei
    Yang, Ming
    Yu, Kai
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (01) : 221 - 231
  • [28] Jung MJ, 2014, J IEEE I C DEVELOP L, P235, DOI 10.1109/DEVLRN.2014.6982987
  • [29] AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos
    Kar, Amlan
    Rai, Nishant
    Sikka, Karan
    Sharma, Gaurav
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5699 - 5708
  • [30] Large-scale Video Classification with Convolutional Neural Networks
    Karpathy, Andrej
    Toderici, George
    Shetty, Sanketh
    Leung, Thomas
    Sukthankar, Rahul
    Fei-Fei, Li
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 1725 - 1732