Constructing Hierarchical Spatiotemporal Information for Action Recognition

被引:0
作者
Yao, Guangle [1 ,2 ,3 ]
Zhong, Jiandan [1 ,2 ,3 ]
Lei, Tao [1 ]
Liu, Xianyuan [1 ]
机构
[1] Chinese Acad Sci, Inst Opt & Elect, Chengdu, Sichuan, Peoples R China
[2] Univ Elect Sci & Technol China, Chengdu, Sichuan, Peoples R China
[3] Univ Chinese Acad Sci, Beijing, Peoples R China
来源
2018 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI) | 2018年
关键词
action recognition; convolutional neural network; spatiotemporal information; action representation; optical flow; NETWORKS;
D O I
10.1109/SmartWorld.2018.00123
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Video action recognition is widely applied in video indexing, intelligent surveillance, multimedia understanding, and other fields. Recently, it was greatly improved by incorporating the convolutional neural network (ConvNet). The features of shadow layers in ConvNet tend to model the apparent and motion of actions, and the features of deep layers tend to represent actions. In this paper, we propose to construct hierarchical information by combining the spatiotemporal features of shadow and deep layers in 3D ConvNet for action recognition. Specifically, we use Res3D to extract spatiotemporal information from different types of layers, and transfer the knowledge learned from RGB to optical flow field. We also propose a Parallel Pair Discriminant Correlation Analysis (PPDCA) to fuse the multiple layers' spatiotemporal information into a compact hierarchal action representation. The experimental results show that there is a good balance between accuracy and dimension in our proposed hierarchical spatiotemporal information, and our method not only outperforms the single layer Res3D methods but also achieves recognition performance comparable to that of state-of-the-art methods.
引用
收藏
页码:596 / 602
页数:7
相关论文
共 36 条
  • [21] AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos
    Kar, Amlan
    Rai, Nishant
    Sikka, Karan
    Sharma, Gaurav
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5699 - 5708
  • [22] Large-scale Video Classification with Convolutional Neural Networks
    Karpathy, Andrej
    Toderici, George
    Shetty, Sanketh
    Leung, Thomas
    Sukthankar, Rahul
    Fei-Fei, Li
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 1725 - 1732
  • [23] Krizhevsky A., 2017, COMMUN ACM, V60, P84, DOI [DOI 10.1145/3065386, 10.1145/3065386]
  • [24] Gradient-based learning applied to document recognition
    Lecun, Y
    Bottou, L
    Bengio, Y
    Haffner, P
    [J]. PROCEEDINGS OF THE IEEE, 1998, 86 (11) : 2278 - 2324
  • [25] Ng JYH, 2015, PROC CVPR IEEE, P4694, DOI 10.1109/CVPR.2015.7299101
  • [26] Park E., 2016, P IEEE WINTER C APPL, P177
  • [27] Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice
    Peng, Xiaojiang
    Wang, Limin
    Wang, Xingxing
    Qiao, Yu
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2016, 150 : 109 - 125
  • [28] ImageNet Large Scale Visual Recognition Challenge
    Russakovsky, Olga
    Deng, Jia
    Su, Hao
    Krause, Jonathan
    Satheesh, Sanjeev
    Ma, Sean
    Huang, Zhiheng
    Karpathy, Andrej
    Khosla, Aditya
    Bernstein, Michael
    Berg, Alexander C.
    Fei-Fei, Li
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 115 (03) : 211 - 252
  • [29] Soomro Khurram., 2012, A Dataset of 101 Human Action Classes from Videos in the Wild, V2
  • [30] Szegedy Christian, 2015, P IEEE C COMP VIS PA, P1, DOI [10.1109/cvpr.2015.7298594, DOI 10.1109/CVPR.2015.7298594]