ECNN: Activity Recognition Using Ensembled Convolutional Neural Networks

被引:0
作者
Roy, Aprameyo [1 ]
Mishra, Deepak [1 ]
机构
[1] Indian Inst Space Sci & Technol, Dept Avion, Thiruvananthapuram 695547, Kerala, India
来源
PROCEEDINGS OF THE 2019 IEEE REGION 10 CONFERENCE (TENCON 2019): TECHNOLOGY, KNOWLEDGE, AND SOCIETY | 2019年
关键词
Human Activity Recognition; 3D-CNN; 2D-CNN; spatio temporal; ensembling;
D O I
10.1109/tencon.2019.8929519
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Human Activity Recognition (HAR) has been a compelling problem in the field of computer vision since a long time. Our focus is to address the problem of trimmed activity recognition which is to identify the class of human activity in a video which is temporally trimmed to contain only those periods where human activity is present. In the past few years there has been a transition from handcrafted features for classification to deep convolutional neural networks which work on raw video data to extract features and classify human activities. 3D convolutional neural networks learn features from both the temporal as well as spatial dimensions and prove to be very powerful in finding correlations in signals containing spatiotemporal information. 3D-CNNs have been extremely successful in activity recognition. We explore the shortcomings of a 3D-CNN architecture and propose ensembling with a 2D-CNN to overcome these for a significantly better performance in activity recognition.
引用
收藏
页码:757 / 760
页数:4
相关论文
共 24 条
  • [1] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
    Carreira, Joao
    Zisserman, Andrew
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
  • [2] Christoph R., 2016, Advances in neural information processing systems, P3476
  • [3] Dollar P., 2005, VISUAL SURVEILLANCE, V14, P65, DOI DOI 10.1109/VSPETS.2005.1570899
  • [4] Learning Spatiotemporal Features with 3D Convolutional Networks
    Du Tran
    Bourdev, Lubomir
    Fergus, Rob
    Torresani, Lorenzo
    Paluri, Manohar
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
  • [5] Convolutional Two-Stream Network Fusion for Video Action Recognition
    Feichtenhofer, Christoph
    Pinz, Axel
    Zisserman, Andrew
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1933 - 1941
  • [6] MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation
    Jain, Arjun
    Tompson, Jonathan
    LeCun, Yann
    Bregler, Christoph
    [J]. COMPUTER VISION - ACCV 2014, PT II, 2015, 9004 : 302 - 315
  • [7] 3D Convolutional Neural Networks for Human Action Recognition
    Ji, Shuiwang
    Xu, Wei
    Yang, Ming
    Yu, Kai
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (01) : 221 - 231
  • [8] Large-scale Video Classification with Convolutional Neural Networks
    Karpathy, Andrej
    Toderici, George
    Shetty, Sanketh
    Leung, Thomas
    Sukthankar, Rahul
    Fei-Fei, Li
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 1725 - 1732
  • [9] Kay W., 2017, KINETICS HUMAN ACTIO
  • [10] KhuiTram Soomro, 2012, BMVC 2009 BRIT MACH