A Spatio-Temporal Deep Learning Approach For Human Action Recognition in Infrared Videos

被引:6
|
作者
Shah, Anuj K. [1 ]
Ghosh, Ripul [2 ]
Akula, Aparna [2 ]
机构
[1] Indian Inst Engn Sci & Technol, Sch Mechatron & Robot, Sibpur, India
[2] Cent Sci Instruments Org, CSIR, Chandigarh 160030, India
来源
OPTICS AND PHOTONICS FOR INFORMATION PROCESSING XII | 2018年 / 10751卷
关键词
Action recognition; Infrared Images; Deep learning; Spatio-temporal; Sequential learning; FALL DETECTION;
D O I
10.1117/12.2502993
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Human action recognition in indoor environment can prove to be very crucial in avoiding serious accidents and (or) damage. Application domain spans from monitoring the actions of solitary elders or persons with disabilities to monitoring persons working alone in a chamber or in isolated industry environment. These scenarios demand an automatic near real-time activity recognition and alert to save life and assets. In this work, considering the fact that the sensing modality should be capable of working round the clock in a non-intrusive manner, we have opted for thermal infrared camera, which captures the heat emitted by objects in the scene and generates an image. Motivated by the recent success of convolutional neural networks (CNN) for human action recognition in IR images, we extend this work by incorporating one additional dimension i.e. the temporal information. In this work, we have designed and implemented a 3D-CNN for learning the spatial as well as the sequential features in the thermal IR videos. In this work, eight action classes are considered - Walking, Standing, Falling, Lying, Sitting, Falling from chair, Sitting up (recovering from fall from sitting posture), Getting up (recovering from fall from lying posture). To evaluate the proposed framework, infrared (IR) videos of different actions were generated in three diverse environments of home - inside study room, inside a bedroom and in the garden. The dataset comprised of 2641 and 894 IR videos for training and testing respectively, each of half a second duration performed by more than 50 volunteers. We have designed and implemented 3D-CNN, comprising of two blocks, each of two convolution and one max pool layer, which automatically constructs features from raw data incorporating both spatial and temporal information to learn actions. Network parameters are learned using back-propagation algorithm and the learning is supervised. Experimental results indicate 85% classification accuracy on 894 complex test videos of the proposed Spatio-Temporal Deep Learning architecture on the IR action dataset.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Spatio-Temporal VLAD Encoding for Human Action Recognition in Videos
    Duta, Ionut C.
    Ionescu, Bogdan
    Aizawa, Kiyoharu
    Sebe, Nicu
    MULTIMEDIA MODELING (MMM 2017), PT I, 2017, 10132 : 365 - 378
  • [2] Human Action Recognition by Learning Spatio-Temporal Features With Deep Neural Networks
    Wang, Lei
    Xu, Yangyang
    Cheng, Jun
    Xia, Haiying
    Yin, Jianqin
    Wu, Jiaji
    IEEE ACCESS, 2018, 6 : 17913 - 17922
  • [3] Spatio-Temporal Human-Object Interactions for Action Recognition in Videos
    Escorcia, Victor
    Carlos Niebles, Juan
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2013, : 508 - 514
  • [4] Unified Spatio-Temporal Attention Networks for Action Recognition in Videos
    Li, Dong
    Yao, Ting
    Duan, Ling-Yu
    Mei, Tao
    Rui, Yong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (02) : 416 - 428
  • [5] LEARNING SPATIO-TEMPORAL DEPENDENCIES FOR ACTION RECOGNITION
    Cai, Qiao
    Yin, Yafeng
    Man, Hong
    2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 3740 - 3744
  • [6] Learning Spatio-Temporal Representations for Action Recognition: A Genetic Programming Approach
    Liu, Li
    Shao, Ling
    Li, Xuelong
    Lu, Ke
    IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (01) : 158 - 170
  • [7] Spatio-temporal information for human action recognition
    Yao, Li
    Liu, Yunjian
    Huang, Shihui
    EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2016,
  • [8] Spatio-temporal information for human action recognition
    Li Yao
    Yunjian Liu
    Shihui Huang
    EURASIP Journal on Image and Video Processing, 2016
  • [9] Spatio-temporal Analysis for Infrared Facial Expression Recognition from Videos
    Liu, Zhilei
    Zhang, Cuicui
    PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON VIDEO AND IMAGE PROCESSING (ICVIP 2017), 2017, : 63 - 67
  • [10] Action Recognition Based on Efficient Deep Feature Learning in the Spatio-Temporal Domain
    Husain, Farzad
    Dellen, Babette
    Torras, Carme
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2016, 1 (02): : 984 - 991