A Spatio-Temporal Deep Learning Approach For Human Action Recognition in Infrared Videos

被引：6

作者：

Shah, Anuj K. ^{[1
]}

Ghosh, Ripul ^{[2
]}

Akula, Aparna ^{[2
]}

机构：

[1] Indian Inst Engn Sci & Technol, Sch Mechatron & Robot, Sibpur, India

[2] Cent Sci Instruments Org, CSIR, Chandigarh 160030, India

来源：

OPTICS AND PHOTONICS FOR INFORMATION PROCESSING XII | 2018年 / 10751卷

关键词：

Action recognition; Infrared Images; Deep learning; Spatio-temporal; Sequential learning; FALL DETECTION;

D O I：

10.1117/12.2502993

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Human action recognition in indoor environment can prove to be very crucial in avoiding serious accidents and (or) damage. Application domain spans from monitoring the actions of solitary elders or persons with disabilities to monitoring persons working alone in a chamber or in isolated industry environment. These scenarios demand an automatic near real-time activity recognition and alert to save life and assets. In this work, considering the fact that the sensing modality should be capable of working round the clock in a non-intrusive manner, we have opted for thermal infrared camera, which captures the heat emitted by objects in the scene and generates an image. Motivated by the recent success of convolutional neural networks (CNN) for human action recognition in IR images, we extend this work by incorporating one additional dimension i.e. the temporal information. In this work, we have designed and implemented a 3D-CNN for learning the spatial as well as the sequential features in the thermal IR videos. In this work, eight action classes are considered - Walking, Standing, Falling, Lying, Sitting, Falling from chair, Sitting up (recovering from fall from sitting posture), Getting up (recovering from fall from lying posture). To evaluate the proposed framework, infrared (IR) videos of different actions were generated in three diverse environments of home - inside study room, inside a bedroom and in the garden. The dataset comprised of 2641 and 894 IR videos for training and testing respectively, each of half a second duration performed by more than 50 volunteers. We have designed and implemented 3D-CNN, comprising of two blocks, each of two convolution and one max pool layer, which automatically constructs features from raw data incorporating both spatial and temporal information to learn actions. Network parameters are learned using back-propagation algorithm and the learning is supervised. Experimental results indicate 85% classification accuracy on 894 complex test videos of the proposed Spatio-Temporal Deep Learning architecture on the IR action dataset.

引用

页数：9

共 50 条

[1] Spatio-Temporal VLAD Encoding for Human Action Recognition in Videos
Duta, Ionut C.
Ionescu, Bogdan
Aizawa, Kiyoharu
Sebe, Nicu
MULTIMEDIA MODELING (MMM 2017), PT I, 2017, 10132 : 365 - 378
[2] Human Action Recognition by Learning Spatio-Temporal Features With Deep Neural Networks
Wang, Lei
Xu, Yangyang
Cheng, Jun
Xia, Haiying
Yin, Jianqin
Wu, Jiaji
IEEE ACCESS, 2018, 6 : 17913 - 17922
[3] Spatio-Temporal Human-Object Interactions for Action Recognition in Videos
Escorcia, Victor
Carlos Niebles, Juan
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2013, : 508 - 514
[4] Unified Spatio-Temporal Attention Networks for Action Recognition in Videos
Li, Dong
Yao, Ting
Duan, Ling-Yu
Mei, Tao
Rui, Yong
IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (02) : 416 - 428
[5] LEARNING SPATIO-TEMPORAL DEPENDENCIES FOR ACTION RECOGNITION
Cai, Qiao
Yin, Yafeng
Man, Hong
2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 3740 - 3744
[6] Learning Spatio-Temporal Representations for Action Recognition: A Genetic Programming Approach
Liu, Li
Shao, Ling
Li, Xuelong
Lu, Ke
IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (01) : 158 - 170
[7] Spatio-temporal information for human action recognition
Yao, Li
Liu, Yunjian
Huang, Shihui
EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2016,
[8] Spatio-temporal information for human action recognition
Li Yao
Yunjian Liu
Shihui Huang
EURASIP Journal on Image and Video Processing, 2016
[9] Spatio-temporal Analysis for Infrared Facial Expression Recognition from Videos
Liu, Zhilei
Zhang, Cuicui
PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON VIDEO AND IMAGE PROCESSING (ICVIP 2017), 2017, : 63 - 67
[10] Action Recognition Based on Efficient Deep Feature Learning in the Spatio-Temporal Domain
Husain, Farzad
Dellen, Babette
Torras, Carme
IEEE ROBOTICS AND AUTOMATION LETTERS, 2016, 1 (02): : 984 - 991

← 1 2 3 4 5 →