Robust Human Activity Recognition Using Multimodel Feature-Level Fusion

被引：109

作者：

Ehatisham-Ul-Haq, Muhammad ^{[1
]}

Javed, Ali ^{[2
]}

Azam, Muhammad Awais ^{[1
]}

Malik, Hafiz M. A. ^{[3
]}

Irtaza, Aun ^{[4
]}

Lee, Ik Hyun ^{[5
]}

Mahmood, Muhammad Tariq ^{[6
]}

机构：

[1] Univ Engn & Technol, Dept Comp Engn, Taxila 47080, Pakistan

[2] Univ Engn & Technol, Dept Software Engn, Taxila 47080, Pakistan

[3] Univ Michigan, Dept Elect & Comp Engn, Dearborn, MI 48128 USA

[4] Univ Engn & Technol, Dept Comp Sci, Taxila 47080, Pakistan

[5] Korea Polytech Univ, Dept Mechatron, Gyeonggi Do 15073, South Korea

[6] Korea Univ Technol & Educ, Sch Comp Sci & Informat Engn, Cheonan 31253, South Korea

来源：

IEEE ACCESS | 2019年 / 7卷

基金：

新加坡国家研究基金会;

关键词：

Dense HOG; depth sensor; feature-level fusion; human action recognition; inertial sensor; RGB camera; SENSORS;

D O I：

10.1109/ACCESS.2019.2913393

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Automated recognition of human activities or actions has great significance as it incorporates wide-ranging applications, including surveillance, robotics, and personal health monitoring. Over the past few years, many computer vision-based methods have been developed for recognizing human actions from RGB and depth camera videos. These methods include space-time trajectory, motion encoding, key poses extraction, space-time occupancy patterns, depth motion maps, and skeleton joints. However, these camera-based approaches are affected by background clutter and illumination changes and applicable to a limited field of view only. Wearable inertial sensors provide a viable solution to these challenges but are subject to several limitations such as location and orientation sensitivity. Due to the complementary trait of the data obtained from the camera and inertial sensors, the utilization of multiple sensing modalities for accurate recognition of human actions is gradually increasing. This paper presents a viable multimodal feature-level fusion approach for robust human action recognition, which utilizes data from multiple sensors, including RGB camera, depth sensor, and wearable inertial sensors. We extracted the computationally efficient features from the data obtained from RGB-D video camera and inertial body sensors. These features include densely extracted histogram of oriented gradient (HOG) features from RGB/depth videos and statistical signal attributes from wearable sensors data. The proposed human action recognition (HAR) framework is tested on a publicly available multimodal human action dataset U ID-MHAD consisting of 27 different human actions. K-nearest neighbor and support vector machine classifiers are used for training and testing the proposed fusion model for HAR. The experimental results indicate that the proposed scheme achieves better recognition results as compared to the state of the art. The feature-level fusion of RGB and inertial sensors provides the overall best performance for the proposed system, with an accuracy rate of 97.6%.

引用

页码：60736 / 60751

页数：16

共 73 条

[1] Human Activity Analysis: A Review [J].

Aggarwal, J. K. ;

Ryoo, M. S. .

ACM COMPUTING SURVEYS, 2011, 43 (03)

[2]

Anjum A, 2013, 2013 IEEE CONSUMER COMMUNICATIONS AND NETWORKING CONFERENCE (CCNC), P914, DOI 10.1109/CCNC.2013.6488584

[3]

[Anonymous], 2015, 2015 IFIP NETW C IFI

[4]

[Anonymous], IEEE Transactions on Pattern Analysis and Machine Intelligence

[5]

[Anonymous], P BRIT VIS C BMVC SP

[6]

[Anonymous], IEEE T SYST MAN CYBE

[7]

Ben Mahjoub A, 2016, INT DES TEST SYMP, P83, DOI 10.1109/IDT.2016.7843019

[8] The recognition of human movement using temporal templates [J].

Bobick, AF ;

Davis, JW .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (03) :257-267

[9]

Chen C, 2016, INT CONF ACOUST SPEE, P2712, DOI 10.1109/ICASSP.2016.7472170

[10] A Real-Time Human Action Recognition System Using Depth and Inertial Sensor Fusion [J].

Chen, Chen ;

Jafari, Roozbeh ;

Kehtarnavaz, Nasser .

IEEE SENSORS JOURNAL, 2016, 16 (03) :773-781

← 1 2 3 4 5 6 7 8 →