Real-time Hand Gesture Detection and Classification Using Convolutional Neural Networks

被引：137

作者：

Koepueklue, Okan ^{[1
]}

Gunduz, Ahmet ^{[1
]}

Kose, Neslihan ^{[2
]}

Rigoll, Gerhard ^{[1
]}

机构：

[1] Tech Univ Munich, Inst Human Machine Commun, Munich, Germany

[2] Intel Deutschland GmbH, Intel Labs Europe, Dependabil Res Lab, Feldkirchen, Germany

来源：

2019 14TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2019) | 2019年

关键词：

D O I：

10.1109/fg.2019.8756576

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Real-time recognition of dynamic hand gestures from video streams is a challenging task since (i) there is no indication when a gesture starts and ends in the video, (ii) performed gestures should only be recognized once, and (iii) the entire architecture should be designed considering the memory and power budget. In this work, we address these challenges by proposing a hierarchical structure enabling offline-working convolutional neural network (CNN) architectures to operate online efficiently by using sliding window approach. The proposed architecture consists of two models: (1) A detector which is a lightweight CNN architecture to detect gestures and (2) a classifier which is a deep CNN to classify the detected gestures. In order to evaluate the single-time activations of the detected gestures, we propose to use Levenshtein distance as an evaluation metric since it can measure misclassifications, multiple detections, and missing detections at the same time. We evaluate our architecture on two publicly available datasets-EgoGesture and NVIDIA Dynamic Hand Gesture Datasets-which require temporal detection and classification of the performed hand gestures. ResNeXt-101 model, which is used as a classifier, achieves the state-of-the-art offline classification accuracy of 94.04% and 83.82% for depth modality on EgoGesture and NVIDIA benchmarks, respectively. In real-time detection and classification, we obtain considerable early detections while achieving performances close to offline operation. The codes and pretrained models used in this work are publicly available(1).

引用

页码：407 / 414

页数：8

共 24 条

[1]

Abhishek KS, 2016, IEEE C ELEC DEVICES, P334, DOI 10.1109/EDSSC.2016.7785276

[2]

[Anonymous], 2017, ABS170805038 CORR

[3]

[Anonymous], 2014, ADV NEURAL INFORM PR

[4]

[Anonymous], ARXIV180407187

[5]

[Anonymous], PROC CVPR IEEE

[6]

[Anonymous], ADV NEURAL INFORM PR

[7]

[Anonymous], 2014, ARXIV

[8]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[9]

Donahue J, 2015, PROC CVPR IEEE, P2625, DOI 10.1109/CVPR.2015.7298878

[10] Learning Spatiotemporal Features with 3D Convolutional Networks [J].

Du Tran ;

Bourdev, Lubomir ;

Fergus, Rob ;

Torresani, Lorenzo ;

Paluri, Manohar .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497

← 1 2 3 →