Gesture Recognition and Multi-modal Fusion on a New Hand Gesture Dataset

被引：1

作者：

Schak, Monika ^{[1
]}

Gepperth, Alexander ^{[1
]}

机构：

[1] Fulda Univ Appl Sci, D-36037 Fulda, Germany

来源：

PATTERN RECOGNITION APPLICATIONS AND METHODS, ICPRAM 2021, ICPRAM 2022 | 2023年 / 13822卷

关键词：

Hand gestures; Dataset; Multi-modal data; Data fusion; Sequence classification; Gesture recognition;

D O I：

10.1007/978-3-031-24538-1_4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a baseline for gesture recognition using state-of-the-art sequence classifiers on a new freely available multi-modal dataset of free-hand gestures. The dataset consists of roughly 100,000 samples, grouped into six classes of typical and easy-to-learn hand gestures. The dataset was recorded using two independent sensors, allowing for experiments on multi-modal data fusion at several depth levels and allowing research on multi-modal fusion for early, intermediate, and late fusion techniques. Since the whole dataset was recorded by a single person we ensure a very high quality of data with little to no risk for incorrectly performed gestures. We show the results of our experiments on unimodal sequence classification using a LSTM as well as a CNN classifier. We also show that multi-modal fusion of all four modalities results in higher precision using late-fusion of the output layer of an LSTM classifier trained on a single modality. Finally, we demonstrate that it is possible to perform live gesture classification using an LSTM-based gesture classifier, showing that generalization to other persons performing the gestures is high.

引用

页码：76 / 97

页数：22

共 30 条

[1] Multisensory integration: psychophysics, neurophysiology, and computation [J].

Angelaki, Dora E. ;

Gu, Yong ;

DeAngelis, Gregory C. .

CURRENT OPINION IN NEUROBIOLOGY, 2009, 19 (04) :452-458

[2]

[Anonymous], 2013, TWENTYTHIRD INT JOIN

[3] See me, hear me, touch me: multisensory integration in lateral occipital-temporal cortex [J].

Beauchamp, MS .

CURRENT OPINION IN NEUROBIOLOGY, 2005, 15 (02) :145-153

[4]

Becker S., 2018, INTERPRETING EXPLAIN

[5] Improving Human Action Recognition Using Fusion of Depth Camera and Inertial Sensors [J].

Chen, Chen ;

Jafari, Roozbeh ;

Kehtarnavaz, Nasser .

IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2015, 45 (01) :51-61

[6] Humans integrate visual and haptic information in a statistically optimal fashion [J].

Ernst, MO ;

Banks, MS .

NATURE, 2002, 415 (6870) :429-433

[7]

Escalera S, 2013, ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, P365

[8] A Generative Learning Approach to Sensor Fusion and Change Detection [J].

Gepperth, Alexander R. T. ;

Hecht, Thomas ;

Gogate, Mandar .

COGNITIVE COMPUTATION, 2016, 8 (05) :806-817

[9]

Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1007/978-3-642-24797-2, 10.1162/neco.1997.9.1.1]

[10] Real-time 3D pointing gesture recognition for natural HCI [J].

Guan, Yepeng ;

Zheng, Mingen .

2008 7TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-23, 2008, :2433-+

← 1 2 3 →