Comparing random forest approaches to segmenting and classifying gestures

被引:34
作者
Joshi, Ajjen [1 ]
Monnier, Camille [2 ]
Betke, Margrit [1 ]
Sclaroff, Stan [1 ]
机构
[1] Boston Univ, Dept Comp Sci, 111 Cummington St, Boston, MA 02215 USA
[2] Charles River Analyt, Cambridge, MA 02138 USA
基金
美国国家科学基金会;
关键词
Gesture spotting; Gesture classification; Random forest classifier; RECOGNITION;
D O I
10.1016/j.imavis.2016.06.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A complete gesture recognition system should localize and classify each gesture from a given gesture vocabulary, within a continuous video stream. In this work, we compare two approaches: a method that performs the tasks of temporal segmentation and classification simultaneously with another that performs the tasks sequentially. The first method trains a single random forest model to recognize gestures from a given vocabulary, as presented in a training dataset of video plus 3D body joint locations, as well as out-of-vocabulary (non-gesture) instances. The second method employs a cascaded approach, training a binary random forest model to distinguish gestures from background and a multi-class random forest model to classify segmented gestures. Given a test input video stream, both frameworks are applied using sliding windows at multiple temporal scales. We evaluated our formulation in segmenting and recognizing gestures from two different benchmark datasets: the NATOPS dataset of 9600 gesture instances from a vocabulary of 24 aircraft handling signals, and the ChaLearn dataset of 7754 gesture instances from a vocabulary of 20 Italian communication gestures. The performance of our method compares favorably with state-of-the-art methods that employ Hidden Markov Models or Hidden Conditional Random Fields on the NATOPS dataset. We conclude with a discussion of the advantages of using our model for the task of gesture recognition and segmentation, and outline weaknesses which need to be addressed in the future. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:86 / 95
页数:10
相关论文
共 35 条
[11]   Multi-modal Gesture Recognition Challenge 2013: Dataset and Results [J].
Escalera, Sergio ;
Gonzalez, Jordi ;
Baro, Xavier ;
Reyes, Miguel ;
Lopes, Oscar ;
Guyon, Isabelle ;
Athitsos, Vassilis ;
Escalante, Hugo J. .
ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2013, :445-452
[12]   Hough Forests for Object Detection, Tracking, and Action Recognition [J].
Gall, Juergen ;
Yao, Angela ;
Razavi, Nima ;
Van Gool, Luc ;
Lempitsky, Victor .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (11) :2188-2202
[13]   Real-time sign language recognition using a consumer depth camera [J].
Kuznetsova, Alina ;
Leal-Taixe, Laura ;
Rosenhahn, Bodo .
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2013, :83-90
[14]  
Lafferty John, 2001, INT C MACH LEARN ICM
[15]   A gesture recognition system using 3D data [J].
Malassiotis, S ;
Aifanti, N ;
Strintzis, MG .
FIRST INTERNATIONAL SYMPOSIUM ON 3D DATA PROCESSING VISUALIZATION AND TRANSMISSION, 2002, :190-193
[16]   Random Forests of Local Experts for Pedestrian Detection [J].
Marin, Javier ;
Vazquez, David ;
Lopez, Antonio M. ;
Amores, Jaume ;
Leibe, Bastian .
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :2592-2599
[17]  
Miranda L., 2012, 2012 XXV SIBGRAPI - Conference on Graphics, Patterns and Images (SIBGRAPI 2012), P268, DOI 10.1109/SIBGRAPI.2012.44
[18]   Gesture recognition: A survey [J].
Mitra, Sushmita ;
Acharya, Tinku .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2007, 37 (03) :311-324
[19]  
Neverova N., 2015, IEEE Transactions on Pattern Analysis and Machine intelligence
[20]  
Neverova N., 2014, P 2014 IEEE EUR C CO