A Multi Modal Approach to Gesture Recognition from Audio and Video Data

被引:9
作者
Bayer, Immanuel [1 ]
Silbermann, Thierry [1 ]
机构
[1] Univ Konstanz, D-78457 Constance, Germany
来源
ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION | 2013年
关键词
Multi-modal interaction; speech and gesture recognition; fusion;
D O I
10.1145/2522848.2532592
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We describe in this paper our approach for the Multi-modal gesture recognition challenge organized by ChaLearn in conjunction with the ICMI 2013 conference. The competition's task was to learn a vocabulary of 20 types of Italian gestures performed from different persons and to detect them in sequences. We develop an algorithm to find the gesture intervals in the audio data, extract audio features from those intervals and train two different models. We engineer features from the skeleton data and use the gesture intervals in the training data to train a model that we afterwards apply to the test sequences using a sliding window. We combine the models through weighted averaging. We find that this way to combine information from two different sources boosts the models performance significantly.
引用
收藏
页码:461 / 465
页数:5
相关论文
共 6 条
  • [1] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [2] Davis P., 1980, COMP PARAM REPRESENT, V28, P357
  • [3] Greedy function approximation: A gradient boosting machine
    Friedman, JH
    [J]. ANNALS OF STATISTICS, 2001, 29 (05) : 1189 - 1232
  • [4] Extremely randomized trees
    Geurts, P
    Ernst, D
    Wehenkel, L
    [J]. MACHINE LEARNING, 2006, 63 (01) : 3 - 42
  • [5] A Survey on Transfer Learning
    Pan, Sinno Jialin
    Yang, Qiang
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (10) : 1345 - 1359
  • [6] Pedregosa F, 2011, J MACH LEARN RES, V12, P2825