Seeking a Hierarchical Prototype for Multimodal Gesture Recognition

被引:0
作者
Li, Yunan [1 ,2 ]
Qi, Tianyu [3 ]
Ma, Zhuoqi [3 ]
Quan, Dou [4 ]
Miao, Qiguang [1 ,2 ]
机构
[1] Xidian Univ, Key Lab Big Data & Intelligent Vis, Key Lab Smart Human Comp Interact & Wearable Tech, Xian, Peoples R China
[2] Xidian Univ, Minist Educ, Key Lab Collaborat Intelligence Syst, Xian 710071, Peoples R China
[3] Xidian Univ, Sch Comp Sci & Technol, Xian 710071, Peoples R China
[4] Xidian Univ, Sch Artificial Intelligence, Xian 710071, Peoples R China
基金
中国国家自然科学基金;
关键词
Generative adversarial network (GAN); gesture prototype; gesture recognition; memory bank; multimodal; NETWORKS; DATASET; FUSION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Gesture recognition has drawn considerable attention from many researchers owing to its wide range of applications. Although significant progress has been made in this field, previous works always focus on how to distinguish between different gesture classes, ignoring the influence of inner-class divergence caused by gesture-irrelevant factors. Meanwhile, for multimodal gesture recognition, feature or score fusion in the final stage is a general choice to combine the information of different modalities. Consequently, the gesture-relevant features in different modalities may be redundant, whereas the complementarity of modalities is not exploited sufficiently. To handle these problems, we propose a hierarchical gesture prototype framework to highlight gesture-relevant features such as poses and motions in this article. This framework consists of a sample-level prototype and a modal-level prototype. The sample-level gesture prototype is established with the structure of a memory bank, which avoids the distraction of gesture-irrelevant factors in each sample, such as the illumination, background, and the performers' appearances. Then the modal-level prototype is obtained via a generative adversarial network (GAN)-based subnetwork, in which the modal-invariant features are extracted and pulled together. Meanwhile, the modal-specific attribute features are used to synthesize the feature of other modalities, and the circulation of modality information helps to leverage their complementarity. Extensive experiments on three widely used gesture datasets demonstrate that our method is effective to highlight gesture-relevant features and can outperform the state-of-the-art methods.
引用
收藏
页码:198 / 209
页数:12
相关论文
共 50 条
  • [31] A Trajectory-Based Approach for Device Independent Gesture Recognition in Multimodal User Interfaces
    Wilhelm, Mathias
    Roscher, Dirk
    Blumendorf, Marco
    Albayrak, Sahin
    HAPTIC AND AUDIO INTERACTION DESIGN, 2010, 6306 : 197 - 206
  • [32] A Hybrid Multimodal Fusion Framework for sEMG-ACC-Based Hand Gesture Recognition
    Duan, Shengcai
    Wu, Le
    Xue, Bo
    Liu, Aiping
    Qian, Ruobing
    Chen, Xun
    IEEE SENSORS JOURNAL, 2023, 23 (03) : 2773 - 2782
  • [33] Multimodal Fusion-GMM based Gesture Recognition for Smart Home by WiFi Sensing
    Ding, Jianyang
    Wang, Yong
    Si, Hongyan
    Ma, Jiannan
    He, Jingwen
    Liang, Kai
    Fu, Shaozhong
    2022 IEEE 95TH VEHICULAR TECHNOLOGY CONFERENCE (VTC2022-SPRING), 2022,
  • [34] Gesture Recognition: The Gesture Segmentation Problem
    M. K. Viblis
    K. J. Kyriakopoulos
    Journal of Intelligent and Robotic Systems, 2000, 28 : 151 - 158
  • [35] Gesture recognition: The gesture segmentation problem
    Viblis, MK
    Kyriakopoulos, KJ
    JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2000, 28 (1-2) : 151 - 158
  • [36] Hierarchical Attention Approach in Multimodal Emotion Recognition for Human Robot Interaction
    Abdullah, Muhammad
    Ahmad, Mobeen
    Han, Dongil
    2021 36TH INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS AND COMMUNICATIONS (ITC-CSCC), 2021,
  • [37] A multimodal hierarchical approach to speech emotion recognition from audio and text
    Singh, Prabhav
    Srivastava, Ridam
    Rana, K. P. S.
    Kumar, Vineet
    KNOWLEDGE-BASED SYSTEMS, 2021, 229
  • [38] Emotion recognition based on brain-like multimodal hierarchical perception
    Zhu X.
    Huang Y.
    Wang X.
    Wang R.
    Multimedia Tools and Applications, 2024, 83 (18) : 56039 - 56057
  • [39] A virtual surgical prototype system based on gesture recognition for virtual surgical training in maxillofacial surgery
    Zhao, Hanjiang
    Cheng, Mengjia
    Huang, Jingyang
    Li, Meng
    Cheng, Huanchong
    Tian, Kun
    Yu, Hongbo
    INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2023, 18 (05) : 909 - 919
  • [40] A virtual surgical prototype system based on gesture recognition for virtual surgical training in maxillofacial surgery
    Hanjiang Zhao
    Mengjia Cheng
    Jingyang Huang
    Meng Li
    Huanchong Cheng
    Kun Tian
    Hongbo Yu
    International Journal of Computer Assisted Radiology and Surgery, 2023, 18 : 909 - 919