Seeking a Hierarchical Prototype for Multimodal Gesture Recognition

被引:4
|
作者
Li, Yunan [1 ,2 ]
Qi, Tianyu [3 ]
Ma, Zhuoqi [3 ]
Quan, Dou [4 ]
Miao, Qiguang [1 ,2 ]
机构
[1] Xidian Univ, Sch Comp Sci & Technol, Xian Key Lab Big Data & Intelligent Vis, Key Lab Smart Human Comp Interact & Wearable Techn, Xian 710071, Peoples R China
[2] Xidian Univ, Key Lab Collaborat Intelligence Syst, Minist Educ, Xian 710071, Peoples R China
[3] Xidian Univ, Sch Comp Sci & Technol, Xian 710071, Peoples R China
[4] Xidian Univ, Sch Artificial Intelligence, Xian 710071, Peoples R China
基金
中国国家自然科学基金;
关键词
Generative adversarial network (GAN); gesture prototype; gesture recognition; memory bank; multimodal; NETWORKS; DATASET; FUSION;
D O I
10.1109/TNNLS.2023.3295811
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Gesture recognition has drawn considerable attention from many researchers owing to its wide range of applications. Although significant progress has been made in this field, previous works always focus on how to distinguish between different gesture classes, ignoring the influence of inner-class divergence caused by gesture-irrelevant factors. Meanwhile, for multimodal gesture recognition, feature or score fusion in the final stage is a general choice to combine the information of different modalities. Consequently, the gesture-relevant features in different modalities may be redundant, whereas the complementarity of modalities is not exploited sufficiently. To handle these problems, we propose a hierarchical gesture prototype framework to highlight gesture-relevant features such as poses and motions in this article. This framework consists of a sample-level prototype and a modal-level prototype. The sample-level gesture prototype is established with the structure of a memory bank, which avoids the distraction of gesture-irrelevant factors in each sample, such as the illumination, background, and the performers' appearances. Then the modal-level prototype is obtained via a generative adversarial network (GAN)-based subnetwork, in which the modal-invariant features are extracted and pulled together. Meanwhile, the modal-specific attribute features are used to synthesize the feature of other modalities, and the circulation of modality information helps to leverage their complementarity. Extensive experiments on three widely used gesture datasets demonstrate that our method is effective to highlight gesture-relevant features and can outperform the state-of-the-art methods.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 50 条
  • [1] Seeking a Hierarchical Prototype for Multimodal Gesture Recognition
    Li, Yunan
    Qi, Tianyu
    Ma, Zhuoqi
    Quan, Dou
    Miao, Qiguang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 198 - 209
  • [2] Gesture Recognition with Focuses Using Hierarchical Body Part Combination
    Zhang, Cheng
    Hou, Yibin
    He, Jian
    Xie, Xiaoyang
    TSINGHUA SCIENCE AND TECHNOLOGY, 2025, 30 (04): : 1583 - 1599
  • [3] Challenges in multimodal gesture recognition
    Escalera, Sergio
    Athitsos, Vassilis
    Guyon, Isabelle
    JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [4] Multimodal Dynamic Networks for Gesture Recognition
    Wu, Di
    Shao, Ling
    PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 945 - 948
  • [5] Multimodal emotion recognition with hierarchical memory networks
    Lai, Helang
    Wu, Keke
    Li, Lingli
    INTELLIGENT DATA ANALYSIS, 2021, 25 (04) : 1031 - 1045
  • [6] Gesture Recognition in Robotic Surgery With Multimodal Attention
    van Amsterdam, Beatrice
    Funke, Isabel
    Edwards, Eddie
    Speidel, Stefanie
    Collins, Justin
    Sridhar, Ashwin
    Kelly, John
    Clarkson, Matthew J.
    Stoyanov, Danail
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2022, 41 (07) : 1677 - 1687
  • [7] Multimodal Gesture Recognition Based on Choquet Integral
    Hirota, K.
    Vu, H. A.
    Le, P. Q.
    Fatichah, C.
    Liu, Z.
    Tang, Y.
    Tangel, M. L.
    Mu, Z.
    Sun, B.
    Yan, F.
    Masano, D.
    Thet, O.
    Yamaguchi, M.
    Dong, F.
    Yamazaki, Y.
    IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ 2011), 2011, : 772 - 776
  • [8] A Multimodal Fusion Model Based on Hybrid Attention Mechanism for Gesture Recognition
    Li, Yajie
    Chen, Yiqiang
    Gu, Yang
    Ouyang, Jianquan
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2020, 2021, 12644 : 302 - 312
  • [9] Multimodal Gesture Recognition via Multiple Hypotheses Rescoring
    Pitsikalis, Vassilis
    Katsamanis, Athanasios
    Theodorakis, Stavros
    Maragos, Petros
    JOURNAL OF MACHINE LEARNING RESEARCH, 2015, 16 : 255 - 284
  • [10] Gesture recognition based on multilevel multimodal feature fusion
    Tian, Jinrong
    Cheng, Wentao
    Sun, Ying
    Li, Gongfa
    Jiang, Du
    Jiang, Guozhang
    Tao, Bo
    Zhao, Haoyi
    Chen, Disi
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 38 (03) : 2539 - 2550