Seeking a Hierarchical Prototype for Multimodal Gesture Recognition

被引：4

作者：

Li, Yunan ^{[1
,2
]}

Qi, Tianyu ^{[3
]}

Ma, Zhuoqi ^{[3
]}

Quan, Dou ^{[4
]}

Miao, Qiguang ^{[1
,2
]}

机构：

[1] Xidian Univ, Sch Comp Sci & Technol, Xian Key Lab Big Data & Intelligent Vis, Key Lab Smart Human Comp Interact & Wearable Techn, Xian 710071, Peoples R China

[2] Xidian Univ, Key Lab Collaborat Intelligence Syst, Minist Educ, Xian 710071, Peoples R China

[3] Xidian Univ, Sch Comp Sci & Technol, Xian 710071, Peoples R China

[4] Xidian Univ, Sch Artificial Intelligence, Xian 710071, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2023年

基金：

中国国家自然科学基金;

关键词：

Generative adversarial network (GAN); gesture prototype; gesture recognition; memory bank; multimodal; NETWORKS; DATASET; FUSION;

D O I：

10.1109/TNNLS.2023.3295811

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Gesture recognition has drawn considerable attention from many researchers owing to its wide range of applications. Although significant progress has been made in this field, previous works always focus on how to distinguish between different gesture classes, ignoring the influence of inner-class divergence caused by gesture-irrelevant factors. Meanwhile, for multimodal gesture recognition, feature or score fusion in the final stage is a general choice to combine the information of different modalities. Consequently, the gesture-relevant features in different modalities may be redundant, whereas the complementarity of modalities is not exploited sufficiently. To handle these problems, we propose a hierarchical gesture prototype framework to highlight gesture-relevant features such as poses and motions in this article. This framework consists of a sample-level prototype and a modal-level prototype. The sample-level gesture prototype is established with the structure of a memory bank, which avoids the distraction of gesture-irrelevant factors in each sample, such as the illumination, background, and the performers' appearances. Then the modal-level prototype is obtained via a generative adversarial network (GAN)-based subnetwork, in which the modal-invariant features are extracted and pulled together. Meanwhile, the modal-specific attribute features are used to synthesize the feature of other modalities, and the circulation of modality information helps to leverage their complementarity. Extensive experiments on three widely used gesture datasets demonstrate that our method is effective to highlight gesture-relevant features and can outperform the state-of-the-art methods.

引用

页码：1 / 12

页数：12

共 50 条

[1] Seeking a Hierarchical Prototype for Multimodal Gesture Recognition
Li, Yunan
Qi, Tianyu
Ma, Zhuoqi
Quan, Dou
Miao, Qiguang
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 198 - 209
[2] Gesture Recognition with Focuses Using Hierarchical Body Part Combination
Zhang, Cheng
Hou, Yibin
He, Jian
Xie, Xiaoyang
TSINGHUA SCIENCE AND TECHNOLOGY, 2025, 30 (04): : 1583 - 1599
[3] Challenges in multimodal gesture recognition
Escalera, Sergio
Athitsos, Vassilis
Guyon, Isabelle
JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
[4] Multimodal Dynamic Networks for Gesture Recognition
Wu, Di
Shao, Ling
PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 945 - 948
[5] Multimodal emotion recognition with hierarchical memory networks
Lai, Helang
Wu, Keke
Li, Lingli
INTELLIGENT DATA ANALYSIS, 2021, 25 (04) : 1031 - 1045
[6] Gesture Recognition in Robotic Surgery With Multimodal Attention
van Amsterdam, Beatrice
Funke, Isabel
Edwards, Eddie
Speidel, Stefanie
Collins, Justin
Sridhar, Ashwin
Kelly, John
Clarkson, Matthew J.
Stoyanov, Danail
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2022, 41 (07) : 1677 - 1687
[7] Multimodal Gesture Recognition Based on Choquet Integral
Hirota, K.
Vu, H. A.
Le, P. Q.
Fatichah, C.
Liu, Z.
Tang, Y.
Tangel, M. L.
Mu, Z.
Sun, B.
Yan, F.
Masano, D.
Thet, O.
Yamaguchi, M.
Dong, F.
Yamazaki, Y.
IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ 2011), 2011, : 772 - 776
[8] A Multimodal Fusion Model Based on Hybrid Attention Mechanism for Gesture Recognition
Li, Yajie
Chen, Yiqiang
Gu, Yang
Ouyang, Jianquan
STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2020, 2021, 12644 : 302 - 312
[9] Multimodal Gesture Recognition via Multiple Hypotheses Rescoring
Pitsikalis, Vassilis
Katsamanis, Athanasios
Theodorakis, Stavros
Maragos, Petros
JOURNAL OF MACHINE LEARNING RESEARCH, 2015, 16 : 255 - 284
[10] Gesture recognition based on multilevel multimodal feature fusion
Tian, Jinrong
Cheng, Wentao
Sun, Ying
Li, Gongfa
Jiang, Du
Jiang, Guozhang
Tao, Bo
Zhao, Haoyi
Chen, Disi
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 38 (03) : 2539 - 2550

← 1 2 3 4 5 →