Graph Transformer Mixture-of-Experts (GTMoE) for 3D Hand Gesture Recognition

被引：0

作者：

Alboody, Ahed ^{[1
]}

Slama, Rim ^{[1
]}

机构：

[1] CESI, UR 7527, Equipe Accueil, CESI LINEACT Lab, Nanterre, France

来源：

INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 3, INTELLISYS 2024 | 2024年 / 1067卷

关键词：

Mixture-of-Experts (MoE); Transformers; Graph convolutional network; Hand gesture recognition; 3D skeleton data; Human-machine interaction;

D O I：

10.1007/978-3-031-66431-1_21

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Mixture-of-experts (MoE) architectures have gained popularity in achieving high performance in a wide range of challenging tasks in Large Language Modeling (LLM) and Computer Vision, especially with the rise of Mixture-of-Experts with Mixtral/Mistral-7B Transformers. In this work, we propose the Graph Transformer Mixture-of-Experts (GTMoE) deep learning architecture to enhance the ability of the Transformer model with layers of Mixture-of-Experts (MoE) and Graph Convolutional Networks (GCNs) for graph learning of hand gesture recognition using 3D hand skeleton data as a challenging task. The main challenge is how to integrate Mixture-of-Experts (MoE) with graphs for 3D hand gesture recognition. In this context, the GTMoE transformer aims to overcome the efficiency use and the integration of MoE architectures with GCN for 3D hand gesture recognition. Recently, 3D hand gesture recognition has become one of the most attractive fields of research in human-computer interaction and pattern recognition. For this task, the proposed model GTMoE decouples the graph spatial and temporal learning of 3D hand gestures by integrating layers of the mixture-of-experts into a Transformer model and Graph Convolutional Networks. The principal idea is to combine the powerful layers of Mixture-of-Experts (MoE) with a Spatial Graph Convolutional Network (SGCN) that preprocess the initial spatial features of intra-frame interactions to extract powerful features from different hand joints, and then, to recognize hand gestures within the Mixture-of-Experts (MoE) of Transformer Encoder. Finally, we evaluate the performance of GTMoE Transformers on benchmarks of the SHREC'17 Track dataset. The experiments show the efficiency of some model variations of the proposed Graph Transformer Mixture-of-Experts (GTMoE), which achieves or outperforms the state-of-the-art.

引用

页码：317 / 336

页数：20

共 30 条

[1] Brown TB, 2020, Arxiv, DOI [arXiv:2005.14165, DOI 10.48550/ARXIV.2005.14165]
[2] Bai RW, 2022, Arxiv, DOI arXiv:2109.02860
[3] Caputo A., 2021, SHREC 2021: track on skeleton-based hand gesture recognition in the wild
[4] Chen YX, 2019, Arxiv, DOI arXiv:1907.08871
[5] Chen ZX, 2022, Arxiv, DOI [arXiv:2208.02813, 10.48550/arXiv.2208.02813, DOI 10.48550/ARXIV.2208.02813]
[6] Cheng ZD, 2021, Arxiv, DOI arXiv:2110.12150
[7] De Smedt Q., 2017, 3DOR 10 EUR WORKSH 3, P1, DOI DOI 10.2312/3DOR.20171049
[8] Du N, 2022, Arxiv, DOI arXiv:2112.06905
[9] Fedus W, 2022, Arxiv, DOI arXiv:2101.03961
[10] Gale Trevor, 2022, arXiv

← 1 2 3 →