Applying a Convolutional Vision Transformer for Emotion Recognition in Children with Autism: Fusion of Facial Expressions and Speech Features

被引：0

作者：

Wang, Yonggu ^{[1
]}

Pan, Kailin ^{[1
]}

Shao, Yifan ^{[1
]}

Ma, Jiarong ^{[1
]}

Li, Xiaojuan ^{[2
]}

机构：

[1] Zhejiang Univ Technol, Coll Educ, Hangzhou 310023, Peoples R China

[2] Zhejiang Univ Finance & Econ, Mental Hlth Educ Ctr, Hangzhou 310018, Peoples R China

来源：

APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 06期

基金：

中国国家自然科学基金;

关键词：

emotion recognition; multimodal feature fusion; deep learning; children with autism;

D O I：

10.3390/app15063083

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

With advances in digital technology, including deep learning and big data analytics, new methods have been developed for autism diagnosis and intervention. Emotion recognition and the detection of autism in children are prominent subjects in autism research. Typically using single-modal data to analyze the emotional states of children with autism, previous research has found that the accuracy of recognition algorithms must be improved. Our study creates datasets on the facial and speech emotions of children with autism in their natural states. A convolutional vision transformer-based emotion recognition model is constructed for the two distinct datasets. The findings indicate that the model achieves accuracies of 79.12% and 83.47% for facial expression recognition and Mel spectrogram recognition, respectively. Consequently, we propose a multimodal data fusion strategy for emotion recognition and construct a feature fusion model based on an attention mechanism, which attains a recognition accuracy of 90.73%. Ultimately, by using gradient-weighted class activation mapping, a prediction heat map is produced to visualize facial expressions and speech features under four emotional states. This study offers a technical direction for the use of intelligent perception technology in the realm of special education and enriches the theory of emotional intelligence perception of children with autism.

引用

页数：35

共 50 条

[31] Real-time facial emotion recognition system among children with autism based on deep learning and IoT
Talaat, Fatma M. M.
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (17) : 12717 - 12728
[32] Emotion Classification in Children's Speech Using Fusion of Acoustic and Linguistic Features
Polzehl, Tim
Sundaram, Shiva
Ketabdar, Hamed
Wagner, Michael
Metze, Florian
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 340 - +
[33] The role of motion and intensity in deaf children's recognition of real human facial expressions of emotion
Jones, Anna C.
Gutierrez, Roberto
Ludlow, Amanda K.
COGNITION & EMOTION, 2018, 32 (01) : 102 - 115
[34] A Novel Emotion-Aware Method Based on the Fusion of Textual Description of Speech, Body Movements, and Facial Expressions
Du, Guanglong
Zeng, Yuwen
Su, Kang
Li, Chunquan
Wang, Xueqian
Teng, Shaohua
Li, Di
Liu, Peter Xiaoping
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
[35] MaxMViT-MLP: Multiaxis and Multiscale Vision Transformers Fusion Network for Speech Emotion Recognition
Ong, Kah Liang
Lee, Chin Poo
Lim, Heng Siong
Lim, Kian Ming
Alqahtani, Ali
IEEE ACCESS, 2024, 12 : 18237 - 18250
[36] Real-time facial emotion recognition system among children with autism based on deep learning and IoT
Fatma M. Talaat
Neural Computing and Applications, 2023, 35 : 12717 - 12728
[37] Human Emotion Recognition by Integrating Facial and Speech Features: An Implementation of Multimodal Framework using CNN
Srinivas, P. V. V. S.
Mishra, Pragnyaban
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (01) : 592 - 603
[38] Semantic fusion of facial expressions and textual opinions from different datasets for learning-centered emotion recognition
Cardenas-Lopez, Hector Manuel
Zatarain-Cabada, Ramon
Barron-Estrada, Maria Lucia
Mitre-Hernandez, Hugo
SOFT COMPUTING, 2023, 27 (22) : 17357 - 17367
[39] Bimodal system for emotion recognition from facial expressions and physiological signals using feature-level fusion
Abdat, F.
Maaoui, C.
Pruski, A.
UKSIM FIFTH EUROPEAN MODELLING SYMPOSIUM ON COMPUTER MODELLING AND SIMULATION (EMS 2011), 2011, : 24 - 29
[40] Semantic fusion of facial expressions and textual opinions from different datasets for learning-centered emotion recognition
Héctor Manuel Cárdenas-López
Ramón Zatarain-Cabada
María Lucía Barrón-Estrada
Hugo Mitre-Hernández
Soft Computing, 2023, 27 : 17357 - 17367

← 1 2 3 4 5 →