Applying a Convolutional Vision Transformer for Emotion Recognition in Children with Autism: Fusion of Facial Expressions and Speech Features

被引:0
作者
Wang, Yonggu [1 ]
Pan, Kailin [1 ]
Shao, Yifan [1 ]
Ma, Jiarong [1 ]
Li, Xiaojuan [2 ]
机构
[1] Zhejiang Univ Technol, Coll Educ, Hangzhou 310023, Peoples R China
[2] Zhejiang Univ Finance & Econ, Mental Hlth Educ Ctr, Hangzhou 310018, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 06期
基金
中国国家自然科学基金;
关键词
emotion recognition; multimodal feature fusion; deep learning; children with autism;
D O I
10.3390/app15063083
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
With advances in digital technology, including deep learning and big data analytics, new methods have been developed for autism diagnosis and intervention. Emotion recognition and the detection of autism in children are prominent subjects in autism research. Typically using single-modal data to analyze the emotional states of children with autism, previous research has found that the accuracy of recognition algorithms must be improved. Our study creates datasets on the facial and speech emotions of children with autism in their natural states. A convolutional vision transformer-based emotion recognition model is constructed for the two distinct datasets. The findings indicate that the model achieves accuracies of 79.12% and 83.47% for facial expression recognition and Mel spectrogram recognition, respectively. Consequently, we propose a multimodal data fusion strategy for emotion recognition and construct a feature fusion model based on an attention mechanism, which attains a recognition accuracy of 90.73%. Ultimately, by using gradient-weighted class activation mapping, a prediction heat map is produced to visualize facial expressions and speech features under four emotional states. This study offers a technical direction for the use of intelligent perception technology in the realm of special education and enriches the theory of emotional intelligence perception of children with autism.
引用
收藏
页数:35
相关论文
共 50 条
  • [31] Real-time facial emotion recognition system among children with autism based on deep learning and IoT
    Talaat, Fatma M. M.
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (17) : 12717 - 12728
  • [32] Emotion Classification in Children's Speech Using Fusion of Acoustic and Linguistic Features
    Polzehl, Tim
    Sundaram, Shiva
    Ketabdar, Hamed
    Wagner, Michael
    Metze, Florian
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 340 - +
  • [33] The role of motion and intensity in deaf children's recognition of real human facial expressions of emotion
    Jones, Anna C.
    Gutierrez, Roberto
    Ludlow, Amanda K.
    COGNITION & EMOTION, 2018, 32 (01) : 102 - 115
  • [34] A Novel Emotion-Aware Method Based on the Fusion of Textual Description of Speech, Body Movements, and Facial Expressions
    Du, Guanglong
    Zeng, Yuwen
    Su, Kang
    Li, Chunquan
    Wang, Xueqian
    Teng, Shaohua
    Li, Di
    Liu, Peter Xiaoping
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
  • [35] MaxMViT-MLP: Multiaxis and Multiscale Vision Transformers Fusion Network for Speech Emotion Recognition
    Ong, Kah Liang
    Lee, Chin Poo
    Lim, Heng Siong
    Lim, Kian Ming
    Alqahtani, Ali
    IEEE ACCESS, 2024, 12 : 18237 - 18250
  • [36] Real-time facial emotion recognition system among children with autism based on deep learning and IoT
    Fatma M. Talaat
    Neural Computing and Applications, 2023, 35 : 12717 - 12728
  • [37] Human Emotion Recognition by Integrating Facial and Speech Features: An Implementation of Multimodal Framework using CNN
    Srinivas, P. V. V. S.
    Mishra, Pragnyaban
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (01) : 592 - 603
  • [38] Semantic fusion of facial expressions and textual opinions from different datasets for learning-centered emotion recognition
    Cardenas-Lopez, Hector Manuel
    Zatarain-Cabada, Ramon
    Barron-Estrada, Maria Lucia
    Mitre-Hernandez, Hugo
    SOFT COMPUTING, 2023, 27 (22) : 17357 - 17367
  • [39] Bimodal system for emotion recognition from facial expressions and physiological signals using feature-level fusion
    Abdat, F.
    Maaoui, C.
    Pruski, A.
    UKSIM FIFTH EUROPEAN MODELLING SYMPOSIUM ON COMPUTER MODELLING AND SIMULATION (EMS 2011), 2011, : 24 - 29
  • [40] Semantic fusion of facial expressions and textual opinions from different datasets for learning-centered emotion recognition
    Héctor Manuel Cárdenas-López
    Ramón Zatarain-Cabada
    María Lucía Barrón-Estrada
    Hugo Mitre-Hernández
    Soft Computing, 2023, 27 : 17357 - 17367