Applying a Convolutional Vision Transformer for Emotion Recognition in Children with Autism: Fusion of Facial Expressions and Speech Features

被引:0
|
作者
Wang, Yonggu [1 ]
Pan, Kailin [1 ]
Shao, Yifan [1 ]
Ma, Jiarong [1 ]
Li, Xiaojuan [2 ]
机构
[1] Zhejiang Univ Technol, Coll Educ, Hangzhou 310023, Peoples R China
[2] Zhejiang Univ Finance & Econ, Mental Hlth Educ Ctr, Hangzhou 310018, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 06期
基金
中国国家自然科学基金;
关键词
emotion recognition; multimodal feature fusion; deep learning; children with autism;
D O I
10.3390/app15063083
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
With advances in digital technology, including deep learning and big data analytics, new methods have been developed for autism diagnosis and intervention. Emotion recognition and the detection of autism in children are prominent subjects in autism research. Typically using single-modal data to analyze the emotional states of children with autism, previous research has found that the accuracy of recognition algorithms must be improved. Our study creates datasets on the facial and speech emotions of children with autism in their natural states. A convolutional vision transformer-based emotion recognition model is constructed for the two distinct datasets. The findings indicate that the model achieves accuracies of 79.12% and 83.47% for facial expression recognition and Mel spectrogram recognition, respectively. Consequently, we propose a multimodal data fusion strategy for emotion recognition and construct a feature fusion model based on an attention mechanism, which attains a recognition accuracy of 90.73%. Ultimately, by using gradient-weighted class activation mapping, a prediction heat map is produced to visualize facial expressions and speech features under four emotional states. This study offers a technical direction for the use of intelligent perception technology in the realm of special education and enriches the theory of emotional intelligence perception of children with autism.
引用
收藏
页数:35
相关论文
共 50 条
  • [1] Multimodal Emotion Recognition Based on Facial Expressions, Speech, and EEG
    Pan, Jiahui
    Fang, Weijie
    Zhang, Zhihang
    Chen, Bingzhi
    Zhang, Zheng
    Wang, Shuihua
    IEEE OPEN JOURNAL OF ENGINEERING IN MEDICINE AND BIOLOGY, 2024, 5 : 396 - 403
  • [2] Applying articulatory features to speech emotion recognition
    Zhou, Yu
    Sun, Yanqing
    Yang, Lin
    Yan, Yonghong
    2009 INTERNATIONAL CONFERENCE ON RESEARCH CHALLENGES IN COMPUTER SCIENCE, ICRCCS 2009, 2009, : 73 - 76
  • [3] An enhanced speech emotion recognition using vision transformer
    Akinpelu, Samson
    Viriri, Serestina
    Adegun, Adekanmi
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [4] Enhanced Facial Emotion Recognition Using Vision Transformer Models
    Fatima, N. Sabiyath
    Deepika, G.
    Anthonisamy, Arun
    Chitra, R. Jothi
    Muralidharan, J.
    Alagarsamy, Manjunathan
    Ramyasree, Kummari
    JOURNAL OF ELECTRICAL ENGINEERING & TECHNOLOGY, 2025, 20 (02) : 1143 - 1152
  • [5] A Comparison of Facial Features and Fusion Methods for Emotion Recognition
    Smirnov, Demiyan V.
    Muraleedharan, Rajani
    Ramachandran, Ravi P.
    NEURAL INFORMATION PROCESSING, ICONIP 2015, PT IV, 2015, 9492 : 574 - 582
  • [6] Cross-dataset emotion recognition from facial expressions through convolutional neural networks
    Dias, William
    Andalo, Fernanda
    Padilha, Rafael
    Bertocco, Gabriel
    Almeida, Waldir
    Costa, Paula
    Rocha, Anderson
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2022, 82
  • [7] Facial Expressions and Body Postures Emotion Recognition based on Convolutional Attention Network
    Zhou, Tiehua
    Gao, Shiru
    Mei, Yuanhao
    Wang, Ling
    PROCEEDINGS OF THE 2021 IEEE INTERNATIONAL CONFERENCE ON COMPUTER, INFORMATION, AND TELECOMMUNICATION SYSTEMS (IEEE CITS 2021), 2021, : 108 - 112
  • [8] Real-time facial emotion recognition model based on kernel autoencoder and convolutional neural network for autism children
    Talaat, Fatma M.
    Ali, Zainab H.
    Mostafa, Reham R.
    El-Rashidy, Nora
    SOFT COMPUTING, 2024, 28 (9-10) : 6695 - 6708
  • [9] Improvement of Speech Emotion Recognition by Deep Convolutional Neural Network and Speech Features
    Mohanty, Aniruddha
    Cherukuri, Ravindranath C.
    Prusty, Alok Ranjan
    THIRD CONGRESS ON INTELLIGENT SYSTEMS, CIS 2022, VOL 1, 2023, 608 : 117 - 129
  • [10] Emotion Recognition using Facial Expressions in Children using the NAO Robot
    Lopez-Rincon, Alejandro
    2019 INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATIONS AND COMPUTERS (CONIELECOMP), 2019, : 146 - 153