Uncertainty-Based Learning of a Lightweight Model for Multimodal Emotion Recognition

被引:1
作者
Radoi, Anamaria [1 ]
Cioroiu, George [1 ]
机构
[1] NUST Politehn Bucharest, Dept Appl Elect & Informat Engn, Bucharest 060042, Romania
关键词
Emotion recognition; Visualization; Feature extraction; Training; Computer architecture; Data mining; Transformers; Convolutional neural networks; Entropy; Uncertainty; entropy; multimodal emotion recognition; uncertainty-based learning; MTCNN; CREMA-D; RAVDESS; FACIAL EXPRESSION; NEURAL-NETWORKS; REPRESENTATIONS;
D O I
10.1109/ACCESS.2024.3450674
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Emotion recognition is a key research topic in the Affective Computing domain, with implications in marketing, human-robot interaction, and health domains. The continuous technological advances in terms of sensors and the rapid development of artificial intelligence technologies led to breakthroughs and improved the interpretation of human emotions. In this paper, we propose a lightweight neural network architecture that extracts and performs the analysis of multimodal information using the same audio and visual networks across multiple temporal segments. Undoubtedly, data collection and annotation for emotion recognition tasks remain challenging aspects in terms of required expertise and effort spent. In this sense, the learning process of the proposed multimodal architecture is based on an iterative procedure that starts with a small volume of annotated samples and allows a step-by-step improvement of the system by assessing the model uncertainty in recognizing discrete emotions. Specifically, at each epoch, the learning process is guided by the most uncertainly annotated samples and integrates different modes of expressing emotions through a simple augmentation technique. The framework is tested on two publicly available multimodal datasets for emotion recognition, i.e. CREMA-D and RAVDESS, using 5-folds cross-validation. Compared to state-of-the-art methods, the achieved performance demonstrates the effectiveness of the proposed approach, with an overall accuracy of 74.2 % on CREMA-D and 76.3 % on RAVDESS. Moreover, with a small number of model parameters and a low inference time, the proposed neural network architecture represents a valid candidate for the integration on platforms with limited memory and computational resources.
引用
收藏
页码:120362 / 120374
页数:13
相关论文
共 73 条
[11]   CREMA-D: Crowd-Sourced Emotional Multimodal Actors Dataset [J].
Cao, Houwei ;
Cooper, David G. ;
Keutmann, Michael K. ;
Gur, Ruben C. ;
Nenkova, Ani ;
Verma, Ragini .
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2014, 5 (04) :377-390
[12]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[13]   Emotion Modelling for Social Robotics Applications: A Review [J].
Cavallo, Filippo ;
Semeraro, Francesco ;
Fiorini, Laura ;
Magyar, Gergely ;
Sincak, Peter ;
Dario, Paolo .
JOURNAL OF BIONIC ENGINEERING, 2018, 15 (02) :185-203
[14]  
Chen L.-W., 2023, P IEEE INT C AC SPEE, P1
[15]  
Cioroiu G., 2023, P INT S SIGN CIRC SY, P1
[16]  
Cosentino S, 2018, IEEE INT C INT ROBOT, P813, DOI 10.1109/IROS.2018.8593503
[17]  
Cover T. M., 1999, Elements of information theory
[18]  
Ortega JDS, 2019, Arxiv, DOI [arXiv:1907.03196, DOI 10.48550/ARXIV.1907.03196]
[19]   Emotion Recognition for Healthcare Surveillance Systems Using Neural Networks: A Survey [J].
Dhuheir, Marwan ;
Albaseer, Abdullatif ;
Baccour, Emna ;
Erbad, Aiman ;
Abdallah, Mohamed ;
Hamdi, Mounir .
IWCMC 2021: 2021 17TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE (IWCMC), 2021, :681-687
[20]   A customizable framework for multimodal emotion recognition using ensemble of deep neural network models [J].
Dixit, Chhavi ;
Satapathy, Shashank Mouli .
MULTIMEDIA SYSTEMS, 2023, 29 (06) :3151-3168