Optimizing Speech Emotion Recognition with Machine Learning Based Advanced Audio Cue Analysis

被引:0
作者
Pallewela, Nuwan [1 ]
Alahakoon, Damminda [1 ]
Adikari, Achini [1 ]
Pierce, John E. [2 ]
Rose, Miranda L. [2 ]
机构
[1] La Trobe Univ, Ctr Data Analyt & Cognit, La Trobe Business Sch, Melbourne, Vic 3083, Australia
[2] La Trobe Univ, Ctr Res Excellence Aphasia Recovery & Rehabil, Melbourne, Vic 3083, Australia
基金
英国医学研究理事会;
关键词
speech emotion recognition; audio; sentiment; machine learning; artificial intelligence; FEATURES;
D O I
10.3390/technologies12070111
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
In today's fast-paced and interconnected world, where human-computer interaction is an integral component of daily life, the ability to recognize and understand human emotions has emerged as a crucial facet of technological advancement. However, human emotion, a complex interplay of physiological, psychological, and social factors, poses a formidable challenge even for other humans to comprehend accurately. With the emergence of voice assistants and other speech-based applications, it has become essential to improve audio-based emotion expression. However, there is a lack of specificity and agreement in current emotion annotation practice, as evidenced by conflicting labels in many human-annotated emotional datasets for the same speech segments. Previous studies have had to filter out these conflicts and, therefore, a large portion of the collected data has been considered unusable. In this study, we aimed to improve the accuracy of computational prediction of uncertain emotion labels by utilizing high-confidence emotion labelled speech segments from the IEMOCAP emotion dataset. We implemented an audio-based emotion recognition model using bag of audio word encoding (BoAW) to obtain a representation of audio aspects of emotion in speech with state-of-the-art recurrent neural network models. Our approach improved the state-of-the-art audio-based emotion recognition with a 61.09% accuracy rate, an improvement of 1.02% over the BiDialogueRNN model and 1.72% over the EmoCaps multi-modal emotion recognition models. In comparison to human annotation, our approach achieved similar results in identifying positive and negative emotions. Furthermore, it has proven effective in accurately recognizing the sentiment of uncertain emotion segments that were previously considered unusable in other studies. Improvements in audio emotion recognition could have implications in voice-based assistants, healthcare, and other industrial applications that benefit from automated communication.
引用
收藏
页数:17
相关论文
共 41 条
  • [1] Abeysinghe S, 2018, INT CONF ADV ICT, P369, DOI 10.1109/ICTER.2018.8615462
  • [2] Adikari A., 2021, International Journal of Information Management Data Insights, V1, P100022, DOI DOI 10.1016/J.JJIMEI.2021.100022
  • [3] Emotions of COVID-19: Content Analysis of Self-Reported Information Using Artificial Intelligence
    Adikari, Achini
    Nawaratne, Rashmika
    De Silva, Daswin
    Ranasinghe, Sajani
    Alahakoon, Oshadi
    Alahakoon, Damminda
    [J]. JOURNAL OF MEDICAL INTERNET RESEARCH, 2021, 23 (04)
  • [4] Understanding Citizens' Emotional Pulse in a Smart City Using Artificial Intelligence
    Adikari, Achini
    Alahakoon, Damminda
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (04) : 2743 - 2751
  • [5] Neural Systems for Recognition of Emotional Prosody: A 3-D Lesion Study
    Adolphs, Ralph
    Damasio, Hanna
    Tranel, Daniel
    [J]. EMOTION, 2002, 2 (01) : 23 - 51
  • [6] Self-Building Artificial Intelligence and Machine Learning to Empower Big Data Analytics in Smart Cities
    Alahakoon, Damminda
    Nawaratne, Rashmika
    Xu, Yan
    De Silva, Daswin
    Sivarajah, Uthayasankar
    Gupta, Bhumika
    [J]. INFORMATION SYSTEMS FRONTIERS, 2023, 25 (01) : 221 - 240
  • [7] Aldeneh Z, 2017, INT CONF ACOUST SPEE, P2741, DOI 10.1109/ICASSP.2017.7952655
  • [8] Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011
    Anagnostopoulos, Christos-Nikolaos
    Iliou, Theodoros
    Giannoukos, Ioannis
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2015, 43 (02) : 155 - 177
  • [9] Interpretability for Multimodal Emotion Recognition using Concept Activation Vectors
    Asokan, Ashish Ramayee
    Kumar, Nidarshan
    Ragam, Anirudh, V
    Shylaja, S. S.
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [10] Bandela SR, 2017, INT CONF COMPUT