Improving Text Emotion Detection Through Comprehensive Dataset Quality Analysis

被引:1
作者
Langure, Alejandro de Leon [1 ]
Zareei, Mahdi [1 ]
机构
[1] Tecnol Monterrey, Sch Engn & Sci, Monterrey 64849, Mexico
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Measurement; Emotion recognition; Accuracy; Indexes; Encoding; Computational modeling; Bidirectional control; Training; Recurrent neural networks; Predictive models; Natural language processing; Text detection; Affective computing; natural language processing; sentiment analysis; text emotion detection; text emotion recognition; MACHINE; CLASSIFICATION; RECOGNITION; MODEL;
D O I
10.1109/ACCESS.2024.3491856
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As Artificial Intelligence assistants like OpenAI's Chat-GPT or Google's Gemini become increasingly integrated into our daily lives, their ability to understand and respond to human emotions expressed in natural language becomes essential. Affective computing, including text emotion detection (TED), has become crucial for human-computer interaction. However, the quality of datasets used for training supervised machine learning algorithms in TED often receives insufficient attention, potentially impacting model performance and comparability. This study addresses this gap by proposing a comprehensive framework for assessing dataset quality in TED. We introduce 14 quantitative metrics across four dimensions: representativity, readability, structure, and part-of-speech tag distribution, and investigate their impact on model performance. We conduct experiments on datasets with varying quality characteristics Using Bidirectional Long Short-Term Memory (BiLSTM) and Bidirectional Encoder Representations from Transformers (BERT) models. Our findings demonstrate that changes in these quality metrics can lead to statistically significant variations in model performance, with most metrics showing over 5% impact on prediction accuracy. Notably, pre-trained models like BERT exhibit more robustness to dataset quality variations than models trained from scratch. These results underscore the importance of considering and reporting dataset quality metrics in TED research, as they significantly influence model performance and generalizability. Our study lays the groundwork for more rigorous dataset quality assessment in affective computing, potentially leading to more reliable and comparable TED models in the future.
引用
收藏
页码:166512 / 166536
页数:25
相关论文
共 94 条
  • [1] Text-based emotion detection: Advances, challenges, and opportunities
    Acheampong, Francisca Adoma
    Chen Wenyu
    Nunoo-Mensah, Henry
    [J]. ENGINEERING REPORTS, 2020, 2 (07)
  • [2] Boosting Arabic Named-Entity Recognition With Multi-Attention Layer
    Ali, Mohammed Nadher Abdo
    Tan, Guanzheng
    Hussain, Aamir
    [J]. IEEE ACCESS, 2019, 7 : 46575 - 46582
  • [3] Machine learning techniques for emotion detection and sentiment analysis: current state, challenges, and future directions
    Alslaity, Alaa
    Orji, Rita
    [J]. BEHAVIOUR & INFORMATION TECHNOLOGY, 2024, 43 (01) : 139 - 164
  • [4] Impact of Dataset Size on Classification Performance: An Empirical Evaluation in the Medical Domain
    Althnian, Alhanoof
    AlSaeed, Duaa
    Al-Baity, Heyam
    Samha, Amani
    Dris, Alanoud Bin
    Alzakari, Najla
    Abou Elwafa, Afnan
    Kurdi, Heba
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (02): : 1 - 18
  • [5] [Anonymous], 2017, P 8 WORKSH COMP APPR
  • [6] Emotion Detection From Micro-Blogs Using Novel Input Representation
    Anzum, Fahim
    Gavrilova, Marina L. L.
    [J]. IEEE ACCESS, 2023, 11 : 19512 - 19522
  • [7] Assessing the readability, reliability, and quality of artificial intelligence chatbot responses to the 100 most searched queries about cardiopulmonary resuscitation: An observational study
    Arca, Dilek Omur
    Erdemir, Ismail
    Kara, Fevzi
    Shermatov, Nurgazy
    Odacioglu, Muruvvet
    Ibisoglu, Emel
    Hanci, Ferid Baran
    Sagiroglu, Gonul
    Hanci, Volkan
    [J]. MEDICINE, 2024, 103 (22) : E38352
  • [8] Artemov A., 2021, arXiv
  • [9] Sample-Size Determination Methodologies for Machine Learning in Medical Imaging Research: A Systematic Review
    Balki, Indranil
    Amirabadi, Afsaneh
    Levman, Jacob
    Martel, Anne L.
    Emersic, Ziga
    Meden, Blaz
    Garcia-Pedrero, Angel
    Ramirez, Saul C.
    Kong, Dehan
    Moody, Alan R.
    Tyrrell, Pascal N.
    [J]. CANADIAN ASSOCIATION OF RADIOLOGISTS JOURNAL-JOURNAL DE L ASSOCIATION CANADIENNE DES RADIOLOGISTES, 2019, 70 (04): : 344 - 353
  • [10] Basha SM, 2019, DEEP LEARNING AND PARALLEL COMPUTING ENVIRONMENT FOR BIOENGINEERING SYSTEMS, P153, DOI 10.1016/B978-0-12-816718-2.00016-6