Improving Text Emotion Detection Through Comprehensive Dataset Quality Analysis

被引：1

作者：

Langure, Alejandro de Leon ^{[1
]}

Zareei, Mahdi ^{[1
]}

机构：

[1] Tecnol Monterrey, Sch Engn & Sci, Monterrey 64849, Mexico

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Measurement; Emotion recognition; Accuracy; Indexes; Encoding; Computational modeling; Bidirectional control; Training; Recurrent neural networks; Predictive models; Natural language processing; Text detection; Affective computing; natural language processing; sentiment analysis; text emotion detection; text emotion recognition; MACHINE; CLASSIFICATION; RECOGNITION; MODEL;

D O I：

10.1109/ACCESS.2024.3491856

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

As Artificial Intelligence assistants like OpenAI's Chat-GPT or Google's Gemini become increasingly integrated into our daily lives, their ability to understand and respond to human emotions expressed in natural language becomes essential. Affective computing, including text emotion detection (TED), has become crucial for human-computer interaction. However, the quality of datasets used for training supervised machine learning algorithms in TED often receives insufficient attention, potentially impacting model performance and comparability. This study addresses this gap by proposing a comprehensive framework for assessing dataset quality in TED. We introduce 14 quantitative metrics across four dimensions: representativity, readability, structure, and part-of-speech tag distribution, and investigate their impact on model performance. We conduct experiments on datasets with varying quality characteristics Using Bidirectional Long Short-Term Memory (BiLSTM) and Bidirectional Encoder Representations from Transformers (BERT) models. Our findings demonstrate that changes in these quality metrics can lead to statistically significant variations in model performance, with most metrics showing over 5% impact on prediction accuracy. Notably, pre-trained models like BERT exhibit more robustness to dataset quality variations than models trained from scratch. These results underscore the importance of considering and reporting dataset quality metrics in TED research, as they significantly influence model performance and generalizability. Our study lays the groundwork for more rigorous dataset quality assessment in affective computing, potentially leading to more reliable and comparable TED models in the future.

引用

页码：166512 / 166536

页数：25

共 94 条

[1] Text-based emotion detection: Advances, challenges, and opportunities
Acheampong, Francisca Adoma
Chen Wenyu
Nunoo-Mensah, Henry
[J]. ENGINEERING REPORTS, 2020, 2 (07)
[2] Boosting Arabic Named-Entity Recognition With Multi-Attention Layer
Ali, Mohammed Nadher Abdo
Tan, Guanzheng
Hussain, Aamir
[J]. IEEE ACCESS, 2019, 7 : 46575 - 46582
[3] Machine learning techniques for emotion detection and sentiment analysis: current state, challenges, and future directions
Alslaity, Alaa
Orji, Rita
[J]. BEHAVIOUR & INFORMATION TECHNOLOGY, 2024, 43 (01) : 139 - 164
[4] Impact of Dataset Size on Classification Performance: An Empirical Evaluation in the Medical Domain
Althnian, Alhanoof
AlSaeed, Duaa
Al-Baity, Heyam
Samha, Amani
Dris, Alanoud Bin
Alzakari, Najla
Abou Elwafa, Afnan
Kurdi, Heba
[J]. APPLIED SCIENCES-BASEL, 2021, 11 (02): : 1 - 18
[5] [Anonymous], 2017, P 8 WORKSH COMP APPR
[6] Emotion Detection From Micro-Blogs Using Novel Input Representation
Anzum, Fahim
Gavrilova, Marina L. L.
[J]. IEEE ACCESS, 2023, 11 : 19512 - 19522
[7] Assessing the readability, reliability, and quality of artificial intelligence chatbot responses to the 100 most searched queries about cardiopulmonary resuscitation: An observational study
Arca, Dilek Omur
Erdemir, Ismail
Kara, Fevzi
Shermatov, Nurgazy
Odacioglu, Muruvvet
Ibisoglu, Emel
Hanci, Ferid Baran
Sagiroglu, Gonul
Hanci, Volkan
[J]. MEDICINE, 2024, 103 (22) : E38352
[8] Artemov A., 2021, arXiv
[9] Sample-Size Determination Methodologies for Machine Learning in Medical Imaging Research: A Systematic Review
Balki, Indranil
Amirabadi, Afsaneh
Levman, Jacob
Martel, Anne L.
Emersic, Ziga
Meden, Blaz
Garcia-Pedrero, Angel
Ramirez, Saul C.
Kong, Dehan
Moody, Alan R.
Tyrrell, Pascal N.
[J]. CANADIAN ASSOCIATION OF RADIOLOGISTS JOURNAL-JOURNAL DE L ASSOCIATION CANADIENNE DES RADIOLOGISTES, 2019, 70 (04): : 344 - 353
[10] Basha SM, 2019, DEEP LEARNING AND PARALLEL COMPUTING ENVIRONMENT FOR BIOENGINEERING SYSTEMS, P153, DOI 10.1016/B978-0-12-816718-2.00016-6

← 1 2 3 4 5 6 7 8 9 10 →