An improved sentiment classification model based on data quality and word embeddings

被引:0
作者
Asma Siagh
Fatima Zohra Laallam
Okba Kazar
Hajer Salem
机构
[1] Kasdi Merbah University Ouargla,Laboratoire d’INtelligence Artificielle et des Technologies de l’Information (LINATI), Department of Computer Science and Information Technologies
[2] University of Biskra,Smart Computer Science Laboratory (LINFI), Computer Science Department
[3] United Arab Emirate University,Department of Information Systems and Security, College of Information Technology
[4] Audensiel Technologies,Pôle R&D
来源
The Journal of Supercomputing | 2023年 / 79卷
关键词
Natural language processing; Sentiment analysis; Deep learning; Imbalanced data; Word representation; Transfer learning;
D O I
暂无
中图分类号
学科分类号
摘要
User-generated content on social media platforms has reached big data levels. Sentiment analysis of this data provides opportunities to gain valuable insights into any domain. However, analyzing real-world data may confront the challenge of class imbalance, which can adversely affect the generalization ability of models due to majority class overfitting. Therefore, having an efficient model that manages any scenario of imbalanced data is practically needed. In this light, this work proposes different models based on studying the impact of data quality and transfer learning through pre-trained embeddings on boosting minority class detection. The proposed models are tested on imbalanced datasets related to social media and education. The experimental results highlight the effectiveness of Wor2vec, Glove, and Fasttext embeddings with preprocessed data. In contrast, BERT embeddings present better results with no-preprocessed data. Furthermore, in comparison with other methods, the best-performing model resulting from this study shows outperformance with notable improvements.
引用
收藏
页码:11871 / 11894
页数:23
相关论文
共 135 条
[1]  
Ghani NA(2019)Social media big data analytics: a survey Comput Human Behav 101 417-428
[2]  
Hamid S(2020)How social media analytics can inform content strategies J Comput Inform Syst. 62 1-13
[3]  
Hashem IAT(2022)Sentiment analysis of consumer reviews using deep learning Sustainability 14 10844-9
[4]  
Ahmed E(2022)Predicting aspect-based sentiment using deep learning and information visualization: the impact of COVID-19 on the airline industry Inform Manag 59 1-118
[5]  
Kordzadeh N(2021)Using sentiment analysis to predict opinion inversion in Tweets of political communication Sci. Rep 11 97-431
[6]  
Young DK(2021)Sentiment analysis using TF-IDF weighting of UK MPs’ tweets on Brexit KnowlSyst 228 421-154
[7]  
Iqbal A(2017)Using Twitter for education: beneficial or simply a waste of time? Comput Educ 106 409-26613
[8]  
Amin R(2019)A multi-stakeholder view of social media as a supporting tool in higher education: an educator-student perspective Eur Manag J 37 125-36
[9]  
Iqbal J(2019)Teaching an old pain medicine Society new tweets: integrating social media into continuing medical education Korean J Anesthesiol 72 26597-1359
[10]  
Alroobaea R(2018)Social media as a marketing tool for European and North American universities and colleges J Intercult Manag 10 1-27