Machine-Learning-Based Approaches for Multi-Level Sentiment Analysis of Romanian Reviews

被引:1
作者
Briciu, Anamaria [1 ]
Calin, Alina-Delia [1 ]
Miholca, Diana-Lucia [1 ]
Moroz-Dubenco, Cristiana [1 ]
Petrascu, Vladiela [1 ]
Dascalu, George [2 ]
机构
[1] Babes Bolyai Univ, Dept Comp Sci, 1 M Kogalniceanu St, Cluj Napoca 400084, Romania
[2] T2 SRL,35 Ceaus Firica St, Rosiori de Vede 145100, Romania
关键词
sentiment analysis; latent semantic indexing; machine learning; deep learning; CNN; dense embedding layer; aspect term extraction; aspect category detection; Romanian language;
D O I
10.3390/math12030456
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Sentiment analysis has increasingly gained significance in commercial settings, driven by the rising impact of reviews on purchase decision-making in recent years. This research conducts a thorough examination of the suitability of machine learning and deep learning approaches for sentiment analysis, using Romanian reviews as a case study, with the aim of gaining insights into their practical utility. A comprehensive, multi-level analysis is performed, covering the document, sentence, and aspect levels. The main contributions of the paper refer to the in-depth exploration of multiple sentiment analysis models at three different textual levels and the subsequent improvements brought with respect to these standard models. Additionally, a balanced dataset of Romanian reviews from twelve product categories is introduced. The results indicate that, at the document level, supervised deep learning techniques yield the best outcomes (specifically, a convolutional neural network model that obtains an AUC value of 0.93 for binary classification and a weighted average F1-score of 0.77 in a multi-class setting with 5 target classes), albeit with increased resource consumption. Favorable results are achieved at the sentence level, as well, despite the heightened complexity of sentiment identification. In this case, the best-performing model is logistic regression, for which a weighted average F1-score of 0.77 is obtained in a multi-class polarity classification task with three classes. Finally, at the aspect level, promising outcomes are observed in both aspect term extraction and aspect category detection tasks, in the form of coherent and easily interpretable word clusters, encouraging further exploration in the context of aspect-based sentiment analysis for the Romanian language.
引用
收藏
页数:36
相关论文
共 62 条
[1]  
Avram Andrei-Marius, 2021, arXiv
[2]  
Boros T., 2018, P CONLL 2018 SHAR TA, P171, DOI DOI 10.18653/V1/K18-2017
[3]  
Bouma G., 2009, P GSCL, V30, P31, DOI DOI 10.1007/BF02774984
[4]  
Brody Samuel., 2010, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, P804
[5]  
Burlacioiu C, 2023, ECON COMPUT ECON CYB, V57, P221, DOI [10.24818/18423264/57.1.23.15, 10.24818/18423264/57.1.23.1]
[6]   A Three Word-Level Approach Used in Machine Learning for Romanian Sentiment Analysis [J].
Buzea, Marius-Cristian ;
Trausan-Matu, Stefan ;
Rebedea, Traian .
2019 18TH ROEDUNET CONFERENCE - NETWORKING IN EDUCATION AND RESEARCH (ROEDUNET), 2019,
[7]   Survey on Aspect Category Detection [J].
Chebolu, Siva Uday Sampreeth ;
Rosso, Paolo ;
Kar, Sudipta ;
Solorio, Thamar .
ACM COMPUTING SURVEYS, 2023, 55 (07)
[8]  
Chebolu SUS, 2023, Arxiv, DOI arXiv:2204.05232
[9]  
Coita I.F., 2022, Digitalization and Big Data for Resilience and Economic Intelligence, P99
[10]   Word Embedding based Clustering to Detect Topics in Social Media [J].
Comito, Carmela ;
Forestiero, Agostino ;
Pizzuti, Clara .
2019 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2019), 2019, :192-199