tRF-BERT: A transformative approach to aspect-based sentiment analysis in the bengali language

被引:1
作者
Ahmed, Shihab [1 ]
Samia, Moythry Manir [1 ]
Sayma, Maksuda Haider [2 ]
Kabir, Md. Mohsin [3 ,4 ]
Mridha, M. F. [5 ]
机构
[1] Comilla Univ, Dept Informat & Commun Technol, Cumilla, Bangladesh
[2] CCN Univ Sci & Technol, Dept Comp Sci & Engn, Cumilla, Bangladesh
[3] Bangladesh Univ Business & Technol, Dept Comp Sci & Engn, Dhaka, Bangladesh
[4] Eotvos Lorand Univ, Fac Informat, Budapest, Hungary
[5] Amer Int Univ Bangladesh, Dept Comp Sci, Dhaka, Bangladesh
关键词
D O I
10.1371/journal.pone.0308050
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In recent years, the surge in reviews and comments on newspapers and social media has made sentiment analysis a focal point of interest for researchers. Sentiment analysis is also gaining popularity in the Bengali language. However, Aspect-Based Sentiment Analysis is considered a difficult task in the Bengali language due to the shortage of perfectly labeled datasets and the complex variations in the Bengali language. This study used two open-source benchmark datasets of the Bengali language, Cricket, and Restaurant, for our Aspect-Based Sentiment Analysis task. The original work was based on the Random Forest, Support Vector Machine, K-Nearest Neighbors, and Convolutional Neural Network models. In this work, we used the Bidirectional Encoder Representations from Transformers, the Robustly Optimized BERT Approach, and our proposed hybrid transformative Random Forest and Bidirectional Encoder Representations from Transformers (tRF-BERT) models to compare the results with the existing work. After comparing the results, we can clearly see that all the models used in our work achieved better results than any of the previous works on the same dataset. Amongst them, our proposed transformative Random Forest and Bidirectional Encoder Representations from Transformers achieved the highest F1 score and accuracy. The accuracy and F1 score of aspect detection for the Cricket dataset were 0.89 and 0.85, respectively, and for the Restaurant dataset were 0.92 and 0.89 respectively.
引用
收藏
页数:26
相关论文
共 41 条
[1]  
Ahmed Masum M, 2021, P INT JOINT C ADV CO, P385
[2]   Aspect Extraction from Bangla Reviews Through Stacked Auto-Encoders [J].
Bodini, Matteo .
DATA, 2019, 4 (03)
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]   COVID-19 sentiment analysis via deep learning during the rise of novel cases [J].
Chandra, Rohitash ;
Krishna, Aswin .
PLOS ONE, 2021, 16 (08)
[5]   The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation [J].
Chicco, Davide ;
Jurman, Giuseppe .
BMC GENOMICS, 2020, 21 (01)
[6]  
Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
[7]  
Dong L, 2014, PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, P49
[8]  
Ghosh K., 2021, Biochemistry (bioche), V2, P741
[9]  
Haque Sabrina, 2020, Cyber Security and Computer Science. Second EAI International Conference, ICONCS 2020. Proceedings. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering (LNICST 325), P403, DOI 10.1007/978-3-030-52856-0_32
[10]  
Hendrycks D, 2020, Arxiv, DOI arXiv:1606.08415