BERT-Based Sentiment Analysis for Low-Resourced Languages: A Case Study of Urdu Language

被引:8
作者
Ashraf, Muhammad Rehan [1 ,2 ]
Jana, Yasmeen [2 ]
Umer, Qasim [2 ,3 ]
Jaffar, M. Arfan [1 ]
Chung, Sungwook [4 ]
Ramay, Waheed Yousuf [5 ]
机构
[1] Super Univ, Dept Comp Sci, Lahore 54000, Pakistan
[2] COMSATS Univ Islamabad, Dept Comp Sci, Vehari 61000, Pakistan
[3] Hanyang Univ, Dept Comp Sci, Seoul 04763, South Korea
[4] Changwon Natl Univ, Dept Comp Engn, Chang Won 51140, South Korea
[5] Air Univ, Dept Comp Sci, Multan 60000, Pakistan
关键词
Sentiment analysis; Support vector machines; Social networking (online); Sports; Blogs; Encoding; Natural language processing; Linguistics; Urdu; BERT; classification; sentiment analysis; ROMAN URDU; CLASSIFICATION; MACHINE;
D O I
10.1109/ACCESS.2023.3322101
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sentiment analysis holds significant importance in research projects by providing valuable insights into public opinions. However, the majority of sentiment analysis studies focus on the English language, leaving a gap in research for other low-resourced languages or regional languages, e.g., Persian, Pashto, and Urdu. Moreover, computational linguists face the challenge of developing lexical resources for these languages. In light of this, this paper presents a deep learning-based approach for Urdu Text Sentiment Analysis (USA-BERT), leveraging Bidirectional Encoder Representations from Transformers and introduces an Urdu Dataset for Sentiment Analysis-23 (UDSA-23). USA-BERT first preprocesses the Urdu reviews by exploiting BERT-Tokenizer. Second, it creates BERT embeddings for each Urdu review. Third, given the BERT embeddings, it fine-tunes a deep learning classifier (BERT). Finally, it employs the Pareto principle on two datasets (the state-of-the-art (UCSA-21) and UDSA-23) to assess USA-BERT. The assessment results demonstrate that USA-BERT significantly surpasses the existing methods by improving the accuracy and f-measure up to 26.09% and 25.87%, respectively.
引用
收藏
页码:110245 / 110259
页数:15
相关论文
共 58 条
[1]  
Ahmad W., 2022, arXiv
[2]  
Ahmed N., 2023, Urdu sentiment analysis using deep attention-based technique
[3]   Deep Learning Based Cross Domain Sentiment Classification for Urdu Language [J].
Altaf, Amna ;
Anwar, Muhammad Waqas ;
Jamal, Muhammad Hasan ;
Hassan, Sana ;
Bajwa, Usama Ijaz ;
Choi, Gyu Sang ;
Ashraf, Imran .
IEEE ACCESS, 2022, 10 :102135-102147
[4]  
Arif H., 2016, ICICC, V8, P48
[5]  
Batra Rakhi, 2020, Mendeley Data, V1, DOI 10.17632/RZ3XG97RM5.1
[6]  
Bengio Y, 2001, ADV NEUR IN, V13, P932
[7]   A comprehensive survey on sentiment analysis: Approaches, challenges and trends [J].
Birjali, Marouane ;
Kasri, Mohammed ;
Beni-Hssane, Abderrahim .
KNOWLEDGE-BASED SYSTEMS, 2021, 226
[8]  
Bojanowski P., 2017, Trans ACL, V5, P135, DOI [10.1162/tacla00051, DOI 10.1162/TACLA00051, DOI 10.1162/TACL_A_00051]
[9]   Sentiment Analysis Is a Big Suitcase [J].
Cambria, Erik ;
Poria, Soujanya ;
Gelbukh, Alexander ;
Thelwall, Mike .
IEEE INTELLIGENT SYSTEMS, 2017, 32 (06) :74-80
[10]   Sentiment Analysis of Roman Urdu on E-Commerce Reviews Using Machine Learning [J].
Chandio, Bilal ;
Shaikh, Asadullah ;
Bakhtyar, Maheen ;
Alrizq, Mesfer ;
Baber, Junaid ;
Sulaiman, Adel ;
Rajab, Adel ;
Noor, Waheed .
CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2022, 131 (03) :1263-1287