BERT-Based Sentiment Analysis for Low-Resourced Languages: A Case Study of Urdu Language

被引:8
作者
Ashraf, Muhammad Rehan [1 ,2 ]
Jana, Yasmeen [2 ]
Umer, Qasim [2 ,3 ]
Jaffar, M. Arfan [1 ]
Chung, Sungwook [4 ]
Ramay, Waheed Yousuf [5 ]
机构
[1] Super Univ, Dept Comp Sci, Lahore 54000, Pakistan
[2] COMSATS Univ Islamabad, Dept Comp Sci, Vehari 61000, Pakistan
[3] Hanyang Univ, Dept Comp Sci, Seoul 04763, South Korea
[4] Changwon Natl Univ, Dept Comp Engn, Chang Won 51140, South Korea
[5] Air Univ, Dept Comp Sci, Multan 60000, Pakistan
关键词
Sentiment analysis; Support vector machines; Social networking (online); Sports; Blogs; Encoding; Natural language processing; Linguistics; Urdu; BERT; classification; sentiment analysis; ROMAN URDU; CLASSIFICATION; MACHINE;
D O I
10.1109/ACCESS.2023.3322101
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sentiment analysis holds significant importance in research projects by providing valuable insights into public opinions. However, the majority of sentiment analysis studies focus on the English language, leaving a gap in research for other low-resourced languages or regional languages, e.g., Persian, Pashto, and Urdu. Moreover, computational linguists face the challenge of developing lexical resources for these languages. In light of this, this paper presents a deep learning-based approach for Urdu Text Sentiment Analysis (USA-BERT), leveraging Bidirectional Encoder Representations from Transformers and introduces an Urdu Dataset for Sentiment Analysis-23 (UDSA-23). USA-BERT first preprocesses the Urdu reviews by exploiting BERT-Tokenizer. Second, it creates BERT embeddings for each Urdu review. Third, given the BERT embeddings, it fine-tunes a deep learning classifier (BERT). Finally, it employs the Pareto principle on two datasets (the state-of-the-art (UCSA-21) and UDSA-23) to assess USA-BERT. The assessment results demonstrate that USA-BERT significantly surpasses the existing methods by improving the accuracy and f-measure up to 26.09% and 25.87%, respectively.
引用
收藏
页码:110245 / 110259
页数:15
相关论文
共 50 条
[41]   Combining BERT and CNN for Sentiment Analysis A Case Study on COVID-19 [J].
Kumar, Gunjan ;
Agrawal, Renuka ;
Sharma, Kanhaiya ;
Gundalwar, Pravin Ramesh ;
Kazi, Aqsa ;
Agrawal, Pratyush ;
Tomar, Manjusha ;
Salagrama, Shailaja .
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (10) :676-686
[42]   Word Sense Disambiguation for Morphologically Rich Low-Resourced Languages: A Systematic Literature Review and Meta-Analysis [J].
Masethe, Hlaudi Daniel ;
Masethe, Mosima Anna ;
Ojo, Sunday Olusegun ;
Giunchiglia, Fausto ;
Owolawi, Pius Adewale .
INFORMATION, 2024, 15 (09)
[43]   Indic SentiReview: Natural Language Processing based Sentiment Analysis on major Indian Languages [J].
Hadiya, Nidhi ;
Nanavati, Nirali .
PROCEEDINGS OF THE 2019 3RD INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2019), 2019, :322-327
[44]   Bert-Based Latent Semantic Analysis (Bert-LSA): A Case Study on Geospatial Data Technology and Application Trend Analysis [J].
Cheng, Quanying ;
Zhu, Yunqiang ;
Song, Jia ;
Zeng, Hongyun ;
Wang, Shu ;
Sun, Kai ;
Zhang, Jinqu .
APPLIED SCIENCES-BASEL, 2021, 11 (24)
[45]   Enhancing Sentiment Analysis for Chinese Texts Using a BERT-Based Model with a Custom Attention Mechanism [J].
Ding, Linlin ;
Han, Yiming ;
Li, Mo ;
Li, Dong .
WEB INFORMATION SYSTEMS AND APPLICATIONS, WISA 2024, 2024, 14883 :172-179
[46]   BERT-based transfer learning in tacit knowledge externalization: A study case of history teachers [J].
Li, Guang ;
Zhu, Linkai ;
Liu, Fangfang ;
Cai, Zhiming ;
Wang, Yiyun ;
Gao, Ruichen .
LEARNING AND MOTIVATION, 2024, 87
[47]   A Study of Sentiment Analysis Algorithms for Agricultural Product Reviews Based on Improved BERT Model [J].
Cao, Ying ;
Sun, Zhexing ;
Li, Ling ;
Mo, Weinan .
SYMMETRY-BASEL, 2022, 14 (08)
[48]   Fine-Grained Sentiment Analysis of Arabic COVID-19 Tweets Using BERT-Based Transformers and Dynamically Weighted Loss Function [J].
Alturayeif, Nora ;
Luqman, Hamzah .
APPLIED SCIENCES-BASEL, 2021, 11 (22)
[49]   BERT-Based Natural Language Processing of Drug Labeling Documents: A Case Study for Classifying Drug-Induced Liver Injury Risk [J].
Wu, Yue ;
Liu, Zhichao ;
Wu, Leihong ;
Chen, Minjun ;
Tong, Weida .
FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2021, 4
[50]   Unveiling Sentiments: A Deep Dive Into Sentiment Analysis for Low-Resource Languages-A Case Study on Hausa Texts [J].
Shehu, Harisu Abdullahi ;
Majikumna, Kaloma Usman ;
Suleiman, Aminu Bashir ;
Luka, Stephen ;
Sharif, Md. Haidar ;
Ramadan, Rabie A. ;
Kusetogullari, Huseyin .
IEEE ACCESS, 2024, 12 :98900-98916