BERT-Based Sentiment Analysis for Low-Resourced Languages: A Case Study of Urdu Language

被引:4
作者
Ashraf, Muhammad Rehan [1 ,2 ]
Jana, Yasmeen [2 ]
Umer, Qasim [2 ,3 ]
Jaffar, M. Arfan [1 ]
Chung, Sungwook [4 ]
Ramay, Waheed Yousuf [5 ]
机构
[1] Super Univ, Dept Comp Sci, Lahore 54000, Pakistan
[2] COMSATS Univ Islamabad, Dept Comp Sci, Vehari 61000, Pakistan
[3] Hanyang Univ, Dept Comp Sci, Seoul 04763, South Korea
[4] Changwon Natl Univ, Dept Comp Engn, Chang Won 51140, South Korea
[5] Air Univ, Dept Comp Sci, Multan 60000, Pakistan
来源
IEEE ACCESS | 2023年 / 11卷
关键词
Sentiment analysis; Support vector machines; Social networking (online); Sports; Blogs; Encoding; Natural language processing; Linguistics; Urdu; BERT; classification; sentiment analysis; ROMAN URDU; CLASSIFICATION; MACHINE;
D O I
10.1109/ACCESS.2023.3322101
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sentiment analysis holds significant importance in research projects by providing valuable insights into public opinions. However, the majority of sentiment analysis studies focus on the English language, leaving a gap in research for other low-resourced languages or regional languages, e.g., Persian, Pashto, and Urdu. Moreover, computational linguists face the challenge of developing lexical resources for these languages. In light of this, this paper presents a deep learning-based approach for Urdu Text Sentiment Analysis (USA-BERT), leveraging Bidirectional Encoder Representations from Transformers and introduces an Urdu Dataset for Sentiment Analysis-23 (UDSA-23). USA-BERT first preprocesses the Urdu reviews by exploiting BERT-Tokenizer. Second, it creates BERT embeddings for each Urdu review. Third, given the BERT embeddings, it fine-tunes a deep learning classifier (BERT). Finally, it employs the Pareto principle on two datasets (the state-of-the-art (UCSA-21) and UDSA-23) to assess USA-BERT. The assessment results demonstrate that USA-BERT significantly surpasses the existing methods by improving the accuracy and f-measure up to 26.09% and 25.87%, respectively.
引用
收藏
页码:110245 / 110259
页数:15
相关论文
共 50 条
[31]   Sentiment analysis techniques, challenges, and opportunities: Urdu language-based analytical study [J].
Liaqat M.I. ;
Hassan M.A. ;
Shoaib M. ;
Khurshid S.K. ;
Shamseldin M.A. .
PeerJ Computer Science, 2022, 8
[32]   Sentiment analysis techniques, challenges, and opportunities: Urdu language-based analytical study [J].
Liaqat, Muhammad Irzam ;
Hassan, Muhammad Awais ;
Shoaib, Muhammad ;
Khurshid, Syed Khaldoon ;
Shamseldin, Mohamed A. .
PEERJ COMPUTER SCIENCE, 2022, 8
[33]   Enhancement of Named Entity Recognition in Low-Resource Languages with Data Augmentation and BERT Models: A Case Study on Urdu [J].
Ullah, Fida ;
Gelbukh, Alexander ;
Zamir, Muhammad Tayyab ;
Riveron, Edgardo Manuel Felipe ;
Sidorov, Grigori .
COMPUTERS, 2024, 13 (10)
[34]   Using Masked Language Modeling to Enhance BERT-Based Aspect-Based Sentiment Analysis for Affective Token Prediction [J].
Jin, Weiqiang ;
Zhao, Biao ;
Liu, Chenxing ;
Zhang, Heng ;
Jiang, Mengying .
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PART X, 2023, 14263 :530-542
[35]   BERT Transformers Performance Comparison for Sentiment Analysis: A Case Study in Spanish [J].
Barcena Ruiz, Gerardo ;
de Jesus Gil, Richard .
GOOD PRACTICES AND NEW PERSPECTIVES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 5, WORLDCIST 2024, 2024, 989 :152-164
[36]   A Novel Framework for Agricultural Futures Price Prediction With BERT-Based Topic Identification and Sentiment Analysis [J].
Wang, Wensheng ;
Liu, Yuxi .
JOURNAL OF FORECASTING, 2025,
[37]   Empowering Urdu sentiment analysis: an attention-based stacked CNN-Bi-LSTM DNN with multilingual BERT [J].
Khan, Lal ;
Qazi, Atika ;
Chang, Hsien-Tsung ;
Alhajlah, Mousa ;
Mahmood, Awais .
COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (01)
[38]   Aspect-based sentiment analysis in Urdu language: resource creation and evaluation [J].
Altaf, Amna ;
Anwar, Muhammad Waqas ;
Jamal, Muhammad Hasan ;
Bajwa, Usama Ijaz ;
Rani, Sadaf .
Neural Computing and Applications, 2024, 36 (34) :21365-21381
[39]   Combining BERT and CNN for Sentiment Analysis A Case Study on COVID-19 [J].
Kumar, Gunjan ;
Agrawal, Renuka ;
Sharma, Kanhaiya ;
Gundalwar, Pravin Ramesh ;
Kazi, Aqsa ;
Agrawal, Pratyush ;
Tomar, Manjusha ;
Salagrama, Shailaja .
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (10) :676-686
[40]   Word Sense Disambiguation for Morphologically Rich Low-Resourced Languages: A Systematic Literature Review and Meta-Analysis [J].
Masethe, Hlaudi Daniel ;
Masethe, Mosima Anna ;
Ojo, Sunday Olusegun ;
Giunchiglia, Fausto ;
Owolawi, Pius Adewale .
INFORMATION, 2024, 15 (09)