Natural Language Processing and Sentiment Analysis on Bangla Social Media Comments on Russia-Ukraine War Using Transformers

被引:8
|
作者
Hasan, Mahmud [1 ]
Islam, Labiba [1 ]
Jahan, Ismat [1 ]
Meem, Sabrina Mannan [1 ]
Rahman, Rashedur M. [1 ]
机构
[1] North South Univ, Dept Elect & Comp Engn, Dhaka 1229, Bangladesh
关键词
Natural language processing; sentiment analysis; transformers; Russia-Ukraine war;
D O I
10.1142/S2196888823500021
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Bangla Language ranks seventh in the list of most spoken languages with 265 native and non-native speakers around the world and the second Indo-Aryan language after Hindi. However, the growth of research for tasks such as sentiment analysis (SA) in Bangla is relatively low compared to SA in the English language. It is because there are not enough high-quality publically available datasets for training language models for text classification tasks in Bangla. In this paper, we propose a Bangla annotated dataset for sentiment analysis on the ongoing Ukraine-Russia war. The dataset was developed by collecting Bangla comments from various videos of three prominent YouTube TV news channels of Bangladesh covering their report on the ongoing conflict. A total of 10,861 Bangla comments were collected and labeled with three polarity sentiments, namely Neutral, Pro-Ukraine (Positive), and Pro-Russia (Negative). A benchmark classifier was developed by experimenting with several transformer-based language models all pre-trained on unlabeled Bangla corpus. The models were fine-tuned using our procured dataset. Hyperparameter optimization was performed on all 5 transformer language models which include: BanglaBERT, XLM-RoBERTa-base, XLM-RoBERTa-large, Distil-mBERT and mBERT. Each model was evaluated and analyzed using several evaluation metrics which include: F1 score, accuracy, and AIC (Akaike Information Criterion). The best-performing model achieved the highest accuracy of 86% with 0.82 F1 score. Based on accuracy, F1 score and AIC, BanglaBERT outperforms baseline and all the other transformer-based classifiers.
引用
收藏
页码:329 / 356
页数:28
相关论文
共 50 条
  • [31] Analyzing Russia's propaganda tactics on Twitter using mixed methods network analysis and natural language processing: a case study of the 2022 invasion of Ukraine
    Alieva, Iuliia
    Kloo, Ian
    Carley, Kathleen M.
    EPJ DATA SCIENCE, 2024, 13 (01)
  • [32] Advancing Natural Language Processing with a Combined Approach: Sentiment Analysis and Transformation Using Graph Convolutional LSTM
    Karunasree, Kedala
    Shailaja, P.
    Rajesh, T.
    Sesadri, U.
    Neelima, Choudaraju
    Nimma, Divya
    Adak, Malabika
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (09) : 692 - 700
  • [33] Using Natural Language Processing and Data Mining for Forecasting Consumer Spending Through Social Media
    Mostafa, Noha
    Abdelazim, Kholoud
    Grida, Mohamed
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 4, INTELLISYS 2023, 2024, 825 : 882 - 901
  • [34] Using Natural Language Processing and Social Media Data to Understand the Lived Experience of People with Fibromyalgia
    Bell, Lucy
    Fordham, Beth
    Mumtaz, Sehreen
    Yaman, Reena
    Balistreri, Lisa
    Butendieck Jr, Ronald R.
    Irani, Anushka
    HEALTHCARE, 2024, 12 (24)
  • [35] Analyzing mass media influence using natural language processing and time series analysis
    Albanese, Federico
    Pinto, Sebastian
    Semeshenko, Viktoriya
    Balenzuela, Pablo
    JOURNAL OF PHYSICS-COMPLEXITY, 2020, 1 (02):
  • [36] Exploring a Language-Based Interest Assessment: Predicting Vocational Interests on Social Media Using Natural Language Processing
    Du, Yan Yi Lance
    Jain, Devansh
    Cho, Young-Min
    Hou, Daphne Xin
    Guntuku, Sharath Chandra
    Ungar, Lyle
    Tay, Louis
    JOURNAL OF CAREER ASSESSMENT, 2024,
  • [37] Depression Detection from Social Media Text Analysis using Natural Language Processing Techniques and Hybrid Deep Learning Model
    Tejaswini, Vankayala
    Babu, Korra Sathya
    Sahoo, Bibhudatta
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (01)
  • [38] Natural language processing analysis applied to COVID-19 open-text opinions using a distilBERT model for sentiment categorization
    Jojoa, Mario
    Eftekhar, Parvin
    Nowrouzi-Kia, Behdin
    Garcia-Zapirain, Begonya
    AI & SOCIETY, 2024, 39 (03) : 883 - 890
  • [39] COVID-19 Pandemic: Identifying Key Issues Using Social Media and Natural Language Processing
    Oyebode, Oladapo
    Ndulue, Chinenye
    Mulchandani, Dinesh
    Suruliraj, Banuchitra
    Adib, Ashfaq
    Orji, Fidelia Anulika
    Milios, Evangelos
    Matwin, Stan
    Orji, Rita
    JOURNAL OF HEALTHCARE INFORMATICS RESEARCH, 2022, 6 (02) : 174 - 207
  • [40] Detecting Novel and Emerging Drug Terms Using Natural Language Processing: A Social Media Corpus Study
    Simpson, Sean S.
    Adams, Nikki
    Brugman, Claudia M.
    Conners, Thomas J.
    JMIR PUBLIC HEALTH AND SURVEILLANCE, 2018, 4 (01): : 200 - 213