Sentiment Analysis of Low-Resource Language Literature Using Data Processing and Deep Learning

被引:3
|
作者
Ali, Aizaz [1 ]
Khan, Maqbool [1 ,2 ]
Khan, Khalil [3 ]
Khan, Rehan Ullah [4 ]
Aloraini, Abdulrahman [4 ]
机构
[1] Pak Austria Fachhochschule Inst Appl Sci & Technol, Dept IT & Comp Sci, Haripur 22620, Pakistan
[2] Software Competence Ctr Hagenberg, Softwarepark 32a, A-4232 Hagenberg, Austria
[3] Nazarbayev Univ, Sch Engn & Digital Sci, Dept Comp Sci, Astana 010000, Kazakhstan
[4] Qassim Univ, Coll Comp, Dept Informat Technol, POB 1162, Buraydah, Saudi Arabia
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2024年 / 79卷 / 01期
关键词
Urdu sentiment analysis; convolutional neural networks; recurrent neural network; deep learning; natural language processing; neural networks; ROMAN URDU; REVIEWS;
D O I
10.32604/cmc.2024.048712
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sentiment analysis, a crucial task in discerning emotional tones within the text, plays a pivotal role in understanding public opinion and user sentiment across diverse languages. While numerous scholars conduct sentiment analysis in widely spoken languages such as English, Chinese, Arabic, Roman Arabic, and more, we come to grappling with resource -poor languages like Urdu literature which becomes a challenge. Urdu is a uniquely crafted language, characterized by a script that amalgamates elements from diverse languages, including Arabic, Parsi, Pashtu, Turkish, Punjabi, Saraiki, and more. As Urdu literature, characterized by distinct character sets and linguistic features, presents an additional hurdle due to the lack of accessible datasets, rendering sentiment analysis a formidable undertaking. The limited availability of resources has fueled increased interest among researchers, prompting a deeper exploration into Urdu sentiment analysis. This research is dedicated to Urdu language sentiment analysis, employing sophisticated deep learning models on an extensive dataset categorized into five labels: Positive, Negative, Neutral, Mixed, and Ambiguous. The primary objective is to discern sentiments and emotions within the Urdu language, despite the absence of well-curated datasets. To tackle this challenge, the initial step involves the creation of a comprehensive Urdu dataset by aggregating data from various sources such as newspapers, articles, and social media comments. Subsequent to this data collection, a thorough process of cleaning and preprocessing is implemented to ensure the quality of the data. The study leverages two well-known deep learning models, namely Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN), for both training and evaluating sentiment analysis performance. Additionally, the study explores hyperparameter tuning to optimize the models' efficacy. Evaluation metrics such as precision, recall, and the F1 -score are employed to assess the effectiveness of the models. The research findings reveal that RNN surpasses CNN in Urdu sentiment analysis, gaining a significantly higher accuracy rate of 91%. This result accentuates the exceptional performance of RNN, solidifying its status as a compelling option for conducting sentiment analysis tasks in the Urdu language.
引用
收藏
页码:713 / 733
页数:21
相关论文
共 50 条
  • [31] Domain Adaptation Speech-to-Text for Low-Resource European Portuguese Using Deep Learning
    Medeiros, Eduardo
    Corado, Leonel
    Rato, Luis
    Quaresma, Paulo
    Salgueiro, Pedro
    FUTURE INTERNET, 2023, 15 (05)
  • [32] A Deep Learning model for Question Analysis in Low-resource Languages: A Dataset and Case Study for Persian
    Khaksefidi, Fatemeh Ebrahimi
    Fatemi, Afsaneh
    Nematbakhsh, Mohammad Ali
    Kia, Mahsa Abazari
    2024 14TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION SYSTEMS, ICPRS, 2024,
  • [33] Sentiment Analysis using Machine Learning and Deep Learning
    Chandra, Yogesh
    Jana, Antoreep
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM-2020), 2019, : 1 - 4
  • [34] Pashto poetry generation: deep learning with pre-trained transformers for low-resource languages
    Ullah, Imran
    Ullah, Khalil
    Khan, Hamad
    Aurangzeb, Khursheed
    Anwar, Muhammad Shahid
    Syed, Ikram
    PeerJ Computer Science, 2024, 10 : 1 - 23
  • [35] Sentiment Analysis Using Machine Learning and Deep Learning on Covid 19 Vaccine Twitter Data with Hadoop MapReduce
    Kul, Seda
    Sayar, Ahmet
    6TH INTERNATIONAL CONFERENCE ON SMART CITY APPLICATIONS, 2022, 393 : 859 - 868
  • [36] Improving sentiment analysis using hybrid deep learning model
    Pandey A.C.
    Rajpoot D.S.
    Recent Advances in Computer Science and Communications, 2020, 13 (04) : 627 - 640
  • [37] Fintech Sentiment Analysis using Deep Learning Approaches: a Survey
    Anis, Sarah
    Morsey, Mohamed Mabrouk
    Aref, Mostafa
    2024 5TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, ROBOTICS AND CONTROL, AIRC 2024, 2024, : 118 - 122
  • [38] Pashto poetry generation: deep learning with pre-trained transformers for low-resource languages
    Ullah, Imran
    Ullah, Khalil
    Khan, Hamad
    Aurangzeb, Khursheed
    Anwar, Muhammad Shahid
    Syed, Ikram
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [39] Sentiment Analysis Based on Deep Learning: A Comparative Study
    Dang, Nhan Cach
    Moreno-Garcia, Maria N.
    De la Prieta, Fernando
    ELECTRONICS, 2020, 9 (03)
  • [40] Sentiment Analysis of Social Media Content in Pashto Language using Deep Learning Algorithms
    Iqbal, Saqib
    Khan, Farhad
    Khan, Hikmat Ullah
    Iqba, Tassawar
    Shah, Jamal Hussain
    JOURNAL OF INTERNET TECHNOLOGY, 2022, 23 (07): : 1669 - 1677