Sentiment Analysis of Roman Urdu on E-Commerce Reviews Using Machine Learning

被引:10
作者
Chandio, Bilal [1 ]
Shaikh, Asadullah [2 ]
Bakhtyar, Maheen [1 ]
Alrizq, Mesfer [2 ]
Baber, Junaid [1 ]
Sulaiman, Adel [2 ]
Rajab, Adel [2 ]
Noor, Waheed [3 ]
机构
[1] Univ Balochistan, Dept Comp Sci & Informat Technol, Quetta 87300, Pakistan
[2] Najran Univ, Coll Comp Sci & Informat Syst, Najran 61441, Saudi Arabia
[3] Univ Balochistan, Dept Informat Technol, Quetta 87300, Pakistan
来源
CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES | 2022年 / 131卷 / 03期
关键词
Sentiment analysis; Roman Urdu; machine learning; SVM; CLASSIFICATION;
D O I
10.32604/cmes.2022.019535
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Sentiment analysis task has widely been studied for various languages such as English and French. However, Roman Urdu sentiment analysis yet requires more attention from peer-researchers due to the lack of Off-the-Shelf Natural Language Processing (NLP) solutions. The primary objective of this study is to investigate the diverse machine learning methods for the sentiment analysis of Roman Urdu data which is very informal in nature and needs to be lexically normalized. To mitigate this challenge, we propose a fine-tuned Support Vector Machine (SVM) powered by Roman Urdu Stemmer. In our proposed scheme, the corpus data is initially cleaned to remove the anomalies from the text. After initial pre-processing, each user review is being stemmed. The input text is transformed into a feature vector using the bag-of-word model. Subsequently, the SVM is used to classify and detect user sentiment. Our proposed scheme is based on a dictionary based Roman Urdu stemmer. The creation of the Roman Urdu stemmer is aimed at standardizing the text so as to minimize the level of complexity. The efficacy of our proposed model is also empirically evaluated with diverse experimental configurations, so as to fine-tune the hyper-parameters and achieve superior performance. Moreover, a series of experiments are conducted on diverse machine learning and deep learning models to compare the performance with our proposed model. We also introduced the largest dataset on Roman Urdu, i.e., Roman Urdu e-commerce dataset (RUECD), which contains 26K+ user reviews annotated by the group of experts. The RUECD is challenging and the largest dataset available of Roman Urdu. The experiments show that the newly generated dataset is quite challenging and requires more attention from the peer researchers for Roman Urdu sentiment analysis.
引用
收藏
页码:1263 / 1287
页数:25
相关论文
共 53 条
[1]   Aspect-based sentiment analysis using smart government review data [J].
Alqaryouti, Omar ;
Siyam, Nur ;
Monem, Azza Abdel ;
Shaalan, Khaled .
APPLIED COMPUTING AND INFORMATICS, 2024, 20 (1/2) :142-161
[2]   Semantic text classification: A survey of past and recent advances [J].
Altinel, Berna ;
Ganiz, Murat Can .
INFORMATION PROCESSING & MANAGEMENT, 2018, 54 (06) :1129-1153
[3]   Role of Discourse Information in Urdu Sentiment Classification: A Rule-based Method and Machine-learning Technique [J].
Awais, Muhammad ;
Shoaib, Muhammad .
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2019, 18 (04)
[4]   HCF-CRS: A Hybrid Content based Fuzzy Conformal Recommender System for providing recommendations with confidence [J].
Ayyaz, Sundus ;
Qamar, Usman ;
Nawaz, Raheel .
PLOS ONE, 2018, 13 (10)
[5]  
Bakshi RK, 2016, PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, P452
[6]   ABCDM: An Attention-based Bidirectional CNN-RNN Deep Model for sentiment analysis [J].
Basiri, Mohammad Ehsan ;
Nemati, Shahla ;
Abdar, Moloud ;
Cambria, Erik ;
Acharya, U. Rajendra .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2021, 115 :279-294
[7]   Co-LSTM: Convolutional LSTM model for sentiment analysis in social big data [J].
Behera, Ranjan Kumar ;
Jena, Monalisa ;
Rath, Santanu Kumar ;
Misra, Sanjay .
INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (01)
[8]   Sentiment classification of Roman-Urdu opinions using Naive Bayesian, Decision Tree and KNN classification techniques [J].
Bilal, Muhammad ;
Israr, Huma ;
Shahid, Muhammad ;
Khan, Amin .
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2016, 28 (03) :330-344
[9]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[10]  
BUCKLAND M, 1994, J AM SOC INFORM SCI, V45, P12, DOI 10.1002/(SICI)1097-4571(199401)45:1<12::AID-ASI2>3.0.CO