A Review of Urdu Sentiment Analysis with Multilingual Perspective: A Case of Urdu and Roman Urdu Language

被引:18
作者
Khan, Ihsan Ullah [1 ]
Khan, Aurangzeb [1 ,2 ]
Khan, Wahab [1 ]
Su'ud, Mazliham Mohd [2 ]
Alam, Muhammad Mansoor [3 ]
Subhan, Fazli [2 ,4 ]
Asghar, Muhammad Zubair [5 ]
机构
[1] Univ Sci & Technol, Dept Comp Sci, Bannu 28100, Pakistan
[2] Multimedia Univ, Fac Comp & Informat, Kuala Lumpur 50050, Malaysia
[3] Riphah Int Univ, Rawalpindi 74400, Pakistan
[4] Natl Univ Modern Languages NUML, Fac Engn & Comp Sci, Islamabad 44000, Pakistan
[5] Gomal Univ, Inst Comp & Informat Technol, Dera Ismail Khan 29050, Pakistan
关键词
preprocessing; feature extraction; classification;
D O I
10.3390/computers11010003
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Research efforts in the field of sentiment analysis have exponentially increased in the last few years due to its applicability in areas such as online product purchasing, marketing, and reputation management. Social media and online shopping sites have become a rich source of user-generated data. Manufacturing, sales, and marketing organizations are progressively turning their eyes to this source to get worldwide feedback on their activities and products. Millions of sentences in Urdu and Roman Urdu are posted daily on social sites, such as Facebook, Instagram, Snapchat, and Twitter. Disregarding people's opinions in Urdu and Roman Urdu and considering only resource-rich English language leads to the vital loss of this vast amount of data. Our research focused on collecting research papers related to Urdu and Roman Urdu language and analyzing them in terms of preprocessing, feature extraction, and classification techniques. This paper contains a comprehensive study of research conducted on Roman Urdu and Urdu text for a product review. This study is divided into categories, such as collection of relevant corpora, data preprocessing, feature extraction, classification platforms and approaches, limitations, and future work. The comparison was made based on evaluating different research factors, such as corpus, lexicon, and opinions. Each reviewed paper was evaluated according to some provided benchmarks and categorized accordingly. Based on results obtained and the comparisons made, we suggested some helpful steps in a future study.
引用
收藏
页数:29
相关论文
共 54 条
  • [1] Morphologically rich Urdu grammar parsing using Earley algorithm
    Abbas, Qaiser
    [J]. NATURAL LANGUAGE ENGINEERING, 2016, 22 (05) : 775 - 810
  • [2] Ahmad S, 2019, INT J COMPUT SCI NET, V19, P166
  • [3] Akhtar MS, 2016, P COLING 2016 26 INT, P482
  • [4] Automatic Detection of Offensive Language for Urdu and Roman Urdu
    Akhter, Muhammad Pervez
    Zheng Jiangbin
    Naqvi, Irfan Raza
    Abdelmajeed, Mohammed
    Sadiq, Muhammad Tariq
    [J]. IEEE ACCESS, 2020, 8 (08): : 91213 - 91226
  • [5] Alam M, 2017, 2017 INTERNATIONAL MULTI-TOPIC CONFERENCE (INMIC)
  • [6] Semantic text classification: A survey of past and recent advances
    Altinel, Berna
    Ganiz, Murat Can
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2018, 54 (06) : 1129 - 1153
  • [7] Balahur A., 2012, Association for Computational Linguistics, P52
  • [8] Sentiment classification of Roman-Urdu opinions using Naive Bayesian, Decision Tree and KNN classification techniques
    Bilal, Muhammad
    Israr, Huma
    Shahid, Muhammad
    Khan, Amin
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2016, 28 (03) : 330 - 344
  • [9] Bose R., 2020, INT J EMERG TRENDS E, V8, P3684, DOI [10.30534/ijeter/2020/129872020, DOI 10.30534/IJETER/2020/129872020]
  • [10] Cambria Erik., 2015, Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), P647, DOI DOI 10.18653/V1/S15-2108