Exploiting Linguistic Features for Effective Sentence-Level Sentiment Analysis in Urdu Language

被引:9
作者
Altaf, Amna [1 ]
Anwar, Muhammad Waqas [1 ]
Jamal, Muhammad Hasan [1 ]
Bajwa, Usama Ijaz [1 ]
机构
[1] COMSATS Univ Islamabad, Dept Comp Sci, Lahore Campus 1-5 Km Def Rd Raiwind Rd, Lahore, Punjab, Pakistan
关键词
Supervised Machine Learning; Parts of Speech Tagging; Sentiment Analysis; Urdu Language; SELECTION;
D O I
10.1007/s11042-023-15216-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Rapid increase in the use of social media has led to the generation of gigabytes of information shared by billions of users worldwide. To analyze this information and determine the behavior of people towards different events, sentiment analysis is widely used by researchers. Existing studies in Urdu sentiment analysis mostly use traditional n-gram features, which unlike linguistic features, do not focus on the contextual information being discussed. Moreover, no existing study classifies sentiments of proverbs and idioms which is challenging as mostly they do not contain sentiment words but carry strong sentiments. This study exploits linguistic features of Urdu language for sentence-level sentiment analysis and classifies idioms and proverbs using classical machine learning techniques. We develop a dataset comprising of idioms, proverbs, and sentences from the news domain, and extract part-of-speech tag-based features, boolean features, and numeric features from the dataset after keen linguistic analysis of Urdu language. Experimental results show that J48 classifier performs best in sentiment classification with an accuracy of 90% and an F-measure of 88%.
引用
收藏
页码:41813 / 41839
页数:27
相关论文
共 49 条
  • [1] Abd-Elhamid L, 2016, PROCEEDINGS OF 2016 11TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), P260, DOI 10.1109/ICCES.2016.7822011
  • [2] On learning algorithm selection for classification
    Ali, S
    Smith, KA
    [J]. APPLIED SOFT COMPUTING, 2006, 6 (02) : 119 - 138
  • [3] Amjad K, 2017, 2017 INTERNATIONAL CONFERENCE ON OPEN SOURCE SYSTEMS & TECHNOLOGIES (ICOSST), P48, DOI 10.1109/ICOSST.2017.8279004
  • [4] [Anonymous], 2015, INT J COMPUT APPL
  • [5] Aziz S., 2020, Pak. J. Eng. Technol, DOI [10.51846/vol3iss2pp172-177, DOI 10.51846/VOL3ISS2PP172-177]
  • [6] Benamara F, 2007, ICWSM
  • [7] Urdu language processing: a survey
    Daud, Ali
    Khan, Wahab
    Che, Dunren
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2017, 47 (03) : 279 - 311
  • [8] Techniques and Applications for Sentiment Analysis
    Feldman, Ronen
    [J]. COMMUNICATIONS OF THE ACM, 2013, 56 (04) : 82 - 89
  • [9] Separate-and-conquer rule learning
    Fürnkranz, J
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 1999, 13 (01) : 3 - 54
  • [10] Deep Learning-Based Sentiment Analysis for Roman Urdu Text
    Ghulam, Hussain
    Zeng, Feng
    Li, Wenjia
    Xiao, Yutong
    [J]. 2018 INTERNATIONAL CONFERENCE ON IDENTIFICATION, INFORMATION AND KNOWLEDGE IN THE INTERNET OF THINGS, 2019, 147 : 131 - 135