Creating sentiment lexicon for sentiment analysis in Urdu: The case of a resource-poor language

被引:40
作者
Asghar, Muhammad Zubair [1 ]
Sattar, Anum [1 ]
Khan, Aurangzeb [2 ]
Ali, Amjad [3 ]
Kundi, Fazal Masud [1 ]
Ahmad, Shakeel [4 ]
机构
[1] Gomal Univ, ICIT, Dera Ismail Khan, KP, Pakistan
[2] Univ Sci & Technol, Dept Comp Sci, Bannu, Pakistan
[3] Univ Swat, Dept Comp & Software Technol, Saidu Sharif, Pakistan
[4] King Abdul Aziz Univ KAU, FCITR, Jeddah, Saudi Arabia
关键词
polarity lexicon; sentiment analysis; Urdu sentiment lexicon; Urdu SentiWordNet; FRAMEWORK;
D O I
10.1111/exsy.12397
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The sentiment analysis (SA) applications are becoming popular among the individuals and organizations for gathering and analysing user's sentiments about products, services, policies, and current affairs. Due to the availability of a wide range of English lexical resources, such as part-of-speech taggers, parsers, and polarity lexicons, development of sophisticated SA applications for the English language has attracted many researchers. Although there have been efforts for creating polarity lexicons in non-English languages such as Urdu, they suffer from many deficiencies, such as lack of publically available sentiment lexicons with a proper scoring mechanism of opinion words and modifiers. In this work, we present a word-level translation scheme for creating a first comprehensive Urdu polarity resource: "Urdu Lexicon" using a merger of existing resources: list of English opinion words, SentiWordNet, English-Urdu bilingual dictionary, and a collection of Urdu modifiers. We assign two polarity scores, positive and negative, to each Urdu opinion word. Moreover, modifiers are collected, classified, and tagged with proper polarity scores. We also perform an extrinsic evaluation in terms of subjectivity detection and sentiment classification, and the evaluation results show that the polarity scores assigned by this technique are more accurate than the baseline methods.
引用
收藏
页数:19
相关论文
共 31 条
[1]  
Afraz Z, 2011, PAKISTAN J SCI, V63, P222
[2]  
[Anonymous], 2004, P 4 INT C LANG RES E
[3]  
[Anonymous], 2008, LREC
[4]  
[Anonymous], 1990, Introduction toWordNet: an on-line lexical database, DOI [DOI 10.1093/IJL/3.4.235, 10.1093/ijl/3.4.235]
[5]   Sentence-Level Emotion Detection Framework Using Rule-Based Classification [J].
Asghar, Muhammad Zubair ;
Khan, Aurangzeb ;
Bibi, Afsana ;
Kundi, Fazal Masud ;
Ahmad, Hussain .
COGNITIVE COMPUTATION, 2017, 9 (06) :868-894
[6]   A Unified Framework for Creating Domain Dependent Polarity Lexicons from User Generated Reviews [J].
Asghar, Muhammad Zubair ;
Khan, Aurangzeb ;
Ahmad, Shakeel ;
Khan, Imran Ali ;
Kundi, Fazal Masud .
PLOS ONE, 2015, 10 (10)
[7]  
Ayesha Zafar S. H., 2012, PROCESS COLLAB CENT, P55
[8]  
Baccianella S., 2010, P 7 INT C LANG RES O, V10, P2200
[9]  
Badaro G., 2014, A large scale Arabic sentiment lexicon for Arabic opinion mining, P165
[10]  
Bakliwal A, 2012, LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P1189