Opinion Mining: An Approach to Feature Engineering

被引:0
作者
Siddiqui, Shafaq [1 ]
Rehman, M. Abdul [1 ]
Daudpota, Sher M. [1 ]
Waqas, Ahmad [1 ]
机构
[1] Sukkur IBA Univ, Dept Comp Sci, Sukkur, Sindh, Pakistan
关键词
Opinion mining; feature engineering; machine learning; classification; natural language processing;
D O I
10.14569/IJACSA.2019.0100320
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Sentiment Analysis or opinion mining refers to a process of identifying and categorizing the subjective information in source materials using natural language processing (NLP), text analytics and statistical linguistics. The main purpose of opinion mining is to determine the writer's attitude towards a particular topic under discussion. This is done by identifying a polarity of a particular text paragraph using different feature sets. Feature engineering in pre-processing phase plays a vital role in improving the performance of a classifier. In this paper we empirically evaluated various features weighting mechanisms against the well-established classification techniques for opinion mining, i.e. Naive Bayes-Multinomial for binary polarity cases and SVM-LIN for multiclass cases. In order to evaluates these classification techniques we use Rotten Tomatoes publically available movie reviews dataset for training the classifiers as this is widely used dataset by research community for the same purpose. The empirical experiment concludes that the feature set containing noun, verb, adverb and adjective lemmas with feature-frequency (FF) function perform better among all other feature settings with 84% and 85% correctly classified test instances for Naive Bayes and SVM, respectively.
引用
收藏
页码:159 / 165
页数:7
相关论文
共 18 条
[1]  
Amolik Akshay, 2015, TWITTER SENTIMENT AN
[2]  
[Anonymous], INT J COMPUTER APPL
[3]  
[Anonymous], 2004, SENTIMENTAL ED SENTI
[4]  
[Anonymous], 2012, DATA PREPROCESSING
[5]  
Asur S., 2010, Proceedings 2010 IEEE/ACM International Conference on Web Intelligence-Intelligent Agent Technology (WI-IAT), P492, DOI 10.1109/WI-IAT.2010.63
[6]  
Bollen Johan, 2011, J COMPUTATIONAL SCI
[7]  
Brooke Julian, 2009, SEMANTIC APPROACH AU, P478
[8]  
Diakopoulos NA, 2010, CHI2010: PROCEEDINGS OF THE 28TH ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, VOLS 1-4, P1195
[9]  
Eguchi Koji, 2006, P 2006 C EMP METH NA
[10]  
Elhadad MK, 2017, 2017 16TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS 2017), P373