SentiCon: A Concept Based Feature Set For Sentiment Analysis

被引:0
作者
Mitra, Satanik [1 ]
Jenamani, Mamata [1 ]
机构
[1] IIT Kharagpur, Ind & Syst Engn, Kharagpur, W Bengal, India
来源
2018 IEEE 13TH INTERNATIONAL CONFERENCE ON INDUSTRIAL AND INFORMATION SYSTEMS (IEEE ICIIS) | 2018年
关键词
Feature Extraction; Sentiment Analysis; Machine Learning; Classification Algorithms;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Selection and extraction of appropriate numerical features to do sentiment analysis on text data with greater accuracy remain an open problem. In supervised machine learning based sentiment analysis, Term Frequency-Inverse Document Frequency (TF-IDF) scores are used as a feature for classifying polarity of text data. TF-IDIF features are a high dimensional representation of the importance of a word in the document. TF-LDF features are sparse and do not consider the correlation among the words which constructs the latent concepts in the document. Latent Semantic Analysis (LSA) removes sparseness of the TF-IDF features by representing it in a low dimensional matrix and extracts those hidden concepts. On the other hand, a natural property of text document is its information content. The quantitative estimation of Parts-of-Speech tags, negation words, sentiment lexicons etc. represent the quality of information shared in a text data. In this work, we propose an approach to generate a concept based domain specific feature set SentiCon by consolidating LSA with the quality of information of the corpus. We have applied Singular Value Decomposition (SVD) on TF-IDF features to find the LSA. We have tested SentiCon with two benchmark datasets IMDB movie review and Epinion Cars, Books datasets using four well-known classifiers - Decision Tree, Random Forrest, Support Vector Machine, and K-Nearest Neighbour classifiers. We have used standard performance measures precision, recall and F-measure to analyze the results.
引用
收藏
页码:246 / 250
页数:5
相关论文
共 25 条
[1]  
Aggarwal C., 2013, Mining text data
[2]  
Aggarwal C.C., 2015, Data Mining: The Textbook, DOI [10.1007/978-3-319-14142-8, DOI 10.1007/978-3-319-14142-8]
[3]  
Bishop Christopher M, 2016, Pattern recognition and machine learning
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]  
Chen Yuanlin, 2015, TSINGHUA SCI TECHNOL
[6]  
Chua A.Y.K., 2016, COMPUT HUMAN BEHAV
[7]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[8]  
2-9
[9]  
Dey A., 2017, LECT NOTES COMPUTER
[10]  
Fernandez-Gavilanes M., 2016, EXPERT SYST APPL