Using Stylometric Features for Sentiment Classification

被引:4
作者
Anchieta, Rafael T. [1 ]
Ricarte Neto, Francisco Assis [2 ]
de Sousa, Rogerio Figueiredo [3 ]
Moura, Raimundo Santos [3 ]
机构
[1] Inst Fed Piaui, Piaui, Brazil
[2] Univ Fed Pernambuco, Recife, Brazil
[3] Univ Fed Piaui, Piaui, Brazil
来源
COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT II | 2015年 / 9042卷
关键词
FEATURE-SELECTION;
D O I
10.1007/978-3-319-18117-2_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper is a comparative study about text feature extraction methods in statistical learning of sentiment classification. Feature extraction is one of the most important steps in classification systems. We use stylometry to compare with TF-IDF and Delta TF-IDF baseline methods in sentiment classification. Stylometry is a research area of Linguistics that uses statistical techniques to analyze literary style. In order to assess the viability of the stylometry, we create a corpus of product reviews from the most traditional online service in Portuguese, namely, Buscape. We gathered 2000 review about Smartphones. We use three classifiers, Support Vector Machine (SVM), Naive Bayes, and J48 to evaluate whether the stylometry has higher accuracy than the TF-IDF and Delta TF-IDF methods in sentiment classification. We found the better result with the SVM classifier (82,75%) of accuracy with stylometry and (72,62%) with Delta TF-IDF and (56,25%) with TF-IDF. The results show that stylometry is quite feasible method for sentiment classification, outperforming the accuracy of the baseline methods. We may emphasize that approach used has promising results.
引用
收藏
页码:189 / 200
页数:12
相关论文
共 28 条
[1]   Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace [J].
Abbasi, Ahmed ;
Chen, Hsinchun .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2008, 26 (02)
[2]   Text feature selection using ant colony optimization [J].
Aghdam, Mehdi Hosseinzadeh ;
Ghasem-Aghaee, Nasser ;
Basiri, Mohammad Ehsan .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) :6843-6853
[3]  
[Anonymous], 2012, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2, ACL '12
[4]  
[Anonymous], 2008, COMPANION P 14 BRAZI
[5]  
Bo P., 2008, Foundations and Trends in Information Retrieval, V2, P1, DOI DOI 10.1561/1500000011
[6]  
Castillo C., 2011, P 20 INT C WORLD WID, P675, DOI [DOI 10.1145/1963405.1963500, 10.1145/1963405.1963500]
[7]  
Hartmann NS, 2014, LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P3865
[8]  
Iqbal F., 2010, P 2010 ACM S APPL CO, P1591
[9]   A novel approach of mining write-prints for authorship attribution in e-mail forensics [J].
Iqbal, Farkhund ;
Hadjidj, Rachid ;
Fung, Benjamin C. M. ;
Debbabi, Mourad .
DIGITAL INVESTIGATION, 2008, 5 (S42-S51) :S42-S51
[10]  
Kotsiantis SB, 2007, INFORM-J COMPUT INFO, V31, P249