HYBRID FEATURE SELECTION FRAMEWORK FOR SENTIMENT ANALYSIS ON LARGE CORPORA

被引:3
作者
Adewole, Kayode S. [1 ]
Balogun, Abdullateef O. [1 ]
Raheem, Muiz O. [1 ]
Jimoh, Muhammed K. [2 ]
Jimoh, Rasheed G. [1 ]
Mabayoje, Modinat A. [1 ]
Usman-Hamza, Fatima E. [1 ]
Akintola, Abimbola G. [1 ]
Asaju-Gbolagade, Ayisat W. [1 ]
机构
[1] Univ Ilorin, Dept Comp Sci, Ilorin, Nigeria
[2] Univ Ilorin, Dept Educ Technol, Ilorin, Nigeria
来源
JORDANIAN JOURNAL OF COMPUTERS AND INFORMATION TECHNOLOGY | 2021年 / 7卷 / 02期
关键词
Sentiment analysis; Opinion mining; Hybrid feature selection; Boruta; Recursive feature elimination; CLASSIFICATION;
D O I
10.5455/jjcit.71-1609858713
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sentiment analysis has recently drawn considerable research attention in recent years owing to its applicability in determining users' opinions, sentiments and emotions from large collections of textual data. The goal of sentiment analysis centred on improving users' experience by deploying robust techniques that mine opinions and emotions from large corpora. There are several studies on sentiment analysis and opinion mining from textual information; however, the existence of domain-specific words, such as slang, abbreviations and grammatical mistakes further posed serious challenges to existing sentiment analysis methods. In this paper, we focus on the identification of an effective discriminative subset of features that can aid classification of users' opinions from large corpora. This study proposes a hybrid feature-selection framework that is based on the hybridization of filter- and wrapper-based feature selection methods. Correlation feature selection (CFS) is hybridized with Boruta and Recursive Feature Elimination (RFE) to identify the most discriminative feature subsets for sentiment analysis. Four publicly available datasets for sentiment analysis: Amazon, Yelp, IMDB and Kaggle are considered to evaluate the performance of the proposed hybrid feature selection framework. This study evaluates the performance of three classification algorithms: Support Vector Machine (SVM), Naive Bayes and Random Forest to ascertain the superiority of the proposed approach. Experimental results across different contexts as depicted by the datasets considered in this study clearly show that CFS combined with Boruta produced promising results, especially when the features selected are passed to Random Forest classifier. Indeed, the proposed hybrid framework provides an effective way of predicting users' opinions and emotions while giving substantial consideration to predictive accuracy. The computing time of the resulting model is shorter as a result of the proposed hybrid feature selection framework.
引用
收藏
页码:130 / 151
页数:22
相关论文
共 37 条
[1]   Twitter spam account detection based on clustering and classification methods [J].
Adewole, Kayode Sakariyah ;
Hang, Tao ;
Wu, Wanqing ;
Songs, Houbing ;
Sangaiah, Arun Kumar .
JOURNAL OF SUPERCOMPUTING, 2020, 76 (07) :4802-4837
[2]  
Agarwal B., 2015, Prominent Feature Extraction for Sentiment Analysis, P21, DOI DOI 10.1007/978-3-319-25343-5_3
[3]   Effective Sentimental Analysis and Opinion Mining of Web Reviews Using Rule Based Classifiers [J].
Ahmed, Shoiab ;
Danti, Ajit .
COMPUTATIONAL INTELLIGENCE IN DATA MINING, VOL 1, CIDM 2015, 2016, 410 :171-179
[4]   Feature selection and ensemble construction: A two-step method for aspect based sentiment analysis [J].
Akhtar, Md Shad ;
Gupta, Deepak ;
Ekbal, Asif ;
Bhattacharyya, Pushpak .
KNOWLEDGE-BASED SYSTEMS, 2017, 125 :116-135
[5]  
Al-Agha I., 2019, Jordanian Journal of Computers and Information Technology, V5, P195, DOI [DOI 10.5455/JJCIT.71-1562700251, 10.5455/jjcit.71-1562700251]
[6]   A multi-stage method for content classification and opinion mining on weblog comments [J].
Alfaro, Cesar ;
Cano-Montero, Javier ;
Gomez, Javier ;
Moguerza, Javier M. ;
Ortega, Felipe .
ANNALS OF OPERATIONS RESEARCH, 2016, 236 (01) :197-213
[7]   AI Meta-Learners and Extra-Trees Algorithm for the Detection of Phishing Websites [J].
Alsariera, Yazan Ahmad ;
Adeyemo, Victor Elijah ;
Balogun, Abdullateef Oluwagbemiga ;
Alazzawi, Ammar Kareem .
IEEE ACCESS, 2020, 8 :142532-142542
[8]   Phishing Website Detection: Forest by Penalizing Attributes Algorithm and Its Enhanced Variations [J].
Alsariera, Yazan Ahmad ;
Elijah, Adeyemo Victor ;
Balogun, Abdullateef O. .
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2020, 45 (12) :10459-10470
[9]   Hybrid Filter-Wrapper Feature Selection Method for Sentiment Classification [J].
Ansari, Gunjan ;
Ahmad, Tanvir ;
Doja, Mohammad Najmud .
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2019, 44 (11) :9191-9208
[10]   RETRACTED: Classification of sentence level sentiment analysis using cloud machine learning techniques (Retracted article. See DEC, 2022) [J].
Arulmurugan, R. ;
Sabarmathi, K. R. ;
Anandakumar, H. .
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 1) :1199-1209