Class Association and Attribute Relevancy Based Imputation Algorithm to Reduce Twitter Data for Optimal Sentiment Analysis

被引:4
作者
Bibi, Maryum [1 ]
Nadeem, Malik Sajjad Ahmed [1 ]
Khan, Imtiaz Hussain [2 ]
Shim, Seong-O [3 ]
Khan, Ishtiaq Rasool [3 ]
Naqvi, Uzma [1 ]
Aziz, Wajid [1 ,3 ]
机构
[1] Univ Azad Jammu & Kashmir, Dept Comp Sci & Informat Technol, Muzaffarabad 13100, Pakistan
[2] King Abdulaziz Univ, Dept Comp Sci, Jeddah 21959, Saudi Arabia
[3] Univ Jeddah, Coll Comp Sci & Engn, Jeddah 21959, Saudi Arabia
关键词
Classification; class association; dimensionality reduction; imputation; machine learning; preprocessing; Twitter sentiment analysis; FEATURE-SELECTION; CLASSIFICATION; MACHINE;
D O I
10.1109/ACCESS.2019.2942112
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Twitter sentiment analysis is a challenging task that involves various preprocessing steps including dimensionality reduction. Dimensionality reduction helps ensure low computational complexity and performance improvement during the classification process. In Twitter data, each tweet has feature values which may or may not reflect a person's response. Therefore, a large number of sparse data points are generated when tweets are represented as feature matrix, eventually increasing computational overheads and error rates in Twitter sentiment analysis. This study proposes a novel preprocessing technique called class association and attribute relevancy based imputation algorithm (CAARIA) to reduce the Twitter data size. CAARIA achieves the dimensionality reduction goal by imputing those tweets that belong to the same class and also share useful information. The performance of two classifiers (Naive Bayes and support vector machines) is evaluated on three Twitter datasets in terms of classification accuracy, measured as area under curve, and time efficiency. CAARIA is also compared against two widely used feature selection (dimensionality reduction) techniques, information gain (IG) and Pearson's correlation (PC). The findings reveal that CAARIA outperforms IG and PC in terms of classification accuracy and time efficiency. These results suggest that CAARIA is a robust data preprocessing technique for the classification task.
引用
收藏
页码:136535 / 136544
页数:10
相关论文
共 71 条
  • [1] Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums
    Abbasi, Ahmed
    Chen, Hsinchun
    Salem, Arab
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2008, 26 (03)
  • [2] Prominent feature extraction for review analysis: an empirical study
    Agarwal, Basant
    Mittal, Namita
    [J]. JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2016, 28 (03) : 485 - 498
  • [3] Al-Shalabi R., 2008, P 6 INT C INFORMATIC, P108
  • [4] Deep Recurrent neural network vs. support vector machine for aspect-based sentiment analysis of Arabic hotels' reviews
    Al-Smadi, Mohammad
    Qawasmeh, Omar
    Al-Ayyoub, Mahmoud
    Jararweh, Yaser
    Gupta, Brij
    [J]. JOURNAL OF COMPUTATIONAL SCIENCE, 2018, 27 : 386 - 393
  • [5] Alboaneen DA, 2017, IEEE INT CONF BIG DA, P4630, DOI 10.1109/BigData.2017.8258507
  • [6] Alzubi J.A., 2015, Res J Appl Sci Eng Technol, V11, P1336, DOI DOI 10.19026/RJASET.11.2241
  • [7] Machine Learning from Theory to Algorithms: An Overview
    Alzubi, Jafar
    Nayyar, Anand
    Kumar, Akshi
    [J]. SECOND NATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE (NCCI 2018), 2018, 1142
  • [8] Alzubi JA, 2016, INT J ADV COMPUT SC, V7, P524
  • [9] Alzubi O, 2018, INT ARAB J INF TECHN, V15, P76
  • [10] Sentiment Analysis in Spanish for Improvement of Products and Services: A Deep Learning Approach
    Andres Paredes-Valverde, Mario
    Colomo-Palacios, Ricardo
    del Pilar Salas-Zarate, Maria
    Valencia-Garcia, Rafael
    [J]. SCIENTIFIC PROGRAMMING, 2017, 2017