The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis

被引:0
|
作者
Saqib Alam
Nianmin Yao
机构
[1] Dalian University of Technology,Department of Electronic Information and Electrical Engineering
关键词
Preprocessing; Machine learning; Sentiment analysis; Word2Vec;
D O I
暂无
中图分类号
学科分类号
摘要
Big data and its related technologies have become active areas of research recently. There is a huge amount of data generated every minute and second that includes unstructured data which is the topic of interest for researchers now a days. A lot of research work is currently going on in the areas of text analytics and text preprocessing. In this paper, we have studied the impact of different preprocessing steps on the accuracy of three machine learning algorithms for sentiment analysis. We applied different text preprocessing techniques and studied their impact on accuracy for sentiment classification using three well-known machine learning classifiers including Naïve Bayes (NB), maximum entropy (MaxE), and support vector machines (SVM). We calculated accuracy of the three machine learning algorithms before and after applying the preprocessing steps. Results proved that the accuracy of NB algorithm was significantly improved after applying the preprocessing steps. Slight improvement in accuracy of SVM algorithm was seen after applying the preprocessing steps. Interestingly, in case of MaxE algorithm, no improvement in accuracy was seen. Our work is a comparative study, and our results proved that in case of NB algorithm, actuary was again significantly high than any other machine learning algorithm after applying the preprocessing steps; followed by MaxE and SVM algorithms. This research work proves that text preprocessing impacts the accuracy of machine learning algorithms. It further concludes that in case of NB algorithm, accuracy has significantly improved after applying text preprocessing steps.
引用
收藏
页码:319 / 335
页数:16
相关论文
共 50 条
  • [41] Sentiment Analysis on COVID-19 Vaccine Tweets using Machine Learning and Deep Learning Algorithms
    Jain, Tarun
    Verma, Vivek Kumar
    Sharma, Akhilesh Kumar
    Saini, Bhavna
    Purohit, Nishant
    Mahdin, Hairulnizam
    Ahmad, Masitah
    Darman, Rozanawati
    Haw, Su-Cheng
    Shaharudin, Shazlyn Milleana
    Arshad, Mohammad Syafwan
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (05) : 32 - 41
  • [42] Can Ensembling Preprocessing Algorithms Lead to Better Machine Learning Fairness?
    Badran, Khaled
    Cote, Pierre-Olivier
    Kolopanis, Amanda
    Bouchoucha, Rached
    Collante, Antonio
    Costa, Diego Elias
    Shihab, Emad
    Khomh, Foutse
    COMPUTER, 2023, 56 (04) : 71 - 79
  • [43] Sentiment Analysis using Machine Learning and Deep Learning
    Chandra, Yogesh
    Jana, Antoreep
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM-2020), 2019, : 1 - 4
  • [44] Analysis of Machine Learning Algorithms and Obtaining Highest Accuracy for Prediction of Diabetes in Women
    Agarwal, Arushi
    Saxena, Ankur
    PROCEEDINGS OF THE 2019 6TH INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2019, : 686 - 690
  • [45] The Impact of Interposable Algorithms on Machine Learning
    Ma Jinzhu
    Liang Wei
    Li Junjie
    CONTEMPORARY INNOVATION AND DEVELOPMENT IN MANAGEMENT SCIENCE, 2012, : 652 - 656
  • [46] Sentiment Analysis for Women's E-commerce Reviews using Machine Learning Algorithms
    Noor, Alaa
    Islam, Mohrima
    2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,
  • [47] Implementation of Machine Learning Algorithms in Arabic Sentiment Analysis Using N-Gram Features
    Gamal, Donia
    Alfonse, Marco
    El-Horbaty, El-Sayed M.
    Salem, Abdel-Badeeh M.
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE OF INFORMATION AND COMMUNICATION TECHNOLOGY [ICICT-2019], 2019, 154 : 332 - 340
  • [48] Arabic Sentiment Analysis for ChatGPT Using Machine Learning Classification Algorithms: A Hyperparameter Optimization Technique
    Nasayreh, Ahmad
    Al Mamlook, Rabia Emhamed
    Samara, Ghassan
    Gharaibeh, Hasan
    Aljaidi, Mohammad
    Alzu'bi, Dalia
    Al-Daoud, Essam
    Abualigah, Laith
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (03)
  • [49] Sentiment analysis for cruises in Saudi Arabia on social media platforms using machine learning algorithms
    Al Sari, Bador
    Alkhaldi, Rawan
    Alsaffar, Dalia
    Alkhaldi, Tahani
    Almaymuni, Hanan
    Alnaim, Norah
    Alghamdi, Najwa
    Olatunji, Sunday O.
    JOURNAL OF BIG DATA, 2022, 9 (01)
  • [50] Sentiment analysis for cruises in Saudi Arabia on social media platforms using machine learning algorithms
    Bador Al sari
    Rawan Alkhaldi
    Dalia Alsaffar
    Tahani Alkhaldi
    Hanan Almaymuni
    Norah Alnaim
    Najwa Alghamdi
    Sunday O. Olatunji
    Journal of Big Data, 9