Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews

被引:7
作者
Chatterjee, Ishani [1 ]
Zhou, Mengchu [1 ,2 ,3 ]
Abusorrah, Abdullah [2 ,3 ]
Sedraoui, Khaled [2 ,3 ]
Alabdulwahab, Ahmed [2 ,3 ]
机构
[1] New Jersey Inst Technol, Dept Elect & Comp Engn, Newark, NJ 07102 USA
[2] King Abdulaziz Univ, Dept Elect & Comp Engn, Fac Engn, Jeddah 21481, Saudi Arabia
[3] King Abdulaziz Univ, Ctr Res Excellence Renewable Energy & Power Syst, Jeddah 21481, Saudi Arabia
关键词
sentiment analysis; interquartile range; TextBlob; natural language processing; outlier detection; data scrapping; J-shaped distribution; imbalance dataset; big data analytics; SENTIMENT ANALYSIS; ANOMALY DETECTION; SYSTEMS;
D O I
10.3390/e23121645
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
People nowadays use the internet to project their assessments, impressions, ideas, and observations about various subjects or products on numerous social networking sites. These sites serve as a great source to gather data for data analytics, sentiment analysis, natural language processing, etc. Conventionally, the true sentiment of a customer review matches its corresponding star rating. There are exceptions when the star rating of a review is opposite to its true nature. These are labeled as the outliers in a dataset in this work. The state-of-the-art methods for anomaly detection involve manual searching, predefined rules, or traditional machine learning techniques to detect such instances. This paper conducts a sentiment analysis and outlier detection case study for Amazon customer reviews, and it proposes a statistics-based outlier detection and correction method (SODCM), which helps identify such reviews and rectify their star ratings to enhance the performance of a sentiment analysis algorithm without any data loss. This paper focuses on performing SODCM in datasets containing customer reviews of various products, which are (a) scraped from Amazon.com and (b) publicly available. The paper also studies the dataset and concludes the effect of SODCM on the performance of a sentiment analysis algorithm. The results exhibit that SODCM achieves higher accuracy and recall percentage than other state-of-the-art anomaly detection algorithms.
引用
收藏
页数:24
相关论文
共 62 条
  • [1] Experimental Results on Customer Reviews Using Lexicon-Based Word Polarity Identification Method
    Abdalgader, Khaled
    Al Shibli, Aysha
    [J]. IEEE ACCESS, 2020, 8 (08): : 179955 - 179969
  • [2] Tourism Mobile App With Aspect-Based Sentiment Classification Framework for Tourist Reviews
    Afzaal, Muhammad
    Usman, Muhammad
    Fong, Alvis
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2019, 65 (02) : 233 - 242
  • [3] Unsupervised Anomaly Detection Based on Minimum Spanning Tree Approximated Distance Measures and Its Application to Hydropower Turbines
    Ahmed, Imtiaz
    Dagnino, Aldo
    Ding, Yu
    [J]. IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2019, 16 (02) : 654 - 667
  • [4] Aspect Based Sentiment Analysis of Ridesharing Platform Reviews for Kansei Engineering
    Ali, Saqib
    Wang, Guojun
    Riaz, Shazia
    [J]. IEEE ACCESS, 2020, 8 (173186-173196): : 173186 - 173196
  • [5] Improving Sentiment Analysis in Arabic and English Languages by Using Multi-Layer Perception Model (MLP)
    Almaghrabi, Maram
    Chetty, Girija
    [J]. 2020 IEEE 7TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2020), 2020, : 745 - 746
  • [6] Improving Sentiment Polarity Detection Through Target Identification
    Basiri, Mohammad Ehsan
    Abdar, Moloud
    Kabiri, Arman
    Nemati, Shahla
    Zhou, Xujuan
    Allahbakhshi, Forough
    Yen, Neil Y.
    [J]. IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2020, 7 (01): : 113 - 128
  • [7] Aggregating Customer Review Attributes for Online Reputation Generation
    Benlahbib, Abdessamad
    Nfaoui, El Habib
    [J]. IEEE ACCESS, 2020, 8 : 96550 - 96564
  • [8] A Survey of Sentiment Analysis from Social Media Data
    Chakraborty, Koyel
    Bhattacharyya, Siddhartha
    Bag, Rajib
    [J]. IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2020, 7 (02): : 450 - 464
  • [9] Chatterjee I., 2021, **DATA OBJECT**, DOI [10.7910/DVN/W96OFO, DOI 10.7910/DVN/W96OFO]
  • [10] A Correntropy-based Affine Iterative Closest Point Algorithm for Robust Point Set Registration
    Chen, Hongchen
    Zhang, Xie
    Du, Shaoyi
    Wu, Zongze
    Zheng, Nanning
    [J]. IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2019, 6 (04) : 981 - 991