Weakening Feature Independence of Naive Bayes Using Feature Weighting and Selection on Imbalanced Customer Review Data

被引:0
作者
Cahya, Reiza Adi [1 ]
Bachtiar, Fitra A. [1 ]
机构
[1] Brawijaya Univ, Fac Comp Sci, Malang, Indonesia
来源
2019 5TH INTERNATIONAL CONFERENCE ON SCIENCE ININFORMATION TECHNOLOGY (ICSITECH): EMBRACING INDUSTRY 4.0 - TOWARDS INNOVATION IN CYBER PHYSICAL SYSTEM | 2019年
关键词
sentiment analysis; genetic algorithm; imbalanced data; naive Bayes; feature selection; feature weighting; CONSTRUCTION; ALGORITHM;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
E-commerce sites have provided review section for users to take advantage of and express their opinion about products or services. Decision makers, on other hand, can also take advantage of the abundant reviews to analyze which aspects of products or services to be improved, which is known as sentiment analysis. naive Bayes (NB) is popular method for sentiment analysis because it is considerably faster than other methods but has comparable performance. One weakness of NB however, is that NB assumes each feature is independent with other features. This assumption is not fulfilled in sentiment analysis because terms are correlated with others. Two approaches, i.e. feature weighting (FW) and feature selection (FS) are used to weaken this assumption. Two approaches use genetic algorithm (GA) to find optimal weights and subset based on correlation and odds ratio to take imbalanced review data into account. Experiments on Women Ecommerce Clothing Review dataset show that FW approach has comparable results to non-weighted NB and FS yield worse results than NB. It can be concluded that proposed FW and FS scheme cannot improve standard NB.
引用
收藏
页码:182 / 187
页数:6
相关论文
共 19 条
  • [1] Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering
    Abualigah, Laith Mohammad
    Khader, Ahamad Tajudin
    [J]. JOURNAL OF SUPERCOMPUTING, 2017, 73 (11) : 4773 - 4795
  • [2] Ahmad SR, 2015, 2015 SCIENCE AND INFORMATION CONFERENCE (SAI), P222, DOI 10.1109/SAI.2015.7237148
  • [3] Feature selection and ensemble construction: A two-step method for aspect based sentiment analysis
    Akhtar, Md Shad
    Gupta, Deepak
    Ekbal, Asif
    Bhattacharyya, Pushpak
    [J]. KNOWLEDGE-BASED SYSTEMS, 2017, 125 : 116 - 135
  • [4] [Anonymous], 1998, LEARNING TEXT CATEGO
  • [5] [Anonymous], 2012, PROC SPRINGER C ARTI, DOI DOI 10.1007/978-3-642-32695-0_59
  • [6] Burns N., 2011, SENTIMENT ANAL CUSTO, P161
  • [7] Fatyanosa TN, 2018, PROCEEDINGS OF 2018 3RD INTERNATIONAL CONFERENCE ON SUSTAINABLE INFORMATION ENGINEERING AND TECHNOLOGY (SIET 2018), P27, DOI 10.1109/SIET.2018.8693190
  • [8] Hybrid feature selection based on enhanced genetic algorithm for text categorization
    Ghareb, Abdullah Saeed
    Abu Bakar, Azuraliza
    Hamdan, Abdul Razak
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2016, 49 : 31 - 47
  • [9] Learning from class-imbalanced data: Review of methods and applications
    Guo Haixiang
    Li Yijing
    Shang, Jennifer
    Gu Mingyun
    Huang Yuanyue
    Bing, Gong
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 73 : 220 - 239
  • [10] Guyon I., 2020, J MACH LEARN RES, V3, P1157, DOI [DOI 10.1162/153244303322753616, 10.1162/153244303322753616]