Resolving class imbalance and feature selection in customer churn dataset

被引:6
作者
Hanif, Aamer [1 ]
Azhar, Noor [1 ]
机构
[1] Air Univ, Dept Comp Sci, Islamabad, Pakistan
来源
2017 INTERNATIONAL CONFERENCE ON FRONTIERS OF INFORMATION TECHNOLOGY (FIT) | 2017年
关键词
customer churn; dataset balancing; dimensionality reduction; feature selection; SUPPORT VECTOR MACHINE; PREDICTION; CLASSIFICATION; SATISFACTION; LOYALTY;
D O I
10.1109/FIT.2017.00022
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Churn prediction datasets pertaining to telecom sector often have the class imbalance problem. Due to large number of features, dimensionality reduction (or feature selection) and dataset balancing become important data preprocessing steps. This research utilizes a real dataset to classify defecting customers in the telecom sector. Three different feature selection and dataset balancing techniques are applied for data preprocessing before classification model building. The results show that random oversampling performed better to balance the dataset and the three feature selection techniques used performed equally well. Customer call related features are extracted as features that are more important. The classification model is built using random forest technique and model evaluation measures are computed and reported. Conduct of experiments on a real dataset that does not have any customer demographic variables is a significant contribution of this paper.
引用
收藏
页码:82 / 86
页数:5
相关论文
共 37 条
  • [21] Predicting subscriber dissatisfaction and improving retention in the wireless telecommunications industry
    Mozer, MC
    Wolniewicz, R
    Grimes, DB
    Johnson, E
    Kaushansky, H
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2000, 11 (03): : 690 - 696
  • [22] Defection detection: Measuring and understanding the predictive accuracy of customer churn models
    Neslin, SA
    Gupta, S
    Kamakura, W
    Lu, JX
    Mason, CH
    [J]. JOURNAL OF MARKETING RESEARCH, 2006, 43 (02) : 204 - 211
  • [23] Genetic algorithm based neural network approaches for predicting churn in cellular wireless network services
    Pendharkar, Parag C.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 6714 - 6720
  • [24] Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy
    Peng, HC
    Long, FH
    Ding, C
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2005, 27 (08) : 1226 - 1238
  • [25] R Core Team, 2003, R LANG ENV STAT COMP
  • [26] Rodan A., 2014, INT J INFORM, V17, P3961, DOI 10.1155/2015/473283
  • [27] Song GJ, 2006, ICDM 2006: Sixth IEEE International Conference on Data Mining, Workshops, P798
  • [28] Customer churn prediction by hybrid neural networks
    Tsai, Chih-Fong
    Lu, Yu-Hsin
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (10) : 12547 - 12553
  • [29] A comparison of machine learning techniques for customer churn prediction
    Vafeiadis, T.
    Diamantaras, K. I.
    Sarigiannidis, G.
    Chatzisavvas, K. Ch.
    [J]. SIMULATION MODELLING PRACTICE AND THEORY, 2015, 55 : 1 - 9
  • [30] Building comprehensible customer churn prediction models with advanced rule induction techniques
    Verbeke, Wouter
    Martens, David
    Mues, Christophe
    Baesens, Bart
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (03) : 2354 - 2364