Resolving class imbalance and feature selection in customer churn dataset

被引:6
作者
Hanif, Aamer [1 ]
Azhar, Noor [1 ]
机构
[1] Air Univ, Dept Comp Sci, Islamabad, Pakistan
来源
2017 INTERNATIONAL CONFERENCE ON FRONTIERS OF INFORMATION TECHNOLOGY (FIT) | 2017年
关键词
customer churn; dataset balancing; dimensionality reduction; feature selection; SUPPORT VECTOR MACHINE; PREDICTION; CLASSIFICATION; SATISFACTION; LOYALTY;
D O I
10.1109/FIT.2017.00022
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Churn prediction datasets pertaining to telecom sector often have the class imbalance problem. Due to large number of features, dimensionality reduction (or feature selection) and dataset balancing become important data preprocessing steps. This research utilizes a real dataset to classify defecting customers in the telecom sector. Three different feature selection and dataset balancing techniques are applied for data preprocessing before classification model building. The results show that random oversampling performed better to balance the dataset and the three feature selection techniques used performed equally well. Customer call related features are extracted as features that are more important. The classification model is built using random forest technique and model evaluation measures are computed and reported. Conduct of experiments on a real dataset that does not have any customer demographic variables is a significant contribution of this paper.
引用
收藏
页码:82 / 86
页数:5
相关论文
共 37 条
  • [1] Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study
    Amin, Adnan
    Anwar, Sajid
    Adnan, Awais
    Nawaz, Muhammad
    Howard, Newton
    Qadir, Junaid
    Hawalah, Ahmad
    Hussain, Amir
    [J]. IEEE ACCESS, 2016, 4 : 7940 - 7957
  • [2] [Anonymous], 2013, INT J COMPUT APPL
  • [3] Backiel A, 2014, LECT NOTES ARTIF INT, V8468, P15, DOI 10.1007/978-3-319-07176-3_2
  • [4] Bin L., 2007, 2007 INT C SERV SYST, P1
  • [5] Handling class imbalance in customer churn prediction
    Burez, J.
    Van den Poel, D.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 4626 - 4636
  • [6] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [7] A hierarchical multiple kernel support vector machine for customer churn prediction using longitudinal behavioral data
    Chen, Zhen-Yu
    Fan, Zhi-Ping
    Sun, Minghe
    [J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2012, 223 (02) : 461 - 472
  • [8] Chomboon K, 2013, P INT MULT ENG COMP, P1
  • [9] A comparative analysis of data preparation algorithms for customer churn prediction: A case study in the telecommunication industry
    Coussement, Kristof
    Lessmann, Stefan
    Verstraeten, Geert
    [J]. DECISION SUPPORT SYSTEMS, 2017, 95 : 27 - 36
  • [10] Dalvi P.K., 2016, 2016_Symposium_on_Colossal_Data_Analysis_and_Networking_(CDAN), P1, DOI DOI 10.1109/CDAN.2016.7570883