Resolving class imbalance and feature selection in customer churn dataset

被引：6

作者：

Hanif, Aamer ^{[1
]}

Azhar, Noor ^{[1
]}

机构：

[1] Air Univ, Dept Comp Sci, Islamabad, Pakistan

来源：

2017 INTERNATIONAL CONFERENCE ON FRONTIERS OF INFORMATION TECHNOLOGY (FIT) | 2017年

关键词：

customer churn; dataset balancing; dimensionality reduction; feature selection; SUPPORT VECTOR MACHINE; PREDICTION; CLASSIFICATION; SATISFACTION; LOYALTY;

D O I：

10.1109/FIT.2017.00022

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Churn prediction datasets pertaining to telecom sector often have the class imbalance problem. Due to large number of features, dimensionality reduction (or feature selection) and dataset balancing become important data preprocessing steps. This research utilizes a real dataset to classify defecting customers in the telecom sector. Three different feature selection and dataset balancing techniques are applied for data preprocessing before classification model building. The results show that random oversampling performed better to balance the dataset and the three feature selection techniques used performed equally well. Customer call related features are extracted as features that are more important. The classification model is built using random forest technique and model evaluation measures are computed and reported. Conduct of experiments on a real dataset that does not have any customer demographic variables is a significant contribution of this paper.

引用

页码：82 / 86

页数：5

共 37 条

[1] Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study
Amin, Adnan
Anwar, Sajid
Adnan, Awais
Nawaz, Muhammad
Howard, Newton
Qadir, Junaid
Hawalah, Ahmad
Hussain, Amir
[J]. IEEE ACCESS, 2016, 4 : 7940 - 7957
[2] [Anonymous], 2013, INT J COMPUT APPL
[3] Backiel A, 2014, LECT NOTES ARTIF INT, V8468, P15, DOI 10.1007/978-3-319-07176-3_2
[4] Bin L., 2007, 2007 INT C SERV SYST, P1
[5] Handling class imbalance in customer churn prediction
Burez, J.
Van den Poel, D.
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 4626 - 4636
[6] SMOTE: Synthetic minority over-sampling technique
Chawla, Nitesh V.
Bowyer, Kevin W.
Hall, Lawrence O.
Kegelmeyer, W. Philip
[J]. 2002, American Association for Artificial Intelligence (16)
[7] A hierarchical multiple kernel support vector machine for customer churn prediction using longitudinal behavioral data
Chen, Zhen-Yu
Fan, Zhi-Ping
Sun, Minghe
[J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2012, 223 (02) : 461 - 472
[8] Chomboon K, 2013, P INT MULT ENG COMP, P1
[9] A comparative analysis of data preparation algorithms for customer churn prediction: A case study in the telecommunication industry
Coussement, Kristof
Lessmann, Stefan
Verstraeten, Geert
[J]. DECISION SUPPORT SYSTEMS, 2017, 95 : 27 - 36
[10] Dalvi P.K., 2016, 2016_Symposium_on_Colossal_Data_Analysis_and_Networking_(CDAN), P1, DOI DOI 10.1109/CDAN.2016.7570883

← 1 2 3 4 →