Sampling method based on improved C4.5 decision tree and its application in prediction of telecom customer churn

被引:0
作者
Deng W. [1 ]
Deng L. [1 ]
Liu J. [1 ]
Qi J. [2 ]
机构
[1] School of Management and Economics, Chongqing University of Posts and Telecommunications, Nan'an district, Chongqing
[2] China Telecom Co., Ltd. Hefei Branch, 255, Changjiang West Road, Hefei
来源
International Journal of Information Technology and Management | 2019年 / 18卷 / 01期
关键词
Data mining; Decision tree; Imbalanced data; Over-sampling; Telecom customer churn; Under-sampling;
D O I
10.1504/IJITM.2019.097887
中图分类号
学科分类号
摘要
Nowadays, customer churn prediction is quite important for telecom operators to reduce churn rate and remain competitive. However, the imbalance between the retained customers and the churners affects the prediction accuracy. For solving this problem, a new sampling method based on improved C4.5 decision tree is proposed. Firstly, an initial weight is set for each sample according to the data scale of each class. Then, the samples' weight is adjusted through several rounds of alternative training by the improved C4.5 decision tree algorithm. Both the gain ratio and the misclassification cost are considered for splitting criterion. Besides, the boundary minority examples and the centre majority examples are found according to their weights. Furthermore, over-sampling is conducted for the boundary minority examples by synthetic minority over-sampling technique (SMOTE) and under-sampling is executed for the majority examples. Experiments on UCI public data and telecom operator data show the efficiency of the new method. Copyright © 2019 Inderscience Enterprises Ltd.
引用
收藏
页码:93 / 109
页数:16
相关论文
共 29 条
[1]  
Blaszczynskiand J., Stefanowski J., Neighbourhood sampling in bagging for imbalanced data, Neurocomputing, 150, B, pp. 529-542, (2015)
[2]  
Bock K.W.D., Poel D.V.D., An empirical evaluation of rotation-based ensemble classifiers for customer churn prediction, Expert Systems with Applications, 38, 10, pp. 12293-12301, (2011)
[3]  
Breiman L., Bagging predictors, Machine Learning, 24, 2, pp. 23-140, (1996)
[4]  
Chawla A.N., Bowyer K.W., Hall L.O., Kegelmeyer W.P., SMOTE: Synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 16, 1, pp. 321-357, (2002)
[5]  
Chen J., Wang G.Y., Positive domain reduction based on dominance relation in inconsistent system, Chinese Computer Science, 35, 3, pp. 216-218, (2008)
[6]  
Fan W., Stolfo S.J., Zhang J., Chan P.K., ADaCosT: Misclassification cost-sensitive boosting, Proc of the 16th Int. Conf. on Machine Learning, (1999)
[7]  
Freund Y., Schapire R.E., A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences, 55, 1, pp. 119-139, (1997)
[8]  
Hashmi N., Butt N.A., Iqbal M., Customer churn prediction in telecommunication-a decade review and classification, International Journal of Computer Science Issues, 10, 2, pp. 271-282, (2013)
[9]  
Hido S., Kashima H., Takahashi Y., Roughly balanced bagging for imbalanced data, Statistical Analysis and Data Mining: The ASA Data Science Journal, 2, 5-6, pp. 412-426, (2009)
[10]  
Idris A., Khan A., Lee Y.S., Intelligent churn prediction in telecom: Employing mrmr feature selection and rotboost based ensemble classification, Applied Intelligence, 39, 3, pp. 659-672, (2013)