An empirical comparison of techniques for the class imbalance problem in churn prediction

被引:106
作者
Zhu, Bing [1 ,2 ]
Baesens, Bart [2 ,3 ]
vanden Broucke, Seppe K. L. M. [2 ]
机构
[1] Sichuan Univ, Business Sch, Chengdu 610064, Peoples R China
[2] Katholieke Univ Leuven, Dept Decis Sci & Informat Management, B-3000 Leuven, Belgium
[3] Univ Southampton, Sch Management, Southampton SO17 1BJ, Hants, England
基金
中国国家自然科学基金;
关键词
Churn prediction; Class imbalance; Benchmark experiment; Expected maximum profit measure; CUSTOMER CHURN; CLASSIFICATION; FRAMEWORK; MACHINE;
D O I
10.1016/j.ins.2017.04.015
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Class imbalance brings significant challenges to customer churn prediction. Many solutions have been developed to address this issue. In this paper, we comprehensively compare the performance of state-of-the-art techniques to deal with class imbalance in the context of churn prediction. A recently developed expected maximum profit criterion is used as one of the main performance measures to offer more insights from the perspective of cost-benefit. The experimental results show that the applied evaluation metric has a great impact on the performance of techniques. An in-depth exploration of reaction patterns to different measures is conducted by intra-family comparison within each solution group and global comparison among the representative techniques from different groups. The results also indicate there is much space to improve solutions' performance in terms of profit-based measure. Our study offers valuable insights for academics and professionals and it also provides a baseline to develop new methods for dealing with class imbalance in churn prediction. (C) 2017 Elsevier Inc. All rights reserved.
引用
收藏
页码:84 / 99
页数:16
相关论文
共 46 条
[31]  
Raeder T, 2012, INTEL SYST REF LIBR, V23, P315
[32]   An empirical study of the classification performance of learners on imbalanted and noisy software quality data [J].
Seiffert, Chris ;
Khoshgoftaar, Taghi M. ;
Van Hulse, Jason ;
Folleco, Andres .
INFORMATION SCIENCES, 2014, 259 :571-595
[33]   RUSBoost: A Hybrid Approach to Alleviating Class Imbalance [J].
Seiffert, Chris ;
Khoshgoftaar, Taghi M. ;
Van Hulse, Jason ;
Napolitano, Amri .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2010, 40 (01) :185-197
[34]   One-class support vector machines - an application in machine fault detection and classification [J].
Shin, HJ ;
Eom, DH ;
Kim, SS .
COMPUTERS & INDUSTRIAL ENGINEERING, 2005, 48 (02) :395-408
[35]   Cost-sensitive boosting for classification of imbalanced data [J].
Sun, Yamnin ;
Kamel, Mohamed S. ;
Wong, Andrew K. C. ;
Wang, Yang .
PATTERN RECOGNITION, 2007, 40 (12) :3358-3378
[36]   A novel ensemble method for classifying imbalanced data [J].
Sun, Zhongbin ;
Song, Qinbao ;
Zhu, Xiaoyan ;
Sun, Heli ;
Xu, Baowen ;
Zhou, Yuming .
PATTERN RECOGNITION, 2015, 48 (05) :1623-1637
[37]  
Ting KM, 2002, IEEE T KNOWL DATA EN, V14, P659, DOI 10.1109/TKDE.2002.1000348
[38]   Customer churn prediction by hybrid neural networks [J].
Tsai, Chih-Fong ;
Lu, Yu-Hsin .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (10) :12547-12553
[39]   New insights into churn prediction in the telecommunication sector: A profit driven data mining approach [J].
Verbeke, Wouter ;
Dejaeger, Karel ;
Martens, David ;
Hur, Joon ;
Baesens, Bart .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2012, 218 (01) :211-229
[40]   A Novel Profit Maximizing Metric for Measuring Classification Performance of Customer Churn Prediction Models [J].
Verbraken, Thomas ;
Verbeke, Wouter ;
Baesens, Bart .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (05) :961-973