A survey on machine learning methods for churn prediction

被引:24
作者
Geiler, Louis [1 ,2 ]
Affeldt, Severine [1 ]
Nadif, Mohamed [1 ]
机构
[1] Univ Paris, Ctr Borelli, UMR 9010, Paris, France
[2] Brigad, 34 Rue Sentier, F-75002 Paris, France
关键词
Churn prediction; Machine learning; Ensemble technique; SUPPORT VECTOR MACHINES; CUSTOMER CHURN; IMBALANCED DATA; LOGISTIC-REGRESSION; NEAREST-NEIGHBOR; CLASSIFICATION; SATISFACTION; SELECTION; RETENTION; SERVICES;
D O I
10.1007/s41060-022-00312-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The diversity and specificities of today's businesses have leveraged a wide range of prediction techniques. In particular, churn prediction is a major economic concern for many companies. The purpose of this study is to draw general guidelines from a benchmark of supervised machine learning techniques in association with widely used data sampling approaches on publicly available datasets in the context of churn prediction. Choosing a priori the most appropriate sampling method as well as the most suitable classification model is not trivial, as it strongly depends on the data intrinsic characteristics. In this paper, we study the behavior of eleven supervised and semi-supervised learning methods and seven sampling approaches on sixteen diverse and publicly available churn-like datasets. Our evaluations, reported in terms of the Area Under the Curve (AUC) metric, explore the influence of sampling approaches and data characteristics on the performance of the studied learning methods. Besides, we propose Nemenyi test and Correspondence Analysis as means of comparison and visualization of the association between classification algorithms, sampling methods and datasets. Most importantly, our experiments lead to a practical recommendation for a prediction pipeline based on an ensemble approach. Our proposal can be successfully applied to a wide range of churn-like datasets.
引用
收藏
页码:217 / 242
页数:26
相关论文
共 156 条
[1]  
Abdillah M.F., 2016, EPROC ENG, V3
[2]  
AHMED M, 2018, NEURAL COMPUT APPL
[3]  
Ahmed M, 2017, PROCEEDINGS OF THE 2017 INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS), P678, DOI 10.1109/IntelliSys.2017.8324367
[4]   Applying support vector machines to imbalanced datasets [J].
Akbani, R ;
Kwek, S ;
Japkowicz, N .
MACHINE LEARNING: ECML 2004, PROCEEDINGS, 2004, 3201 :39-50
[5]   One-class support vector classifiers: A survey [J].
Alam, Shamshe ;
Sonbhadra, Sanjay Kumar ;
Agarwal, Sonali ;
Nagabhushan, P. .
KNOWLEDGE-BASED SYSTEMS, 2020, 196
[6]  
Amnueypornsakul B., 2014, Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), P55
[7]   THE ANTECEDENTS AND CONSEQUENCES OF CUSTOMER SATISFACTION FOR FIRMS [J].
ANDERSON, EW ;
SULLIVAN, MW .
MARKETING SCIENCE, 1993, 12 (02) :125-143
[8]  
[Anonymous], 2001, Journal of Service Research, DOI [10.1177/109467050133004, DOI 10.1177/109467050133004]
[9]  
Batista G., 2003, P 2003 WORKSH OP SOU, P10
[10]  
Batista Gustavo APA, 2004, ACM SIGKDD Explor Newsl, V6, P20, DOI [10.1145/1007730.1007735, DOI 10.1145/1007730.1007735]