A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees

被引:316
作者
De Caigny, Arno [1 ]
Coussement, Kristof [1 ]
De Bock, Koen W. [2 ]
机构
[1] Univ Catholique Lille, IESEG Sch Management, LEM, Dept Mkt,UMR CNRS 9221, 3 Rue Digue, F-59000 Lille, France
[2] Audencia Business Sch, 8 Route Joneliere, F-44312 Nantes, France
关键词
OR in marketing; Hybrid algorithm; Customer churn prediction; Logit leaf model; Predictive analytics; SUPPORT VECTOR MACHINES; FEATURE-SELECTION; RULE EXTRACTION; SEGMENTATION; SATISFACTION; SERVICE; MODELS; TESTS; CLASSIFIERS; PERFORMANCE;
D O I
10.1016/j.ejor.2018.02.009
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
Decision trees and logistic regression are two very popular algorithms in customer churn prediction with strong predictive performance and good comprehensibility. Despite these strengths, decision trees tend to have problems to handle linear relations between variables and logistic regression has difficulties with interaction effects between variables. Therefore a new hybrid algorithm, the logit leaf model (LLM), is proposed to better classify data. The idea behind the LLM is that different models constructed on segments of the data rather than on the entire dataset lead to better predictive performance while maintaining the comprehensibility from the models constructed in the leaves. The LLM consists of two stages: a segmentation phase and a prediction phase. In the first stage customer segments are identified using decision rules and in the second stage a model is created for every leaf of this tree. This new hybrid approach is benchmarked against decision trees, logistic regression, random forests and logistic model trees with regards to the predictive performance and comprehensibility. The area under the receiver operating characteristics curve (AUC) and top decile lift (TDL) are used to measure the predictive performance for which LLM scores significantly better than its building blocks logistic regression and decision trees and performs at least as well as more advanced ensemble methods random forests and logistic model trees. Comprehensibility is addressed by a case study for which we observe some key benefits using the LLM compared to using decision trees or logistic regression. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:760 / 772
页数:13
相关论文
共 76 条
[1]  
ANDERSON DAVIDRAY., 2010, STAT BUSINESS EC
[2]  
[Anonymous], 2009, SIGKDD Explorations, DOI DOI 10.1145/1656274.1656278
[3]  
[Anonymous], 2011, ICML
[4]   Customer satisfaction cues to support market segmentation and explain switching behavior [J].
Athanassopoulos, AD .
JOURNAL OF BUSINESS RESEARCH, 2000, 47 (03) :191-207
[5]   Using neural network rule extraction and decision tables for credit-risk evaluation [J].
Baesens, B ;
Setiono, R ;
Mues, C ;
Vanthienen, J .
MANAGEMENT SCIENCE, 2003, 49 (03) :312-329
[6]   Customer event history for churn prediction: How long is long enough? [J].
Ballings, Michel ;
Van den Poel, Dirk .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (18) :13517-13522
[7]  
Blattberg R.C., 2010, Database Marketing: Analyzing and Managing Customers. International Series in Quantitative Marketing
[8]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[9]   BRAND CHOICE, PURCHASE INCIDENCE, AND SEGMENTATION - AN INTEGRATED MODELING APPROACH [J].
BUCKLIN, RE ;
GUPTA, S .
JOURNAL OF MARKETING RESEARCH, 1992, 29 (02) :201-215
[10]   Handling class imbalance in customer churn prediction [J].
Burez, J. ;
Van den Poel, D. .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) :4626-4636