A Sampling-Based Stack Framework for Imbalanced Learning in Churn Prediction

被引:8
作者
De, Soumi [1 ]
Prabu, P. [2 ]
机构
[1] CHRIST Deemed Univ, Dept Data Sci, Bengaluru 560029, India
[2] CHRIST Deemed Univ, Dept Comp Sci, Bengaluru 560029, India
关键词
Stacking; Training; Prediction algorithms; Support vector machines; Classification algorithms; Licenses; Companies; Churn prediction; ensemble classifiers; over-sampling; under-sampling; ensemble stack;
D O I
10.1109/ACCESS.2022.3185227
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Churn prediction is gaining popularity in the research community as a powerful paradigm that supports data-driven operational decisions. Datasets related to churn prediction are often skewed with imbalanced class distribution. Data-level solutions, like over-sampling and under-sampling, have been commonly used by researchers to address this problem. There are limited number of case studies that attempt to evolve these data-level solutions by integrating them with computationally advanced frameworks, like ensembles. Ensembles primarily employ algorithmic diversity using a fixed set of training instances to achieve superior performance. This study aims to introduce algorithmic diversity in ensembles by modifying the fixed set of training instances using diverse sampling strategies to increase predictive performance in imbalanced learning. Data is acquired from the world's largest open hotel commerce platform company. A four-part series of experiments is conducted to analyze the effectiveness of sampling techniques and ensemble solutions on model performance. A new sampling-based stack framework called "Stacking of Samplers for Imbalanced Learning" is proposed. The framework combines the prediction capabilities of sampling solutions to stimulate the information gain of the meta features in ensemble. It is observed that the proposed framework leads to improvement in model performance with AUC of 86.4% and top-decile lift of 4.7 for customers of the hotel technology provider. Additionally, results show that the framework records a higher information gain for meta features used in a stack, compared to commonly used stack frameworks.
引用
收藏
页码:68017 / 68028
页数:12
相关论文
共 27 条
[1]   Exploring nested ensemble learners using overproduction and choose approach for churn prediction in telecom industry [J].
Ahmed, Mahreen ;
Afzal, Hammad ;
Siddiqi, Imran ;
Amjad, Muhammad Faisal ;
Khurshid, Khawar .
NEURAL COMPUTING & APPLICATIONS, 2020, 32 (08) :3237-3251
[2]   Customer switching behavior analysis in the telecommunication industry via push-pull-mooring framework: A machine learning approach [J].
Al-Mashraie, Mohammed ;
Chung, Sung Hoon ;
Jeon, Hyun Woo .
COMPUTERS & INDUSTRIAL ENGINEERING, 2020, 144
[3]   Network Intrusion Detection System Using Neural Network and Condensed Nearest Neighbors with Selection of NSL-KDD Influencing Features [J].
Belgrana, Fatima Zohra ;
Benamrane, Nacera ;
Hamaida, Mohamed Amine ;
Chaabani, Abdellah Mohamed ;
Taleb-Ahmed, Abdelmalik .
2020 IEEE INTERNATIONAL CONFERENCE ON INTERNET OF THINGS AND INTELLIGENCE SYSTEM (IOTAIS), 2021, :23-29
[4]   An extensive study of C-SMOTE, a Continuous Synthetic Minority Oversampling Technique for Evolving Data Streams [J].
Bernardo, Alessio ;
Della Valle, Emanuele .
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 196
[5]   Spline-rule ensemble classifiers with structured sparsity regularization for interpretable customer churn modeling [J].
Bock, Koen W. De ;
De Caigny, Arno .
DECISION SUPPORT SYSTEMS, 2021, 150
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[8]  
Chyi Y.M., 2003, THESIS NATL SUN YAT
[9]  
Cichosz P., 2015, Data Mining Algorithms: Explained Using R
[10]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297