Semi-supervised ensemble based on metacost model for customer churn prediction

被引:0
|
作者
Xiao J. [1 ,2 ]
Li S. [1 ]
He X. [1 ,2 ]
Teng G. [1 ,2 ]
Jia P. [3 ]
Xie L. [4 ]
机构
[1] Business School, Sichuan University, Chengdu
[2] Management Science and Operations Research Institute, Sichuan University, Chengdu
[3] Beijing Research Center Science of Science, Beijing
[4] School of Medical Information Engineering, Zunyi Medical University, Zunyi
关键词
Co-training; Costsensitive; Customer churn prediction; Imbalanced class distribution; Semi-supervised;
D O I
10.12011/SETP2019-2879
中图分类号
学科分类号
摘要
Customer churn prediction is an important content of customer relationship management (CRM). In many real customer churn prediction modeling, the class distribution is highly imbalanced, so that the performance of model is poor and it's difficult to achieve satisfactory results. At the same time, in reality, there are only a small number of labeled samples, and a large number of them are unlabeled, which cause a lot of waste of useful information. In order to solve the two problems above, this study combines the technologies of meta cost-sensitive learning, semi-supervised learning and ensemble method of Bagging, and proposes semi-supervised ensemble based on metacost model (SSEM) for customer churn prediction. This model mainly includes the following three stages:1) Metacost method is used to modify the label of initial labeled training set L, a new training set Lm is obtained, then Lm is randomly divided into model training set Ltr and model verification set Va; 2) Va is used to select three base classifiers with the highest classification accuracy, then these classifiers cooperate to selectively label some samples from unlabeled data set U, which are added into Ltr; 3) N base classifiers are trained on the new model training set Ltr, then using them to classify samples in test set, and the final classification results are obtained by integration. The empirical analysis is conducted in two customer churn prediction datasets, and the results show that the performance of SSEM model is superior to the common used supervised ensemble models and the semi-supervised ensemble models. © 2021, Editorial Board of Journal of Systems Engineering Society of China. All right reserved.
引用
收藏
页码:188 / 199
页数:11
相关论文
共 51 条
  • [1] Reichheld F F., The loyalty effect: The hidden force behind growth, profits, and lasting value, (1996)
  • [2] Bhattacharya C B., When customers are members: Customer retention in paid membership contexts, Journal of The Academy of Marketing Science, 26, 1, pp. 31-44, (1998)
  • [3] Idris A, Khan A., Churn prediction system for telecom using filter-wrapper and ensemble classification, The Computer Journal, 60, 3, pp. 410-430, (2017)
  • [4] Zhang Z, Chang G R, Huang X Y., A combination classification method of multiple decision trees based on genetic algorithm, Systems Engineering-Theory&Practice, 24, 4, pp. 63-69, (2004)
  • [5] Li Y L, Wu C, Luo P., Introducing the back-propagation into probabilistic neural network, Systems Engineering-Theory&Practice, 34, 11, pp. 2921-2928, (2014)
  • [6] Jain H, Khunteta A, Srivastava S., Churn prediction in telecommunication using logistic regression and logit boost, Procedia Computer Science, 167, 2, pp. 101-112, (2020)
  • [7] Putra F D C, Gumelar A B, Susilowati S, Et al., Construction of churn prediction model using human voice emotions features based on Bayesian belief network, Proceedings of the 2019 International Conference on Computer Engineering, Network, and Intelligent Multimedia (CENIM), pp. 1-6, (2019)
  • [8] Xia G E, Jin W D., Model of customer churn prediction on support vector machine, Systems Engineering-Theory&Practice, 28, 1, pp. 71-77, (2008)
  • [9] Ying W Y, Lin N, Li X., The unbalance dataset analysis algorithm in customer churn prediction, Systems Engineering, 26, 11, pp. 99-104, (2008)
  • [10] Amin A, Rahim F, Ali I, Et al., A comparison of two oversampling techniques (SMOTE vs MTDF) for handling class imbalance problem: A case study of customer churn prediction, New Contributions in Information Systems and Technologies, pp. 215-225, (2015)