Customer churn prediction in telecom using machine learning in big data platform

被引:138
作者
Ahmad, Abdelrahim Kasem [1 ]
Jafar, Assef [1 ]
Aljoumaa, Kadan [1 ]
机构
[1] Higher Inst Appl Sci & Technol, Fac Informat Technol, Damascus, Syria
关键词
Customer churn prediction; Churn in telecom; Machine learning; Feature selection; Classification; Mobile Social Network Analysis; Big data; CLASS IMBALANCE;
D O I
10.1186/s40537-019-0191-6
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Customer churn is a major problem and one of the most important concerns for large companies. Due to the direct effect on the revenues of the companies, especially in the telecom field, companies are seeking to develop means to predict potential customer to churn. Therefore, finding factors that increase customer churn is important to take necessary actions to reduce this churn. The main contribution of our work is to develop a churn prediction model which assists telecom operators to predict customers who are most likely subject to churn. The model developed in this work uses machine learning techniques on big data platform and builds a new way of features' engineering and selection. In order to measure the performance of the model, the Area Under Curve (AUC) standard measure is adopted, and the AUC value obtained is 93.3%. Another main contribution is to use customer social network in the prediction model by extracting Social Network Analysis (SNA) features. The use of SNA enhanced the performance of the model from 84 to 93.3% against AUC standard. The model was prepared and tested through Spark environment by working on a large dataset created by transforming big raw data provided by SyriaTel telecom company. The dataset contained all customers' information over 9 months, and was used to train, test, and evaluate the system at SyriaTel. The model experimented four algorithms: Decision Tree, Random Forest, Gradient Boosted Machine Tree "GBM" and Extreme Gradient Boosting "XGBOOST". However, the best results were obtained by applying XGBOOST algorithm. This algorithm was used for classification in this churn predictive model.
引用
收藏
页数:24
相关论文
共 26 条
[1]   Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study [J].
Amin, Adnan ;
Anwar, Sajid ;
Adnan, Awais ;
Nawaz, Muhammad ;
Howard, Newton ;
Qadir, Junaid ;
Hawalah, Ahmad ;
Hussain, Amir .
IEEE ACCESS, 2016, 4 :7940-7957
[2]  
[Anonymous], 2016, Int. Res. J. Eng. Technol
[3]   The Perils of Proactive Churn Prevention Using Plan Recommendations: Evidence from a Field Experiment [J].
Ascarza, Eva ;
Iyengar, Raghuram ;
Schleicher, Martin .
JOURNAL OF MARKETING RESEARCH, 2016, 53 (01) :46-60
[4]   Betweenness centrality in large complex networks [J].
Barthélemy, M .
EUROPEAN PHYSICAL JOURNAL B, 2004, 38 (02) :163-168
[5]  
Bott, 2014, IGARSS, V11, P1
[6]  
Brândusoiu I, 2016, INT CONF COMM, P97, DOI 10.1109/ICComm.2016.7528311
[7]   The anatomy of a large-scale hypertextual Web search engine [J].
Brin, S ;
Page, L .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7) :107-117
[8]   Handling class imbalance in customer churn prediction [J].
Burez, J. ;
Van den Poel, D. .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) :4626-4636
[9]  
Chawla NV, 2005, DATA MINING AND KNOWLEDGE DISCOVERY HANDBOOK, P853, DOI 10.1007/0-387-25465-X_40
[10]  
Chen Tianqi, 2016, XGBOOST SCALABLE TRE