Predicting complex user behavior from CDR based social networks

被引:7
作者
Doyle, Casey [1 ,2 ]
Herga, Zala [3 ,4 ]
Dipple, Stephen [1 ,2 ]
Szymanski, Boleslaw K. [1 ,5 ,6 ]
Korniss, Gyorgy [1 ,2 ]
Mladenic, Dunja [3 ,4 ]
机构
[1] Rensselaer Polytech Inst, Network Sci & Technol Ctr, 110 8th St, Troy, NY 12180 USA
[2] Rensselaer Polytech Inst, Dept Phys Appl Phys & Astron, 110 8th St, Troy, NY 12180 USA
[3] Jozef Stefan Inst, Artificial Intelligence Lab, Jamova 39, Ljubljana, Slovenia
[4] Jozef Stefan Int Postgrad Sch, Jamova 39, Ljubljana, Slovenia
[5] Rensselaer Polytech Inst, Dept Comp Sci, 110 8th St, Troy, NY 12180 USA
[6] Wroclaw Univ Sci & Technol, Fac Comp Sci & Management, Wroclaw, Poland
基金
欧盟地平线“2020”;
关键词
Social networks; Complex behavior prediction; Probability of default; Feature selection; Call detail record dataset; VARIABLE IMPORTANCE; RELATIVE WEIGHT; DEFAULT;
D O I
10.1016/j.ins.2019.05.082
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Call Detail Record (CDR) datasets provide enough information about personal interactions of cell phone service customers to enable building detailed social networks. We take one such dataset and create a realistic social network to predict which customer will default on payments for the phone services, a complex behavior combining social, economic, and legal considerations. After extracting a large feature set from this network, we find that each feature poorly correlates with the default status. Hence, we develop a sophisticated model to enable reliable predictions. Our main contribution is a methodology for building complex behavior models from very large sets of diverse features and using different methods to choose those features that perform best for the final model. This approach enables us to identify the most efficient features for our problem which, unexpectedly, are based on the number of unique users with whom the given user communicates around the Christmas and New Year's Eve holidays. In general, features based on the number of close ties maintained by a user perform better than others. Our resulting models significantly outperform. the methods currently published in the literature. The paper contributes also a systematic analysis of properties of the network derived from CDR. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页码:217 / 228
页数:12
相关论文
共 33 条
[1]   Predicting financial trouble using call data-On social capital, phone logs, and financial trouble [J].
Agarwal, Rishav Raj ;
Lin, Chia -Ching ;
Chen, Kuan-Ta ;
Singh, Vivek Kumar .
PLOS ONE, 2018, 13 (02)
[2]  
[Anonymous], 2011, Dynamics of Socio-Economic Systems
[3]  
[Anonymous], 1987, P 2 INT TAMP C STAT
[4]   The dominance analysis approach for comparing predictors in multiple regression [J].
Azen, R ;
Budescu, DV .
PSYCHOLOGICAL METHODS, 2003, 8 (02) :129-148
[5]   A Personal Credit Rating Prediction Model Using Data Mining in Smart Ubiquitous Environments [J].
Bae, Jae Kwon ;
Kim, Jinhwa .
INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2015,
[6]  
Bjorkegren D., 2017, ARXIV171205840
[7]   A survey of results on mobile phone datasets analysis [J].
Blondel, Vincent D. ;
Decuyper, Adeline ;
Krings, Gautier .
EPJ DATA SCIENCE, 2015, 4 (01) :1-55
[8]  
Chawla NV, 2010, DATA MINING AND KNOWLEDGE DISCOVERY HANDBOOK, SECOND EDITION, P875, DOI 10.1007/978-0-387-09823-4_45
[9]   Predicting and Deterring Default with Social Media Information in Peer-to-Peer Lending [J].
Ge, Ruyi ;
Feng, Juan ;
Gu, Bin ;
Zhang, Pengzhu .
JOURNAL OF MANAGEMENT INFORMATION SYSTEMS, 2017, 34 (02) :401-424
[10]  
Jierui Xie, 2011, 2011 IEEE First International Network Science Workshop (NSW 2011), P188, DOI 10.1109/NSW.2011.6004645