Anomalous Telecom Customer Behavior Detection and Clustering Analysis Based on ISPs Operating Data

被引:8
作者
Zheng, Feng [1 ,2 ]
Liu, Quanyun [1 ,2 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing Key Lab Network Syst Architecture & Conve, Beijing 100876, Peoples R China
[2] Beijing Univ Posts & Telecommun, Beijing Lab Adv Informat Networks, Beijing 100876, Peoples R China
关键词
Telecommunications; Clustering algorithms; Dimensionality reduction; Anomaly detection; Mobile handsets; Approximation algorithms; clustering analysis; behavior analysis; telecom operators; dimension reduction; COMPONENT ANALYSIS; DIMENSIONALITY;
D O I
10.1109/ACCESS.2020.2976898
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Mobile networks and smart phones have become ubiquitous in our daily life. Large amount of customer related telecom data from various sources are generated every day, from which diversified behavior patterns can be revealed, including some anomalous behaviors that are vicious. It becomes increasingly important to achieve both efficient and effective customer behavior analysis based on the telecom big data. In this paper, the Multi-faceted Telecom Customer Behavior Analysis (MTCBA) framework for anomalous telecom customer behavior detection and clustering analysis is proposed. In this framework, we further design the hierarchical Locality Sensitive Hashing-Local Outlier Factor (hierarchical LSH-LOF) scheme for suspicious customer detection, and the Autoencoders with Factorization Machines (FM-AE) structure for dimension reduction to achieve more efficient clustering. Hierarchical LSH-LOF is an improved algorithm of LOF, in which we design a hierarchical LSH process that selects the approximate k nearest neighbors from coarse to fine by gradually narrowing down the scope. Experiments show its superiority over KD-tree w.r.t searching speed. FM-AE exploits Factorization Machines for learning second order feature interactions, which we prove to be useful by designing comparative experiments with five dimension reduction algorithms. With the proposed MTCBA framework, efficient and effective telecom customer behavior analysis including anomalous customer behavior detection and clustering analysis is performed on the real world telecom operating data provided by one of the major Internet service providers (ISPs) in China. Meanwhile, interpretable clustering results of six clusters are obtained to provide valuable information for the precision marketing of telecom operators, criminal combating, and social credit system construction.
引用
收藏
页码:42734 / 42748
页数:15
相关论文
共 36 条
[1]   MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING [J].
BENTLEY, JL .
COMMUNICATIONS OF THE ACM, 1975, 18 (09) :509-517
[2]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[3]  
Charikar M. S., 2002, P 34 ANN ACM S THEOR, P380, DOI DOI 10.1145/509907.509965
[4]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[5]  
Cheng C, 2016, 2016 16TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES (ISCIT), P324, DOI 10.1109/ISCIT.2016.7751644
[6]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893
[7]  
Finkelstein A., ARXIV170100220, P2017
[8]   The use of multiple measurements in taxonomic problems [J].
Fisher, RA .
ANNALS OF EUGENICS, 1936, 7 :179-188
[9]  
Gionis A, 1999, PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P518
[10]   Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming [J].
Goemans, MX ;
Williamson, DP .
JOURNAL OF THE ACM, 1995, 42 (06) :1115-1145