Telecom Fraud Detection Based on Feature Binning and Autoencoder

被引:2
作者
Liang, Fei-Yao [1 ,3 ]
Li, Fei-Peng [2 ,3 ]
Xu, Rong-Hai [1 ,3 ]
Cheng, Wei [2 ,3 ]
Deng, Shi-Xian [2 ,3 ]
Yang, Zhe-Rui [1 ,3 ]
Wang, Chang-Dong [1 ,3 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou, Peoples R China
[2] Guangdong Unicom, Guangzhou, Peoples R China
[3] Sun Yat Sen Univ Guangdong Unicom Lab Comp Power, Guangzhou, Guangdong, Peoples R China
来源
23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, ICDM 2023 | 2023年
关键词
telecom fraud detection; feature binning; autoencoder; imbalance classification;
D O I
10.1109/ICDM58522.2023.00046
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid development of modern communication technology, telecom fraud has been increasing year by year. If fraudsters can be accurately identified before they carry out their scams, it can not only protect people from potential losses but also increase trust in telecom operators. Therefore, in recent years, telecom fraud detection has garnered widespread attention in both academia and industry. Although existing methods for telecom fraud detection have achieved good performance, there are still many unresolved issues for real world telecom operators. First, existing methods only focus on a single telecom scenario, while real -world telecom scenarios are diverse. Utilizing the characteristics of these different telecom scenarios can improve the effectiveness of telecom fraud detection. Second, existing methods usually use Graph Neural Networks (GNNs) to aggregate neighbor information. However, real-world telecom operators can't obtain information of users from other operators, resulting in the lacking destination node attributes, which degenerates the performance of GNNs. To address the above issues, in this paper, we propose a new model for Telecom Fraud Detection Based on Feature binning and Autoencoder (TFD-FA). In TED -FA, a feature binning framework is designed to partition users into different telecom scenarios in order to reflect their unique characteristics. An autoencoder component is also designed to aggregate neighbor information. Furthermore, an imbalance classifier component is constructed to solve the problem of the significantly lower number of fraudsters compared to normal users. Extensive experiments in a real world dataset demonstrate the effectiveness of TED -FA, which outperforms the compared baseline models.
引用
收藏
页码:368 / 377
页数:10
相关论文
共 40 条
[1]  
[Anonymous], 2014, Empirical evaluation of gated recurrent neural networks on sequence modeling
[2]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[3]  
Bolton RJ, 2002, STAT SCI, V17, P235
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[6]   Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters [J].
Dou, Yingtong ;
Liu, Zhiwei ;
Sun, Li ;
Deng, Yutong ;
Peng, Hao ;
Yu, Philip S. .
CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, :315-324
[7]   FINDING STRUCTURE IN TIME [J].
ELMAN, JL .
COGNITIVE SCIENCE, 1990, 14 (02) :179-211
[8]   Greedy function approximation: A gradient boosting machine [J].
Friedman, JH .
ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232
[9]  
Glorot X., 2010, P 13 INT C ARTIFICIA, P249
[10]   An application of supervised and unsupervised learning approaches to telecommunications fraud detection [J].
Hilas, Constantinos S. ;
Mastorocostas, Paris As. .
KNOWLEDGE-BASED SYSTEMS, 2008, 21 (07) :721-726