Telecom Fraud Detection Based on Feature Binning and Autoencoder

被引：2

作者：

Liang, Fei-Yao ^{[1
,3
]}

Li, Fei-Peng ^{[2
,3
]}

Xu, Rong-Hai ^{[1
,3
]}

Cheng, Wei ^{[2
,3
]}

Deng, Shi-Xian ^{[2
,3
]}

Yang, Zhe-Rui ^{[1
,3
]}

Wang, Chang-Dong ^{[1
,3
]}

机构：

[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou, Peoples R China

[2] Guangdong Unicom, Guangzhou, Peoples R China

[3] Sun Yat Sen Univ Guangdong Unicom Lab Comp Power, Guangzhou, Guangdong, Peoples R China

来源：

23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, ICDM 2023 | 2023年

关键词：

telecom fraud detection; feature binning; autoencoder; imbalance classification;

D O I：

10.1109/ICDM58522.2023.00046

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the rapid development of modern communication technology, telecom fraud has been increasing year by year. If fraudsters can be accurately identified before they carry out their scams, it can not only protect people from potential losses but also increase trust in telecom operators. Therefore, in recent years, telecom fraud detection has garnered widespread attention in both academia and industry. Although existing methods for telecom fraud detection have achieved good performance, there are still many unresolved issues for real world telecom operators. First, existing methods only focus on a single telecom scenario, while real -world telecom scenarios are diverse. Utilizing the characteristics of these different telecom scenarios can improve the effectiveness of telecom fraud detection. Second, existing methods usually use Graph Neural Networks (GNNs) to aggregate neighbor information. However, real-world telecom operators can't obtain information of users from other operators, resulting in the lacking destination node attributes, which degenerates the performance of GNNs. To address the above issues, in this paper, we propose a new model for Telecom Fraud Detection Based on Feature binning and Autoencoder (TFD-FA). In TED -FA, a feature binning framework is designed to partition users into different telecom scenarios in order to reflect their unique characteristics. An autoencoder component is also designed to aggregate neighbor information. Furthermore, an imbalance classifier component is constructed to solve the problem of the significantly lower number of fraudsters compared to normal users. Extensive experiments in a real world dataset demonstrate the effectiveness of TED -FA, which outperforms the compared baseline models.

引用

页码：368 / 377

页数：10

共 40 条

[1]

[Anonymous], 2014, Empirical evaluation of gated recurrent neural networks on sequence modeling

[2] Latent Dirichlet allocation [J].