Relational Data Pre-Processing Techniques for Improved Securities Fraud Detection

被引:0
|
作者
Fast, Andrew [1 ]
Friedland, Lisa [1 ]
Maier, Marc [1 ]
Taylor, Brian [1 ]
Jensen, David [1 ]
Goldberg, Henry G. [2 ]
Komoroske, John [2 ]
机构
[1] Univ Massachusetts, Dept Comp Sci, Amherst, MA 01003 USA
[2] Natl Associat Secur Dealers, Washington, DC 20006 USA
基金
美国国家科学基金会;
关键词
Fraud detection; data pre-processing; statistical relational learning; normalization; relational probability trees;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Commercial datasets are often large, relational, and dynamic. They contain many records of people, places, things, events and their interactions over time. Such datasets are rarely structured appropriately for knowledge discovery, and they often contain variables whose meanings change across different subsets of the data. We describe how these challenges were addressed in a collaborative analysis project undertaken by the University of Massachusetts Amherst and the National Association of Securities Dealers (NASD). We describe several methods for data preprocessing that we applied to transform a large, dynamic, and relational dataset describing nearly the entirety of the U.S. securities industry, and we show how these methods made the dataset suitable for learning statistical relational models. To better utilize social structure, we first applied known consolidation and link formation techniques to associate individuals with branch office locations. In addition, we developed an innovative technique to infer professional associations by exploiting dynamic employment histories. Finally, we applied normalization techniques to create a suitable class label that adjusts for spatial, temporal, and other heterogeneity within the data. We show how these pre-processing techniques combine to provide the necessary foundation for learning high-performing statistical models of fraudulent activity.
引用
收藏
页码:941 / +
页数:2
相关论文
共 50 条
  • [31] Drought Forecasting: A Review and Assessment of the Hybrid Techniques and Data Pre-Processing
    Alawsi, Mustafa A.
    Zubaidi, Salah L.
    Al-Bdairi, Nabeel Saleem Saad
    Al-Ansari, Nadhir
    Hashim, Khalid
    HYDROLOGY, 2022, 9 (07)
  • [32] Big Data Pre-processing Techniques Within the Wireless Sensors Networks
    Fouad, Mohamed Mostafa
    Gaber, Tarek
    Ahmed, Maamoun
    Oweis, Nour E.
    Snasel, Vaclav
    PROCEEDINGS OF THE SECOND INTERNATIONAL AFRO-EUROPEAN CONFERENCE FOR INDUSTRIAL ADVANCEMENT (AECIA 2015), 2016, 427 : 667 - 677
  • [33] Machine Learning Approach for Detection of Diabetic Retinopathy with Improved Pre-Processing
    Sharma, Ayushi
    Shinde, Swapnil
    Shaikh, Imran Ismail
    Vyas, Madhav
    Rani, Soumya
    2021 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, AND INTELLIGENT SYSTEMS (ICCCIS), 2021, : 517 - 522
  • [34] An Improved MSR-Based Data-Driven Detection Method Using Smoothing Pre-Processing
    Yan, Yujia
    Wu, Guangxin
    Dong, Yang
    Bai, Yechao
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 444 - 448
  • [35] An improved MSR-based data-driven detection method using smoothing pre-processing
    Yan, Yujia
    Wu, Guangxin
    Dong, Yang
    Bai, Yechao
    IEEE Signal Processing Letters, 2021, 28 : 444 - 448
  • [36] Identification of Emotions in Text Articles through Data Pre-Processing and Data Mining Techniques
    GeethaRamani, R.
    Kumar, M. Naveen
    Balasubramanian, Lakshmi
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION CONTROL AND COMPUTING TECHNOLOGIES (ICACCCT), 2016, : 611 - 615
  • [37] Big data pre-processing methods with vehicle driving data using MapReduce techniques
    Cho, Wonhee
    Choi, Eunmi
    JOURNAL OF SUPERCOMPUTING, 2017, 73 (07): : 3179 - 3195
  • [38] Big data pre-processing methods with vehicle driving data using MapReduce techniques
    Wonhee Cho
    Eunmi Choi
    The Journal of Supercomputing, 2017, 73 : 3179 - 3195
  • [39] On Pre-processing Algorithms for Data Stream
    Duda, Piotr
    Jaworski, Maciej
    Pietruczuk, Lena
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT II, 2012, 7268 : 56 - 63
  • [40] Kurtosis removal for data pre-processing
    Loperfido, Nicola
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2023, 17 (01) : 239 - 267