Relational Data Pre-Processing Techniques for Improved Securities Fraud Detection

被引:0
|
作者
Fast, Andrew [1 ]
Friedland, Lisa [1 ]
Maier, Marc [1 ]
Taylor, Brian [1 ]
Jensen, David [1 ]
Goldberg, Henry G. [2 ]
Komoroske, John [2 ]
机构
[1] Univ Massachusetts, Dept Comp Sci, Amherst, MA 01003 USA
[2] Natl Associat Secur Dealers, Washington, DC 20006 USA
基金
美国国家科学基金会;
关键词
Fraud detection; data pre-processing; statistical relational learning; normalization; relational probability trees;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Commercial datasets are often large, relational, and dynamic. They contain many records of people, places, things, events and their interactions over time. Such datasets are rarely structured appropriately for knowledge discovery, and they often contain variables whose meanings change across different subsets of the data. We describe how these challenges were addressed in a collaborative analysis project undertaken by the University of Massachusetts Amherst and the National Association of Securities Dealers (NASD). We describe several methods for data preprocessing that we applied to transform a large, dynamic, and relational dataset describing nearly the entirety of the U.S. securities industry, and we show how these methods made the dataset suitable for learning statistical relational models. To better utilize social structure, we first applied known consolidation and link formation techniques to associate individuals with branch office locations. In addition, we developed an innovative technique to infer professional associations by exploiting dynamic employment histories. Finally, we applied normalization techniques to create a suitable class label that adjusts for spatial, temporal, and other heterogeneity within the data. We show how these pre-processing techniques combine to provide the necessary foundation for learning high-performing statistical models of fraudulent activity.
引用
收藏
页码:941 / +
页数:2
相关论文
共 50 条
  • [41] Kurtosis removal for data pre-processing
    Nicola Loperfido
    Advances in Data Analysis and Classification, 2023, 17 : 239 - 267
  • [42] Intelligent assistance for data pre-processing
    Bilalli, Besim
    Abello, Alberto
    Aluja-Banet, Tomas
    Wrembel, Robert
    COMPUTER STANDARDS & INTERFACES, 2018, 57 : 101 - 109
  • [43] A NEW METHOD FOR DATA PRE-PROCESSING
    RAISINGHANI, SC
    BILIMORIA, KD
    JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 1984, 7 (02) : 255 - 256
  • [44] Pre-Processing Methods of Data Mining
    Saleem, Asma
    Asif, Khadim Hussain
    Ali, Ahmad
    Awan, Shahid Mahmood
    AlGhamdi, Mohammed A.
    2014 IEEE/ACM 7TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2014, : 451 - 456
  • [45] Pre-processing VDIF Data in FPGA
    Gan, Jiangying
    Xu, Zhijun
    2018 PROGRESS IN ELECTROMAGNETICS RESEARCH SYMPOSIUM (PIERS-TOYAMA), 2018, : 723 - 728
  • [46] A Study of Underwater Image Pre-processing and Techniques
    Prasenan, Pooja
    Suriyakala, C. D.
    COMPUTATIONAL VISION AND BIO-INSPIRED COMPUTING ( ICCVBIC 2021), 2022, 1420 : 313 - 333
  • [47] PRE-PROCESSING OF DATA FOR CHARACTER RECOGNITION
    ALCORN, TM
    HOGGAR, CW
    MARCONI REVIEW, 1969, 32 (172): : 61 - &
  • [48] Image pre-processing techniques for auto focusing
    Li, Qi
    Feng, Hua-Jun
    Xu, Zhi-Hai
    Guangdian Gongcheng/Opto-Electronic Engineering, 2004, 31 (09):
  • [49] Pre-processing Agilent microarray data
    Zahurak, Marianna
    Parmigiani, Giovanni
    Yu, Wayne
    Scharpf, Robert B.
    Berman, David
    Schaeffer, Edward
    Shabbeer, Shabana
    Cope, Leslie
    BMC BIOINFORMATICS, 2007, 8 (1)
  • [50] Pre-processing Agilent microarray data
    Marianna Zahurak
    Giovanni Parmigiani
    Wayne Yu
    Robert B Scharpf
    David Berman
    Edward Schaeffer
    Shabana Shabbeer
    Leslie Cope
    BMC Bioinformatics, 8