Relational Data Pre-Processing Techniques for Improved Securities Fraud Detection

被引:0
|
作者
Fast, Andrew [1 ]
Friedland, Lisa [1 ]
Maier, Marc [1 ]
Taylor, Brian [1 ]
Jensen, David [1 ]
Goldberg, Henry G. [2 ]
Komoroske, John [2 ]
机构
[1] Univ Massachusetts, Dept Comp Sci, Amherst, MA 01003 USA
[2] Natl Associat Secur Dealers, Washington, DC 20006 USA
来源
KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING | 2007年
基金
美国国家科学基金会;
关键词
Fraud detection; data pre-processing; statistical relational learning; normalization; relational probability trees;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Commercial datasets are often large, relational, and dynamic. They contain many records of people, places, things, events and their interactions over time. Such datasets are rarely structured appropriately for knowledge discovery, and they often contain variables whose meanings change across different subsets of the data. We describe how these challenges were addressed in a collaborative analysis project undertaken by the University of Massachusetts Amherst and the National Association of Securities Dealers (NASD). We describe several methods for data preprocessing that we applied to transform a large, dynamic, and relational dataset describing nearly the entirety of the U.S. securities industry, and we show how these methods made the dataset suitable for learning statistical relational models. To better utilize social structure, we first applied known consolidation and link formation techniques to associate individuals with branch office locations. In addition, we developed an innovative technique to infer professional associations by exploiting dynamic employment histories. Finally, we applied normalization techniques to create a suitable class label that adjusts for spatial, temporal, and other heterogeneity within the data. We show how these pre-processing techniques combine to provide the necessary foundation for learning high-performing statistical models of fraudulent activity.
引用
收藏
页码:941 / +
页数:2
相关论文
共 50 条
  • [31] Application of various Pre-processing techniques on Infrared (IR) Spectroscopy data for classification of different ghee samples
    Kumar, Navjot
    Panchariya, P. C.
    Patel, Surendra Singh
    Kiranmayee, A. H.
    Ranjan, Rishi
    2018 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2018,
  • [32] The Role of Data Pre-processing Techniques in Improving Machine Learning Accuracy for Predicting Coronary Heart Disease
    Sami, Osamah
    Elsheikh, Yousef
    Almasalha, Fadi
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (06) : 812 - 820
  • [33] Application for pre-processing and visualization of electrodermal activity wearable data
    Suoja, K.
    Liukkonen, J.
    Jussila, J.
    Salonius, H.
    Venho, N.
    Sillanpaa, V.
    Vuori, V.
    Helander, N.
    EMBEC & NBC 2017, 2018, 65 : 93 - 96
  • [34] A data pre-processing method based on multi-threshold
    Su-bida
    Wang-shuhua
    Wang-Jingfeng
    Zhong-Hua
    Deng-Rong
    Hua-Hao
    Yang-suhui
    INTERNATIONAL SYMPOSIUM ON OPTOELECTRONIC TECHNOLOGY AND APPLICATION 2014: OPTICAL REMOTE SENSING TECHNOLOGY AND APPLICATIONS, 2014, 9299
  • [35] Text Data Pre-Processing for Time-series Modelling
    Pomenkova, Jitka
    Korab, Petr
    Strba, David
    2023 33RD INTERNATIONAL CONFERENCE RADIOELEKTRONIKA, RADIOELEKTRONIKA, 2023,
  • [36] A Hybrid Model Focusing on Data Pre-Processing in Diabetes Diagnosis
    Zeidi, Farnaz
    Azar, Lalah
    Arslan, Vasfiye
    Erol, Cigdem
    CYBERNETICS AND SYSTEMS, 2023, 54 (07) : 1199 - 1211
  • [37] A new method of power grid huge data pre-processing
    Qu, Zhaoyang
    Liu, Jingmin
    CEIS 2011, 2011, 15
  • [38] Pre-Processing Flow for Enhancing Learning from Medical Data
    Muresan, Sebastian
    Faloba, Ioana
    Lemnaru, Camelia
    Potolea, Rodica
    2015 IEEE 11TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP), 2015, : 27 - 34
  • [39] ANALYSIS OF DATA PRE-PROCESSING METHODS FOR SENTIMENT ANALYSIS OF REVIEWS
    Parlar, Tuba
    Ozel, Selma Ayse
    Song, Fei
    COMPUTER SCIENCE-AGH, 2019, 20 (01): : 123 - 141
  • [40] prewas: data pre-processing for more informative bacterial GWAS
    Saund, Katie
    Lapp, Zena
    Thiede, Stephanie N.
    Pirani, Ali
    Snitkin, Evan S.
    MICROBIAL GENOMICS, 2020, 6 (05): : 1 - 8