Relational Data Pre-Processing Techniques for Improved Securities Fraud Detection

被引:0
|
作者
Fast, Andrew [1 ]
Friedland, Lisa [1 ]
Maier, Marc [1 ]
Taylor, Brian [1 ]
Jensen, David [1 ]
Goldberg, Henry G. [2 ]
Komoroske, John [2 ]
机构
[1] Univ Massachusetts, Dept Comp Sci, Amherst, MA 01003 USA
[2] Natl Associat Secur Dealers, Washington, DC 20006 USA
来源
KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING | 2007年
基金
美国国家科学基金会;
关键词
Fraud detection; data pre-processing; statistical relational learning; normalization; relational probability trees;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Commercial datasets are often large, relational, and dynamic. They contain many records of people, places, things, events and their interactions over time. Such datasets are rarely structured appropriately for knowledge discovery, and they often contain variables whose meanings change across different subsets of the data. We describe how these challenges were addressed in a collaborative analysis project undertaken by the University of Massachusetts Amherst and the National Association of Securities Dealers (NASD). We describe several methods for data preprocessing that we applied to transform a large, dynamic, and relational dataset describing nearly the entirety of the U.S. securities industry, and we show how these methods made the dataset suitable for learning statistical relational models. To better utilize social structure, we first applied known consolidation and link formation techniques to associate individuals with branch office locations. In addition, we developed an innovative technique to infer professional associations by exploiting dynamic employment histories. Finally, we applied normalization techniques to create a suitable class label that adjusts for spatial, temporal, and other heterogeneity within the data. We show how these pre-processing techniques combine to provide the necessary foundation for learning high-performing statistical models of fraudulent activity.
引用
收藏
页码:941 / +
页数:2
相关论文
共 50 条
  • [41] A Survey of Medicare Data Processing and Integration for Fraud Detection
    Bauder, Richard A.
    Khoshgoftaar, Taghi M.
    2018 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2018, : 9 - 14
  • [42] Review of the most common pre-processing techniques for near-infrared spectra
    Rinnan, Asmund
    van den Berg, Frans
    Engelsen, Soren Balling
    TRAC-TRENDS IN ANALYTICAL CHEMISTRY, 2009, 28 (10) : 1201 - 1222
  • [43] A data pre-processing method for web content mining based on XML
    Zhang, Zhonglin
    Chen, Zhi
    2007 International Symposium on Computer Science & Technology, Proceedings, 2007, : 525 - 528
  • [44] Data Pre-processing from Production Processes for Analysis in Automotive Industry
    Simoncicova, Veronika
    Hrcka, Lukas
    Tadanai, Ondrej
    Tanuska, Pavol
    Vazan, Pavel
    CENTRAL EUROPEAN CONFERENCE ON INFORMATION AND INTELLIGENT SYSTEMS (CECIIS 2016), 2016, : 17 - 21
  • [45] Data pre-processing for cardiovascular disease classification: A systematic literature review
    Javid, Irfan
    Ghazali, Rozaida
    Zulqarnain, Muhammad
    Hassan, Norlida
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (01) : 1525 - 1545
  • [46] Data Pre-Processing of Inertial Measurement Unit Based on Abnormity Analysis
    Fan, Jinhua
    Song, Jianying
    Peng, Jie
    Guo, Xianfeng
    2015 27TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2015, : 1812 - 1816
  • [47] Data Pre-Processing Evaluation for Text Mining: Transaction/Sequence Model
    Munkova, Dasa
    Munk, Michal
    Vozar, Martin
    2013 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, 2013, 18 : 1198 - 1207
  • [48] Pre-processing of Power Measurements Data at Substations Based on Lagrange Interpolation
    Pei, Xingyu
    Shi, Yujie
    Huang, Minglei
    Song, Weijie
    2021 IEEE IAS INDUSTRIAL AND COMMERCIAL POWER SYSTEM ASIA (IEEE I&CPS ASIA 2021), 2021, : 948 - 952
  • [49] A MATLAB toolbox for data pre-processing and multivariate statistical process control
    Yi, Gang
    Herdsman, Craig
    Morris, Julian
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2019, 194
  • [50] A Study of the Data Pre-Processing Module of the Dendritic Cell Evolutionary Algorithm
    Chelly, Zeineb
    Elouedi, Zied
    2014 INTERNATIONAL CONFERENCE ON CONTROL, DECISION AND INFORMATION TECHNOLOGIES (CODIT), 2014, : 634 - 639