Relational Data Pre-Processing Techniques for Improved Securities Fraud Detection

被引:0
|
作者
Fast, Andrew [1 ]
Friedland, Lisa [1 ]
Maier, Marc [1 ]
Taylor, Brian [1 ]
Jensen, David [1 ]
Goldberg, Henry G. [2 ]
Komoroske, John [2 ]
机构
[1] Univ Massachusetts, Dept Comp Sci, Amherst, MA 01003 USA
[2] Natl Associat Secur Dealers, Washington, DC 20006 USA
来源
KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING | 2007年
基金
美国国家科学基金会;
关键词
Fraud detection; data pre-processing; statistical relational learning; normalization; relational probability trees;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Commercial datasets are often large, relational, and dynamic. They contain many records of people, places, things, events and their interactions over time. Such datasets are rarely structured appropriately for knowledge discovery, and they often contain variables whose meanings change across different subsets of the data. We describe how these challenges were addressed in a collaborative analysis project undertaken by the University of Massachusetts Amherst and the National Association of Securities Dealers (NASD). We describe several methods for data preprocessing that we applied to transform a large, dynamic, and relational dataset describing nearly the entirety of the U.S. securities industry, and we show how these methods made the dataset suitable for learning statistical relational models. To better utilize social structure, we first applied known consolidation and link formation techniques to associate individuals with branch office locations. In addition, we developed an innovative technique to infer professional associations by exploiting dynamic employment histories. Finally, we applied normalization techniques to create a suitable class label that adjusts for spatial, temporal, and other heterogeneity within the data. We show how these pre-processing techniques combine to provide the necessary foundation for learning high-performing statistical models of fraudulent activity.
引用
收藏
页码:941 / +
页数:2
相关论文
共 50 条
  • [21] Pre-processing Techniques for Colour Digital Pathology Image Analysis
    Saafin, Wael
    Schaefer, Gerald
    MEDICAL IMAGE UNDERSTANDING AND ANALYSIS (MIUA 2017), 2017, 723 : 551 - 560
  • [22] Data Preparation for Pre-processing on Oral Cancer Dataset
    Mohd, Fatihah
    Abu Bakar, Zainab
    Noor, Noor Maizura Mohamad
    Rajion, Zainul Ahmad
    2013 13TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2013), 2013, : 324 - 328
  • [23] PRESISTANT: Learning based assistant for data pre-processing
    Bilalli, Besim
    Abello, Alberto
    Aluja-Banet, Tomas
    Wrembel, Robert
    DATA & KNOWLEDGE ENGINEERING, 2019, 123
  • [24] Data Pre-Processing Method for Industrie 4.0 Applications
    Czwick, Cordula
    Anderl, Reiner
    3RD INTERNATIONAL CONFERENCE ON INDUSTRY 4.0 AND SMART MANUFACTURING, 2022, 200 : 327 - 336
  • [25] Data Pre-Processing for More Effective Gene Clustering
    Hou, Jingyu
    Chen, Yi-Ping Phoebe
    INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL SCIENCES AND OPTIMIZATION, VOL 1, PROCEEDINGS, 2009, : 710 - 713
  • [26] protGear: A protein microarray data pre-processing suite
    Mwai, Kennedy
    Kibinge, Nelson
    Tuju, James
    Kamuyu, Gathoni
    Kimathi, Rinter
    Mburu, James
    Chepsat, Emily
    Nyamako, Lydia
    Chege, Timothy
    Nkumama, Irene
    Kinyanjui, Samson
    Musenge, Eustasius
    Osier, Faith
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 2518 - 2525
  • [27] Pre-Processing of Affymetrix Gene Chip Microarray Data
    Hasan, Ahmed R.
    Pattison, John E.
    Hariz, Alex
    CURRENT BIOINFORMATICS, 2010, 5 (04) : 270 - 279
  • [28] The method of data pre-processing in grey information systems
    Wu, S. X.
    Liu, S. F.
    Li, M. Q.
    2006 9TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION, VOLS 1- 5, 2006, : 1988 - +
  • [29] Research of VGOS baseband data pre-processing system
    Gan Jiangying
    Guo Shaoguang
    He Xuan
    Liu Cong
    Sun Zhengxiong
    Li Jiyun
    Ma Langming
    Shu Fengchun
    Zhang Xiuzhong
    CHINESE SPACE SCIENCE AND TECHNOLOGY, 2022, 42 (06) : 46 - 53
  • [30] Improving the performance of data-driven techniques through data pre-processing for modelling daily reservoir inflow
    Jothiprakash, V.
    Kote, Alka S.
    HYDROLOGICAL SCIENCES JOURNAL-JOURNAL DES SCIENCES HYDROLOGIQUES, 2011, 56 (01): : 168 - 186