Relational Data Pre-Processing Techniques for Improved Securities Fraud Detection

被引:0
|
作者
Fast, Andrew [1 ]
Friedland, Lisa [1 ]
Maier, Marc [1 ]
Taylor, Brian [1 ]
Jensen, David [1 ]
Goldberg, Henry G. [2 ]
Komoroske, John [2 ]
机构
[1] Univ Massachusetts, Dept Comp Sci, Amherst, MA 01003 USA
[2] Natl Associat Secur Dealers, Washington, DC 20006 USA
来源
KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING | 2007年
基金
美国国家科学基金会;
关键词
Fraud detection; data pre-processing; statistical relational learning; normalization; relational probability trees;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Commercial datasets are often large, relational, and dynamic. They contain many records of people, places, things, events and their interactions over time. Such datasets are rarely structured appropriately for knowledge discovery, and they often contain variables whose meanings change across different subsets of the data. We describe how these challenges were addressed in a collaborative analysis project undertaken by the University of Massachusetts Amherst and the National Association of Securities Dealers (NASD). We describe several methods for data preprocessing that we applied to transform a large, dynamic, and relational dataset describing nearly the entirety of the U.S. securities industry, and we show how these methods made the dataset suitable for learning statistical relational models. To better utilize social structure, we first applied known consolidation and link formation techniques to associate individuals with branch office locations. In addition, we developed an innovative technique to infer professional associations by exploiting dynamic employment histories. Finally, we applied normalization techniques to create a suitable class label that adjusts for spatial, temporal, and other heterogeneity within the data. We show how these pre-processing techniques combine to provide the necessary foundation for learning high-performing statistical models of fraudulent activity.
引用
收藏
页码:941 / +
页数:2
相关论文
共 50 条
  • [1] Analysis of activity detection data pre-processing
    Alexan, Anca
    Alexan, Alexandru
    Stefan, Oniga
    Pap, Iuliu Alexandru
    2019 IEEE 25TH INTERNATIONAL SYMPOSIUM FOR DESIGN AND TECHNOLOGY IN ELECTRONIC PACKAGING (SIITME 2019), 2019, : 282 - 286
  • [2] Data Pre-processing Techniques for Publication Performance Analysis
    Zulkepli, Fatin Shahirah
    Ibrahin, Roliana
    Saeed, Faisal
    RECENT TRENDS IN INFORMATION AND COMMUNICATION TECHNOLOGY, 2018, 5 : 59 - 65
  • [3] Survey of Pre-processing Techniques for Mining Big Data
    Hariharakrishnan, Jayaram
    Mohanavalli, S.
    Srividya
    Kumar, Sundhara K. B.
    2017 INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION AND SIGNAL PROCESSING (ICCCSP), 2017, : 77 - 81
  • [4] Drought Forecasting: A Review and Assessment of the Hybrid Techniques and Data Pre-Processing
    Alawsi, Mustafa A.
    Zubaidi, Salah L.
    Al-Bdairi, Nabeel Saleem Saad
    Al-Ansari, Nadhir
    Hashim, Khalid
    HYDROLOGY, 2022, 9 (07)
  • [5] Data pre-processing for analyzing microbiome data - A mini review
    Zhou, Ruwen
    Ng, Siu Kin
    Sung, Joseph Jao Yiu
    Goh, Wilson Wen Bin
    Wong, Sunny Hei
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2023, 21 : 4804 - 4815
  • [6] Identification of Emotions in Text Articles through Data Pre-Processing and Data Mining Techniques
    GeethaRamani, R.
    Kumar, M. Naveen
    Balasubramanian, Lakshmi
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION CONTROL AND COMPUTING TECHNOLOGIES (ICACCCT), 2016, : 611 - 615
  • [7] PRESISTANT: Data Pre-processing Assistant
    Bilalli, Besim
    Abello, Alberto
    Aluja-Banet, Tomas
    Munir, Rana Faisal
    Wrembel, Robert
    INFORMATION SYSTEMS IN THE BIG DATA ERA, 2018, 317 : 57 - 65
  • [8] Intelligent assistance for data pre-processing
    Bilalli, Besim
    Abello, Alberto
    Aluja-Banet, Tomas
    Wrembel, Robert
    COMPUTER STANDARDS & INTERFACES, 2018, 57 : 101 - 109
  • [9] Pre-Processing Methods of Data Mining
    Saleem, Asma
    Asif, Khadim Hussain
    Ali, Ahmad
    Awan, Shahid Mahmood
    AlGhamdi, Mohammed A.
    2014 IEEE/ACM 7TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2014, : 451 - 456
  • [10] A novel data pre-processing method for odour detection and identification system
    Zhang, Wentian
    Liu, Taoping
    Ye, Lin
    Ueland, Maiken
    Forbes, Shari L.
    Su, Steven W.
    SENSORS AND ACTUATORS A-PHYSICAL, 2019, 287 : 113 - 120