Mapping of Financial Services datasets using Human-in-the-Loop

被引:1
作者
Asthana, Shubhi [1 ]
Mahindru, Ruchi [2 ]
机构
[1] IBM Res, San Jose, CA 95120 USA
[2] IBM Res, Yorktown Hts, NY USA
来源
3RD ACM INTERNATIONAL CONFERENCE ON AI IN FINANCE, ICAIF 2022 | 2022年
关键词
Data Mapping; Financial Services; Data Analytics; Human-in-the-Loop; Clustering; INFORMATION;
D O I
10.1145/3533271.3561705
中图分类号
F8 [财政、金融];
学科分类号
0202 ;
摘要
Increasing access to financial services data helps accelerate the monitoring and management of datasets and facilitates better business decision-making. However, financial services datasets are typically vast, ranging in terabytes of data, containing both structured and unstructured. It is a laborious task to comb through all the data and map them reasonably. Mapping the data is important to perform comprehensive analysis and take informed business decisions. Based on client engagements, we have observed that there is a lack of industry standards for definitions of key terms and a lack of governance for maintaining business processes. This typically leads to disconnected siloed datasets generated from disintegrated systems. To address these challenges, we developed a novel methodology DaME (Data Mapping Engine) that performs data mapping by training a data mapping engine and utilizing human-in-the-loop techniques. The results from the industrial application and evaluation of DaME on a financial services dataset are encouraging that it can help reduce manual effort by automating data mapping and reusing the learning. The accuracy from our dataset in the application is much higher at 69% compared to the existing state-of-the-art with an accuracy of 34%. It has also helped improve the productivity of the industry practitioners, by saving them 14,000 hours of time spent manually mapping vast data stores over a period of ten months.
引用
收藏
页码:183 / 191
页数:9
相关论文
共 17 条
[1]  
Ahamed Bazeer, 2020, 2020 International Conference on Computer Science and Software Engineering (CSASE). Proceedings, P56, DOI 10.1109/CSASE48920.2020.9142093
[2]   Toward a complete dataset of drug-drug interaction information from publicly available sources [J].
Ayvaz, Serkan ;
Horn, John ;
Hassanzadeh, Oktie ;
Zhu, Qian ;
Stan, Johann ;
Tatonetti, Nicholas P. ;
Vilar, Santiago ;
Brochhausen, Mathias ;
Samwald, Matthias ;
Rastegar-Mojarad, Majid ;
Dumontier, Michel ;
Boyce, Richard D. .
JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 55 :206-217
[3]  
Benesty J, Noise reduction in speech processing
[4]   Ontology Mapping Framework with Feature Extraction and Semantic Embeddings [J].
Chandrashekar, Mayanka ;
Nagulapati, Rohithkumar ;
Lee, Yugyung .
2018 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS WORKSHOPS (ICHI-W), 2018, :34-42
[5]  
Hartigan J. A., 1979, Applied Statistics, V28, P100, DOI 10.2307/2346830
[6]   A Collective, Probabilistic Approach to Schema Mapping Using Diverse Noisy Evidence [J].
Kimmig, Angelika ;
Memory, Alex ;
Miller, Renee J. ;
Getoor, Lise .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (08) :1426-1439
[7]   SystemT: A System for Declarative Information Extraction [J].
Krishnamurthy, Rajasekar ;
Li, Yunyao ;
Raghavan, Sriram ;
Reiss, Frederick ;
Vaithyanathan, Shivakumar ;
Zhu, Huaiyu .
SIGMOD RECORD, 2008, 37 (04) :7-13
[8]  
Luján-Mora S, 2004, LECT NOTES COMPUT SC, V3288, P191
[9]   WORDNET - A LEXICAL DATABASE FOR ENGLISH [J].
MILLER, GA .
COMMUNICATIONS OF THE ACM, 1995, 38 (11) :39-41
[10]  
Montgomery DC., 2021, Introduction to linear regression analysis