Mapping of Financial Services datasets using Human-in-the-Loop

被引：1

作者：

Asthana, Shubhi ^{[1
]}

Mahindru, Ruchi ^{[2
]}

机构：

[1] IBM Res, San Jose, CA 95120 USA

[2] IBM Res, Yorktown Hts, NY USA

来源：

3RD ACM INTERNATIONAL CONFERENCE ON AI IN FINANCE, ICAIF 2022 | 2022年

关键词：

Data Mapping; Financial Services; Data Analytics; Human-in-the-Loop; Clustering; INFORMATION;

D O I：

10.1145/3533271.3561705

中图分类号：

F8 [财政、金融];

学科分类号：

0202 ;

摘要：

Increasing access to financial services data helps accelerate the monitoring and management of datasets and facilitates better business decision-making. However, financial services datasets are typically vast, ranging in terabytes of data, containing both structured and unstructured. It is a laborious task to comb through all the data and map them reasonably. Mapping the data is important to perform comprehensive analysis and take informed business decisions. Based on client engagements, we have observed that there is a lack of industry standards for definitions of key terms and a lack of governance for maintaining business processes. This typically leads to disconnected siloed datasets generated from disintegrated systems. To address these challenges, we developed a novel methodology DaME (Data Mapping Engine) that performs data mapping by training a data mapping engine and utilizing human-in-the-loop techniques. The results from the industrial application and evaluation of DaME on a financial services dataset are encouraging that it can help reduce manual effort by automating data mapping and reusing the learning. The accuracy from our dataset in the application is much higher at 69% compared to the existing state-of-the-art with an accuracy of 34%. It has also helped improve the productivity of the industry practitioners, by saving them 14,000 hours of time spent manually mapping vast data stores over a period of ten months.

引用

页码：183 / 191

页数：9

共 17 条

[1]

Ahamed Bazeer, 2020, 2020 International Conference on Computer Science and Software Engineering (CSASE). Proceedings, P56, DOI 10.1109/CSASE48920.2020.9142093

[2] Toward a complete dataset of drug-drug interaction information from publicly available sources [J].