Assessment of Creditworthiness Models Privacy-Preserving Training with Synthetic Data

被引：3

作者：

Munoz-Cancino, Ricardo ^{[1
]}

Bravo, Cristian ^{[2
]}

Rios, Sebastian A. ^{[1
]}

Grana, Manuel ^{[3
]}

机构：

[1] Univ Chile, Dept Ind Engn, Business Intelligence Res Ctr CEINE, Beauchef 851, Santiago 8370456, Chile

[2] Univ Western Ontario, Dept Stat & Actuarial Sci, 1151 Richmond St, London, ON N6A 3K7, Canada

[3] Univ Basque Country, Computat Intelligence Grp, San Sebastian 20018, Spain

来源：

HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2022 | 2022年 / 13469卷

基金：

加拿大自然科学与工程研究理事会;

关键词：

Credit scoring; Synthetic data; Generative adversarial networks; Variational autoencoders;

D O I：

10.1007/978-3-031-15471-3_32

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Credit scoring models are the primary instrument used by financial institutions to manage credit risk. The scarcity of research on behavioral scoring is due to the difficult data access. Financial institutions have to maintain the privacy and security of borrowers' information refrain them from collaborating in research initiatives. In this work, we present a methodology that allows us to evaluate the performance of models trained with synthetic data when they are applied to real-world data. Our results show that synthetic data quality is increasingly poor when the number of attributes increases. However, creditworthiness assessment models trained with synthetic data show a reduction of 3% of AUC and 6% of KS when compared with models trained with real data. These results have a significant impact since they encourage credit risk investigation from synthetic data, making it possible to maintain borrowers' privacy and to address problems that until now have been hampered by the availability of information.

引用

页码：375 / 384

页数：10

共 50 条

[1] Privacy-Preserving Synthetic Smart Meters Data
Del Grosso, Ganesh
Pichler, Georg
Piantanida, Pablo
2021 IEEE POWER & ENERGY SOCIETY INNOVATIVE SMART GRID TECHNOLOGIES CONFERENCE (ISGT), 2021,
[2] Privacy-Preserving Synthetic Location Data in the Real World
Cunningham, Teddy
Cormode, Graham
Ferhatosmanoglu, Hakan
PROCEEDINGS OF 17TH INTERNATIONAL SYMPOSIUM ON SPATIAL AND TEMPORAL DATABASES, SSTD 2021, 2021, : 23 - 33
[3] Privacy-Preserving Anomaly Detection Using Synthetic Data
Mayer, Rudolf
Hittmeir, Markus
Ekelhart, Andreas
DATA AND APPLICATIONS SECURITY AND PRIVACY XXXIV, DBSEC 2020, 2020, 12122 : 195 - 207
[4] Synthetic data for privacy-preserving clinical risk prediction
Qian, Zhaozhi
Callender, Thomas
Cebere, Bogdan
Janes, Sam M.
Navani, Neal
van der Schaar, Mihaela
SCIENTIFIC REPORTS, 2024, 14 (01):
[5] DataSynthesizer: Privacy-Preserving Synthetic Datasets
Ping, Haoyue
Stoyanovich, Julia
Howe, Bill
SSDBM 2017: 29TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, 2017,
[6] A systematic review of privacy-preserving techniques for synthetic tabular health data
Tobias Hyrup
Anton D. Lautrup
Arthur Zimek
Peter Schneider-Kamp
Discover Data, 3 (1):
[7] SoK: Privacy-Preserving Data Synthesis
Hu, Yuzheng
Wu, Fan
Li, Qinbin
Long, Yunhui
Garrido, Gonzalo Munilla
Ge, Chang
Ding, Bolin
Forsyth, David
Li, Bo
Song, Dawn
45TH IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP 2024, 2024, : 4696 - 4713
[8] Generating Synthetic Health Sensor Data for Privacy-Preserving Wearable Stress Detection
Lange, Lucas
Wenzlitschke, Nils
Rahm, Erhard
SENSORS, 2024, 24 (10)
[9] Experimental Evaluation for Risk Assessment of Privacy Preserving Synthetic Data
Chida, Koji
Kakuta, Susumu
Itakura, Hiroyuki
Ishihara, Ichiro
Yoshioka, Kosuke
Takeuchi, Hiroshi
MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE, MDAI 2024, 2024, 14986 : 224 - 236
[10] Towards Privacy-Preserving Relational Data Synthesis via Probabilistic Relational Models
Luttermann, Malte
Moeller, Ralf
Hartwig, Mattis
KI 2024: ADVANCES IN ARTIFICIAL INTELLIGENCE, KI 2024, 2024, 14992 : 175 - 189

← 1 2 3 4 5 →