VertiBayes: learning Bayesian network parameters from vertically partitioned data with missing values

被引：1

作者：

van Daalen, Florian ^{[1
,2
]}

Ippel, Lianne ^{[2
]}

Dekker, Andre ^{[1
]}

Bermejo, Inigo ^{[1
]}

机构：

[1] Maastricht Univ, Med Ctr, GROW Sch Oncol & Reprod, Dept Radiat Oncol MAASTRO, Maastricht, Netherlands

[2] Stat Netherlands, Methodol, Heerlen, Netherlands

来源：

COMPLEX & INTELLIGENT SYSTEMS | 2024年 / 10卷 / 04期

关键词：

Federated Learning; Bayesian network; Privacy preserving; Vertically partitioned data; Parameter learning; Structure learning;

D O I：

10.1007/s40747-024-01424-0

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Federated learning makes it possible to train a machine learning model on decentralized data. Bayesian networks are widely used probabilistic graphical models. While some research has been published on the federated learning of Bayesian networks, publications on Bayesian networks in a vertically partitioned data setting are limited, with important omissions, such as handling missing data. We propose a novel method called VertiBayes to train Bayesian networks (structure and parameters) on vertically partitioned data, which can handle missing values as well as an arbitrary number of parties. For structure learning we adapted the K2 algorithm with a privacy-preserving scalar product protocol. For parameter learning, we use a two-step approach: first, we learn an intermediate model using maximum likelihood, treating missing values as a special value, then we train a model on synthetic data generated by the intermediate model using the EM algorithm. The privacy guarantees of VertiBayes are equivalent to those provided by the privacy preserving scalar product protocol used. We experimentally show VertiBayes produces models comparable to those learnt using traditional algorithms. Finally, we propose two alternative approaches to estimate the performance of the model using vertically partitioned data and we show in experiments that these give accurate estimates.

引用

页码：5317 / 5329

页数：13

共 36 条

[1] Privacy Preserving Synthetic Data Release Using Deep Learning [J].

Abay, Nazmiye Ceren ;

Zhou, Yan ;

Kantarcioglu, Murat ;

Thuraisingham, Bhavani ;

Sweeney, Latanya .

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2018, PT I, 2019, 11051 :510-526

[2] NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].

AKAIKE, H .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723

[3]

[Anonymous], 2005, 21 INT C DAT ENG WOR, DOI DOI 10.1109/ICDE.2005.230

[4]

Atallah MJ, 2001, LECT NOTES COMPUT SC, V2125, P165

[5]

Beinlich I. A., 1989, AIME 89. Second European Conference on Artificial Intelligence in Medicine Proceedings, P247

[6] Ontology-Driven Learning of Bayesian Network for Causal Inference and Quality Assurance in Additive Manufacturing [J].

Chen, Ruimin ;

Lu, Yan ;

Witherell, Paul ;

Simpson, Timothy W. ;

Kumara, Soundar ;

Yang, Hui .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (03) :6032-6038

[7] A BAYESIAN METHOD FOR THE INDUCTION OF PROBABILISTIC NETWORKS FROM DATA [J].

COOPER, GF ;

HERSKOVITS, E .

MACHINE LEARNING, 1992, 9 (04) :309-347

[8] Mobile Iris Challenge Evaluation (MICHE)-I, biometric iris dataset and protocols [J].

De Marsico, Maria ;

Nappi, Michele ;

Riccio, Daniel ;

Wechsler, Harry .

PATTERN RECOGNITION LETTERS, 2015, 57 :17-23

[9] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].

DEMPSTER, AP ;

LAIRD, NM ;

RUBIN, DB .

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38

[10]

Du Wenliang., 2002, Proceedings of the IEEE International Conference on Data Mining Workshop on Privacy, Security, and Data Mining (PSDM), P1

← 1 2 3 4 →