Cross-Silo Federated Learning based Decision Trees

被引：2

作者：

Kalloori, Saikishore ^{[1
]}

Klingler, Severin ^{[1
]}

机构：

[1] Swiss Fed Inst Technol, Media Technol Ctr, Zurich, Switzerland

来源：

37TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING | 2022年

关键词：

Federated learning; Random Forest; Decision Tree;

D O I：

10.1145/3477314.3507149

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Most research and application in the field of Machine Learning focus on training a model for a particular task such as churn prediction by using training data present on one machine or in a data center. Nowadays, in many organizations and industries, the training data exists in different (isolated) locations. In order to protect data privacy and security, it is not feasible to gather all the training data to one location and perform a centralized training of machine learning models. Federated Learning (FL) is a form of machine learning technique where the goal is to learn a high-quality model trained across multiple clients (such as mobile devices) or data centers without ever exchanging their training data. Most of the existing research on FL focuses on two directions: (a) training parametric models such as neural networks and (b) mainly focusing on an FL setup containing millions of clients. However, in this work, we focus on non-parametric models such as decision trees, and more specifically, we build decision trees using federated learning and train random forest model. Our work aims at involving corporate companies instead of mobile devices in the federated learning process. We consider a setting where a small number of organizations or industry companies collaboratively build machine learning models without exchanging their privately held large data sets. We designed a federated decision tree-based random forest algorithm using FL and conducted our experiments using different datasets. Our results demonstrate that each participating corporate company have benefit in improving their model's performance from federated learning. We also introduce how to incorporate differential privacy into our decision tree-based random forest algorithm.

引用

页码：1117 / 1124

页数：8

共 25 条

[1]

[Anonymous], 1895, Philos. Trans. R. Soc., DOI DOI 10.1098/RSTA.1895.0010

[2]

McMahan HB, 2018, Arxiv, DOI arXiv:1710.06963

[3] XGBoost: A Scalable Tree Boosting System [J].

Chen, Tianqi ;

Guestrin, Carlos .

KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794

[4]

Cheng KW, 2021, Arxiv, DOI arXiv:1901.08755

[5] DFedForest: Decentralized Federated Forest [J].

de Souza, Lucas Airam C. ;

Rebello, Gabriel Antonio F. ;

Camilo, Gustavo F. ;

Guimaraes, Lucas C. B. ;

Duarte, Otto Carlos M. B. .

2020 IEEE INTERNATIONAL CONFERENCE ON BLOCKCHAIN (BLOCKCHAIN 2020), 2020, :90-97

[6]

Dwork C, 2006, LECT NOTES COMPUT SC, V4052, P1

[7] Calibrating noise to sensitivity in private data analysis [J].

Dwork, Cynthia ;

McSherry, Frank ;

Nissim, Kobbi ;

Smith, Adam .

THEORY OF CRYPTOGRAPHY, PROCEEDINGS, 2006, 3876 :265-284

[8]

Freund Y., 1996, Machine Learning. Proceedings of the Thirteenth International Conference (ICML '96), P148

[9]

Gentry Craig, 2009, THESIS STANFORD U

[10] UNIVERSALLY UTILITY-MAXIMIZING PRIVACY MECHANISMS [J].

Ghosh, Arpita ;

Roughgarden, Tim ;

Sundararajan, Mukund .

SIAM JOURNAL ON COMPUTING, 2012, 41 (06) :1673-1693

← 1 2 3 →