Machine Unlearning in Gradient Boosting Decision Trees

被引：3

作者：

Lin, Huawei ^{[1
]}

Chung, Jun Woo ^{[1
]}

Lao, Yingjie ^{[2
]}

Zhao, Weijie ^{[1
]}

机构：

[1] RIT, Rochester, NY 14623 USA

[2] Clemson Univ, Clemson, SC USA

来源：

PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023 | 2023年

基金：

美国国家科学基金会;

关键词：

Machine Unlearning; Gradient Boosting Decision Trees; Privacy; STATISTICAL VIEW;

D O I：

10.1145/3580305.3599420

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Various machine learning applications take users' data to train the models. Recently enforced legislation requires companies to remove users' data upon requests, i.e., the right to be forgotten. In the context of machine learning, the trained model potentially memorizes the training data. Machine learning algorithms have to be able to unlearn the user data that are requested to delete to meet the requirement. Gradient Boosting Decision Trees (GBDT) is a widely deployed model in many machine learning applications. However, few studies investigate the unlearning on GBDT. This paper proposes a novel unlearning framework for GBDT. To the best of our knowledge, this is the first work that considers machine unlearning on GBDT. It is not straightforward to transfer the unlearning methods of DNN to GBDT settings. We formalized the machine unlearning problem and its relaxed version. We propose an unlearning framework that efficiently and effectively unlearns a given collection of data without retraining the model from scratch. We introduce a collection of techniques, including random split point selection and random partitioning layers training, to the training process of the original tree models to ensure that the trained model requires few subtree retrainings during the unlearning. We investigate the intermediate data and statistics to store as an auxiliary data structure during the training so that we can immediately determine if a subtree is required to be retrained without touching the original training dataset. Furthermore, a lazy update technique is proposed as a trade-off between unlearning time and model functionality. We experimentally evaluate our proposed methods on public datasets. The empirical results confirm the effectiveness of our framework.

引用

页码：1374 / 1383

页数：10

共 37 条

[1] A comparative analysis of gradient boosting algorithms
Bentejac, Candice
Csorgo, Anna
Martinez-Munoz, Gonzalo
[J]. ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (03) : 1937 - 1967
[2] Brophy Jonathan, 2021, P 38 INT C MACHINE L, V139, P1092
[3] Towards Making Systems Forget with Machine Unlearning
Cao, Yinzhi
Yang, Junfeng
[J]. 2015 IEEE SYMPOSIUM ON SECURITY AND PRIVACY SP 2015, 2015, : 463 - 480
[4] Carlini N, 2019, PROCEEDINGS OF THE 28TH USENIX SECURITY SYMPOSIUM, P267
[5] Graph Unlearning<bold> </bold>
Chen, Min
Zhang, Zhikun
Wang, Tianhao
Backes, Michael
Humbert, Mathias
Zhang, Yang
[J]. PROCEEDINGS OF THE 2022 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2022, 2022, : 499 - 513
[6] Task-wise Split Gradient Boosting Trees for Multi-center Diabetes Prediction
Chen, Mingcheng
Wang, Zhenghui
Zhao, Zhiyun
Zhang, Weinan
Guo, Xiawei
Shen, Jian
Qu, Yanru
Lu, Jieli
Xu, Min
Xu, Yu
Wang, Tiange
Li, Mian
Tu, Weiwei
Yu, Yong
Bi, Yufang
Wang, Weiqing
Ning, Guang
[J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 2663 - 2673
[7] Automatic detection method of cracks from concrete surface imagery using two-step light gradient boosting machine
Chun, Pang-jo
Izumi, Shota
Yamane, Tatsuro
[J]. COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2021, 36 (01) : 61 - 72
[8] Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures
Fredrikson, Matt
Jha, Somesh
Ristenpart, Thomas
[J]. CCS'15: PROCEEDINGS OF THE 22ND ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2015, : 1322 - 1333
[9] Additive logistic regression: A statistical view of boosting - Rejoinder
Friedman, J
Hastie, T
Tibshirani, R
[J]. ANNALS OF STATISTICS, 2000, 28 (02) : 400 - 407
[10] Friedman J, 2008, J MACH LEARN RES, V9, P175

← 1 2 3 4 →