Machine Unlearning in Gradient Boosting Decision Trees

被引:3
作者
Lin, Huawei [1 ]
Chung, Jun Woo [1 ]
Lao, Yingjie [2 ]
Zhao, Weijie [1 ]
机构
[1] RIT, Rochester, NY 14623 USA
[2] Clemson Univ, Clemson, SC USA
来源
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023 | 2023年
基金
美国国家科学基金会;
关键词
Machine Unlearning; Gradient Boosting Decision Trees; Privacy; STATISTICAL VIEW;
D O I
10.1145/3580305.3599420
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Various machine learning applications take users' data to train the models. Recently enforced legislation requires companies to remove users' data upon requests, i.e., the right to be forgotten. In the context of machine learning, the trained model potentially memorizes the training data. Machine learning algorithms have to be able to unlearn the user data that are requested to delete to meet the requirement. Gradient Boosting Decision Trees (GBDT) is a widely deployed model in many machine learning applications. However, few studies investigate the unlearning on GBDT. This paper proposes a novel unlearning framework for GBDT. To the best of our knowledge, this is the first work that considers machine unlearning on GBDT. It is not straightforward to transfer the unlearning methods of DNN to GBDT settings. We formalized the machine unlearning problem and its relaxed version. We propose an unlearning framework that efficiently and effectively unlearns a given collection of data without retraining the model from scratch. We introduce a collection of techniques, including random split point selection and random partitioning layers training, to the training process of the original tree models to ensure that the trained model requires few subtree retrainings during the unlearning. We investigate the intermediate data and statistics to store as an auxiliary data structure during the training so that we can immediately determine if a subtree is required to be retrained without touching the original training dataset. Furthermore, a lazy update technique is proposed as a trade-off between unlearning time and model functionality. We experimentally evaluate our proposed methods on public datasets. The empirical results confirm the effectiveness of our framework.
引用
收藏
页码:1374 / 1383
页数:10
相关论文
共 37 条
  • [1] A comparative analysis of gradient boosting algorithms
    Bentejac, Candice
    Csorgo, Anna
    Martinez-Munoz, Gonzalo
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (03) : 1937 - 1967
  • [2] Brophy Jonathan, 2021, P 38 INT C MACHINE L, V139, P1092
  • [3] Towards Making Systems Forget with Machine Unlearning
    Cao, Yinzhi
    Yang, Junfeng
    [J]. 2015 IEEE SYMPOSIUM ON SECURITY AND PRIVACY SP 2015, 2015, : 463 - 480
  • [4] Carlini N, 2019, PROCEEDINGS OF THE 28TH USENIX SECURITY SYMPOSIUM, P267
  • [5] Graph Unlearning<bold> </bold>
    Chen, Min
    Zhang, Zhikun
    Wang, Tianhao
    Backes, Michael
    Humbert, Mathias
    Zhang, Yang
    [J]. PROCEEDINGS OF THE 2022 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2022, 2022, : 499 - 513
  • [6] Task-wise Split Gradient Boosting Trees for Multi-center Diabetes Prediction
    Chen, Mingcheng
    Wang, Zhenghui
    Zhao, Zhiyun
    Zhang, Weinan
    Guo, Xiawei
    Shen, Jian
    Qu, Yanru
    Lu, Jieli
    Xu, Min
    Xu, Yu
    Wang, Tiange
    Li, Mian
    Tu, Weiwei
    Yu, Yong
    Bi, Yufang
    Wang, Weiqing
    Ning, Guang
    [J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 2663 - 2673
  • [7] Automatic detection method of cracks from concrete surface imagery using two-step light gradient boosting machine
    Chun, Pang-jo
    Izumi, Shota
    Yamane, Tatsuro
    [J]. COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2021, 36 (01) : 61 - 72
  • [8] Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures
    Fredrikson, Matt
    Jha, Somesh
    Ristenpart, Thomas
    [J]. CCS'15: PROCEEDINGS OF THE 22ND ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2015, : 1322 - 1333
  • [9] Additive logistic regression: A statistical view of boosting - Rejoinder
    Friedman, J
    Hastie, T
    Tibshirani, R
    [J]. ANNALS OF STATISTICS, 2000, 28 (02) : 400 - 407
  • [10] Friedman J, 2008, J MACH LEARN RES, V9, P175