Data Provenance via Differential Auditing

被引:0
作者
Mu, Xin [1 ]
Pang, Ming [2 ]
Zhu, Feida [3 ]
机构
[1] Peng Cheng Lab, Dept Strateg & Adv Interdisciplinary Res, Shenzhen 518066, Peoples R China
[2] JD Com Inc, Beijing 100101, Peoples R China
[3] Singapore Management Univ, Sch Comp & Informat Syst, Singapore 188065, Singapore
基金
新加坡国家研究基金会; 美国国家科学基金会;
关键词
Auditing data; data provenance; machine learning;
D O I
10.1109/TKDE.2023.3334821
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rising awareness of data assets, data governance, which is to understand where data comes from, how it is collected, and how it is used, has been assuming ever-growing importance. One critical component of data governance gaining increasing attention is auditing machine learning models to determine if specific data has been used for training. Existing auditing techniques, like shadow auditing methods, have shown feasibility under specific conditions such as having access to label information and knowledge of training protocols. However, these conditions are often not met in most real-world applications. In this paper, we introduce a practical framework for auditing data provenance based on a differential mechanism, i.e., after carefully designed transformation, perturbed input data from the target model's training set would result in much more drastic changes in the output than those from the model's non-training set. Our framework is data-dependent and does not require distinguishing training data from non-training data or training additional shadow models with labeled output data. Furthermore, our framework extends beyond point-based data auditing to group-based data auditing, aligning with the needs of real-world applications. Our theoretical analysis of the differential mechanism and the experimental results on real-world data sets verify the proposal's effectiveness.
引用
收藏
页码:5066 / 5079
页数:14
相关论文
共 60 条
[1]   Deep Learning with Differential Privacy [J].
Abadi, Martin ;
Chu, Andy ;
Goodfellow, Ian ;
McMahan, H. Brendan ;
Mironov, Ilya ;
Talwar, Kunal ;
Zhang, Li .
CCS'16: PROCEEDINGS OF THE 2016 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2016, :308-318
[2]   INSTANCE-BASED LEARNING ALGORITHMS [J].
AHA, DW ;
KIBLER, D ;
ALBERT, MK .
MACHINE LEARNING, 1991, 6 (01) :37-66
[3]  
[Anonymous], 2012, Advances in Neural Information Processing Systems
[4]  
Arpit D, 2017, PR MACH LEARN RES, V70
[5]  
Carlini N, 2019, PROCEEDINGS OF THE 28TH USENIX SECURITY SYMPOSIUM, P267
[6]  
Chakraborty A, 2018, Arxiv, DOI arXiv:1810.00069
[7]  
Chandrasekaran V., 2021, arXiv
[8]  
Choquette-Choo CA, 2021, PR MACH LEARN RES, V139
[9]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[10]  
Dwork C, 2006, LECT NOTES COMPUT SC, V4052, P1