Data Provenance via Differential Auditing

被引：0

作者：

Mu, Xin ^{[1
]}

Pang, Ming ^{[2
]}

Zhu, Feida ^{[3
]}

机构：

[1] Peng Cheng Lab, Dept Strateg & Adv Interdisciplinary Res, Shenzhen 518066, Peoples R China

[2] JD Com Inc, Beijing 100101, Peoples R China

[3] Singapore Management Univ, Sch Comp & Informat Syst, Singapore 188065, Singapore

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2024年 / 36卷 / 10期

基金：

新加坡国家研究基金会; 美国国家科学基金会;

关键词：

Auditing data; data provenance; machine learning;

D O I：

10.1109/TKDE.2023.3334821

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the rising awareness of data assets, data governance, which is to understand where data comes from, how it is collected, and how it is used, has been assuming ever-growing importance. One critical component of data governance gaining increasing attention is auditing machine learning models to determine if specific data has been used for training. Existing auditing techniques, like shadow auditing methods, have shown feasibility under specific conditions such as having access to label information and knowledge of training protocols. However, these conditions are often not met in most real-world applications. In this paper, we introduce a practical framework for auditing data provenance based on a differential mechanism, i.e., after carefully designed transformation, perturbed input data from the target model's training set would result in much more drastic changes in the output than those from the model's non-training set. Our framework is data-dependent and does not require distinguishing training data from non-training data or training additional shadow models with labeled output data. Furthermore, our framework extends beyond point-based data auditing to group-based data auditing, aligning with the needs of real-world applications. Our theoretical analysis of the differential mechanism and the experimental results on real-world data sets verify the proposal's effectiveness.

引用

页码：5066 / 5079

页数：14

共 60 条

[51]

Szegedy C, 2015, PROC CVPR IEEE, P1, DOI 10.1109/CVPR.2015.7298594

[52]

Truex S, 2019, Arxiv, DOI [arXiv:1807.09173, 10.48550/arXiv.1807.09173]

[53] Differentially Private Naive Bayes Classification [J].

Vaidya, Jaideep ;

Basu, Anirban ;

Shafiq, Basit ;

Hong, Yuan .

2013 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 1, 2013, :571-576

[54]

Vedaldi A., 2015, BRIT MACH VIS C, DOI [10.5244/C.29.41, DOI 10.5244/C.29.41]

[55] Top 10 algorithms in data mining [J].

Wu, Xindong ;

Kumar, Vipin ;

Quinlan, J. Ross ;

Ghosh, Joydeep ;

Yang, Qiang ;

Motoda, Hiroshi ;

McLachlan, Geoffrey J. ;

Ng, Angus ;

Liu, Bing ;

Yu, Philip S. ;

Zhou, Zhi-Hua ;

Steinbach, Michael ;

Hand, David J. ;

Steinberg, Dan .

KNOWLEDGE AND INFORMATION SYSTEMS, 2008, 14 (01) :1-37

[56]

Yang Q., 2020, Federated Learning: Privacy and Incentive

[57] Template-based and free modeling of I-TASSER and QUARK pipelines using predicted contact maps in CASP12 [J].

Zhang, Chengxin ;

Mortuza, S. M. ;

He, Baoji ;

Wang, Yanting ;

Zhang, Yang .

PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2018, 86 :136-151

[58] Age Group and Gender Estimation in the Wild With Deep RoR Architecture [J].

Zhang, Ke ;

Gao, Ce ;

Guo, Liru ;

Sun, Miao ;

Yuan, Xingfang ;

Han, Tony X. ;

Zhao, Zhenbing ;

Li, Baogang .

IEEE ACCESS, 2017, 5 :22492-22503

[59] Membership Inference Attacks Against Recommender Systems [J].

Zhang, Minxing ;

Ren, Zhaochun ;

Wang, Zihan ;

Ren, Pengjie ;

Chen, Zhumin ;

Hu, Pengfei ;

Zhang, Yang .

CCS '21: PROCEEDINGS OF THE 2021 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2021, :864-879

[60]

Zhou ZH, 2017, PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P3553

← 1 2 3 4 5 6 →