ProML: A Decentralised Platform for Provenance Management of Machine Learning Software Systems

被引:2
作者
Nguyen Khoi Tran [1 ]
Sabir, Bushra [1 ]
Babar, Muhammad Ali [1 ]
Cui, Nini [1 ]
Abolhasan, Mehran [2 ]
Lipman, Justin [2 ]
机构
[1] Univ Adelaide, Adelaide, SA, Australia
[2] Univ Technol Sydney, Sydney, NSW, Australia
来源
SOFTWARE ARCHITECTURE, ECSA 2022 | 2022年 / 13444卷
关键词
SE for AI; Provenance; Machine Learning; Blockchain;
D O I
10.1007/978-3-031-16697-6_4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Large-scale Machine Learning (ML) based Software Systems are increasingly developed by distributed teams situated in different trust domains. Insider threats can launch attacks from any domain to compromise ML assets (models and datasets). Therefore, practitioners require information about how and by whom ML assets were developed to assess their quality attributes such as security, safety, and fairness. Unfortunately, it is challenging for ML teams to access and reconstruct such historical information of ML assets (ML provenance) because it is generally fragmented across distributed ML teams and threatened by the same adversaries that attack ML assets. This paper proposes ProML, a decentralised platform that leverages blockchain and smart contracts to empower distributed ML teams to jointly manage a single source of truth about circulated ML assets' provenance without relying on a third party, which is vulnerable to insider threats and presents a single point of failure. We propose a novel architectural approach called Artefact-as-a-State-Machine to leverage blockchain transactions and smart contracts for managing ML provenance information and introduce a user-driven provenance capturing mechanism to integrate existing scripts and tools to ProML without compromising participants' control over their assets and toolchains. We evaluate the performance and overheads of ProML by benchmarking a proof-of-concept system on a global blockchain. Furthermore, we assessed ProML's security against a threat model of a distributed ML workflow.
引用
收藏
页码:49 / 65
页数:17
相关论文
共 26 条
[1]  
Algorithmia, 2020, 2020 STAT ENT MACH L
[2]  
[Anonymous], 2014, Ethereum: A secure decentralised generalised transaction ledger
[3]  
Baracaldo N, 2017, PROCEEDINGS OF THE 10TH ACM WORKSHOP ON ARTIFICIAL INTELLIGENCE AND SECURITY, AISEC 2017, P103, DOI 10.1145/3128572.3140450
[4]   150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com [J].
Bernardi, Lucas ;
Mavridis, Themis ;
Estevez, Pablo .
KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, :1743-1751
[5]  
Cachin C, 2017, Arxiv, DOI arXiv:1707.01873
[6]   The Approach to Managing Provenance Metadata and Data Access Rights in Distributed Storage Using the Hyperledger Blockchain Platform [J].
Demichev, Andrey ;
Kryukov, Alexander ;
Prikhodko, Nikolai .
2018 IVANNIKOV ISPRAS OPEN CONFERENCE (ISPRAS), 2018, :131-136
[7]  
Dotscience, 2019, STAT DEV OP AI APPL
[8]  
Gebru T, 2021, COMMUN ACM, V64, P86, DOI 10.1145/3458723
[9]   Towards Security Threats of Deep Learning Systems: A Survey [J].
He, Yingzhe ;
Meng, Guozhu ;
Chen, Kai ;
Hu, Xingbo ;
He, Jinwen .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (05) :1743-1770
[10]   Don't Forget Your Roots! Using Provenance Data for Transparent and Explainable Development of Machine Learning Models [J].
Jentzsch, Sophie F. ;
Hochgeschwender, Nico .
2019 34TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING WORKSHOPS (ASEW 2019), 2019, :37-40