Data pricing in machine learning pipelines

被引:0
作者
Zicun Cong
Xuan Luo
Jian Pei
Feida Zhu
Yong Zhang
机构
[1] Simon Fraser University,
[2] Singapore Management University,undefined
[3] Huawei Technologies Canada,undefined
来源
Knowledge and Information Systems | 2022年 / 64卷
关键词
Data assets; Data pricing; Data products; Machine learning; AI;
D O I
暂无
中图分类号
学科分类号
摘要
Machine learning is disruptive. At the same time, machine learning can only succeed by collaboration among many parties in multiple steps naturally as pipelines in an eco-system, such as collecting data for possible machine learning applications, collaboratively training models by multiple parties and delivering machine learning services to end users. Data are critical and penetrating in the whole machine learning pipelines. As machine learning pipelines involve many parties and, in order to be successful, have to form a constructive and dynamic eco-system, marketplaces and data pricing are fundamental in connecting and facilitating those many parties. In this article, we survey the principles and the latest research development of data pricing in machine learning pipelines. We start with a brief review of data marketplaces and pricing desiderata. Then, we focus on pricing in three important steps in machine learning pipelines. To understand pricing in the step of training data collection, we review pricing raw data sets and data labels. We also investigate pricing in the step of collaborative training of machine learning models and overview pricing machine learning models for end users in the step of machine learning deployment. We also discuss a series of possible future directions.
引用
收藏
页码:1417 / 1455
页数:38
相关论文
共 102 条
[21]  
Milan GS(2020)-person games IEEE Trans Knowl Data Eng 7 48:1-48:28
[22]  
Saciloto EB(2016)A survey on data pricing: from economics to data science ACM Trans Intell Syst Technol 9 1-92
[23]  
Larentis F(2010)Incentives for effort in crowdsourcing using the peer truth serum Theor Found Numer Methods Sparse Recovery 20 53-65
[24]  
Ensthaler L(1987)Compressive sensing and structured random matrices J Comput Appl Math 42 15-26
[25]  
Giebe T(2013)Silhouettes: a graphical aid to the interpretation and validation of cluster analysis ACM SIGMOD Rec 2 307-317
[26]  
Fernandez RC(1953)Marketplaces for data: an initial survey Contrib Theory Games 54 208-216
[27]  
Subramaniam P(2019)A value for n-person games Intereconomics 9 1695-1706
[28]  
Franklin MJ(2016)Data marketplaces: trends and monetisation of data goods Proc VLDB Endow 18 193:1-193:46
[29]  
Gillies DB(2017)Price-optimal querying with data APIs J Mach Learn Res 112 1-10
[30]  
Hynes N(2017)Making better use of the crowd: how crowdsourcing can advance machine learning research Comput Ind Eng 35 58-69