Data pricing in machine learning pipelines

被引:0
作者
Zicun Cong
Xuan Luo
Jian Pei
Feida Zhu
Yong Zhang
机构
[1] Simon Fraser University,
[2] Singapore Management University,undefined
[3] Huawei Technologies Canada,undefined
来源
Knowledge and Information Systems | 2022年 / 64卷
关键词
Data assets; Data pricing; Data products; Machine learning; AI;
D O I
暂无
中图分类号
学科分类号
摘要
Machine learning is disruptive. At the same time, machine learning can only succeed by collaboration among many parties in multiple steps naturally as pipelines in an eco-system, such as collecting data for possible machine learning applications, collaboratively training models by multiple parties and delivering machine learning services to end users. Data are critical and penetrating in the whole machine learning pipelines. As machine learning pipelines involve many parties and, in order to be successful, have to form a constructive and dynamic eco-system, marketplaces and data pricing are fundamental in connecting and facilitating those many parties. In this article, we survey the principles and the latest research development of data pricing in machine learning pipelines. We start with a brief review of data marketplaces and pricing desiderata. Then, we focus on pricing in three important steps in machine learning pipelines. To understand pricing in the step of training data collection, we review pricing raw data sets and data labels. We also investigate pricing in the step of collaborative training of machine learning models and overview pricing machine learning models for end users in the step of machine learning deployment. We also discuss a series of possible future directions.
引用
收藏
页码:1417 / 1455
页数:38
相关论文
共 102 条
  • [1] Arora S(2012)The multiplicative weights update method: a meta-algorithm and applications Theory Comput 8 121-164
  • [2] Hazan E(2006)The lovely but lonely Vickrey auction Comb Auctions 17 22-26
  • [3] Kale S(2015)Pricing information goods: a strategic analysis of the selling and pay-per-use mechanisms Mark Sci 34 218-234
  • [4] Ausubel LM(1974)A dendrite method for cluster analysis Commun Stat Theory Methods 3 1-27
  • [5] Milgrom P(2011)Differentially private empirical risk minimization J Mach Learn Res 12 1069-1109
  • [6] Balasubramanian S(2019)Revenue maximization for query pricing Proc VLDB Endow 13 1-14
  • [7] Bhattacharya S(1980)Characterizations of an empirical influence function for detecting influential cases in regression Technometrics 22 495-508
  • [8] Krishnan VV(2017)Pricing strategies and levels and their impact on corporate profitability Revista de Administração (São Paulo) 52 120-133
  • [9] Caliński T(2014)Bayesian optimal knapsack procurement Eur J Oper Res 234 774-779
  • [10] Harabasz J(2020)Data market platforms: trading data assets to solve data problems Proc VLDB Endow 13 1933-1947