LAMP: Data Provenance for Graph Based Machine Learning Algorithms through Derivative Computation

被引:14
作者
Ma, Shiqing [1 ]
Aafer, Yousra [1 ]
Xu, Zhaogui [2 ]
Lee, Wen-Chuan [1 ]
Zhai, Juan [2 ]
Liu, Yingqi [1 ]
Zhang, Xiangyu [1 ]
机构
[1] Purdue Univ, W Lafayette, IN 47907 USA
[2] Nanjing Univ, Nanjing, Jiangsu, Peoples R China
来源
ESEC/FSE 2017: PROCEEDINGS OF THE 2017 11TH JOINT MEETING ON FOUNDATIONS OF SOFTWARE ENGINEERING | 2017年
基金
美国国家科学基金会;
关键词
Data Provenance; Machine Learning; Debugging; AUTOMATIC DIFFERENTIATION;
D O I
10.1145/3106237.3106291
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Data provenance tracking determines the set of inputs related to a given output. It enables quality control and problem diagnosis in data engineering. Most existing techniques work by tracking program dependencies. They cannot quantitatively assess the importance of related inputs, which is critical to machine learning algorithms, in which an output tends to depend on a huge set of inputs while only some of them are of importance. In this paper, we propose LAMP, a provenance computation system for machine learning algorithms. Inspired by automatic differentiation (AD), LAMP quantifies the importance of an input for an output by computing the partial derivative. LAMP separates the original data processing and the more expensive derivative computation to different processes to achieve cost-effectiveness. In addition, it allows quantifying importance for inputs related to discrete behavior, such as control flow selection. The evaluation on a set of real world programs and data sets illustrates that LAMP produces more precise and succinct provenance than program dependence based techniques, with much less overhead. Our case studies demonstrate the potential of LAMP in problem diagnosis in data engineering.
引用
收藏
页码:786 / 797
页数:12
相关论文
共 50 条
[1]  
[Anonymous], 1981, SPRINGER LECT NOTES
[2]  
[Anonymous], 1989, Mathematical Programming: recent developments and applications
[3]  
[Anonymous], 2000, NIPS 00
[4]  
[Anonymous], 2004, P 2004 C EMP METH NA, DOI DOI 10.1016/0305-0491(73)90144-2
[5]  
[Anonymous], 2012, NSDI
[6]  
[Anonymous], 1999, PAGERANK CITATION RA
[7]  
[Anonymous], URBANA
[8]  
[Anonymous], 2012, RECENT ADV ALGORITHM, DOI DOI 10.1007/978-3-642-30023-3_27
[9]  
Chakrabarti S., 2007, P INT C WORLD WID WE, P571, DOI [10.1145/ 1242572.1242650, DOI 10.1145/1242572.1242650]
[10]  
Chen HH, 2013, 2013 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), P448