Provenance Network AnalyticsAn approach to data analytics using data provenance

被引:0
作者
Trung Dong Huynh
Mark Ebden
Joel Fischer
Stephen Roberts
Luc Moreau
机构
[1] University of Southampton,Electronics and Computer Science
[2] University of Oxford,Information Engineering, Department of Engineering Science
[3] University of Nottingham,Mixed Reality Lab., School of Computer Science
[4] King’s College London,Department of Informatics
来源
Data Mining and Knowledge Discovery | 2018年 / 32卷
关键词
Data provenance; Data analytics; Network metrics; Graph classification;
D O I
暂无
中图分类号
学科分类号
摘要
Provenance network analytics is a novel data analytics approach that helps infer properties of data, such as quality or importance, from their provenance. Instead of analysing application data, which are typically domain-dependent, it analyses the data’s provenance as represented using the World Wide Web Consortium’s domain-agnostic PROV data model. Specifically, the approach proposes a number of network metrics for provenance data and applies established machine learning techniques over such metrics to build predictive models for some key properties of data. Applying this method to the provenance of real-world data from three different applications, we show that it can successfully identify the owners of provenance documents, assess the quality of crowdsourced data, and identify instructions from chat messages in an alternate-reality game with high levels of accuracy. By so doing, we demonstrate the different ways the proposed provenance network metrics can be used in analysing data, providing the foundation for provenance-based data analytics.
引用
收藏
页码:708 / 735
页数:27
相关论文
共 50 条
[41]   Oceanographic Data Provenance Tracking with the Shore Side Data System [J].
McCann, Michael ;
Gomes, Kevin .
PROVENANCE AND ANNOTATION OF DATA AND PROCESSES, 2008, 5272 :309-322
[42]   Comparative Study of Evaluating the Trustworthiness of Data Based on Data Provenance [J].
Gurjar, Kuldeep ;
Moon, Yang-Sae .
JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2016, 12 (02) :234-248
[43]   Data Provenance in Biomedical Research: Scoping Review [J].
Johns, Marco ;
Meurers, Thierry ;
Wirth, Felix N. ;
Haber, Anna C. ;
Mueller, Armin ;
Halilovic, Mehmed ;
Balzer, Felix ;
Prasser, Fabian .
JOURNAL OF MEDICAL INTERNET RESEARCH, 2023, 25
[44]   Data provenance and trust establishment in the Internet of Things [J].
Elkhodr, Mahmoud ;
Alsinglawi, Belal .
SECURITY AND PRIVACY, 2020, 3 (03)
[45]   Reliability of Design Data Through Provenance Management [J].
Giese, Tim G. ;
Anderl, Reiner .
PRODUCT LIFECYCLE MANAGEMENT PLM IN TRANSITION TIMES: THE PLACE OF HUMANS AND TRANSFORMATIVE TECHNOLOGIES, PLM 2022, 2023, 667 :274-283
[46]   Managing Data Provenance in Genome Project Workflows [J].
de Paula, Renato ;
Holanda, Maristela T. ;
Walter, Maria Emilia M. T. ;
Lifschitz, Sergio .
2012 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS (BIBMW), 2012,
[47]   Reliable Provenance Information for Multimedia Data Using Invertible Fragile Watermarks [J].
Schaeler, Martin ;
Schulze, Sandro ;
Merkel, Ronny ;
Saake, Gunter ;
Dittmann, Jana .
ADVANCES IN DATABASES, 2011, 7051 :3-17
[48]   DATA PROVENANCE IN PHOTOGRAMMETRY THROUGH DOCUMENTATION PROTOCOLS [J].
Carboni, N. ;
Bruseker, G. ;
Guillem, A. ;
Castaneda, D. Bellido ;
Coughenour, C. ;
Domajnko, M. ;
de Kramer, M. ;
Calles, M. M. Ramos ;
Stathopoulou, E. K. ;
Suma, R. .
XXIII ISPRS CONGRESS, COMMISSION V, 2016, 3 (05) :57-64
[49]   Data provenance in SOA: security, reliability, and integrity [J].
Tsai, W. T. ;
Wei, Xiao ;
Chen, Yinong ;
Paul, Ray ;
Chung, Jen-Yao ;
Zhang, Dawei .
SERVICE ORIENTED COMPUTING AND APPLICATIONS, 2007, 1 (04) :223-247
[50]   A Data Provenance Model for Collaboration Design Process [J].
Sun, Xuan ;
Gao, Xin ;
Kang, Haiyan ;
Li, Chen .
PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INFORMATION SCIENCES, MACHINERY, MATERIALS AND ENERGY (ICISMME 2015), 2015, 126 :384-389