Provenance Network AnalyticsAn approach to data analytics using data provenance

被引:0
作者
Trung Dong Huynh
Mark Ebden
Joel Fischer
Stephen Roberts
Luc Moreau
机构
[1] University of Southampton,Electronics and Computer Science
[2] University of Oxford,Information Engineering, Department of Engineering Science
[3] University of Nottingham,Mixed Reality Lab., School of Computer Science
[4] King’s College London,Department of Informatics
来源
Data Mining and Knowledge Discovery | 2018年 / 32卷
关键词
Data provenance; Data analytics; Network metrics; Graph classification;
D O I
暂无
中图分类号
学科分类号
摘要
Provenance network analytics is a novel data analytics approach that helps infer properties of data, such as quality or importance, from their provenance. Instead of analysing application data, which are typically domain-dependent, it analyses the data’s provenance as represented using the World Wide Web Consortium’s domain-agnostic PROV data model. Specifically, the approach proposes a number of network metrics for provenance data and applies established machine learning techniques over such metrics to build predictive models for some key properties of data. Applying this method to the provenance of real-world data from three different applications, we show that it can successfully identify the owners of provenance documents, assess the quality of crowdsourced data, and identify instructions from chat messages in an alternate-reality game with high levels of accuracy. By so doing, we demonstrate the different ways the proposed provenance network metrics can be used in analysing data, providing the foundation for provenance-based data analytics.
引用
收藏
页码:708 / 735
页数:27
相关论文
共 50 条
[21]   Securing Big Data Provenance for Auditors: The Big Data Provenance Black Box as Reliable Evidence [J].
Appelbaum, Deniz .
JOURNAL OF EMERGING TECHNOLOGIES IN ACCOUNTING, 2016, 13 (01) :17-36
[22]   Using Data Provenance to Measure Information Assurance Attributes [J].
Moitra, Abha ;
Barnett, Bruce ;
Crapo, Andrew ;
Dill, Stephen J. .
PROVENANCE AND ANNOTATION OF DATA AND PROCESSES, 2010, 6378 :111-+
[23]   Securing Synchrophasors Using Data Provenance in the Quantum Era [J].
Javed, Kashif ;
Khan, Mansoor Ali ;
Ullah, Mukhtar ;
Aman, Muhammad Naveed ;
Sikdar, Biplab .
IEEE OPEN JOURNAL OF THE COMMUNICATIONS SOCIETY, 2024, 5 :1594-1608
[24]   Permissioned Blockchain for Data Provenance in Scientific Data Management [J].
Moeller, Julius ;
Froeschle, Sibylle ;
Hahn, Axel .
INNOVATION THROUGH INFORMATION SYSTEMS, VOL III: A COLLECTION OF LATEST RESEARCH ON MANAGEMENT ISSUES, 2021, 48 :22-38
[25]   Asynchronous Data Provenance for Research Data in a Distributed System [J].
Heinrichs, Benedikt ;
Politze, Marius .
ICEIS: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS - VOL 2, 2021, :361-367
[26]   Data provenance management and application in Clinical Data Spaces [J].
Bao, Xiaoyuan ;
Jiang, Jingsi ;
Zhang, Kai .
2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI 2021), 2021, :1255-1258
[27]   Empowering integration processes with data provenance [J].
Tomazela, Bruno ;
Hara, Carmem Satie ;
Ciferri, Ricardo Rodrigues ;
de Aguiar Ciferri, Cristina Dutra .
DATA & KNOWLEDGE ENGINEERING, 2013, 86 :102-123
[28]   Data Provenance in Citizen Science Databases [J].
Tiufiakov, Nikita ;
Dahanayake, Ajantha ;
Zudilova, Tatiana .
NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2018, 2018, 909 :242-253
[29]   Data Provenance via Differential Auditing [J].
Mu, Xin ;
Pang, Ming ;
Zhu, Feida .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (10) :5066-5079
[30]   Tracking provenance in a virtual data grid [J].
Clifford, Ben ;
Foster, Ian ;
Voeckler, Jens-S. ;
Wilder, Michael ;
Zhao, Yong .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2008, 20 (05) :565-575