Prov-Dominoes: An approach for knowledge discovery from provenance data

被引：0

作者：

Alencar, Victor ^{[1
]}

Kohwalter, Troy ^{[2
]}

Braganholo, Vanessa ^{[2
]}

Da Silva Junior, Jose Ricardo ^{[3
,4
]}

Murta, Leonardo ^{[2
]}

机构：

[1] CASNAV, Brazilian Navy, Rio De Janeiro, RJ, Brazil

[2] Univ Fed Fluminense, Inst Computacao, Niteroi, RJ, Brazil

[3] IFRJ, Dept Computacao, Niteroi, RJ, Brazil

[4] Inst Fed Rio Janeiro, Niteroi, RJ, Brazil

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 245卷

关键词：

Knowledge discovery; Data analysis; Provenance; Gpu computing; VISUALIZATION; MODEL;

D O I：

10.1016/j.eswa.2023.123030

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Provenance has become increasingly relevant to understanding, auditing, and reproducing computational tasks. The provenance analysis processes can often be overwhelming to the user due to the large volume of data, the multiple relationships among data, and the implicit information buried in the data. Existing provenance analysis tools use either visual exploration (which is overwhelming for large provenance graphs) or do not support the exploration of implicit provenance data, such as the inferences of the PROV Data Model Constraints. To fill in this gap, we introduce Prov-Dominoes, a tool designed to interactively enable knowledge discovery on provenance data. Prov-Dominoes promotes the provenance relationships among entities, activities, and agents into first-class elements represented by domino tiles. It allows users to combine and compose such domino tiles visually and interactively, using GPU. The benefits of Prov-Dominoes are three-fold: first, it uses matrices to display provenance data, which is more compact than graphs; second, it allows users to easily explore implicit information; third, it is capable of efficiently processing large datasets using GPUs. We evaluated Prov-Dominoes over distinct case studies, allowing the observation of Prov-Dominoes in action. We also evaluated the performance of sequential combinations executed in Prov-Dominoes when dealing with provenance data with thousands of relations, contrasting their executions in GPU and CPU. The results showed that, for a large dataset, GPU was more than a hundred times faster than CPU.

引用

页数：17

共 50 条

[1] Abstracting PROV provenance graphs: A validity-preserving approach
Missier, P.
Bryans, J.
Gamble, C.
Curcin, V
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 111 : 352 - 367
[2] Prov Viewer: A Graph-Based Visualization Tool for Interactive Exploration of Provenance Data
Kohwalter, Troy
Oliveira, Thiago
Freire, Juliana
Clua, Esteban
Murta, Leonardo
PROVENANCE AND ANNOTATION OF DATA AND PROCESSES, IPAW 2016, 2016, 9672 : 71 - 82
[3] Power Quality Data Analysis: From raw data to knowledge using knowledge discovery approach
Santoso, S
Lamoree, JD
2000 IEEE POWER ENGINEERING SOCIETY SUMMER MEETING, CONFERENCE PROCEEDINGS, VOLS 1-4, 2000, : 172 - 177
[4] Efficiently Comparing Provenance for Knowledge Discovery
Tang, Jiuyang
Zhao, Xiang
Ge, Bin
Xiao, Weidong
Shang, Haichuan
JOURNAL OF INTERNET TECHNOLOGY, 2014, 15 (06): : 963 - 974
[5] Knowledge Discovery from Data Mining
Lan, Tian
EBM 2010: INTERNATIONAL CONFERENCE ON ENGINEERING AND BUSINESS MANAGEMENT, VOLS 1-8, 2010, : 4642 - 4645
[6] Visualization and Visual Knowledge Discovery from Big Uncertain Data
Leung, Carson K.
Madill, Evan W. R.
Pazdor, Adam
2022 26TH INTERNATIONAL CONFERENCE INFORMATION VISUALISATION (IV), 2022, : 330 - 335
[7] A hybrid and exploratory approach to knowledge discovery in metabolomic data
Grissa, Dhouha
Comte, Blandine
Petera, Melanie
Pujos-Guillot, Estelle
Napoli, Amedeo
DISCRETE APPLIED MATHEMATICS, 2020, 273 (273) : 103 - 116
[8] MLflow2PROV: Extracting Provenance from Machine Learning Experiments
Schlegel, Marius
Sattler, Kai-Uwe
PROCEEDINGS OF THE SEVENTH WORKSHOP ON DATA MANAGEMENT FOR END-TO-END MACHINE LEARNING, DEEM, 2023,
[9] PROV-TE: A Provenance-Driven Diagnostic Framework for Task Eviction in Data Centers
Albatli, Abdulaziz
McKee, David
Townend, Paul
Lau, Lydia
Xu, Jie
2017 THIRD IEEE INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (IEEE BIGDATASERVICE 2017), 2017, : 233 - 242
[10] Interval-valued fuzzy predicates from labeled data: An approach to data classification and knowledge discovery
Comas, Diego S.
Meschino, Gustavo J.
Ballarin, Virginia L.
INFORMATION SCIENCES, 2025, 707

← 1 2 3 4 5 →