Prov-Dominoes: An approach for knowledge discovery from provenance data

被引:0
作者
Alencar, Victor [1 ]
Kohwalter, Troy [2 ]
Braganholo, Vanessa [2 ]
Da Silva Junior, Jose Ricardo [3 ,4 ]
Murta, Leonardo [2 ]
机构
[1] CASNAV, Brazilian Navy, Rio De Janeiro, RJ, Brazil
[2] Univ Fed Fluminense, Inst Computacao, Niteroi, RJ, Brazil
[3] IFRJ, Dept Computacao, Niteroi, RJ, Brazil
[4] Inst Fed Rio Janeiro, Niteroi, RJ, Brazil
关键词
Knowledge discovery; Data analysis; Provenance; Gpu computing; VISUALIZATION; MODEL;
D O I
10.1016/j.eswa.2023.123030
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Provenance has become increasingly relevant to understanding, auditing, and reproducing computational tasks. The provenance analysis processes can often be overwhelming to the user due to the large volume of data, the multiple relationships among data, and the implicit information buried in the data. Existing provenance analysis tools use either visual exploration (which is overwhelming for large provenance graphs) or do not support the exploration of implicit provenance data, such as the inferences of the PROV Data Model Constraints. To fill in this gap, we introduce Prov-Dominoes, a tool designed to interactively enable knowledge discovery on provenance data. Prov-Dominoes promotes the provenance relationships among entities, activities, and agents into first-class elements represented by domino tiles. It allows users to combine and compose such domino tiles visually and interactively, using GPU. The benefits of Prov-Dominoes are three-fold: first, it uses matrices to display provenance data, which is more compact than graphs; second, it allows users to easily explore implicit information; third, it is capable of efficiently processing large datasets using GPUs. We evaluated Prov-Dominoes over distinct case studies, allowing the observation of Prov-Dominoes in action. We also evaluated the performance of sequential combinations executed in Prov-Dominoes when dealing with provenance data with thousands of relations, contrasting their executions in GPU and CPU. The results showed that, for a large dataset, GPU was more than a hundred times faster than CPU.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Data Mining and Knowledge Discovery Technologies
    Bacao, Fernando
    [J]. ONLINE INFORMATION REVIEW, 2008, 32 (06) : 866 - 867
  • [42] A Scientific Knowledge Discovery and Data Mining Process Model for Metabolomics
    Banimustafa, Ahmed
    Hardy, Nigel
    [J]. IEEE ACCESS, 2020, 8 : 209964 - 210005
  • [43] An algorithm for protecting knowledge discovery data
    Brumen, B
    Golob, I
    Welzer, T
    Rozman, I
    Druzovec, M
    Jaakkola, H
    [J]. INFORMATICA, 2003, 14 (03) : 277 - 288
  • [44] Knowledge Discovery in Large Data Sets
    Simas, Tiago
    Silva, Gabriel
    Miranda, Bruno
    Moitinho, Andre
    Ribeiro, Rita
    [J]. CLASSIFICATION AND DISCOVERY IN LARGE ASTRONOMICAL SURVEYS, 2008, 1082 : 196 - +
  • [45] Knowledge discovery and variable scale evaluation for long series data
    Zhai, Yanwei
    Lv, Zheng
    Zhao, Jun
    Wang, Wei
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (04) : 3157 - 3180
  • [46] Knowledge Discovery for Scalable Data Mining
    Chhabra, Indu
    Suri, Gunmala
    [J]. EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2019, 6 (21) : 1 - 9
  • [47] A Hierarchical Information Compression Approach for Knowledge Discovery From Social Multimedia
    Liu, Zheng
    Weng, Yu
    Xu, Ruiyang
    Chaomurilige
    Gao, Honghao
    [J]. IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (06): : 7754 - 7765
  • [48] An integrated interactive environment for knowledge discovery from heterogeneous data resources
    Chen, M
    Zhu, QM
    Chen, ZX
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2001, 43 (08) : 487 - 496
  • [49] Knowledge Discovery from Academic Data using Association Rule Mining
    Ahmed, Shibbir
    Paul, Rajshakhar
    Hoque, Abu Sayed Md Latiful
    [J]. 2014 17TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2014, : 314 - 319
  • [50] Knowledge Discovery from Unstructured Data in Financial Services (KDF) Workshop
    Shah, Sameena
    Zhu, Xiandan
    Chen, Wenhu
    Li, Manling
    Nourbakhsh, Armineh
    Liu, Xiaomo
    Ma, Zhiqiang
    Smiley, Charese
    Pei, Yulong
    Gupta, Akshat
    [J]. PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 3464 - 3467