Programming knowledge discovery workflows in service-oriented distributed systems

被引:8
作者
Cesario, Eugenio [1 ]
Lackovic, Marco [2 ]
Talia, Domenico [1 ,2 ]
Trunfio, Paolo [2 ]
机构
[1] ICAR CNR, Arcavacata Di Rende, CS, Italy
[2] Univ Calabria, DEIS, I-87036 Arcavacata Di Rende, CS, Italy
关键词
distributed data mining; workflows; Grid computing; Knowledge Grid;
D O I
10.1002/cpe.2936
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In several scientific and business domains, very large data repositories are generated. To find interesting and useful information in those repositories, efficient data mining techniques and knowledge discovery processes must be used. The exploitation of data mining techniques in science helps scientists in hypothesis formation and gives them a support on their scientific practices, whereas in industrial processes, data mining can exploit existing data sources as a real value for companies that can take advantage from the knowledge that can be extracted from their large data sources. Data mining tasks are often composed by multiple stages that may be linked to each other to form various execution flows. Moreover, data mining tasks are often distributed because they involve data and tools located over geographically distributed environments. Therefore, it is fundamental to exploit effective paradigms, such as services and workflows, to model data mining tasks that are both multi-staged and distributed. This paper discusses data mining services and workflows for analyzing scientific data in high-performance distributed environments such as Grids and Clouds. We discuss how it is possible to define basic and complex services for supporting distributed data mining tasks in Grids. We also present a workflow formalism and a service-oriented programming framework, named DIS3GNO, for designing and running distributed knowledge discovery processes in the Knowledge Grid system. DIS3GNO supports all the phases of a knowledge discovery process, including composition, execution, and results visualization. After introducing DIS3GNO, some relevant use cases implemented by it and a performance evaluation of the system are discussed. Copyright (C) 2012 John Wiley & Sons, Ltd.
引用
收藏
页码:1482 / 1504
页数:23
相关论文
共 50 条
  • [31] An Application of a Service-oriented System to Support ArrayAnnotation in Custom Chip Design for Epigenomic Analysis
    Han, Junghee
    Potter, Dustin
    Kurc, Tahsin
    Singer, Greg
    Yan, Pearlly S.
    Hao, Sun
    Hastings, Shannon
    Langella, Stephen
    Oster, Scott
    Davuluri, Ramana V.
    Huang, Tim H. -M.
    Saltz, Joel H.
    CANCER INFORMATICS, 2008, 6 : 111 - 125
  • [32] Service-oriented grid computing system for digital rights management (GC-DRM)
    Tsai, Min-Jen
    Luo, Yuan-Fu
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (07) : 10708 - 10726
  • [33] A Service-Oriented Framework for Running Quantum Mechanical Simulations of Material Properties in a Grid Environment
    Yang, Xiaoyu
    Bruin, Richard P.
    Dove, Martin T.
    Walkingshaw, Andrew
    Mortimer-Jones, Thomas V.
    Sinclair, Richard
    Wilson, Dan J.
    Milman, Victor
    Donovan, Tim
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2010, 40 (04): : 485 - 490
  • [34] Optimizing computational resource management for the scientific gateways ecosystems based on the service-oriented paradigm
    de Oliveira, Edvard Martins
    Estrella, Julio Cezar
    Botazzo Delbem, Alexandre Claudio
    Souza Pardo, Mario Henrique
    da Costa, Fausto Guzzo
    Defelicibus, Alexandre
    Reiff-Marganiec, Stephan
    SOFTWARE-PRACTICE & EXPERIENCE, 2020, 50 (06) : 899 - 924
  • [35] A New Distributed and Hierarchical Mechanism for Service Discovery in Grid Environment
    Khanli, Leyli Mohamad
    Ebadi, Saeed
    ADVANCES IN GRID AND PERVASIVE COMPUTING, PROCEEDINGS, 2010, 6104 : 174 - +
  • [36] Developing service-oriented applications in a grid environmentExperiences using the OPeNDAP back-end-server
    Jose Garcia
    Peter Fox
    Patrick West
    Stephan Zednik
    Earth Science Informatics, 2009, 2 : 133 - 139
  • [37] Integrating Clinical Trial Imaging Data Resources Using Service-Oriented Architecture and Grid Computing
    Stefan Baumann El-Ghatta
    Thierry Cladé
    Joshua C. Snyder
    Neuroinformatics, 2010, 8 : 251 - 259
  • [38] Integrating Clinical Trial Imaging Data Resources Using Service-Oriented Architecture and Grid Computing
    El-Ghatta, Stefan Baumann
    Clade, Thierry
    Snyder, Joshua C.
    NEUROINFORMATICS, 2010, 8 (04) : 251 - 259
  • [39] Inferring Workflows with Job Dependencies from Distributed Processing Systems Logs (Or, how to evaluate your systems with realistic workflows NOT pulled out of thin air)
    Carrillo, Gladys E.
    Abad, Cristina L.
    2017 IEEE 15TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, 15TH INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, 3RD INTL CONF ON BIG DATA INTELLIGENCE AND COMPUTING AND CYBER SCIENCE AND TECHNOLOGY CONGRESS(DASC/PICOM/DATACOM/CYBERSCI, 2017, : 1025 - 1030
  • [40] A Market-Oriented Grid Directory Service for Publication and Discovery of Grid Service Providers and their Services
    Jia Yu
    Srikumar Venugopal
    Rajkumar Buyya
    The Journal of Supercomputing, 2006, 36 : 17 - 31