Programming knowledge discovery workflows in service-oriented distributed systems

被引:8
|
作者
Cesario, Eugenio [1 ]
Lackovic, Marco [2 ]
Talia, Domenico [1 ,2 ]
Trunfio, Paolo [2 ]
机构
[1] ICAR CNR, Arcavacata Di Rende, CS, Italy
[2] Univ Calabria, DEIS, I-87036 Arcavacata Di Rende, CS, Italy
来源
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2013年 / 25卷 / 10期
关键词
distributed data mining; workflows; Grid computing; Knowledge Grid;
D O I
10.1002/cpe.2936
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In several scientific and business domains, very large data repositories are generated. To find interesting and useful information in those repositories, efficient data mining techniques and knowledge discovery processes must be used. The exploitation of data mining techniques in science helps scientists in hypothesis formation and gives them a support on their scientific practices, whereas in industrial processes, data mining can exploit existing data sources as a real value for companies that can take advantage from the knowledge that can be extracted from their large data sources. Data mining tasks are often composed by multiple stages that may be linked to each other to form various execution flows. Moreover, data mining tasks are often distributed because they involve data and tools located over geographically distributed environments. Therefore, it is fundamental to exploit effective paradigms, such as services and workflows, to model data mining tasks that are both multi-staged and distributed. This paper discusses data mining services and workflows for analyzing scientific data in high-performance distributed environments such as Grids and Clouds. We discuss how it is possible to define basic and complex services for supporting distributed data mining tasks in Grids. We also present a workflow formalism and a service-oriented programming framework, named DIS3GNO, for designing and running distributed knowledge discovery processes in the Knowledge Grid system. DIS3GNO supports all the phases of a knowledge discovery process, including composition, execution, and results visualization. After introducing DIS3GNO, some relevant use cases implemented by it and a performance evaluation of the system are discussed. Copyright (C) 2012 John Wiley & Sons, Ltd.
引用
收藏
页码:1482 / 1504
页数:23
相关论文
共 50 条
  • [21] Optimization of Supercontinuum Spectrum Using Genetic Algorithms on Service-Oriented Grids
    Molto, G.
    Arevalillo-Herraez, M.
    Milian, C.
    Zacares, M.
    Hernandez, V.
    Ferrando, A.
    IBERGRID: 3RD IBERIAN GRID INFRASTRUCTURE CONFERENCE PROCEEDINGS, 2009, : 137 - 147
  • [22] A service-oriented WSRF-based architecture for metascheduling on computational Grids
    Molto, G.
    Hernandez, V.
    Alonso, J. M.
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2008, 24 (04): : 317 - 328
  • [23] Allocating QOS-constrained applications in a web service-oriented grid
    Patel, Yash
    Darlington, John
    DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, PROCEEDINGS, 2006, 4317 : 278 - +
  • [24] Service-oriented middleware for financial Monte Carlo simulations on the cell broadband engine
    Rotaru, T.
    Dalheimer, M.
    Pfreundt, F. -J.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2010, 22 (05): : 643 - 657
  • [25] Enabling the simulation of service-oriented computing and provisioning policies for autonomic utility grids
    de Assuncao, Marcos Dias
    Streitberger, Werner
    Eymann, Torsten
    Buyya, Rajkumar
    GRID ECONOMICS AND BUSINESS MODELS, 2007, 4685 : 136 - +
  • [26] Gaussian grid: a computational chemistry experiment over a web service-oriented grid
    N. Sanna
    T. Castrignano
    P. D’Onorio De Meo
    D. Carrabino
    A. Grandi
    G. Morelli
    P. Caruso
    V. Barone
    Theoretical Chemistry Accounts, 2007, 117 : 1145 - 1152
  • [27] Gaussian grid: a computational chemistry experiment over a web service-oriented grid
    Sanna, N.
    Castrignano, T.
    De Meo, P. D'Onorio
    Carrabino, D.
    Grandi, A.
    Morelli, G.
    Caruso, P.
    Barone, V.
    THEORETICAL CHEMISTRY ACCOUNTS, 2007, 117 (5-6) : 1145 - 1152
  • [28] A Recursive Distributed Topology Discovery Service for Grid Clients
    Valcarenghi, Luca
    Paolucci, Francesco
    Cugini, Filippo
    Castoldi, Piero
    IEEE COMMUNICATIONS LETTERS, 2009, 13 (07) : 549 - 551
  • [29] Distributed Knowledge Discovery with Non Linear Dimensionality Reduction
    Magdalinos, Panagis
    Vazirgiannis, Michalis
    Valsamou, Dialecti
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II, PROCEEDINGS, 2010, 6119 : 14 - 26
  • [30] An Application of a Service-oriented System to Support ArrayAnnotation in Custom Chip Design for Epigenomic Analysis
    Han, Junghee
    Potter, Dustin
    Kurc, Tahsin
    Singer, Greg
    Yan, Pearlly S.
    Hao, Sun
    Hastings, Shannon
    Langella, Stephen
    Oster, Scott
    Davuluri, Ramana V.
    Huang, Tim H. -M.
    Saltz, Joel H.
    CANCER INFORMATICS, 2008, 6 : 111 - 125