Programming knowledge discovery workflows in service-oriented distributed systems

被引:8
作者
Cesario, Eugenio [1 ]
Lackovic, Marco [2 ]
Talia, Domenico [1 ,2 ]
Trunfio, Paolo [2 ]
机构
[1] ICAR CNR, Arcavacata Di Rende, CS, Italy
[2] Univ Calabria, DEIS, I-87036 Arcavacata Di Rende, CS, Italy
关键词
distributed data mining; workflows; Grid computing; Knowledge Grid;
D O I
10.1002/cpe.2936
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In several scientific and business domains, very large data repositories are generated. To find interesting and useful information in those repositories, efficient data mining techniques and knowledge discovery processes must be used. The exploitation of data mining techniques in science helps scientists in hypothesis formation and gives them a support on their scientific practices, whereas in industrial processes, data mining can exploit existing data sources as a real value for companies that can take advantage from the knowledge that can be extracted from their large data sources. Data mining tasks are often composed by multiple stages that may be linked to each other to form various execution flows. Moreover, data mining tasks are often distributed because they involve data and tools located over geographically distributed environments. Therefore, it is fundamental to exploit effective paradigms, such as services and workflows, to model data mining tasks that are both multi-staged and distributed. This paper discusses data mining services and workflows for analyzing scientific data in high-performance distributed environments such as Grids and Clouds. We discuss how it is possible to define basic and complex services for supporting distributed data mining tasks in Grids. We also present a workflow formalism and a service-oriented programming framework, named DIS3GNO, for designing and running distributed knowledge discovery processes in the Knowledge Grid system. DIS3GNO supports all the phases of a knowledge discovery process, including composition, execution, and results visualization. After introducing DIS3GNO, some relevant use cases implemented by it and a performance evaluation of the system are discussed. Copyright (C) 2012 John Wiley & Sons, Ltd.
引用
收藏
页码:1482 / 1504
页数:23
相关论文
共 50 条
  • [41] A software cybernetics approach to deploying and scheduling workflows in service-based systems
    Yau, Stephen S.
    Huang, Dazhi
    Zhu, Luping
    Cai, Kai-Yuan
    [J]. 11TH IEEE INTERNATIONAL WORKSHOP ON FUTURE TRENDS OF DISTRIBUTED COMPUTING SYSTEMS, PROCEEDINGS, 2007, : 149 - +
  • [42] A market-oriented grid directory service for publication and discovery of grid service providers and their services
    Yu, Jia
    Venugopal, Srikumar
    Buyya, Rajkumar
    [J]. JOURNAL OF SUPERCOMPUTING, 2006, 36 (01) : 17 - 31
  • [43] E-service delivery pattern for knowledge discovery Grid
    Liu, Meiqun
    [J]. KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 2, PROCEEDINGS, 2008, 5178 : 313 - 317
  • [44] Resource discovery for distributed computing systems: A comprehensive survey
    Zarrin, Javad
    Aguiar, Rui L.
    Barraca, Joao Paulo
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2018, 113 : 127 - 166
  • [45] Developing service-oriented applications in a grid environment Experiences using the OPeNDAP back-end-server
    Garcia, Jose
    Fox, Peter
    West, Patrick
    Zednik, Stephan
    [J]. EARTH SCIENCE INFORMATICS, 2009, 2 (1-2) : 133 - 139
  • [46] Scheduling trade-off of dynamic multiple parallel workflows on heterogeneous distributed computing systems
    Xie, Guoqi
    Liu, Liangjiao
    Yang, Liu
    Li, Renfa
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (02)
  • [47] A grid portal for solving geoscience problems using distributed knowledge discovery services
    Folino, Gianluigi
    Forestiero, Agostino
    Papuzzo, Giuseppe
    Spezzano, Giandomenico
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2010, 26 (01): : 87 - 96
  • [48] From parallel data mining to grid-enabled distributed knowledge discovery
    Cesario, Eugenio
    Talia, Domenico
    [J]. ROUGH SETS, FUZZY SETS, DATA MINING AND GRANULAR COMPUTING, PROCEEDINGS, 2007, 4482 : 25 - +
  • [49] Semantic Web-based Knowledge Management in Distributed Systems
    Buraga, Sabin C.
    [J]. COMPLEXITY IN ARTIFICIAL AND NATURAL SYSTEMS, PROCEEDINGS, 2008, : 11 - 16
  • [50] Big Data Processing Workflows Oriented Real-Time Scheduling Algorithm using Task-Duplication in Geo-Distributed Clouds
    Chen, Huangke
    Wen, Jinming
    Pedrycz, Witold
    Wu, Guohua
    [J]. IEEE TRANSACTIONS ON BIG DATA, 2020, 6 (01) : 131 - 144