Optimization and Scheduling Algorithm for Data Intensive Workflows in Distributed Data Mining Architecture

被引:0
作者
Kakasevski, Gorgi [1 ]
Mishev, Anastas [2 ]
机构
[1] FON Univ, Fac Informat, Skopje, North Macedonia
[2] Ss Cyril & Methodius Univ, Fac Comp Sci & Engn, Skopje, North Macedonia
来源
17TH IEEE INTERNATIONAL CONFERENCE ON SMART TECHNOLOGIES - IEEE EUROCON 2017 CONFERENCE PROCEEDINGS | 2017年
关键词
Grid computing; distributed data mining; scheduling algorithm;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
New Grid and cloud solutions for distributed data mining and data processing are needed for execution of data intensive workflows. In contrast of the standard workflows, in which data between the jobs are exchanged in the form of files and the jobs are finished when they process the input data, data intensive workflows receive data organized in blocks which are streamed on inputs, analyze the data and produce stream output. Each job is active for a long period of time and can receive new data. In our previous research works we proposed the Open Grid Service Architecture for Data Mining (OGSA-DM), which is capable of executing data intensive workflows. According to our analysis, the current algorithms for scheduling workflows can't be applied on data intensive workflows because they produce unsatisfactory results and can't guarantee optimal solution. In this paper we propose new optimization and scheduling algorithm which is developed on the advantages of data intensive workflows. In several experiments we've shown that our proposed algorithm works and gives satisfactory results.
引用
收藏
页码:775 / 780
页数:6
相关论文
共 25 条
[1]   The design and implementation of Grid database services in OGSA-DAI [J].
Antonioletti, M ;
Atkinson, M ;
Baxter, R ;
Borley, A ;
Hong, NPC ;
Collins, B ;
Hardman, N ;
Hume, AC ;
Knox, A ;
Jackson, M ;
Krause, A ;
Laws, S ;
Magowan, J ;
Paton, NW ;
Pearson, D ;
Sugden, T ;
Watson, P ;
Westhead, M .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2005, 17 (2-4) :357-376
[2]  
Atkinson M. P., 2003, GLOBAL GRID FORUM, V13
[3]   A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems [J].
Braun, TD ;
Siegel, HJ ;
Beck, N ;
Bölöni, LL ;
Maheswaran, M ;
Reuther, AI ;
Robertson, JP ;
Theys, MD ;
Yao, B ;
Hensgen, D ;
Freund, RF .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2001, 61 (06) :810-837
[4]   Total recall: Automatic query expansion with a generative feature model for object retrieval [J].
Chum, Ondrej ;
Philbin, James ;
Sivic, Josef ;
Isard, Michael ;
Zisserman, Andrew .
2007 IEEE 11TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS 1-6, 2007, :496-+
[5]  
Dong F., 2006, Scheduling algorithms for grid computing: State of the art and open problems
[6]  
Forti A., 2006, THESIS
[7]  
Foster I., 2003, GRID 2 BLUEPRINT NEW, Vsecond
[8]  
FOSTER I, 2004, PHYSL GRID OPEN GRID
[9]  
FRAWLEY WJ, 1992, AI MAG, V13, P57
[10]  
HONG NP, 2003, GRID DATABASE SERVIC