An effective reliability-driven technique of allocating tasks on heterogeneous cluster systems

被引：19

作者：

Tang, Xiaoyong ^{[1
,2
]}

Li, Kenli ^{[1
]}

Liao, Guiping ^{[3
]}

机构：

[1] Hunan Univ, Sch Informat Sci & Engn, Changsha 410082, Hunan, Peoples R China

[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210046, Jiangsu, Peoples R China

[3] Hunan Agr Univ, Informat Sci & Technol Coll, Changsha 410128, Hunan, Peoples R China

来源：

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS | 2014年 / 17卷 / 04期

基金：

美国国家科学基金会;

关键词：

Reliability analysis; Heterogeneous cluster systems; Scheduling algorithm; Duplication; Weibull distribution; REPLICATION; PERFORMANCE; ALGORITHMS; TIME;

D O I：

10.1007/s10586-014-0372-1

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In large-scale heterogeneous cluster computing systems, processor and network failures are inevitable and can have an adverse effect on applications executing on such systems. One way of taking failures into account is to employ a reliable scheduling algorithm. However, most existing scheduling algorithms for precedence constrained tasks in heterogeneous systems only consider scheduling length, and not efficiently satisfy the reliability requirements of task. In recognition of this problem, we build an application reliability analysis model based on Weibull distribution, which can dynamically measure the reliability of task executing on heterogeneous cluster with arbitrary networks architectures. Then, we propose a reliability-driven earliest finish time with duplication scheduling algorithm (REFTD) which incorporates task reliability overhead into scheduling. Furthermore, to improve system reliability, it duplicates task as if task hazard rate is more than threshold . The comparison study, based on both randomly generated graphs and the graphs of some real applications, shows that our scheduling algorithm can shorten schedule length and improve system reliability significantly.

引用

页码：1413 / 1425

页数：13

共 29 条

[1]

[Anonymous], 1979, COMPUTERS INTRACTABI

[2] A hybrid policy for fault tolerant load balancing in grid computing environments [J].

Balasangameshwara, Jasma ;

Raju, Nedunchezhian .

JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2012, 35 (01) :412-422

[3] COMPUTATIONAL-COMPLEXITY OF NETWORK RELIABILITY-ANALYSIS - AN OVERVIEW [J].

BALL, MO .

IEEE TRANSACTIONS ON RELIABILITY, 1986, 35 (03) :230-239

[4] Network modeling issues for Grid application scheduling [J].

Casanova, H .

INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE, 2005, 16 (02) :145-162

[5] A comparative study of exponential distribution vs Weibull distribution in machine reliability analysis in a CMS design [J].

Das, K. .

COMPUTERS & INDUSTRIAL ENGINEERING, 2008, 54 (01) :12-33

[6] Matching and scheduling algorithms for minimizing execution time and failure probability of applications in heterogeneous computing [J].

Dogan, A ;

Özgüner, F .

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2002, 13 (03) :308-323

[7] Enhancing performance of failure-prone clusters by adaptive provisioning of cloud resources [J].

Javadi, Bahman ;

Thulasiraman, Parimala ;

Buyya, Rajkumar .

JOURNAL OF SUPERCOMPUTING, 2013, 63 (02) :467-489

[8] Optimizing performance and reliability on heterogeneous parallel systems: Approximation algorithms and heuristics [J].

Jeannot, Emmanuel ;

Saule, Erik ;

Trystram, Denis .

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2012, 72 (02) :268-280

[9]

Jin H, 2009, CCGRID: 2009 9TH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, P236, DOI 10.1109/CCGRID.2009.55

[10] Scheduling for heterogeneous systems using constrained critical paths [J].

Khan, Minhaj Ahmad .

PARALLEL COMPUTING, 2012, 38 (4-5) :175-193

← 1 2 3 →