Predicting Transient Downtime in Virtual Server Systems: An Efficient Sample Path Randomization Approach

被引：8

作者：

Du, Anna Ye ^{[1
]}

Das, Sanjukta ^{[1
]}

Yang, Zhouhan ^{[2
]}

Qiao, Chunming ^{[2
]}

Ramesh, R. ^{[1
]}

机构：

[1] SUNY Buffalo, Dept Management Sci & Syst, Buffalo, NY 14260 USA

[2] SUNY Buffalo, Dept Comp Sci & Engn, Buffalo, NY 14260 USA

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2015年 / 64卷 / 12期

基金：

美国国家科学基金会;

关键词：

Cloud computing; virtual infrastructure; fault-tolerant systems; Markov chains; REPAIRABLE COMPUTER-SYSTEMS; FAULT-TOLERANT SYSTEMS; MARKOV-MODELS; PERFORMABILITY; DEPENDABILITY; DISTRIBUTIONS; AVAILABILITY;

D O I：

10.1109/TC.2015.2394437

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

A central challenge in developing cloud datacenters Service Level Agreements is the estimation of downtime distribution of a set of provisioned servers over a service window, which is compounded by three facts. First, while steady-state probabilities have been derived for birth-death processes involving server failures and repairs, they could be highly inaccurate under transience. Furthermore, steady-state cannot be assured under typical service windows. Therefore, estimation of transient distributions is essential. Second, the processes of failures and repairs may follow any distribution and hence need to be extracted using system log data and modeled using appropriate general distributions. Third, downtime distributions over service windows depend on the number of servers and their deployment structure for a contract. We develop an efficient and generalized sample path randomization approach to precisely estimate transient probabilities under three different checkpointing strategies and three flexible failure distribution models. The estimators are unbiased, consistent, efficient and sufficient. Their asymptotic convergence is established. The estimation algorithms are computationally efficient in solving practical problems and yield rich information on transient system behaviors. The methodology is general and extensible to various server failure and repair processes characterized using birth-death modeling.

引用

页码：3541 / 3554

页数：14

共 21 条

[1]

[Anonymous], 1987, Communications in Statistics-Stochastic Models, DOI DOI 10.1080/15326348708807067

[2] Transient analysis of some rewarded Markov models using randomization with quasistationarity detection [J].

Carrasco, JA .

IEEE TRANSACTIONS ON COMPUTERS, 2004, 53 (09) :1106-1120

[3]

Dean Jeff, 2009, 3 ACM SIGOPS INT WOR

[4] ON EVALUATING THE CUMULATIVE PERFORMANCE DISTRIBUTION OF FAULT-TOLERANT COMPUTER-SYSTEMS [J].

DONATIELLO, L ;

GRASSI, V .

IEEE TRANSACTIONS ON COMPUTERS, 1991, 40 (11) :1301-1307

[5] COVERAGE MODELING FOR DEPENDABILITY ANALYSIS OF FAULT-TOLERANT SYSTEMS [J].

DUGAN, JB ;

TRIVEDI, KS .

IEEE TRANSACTIONS ON COMPUTERS, 1989, 38 (06) :775-787

[6]

Gill P., 2011, P ACM SIGCOMM C TOR

[7]

Green M., 2003, P DISCR RAND WALKS

[8]

IYER BR, 1986, IEEE T COMPUT, V35, P902, DOI 10.1109/TC.1986.1676681

[9] Valuing American options by simulation: A simple least-squares approach [J].

Longstaff, FA ;

Schwartz, ES .

REVIEW OF FINANCIAL STUDIES, 2001, 14 (01) :113-147

[10] A computationally efficient technique for transient analysis of repairable Markovian systems [J].

Malhotra, M .

PERFORMANCE EVALUATION, 1996, 24 (04) :311-331

← 1 2 3 →