Google hostload prediction based on Bayesian model with optimized feature combination

被引:40
作者
Di, Sheng [1 ]
Kondo, Derrick [1 ]
Cirne, Walfredo [2 ]
机构
[1] INRIA, Grenoble, France
[2] Google Inc, Mountain View, CA 94043 USA
关键词
Hostload prediction; Bayesian model; Google data center;
D O I
10.1016/j.jpdc.2013.10.001
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We design a novel prediction method with Bayes model to predict a. load fluctuation pattern over a long-term interval, in the context of Google data centers. We exploit a set of features that capture the expectation, trend, stability and patterns of recent host loads. We also investigate the correlations among these features and explore the most effective combinations of features with various training periods. All of the prediction methods are evaluated using Google trace with 10,000+ heterogeneous hosts. Experiments show that our Bayes method improves the long-term load prediction accuracy by 5.6%-50%, compared to other state-of-the-art methods based on moving average, auto-regression, and/or noise filters. Mean squared error of pattern prediction with Bayes method can be approximately limited in [10(-8), 10(-5)]. Through a load balancing scenario, we confirm the precision of pattern prediction in finding a set of idlest/busiest hosts from among 10,000+ hosts can be improved by about 7% on average. (C) 2013 Elsevier Inc. All rights reserved.
引用
收藏
页码:1820 / 1832
页数:13
相关论文
共 35 条
  • [11] [Anonymous], 2010, Performance analysis of high performance computing applications on the amazon web services cloud
  • [12] [Anonymous], P 14 INT MIDDL E POW
  • [13] [Anonymous], 2009, P 7 AUSTR S GRID COM
  • [14] Barnes BJ, 2008, ICS'08: PROCEEDINGS OF THE 2008 ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, P368
  • [15] Berger J.O., 1985, Statistical decision theory and Bayesian analysis, V2nd
  • [16] A performance prediction framework for scientific applications
    Carrington, L
    Snavely, A
    Wolter, N
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2006, 22 (03): : 336 - 346
  • [17] Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
  • [18] Di S, 2012, INT CONF HIGH PERFOR
  • [19] Characterization and Comparison of Cloud versus Grid Workloads
    Di, Sheng
    Kondo, Derrick
    Cirne, Walfredo
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2012, : 230 - 238
  • [20] Host load prediction using linear models
    Peter A. Dinda
    David R. O'Hallaron
    [J]. Cluster Computing, 2000, 3 (4) : 265 - 280