On the use of hybrid reinforcement learning for autonomic resource allocation

被引:45
作者
Tesauro, Gerald
Jong, Nicholas K.
Das, Rajarshi
Bennani, Mohamed N.
机构
[1] IBM TJ Watson Res Ctr, Hawthorne, NY 10532 USA
[2] Univ Texas, Dept Comp Sci, Austin, TX 78712 USA
[3] Oracle Inc, Portland, OR 97204 USA
来源
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS | 2007年 / 10卷 / 03期
关键词
reinforcement learning; resource allocation; performance management; policy learning;
D O I
10.1007/s10586-007-0035-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement Learning (RL) provides a promising new approach to systems performance management that differs radically from standard queuing-theoretic approaches making use of explicit system performance models. In principle, RL can automatically learn high-quality management policies without an explicit performance model or traffic model, and with little or no built-in system specific knowledge. In our original work (Das, R., Tesauro, G., Walsh, W.E.: IBM Research, Tech. Rep. RC23802 (2005), Tesauro, G.: In: Proc. of AAAI-05, pp. 886-891 (2005), Tesauro, G., Das, R., Walsh, W.E., Kephart, J.O.: In: Proc. of ICAC-05, pp. 342-343 (2005)) we showed the feasibility of using online RL to learn resource valuation estimates (in lookup table form) which can be used to make high-quality server allocation decisions in a multi-application prototype Data Center scenario. The present work shows how to combine the strengths of both RL and queuing models in a hybrid approach, in which RL trains offline on data collected while a queuing model policy controls the system. By training offline we avoid suffering potentially poor performance in live online training. We also now use RL to train nonlinear function approximators (e.g. multi-layer perceptrons) instead of lookup tables; this enables scaling to substantially larger state spaces. Our results now show that, in both open-loop and closed-loop traffic, hybrid RL training can achieve significant performance improvements over a variety of initial model-based policies. We also find that, as expected, RL can deal effectively with both transients and switching delays, which lie outside the scope of traditional steady-state queuing theory.
引用
收藏
页码:287 / 299
页数:13
相关论文
共 36 条
[31]  
Tesauro G, 2005, 3RD INTERNATIONAL CONFERENCE ON AUTONOMIC COMPUTING, PROCEEDINGS, P65
[32]  
Urgaonkar B., 2005, ACM SIGMETRICS Performance Evaluation Review, V33, P291
[33]  
Vengerov D, 2005, ICAC 2005: Second International Conference on Autonomic Computing, Proceedings, P339
[34]  
VENGEROV D, 2005, TR2005141 SUN MICR
[35]   Utility functions in autonomic systems [J].
Walsh, WE ;
Tesauro, G ;
Kephart, JO ;
Das, R .
INTERNATIONAL CONFERENCE ON AUTONOMIC COMPUTING, PROCEEDINGS, 2004, :70-77
[36]  
Whiteson S, 2004, ENG APPL ARTIF INTEL, V17, P855, DOI 10.1016/j.engappai.2004.08.027