A comparative study of ad hoc techniques and evolutionary methods for multi-armed bandit problems

被引:0
作者
D. E. Koulouriotis
A. Xanthopoulos
机构
[1] Democritus University of Thrace,Department of Production and Management Engineering, School of Engineering
关键词
Multi-armed bandits; Exploitation-exploration trade-off; Evolutionary algorithm; Heuristic techniques;
D O I
10.1007/s12351-008-0007-5
中图分类号
学科分类号
摘要
In this paper we present an evolutionary approach for the finite-horizon, undiscounted multi-armed bandit problem. The evolutionary algorithm that we designed exhibits a number of novel features dictated by its intended application to the bandit problem. We also present five efficient ad hoc techniques for solving the multi-armed bandit problem that exist in the literature. In order to gain insight on the presented algorithms’ behaviour and compare their performance, we carried out a series of simulation experiments. We present the numerical results and then discuss the way in which performance is affected by parameter selection. The paper is concluded with a number of empirical suggestions on how to select a suitable evolutionary algorithm parameter set for a particular bandit task along with some directions for future research.
引用
收藏
页码:105 / 122
页数:17
相关论文
共 25 条
[1]  
Azoulay-Schwartz R(2004)Exploitation vs. exploration: choosing a supplier in an environment of incomplete information Decis Support Syst 38 1-18
[2]  
Kraus S(1994)Switching costs and the Gittins index Econometrica 62 687-694
[3]  
Wilkenfeld J(2002)Optimal learning and experimentation in bandit problems J Econ Dyn Control 27 87-107
[4]  
Banks JS(2005)Information technology project failures: applying the bandit problem to evaluate managerial decision making Inf Manag Comp Secur 13 135-143
[5]  
Sundaram RK(2000)Do evolutionary processes minimize expected losses? J Theor Biol 207 117-123
[6]  
Brezzi M(2004)A survey on the bandit problem with switching costs Economist 152 513-541
[7]  
Lai TL(1996)Reinforcement learning: a survey J Artif Intell Res 4 237-285
[8]  
Chulkov DV(1998)Multi-armed bandits in discrete and continuous time Ann Appl Probab 8 1270-1290
[9]  
Desai MS(2001)Dynamic pricing on the internet: theory and simulations Electron Commer Res 1 265-276
[10]  
Fogel DB(1981)Systematic search, related information and the Gittins’ index Econ Lett 8 327-333