A comparative study of ad hoc techniques and evolutionary methods for multi-armed bandit problems

被引：0

作者：

D. E. Koulouriotis

A. Xanthopoulos

机构：

[1] Democritus University of Thrace,Department of Production and Management Engineering, School of Engineering

来源：

Operational Research | 2008年 / 8卷 / 2期

关键词：

Multi-armed bandits; Exploitation-exploration trade-off; Evolutionary algorithm; Heuristic techniques;

D O I：

10.1007/s12351-008-0007-5

中图分类号：

学科分类号：

摘要：

In this paper we present an evolutionary approach for the finite-horizon, undiscounted multi-armed bandit problem. The evolutionary algorithm that we designed exhibits a number of novel features dictated by its intended application to the bandit problem. We also present five efficient ad hoc techniques for solving the multi-armed bandit problem that exist in the literature. In order to gain insight on the presented algorithms’ behaviour and compare their performance, we carried out a series of simulation experiments. We present the numerical results and then discuss the way in which performance is affected by parameter selection. The paper is concluded with a number of empirical suggestions on how to select a suitable evolutionary algorithm parameter set for a particular bandit task along with some directions for future research.

引用

页码：105 / 122

页数：17

共 25 条

[1]

Azoulay-Schwartz R(2004)Exploitation vs. exploration: choosing a supplier in an environment of incomplete information Decis Support Syst 38 1-18

[2]

Kraus S(1994)Switching costs and the Gittins index Econometrica 62 687-694

[3]

Wilkenfeld J(2002)Optimal learning and experimentation in bandit problems J Econ Dyn Control 27 87-107

[4]

Banks JS(2005)Information technology project failures: applying the bandit problem to evaluate managerial decision making Inf Manag Comp Secur 13 135-143

[5]

Sundaram RK(2000)Do evolutionary processes minimize expected losses? J Theor Biol 207 117-123

[6]

Brezzi M(2004)A survey on the bandit problem with switching costs Economist 152 513-541

[7]

Lai TL(1996)Reinforcement learning: a survey J Artif Intell Res 4 237-285

[8]

Chulkov DV(1998)Multi-armed bandits in discrete and continuous time Ann Appl Probab 8 1270-1290

[9]

Desai MS(2001)Dynamic pricing on the internet: theory and simulations Electron Commer Res 1 265-276

[10]

Fogel DB(1981)Systematic search, related information and the Gittins’ index Econ Lett 8 327-333

← 1 2 3 →