Distributed policy search reinforcement learning for job-shop scheduling tasks

被引：61

作者：

Gabel, Thomas ^{[1
]}

Riedmiller, Martin ^{[1
]}

机构：

[1] Univ Freiburg, Machine Learning Lab, Dept Comp Sci, Freiburg, Germany

来源：

INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH | 2012年 / 50卷 / 01期

关键词：

distributed reinforcement learning; job-shop scheduling; distributed control; mulit-agent systems; policy search; ALGORITHMS;

D O I：

10.1080/00207543.2011.571443

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

We interpret job-shop scheduling problems as sequential decision problems that are handled by independent learning agents. These agents act completely decoupled from one another and employ probabilistic dispatching policies for which we propose a compact representation using a small set of real-valued parameters. During ongoing learning, the agents adapt these parameters using policy gradient reinforcement learning, with the aim of improving the performance of the joint policy measured in terms of a standard scheduling objective function. Moreover, we suggest a lightweight communication mechanism that enhances the agents' capabilities beyond purely reactive job dispatching. We evaluate the effectiveness of our learning approach using various deterministic as well as stochastic job-shop scheduling benchmark problems, demonstrating that the utilisation of policy gradient methods can be effective and beneficial for scheduling problems.

引用

页码：41 / 61

页数：21

共 30 条

[1]

Aberdeen D., 2007, P 17 INT C AUT PLANN, P10

[2] A survey of factory control algorithms that can be implemented in a multi-agent heterarchy: Dispatching, scheduling, and pull [J].

Baker, AD .

JOURNAL OF MANUFACTURING SYSTEMS, 1998, 17 (04) :297-320

[3]

Baxter J., 1999, Direct gradient-based reinforcement learning: I. gradient estimation algorithms

[4] OR-LIBRARY - DISTRIBUTING TEST PROBLEMS BY ELECTRONIC MAIL [J].

BEASLEY, JE .

JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 1990, 41 (11) :1069-1072

[5]

BEASLEY JE, 2005, OR LIB

[6] The complexity of decentralized control of Markov decision processes [J].

Bernstein, DS ;

Givan, R ;

Immerman, N ;

Zilberstein, S .

MATHEMATICS OF OPERATIONS RESEARCH, 2002, 27 (04) :819-840

[7]

Blazewicz J., 1993, SCHEDULING COMPUTER

[8]

Boyan J.A., 1994, Advances in Neural Information Processing Systems, V6

[9]

Gabel T., 2008, P 7 INT C AUT AG MUL, P369

[10]

Gabel T., 2009, THESIS U OSNABRUCK G

← 1 2 3 →