Routing in Reinforcement Learning Markov Chains

被引：0

作者：

Moll, Maximilian ^{[1
]}

Weller, Dominic ^{[1
]}

机构：

[1] Univ Bundeswehr Munchen, Werner Heisenberg Weg 39, D-85577 Neubiberg, Germany

来源：

OPERATIONS RESEARCH PROCEEDINGS 2021 | 2022年

关键词：

Reinforcement Learning; Routing; ASTERISK; GO;

D O I：

10.1007/978-3-031-08623-6_60

中图分类号：

C93 [管理学];

学科分类号：

12 ; 1201 ; 1202 ; 120202 ;

摘要：

With computers beating human players in challenging games like Chess, Go, and StarCraft, Reinforcement Learning has gained much attention recently. The growing field of this data-driven approach to control theory has produced various promising algorithms that combine simulation for data generation, optimization, and often bootstrapping. However, underneath each of those lies the assumption that the problem can be cast as a Markov Decision Process, which extends the usual Markov Chain by assigning controls and resulting rewards to each potential transition. This assumption implies that the underlyingMarkov Chain and the reward, the data equivalent of an inverse cost function, form a weighted network. Consequently, the optimization problem in Reinforcement Learning can be translated to a routing problem in such possibly immense and largely unknown networks. This paper analyzes this novel interpretation and provides some first approaches to its solution.

引用

页码：409 / 414

页数：6

共 12 条

[1]

Brockman G, 2016, Arxiv, DOI arXiv:1606.01540

[2]

Cui X, 2011, INT J COMPUT SCI NET, V11, P125

[3] PHA*: Finding the shortest path with a in an unknown physical environment [J].

Felner, A ;

Stern, R ;

Ben-Yair, A ;

Kraus, S ;

Netanyahu, N .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2004, 21 :631-670

[4] A reinforcement learning approach for optimizing multiple traveling salesman problems over graphs [J].

Hu, Yujiao ;

Yao, Yuan ;

Lee, Wee Sun .

KNOWLEDGE-BASED SYSTEMS, 2020, 204

[5] Reinforcement learning for combinatorial optimization: A survey [J].

Mazyavkina, Nina ;

Sviridov, Sergey ;

Ivanov, Sergei ;

Burnaev, Evgeny .

COMPUTERS & OPERATIONS RESEARCH, 2021, 134

[6]

Mnih V, 2013, Arxiv, DOI [arXiv:1312.5602, DOI 10.48550/ARXIV.1312.5602]

[7]

Moll M, 2017, IN C IND ENG ENG MAN, P1047, DOI 10.1109/IEEM.2017.8290052

[8]

Moore A. W., 1990, Technical report

[9] A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play [J].

Silver, David ;

Hubert, Thomas ;

Schrittwieser, Julian ;

Antonoglou, Ioannis ;

Lai, Matthew ;

Guez, Arthur ;

Lanctot, Marc ;

Sifre, Laurent ;

Kumaran, Dharshan ;

Graepel, Thore ;

Lillicrap, Timothy ;

Simonyan, Karen ;

Hassabis, Demis .

SCIENCE, 2018, 362 (6419) :1140-+

[10] Mastering the game of Go without human knowledge [J].

Silver, David ;

Schrittwieser, Julian ;

Simonyan, Karen ;

Antonoglou, Ioannis ;

Huang, Aja ;

Guez, Arthur ;

Hubert, Thomas ;

Baker, Lucas ;

Lai, Matthew ;

Bolton, Adrian ;

Chen, Yutian ;

Lillicrap, Timothy ;

Hui, Fan ;

Sifre, Laurent ;

van den Driessche, George ;

Graepel, Thore ;

Hassabis, Demis .

NATURE, 2017, 550 (7676) :354-+

← 1 2 →