Acquisition of Automated Guided Vehicle Route Planning Policy Using Deep Reinforcement Learning

被引：0

作者：

Kamoshida, Ryota ^{[1
]}

Kazama, Yoriko ^{[1
]}

机构：

[1] Hitachi Ltd, Res & Dev Grp, Ctr Technol Innovat Syst Engn, Kokubunji, Tokyo 1858601, Japan

来源：

2017 6TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED LOGISTICS AND TRANSPORT (ICALT) | 2017年

关键词：

Automated guided vehicle; order picking; warehouse; reinforcement learning; deep learning;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Automated guided vehicle (AGV) systems have been widely used in warehouses to improve productivity and reduce costs. For almost every warehouse, order picking is the most costly activity. In an order picking activity, the picker's travel time is the dominant component. To eliminate the travel time, we have developed a picking system in which AGVs transport the entire shelves including the required items to the pickers instead of the pickers moving to the shelves, which improves the efficiency of the picking activities. To minimize the shelf waiting time for the pickers, an intelligent AGV control method such as route planning is required. While there are already some existing approaches using reinforcement learning for this, reinforcement learning often requires hand-engineered low-dimensional state representation, which results in the loss of some state information. In this paper, we present an AGV route planning method for an AGV picking system using deep reinforcement learning. This method uses raw high-dimensional map information as input instead of hand-engineered low-dimensional state representation and it enables the acquisition of a successful AGV route planning policy. We evaluated the validity of the proposed method using an AGV picking system simulator and found that the proposed method outperforms other route planning strategies including our previous method.

引用

页码：1 / 6

页数：6

共 16 条

[1]

[Anonymous], 2015, ARXIV PREPRINT ARXIV

[2]

[Anonymous], 2012, COURSERA NEURAL NETW

[3]

[Anonymous], 1992, THESIS CARNEGIE MELL

[4] Design and control of warehouse order picking: A literature review [J].

de Koster, Rene ;

Le-Duc, Tho ;

Roodbergen, Kees Jan .

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2007, 182 (02) :481-501

[5]

Dijkstra EW., 1959, NUMER MATH, V1, P269, DOI 10.1007/BF01386390

[6] Reinforcement Learning: A Tutorial Survey and Recent Advances [J].

Gosavi, Abhijit .

INFORMS JOURNAL ON COMPUTING, 2009, 21 (02) :178-192

[7]

Gupta G., 2015, LEARNING REAL TIME P

[8] Gradient-based learning applied to document recognition [J].

Lecun, Y ;

Bottou, L ;

Bengio, Y ;

Haffner, P .

PROCEEDINGS OF THE IEEE, 1998, 86 (11) :2278-2324

[9]

Mnih V., 2013, COMPUTER SCI

[10] Human-level control through deep reinforcement learning [J].

Mnih, Volodymyr ;

Kavukcuoglu, Koray ;

Silver, David ;

Rusu, Andrei A. ;

Veness, Joel ;

Bellemare, Marc G. ;

Graves, Alex ;

Riedmiller, Martin ;

Fidjeland, Andreas K. ;

Ostrovski, Georg ;

Petersen, Stig ;

Beattie, Charles ;

Sadik, Amir ;

Antonoglou, Ioannis ;

King, Helen ;

Kumaran, Dharshan ;

Wierstra, Daan ;

Legg, Shane ;

Hassabis, Demis .

NATURE, 2015, 518 (7540) :529-533

← 1 2 →