Deep Reinforcement Learning with Knowledge Transfer for Online Rides Order Dispatching

被引：96

作者：

Wang, Zhaodong ^{[1
]}

Qin, Zhiwei ^{[2
]}

Tang, Xiaocheng ^{[2
]}

Ye, Jieping ^{[3
]}

Zhu, Hongtu ^{[3
]}

机构：

[1] Washington State Univ, Pullman, WA 99164 USA

[2] DiDi Res Amer, Mountain View, CA USA

[3] DiDi Chuxing, Beijing, Peoples R China

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM) | 2018年

关键词：

Ride dispatching; Deep reinforcement learning; Transfer learning; Spatio-temporal mining;

D O I：

10.1109/ICDM.2018.00077

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Ride dispatching is a central operation task on a ride-sharing platform to continuously match drivers to trip-requesting passengers. In this work, we model the ride dispatching problem as a Markov Decision Process and propose learning solutions based on deep Q-networks with action search to optimize the dispatching policy for drivers on ride-sharing platforms. We train and evaluate dispatching agents for this challenging decision task using real-world spatio-temporal trip data from the DiDi ride-sharing platform. A large-scale dispatching system typically supports many geographical locations with diverse demand-supply settings. To increase learning adaptability and efficiency, we propose a new transfer learning method Correlated Feature Progressive Transfer, along with two existing methods, enabling knowledge transfer in both spatial and temporal spaces. Through an extensive set of experiments, we demonstrate the learning and optimization capabilities of our deep reinforcement learning algorithms. We further show that dispatching policies learned by transferring knowledge from a source city to target cities or across temporal space within the same city significantly outperform those without transfer learning.

引用

页码：617 / 626

页数：10

共 26 条

[1]

[Anonymous], 2017, NIPS

[2]

[Anonymous], 2009, 8 INT C AUT AG MULT

[3]

[Anonymous], 2017, ARXIV170708475

[4]

[Anonymous], 2016, P 4 INT C LEARN REPR

[5]

[Anonymous], 1998, REINFORCEMENT LEARNI

[6]

[Anonymous], 2016, NEURAL INFORM PROCES

[7]

[Anonymous], 1998, COMBINATORIAL OPTIMI

[8] Reducing the dimensionality of data with neural networks [J].

Hinton, G. E. ;

Salakhutdinov, R. R. .

SCIENCE, 2006, 313 (5786) :504-507

[9] Overcoming catastrophic forgetting in neural networks [J].

Kirkpatricka, James ;

Pascanu, Razvan ;

Rabinowitz, Neil ;

Veness, Joel ;

Desjardins, Guillaume ;

Rusu, Andrei A. ;

Milan, Kieran ;

Quan, John ;

Ramalho, Tiago ;

Grabska-Barwinska, Agnieszka ;

Hassabis, Demis ;

Clopath, Claudia ;

Kumaran, Dharshan ;

Hadsell, Raia .

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2017, 114 (13) :3521-3526

[10] Taxi dispatch system based on current demands and real-time traffic conditions [J].

Lee, DH ;

Wang, H ;

Cheu, RL ;

Teo, SH .

TRANSPORTATION NETWORK MODELING 2004, 2004, (1882) :193-200

← 1 2 3 →