Deep Reinforcement Learning with Knowledge Transfer for Online Rides Order Dispatching

被引:87
作者
Wang, Zhaodong [1 ]
Qin, Zhiwei [2 ]
Tang, Xiaocheng [2 ]
Ye, Jieping [3 ]
Zhu, Hongtu [3 ]
机构
[1] Washington State Univ, Pullman, WA 99164 USA
[2] DiDi Res Amer, Mountain View, CA USA
[3] DiDi Chuxing, Beijing, Peoples R China
来源
2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM) | 2018年
关键词
Ride dispatching; Deep reinforcement learning; Transfer learning; Spatio-temporal mining;
D O I
10.1109/ICDM.2018.00077
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Ride dispatching is a central operation task on a ride-sharing platform to continuously match drivers to trip-requesting passengers. In this work, we model the ride dispatching problem as a Markov Decision Process and propose learning solutions based on deep Q-networks with action search to optimize the dispatching policy for drivers on ride-sharing platforms. We train and evaluate dispatching agents for this challenging decision task using real-world spatio-temporal trip data from the DiDi ride-sharing platform. A large-scale dispatching system typically supports many geographical locations with diverse demand-supply settings. To increase learning adaptability and efficiency, we propose a new transfer learning method Correlated Feature Progressive Transfer, along with two existing methods, enabling knowledge transfer in both spatial and temporal spaces. Through an extensive set of experiments, we demonstrate the learning and optimization capabilities of our deep reinforcement learning algorithms. We further show that dispatching policies learned by transferring knowledge from a source city to target cities or across temporal space within the same city significantly outperform those without transfer learning.
引用
收藏
页码:617 / 626
页数:10
相关论文
共 26 条
  • [1] [Anonymous], 2017, NIPS
  • [2] [Anonymous], 2009, 8 INT C AUT AG MULT
  • [3] [Anonymous], 2017, ARXIV170708475
  • [4] [Anonymous], 2016, P 4 INT C LEARN REPR
  • [5] [Anonymous], 1998, REINFORCEMENT LEARNI
  • [6] [Anonymous], 2016, NEURAL INFORM PROCES
  • [7] [Anonymous], 1998, COMBINATORIAL OPTIMI
  • [8] Reducing the dimensionality of data with neural networks
    Hinton, G. E.
    Salakhutdinov, R. R.
    [J]. SCIENCE, 2006, 313 (5786) : 504 - 507
  • [9] Overcoming catastrophic forgetting in neural networks
    Kirkpatricka, James
    Pascanu, Razvan
    Rabinowitz, Neil
    Veness, Joel
    Desjardins, Guillaume
    Rusu, Andrei A.
    Milan, Kieran
    Quan, John
    Ramalho, Tiago
    Grabska-Barwinska, Agnieszka
    Hassabis, Demis
    Clopath, Claudia
    Kumaran, Dharshan
    Hadsell, Raia
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2017, 114 (13) : 3521 - 3526
  • [10] Taxi dispatch system based on current demands and real-time traffic conditions
    Lee, DH
    Wang, H
    Cheu, RL
    Teo, SH
    [J]. TRANSPORTATION NETWORK MODELING 2004, 2004, (1882): : 193 - 200