InBEDE: Integrating Contextual Bandit with TD Learning for Joint Pricing and Dispatch of Ride-Hailing Platforms

被引:31
作者
Chen, Haipeng [1 ,2 ]
Jiao, Yan [2 ]
Qin, Zhiwei [2 ]
Tang, Xiaocheng [2 ]
Li, Hao [2 ]
An, Bo [1 ]
Zhu, Hongtu [2 ]
Ye, Jieping [2 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Didi Chuxing, AI Labs, Beijing, Peoples R China
来源
2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019) | 2019年
关键词
Ride-hailing platform; Joint pricing and dispatch; Reinforcement learning;
D O I
10.1109/ICDM.2019.00016
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For both the traditional street-hailing taxi industry and the recently emerged on-line ride-hailing, it has been a major challenge to improve the ride-hailing marketplace efficiency due to spatio-temporal imbalance between the supply and demand, among other factors. Despite the numerous approaches to improve marketplace efficiency using pricing and dispatch strategies, they usually optimize pricing or dispatch separately. In this paper, we show that these two processes are in fact intrinsically interrelated. Motivated by this observation, we make an attempt to simultaneously optimize pricing and dispatch strategies. However, such a joint optimization is extremely challenging due to the inherent huge scale and lack of a uniform model of the problem. To handle the high complexity brought by the new problem, we propose InBEDE (Integrating contextual Bandit with tEmporal DiffErence learning), a learning framework where pricing strategies are learned via a contextual bandit algorithm, and the dispatch strategies are optimized with the help of temporal difference learning. The two learning components proceed in a mutual bootstrapping manner, in the sense that the policy evaluations of the two components are inter-dependent. Evaluated with real-world datasets of two Chinese cities from Didi Chuxing, an online ride-hailing platform, we show that the market efficiency of the ride-hailing platform can be significantly improved using InBEDE.
引用
收藏
页码:61 / 70
页数:10
相关论文
共 38 条
[1]  
Agrawal S, 2013, INT C MACH LEARN, P127
[2]  
ALLESIARDO R, 2014, NIPS, V8834, P374
[3]  
[Anonymous], 2016, PROC INT C MACH LEAR
[4]  
[Anonymous], 2016, USING BIG DATA ESTIM
[5]   Finite-time analysis of the multiarmed bandit problem [J].
Auer, P ;
Cesa-Bianchi, N ;
Fischer, P .
MACHINE LEARNING, 2002, 47 (2-3) :235-256
[6]  
Bai J., 2018, MANUFACTURING SERVIC
[7]  
Banerjee S., 2016, Pricing and optimization in shared vehicle systems: An approximation framework
[8]   Optimal adaptive policies for sequential allocation problems [J].
Burnetas, AN ;
Katehakis, MN .
ADVANCES IN APPLIED MATHEMATICS, 1996, 17 (02) :122-142
[9]   Surge Pricing Solves the Wild Goose Chase [J].
Castillo, Juan Camilo ;
Knoepfle, Dan ;
Weyl, Glen .
EC'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON ECONOMICS AND COMPUTATION, 2017, :241-242
[10]  
Chen M., 2017, ARXIV170701625