Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning

被引:191
作者
Li, Minne [1 ]
Qin, Zhiwei [2 ]
Jiao, Yan [2 ]
Yang, Yaodong [1 ]
Gong, Zhichen [1 ]
Wang, Jun [1 ]
Wang, Chenxi
Wu, Guobin
Ye, Jieping
机构
[1] UCL, London, England
[2] DiDi Res Amer, Mountain View, CA USA
来源
WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019) | 2019年
关键词
Multi-Agent Reinforcement Learning; Mean Field Reinforcement Learning; Order Dispatching; INTERNET; THINGS;
D O I
10.1145/3308558.3313433
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A fundamental question in any peer-to-peer ridesharing system is how to, both effectively and efficiently, dispatch user's ride requests to the right driver in real time. Traditional rule-based solutions usually work on a simplified problem setting, which requires a sophisticated hand-crafted weight design for either centralized authority control or decentralized multi-agent scheduling systems. Although recent approaches have used reinforcement learning to provide centralized combinatorial optimization algorithms with informative weight values, their single-agent setting can hardly model the complex interactions between drivers and orders. In this paper, we address the order dispatching problem using multi-agent reinforcement learning (MARL), which follows the distributed nature of the peer-to-peer ridesharing problem and possesses the ability to capture the stochastic demand-supply dynamics in large-scale ridesharing scenarios. Being more reliable than centralized approaches, our proposed MARL solutions could also support fully distributed execution through recent advances in the Internet of Vehicles (IoV) and the Vehicle-to-Network (V2N). Furthermore, we adopt the mean field approximation to simplify the local interactions by taking an average action among neighborhoods. The mean field approximation is capable of globally capturing dynamic demand-supply variations by propagating many local interactions between agents and the environment. Our extensive experiments have shown the significant improvements of MARL order dispatching algorithms over several strong baselines on the accumulated driver income (ADI), and order response rate measures. Besides, the simulated experiments with real data have also justified that our solution can alleviate the supply-demand gap during the rush hours, thus possessing the capability of reducing traffic congestion.
引用
收藏
页码:983 / 994
页数:12
相关论文
共 45 条
[1]   Interworking of DSRC and Cellular Network Technologies for V2X Communications: A Survey [J].
Abboud, Khadige ;
Omar, Hassan Aboubakr ;
Zhuang, Weihua .
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2016, 65 (12) :9457-9470
[2]   Analyzing and visualizing multiagent rewards in dynamic and stochastic domains [J].
Agogino, Adrian K. ;
Tumer, Kagan .
AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2008, 17 (02) :320-338
[3]   Real-Time Ridesharing Opportunities and Challenges in Using Mobile Phone Technology to Improve Rideshare Services [J].
Amey, Andrew ;
Attanucci, John ;
Mishalani, Rabi .
TRANSPORTATION RESEARCH RECORD, 2011, (2217) :103-110
[4]  
[Anonymous], MULTIAGENT REINFORCE
[5]  
[Anonymous], 2009, Single point of failure: The 10 essential laws of supply chain risk management
[6]  
[Anonymous], 2016, 2016 INT C INF SYST
[7]  
[Anonymous], P 8 INT C AUT AG MUL
[8]  
[Anonymous], 1994, P 11 INT C INT C MAC
[9]  
[Anonymous], 2014, ICML ICML 14
[10]  
[Anonymous], 1998, REINFORCEMENT LEARNI