Dynamic Bike Reposition: A Spatio-Temporal Reinforcement Learning Approach

被引：88

作者：

Li, Yexin ^{[1
]}

Zheng, Yu ^{[2
]}

Yang, Qiang ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China

[2] JD Finance, Urban Comp Business Unit, Beijing, Peoples R China

来源：

KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING | 2018年

基金：

中国国家自然科学基金;

关键词：

Bike-Sharing System; Dynamic Bike Reposition; Reinforcement Learning;

D O I：

10.1145/3219819.3220110

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Bike-sharing systems are widely deployed in many major cities, while the jammed and empty stations in them lead to severe customer loss. Currently, operators try to constantly reposition bikes among stations when the system is operating. However, how to efficiently reposition to minimize the customer loss in a long period remains unsolved. We propose a spatio-temporal reinforcement learning based bike reposition model to deal with this problem. Firstly, an inter-independent inner-balance clustering algorithm is proposed to cluster stations into groups. Clusters obtained have two properties, i.e. each cluster is inner-balanced and independent from the others. As there are many trikes repositioning in a very large system simultaneously, clustering is necessary to reduce the problem complexity. Secondly, we allocate multiple trikes to each cluster to conduct inner-cluster bike reposition. A spatio-temporal reinforcement learning model is designed for each cluster to learn a reposition policy in it, targeting at minimizing its customer loss in a long period. To learn each model, we design a deep neural network to estimate its optimal long-term value function, from which the optimal policy can be easily inferred. Besides formulating the model in a multi-agent way, we further reduce its training complexity by two spatio-temporal pruning rules. Thirdly, we design a system simulator based on two predictors to train and evaluate the reposition model. Experiments on real-world datasets from Citi Bike are conducted to confirm the effectiveness of our model.

引用

页码：1724 / 1733

页数：10

共 20 条

[1]

Bao J., 2017, P KDD

[2] SHARED BICYCLES IN A CITY: A SIGNAL PROCESSING AND DATA ANALYSIS PERSPECTIVE [J].

Borgnat, Pierre ;

Abry, Patrice ;

Flandrin, Patrick ;

Robardet, Celine ;

Rouquier, Jean-Baptiste ;

Fleury, Eric .

ADVANCES IN COMPLEX SYSTEMS, 2011, 14 (03) :415-438

[3]

Chemla Daniel., 2013, SELF SERVICE BIKE SH

[4]

Fricker C., 2016, EURO J TRANSPORTATIO

[5]

Froehlich J., 2009, P IJCAI, V9

[6]

Ghosh S., 2017, J ARTIFICIAL INTELLI, P387

[7]

Ghosh Supriyo., 2016, Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, P3096

[8]

Hasselt H. V., 2016, P AAAI

[9]

Ji Won Yoon, 2012, Proceedings of the 2012 13th IEEE International Conference on Mobile Data Management (MDM), P306, DOI 10.1109/MDM.2012.16

[10]

Johnson S. C, 1967, PSYCHOMETRIKA

← 1 2 →