A stacking model for variation prediction of public bicycle traffic flow

被引：16

作者：

Lin, Fei ^{[1
]}

Jiang, Jian ^{[1
]}

Fan, Jin ^{[1
]}

Wang, Shihua ^{[1
]}

机构：

[1] Hangzhou Dianzi Univ, Sch Comp Sci & Technol, Hangzhou 310018, Zhejiang, Peoples R China

来源：

INTELLIGENT DATA ANALYSIS | 2018年 / 22卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Public bicycle; traffic flow variation prediction; stacking; xgboost; k-medoids; data mining; machine learning;

D O I：

10.3233/IDA-173443

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Public bicycle system can improve the public transport travel efficiency and reduce environmental pollution, which has been deployed in many cities all over the world. However, the bicycle usages become quite skewed and imbalanced in different stations. A system which could recommend the nearest available stations for passengers whether they are looking for a dock or a bike, is of great importance. Monitoring the current number of docks or bikes at each station cannot tackle this problem because it's too late to recommend the station for passengers to rent or return bikes after the imbalance has occurred. To address this issue, we propose a stacking model for variation prediction of public bicycle traffic flow called SMVP based on the real-world datasets. The stacking model integrates multiple base models which we trained by different combinations of features so that it could get better performance. We adopt a machine learning system called XGBoost [25] to train the models and construct the multiple complex factors which impact the public bicycle traffic flow. The traditional factors, such as temporal, spatial, historical and meteorological factors are taken into consideration. A new clustering factor which considers both the geographical positions and transition patterns of stations is also proposed in this framework and then we use the K-Medoids algorithm [12] to cluster stations into groups by constructing a new different station relation matrix which considers these two factors as the distance between different stations. The performance of SMVP is improved on the datasets of Hangzhou and New York City, especially in terms of Coefficient of Determination improved by 25.58% in Hangzhou, compared with the traditional stacking [5] and single model respectively.

引用

页码：911 / 933

页数：23

共 28 条

[1]

[Anonymous], CIRRELT

[2]

[Anonymous], JOURNEYS

[3]

[Anonymous], J STAT COMPUTATION S

[4] BALANCING THE STATIONS OF A SELF SERVICE "BIKE HIRE" SYSTEM [J].

Benchimol, Mike ;

Benchimol, Pascal ;

Chappert, Benoit ;

de la Taille, Arnaud ;

Laroche, Fabien ;

Meunier, Frederic ;

Robinet, Ludovic .

RAIRO-OPERATIONS RESEARCH, 2011, 45 (01) :37-61

[5] SHARED BICYCLES IN A CITY: A SIGNAL PROCESSING AND DATA ANALYSIS PERSPECTIVE [J].

Borgnat, Pierre ;

Abry, Patrice ;

Flandrin, Patrick ;

Robardet, Celine ;

Rouquier, Jean-Baptiste ;

Fleury, Eric .

ADVANCES IN COMPLEX SYSTEMS, 2011, 14 (03) :415-438

[6]

Chen T., 2016, ACM KNOWLEDGE DISCOV

[7]

DeMaio P., 2009, BIKE SHARING HIST I, V12, P41, DOI [10.5038/2375-0901.12.4.3, DOI 10.5038/2375-0901.12.4.3]

[8]

Deng L, 2012, INT CONF ACOUST SPEE, P2133, DOI 10.1109/ICASSP.2012.6288333

[9] Model-Based Count Series Clustering for Bike Sharing System Usage Mining: A Case Study with the Velib' System of Paris [J].

Etienne, Come ;

Latifa, Oukhellou .

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2014, 5 (03)

[10] Greedy function approximation: A gradient boosting machine [J].

Friedman, JH .

ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232

← 1 2 3 →