Examining nonlinearity in population inflow estimation using big data: An empirical comparison of explainable machine learning models

被引:26
|
作者
Hu, Songhua [1 ]
Xiong, Chenfeng [2 ]
Chen, Peng [3 ]
Schonfeld, Paul [1 ]
机构
[1] Univ Maryland, Dept Civil & Environm Engn, College Pk, MD 20742 USA
[2] Villanova Univ, Dept Civil & Environm Engn, Villanova, PA 19085 USA
[3] Univ S Florida, Sch Publ Affairs, Tampa, FL 33620 USA
关键词
Explainable machine learning; Interpretability; Nonlinearity; Mobile device location data; Travel demand; BUILT ENVIRONMENT; TRAVEL; BEHAVIOR; CHOICE;
D O I
10.1016/j.tra.2023.103743
中图分类号
F [经济];
学科分类号
02 ;
摘要
Mobile device location data (MDLD) contain population-representative, fine-grained travel de-mand information, facilitating opportunities to validate established relations between travel de-mand and underlying factors from a big data perspective. Using the nationwide census block group (CBG)-level population inflow derived from MDLD as the proxy of travel demand, this study examines its relations with various factors including socioeconomics, demographics, land use, and CBG attributes. A host of tree-based machine learning (ML) models and interpretation techniques (feature importance, partial dependence plot (PDP), accumulated local effect (ALE), SHapley Additive exPlanations (SHAP)) are extensively compared to determine the best model architecture and justify interpretation robustness. Empirical results show that: 1) Boosting trees perform the best among all models, followed by bagging trees, single trees, and linear regressions. (2) Feature importance holds consistently among different tree-based models but is influenced by measures of importance and hyperparameter settings. 3) Pronounced nonlinearities, threshold effects, and interaction effects are observed in relations among population inflow and most of its determinants. 4) Compared with PDP, ALE and SHAP plots are more reliable in the presence of outliers, feature dependency, and local heterogeneity. Taken together, techniques introduced in this study can either be integrated into customary travel demand models to enhance model ac-curacy or serve as interpretation tools that offer a comprehensive understanding of intricate relations.
引用
收藏
页数:20
相关论文
共 43 条
  • [41] Data-Driven Prediction Models For Total Shear Strength of Reinforced Concrete Beams With Fiber Reinforced Polymers Using An Evolutionary Machine Learning Approach
    Anvari, Ataollah Taghipour
    Babanajad, Saeed
    Gandomi, Amir H.
    ENGINEERING STRUCTURES, 2023, 276
  • [42] Data-driven design approach for the lateral-distortional buckling in steel-concrete composite cellular beams using machine learning models
    de Oliveira, Vinicius Moura
    de Carvalho, Adriano Silva
    Rossi, Alexandre
    Hosseinpour, Mahmoud
    Sharifi, Yasser
    Martins, Carlos Humberto
    STRUCTURES, 2024, 61
  • [43] Risk prediction for cut-ins using multi-driver simulation data and machine learning algorithms: A comparison among decision tree, GBDT and LSTM
    Luo, Tianyang
    Wang, Junhua
    Fu, Ting
    Shangguan, Qiangqiang
    Fang, Shou'en
    INTERNATIONAL JOURNAL OF TRANSPORTATION SCIENCE AND TECHNOLOGY, 2023, 12 (03) : 862 - 877