Examining nonlinearity in population inflow estimation using big data: An empirical comparison of explainable machine learning models

被引:26
|
作者
Hu, Songhua [1 ]
Xiong, Chenfeng [2 ]
Chen, Peng [3 ]
Schonfeld, Paul [1 ]
机构
[1] Univ Maryland, Dept Civil & Environm Engn, College Pk, MD 20742 USA
[2] Villanova Univ, Dept Civil & Environm Engn, Villanova, PA 19085 USA
[3] Univ S Florida, Sch Publ Affairs, Tampa, FL 33620 USA
关键词
Explainable machine learning; Interpretability; Nonlinearity; Mobile device location data; Travel demand; BUILT ENVIRONMENT; TRAVEL; BEHAVIOR; CHOICE;
D O I
10.1016/j.tra.2023.103743
中图分类号
F [经济];
学科分类号
02 ;
摘要
Mobile device location data (MDLD) contain population-representative, fine-grained travel de-mand information, facilitating opportunities to validate established relations between travel de-mand and underlying factors from a big data perspective. Using the nationwide census block group (CBG)-level population inflow derived from MDLD as the proxy of travel demand, this study examines its relations with various factors including socioeconomics, demographics, land use, and CBG attributes. A host of tree-based machine learning (ML) models and interpretation techniques (feature importance, partial dependence plot (PDP), accumulated local effect (ALE), SHapley Additive exPlanations (SHAP)) are extensively compared to determine the best model architecture and justify interpretation robustness. Empirical results show that: 1) Boosting trees perform the best among all models, followed by bagging trees, single trees, and linear regressions. (2) Feature importance holds consistently among different tree-based models but is influenced by measures of importance and hyperparameter settings. 3) Pronounced nonlinearities, threshold effects, and interaction effects are observed in relations among population inflow and most of its determinants. 4) Compared with PDP, ALE and SHAP plots are more reliable in the presence of outliers, feature dependency, and local heterogeneity. Taken together, techniques introduced in this study can either be integrated into customary travel demand models to enhance model ac-curacy or serve as interpretation tools that offer a comprehensive understanding of intricate relations.
引用
收藏
页数:20
相关论文
共 43 条
  • [31] Analysis of harsh braking and harsh acceleration occurrence via explainable imbalanced machine learning using high-resolution smartphone telematics and traffic data
    Ziakopoulos, Apostolos
    ACCIDENT ANALYSIS AND PREVENTION, 2024, 207
  • [32] Prediction of load-bearing capacity of FRP-steel composite tubed concrete columns: Using explainable machine learning model with limited data
    Liu, Xiaoyang
    Sun, Guozheng
    Ju, Ruiqing
    Li, Jing
    Li, Zili
    Jiang, Yali
    Zhao, Kai
    Zhang, Ye
    Jing, Yucai
    Yang, Guotao
    STRUCTURES, 2025, 71
  • [33] Observational data analysis using generalizability theory and general and mixed linear models: an empirical study of infant learning and development
    Blanco-Villasenor, Angel
    Escolano-Perez, Elena
    ANALES DE PSICOLOGIA, 2017, 33 (03): : 450 - 460
  • [34] Interpretable deep learning for consistent large-scale urban population estimation using Earth observation data
    Doda, Sugandha
    Kahl, Matthias
    Ouan, Kim
    Obadic, Ivica
    Wang, Yuanyuan
    Taubenboeck, Hannes
    Zhu, Xiao Xiang
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 128
  • [35] Enhancing heart failure treatment decisions: interpretable machine learning models for advanced therapy eligibility prediction using EHR data
    Yufeng Zhang
    Jessica R. Golbus
    Emily Wittrup
    Keith D. Aaronson
    Kayvan Najarian
    BMC Medical Informatics and Decision Making, 24
  • [36] Enhancing heart failure treatment decisions: interpretable machine learning models for advanced therapy eligibility prediction using EHR data
    Zhang, Yufeng
    Golbus, Jessica R.
    Wittrup, Emily
    Aaronson, Keith D.
    Najarian, Kayvan
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 24 (01)
  • [37] A systematic comparison of short-term and long-term mortality prediction in acute myocardial infarction using machine learning models
    Yawei Yang
    Junjie Tang
    Liping Ma
    Feng Wu
    Xiaoqing Guan
    BMC Medical Informatics and Decision Making, 25 (1)
  • [38] Risk of crashes among self-employed truck drivers: Prevalence evaluation using fatigue data and machine learning prediction models
    Soliani, Rodrigo Duarte
    Lopes, Alisson Vinicius Brito
    Santiago, Fabio
    da Silva, Luiz Bueno
    Emekwuru, Nwabueze
    Lorena, Ana Carolina
    JOURNAL OF SAFETY RESEARCH, 2025, 92 : 68 - 80
  • [39] Machine learning-aided selection of CPT-based transformation models using field monitoring data from a specific project
    Tian, Hua-Ming
    Wang, Yu
    Shi, Chao
    ACTA GEOTECHNICA, 2025, 20 (01) : 439 - 459
  • [40] Exploring interpretable and non-interpretable machine learning models for estimating winter wheat evapotranspiration using particle swarm optimization with limited climatic data
    Zhao, Xin
    Zhang, Lei
    Zhu, Ge
    Cheng, Chenguang
    He, Jun
    Traore, Seydou
    Singh, Vijay P.
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2023, 212