Examining nonlinearity in population inflow estimation using big data: An empirical comparison of explainable machine learning models

被引：26

作者：

Hu, Songhua ^{[1
]}

Xiong, Chenfeng ^{[2
]}

Chen, Peng ^{[3
]}

Schonfeld, Paul ^{[1
]}

机构：

[1] Univ Maryland, Dept Civil & Environm Engn, College Pk, MD 20742 USA

[2] Villanova Univ, Dept Civil & Environm Engn, Villanova, PA 19085 USA

[3] Univ S Florida, Sch Publ Affairs, Tampa, FL 33620 USA

来源：

TRANSPORTATION RESEARCH PART A-POLICY AND PRACTICE | 2023年 / 174卷

关键词：

Explainable machine learning; Interpretability; Nonlinearity; Mobile device location data; Travel demand; BUILT ENVIRONMENT; TRAVEL; BEHAVIOR; CHOICE;

D O I：

10.1016/j.tra.2023.103743

中图分类号：

F [经济];

学科分类号：

02 ;

摘要：

Mobile device location data (MDLD) contain population-representative, fine-grained travel de-mand information, facilitating opportunities to validate established relations between travel de-mand and underlying factors from a big data perspective. Using the nationwide census block group (CBG)-level population inflow derived from MDLD as the proxy of travel demand, this study examines its relations with various factors including socioeconomics, demographics, land use, and CBG attributes. A host of tree-based machine learning (ML) models and interpretation techniques (feature importance, partial dependence plot (PDP), accumulated local effect (ALE), SHapley Additive exPlanations (SHAP)) are extensively compared to determine the best model architecture and justify interpretation robustness. Empirical results show that: 1) Boosting trees perform the best among all models, followed by bagging trees, single trees, and linear regressions. (2) Feature importance holds consistently among different tree-based models but is influenced by measures of importance and hyperparameter settings. 3) Pronounced nonlinearities, threshold effects, and interaction effects are observed in relations among population inflow and most of its determinants. 4) Compared with PDP, ALE and SHAP plots are more reliable in the presence of outliers, feature dependency, and local heterogeneity. Taken together, techniques introduced in this study can either be integrated into customary travel demand models to enhance model ac-curacy or serve as interpretation tools that offer a comprehensive understanding of intricate relations.

引用

页数：20

共 43 条

[31] Analysis of harsh braking and harsh acceleration occurrence via explainable imbalanced machine learning using high-resolution smartphone telematics and traffic data
Ziakopoulos, Apostolos
ACCIDENT ANALYSIS AND PREVENTION, 2024, 207
[32] Prediction of load-bearing capacity of FRP-steel composite tubed concrete columns: Using explainable machine learning model with limited data
Liu, Xiaoyang
Sun, Guozheng
Ju, Ruiqing
Li, Jing
Li, Zili
Jiang, Yali
Zhao, Kai
Zhang, Ye
Jing, Yucai
Yang, Guotao
STRUCTURES, 2025, 71
[33] Observational data analysis using generalizability theory and general and mixed linear models: an empirical study of infant learning and development
Blanco-Villasenor, Angel
Escolano-Perez, Elena
ANALES DE PSICOLOGIA, 2017, 33 (03): : 450 - 460
[34] Interpretable deep learning for consistent large-scale urban population estimation using Earth observation data
Doda, Sugandha
Kahl, Matthias
Ouan, Kim
Obadic, Ivica
Wang, Yuanyuan
Taubenboeck, Hannes
Zhu, Xiao Xiang
INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 128
[35] Enhancing heart failure treatment decisions: interpretable machine learning models for advanced therapy eligibility prediction using EHR data
Yufeng Zhang
Jessica R. Golbus
Emily Wittrup
Keith D. Aaronson
Kayvan Najarian
BMC Medical Informatics and Decision Making, 24
[36] Enhancing heart failure treatment decisions: interpretable machine learning models for advanced therapy eligibility prediction using EHR data
Zhang, Yufeng
Golbus, Jessica R.
Wittrup, Emily
Aaronson, Keith D.
Najarian, Kayvan
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 24 (01)
[37] A systematic comparison of short-term and long-term mortality prediction in acute myocardial infarction using machine learning models
Yawei Yang
Junjie Tang
Liping Ma
Feng Wu
Xiaoqing Guan
BMC Medical Informatics and Decision Making, 25 (1)
[38] Risk of crashes among self-employed truck drivers: Prevalence evaluation using fatigue data and machine learning prediction models
Soliani, Rodrigo Duarte
Lopes, Alisson Vinicius Brito
Santiago, Fabio
da Silva, Luiz Bueno
Emekwuru, Nwabueze
Lorena, Ana Carolina
JOURNAL OF SAFETY RESEARCH, 2025, 92 : 68 - 80
[39] Machine learning-aided selection of CPT-based transformation models using field monitoring data from a specific project
Tian, Hua-Ming
Wang, Yu
Shi, Chao
ACTA GEOTECHNICA, 2025, 20 (01) : 439 - 459
[40] Exploring interpretable and non-interpretable machine learning models for estimating winter wheat evapotranspiration using particle swarm optimization with limited climatic data
Zhao, Xin
Zhang, Lei
Zhu, Ge
Cheng, Chenguang
He, Jun
Traore, Seydou
Singh, Vijay P.
COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2023, 212

← 1 2 3 4 5 →