Examining nonlinearity in population inflow estimation using big data: An empirical comparison of explainable machine learning models

被引:26
|
作者
Hu, Songhua [1 ]
Xiong, Chenfeng [2 ]
Chen, Peng [3 ]
Schonfeld, Paul [1 ]
机构
[1] Univ Maryland, Dept Civil & Environm Engn, College Pk, MD 20742 USA
[2] Villanova Univ, Dept Civil & Environm Engn, Villanova, PA 19085 USA
[3] Univ S Florida, Sch Publ Affairs, Tampa, FL 33620 USA
关键词
Explainable machine learning; Interpretability; Nonlinearity; Mobile device location data; Travel demand; BUILT ENVIRONMENT; TRAVEL; BEHAVIOR; CHOICE;
D O I
10.1016/j.tra.2023.103743
中图分类号
F [经济];
学科分类号
02 ;
摘要
Mobile device location data (MDLD) contain population-representative, fine-grained travel de-mand information, facilitating opportunities to validate established relations between travel de-mand and underlying factors from a big data perspective. Using the nationwide census block group (CBG)-level population inflow derived from MDLD as the proxy of travel demand, this study examines its relations with various factors including socioeconomics, demographics, land use, and CBG attributes. A host of tree-based machine learning (ML) models and interpretation techniques (feature importance, partial dependence plot (PDP), accumulated local effect (ALE), SHapley Additive exPlanations (SHAP)) are extensively compared to determine the best model architecture and justify interpretation robustness. Empirical results show that: 1) Boosting trees perform the best among all models, followed by bagging trees, single trees, and linear regressions. (2) Feature importance holds consistently among different tree-based models but is influenced by measures of importance and hyperparameter settings. 3) Pronounced nonlinearities, threshold effects, and interaction effects are observed in relations among population inflow and most of its determinants. 4) Compared with PDP, ALE and SHAP plots are more reliable in the presence of outliers, feature dependency, and local heterogeneity. Taken together, techniques introduced in this study can either be integrated into customary travel demand models to enhance model ac-curacy or serve as interpretation tools that offer a comprehensive understanding of intricate relations.
引用
收藏
页数:20
相关论文
共 43 条
  • [21] Comparative effectiveness of explainable machine learning approaches for extrauterine growth restriction classification in preterm infants using longitudinal data
    Cho, Kee Hyun
    Kim, Eun Sun
    Kim, Jong Wook
    Yun, Cheol-Heui
    Jang, Jae-Won
    Kasani, Payam Hosseinzadeh
    Jo, Heui Seung
    FRONTIERS IN MEDICINE, 2023, 10
  • [22] Environmental factors for outdoor jogging in Beijing: Insights from using explainable spatial machine learning and massive trajectory data
    Yang, Wei
    Li, Yingpeng
    Liu, Yong
    Fan, Peilei
    Yue, Wenze
    LANDSCAPE AND URBAN PLANNING, 2024, 243
  • [23] Unraveling nonlinear effects of environment features on green view index using multiple data sources and explainable machine learning
    Chen, Cai
    Wang, Jian
    Li, Dong
    Sun, Xiaohu
    Zhang, Jiyong
    Yang, Changjiang
    Zhang, Bo
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [24] TUN-AI: Tuna biomass estimation with Machine Learning models trained on oceanography and echosounder FAD data
    Precioso, Daniel
    Navarro-Garcia, Manuel
    Gavira-O'Neill, Kathryn
    Torres-Barran, Alberto
    Gordo, David
    Gallego, Victor
    Gomez-Ullate, David
    FISHERIES RESEARCH, 2022, 250
  • [25] Explanation of machine learning models using shapley additive explanation and application for real data in hospital
    Nohara, Yasunobu
    Matsumoto, Koutarou
    Soejima, Hidehisa
    Nakashima, Naoki
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2022, 214
  • [26] Exploring influence of groundwater and lithology on data-driven stability prediction of soil slopes using explainable machine learning: a case study
    Wen Gao
    Mingdong Zang
    Gang Mei
    Bulletin of Engineering Geology and the Environment, 2024, 83
  • [27] Exploring influence of groundwater and lithology on data-driven stability prediction of soil slopes using explainable machine learning: a case study
    Gao, Wen
    Zang, Mingdong
    Mei, Gang
    BULLETIN OF ENGINEERING GEOLOGY AND THE ENVIRONMENT, 2024, 83 (01)
  • [28] Constructing prediction models and analyzing factors in suicidal ideation using machine learning, focusing on the older population
    Jung, Hyun Woo
    Jang, Jin Su
    PLOS ONE, 2024, 19 (07):
  • [29] Examining the nonlinear and threshold effects of the 5Ds built environment to land values using interpretable machine learning models
    Doan, Quang Cuong
    Vu, Khac Hung
    Trinh, Thi Kieu Trang
    Bui, Thi Cam Ngoc
    JOURNAL OF GEOGRAPHICAL SCIENCES, 2024, 34 (12) : 2509 - 2533
  • [30] Predicting Cyberbullying on Social Media in the Big Data Era Using Machine Learning Algorithms: Review of Literature and Open Challenges
    Al-Garadi, Mohammed Ali
    Hussain, Mohammad Rashid
    Khan, Nawsher
    Murtaza, Ghulam
    Nweke, Henry Friday
    Ali, Ihsan
    Mujtaba, Ghulam
    Chiroma, Haruna
    Khattak, Hasan Ali
    Gani, Abdullah
    IEEE ACCESS, 2019, 7 : 70701 - 70718