Effects of Different Training Datasets on Machine Learning Models for Pavement Performance Prediction

被引:1
|
作者
Aranha, Ana Luisa [1 ]
Bernucci, Liedi Legi Bariani [1 ]
Vasconcelos, Kamilla L. [1 ]
机构
[1] Univ Sao Paulo, Dept Transportat Engn, Sao Paulo, Brazil
关键词
data and data science; machine learning (artificial intelligence); infrastructure; infrastructure management and system preservation; pavement management systems; pavement performance;
D O I
10.1177/03611981231155902
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
With improvements in data collection, storage, and processing, machine learning (ML) is gaining momentum as a behavior prediction method in the field of engineering. Several studies have evaluated these algorithms' potential to predict pavement serviceability, however some challenges limit its use. Training data preprocessing has a great impact on the model's predictive performance, is highly dependent on the modeler's experience, and is not typically reported in engineering-related literature. The objective of this study was to assess the effects of data preprocessing, hyperparameter selection, and time series size on the model's evaluation metrics. Therefore, this paper analyzes the performance of three ML algorithms on maximum deflection (D-0) and international roughness index (IRI) prediction: support vector machine, random forest (RF), and artificial neural network (ANN). An R-2 and mean square error (MSE) analysis was conducted on 12 training datasets, with two sizes of historical data and five stages of data preprocessing. The results indicated that ANN was the most accurate technique with an R-2 of 0.99 and MSE of 20 x10(-3) mm on the D-0 prediction and an R-2 of 0.91 and MSE of 0.03 m/km on the IRI prediction. RF was also identified as an effective technique, generating similar results with less data preprocessing. The addition of structural and traffic categorical features to the training dataset resulted in the most significant improvement of the support vector regression and ANN performance metrics; the hyperparameter selection was effective only on IRI prediction, especially with the ANN algorithm.
引用
收藏
页码:196 / 206
页数:11
相关论文
共 50 条
  • [31] Choices Matter When Training Machine Learning Models for Return Prediction
    Howard, Clint
    FINANCIAL ANALYSTS JOURNAL, 2024, 80 (04) : 81 - 107
  • [32] Performance Assessment of Machine Learning Based Models for Diabetes Prediction
    Deo, Ridhi
    Panigrahi, Suranjan
    2019 IEEE HEALTHCARE INNOVATIONS AND POINT OF CARE TECHNOLOGIES (HI-POCT), 2019, : 147 - 150
  • [33] Unsupervised machine learning for disease prediction: a comparative performance analysis using multiple datasets
    Haohui Lu
    Shahadat Uddin
    Health and Technology, 2024, 14 : 141 - 154
  • [34] Performance Analysis of Machine Learning Techniques on Software Defect Prediction using NASA Datasets
    Iqbal, Ahmed
    Aftab, Shabib
    Ali, Umair
    Nawaz, Zahid
    Sana, Laraib
    Ahmad, Munir
    Husen, Arif
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (05) : 300 - 308
  • [35] Is handling unbalanced datasets for machine learning uplifts system performance?: A case of diabetic prediction
    Narwane, Swati V.
    Sawarkar, Sudhir D.
    DIABETES & METABOLIC SYNDROME-CLINICAL RESEARCH & REVIEWS, 2022, 16 (09)
  • [36] Several models for tunnel boring machine performance prediction based on machine learning
    Mahmoodzadeh, Arsalan
    Nejati, Hamid Reza
    Ibrahim, Hawkar Hashim
    Ali, Hunar Farid Hama
    Mohammed, Adil Hussein
    Rashidi, Shima
    Majeed, Mohammed Kamal
    GEOMECHANICS AND ENGINEERING, 2022, 30 (01) : 75 - 91
  • [37] Unsupervised machine learning for disease prediction: a comparative performance analysis using multiple datasets
    Lu, Haohui
    Uddin, Shahadat
    HEALTH AND TECHNOLOGY, 2024, 14 (01) : 141 - 154
  • [38] Machine learning algorithms for monitoring pavement performance
    Cano-Ortiz, Saul
    Pascual-Munoz, Pablo
    Castro-Fresno, Daniel
    AUTOMATION IN CONSTRUCTION, 2022, 139
  • [39] Assessing the effects of data drift on the performance of machine learning models used in clinical sepsis prediction
    Rahmani, Keyvan
    Thapa, Rahul
    Tsou, Peiling
    Chetty, Satish Casie
    Barnes, Gina
    Lam, Carson
    Tso, Chak Foon
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2023, 173
  • [40] Airfield pavement condition prediction with machine learning models for life-cycle cost analysis
    Clemmensen, April
    Wang, Hao
    INTERNATIONAL JOURNAL OF PAVEMENT ENGINEERING, 2024, 25 (01)