Machine and deep learning performance in out-of-distribution regressions

被引:0
作者
Shmuel, Assaf [1 ]
Glickman, Oren [1 ]
Lazebnik, Teddy [2 ]
机构
[1] Bar Ilan Univ, Dept Comp Sci, Ramat Gan, Israel
[2] UCL, Canc Inst, Dept Canc Biol, London, England
来源
MACHINE LEARNING-SCIENCE AND TECHNOLOGY | 2024年 / 5卷 / 04期
关键词
data-driven model generalization; out of distribution; feature engineering; symbolic regression; machine learning robustness; SYMBOLIC REGRESSION; BIG DATA; MODEL;
D O I
10.1088/2632-2153/ada221
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning (ML) and deep learning (DL) models are gaining popularity due to their effectiveness in many computational tasks. These models are based on an intuitive, but frequently unsatisfied, assumption that the data used to train these models is well-representing the task at hand. This gives rise to the out-of-distribution (OOD) challenge which can cause an unexpected drop in the data-driven model's performance. In this study, we evaluate the performance of various ML and DL models in in-distribution (ID) versus OOD prediction. While the degradation in OOD performance is well acknowledged, to the best of our knowledge, this is one of the first studies to quantify it for various models on a large benchmark n = 15 real-world regression datasets. We extensively ( n>40000 runs) compare the ID versus OOD performance of XGBoost, random forest, K-nearest-neighbors, support vector machine, and linear regression models, as well as AutoML models (Tree-based Pipeline Optimization Tool and AutoKeras). In addition, to tackle this challenge, we propose to integrate a symbolic regression (SR) as a feature engineering method model with an ML or DL model to improve its performance for OOD samples. Our results show that the incorporation of SR-derived features significantly enhances the predictive capabilities of both ML and DL models with 3.70% and 10.20%, on average, of the OOD samples, respectively, without reducing ID performance and in fact improving it to a slightly lower extent. As such, this method can help produce more generalized and robust data-driven models.
引用
收藏
页数:30
相关论文
共 50 条
[41]   Safety Monitoring for Learning-Enabled Cyber-Physical Systems in Out-of-Distribution Scenarios [J].
Lin, Vivian ;
Kaur, Ramneet ;
Yang, Yahan ;
Dutta, Souradeep ;
Kantaros, Yiannis ;
Roy, Anirban ;
Jha, Susmit ;
Sokolsky, Oleg ;
Lee, Insup .
PROCEEDINGS OF THE 16TH ANNUAL ACM/IEEE INTERNATIONAL CONFERENCE ON CYBER-PHYSICAL SYSTEMS, ICCPS 2025, 2025,
[42]   SiMOOD: Evolutionary Testing Simulation with Out-Of-Distribution Images [J].
Ferreira, Raul Sena ;
Guerin, Joris ;
Guiochet, Jeremie ;
Waeselynck, Helene .
2022 IEEE 27TH PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING (PRDC), 2022, :68-77
[43]   Out-of-Distribution Detection for Fungi Images with Similar Features [J].
Kawashima, Yutaka ;
Higo, Mayuka ;
Tokiwa, Toshiyuki ;
Asami, Yukihiro ;
Nonaka, Kenichi ;
Aoki, Yoshimitsu .
FIFTEENTH INTERNATIONAL CONFERENCE ON QUALITY CONTROL BY ARTIFICIAL VISION, 2021, 11794
[44]   Real-time Out-of-distribution Detection in Learning-Enabled Cyber-Physical Systems [J].
Cai, Feiyang ;
Koutsoukos, Xenofon .
2020 ACM/IEEE 11TH INTERNATIONAL CONFERENCE ON CYBER-PHYSICAL SYSTEMS (ICCPS 2020), 2020, :174-183
[45]   Out-of-distribution monocular depth estimation with local invariant regression☆ [J].
Hu, Yeqi ;
Rao, Yuan ;
Yu, Hui ;
Wang, Gaige ;
Fan, Hao ;
Pang, Wei ;
Dong, Junyu .
KNOWLEDGE-BASED SYSTEMS, 2025, 319
[46]   Out-of-distribution detection using normalizing flows on the data manifold [J].
Razavi, Seyedeh Fatemeh ;
Mehmanchi, Mohammadmahdi ;
Hosseini, Reshad ;
Tavassolipour, Mostafa .
APPLIED INTELLIGENCE, 2025, 55 (07)
[47]   Methods for Non-Intrusive Out-Of-Distribution Images Detection [J].
Vlasova, Anastasiia V. ;
Shkanaev, Aleksandr Yu. ;
Sholomov, Dmitry L. .
SIXTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION, ICMV 2023, 2024, 13072
[48]   Out-of-Distribution Object Detection in Autonomous Vehicles With Yolo Model [J].
Honorato, Eduardo Sperle ;
Suzuki Uchida, Mariana Aya ;
Segreto Silva, Thiago Henrique ;
Wolf, Denis Fernando .
2024 LATIN AMERICAN ROBOTICS SYMPOSIUM, LARS 2024, 2024,
[49]   Trustworthy diagnosis of Electrocardiography signals based on out-of-distribution detection [J].
Yu, Bowen ;
Liu, Yuhong ;
Wu, Xin ;
Ren, Jing ;
Zhao, Zhibin .
PLOS ONE, 2025, 20 (02)
[50]   EFOA: Enhancing Out-of-Distribution Detection by Fake Outlier Augmentation [J].
Wang, Peng ;
Chen, Jiankang ;
Zhou, Yuren ;
Wang, Ruixuan .
PATTERN RECOGNITION AND COMPUTER VISION, PT III, PRCV 2024, 2025, 15033 :89-103