Machine and deep learning performance in out-of-distribution regressions

被引:0
作者
Shmuel, Assaf [1 ]
Glickman, Oren [1 ]
Lazebnik, Teddy [2 ]
机构
[1] Department of Computer Science, Bar Ilan University, Ramat Gan
[2] Department of Cancer Biology, Cancer Institute, University College London, London
来源
Machine Learning: Science and Technology | 2024年 / 5卷 / 04期
关键词
data-driven model generalization; feature engineering; machine learning robustness; out of distribution; symbolic regression;
D O I
10.1088/2632-2153/ada221
中图分类号
学科分类号
摘要
Machine learning (ML) and deep learning (DL) models are gaining popularity due to their effectiveness in many computational tasks. These models are based on an intuitive, but frequently unsatisfied, assumption that the data used to train these models is well-representing the task at hand. This gives rise to the out-of-distribution (OOD) challenge which can cause an unexpected drop in the data-driven model’s performance. In this study, we evaluate the performance of various ML and DL models in in-distribution (ID) versus OOD prediction. While the degradation in OOD performance is well acknowledged, to the best of our knowledge, this is one of the first studies to quantify it for various models on a large benchmark n = 15 real-world regression datasets. We extensively ( n > 40 000 runs) compare the ID versus OOD performance of XGBoost, random forest, K-nearest-neighbors, support vector machine, and linear regression models, as well as AutoML models (Tree-based Pipeline Optimization Tool and AutoKeras). In addition, to tackle this challenge, we propose to integrate a symbolic regression (SR) as a feature engineering method model with an ML or DL model to improve its performance for OOD samples. Our results show that the incorporation of SR-derived features significantly enhances the predictive capabilities of both ML and DL models with 3.70% and 10.20%, on average, of the OOD samples, respectively, without reducing ID performance and in fact improving it to a slightly lower extent. As such, this method can help produce more generalized and robust data-driven models. © 2025 The Author(s). Published by IOP Publishing Ltd.
引用
收藏
相关论文
共 50 条
[1]   Cats Are Not Fish: Deep Learning Testing Calls for Out-Of-Distribution Awareness [J].
Berend, David ;
Xie, Xiaofei ;
Ma, Lei ;
Zhou, Lingjun ;
Liu, Yang ;
Xu, Chi ;
Zhao, Jianjun .
2020 35TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2020), 2020, :1041-1052
[2]   Detecting Out-of-Distribution Data in Wireless Communications Applications of Deep Learning [J].
Liu, Jinshan ;
Oyedare, Taiwo ;
Park, Jung-Min .
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2022, 21 (04) :2476-2487
[3]   Predicting Out-of-Distribution Performance of Deep Neural Networks Using Model Conformance [J].
Kaur, Ramneet ;
Jha, Susmit ;
Roy, Anirban ;
Sokolsky, Oleg ;
Lee, Insup .
2023 IEEE INTERNATIONAL CONFERENCE ON ASSURED AUTONOMY, ICAA, 2023, :19-28
[4]   LEARNING WITH OUT-OF-DISTRIBUTION DATA FOR AUDIO CLASSIFICATION [J].
Iqbal, Turab ;
Cao, Yin ;
Kong, Qiuqiang ;
Plumbley, Mark D. ;
Wang, Wenwu .
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, :636-640
[5]   Deep Individual Active Learning: Safeguarding against Out-of-Distribution Challenges in Neural Networks [J].
Shayovitz, Shachar ;
Bibas, Koby ;
Feder, Meir .
ENTROPY, 2024, 26 (02)
[6]   Wafer Map Classifier using Deep Learning for Detecting Out-of-Distribution Failure Patterns [J].
Kim, Yusung ;
Cho, Donghee ;
Lee, Jee-Hyong .
2020 IEEE INTERNATIONAL SYMPOSIUM ON THE PHYSICAL AND FAILURE ANALYSIS OF INTEGRATED CIRCUITS (IPFA), 2020,
[7]   CODI: Enhancing machine learning-based molecular profiling through contextual out-of-distribution integration [J].
Eissa, Tarek ;
Huber, Marinus ;
Obermayer-Pietsch, Barbara ;
Linkohr, Birgit ;
Peters, Annette ;
Fleischmann, Frank ;
Zigman, Mihaela .
PNAS NEXUS, 2024, 3 (10)
[8]   OUT-OF-DISTRIBUTION AS A TARGET CLASS IN SEMI-SUPERVISED LEARNING [J].
Tadros, Antoine ;
Drouyer, Sebastien ;
von Gioi, Rafael Grompone .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :3249-3252
[9]   Performance analysis of out-of-distribution detection on trained neural networks [J].
Henriksson, Jens ;
Berger, Christian ;
Borg, Markus ;
Tornberg, Lars ;
Sathyamoorthy, Sankar Raman ;
Englund, Cristofer .
INFORMATION AND SOFTWARE TECHNOLOGY, 2021, 130
[10]   A deep semi-supervised learning approach to the detection of glaucoma on out-of-distribution retinal fundus image datasets [J].
Lei Wang ;
Xiaoyun Zhang ;
Zhongwen Li ;
Shuchen Yu ;
Yabo Wu ;
Shaodan Zhang ;
Gaoqiang Jiang ;
Bihan Tian ;
Chenyang Mei ;
Jiantao Pu ;
Yuanbo Liang ;
Quanyong Yi ;
Wencan Wu .
BMC Ophthalmology, 25 (1)