Imbalanced generative sampling of training data for improving quality of machine learning model

被引:0
作者
Coskun, Umut Can [1 ]
Dogan, Kemal Mert [2 ]
Gunpinar, Erkan [3 ]
机构
[1] Numedyne Informat & Engn Inc, Istanbul, Turkiye
[2] Yildiz Tech Univ, TR-34210 Istanbul, Turkiye
[3] Istanbul Tech Univ, Istanbul, Turkiye
关键词
Imbalanced sampling; Machine learning; Computer-aided design; Design exploration; Training data; Computational fluid dynamics; DESIGN; OPTIMIZATION; PERFORMANCE; UNCERTAINTY; ALGORITHM; SYSTEM;
D O I
10.1016/j.aei.2024.102631
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Design exploration in engineering applications often requires a meticulous experimental or numerical study to evaluate performance ( Y) of each design, which may require great effort, time or resources. Reducing the number of these tests for finding a good design is of paramount importance in all engineering fields. This study aims at computing a machine learning (ML) model using less number of designs as training data. Uniform sampling (US) in the design space (based on predefined design parameters) to obtain a training data is a promising approach. We further extend this sampling concept to obtain designs in the design space by also employing the ML model. The designs are selected via two non -uniform (imbalanced) sampling methods (namely, height -based sampling - HBS and gradient -based sampling - GBS) while considering their Y and gradient, dY, values. These values are divided into uniform intervals, and we aim at equalizing the number of designs in the training data at each interval as much as possible. This can force designs to have minimum or maximum Y or dY values, which, in fact, lie on small portion of the design space, in general. Therefore, capturing designs from all design space portions can be enabled. Results of the proposed methods are compared against US along with two well studied non -uniform sampling strategies, Stratified Over Sampling (SOS) and Gaussian -Process Based Sampling (GPBS). To reliably investigate quality of ML models obtained using designs sampled via US, SOS, GPBS, HBS and GBS, we utilize standard test (known) functions (such as Easom and Beale ) as substitutes for engineering problems. According to the results presented, ML models using HBS and GBS have either better prediction accuracy or wider applicability compared to all other tested sampling methods.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Comparison Of The Different Sampling Techniques For Imbalanced Classification Problems In Machine Learning
    Peng Zhihao
    Yan Fenglong
    Li Xucheng
    2019 11TH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA 2019), 2019, : 431 - 434
  • [22] A machine learning method for incomplete and imbalanced medical data
    Salman, Issam
    Vomlel, Jiri
    PROCEEDINGS OF THE 20TH CZECH-JAPAN SEMINAR ON DATA ANALYSIS AND DECISION MAKING UNDER UNCERTAINTY, 2017, : 188 - 195
  • [23] Machine-learning classifiers for imbalanced tornado data
    Trafalis T.B.
    Adrianto I.
    Richman M.B.
    Lakshmivarahan S.
    Computational Management Science, 2014, 11 (4) : 403 - 418
  • [24] Effect of Training Data Order for Machine Learning
    Mange, Jeremy
    2019 6TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI 2019), 2019, : 406 - 407
  • [25] Improving outdoor thermal environmental quality through kinetic canopy empowered by machine learning and control algorithms
    Zeng, Tiancheng
    Ma, Xintong
    Luo, Yilu
    Yin, Jun
    Ji, Yuxin
    Lu, Shuai
    BUILDING SIMULATION, 2025, : 699 - 720
  • [26] A binary PSO-based ensemble under-sampling model for rebalancing imbalanced training data
    Li, Jinyan
    Wu, Yaoyang
    Fong, Simon
    Tallon-Ballesteros, Antonio J.
    Yang, Xin-she
    Mohammed, Sabah
    Wu, Feng
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (05) : 7428 - 7463
  • [27] Predicting severely imbalanced data disk drive failures with machine learning models
    Ahmed, Jishan
    Green II, Robert C.
    MACHINE LEARNING WITH APPLICATIONS, 2022, 9
  • [28] Machine learning-based sensitivity of steel frames with highly imbalanced and data
    Koh, Hyeyoung
    Blum, Hannah B.
    ENGINEERING STRUCTURES, 2022, 259
  • [29] Comparison of Sampling Methods Using Machine Learning and Deep Learning Algorithms with an Imbalanced Data Set for the Prevention of Violence Against Physicians
    Cakir, Hilal
    Incereis, Nilgun
    Akgun, Bekir Tevfik
    Tastemir, Av. S. Yazgulu
    2021 15TH TURKISH NATIONAL SOFTWARE ENGINEERING SYMPOSIUM (UYMS), 2021, : 88 - 94
  • [30] Improving the Quality of Art Market Data Using Linked Open Data and Machine Learning
    Filipiak, Dominik
    Filipowska, Agata
    BUSINESS INFORMATION SYSTEMS WORKSHOPS, BIS 2016, 2017, 263 : 418 - 428