Imbalanced generative sampling of training data for improving quality of machine learning model

被引：0

作者：

Coskun, Umut Can ^{[1
]}

Dogan, Kemal Mert ^{[2
]}

Gunpinar, Erkan ^{[3
]}

机构：

[1] Numedyne Informat & Engn Inc, Istanbul, Turkiye

[2] Yildiz Tech Univ, TR-34210 Istanbul, Turkiye

[3] Istanbul Tech Univ, Istanbul, Turkiye

来源：

ADVANCED ENGINEERING INFORMATICS | 2024年 / 62卷

关键词：

Imbalanced sampling; Machine learning; Computer-aided design; Design exploration; Training data; Computational fluid dynamics; DESIGN; OPTIMIZATION; PERFORMANCE; UNCERTAINTY; ALGORITHM; SYSTEM;

D O I：

10.1016/j.aei.2024.102631

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Design exploration in engineering applications often requires a meticulous experimental or numerical study to evaluate performance ( Y) of each design, which may require great effort, time or resources. Reducing the number of these tests for finding a good design is of paramount importance in all engineering fields. This study aims at computing a machine learning (ML) model using less number of designs as training data. Uniform sampling (US) in the design space (based on predefined design parameters) to obtain a training data is a promising approach. We further extend this sampling concept to obtain designs in the design space by also employing the ML model. The designs are selected via two non -uniform (imbalanced) sampling methods (namely, height -based sampling - HBS and gradient -based sampling - GBS) while considering their Y and gradient, dY, values. These values are divided into uniform intervals, and we aim at equalizing the number of designs in the training data at each interval as much as possible. This can force designs to have minimum or maximum Y or dY values, which, in fact, lie on small portion of the design space, in general. Therefore, capturing designs from all design space portions can be enabled. Results of the proposed methods are compared against US along with two well studied non -uniform sampling strategies, Stratified Over Sampling (SOS) and Gaussian -Process Based Sampling (GPBS). To reliably investigate quality of ML models obtained using designs sampled via US, SOS, GPBS, HBS and GBS, we utilize standard test (known) functions (such as Easom and Beale ) as substitutes for engineering problems. According to the results presented, ML models using HBS and GBS have either better prediction accuracy or wider applicability compared to all other tested sampling methods.

引用

页数：14

共 50 条

[1] Sampling Approaches for Imbalanced Data Classification Problem in Machine Learning
Tyagi, Shivani
Mittal, Sangeeta
PROCEEDINGS OF RECENT INNOVATIONS IN COMPUTING, ICRIC 2019, 2020, 597 : 209 - 221
[2] Generative learning for imbalanced data using the Gaussian mixed model
Xie, Yuxi
Peng, Lizhi
Chen, Zhenxiang
Yang, Bo
Zhang, Hongli
Zhang, Haibo
APPLIED SOFT COMPUTING, 2019, 79 : 439 - 451
[3] On Machine Learning with Imbalanced Data and Research Quality Evaluation Methodologies
Lipitakis, Anastasia-Dimitra
Lipitakis, Evangelia A. E. C.
2014 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), VOL 1, 2014, : 451 - 457
[4] Imbalanced Data Problem in Machine Learning: A Review
Altalhan, Manahel
Algarni, Abdulmohsen
Alouane, Monia Turki-Hadj
IEEE ACCESS, 2025, 13 : 13686 - 13699
[5] Uncertainty quantification driven machine learning for improving model accuracy in imbalanced regression tasks
Dolar, Tuba
Chen, Jie
Chen, Wei
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 261
[6] Online Extreme Learning Machine with Hybrid Sampling Strategy for Sequential Imbalanced Data
Mao, Wentao
Jiang, Mengxue
Wang, Jinwan
Li, Yuan
COGNITIVE COMPUTATION, 2017, 9 (06) : 780 - 800
[7] A Hybrid Machine Learning Approach for Improving Mortality Risk Prediction on Imbalanced Data
Tashkandi, Araek
Wiese, Lena
IIWAS2019: THE 21ST INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES, 2019, : 83 - 92
[8] Machine Learning and Synthetic Minority Oversampling Techniques for Imbalanced Data: Improving Machine Failure Prediction
Wah, Yap Bee
Ismail, Azlan
Azid, Nur Niswah Naslina
Jaafar, Jafreezal
Aziz, Izzatdin Abdul
Hasan, Mohd Hilmi
Zain, Jasni Mohamad
CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (03): : 4821 - 4841
[9] A comparative analysis of machine learning techniques for imbalanced data
Mrad, Ali Ben
Lahiani, Amine
Mefteh-Wali, Salma
Mselmi, Nada
ANNALS OF OPERATIONS RESEARCH, 2024,
[10] Machine Learning on Imbalanced Data in Credit Risk
Birla, Shiivong
Kohli, Kashish
Dutta, Akash
7TH IEEE ANNUAL INFORMATION TECHNOLOGY, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE IEEE IEMCON-2016, 2016,

← 1 2 3 4 5 →