On the Data Quality and Imbalance in Machine Learning-based Design and Manufacturing-A Systematic Review

被引:0
作者
Xie, Jiarui [1 ]
Sun, Lijun [1 ,2 ]
Zhao, Yaoyao Fiona [1 ]
机构
[1] McGill Univ, Dept Mech Engn, Addit Design & Mfg Lab, Montreal, PQ H3A 0G4, Canada
[2] McGill Univ, Dept Civil Engn, Smart Transportat Lab, Montreal, PQ H3A 0G4, Canada
来源
ENGINEERING | 2025年 / 45卷
关键词
Machine learning; Design and manufacturing; Data quality; Data augmentation; Active learning; CONVOLUTIONAL NEURAL-NETWORK; DATA GOVERNANCE; DEEP; FRAMEWORK; VISION; METHODOLOGY; INSPECTION; SELECTION; MODEL;
D O I
10.1016/j.eng.2024.04.024
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Machine learning (ML) has recently enabled many modeling tasks in design, manufacturing, and condition monitoring due to its unparalleled learning ability using existing data. Data have become the limiting factor when implementing ML in industry. However, there is no systematic investigation on how data quality can be assessed and improved for ML-based design and manufacturing. The aim of this survey is to uncover the data challenges in this domain and review the techniques used to resolve them. To establish the background for the subsequent analysis, crucial data terminologies in ML-based modeling are reviewed and categorized into data acquisition, management, analysis, and utilization. Thereafter, the concepts and frameworks established to evaluate data quality and imbalance, including data quality assessment, data readiness, information quality, data biases, fairness, and diversity, are further investigated. The root causes and types of data challenges, including human factors, complex systems, complicated relationships, lack of data quality, data heterogeneity, data imbalance, and data scarcity, are identified and summarized. Methods to improve data quality and mitigate data imbalance and their applications in this domain are reviewed. This literature review focuses on two promising methods: data augmentation and active learning. The strengths, limitations, and applicability of the surveyed techniques are illustrated. The trends of data augmentation and active learning are discussed with respect to their applications, data types, and approaches. Based on this discussion, future directions for data quality improvement and data imbalance mitigation in this domain are identified. (c) 2024 THE AUTHORS. Published by Elsevier LTD on behalf of Chinese Academy of Engineering and Higher Education Press Limited Company. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:105 / 131
页数:27
相关论文
共 223 条
  • [1] Alaa A, 2022, arXiv
  • [2] Alasadi S. A., 2017, Journal of Engineering and Applied Sciences, V12, P4102, DOI DOI 10.3923/JEASCI.2017.4102.4107
  • [3] Wafer Map Defect Patterns Classification using Deep Selective Learning
    Alawieh, Mohamed Baker
    Boning, Duane
    Pan, David Z.
    [J]. PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2020,
  • [4] Data governance activities: an analysis of the literature
    Alhassan, Ibrahim
    Sammon, David
    Daly, Mary
    [J]. JOURNAL OF DECISION SYSTEMS, 2016, 25 : 64 - 75
  • [5] Ali H., 2019, INDONESIAN J ELECT E, V14, P1560
  • [6] [Anonymous], 2016, arXiv preprint arXiv:1610.00768, DOI DOI 10.1038/NATURE14539
  • [7] GRAPHS IN STATISTICAL-ANALYSIS
    ANSCOMBE, FJ
    [J]. AMERICAN STATISTICIAN, 1973, 27 (01) : 17 - 21
  • [8] Aviation Data Analytics in MRO Operations: Prospects and Pitfalls
    Apostolidis, Asteris
    Pelt, Maurice
    Stamoulis, Konstantinos P.
    [J]. 2020 ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM (RAMS 2020), 2020,
  • [9] FactSheets: Increasing trust in AI services through supplier's declarations of conformity
    Arnold, M.
    Bellamy, R. K. E.
    Hind, M.
    Houde, S.
    Mehta, S.
    Mojsilovic, A.
    Nair, R.
    Ramamurthy, K. Natesan
    Olteanu, A.
    Piorkowski, D.
    Reimer, D.
    Richards, J.
    Tsay, J.
    Varshney, K. R.
    [J]. IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2019, 63 (4-5)
  • [10] Askham Nicola., 2013, 6 PRIMARY DIMENSIONS