Systematic review of data-centric approaches in artificial intelligence and machine learning

被引:0
|
作者
Singh P. [1 ]
机构
[1] Wellington, New Zealand
来源
Data Science and Management | 2023年 / 6卷 / 03期
基金
英国惠康基金; 美国国家科学基金会; 美国国家卫生研究院; 欧洲研究理事会;
关键词
Data management; Data preprocessing; Data-centric; Machine learning; MLOps; Semi-supervised learning; Technical debt;
D O I
10.1016/j.dsm.2023.06.001
中图分类号
学科分类号
摘要
Artificial intelligence (AI) relies on data and algorithms. State-of-the-art (SOTA) AI smart algorithms have been developed to improve the performance of AI-oriented structures. However, model-centric approaches are limited by the absence of high-quality data. Data-centric AI is an emerging approach for solving machine learning (ML) problems. It is a collection of various data manipulation techniques that allow ML practitioners to systematically improve the quality of the data used in an ML pipeline. However, data-centric AI approaches are not well documented. Researchers have conducted various experiments without a clear set of guidelines. This survey highlights six major data-centric AI aspects that researchers are already using to intentionally or unintentionally improve the quality of AI systems. These include big data quality assessment, data preprocessing, transfer learning, semi-supervised learning, machine ​learning ​operations (MLOps), and the effect of adding more data. In addition, it highlights recent data-centric techniques adopted by ML practitioners. We addressed how adding data might harm datasets and how HoloClean can be used to restore and clean them. Finally, we discuss the causes of technical debt in AI. Technical debt builds up when software design and implementation decisions run into “or outright collide with” business goals and timelines. This survey lays the groundwork for future data-centric AI discussions by summarizing various data-centric approaches. © 2023 Xi'an Jiaotong University
引用
收藏
页码:144 / 157
页数:13
相关论文
共 50 条
  • [31] Artificial Intelligence in Drug Discovery: A Comprehensive Review of Data-driven and Machine Learning Approaches
    Hyunho Kim
    Eunyoung Kim
    Ingoo Lee
    Bongsung Bae
    Minsu Park
    Hojung Nam
    Biotechnology and Bioprocess Engineering, 2020, 25 : 895 - 930
  • [32] Role of artificial intelligence in data-centric additive manufacturing processes for biomedical applications
    Mohammadnabi, Saman
    Moslemy, Nima
    Taghvaei, Hadi
    Zia, Abdul Wasy
    Askarinejad, Sina
    Shalchy, Faezeh
    JOURNAL OF THE MECHANICAL BEHAVIOR OF BIOMEDICAL MATERIALS, 2025, 166
  • [33] A data-centric review of deep transfer learning with applications to text data
    Bashath, Samar
    Perera, Nadeesha
    Tripathi, Shailesh
    Manjang, Kalifa
    Dehmer, Matthias
    Streib, Frank Emmert
    INFORMATION SCIENCES, 2022, 585 : 498 - 528
  • [34] Applications of Artificial Intelligence, Machine Learning, and Deep Learning in Nutrition: A Systematic Review
    Armand, Tagne Poupi Theodore
    Nfor, Kintoh Allen
    Kim, Jung-In
    Kim, Hee-Cheol
    NUTRIENTS, 2024, 16 (07)
  • [35] Better, Not Just More: Data-centric machine learning for Earth observation
    Roscher, Ribana
    Russwurm, Marc
    Gevaert, Caroline
    Kampffmeyer, Michael
    Dos Santos, Jefersson A.
    Vakalopoulou, Maria
    Haensch, Ronny
    Hansen, Stine
    Nogueira, Keiller
    Prexl, Jonathan
    Tuia, Devis
    IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2024, 12 (04) : 335 - 355
  • [36] Data-centric approach to improve machine learning models for inorganic materials
    Bartel, Christopher J.
    PATTERNS, 2021, 2 (11):
  • [37] Machine Learning for Failure Management in Microwave Networks: A Data-Centric Approach
    Di Cicco, Nicola
    Ibrahimi, Memedhe
    Musumeci, Francesco
    Bruschetta, Federica
    Milano, Michele
    Passera, Claudio
    Tornatore, Massimo
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2024, 21 (05): : 5420 - 5431
  • [38] A Review of Data-Centric Artificial Intelligence (DCAI) and its Impact on manufacturing Industry: Challenges, Limitations, and Future Directions
    Nieberl, Michael
    Zeiser, Alexander
    Timinger, Holger
    2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 44 - 51
  • [39] Artificial Intelligence and Suicide Prevention: A Systematic Review of Machine Learning Investigations
    Bernert, Rebecca A.
    Hilberg, Amanda M.
    Melia, Ruth
    Kim, Jane Paik
    Shah, Nigam H.
    Abnousi, Freddy
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2020, 17 (16) : 1 - 25
  • [40] A systematic review of the applications of artificial intelligence and machine learning in autoimmune diseases
    I. S. Stafford
    M. Kellermann
    E. Mossotto
    R. M. Beattie
    B. D. MacArthur
    S. Ennis
    npj Digital Medicine, 3