Systematic review of data-centric approaches in artificial intelligence and machine learning

被引:0
|
作者
Singh P. [1 ]
机构
[1] Wellington, New Zealand
来源
Data Science and Management | 2023年 / 6卷 / 03期
基金
英国惠康基金; 美国国家科学基金会; 美国国家卫生研究院; 欧洲研究理事会;
关键词
Data management; Data preprocessing; Data-centric; Machine learning; MLOps; Semi-supervised learning; Technical debt;
D O I
10.1016/j.dsm.2023.06.001
中图分类号
学科分类号
摘要
Artificial intelligence (AI) relies on data and algorithms. State-of-the-art (SOTA) AI smart algorithms have been developed to improve the performance of AI-oriented structures. However, model-centric approaches are limited by the absence of high-quality data. Data-centric AI is an emerging approach for solving machine learning (ML) problems. It is a collection of various data manipulation techniques that allow ML practitioners to systematically improve the quality of the data used in an ML pipeline. However, data-centric AI approaches are not well documented. Researchers have conducted various experiments without a clear set of guidelines. This survey highlights six major data-centric AI aspects that researchers are already using to intentionally or unintentionally improve the quality of AI systems. These include big data quality assessment, data preprocessing, transfer learning, semi-supervised learning, machine ​learning ​operations (MLOps), and the effect of adding more data. In addition, it highlights recent data-centric techniques adopted by ML practitioners. We addressed how adding data might harm datasets and how HoloClean can be used to restore and clean them. Finally, we discuss the causes of technical debt in AI. Technical debt builds up when software design and implementation decisions run into “or outright collide with” business goals and timelines. This survey lays the groundwork for future data-centric AI discussions by summarizing various data-centric approaches. © 2023 Xi'an Jiaotong University
引用
收藏
页码:144 / 157
页数:13
相关论文
共 50 条
  • [41] Artificial intelligence and machine learning in orthopedic surgery: a systematic review protocol
    Maffulli, Nicola
    Rodriguez, Hugo C.
    Stone, Ian W.
    Nam, Andrew
    Song, Albert
    Gupta, Manu
    Alvarado, Rebecca
    Ramon, David
    Gupta, Ashim
    JOURNAL OF ORTHOPAEDIC SURGERY AND RESEARCH, 2020, 15 (01)
  • [42] Artificial intelligence and machine learning in orthopedic surgery: a systematic review protocol
    Nicola Maffulli
    Hugo C. Rodriguez
    Ian W. Stone
    Andrew Nam
    Albert Song
    Manu Gupta
    Rebecca Alvarado
    David Ramon
    Ashim Gupta
    Journal of Orthopaedic Surgery and Research, 15
  • [43] Techniques and applications of Machine Learning and Artificial Intelligence in education: a systematic review
    Forero-Corba, Wiston
    Bennasar, Francisca Negre
    RIED-REVISTA IBEROAMERICANA DE EDUCACION A DISTANCIA, 2024, 27 (01):
  • [44] Device-Centric Sensing: An Alternative to Data-Centric Approaches
    Distefano, Salvatore
    Merlino, Giovanni
    Puliafito, Antonio
    IEEE SYSTEMS JOURNAL, 2017, 11 (01): : 231 - 241
  • [45] A systematic review of the applications of artificial intelligence and machine learning in autoimmune diseases
    Stafford, I. S.
    Kellermann, M.
    Mossotto, E.
    Beattie, R. M.
    MacArthur, B. D.
    Ennis, S.
    NPJ DIGITAL MEDICINE, 2020, 3 (01)
  • [46] DALEC: a framework for the systematic evaluation of data-centric approaches to process management software
    Sebastian Steinau
    Andrea Marrella
    Kevin Andrews
    Francesco Leotta
    Massimo Mecella
    Manfred Reichert
    Software & Systems Modeling, 2019, 18 : 2679 - 2716
  • [47] Navigating Data-Centric Artificial Intelligence with DC-Check: Advances, Challenges, and Opportunities
    Seedat N.
    Imrie F.
    Van Der Schaar M.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (06): : 2589 - 2603
  • [48] DALEC: a framework for the systematic evaluation of data-centric approaches to process management software
    Steinau, Sebastian
    Marrella, Andrea
    Andrews, Kevin
    Leotta, Francesco
    Mecella, Massimo
    Reichert, Manfred
    SOFTWARE AND SYSTEMS MODELING, 2019, 18 (04): : 2679 - 2716
  • [49] Data-Centric Graph Learning: A Survey
    Guo, Yuxin
    Bo, Deyu
    Yang, Cheng
    Lu, Zhiyuan
    Zhang, Zhongjian
    Liu, Jixi
    Peng, Yufei
    Shi, Chuan
    IEEE TRANSACTIONS ON BIG DATA, 2025, 11 (01) : 1 - 20
  • [50] Model and data-centric machine learning algorithms to address data scarcity for failure identification
    Khan, Lareb Zar
    Pedro, Joao
    Costa, Nelson
    Sgambelluri, Andrea
    Napoli, Antonio
    Sambo, Nicola
    JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING, 2024, 16 (03) : 369 - 381