Systematic review of data-centric approaches in artificial intelligence and machine learning

被引:0
|
作者
Singh P. [1 ]
机构
[1] Wellington, New Zealand
来源
Data Science and Management | 2023年 / 6卷 / 03期
基金
欧洲研究理事会; 美国国家卫生研究院; 英国惠康基金; 美国国家科学基金会;
关键词
Data management; Data preprocessing; Data-centric; Machine learning; MLOps; Semi-supervised learning; Technical debt;
D O I
10.1016/j.dsm.2023.06.001
中图分类号
学科分类号
摘要
Artificial intelligence (AI) relies on data and algorithms. State-of-the-art (SOTA) AI smart algorithms have been developed to improve the performance of AI-oriented structures. However, model-centric approaches are limited by the absence of high-quality data. Data-centric AI is an emerging approach for solving machine learning (ML) problems. It is a collection of various data manipulation techniques that allow ML practitioners to systematically improve the quality of the data used in an ML pipeline. However, data-centric AI approaches are not well documented. Researchers have conducted various experiments without a clear set of guidelines. This survey highlights six major data-centric AI aspects that researchers are already using to intentionally or unintentionally improve the quality of AI systems. These include big data quality assessment, data preprocessing, transfer learning, semi-supervised learning, machine ​learning ​operations (MLOps), and the effect of adding more data. In addition, it highlights recent data-centric techniques adopted by ML practitioners. We addressed how adding data might harm datasets and how HoloClean can be used to restore and clean them. Finally, we discuss the causes of technical debt in AI. Technical debt builds up when software design and implementation decisions run into “or outright collide with” business goals and timelines. This survey lays the groundwork for future data-centric AI discussions by summarizing various data-centric approaches. © 2023 Xi'an Jiaotong University
引用
收藏
页码:144 / 157
页数:13
相关论文
共 50 条
  • [21] A data-centric review of deep transfer learning with applications to text data
    Bashath, Samar
    Perera, Nadeesha
    Tripathi, Shailesh
    Manjang, Kalifa
    Dehmer, Matthias
    Streib, Frank Emmert
    INFORMATION SCIENCES, 2022, 585 : 498 - 528
  • [22] Better, Not Just More: Data-centric machine learning for Earth observation
    Roscher, Ribana
    Russwurm, Marc
    Gevaert, Caroline
    Kampffmeyer, Michael
    Dos Santos, Jefersson A.
    Vakalopoulou, Maria
    Haensch, Ronny
    Hansen, Stine
    Nogueira, Keiller
    Prexl, Jonathan
    Tuia, Devis
    IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2024, 12 (04) : 335 - 355
  • [23] Data-centric approach to improve machine learning models for inorganic materials
    Bartel, Christopher J.
    PATTERNS, 2021, 2 (11):
  • [24] Machine Learning for Failure Management in Microwave Networks: A Data-Centric Approach
    Di Cicco, Nicola
    Ibrahimi, Memedhe
    Musumeci, Francesco
    Bruschetta, Federica
    Milano, Michele
    Passera, Claudio
    Tornatore, Massimo
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2024, 21 (05): : 5420 - 5431
  • [25] A Review of Data-Centric Artificial Intelligence (DCAI) and its Impact on manufacturing Industry: Challenges, Limitations, and Future Directions
    Nieberl, Michael
    Zeiser, Alexander
    Timinger, Holger
    2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 44 - 51
  • [26] Applications of Artificial Intelligence, Machine Learning, and Deep Learning in Nutrition: A Systematic Review
    Armand, Tagne Poupi Theodore
    Nfor, Kintoh Allen
    Kim, Jung-In
    Kim, Hee-Cheol
    NUTRIENTS, 2024, 16 (07)
  • [27] Techniques and applications of Machine Learning and Artificial Intelligence in education: a systematic review
    Forero-Corba, Wiston
    Bennasar, Francisca Negre
    RIED-REVISTA IBEROAMERICANA DE EDUCACION A DISTANCIA, 2024, 27 (01):
  • [28] Artificial Intelligence and Suicide Prevention: A Systematic Review of Machine Learning Investigations
    Bernert, Rebecca A.
    Hilberg, Amanda M.
    Melia, Ruth
    Kim, Jane Paik
    Shah, Nigam H.
    Abnousi, Freddy
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2020, 17 (16) : 1 - 25
  • [29] Artificial intelligence and machine learning in orthopedic surgery: a systematic review protocol
    Maffulli, Nicola
    Rodriguez, Hugo C.
    Stone, Ian W.
    Nam, Andrew
    Song, Albert
    Gupta, Manu
    Alvarado, Rebecca
    Ramon, David
    Gupta, Ashim
    JOURNAL OF ORTHOPAEDIC SURGERY AND RESEARCH, 2020, 15 (01)
  • [30] Artificial intelligence and machine learning in orthopedic surgery: a systematic review protocol
    Nicola Maffulli
    Hugo C. Rodriguez
    Ian W. Stone
    Andrew Nam
    Albert Song
    Manu Gupta
    Rebecca Alvarado
    David Ramon
    Ashim Gupta
    Journal of Orthopaedic Surgery and Research, 15