Systematic review of data-centric approaches in artificial intelligence and machine learning

被引:0
|
作者
Singh P. [1 ]
机构
[1] Wellington, New Zealand
来源
Data Science and Management | 2023年 / 6卷 / 03期
基金
英国惠康基金; 美国国家科学基金会; 美国国家卫生研究院; 欧洲研究理事会;
关键词
Data management; Data preprocessing; Data-centric; Machine learning; MLOps; Semi-supervised learning; Technical debt;
D O I
10.1016/j.dsm.2023.06.001
中图分类号
学科分类号
摘要
Artificial intelligence (AI) relies on data and algorithms. State-of-the-art (SOTA) AI smart algorithms have been developed to improve the performance of AI-oriented structures. However, model-centric approaches are limited by the absence of high-quality data. Data-centric AI is an emerging approach for solving machine learning (ML) problems. It is a collection of various data manipulation techniques that allow ML practitioners to systematically improve the quality of the data used in an ML pipeline. However, data-centric AI approaches are not well documented. Researchers have conducted various experiments without a clear set of guidelines. This survey highlights six major data-centric AI aspects that researchers are already using to intentionally or unintentionally improve the quality of AI systems. These include big data quality assessment, data preprocessing, transfer learning, semi-supervised learning, machine ​learning ​operations (MLOps), and the effect of adding more data. In addition, it highlights recent data-centric techniques adopted by ML practitioners. We addressed how adding data might harm datasets and how HoloClean can be used to restore and clean them. Finally, we discuss the causes of technical debt in AI. Technical debt builds up when software design and implementation decisions run into “or outright collide with” business goals and timelines. This survey lays the groundwork for future data-centric AI discussions by summarizing various data-centric approaches. © 2023 Xi'an Jiaotong University
引用
收藏
页码:144 / 157
页数:13
相关论文
共 50 条
  • [1] Data-centric artificial intelligence in oncology: a systematic review assessing data quality in machine learning models for head and neck cancer
    Adeoye, John
    Hui, Liuling
    Su, Yu-Xiong
    JOURNAL OF BIG DATA, 2023, 10 (01)
  • [2] Data-centric artificial intelligence in oncology: a systematic review assessing data quality in machine learning models for head and neck cancer
    John Adeoye
    Liuling Hui
    Yu-Xiong Su
    Journal of Big Data, 10
  • [3] Data-Centric Artificial Intelligence
    Jakubik, Johannes
    Voessing, Michael
    Kuehl, Niklas
    Walk, Jannis
    Satzger, Gerhard
    BUSINESS & INFORMATION SYSTEMS ENGINEERING, 2024, 66 (04) : 507 - 515
  • [4] Data-Centric Approaches to Radio Frequency Machine Learning
    Kuzdeba, Scott
    Robinson, Josh
    2022 IEEE MILITARY COMMUNICATIONS CONFERENCE (MILCOM), 2022,
  • [5] Data-centric Artificial Intelligence: A Survey
    Zha, Daochen
    Bhat, Zaid Pervaiz
    Lai, Kwei-Herng
    Yang, Fan
    Jiang, Zhimeng
    Zhong, Shaochen
    Hu, Xia
    ACM COMPUTING SURVEYS, 2025, 57 (05)
  • [6] Data-Centric Green Artificial Intelligence: A Survey
    Salehi S.
    Schmeink A.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (05): : 1973 - 1989
  • [7] Technical Analysis of Data-Centric and Model-Centric Artificial Intelligence
    Majeed, Abdul
    Hwang, Seong Oun
    IT PROFESSIONAL, 2023, 25 (06) : 62 - 70
  • [8] Machine learning for data-centric epidemic forecasting
    Rodriguez, Alexander
    Kamarthi, Harshavardhan
    Agarwal, Pulak
    Ho, Javen
    Patel, Mira
    Sapre, Suchet
    Prakash, B. Aditya
    NATURE MACHINE INTELLIGENCE, 2024, 6 (10) : 1122 - 1131
  • [9] A Data-Centric Optimization Framework for Machine Learning
    Rausch, Oliver
    Ben-Nun, Tal
    Dryden, Nikoli
    Ivanov, Andrei
    Li, Shigang
    Hoefler, Torsten
    PROCEEDINGS OF THE 36TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ICS 2022, 2022,
  • [10] Data-Centric Artificial Intelligence, Preprocessing, and the Quest for Transformative Artificial Intelligence Systems Development
    Majeed, Abdul
    Hwang, Seong Oun
    COMPUTER, 2023, 56 (05) : 109 - 115