Systematic review of data-centric approaches in artificial intelligence and machine learning

被引:0
|
作者
Singh P. [1 ]
机构
[1] Wellington, New Zealand
来源
Data Science and Management | 2023年 / 6卷 / 03期
基金
欧洲研究理事会; 美国国家卫生研究院; 英国惠康基金; 美国国家科学基金会;
关键词
Data management; Data preprocessing; Data-centric; Machine learning; MLOps; Semi-supervised learning; Technical debt;
D O I
10.1016/j.dsm.2023.06.001
中图分类号
学科分类号
摘要
Artificial intelligence (AI) relies on data and algorithms. State-of-the-art (SOTA) AI smart algorithms have been developed to improve the performance of AI-oriented structures. However, model-centric approaches are limited by the absence of high-quality data. Data-centric AI is an emerging approach for solving machine learning (ML) problems. It is a collection of various data manipulation techniques that allow ML practitioners to systematically improve the quality of the data used in an ML pipeline. However, data-centric AI approaches are not well documented. Researchers have conducted various experiments without a clear set of guidelines. This survey highlights six major data-centric AI aspects that researchers are already using to intentionally or unintentionally improve the quality of AI systems. These include big data quality assessment, data preprocessing, transfer learning, semi-supervised learning, machine ​learning ​operations (MLOps), and the effect of adding more data. In addition, it highlights recent data-centric techniques adopted by ML practitioners. We addressed how adding data might harm datasets and how HoloClean can be used to restore and clean them. Finally, we discuss the causes of technical debt in AI. Technical debt builds up when software design and implementation decisions run into “or outright collide with” business goals and timelines. This survey lays the groundwork for future data-centric AI discussions by summarizing various data-centric approaches. © 2023 Xi'an Jiaotong University
引用
收藏
页码:144 / 157
页数:13
相关论文
共 50 条
  • [41] Machine Learning, Deep Learning, Artificial Intelligence and Aesthetic Plastic Surgery: A Qualitative Systematic Review
    Nogueira, Raquel
    Eguchi, Marina
    Kasmirski, Julia
    de Lima, Bruno Veronez
    Dimatos, Dimitri Cardoso
    Lima, Diego L.
    Glatter, Robert
    Tran, David L.
    Piccinini, Pedro Salomao
    AESTHETIC PLASTIC SURGERY, 2025, 49 (01) : 389 - 399
  • [42] Empowering engineering with data, machine learning and artificial intelligence: a short introductive review
    Francisco Chinesta
    Elias Cueto
    Advanced Modeling and Simulation in Engineering Sciences, 9
  • [43] A Machine-Learning-Based Data-Centric Misbehavior Detection Model for Internet of Vehicles
    Sharma, Prinkle
    Liu, Hong
    IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (06) : 4991 - 4999
  • [44] What Is a Digital Twin? Experimental Design for a Data-Centric Machine Learning Perspective in Health
    Emmert-Streib, Frank
    Yli-Harja, Olli
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2022, 23 (21)
  • [45] Empowering engineering with data, machine learning and artificial intelligence: a short introductive review
    Chinesta, Francisco
    Cueto, Elias
    ADVANCED MODELING AND SIMULATION IN ENGINEERING SCIENCES, 2022, 9 (01)
  • [46] Data-Centric Machine Learning: Improving Model Performance and Understanding Through Dataset Analysis
    Westermann, Hannes
    Savelka, Jaromir
    Walker, Vern R.
    Ashley, Kevin D.
    Benyekhlef, Karim
    LEGAL KNOWLEDGE AND INFORMATION SYSTEMS, 2021, 346 : 54 - 57
  • [47] Artificial Intelligence and Machine Learning in Predicting Intradialytic Hypotension in Hemodialysis Patients: A Systematic Review
    Chaudhry, Taha Zahid
    Yadav, Mansi
    Bokhari, Syed Faqeer Hussain
    Fatimah, Syeda Rubab
    Rehman, Abdur
    Kamran, Muhammad
    Asim, Aiman
    Elhefyan, Mohamed
    Yousif, Osman
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (07)
  • [48] Machine Learning and Artificial Intelligence in Circular Economy: A Bibliometric Analysis and Systematic Literature Review
    Noman A.A.
    Akter U.H.
    Pranto T.H.
    Haque A.K.M.B.
    Ann. Emer. Tech. Comp., 2022, 2 (13-40): : 13 - 40
  • [49] Predicting Mandibular Bone Growth Using Artificial Intelligence and Machine Learning: A Systematic Review
    Dashti, Mahmood
    Khosraviani, Farshad
    Azimi, Tara
    Sehat, Mohammad Soroush
    Alekajbaf, Ehsan
    Fahimipour, Amir
    Zare, Niusha
    ADVANCES IN ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING, 2024, 4 (03): : 2731 - 2745
  • [50] Artificial intelligence and machine learning approaches in cerebral palsy diagnosis, prognosis, and management: a comprehensive review
    Balgude, Shalini Dhananjay
    Gite, Shilpa
    Pradhan, Biswajeet
    Lee, Chang-Wook
    PEERJ COMPUTER SCIENCE, 2024, 10 : 1 - 52