ydata-profiling: Accelerating data-centric AI with high-quality data

被引:6
|
作者
Clemente, Fabiana [1 ]
Ribeiro, Goncalo Martins [1 ]
Quemy, Alexandre [1 ]
Santos, Miriam Seoane [1 ]
Pereira, Ricardo Cardoso [1 ]
Barros, Alex [1 ]
机构
[1] YData Labs Inc, Seattle, WA 98121 USA
关键词
Exploratory data analysis; Data profiling; Data quality; Data-centric AI; Data Intrinsic Characteristics; Data Complexity; TRENDS; CLASSIFICATION; AUTOENCODERS; IMPUTATION;
D O I
10.1016/j.neucom.2023.126585
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
ydata-profiling is an open-source Python package for advanced exploratory data analysis that enables users to generate data profiling reports in a simple, fast, and efficient manner, fostering a standardized and visual understanding of the data. Beyond traditional descriptive properties and statistics, ydata-profiling follows a Data-Centric AI approach to exploratory analysis, as it focuses on the automatic detection and highlighting of complex data characteristics often associated with potential data quality issues, such as high ratios of missing or imbalanced data, infinite, unique, or constant values, skewness, high correlation, high cardinality, non-stationarity, seasonality, duplicate records, and other inconsistencies. The source code, documentation, and examples are available in the GitHub repository: https://github.com/ydataai/ydataprofiling.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Data-centric automated data mining
    Campos, MM
    Stengard, PJ
    Milenova, BL
    ICMLA 2005: FOURTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2005, : 97 - 104
  • [22] Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A Comprehensive Benchmark
    Hansen, Lasse
    Seedat, Nabeel
    van der Schaar, Mihaela
    Petrovic, Andrija
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [23] Data-centric AI practice in maritime: securing trusted data quality via a computer vision-based framework
    Wang, Ke
    Tristan, Ong Qi Hao
    Zhang, Xiaocai
    Fu, Xiuju
    Qin, Zheng
    2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 414 - 417
  • [24] From Concept to Implementation: The Data-Centric Development Process for AI in Industry
    Luley, Paul-Philipp
    Deriu, Jan M.
    Yan, Peng
    Schatte, Gerrit A.
    Stadelmann, Thilo
    2023 10TH IEEE SWISS CONFERENCE ON DATA SCIENCE, SDS, 2023, : 73 - 76
  • [25] Data-centric Edge-AI: A Symbolic Representation Use Case
    Ilager, Shashikant
    De Maio, Vincenzo
    Lujic, Ivan
    Brandic, Ivona
    2023 IEEE INTERNATIONAL CONFERENCE ON EDGE COMPUTING AND COMMUNICATIONS, EDGE, 2023, : 301 - 308
  • [26] A Data-Centric AI Paradigm for Socio-Industrial and Global Challenges
    Majeed, Abdul
    Hwang, Seong Oun
    ELECTRONICS, 2024, 13 (11)
  • [27] RDF Data-Centric Storage
    Levandoski, Justin J.
    Mokbel, Mohamed F.
    2009 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, VOLS 1 AND 2, 2009, : 911 - 918
  • [28] Unpacking data-centric geotechnics
    Phoon, Kok-Kwang
    Ching, Jianye
    Cao, Zijun
    UNDERGROUND SPACE, 2022, 7 (06) : 967 - 989
  • [29] (Re)Designing Data-Centric Data Centers
    Ranganathan, Parthasarathy
    Chang, Jichuan
    IEEE MICRO, 2012, 32 (01) : 66 - 70
  • [30] Data-centric decision support
    Kulhavy, R
    PROCEEDINGS OF THE 2002 AMERICAN CONTROL CONFERENCE, VOLS 1-6, 2002, 1-6 : 3395 - 3400