ydata-profiling: Accelerating data-centric AI with high-quality data

被引:12
作者
Clemente, Fabiana [1 ]
Ribeiro, Goncalo Martins [1 ]
Quemy, Alexandre [1 ]
Santos, Miriam Seoane [1 ]
Pereira, Ricardo Cardoso [1 ]
Barros, Alex [1 ]
机构
[1] YData Labs Inc, Seattle, WA 98121 USA
关键词
Exploratory data analysis; Data profiling; Data quality; Data-centric AI; Data Intrinsic Characteristics; Data Complexity; TRENDS; CLASSIFICATION; AUTOENCODERS; IMPUTATION;
D O I
10.1016/j.neucom.2023.126585
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
ydata-profiling is an open-source Python package for advanced exploratory data analysis that enables users to generate data profiling reports in a simple, fast, and efficient manner, fostering a standardized and visual understanding of the data. Beyond traditional descriptive properties and statistics, ydata-profiling follows a Data-Centric AI approach to exploratory analysis, as it focuses on the automatic detection and highlighting of complex data characteristics often associated with potential data quality issues, such as high ratios of missing or imbalanced data, infinite, unique, or constant values, skewness, high correlation, high cardinality, non-stationarity, seasonality, duplicate records, and other inconsistencies. The source code, documentation, and examples are available in the GitHub repository: https://github.com/ydataai/ydataprofiling.
引用
收藏
页数:10
相关论文
共 70 条
[1]   Data-Debugging Through Interactive Visual Explanations [J].
Afzal, Shazia ;
Chaudhary, Arunima ;
Gupta, Nitin ;
Patel, Hima ;
Spina, Carolina ;
Wang, Dakuo .
TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, 2021, 12705 :133-142
[2]  
Bamboolib, 2023, A GUI for Pandas DataFrames
[3]   Feature selection in machine learning: A new perspective [J].
Cai, Jie ;
Luo, Jiawei ;
Wang, Shulin ;
Yang, Sheng .
NEUROCOMPUTING, 2018, 300 :70-79
[4]  
Cha I., 2022, arXiv
[5]  
Chu B., 2022, J. Statist. Plann. Inference
[6]  
dabl, 2023, Data Analysis Baseline Library
[7]   Handling data irregularities in classification: Foundations, trends, and future challenges [J].
Das, Swagatam ;
Datta, Shounak ;
Chaudhuri, Bidyut B. .
PATTERN RECOGNITION, 2018, 81 :674-693
[8]   On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario [J].
De Vito, S. ;
Massera, E. ;
Piga, A. ;
Martinotto, L. ;
Di Francia, G. .
SENSORS AND ACTUATORS B-CHEMICAL, 2008, 129 (02) :750-757
[9]  
Deutch D, 2022, Arxiv, DOI arXiv:2209.06260
[10]   ExplainED: Explanations for EDA Notebooks [J].
Deutch, Daniel ;
Gilad, Amir ;
Milo, Tova ;
Somech, Amit .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (12) :2917-2920