ydata-profiling: Accelerating data-centric AI with high-quality data

被引:6
|
作者
Clemente, Fabiana [1 ]
Ribeiro, Goncalo Martins [1 ]
Quemy, Alexandre [1 ]
Santos, Miriam Seoane [1 ]
Pereira, Ricardo Cardoso [1 ]
Barros, Alex [1 ]
机构
[1] YData Labs Inc, Seattle, WA 98121 USA
关键词
Exploratory data analysis; Data profiling; Data quality; Data-centric AI; Data Intrinsic Characteristics; Data Complexity; TRENDS; CLASSIFICATION; AUTOENCODERS; IMPUTATION;
D O I
10.1016/j.neucom.2023.126585
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
ydata-profiling is an open-source Python package for advanced exploratory data analysis that enables users to generate data profiling reports in a simple, fast, and efficient manner, fostering a standardized and visual understanding of the data. Beyond traditional descriptive properties and statistics, ydata-profiling follows a Data-Centric AI approach to exploratory analysis, as it focuses on the automatic detection and highlighting of complex data characteristics often associated with potential data quality issues, such as high ratios of missing or imbalanced data, infinite, unique, or constant values, skewness, high correlation, high cardinality, non-stationarity, seasonality, duplicate records, and other inconsistencies. The source code, documentation, and examples are available in the GitHub repository: https://github.com/ydataai/ydataprofiling.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Image Quality Assessment: Integrating Model-Centric and Data-Centric Approaches
    Cao, Peibei
    Li, Dingquan
    Ma, Kede
    CONFERENCE ON PARSIMONY AND LEARNING, VOL 234, 2024, 234 : 529 - 541
  • [42] A data-centric approach to high-level synthesis
    Tarafdar, S
    Leeser, M
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2000, 19 (11) : 1251 - 1267
  • [43] Data-Centric Interactions on the Web
    Diaz, Paloma
    Hussein, Tim
    Lohmann, Steffen
    Ziegler, Juergen
    HUMAN-COMPUTER INTERACTION - INTERACT 2011, PT IV, 2011, 6949 : 726 - 727
  • [44] Data-centric storage in sensornets
    Shenker, S
    Ratnasamy, S
    Karp, B
    Govindan, R
    Estrin, D
    ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2003, 33 (01) : 137 - 142
  • [45] Gaspar Data-Centric Framework
    Silva, Rui
    Sobral, J. L.
    HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2016, 2017, 10150 : 234 - 247
  • [46] Data-Centric Intelligent Computing
    Shen, Jun
    Hung, Chih-Cheng
    Beydoun, Ghassan
    Li, Yan
    Guo, William
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2018, 11 (01) : 616 - 617
  • [47] A Data-Centric Approach to Quality Estimation of Role Mining Results
    Dong, Lijun
    Wu, Kui
    Tang, Guoming
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2016, 11 (12) : 2678 - 2692
  • [48] AI in Interventional Radiology: There is Momentum for High-Quality Data Registries
    Sailer, Anna M.
    Tipaldi, Marcello Andrea
    Krokidis, Miltiadis
    CARDIOVASCULAR AND INTERVENTIONAL RADIOLOGY, 2019, 42 (08) : 1208 - 1209
  • [49] AI in Interventional Radiology: There is Momentum for High-Quality Data Registries
    Anna M. Sailer
    Marcello Andrea Tipaldi
    Miltiadis Krokidis
    CardioVascular and Interventional Radiology, 2019, 42 : 1208 - 1209
  • [50] Data-centric security: Integrating data privacy and data security
    Hennessy, S. D.
    Lauer, G. D.
    Zunic, N.
    Gerber, B.
    Nelson, A. C.
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2009, 53 (02)