ydata-profiling: Accelerating data-centric AI with high-quality data

被引:6
|
作者
Clemente, Fabiana [1 ]
Ribeiro, Goncalo Martins [1 ]
Quemy, Alexandre [1 ]
Santos, Miriam Seoane [1 ]
Pereira, Ricardo Cardoso [1 ]
Barros, Alex [1 ]
机构
[1] YData Labs Inc, Seattle, WA 98121 USA
关键词
Exploratory data analysis; Data profiling; Data quality; Data-centric AI; Data Intrinsic Characteristics; Data Complexity; TRENDS; CLASSIFICATION; AUTOENCODERS; IMPUTATION;
D O I
10.1016/j.neucom.2023.126585
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
ydata-profiling is an open-source Python package for advanced exploratory data analysis that enables users to generate data profiling reports in a simple, fast, and efficient manner, fostering a standardized and visual understanding of the data. Beyond traditional descriptive properties and statistics, ydata-profiling follows a Data-Centric AI approach to exploratory analysis, as it focuses on the automatic detection and highlighting of complex data characteristics often associated with potential data quality issues, such as high ratios of missing or imbalanced data, infinite, unique, or constant values, skewness, high correlation, high cardinality, non-stationarity, seasonality, duplicate records, and other inconsistencies. The source code, documentation, and examples are available in the GitHub repository: https://github.com/ydataai/ydataprofiling.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Data-Centric Mobile Crowdsensing
    Jiang, Changkun
    Gao, Lin
    Duan, Lingjie
    Huang, Jianwei
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2018, 17 (06) : 1275 - 1288
  • [32] Cognitive Data-Centric Systems
    Chang, Leland
    PROCEEDINGS OF THE GREAT LAKES SYMPOSIUM ON VLSI 2017 (GLSVLSI' 17), 2017, : 1 - 1
  • [33] Data-centric AI and cancer research: constructing a research data access pipeline using XNAT
    Butterworth, Victoria
    Vilic, Dijana
    Al Jazzaf, Haleema
    Young, Thomas
    Palmer, Isabel
    Avgoulea, Tania
    Andriolo, Josh
    Creppy, Carole
    Routledge, Corla
    Misson-Yates, Sarah
    Guerrero-Urbano, Teresa
    RADIOTHERAPY AND ONCOLOGY, 2024, 194 : S2975 - S2977
  • [34] Data-Centric Security for the IoT
    Schreckling, Daniel
    Parra, Juan David
    Doukas, Charalampos
    Posegga, Joachim
    INTERNET OF THINGS: IOT INFRASTRUCTURES, IOT 360, PT II, 2016, 170 : 77 - 86
  • [35] A Data-Centric Approach to Synchronization
    Dolby, Julian
    Hammer, Christian
    Marino, Daniel
    Tip, Frank
    Vaziri, Mandana
    Vitek, Jan
    ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 2012, 34 (01):
  • [36] Orchestrating Data-Centric Workflows
    Barker, Adam
    Weissman, Jon B.
    van Hemert, Jano
    CCGRID 2008: EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, VOLS 1 AND 2, PROCEEDINGS, 2008, : 210 - 217
  • [37] Data-Centric Intelligent Computing
    Jun Shen
    Chih-Cheng Hung
    Ghassan Beydoun
    Yan Li
    William Guo
    International Journal of Computational Intelligence Systems, 2018, 11 : 616 - 617
  • [38] Data-Centric Artificial Intelligence
    Jakubik, Johannes
    Voessing, Michael
    Kuehl, Niklas
    Walk, Jannis
    Satzger, Gerhard
    BUSINESS & INFORMATION SYSTEMS ENGINEERING, 2024, 66 (04) : 507 - 515
  • [39] Practical data-centric storage
    Ee, Cheng Tien
    Ratnasamy, Sylvia
    Shenker, Scott
    USENIX ASSOCIATION PROCEEDINGS OF THE 3RD SYMPOSIUM ON NETWORKED SYSTEMS DESIGN & IMPLEMENTATION (NSDI 06), 2006, : 325 - +
  • [40] Distributed scheduler for high performance data-centric systems
    Goel, S
    Sharda, H
    Taniar, D
    IEEE TENCON 2003: CONFERENCE ON CONVERGENT TECHNOLOGIES FOR THE ASIA-PACIFIC REGION, VOLS 1-4, 2003, : 1157 - 1161