Profiler: Integrated Statistical Analysis and Visualization for Data Quality Assessment

被引:121
|
作者
Kandel, Sean [1 ]
Parikh, Ravi [1 ]
Paepcke, Andreas [1 ]
Hellerstein, Joseph M.
Heer, Jeffrey [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
来源
PROCEEDINGS OF THE INTERNATIONAL WORKING CONFERENCE ON ADVANCED VISUAL INTERFACES | 2012年
基金
美国国家科学基金会;
关键词
Data analysis; visualization; data quality; anomaly detection;
D O I
10.1145/2254556.2254659
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data quality issues such as missing, erroneous, extreme and duplicate values undermine analysis and are time-consuming to find and fix. Automated methods can help identify anomalies, but determining what constitutes an error is context-dependent and so requires human judgment. While visualization tools can facilitate this process, analysts must often manually construct the necessary views, requiring significant expertise. We present Profiler, a visual analysis tool for assessing quality issues in tabular data. Profiler applies data mining methods to automatically flag problematic data and suggests coordinated summary visualizations for assessing the data in context. The system contributes novel methods for integrated statistical and visual analysis, automatic view suggestion, and scalable visual summaries that support real-time interaction with millions of data points. We present Profiler's architecture-including modular components for custom data types, anomaly detection routines and summary visualizations-and describe its application to motion picture, natural disaster and water quality data sets.
引用
收藏
页码:547 / 554
页数:8
相关论文
共 50 条
  • [21] Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality
    Carlos Sáez
    Pedro Pereira Rodrigues
    João Gama
    Montserrat Robles
    Juan M. García-Gómez
    Data Mining and Knowledge Discovery, 2015, 29 : 950 - 975
  • [22] Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality
    Saez, Carlos
    Rodrigues, Pedro Pereira
    Gama, Joo
    Robles, Montserrat
    Garcia-Gomez, Juan M.
    DATA MINING AND KNOWLEDGE DISCOVERY, 2015, 29 (04) : 950 - 975
  • [23] Data Analysis and Visualization of Sales Data
    Singh, Kiran
    Wajgi, Rakhi
    2016 WORLD CONFERENCE ON FUTURISTIC TRENDS IN RESEARCH AND INNOVATION FOR SOCIAL WELFARE (STARTUP CONCLAVE), 2016,
  • [24] Towards integrated Data Analysis Quality: Criteria for the application of Industrial Data Science
    West, Nikolai
    Gries, Jonas
    Brockmeier, Carina
    Goebel, Jens C.
    Deuse, Jochen
    2021 IEEE 22ND INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2021), 2021, : 131 - 138
  • [25] Application of Data Analysis in Quality Management An Integrated Process Model
    Frey A.M.
    Pampus O.
    Stadler F.
    Erdler G.-A.
    Lanza G.
    ZWF Zeitschrift fuer Wirtschaftlichen Fabrikbetrieb, 2022, 117 (04): : 182 - 186
  • [26] Statistical classification and visualization of MALDI-Imaging data
    Gerhard, Marc
    Deininger, Soeren-Oliver
    Schleif, Frank-Michael
    TWENTIETH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, PROCEEDINGS, 2007, : 403 - +
  • [27] A Practical Guide to Visualization and Statistical Analysis of R. solanacearum Infection Data Using R
    Schandry, Niklas
    FRONTIERS IN PLANT SCIENCE, 2017, 8
  • [28] Data visualization and analysis within a Hydrologic Information System: Integrating with the R statistical computing environment
    Horsburgh, Jeffery S.
    Reeder, Stephanie L.
    ENVIRONMENTAL MODELLING & SOFTWARE, 2014, 52 : 51 - 61
  • [29] Usability Evaluation and Visualization Software Design for Power Quality Disturbance Data
    Wang J.
    Zhang H.
    Hu W.
    Li Y.
    Zhao Y.
    Xiao X.
    Wang Y.
    Dianwang Jishu/Power System Technology, 2022, 46 (03): : 1109 - 1116
  • [30] Strategic Shift of Statistical Review on Data Quality Assessment for New Drug Applications in China
    Wang, Jun
    Wang, Gang
    Li, Min
    Han, Jingjing
    Zeng, Xin
    Pan, Jianhong
    Yang, Jinbo
    THERAPEUTIC INNOVATION & REGULATORY SCIENCE, 2019, 53 (02) : 227 - 232