Research directions in data wrangling: Visualizations and transformations for usable and credible data

被引:191
作者
Kandel, Sean [1 ]
Heer, Jeffrey [1 ]
Plaisant, Catherine [2 ]
Kennedy, Jessie [3 ]
van Ham, Frank
Riche, Nathalie Henry [4 ]
Weaver, Chris [5 ]
Lee, Bongshin [4 ]
Brodbeck, Dominique
Buono, Paolo [6 ]
机构
[1] Stanford Univ, Dept Comp Sci, San Francisco, CA 94107 USA
[2] Univ Maryland, Human Comp Interact Lab, College Pk, MD 20742 USA
[3] Edinburgh Napier Univ, Inst Informat & Digital Innovat, Edinburgh, Midlothian, Scotland
[4] Microsoft Res, Redmond, WA USA
[5] Univ Oklahoma, Sch Comp Sci, Norman, OK 73019 USA
[6] Univ Bari Aldo Moro, Dipartimento Informat, Bari, Italy
基金
美国国家科学基金会;
关键词
data cleaning; data quality; data transformation; uncertainty; visualization; TOOL;
D O I
10.1177/1473871611415994
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In spite of advances in technologies for working with data, analysts still spend an inordinate amount of time diagnosing data quality issues and manipulating data into a usable form. This process of 'data wrangling' often constitutes the most tedious and time-consuming aspect of analysis. Though data cleaning and integration are longstanding issues in the database community, relatively little research has explored how interactive visualization can advance the state of the art. In this article, we review the challenges and opportunities associated with addressing data quality issues. We argue that analysts might more effectively wrangle data through new interactive systems that integrate data verification, transformation, and visualization. We identify a number of outstanding research questions, including how appropriate visual encodings can facilitate apprehension of missing data, discrepant values, and uncertainty; how interactive visualizations might facilitate data transform specification; and how recorded provenance and social interaction might enable wider reuse, verification, and modification of data transformations.
引用
收藏
页码:271 / 288
页数:18
相关论文
共 69 条
[1]  
Altova, DAT INT OPP CHALL AL
[2]  
[Anonymous], ACM HUMAN FACTORS CO
[3]  
[Anonymous], 2003, Exploratory Data Mining and Data Cleaning
[4]  
[Anonymous], 2005, Illuminating the path: The research and development agenda for visual analytics (Tech. Rep.)
[5]  
[Anonymous], 2006, Proc. Special Interest Group on Management of Data Conf. (SIGMOD '06), DOI [10.1145/1142473.1142574, DOI 10.1145/1142473.1142574]
[6]  
[Anonymous], 2009, P 14 INT C INT US IN
[7]  
[Anonymous], 2007, ACM Transactions on Knowledge Discovery from Data (TKDD), DOI [DOI 10.1145/1217299.1217304, 10.1145/1217299.1217304]
[8]  
Arasu A., 2003, P 2003 ACM SIGMOD IN, P337, DOI DOI 10.1145/872757.872799
[9]  
Benjelloun Omar., 2006, VLDB
[10]  
Benjelloun Omar., 2008, The VLDB Journal