Quantifying and comparing features in high-dimensional datasets

被引:22
作者
Piringer, Harald [1 ]
Berger, Wolfgang [1 ]
Hauser, Helwig
机构
[1] VRVis Res Ctr, Vienna, Austria
来源
PROCEEDINGS OF THE 12TH INTERNATIONAL INFORMATION VISUALISATION | 2008年
关键词
D O I
10.1109/IV.2008.17
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Linking and brushing is a proven approach to analyzing multi-dimensional datasets in the context of multiple coordinated views. Nevertheless, most of the respective visualization techniques only offer qualitative visual results. Many user tasks, however, also require precise quantitative results as, for example, offered by statistical analysis. In succession of the useful Rank-by-Feature Framework, this paper describes a joint visual and statistical approach for guiding the user through a high-dimensional dataset by ranking dimensions (1D case) and pairs of dimensions (2D case) according to statistical summaries. While the original Rank-by-Feature Framework is limited to global features, the most important novelty here is the concept to consider local features, i.e., data subsets defined by brushing in linked views. The ability to compare subsets to other subsets and subsets to the whole dataset in the context of a large number of dimensions significantly extends the benefits of the approach especially in later stages of an exploratory data analysis. A case study illustrates the workflow by analyzing counts of keywords for classifying e-mails as spam or no-spam.
引用
收藏
页码:240 / 245
页数:6
相关论文
共 15 条
[1]   Similarity clustering of dimensions for an enhanced visualization of multidimensional data [J].
Ankerst, M ;
Berchtold, S ;
Keim, DA .
IEEE SYMPOSIUM ON INFORMATION VISUALIZATION - PROCEEDINGS, 1998, :52-+
[2]  
[Anonymous], P IEEE S INF VIS
[3]  
ASUNCION A, 2008, UCI MACHINE LEARNING
[4]   PROJECTION PURSUIT ALGORITHM FOR EXPLORATORY DATA-ANALYSIS [J].
FRIEDMAN, JH ;
TUKEY, JW .
IEEE TRANSACTIONS ON COMPUTERS, 1974, C 23 (09) :881-890
[5]   Corrgrams: Exploratory displays for correlation matrices [J].
Friendly, M .
AMERICAN STATISTICIAN, 2002, 56 (04) :316-324
[6]   Intelligent visual analytics queries [J].
Hao, Ming C. ;
Dayal, Umeshwar ;
Keim, Daniel A. ;
Morent, Dominik ;
Schneidewind, Joem .
VAST: IEEE SYMPOSIUM ON VISUAL ANALYTICS SCIENCE AND TECHNOLOGY 2007, PROCEEDINGS, 2007, :91-+
[7]  
Massart DL, 2005, LC GC EUR, V18, P215
[8]  
Montgomery D. C., 2003, APPL STAT PROBABILIT
[9]   Knowledge discovery in high-dimensional data: Case studies and a user survey for the rank-by-feature framework [J].
Seo, J ;
Shneiderman, B .
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2006, 12 (03) :311-322
[10]  
Seo J, 2004, IEEE SYMPOSIUM ON INFORMATION VISUALIZATION 2004, PROCEEDINGS, P65