The Generalized Pairs Plot

被引:91
作者
Emerson, John W. [1 ]
Green, Walton A. [2 ]
Schloerke, Barret [3 ]
Crowley, Jason [3 ]
Cook, Dianne [3 ]
Hofmann, Heike [3 ]
Wickham, Hadley [4 ]
机构
[1] Yale Univ, Dept Stat, New Haven, CT 06520 USA
[2] Harvard Univ, Dept Organism & Evolutionary Biol, Cambridge, MA 02138 USA
[3] Iowa State Univ, Dept Stat, Ames, IA 50011 USA
[4] Rice Univ, Dept Stat, Houston, TX 77251 USA
关键词
Exploratory data analysis; Grammar of graphics; Graphics; Multivariate data; Scatterplot matrix; Visualization; DISPLAYS;
D O I
10.1080/10618600.2012.694762
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This article develops a generalization of the scatterplot matrix based on the recognition that most datasets include both categorical and quantitative information. Traditional grids of scatterplots often obscure important features of the data when one or more variables are categorical but coded as numerical. The generalized pairs plot offers a range of displays of paired combinations of categorical and quantitative variables. A mosaic plot, fluctuation diagram, or faceted bar chart may be used to display two categorical variables. A side-by-side boxplot, stripplot, faceted histogram, or density plot helps visualize a categorical and a quantitative variable. A traditional scatterplot is suitable for displaying a pair of numerical variables, but options also support density contours or annotating summary statistics such as the correlation and number of missing values, for example. By combining these, the generalized pairs plot may help to reveal structure in multivariate data that otherwise might go unnoticed in the process of exploratory data analysis. Two different R packages provide implementations of the generalized pairs plot, gpairs and GGally. Supplementary materials for this article are available online on the journal web site.
引用
收藏
页码:79 / 91
页数:13
相关论文
共 37 条
[11]  
Emerson J., 2010, 2010 ENV PERFORMANCE
[12]  
Emerson J. W., 2012, BARCODE BARCODE PLOT
[13]  
Emerson J. W., 2012, YALETOOLKIT DATA EXP
[14]  
Emerson J. W., 2006, USER 2006 C VIENN
[15]  
Emerson JohnW., 2012, gpairs: The Generalized Pairs Plot
[16]   The use of multiple measurements in taxonomic problems [J].
Fisher, RA .
ANNALS OF EUGENICS, 1936, 7 :179-188
[17]  
FRIENDLY M, 1994, J AM STAT ASSOC, V89, P190
[18]   Corrgrams: Exploratory displays for correlation matrices [J].
Friendly, M .
AMERICAN STATISTICIAN, 2002, 56 (04) :316-324
[19]  
Grosjean P, 2003, CAN J FISH AQUAT SCI, V60, P237, DOI [10.1139/f03-017, 10.1139/F03-017]
[20]  
Hartigan J. A., 1975, Journal of Statistical Computation and Simulation, V4, P187, DOI 10.1080/00949657508810123