Visualizing bivariate long-tailed data

被引:1
作者
Dyer, Justin S. [1 ]
Owen, Art B. [1 ]
机构
[1] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
来源
ELECTRONIC JOURNAL OF STATISTICS | 2011年 / 5卷
基金
美国国家科学基金会;
关键词
Copula; bivariate Zipf; bipartite preferential attachment; preferential attachment; Zipf-Mandelbrot; COMPLEX NETWORKS;
D O I
10.1214/11-EJS622
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Variables in large data sets in biology or e-commerce often have a head, made up of very frequent values and a long tail of ever rarer values. Models such as the Zip for Zipf-Mandelbrot provide a good description. The problem we address here is the visualization of two such long-tailed variables, as one might see in a bivariate Zipf context. We introduce a copula plot to display the joint behavior of such variables. The plot uses an empirical ordering of the data; we prove that this ordering is a symptotically accurate in a Zipf-Mandelbrot-Poisson model. We often see an association between entities at the head of one variable with those from the tail of the other. We present two generative models (saturation and bipartite preferential attachment) that show such qualitative behavior and we characterize the power law behavior of the marginal distributions in these models.
引用
收藏
页码:642 / 668
页数:27
相关论文
共 19 条
  • [1] [Anonymous], 2006, Random Graph Dynamics
  • [2] Artin E., 1964, The Gamma Function
  • [3] Emergence of scaling in random networks
    Barabási, AL
    Albert, R
    [J]. SCIENCE, 1999, 286 (5439) : 509 - 512
  • [4] Bennett J., 2007, P KDD CUP WORKSH NEW
  • [5] The degree sequence of a scale-free random graph process
    Bollobás, B
    Riordan, O
    Spencer, J
    Tusnády, G
    [J]. RANDOM STRUCTURES & ALGORITHMS, 2001, 18 (03) : 279 - 290
  • [6] Detecting rich-club ordering in complex networks
    Colizza, V
    Flammini, A
    Serrano, MA
    Vespignani, A
    [J]. NATURE PHYSICS, 2006, 2 (02) : 110 - 115
  • [7] DYER JS, 2010, CORRECT ORDERING ZIP
  • [8] Gautschi W., 1959, Journal of Mathematics and Physics, V38, P77, DOI [10.1002/sapm195938177, DOI 10.1002/SAPM195938177]
  • [9] Bipartite graphs as models of complex networks
    Guillaume, Jean-Loup
    Latapy, Matthieu
    [J]. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2006, 371 (02) : 795 - 813
  • [10] Authoritative sources in a hyperlinked environment
    Kleinberg, JM
    [J]. JOURNAL OF THE ACM, 1999, 46 (05) : 604 - 632