Power-Law Distributions in Empirical Data

被引:6167
作者
Clauset, Aaron [1 ,2 ]
Shalizi, Cosma Rohilla [3 ]
Newman, M. E. J. [4 ,5 ]
机构
[1] Santa Fe Inst, Santa Fe, NM 87501 USA
[2] Univ New Mexico, Dept Comp Sci, Albuquerque, NM 87131 USA
[3] Carnegie Mellon Univ, Dept Stat, Pittsburgh, PA 15213 USA
[4] Univ Michigan, Dept Phys, Ann Arbor, MI 48109 USA
[5] Univ Michigan, Ctr Study Complex Syst, Ann Arbor, MI 48109 USA
基金
美国国家科学基金会;
关键词
power-law distributions; Pareto; Zipf; maximum likelihood; heavy-tailed distributions; likelihood ratio test; model selection; LIKELIHOOD RATIO; INFERENCE; TESTS;
D O I
10.1137/070710111
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the detection and characterization of power laws is complicated by the large fluctuations that occur in the tail of the distribution-the part of the distribution representing large but rare events-and by the difficulty of identifying the range over which power-law behavior holds. Commonly used methods for analyzing power-law data, such as least-squares fitting, can produce Substantially inaccurate estimates of parameters for power-law distributions, and even in cases where such methods return accurate answers they are still unsatisfactory because they give no indication of whether the data obey a power law at all. Here we present a principled statistical framework for discerning and quantifying power-law behavior in empirical data. Our approach combines maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov-Smirnov (KS) statistic and likelihood ratios. We evaluate the effectiveness of the approach with tests on synthetic data, and give critical comparisons to previous approaches. We also apply the proposed methods to twenty-four real-world data sets from a range of different disciplines, each of which has been conjectured to follow a power-law distribution. In some cases we find these conjectures to be consistent with the data, while in others the power law is ruled out.
引用
收藏
页码:661 / 703
页数:43
相关论文
共 69 条
[31]  
IJIRI Y, 1977, DISTRIBUTIONS SIZES
[32]   Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins [J].
Ito, T ;
Tashiro, K ;
Muta, S ;
Ozawa, R ;
Chiba, T ;
Nishizawa, M ;
Yamamoto, K ;
Kuhara, S ;
Sakaki, Y .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (03) :1143-1147
[33]   Some tests of significance, treated by the theory of probability [J].
Jeffreys, H .
PROCEEDINGS OF THE CAMBRIDGE PHILOSOPHICAL SOCIETY, 1935, 31 :203-222
[34]  
Johnson N.L., 1994, Continuous univariate distributions
[35]   BAYES FACTORS [J].
KASS, RE ;
RAFTERY, AE .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1995, 90 (430) :773-795
[36]   Connectivity of growing random networks [J].
Krapivsky, PL ;
Redner, S ;
Leyvraz, F .
PHYSICAL REVIEW LETTERS, 2000, 85 (21) :4629-4632
[37]  
MacKay David JC, 2003, Information Theory, Inference, and Learning Algorithms
[38]   Empirical distributions of stock returns: between the stretched exponential and the power law? [J].
Malevergne, Y ;
Pisarenko, V ;
Sornette, D .
QUANTITATIVE FINANCE, 2005, 5 (04) :379-401
[39]   Nonparametric estimation of long-tailed density functions and its application to the analysis of World Wide Web traffic [J].
Markovitch, NM ;
Krieger, UR .
PERFORMANCE EVALUATION, 2000, 42 (2-3) :205-222
[40]   LAWS OF LARGE NUMBERS FOR SUMS OF EXTREME VALUES [J].
MASON, DM .
ANNALS OF PROBABILITY, 1982, 10 (03) :754-764