Is Least-Squares Inaccurate in Fitting Power-Law Distributions? The Criticism is Complete Nonsense

被引:2
作者
Zhong, Xiaoshi [1 ]
Wang, Muyin [1 ]
Zhang, Hongkun [1 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
来源
PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22) | 2022年
关键词
Power-law distributions; least-squares estimation (LSE); average strategy; long-tailed noises; GOODNESS-OF-FIT; PARAMETER-ESTIMATION; STATISTICS; ABUNDANCE; SPECTRA; SYSTEMS; MODEL;
D O I
10.1145/3485447.3511995
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Ordinary least-squares estimation is proved to be the best linear unbiased estimator according to the Gauss-Markov theorem. In the last two decades, however, some researchers criticized that least-squares was substantially inaccurate in fitting power-law distributions; such criticism has caused a strong bias in research community. In this paper, we conduct extensive experiments to rebut that such criticism is complete nonsense. Specifically, we sample different sizes of discrete and continuous data from power-law models, showing that even though the long-tailed noises are sampled from power-law models, they cannot be treated as power-law data. We define the correct way to bin continuous power-law data into data points and propose an average strategy for least-squares to fit power-law distributions. Experiments on both simulated and real-world data show that our proposed method fits power-law data perfectly. We uncover a fundamental flaw in the popular method proposed by Clauset et al. [12]: it tends to discard the majority of power-law data and fit the long-tailed noises. Experiments also show that the reverse cumulative distribution function is a bad idea to plot power-law data in practice because it usually hides the true probability distribution of data. We hope that our research can clean up the bias about least-squares fitting power-law distributions. Source code can be found at https://github.com/xszhong/LSavg.
引用
收藏
页码:2748 / 2758
页数:11
相关论文
共 62 条
  • [1] Adamic L.A., 2000, J ELECT COMMERCE, V1, P5, DOI DOI 10.2139/SSRN.166108
  • [2] Internet -: Diameter of the World-Wide Web
    Albert, R
    Jeong, H
    Barabási, AL
    [J]. NATURE, 1999, 401 (6749) : 130 - 131
  • [3] How rare are power-law networks really?
    Artico, I.
    Smolyarenko, I.
    Vinciotti, V.
    Wit, E. C.
    [J]. PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2020, 476 (2241):
  • [4] A/B Testing with Fat Tails
    Azevedo, Eduardo M.
    Deng, Alex
    Montiel Olea, Jose Luis
    Rao, Justin
    Weyl, E. Glen
    [J]. JOURNAL OF POLITICAL ECONOMY, 2020, 128 (12) : 4614 - 4672
  • [5] Emergence of scaling in random networks
    Barabási, AL
    Albert, R
    [J]. SCIENCE, 1999, 286 (5439) : 509 - 512
  • [7] Blaslus Bernd, 2020, CHAOS, V30, P9
  • [8] Scaling of fracture systems in geological media
    Bonnet, E
    Bour, O
    Odling, NE
    Davy, P
    Main, I
    Cowie, P
    Berkowitz, B
    [J]. REVIEWS OF GEOPHYSICS, 2001, 39 (03) : 347 - 383
  • [9] Heavy-tailed distributions for building stock data
    Bradley, Patrick Erik
    Behnisch, Martin
    [J]. ENVIRONMENT AND PLANNING B-URBAN ANALYTICS AND CITY SCIENCE, 2019, 46 (07) : 1281 - 1296
  • [10] Scale-free networks are rare
    Broido, Anna D.
    Clauset, Aaron
    [J]. NATURE COMMUNICATIONS, 2019, 10 (1)