A formal derivation of Heaps' law

被引:53
作者
van Leijenhorst, DC [1 ]
van der Weide, TP [1 ]
机构
[1] Radboud Univ Nijmegen, Fac Math & Comp Sci, Dept Comp Sci, NL-6525 ED Nijmegen, Netherlands
关键词
Mandelbrot distribution - Statistical distribution - Statistical model;
D O I
10.1016/j.ins.2004.03.006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Word frequencies in text documents can be reasonably described by the Mandelbrot distribution, which has Zipf's Law as a special case. Furthermore, the growth of vocabulary size as a function of the text size (its number of words) has been described in Heaps' Law. It has been shown that these two experimental laws are related. In this paper we go a step further, and provide a (formal) derivation of Heaps' Law from the Mandelbrot distribution. We also provide a specification of the validity area for applying Heaps' Law. (C) 2004 Elsevier Inc. All rights reserved.
引用
收藏
页码:263 / 272
页数:10
相关论文
共 13 条
[1]  
[Anonymous], 1965, COURSE MODERN ANAL
[2]  
[Anonymous], 1949, Human behaviour and the principle of least-effort
[3]  
ANTON H, 1999, CALCULUS
[4]  
AYOUB R, 1963, AMS MATH SURVEYS
[5]  
BaezaYates R, 2000, J AM SOC INFORM SCI, V51, P69, DOI 10.1002/(SICI)1097-4571(2000)51:1<69::AID-ASI10>3.0.CO
[6]  
2-C
[7]   Zipf's law for cities: An explanation [J].
Gabaix, X .
QUARTERLY JOURNAL OF ECONOMICS, 1999, 114 (03) :739-767
[8]   THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS [J].
GOOD, IJ .
BIOMETRIKA, 1953, 40 (3-4) :237-264
[9]  
Heaps HS, 1978, INFORMATION RETRIEVA
[10]   RANDOM TEXTS EXHIBIT ZIPF-LAW-LIKE WORD-FREQUENCY DISTRIBUTION [J].
LI, WT .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1992, 38 (06) :1842-1845