Solvable null model for the distribution of word frequencies

被引:16
作者
Fontanari, JF
Perlovsky, LI
机构
[1] Univ Sao Paulo, Inst Fis Sao Carlos, BR-13560970 Sao Paulo, Brazil
[2] USAF, Res Lab, Hanscom AFB, MA 01731 USA
来源
PHYSICAL REVIEW E | 2004年 / 70卷 / 04期
基金
巴西圣保罗研究基金会;
关键词
D O I
10.1103/PhysRevE.70.042901
中图分类号
O35 [流体力学]; O53 [等离子体物理学];
学科分类号
070204 ; 080103 ; 080704 ;
摘要
Zipf's law asserts that in all natural languages the frequency of a word is inversely proportional to its rank. The significance, if any, of this result for language remains a mystery. Here we examine a null hypothesis for the distribution of word frequencies, a so-called discourse-triggered word choice model, which is based on the assumption that the more a word is used, the more likely it is to be used again. We argue that this model is equivalent to the neutral infinite-alleles model of population genetics and so the degeneracy of the different words composing a sample of text is given by the celebrated Ewens sampling formula [Theor. Pop. Biol. 3, 87 (1972)], which we show to produce an exponential distribution of word frequencies.
引用
收藏
页数:4
相关论文
共 22 条
[1]  
[Anonymous], 1979, Monte Carlo Methods, DOI DOI 10.1007/978-94-009-5819-7
[2]  
[Anonymous], 1949, Human behaviour and the principle of least-effort
[3]   The variant call format and VCFtools [J].
Danecek, Petr ;
Auton, Adam ;
Abecasis, Goncalo ;
Albers, Cornelis A. ;
Banks, Eric ;
DePristo, Mark A. ;
Handsaker, Robert E. ;
Lunter, Gerton ;
Marth, Gabor T. ;
Sherry, Stephen T. ;
McVean, Gilean ;
Durbin, Richard .
BIOINFORMATICS, 2011, 27 (15) :2156-2158
[4]  
EWENS WJ, 1972, THEOR POPUL BIOL, V3, P87, DOI 10.1016/0040-5809(72)90035-4
[5]   Language-tree divergence times support the Anatolian theory of Indo-European origin [J].
Gray, RD ;
Atkinson, QD .
NATURE, 2003, 426 (6965) :435-439
[6]   Zipf's law and the effect of ranking on probability distributions [J].
Gunther, R ;
Levitin, L ;
Schapiro, B ;
Wagner, P .
INTERNATIONAL JOURNAL OF THEORETICAL PHYSICS, 1996, 35 (02) :395-417
[7]   FREQUENCY-DISTRIBUTIONS IN POPULATION-GENETICS PARALLEL THOSE IN STATISTICAL PHYSICS [J].
HIGGS, PG .
PHYSICAL REVIEW E, 1995, 51 (01) :95-101
[8]   SAMPLING THEORY OF SELECTIVELY NEUTRAL ALLELES [J].
KARLIN, S ;
MCGREGOR, J .
THEORETICAL POPULATION BIOLOGY, 1972, 3 (01) :113-&
[9]  
KIMURA M, 1971, Theoretical Population Biology, V2, P174
[10]   Spontaneous evolution of linguistic structure - An iterated learning model of the emergence of regularity and irregularity [J].
Kirby, S .
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2001, 5 (02) :102-110