A Three-Parameter Rank-Frequency Relation in Natural Languages

被引:0
|
作者
Ding, Chenchen [1 ]
Utiyama, Masao [1 ]
Sumita, Eiichiro [1 ]
机构
[1] Natl Inst Informat & Commun Technol, Adv Speech Translat Res & Dev Promot Ctr, Adv Translat Technol Lab, 3-5 Hikaridai, Seika, Kyoto 6190289, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present that, the rank-frequency relation in textual data follows f proportional to r(-alpha) (r+gamma)(-beta), where f is the token frequency and r is the rank by frequency, with (alpha, beta, gamma) as parameters. The formulation is derived based on the empirical observation that d(2)(x + y)/dx(2) is a typical impulse function, where (x, y) = (log r, log f). The formulation is the power law when beta = 0 and the Zipf-Mandelbrot law when alpha = 0. We illustrate that alpha is related to the analytic features of syntax and beta + gamma to those of morphology in natural languages from an investigation of multilingual corpora.
引用
收藏
页码:460 / 464
页数:5
相关论文
共 50 条
  • [1] Rank-frequency distribution of natural languages: A difference of probabilities approach
    Cocho, Germinal
    Rodriguez, Rosalio F.
    Sanchez, Sergio
    Flores, Jorge
    Pineda, Carlos
    Gershenson, Carlos
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2019, 532
  • [2] Rank-frequency relation for Chinese characters
    Deng, Weibing
    Allahverdyan, Armen E.
    Li, Bo
    Wang, Qiuping A.
    EUROPEAN PHYSICAL JOURNAL B, 2014, 87 (02):
  • [3] Rank-frequency relation for Chinese characters
    Weibing Deng
    Armen E. Allahverdyan
    Bo Li
    Qiuping A. Wang
    The European Physical Journal B, 2014, 87
  • [4] RANK-FREQUENCY DISTRIBUTIONS FOR PHONEMES
    SIGURD, B
    PHONETICA, 1968, 18 (01) : 1 - &
  • [5] Rank-Frequency Distributions of Romanian Words
    Cocioceanu, Adrian
    Raportaru, Carina Mihaela
    Nicolin, Alexandru I.
    Jakimovski, Dragan
    TIM17 PHYSICS CONFERENCE, 2017, 1916
  • [6] A RANK-FREQUENCY MODEL FOR SCIENTIFIC PRODUCTIVITY
    HUBERT, JJ
    SCIENTOMETRICS, 1981, 3 (03) : 191 - 202
  • [7] RANK-FREQUENCY FORM OF ZIPFS LAW
    HILL, BM
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1974, 69 (348) : 1017 - 1026
  • [8] TESTS OF A STATISTICAL EXPLANATION OF THE RANK-FREQUENCY RELATION FOR WORDS IN WRITTEN-ENGLISH
    MILLER, GA
    NEWMAN, EB
    AMERICAN JOURNAL OF PSYCHOLOGY, 1958, 71 (01): : 209 - 218
  • [9] Towards a model for rank-frequency distributions of melodic intervals
    Macutek, Jan
    Svehlikova, Zuzana
    Cenkerova, Zuzana
    GLOTTOMETRICS, 2011, 21 : 60 - 64
  • [10] Text mixing shapes the anatomy of rank-frequency distributions
    Williams, Jake Ryland
    Bagrow, James P.
    Danforth, Christopher M.
    Dodds, Peter Sheridan
    PHYSICAL REVIEW E, 2015, 91 (05):